Next Article in Journal
A Simple Test for Causality in Volatility
Next Article in Special Issue
Bayesian Treatments for Panel Data Stochastic Frontier Models with Time Varying Heterogeneity
Previous Article in Journal
Goodness-of-Fit Tests for Copulas of Multivariate Time Series
Previous Article in Special Issue
Subset-Continuous-Updating GMM Estimators for Dynamic Panel Data Models
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Accuracy and Efficiency of Various GMM Inference Techniques in Dynamic Micro Panel Data Models

1
Amsterdam School of Economics, University of Amsterdam, P.O. Box 15867, 1001 NJ Amsterdam,The Netherlands
2
IKZ, Newtonlaan 1-41, 3584 BX Utrecht, The Netherlands
*
Author to whom correspondence should be addressed.
Econometrics 2017, 5(1), 14; https://doi.org/10.3390/econometrics5010014
Submission received: 28 December 2016 / Revised: 6 March 2017 / Accepted: 10 March 2017 / Published: 20 March 2017
(This article belongs to the Special Issue Recent Developments in Panel Data Methods)

Abstract

:
Studies employing Arellano-Bond and Blundell-Bond generalized method of moments (GMM) estimation for linear dynamic panel data models are growing exponentially in number. However, for researchers it is hard to make a reasoned choice between many different possible implementations of these estimators and associated tests. By simulation, the effects are examined in terms of many options regarding: (i) reducing, extending or modifying the set of instruments; (ii) specifying the weighting matrix in relation to the type of heteroskedasticity; (iii) using (robustified) 1-step or (corrected) 2-step variance estimators; (iv) employing 1-step or 2-step residuals in Sargan-Hansen overall or incremental overidentification restrictions tests. This is all done for models in which some regressors may be either strictly exogenous, predetermined or endogenous. Surprisingly, particular asymptotically optimal and relatively robust weighting matrices are found to be superior in finite samples to ostensibly more appropriate versions. Most of the variants of tests for overidentification and coefficient restrictions show serious deficiencies. The variance of the individual effects is shown to be a major determinant of the poor quality of most asymptotic approximations; therefore, the accurate estimation of this nuisance parameter is investigated. A modification of GMM is found to have some potential when the cross-sectional heteroskedasticity is pronounced and the time-series dimension of the sample is not too small. Finally, all techniques are employed to actual data and lead to insights which differ considerably from those published earlier.

1. Introduction

One of the major attractions of analyzing panel data rather than single indexed variables is that they allow us to cope with the empirically very relevant situation of unobserved heterogeneity correlated with included regressors. Econometric analysis of dynamic relationships on the basis of panel data, where the number of surveyed individuals is relatively large while covering just a few time periods, is very often based on GMM (generalized method of moments). Its reputation is built on its claimed flexibility, generality, ease of use, robustness and efficiency. Widely available standard software enables us to estimate models including exogenous, predetermined and endogenous regressors consistently, while allowing for semiparametric approaches regarding the presence of heteroskedasticity and the type of distribution of the disturbances. This software also provides specification checks regarding the adequacy of the internal and external instrumental variables employed and the specific assumptions made regarding (absence of) serial correlation.
Especially popular are the GMM implementations put forward by Arellano and Bond [1]. However, practical problems have often been reported, such as vulnerability due to the abundance of internal instruments, discouraging improvements of 2-step over 1-step GMM findings, poor size control of test statistics, and weakness of instruments especially when the dynamic adjustment process is slow (a root is close to unity). As remedies it has been suggested to reduce the number of instruments by renouncing some valid orthogonality conditions, but also to extend the number of instruments by adopting more orthogonality conditions. Extra orthogonality conditions can be based on certain homoskedasticity or stationarity assumptions or initial value conditions, see Blundell and Bond [2]. By abandoning weak instruments finite sample bias may be reduced, whereas by extending the instrument set with a few strong ones the bias may be further reduced and the efficiency enhanced. Presently, it is not clear yet how practitioners can best make use of these suggestions, because no set of preferred testing tools is yet available, nor a comprehensive sequential specification search strategy, which in a systematic fashion allow us to select instruments by assessing both their validity and their strength, as well as to classify individual regressors accurately as relevant and either endogenous, predetermined or strictly exogenous. Therefore, it happens often that, in applied research, models and techniques are selected simply on the basis of the perceived significance and plausibility of their coefficient estimates, whereas it is well known that imposing invalid coefficient restrictions and employing regressors wrongly as instruments will often lead to relatively small estimated standard errors. Then, however, these provide misleading information on the actual precision of the often seriously biased estimators.
The available studies on the performance of alternative inference techniques for dynamic panel data models have obvious limitations when it comes to advising practitioners on the most effective implementations of estimators and tests under general circumstances. As a rule, they do not consider various empirically relevant issues in conjunction, such as: (i) occurrence and the possible endogeneity of regressors additional to the lagged dependent variable, (ii) occurrence of individual effect (non-)stationarity of both the lagged dependent variable and other regressors, (iii) cross-section and/or time-series heteroskedasticity of the idiosyncratic disturbances, and (iv) variation in signal-to-noise ratios and in the relative prominence of individual effects. For example: the simulation results in Arellano and Bover [3], Hahn and Kuersteiner [4], Alvarez and Arellano [5], Hahn et al. [6], Kiviet [7], Kruiniger [8], Okui [9], Roodman [10], Hayakawa [11] and Han and Phillips [12] just concern the panel AR(1) model under homoskedasticity. Although an extra regressor is included in the simulation studies in Arellano and Bond [1], Kiviet [13], Bowsher [14], Hsiao et al. [15], Bond and Windmeijer [16], Bun and Carree [17,18], Bun and Kiviet [19], Gouriéroux et al. [20], Hayakawa [21], Dhaene and Jochmans [22], Flannery and Hankins [23], Everaert [24] and Kripfganz and Schwarz [25], this regressor is (weakly-)exogenous and most experiments just concern homoskedastic disturbances and stationarity regarding the impact of individual effects. Blundell et al. [26] and Bun and Sarafidis [27] include an endogenous regressor, but their design does not allow us to control the degree of simultaneity; moreover, they stick to homoskedasticity. Harris et al. [28] only examine the effects of neglected endogeneity. Heteroskedasticity is considered in a few simulation experiments in Arellano and Bond [1] in the model with an exogenous regressor, and just for the panel AR(1) case in Blundell and Bond [2]. Windmeijer [29] analyzes panel GMM with heteroskedasticity, but without including a lagged dependent variable in the model. Bun and Carree [30] and Juodis [31] examine effects of heteroskedasticity in the model with a lagged dependent and a strictly exogenous regressor under stationarity regarding the effects. Moral-Benito [32] examines stationary and nonstationary regressors in a dynamic model with heteroskedasticity, but the extra regressor is predetermined or strictly exogenous. Moreover, his study is restricted to time-series heteroskedasticity, while assuming cross-sectional homoskedasticity. In a micro context cross-sectional heteroskedasticity seems more realistic to us, whereas it is also trickier when N is large and T small.
So, knowledge is still scarce with respect to the performance of GMM when it is not only needed to cope with genuine simultaneity (which we consider to be the core of econometrics), but also because of occurrence of heteroskedasticity of unknown form. Moreover, many of the simulation studies mentioned above did not systematically explore the effects of relevant nuisance parameter values on the finite sample distortions to asymptotic approximations. We examine estimating a prominent nuisance parameter, namely the variance of the individual effects, which to date has received surprisingly little attention in the literature. Regarding the performance of tests on the validity of instruments worrying results have been obtained in Bowsher [14] and Roodman [10] for homoskedastic models. On the other hand Bun and Sarafidis [27] report reassuring results, but these just concern models where T = 3 . Hence, it would be useful to examine more cases over an extended grid covering more dimensions. Our grid of examined cases will be much wider and have more dimensions. Moreover, we will deliberately explore both feasible and unfeasible versions of estimators and test statistics (in unfeasible versions particular nuisance parameter values are assumed to be known). Therefore we will be able to draw more useful conclusions on what aspects do have major effects on any inference inaccuracies in finite samples.
The data generating process designed here can be simulated for classes of models which may include individual and time effects, a lagged dependent variable regressor and another regressor which may be correlated with these and other individual effects and be either strictly exogenous or jointly dependent with regard to the idiosyncratic disturbances, whereas the latter may show a form of cross-section heteroskedasticity associated with both the individual effects. For a range of relevant parameter values we will verify in moderately large samples the properties of alternative GMM estimators, both 1-step and 2-step, focussing on alternative implementations regarding the weighting matrix and corresponding corrections to variance estimates according to the often practiced approach by Windmeijer [29]. This will include variants of the popular system estimator, which exploit as instruments the first-difference of lagged internal variables for the untransformed model in addition to lagged level internal variables as instruments for the model from which the individual effects have been removed. We will examine cases where the extra instruments are (in)valid in order to verify whether particular tests for overidentification restrictions have appropriate size and power, such that with reasonable probabilities valid instruments will be recognized as appropriate and invalid instruments will be detected and can be discarded. Moreover, following Kiviet and Feng [33], we shall investigate a rather novel modification of the traditional GMM implementation which aims at improving the strength of the exploited instruments in the presence of heteroskedasticity. Of course, also the simulation design used here has its limitations. It has only one extra regressor next to the lagged dependent variable, we only consider cross-sectional heteroskedasticity, and all basic random terms have been drawn from the normal distribution. Moreover, the design does not accommodate general forms of cross-sectional dependence between error terms. However, by including individual and time specific effects particular simple forms of cross-sectional dependence are accommodated.
Due to the high dimensionality of the Monte Carlo design a general discussion of the major findings is hard, because particular qualities (and failures) of inference techniques are usually not global but only occur in a particular limited context. However, in the last but one paragraph of the concluding Section 7 we nevertheless provide a list of eleven (a through k) established observations which seem very useful for practitioners. Here we will list a few more advises most of which seem contrary to current dominant practice: (i) many studies claim to have dealt with the limitations of a static model by just including the lagged dependent variable, whereas its single extra coefficient just leads to a highly restrictive dynamic model; (ii) it seems widely believed that an exogenous regressor should just be instrumented by itself, whereas using its lags as instruments too is highly effective to instrument further non-exogenous regressors; (iii) test statistics involving a large number of degrees of freedom will generally lack power when they jointly test restrictions of which only few are false and therefore Sargan-Hansen statistics should as a rule be partitioned into a series of well-chosen increments; (iv) it has been reported (see, for instance, Hayashi [34] p. 218), that Sargan-Hansen tests tend to overreject, especially when using many instruments, though in our simulations we find that underrejection is predominant, as already reported under homoskedasticity by Bowsher [14]; (v) estimates of nuisance parameters are generally useful to interpret estimated parameters of primary interest and therefore not only the variance of the idiosyncratic disturbances but also the variance of the individual effects should be examined.
The structure of this study is as follows. In Section 2 we first present the major issues regarding IV and GMM coefficient and variance estimation in linear models and on inference techniques on establishing instrument validity and regarding the coefficient values by standard and by corrected test statistics. Next in Section 3 the generic results of Section 2 are used to discuss in more detail than provided elsewhere the various options for their implementation in linear models for single dynamic simultaneous micro econometric panel data relationships with both individual and time effects and some form of cross-sectional heteroskedasticity. In Section 4 the Monte Carlo design is developed to analyze and compare the performance of alternative often asymptotically equivalent inference methods in finite samples of empirically relevant parametrizations. Section 5 summarizes the simulation results, from which some preferred techniques for use in finite samples of particular models emerge, plus a warning regarding particular types of models that require more refined methods yet to be developed. An empirical illustration, which involves data on labor supply earlier examined by Ziliak [35], can be found in Section 6, where we also formulate a tentative comprehensive specification search strategy for dynamic micro panel data models. Finally, in Section 7 the major findings are summarized.

2. Basic GMM Results for Linear Models

Here we present concisely the major generic results on IV and GMM inference for single indexed data. First we define the model and estimators, discuss some of their special properties and consider specific test situations. From these general findings for linear regressions the examined implementations for specific linear dynamic panel data models follow easily in Section 3.

2.1. Model and Estimators

Let the scalar dependent variable y i depend linearly on K regressors x i and an unobserved disturbance term u i , and let there be L K variables z i (the instruments) that establish orthogonality conditions such that
y i = x i β 0 + u i E [ z i ( y i x i β 0 ) ] = 0 i = 1 , . . . , N .
Here x i and β 0 are K × 1 vectors, β 0 containing the true values of the unknown coefficients, and z i is an L × 1 vector. Applying the analogy principle, the method of moments (MM) aims to find an estimator for model parameter β by solving the L sample moment equations
N 1 Σ i = 1 N z i ( y i x i β ^ ) = 0 .
Generally, these have a unique solution only when L = K and then yield
β ^ = ( Σ i = 1 N z i x i ) 1 Σ i = 1 N z i y i ,
provided the inverse exists. For L > K the MM recipe to find a unique estimator is: minimize with respect to β the criterion function Σ i = 1 N [ ( y j x j β ) z j ] G Σ i = 1 N [ z i ( y i x i β ) ] for some weighting matrix G . It can be shown that the asymptotically optimal choice for G is an expression which has a probability limit that is proportional to the inverse of the asymptotic variance V of N 1 / 2 Σ i = 1 N z i u i d N ( 0 , V ) . When u i i i d ( 0 , σ u 2 ) an optimal choice for G is proportional to the inverse of Σ i = 1 N z i z i and the MM estimator is
β ^ I V = [ X Z ( Z Z ) 1 Z X ] 1 X Z ( Z Z ) 1 Z y ,
where y = ( y 1 y N ) , X = ( x 1 x N ) and Z = ( z 1 z N ) . But, when E ( z i u i ) = 0 while u = ( u 1 u N ) ( 0 , σ u 2 Ω ) , where Ω has full rank and without loss of generality t r ( Ω ) = N , the optimal choice for G is a matrix proportional to ( Z Ω Z ) 1 , yielding MM estimator
β ^ G M M = [ X Z ( Z Ω Z ) 1 Z X ] 1 X Z ( Z Ω Z ) 1 Z y .
Note that for Ω = I N the latter formula simplifies to β ^ I V . When L = K both β ^ G M M and β ^ I V simplify to (3) or ( Z X ) 1 Z y . Below we focus on cases where Ω is diagonal.
When Ω is unknown and therefore (5) is unfeasible, one should use an informed guess Ω ( 0 ) to obtain the 1-step estimator β ^ G M M ( 1 ) , which is sub-optimal when Ω ( 0 ) Ω , though consistent under the assumptions made. Then the residuals
u ^ ( 1 ) = y X β ^ G M M ( 1 )
are consistent for u . In cases where G ^ ( 1 ) = ( N 1 Σ i = 1 N u ^ i ( 1 ) 2 z i z i ) 1 is such that plim ( G ^ ( 1 ) [ N 1 Z Ω Z ] 1 ) = O the 2-step estimator
β ^ G M M ( 2 ) = [ X Z G ^ ( 1 ) Z X ] 1 X Z G ^ ( 1 ) Z y
is asymptotically equivalent to β ^ G M M and thus asymptotically optimal, given the L instruments used.

2.2. Some Algebraic Peculiarities

Defining P Z = Z ( Z Z ) 1 Z and X ^ = P Z X one finds β ^ I V = ( X ^ X ^ ) 1 X ^ y , which highlights its two-stage least-squares character. Now suppose that X = ( X 1 , X 2 ) and Z = ( Z 1 , Z 2 ) with Z 2 = X 2 whereas X β = X 1 β 1 + X 2 β 2 , where β 1 and β 2 have K 1 and K 2 elements respectively. Standard results on partitioned regression yields
β ^ 1 , I V = ( X ^ 1 M X ^ 2 X ^ 1 ) 1 X ^ 1 M X ^ 2 y = ( X 1 P M X 2 Z 1 X 1 ) 1 X 1 P M X 2 Z 1 y ,
which is the IV estimator in the regression of y on just X 1 using the L K 2 instruments M X 2 Z 1 . This result is known as partialling out the predetermined regressors X 2 . It follows from X ^ 2 = P Z X 2 = X 2 which yields
M X ^ 2 X ^ 1 = M X 2 P Z X 1 = M X 2 ( P X 2 + P M X 2 Z 1 ) X 1 = P M X 2 Z 1 X 1 .
A similar result is not straight-forwardly available for GMM because of the following.
Let positive definite matrix Ω be factorized as follows
Ω 1 = Ψ Ψ , so Ω = Ψ 1 ( Ψ ) 1 .
Now define
y * = Ψ y , X * = Ψ X , Z = ( Ψ ) 1 Z ,
then
β ^ G M M = [ X * Z ( Z Z ) 1 Z X * ] 1 X * Z ( Z Z ) 1 Z y * = ( X * P Z X * ) 1 X * P Z y * ,
so GMM is equivalent to IV using transformed variables, but where Z has been transformed differently. Therefore, if X 2 is such that X 2 * establishes valid instruments in the transformed model y * = X * β + u * , where u * ( 0 , σ u 2 I N ) , the regressors X 2 * are not used as instruments in GMM in its IV interpretation. They would, though, if one would deliberately choose Z 2 = Ω 1 X 2 .
As is well-known and easily verified, linear transformations of the matrix of instruments of the form Z = Z C , where C is a full rank L × L matrix, have no effect on β ^ I V nor on β ^ G M M . However, there is not such invariance when the matrix Z is premultiplied by some transformation matrix, and hence not the columns but the rows of Z are directly affected. It has been shown in Kiviet and Feng [33] that such transformations, chosen in correspondence with the required transformation of the model when Ω I N , may lead to modified GMM estimation achieving higher efficiency levels and better results in finite samples than standard GMM, provided the validity of the transformed instruments is maintained. We will examine here the effects of employing transformation Z * = Ψ Z , which provides the modified GMM estimator
β ^ M G M M = [ X Ω 1 Z ( Z Ω 1 Z ) 1 Z Ω 1 X ] 1 X Ω 1 Z ( Z Ω 1 Z ) 1 Z Ω 1 y .
When this can be made feasible, it yields β ^ M G M M ( 2 ) . This modification attempts to employ the optimal instruments, see Arellano [36], Appendix B.

2.3. Particular Test Procedures

Inference on elements of β 0 based on β ^ G M M ( 2 ) of (7) requires an asymptotic approximation to its distribution. Under correct specification the standard first-order approximation is
β ^ G M M ( 2 ) a N ( β 0 , [ X Z G ^ ( 1 ) Z X ] 1 ) .
It allows testing general restrictions by Wald-type tests. For an individual coefficient, say β 0 k = e K , k β 0 , where e K , k is a K × 1 vector with all elements zero except its k t h element ( 1 k K ) which is unity, testing H 0 : β 0 k = β k 0 and allowing for one-sided alternatives, amounts to comparing test statistic
W β k = ( e K , k β ^ G M M ( 2 ) β k 0 ) / { e K , k [ X Z G ^ ( 1 ) Z X ] 1 e K , k } 1 / 2
with the appropriate quantile of the standard normal distribution. Note that this test statistic is actually an asymptotic t-test; in finite samples the type I error probability may deviate from the chosen nominal level, also depending on any employed loss of degrees of freedom corrections. In fact it has been observed that the consistent estimator of the variance of two-step GMM estimators V a r ^ ( β ^ G M M ( 2 ) ) given in (12) often underestimates the finite sample variance, because in its derivation the randomness of G ^ ( 1 ) is not taken into account. Windmeijer [29] provides a corrected formula V a r c ^ ( β ^ G M M ( 2 ) ) , see Appendix A, which can be used in the corrected t-test
W β k c = ( e K , k β ^ G M M ( 2 ) β k 0 ) / { e K , k V a r c ^ ( β ^ G M M ( 2 ) ) e K , k } 1 / 2 .
Provided L > K the overidentification restrictions can be tested by the Sargan-Hansen statistic. When Ω = I N this is given by J Z = u ^ I V P Z u ^ I V / σ ^ u 2 , where u ^ I V = y X β ^ I V and σ ^ u 2 = u ^ I V u ^ I V / N . Because P Z ( y β ^ I V ) = P Z [ I N X ( X ^ X ^ ) 1 X ^ ] u = P Z M X ^ u = ( P X ^ + P X ^ ) M X ^ u = P X ^ u , where P X ^ is idempotent, ( X , X ^ ) spans the same subspace as Z while X ^ X ^ = O , J Z is asymptotically χ 2 distributed with r a n k ( X ^ ) = L K degrees of freedom under correct specification and using valid instruments.
For general Ω the derivations in subsection 2.2 indicate that its GMM generalization is simply given by J Z = ( y * X * β ^ G M M ) P Z ( y * X * β ^ G M M ) / σ ^ u 2 , which necessarily has asymptotic null distribution χ L K 2 too. Here
σ ^ u 2 = ( y * X * β ^ G M M ) ( y * X * β ^ G M M ) / N = u ^ G M M Ω 1 u ^ G M M / N ,
where u ^ G M M = y X β ^ G M M , and P Z ( y * X * β ^ G M M ) = ( Ψ ) 1 Z ( Z Ω Z ) 1 Z u ^ G M M . Substitution yields for J Z the familiar expression
J Z = u ^ G M M Z ( Z Ω Z ) 1 Z u ^ G M M / σ ^ u 2 ,
which is unfeasible when Ω is unknown.
We will consider various feasible test statistics when Ω is diagonal. Choosing Ω ^ ( 0 ) = I N yields
J Z ( 1 , 0 ) = u ^ ( 1 ) Z ( Z Z ) 1 Z u ^ ( 1 ) / ( u ^ ( 1 ) u ^ ( 1 ) / N ) ,
with Z u ^ ( 1 ) = Z [ I N X ( X ^ X ^ ) 1 X ^ ] u . The variance of the limiting distribution of N 1 / 2 Z u ^ ( 1 ) differs from σ u 2   plim   N 1 Z Z , unless V a r ( u i z i ) = σ u 2 . The same holds for
J Z ( 1 , 1 ) = u ^ ( 1 ) Z G ^ ( 1 ) Z u ^ ( 1 ) .
So, these two tests have a limiting χ L K 2 distribution only under unconditional or conditional homoskedasticity.
For
J Z ( 2 , 1 ) = u ^ ( 2 ) Z G ^ ( 1 ) Z u ^ ( 2 ) ,
where u ^ ( 2 ) = y X β ^ G M M ( 2 ) and plim G ^ ( 1 ) = ( σ u 2 plim N 1 Z Ω Z ) 1 , we find
N 1 / 2 Z u ^ ( 2 ) = N 1 / 2 Z ( y X β ^ G M M ( 2 ) ) = N 1 / 2 Z [ u X ( β ^ G M M ( 2 ) β 0 ) ] N 1 / 2 Z { I N X [ X Z ( Z Ω Z ) 1 Z X ] 1 X Z ( Z Ω Z ) 1 Z } u = N 1 / 2 Z { I N X * [ X * P Z X * ] 1 X * P Z } u * ,
using the asymptotic approximation (A2) of Appendix A. Then it follows that J Z ( 2 , 1 ) converges to u * M X ^ * P Z M X ^ * u * / σ u 2 = u * P X ^ * u * / σ u 2 , where X ^ * = P Z X * , ( X ^ * , X ^ * ) = Z and X ^ * X ^ * = O . Hence J Z ( 2 , 1 ) does have asymptotic null distribution χ L K 2 , and so will J Z ( 2 , 2 ) , where u ^ ( 2 ) = y X β ^ G M M ( 2 ) is used to construct G ^ ( 2 ) .
When Z = ( Z m , Z a ) , where Z m is an N × L m matrix with L m K containing the instruments whose validity seems very likely, then, under the maintained hypothesis E ( Z m u ) = 0 , one can test the validity of the L L m additional instruments Z a by the incremental test statistic
J I Z a ( 2 , 1 ) = u ^ ( 2 ) Z G ^ ( 1 ) Z u ^ ( 2 ) u ^ m ( 2 ) Z m G ^ m ( 1 ) Z m u ^ m ( 2 ) ,
which under correct specification of the model with valid instruments Z is distributed as χ L L m 2 asymptotically. Of course, u ^ m ( 2 ) and G ^ m ( 1 ) are obtained by just using the instruments Z m . Note that for m = K we have J I Z a ( 2 , 1 ) = J Z ( 2 , 1 ) because in that case Z m u ^ m ( 2 ) = 0 . Hence, when m = K , explicit specification of component Z m is meaningless.
In simulations it is interesting to examine as well unfeasible versions of the above test statistics, which exploit information that is usually not available in practice. This will produce evidence on what elements of the feasible asymptotic tests may cause any inaccuracies in finite samples. So, next to (13), (19) and (20) we will also examine
W β k ( u ) = ( e K , k β ^ G M M β k 0 ) / { σ u 2 e K , k [ X Z ( Z Ω Z ) 1 Z X ] 1 e K , k } 1 / 2 ,
J Z ( u ) = u ^ Z ( Z Ω Z ) 1 Z u ^ / σ u 2 , where u ^ = y X β ^ G M M ,
J I Z a ( u ) = [ u ^ Z ( Z Ω Z ) 1 Z u ^ u ^ m Z m ( Z m Ω Z m ) 1 Z m u ^ m ] / σ u 2 .
Similar feasible and unfeasible implementations of t-tests and Sargan-Hansen tests for MGMM based estimators follow straight-forwardly.

3. Implementations for Dynamic Micro Panel Models

3.1. Model and Assumptions

We consider the balanced linear dynamic panel data model ( i = 1 , . . . , N ; t = 1 , . . . , T )
y i t = x i t β + w i t γ + v i t δ + μ + τ t + η i + ε i t ,
where x i t contains K x 0 strictly exogenous regressors (excluding a constant and fixed time effects), w i t are K w 0 predetermined regressors (probably including lags of the dependent variable and other variables affected by lagged feedback from y i t or just from ε i t ), v i t are K v 0 endogenous regressors (affected by instantaneous feedback from y i t and therefore jointly dependent with y i t ), μ is an overall constant, the τ t are random or fixed time effects, the η i are random individual specific effects (most likely correlated with many of the regressors) such that
η i i i d ( 0 , σ η 2 ) ,
whereas
E ( ε i t ) = 0 , E ( ε i t 2 ) = σ i t 2 , E ( ε i t ε j s ) = 0 , E ( η i ε j t ) = 0 , i , j , t s .
The parameter vector τ could have all its elements equal and then will be absorbed by the overall intercept μ of the model. However, it seems better to allow for time effects in addition to an intercept, because this helps to underpin the assumption that both the idiosyncratic disturbances and the individual effects have expectation zero. Note, though, that for identification at least one restriction should be imposed on the T + 1 scalar parameters represented by μ and τ .
The classification of the regressors implies
E ( x i t ε i s ) = 0 , E ( w i t ε i , t + l ) = 0 , E ( v i t ε i , t + 1 + l ) = 0 , i , t , s , l 0 .
For the sake of simplicity we assume that all regressors are time varying and that the vectors x i t , w i t or v i t are defined for t = 1 , . . . , T . However, their elements may contain observations prior to t = 1 for regressors that are actually the l t h order lag of a current variable. Only these lagged regressors are observed from t = 1 l onwards. This means that all regressors in (24), be it current variables or lags of them, have exactly T observations. So, any unbalancedness problems have been defined away; moreover, no internal instrumental variables can be constructed involving observations prior to those included in x i 1 , w i 1 or v i 1 .
Stacking the T time-series observations of (24) the equation in levels can be written
y i = X i β + W i γ + V i δ + I T τ + ι T ( μ + η i ) + ε i ,
where y i = ( y i 1 y i T ) , X i = ( x i 1 x i T ) , W i = ( w i 1 w i T ) , V i = ( v i 1 v i T ) , τ = ( τ 1 τ T ) and ι T is the T × 1 vector with all its elements equal to unity. We do allow K x = 0 , K w = 0 , K v = 0 , but not all three at the same time, so
K = K x + K w + K v > 0 .
We will focus on micro panels, where the number of time-series observations T is usually very small, possibly a one digit number, and the number of cross-section units N is large, usually at least several hundreds. Therefore, asymptotic approximations will be for N and T finite.

3.2. Removing Individual Effects by First Differencing

First we consider estimating the model by GMM following the approach propounded by Arellano and Bond [1], see also Holtz-Eakin et al. [37]. To clarify this, and the consequences it may have for the time-dummies, we introduce the matrices
D T = 1 1 0 0 0 1 1 0 0 0 1 1 and D T 1 * = 1 0 0 1 1 1 0 0 1 1 ,
where D T is ( T 1 ) × T and D T 1 * is its ( T 1 ) × ( T 1 ) submatrix after removing the first column. By taking first differences the intercept and the individual effects are removed and one may estimate the N sets of T 1 equations D T y i = D T X i β + D T W i γ + D T V i δ + D T τ + D T ε i . Denoting y ˜ i = D T y i , X ˜ i = D T X i , W ˜ i = D T W i , V ˜ i = D T V i , ε ˜ i = D T ε i and τ ˜ = D T τ , where τ ˜ t = τ t τ t 1 for t = 2 , . . . , T , this can compactly be expressed as y ˜ i = R ¯ i α ¯ + ε ˜ i , where R ¯ i = ( X ˜ i , W ˜ i , V ˜ i , I T 1 ) and α ¯ = ( β , γ , δ , τ ˜ ) .
The popular Stata package xtabond2 (StataCorp LLC, College Station, TX, USA) reparametrizes the time-effects differently (as we found out by experimentation). It substitutes τ ˜ = D T 1 * τ * . Hence, τ t * = τ t τ 1 for t = 2 , . . . , T , and it estimates
y ˜ i = R ˜ i α ˜ + ε ˜ i ,
where R ˜ i = ( X ˜ i , W ˜ i , V ˜ i , D T 1 * ) and α ˜ = ( β , γ , δ , τ * ) . So, basically, it addresses the problem that not all T time-dummy coefficients can be identified, by replacing the submatrix of regressors D T , which has rank T 1 , by full rank matrix D T 1 * , so by simply removing its first column, with the effect that coefficients τ * will be estimated.
Defining
Q T = 0 I T 1 and A T = 1 0 0 1 1 0 1 1 1 ,
where Q T is T × ( T 1 ) and A T is T × T lower-triangular, one easily finds that D T Q T = D T 1 * = A T 1 1 . Instead of τ * , one can directly estimate τ ˜ = D T 1 * τ * by xtabond2 (StataCorp LLC) by replacing in the equation in levels I T τ by A T τ * * , hence by replacing the time dummy variables by accumulated time dummies. Here τ * * = A T 1 τ = D T + 1 Q T + 1 τ = D T + 1 ( 0 , τ ) = ( τ 1 , τ ˜ ) . Note that D T A T = ( 0 , I T 1 ) . Hence, I T 1 τ ˜ = τ ˜ would remain, after removal of the first column, as in y ˜ i = R ¯ i α ¯ + ε ˜ i . Of course, many more different transformations of the T coefficients τ can be estimated, though, by taking first differences only T 1 linear transformations of them can be identified; by interpreting them appropriately no fundamental differences emerge.
To construct a full column rank matrix of instrumental variables Z = ( Z 1 Z N ) , which expresses as many linearly independent orthogonality conditions as possible for (31), while restricting ourselves to internal variables, i.e., variables occurring in (24), we define the following vectors
x i T = ( x i 1 x i T ) , w i t = ( w i 1 w i t ) , v i t = ( v i 1 v i t ) .
Without making this explicit in the notation it should be understood that these three vectors only contain unique elements. Hence, if vector x i s (or w i s ) contains for 1 < s T a particular value and also its lag (which is not possible for v i t ), then this lag should be taken out since it already appears in x i , s 1 . Matrix Z i is of order ( T 1 ) × L and consists of four blocks (though some of these may be void)
Z i = ( Z i x , Z i w , Z i v , D T 1 * ) .
The final block spans the same space as I T 1 and could thus be replaced by I T 1 . It is associated with the fundamental moment conditions E ( ε ˜ i ) = 0 . Therefore, it could form part of Z i even if one imposes τ = 0 . For the other blocks we have
Z i x = I T 1 x i T , Z i w = w i 1 0 0 O O 0 0 w i T 1 , Z i v = 0 0 0 v i 1 0 0 O O 0 0 v i T 2 .
The maximum possible number of columns of Z i x is K x T ( T 1 ) , for Z i w it is K w T ( T 1 ) / 2 and for Z i v it is K v ( T 1 ) ( T 2 ) / 2 , thus
L ( T 1 ) { T [ K x + ( K w + K v ) / 2 ] K v + 1 } ,
whereas MM estimation requires L K + T 1 . It follows from (26) and (27) that E ( Z i ε ˜ i ) = 0 indeed. In actual estimation one may use a subset of these instruments by taking the linear transformation Z i * = Z i C , where C is an L × L * matrix (with all its elements often being either zero, one or minus one) of rank L * < L , provided L * K + T 1 . In the above we have implicitly assumed that the variables are such that Z = ( Z 1 Z N ) will have full column rank, so another necessary condition is N ( T 1 ) L * . Of course, it is not required that individual blocks Z i have full column rank.
Despite its undesirable effect on the asymptotic variance of method of moments estimators, reducing the number of instruments may improve estimation precision, because it may at the same time mitigate estimation bias in finite samples, especially when weak instruments are being removed. So, instead of including the block D T 1 * or I T 1 in Z i one could—especially when the model has no time-effects—replace it by I T 1 ι T 1 = ι T 1 . Regarding Z i w and Z i v two alternative instrument reduction methods have been suggested, namely omitting long lags (see Bowsher [14], Windmeijer [29] and Bun and Kiviet [19]) and collapsing (see Roodman [10], but also suggested in Anderson and Hsiao [38]). Both are employed in Ziliak [35]; these two methods can also be combined.
Omitting long lags could be achieved by reducing Z i w to, for instance,
w i 1 0 0 0 0 0 0 0 w i 1 w i 2 0 0 0 0 0 0 0 w i 2 w i 3 0 0 O O O O 0 0 0 0 0 w i , T 2 w i , T 1
and similar for Z i v . The collapsed versions of Z i w and of Z i v can be denoted as
Z i * w = w i 1 0 0 w i 2 w i 1 O w i , T 1 w i , T 2 w i , 1 , Z i * v = 0 0 0 v i 1 0 0 v i 2 v i 1 O v i , T 2 w i , T 3 v i , 1 .
Collapsing can be combined with omitting long lags, if one removes all the columns of Z i * w and Z i * v which have at least a certain number of zero elements (say 1 or 2 or more) in their top rows. In corresponding ways, the column space of Z i x can be reduced by including in Z i either a limited number of lags and leads, or the collapsed matrix
Z i * x = x i 2 x i 1 0 0 x i 3 x i 2 x i 1 O x i , T x i , T 1 x i , T 2 x i , 1 ,
or just its first two or three columns – or what is often done in practice – simply the difference between the first two columns, the K x regressors ( Δ x i 2 , . . . , Δ x i T ) .
It seems useful to distinguish the following specific forms of instrument matrix reduction of the case where all instruments associated with valid linear moment restrictions are being used. The latter case we label as A (all); the reductions are labelled C (standard collapsing), L0, L1, L2, L3 (which primarily restrict the lag length), and C0, C1, C2, C3 (which combine the two reduction principles). In all the reductions we replace I T 1 by ι T 1 when the model does not include time-effects. Regarding Z i x , Z i w and Z i v different types of reductions can be taken, which we will distinguish by using for example the characterization: A v , L2 w , C1 x , etc. This leads to the particular reductions as indicated and defined in Table 1.
Note that for all three types of regressors L2, like L1, uses one extra lag compared to L0, but does not impose the first-difference restrictions characterizing L1. We skipped a similar intermediary case between L2 and L3. Self-evidently L2 x can also be represented by combining diag ( x i 1 , . . . , x i , T 1 ) with L1 x , and similar for L2 w and L2 v . The reductions C0 and C1, which yield just one instrument per regressor, constitute generalizations of the classic instruments suggested by Anderson and Hsiao [38]. These may lead to just identified models where the number of instruments equals the number of regressors which provokes the non-existence of moments problem. To avoid that, and also because we suppose that in general some degree of overidentification will have advantages regarding both estimation precision and the opportunity to test model adequacy, one may choose to restrict oneself to the popular C1 x and the reductions C and C3 as far as collapsing is concerned. In C3 just the first three columns of the matrices in (38) and (39) are being used as instruments.

3.2.1. Alternative Weighting Matrices

We assumed in (26) that the ε i t are serially and cross-sectionally uncorrelated but may be heteroskedastic. Let us define the matrix Ω i = diag ( σ i 1 2 , . . . , σ i T 2 ) , thus ε i ( 0 , Ω i ) and ε ˜ i = D T ε i ( 0 , D T Ω i D T ) . Under standard regularity we have
N 1 / 2 Σ i = 1 N Z i ε ˜ i d N ( 0 , plim N 1 Σ i = 1 N Z i D T Ω i D T Z i ) .
Hence, the optimal GMM estimator of α ˜ of (31) should use a weighting matrix such that its inverse has probability limit proportional to plim N 1 Σ i = 1 N Z i D T Ω i D T Z i . This can be achieved by first obtaining a consistent 1-step GMM estimator
α ˜ ^ ( 1 ) = [ ( Σ i = 1 N R ˜ i Z i ) G ( 0 ) ( Σ i = 1 N Z i R ˜ i ) ] 1 ( Σ i = 1 N R ˜ i Z i ) G ( 0 ) ( Σ i = 1 N Z i y ˜ i ) ,
which uses the weighting matrix
G ( 0 ) = Σ i = 1 N Z i D T D T Z i 1 .
This is already efficient if Ω i = σ ε 2 I T ; otherwise, in a second step, the consistent 1-step residuals ε ˜ ^ i ( 1 ) = y ˜ i R ˜ i α ˜ ^ ( 1 ) can be used to construct the asymptotically optimal weighting matrix
G ^ a ( 1 ) = ( Σ i = 1 N Z i ε ˜ ^ i ( 1 ) ε ˜ ^ i ( 1 ) Z i ) 1 .
An alternative is using
G ^ b ( 1 ) = ( Σ i = 1 N Z i H ^ i ( 1 ) b Z i ) 1 ,
where H ^ i ( 1 ) b is the band matrix
H ^ i ( 1 ) b = ε ˜ ^ i 2 ( 1 ) ε ˜ ^ i 2 ( 1 ) ε ˜ ^ i 2 ( 1 ) ε ˜ ^ i 3 ( 1 ) 0 0 ε ˜ ^ i 3 ( 1 ) ε ˜ ^ i 2 ( 1 ) ε ˜ ^ i 3 ( 1 ) ε ˜ ^ i 3 ( 1 ) ε ˜ ^ i 3 ( 1 ) ε ˜ ^ i 4 ( 1 ) 0 0 ε ˜ ^ i 4 ( 1 ) ε ˜ ^ i 3 ( 1 ) ε ˜ ^ i 4 ( 1 ) ε ˜ ^ i 4 ( 1 ) 0 0 ε ˜ ^ i , T 1 ( 1 ) ε ˜ ^ i , T 1 ( 1 ) ε ˜ ^ i , T 1 ( 1 ) ε ˜ ^ i T ( 1 ) 0 0 0 ε ˜ ^ i T ( 1 ) ε ˜ ^ i , T 1 ( 1 ) ε ˜ ^ i T ( 1 ) ε ˜ ^ i T ( 1 ) .
Both ( N G ^ a ( 1 ) ) 1 and ( N G ^ b ( 1 ) ) 1 have a probability limit equal to the limiting variance of (40). The latter is less robust, but may converge faster when Ω i is diagonal indeed. On the other hand (44) may not be positive definite, whereas (43) is.
For the special case Ω i = σ ε , i 2 I T of cross-section heteroskedasticity but time-series homoskedasticity, one could use
G ^ c ( 1 ) = ( Σ i = 1 N Z i H ^ i ( 1 ) c Z i ) 1 ,
with
H ^ i ( 1 ) c = σ ^ ε , i 2 , ( 1 ) H = σ ^ ε , i 2 , ( 1 ) 2 1 0 0 1 2 0 0 2 1 0 0 1 2 ,
where H = D T D T and
σ ^ ε , i 2 , ( 1 ) = ε ˜ ^ i ( 1 ) H 1 ε ˜ ^ i ( 1 ) / ( T 1 ) .
Of course, these N estimators are not consistent for T finite. However, a consistent estimator for σ ε 2 = N 1 Σ i = 1 N σ ε , i 2 is given by
σ ^ ε 2 , ( 1 ) = N 1 Σ i = 1 N σ ^ ε , i 2 , ( 1 ) .
The three different weighting matrices can be used to calculate alternative α ˜ ^ ( j ) ( 2 ) estimators for j { a , b , c } according to
α ˜ ^ ( j ) ( 2 ) = [ ( Σ i = 1 N R ˜ i Z i ) G ^ j ( 1 ) ( Σ i = 1 N Z i R ˜ i ) ] 1 ( Σ i = 1 N R ˜ i Z i ) G ^ j ( 1 ) ( Σ i = 1 N Z i y ˜ i ) .
When the employed weighting matrix is asymptotically optimal indeed, the first-order asymptotic approximation to the variance of α ˜ ^ ( j ) ( 2 ) is given by the inverse of the matrix in square brackets. From this (corrected) t-tests are easily obtained, see Section 3.4. Matching implementations of Sargan-Hansen statistics follow easily too, see Section 3.5. Note that estimators for σ ε , i 2 or σ ε 2 can also be obtained by employing second-stage residuals.
Let α ˜ ^ represent any of the consistent estimators of α ˜ mentioned above, and consider the residuals u ^ i = y i ( X i , W i , V i , Q T ) α ˜ ^ . From these we find μ + τ 1 ^ = N 1 T 1 Σ i = 1 N ι T u ^ i , giving u ^ i = u ^ i μ + τ 1 ^ , which for N converges to u i = ι T η i + ε i . Since E ( u i u i ) = σ ε , i 2 I T + σ η 2 ι T ι T we have plim N 1 Σ i = 1 N u ^ i u ^ i = σ ε 2 I T + σ η 2 ι T ι T . This yields plim N 1 Σ i = 1 N ι T u ^ i u ^ i ι T = T σ ε 2 + σ η 2 T 2 from which the consistent estimator T 2 N 1 Σ i = 1 N ( ι T u ^ i ) 2 T 1 σ ^ ε 2 , ( 1 ) for σ η 2 follows. From simulations we established that, especially when σ η is relatively small, this estimator is often negative. An alternative more satisfactory consistent estimator turns out to be
σ ^ η 2 , ( 1 ) = T 1 N 1 Σ i = 1 N u ^ i u ^ i σ ^ ε 2 , ( 1 ) .
This does not hinge as much on the serial uncorrelatedness of the ε i t . However, estimator (51) can be negative too, especially when σ η / σ ε is small or T is very small. When this happens it seems reasonable to set σ ^ η 2 , ( 1 ) = 0 . Note that this does not jeopardize the consistency of the estimator; therefore we followed this approach in the simulations, rather then using the non-negative estimator N 1 Σ i = 1 N η ^ i 2 , where η ^ i = T 1 Σ t = 1 T u ^ i t μ + τ 1 ^ , since this is inconsistent, because E ( η ^ i 2 ) σ η 2 .

3.3. Respecting the Equation in Levels as Well

In this subsection we will examine whether the first-difference operation in the foregoing subsection implied a loss of valid orthogonality conditions embodied by our initial assumptions made in Section 3.1.
Since τ = τ 1 ι T + Q T τ * , we can rewrite model (28) as
y i = X i β + W i γ + V i δ + Q T τ * + ( μ + τ 1 + η i ) ι T + ε i = R i α ˜ + ( μ + τ 1 ) ι T + u i = R i * α ¨ + u i ,
where R i = ( X i , W i , V i , Q T ) , R i * = ( R i , ι T ) and ( K + T ) × 1 vector α ¨ = ( α ˜ , μ + τ 1 ) ; note that R ˜ i = D T R i .
Regressor ι T is a valid instrument for model (52). It embodies the single orthogonality condition E [ Σ t = 1 T ( η i + ε i t ) ] = E [ Σ t = 1 T u i t ] = 0 ( i ) , which is implied by the T + 1 assumptions E ( η i ) = 0 and E ( ε i t ) = 0 (for t = 1 , . . . , T ) made in (25) and (26). These T + 1 assumptions can also be represented (through linear transformation) by (i) E ( η i ) = 0 , (ii) E ( Δ ε i t ) = 0 (for t = 2 , . . . , T ) and (iii) E ( Σ t = 1 T u i t ) = 0 . Because we cannot express η i exclusively in observed variables and unknown parameters it is impossible to convert (i) into a separate sample orthogonality condition. The T 1 orthogonality conditions (ii) are already employed by Arellano-Bond estimation, through including I T 1 or D T 1 * in Z i of (34) for the equation in first differences. Orthogonality condition (iii), which is in terms of the level disturbance, can be exploited by including the column ι T in the i-th block of an instrument matrix for level equation (52). Apparently, this condition will get lost when estimation is just based on the equation in first differences.
Combining the T 1 difference equations and the T level equations in a system yields
y ¨ i = R ¨ i α ¨ + u ¨ i ,
for each individual i , where y ¨ i = ( y ˜ i , y i ) , R ¨ i = ( R ˜ i * , R i * ) , with R ˜ i * = ( R ˜ i , 0 ) , so it is extended by an extra column of zeros (to annihilate coefficient μ + τ 1 in the equation in first differences), and u ¨ i = ( ε ˜ i , u i ) . We find that E ( ε ˜ i u i ) = E ( D ε i ε i ) = D Ω i and E ( u i u i ) = E [ ( η i ι T + ε i ) ( η i ι T + ε i ) ] = σ η 2 ι T ι T + Ω i , so
E ( u ¨ i u ¨ i ) = D Ω i D D Ω i Ω i D Ω i + σ η 2 ι T ι T .
Model (53) can be estimated by MM using the N ( 2 T 1 ) × ( L + 1 ) matrix of instruments with blocks
Z ¨ i = Z i 0 O ι T ,
provided N ( 2 T 1 ) L + 1 K + T . Since both R ¨ i and Z ¨ i contain a column ( 0 , ι T ) , and due to the occurrence of the O-block in Z ¨ i , by a minor generalization of result (8) the IV estimator of α ¨ obtained by using instrument blocks Z ¨ i in (53) will be equivalent regarding α ˜ with the IV estimator of Equation (31) using instruments with blocks Z i . That the same holds here for GMM under cross-sectional heteroskedasticity when using optimal instruments is due to the very special shape of Z ¨ i and is proved in Appendix B. Hence, there seems no good reason to estimate the system, just in order to exploit the extra valid instrument ( 0 , ι T ) .

3.3.1. Effect Stationarity

However, more valid internal instruments can be found for the equation in levels when some of the regressors X i , W i or V i are known to be uncorrelated (like ι T ) with the individual effects, or (which is more general) have time-invariant correlation with the individual effects. Then, after first differencing, these explanatory variables will be uncorrelated with η i . Let r i t = ( x i t , w i t , v i t ) contain the K = K x + K w + K v unique elements of r i t which are effect stationary, by which we mean that E ( r i t η i ) is time-invariant, so that
E ( Δ r i t η i ) = 0 , i , t = 2 , . . . , T .
This implies that for the equation in levels the following orthogonality conditions hold
E [ Δ x i t ( η i + ε i s ) ] = 0 E [ Δ w i t ( η i + ε i , t + l ) ] = 0 E [ Δ v i t ( η i + ε i , t + 1 + l ) ] = 0 i , t > 1 , s 1 , l 0
When w i t includes y i , t 1 , then apparently y i t is effect stationary so that the adopted model (24) suggests that all regressors in r i t must be effect stationary, resulting in K = K .
Like for the T 1 conditions E ( Δ ε i t ) = 0 discussed below Equation (52), many of the conditions (56) are already implied by the orthogonality conditions E ( Z i ε ˜ i ) = 0 for the equation in first-differences. In Appendix C we demonstrate that a matrix Z ˜ i s of instruments can be designed for the equation in levels (52) just containing instruments additional to those already exploited by E ( Z i ε ˜ i ) = 0 , whilst E [ Z ˜ i s ( η i ι T + ε i ) ] = 0 . This is the T × L matrix
Z ˜ i s = ( Z ˜ i x , Z ˜ i w , Z ˜ i v , ι T ) ,
where L = K ( T 1 ) K v + 1 , with
Z ˜ i x = 0 0 0 Δ x i 2 0 0 0 Δ x i 3 0 0 0 Δ x i T , Z ˜ i w = 0 0 0 Δ w i 2 0 0 0 Δ w i 3 0 0 0 Δ w i T , Z ˜ i v = 0 0 0 0 Δ v i 2 0 0 Δ v i , T 1 .
Under effect stationarity of the K variables (56) the system (53) can be estimated while exploiting the matrix of instruments
Z ¨ i s = Z i O O Z ˜ i s .
If one decides to collapse the instruments included in Z i , it seems reasonable to collapse Z ˜ i s as well and replace it by
Z ¨ i * s = 0 0 0 1 Δ x i 2 Δ w i 2 0 1 Δ x i 3 Δ w i 3 Δ v i 2 1 Δ x i T Δ w i T Δ v i , T 1 1 .
Note that Z ¨ i * s has L = K + 1 columns.

3.3.2. Alternative Weighting Matrices under Effect Stationarity

For the above system we have
N 1 / 2 Σ i = 1 N Z ¨ i s u ¨ i d N ( 0 , plim N 1 Σ i = 1 N Φ i ) ,
with
Φ i = Z i D Ω i D Z i Z i D Ω i Z ˜ i s Z ˜ i s Ω i D Z i Z ˜ i s ( Ω i + σ η 2 ι T ι T ) Z ˜ i s .
Hence a feasible initial weighting matrix is given by
S ( 0 ) ( q ) = [ Σ i = 1 N Φ i ( 0 ) ( q ) ] 1 ,
where
Φ i ( 0 ) ( q ) = Z i D D Z i Z i D Z ˜ i s Z ˜ i s D Z i Z ˜ i s ( I T + q ι T ι T ) Z ˜ i s ,
with q some nonnegative real value. Weighting matrix S ( 0 ) ( q ) would be optimal if Ω i = σ ε 2 I T with q = σ η 2 / σ ε 2 . For any nonnegative q a consistent 1-step GMM system estimator is given by
α ¨ ^ ( 1 ) ( q ) = [ ( Σ i = 1 N R ¨ i Z ¨ i s ) S ( 0 ) ( q ) ( Σ i = 1 N Z ¨ i s R ¨ i ) ] 1 ( Σ i = 1 N R ¨ i Z ¨ i s ) S ( 0 ) ( q ) ( Σ i = 1 N Z ¨ i s y ¨ i ) .
Next, in a second step, the consistent 1-step residuals u ¨ ^ i ( 1 ) = y ¨ i R ¨ i α ¨ ^ ( 1 ) ( q ) can be used to construct the asymptotically optimal weighting matrix
S ^ a ( 1 ) = ( Σ i = 1 N Z ¨ i s u ¨ ^ i ( 1 ) u ¨ ^ i ( 1 ) Z ¨ i s ) 1 ,
where u ¨ ^ i ( 1 ) = ( ε ˜ ^ i s ( 1 ) , u ^ i s ( 1 ) ) with ε ˜ ^ i s ( 1 ) = y ˜ i R ˜ i * α ¨ ^ ( 1 ) ( q ) and u ^ i s ( 1 ) = y i R i * α ¨ ^ ( 1 ) ( q ) . However, several alternatives are possible. Consider weighting matrix
S ^ b ( 1 ) = Σ i = 1 N Z i H ^ i s ( 1 ) Z i Z i D ^ i s ( 1 ) Z ˜ i s Z ˜ i s D ^ i s ( 1 ) Z i Z ˜ i s u ^ i s ( 1 ) u ^ i s ( 1 ) Z ˜ i s 1 ,
where H ^ i s ( 1 ) is self-evidently like H ^ i ( 1 ) b but on the basis of the residuals ε ˜ ^ i s ( 1 ) , and
D ^ i s ( 1 ) = ε ˜ ^ i 2 s ( 1 ) u ^ i 1 s ( 1 ) ε ˜ ^ i 2 s ( 1 ) u ^ i 2 s ( 1 ) 0 0 0 0 ε ˜ ^ i 3 s ( 1 ) u ^ i 2 s ( 1 ) ε ˜ ^ i 3 s ( 1 ) u ^ i 3 s ( 1 ) 0 0 ε ˜ ^ i , T 1 s ( 1 ) u ^ i , T 1 s ( 1 ) ε ˜ ^ i , T 1 s ( 1 ) u ^ i T s ( 1 ) 0 0 0 0 ε ˜ ^ i T s ( 1 ) u ^ i T s ( 1 ) .
For the special case σ ε 2 Ω i = σ ε , i 2 I T of cross-section heteroskedasticity and time-series homoskedasticity one can use the weighting matrix
S ^ c ( 1 ) = Σ i = 1 N σ ^ ε , i 2 , s ( 1 ) Z i H Z i Z i D Z ˜ i s Z ˜ i s D Z i Z ˜ i s [ I T + ( σ ^ η 2 , ( 1 ) / σ ^ ε , i 2 , s ( 1 ) ) ι T ι T ] Z ˜ i s 1 ,
where
σ ^ ε , i 2 , s ( 1 ) = ε ˜ ^ i s ( 1 ) H 1 ε ˜ ^ i s ( 1 ) / ( T 1 ) ,
σ ^ η 2 , s ( 1 ) = T 1 N 1 Σ i = 1 N u ^ i s ( 1 ) u ^ i s ( 1 ) N 1 Σ i = 1 N σ ^ ε , i 2 , s ( 1 ) .
For j { a , b , c } three alternative 2-step system estimators
α ¨ ^ ( j ) ( 2 ) = [ ( Σ i = 1 N R ¨ i Z ¨ i s ) S ^ j ( 1 ) ( Σ i = 1 N Z ¨ i s R ¨ i ) ] 1 ( Σ i = 1 N R ¨ i Z ¨ i s ) S ^ j ( 1 ) ( Σ i = 1 N Z ¨ i s y ¨ i )
are obtained, where the inverse matrix expression can be used again to estimate the variance of α ¨ ^ ( j ) ( 2 ) if all employed moment conditions are valid.

3.4. Coefficient Restriction Tests

Simple Student-type coefficient test statistics can be obtained from 1-step and 2-step AB and BB estimation for the different weighting matrices considered. The 1-step estimators can be used in combination with a robust variance estimate (which takes possible heteroskedasticity into account). The 2-step estimators can be used in combination with the standard or a corrected variance estimate.1
When testing particular coefficient values, the relevant element of estimator α ˜ ^ ( 1 ) given in (41) should under homoskedasticity be scaled by the corresponding diagonal element of the standard expression for its estimated variance given by
V a r ^ ( α ˜ ^ ( 1 ) ) = N 1 Σ i = 1 N σ ^ ε , i 2 , ( 1 ) Ψ , with Ψ = [ ( Σ i = 1 N R ˜ i Z i ) G ( 0 ) ( Σ i = 1 N Z i R ˜ i ) ] 1 ,
where σ ^ ε , i 2 , ( 1 ) is given in (48). Its robust version under cross-sectional heteroskedasticity uses for j { a , b , c }
V a r ^ ( j ) ( α ˜ ^ ( 1 ) ) = Ψ ( Σ i = 1 N R ˜ i Z i ) G ( 0 ) [ G ^ j ( 1 ) ] 1 G ( 0 ) ( Σ i = 1 N Z i R ˜ i ) Ψ .
However, under heteroskedasticity the estimators α ˜ ^ ( j ) ( 2 ) given in (70) are more efficient. The standard estimator for their variance is
V a r ^ ( α ˜ ^ ( j ) ( 2 ) ) = [ ( Σ i = 1 N R ˜ i Z i ) G ^ j ( 1 ) ( Σ i = 1 N Z i R ˜ i ) ] 1 .
The corrected version V a r c ^ ( α ˜ ^ ( j ) ( 2 ) ) requires derivation for k = 1 , . . . , K 1 of the actual implementation of matrix Ω ( β ) / β k of Appendix A which is here N ( T 1 ) × N ( T 1 ) . We denote its i-th block as Ω ˜ ( j ) i ( α ˜ ) / α ˜ k . For the a-type weighting matrix2 the relevant T 1 × T 1 matrix ε ˜ i ε ˜ i / α ˜ k with ε ˜ i = y ˜ i R ˜ i α ˜ , is ( ε ˜ i R ˜ i k + R ˜ i k ε ˜ i ) , where R ˜ i k denotes the k-th column of R ˜ i . For weighting matrix b it simplifies to the matrix consisting of the main diagonal and the two first sub-diagonals of ( ε ˜ i R ˜ i k + R ˜ i k ε ˜ i ) with all other elements zero. Moreover, Ω ˜ ( c ) i ( α ˜ ) / α ˜ k = 2 [ ε ˜ i H 1 R ˜ i k / ( T 1 ) ] H . So, we find
V a r c ^ ( α ˜ ^ ( j ) ( 2 ) ) = V a r ^ ( α ˜ ^ ( j ) ( 2 ) ) + F ^ ( j ) V a r ^ ( α ˜ ^ ( j ) ( 2 ) ) + V a r ^ ( α ˜ ^ ( j ) ( 2 ) ) F ^ ( j ) + F ^ ( j ) V a r ^ ( j ) ( α ˜ ^ ( 1 ) ) F ^ ( j ) ,
with the k-th column of F ^ ( j ) given by
F ^ ( j ) · k = V a r ^ ( α ˜ ^ ( j ) ( 2 ) ) ( Σ i = 1 N R ˜ i Z i ) G ^ j ( 1 ) Σ i = 1 N Z i Ω ˜ ( j ) i ( α ˜ ) α ˜ k α ˜ ^ ( 1 ) Z i G ^ j ( 1 ) ( Σ i = 1 N Z i ε ˜ ^ i ( 2 ) ) .
All above expressions become a bit more complex when considering Blundell-Bond estimation of the K coefficients α ¨ . The suboptimal 1-step estimator (63) of α ¨ should not be used for testing, unless in combination with
V a r ^ ( α ¨ ^ ( 1 ) ) = Φ ( Σ i = 1 N R ¨ i Z ¨ i s ) S ( 0 ) ( q ) [ S ( 0 ) ( σ ^ η 2 , s ( 1 ) / σ ^ ε 2 , s ( 1 ) ) ] 1 S ( 0 ) ( q ) ( Σ i = 1 N Z ¨ i s R ¨ i ) Φ ,
under homoskedasticity, or a robust variance estimator, which is
V a r ^ ( j ) ( α ¨ ^ ( 1 ) ) = Φ ( Σ i = 1 N R ¨ i Z ¨ i s ) S ( 0 ) ( q ) [ S ^ j ( 1 ) ] 1 S ( 0 ) ( q ) ( Σ i = 1 N Z ¨ i s R ¨ i ) Φ ,
where Φ = [ ( Σ i = 1 N R ¨ i Z ¨ i s ) S ( 0 ) ( q ) ( Σ i = 1 N Z ¨ i s R ¨ i ) ] 1 . It seems better of course to use the efficient estimator α ¨ ^ ( j ) ( 2 ) of (70). The standard expression for its estimated variance is
V a r ^ ( α ¨ ^ ( j ) ( 2 ) ) = [ ( Σ i = 1 N R ¨ i Z ¨ i s ) S ^ j ( 1 ) ( Σ i = 1 N Z ¨ i s R ¨ i ) ] 1 .
Their corrected versions can be obtained by a formula similar to (74) upon changing α ˜ in α ¨ and F ^ ( j ) · k of (75) in
F ^ ( j ) · k = V a r ^ ( α ¨ ^ ( j ) ( 2 ) ) ( Σ i = 1 N R ¨ i Z ¨ i s ) S ^ j ( 1 ) Σ i = 1 N Z ¨ i s Ω ¨ ( j ) i ( α ¨ ) α ¨ k α ¨ ^ ( 1 ) Z ¨ i s S ^ j ( 1 ) ( Σ i = 1 N Z ¨ i s ε ¨ ^ i ( 2 ) ) ,
where the block of Ω ¨ ( j ) i ( α ¨ ) / α ¨ k corresponding to the equation in first differences is similar as before, but with an extra column and row of zeros for the intercept. The block corresponding to the equation in levels we took for weighting matrices a and b equal to u i u i / α ¨ k = ( u i R ¯ i k * + R ¯ i k * u i ) , and for type c
{ ε ˜ i H 1 ε ˜ i / ( T 1 ) + Σ i = 1 N [ ( ι T u i ) 2 u i u i ] / [ N T ( T 1 ) ] } I T / α ¨ k ,
for which the first term yields 2 [ ( ε ˜ i H 1 R ˜ i k ) / ( T 1 ) ] I T , and the second gives
2 { Σ i = 1 N ( ι T u i R ¯ i k * ι T R ¯ i k * u i ) / [ N T ( T 1 ) ] } I T .
For the nondiagonal upperblock of Ω ¨ ( j ) i ( α ¨ ) / α ¨ k we took in cases a and b ε ˜ i u i / α ˜ k = ( ε ˜ i R ¯ i k * + R ˜ i k u i ) and for the derivative with respect to the intercept ε ˜ i ι T . In case c it is 2 [ ( ε ˜ i H 1 R ˜ i k ) / ( T 1 ) ] D and a zero matrix for the derivative with respect to the intercept.

3.5. Tests of Overidentification Restrictions

Using Arellano-Bond and Blundell-Bond type estimation, many options exist with respect to testing the overidentification restrictions. These options differ in the residuals and weighting matrices being employed. After 1-step Arellano-Bond estimation, see (41) and (48), we have the test statistics
J A B ( 1 , 0 ) = ( Σ i = 1 N ε ˜ ^ i ( 1 ) Z i ) ( Σ i = 1 N Z i H Z i ) 1 ( Σ i = 1 N Z i ε ˜ ^ i ( 1 ) ) / ( N 1 Σ i = 1 N σ ^ i 2 , ( 1 ) ) ,
J A B j ( 1 , 1 ) = ( Σ i = 1 N ε ˜ ^ i ( 1 ) Z i ) G ^ j ( 1 ) ( Σ i = 1 N Z i ε ˜ ^ i ( 1 ) ) , j { a , b , c }
which are only valid in case of conditional homoskedasticity. Here G ^ j ( 1 ) is given in (43), (44) and (46). From (70) one may obtain 2-step residuals ε ˜ ^ i ( j ) ( 2 ) = y ˜ i R ˜ i * α ¨ ^ ( j ) ( 2 ) , and from these overidentification restrictions test statistics can be calculated which are valid under heteroskedasticity. These may differ depending on whether the j-th weighting matrix is now obtained still from 1-step or already from 2-step residuals. This leads to
J A B j ( 2 , h ) = ( Σ i = 1 N ε ˜ ^ a , i ( 2 ) Z i ) G ^ j ( h ) ( Σ i = 1 N Z i ε ˜ ^ a , i ( 2 ) ) , for h { 1 , 2 }
where the 2-step weighting matrices are either G ^ a ( 2 ) = ( Σ i = 1 N Z i ε ˜ ^ i ( a ) ( 2 ) ε ˜ ^ i ( a ) ( 2 ) Z i ) 1 , G ^ b ( 2 ) = ( Σ i = 1 N Z i H ^ i ( 2 ) b Z i ) 1 or G ^ c ( 2 ) = ( Σ i = 1 N σ ^ ε , i ( c ) 2 , ( 2 ) Z i H Z i ) 1 , and H ^ i ( 2 ) b is like H ^ i ( 1 ) b of (45), though using ε ˜ ^ i ( b ) ( 2 ) instead of ε ˜ ^ i ( 1 ) ; furthermore σ ^ ε , i ( c ) 2 , ( 2 ) = ε ˜ ^ i ( c ) ( 2 ) H 1 ε ˜ ^ i ( c ) ( 2 ) / ( T 1 ) .
Exploiting effect stationarity of a subset of the regressors by estimating the Blundell-Bond system leads to the 1-step test statistics (exclusively for use under conditional homoskedasticity)
J B B ( 1 , 0 ) = ( Σ i = 1 N u ¨ ^ i ( 1 ) Z ¨ i s ) S ( 0 ) ( σ ^ η 2 , s ( 1 ) / σ ^ ε 2 , s ( 1 ) ) ( Σ i = 1 N Z ¨ i s u ¨ ^ i ( 1 ) ) / σ ^ ε 2 , s ( 1 ) ,
J B B j ( 1 , 1 ) = ( Σ i = 1 N u ¨ ^ i ( 1 ) Z ¨ i s ) S ^ j ( 1 ) ( Σ i = 1 N Z ¨ i s u ¨ ^ i ( 1 ) ) , j { a , b , c }
where σ ^ ε 2 , s ( 1 ) = Σ i = 1 N σ ^ ε , i 2 , s ( 1 ) / N and S ( 0 ) ( · ) and S ^ j ( 1 ) can be found in (62), (64), (65) and (67). Defining the various 2-step residuals and variance estimators as u ¨ ^ i ( j ) ( 2 ) = y ¨ i R ¨ i α ¨ ^ ( j ) ( 2 ) = ( ε ˜ ^ i ( j ) s ( 2 ) , u ^ i ( j ) s ( 2 ) ) and σ ^ ε , i ( j ) 2 , s ( 2 ) and σ ^ η ( j ) 2 , ( 2 ) similar to (68) and (69) though obtained from the appropriate two-step residuals ε ˜ ^ i ( j ) s ( 2 ) = y ˜ i R ˜ i * α ¨ ^ ( j ) ( 2 ) and u ^ i ( j ) s ( 2 ) = y i R i * α ¨ ^ ( j ) ( 2 ) , the statistics to be used under heteroskedasticity after 2-step estimation are
J B B j ( 2 , h ) = ( Σ i = 1 N u ¨ ^ i ( j ) ( 2 ) Z ¨ i s ) S ^ j ( h ) ( Σ i = 1 N Z ¨ i s u ¨ ^ i ( j ) ( 2 ) ) ,
where S ^ a ( 2 ) and S ^ b ( 2 ) are like S ^ a ( 1 ) and S ^ b ( 1 ) , except that they use u ¨ ^ i ( a ) ( 2 ) and u ¨ ^ i ( b ) ( 2 ) instead of u ¨ ^ i ( 1 ) . With respect to S ^ c ( 2 ) one can use
S ^ c ( 2 ) = Σ i = 1 N σ ^ ε , i ( j ) 2 , s ( 2 ) Z i H Z i Z i D Z ˜ i s Z ˜ i s D Z i Z ˜ i s [ I T + ( σ ^ η ( c ) 2 , ( 2 ) / σ ^ ε , i ( c ) 2 , s ( 2 ) ) ι T ι T ] Z ˜ i s 1 .
Under their respective null hypotheses the tests based on Arellano-Bond estimation follow asymptotically χ 2 distributions with L K T + 1 degrees of freedom, whereas the tests based on Blundell-Bond estimates have L + L K T degrees of freedom3. Self-evidently tests on the effect stationarity related orthogonality conditions are given by
J E S ( 1 , 0 ) = J B B ( 1 , 0 ) J A B ( 1 , 0 ) ,
J E S j ( l , h ) = J B B j ( l , h ) J A B j ( l , h ) , 0 < l h { 1 , 2 } , j { a , b , c } ,
where J E S ( 1 , 0 ) and J E S j ( 1 , 1 ) require homoskedasticity. They should all be compared with a χ 2 critical value for L 1 degrees of freedom.4

3.6. Modified GMM

In the special case that panel model (28) has cross-sectional heteroskedasticity and no time-series heteroskedasticity, hence
σ ε 2 Ω i = σ ε , i 2 I T , with Σ i = 1 N σ ε , i 2 = σ ε 2 N ,
we can easily employ MGMM estimator (11). However, because H 1 is not a lower-triangular matrix, not all instruments σ ε , i 2 H 1 Z i would be valid for the equation in first-differences. This problem can be avoided by using, instead of first-differencing, the forward orthogonal deviation (FOD) transformation for removing the individual effects. Let
B = T 1 T 0 0 0 0 T 2 T 1 0 0 0 2 3 0 0 0 0 1 2 1 / 2 1 1 T 1 1 T 1 1 T 1 0 1 1 T 2 1 T 2 1 T 2 0 0 1 1 2 1 2 0 0 0 1 1 ,
and ε ˇ i = B ε i . Then B ι T = 0 and B u i = ε ˇ i ( 0 , σ ε , i 2 I T 1 ) provided (88) holds, whereas E ( Z i ε ˇ i ) = 0 for Z i given by (34). Hence, premultiplying model y i = R i α ˜ + ( μ + τ 1 + η i ) ι T + ε i by B yields
y ˇ i = R ˇ i α ˜ + ε ˇ i ,
where y ˇ i = B y i and R ˇ i = B R i . Estimating this by GMM, but using an instrument matrix with components σ ε , i 2 Z i , yields the unfeasible MABu estimator for the model with cross-sectional heteroskedasticity, which is
α ˜ ^ M A B u = [ ( Σ i = 1 N σ ε , i 2 R ˇ i Z i ) ( Σ i = 1 N σ ε , i 2 Z i Z i ) 1 ( Σ i = 1 N σ ε , i 2 Z i R ˇ i ) ] 1 × ( Σ i = 1 N σ ε , i 2 R ˇ i Z i ) ( Σ i = 1 N σ ε , i 2 Z i Z i ) 1 ( Σ i = 1 N σ ε , i 2 Z i y ˇ i ) .
Note that the exploited moment conditions are here E ( σ ε , i 2 Z i ε ˇ i ) = σ ε , i 2 E ( Z i ε ˇ i ) = 0 . For σ ε , i 2 > 0 these are intrinsically equivalent with E ( Z i ε ˇ i ) = 0 , but they induce the use of a different set of instruments yielding a different estimator. That it is most likely that the unfeasible standard AB estimator ABu, which uses instruments σ ε , i Z i for regressors σ ε , i 1 R ˇ i , will generally exploit weaker instruments than MABu, which uses σ ε , i 1 Z i for regressors σ ε , i 1 R ˇ i , should be intuitively obvious.
To convert this into a feasible procedure, one could initially assume that all σ ε , i 2 are equal. Then the first-step MGMM estimator α ˜ ^ M A B ( 1 ) is numerically equivalent to AB1 of (41), provided all instruments are being used.5 Next, exploiting (48), the feasible 2-step MAB estimator can be obtained by
α ˜ ^ M A B ( 2 ) = [ ( Σ i = 1 N R ˇ i Z i / σ ^ ε , i 2 , ( 1 ) ) ( Σ i = 1 N Z i Z i / σ ^ ε , i 2 , ( 1 ) ) 1 ( Σ i = 1 N Z i R ˇ i / σ ^ ε , i 2 , ( 1 ) ) ] 1 × ( Σ i = 1 N R ˇ i Z i / σ ^ ε , i 2 , ( 1 ) ) ( Σ i = 1 N Z i Z i / σ ^ ε , i 2 , ( 1 ) ) 1 ( Σ i = 1 N Z i y ˇ i / σ ^ ε , i 2 , ( 1 ) ) .
Modifying the system estimator is more problematic, primarily because the inverse of the matrix V a r ( u i ) = Σ i = σ ε , i 2 I T + σ η 2 ι T ι T , which is Σ i 1 = σ ε , i 2 [ I T + ( T + σ ε , i 2 / σ η 2 ) 1 ι T ι T ] , is nondiagonal. So, although E ( Z ˜ i s u i ) = 0 , surely E ( Z ˜ i s Σ i 1 u i ) 0 . However, as an unfeasible modified system estimator we can combine estimation of the model for y ˇ i using instruments σ ε , i 2 Z i with estimation of the model for y i using instruments ( σ ε , i 2 + σ η 2 ) 1 Z ˜ i s . So, the system is then given by the model
y i = R i α ¨ + u i ,
where y i = ( y ˇ i , y i ) , R i = ( R ˇ i * , R i * ) , with R ˇ i * = ( R ˇ i , 0 ) , and u i = ( ε ˇ i , u i ) .
For the 1-step estimator we could again choose some nonnegative value q and calculate the 1-step estimator BB1 given in (63) in order to find residuals and obtain the estimators σ ^ ε , i 2 , s ( 1 ) and σ ^ η 2 , s ( 1 ) of (68) and (69). Building on E ( u i u i ) and instrument matrix block Z i , given by
E ( u i u i ) = σ ε , i 2 I T 1 B B I T + ( σ η 2 / σ ε , i 2 ) ι T ι T and Z i = σ ^ ε , i 2 , s ( 1 ) Z i O O ( σ ^ ε , i 2 , s ( 1 ) + σ ^ η 2 , s ( 1 ) ) 1 Z ˜ i s ,
one obtains weighting matrix
S ^ c B ( 1 ) = Σ i = 1 N σ ^ ε , i 2 , s ( 1 ) Z i Z i ( σ ^ ε , i 2 , s ( 1 ) + σ ^ η 2 , s ( 1 ) ) 1 Z i B Z ˜ i s ( σ ^ ε , i 2 , s ( 1 ) + σ ^ η 2 , s ( 1 ) ) 1 Z ˜ i s B Z i ( σ ^ ε , i 2 , s ( 1 ) + σ ^ η 2 , s ( 1 ) ) 2 Z ˜ i s [ σ ^ ε , i 2 , s ( 1 ) I T + σ ^ η 2 , s ( 1 ) ι T ι T ] Z ˜ i s 1 ,
which can be exploited in the feasible 2-step MGMM system estimator MBB
α ¨ ^ M B B ( 2 ) = [ ( Σ i = 1 N R i Z i ) S ^ c B ( 1 ) ( Σ i = 1 N Z i R i ) ] 1 ( Σ i = 1 N R i Z i ) S ^ c B ( 1 ) ( Σ i = 1 N Z i y i ) .
For both α ˜ ^ M A B ( 2 ) and α ¨ ^ M B B ( 2 ) relevant t-test and Sargan-Hansen test statistics can be constructed. Regarding the latter we will just examine
J M A B = ( Σ i = 1 N ε ˇ ^ i ( 2 ) Z i / σ ^ ε , i 2 , s ( 1 ) ) ( Σ i = 1 N Z i Z i / σ ^ ε , i 2 , s ( 1 ) ) 1 ( Σ i = 1 N Z i ε ˇ ^ i ( 2 ) / σ ^ ε , i 2 , s ( 1 ) ) ,
where ε ˇ ^ i ( 2 ) = y ˇ i R ˇ i α ˜ ^ M G M M c h ( 2 ) , and
J M B B = ( Σ i = 1 N u ^ i ( 2 ) Z i s ) S ^ c ( 1 ) ( Σ i = 1 N Z i s u ^ i ( 2 ) ) ,
with u ^ i ( 2 ) = y i R i α ¨ ^ M B B ( 2 ) = ( ε ˇ ^ i s ( 2 ) u ^ i s ( 2 ) ) . Under their respective null hypotheses these follow asymptotically χ 2 distributions with L K T + 1 and L + L K degrees of freedom. Self-evidently, the test on the effect stationarity related orthogonality conditions is given by
J E S M = J M B B J M A B .

4. Simulation Design

We will examine the stable dynamic simultaneous heteroskedastic DGP ( i = 1 , . . . , N , t = 1 , . . . , T )
y i t = μ y + γ y i , t 1 + β x i t + σ η η i + σ ε ω i 1 / 2 ε i t ( γ < 1 ) .
Here β has just one element relating to the for each i stable autoregressive regressor
x i t = μ x + ξ x i , t 1 + π η η i + π λ λ i + σ v ω i 1 / 2 v i t , where
v i t = ρ v ε ε i t + ( 1 ρ v ε 2 ) 1 / 2 ζ i t ,
with ξ < 1 and ρ v ε < 1 . All random drawings η i , ε i t , λ i , ζ i t are I I D ( 0 , 1 ) and mutually independent. Parameter ρ v ε indicates the correlation between the cross-sectionally heteroskedastic disturbances ε i t = σ ε ω i 1 / 2 ε i t and v i t = σ v ω i 1 / 2 v i t , which are both homoskedastic over time. How we did generate the values ω 1 , . . . , ω N and the start-up values x i , 0 and y i , 0 and chose relevant numerical values for the other eleven parameters will be discussed extensively below.
Note that in this DGP x i t is either strictly exogenous ( ρ v ε = 0 ) or otherwise endogenous6; the only weakly exogenous regressor is y i , t 1 . Regressor x i t may be affected contemporaneously by two independent individual specific effects when π η 0 and π λ 0 , but also with delays if ξ 0 . The dependent variable y i t may be affected contemporaneously by the (standardized) individual effect η i , both directly and indirectly; directly if σ η 0 , and indirectly via x i t when β π η 0 . However, η i will also have delayed effects on y i t , when γ 0 or ξ β π η 0 , and so has λ i when ξ β π λ 0 .
The cross-sectional heteroskedasticity is determined by both η i and λ i , the two standardized individual effects, and is thus associated with the regressors x i t and y i , t 1 . It follows a lognormal pattern when both η i and λ i are standard normal, because we take
ω i = e h i ( θ ) , with h i ( θ ) = θ 2 / 2 + θ [ κ 1 / 2 η i + ( 1 κ ) 1 / 2 λ i ] N I D ( θ 2 / 2 , θ 2 ) ,
where 0 κ 1 . This establishes a lognormal distribution with E ( ω i ) = 1 and V a r ( ω i ) = e θ 2 1 . So, for θ = 0 the ε i t and v i t are homoskedastic. The seriousness of the heteroskedasticity increases with the absolute value of θ. From h i ( θ ) / 2 N I D ( θ 2 / 4 , θ 2 / 4 ) it follows that ω i 1 / 2 = e h i ( θ ) / 2 is lognormally distributed too, hence E ( σ ε ω i 1 / 2 ) = σ ε e θ 2 / 8 and V a r ( σ ε ω i 1 / 2 ) = σ ε 2 ( 1 e θ 2 / 4 ) . Table 2 presents some quantiles and moments of the distributions of ω i and ω i 1 / 2 (taken as the positive square root of ω i ) in order to disclose the effects of parameter θ . It shows that θ 1 implies pretty serious heteroskedasticity, whereas it may be qualified mild when θ 0 . 3 , say. In all our simulation experiments ε i t will be unconditionally homoskedastic, irrespective of the value of θ , because E ( ε i t 2 ) = σ ε 2 E ( ω i ε i t 2 ) = σ ε 2 . However, for θ 0 none of the experiments will be characterized by conditional homoskedasticity, because of the following. Always y i , t 2 will be employed as instrument. Because a realization of y i , t 2 depends on θ , also E ( ε i t 2 y i , t 2 ) will depend on θ .
Without loss of generality we may chose σ ε = 1 and μ y = μ x = 0 . Note that (99) implicitly specifies τ 1 = 0 , τ * = 0 . All simulation results refer to estimators where these T restrictions have been imposed (there are no time effects), but μ y = μ x = 0 have not been imposed. Hence, when estimating the model in levels ι T is one of the regressors. Moreover, we may always include I T 1 in Z i and ι T in Z ˜ i s in order to exploit the fundamental moment conditions E ( ε ˜ i t ) = 0 (for t = 2 , . . . , T ) and E [ t = 1 T ( η i + ε i t ) ] = 0 for i = 1 , . . . , N .
Apart from values for θ and κ , we have to make choices on relevant values for eight more parameters. We could choose γ { 0 . 2 , 0 . 5 , 0 . 8 } , which covers a broad range of adjustment processes for dynamic behavioral relationships, and ξ { 0 . 5 , 0 . 8 , 0 . 95 } to include less and more smooth x i t processes. Next, interesting values should be given to the remaining six parameters, namely β , σ η , π η , π λ , σ v and ρ v ε . We will do this by choosing relevant values for six alternative more meaningful notions, which are all functions of some of the eight DGP parameters and allow us to establish relevant numerical values for them, as suggested in Kiviet [39].
The first three notions will be based on (ratios of) particular variance components of the long-run stationary path of the process for x i t . Using lag-operator notation and assuming that v i t (and ε i t ) exist for t = , . . . . , 0 , 1 , . . . , T , we find7 that the long-run path for x i t consists of three mutually independent components, namely
x i t = ( 1 ξ ) 1 π η η i + ( 1 ξ ) 1 π λ λ i + σ v ω i 1 / 2 ( 1 ξ L ) 1 v i t .
The third component, the accumulated contributions of v i t , is a stationary AR(1) process with variance σ v 2 ω i / ( 1 ξ 2 ) . Approximating N 1 Σ i = 1 N ω i by 1 , the average variance is σ v 2 / ( 1 ξ 2 ) . The other two components have variances π η 2 / ( 1 ξ ) 2 and π λ 2 / ( 1 ξ ) 2 respectively, so the average long-run variance of x i t equals
V ¯ x = ( 1 ξ ) 2 ( π η 2 + π λ 2 ) + ( 1 ξ 2 ) 1 σ v 2 .
A first characterization of the x i t series can be obtained by setting V ¯ x = 1 . This is an innocuous normalization, because β is still a free parameter. As a second characterization of the x i t series, we can choose what we call the (average) effects variance fraction of x i t , given by
E V F x = ( 1 ξ ) 2 ( π η 2 + π λ 2 ) / V ¯ x ,
with 0 E V F x 1 , for which we could take, say, E V F x { 0 , 0 . 3 , 0 . 6 } . To balance the two individual effect variances we define for the case E V F x > 0 what we call the individual effect fraction of η i in x i t given by
I E F x η = π η 2 / ( π η 2 + π λ 2 ) .
So I E F x η , with 0 I E F x η 1 , expresses the fraction due to π λ η i of the (long-run) variance of x i t stemming from the two individual effects. We could take, say, I E F x η { 0 , 0 . 3 , 0 . 6 } .
From these three characterizations we obtain
π λ = ( 1 ξ ) [ ( 1 I E F x η ) E V F x ] 1 / 2 ,
π η = ( 1 ξ ) [ I E F x η E V F x ] 1 / 2 ,
σ v = [ ( 1 ξ 2 ) ( 1 E V F x ) ] 1 / 2 .
For all three we will only consider the nonnegative root, because changing the sign would have no effects on the characteristics of x i t , as we will generate the series η i , ε i t , λ i and ζ i t from symmetric distributions. The above choices regarding the x i t process have the following implications for the average correlations between x i t and its two constituting effects:
ρ ¯ x η = π η / ( 1 ξ ) = [ ( 1 I E F x η ) E V F x ] 1 / 2 ,
ρ ¯ x λ = π λ / ( 1 ξ ) = [ I E F x η E V F x ] 1 / 2 .
Now the x i t series can be generated upon choosing a value for ρ v ε . This we obtain from E ( x i t ε i t ) = σ ε σ v ρ v ε ω i , which on average is σ v ρ v ε . Hence, fixing the average simultaneity8 to ρ ¯ x ε , we should choose
ρ v ε = ρ ¯ x ε / σ v .
In order that both correlations are smaller than 1 in absolute value an admissibility restriction has to be satisfied, namely ρ ¯ x ε 2 σ v 2 , giving
ρ ¯ x ε 2 ( 1 ξ 2 ) ( 1 E V F x ) .
When choosing E V F x = 0 . 6 and ξ = 0 . 8 we should have ρ ¯ x ε 0 . 379 . That we should not exclude negative values of ρ ¯ x ε will become obvious in due course. For the moment it seems interesting to examine, say, ρ ¯ x ε { 0 . 3 , 0 , 0 . 3 } .
The remaining choices concern β and σ η which both directly affect the DGP for y i t . Substituting (103) and (101) in (99) we find that the long-run stationary path for y i t entails four mutually independent components, since
y i t = β ( 1 γ L ) 1 x i t + ( 1 γ ) 1 σ η η i + σ ε ω i 1 / 2 ( 1 γ L ) 1 ε i t = ( 1 γ ) 1 ( 1 ξ ) 1 { [ β π η + ( 1 ξ ) σ η ] η i + β π λ λ i } + β σ v ω i 1 / 2 ( 1 γ L ) 1 ( 1 ξ L ) 1 v i t + σ ε ω i 1 / 2 ( 1 γ L ) 1 ε i t = ( 1 γ ) 1 ( 1 ξ ) 1 { [ β π η + ( 1 ξ ) σ η ] η i + β π λ λ i } + β σ v ( 1 ρ v ε 2 ) 1 / 2 ω i 1 / 2 ( 1 γ L ) ( 1 ξ L ) ζ i t + [ β ρ v ε σ v + ( 1 ξ L ) σ ε ] ω i 1 / 2 ( 1 γ L ) ( 1 ξ L ) ε i t
The second term of the final expression constitutes for each i an AR(2) process and the third one an ARMA(2,1) process. The variance of y i t has four components given by (derivations in Appendix D)
V η = ( 1 γ ) 2 ( 1 ξ ) 2 [ β π η + ( 1 ξ ) σ η ] 2 V λ = ( 1 γ ) 2 ( 1 ξ ) 2 β 2 π λ 2 V ζ ( i ) = β 2 σ v 2 ( 1 ρ v ε 2 ) ( 1 + γ ξ ) ( 1 γ 2 ) ( 1 ξ 2 ) ( 1 γ ξ ) ω i V ε ( i ) = [ ( 1 + β ρ v ε σ v ) 2 + ξ 2 ] ( 1 + γ ξ ) 2 ξ ( 1 + β ρ v ε σ v ) ( γ + ξ ) ( 1 γ 2 ) ( 1 ξ 2 ) ( 1 γ ξ ) ω i .
Averaging the last two over all i yields V ¯ ζ and V ¯ ε . For the average long-run variance of y i t we then can evaluate
V ¯ y = V η + V λ + V ¯ ζ + V ¯ ε .
When choosing fixed values for ratios involving these components to obtain values for β and σ η we will run into the problem of multiple solutions. On the other hand, the four components of (115) have particular invariance properties regarding the signs of β , σ η and ρ v ε , since changing the sign of all three yields exactly the same value of V ¯ y . We coped with this as follows. Although we note that V η does depend on β π η , we set σ η simply by fixing the direct cumulated effect impact of η i on y i t relative to the current noise σ ε = 1 . This is
D E N y η = σ η / ( 1 γ ) .
Because the direct and indirect (via x i t ) effects from η i may have opposite signs, D E N y η could be given negative values too, but we restricted ourselves to D E N y η { 1 , 4 } , yielding
σ η = ( 1 γ ) D E N y η .
Finally we fix a signal-noise ratio, which gives a value for β . Because under simultaneity the noise and current signal conflate, we focus on the case where ρ x ε = 0 . Then we have
V ¯ ζ = [ ( 1 γ 2 ) ( 1 ξ 2 ) ( 1 γ ξ ) ] 1 β 2 σ v 2 ( 1 + γ ξ ) , V ¯ ε = ( 1 γ 2 ) 1 .
Leaving the variance due to the effects aside, the average signal variance is V ¯ ζ + V ¯ ε 1 , because the current average noise variance is unity. Hence, we may define a signal-noise ratio as
S N R = V ¯ ζ + V ¯ ε 1 = β 2 ( 1 E V F x ) ( 1 + γ ξ ) ( 1 γ 2 ) ( 1 γ ξ ) + γ 2 ( 1 γ 2 ) ,
where we have substituted (109). For this we may choose, say, S N R { 2 , 3 , 5 } , in order to find
β = 1 γ ξ 1 + γ ξ S N R γ 2 ( S N R + 1 ) 1 E V F x 1 / 2 .
Note that here another admissibility restriction crops up, namely
γ 2 S N R / ( S N R + 1 ) .
However, for γ 0 . 8 this is satisfied for S N R 1 . 78 . From (119) we only examined the positive root.
Instead of fixing SNR another approach would be to fix the total multiplier
T M = β / ( 1 γ ) ,
which would directly lead to a value for β , given γ . However, different T M values will then lead to different S N R values, because
S N R = T M 2 ( 1 E V F x ) ( 1 γ ) ( 1 + γ ξ ) ( 1 + γ ) ( 1 γ ξ ) + γ 2 ( 1 γ 2 ) .
At this stage it is hard to say what would yield more useful information from the Monte Carlo, fixing T M or S N R . Keeping both constant for different γ and some other characteristics of this DGP is out of the question. We chose to fix S N R = 3 . which yields T M values in the range 1.5–1.8. When comparing with results for T M = 1 we did not note substantial principle differences.
For all different design parameter combinations considered, which involve sample size N { 200 , 1000 } and T { 3 , 6 , 9 } , we used the very same realizations of the underlying standardized random components η i , λ i , ε i t and ζ i t over the respective 10,000 replications that we performed. At this stage, all these components have been drawn from the standard normal distribution. To speed-up the convergence of our simulation results, in each replication we have modified the N drawings η i and λ i such that they have sample mean zero, sample variance 1 and sample correlation zero. This rescaling is achieved by replacing the N draws η i first by [ η i N 1 Σ i = 1 N η i ] and next by η i / [ N 1 Σ i = 1 N ( η i ) 2 ] 1 / 2 , and by replacing the λ i by the residuals obtained after regressing λ i on η i and an intercept, and next scaling them by taking λ i / [ N 1 Σ i = 1 N ( λ i ) 2 ] 1 / 2 . In addition, we have rescaled in each replication the ω i by dividing them by N 1 Σ i = 1 N ω i , so that the resulting ω i have sum N as they should in order to avoid that presence of heteroskedasticity is conflated with larger or smaller average disturbance variance.
In the simulation experiments we will start-up the processes for x i t and y i t at pre-sample period s < 0 by taking x i s = 0 and y i , s = 0 and next generate x i t and y i t for the indices t = s + 1 , . . . , T . The data with time-indices s , . . . , 1 will be discarded when estimating the model. We suppose that for s = 50 both series will be on their stationary track from t = 0 onwards. When taking s = 1 or 2 the initial values y i 0 and x i 1 will be such that effect-stationarity has not yet been achieved. Due to the fixed zero startups (which are equal to the unconditional expectations) the (cross-)autocorrelations of the x i t and y i t series have a very peculiar start then too, so such results regarding effect nonstationarity will certainly not be fully general, but for s close to zero they mimic in a particular way the situation that the process started only very recently.
Another simple way to mimic a situation in which lagged first-differenced variables are invalid instruments for the model in levels can be designed as follows. Equations (103) and (114) highlight that in the long-run Δ x i t and Δ y i t are uncorrelated with the effects η i and λ i . This can be undermined by perturbing x i 0 and y i 0 as obtained from s = 50 in such a way that we add to them the values
ϕ 1 1 ξ ( π η η i + π λ λ i ) and ϕ 1 ( 1 γ ) ( 1 ξ ) { [ β π η + ( 1 ξ ) σ η ] η i + β π λ λ i } ,
respectively. Note that for ϕ = 1 effect stationarity is maintained, whereas for 0 ϕ < 1 the dependence of x i 0 and y i 0 on the effects is mitigated in comparison to the stationary track (upon maintaining stationarity regarding ε i t and ζ i t ), whereas for ϕ > 1 this dependence is inflated. Note that this is a straight-forward generalization of the approach followed in Kiviet [7] for the panel AR(1) model.

5. Simulation Results

To limit the number of tables we proceed as follows. Often we will first produce results on unfeasible implementations of the various inference techniques in relatively simple DGPs. These exploit the true values of ω 1 , . . . , ω N , σ ε 2 and σ η 2 instead of their estimates. Although this information is generally not available in practice, only when such unfeasible techniques behave reasonably well in finite samples it seems useful to examine in more detail the performance of feasible implementations. Results for the unfeasible Arellano and Bond [1] and Blundell and Bond [2] GMM estimators are denoted as ABu and BBu respectively. Their feasible counterparts are denoted as AB1 and BB1 for the 1-step (which under homoskedasticity are equivalent to their unfeasible counterparts) and AB2 and BB2 for the 2-step estimators. For 2-step estimators the lower case letters a, b or c are used (as in for instance AB2c) to indicate which type of weighting matrix has been exploited, as discussed in Section 3.2.1 and Section 3.3.2. For corresponding MGMM implementations these acronyms are preceded by the letter M. Under homoskedasticity their unfeasible implementation has been omitted when this is equivalent to GMM. In BB estimation we have always used q = 1 .
First in Section 5.1 we will discuss the results for DGPs in which the initial conditions are such that BB estimation will be consistent and more efficient than AB, and subsequently in Section 5.2 the situation where BB is inconsistent is examined. Within these subsections we will examine different parameter value combinations for the DGP. We will start by presenting results for a reference parametrization (indicated P0) which has been chosen such that the model has in fact four parameters less, by taking ρ ¯ x ε = 0 ( x i t is strictly exogenous), E V F x = 0 (hence π λ = π η = 0 , so x i t is neither correlated with λ i nor with η i ) and κ = 0 (any cross-sectional heteroskedasticity is just related with λ i ). These choices (implying that any heteroskedasticity will be unrelated to the mean of regressor x i t ) may (hopefully) lead to results where little difference between unfeasible and feasible estimation will be found and where test sizes are relatively close to the nominal level of 5%. Next we will discuss the effects of settings (to be labelled P1, P2 etc.) which deviate from this reference parametrization P0 in one or more aspects regarding the various correlations and variance fractions and ratios. In P0 the relationship for y i t will be characterized by D E N y η = 1 (the impact on y i t of the individual effect η i and of the idiosyncratic disturbance ε i t have equal variance). The two remaining parameters have been held fixed over all cases examined (including P0); the x i t series has autoregressive coefficient ξ = 0 . 8 and regarding y i t we take S N R = 3 (excluding the impacts of the individual effects, the variance of the explanatory part of y i t is three times as large as σ ε 2 ).
In Section 3.2 we already indicated that we will examine implementations of GMM where all internal instruments associated with linear moment conditions will be employed (A), but also particular reductions based either on collapsing (C) or omitting long lags (L3, etc.), or a combination (C3, etc.). On top of this we will also distinguish situations that may lead to reductions of the instruments that are being used, because the regressor x i t in model (99), which will either be strictly exogenous or endogenous with respect to ε i t , might be rightly or wrongly treated as either strictly exogenous, or as predetermined (weakly exogenous), or as endogenous. These three distinct situations will be indicated by the letters X, W and E respectively. So, in parametrization P0, where x i t is strictly exogenous, the instruments used by either A, C or, say, L2, are not the same under the situations X, W and E. This is hopefully clarified in the next paragraph.
Since we assume that for estimation just the observations y i 0 , . . . , y i T and x i 1 , . . . , x i T are available, the number of internal instruments that are used under XA (all instruments, x i t treated as strictly exogenous) for estimation of the equation in first differences is: T 1 (time-dummies) + T ( T 1 ) / 2 (lags of y i t ) + ( T 1 ) T (lags and leads of x i t ) . This yields { 11 , 50 , 116 } instruments for T = { 3 , 6 , 9 } . Under WA this is { 8 , 35 , 80 } and under EA { 6 , 30 , 72 } . From Section 3.3.1 it follows that for BB estimation this number of instruments increases with 1 (intercept) + T 1 (when y i , t 1 is supposed to be effect stationary) + T 1 (when x i , t is supposed to be effect stationary) 1 (when x i t is treated as endogenous). This implies for T = { 3 , 6 , 9 } a total of { 5 , 11 , 17 } extra instruments under XA and WA, and of { 4 , 10 , 16 } under EA, whereas these extra instruments will be valid in Section 5.1 below and invalid in Section 5.2.
For the tables to follow we always examine the three values γ { 0 . 2 , 0 . 5 , 0 . 8 } for the dynamic adjustment coefficient at the three sample size values T { 3 , 6 , 9 } while mostly N = 200 , as in the classic Arellano and Bond [1] study. This is done for both θ = 0 (homoskedasticity) and θ = 1 (substantial cross-sectional heteroskedasticity). Tables have a code which starts by the design parametrization, followed by the character u or f, indicating whether the table contains unfeasible or feasible results. Because of the many feasible variants not all results can be combined in just one table. Therefore, the f is followed by c, t, J or σ, where c indicates that the table just contains results on coefficient estimates, which are estimated bias, standard deviation (Stdv) and RMSE (root mean squared error; below often loosely addressed as precision); t refers to estimates of the actual rejection probabilities of tests on true coefficient values; J indicates that the table only contains results on Sargan-Hansen tests; and σ indicates that the table just contains results on estimating σ ε en σ η . Next, after a bar (-), the earlier discussed code is given for how regressor x i t is actually treated when selecting the instruments, followed by the type of instrument reduction.

5.1. DGPs under Effect Stationarity

Here we focus on the case where BB is consistent and more efficient than AB, since s = 50 and ϕ = 1 .

5.1.1. Results for the Reference Parametrization P0

Table 3, with code P0u-XA, gives results for unfeasible GMM coefficient estimators, unfeasible single coefficient tests, and for unfeasible Sargan-Hansen tests for the reference parametrization P0 when x i t is (correctly) treated as strictly exogenous and all available instruments are being used. Table 4 (P0fc-XA) presents a selection of feasible counterparts regarding the coefficient estimators. Under homoskedasticity we see that for γ ^ A B u = γ ^ A B 1 its bias (which is negative), stdv and thus its RMSE increase with γ and decrease with T , whereas the bias of β ^ A B u = β ^ A B 1 is moderate and its RMSE, although decreasing in T , is almost invariant with respect to β. The BBu coefficient estimates are superior indeed, the more so for larger γ values (as is already well-known), but less so for β . As already conjectured in Section 3.6 under cross-sectional heteroskedasticity both ABu and BBu are substantially less precise than under homoskedasticity. However, modifying the instruments under cross-sectional heteroskedasticity as is done by MABu and MBBu yields considerable improvements in performance both in terms of bias and RMSE. In fact, the precision of the unfeasible modified estimators under heteroskedasticity comes very close to their counterparts under homoskedasticity.
The simulation results in Table 4 for feasible estimation do not contain the b variant of the weighting matrix9 because it is so bad, whereas both the a and c variants yield RMSE values very close to their unfeasible counterparts, under homoskedasticity as well as heteroskedasticity. Although the best unfeasible results under heteroskedasticity are obtained by MBBu, this does not fully carry over to MBB, because for T small and also for moderate T and large γ , BB2c performs much better. The performance of MAB and AB2c is rather similar, whereas we established that their unfeasible variants differ a lot when γ is large. Apparently, the modified estimators can be much more vulnerable when the variances of the error components, σ ε , i 2 and σ η 2 , are unknown, probably because their estimates have to be inverted in (92) and (94).
From the type I error estimates for unfeasible single coefficient tests in Table 3 we see that the standard test procedures work pretty well for all techniques regarding β , but with respect to γ ABu fails for larger γ . This gets even worse under heteroskedasticity, but less so for MABu. For BBu and MBBu the results are reasonable. Here the test seems to benefit from the smaller bias of BBu. For the feasible variants we find in Table 5 (P0ft-XA) that under homoskedasticity AB1 has reasonable actual significance level for β , but for γ only when it is small. The same holds for AB2c. Under heteroskedasticity AB2c overrejects, especially for γ or T large, but only mildly so for tests on β . Both AB2a and MAB overreject enormously. Employing the Windmeijer [29] correction mitigates the overrejection probabilities in many cases, but not in all. AB2cW has appropriate size for tests on β , but for tests on γ the size increases both with γ and with T from 7% to 37% over the grid examined. Since the test based on ABu shows a similar pattern, it is self-evident that a correction which just takes the randomness of AB1 into account cannot be fully effective. Oddly enough the Windmeijer correction is occasionally more effective for the heavily oversized AB2a than for the less oversized AB2c. Under homoskedasticity both BB2c and BB2cW behave very reasonable, both for tests on β and on γ . Under heteroskedasticity BB2cW is still very reasonable, but all other implementations fail in some instances, especially for tests on γ when γ or T are large. The failure of BB1 under heteroskedasticity is self-evident, see (76).
Regarding the unfeasible J tests Table 3 shows reasonable size properties under homoskedasticity, especially for J B B u , but less so for the incremental test on effect stationarity when γ is large. Under heteroskedasticity this problem is more serious, though less so for the unfeasible modified procedure. Heteroskedasticity and γ large lead to underrejection of the J A B u test, especially when T is large too. Turning now to the many variants of feasible J tests, of which only some10 are presented in Table 6 (P0fJ-XA), we first focus on J A B . Under homoskedasticity J A B ( 1 , 0 ) behaves reasonable, though when (inappropriately) applied when θ = 1 it rejects with high probability (thus detecting heteroskedasticity instead of instrument invalidity, probably due to underestimation of the variance of the still valid moment conditions). Of the J A B ( 1 , 1 ) tests, which are only valid when θ = 0 , the c variant severely underrejects when T = 9 (when there is an abundance of instruments), but less so than the a version. Such severe underrejection under homoskedasticity had already been noted by Bowsher [14] when T > 9 . An almost similar pattern we note for J A B ( 2 , 1 ) and J A B ( 2 , 2 ) , which are asymptotically valid for any θ . Test J M A B overrejects severely for T = 3 and underrejects otherwise. Turning now to feasible J B B tests we find that J B B ( 1 , 0 ) underrejects when θ = 0 and, like J A B ( 1 , 0 ) , rejects with high probability when θ = 1 . Both the a and c variants of test J B B ( 1 , 1 ) , like J A B ( 1 , 1 ) , have rejection probabilities that are not invariant with respect to T , γ and θ . The c variants seem the least vulnerable, and therefore also yield an almost reasonable incremental test J E S ( 1 , 1 ) , although it underrejects when θ = 0 and overrejects when θ = 1 for γ = 0 . 8 . For J B B ( 2 , 1 ) and J B B ( 2 , 2 ) too the c variant has rejection probabilities which vary the least with T , γ and θ , but they are systematically below the nominal significance level, which is also the case for the resulting incremental tests. Oddly enough, the incremental tests resulting from the a variants have type I error probabilities reasonably close to 5%, despite the serious underrejection of both the J A B and J B B tests from which they result.
From Table 7 it can be seen that in the base case P0 estimation of σ ε (which has true value 1) is pretty accurate for all techniques and T and γ values, but less so under heteroskedasticity when T is small and γ large. Estimation of σ η is much more problematic. Only when γ is moderate, estimation bias is moderate too. The bias can exceed 100% when γ is large and T is small, and gets even worse under heteroskedasticity. Employing BB mitigates this bias.
When treating regressor x i t as predetermined (P0-WA, not presented here), although it is strictly exogenous, fewer instruments are being used. Since the ones that are now abstained from are most probably the strongest ones regarding Δ x i t , it is no surprise that in the simulation results we note that especially the standard deviation of the β coefficient suffers. Also the rejection probabilities of the various tests differ slightly between implementations WA and XA, but not in a very systematic way as it seems. When treating x i t as endogenous (P0-EA) the precision of the estimators gets worse, with again no striking effects on the performance of test procedures under their respective null hypotheses. Upon comparing for P0 the instrument set A (and set C) with the one where A x (C x ) is replaced by C1 x it has been found that the in practice quite popular choice C1 x yields often slightly less efficient estimates for β, but much less efficient estimates for γ.
When x i t is again treated as strictly exogenous, but the number of instruments is reduced by collapsing the instruments stemming from both y i t and x i t , then we note from Table 8 (P0fc-XC, just covering θ = 1 ) a mixed picture regarding the coefficient estimates. Although any substantial bias always reduces by collapsing, standard errors always increase at the same time, leading either to an increase or a decrease in RMSE. Decreases occur for the AB estimators of γ , especially when γ is large; for β just increases occur. A noteworthy reduction in RMSE does show up for BB2a when γ is large, T = 9 and θ = 1 , but then the RMSE of BB2c using all instruments is in fact smaller. However, Table 9 (P0ft-XC) shows that collapsing is certainly found to be very beneficial for the type I error probability of coefficient tests, especially in cases where collapsing yields substantially reduced coefficient bias. The AB tests benefit a lot from collapsing, especially the c variant, leaving only little room for further improvement by employing the Windmeijer correction. After collapsing AB1 works well under homoskedasticity, and also under heteroskedasticity provided robust standard errors are being used, where the c version is clearly superior to the a version. AB2c has appropriate type I error probabilities, except for testing γ when it is 0 . 8 at T = 3 and θ = 1 (which is not repaired by a Windmeijer correction either), and is for most cases superior to AB2aW. After collapsing BB2a shows overrejection which is not completely repaired by BB2aW when θ = 1 . BB2c and BB2cW generally show lower rejection probabilities, with occasionally some underrejection. Tests based on MAB and MBB still heavily overreject. Table 10 (P0fJ-XC) shows that by collapsing the J A B and J B B tests suffer much less from underrejection when T is larger than 3. However, both the a and c versions of the J ( 2 , 1 ) and J ( 2 , 2 ) tests usually still underreject, mostly by about 1 or 2 percentage points. Good performance is shown by J E S a ( 2 , 1 ) and J E S c ( 2 , 1 ) . Table 11 (P0fσ-XC) shows that collapsing reduces the bias in estimates of σ η substantially, although the bias is still huge when γ is large and T small, especially for AB and more so under heteroskedasticity.
When x i t is still correctly treated as strictly exogenous but for the level instruments just a few lags or first differences are being used (XL0 ... XL3) for both y i t and x i t then we find the following. Regarding feasible AB and BB estimation collapsing (XC) always gives smaller RMSE values than XL0 and XL1 (which is much worse than XL0), but this is not the case for XL2 and XL3. Whereas XC yields smaller bias, XL2 and XL3 often reach smaller Stdv and RMSE. Especially regarding β XL3 performs better than XL2. Probably due to the smaller bias of XC it is more successful in mitigating size problems of coefficient tests than XL0 through XL3. The effects on J tests is less clear-cut. Combining collapsing with restricting the lag length we find that XC2 and XC3 are in some aspects slightly worse but in others occasionally better than XC for P0. We also examined the hybrid instrumentation which seems popular amongst practitioners where C w is combined with L1 x (see Table 1). Especially for γ this leads to loss of estimator precision without any other clear advantages, so it does not outperform the XC results for P0. From examining P0-WC (and P0-EC) we find that in comparison to P0-WA (P0-EA) there is often some increase in RMSE, but the size control of especially the t-tests is much better.
Summarizing the results for P0 on feasible estimators and tests we note that when choosing between different possible instrument sets a trade off has to be made between estimator precision and test size control. For both some form of reduction of the instrument set is often but not always beneficial. Not one single method seems superior irrespective of the actual values of γ , β and T . Using all instruments is not necessarily a bad choice; also XC, XL3 and XC3 often work well. To mitigate estimator bias and foster test size control while not sacrificing too much estimator precision using collapsing (C) for all regressors seems a reasonable compromise, as far as P0 is concerned. Coefficient and J tests based on the modified estimator using its simple feasible implementation examined here behave so poorly, that in the remainder we no longer mention its results.

5.1.2. Results for Alternative Parametrizations

Next we examine a series of alternative parametrizations where each time we just change one of the parameter values of one of the already examined cases. In P1 we increase D E N y η from 1 to 4 (hence, substantially increasing the relative variance of the individual effects). We note that for P1-XA (not tabulated here) all estimators regarding γ are more biased and dispersed than for P0-XA, but there is little or no effect on the β estimates. For both T and γ large this leads to serious overrejection for the unfeasible coefficient tests regarding γ, in particular for ABu. Self-evidently, this carries over to the feasible tests and, although a Windmeijer correction has a mitigating effect, the overrejection remains often serious for both AB and BB based tests. Tests on β based on AB behave reasonable, apart from not robustified AB1 and AB2a. For the latter a Windmeijer correction proves reasonably effective. When exploiting the effect stationarity the BB2c implementation seems preferable. The unfeasible J tests show a similar though slightly more extreme pattern as for P0-XA. Among the feasible tests both serious underrejection and some overrejection occurs. The when θ = 1 invalid J A B ( 1 , 1 ) is not much worse than the valid tests. As far as the incremental tests concerns J E S c ( 2 , 2 ) behaves remarkably well.
In Table 12, Table 13, Table 14 and Table 15 (P1fj-XC, j = c,t,J,σ) we find that collapsing leads again to reduced bias, slightly deteriorated precision though improved size control (here all unfeasible tests behave reasonably well). All feasible AB1R and AB2W tests have reasonable size control, apart from tests on γ when T is small and γ large. These give actual significance levels close to 10%. BB2cW seems slightly better than BB2aW. The J tests using 1-step residuals only show some serious overrejection under heteroskedasticity, whereas the J ( 2 , 1 ) and J ( 2 , 2 ) behave quite satisfactorily. The increase of σ η has an adverse effect on its estimate when using uncollapsed BB for γ small, but collapsing substantially reduces the bias in σ η estimates. For C3 reasonably similar results are obtained, but those for L3 are generally slightly less attractive.
In P2 we increase E V F x from 0 to 0.6, upon having again I E F x η = 0 (hence, x i t is still uncorrelated with effect η i though correlated with effect λ i , which determines any heteroskedasticity). This leads to increased β values. Results for P2-XA show larger absolute values for the standard deviations of the β estimates than for P0-XA, but they are almost similar in relative terms. The patterns in the rejection probabilities under the respective null hypotheses are hardly affected, and P2-XC shows again improved behavior of the test statistics due to reduced estimator bias, whereas the RMSE values have slightly increased. Under P2 σ η estimates are more biased than under P0.
In P3 we change I E F x η from 0 to 0.3, while keeping E V F x = 0 . 6 (hence, realizing now dependence between regressor x i t and the individual effect η i ). Comparing the results for P3-XA with those for P2-XA (which have the same β values) we find that all patterns are pretty similar. Also P3-XC follows the P2-XC picture closely. Under P3 σ η estimates are more biased than under P0.
P4 differs from P3 because κ = 0 . 25 , thus now the heteroskedasticity is determined by η i too. This has a noteworthy effect on MBB estimation, a minor effect on JBB (and thus on JES) testing, and almost no effect on σ η estimation.
P5 differs from P0 just in having ρ ¯ x ε = 0 . 3 , so x i t is now endogenous with respect to ε i t . P5-EA uses all instruments available when correctly taking the endogeneity into account. This leads to very unsatisfactory results. The coefficient estimates of γ have serious negative bias, and those for β positive bias, whereas the standard deviation is slightly larger than for P0-EA, which are substantially larger than for P0-XA. All coefficient tests are very seriously oversized, also after a Windmeijer correction, both for AB and BB. Tests J A B u and J B B u show underrejection, whereas the matching J E S tests show serious overrejection when T is large, but the feasible 2-step variants are not all that bad. From Table 16, Table 17 and Table 18 (P5fj-EC, j = c,t,J) we see that most results which correctly handle the simultaneity of x i t are still bad after collapsing, especially for T small (where collapsing can only lead to a minor reduction of the instrument set), although not as bad as those for P5-EA and larger values of T. For P5-EC the rejection probabilities of the corrected coefficient tests are usually in the 10%–20% range, but those of the 2-step J tests are often close to 5%. Under P5 estimates of σ ε and σ η are much more biased than under P0. Both AB and BB are inconsistent when treating x i t either as predetermined or as exogenous. For P5-WA and P5-XA the coefficient bias is almost similar but much more serious than for P5-EA. For the inconsistent estimators the bias does not reduce when collapsing the instruments. Because the inconsistent estimators have a much smaller standard deviation than the consistent estimators practitioners should be warned never to select an estimator simply because of its attractive estimated standard error. The consistency of AB and BB should be tested with the Sargan-Hansen test.
In this study we did not examine the particular incremental test which focusses on the validity of the extra instruments when comparing E with W or E with X. Here we just examine the rejection probabilities of the overall overidentification J tests for case P5 using all instruments and can compare the rejection frequencies when treating x i t correctly as endogenous, or incorrectly as either predetermined or exogenous. From Table 19 (P5fJ-jA, j = E,W,X) we find that size control for J ( 2 , 2 ) can be slightly better than for J ( 2 , 1 ) . The detection of inconsistency by J ( 2 , 1 ) has often a higher probability when the null hypothesis is W than when it is X. The probability generally increases with T and with γ and is often better for the c variant than for the a variant and slightly better for BB implementations than for AB implementations, whereas in general heteroskedasticity mitigates the rejection probability. In the situation where all instruments have been collapsed, where we already established that the J tests do have reasonable size control, we find the following. For T = 3 and γ = 0 . 2 the rejection probability of the J A B and J B B tests does not rise very much when ρ ¯ x ε moves from 0 to 0.3, whereas for T = 9 , ρ ¯ x ε = 0 . 3 this rejection probability is often larger than 0 . 7 when γ 0 . 5 and often larger than 0 . 9 for γ = 0 . 8 . Hence, only for particular T , γ and θ parametrizations the probability to detect inconsistency seems reasonable, whereas the major consequence of inconsistency, which is serious estimator bias, is relatively invariant regarding T , γ and θ .
Summarizing our results for effect stationary models we note the following. We established that finite sample inaccuracies of the asymptotic techniques seriously aggravate when either σ η / ( 1 γ ) σ ε or under simultaneity. For both problems it helps to collapse instruments, and the first problem is mitigated and the second problem detected with higher probability by instrumenting according to W rather than X. Neglected simultaneity leads to seemingly accurate but seriously biased coefficient estimators, whereas asymptotically valid inference on simultaneous dynamic relationships is often not very accurate either. Even when the more efficient BB estimator is used with Windmeijer corrected standard errors, the bias in both γ and β is very substantial and test sizes are seriously distorted. Some further pilot simulations disclosed that N should be much and much larger than 200 in order to find much more reasonable asymptotic approximation errors.

5.2. Nonstationarity

Next we examine the effects of a value of ϕ different from unity. We will just consider setting ϕ = 0 . 5 and perturbing x i 0 and y i 0 according to (123), so that their dependence on the effects is initially 50% away from stationarity so that BB estimation is inconsistent. That this occurred we will indicate in the parametrization code by changing P into P ϕ . Comparing the results for P ϕ 0-XA with those for P0-XA, where ϕ = 1 (effect stationarity), we note from Table 20 (P ϕ 0fc-XA) a rather moderate positive bias in the BB estimators for both γ and β when both T and γ are small. Despite the inconsistency of BB the bias is very mild for larger T and especially for larger γ it is much smaller than for consistent AB. The pattern regarding T can be explained, because convergence towards effect stationarity does occur when time proceeds. Since this convergence is faster for smaller γ the good results for large γ seem due to great strength of the first-differenced lagged instruments regarding the level equation. Since π η = 0 here Δ x i , t 1 is in fact a valid instrument too. Note that the RMSE of inconsistent BB1, BB2a and BB2c is always smaller than that for consistent AB1, AB2a and AB2c, except when T and γ are both small. With respect to the AB estimators we find little to no difference compared to the results under stationarity. Table 21 (P ϕ 0ft-XA) shows that when γ = 0 . 8 the BB2cW coefficient test on γ yields very mild overrejection, while AB2aW and AB2cW seriously overreject. For smaller values of γ it is the other way around. After collapsing (not tabulated here) similar but more moderate patterns are found, due to the mitigated bias which goes again with slightly increased standard errors. Hence, for this case we find that one should perhaps not worry too much when applying BB even if effect stationarity does not strictly hold for the initial observations. As it happens, we note from Table 22 (P ϕ 0fJ-XA) that the rejection probabilities of the JES tests are such that they are relatively low when BB inference is more precise than AB inference, and relatively high when either T or γ are low for ϕ = 0 . 5 . This pattern is much more pronounced for the JES tests than for the JBB tests. However, it is also the case in P ϕ 0 that collapsing mitigates this welcome quality of the JES tests to warn against unfavorable consequences of effect nonstationarity on BB inference.
From P ϕ 1-XA, in which the individual effects are much more prominent, we find that ϕ = 0 . 5 has curious effects on AB and BB results. For effect stationarity ( ϕ = 1 ) we already noted more bias for AB than under P0. For γ large, this bias is even more serious when ϕ = 0 . 5 , despite the consistency of AB. For BB estimation the reduction of ϕ leads to much larger bias and much smaller stdv, with the effect that RMSE values for inconsistent BB are usually much worse than for AB, but are often slightly better (except for BB2c) when γ = 0 . 8 . All BB coefficient tests for γ have size close or equal to 1 under P ϕ 1-XA and the AB tests for γ = 0 . 8 overreject very seriously as well. Under P ϕ 1-XC the bias of AB is reasonable except for γ = 0 . 8 . The bias of BB has decreased but is still enormous, although its RMSE remains preferable when γ = 0 . 8 . Especially regarding tests on γ BB fails. For both the a and c versions the JES test has high rejection probability to detect ϕ 1 , except when γ is large. The relatively low rejection probability of JES tests obtained after collapsing when γ = 0 . 8 and ϕ = 0 . 5 again indicates that despite its inconsistency BB has similar or smaller RMSE than AB for that specific case.
Next we consider the simultaneous model again. In case P ϕ 5-EA estimator AB is consistent and BB again inconsistent. Nevertheless, for all γ and T values examined in Table 23 (P ϕ 5fc-EA), AB has a more severe bias than BB, whereas BB has smaller stdv values at the same time and thus has smaller RMSE for all γ and T values examined. The size control of coefficient tests is worse for AB, but for BB it is appalling too, where BB2aW, with estimated type I error probabilities ranging from 5% to 70%, is often preferable to BB2cW. The 2-step JAB tests behave reasonably, whereas the JBB tests reject with probabilities in the 3%–38% range, and JES in the 3%–69% range. By collapsing the RMSE of AB generally reduces when T 6 and for BB especially when γ = 0 . 8 . BB has again smaller RMSE than AB. The rejection rates of the JBB and JES tests are substantially lower now, which seems bad because the invalid (first-differenced) instruments are less often detected, but this may nevertheless be appreciated because it induces to prefer less inaccurate BB inference to AB inference. After collapsing the size distortions of BB2aW and BB2cW are less extreme too, now ranging from 5%–33%, but the RMSE values for BB may suffer due to collapsing, especially when γ and T are small. The RMSE values for BB under P ϕ 5-WA and P ϕ 5-XA are usually much worse than those for AB under P ϕ 5-EA. Hence, although the invalid instruments for the level equation are not necessarily a curse when endogeneity of x i t is respected, they should not be used when they are invalid for two reasons ( ϕ 1 and ρ x ε 0 ). That neither AB nor BB should be used in P5 under W and X will be indicated with highest probability under WC, and then this probability is larger than 0.8 for J B B a ( 2 , 1 ) only when T is high and for J A B a ( 2 , 1 ) only when both T and γ are high.
Summarizing our findings regarding effect nonstationarity, we have established that although ϕ 1 renders BB estimators inconsistent, especially when T is not small BB inference nevertheless often beats consistent AB, provided possible endogeneity of x i t is respected. The JES test seems to have the remarkable property to be able to guide towards the technique with smallest RMSE instead of the technique exploiting the valid instruments. For further details we refer to the full set of Monte Carlo results.

6. Empirical Results

The above findings will be employed now in a re-analysis of the data and some of the techniques studied in Ziliak [35]. The main purpose of that article was to expose the downward bias in GMM as the number of moment conditions expands. This is done by estimating a static life-cycle labor-supply model for a ten year balanced panel of males, and comparing for various implementations of 2SLS and GMM the coefficient estimates and their estimated standard errors when exploiting expanding sets of instruments. We find this approach rather naive for various reasons: (a) the difference between empirical coefficient estimates will at best provide a very poor proxy to any underlying difference in bias; (b) standard asymptotic variance estimates of IV estimators are known to be very poor representations of true estimator uncertainty;11 (c) the whole analysis is based on just one sample and possibly the model is seriously misspecified.12 The latter issue also undermines conclusions drawn in Ziliak [35] on overrejection by the J test, because it is of course unknown in which if any of his empirical models the null hypothesis is true. To avoid such criticism we designed the controlled experiments in the two foregoing sections on the effects of different sets of instruments on various relevant inference techniques. Now we will examine how these simulation results can be exploited to underpin actual inference from the data set used by Ziliak.
This data set originates from waves XII-XXI and the years 1979–1988 of the PSID. The subjects are N = 532 continuously married working men aged 22–51 in 1979. Ziliak [35] employs the static model13
ln h i t = β ln w i t + z i t γ + η i + ε i t ,
where h i t is the observed annual hours of work, w i t the hourly real wage rate, z i t a vector of four characteristics (kids, disabled, age, age-squared), η i an individual effect and ε i t the idiosyncratic error term. He assumes that ln w i t may be an endogenous regressor and that all variables included in z i t are predetermined. The parameter of interest is β and in the various static models examined its GMM estimates range from approximately 0.07 to 0.52, depending on the number of instruments employed.
After some experimentation we inferred that lagged reactions play a significant role in this relationship and that in fact a general second-order linear dynamic specification is required in order to pass the diagnostic tests which are provided by default in the Stata package xtabond2 (StataCorp LLC), see Roodman [43]. This model, also allowing for time-effects, is given by
ln h i t = l = 1 2 γ l ln h i , t l + l = 0 2 ( β l w ln w i , t l + β l k k i d s i , t l + β l d d i s a b i , t l ) + β a a g e i , t + β a a a g e i , t 2 + τ t + η i + ε i t .
We did not include lags of age and its square.14 Contrary to Ziliak, we will not treat variable a g e as predetermined, since due to its very nature (no feedbacks from hours worked to age) it must be strictly exogenous. On the other hand, lagged or even immediate feedbacks from labor supply to the variables k i d s and d i s a b seem well possible.
In the sequence of various model specifications and instrument set compositions embarked on below, we adopted the following methodological strategy. We start with a rather general initial dynamic model specification employing a relatively uncontroversial set of instruments, hence avoiding as much as possible the imposition of doubtful exclusion restrictions on (lagged) regressor variables as well as the exploitation of yet unconfirmed orthogonality conditions. This initial model is estimated by 1-step AB with heteroskedasticity robust standard errors, neglecting for the moment any coefficient t-tests, unless serial correlation tests and heteroskedasticity robust J tests show favorable results. As long as the latter is not the case, the model should be re-specified by adapting the functional form and/or including additional explanatories, either new ones or transformations of already included ones such as longer lags or interactions. When favorable serial correlation and robust J tests have been obtained, and when reconfirmed (especially in case evidence has been found indicating the presence of heteroskedasticity) by favorable autocorrelation and J tests after 2-step AB estimation, hopefully initial consistent estimates have been accomplished. Then, in next stages, the two further aims are: attaining increased efficiency and mitigating finite sample bias. These are pursued first by sequentially testing additional orthogonality conditions. Initially by testing whether variables treated as endogenous seem actually predetermined, and next by verifying whether predetermined variables seem in fact exogenous, possibly followed by testing the orthogonality conditions implied by effect stationarity. In this process the tested extra instruments are added to the already adopted set of instruments, provided incremental J tests are convincingly insignificant. Next, one could test coefficient restrictions (on the basis of robust 1-step AB standard errors in case of suspected heteroskedasticity, or using Windmeijer-corrected 2-step AB standard errors) and impose these restrictions when convincingly insignificant from both a statistical and economic point of view. During the whole process the effects on the various estimates and test statistics of collapsing the instrument set and/or removing instruments with long lags could be monitored and possibly induce not exploiting particular probably valid orthogonality conditions represented by apparently weak instruments.
For the present data set, the inclusion of second-order lags in the initial model specification yields T = 7 and K = 20 when estimating the first-differenced model (125), hence N T = 3724 . Although no generally accepted rules of thumb exist yet on requirements regarding the number of degrees of freedom and the degree of overidentification for GMM to work well in the analysis of micro panel data sets, we chose to respect at any stage in the specification search the inequalities L 10 K and L N T / 20 , but also examined cases where K < L < 2 K .
Table 24 presents some estimation and test results for model (125) obtained by employing different estimation methods and instrument sets. All results have been obtained by Stata/SE14.0 with package xtabond2 (StataCorp LLC), abstaining from any finite sample corrections, and supplemented with code for calculating σ ^ η , σ ^ ε and J A B test variants. In column (1) 1-step Arellano-Bond GMM estimates are presented (omitting the results for the included time-effects) with heteroskedasticity robust standard errors (indicated by AB1R) using all level instruments that are valid when (with respect to ε i t ) regressor ln h i , t 1 is predetermined, the regressors ln w i t , k i d s i t and d i s a b i t could be endogenous, and a g e is exogenous (indicated by 1P3E1X). This yields 4 × Σ h = 2 8 h + 9 = 149 instruments, because we instrumented both a g e and a g e 2 , like the seven time-dummies, just by themselves. For the AR and J tests given in the bottom lines the p-values are presented. Hence, in column (1) the (first-differenced) residuals do exhibit 1st order serial correlation (as they should) but no significant 2nd order problems emerge. We supplemented J A B ( 1 , 0 ) and J A B a ( 2 , 1 ) , as presented by xtabond2 (StataCorp LLC, see our footnote 3) by J A B a ( 1 , 1 ) and J A B a ( 2 , 2 ) . The p-value of 0.000 for J A B a ( 1 , 0 ) should be neglected, because we found convincing evidence of heteroskedasticity from an auxiliary LSDV regression (not presented) of the squared level residuals for the findings in column (1) on all regressors of model (125), except the current endogenous ones. Xtabond2 suggests now that we judge the adequacy of model and instruments on the basis of test J A B a ( 2 , 1 ) , hence on a hybrid test statistic involving both 1-step and 2-step residuals. Its p-value is high, thus seems to approve the validity of the instruments. However, in some of our simulations this variant underrejects. The purely 1-step based J A B a ( 1 , 1 ) test is only valid under conditional homoskedasticity, so we may neglect its low p-value in this and all other columns.
Because many regressors in column (1) have very low absolute t-values, this may undermine the finite sample performance of the J A B tests. Therefore, in column (2) we examine removal of the time-effects from the regression, which in column (1) have absolute t-values between 0.34 and 1.15. In column (3) we remove the time-dummies from the set of instruments too. This has little effect. Because the exogeneity of the time-effects is self-evident, we decide to keep them in the instrument set, though exclude them from the regressors. Since we did not manage to get more satisfying results regarding the J tests by relaxing implicit restrictions (including interactions, generalizing the functional form), we adopt with some hesitance the specification and classification of the variables of column (2) as an acceptable starting point. In the table all coefficient estimates with a t-ratio above 2 are marked by a double asterix, and a single asterix when between 1 and 2 (estimated standard errors are given between parentheses). The modest estimated values for the lagged dependent variable coefficient estimates in combination with those of σ ^ η / σ ^ ε suggest values of the D E N y η concept such that the relatively unfavorable simulation results for case P1 (where D E N y η = 4 ) do not seem to apply here. Column (4) presents the Windmeijer corrected 2-step AB estimates. For many coefficients these suggest an improvement in estimator efficiency. From the simulations we learned that we should not overrate the qualities of 2-step estimation. Also note that some of the coefficient estimates deviate from their 1-step counterparts, which might be due to vulnerability to finite sample bias. This can also be seen from the bottom row of the table which presents the estimate of the long-run wage elasticity of hours worked. This total multiplier is given by T M w = ( β ^ 0 w + β ^ 1 w + β ^ 2 w ) / ( 1 γ ^ 1 γ ^ 2 ) . Column (4) suggests a lower elasticity than column (2). Many of the static models estimated by Ziliak suggest even lower values for this elasticity (and forced equality of immediate and long-run elasticity, which is sharply rejected by all our models).
Before we proceed, we want to report that when estimating model (125) by AB1R without second order lags (then K = 17 and L = 154 ) the p-values of the AR(1) and AR(2) tests are 0.000 and 0.754 respectively, whereas that of J A B a ( 2 , 1 ) is 0 . 510 . Hence, despite the significance of various of the coefficients of twice lagged variables in columns (1) through (3), these three tests do not detect the apparent dynamic underspecification; hence, they lack power.
Although quite a few slope coefficients in columns (1) through (3) have t-ratio’s with small absolute values, similar to the time-effects, we prefer not to proceed at this stage by imposing further coefficient restrictions on the model. Instead, we shall try to decrease the estimated standard errors and mitigate finite sample bias by examining whether the three regressors which we treated as endogenous could actually be classified such that additional and stronger instruments might be used. However, before we do that, just for illustrative purposes, we present again AB1 and AB2 results for the model specification and instrument set as used in column (2), but now not robustified AB1 in column (5) and not Windmeijer corrected AB2 in column (6). For most coefficients column (5) suggests smaller standard errors than column (2), but given the detected heteroskedasticity we know these are deceitful inconsistent standard deviation estimates. Column (6) shows that not using the Windmeijer correction would incorrectly suggest that AB2 is substantially more efficient than (robust) AB1, which often it is not, as we already learned from our simulations. Note that the value of the serial correlation tests does not just depend on the (unaffected) residuals, but on the (affected) coefficient standard errors too. Therefore, we interpret the rejection by AR(2) in column (5) as due to size problems.
Next, a series of incremental J a ( 2 , 1 ) tests (not presented in the table) has been performed to establish the actual classification of the three yet treated as endogenous regressors. Testing against 1P4X (which implies 42 extra instruments) yields a p-values below 0.005. So, we better proceed step by step to assess whether some of these 42 instruments are nevertheless valid. Testing validity of the 7 extra instruments in case d i s a b i , t is treated as predetermined yields a p-value of 0.029, so this seems truly endogenous. Doing the same for k i d s i , t gives 0.520. Next testing whether the 7 extra instruments involving current values of k i d s i , t seem valid too yields p-value 0.398 and when testing against column (2) the 14 extra instruments yield p-value 0.490. Accepting exogeneity of the variable k i d s i , t and maintaining endogeneity of d i s a b i , t we now focus on the classification of ln w i t . Testing the extra 7 instruments when treating ln w i t as predetermined yields p-values 0.330. Testing jointly the 21 instruments additional to column (2) the p-value is 0.429. We decide to adopt the classification where variables a g e i , t and k i d s i , t are exogenous, d i s a b i , t and ln w i , t are endogenous, and self-evidently ln h i , t 1 is predetermined (all with respect to ε i , t ). The corresponding AB1R and AB2W estimates can be found in columns (7) and (8). Note that the extra instruments are especially beneficial for the standard errors of the β j k coefficients. Again the T M w estimate is larger for 1-step than for 2-step estimation.
In columns (9) and (10) we examine the effects on the results of column (7) of reducing the number of instruments; in column (9) by collapsing and in column (10) by discarding instruments lagged more than two periods. This leads to disturbing results. If the instruments used in column (7) are valid, those used in the columns (9) and (10) cannot be invalid. Nevertheless, test p-values of test J A B a ( 2 , 1 ) reduces substantially. That the estimated coefficient standard errors have increased in columns (9) and (10) is understandable, but the substantial shifts in coefficient estimates is seriously uncomfortable. The negative T M w found after collapsing seems not very realistic. The main question seems now whether this is just caused by finite sample bias, or by inconsistency. In the latter case the results of all other columns must be inconsistent too.
Finally, we examine 2-step Blundell-Bond system estimation with Windmeijer correction. Testing validity of the 34 instruments used in column (11) additional to those used in column (8), yields a p-value for the J E S a ( 2 , 1 ) test of 0.016, whereas the J A B a ( 2 , 1 ) based Hayashi-version (see our footnote 3) calculated by xtabond2 (StataCorp LLC) gives a p-value of 0.136. So, effect stationarity seems doubtful, although the five γ and β w coefficients seem all highly significant now (with all further coefficients insignificant). The estimates of T M w and σ η deviate strongly from those of columns (1) through (8). Even more distorted BB2 results are obtained after collapsing. We find it hard to believe that this is all due to increased efficiency and reduced finite sample bias and simply reject effect stationarity and tend to accept the results of columns (7) and (8). Or, should we declare all results in Table 21 uninterpretable simply because no model from the class examined here matches with the Ziliak data? It is hard to answer this question, simply because we learned from the simulations how vulnerable all employed tools are even in cases where the adopted model specification fully corresponds with the underlying DGP.
Hopefully the small sample bias is such that proper interpretation of the coefficients of column (7) is possible. Then we note that—although not statistically significant—we find a tendency that a positive change in either k i d s or d i s a b leads to an immediate drop in hours supplied, although this drop is mitigated for a substantial part after a few periods. Also, the older an individual gets there is a tendency (again insignificant) to work fewer hours. The wage elasticity is positive with a larger value than was inferred by earlier (static) studies. However, given what we learned from the simulations, we should restrain ourselves when drawing far-reaching conclusions from the estimation and test results given in Table 24, simply because we established that for the currently available techniques for analysis of dynamic panel data models the bias of coefficient estimates can be substantial and the actual size of tests may deviate considerably from the aimed at levels whereas their actual power seems modest.

7. Major Findings

In social science the quantitative analysis of many highly relevant problems requires structural dynamic panel data methods. These allow the observed data to have at best a quasi-experimental nature, whereas the causal structure and the dynamic interactions in the presence of unobserved heterogeneity have yet to be unraveled. When the cross-section dimension of the sample is not very small, employing GMM techniques seems most appropriate in such circumstances. This is also practical since corresponding software packages are widely available. However, not too much is known yet about the actual accuracy in practical situations on the abundance of different not always asymptotically equivalent implementations of estimators and test procedures. This study aims to demarcate the areas in the parameter space where the asymptotic approximations to the properties of the relevant inference techniques in this context have either shown to be reliable beacons or are actually often misguiding marsh fires.
In this context we provide a rather rigorous treatment of many major variants of GMM implementations as well as for the inference techniques on testing the validity of particular orthogonality assumptions and restrictions on individual coefficient values. Special attention is given to the consequences of the joint presence in the model of time-constant and individual-constant unobserved effects, covariates that may be strictly exogenous, predetermined or endogenous, and disturbances that may show particular forms of heteroskedasticity. Also the implications regarding initial conditions for separate regressors with respect to individual effect stationarity are analyzed in great detail, and various popular options that aim to mitigate bias by reducing the number of exploited internal instruments are elucidated. In addition, as alternatives to those used in current standard software, less robust weighting matrices and additional variants of Sargan-Hansen test implementations are considered, as well as the effects of particular modifications of the instruments under heteroskedasticity.
Next, a simulation study is designed in which all the above variants and details are being parametrized and categorized, which leads to a data generating process involving 10 parameters, for which, under 6 different settings regarding sample size and initial conditions, 60 different grid points are examined. For each setting and various of the grid points 13 different choices regarding the set of instruments have been used to examine 12 different implementations of GMM coefficient estimates, giving rise to 24 different implementations of t-tests and 27 different implementations of Sargan-Hansen tests. From all this only a pragmatically selected subset of results is actually presented in this paper.
The major conclusion from the simulations is that, even when the cross-section sample size is several hundreds, the quality of this type of inference depends heavily on a great number of aspects of which many are usually beyond the control of the investigator, such as: magnitude of the time-dimension sample size, speed of dynamic adjustment, presence of any endogenous regressors, type and severity of heteroskedasticity, relative prominence of the individual effects and (non)stationarity of the effect impact on any of the explanatory variables. The quality of inference also depends seriously on choices made by the investigator, such as: type and severity of any reductions applied regarding the set of instruments, choice between (robust) 1-step or (corrected) 2-step estimation, employing a modified GMM estimator, the chosen degree of robustness of the adopted weighting matrix, the employed variant of coefficient tests and of (incremental) Sargan-Hansen tests in deciding on the endogeneity of regressors, the validity of instruments and on the (dynamic) specification of the relationship in general.
Our findings regarding the alternative approaches of modifying instruments and exploiting different weighting matrices are as follows for the examined case of cross-sectional heteroskedasticity. Although the unfeasible form of modification does yield very substantial reductions in both bias and variance, for the straight-forward feasible implementation examined here the potential efficiency gains do not materialize. The robust weighting matrix, which also allows for possible time-series heteroskedasticity, performs often as well as (and sometimes even better than) a specially designed less robust version, although the latter occasionally demonstrates some benefits for incremental Sargan-Hansen tests.
Furthermore, we can report to practitioners: (a) when the effect-noise-ratio is large, the performance of all GMM inference deteriorates; (b) the same occurs in the presence of a genuine (or a supervacaneously treated as) endogenous regressor; (c) in many settings the coefficient restrictions tests show serious size problems which usually can be mitigated by a Windmeijer correction, although for γ large or under simultaneity serious overrejection remains unless N is very much larger than 200; (d) the limited effectiveness of the Windmeijer correction is due to the fact that the positive or negative bias in coefficient estimates is often more serious than the negative bias in the variance estimate; (e) limiting to some degree the number of instruments usually reduces bias and therefore improves size properties of coefficient tests, though at the potential cost of power loss because efficiency usually suffers; (f) for the case of an autoregressive strictly exogenous regressor we noted that it is better to not just instrument it by itself, but also by some of its lags because this improves inference, especially regarding the lagged dependent variable coefficient; (g) to mitigate size problems of the overall Sargan-Hansen overidentification tests the set of instruments should be reduced, possibly by collapsing; under conditional heteroskedasticity one should employ the quadratic form in 2-step residuals, possibly in combination with a weighting matrix based on 1-step residuals, although occasionally the 2-step weighting matrix seems preferable; (h) collapsing also reduces size problems of the incremental Sargan-Hansen effect stationarity test; (i) except under simultaneity, the GMM estimator which exploits instruments which are invalid under effect nonstationarity (BB) may nevertheless perform better than the estimator abstaining from these instruments (AB); (j) the rejection probability of the incremental Sargan-Hansen test for effect stationarity is such that it tends to direct the researcher towards applying the most accurate estimator, even if this is inconsistent; (k) The estimate of σ ^ ε is usually pretty accurate, which is certainly not always the case for σ ^ η , although quality improves for larger N and T , is better for BB than for AB and usually benefits from collapsing.
When re-analyzing a popular empirical data set in the light of the above simulation findings we note in particular that actual dynamic feedbacks may be much more subtle than those that can be captured by just including a lagged dependent variable regressor, which at present seems the most common approach to model dynamics in panels. In theory the omission of further lagged regressor variables should result in rejections by Sargan-Hansen test statistics, but their power suffers when many valid and some invalid orthogonality conditions are tested jointly instead of by deliberately chosen sequences of incremental tests or by direct variable addition tests. Hopefully tests for serial correlation, which we intentionally left out of this already overloaded study, provide an extra help to practitioners in guiding them towards well-specified models. Our results demonstrate that, especially under particular unfavorable settings, there is great urge for developing more refined inference procedures for structural dynamic panel data models.
multiple

Supplementary Materials

The following are available online at www.mdpi.com/2225-1146/5/1/14/s1, the full set of all Monte Carlo results produced for this article.

Acknowledgments

Financial support from the Netherlands Organization for Scientific Research (NWO) grants “Statistical inference methods regarding effectivity of endogenous policy measures” and “Causal inference with panel data” is gratefully acknowledged. Furthermore, all three authors would like to thank the Division of Economics at Nanyang Technological University in Singapore, where substantial parts of this paper have been written. The paper benefited from constructive comments by two reviewers and an academic editor, all anonymous.

Author Contributions

Overall, all authors contributed equally to this project.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Corrected Variance Estimation for 2-Step GMM

Windmeijer [29] provides a correction to the standard expression for the estimated variance of the 2-step GMM estimator in general nonlinear models and next specializes his results for models with linear moment conditions and finally for linear (panel data) models. Here we apply his approach directly to the standard linear model of Section 2.1 where β ^ ( 2 ) is based on weighting matrix Z Ω ^ ( 1 ) Z , where Ω ^ ( 1 ) depends on u ^ ( 1 ) and thus on β ^ ( 1 ) .
The nonlinear dependence of β ^ ( 2 ) on β ^ ( 1 ) can be made explicit by a linear approximation obtained by employing the well-known `delta-method’ to the vector function f ( β ) = { X Z [ Z Ω ( β ) Z ] 1 Z X } 1 X Z [ Z Ω ( β ) Z ] 1 Z u . Note that β ^ ( 2 ) = β 0 + f ( β ^ ( 1 ) ) . Expanding the second term around β 0 yields
β ^ ( 2 ) β 0 f ( β 0 ) + f ( β ) β β = β 0 ( β ^ ( 1 ) β 0 ) ,
where under sufficient regularity the omitted terms will be of small order. For k = 1 , . . . , K we find
f ( β ) β k = { X Z [ Z Ω ( β ) Z ] 1 Z X } 1 β k X Z [ Z Ω ( β ) Z ] 1 Z u + { X Z [ Z Ω ( β ) Z ] 1 Z X } 1 X Z [ Z Ω ( β ) Z ] 1 Z u β k ,
where
{ X Z [ Z Ω ( β ) Z ] 1 Z X } 1 β k = { X Z [ Z Ω ( β ) Z ] 1 Z X } 1 X Z [ Z Ω ( β ) Z ] 1 β k Z X × { X Z [ Z Ω ( β ) Z ] 1 Z X } 1 ,
with
[ Z Ω ( β ) Z ] 1 β k = [ Z Ω ( β ) Z ] 1 Z Ω ( β ) β k Z [ Z Ω ( β ) Z ] 1 ,
and
{ X Z [ Z Ω ( β ) Z ] 1 Z u } β k = X Z [ Z Ω ( β ) Z ] 1 β k Z u .
In the latter we omit an extra term in u / β k = ( y X β ) / β k simply because we just want to extract the dependence of β ^ ( 2 ) on the operational weighting matrix.
Using the short-hand notation A ( β ) = Z [ Z Ω ( β ) Z ] 1 Z and Ω k ( β ) = Ω ( β ) / β k we can establish from the above that
f ( β ) / β k = [ X A ( β ) X ] 1 X A ( β ) Ω k ( β ) { I N A ( β ) X [ X A ( β ) X ] 1 X } A ( β ) u .
This is the k-th column of the matrix F ( β ) = f ( β ) / β in (A1). The latter can now be expressed as
β ^ ( 2 ) β 0 { [ X A ( β 0 ) X ] 1 X A ( β 0 ) + F ( β 0 ) ( X P Z X ) 1 X P Z } u .
Because F ( β 0 ) = O p ( N 1 / 2 ) the second term is of smaller order.
This approximation to the estimation errors of β ^ ( 2 ) can be used to obtain a finite sample corrected variance estimate of β ^ ( 2 ) . This is relatively easy if one conditions on some value for F ( β 0 ) , say F ^ . Windmeijer chooses for the k-th column of F ^ the vector
[ X A ( β ^ ( 1 ) ) X ] 1 X A ( β ^ ( 1 ) ) Ω k ( β ^ ( 1 ) ) { I N A ( β ^ ( 1 ) ) X [ X A ( β ^ ( 1 ) ) X ] 1 X } A ( β ^ ( 1 ) ) u ^ ( 2 ) .
Taking u ^ ( 2 ) instead of the asymptotically equivalent u ^ ( 1 ) leads to substantial simplification, because X A ( β ^ ( 1 ) ) u ^ ( 2 ) = X Z [ Z Ω ( β ( 1 ) ) Z ] 1 Z ( y X β ^ ( 2 ) ) = 0 , giving
F ^ = ( F ^ · 1 , . . . , F ^ · K ) , with F ^ · k = [ X A ( β ^ ( 1 ) ) X ] 1 X A ( β ^ ( 1 ) ) Ω k ( β ^ ( 1 ) ) A ( β ^ ( 1 ) ) u ^ ( 2 ) .
Note that when L = K we have F ^ = O , because Z u ^ ( 1 ) = Z u ^ ( 2 ) = 0 .
This all then yields for L > K the corrected variance estimator
V a r c ^ ( β ^ ( 2 ) ) = V a r ^ ( β ^ ( 2 ) ) + F ^ V a r ^ ( β ^ ( 2 ) ) + V a r ^ ( β ^ ( 2 ) ) F ^ + F ^ V a r ^ ( β ^ ( 1 ) ) F ^ ,
where
V a r ^ ( β ^ ( 1 ) ) = σ ^ u 2 ( X P Z X ) 1 X P Z Ω ^ P Z X ( X P Z X ) 1 .
Note that in case Ω ( β ) = d i a g ( u 1 2 , . . . , u N 2 ) one has Ω k ( β ^ ( 1 ) ) = 2 β ^ k ( 1 ) d i a g ( u ^ 1 ( 1 ) x 1 k , . . . , u ^ N ( 1 ) x N k ) .

Appendix B. Partialling Out and GMM

The IV/2SLS result on partialling out directly generalizes for the MGMM estimator, provided this uses all the (transformed) predetermined regressors as instruments. In standard GMM the equivalence of predetermined regressors and a block of the instruments gets lost. Using the notation of (10) and considering the partitioned model leading to (8), we easily find its counterpart
β ^ 1 , G M M = ( X ^ 1 * M X ^ 2 * X ^ 1 * ) 1 X ^ 1 * M X ^ 2 * y * ,
where
X ^ * = ( X ^ 1 * , X ^ 2 * ) = P Z ( X 1 * , X 2 * ) = P ( Ψ ) 1 Z ( Ψ X 1 , Ψ X 2 ) .
In the special case of system (53) with instruments (55) we have X 2 = Z 2 = ( 0 , ι T ) and Z 1 Z 2 = 0 , whereas under cross-sectional heteroskedasticity, due to D ι T = 0 , the optimal weighting matrix is block-diagonal, hence Z 1 Ω Z 2 = 0 . Therefore Z 1 Z 2 = 0 too, giving P Z = P Z 1 + P Z 2 . Now we find X ^ 1 * = P Z X 1 * = ( P Z 1 + P Z 2 ) X 1 * and X ^ 2 * = ( P Z 1 + P Z 2 ) Ψ Z 2 = P Z 2 Ψ Z 2 = ( Ψ ) 1 Z 2 ( Z 2 Ω Z 2 ) 1 Z 2 Z 2 = c Z 2 , with c some scalar, because Z 2 has just one column. Therefore, M X ^ 2 * X ^ 1 * = M Z 2 ( P Z 1 + P Z 2 ) X 1 * = P Z 1 X 1 * . Thus, in this particular case (when using an appropriate weighting matrix), we find
β ^ 1 , G M M = ( X ^ 1 * M X ^ 2 * X ^ 1 * ) 1 X ^ 1 * M X ^ 2 * y * = ( X ^ 1 * P Z 1 X 1 * ) 1 X ^ 1 * P Z 1 y .
Due to the block of zeros in Z 1 this is just the GMM estimator of the model in first differences.

Appendix C. Extracting Redundant Moment Conditions

Through linear transformation15 we demonstrate that the sets of moment conditions for the equation in levels and for the equation in first-differences have a non empty intersection. First we consider the moment conditions associated with the strictly exogenous regressors. For the equation in first differences these are E ( x i T Δ ε i t ) = E [ Δ ε i t ( x i 1 . . . x i T ) ] = 0 , for t = 2 , . . . , T . They can also be represented16 by the combination E [ Δ ε i t ( Δ x i 2 . . . Δ x i T ) ] = 0 and E ( x i t Δ ε i t ) = 0 . However, by a similar transformation17 (here of the disturbances instead of the instruments), the conditions for the equation in levels E [ Δ x i t h ( η i ι T + ε i ) ] = 0 , where h = 1 , . . . , K x (and again t = 2 , . . . , T ), can be represented by E ( Δ x i t h ε ˜ i ) = 0 and E [ Δ x i t h ( η i + ε i t ) ] = 0 . So, just the K x ( T 1 ) orthogonality conditions E [ Δ x i t ( η i + ε i t ) ] = 0 for t = 2 , . . . , T are additional due to effect stationarity of K x of the strictly exogenous regressors.
Similarly, the orthogonality conditions E ( w i t 1 Δ ε i t ) = 0 , or E ( w i s Δ ε i t ) = 0 for s = 1 , . . . , t 1 with t = 2 , . . . , T , can be represented by E ( w i , t 1 Δ ε i t ) = 0 for t = 2 , . . . , T and E ( Δ w i s Δ ε i t ) = 0 for t = 3 , . . . , T and s = 2 , . . . , t 1 . On the other hand, the conditions E [ Δ w i t ( η i + ε i , t + l ) ] = 0 for t > 1 and l > 0 are actually E [ Δ w i s ( η i + ε i t ) ] = 0 for t = 2 , . . . , T and s = 2 , . . . , t , whereas these can be represented by E ( Δ w i s Δ ε i t ) = 0 for t = 3 , . . . , T and s = 2 , . . . , t 1 and E [ Δ w i t ( η i + ε i t ) ] = 0 for t = 2 , . . . , T . Thus, only the K w ( T 1 ) conditions E [ Δ w i t ( η i + ε i t ) ] = 0 for t = 2 , . . . , T are additional.
Using the same logic, the orthogonality conditions E ( v i t 2 Δ ε i t ) = 0 for t = 3 , . . . , T , which are actually E ( v i s Δ ε i t ) = 0 for t = 3 , . . . , T and s = 1 , . . . , t 2 , can also be represented by E ( v i , t 2 Δ ε i t ) = 0 for t = 3 , . . . , T and E ( Δ v i s Δ ε i t ) = 0 for t = 4 , . . . , T and s = 2 , . . . , t 2 . However, the conditions E [ Δ v i t ( η i + ε i , t + 1 + l ) ] = 0 for t > 1 and l > 0 are in fact E [ Δ v i s ( η i + ε i t ) ] = 0 for t = 3 , . . . , T and s = 2 , . . . , t 1 , which can also be represented as E ( Δ v i s Δ ε i t ) = 0 for t = 4 , . . . , T and s = 2 , . . . , t 2 and E [ Δ v i , t 2 ( η i + ε i t ) ] = 0 for t = 3 , . . . , T . Thus, we find that only the K v ( T 2 ) conditions E [ Δ v i , t 2 ( η i + ε i t ) ] = 0 for t = 3 , . . . , T are additional.

Appendix D. Derivations for (115)

The results for V η and V λ are obvious. Those for V ζ ( i ) and V ε ( i ) are obtained as follows. We use the standard result that the variance of a general stationary ARMA(2,1) process
z t = ψ ( 1 ϕ L ) ( 1 γ L ) ( 1 ξ L ) u t ,
where u t I I D ( 0 , 1 ) , is given by
V a r ( z t ) = ψ 2 ( 1 + γ ξ ) ( 1 + ϕ 2 ) 2 ϕ ( γ + ξ ) ( 1 γ ξ ) ( 1 γ 2 ) ( 1 ξ 2 ) .
Because we can rewrite (using σ ε = 1 )
[ β ρ v ε σ v + ( 1 ξ L ) ] ω i 1 / 2 = ( 1 + β ρ v ε σ v ) ω i 1 / 2 1 ξ 1 + β ρ v ε σ v L ,
the result for V ε ( i ) follows upon substituting ψ = ( 1 + β ρ v ε σ v ) ω i 1 / 2 and ϕ = ξ / ( 1 + β ρ v ε σ v ) . For V ζ ( i ) simply take ϕ = 0 and ψ = β σ v ( 1 ρ v ε 2 ) 1 / 2 ω 1 / 2 .
  • 1.Many authors and the Stata xtabond2 package (StataCorp LLC) confusingly address the corrected 2-step variance as robust.
  • 2.This is the only variant considered in Windmeijer [29].
  • 3.Package xtabond2 (StataCorp LLC) for Stata always reports J A B ( 1 , 0 ) after Arellano-Bond estimation, which is inappropriate unless there is conditional homoskedasticity. After requesting for robust standard errors in 1-step estimation it presents also J A B a ( 2 , 1 ) . Requesting 2-step estimation also presents both J A B ( 1 , 0 ) and J A B a ( 2 , 1 ) . Blundell-Bond estimation yields J B B ( 1 , 0 ) and J B B ( 2 , 1 ) , although a version of J B B ( 1 , 0 ) is reported that does not use weighting matrix S ( 0 ) ( σ ^ η 2 , s ( 1 ) / σ ^ ε 2 , s ( 1 ) ) , but S ( 0 ) ( 0 ) , which is only valid under homoskedasticity and σ η 2 = 0 . Package xtabond2 (StataCorp LLC) addresses overidentification tests after 1-step estimation always as “Sargan test” and after 2-step estimation as “Hansen test”.
  • 4.By specifying instruments in separate groups, xtabond2 (StataCorp LLC) presents for each separate group the corresponding incremental J test. However, not the version as defined in (87), but an asymptotically equivalent one as suggested in Hayashi [34] (p. 220) which will never be negative.
  • 5.Proved in Arellano and Bover [3].
  • 6.If we would strictly follow the notation of the earlier sections the coefficient β should actually be called δ when ρ v ε 0 .
  • 7.Note that ( 1 ξ L ) 1 = 1 + ξ L + ξ 2 L 2 + . . . and therefore ( 1 ξ L ) 1 η i = j = 0 ξ j η i = η i / ( 1 ξ ) .
  • 8.Such control is not exercised in the simulation designs of Blundell et al. [26] and Bun and Sarafidis [27]. They do consider simultaneity, but its magnitude has not been mentioned and it is not kept constant over different designs.
  • 9.The b variant differs from a only for T > 3 and may then be not positive definite. For T = 6 , 9 it proved to be so bad for both AB and BB that we discarded it completely from the presented tables.
  • 10.Below various results will be discussed in the text without referring to a table presented in the article, simply because we did not find it worthwhile to include it, respecting reasonable space limitations. However, the full set of all Monte Carlo results produced for this article is available as supplementary material at: https://www.mdpi.com/2225-1146/5/1/14/s1.
  • 11.See findings in Kiviet [40] and in many of its references.
  • 12.Baltagi et al. [41] study a similar life-cycle labor-supply model for physicians in Norway. They consider a dynamic model, and this rejects the static specification used by Ziliak [35].
  • 13.This static model is also used extensively for illustrative purposes in Cameron and Trivedi [42].
  • 14.If a g e i , t = a g e i , t 1 + 1 then a g e i t 2 = a g e i , t 1 2 + 2 a g e i , t 1 + 1 . Thus, including lags of a g e i t and of a g e i t 2 in addition to their current values, either as regressors or as instruments, in combination with an intercept or time-dummies, leads to rank reduction. Although in this particular data set a g e i , t = a g e i , t 1 + 1 does not hold i , t , we abstained from including lags of age and its square.
  • 15.In this Appendix we repeatedly use the result that the p conditions E ( a C b ) = 0 , where a is a random scalar, b a p × 1 random vector and C a deterministic nonsingular p × p matrix, are equivalent with the p conditions E ( a b ) = 0 , because E ( a b ) = 0 C E ( a b ) = 0 E ( a C b ) = 0 .
  • 16.Here C = ( D e T , t ) I K x .
  • 17.Now C = ( D e T , t ) .

References

  1. M. Arellano, and S. Bond. “Some Tests of Specification for Panel Data: Monte Carlo Evidence and an Application to Employment Equations.” Rev. Econ. Stud. 58 (1991): 277–297. [Google Scholar] [CrossRef]
  2. R. Blundell, and S. Bond. “Initial Conditions and Moment Restrictions in Dynamic Panel Data Models.” J. Econom. 87 (1998): 115–143. [Google Scholar] [CrossRef]
  3. M. Arellano, and O. Bover. “Another Look at the Instrumental Variable Estimation of Error-Components Models.” J. Econom. 68 (1995): 29–51. [Google Scholar] [CrossRef]
  4. J. Hahn, and G. Kuersteiner. “Asymptotically Unbiased Inference for a Dynamic Panel Model with Fixed Effects When Both n and T Are Large.” Econometrica 70 (2002): 1639–1657. [Google Scholar] [CrossRef]
  5. J. Alvarez, and M. Arellano. “The Time Series and Cross-Section Asymptotics of Dynamic Panel Data Estimators.” Econometrica 71 (2003): 1121–1159. [Google Scholar] [CrossRef]
  6. J. Hahn, J. Hausman, and G. Kuersteiner. “Long Difference Instrumental Variables Estimation for Dynamic Panel Models with Fixed Effects.” J. Econom. 140 (2007): 574–617. [Google Scholar] [CrossRef]
  7. J.F. Kiviet. “Judging Contending Estimators by Simulation: Tournaments in Dynamic Panel Data Models.” In The Refinement of Econometric Estimation and Test Procedures; Finite Sample and Asymptotic Analysis. Edited by G.D.A. Phillips and E. Tzavalis. Cambridge, UK: Cambridge University Press, 2007, pp. 282–318. [Google Scholar]
  8. H. Kruiniger. “Maximum Likelihood Estimation and Inference Methods for the Covariance Stationary Panel AR(1)/Unit Root Model.” J. Econom. 144 (2008): 447–464. [Google Scholar] [CrossRef]
  9. R. Okui. “The Optimal Choice of Moments in Dynamic Panel Data Models.” J. Econom. 151 (2009): 1–16. [Google Scholar] [CrossRef]
  10. D. Roodman. “A Note on the Theme of too Many Instruments.” Oxf. Bull. Econ. Stat. 71 (2009): 135–158. [Google Scholar] [CrossRef]
  11. K. Hayakawa. “On the Effect of Mean-Nonstationarity in Dynamic Panel Data Models.” J. Econom. 153 (2009): 133–135. [Google Scholar] [CrossRef]
  12. C. Han, and P.C.B. Phillips. “First Difference Maximum Likelihood and Dynamic Panel Estimation.” J. Econom. 175 (2013): 35–45. [Google Scholar] [CrossRef]
  13. J.F. Kiviet. “On Bias, Inconsistency, and Efficiency of Various Estimators in Dynamic Panel Data Models.” J. Econom. 68 (1995): 53–78. [Google Scholar] [CrossRef]
  14. C.G. Bowsher. “On Testing Overidentifying Restrictions in Dynamic Panel Data Models.” Econ. Lett. 77 (2002): 211–220. [Google Scholar] [CrossRef]
  15. C. Hsiao, M.H. Pesaran, and A.K. Tahmiscioglu. “Maximum Likelihood Estimation of Fixed Effects Dynamic Panel Data Models Covering Short Time Periods.” J. Econom. 109 (2002): 107–150. [Google Scholar] [CrossRef]
  16. S. Bond, and F. Windmeijer. “Reliable Inference for GMM Estimators? Finite Sample Properties of Alternative Test Procedures in Linear Panel Data Models.” Econom. Rev. 24 (2005): 1–37. [Google Scholar] [CrossRef]
  17. M.J.G. Bun, and M.A. Carree. “Bias-Corrected Estimation in Dynamic Panel Data Models.” J. Bus. Econ. Stat. 23 (2005): 200–210. [Google Scholar] [CrossRef]
  18. M.J.G. Bun, and M.A. Carree. “Correction: Bias-Corrected Estimation in Dynamic Panel Data Models.” J. Bus. Econ. Stat. 23 (2005): 200–210. [Google Scholar] [CrossRef]
  19. M.J.G. Bun, and J.F. Kiviet. “The Effects of Dynamic Feedbacks on LS and MM Estimator Accuracy in Panel Data Models.” J. Econom. 132 (2006): 409–444. [Google Scholar] [CrossRef]
  20. C. Gouriéroux, P.C.B. Phillips, and J. Yu. “Indirect Inference for Dynamic Panel Models.” J. Econom. 157 (2010): 68–77. [Google Scholar] [CrossRef]
  21. K. Hayakawa. “The Effects of Dynamic Feedbacks on LS and MM Estimator Accuracy in Panel Data Models: Some Additional Results.” J. Econom. 159 (2010): 202–208. [Google Scholar] [CrossRef]
  22. G. Dhaene, and K. Jochmans. “Likelihood Inference in an Autoregression with Fixed Effects.” Econom. Theory 32 (2016): 1178–1215. [Google Scholar] [CrossRef]
  23. M.J. Flannery, and K.W. Hankins. “Estimating Dynamic Panel Models in Corporate Finance.” J. Corp. Financ. 19 (2013): 1–19. [Google Scholar] [CrossRef]
  24. G. Everaert. “Orthogonal to Backward Mean Transformation for Dynamic Panel Data Models.” Econom. J. 16 (2013): 179–221. [Google Scholar] [CrossRef]
  25. S. Kripfganz, and C. Schwarz. Estimation of Linear Dynamic Panel Data Models with Time-Invariant Regressors. ECB Working Paper 1838; Frankfurt am Main, Germany: European Central Bank, 2015. [Google Scholar]
  26. R. Blundell, S. Bond, and F. Windmeijer. “Estimation in Dynamic Panel Data Models: Improving on the Performance of the Standard GMM Estimator.” Adv. Econom. 1 (2001): 53–91. [Google Scholar]
  27. M.J.G. Bun, and V. Sarafidis. “Dynamic Panel Data Models.” In The Oxford Handbook of Panel Data. Edited by B.H. Baltagi. Oxford, UK: Oxford University Press, 2015, pp. 76–110. [Google Scholar]
  28. M.N. Harris, W. Kostenko, L. Mátyás, and I. Timol. “The Robustness of Estimators for Dynamic Panel Data Models to Misspecification.” Singap. Econ. Rev. 54 (2009): 399–426. [Google Scholar] [CrossRef]
  29. F. Windmeijer. “A Finite Sample Correction for the Variance of Linear Efficient Two-Step GMM Estimators.” J. Econom. 126 (2005): 25–51. [Google Scholar] [CrossRef]
  30. M.J.G. Bun, and M.A. Carree. “Bias-Corrected Estimation in Dynamic Panel Data Models with Heteroscedasticity.” Econ. Lett. 92 (2006): 220–227. [Google Scholar] [CrossRef]
  31. A. Juodis. “A Note on Bias-Corrected Estimation in Dynamic Panel Data Models.” Econ. Lett. 118 (2013): 435–438. [Google Scholar] [CrossRef]
  32. E. Moral-Benito. “Likelihood-Based Estimation of Dynamic Panels with Predetermined Regressors.” J. Bus. Econ. Stat. 31 (2013): 451–472. [Google Scholar] [CrossRef]
  33. J.F. Kiviet, and Q. Feng. Efficiency Gains by Modifying GMM Estimation in Linear Models under Heteroskedasticity. UvA-Econometrics Discussion Paper 2014/06; Amsterdam, The Netherlands: University of Amsterdam, Amsterdam School of Economics, 2016. [Google Scholar]
  34. F. Hayashi. Econometrics. Princeton, NJ, USA: Princeton University Press, 2000. [Google Scholar]
  35. J.P. Ziliak. “Efficient Estimation with Panel Data When Instruments are Predetermined: An Empirical Comparison of Moment-Condition Estimators.” J. Bus. Econ. Stat. 15 (1997): 419–431. [Google Scholar] [CrossRef]
  36. M. Arellano. Panel Data Econometrics. Oxford, UK: Oxford University Press, 2003. [Google Scholar]
  37. D. Holtz-Eakin, W. Newey, and H.S. Rosen. “Estimating Vector Autoregressions with Panel Data.” Econometrica 56 (1988): 1371–1395. [Google Scholar] [CrossRef]
  38. T.W. Anderson, and C. Hsiao. “Estimation of Dynamic Models with Error Components.” J. Am. Stat. Assoc. 76 (1981): 598–606. [Google Scholar] [CrossRef]
  39. J.F. Kiviet. “Monte Carlo Simulation for Econometricians.” Found. Trends Econom. 5 (2011): 1–181. [Google Scholar] [CrossRef]
  40. J.F. Kiviet. “Identification and Inference in a Simultaneous Equation under Alternative Information Sets and Sampling Schemes.” Econom. J. 16 (2013): S24–S59. [Google Scholar] [CrossRef]
  41. B.H. Baltagi, E. Bratberg, and T.H. Holmås. “A Panel Data Study of Physicians’ Labor Supply: The Case of Norway.” Health Econ. 14 (2005): 1035–1045. [Google Scholar] [CrossRef] [PubMed]
  42. A.C. Cameron, and P.K. Trivedi. Microeconometrics: Methods and Applications. Cambridge, UK: Cambridge University Press, 2005. [Google Scholar]
  43. D. Roodman. “How to do xtabond2: An Introduction to Difference and System GMM in Stata.” Stata J. 9 (2009): 86–136. [Google Scholar] [CrossRef]
Table 1. Definition of labels for particular instrument matrix reductions.
Table 1. Definition of labels for particular instrument matrix reductions.
A x : Z i x A w : Z i w A v : Z i v
L0 x : d i a g ( x i 2 , . . . , x i T ) L0 w : d i a g ( w i 1 , . . . , w i , T 1 ) L0 v : [ 0 , d i a g ( v i 1 , . . . , v i , T 2 ) ]
L1 x : d i a g ( Δ x i 2 , . . . , Δ x i T ) L1 w : d i a g ( w i 1 , Δ w i 2 , . . . , Δ w i , T 1 ) L1 v : [ 0 , d i a g ( v i 1 , Δ v i 2 , . . . , Δ v i , T 2 ) ]
L2 x : d i a g ( x i 1 , . . . , x i , T 1 ) , L0 x L2 w : [ 0 , d i a g ( w i 1 , . . . , w i , T 2 ) ] , L0 w L2 v : [ 0 , 0 , d i a g ( v i 1 , . . . , v i , T 3 ) ] , L0 v
L3 x : [ 0 , d i a g ( x i 1 , . . . , x i , T 2 ) ] , L2 x L3 w : [ 0 , 0 , d i a g ( w i 1 , . . . , w i , T 3 ) ] , L2 w L3 v : [ 0 , 0 , 0 , d i a g ( v i 1 , . . . , v i , T 4 ) ] , L2 v
C x : Z i * x C w : Z i * w C v : Z i * v
C0 x : ( x i 2 , . . . , x i T ) C0 w : ( w i 1 , . . . , w i , T 1 ) C0 v : ( 0 , v i 1 , . . . , v i , T 2 )
C1 x : ( Δ x i 2 , . . . , Δ x i T ) C1 w : ( 0 , Δ w i 2 , . . . , Δ w i , T 1 ) C1 v : ( 0 , 0 , Δ v i 2 , . . . , Δ v i , T 2 )
C2 x : C0 x , ( x i 1 , . . . , x i , T 1 ) C2 w : C0 w , ( 0 , w i 1 , . . . , w i , T 2 ) C2 v : C0 v , ( 0 , 0 , v i 1 , . . . , v i , T 3 )
C3 x : C2 x , ( 0 , x i 1 , . . . , x i , T 2 ) C3 w : C2 w , ( 0 , 0 , w i 1 , . . . , w i , T 3 ) C3 v : C2 v , ( 0 , 0 , 0 , v i 1 , . . . , v i , T 4 )
Table 2. Heteroskedasticity quantiles and moments of ω i 1 / 2 for different values of θ.
Table 2. Heteroskedasticity quantiles and moments of ω i 1 / 2 for different values of θ.
θ ω i ω i 1 / 2
q 0 . 01 q 0 . 5 q 0 . 99 q 0 . 01 q 0 . 5 q 0 . 99 E ( ω i 1 / 2 ) St . Dev . ( ω i 1 / 2 )
0.10.7890.9951.2560.8880.9981.1210.9990.050
0.30.4760.9561.9210.6900.9781.3860.9890.149
0.50.2760.8832.8240.5250.9391.6810.9690.246
0.70.1540.7833.9890.3910.8851.9970.9410.340
1.00.0590.6076.2110.2430.7792.4920.8820.470
1.30.0210.4308.8400.1450.6552.9730.8100.587
1.60.0070.27811.4980.0820.5273.3910.7260.688
2.00.0010.13514.1920.0360.3683.7670.6070.795
Table 3. P0u-XA *.
Table 3. P0u-XA *.
Unfeasible Coefficient Estimators
θ = 0 θ = 1
ρ ¯ x ε = 0 . 0 L ABuBBuABu BBu MABu MBBu
ABBB γ BiasStdvRMSEBiasStdvRMSEBiasStdvRMSE BiasStdvRMSE BiasStdvRMSE BiasStdvRMSE
T = 3 1116 0.20−0.0120.0580.060−0.0010.0490.049−0.0220.0800.083 −0.0030.0670.067 −0.0150.0660.068 −0.0020.0550.055
0.50−0.0220.0760.080−0.0030.0550.055−0.0420.1050.113 −0.0070.0750.075 −0.0290.0870.091 −0.0030.0610.061
0.80−0.0770.1340.155−0.0090.0670.068−0.1440.1820.232 −0.0180.0960.097 −0.0960.1500.178 −0.0060.0710.071
T = 6 5061 0.20−0.0090.0290.0300.0000.0260.026−0.0170.0400.044 0.0010.0360.036 −0.0100.0300.032 −0.0020.0290.029
0.50−0.0170.0340.0380.0000.0280.028−0.0300.0460.055 −0.0000.0380.038 −0.0200.0370.041 −0.0000.0310.031
0.80−0.0540.0520.075−0.0020.0320.032−0.0940.0700.117 −0.0050.0430.043 −0.0650.0570.087 0.0010.0340.034
T = 9 116133 0.20−0.0080.0210.0230.0010.0200.020−0.0150.0290.032 0.0010.0270.027 −0.0090.0220.024 −0.0020.0210.021
0.50−0.0140.0240.0270.0010.0200.020−0.0240.0310.040 0.0020.0270.027 −0.0160.0250.029 −0.0010.0220.022
0.80−0.0410.0330.053−0.0000.0220.022−0.0690.0430.081 −0.0010.0280.028 −0.0490.0360.061 0.0030.0230.023
ABBB β BiasStdvRMSEBiasStdvRMSEBiasStdvRMSE BiasStdvRMSE BiasStdvRMSE BiasStdvRMSE
T = 3 1116 1.430.0030.1000.1000.0010.0960.0960.0040.1480.148 0.0050.1360.136 0.0040.1000.100 0.0010.0970.097
0.930.0020.0990.0990.0030.0920.0920.0020.1460.146 0.0090.1310.131 0.0020.0990.099 0.0020.0940.094
0.31−0.0020.0970.0970.0060.0920.093−0.0050.1420.142 0.0120.1320.133 −0.0030.0970.097 0.0040.0930.093
T = 6 5061 1.430.0060.0540.055−0.0000.0530.0530.0110.0780.078 −0.0000.0740.074 0.0070.0550.055 0.0010.0540.054
0.930.0070.0530.053−0.0000.0510.0510.0120.0750.076 0.0010.0700.070 0.0080.0530.054 0.0000.0520.052
0.310.0040.0510.0510.0020.0480.0480.0060.0730.073 0.0040.0660.066 0.0050.0510.051 0.0000.0480.048
T = 9 116133 1.430.0070.0400.041−0.0010.0390.0390.0120.0560.057 −0.0010.0540.054 0.0080.0400.041 0.0010.0400.040
0.930.0090.0390.040−0.0010.0370.0370.0140.0540.056 −0.0010.0510.051 0.0100.0390.040 0.0010.0380.038
0.310.0060.0370.0370.0010.0340.0340.0100.0510.052 0.0020.0470.047 0.0080.0370.037 −0.0000.0350.035
Unfeasible t-Test: Actual Significance Level
ρ ¯ x ε = 0 . 0 L θ = 0 θ = 1
ABBB γ ABuBBu β ABuBBu γ ABuBBuMABuMBBu β ABuBBuMABuMBBu
T = 3 1116 0.200.0580.051 1.430.0480.0500.200.0600.0510.0620.049 1.430.0460.0490.0480.049
0.500.0610.053 0.930.0470.0500.500.0660.0530.0670.057 0.930.0450.0480.0470.048
0.800.0890.056 0.310.0420.0490.800.1230.0610.0990.058 0.310.0370.0470.0390.049
T = 6 5061 0.200.0590.041 1.430.0500.0480.200.0710.0480.0610.044 1.430.0520.0500.0490.049
0.500.0740.044 0.930.0520.0470.500.0990.0530.0790.044 0.930.0510.0500.0510.049
0.800.1720.052 0.310.0500.0490.800.2670.0580.1970.054 0.310.0470.0480.0500.049
T = 9 116133 0.200.0710.043 1.430.0490.0470.200.0820.0480.0720.053 1.430.0550.0480.0490.048
0.500.0950.047 0.930.0530.0480.500.1270.0490.1010.048 0.930.0580.0470.0530.049
0.800.2460.053 0.310.0550.0500.800.3770.0530.2810.055 0.310.0540.0500.0570.050
Unfeasible Sargan-Hansen Test: Rejection Probability
ρ ¯ x ε = 0 . 0 df θ = 0 θ = 1
ABBBInc γ JABu JBBu JESu JMABu JMMBu JESMu JABu JBBu JESu JMABu JMMBu JESMu
T = 3 91340.200.0480.0470.0480.0480.0470.0480.0490.0470.045 0.0500.0500.049
0.500.0490.0490.0480.0490.0490.0480.0470.0480.050 0.0490.0510.051
0.800.0390.0510.0630.0390.0510.0630.0330.0480.075 0.0380.0500.067
T = 6 4858100.200.0450.0480.0480.0450.0480.0480.0480.0480.050 0.0450.0480.050
0.500.0430.0450.0490.0430.0450.0490.0430.0480.057 0.0420.0470.052
0.800.0360.0430.0750.0360.0430.0750.0300.0470.103 0.0350.0430.077
T = 9 114130160.200.0480.0530.0480.0480.0530.0480.0490.0540.053 0.0480.0520.051
0.500.0460.0510.0530.0460.0510.0530.0480.0520.059 0.0450.0510.055
0.800.0360.0490.0860.0360.0490.0860.0340.0500.118 0.0370.0490.095
* R = 10 , 000 simulation replications. Design parameter values: N = 200 , S N R = 3 , D E N y = 1 . 0 , E V F x = 0 . 0 , ρ ¯ x ε = 0 . 0 , ξ = 0 . 8 , κ = 0 . 00 , σ ε = 1 , q = 1 , ϕ = 1 . 0 . These yield the DGP parameter values: π λ = 0 . 00 , π η = 0 . 00 , σ v = 0 . 60 , σ η = 1 . 0 * ( 1 γ ) , ρ v ε = 0 . 0 (and ρ ¯ x η = 0 . 00 , ρ ¯ x λ = 0 . 00 ).
Table 4. P0fc-XA *.
Table 4. P0fc-XA *.
Feasible Coefficient Estimators for Arellano-Bond
θ = 0 θ = 1
ρ ¯ x ε = 0 . 0 AB1AB2aAB2cAB1AB2aAB2cMAB
L γ BiasStdvRMSEBiasStdvRMSEBiasStdvRMSEBiasStdvRMSEBiasStdvRMSEBiasStdvRMSEBiasStdvRMSE
T = 3 110.20−0.0120.0580.060−0.0110.0610.062−0.0120.0590.061−0.0230.0860.089−0.0190.0810.083−0.0220.0810.084−0.0220.0830.086
0.50−0.0220.0760.080−0.0220.0790.082−0.0220.0780.081−0.0440.1120.121−0.0360.1060.112−0.0400.1060.114−0.0420.1090.117
0.80−0.0770.1340.155−0.0770.1410.161−0.0750.1370.156−0.1460.1940.243−0.1320.1880.230−0.1390.1840.231−0.1430.1900.238
T = 6 500.20−0.0090.0290.030−0.0090.0320.033−0.0090.0290.031−0.0190.0450.049−0.0160.0410.043−0.0170.0400.044−0.0140.0360.039
0.50−0.0170.0340.038−0.0170.0380.041−0.0170.0340.038−0.0350.0520.063−0.0280.0470.055−0.0300.0470.055−0.0260.0430.050
0.80−0.0540.0520.075−0.0550.0590.081−0.0530.0530.075−0.1050.0780.131−0.0910.0740.118−0.0940.0710.117−0.0870.0680.110
T = 9 1160.20−0.0080.0210.023−0.0080.0240.025−0.0080.0210.023−0.0170.0330.037−0.0150.0310.035−0.0140.0290.032−0.0110.0240.027
0.50−0.0140.0240.027−0.0140.0260.030−0.0140.0240.028−0.0280.0360.046−0.0260.0340.043−0.0240.0320.040−0.0190.0280.034
0.80−0.0410.0330.053−0.0420.0370.056−0.0410.0330.053−0.0780.0500.093−0.0740.0480.088−0.0700.0430.082−0.0610.0400.073
L β BiasStdvRMSEBiasStdvRMSEBiasStdvRMSEBiasStdvRMSEBiasStdvRMSEBiasStdvRMSEBiasStdvRMSE
T = 3 111.430.0030.1000.1000.0040.1030.1030.0030.1010.1010.0060.1580.1580.0050.1450.1450.0050.1470.1470.0050.1480.148
0.930.0020.0990.0990.0030.1030.1030.0020.1000.1000.0040.1570.1570.0030.1440.1440.0020.1460.1460.0030.1470.147
0.31−0.0020.0970.097−0.0020.1010.101−0.0020.0990.099−0.0040.1520.152−0.0050.1410.141−0.0050.1430.143−0.0040.1420.142
T = 6 501.430.0060.0540.0550.0060.0600.0610.0060.0550.0550.0130.0870.0880.0100.0770.0770.0110.0780.0790.0090.0670.067
0.930.0070.0530.0530.0070.0590.0590.0070.0530.0540.0140.0850.0860.0110.0740.0750.0120.0760.0770.0100.0650.066
0.310.0040.0510.0510.0040.0570.0570.0040.0520.0520.0070.0820.0820.0050.0720.0720.0060.0730.0740.0050.0630.063
T = 9 1161.430.0070.0400.0410.0080.0450.0450.0070.0410.0410.0140.0650.0660.0130.0600.0610.0120.0560.0580.0090.0460.047
0.930.0090.0390.0400.0090.0430.0440.0090.0390.0400.0170.0620.0640.0160.0580.0600.0140.0540.0560.0120.0440.046
0.310.0060.0370.0370.0070.0410.0410.0060.0370.0370.0120.0590.0600.0110.0550.0560.0100.0510.0520.0090.0420.043
Feasible Coefficient Estimators for Blundell-Bond
θ = 0 θ = 1
ρ ¯ x ε = 0 . 0 BB1BB2aBB2cBB1BB2aBB2cMBB
L γ BiasStdvRMSEBiasStdvRMSEBiasStdvRMSEBiasStdvRMSEBiasStdvRMSEBiasStdvRMSEBiasStdvRMSE
T = 3 160.20−0.0030.0490.050−0.0000.0510.051−0.0000.0510.051−0.0100.0730.074−0.0040.0670.068−0.0020.0700.070−0.0030.0720.072
0.50−0.0090.0570.058−0.0040.0580.058−0.0030.0570.057−0.0210.0850.087−0.0100.0780.078−0.0050.0790.0790.0020.0920.092
0.80−0.0290.0750.081−0.0130.0760.077−0.0100.0730.074−0.0550.1110.124−0.0310.1080.112−0.0140.1070.1080.0200.1550.157
T = 6 610.20−0.0020.0260.026−0.0010.0280.0280.0010.0270.027−0.0100.0410.042−0.0060.0370.0370.0000.0380.038−0.0030.0340.034
0.50−0.0080.0290.030−0.0030.0320.0320.0000.0300.030−0.0210.0450.050−0.0130.0400.043−0.0010.0400.040−0.0030.0390.039
0.80−0.0290.0380.048−0.0140.0390.041−0.0050.0350.035−0.0580.0560.081−0.0420.0520.067−0.0090.0470.0480.0070.0560.056
T = 9 1330.20−0.0020.0200.020−0.0010.0210.0210.0010.0200.020−0.0090.0310.033−0.0080.0300.0310.0010.0280.028−0.0030.0240.024
0.50−0.0080.0210.022−0.0060.0220.0230.0010.0210.021−0.0190.0330.038−0.0170.0310.0360.0010.0290.029−0.0040.0260.027
0.80−0.0270.0260.038−0.0210.0270.034−0.0030.0240.024−0.0530.0400.066−0.0490.0380.062−0.0060.0310.032−0.0040.0340.034
L β BiasStdvRMSEBiasStdvRMSEBiasStdvRMSEBiasStdvRMSEBiasStdvRMSEBiasStdvRMSEBiasStdvRMSE
T = 3 161.430.0020.0960.0960.0020.1010.1010.0020.0970.0970.0070.1490.1490.0040.1380.1380.0080.1370.1370.0090.1390.139
0.930.0040.0940.0940.0040.0970.0970.0050.0930.0930.0090.1460.1460.0080.1330.1330.0140.1320.1330.0210.1360.137
0.310.0050.0940.0940.0070.0980.0980.0070.0940.0940.0100.1450.1460.0100.1340.1340.0150.1350.1350.0410.1570.162
T = 6 611.430.0020.0530.0530.0010.0590.059−0.0000.0540.0540.0070.0850.0850.0040.0750.0750.0000.0750.0750.0030.0650.065
0.930.0040.0520.0520.0020.0570.0570.0000.0510.0510.0110.0820.0830.0070.0720.0730.0030.0720.0720.0050.0630.063
0.310.0040.0500.0500.0040.0540.0540.0030.0480.0480.0090.0790.0790.0070.0700.0700.0060.0670.0680.0080.0600.060
T = 9 1331.430.0020.0390.0390.0020.0430.043−0.0010.0400.0400.0080.0640.0640.0070.0600.060−0.0010.0550.0550.0030.0460.046
0.930.0050.0380.0380.0040.0410.042−0.0000.0380.0380.0120.0610.0620.0110.0570.0580.0000.0520.0520.0030.0440.044
0.310.0060.0360.0360.0050.0390.0390.0020.0350.0350.0110.0570.0580.0100.0540.0550.0040.0470.0480.0040.0400.041
* R = 10 , 000 simulation replications. Design parameter values: N = 200 , S N R = 3 , D E N y = 1 . 0 , E V F x = 0 . 0 , ρ ¯ x ε = 0 . 0 , ξ = 0 . 8 , κ = 0 . 00 , σ ε = 1 , q = 1 , ϕ = 1 . 0 . These yield the DGP parameter values: π λ = 0 . 00 , π η = 0 . 00 , σ v = 0 . 60 , σ η = 1 . 0 * ( 1 γ ) , ρ v ε = 0 . 0 (and ρ ¯ x η = 0 . 00 , ρ ¯ x λ = 0 . 00 ).
Table 5. P0ft-XA *.
Table 5. P0ft-XA *.
Feasible t-Test Arellano-Bond: Actual Significance Level
ρ ¯ x ε = 0 . 0 θ = 0 θ = 1
L γ AB1AB1aRAB1cRAB2aAB2aWAB2cAB2cWAB1AB1aRAB1cRAB2aAB2aWAB2cAB2cWMAB
T = 3 110.200.0620.0640.0570.0840.0620.0650.0590.2130.0880.0700.1340.0740.0820.0670.556
0.500.0720.0720.0650.0930.0710.0730.0690.2370.1040.0860.1500.0830.1010.0830.584
0.800.1200.1190.1140.1440.1040.1210.1140.3480.1900.1720.2480.1460.1920.1660.697
T = 6 500.200.0610.0630.0520.1590.0610.0580.0530.2460.0910.0670.3540.0770.0780.0660.201
0.500.0780.0780.0710.1820.0710.0760.0710.2990.1230.0980.3950.0960.1080.0910.241
0.800.1910.1830.1760.3170.1420.1820.1740.5470.3290.2850.6170.2340.2990.2670.465
T = 9 1160.200.0730.0740.0620.3240.0700.0690.0640.2720.1010.0750.6890.0950.0870.0750.154
0.500.0980.0960.0900.3580.0840.0960.0900.3440.1490.1170.7280.1390.1300.1150.198
0.800.2550.2420.2390.5520.1920.2460.2350.6530.4110.3670.8930.3760.3960.3660.460
L β AB1AB1aRAB1cRAB2aAB2aWAB2cAB2cWAB1AB1aRAB1cRAB2aAB2aWAB2cAB2cWMAB
T = 3 111.430.0520.0520.0510.0730.0530.0570.0520.2160.0700.0650.1170.0620.0710.0560.586
0.930.0510.0520.0510.0730.0530.0570.0510.2150.0730.0650.1180.0620.0720.0550.582
0.310.0500.0510.0500.0720.0520.0550.0500.2140.0680.0620.1160.0610.0720.0570.581
T = 6 501.430.0510.0550.0500.1450.0550.0560.0510.2220.0680.0590.3150.0600.0680.0570.210
0.930.0540.0540.0520.1450.0550.0580.0530.2250.0690.0610.3150.0620.0690.0580.216
0.310.0540.0540.0540.1440.0520.0590.0540.2320.0660.0640.3170.0580.0700.0580.229
T = 9 1161.430.0500.0520.0480.2960.0570.0540.0510.2320.0680.0570.6520.0670.0670.0550.138
0.930.0530.0550.0520.2980.0570.0580.0540.2410.0720.0640.6570.0710.0700.0590.145
0.310.0570.0560.0580.2970.0570.0610.0570.2410.0690.0690.6540.0660.0710.0610.153
Feasible t-Test Blundell-Bond: Actual Significance Level
ρ ¯ x ε = 0 . 0 θ = 0 θ = 1
L γ BB1BB1aRBB1cRBB2aBB2aWBB2cBB2cWBB1BB1aRBB1cRBB2aBB2aWBB2cBB2cWMBB
T = 3 160.200.0510.0560.0410.0930.0590.0580.0500.1960.0760.0510.1580.0640.0730.0580.494
0.500.0560.0570.0460.0980.0570.0620.0560.2090.0790.0560.1650.0670.0800.0650.503
0.800.0660.0650.0480.1030.0570.0550.0430.2510.1000.0710.2080.0730.0870.0650.595
T = 6 610.200.0420.0520.0340.1780.0560.0450.0380.2060.0740.0480.3890.0670.0590.0460.160
0.500.0530.0600.0440.1840.0540.0520.0420.2480.0920.0640.4130.0730.0690.0520.167
0.800.1160.1220.1000.2170.0610.0630.0460.4240.2180.1610.5620.1290.0790.0540.261
T = 9 1330.200.0490.0560.0410.3640.0540.0500.0420.2210.0760.0510.7350.0720.0570.0430.119
0.500.0630.0700.0560.3810.0580.0560.0440.2790.1080.0780.7670.1010.0630.0480.121
0.800.1730.1760.1560.5060.1090.0660.0490.5520.3130.2490.8940.2820.0740.0490.167
L β BB1BB1aRBB1cRBB2aBB2aWBB2cBB2cWBB1BB1aRBB1cRBB2aBB2aWBB2cBB2cWMBB
T = 3 161.430.0510.0540.0520.0830.0560.0580.0530.2100.0700.0610.1470.0640.0730.0580.571
0.930.0500.0530.0510.0840.0550.0570.0520.2140.0700.0610.1480.0640.0730.0580.564
0.310.0480.0510.0490.0870.0560.0590.0540.2170.0720.0620.1550.0650.0780.0640.626
T = 6 611.430.0480.0520.0470.1660.0550.0530.0490.2160.0690.0530.3710.0590.0640.0520.200
0.930.0510.0550.0500.1660.0530.0540.0490.2230.0690.0580.3760.0630.0640.0510.205
0.310.0530.0550.0530.1690.0550.0530.0500.2290.0680.0600.3880.0640.0640.0540.221
T = 9 1331.430.0470.0510.0460.3420.0520.0510.0470.2200.0670.0540.7170.0640.0600.0490.128
0.930.0500.0520.0490.3490.0530.0530.0490.2320.0700.0600.7180.0670.0610.0500.132
0.310.0550.0550.0550.3560.0560.0560.0510.2340.0700.0660.7300.0660.0630.0540.144
* R = 10 , 000 simulation replications. Design parameter values: N = 200 , S N R = 3 , D E N y = 1 . 0 , E V F x = 0 . 0 , ρ ¯ x ε = 0 . 0 , ξ = 0 . 8 , κ = 0 . 00 , σ ε = 1 , q = 1 , ϕ = 1 . 0 . These yield the DGP parameter values: π λ = 0 . 00 , π η = 0 . 00 , σ v = 0 . 60 , σ η = 1 . 0 * ( 1 γ ) , ρ v ε = 0 . 0 (and ρ ¯ x η = 0 . 00 , ρ ¯ x λ = 0 . 00 ).
Table 6. P0fJ-XA *.
Table 6. P0fJ-XA *.
Feasible Sargan-Hansen Test: Rejection Probability
ρ ¯ x ε = 0 . 0 df θ = 0
ABBBInc γ JAB a ( 2 , 1 ) JBB a ( 2 , 1 ) JES a ( 2 , 1 ) JAB c ( 2 , 1 ) JBB c ( 2 , 1 ) JES c ( 2 , 1 ) JMAB JMBB JESM
T = 3 91340.200.0470.0500.0530.0430.0330.0300.2440.3040.306
0.500.0500.0510.0520.0440.0340.0250.2460.3280.327
0.800.0610.0550.0470.0550.0350.0210.2510.3810.375
T = 6 4858100.200.0340.0380.0680.0260.0250.0300.0320.3860.439
0.500.0370.0390.0620.0270.0230.0230.0330.3910.442
0.800.0460.0420.0560.0310.0220.0130.0390.4040.452
T = 9 114130160.200.0070.0020.0560.0210.0230.0330.0220.4090.466
0.500.0070.0020.0530.0210.0210.0260.0220.4110.467
0.800.0090.0020.0480.0250.0190.0140.0260.4160.471
ρ ¯ x ε = 0 . 0 df θ = 1
ABBBInc γ JAB a ( 2 , 1 ) JBB a ( 2 , 1 ) JES a ( 2 , 1 ) JAB c ( 2 , 1 ) JBB c ( 2 , 1 ) JES c ( 2 , 1 ) JMAB JMBB JESM
T = 3 91340.200.0360.0350.0470.0370.0340.0410.2780.5690.557
0.500.0410.0350.0450.0420.0380.0420.2800.5940.581
0.800.0570.0410.0450.0610.0470.0470.3000.6200.608
T = 6 4858100.200.0160.0150.0540.0200.0220.0280.0370.7270.754
0.500.0180.0160.0510.0230.0200.0270.0360.7300.756
0.800.0240.0170.0480.0330.0280.0270.0420.7380.761
T = 9 114130160.200.0010.0000.0460.0150.0170.0320.0240.7640.788
0.500.0010.0000.0440.0170.0150.0250.0230.7660.788
0.800.0010.0000.0380.0250.0190.0200.0270.7700.791
* R = 10 , 000 simulation replications. Design parameter values: N = 200 , S N R = 3 , D E N y = 1 . 0 , E V F x = 0 . 0 , ρ ¯ x ε = 0 . 0 , ξ = 0 . 8 , κ = 0 . 00 , σ ε = 1 , q = 1 , ϕ = 1 . 0 . These yield the DGP parameter values: π λ = 0 . 00 , π η = 0 . 00 , σ v = 0 . 60 , σ η = 1 . 0 * ( 1 γ ) , ρ v ε = 0 . 0 (and ρ ¯ x η = 0 . 00 , ρ ¯ x λ = 0 . 00 ).
Table 7. P0fσ-XA *.
Table 7. P0fσ-XA *.
Standard Errors of Error Components Eta and Epsilon
θ = 0 θ = 1
ρ ¯ x ε = 0 . 0 Bias σ ^ η Bias σ ^ ε Bias σ ^ η Bias σ ^ ε
L γ σ η AB1AB2aAB2cAB1AB2aAB2cAB1AB2aAB2cMABAB1AB2aAB2cMAB
T = 3 110.200.800.0250.0240.025−0.007−0.006−0.0070.0530.0430.0480.049−0.016−0.013−0.015−0.015
0.500.500.0500.0490.049−0.011−0.011−0.0110.1060.0860.0950.099−0.024−0.020−0.022−0.023
0.800.200.2240.2280.223−0.033−0.033−0.0330.4130.3770.3900.402−0.063−0.056−0.060−0.061
T = 6 500.200.800.0130.0130.013−0.003−0.002−0.0030.0270.0220.0230.019−0.006−0.005−0.006−0.005
0.500.500.0270.0270.026−0.005−0.005−0.0050.0570.0460.0480.042−0.011−0.009−0.010−0.008
0.800.200.1270.1290.126−0.019−0.019−0.0190.2440.2140.2190.202−0.036−0.032−0.032−0.030
T = 9 1160.200.800.0100.0100.010−0.001−0.001−0.0010.0200.0190.0170.013−0.004−0.003−0.003−0.002
0.500.500.0190.0200.019−0.003−0.003−0.0030.0400.0370.0340.027−0.006−0.006−0.005−0.004
0.800.200.0920.0940.092−0.012−0.012−0.0120.1720.1620.1540.134−0.022−0.021−0.020−0.017
L γ σ η BB1BB2aBB2cBB1BB2aBB2cBB1BB2aBB2cMBBBB1BB2aBB2cMBB
T = 3 160.200.800.0080.0060.005−0.004−0.003−0.0030.0260.0150.0120.016−0.012−0.008−0.008−0.009
0.500.500.0210.0120.009−0.006−0.004−0.0040.0510.0280.0160.013−0.016−0.010−0.008−0.006
0.800.200.0900.0490.037−0.016−0.008−0.0060.1760.1140.0720.097−0.031−0.019−0.0110.007
T = 6 610.200.800.0030.002−0.000−0.001−0.001−0.0010.0140.0090.0000.005−0.005−0.004−0.002−0.003
0.500.500.0130.005−0.000−0.003−0.001−0.0010.0340.0220.0010.005−0.008−0.006−0.002−0.003
0.800.200.0690.0290.003−0.011−0.006−0.0020.1410.1020.011−0.022−0.023−0.017−0.0050.001
T = 9 1330.200.800.0020.002−0.001−0.001−0.001−0.0000.0110.010−0.0010.004−0.003−0.003−0.001−0.002
0.500.500.0100.008−0.002−0.002−0.001−0.0000.0270.024−0.0010.005−0.005−0.005−0.001−0.002
0.800.200.0610.0450.000−0.008−0.006−0.0010.1190.1100.004−0.001−0.017−0.015−0.003−0.002
* R = 10 , 000 simulation replications. Design parameter values: N = 200 , S N R = 3 , D E N y = 1 . 0 , E V F x = 0 . 0 , ρ ¯ x ε = 0 . 0 , ξ = 0 . 8 , κ = 0 . 00 , σ ε = 1 , q = 1 , ϕ = 1 . 0 . These yield the DGP parameter values: π λ = 0 . 00 , π η = 0 . 00 , σ v = 0 . 60 , σ η = 1 . 0 * ( 1 γ ) ; ρ v ε = 0 . 0 (and ρ ¯ x η = 0 . 00 , ρ ¯ x λ = 0 . 00 ).
Table 8. P0fc-XC *.
Table 8. P0fc-XC *.
Feasible Coefficient Estimators for Arellano-Bond ( θ = 1 )
ρ ¯ x ε = 0 . 0 AB1AB2aAB2cMAB
L γ BiasStdvRMSEBiasStdvRMSEBiasStdvRMSEBiasStdvRMSE
T = 3 60.20−0.0100.0900.090−0.0100.0860.086−0.0130.0870.088−0.0100.0860.086
0.50−0.0190.1180.120−0.0180.1130.114−0.0220.1140.116−0.0180.1130.115
0.80−0.0650.2170.227−0.0620.2070.216−0.0740.2060.219−0.0650.2100.220
T = 6 120.20−0.0060.0490.050−0.0040.0460.046−0.0060.0470.048−0.0040.0390.040
0.50−0.0110.0590.060−0.0070.0540.055−0.0100.0560.057−0.0070.0470.047
0.80−0.0350.0940.100−0.0260.0880.092−0.0330.0890.095−0.0240.0740.078
T = 9 180.20−0.0050.0370.037−0.0030.0330.034−0.0040.0350.035−0.0020.0270.027
0.50−0.0090.0420.043−0.0060.0380.039−0.0070.0400.041−0.0040.0310.031
0.80−0.0240.0610.066−0.0170.0560.059−0.0210.0580.062−0.0140.0450.047
L β BiasStdvRMSEBiasStdvRMSEBiasStdvRMSEBiasStdvRMSE
T = 3 61.430.0040.1810.1810.0030.1720.1720.0020.1740.1740.0040.1590.159
0.930.0030.1800.1800.0020.1720.1720.0020.1740.1740.0040.1590.159
0.310.0010.1790.179−0.0010.1710.171−0.0010.1730.1730.0000.1580.158
T = 6 121.430.0040.1100.1110.0000.1020.1020.0030.1060.1060.0020.0790.079
0.930.0040.1090.1090.0000.1010.1010.0030.1050.1050.0020.0770.077
0.310.0010.1090.109−0.0010.1010.1010.0000.1050.1050.0010.0760.076
T = 9 181.430.0030.0840.0840.0000.0750.0750.0020.0800.0800.0010.0560.056
0.930.0030.0830.0830.0010.0740.0740.0020.0790.0790.0020.0550.055
0.310.0010.0820.082−0.0010.0730.0730.0000.0780.0780.0000.0530.053
Feasible Coefficient Estimators for Blundell-Bond ( θ = 1 )
ρ ¯ x ε = 0 . 0 BB1BB2aBB2cMBB
L γ BiasStdvRMSEBiasStdvRMSEBiasStdvRMSEBiasStdvRMSE
T = 3 90.20−0.0050.0780.078−0.0020.0740.074−0.0040.0750.0750.0010.0760.076
0.50−0.0110.0920.093−0.0070.0860.086−0.0080.0870.0880.0100.1010.101
0.80−0.0370.1290.134−0.0240.1230.126−0.0250.1260.1280.0520.2040.210
T = 6 150.20−0.0040.0460.046−0.0010.0420.042−0.0020.0440.044−0.0010.0360.036
0.50−0.0070.0520.052−0.0020.0470.047−0.0030.0480.0490.0010.0410.041
0.80−0.0210.0710.074−0.0100.0630.063−0.0110.0650.0660.0220.0640.067
T = 9 210.20−0.0030.0350.035−0.0010.0310.031−0.0010.0330.033−0.0010.0260.026
0.50−0.0060.0380.039−0.0020.0340.034−0.0030.0360.036−0.0010.0280.028
0.80−0.0160.0510.053−0.0070.0440.045−0.0070.0470.0470.0030.0380.038
L β BiasStdvRMSEBiasStdvRMSEBiasStdvRMSEBiasStdvRMSE
T = 3 91.430.0040.1700.1700.0010.1620.1620.0050.1630.1630.0070.1570.157
0.930.0060.1680.1680.0040.1580.1580.0090.1600.1600.0240.1540.156
0.310.0080.1720.1720.0070.1640.1640.0100.1650.1660.0520.1800.188
T = 6 151.430.0040.1070.1070.0020.0980.0980.0040.1020.1020.0020.0780.078
0.930.0050.1050.1050.0030.0950.0960.0060.1000.1000.0050.0760.076
0.310.0040.1060.1060.0030.0970.0970.0050.1010.1010.0140.0760.077
T = 9 211.430.0030.0820.0820.0020.0730.0730.0030.0780.0780.0010.0560.056
0.930.0040.0800.0800.0030.0710.0710.0050.0750.0750.0020.0540.054
0.310.0030.0790.0800.0030.0700.0700.0040.0750.0750.0050.0530.053
* R = 10 , 000 simulation replications. Design parameter values: N = 200 , S N R = 3 , D E N y = 1 . 0 , E V F x = 0 . 0 , ρ ¯ x ε = 0 . 0 , ξ = 0 . 8 , κ = 0 . 00 , σ ε = 1 , q = 1 , ϕ = 1 . 0 . These yield the DGP parameter values: π λ = 0 . 00 , π η = 0 . 00 , σ v = 0 . 60 , σ η = 1 . 0 * ( 1 γ ) , ρ v ε = 0 . 0 (and ρ ¯ x η = 0 . 00 , ρ ¯ x λ = 0 . 00 ).
Table 9. P0ft-XC *.
Table 9. P0ft-XC *.
Feasible t-Test: Actual Significance Level ( θ = 1 )
ρ ¯ x ε = 0 . 0 Arellano-Bond Blundell-Bond
L γ AB1AB1aRAB1cRAB2aAB2aWAB2cAB2cW L γ BB1BB1aRBB1cRBB2aBB2aWBB2cBB2cW
T = 3 60.200.1920.0730.0510.0970.0710.0590.05590.200.1960.0700.0460.1100.0660.0580.051
0.500.1980.0790.0580.1000.0720.0660.061 0.500.2010.0710.0460.1090.0670.0640.056
0.800.2300.1100.0910.1320.0940.1020.095 0.800.2180.0790.0460.1310.0710.0630.055
T = 6 120.200.2080.0700.0470.1200.0620.0480.046150.200.2030.0660.0430.1290.0600.0480.046
0.500.2060.0690.0490.1210.0610.0500.048 0.500.2030.0670.0430.1270.0580.0480.045
0.800.2390.0900.0670.1390.0710.0720.069 0.800.2250.0810.0500.1350.0620.0490.045
T = 9 180.200.2160.0700.0470.1480.0640.0510.048210.200.2080.0670.0470.1530.0630.0500.048
0.500.2160.0710.0490.1440.0640.0540.053 0.500.2090.0690.0470.1550.0590.0490.047
0.800.2330.0900.0680.1650.0680.0680.066 0.800.2250.0780.0540.1600.0580.0490.045
L β AB1AB1aRAB1cRAB2aAB2aWAB2cAB2cW L β BB1BB1aRBB1cRBB2aBB2aWBB2cBB2cW
T = 3 61.430.2120.0690.0630.0910.0680.0680.06191.430.2100.0680.0620.1040.0660.0690.062
0.930.2110.0660.0610.0890.0670.0660.059 0.930.2130.0680.0610.1050.0680.0680.062
0.310.2060.0660.0590.0890.0640.0610.054 0.310.2150.0700.0620.1110.0730.0720.064
T = 6 121.430.2150.0640.0540.1140.0620.0580.055151.430.2140.0650.0560.1240.0630.0620.058
0.930.2170.0650.0540.1140.0620.0560.055 0.930.2160.0660.0560.1250.0640.0620.057
0.310.2140.0650.0520.1160.0640.0550.053 0.310.2170.0680.0540.1280.0650.0600.058
T = 9 181.430.2210.0640.0510.1350.0580.0550.054211.430.2200.0650.0520.1470.0590.0560.054
0.930.2220.0630.0520.1380.0590.0540.053 0.930.2190.0630.0520.1490.0590.0550.053
0.310.2190.0630.0520.1390.0570.0530.051 0.310.2210.0640.0530.1500.0570.0560.054
* R = 10 , 000 simulation replications. Design parameter values: N = 200 , S N R = 3 , D E N y = 1 . 0 , E V F x = 0 . 0 , ρ ¯ x ε = 0 . 0 , ξ = 0 . 8 , κ = 0 . 00 , σ ε = 1 , q = 1 , ϕ = 1 . 0 . These yield the DGP parameter values: π λ = 0 . 00 , π η = 0 . 00 , σ v = 0 . 60 , σ η = 1 . 0 * ( 1 γ ) , ρ v ε = 0 . 0 (and ρ ¯ x η = 0 . 00 , ρ ¯ x λ = 0 . 00 ).
Table 10. P0fJ-XC *.
Table 10. P0fJ-XC *.
Feasible Sargan-Hansen Test: Rejection Probability
ρ ¯ x ε = 0 . 0 df θ = 1
ABBBInc γ JAB a ( 2 , 1 ) JBB a ( 2 , 1 ) JES a ( 2 , 1 ) JAB c ( 2 , 1 ) JBB c ( 2 , 1 ) JES c ( 2 , 1 )
T = 3 4620.200.0370.0370.0450.0420.0400.044
0.500.0400.0370.0420.0450.0420.042
0.800.0480.0400.0410.0540.0480.046
T = 6 101220.200.0320.0300.0480.0360.0350.042
0.500.0310.0290.0460.0370.0370.044
0.800.0370.0350.0470.0400.0400.053
T = 9 161820.200.0270.0270.0470.0330.0340.050
0.500.0290.0280.0460.0340.0320.049
0.800.0320.0300.0510.0350.0360.054
* R = 10 , 000 simulation replications. Design parameter values: N = 200 , S N R = 3 , D E N y = 1 . 0 , E V F x = 0 . 0 , ρ ¯ x ε = 0 . 0 , ξ = 0 . 8 , κ = 0 . 00 , σ ε = 1 , q = 1 , ϕ = 1 . 0 . These yield the DGP parameter values: π λ = 0 . 00 , π η = 0 . 00 , σ v = 0 . 60 , σ η = 1 . 0 * ( 1 γ ) , ρ v ε = 0 . 0 (and ρ ¯ x η = 0 . 00 , ρ ¯ x λ = 0 . 00 ).
Table 11. P0fσ-XC *.
Table 11. P0fσ-XC *.
Standard Errors of Error Components η i and ε it
θ = 0 θ = 1
ρ ¯ x ε = 0 . 0 Bias σ ^ η Bias σ ^ ε Bias σ ^ η Bias σ ^ ε
L γ σ η AB1AB2aAB2cAB1AB2aAB2cAB1AB2aAB2cMABAB1AB2aAB2cMAB
T = 3 60.200.800.0170.0180.018−0.004−0.004−0.0040.0370.0350.0400.032−0.010−0.009−0.011−0.010
0.500.500.0290.0300.032−0.006−0.006−0.0060.0700.0630.0720.061−0.013−0.012−0.014−0.013
0.800.200.1520.1550.159−0.014−0.013−0.0150.3060.2840.3050.291−0.027−0.026−0.031−0.027
T = 6 120.200.800.0050.0050.005−0.001−0.001−0.0010.0130.0090.0120.007−0.003−0.002−0.003−0.002
0.500.500.0100.0100.010−0.001−0.001−0.0010.0250.0170.0220.013−0.004−0.003−0.004−0.003
0.800.200.0340.0340.036−0.005−0.004−0.0050.0970.0740.0910.059−0.011−0.008−0.011−0.009
T = 9 180.200.800.0030.0030.003−0.000−0.000−0.0000.0080.0060.0070.004−0.001−0.001−0.001−0.001
0.500.500.0060.0060.006−0.001−0.000−0.0010.0160.0100.0130.007−0.002−0.001−0.002−0.001
0.800.200.0150.0140.016−0.002−0.002−0.0030.0530.0340.0440.021−0.006−0.004−0.006−0.004
L γ σ η BB1BB2aBB2cBB1BB2aBB2cBB1BB2aBB2cMBBBB1BB2aBB2cMBB
T = 3 90.200.800.0080.0080.008−0.003−0.002−0.0030.0230.0170.0190.012−0.009−0.006−0.007−0.006
0.500.500.0160.0140.014−0.004−0.003−0.0040.0400.0270.0280.009−0.011−0.008−0.009−0.001
0.800.200.0730.0560.061−0.010−0.007−0.0080.1580.1200.1240.149−0.021−0.014−0.0150.027
T = 6 150.200.800.0030.0020.002−0.001−0.000−0.0010.0090.0040.0050.003−0.003−0.002−0.002−0.002
0.500.500.0060.0030.004−0.001−0.000−0.0010.0170.0070.0090.001−0.004−0.002−0.002−0.001
0.800.200.0170.0050.010−0.003−0.002−0.0020.0590.0260.031−0.045−0.008−0.004−0.0050.008
T = 9 210.200.800.0020.0010.001−0.000−0.000−0.0000.0060.0030.0030.002−0.001−0.001−0.001−0.001
0.500.500.0040.0020.003−0.000−0.000−0.0000.0110.0050.0050.003−0.002−0.001−0.001−0.001
0.800.200.008−0.0010.003−0.002−0.000−0.0010.0330.0090.011−0.018−0.005−0.002−0.0020.001
* R = 10 , 000 simulation replications. Design parameter values: N = 200 , S N R = 3 , D E N y = 1 . 0 , E V F x = 0 . 0 , ρ ¯ x ε = 0 . 0 , ξ = 0 . 8 , κ = 0 . 00 , σ ε = 1 , q = 1 , ϕ = 1 . 0 . These yield the DGP parameter values: π λ = 0 . 00 , π η = 0 . 00 , σ v = 0 . 60 , σ η = 1 . 0 * ( 1 γ ) , ρ v ε = 0 . 0 (and ρ ¯ x η = 0 . 00 , ρ ¯ x λ = 0 . 00 ).
Table 12. P1fc-XC *.
Table 12. P1fc-XC *.
Feasible Coefficient Estimators for Arellano-Bond ( θ = 1 )
ρ ¯ x ε = 0 . 0 AB1AB2aAB2cMAB
L γ BiasStdvRMSEBiasStdvRMSEBiasStdvRMSEBiasStdvRMSE
T = 3 60.20−0.0170.1340.135−0.0200.1270.129−0.0210.1260.128−0.0170.1320.134
0.50−0.0360.1750.179−0.0380.1670.171−0.0410.1640.169−0.0350.1720.176
0.80−0.1310.3030.330−0.1340.2980.326−0.1350.2840.315−0.1360.2980.328
T = 6 120.20−0.0090.0620.063−0.0070.0580.058−0.0080.0580.059−0.0060.0520.053
0.50−0.0170.0770.079−0.0140.0720.074−0.0160.0720.074−0.0110.0650.066
0.80−0.0610.1260.139−0.0530.1210.133−0.0560.1170.130−0.0490.1130.123
T = 9 180.20−0.0060.0440.044−0.0050.0400.040−0.0050.0410.042−0.0040.0350.035
0.50−0.0120.0520.053−0.0090.0480.049−0.0100.0490.050−0.0070.0410.042
0.80−0.0400.0810.090−0.0320.0770.084−0.0350.0760.083−0.0280.0700.075
L β BiasStdvRMSEBiasStdvRMSEBiasStdvRMSEBiasStdvRMSE
T = 3 61.430.0050.1820.1820.0020.1740.1740.0030.1760.1760.0070.1610.161
0.930.0030.1810.1810.0000.1740.1740.0010.1750.1750.0060.1590.159
0.31−0.0020.1770.177−0.0060.1710.171−0.0040.1720.172−0.0020.1570.157
T = 6 120.200.0040.1100.1100.0020.1030.1030.0040.1060.1060.0030.0800.080
0.930.0040.1090.1090.0010.1020.1020.0040.1050.1050.0040.0780.078
0.310.0000.1080.108−0.0020.1020.102−0.0010.1050.1050.0020.0760.076
T = 9 181.430.0030.0840.0840.0010.0760.0760.0030.0810.0810.0020.0570.057
0.930.0040.0830.0830.0010.0750.0750.0030.0790.0790.0030.0550.055
0.310.0000.0820.082−0.0010.0740.074−0.0000.0780.0780.0010.0530.053
Feasible Coefficient Estimators for Blundell-Bond ( θ = 1 )
ρ ¯ x ε = 0 . 0 BB1BB2aBB2cMBB
L γ BiasStdvRMSEBiasStdvRMSEBiasStdvRMSEBiasStdvRMSE
T = 3 90.200.0230.1250.1280.0180.1140.1160.0160.1140.1150.0210.1230.124
0.500.0110.1240.1250.0120.1240.1240.0100.1260.1260.0170.1290.130
0.80−0.0220.1420.144−0.0150.1480.149−0.0160.1530.1540.0310.1900.193
T = 6 150.200.0070.0610.0620.0040.0530.0540.0020.0550.0550.0020.0490.049
0.500.0020.0640.0640.0040.0610.0610.0020.0630.0630.0010.0560.056
0.80−0.0160.0780.080−0.0050.0750.075−0.0090.0780.0780.0080.0790.080
T = 9 210.200.0030.0440.0440.0020.0380.038−0.0000.0390.0390.0000.0330.033
0.50−0.0010.0460.0460.0020.0420.042−0.0010.0440.044−0.0010.0370.037
0.80−0.0140.0560.058−0.0040.0530.053−0.0070.0550.056−0.0020.0500.050
L β BiasStdvRMSEBiasStdvRMSEBiasStdvRMSEBiasStdvRMSE
T = 3 91.43−0.0100.1850.185−0.0000.1750.1750.0000.1740.174−0.0090.1710.171
0.93−0.0030.1760.176−0.0000.1700.1700.0000.1700.170−0.0010.1630.163
0.310.0040.1740.1740.0020.1670.1670.0030.1680.1680.0190.1670.168
T = 6 151.43−0.0020.1120.112−0.0000.1030.1030.0010.1060.106−0.0010.0820.082
0.930.0010.1080.1080.0010.1010.1010.0020.1040.1040.0000.0790.079
0.310.0030.1060.1070.0020.1000.1000.0030.1030.1030.0040.0770.077
T = 9 211.43−0.0010.0860.086−0.0000.0760.0760.0010.0810.081−0.0000.0580.058
0.930.0020.0820.0820.0010.0740.0740.0020.0780.0780.0000.0560.056
0.310.0020.0800.0800.0020.0730.0730.0020.0760.0760.0010.0540.054
* R = 10 , 000 simulation replications. Design parameter values: N = 200 , S N R = 3 , D E N y = 4 . 0 , E V F x = 0 . 0 , ρ ¯ x ε = 0 . 0 , ξ = 0 . 8 , κ = 0 . 00 , σ ε = 1 , q = 1 , ϕ = 1 . 0 . These yield the DGP parameter values: π λ = 0 . 00 , π η = 0 . 00 , σ v = 0 . 60 , σ η = 4 . 0 * ( 1 γ ) , ρ v ε = 0 . 0 (and ρ ¯ x η = 0 . 00 , ρ ¯ x λ = 0 . 00 ).
Table 13. P1ft-XC *.
Table 13. P1ft-XC *.
Feasible t-Test: Actual Significance Level ( θ = 1 )
ρ ¯ x ε = 0 . 0 Arellano-Bond Blundell-Bond
L γ AB1AB1aRAB1cRAB2aAB2aWAB2cAB2cW L γ BB1BB1aRBB1cRBB2aBB2aWBB2cBB2cW
T = 3 60.200.1650.0750.0640.0990.0720.0680.06390.200.1180.0870.0680.1320.0800.0850.065
0.500.1790.0820.0730.1060.0760.0800.073 0.500.1380.0830.0630.1460.0930.0970.077
0.800.2360.1340.1300.1650.1110.1370.130 0.800.1440.0630.0340.1410.0880.0880.069
T = 6 120.200.1920.0700.0530.1150.0600.0520.050150.200.1240.0710.0480.1270.0620.0520.046
0.500.1920.0720.0580.1170.0590.0560.054 0.500.1410.0650.0450.1330.0650.0590.049
0.800.2150.0990.0860.1550.0760.0880.085 0.800.1610.0650.0380.1380.0690.0580.048
T = 9 180.200.2010.0700.0540.1400.0630.0550.053210.200.1380.0640.0470.1500.0620.0520.048
0.500.2010.0710.0570.1400.0630.0580.055 0.500.1550.0650.0450.1530.0600.0510.046
0.800.2250.0950.0840.1740.0670.0780.076 0.800.1750.0680.0420.1570.0610.0530.044
L β AB1AB1aRAB1cRAB2aAB2aWAB2cAB2cW L β BB1BB1aRBB1cRBB2aBB2aWBB2cBB2cW
T = 3 61.430.2070.0660.0600.0860.0650.0650.05891.430.1760.0680.0580.0970.0650.0650.058
0.930.2040.0640.0570.0860.0650.0610.055 0.930.1910.0660.0560.0970.0660.0660.059
0.310.1950.0610.0510.0830.0610.0570.050 0.310.2080.0690.0570.1030.0680.0670.059
T = 6 121.430.2130.0650.0530.1110.0640.0560.053151.430.1960.0650.0550.1190.0610.0580.055
0.930.2150.0660.0530.1120.0620.0570.053 0.930.2050.0640.0530.1190.0620.0590.055
0.310.2110.0640.0510.1140.0620.0550.052 0.310.2140.0670.0530.1240.0630.0590.058
T = 9 181.430.2190.0630.0510.1310.0600.0530.052211.430.2000.0630.0530.1390.0610.0560.052
0.930.2190.0630.0510.1310.0570.0540.052 0.930.2070.0640.0540.1410.0610.0550.053
0.310.2150.0630.0500.1300.0560.0520.051 0.310.2170.0630.0540.1410.0600.0540.052
* R = 10 , 000 simulation replications. Design parameter values: N = 200 , S N R = 3 , D E N y = 4 . 0 , E V F x = 0 . 0 , ρ ¯ x ε = 0 . 0 , ξ = 0 . 8 , κ = 0 . 00 , σ ε = 1 , q = 1 , ϕ = 1 . 0 . These yield the DGP parameter values: π λ = 0 . 00 , π η = 0 . 00 , σ v = 0 . 60 , σ η = 4 . 0 * ( 1 γ ) , ρ v ε = 0 . 0 (and ρ ¯ x η = 0 . 00 , ρ ¯ x λ = 0 . 00 ).
Table 14. P1fJ-XC *.
Table 14. P1fJ-XC *.
Feasible Sargan-Hansen Test: Rejection Probability
ρ ¯ x ε = 0 . 0 df θ = 1
ABBBInc γ JAB a ( 2 , 1 ) JBB a ( 2 , 1 ) JES a ( 2 , 1 ) JAB c ( 2 , 1 ) JBB c ( 2 , 1 ) JES c ( 2 , 1 )
T = 3 4620.200.0390.0430.0610.0430.0470.054
0.500.0420.0430.0550.0480.0420.046
0.800.0530.0360.0420.0590.0380.036
T = 6 101220.200.0350.0360.0550.0390.0410.053
0.500.0340.0350.0520.0390.0420.054
0.800.0430.0350.0440.0450.0420.044
T = 9 161820.200.0290.0290.0560.0370.0390.054
0.500.0290.0280.0530.0370.0390.054
0.800.0310.0260.0490.0400.0390.046
* R = 10 , 000 simulation replications. Design parameter values: N = 200 , S N R = 3 , D E N y = 4 . 0 , E V F x = 0 . 0 , ρ ¯ x ε = 0 . 0 , ξ = 0 . 8 , κ = 0 . 00 , σ ε = 1 , q = 1 , ϕ = 1 . 0 . These yield the DGP parameter values: π λ = 0 . 00 , π η = 0 . 00 , σ v = 0 . 60 , σ η = 4 . 0 * ( 1 γ ) , ρ v ε = 0 . 0 (and ρ ¯ x η = 0 . 00 , ρ ¯ x λ = 0 . 00 ).
Table 15. P1fσ-XC *.
Table 15. P1fσ-XC *.
Standard Errors of Error Components η i and ε it
θ = 0 θ = 1
ρ ¯ x ε = 0 . 0 Bias σ ^ η Bias σ ^ ε Bias σ ^ η Bias σ ^ ε
L γ σ η AB1AB2aAB2cAB1AB2aAB2cAB1AB2aAB2cMABAB1AB2aAB2cMAB
T = 3 60.203.200.0500.0530.053−0.002−0.002−0.0030.0810.0910.0960.078−0.006−0.007−0.008−0.006
0.502.000.1040.1070.108−0.006−0.006−0.0070.1770.1840.1930.168−0.012−0.013−0.015−0.012
0.800.800.4900.5100.498−0.025−0.025−0.0260.8230.8100.8020.826−0.040−0.041−0.044−0.042
T = 6 120.203.200.0200.0200.020−0.001−0.000−0.0010.0390.0310.0350.024−0.002−0.001−0.002−0.002
0.502.000.0380.0380.038−0.002−0.001−0.0020.0740.0610.0690.049−0.004−0.003−0.004−0.003
0.800.800.1430.1470.144−0.009−0.009−0.0090.2790.2410.2540.219−0.016−0.014−0.016−0.014
T = 9 180.203.200.0120.0130.012−0.0000.000−0.0000.0270.0190.0220.015−0.001−0.001−0.001−0.001
0.502.000.0230.0230.023−0.000−0.000−0.0000.0490.0360.0420.028−0.002−0.001−0.002−0.001
0.800.800.0870.0860.087−0.004−0.004−0.0040.1720.1390.1510.121−0.009−0.007−0.008−0.006
L γ σ η BB1BB2aBB2cBB1BB2aBB2cBB1BB2aBB2cMBBBB1BB2aBB2cMBB
T = 3 90.203.20−0.105−0.062−0.0580.0140.0070.006−0.089−0.068−0.060−0.0810.0090.0050.0040.008
0.502.00−0.064−0.054−0.0490.0080.0070.006−0.036−0.042−0.033−0.0600.0020.0030.0020.005
0.800.800.0370.0260.028−0.004−0.002−0.0020.1340.1020.1120.054−0.013−0.008−0.0090.016
T = 6 150.203.20−0.037−0.008−0.0030.0030.0010.000−0.027−0.015−0.006−0.0100.0010.000−0.001−0.000
0.502.00−0.024−0.013−0.0060.0020.0010.001−0.005−0.016−0.005−0.0050.0000.001−0.000−0.000
0.800.800.021−0.0010.013−0.0010.001−0.0010.0710.0200.036−0.039−0.006−0.001−0.0030.004
T = 9 210.203.20−0.020−0.0030.0020.0010.0000.000−0.012−0.0060.001−0.0000.0000.000−0.000−0.000
0.502.00−0.011−0.0060.0020.0010.0010.0000.004−0.0060.0030.004−0.0000.000−0.000−0.000
0.800.800.018−0.0010.013−0.0010.001−0.0000.0590.0150.0280.007−0.004−0.000−0.0010.000
* R = 10 , 000 simulation replications. Design parameter values: N = 200 , S N R = 3 , D E N y = 4 . 0 , E V F x = 0 . 0 , ρ ¯ x ε = 0 . 0 , ξ = 0 . 8 , κ = 0 . 00 , σ ε = 1 , q = 1 , ϕ = 1 . 0 . These yield the DGP parameter values: π λ = 0 . 00 , π η = 0 . 00 , σ v = 0 . 60 , σ η = 4 . 0 * ( 1 γ ) , ρ v ε = 0 . 0 (and ρ ¯ x η = 0 . 00 , ρ ¯ x λ = 0 . 00 ).
Table 16. P5fc-EC *.
Table 16. P5fc-EC *.
Feasible Coefficient Estimators for Arellano-Bond ( θ = 1 )
ρ ¯ x ε = 0 . 3 AB1AB2aAB2cMAB
L γ BiasStdvRMSEBiasStdvRMSEBiasStdvRMSEBiasStdvRMSE
T = 3 40.20−0.1650.2970.340−0.1690.2980.343−0.1690.2910.337−0.1780.2870.338
0.50−0.1910.3790.424−0.1970.3720.421−0.1950.3600.409−0.2040.3570.412
0.80−0.1810.4770.511−0.1950.4740.513−0.1970.4530.494−0.1970.4430.485
T = 6 100.20−0.0970.1010.140−0.0900.1010.135−0.0990.0990.141−0.0980.1000.140
0.50−0.1040.1110.152−0.0920.1120.144−0.0960.1080.144−0.0930.1070.142
0.80−0.0910.1260.155−0.0780.1230.146−0.0830.1210.147−0.0680.1100.130
T = 9 160.20−0.0730.0700.101−0.0620.0660.091−0.0730.0680.100−0.0600.0620.086
0.50−0.0770.0750.108−0.0630.0720.096−0.0690.0720.100−0.0560.0650.086
0.80−0.0640.0800.102−0.0500.0760.091−0.0550.0760.094−0.0390.0630.074
L β BiasStdvRMSEBiasStdvRMSEBiasStdvRMSEBiasStdvRMSE
T = 3 41.430.7901.4551.6550.8101.4511.6610.8041.4201.6320.8231.3621.591
0.930.6131.3221.4570.6381.2861.4360.6441.2421.3990.6311.1951.351
0.310.3621.1091.1660.3941.1001.1680.4181.0681.1460.3830.9891.061
T = 6 101.430.4090.4320.5950.3990.4370.5910.4200.4230.5960.3810.3950.548
0.930.3240.3620.4860.3070.3640.4760.3140.3510.4710.2640.3090.406
0.310.1780.2830.3340.1700.2760.3240.1790.2730.3260.1210.2160.247
T = 9 161.430.2860.2700.3940.2550.2610.3650.2890.2620.3900.2130.2180.305
0.930.2350.2350.3320.2050.2240.3040.2220.2240.3150.1570.1790.238
0.310.1290.1790.2210.1120.1670.2010.1230.1710.2110.0700.1210.140
Feasible Coefficient Estimators for Blundell-Bond ( θ = 1 )
ρ ¯ x ε = 0 . 3 BB1BB2aBB2cMBB
L γ BiasStdvRMSEBiasStdvRMSEBiasStdvRMSEBiasStdvRMSE
T = 3 70.20−0.1060.1690.199−0.1020.1790.206−0.1080.1740.205−0.0890.1870.207
0.50−0.1260.1870.225−0.1170.1950.228−0.1200.1930.227−0.0890.1870.207
0.80−0.1230.1980.233−0.1120.2090.237−0.1120.2030.232−0.0630.2570.265
T = 6 130.20−0.0660.0800.104−0.0540.0780.095−0.0640.0790.102−0.0500.0780.093
0.50−0.0730.0860.113−0.0550.0830.100−0.0600.0840.103−0.0390.0840.093
0.80−0.0640.0890.110−0.0440.0840.094−0.0470.0860.098−0.0050.0860.086
T = 9 190.20−0.0540.0590.080−0.0420.0560.070−0.0510.0570.077−0.0370.0500.063
0.50−0.0580.0630.085−0.0420.0590.072−0.0470.0600.076−0.0310.0520.061
0.80−0.0480.0630.079−0.0310.0570.065−0.0340.0590.068−0.0120.0500.052
L β BiasStdvRMSEBiasStdvRMSEBiasStdvRMSEBiasStdvRMSE
T = 3 71.430.4950.7780.9220.4890.8370.9700.5000.7930.9370.4990.7510.901
0.930.4190.6430.7680.4030.6750.7860.4130.6500.7700.4650.5940.754
0.310.2760.5040.5740.2650.5110.5760.2790.4950.5680.4060.4420.600
T = 6 131.430.2840.3210.4290.2470.3180.4030.2760.3150.4190.2190.2970.369
0.930.2440.2780.3700.2020.2700.3370.2160.2680.3440.1620.2380.288
0.310.1510.2160.2640.1250.2030.2380.1350.2030.2430.1160.1590.196
T = 9 191.430.2200.2230.3130.1800.2110.2780.2100.2150.3000.1430.1800.230
0.930.1900.1970.2730.1510.1830.2370.1670.1850.2490.1080.1510.186
0.310.1150.1500.1890.0920.1350.1630.1030.1380.1720.0630.1050.122
* R = 10 , 000 simulation replications. Design parameter values: N = 200 , S N R = 3 , D E N y = 1 . 0 , E V F x = 0 . 0 , ρ ¯ x ε = 0 . 3 , ξ = 0 . 8 , κ = 0 . 00 , σ ε = 1 , q = 1 , ϕ = 1 . 0 . These yield the DGP parameter values: π λ = 0 . 00 , π η = 0 . 00 , σ v = 0 . 60 , σ η = 1 . 0 * ( 1 γ ) , ρ v ε = 0 . 5 (and ρ ¯ x η = 0 . 00 , ρ ¯ x λ = 0 . 00 ).
Table 17. P5ft-EC *.
Table 17. P5ft-EC *.
Feasible t-Test: Actual Significance Level ( θ = 1 )
ρ ¯ x ε = 0 . 3 Arellano-Bond Blundell-Bond
L γ AB1AB1aRAB1cRAB2aAB2aWAB2cAB2cW L γ BB1BB1aRBB1cRBB2aBB2aWBB2cBB2cW
T = 3 40.200.2520.1550.1180.1750.1640.1340.13470.200.2630.1490.1130.1980.1470.1460.129
0.500.2830.1730.1400.1950.1750.1610.154 0.500.3010.1660.1420.2100.1500.1650.134
0.800.2740.1700.1460.1910.1660.1680.158 0.800.2970.1450.1350.1880.1270.1470.111
T = 6 100.200.4200.2500.1960.3160.2030.2250.211130.200.3590.1950.1400.2470.1410.1590.143
0.500.4040.2310.2040.2890.1770.2030.187 0.500.3670.1930.1630.2360.1290.1490.130
0.800.3400.1730.1570.2200.1270.1540.147 0.800.3320.1530.1300.1850.0980.1090.098
T = 9 160.200.4530.2580.2180.3590.2010.2370.224190.200.3990.2150.1650.2940.1490.1690.157
0.500.4400.2450.2150.3270.1770.2060.194 0.500.4050.2090.1760.2790.1350.1520.140
0.800.3530.1750.1610.2470.1180.1450.140 0.800.3390.1550.1380.2160.0980.1020.096
L β AB1AB1aRAB1cRAB2aAB2aWAB2cAB2cW L β BB1BB1aRBB1cRBB2aBB2aWBB2cBB2cW
T = 3 41.430.2530.1540.1140.1750.1670.1320.12871.430.2650.1510.1300.2050.1520.1620.136
0.930.2740.1730.1350.1960.1830.1550.147 0.930.3040.1800.1640.2230.1600.1850.155
0.310.2590.1540.1300.1760.1590.1480.139 0.310.2880.1580.1510.1950.1400.1630.139
T = 6 101.430.4290.2550.2170.3360.2190.2430.227131.430.3810.2150.1790.2800.1650.1940.168
0.930.4130.2340.2070.3050.1950.2180.202 0.930.3870.2130.1920.2660.1570.1850.168
0.310.3190.1540.1360.2130.1380.1490.140 0.310.3300.1650.1460.2260.1350.1460.139
T = 9 161.430.4630.2700.2380.3750.2150.2620.248191.430.4230.2390.2030.3230.1730.2060.192
0.930.4430.2440.2260.3500.1930.2260.214 0.930.4250.2280.2110.3180.1660.1960.186
0.310.3480.1610.1470.2490.1290.1490.145 0.310.3550.1670.1530.2620.1350.1500.145
* R = 10 , 000 simulation replications. Design parameter values: N = 200 , S N R = 3 , D E N y = 1 . 0 , E V F x = 0 . 0 , ρ ¯ x ε = 0 . 3 , ξ = 0 . 8 , κ = 0 . 00 , σ ε = 1 , q = 1 , These yield the DGP parameter values: π λ = 0 . 00 , π η = 0 . 00 , σ v = 0 . 60 , σ η = 1 . 0 * ( 1 γ ) , ρ v ε = 0 . 5 (and ρ ¯ x η = 0 . 00 , ρ ¯ x λ = 0 . 00 ).
Table 18. P5fJ-EC *.
Table 18. P5fJ-EC *.
Feasible Sargan-Hansen Test: Rejection Probability
ρ ¯ x ε = 0 . 3 df θ = 1
ABBBInc γ JAB a ( 2 , 1 ) JBB a ( 2 , 1 ) JES a ( 2 , 1 ) JAB c ( 2 , 1 ) JBB c ( 2 , 1 ) JES c ( 2 , 1 )
T = 3 2420.200.0360.0420.0680.0320.0450.072
0.500.0440.0500.0740.0380.0550.077
0.800.0670.0570.0690.0610.0640.074
T = 6 81020.200.0650.0640.0720.0560.0570.074
0.500.0710.0660.0670.0660.0630.070
0.800.0600.0580.0640.0630.0620.065
T = 9 141620.200.0630.0580.0630.0530.0580.072
0.500.0630.0560.0660.0600.0620.072
0.800.0530.0480.0610.0530.0560.066
* R = 10 , 000 simulation replications. Design parameter values: N = 200 , S N R = 3 , D E N y = 1 . 0 , E V F x = 0 . 0 , ρ ¯ x ε = 0 . 3 , ξ = 0 . 8 , κ = 0 . 00 , σ ε = 1 , q = 1 , ϕ = 1 . 0 . These yield the DGP parameter values: π λ = 0 . 00 , π η = 0 . 00 , σ v = 0 . 60 , σ η = 1 . 0 * ( 1 γ ) , ρ v ε = 0 . 5 (and ρ ¯ x η = 0 . 00 , ρ ¯ x λ = 0 . 00 ).
Table 19. P5fJ-jA *, j = E,W,X.
Table 19. P5fJ-jA *, j = E,W,X.
P5fJ-EA *. Feasible Sargan-Hansen Test: Rejection Probability
ρ ¯ x ε = 0 . 3 df θ = 1
ABBBInc γ JAB a ( 2 , 1 ) JBB a ( 2 , 1 ) JES a ( 2 , 1 ) JAB c ( 2 , 1 ) JBB c ( 2 , 1 ) JES c ( 2 , 1 ) JAB a ( 1 , 1 ) JBB a ( 1 , 1 ) JES a ( 1 , 1 ) JAB a ( 2 , 2 ) JBB a ( 2 , 2 ) JES a ( 2 , 2 ) JAB c ( 2 , 2 ) JBB c ( 2 , 2 ) JES c ( 2 , 2 )
T = 3 4730.200.0520.0470.0590.0390.0400.0530.0740.0740.079 0.0520.0480.0590.0430.0400.053
0.500.0650.0570.0640.0430.0480.0630.0880.0940.091 0.0660.0570.0660.0470.0450.057
0.800.0830.0650.0680.0580.0550.0690.1110.1080.107 0.0880.0660.0700.0610.0470.062
T = 6 283790.200.0650.0560.0620.0420.0480.0570.1050.1070.091 0.0720.0640.0670.0450.0440.049
0.500.0740.0620.0620.0550.0590.0630.1160.1230.099 0.0770.0690.0700.0530.0440.047
0.800.0810.0590.0570.0710.0650.0620.1280.1280.103 0.0830.0680.0670.0650.0420.041
T = 9 7085150.200.0300.0180.0490.0460.0540.0600.0550.0500.074 0.0400.0330.0650.0480.0490.046
0.500.0320.0210.0480.0600.0700.0730.0550.0550.082 0.0420.0360.0670.0530.0500.048
0.800.0320.0160.0430.0700.0780.0670.0590.0540.086 0.0420.0330.0650.0580.0440.041
P5fJ-XA *. Feasible Sargan-Hansen Test: Rejection ProbabilityP5fJ-WA *. Feasible Sargan-Hansen Test: Rejection Probability
ρ ¯ x ε = 0 . 3 df θ = 1 df θ = 1
ABBBIncγ JAB a ( 2 , 1 ) JBB a ( 2 , 1 ) JES a ( 2 , 1 ) JAB c ( 2 , 1 ) JBB c ( 2 , 1 ) JES c ( 2 , 1 ) ABBBIncγ JAB a ( 2 , 1 ) JBB a ( 2 , 1 ) JES a ( 2 , 1 ) JAB c ( 2 , 1 ) JBB c ( 2 , 1 ) JES c ( 2 , 1 )
T = 3 91340.200.1030.1280.1030.0880.1330.12461040.200.1100.1300.0990.0770.1110.112
0.500.1080.1360.1030.0980.1490.131 0.500.1200.1550.1150.0950.1420.133
0.800.1750.2200.1410.1870.2910.219 0.800.1820.2520.1780.1710.2780.235
T = 6 4858100.200.1480.2150.1690.1830.2500.1643343100.200.2170.3160.1890.1840.2700.199
0.500.1450.2480.2180.2230.3360.243 0.500.2330.3820.2370.2440.3950.293
0.800.3240.4880.2510.5760.7710.507 0.800.4650.6700.3190.5800.8240.580
T = 9 114130160.200.0130.0040.0490.2900.3680.1897894160.200.1410.1380.0930.3030.4160.255
0.500.0110.0060.0690.3390.4840.319 0.500.1380.1710.1250.3820.5890.411
0.800.0330.0160.0520.8150.9400.619 0.800.3490.4010.1220.8350.9680.731
* R = 10 , 000 simulation replications. Design parameter values: N = 200 , S N R = 3 , D E N y = 1 . 0 , E V F x = 0 . 0 , ρ ¯ x ε = 0 . 3 , ξ = 0 . 8 , κ = 0 . 00 , σ ε = 1 , q = 1 , ϕ = 1 . 0 . These yield the DGP parameter values: π λ = 0 . 00 , π η = 0 . 00 , σ v = 0 . 60 , σ η = 1 . 0 * ( 1 γ ) , ρ v ε = 0 . 5 (and ρ ¯ x η = 0 . 00 , ρ ¯ x λ = 0 . 00 ).
Table 20. P ϕ 0fc-XA *.
Table 20. P ϕ 0fc-XA *.
Feasible Coefficient Estimators for Arellano-Bond ( θ = 1 )
ρ ¯ x ε = 0 . 0 AB1AB2aAB2cMAB
L γ BiasStdvRMSEBiasStdvRMSEBiasStdvRMSEBiasStdvRMSE
T = 3 110.20−0.0240.0890.092−0.0200.0840.086−0.0230.0840.087−0.0230.0860.089
0.50−0.0440.1160.124−0.0380.1100.117−0.0430.1100.118−0.0430.1130.121
0.80−0.1460.1990.246−0.1360.1930.236−0.1440.1890.237−0.1430.1940.241
T = 6 500.20−0.0200.0460.050−0.0160.0410.044−0.0170.0410.045−0.0150.0370.040
0.50−0.0350.0530.064−0.0300.0490.057−0.0310.0480.057−0.0270.0440.052
0.80−0.1050.0800.132−0.0930.0760.120−0.0980.0720.122−0.0890.0700.113
T = 9 1160.20−0.0170.0340.038−0.0150.0320.035−0.0150.0290.033−0.0110.0250.027
0.50−0.0290.0370.047−0.0260.0350.044−0.0250.0320.041−0.0200.0280.035
0.80−0.0790.0500.094−0.0750.0490.089−0.0730.0440.085−0.0640.0420.076
L β BiasStdvRMSEBiasStdvRMSEBiasStdvRMSEBiasStdvRMSE
T = 3 111.430.0060.1580.1580.0050.1450.1450.0050.1470.1470.0050.1480.148
0.930.0040.1570.1570.0030.1440.1440.0030.1460.1460.0040.1470.147
0.31−0.0040.1520.152−0.0060.1400.140−0.0050.1420.143−0.0040.1420.142
T = 6 501.430.0130.0870.0880.0100.0770.0770.0110.0780.0790.0090.0670.068
0.930.0140.0850.0860.0110.0740.0750.0120.0760.0770.0110.0650.066
0.310.0070.0820.0820.0050.0720.0720.0060.0730.0740.0060.0630.063
T = 9 1161.430.0140.0650.0670.0130.0600.0620.0120.0570.0580.0100.0460.047
0.930.0170.0620.0650.0160.0580.0600.0150.0540.0560.0120.0440.046
0.310.0120.0590.0600.0110.0550.0560.0110.0510.0520.0090.0420.043
Feasible Coefficient Estimators for Blundell-Bond ( θ = 1 )
ρ ¯ x ε = 0 . 0 BB1BB2aBB2cMBB
L γ BiasStdvRMSEBiasStdvRMSEBiasStdvRMSEBiasStdvRMSE
T = 3 160.200.0360.0730.0820.0560.0750.0940.0610.0740.0960.0600.0780.099
0.500.0110.0840.0840.0360.0790.0870.0390.0780.0870.0620.1010.119
0.80−0.0400.1080.116−0.0100.1030.1040.0050.1010.1010.0670.1620.176
T = 6 610.200.0090.0410.0420.0160.0380.0420.0360.0400.0540.0280.0370.047
0.50−0.0060.0450.0450.0090.0420.0430.0320.0410.0520.0410.0420.059
0.80−0.0490.0550.074−0.0280.0510.0580.0070.0450.0450.0400.0550.068
T = 9 1330.200.0020.0310.0310.0030.0300.0300.0230.0290.0370.0140.0250.029
0.50−0.0110.0330.034−0.0070.0310.0320.0250.0290.0380.0230.0280.036
0.80−0.0480.0390.062−0.0420.0380.0560.0070.0300.0310.0210.0340.040
L β BiasStdvRMSEBiasStdvRMSEBiasStdvRMSEBiasStdvRMSE
T = 3 161.430.0380.1530.1580.0640.1490.1620.0580.1460.1580.0430.1440.150
0.930.0290.1490.1520.0470.1390.1470.0480.1380.1460.0490.1410.150
0.310.0130.1470.1470.0160.1360.1370.0190.1360.1380.0480.1560.164
T = 6 611.430.0080.0860.0860.0100.0780.0790.0040.0780.078−0.0010.0670.067
0.930.0140.0830.0840.0150.0740.0760.0110.0740.0740.0030.0640.065
0.310.0100.0790.0800.0090.0700.0710.0090.0680.0690.0100.0610.062
T = 9 1331.430.0050.0640.0640.0050.0610.061−0.0060.0560.057−0.0040.0470.047
0.930.0120.0610.0620.0110.0580.059−0.0000.0530.053−0.0030.0450.045
0.310.0110.0570.0590.0110.0540.0550.0050.0480.0480.0030.0410.041
* R = 10 , 000 simulation replications. Design parameter values: N = 200 , S N R = 3 , D E N y = 1 . 0 , E V F x = 0 . 0 , ρ ¯ x ε = 0 . 0 , ξ = 0 . 8 , κ = 0 . 00 , σ ε = 1 , q = 1 , ϕ = 0 . 5 . These yield the DGP parameter values: π λ = 0 . 00 , π η = 0 . 00 , σ v = 0 . 60 , σ η = 1 . 0 * ( 1 γ ) , ρ v ε = 0 . 0 (and ρ ¯ x η = 0 . 00 , ρ ¯ x λ = 0 . 00 ).
Table 21. P ϕ 0ft-XA *.
Table 21. P ϕ 0ft-XA *.
Feasible t-Test: Actual Significance Level ( θ = 1 )
ρ ¯ x ε = 0 . 0 Arellano-Bond Blundell-Bond
L γ AB1AB1aRAB1cRAB2aAB2aWAB2cAB2cW L γ BB1BB1aRBB1cRBB2aBB2aWBB2cBB2cW
T = 3 110.200.2250.0880.0730.1400.0750.0860.069160.200.2570.1180.0870.3220.1650.2130.187
0.500.2500.1030.0880.1590.0870.1060.086 0.500.2190.0830.0610.2450.1200.1350.122
0.800.3530.1910.1700.2540.1470.1960.171 0.800.2400.0900.0600.1960.0710.0840.069
T = 6 500.200.2500.0920.0670.3610.0790.0770.065610.200.2190.0800.0490.4450.0900.2050.156
0.500.3070.1250.1010.4110.0990.1140.096 0.500.2120.0680.0450.4350.0740.1840.151
0.800.5490.3220.2820.6240.2360.3120.277 0.800.3820.1760.1270.4990.0880.0850.066
T = 9 1160.200.2740.1010.0740.6930.0930.0880.0771330.200.2100.0670.0440.7300.0640.1640.124
0.500.3490.1480.1170.7340.1380.1340.116 0.500.2330.0750.0540.7290.0670.1820.140
0.800.6550.4040.3630.8930.3760.4120.379 0.800.5110.2670.2090.8660.2170.0850.066
L β AB1AB1aRAB1cRAB2aAB2aWAB2cAB2cW L β BB1BB1aRBB1cRBB2aBB2aWBB2cBB2cW
T = 3 111.430.2150.0700.0640.1170.0630.0710.056161.430.2300.0830.0730.2140.1060.1060.096
0.930.2160.0720.0640.1180.0630.0710.054 0.930.2260.0770.0670.1900.0890.0970.085
0.310.2150.0690.0630.1170.0610.0720.056 0.310.2200.0730.0620.1630.0690.0830.068
T = 6 501.430.2210.0680.0590.3150.0600.0680.058611.430.2190.0700.0540.3870.0650.0700.059
0.930.2270.0700.0610.3140.0640.0690.059 0.930.2270.0700.0580.3950.0700.0720.060
0.310.2320.0660.0640.3150.0570.0690.058 0.310.2300.0690.0600.3950.0660.0690.058
T = 9 1161.430.2320.0670.0560.6540.0680.0650.0561331.430.2210.0650.0520.7190.0630.0660.054
0.930.2410.0730.0650.6570.0710.0700.060 0.930.2290.0690.0590.7210.0670.0660.055
0.310.2420.0680.0690.6570.0660.0720.061 0.310.2360.0700.0650.7340.0670.0670.057
* R = 10 , 000 simulation replications. Design parameter values: N = 200 , S N R = 3 , D E N y = 1 . 0 , E V F x = 0 . 0 , ρ ¯ x ε = 0 . 0 , ξ = 0 . 8 , κ = 0 . 00 , σ ε = 1 , q = 1 , ϕ = 0 . 5 . These yield the DGP parameter values: π λ = 0 . 00 , π η = 0 . 00 , σ v = 0 . 60 , σ η = 1 . 0 * ( 1 γ ) , ρ v ε = 0 . 0 (and ρ ¯ x η = 0 . 00 , ρ ¯ x λ = 0 . 00 ).
Table 22. P ϕ 0fJ-XA *.
Table 22. P ϕ 0fJ-XA *.
Feasible Sargan-Hansen Test: Rejection Probability
ρ ¯ x ε = 0 . 0 df θ = 1
ABBBInc γ JAB a ( 2 , 1 ) JBB a ( 2 , 1 ) JES a ( 2 , 1 ) JAB c ( 2 , 1 ) JBB c ( 2 , 1 ) JES c ( 2 , 1 )
T = 3 91340.200.0360.2890.5050.0370.2400.412
0.500.0390.1010.1690.0410.1010.146
0.800.0560.0480.0600.0600.0570.063
T = 6 4858100.200.0170.2240.6600.0200.1920.596
0.500.0190.0820.3270.0220.0710.201
0.800.0230.0240.0740.0320.0350.045
T = 9 114130160.200.0010.0040.3500.0160.1340.642
0.500.0010.0020.2190.0160.0560.264
0.800.0000.0010.0680.0240.0250.039
* R = 10 , 000 simulation replications. Design parameter values: N = 200 , S N R = 3 , D E N y = 1 . 0 , E V F x = 0 . 0 , ρ ¯ x ε = 0 . 0 , ξ = 0 . 8 , κ = 0 . 00 , σ ε = 1 , q = 1 , ϕ = 0 . 5 . These yield the DGP parameter values: π λ = 0 . 00 , π η = 0 . 00 , σ v = 0 . 60 , σ η = 1 . 0 * ( 1 γ ) , ρ v ε = 0 . 0 (and ρ ¯ x η = 0 . 00 , ρ ¯ x λ = 0 . 00 ).
Table 23. P ϕ 5fc-EA*.
Table 23. P ϕ 5fc-EA*.
Feasible Coefficient Estimators for Arellano-Bond ( θ = 1 )
ρ ¯ x ε = 0 . 3 AB1AB2aAB2cMAB
L γ BiasStdvRMSEBiasStdvRMSEBiasStdvRMSEBiasStdvRMSE
T = 3 60.20−0.1690.1930.256−0.1670.1910.254−0.1820.1840.259−0.1770.1910.260
0.50−0.2410.2510.349−0.2430.2510.350−0.2530.2470.354−0.2500.2470.352
0.80−0.2920.3350.444−0.3110.3320.455−0.3120.3270.452−0.2980.3280.443
T = 6 300.20−0.1440.0700.160−0.1350.0690.151−0.1490.0650.163−0.1450.0630.158
0.50−0.1870.0870.206−0.1780.0870.198−0.1830.0820.200−0.1970.0830.214
0.80−0.2030.1090.230−0.1910.1100.221−0.1960.1020.221−0.1920.1050.219
T = 9 720.20−0.1420.0480.150−0.1340.0470.142−0.1440.0430.150−0.1250.0390.131
0.50−0.1690.0570.179−0.1630.0560.172−0.1630.0520.171−0.1590.0490.167
0.80−0.1650.0680.179−0.1570.0670.171−0.1570.0620.169−0.1500.0620.162
L β BiasStdvRMSEBiasStdvRMSEBiasStdvRMSEBiasStdvRMSE
T = 3 61.430.6770.8161.0600.6850.8041.0560.7190.7781.0600.6820.7881.043
0.930.6600.7410.9920.6680.7280.9880.6900.7230.9990.6630.7170.977
0.310.5340.6950.8770.5540.6690.8680.5680.6720.8800.5330.6750.860
T = 6 301.430.5150.2570.5760.4950.2550.5570.5290.2400.5810.4840.2270.535
0.930.5080.2400.5620.4890.2390.5440.4980.2280.5470.4890.2160.534
0.310.3680.2140.4250.3470.2080.4040.3570.2010.4100.3120.1810.361
T = 9 721.430.4850.1660.5120.4620.1610.4900.4890.1480.5110.4090.1320.429
0.930.4720.1570.4980.4540.1530.4790.4560.1420.4780.4130.1280.433
0.310.3270.1350.3540.3070.1280.3320.3120.1220.3350.2580.1040.279
Feasible Coefficient Estimators for Blundell-Bond ( θ = 1 )
ρ ¯ x ε = 0 . 3 BB1BB2aBB2cMBB
L γ BiasStdvRMSEBiasStdvRMSEBiasStdvRMSEBiasStdvRMSE
T = 3 100.200.0300.1340.1370.0670.1520.1660.0640.1530.1660.1100.1900.219
0.50−0.0480.1470.155−0.0140.1570.158−0.0130.1600.1600.0380.2000.204
0.80−0.1240.1600.202−0.1030.1720.201−0.0920.1690.192−0.0440.2080.213
T = 6 400.20−0.0480.0620.079−0.0240.0650.069−0.0040.0710.0710.0040.0750.075
0.50−0.0800.0650.103−0.0400.0680.079−0.0150.0680.0690.0170.0770.079
0.80−0.1020.0660.122−0.0690.0640.094−0.0410.0600.072−0.0030.0690.069
T = 9 880.20−0.0830.0450.095−0.0720.0440.085−0.0500.0470.069−0.0530.0440.069
0.50−0.1040.0470.114−0.0860.0470.098−0.0420.0470.063−0.0340.0500.060
0.80−0.1040.0470.114−0.0860.0450.097−0.0380.0390.055−0.0180.0440.047
L β BiasStdvRMSEBiasStdvRMSEBiasStdvRMSEBiasStdvRMSE
T = 3 101.430.1690.5820.6060.1130.6600.6700.1180.6360.6470.1090.6290.638
0.930.3270.4920.5910.2950.5230.6010.2890.5200.5940.3350.4840.589
0.310.3420.4040.5300.3320.4200.5360.3160.4090.5170.4340.3620.565
T = 6 401.430.2910.2160.3620.2380.2270.3290.1920.2320.3010.1520.2290.275
0.930.3260.1930.3780.2560.1980.3230.2050.1900.2800.1490.1950.246
0.310.2760.1590.3190.2280.1490.2730.1940.1390.2390.1790.1310.222
T = 9 881.430.3510.1490.3810.3240.1450.3550.2740.1460.3100.2460.1350.281
0.930.3580.1360.3830.3230.1340.3500.2400.1290.2720.1950.1270.233
0.310.2720.1120.2940.2450.1050.2660.1830.0920.2050.1470.0870.171
* R = 10 , 000 simulation replications. Design parameter values: N = 200 , S N R = 3 , D E N y = 1 . 0 , E V F x = 0 . 0 , ρ ¯ x ε = 0 . 3 , ξ = 0 . 8 , κ = 0 . 00 , σ ε = 1 , q = 1 , ϕ = 0 . 5 . These yield the DGP parameter values: π λ = 0 . 00 , π η = 0 . 00 , σ v = 0 . 60 , σ η = 1 . 0 * ( 1 γ ) , ρ v ε = 0 . 5 (and ρ ¯ x η = 0 . 00 , ρ ¯ x λ = 0 . 00 ).
Table 24. Empirical findings for the Ziliak data by Arellano-Bond estimation.
Table 24. Empirical findings for the Ziliak data by Arellano-Bond estimation.
(1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)
AB1RAB1RAB1RAB2WAB1AB2AB1RAB2WAB1RCAB1RL2BB2W
1P3E1X1P3E1X1P3E1X1P3E1X1P3E1X1P3E1X1P2E2X1P2E2X1P2E2X1P2E2X1P2E2X
γ 1 0.207 **0.208 **0.190 **0.200 **0.208 **0.200 **0.211 **0.202 **0.251 **0.0810.305 **
(0.068)(0.069)(0.073)(0.063)(0.025)(0.015)(0.070)(0.065)(0.108)(0.142)(0.052)
γ 2 0.070 **0.069 **0.056 **0.078 **0.069 **0.078 **0.071 **0.082 **0.058 **0.122 **0.159 **
(0.030)(0.029)(0.028)(0.029)(0.023)(0.011)(0.030)(0.029)(0.029)(0.045)(0.037)
β 0 w 0.629 **0.625 **0.617 **0.438 **0.625 **0.438 **0.588 **0.429 **0.0400.518 **0.383 **
(0.204)(0.202)(0.209)(0.181)(0.095)(0.053)(0.197)(0.167)(0.210)(0.282)(0.143)
β 1 w 0.001−0.019−0.013−0.032−0.019−0.032−0.048−0.054−0.038−0.210 *−0.225
(0.123)(0.121)(0.115)(0.112)(0.069)(0.037)(0.112)(0.106)(0.124)(0.178)(0.080)
β 2 w −0.070 *−0.080 *−0.078 *−0.058 *−0.080 *−0.058 **−0.098 *−0.076 *−0.090 *0.012−0.142 **
(0.065)(0.064)(0.062)(0.056)(0.042)(0.022)(0.061)(0.054)(0.072)(0.101)(0.050)
β 0 k −0.047−0.047−0.0460.006−0.0470.006−0.024 *−0.014 *−0.019 *−0.027 *−0.006
(0.085)(0.079)(0.083)(0.061)(0.056)(0.029)(0.014)(0.010)(0.015)(0.017)(0.010)
β 1 k 0.0160.0080.006−0.0320.008−0.032 *0.0090.003−0.011−0.089 *0.004
(0.069)(0.064)(0.068)(0.052)(0.052)(0.026)(0.011)(0.009)(0.014)(0.073)(0.009)
β 2 k 0.0080.0080.0080.0050.0080.0050.0010.0070.0060.102 *0.003
(0.016)(0.015)(0.015)(0.012)(0.014)(0.008)(0.012)(0.009)(0.014)(0.070)(0.008)
β 0 d −0.154 *−0.118 *−0.112 *−0.072 *−0.118 *−0.072 *−0.112 *−0.0730.321 *−0.038−0.046
(0.092)(0.090)(0.090)(0.071)(0.084)(0.038)(0.086)(0.077)(0.251)(0.211)(0.062)
β 1 d 0.0150.0170.0200.0150.0170.0150.0060.0030.0430.124−0.004
(0.048)(0.047)(0.047)(0.044)(0.040)(0.018)(0.046)(0.043)(0.083)(0.197)(0.036)
β 2 d 0.069 **0.072 **0.071 **0.053 **0.072 **0.053 **0.065 *0.049 *0.070 *0.079 **0.045
(0.034)(0.034)(0.034)(0.030)(0.033)(0.014)(0.033)(0.030)(0.040)(0.039)(0.026)
β a −0.0100.0070.0080.0110.0070.011 *0.008−0.0010.0240.0020.003
(0.023)(0.019)(0.019)(0.017)(0.017)(0.011)(0.015)(0.013)(0.021)(0.020)(0.006)
β a a −0.0000−0.0000−0.0001−0.0001−0.0000−0.0001 *−0.00000.0000−0.0003−0.0000−0.0000
(0.0002)(0.0002)(0.0002)(0.0002)(0.0002)(0.0001)(0.0002)(0.0002)(0.0003)(0.0003)(0.0001)
K2013131313131313131313
L1491491421491491491631634351197
A R ( 1 ) 0.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.000
A R ( 2 ) 0.1510.1500.2070.5020.0380.4270.1570.4900.2880.3070.336
J A B ( 1 , 0 ) 0.0000.0000.0000.0000.0000.0000.0000.0000.0000.0260.000
J A B a ( 1 , 1 ) 0.0570.1730.1170.1730.1730.1730.1370.1370.0050.0780.020
J A B a ( 2 , 1 ) 0.7830.7280.6430.7280.7280.7280.7280.7280.0840.2950.312
J A B a ( 2 , 2 ) 0.1390.2070.1570.2070.2070.2070.2180.2180.0340.1840.084
σ ^ η 0.2350.2340.2370.1720.2340.1720.2030.1550.1550.1790.068
σ ^ ε 0.2460.2460.2440.2370.2460.2370.2430.2360.2430.2450.242
T M w 0.7750.7280.6980.4820.7280.4820.6160.418−0.1270.4020.030

Share and Cite

MDPI and ACS Style

Kiviet, J.; Pleus, M.; Poldermans, R. Accuracy and Efficiency of Various GMM Inference Techniques in Dynamic Micro Panel Data Models. Econometrics 2017, 5, 14. https://doi.org/10.3390/econometrics5010014

AMA Style

Kiviet J, Pleus M, Poldermans R. Accuracy and Efficiency of Various GMM Inference Techniques in Dynamic Micro Panel Data Models. Econometrics. 2017; 5(1):14. https://doi.org/10.3390/econometrics5010014

Chicago/Turabian Style

Kiviet, Jan, Milan Pleus, and Rutger Poldermans. 2017. "Accuracy and Efficiency of Various GMM Inference Techniques in Dynamic Micro Panel Data Models" Econometrics 5, no. 1: 14. https://doi.org/10.3390/econometrics5010014

APA Style

Kiviet, J., Pleus, M., & Poldermans, R. (2017). Accuracy and Efficiency of Various GMM Inference Techniques in Dynamic Micro Panel Data Models. Econometrics, 5(1), 14. https://doi.org/10.3390/econometrics5010014

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop