Subset-Continuous-Updating GMM Estimators for Dynamic Panel Data Models

The two-step GMM estimators of Arellano and Bond (1991) and Blundell and Bond (1998) for dynamic panel data models have been widely used in empirical work; however, neither of them performs well in small samples with weak instruments. The continuous-updating GMM estimator proposed by Hansen, Heaton, and Yaron (1996) is in principle able to reduce the small-sample bias, but it involves high-dimensional optimizations when the number of regressors is large. This paper proposes a computationally feasible variation on these standard two-step GMM estimators by applying the idea of continuous-updating to the autoregressive parameter only, given the fact that the absolute value of the autoregressive parameter is less than unity as a necessary requirement for the data-generating process to be stationary. We show that our subset-continuous-updating method does not alter the asymptotic distribution of the two-step GMM estimators, and it therefore retains consistency. Our simulation results indicate that the subset-continuous-updating GMM estimators outperform their standard two-step counterparts in finite samples in terms of the estimation accuracy on the autoregressive parameter and the size of the Sargan-Hansen test.


Introduction
In recent decades, dynamic panel data models with unobserved individual-specific heterogeneity have been widely used to investigate the dynamics of economic activities. Several estimators have been suggested for estimating the model parameters. A standard estimation procedure is to first-difference the model, so as to eliminate the unobserved heterogeneity, and then base GMM estimation on the moment conditions implied where endogenous differences of the variables are instrumented by their lagged levels. This is the well known Arellano-Bond estimator, or first-difference (DIF) GMM estimator (see Arellano and Bond [1]). The DIF GMM estimator was found to be inefficient since it does not make use of all available moment conditions (see Ahn and Schmidt [2]); it also has very poor finite sample properties in dynamic panel data models with highly persistent series and large variations in the fixed effects relative to the idiosyncratic errors (see Blundell and Bond [3]) since the instruments in those cases become less informative.
To improve the performance of the DIF GMM estimator, Blundell and Bond [3] propose taking into consideration extra moment conditions from the level equation that rely on certain restrictions on the initial observations, as suggested by Arellano and Bover [4]. The resulting system (SYS) GMM estimator has been shown to perform much better than the DIF GMM estimator in terms of finite sample bias and mean squared error, as well as with regard to coefficient estimator standard errors since the instruments used for the level equation are still informative as the autoregressive coefficient approaches unity (see Blundell and Bond [3] and Blundell, Bond, and Windmeijer [5]). As a result, the SYS GMM estimator has been widely used for estimation of production functions, demand for addictive goods, empirical growth models, etc. However, it was pointed out later on (see Hayakawa [6] and Bun and Windmeijer [7]) that the weak instruments problem still remains in the SYS GMM estimator. Since the increase in the length of the panel leads to a quadratic increase in the number of instruments, the two-step DIF and SYS GMM estimators are both biased due to many weak moment conditions; see Newey and Windmeijer [8].
The work by Hansen, Heaton, and Yaron [9] suggests that the continuous-updating GMM estimator has smaller bias than the standard two-step GMM estimator. However, it involves high-dimensional optimizations when the number of regressors is large. Given the fact that the absolute value of the autoregressive parameter must be less than unity as a necessary requirement for the data-generating process to be stationary, we propose a computationally feasible variation on the two-step DIF and SYS GMM estimators, in which the idea of continuous-updating is applied solely to the autoregressive parameter; these two new estimators are denoted "SCUDIF" and "SCUSYS" below. Following the jackknife interpretation of the continuous-updating estimator in the work of Donald and Newey [10], we show that the subset-continuous-updating method that we propose in this paper does not alter the asymptotic distribution of the two-step GMM estimators, and it hence retains consistency. It is computationally advantageous relative to the continuous-updating estimator in that it replaces a relatively high-dimensional optimization over unbounded intervals by a one-dimensional optimization limited to the stationary domain (−1, 1) of the autoregressive parameter. We conduct Monte Carlo experiments and show that the proposed subset-continuous-updating versions of the DIF and SYS GMM estimators outperform their standard two-step counterparts in small samples in terms of the estimation accuracy on the autoregressive parameter and the rejection frequency of the Sargan-Hansen test.
The layout of the paper is as follows: Section 2 describes the model specification and our proposed subset-continuous-updating method; Section 3 describes the Monte Carlo experiments and presents the results; and Section 4 concludes the paper.

Subset-Continuous-Updating GMM Estimator
Consider a linear panel data model with one dynamic dependent variable y it , additional explanatory variables X it = (x 1 it , ..., x K it ), unobserved individual-specific fixed effects µ i , and idiosyncratic errors ν it : for i = 1, ..., N and t = 2, ..., T, where N is large and T is small. Here, θ is the autoregressive parameter and we make a familiar assumption as in the literature that it satisfies |θ| < 1 to ensure the stationarity of the model; β is a K-dimensional column vector of remaining coefficients. As Blundell, Bond, and Windmeijer [5] argue, this model specification is sufficient to cover most cases that researchers would encounter in linear dynamic panel applications. While our discussion applies to the general setup of dynamic panel data models in Equation (1), for expositional clarity, we consider a special case with a unique, i.e., K = 1, additional explanatory variable x it : for i = 1, ..., N and t = 2, ..., T. We also follow Blundell, Bond, and Windmeijer [5] to allow for persistence and endogeneity in the explanatory variable x it : 3 of 13 where ρ captures the persistence of x it , and τ and λ determine the correlation of x it with the individual effects µ i and the idiosyncratic errors ν it , respectively. We assume, at the outset, that µ i , ν it , and e it have the following properties: E(ν it ν is ) = 0, E(e it e is ) = 0 for i = 1, ..., N and ∀t = s, E(ν it e is ) = 0 for i = 1, ..., N and ∀t, s.
Furthermore, we impose mean-stationarity restrictions on the initial conditions: and Under these conditions, we consider both the DIF GMM estimator of Arellano and Bond [1] and the SYS GMM estimator of Blundell and Bond [3], which are derived from the following moment conditions: where w denotes the data, (θ 0 , β 0 ) are true parameters, and g = Z d ∆u for the DIF GMM estimator, g = Z s p for the SYS GMM estimator.
In the above equations, Z d is the m d × N(T − 2) matrix (Z d1 , Z d2 , ..., Z dN ) and Z s is the m s × 2N(T − 2) matrix (Z s1 , Z s2 , ..., Z sN ), where the number of instruments for the DIF GMM estimator is m d = (T − 2)(T − 1) and the instrument count for the SYS GMM estimator is m s = m d + 2(T − 2) in the case of K = 1; ∆u is the N(T − 2) vector (∆u 1 , ∆u 2 , ..., ∆u N ) and p is the 2N(T − 2) vector (p 1 , p 2 , ..., p N ) with and Z x di is similarly defined. The instrument for the system equation Z si is a block matrix with Z di and Z li on the main diagonals and zeros otherwise, i.e., where Z li is the instrument matrix for the level equation and and Z x li is defined similarly. In words, the DIF GMM estimator is obtained from the moment conditions where endogenous differences of the variables are instrumented by their lagged levels and the SYS GMM estimator utilizes further moment conditions where endogenous level variables are instrumented by their lagged differences. The validity of these moment conditions is tested by the Sargan-Hansen test of overidentifying restrictions (see Sargan [11] and Hansen [12]).
Let w i (i = 1, ..., N) denote the i-th observation and g i (θ, β) = g(w i , θ, β). The sample first and second moments of the g are given by:

Two-Step GMM Estimator:
The standard two-step GMM estimator is the solution to the following minimization problem: where θ, β is a preliminary estimator, e.g., the first-step estimator. 1 The first-order conditions are: 1 To obtain a consistent first-step estimator, we use in Equation (16), suggested by Arellano and Bond [1] and Blundell, Bond, and Windmeijer [5] for the DIF and SYS GMM estimators, respectively, where H d is a (T-2) square matrix that has twos in the main diagonal, minus ones in the first subdiagonals, and zeros otherwise, i.e., and H s is the matrix Econometrics 2016, 4, 47 5 of 13 where g = g θ, β , g θ = ∂ g θ, β /∂θ and g β = ∂ g θ, β /∂β. Subset-Continuous-Updating GMM Estimator: Motivated by the fact that the autoregressive parameter θ in a stationary dynamic panel data model lies in the bounded interval (−1, 1), we propose to apply the idea of continuous-updating of Hansen, Heaton, and Yaron [9] solely to this bounded parameter θ. The subset-continuous-updating GMM estimator is obtained as the solution to the following minimization problem: and where β is a preliminary estimator. 2 The first-order condition with respect to θ is: where and g = g θ, β , g θ = ∂ g θ, β /∂θ, g β = ∂ g θ, β /∂β, g i = g w i , θ, β , g θ,i = ∂g w i , θ, β /∂θ, and g β,i = ∂g w i , θ, β /∂β. We call the solution to the above minimization problem a subset-continuous-updating estimator to reflect the fact that we are applying the idea of continuous-updating of Hansen, Heaton, and Yaron [9] on a subset of model parameters. Given the linearity of the DIF and SYS GMM moment conditions, the minimization problem in Equation (19) yields a closed-form solution for β as a function of θ. The β estimator is of course consistent conditional on a consistent estimator θ. The θ estimator is consistent according to the jackknife interpretation in the work of Donald and Newey [10]. More specifically, consider the regression of g θ,i + g β,i ∂ β/∂θ on g i , let denote the matrix of coefficients. The vector of residuals follows: 2 In the Monte Carlo experiments, we conduct a bounded optimization over θ ∈ (−1, 1) using the two-step DIF or SYS GMM estimator of θ as the starting value until a convergence criterion is met. We set both the step tolerance and the function tolerance to a relatively small number, 10 −8 .
By definition, the residual η i is orthogonal to the regressor g i , i.e., Then, we can rewrite the first-order condition with respect to θ in Equation (21) as: which simplifies to essentially the same equation given by Donald and Newey [10] (at the bottom of page 240): Let A i denote the term inside the parentheses in Equation (27): This term converges to the same limit for all i = 1, ..., N as N → ∞ since g converges to zero in probability, i.e., g = o p (1). As Donald and Newey [10] point out, Equation (27) is simply a modification of the usual interpretation of a GMM estimator that allows "a linear combination coefficient for each observation, which excludes its own observation from the Jacobian of the moments." Given that A i has the same limit, this modification does not change the asymptotic distribution of the estimator, which implies that our subset-continuous-updating estimator retains consistency. While we use a scalar parameter β for expositional clarity here, our subset-continuous-updating method applies to a K-dimensional vector of parameters β without any extra computational burden because the continuous-updating is imposed on the scalar parameter θ only; see Equation (20). Applying our subset-continuous-updating method to the dynamic panel data model, the resulting SCUDIF and SCUSYS GMM estimators have exactly the same asymptotic distributions as their two-step counterparts in terms of the first-order terms. They are computationally advantageous relative to the continuous-updating estimator in that they replace a K + 1-dimensional numerical optimization over the unbounded domain of (θ, β) by a one-dimensional optimization over a bounded domain θ ∈ (−1, 1). It is worth noting that our subset-continuous-updating estimator can be easily extended to the AR(2) case, where necessary. This extension leads to a two-dimensional, instead of one-dimensional, optimization, which is more computationally burdensome, but the optimization is at least still limited to the well-defined, bounded region of the parameter space corresponding to stationary dynamics. Per Box and Jenkins [13], this AR(2) stationary region is a triangle.

Monte Carlo Experiments
In this section, we conduct Monte Carlo experiments to compare the performance of our subset-continuous-updating estimators with the standard two-step estimators in finite samples. We consider the model and assumptions specified in Section 2. Without loss of generality, we consider one additional explanatory variable beyond the lagged dependent variable, i.e., we restrict K = 1. This restriction is made for expositional clarity here, as our proposed estimation method applies to models with multiple explanatory variables (K > 1) without any extra computational burden because the continuous-updating is imposed solely on the bounded scalar parameter θ. The model specification is described by Equations (2) and (3).
For each Monte Carlo replication, µ i , ν it , and e it are all drawn from the normal distribution with zero means and standard deviations σ µ , σ ν , and σ e . The initial observations are drawn from the mean stationary distribution as in Equations (8) and (9). 3 Then, we generate the data x it and y it and discard the first 30 observations before selecting our sample. We keep the following parameters fixed in the various Monte Carlo simulations: These parameter values are taken from Blundell, Bond, and Windmeijer [5]. Following Blundell, Bond, and Windmeijer [5], we fix the sample size N at 500 and consider T = {4, 8}. We also further consider an even longer panel, i.e., T = 12. These results are presented in Tables 1-3, respectively, where we compare our subset-continuous-updating estimators (denoted SCUDIF and SCUSYS) to the standard two-step estimators of Arellano and Bond [1] and Blundell and Bond [3] (denoted DIF and SYS) from three perspectives: (1) the estimation accuracy, quantified by median absolute errors (MAE), (2) the sampling standard deviations across all simulation repetitions (SD), (3) the Windmeijer [14] corrected standard errors (SE), 4 (4) the size of the two-tailed t-test under the null hypothesis that the parameter equals the true value at the 5% significance level, and (5) the rejection frequency of the 5% Sargan-Hansen test (denoted RF SH ). All results are obtained from 10,000 Monte Carlo repetitions. The following findings are worth noting.
Firstly, we find evidence that the standard two-step DIF and SYS GMM estimates of θ are sensitive to the variance ratio σ 2 µ /σ 2 ν , which is consistent with the results of Hayakawa [6] and Bun and Windmeijer [7]. In particular, for any given combination of θ, ρ, and λ, the DIF and SYS GMM estimates of the autoregressive parameter become more biased as the variance ratio increases. In most cases, the bias in the β estimates also increases with the variance ratio. For example, with (θ, ρ, λ) = (0.8, 0.8, −0.4) and T = 4, the MAEs of θ DIF and θ SYS both become double when the variance ratio increases from 1/4 to 4; the MAEs of β DIF and β SYS become twice and four thirds bigger, respectively. In contrast, while the variance ratio also affects the estimation accuracy of the SCUDIF and SCUSYS GMM estimators, the MAE values for these estimators deteriorate less quickly than do the corresponding values for the standard two-step DIF and SYS GMM estimators.
Secondly, compared to the standard two-step DIF and SYS GMM estimators, our proposed subset-continuous-updating counterparts are noticeably less biased in estimating the autoregressive parameter, especially in the case of large variance ratios and relatively long panels, where the problem of too many weak instruments becomes more prominent. Consequently, for example, with 3 After we draw µ i , ν it , and e it from the normal distribution, we simply set the initial values of x it and y it to to ensure mean stationarity. 4 The standard error estimates for the two-step DIF and SYS GMM estimators of θ and β are calculated as in Windmeijer [14]. The same is obtained for the SCUDIF and SCUSYS GMM estimators of β conditional on θ. The standard error estimates for the SCUDIF and SCUSYS GMM estimators of θ are instead obtained from the Hessian matrix of the single-dimensional optimization problem. (θ, ρ, λ) = (0.8, 0.8, −0.4), σ 2 µ /σ 2 ν = 4, and T = 12, the MAE of θ SCUDIF is two-thirds that of θ DIF and the MAE of θ SCUSYS is only one-fourth of that of θ SYS . Summarizing the θ MAE results in Tables 1-3, the following matrix displays the frequencies in the cases corresponding to the large variance ratio of σ 2 µ /σ 2 ν = 4 with which our proposed subset-continuous-updating estimators have smaller MAE than the corresponding two-step estimators across the different (θ, ρ, λ) combinations: Estimation accuracy on θ T = 4 T = 8 T = 12 SCUDIF outperforms DIF 75.0% 100.0% 75.0% SCUSYS outperforms SYS 87.5% 100.0% 100.0% With regard to the β estimation, we do not observe any clear pattern in terms of the relative performance of these two sets of estimators. For example, with (θ, ρ, λ) = (0.8, 0.8, −0.4), σ 2 µ /σ 2 ν = 4, and T = 12, both SCUDIF and SCUSYS are respective improvements on DIF and SYS in estimating β. However, when x it becomes less persistent, i.e., ρ = 0.5, while SCUDIF still significantly outperforms DIF, SCUSYS performs worse than SYS in terms of the estimation accuracy on β.
Turning to the size estimates for the t-tests, we find that none of the standard two-step or the subset-continuous-updating estimators consistently yields well-sized t-tests for either θ or for β. For the DIF and SYS tests, this is a bias problem: the Windmeijer [14] corrected standard errors are generally good estimates of the sampling standard deviations for these standard two-step estimators, but this is no guarantee of a well-sized t-test when the parameter estimates suffer substantial biases. For example, with (θ = 0.8, ρ = 0.5, λ = −0.4) and σ 2 µ /σ 2 ν = 4, the empirical rejection frequencies of the t-tests of H 0 : θ = 0.8 and H 0 : β = 1 are both greater than 50% at all of the dataset lengths. The SCUDIF and SCUSYS estimators are less biased, but these subset-continuous-updating estimators lead to standard error estimates which are somewhat downward biased; hence, these tests are also over-sized. These results suggest that bootstrap-simulation based methods are necessary for credible structural parameter inference with samples of the lengths considered here, regardless of which of these estimators one chooses. We note, however, that the more precise estimation provided by the SCUDIF and SCUSYS theta estimators is also advantageous using simulation-based inference.
Lastly and most importantly, the standard two-step estimators tend to over-reject the Sargan-Hansen test and our subset-continuous-updating counterparts are generally well-sized. Given the fact that the endogenous variable x it and the dependent variable y it are both mean stationary, lagged levels and lagged differences of x it and y it are valid instruments for the first-difference equation and the level equation, respectively. Thus, presuming that the model is in all other respects well-specified, the null hypothesis of the Sargan-Hansen test is satisfied. With any panel length that we consider, T = {4, 8, 12}, the rejection frequency of the Sargan-Hansen test for our subset-continuous-updating estimators lies around 5% whereas it sometimes exceeds 15%, for instance with θ = 0.8, σ 2 µ /σ 2 ν = 4, and T = {8, 12}, when the standard two-step estimators are adopted. In the worst case scenario, (θ, ρ, λ) = (0.8, 0.5, −0.1), σ 2 µ /σ 2 ν = 4, and T = 8, the rejection frequency is even higher than 25%. In other words, the Sargan-Hansen associated with the standard two-step SYS GMM estimator has a 25 percent chance of incorrectly rejecting a true null hypothesis. In contrast, the 5% Sargan-Hansen test associated with the subset-continuous-updating SYS GMM estimator has a reasonable rejection frequency (size) of 6.2%.
In conclusion, the subset-continuous-updating method that we propose in this paper is shown to improve the estimation accuracy on the autoregressive parameter and the size of the Sargan-Hansen test in a dynamic panel data model when the variance ratio becomes large. No extra computational burden is incurred when we apply this method to models with multiple explanatory variables.

Conclusions
The two-step GMM estimators of Arellano and Bond [1] and Blundell and Bond [3] for dynamic panel data models have been widely used in empirical work. However, neither of them performs well in small samples with weak instruments. The continuous-updating GMM estimator proposed by Hansen, Heaton, and Yaron [9] is in principle able to reduce the small-sample bias, but it involves high-dimensional optimizations when the number of regressors is large. Given the fact that the absolute value of the autoregressive parameter is less than unity for a dynamic panel data model to be stationary, we propose a computationally feasible variation on the standard two-step GMM estimators by applying the idea of continuous-updating to the autoregressive parameter only. We show that our subset-continuous-updating method does not alter the asymptotic distribution of the two-step GMM estimators and hence retains consistency. According to our Monte Carlo simulation results, the subset-continuous-updating GMM estimators for dynamic panel data models outperform their standard two-step counterparts in finite samples in terms of the estimation accuracy on the autoregressive parameter and the size of the Sargan-Hansen test.