1. Introduction
This paper proposes estimating linear dynamic panels by recentering the cross moments between the lagged dependent variables, which are endogenous, and the error term in the model by their non-zero expectations.
2 The resulting estimator is named the recentered method of moments (RMM) estimator accordingly. These recentered moments are functions of model parameters and, together with moment conditions from other exogenous regressors, if any, form the basis for model estimation. Essentially, it is based on the idea that the best “instrument” for any of the endogenous lagged dependent variables is itself. As such, one does not need to search for IVs and can avoid issues of weak instruments and many instruments that exist in the GMM framework. It is closely related to the bias-correction literature, but there is no correction procedure involved. In particular,
Appendix B illustrates that the estimator proposed in this paper is
numerically equivalent to the indirect inference (II) estimator, as introduced by
Bao and Yu (
2023). However, they are motivated very differently. The strategy of
Bao and Yu (
2023) starts with a biased estimator, but this paper does not have a biased estimator to begin with and designs recentered moment conditions directly by expressing the cross moments between the endogenous lagged dependent variables and disturbances in terms of model parameters.
3 Furthermore, there are three major contributions in this paper that are absent elsewhere.
First, it allows for more general assumptions regarding the data-generating process (DGP). Specifically, the most general form of error heteroskedasticity, arising from cross-sectional units, time, or both, is considered.
Breitung et al. (
2022) focus primarily on the situation of cross-sectional heteroskedasticity.
4 Alvarez and Arellano (
2022) consider only time-series heteroskedasticity.
Bao and Yu (
2023) discuss a robust version of their II estimator that is valid under both forms of heteroskedasticity, though falling short of deriving its asymptotic distribution. They focus instead on the case when there is only time-series heteroskedasticity.
Bun and Carree (
2006) derive the asymptotic bias of the WG estimator when both forms of heteroskedasticity are present and propose bias correcting the WG estimator when
T is fixed for the first-order DP (DP(1)) model.
Juodis (
2013) points out that the bias-correction procedure of
Bun and Carree (
2006) is, in fact, inconsistent and designs a consistent one for a higher-order DP.
5 Note that both
Bun and Carree (
2006) and
Juodis (
2013) estimate the temporal heteroskedasticity parameters so that they can be plugged into the bias expression of the WG estimator for the purpose of bias correction. However, they do not provide insights on how to conduct asymptotic inference on the resulting bias-corrected estimator. In contrast,
Alvarez and Arellano (
2022) jointly estimate the temporal heteroskedasticity parameters and model parameters and also derive their joint asymptotic distribution.
Second, this paper explicitly includes the case when there is a unit root. If the time span is short, this may not matter, as the inference procedure is under the large-
N asymptotic regime, where
N is the number of cross-sectional units.
Bun and Carree (
2006),
Juodis (
2013),
Alvarez and Arellano (
2022), and
Bao and Yu (
2023) all consider short panels. But for long panels, the issue of unit root cannot be simply ignored.
Dhaene and Jochmans (
2015) do not consider the unit-root case, and their jackknife method is developed under the rectangular-array asymptotic regime (namely, both
N and
T are large).
Breitung et al. (
2022) also present some results under the rectangular-array regime, but do not explicitly consider the unit-root case.
Third, asymptotic distribution results are derived that, in general, do not require both
N and
T to be large. The asymptotic distribution of the proposed RMM estimator under large
T resembles the familiar ordinary least squares (OLS) result in traditional regression analysis, and its asymptotic variance achieves the efficiency bound under homoskedasticity. Similar to time series literature, the convergence rate of the estimator of the autoregressive parameters is different when there is a unit root under large
T, but the standard
t-test procedure carries through when one is conducting hypothesis testing. Under homoskedasticity,
Han and Phillips (
2010) report a unified asymptotic distribution result for their first difference least squares (FDLS) estimator for DP(1) when the unit-root case is allowed. In their setup, the fixed effects disappear under the unit-root case.
Hayakawa (
2009) derives the asymptotic properties of the IV estimator for higher-order DP models, with neither heteroskedasticity nor exogenous regressors present, when the panel is dynamically stable under large
N and large
T, though he remarks that it should also hold under other asymptotic regimes.
The plan of this paper is as follows. The next section introduces the model and notation.
Section 3 presents the RMM estimator under the baseline set-up of homoskedasticity, though it is further shown that the estimator is robust to cross-sectional heteroskedasticity as well. An important message here is that when
T is large, there is no asymptotic bias, standing in contrast to the consistent WG estimator that may possess an asymptotic bias.
Section 4 introduces a robust estimator under cross-sectional and temporal heteroskedasticity.
Section 5 illustrates the good finite-sample performance of the proposed estimator by Monte Carlo experiments. The last section concludes and discusses possible future research. Technical details, including lemmas that are used for the proofs of the main results in this paper, together with some extended discussions and additional simulation results, are provided in the appendices. Throughout, matrix/vector dimension subscripts are typically omitted unless confusion may arise. A subscript 0 is used to signify the true value of a parameter that is to be estimated.
2. Model, Notation, and Assumptions
The
p-th order linear dynamic panel model, DP(
p) for short, is
, where the dependent variable
is related to its lagged values, up to order
p, fixed effects
, the
vector of exogenous variables
, and an idiosyncratic disturbance term
. The vector
collects all the right-hand side variables and
, where
. For each
i, let
,
,
,
,
, and
. Stacking over
i, one has
,
,
,
,
,
, and
. Let ⊗ denote the matrix Kronecker product operator and
be a
column vector of ones. Then, in matrix notation, (
1) can be written as
Suppose there exists a matrix
such that when pre-multiplying it to (
2), the fixed effects are wiped out, namely,
The
matrix
comes as a natural choice for
, where
, and
(
) denotes the identity matrix of size
N (
T). In this case, applying the OLS procedure to (
3) yields the WG estimator. For a given
, the regression model (
3) contains endogenous lagged dependent variables
, ⋯,
as regressors and
. The IV/GMM literature focuses on using various instruments for
(or its various differences). The best “instrument” for
is of course itself, though it violates the definition of instrument. Nevertheless, if one can explicitly analyze how
is correlated with
, subject to the transformation induced by
, then this piece of information can be used to estimate model parameters. This is essentially the idea of this paper. In the sequel,
is used (and thus,
), and the following assumptions are made. Different assumptions about the idiosyncratic error term are deferred into the next two sections when different forms of heteroskedasticity are discussed.
Assumption 1. The series of fixed effects , , is i.i.d. across individuals with finite moments up to the fourth order.
Assumption 2. The error terms and fixed effects are independent of each other for any , .
Assumption 3. The regressors , if present, are either fixed or random and converges (in probability) to a nonsingular matrix as . When they are fixed, . When they are random: (i) they are strictly exogenous with respect to error terms; (ii) with finite moments up to the fourth order; (ii) and , , , , , , .
Assumption 4. The initial values , , , are either fixed or random. When they are fixed, . When they are random: (i) with finite moments up to the fourth order; (ii) and , , , , , , ; (iii) , , , .
Without loss of generality, in what follows, the analysis in this paper is conditioning on
for ease of presentation. No distribution assumption is made, so long as some moment conditions hold. Assumption 3 does not rule out the possible correlation between
and the fixed effects, but rules out correlation among the products of them. For simplicity, the strictly exogenous regressors contained in
are bounded (in probability). For the estimation strategy to be introduced in the next section, the inclusion of explosive exogenous regressors can affect the convergence rate of the associated parameter estimators, but does not affect the convergence rate of those associated with the lagged dependent variables. Assumption 4 does not spell out how the initial values are generated, except that there are restrictions on how they may be correlated with the fixed effects and idiosyncratic errors. The panel under consideration is not restricted to be dynamically stable. The fixed effects in Assumption 1 are treated as randomly generated from some distribution. This assumption itself is not needed for the purpose of estimation, since the recentered moment conditions to be presented wipe out the fixed effects. However, it is used, together with Assumptions 2–4 and Assumption 5 or 6, to ensure that the (scaled) moments and the associated gradient have properly defined probability limits (see, for example, Lemma A5 in the
Appendix C), which in turn are used to establish the asymptotic distribution of the RMM estimator.
6 When there is a unit root, the asymptotic behavior of the RMM estimator to be introduced depends on the fixed effects, so for the sake of tractability, they are assumed to be deterministic by following
Hahn and Kuersteiner (
2002).
7In the next two sections, the asymptotic properties are derived of the RMM estimator under homoskedasticity and heteroskedasticity, without or with a unit root. Under large N and finite T, the asymptotic distribution is of the same form, whereas under large T, subject to the appropriate scaling factors, the asymptotic distribution resembles the OLS result in traditional regression analysis.
Note that, as established in
Bao and Yu (
2023), the vectors of lagged observations from model (
1) (at the true parameter vector
) for each cross-sectional unit can be written as
, where
is a
strict lower triangular matrix with 1’s on the first sub-diagonals and 0’s elsewhere,
is a
vector,
,
, and
.
8 Then, stacking over index
i, one has, for
,
where
is an
vector collecting initial cross-sectional observations at time
. This representation of
is in terms of
,
,
, and initial conditions, so
essentially boils down to linear and quadratic forms in the random vector
. This facilitates the derivation of the expectation of
, which forms the basis of the moment conditions in this paper for estimating model parameters. When
is independent and identically distributed (i.i.d.) across
i and
t, moments of quadratic forms in
can be found in
Bao and Ullah (
2010), and in the presence of heteroskedasticity, results are provided in Appendix A.7 of
Ullah (
2004).
5. Monte Carlo Evidence
In this section, Monte Carlo simulations are conducted to assess the finite-sample performance of the proposed estimator.
Bao and Yu (
2023) provide simulation results for the first- and second-order models when the idiosyncratic errors are either homoskedastic or temporally heteroskedastic. Recall that under the baseline framework, the proposed estimator in this paper is also robust to cross-sectional heteroskedasticity. This section considers a third-order model, and for one to get a comprehensive spectrum of possible heteroskedasticity, four scenarios are included: homoskedasticity (across
i and
t), cross-sectional heteroskedasticity (across
i only), temporal heteroskedasticity (over
t only), and double (both cross-sectional and temporal) heteroskedasticity.
The following DP(3) is used:
where
is i.i.d. (across
i) following a standard normal distribution,
is i.i.d. (across
i) uniformly on the interval
,
is i.i.d. (across
i and
t) following a standard normal distribution, and
is similarly defined. The initial value of
is set to be
, where
is i.i.d. (across
i) following a standard normal distribution. The initial observations on
are simulated as
if there is no unit root and
otherwise, where
follows a stationary zero-mean third-order autoregressive (AR(3)) process with coefficients
,
, and
, and its shock term is a unit-variance white noise.
Let
be i.i.d. (across
i and
t) with mean 0 and variance 1. The error term
is simulated as follows in the four different scenarios (homoskedasticity, cross-sectional heteroskedasticity, temporal heteroskedasticity, and double heteroskedasticity):
where, for example,
means that
is independently (across
i) drawn from a uniform distribution on the interval
, and if the realization is no smaller than 100, then it is redrawn from a chi-squared distribution with 10 degrees of freedom.
is simulated from a normal distribution, but results under non-normal distributions are available upon request and the conclusions in this section are largely consistent across all distributions.
16In reality, data themselves rarely reveal clearly the lack or presence of heteroskedasticity, so estimation results from the estimator using the recentered moment conditions (
7) and the robust one using (
25), referred to as RMM and RMM
, respectively, are presented. When the empirical rejection rates of a
two-sided
t-test of the relevant parameter equal to its true value are reported, for WG and GMM, also included are their empirical sizes from
t-ratios using the White-type robust standard errors (clustered at the individual level, referred to as WG(
h) and GMM(
h), respectively). There are different choices for the GMM estimator, depending on what and how many instruments are used.
Breitung et al. (
2022) show that the one-step estimator of
Arellano and Bond (
1991) is very comparable to other popular choices (
Ahn and Schmidt 1995;
Blundell and Bond 1998), so the GMM estimator used in this section is the one-step one.
17 Different combinations of
N and
T are experimented with: [100 10; 50 20; 50, 50; 25 40; 20 50; 10 100]. Regardless of
N and
T, one can always use the estimator in this paper, but in reality, in a situation like
,
, to conduct inference, there is typically no convincing evidence for one to favor one asymptotic regime over the other. So in what follows, in addition to the bias and root mean squared error (RMSE), all out of 10,000 simulations, also reported are the empirical sizes using standard errors constructed under different asymptotics. In particular, RMM(
N) and RMM
(
N) denote the empirical sizes from
t-ratios using the large-
N standard errors (see
Section 3.2 and
Section 4.1) for RMM and RMM
, respectively. Likewise, RMM(
T) refers to the empirical size from the
t-ratio when the large-
T standard errors from (17) (or (22)) are used for RMM. Furthermore, RMM
(
) means when the feasible variance (
32) (or its version under unit root) is used for RMM
, which is valid if both
N and
T are large. On the other hand, RMM
(
T) does not mean that a different
t-ratio is used and instead, it signals that, for the RMM
t-ratio when the feasible variance is used, the
approximation developed by
Hansen (
2007) under large
T is used for conducting hypothesis testing.
Included for comparison are the bias-corrected estimators of
Bun and Carree (
2006) (BC for short) and
Juodis (
2013) (BCJ for short).
18 As
Juodis (
2013) points out, BC is consistent under homoskedasticity and cross-sectional heteroskedasticity, but invalid under temporal heteroskedasticity. BCJ, on the other hand, is robust to both forms of heteroskedasticity. Neither
Bun and Carree (
2006) nor
Juodis (
2013) derives the asymptotic distributions of their bias-corrected estimators, so only their bias and RMSE results are reported in this section. Included also is the half-panel jackknife (HPJ) estimator of
Chudik et al. (
2018), where errors can be heteroskedastic across both
i and
t.
In the experiments,
and three sets of parameter configurations of
are used:
,
, and
. The first configurations reflects a situation when the degree of time-series correlation is relatively strong in the sense that the cumulative partial effect of a past shock, measured by powers of
, can be high. The second set corresponds to a case of zero past effect, and the last one is a unit-root case where the past effect never dies out. To save space, only results related to
under the first set of parameter configuration
are reported, whereas results under the other two parameter configurations are contained in
Appendix F.
19Table 1 reports the bias and RMSE, both multiplied by 100, and empirical rejection rate of the two-sided
t-test related to
when
is homoskedastic. One sees clearly the superb performance of both RMM and RMM
, with the smallest bias and lowest RMSE. GMM performs reasonably well, but its bias and RMSE peak at
. The WG estimator performs the worst when
T is small but improves as
T gets relatively larger, but is still more biased even when
compared with RMM or RMM
. BC and BCJ report larger bias and higher RMSE than GMM on many occasions, though there are also cases where they are much better than GMM. Note that BC and BCJ bias-correct the WG estimator, which is in fact consistent under large
T.
Table A1 in
Appendix F reveals that when there is a unit root, BC and BCJ give a much larger bias than WG, even when
T is relatively large. So, in this case, the action of bias-correction gives more biased estimates. HPJ is the most biased and possesses the highest RMSE among all the consistent estimators at
, namely, when the panel is relatively short. Its performance improves quickly as
T increases, though still quite a bit below RMM/RMM
.
Now, consider the size performance of the associated two-sided
test from the different estimators. From
Table 1, one sees severe size distortions of the WG-based inference when
T is relatively small, but its empirical size is close to the nominal size when
. The GMM-based inference performs really poorly. The
t-test based on HPJ severely over-rejects when
, but its size distortion goes down quickly when the panel has longer spans. The large-
N-based inferences from RMM and RMM
, namely, RMM(
N) and RMM
(
N), give empirical sizes close to
when
N is relatively large. On the other hand, RMM(
T) delivers a reasonably good size performance even when
T is relatively small. Results from RMM
(
) and RMM
(
N) are mixed: when they work, they may perform slightly worse than RMM(
N) and RMM(
T), but their performances get worse when
N is small and
T is large. Finally, the size result from RMM
(
T), namely, using the
approximation for the asymptotic distribution of the
t-ratio, is good in almost all cases. Recall that the
approximation is valid under homoskedasticity. These results are not surprising, given that homoskedastic errors are simulated and that, for the robust estimator, the feasible standard errors based on (
32) (or its version under unit root) require both
N and
T to be large. In fact, when
, the size distortion from RMM
(
) is the smallest compared with other
combinations with relatively smaller
N or
T.
Table 2 reports results under cross-sectional heteroskedasticity. Relative performances of these different estimators largely stay the same. Notably, BC and BCJ perform really poorly when
, though they are designed to bias-correct the WG estimator under large
N and fixed
T. In contrast to the homoskedastic case, HPJ gives a smaller bias and lower RMSE than RMM/RMM
when
, but usually performs worse in other cases. In terms of hypothesis testing, when
N is relatively large, both RMM(
N) and RMM
(
N) work well; when
T is relatively large, both RMM(
T) and RMM
(
T) provide empirical rejection rates close to the nominal size. Recall that the RMM estimator designed under homoskedasticity is also valid under cross-sectional heteroskedasticity, and so is the related inference procedure. There is pronounced size distortion from HPJ for the unit root case (see
Table A2 in
Appendix F), but its size performance is reasonably good under the other two parameter configurations when
T is not relatively small.
When there is time-series heteroskedasticity, the robust estimator usually dominates all the other estimators in terms of bias and RMSE, as demonstrated in
Table 3 (and
Table A3 in
Appendix F). In terms of hypothesis testing, the large-
N RMM
t-ratio delivers very good size performance when
N is relatively large, whereas the
approximation (RMM
(
T)) provides good results when
T is relatively large. The large-
N-large-
T RMM
-based inference reports upward size distortions in almost all cases.
Finally,
Table 4 provides results under double heteroskedasticity, namely, when both cross-sectional heteroskedasticity and time-series heteroskedasticity are present. RMM
is the least biased, except when
, where HPJ is slightly better. On the other hand, RMM, which ignores temporal heteroskedasticity, reports very small bias on many occasions, especially under the second and third parameter configurations (see
Table A4 in
Appendix F). In terms of RMSE, BCJ and RMM
are comparable. In terms of the size performance of the associated
t-test, the story is very similar to that reported when there is temporal heteroskedasticity only, namely, under large
N, RMM
(
N) is most trustworthy and RMM
(
T) is the one under large
T. GMM has the most size distortions in almost all cases, and HPJ gives severe size distortions when there is a unit root.
Summarizing all the simulation results, one can learn the following. (i) When there is no heteroskedasticity, the proposed estimator can be safely used, either the one based on (
7) or the robust one based on (
25), regardless of
N and
T. When heteroskedasticity in the time dimension is present, then the robust version is the best. (ii) The presence of a unit root or not has no substantial impact on its performance. (iii) In terms of inference, when
N is relatively large or is of comparable size relative to
T, the large
N-based inference from RMM
has very good size performance, regardless of heteroskedasticity. When
T is relatively large, the large
T-based inference from RMM gives reliable inference under homoskedasticity and cross-sectional heteroskedasticity. (iv) When
T is large, the
approximation for the
t-ratio from RMM
with the feasible robust variance usually has good size performance, regardless of heteroskedasticity.