FDML versus GMM for Dynamic Panel Models with Roots Near Unity

: This paper evaluates the ﬁrst-differenced maximum likelihood (FDML) and the continuously updating system generalized method of moments (CU-GMM) estimators of dynamic panel models when the data is close to non-stationary. This case is far from trivial, as a high degree of persistence is the norm rather than the exception in economic panels, particularly in ﬁnancial management. While the CU-GMM is shown to have lower bias and higher power, it suffers from severe size distortions, which are exacerbated when the data approaches non-stationarity.


Introduction
Dynamic panel data models are of crucial importance in empirical econometrics, since their applications can be found in virtually all subfields of economics. For empirical finance research in particular, several of the most cited papers in recent years have applied the dynamic panel model Beck et al. (2000); Midrigan and Xu (2014); Wintoki et al. (2012). In the financial management literature, the dynamic panel model has been used to answer, for instance, whether cash flow impacts innovation (Brown et al. 2009;Carpenter and Guariglia 2008), and how dividend smoothing differs between private and public firms (Michaely and Roberts 2012). The model is chiefly used when the number of cross-sections, N, is high, but the number of time periods, T, is relatively small.
The econometric techniques for parameter estimation in the dynamic panel model have traditionally been based on the generalized method of moments, GMM. Two GMMbased methods for dynamic panels have been particularly successful: the difference GMM estimator, which is due to Arellano and Bond (1991), and the system GMM of Arellano and Bover (1995) and Blundell and Bond (1998). A likelihood-based estimator, the firstdifferenced ML (FDML), was developed by Hsiao et al. (2002). However, the performance of both these classes of estimators in the situation in which the data is close to non-stationary remains relatively unexplored. For the GMM, the system estimator is usually preferred to the difference estimator (cf. Blundell and Bond 1998;Hayakwa and Pesaran 2015). This is because of the considerable increase in bias that results from using the difference GMM in lieu of the system GMM for highly persistent data. Additionally, Hansen et al. (1996) have made further contributions to GMM theory by introducing the so-called continuously updating GMM (CU-GMM), which can be shown to decrease the finite-sample bias of GMM estimators. 1 As a high degree of persistence is the norm rather than an exception in economic panels, this is a situation with potentially serious ramifications. The finite-sample properties of the FDML in the almost non-stationary setting has not been evaluated either. Failing to perform in the nearly non-stationary case would be considered a serious drawback of this relatively new method, potentially limiting its usability in practical situations.
In this paper, I compare the local-to-unit-root performance of the system CU-GMM, which was tailor-made for this setting, with the FDML. The results of the Monte Carlo simulations show that the FDML has higher absolute bias in the nearly non-stationary case, particularly for low values of T, compared to the continuously updating system GMM. Moreover, for low values of T, the power of the FDML is lower than for the system CU-GMM. However, the CU-GMM estimator suffers from severe oversizing, which is exacerbated as N and T increase. This size distorsion problem was not seen when applying the FDML.
This paper contributes to a rich literature on the CU-GMM and FDML estimators of dynamic panel models. Recent contributions on the CU-GMM include the work by Ashley and Sun (2016), who adjust the standard two-step estimator by applying continuous updating on the autoregressive parameter only, and by Kleibergen and Zhan (2021) on CU-GMM robustness tests for under weak identification and potential misspecification. Further, a second strain in the literature focuses on the sensitivity of the CU-GMM to the choice of instruments. Newey (2004) gives standard errors for the CU-GMM that adjust for the number of overidentifying restrictions, whereas Donald et al. (2009) develop moment selection criteria for various GMM-based estimators, including the CU-GMM. Recent research on the FDML includes, for instance, extensions to the situation with arbitrary initial conditions (Kruiniger 2018b).
Despite its good theoretical properties, the FDML is more seldom used by practitioners compared to the GMM. What is more, recent research has shown that the FDML outperforms the GMM in terms of size, power, and bias in most finite-sample cases: Elhorst (2010) and Hsiao and Zhang (2015) provide evidence when comparing with the Arellano-Bond difference GMM, and Hayakwa and Pesaran (2015) extend this to include the system GMM, including the CU-adjusted estimator. However, none of these papers consider the situation with local to unit root data 2 , for which the Arellano-Bond estimator is inconsistent due to weak instruments. The present paper aims to fill this gap.
The rest of the paper is structured as follows. Section 2 introduces the CU-GMM and FDML estimators for the dynamic panel model. Section 3 describes the Monte Carlo design. Section 4 presents the results of the simulation study. The paper concludes with Section 5.

Theory
The AR(1) dynamic panel data model can be described by for individuals i = 1, . . . , N and time periods t = 1, . . . , T, where α i are the fixed effects, φ is the autoregressive (AR) parameter, and u it is the idiosyncratic error term. It is assumed that the error terms u it are independent and identically distributed, and Using the fixed effects estimator to estimate (1) gives biased estimates of φ (Nickell 1981). For T large, we can write plim N− →∞ (φ − φ) ≈ −(1 + φ)/(T − 1), meaning that when φ is near unity, the bias can be sizable.
The first to propose an unbiased estimator of (1) were Anderson and Hsiao (1981). However, the Anderson-Hsiao estimator is asymptotically inefficient for all values of φ, and the absolute bias and variance of the estimator both increase significantly when φ approaches unity (cf. Arellano and Bover 1995). These problems have caused the empirical importance of the Anderson-Hsiao estimator to decline significantly. Instead, this paper considers GMM and FDML, which dominate present-day empirical research.

CU-GMM
The first estimation technique considered in this paper is the system CU-GMM. I shall focus particularly on the performance of the CU version of the GMM in this paper, as it has received relatively little attention in the literature compared to the one-step and two-step GMM estimators.
Let π it = α i + u it . Then, Arellano and Bond (1991) show that for t = 3, . . . , T, the moment conditions E[y is ∆π is ] = 0, where ∆ is the difference operator, can be utilized. If and the vector of first-differenced errors is ∆π i = (∆π i3 , . . . , ∆π iT ) . Using this notation, the moment conditions can be written E[Z i ∆π i ] = 0 for i = 1, . . . , N. However, Blundell and Bond (1998) show that the Arellano-Bond estimator significantly underestimates φ in the local-to-unity case. In order to remedy this problem, it is possible to introduce additional moment conditions, namely E[π it ∆y i,t−s ] = 0 for t = 3, . . . , T and i = 1, . . . , N. For u it ∼ MA(0), it holds that s = 1, and if u it ∼ MA(1), then s = 2. The joint moment conditions can be written in matrix form as where ∆π i is as defined previously and π i = (π i3 , . . . , π iT ) , for i = 1, . . . , N. Using these additional moment conditions, the Blundell-Bond (otherwise known as the system GMM) CU estimator is the solution to the optimization problem arg min where Φ is the compact set of all possible parameters and W N is a positive semi-definite weight matrix, for which it holds that W N P − → W by the law of large numbers. The CU technique allows for the weight matrix to be a function of φ. Thus, instead of fixing W N in each stage of the estimation, it is altered as the value ofφ changes during the minimization process. This reduces the finite-sample bias of the estimator without altering its asymptotic properties (Hansen et al. 1996;Newey and Smith 2004;Pakes and Pollard 1989).
I will useφ GMM to denote the continuously updating system GMM. Under stationarity, the following theorem regarding the limiting distribution ofφ GMM holds.
The asymptotic bias is O p ( √ T/N), which implies that as N/T − → ∞, the asymptotic bias disappears. Moreover, the limiting variance in (4) is equal to the Cramér-Rao lower bound (Hahn and Kuersteiner 2002). Hence, as N/T − → ∞, the GMM estimator is asymptotically efficient. However, if |φ| = 1, the limiting distribution is no longer normal, as manifested by Theorem 2.
Here, C denotes a standard Cauchy variate. However, it can be shown that normality can break down even for values of φ local to unity, although it has not been empirically tested how close to unity φ must be in order for the limiting distribution to become Cauchy (Phillips 2014).

FDML
An alternative approach to estimating φ is by using FDML. In order to eliminate α i , take the first difference of (1) to obtain For t = 1, the above expression is not well defined, since ∆y i1 = φ∆y i0 + ∆u i1 and ∆y i0 is not observable. However, by continuous substitution, Now, the analysis will differ slightly depending on whether the process is stationary or not. Assume first that |φ| < 1 and m − → ∞. Then, it holds for t = 3, . . . , Alternatively, if |φ| ≥ 1, the process has started from a finite point m that is behind the 0:th Assuming stationary increments, let ∆y i be as defined previously, and where ω = (1/σ 2 u )V(∆y i1 ). This is equal to 2/(1 + φ) if |φ| < 1, and c else. To find the likelihood function of ∆y i , use that ∆u * i is a linear combination of ∆y i , and that the Jacobian of this transformation is equal to unity. Thus, the joint probability density functions (p.d.fs) of ∆u * i and ∆y i are equal. Then, assuming that the u it :s are independent normal, the joint p.d.f. of ∆y i is equal to the likelihood function of (1), and is given by The corresponding log-likelihood is The two unknown elements of Ω are σ 2 u and ω. Proceeding from here, the FDML technique involves utilizing the Anderson-Hsiao estimatorφ AH to find an initial estimate of φ. Then, the variance σ 2 u is estimated byσ 2 In the stationary case, ω can be estimated by 2(1 +φ AH ) −1 . In the non-stationary case, ω is estimated byω = Then, (10) is maximized numerically until convergence. Theorems 3 and 4 provide the asymptotic results for the FDML.
Hence, the FDML estimator is asymptotically unbiased, asymptotically normal and, for T large, asymptotically efficient. For the unit root case, Theorem 4 holds.

Proof. See Kruiniger (2008a).
The rate of convergence is O p (T √ N), which is faster than the rate of convergence in the stationary case, which is O p ( √ NT) according to Theorem 3. Note that normality of the FDML does not break down in the unit root case, which is a clear contrast to the GMM.

Monte Carlo Setup
I now proceed to evaluate the performance of the CU-GMM and FDML in the closeto-non-stationary setting. The model of interest is where u it ∼ N (0, 1) or u it ∼ N (0, 2). Individual effects are generated according to The AR parameter is varied according to φ 0 ∈ {0.90, 0.95, 0.99}. The number of Monte Carlo replications is set to 1000.

Results
Tables 1-3 present the mean and median biases, size, and power for the FDML and CU system GMM, using standard standard errors in computing size and power. Table 1 corresponds to φ = 0.90, Table 2 to φ = 0.95, and Table 3 is for the case when φ = 0.99.   The absolute biases are generally larger when φ is closer to unity; this applies for both estimators. For φ = 0.90 and φ = 0.95, the FDML generally outperforms the GMM in terms of bias except when T is low. When φ = 0.99, the GMM estimator is better for virtually every combination of N and T; the exceptions being the mean bias for the combination of T = 50 and N = 500, as well as the median biases for the T = 30 when N = 150 and 500. The absolute mean and median biases are close to zero for the GMM when φ = 0.99. The FDML performs relatively poorly when φ is this close to unity, especially when T < 20.
Considering that the Blundell-Bond version of the GMM is more or less tailor-made for the situation with close-to-non-stationary data, the bias results do not come as a major surprise. Also, due to the close relationship betweenφ FDML and the poor initial estimate obtained from the Anderson-Hsiao estimator, one would expect higher absolute FDML bias when φ is closer to unity, given the discussion in Section 2. This is confirmed by the Monte Carlo results. The mean and median biases are generally negative for the FDML, while for GMM, they tend to be negative for φ = 0.90 and φ = 0.95, and positive for φ = 0.99.
While the performance of the GMM is superior to the FDML in terms of bias, the size is considerably higher than 5% for the GMM, irrespective of the value of φ. Additionally, the size of the GMM estimator is increasing both with the AR parameter φ and with the number of time periods, T. This effect is not observed in the likelihood estimator. The results further show that the size problem for the GMM is exacerbated when φ is very close to one. For example, when φ = 0.99, the size is above 96% even for N = 150 and T = 30.
Hence, if imposed such that |φ| < 1, the value of φ is an important factor in explaining the performance of size.
Regarding the power, it is considerably lower for the FDML, especially for small values of T. Additionally, the power of the FDML is deteriorating as φ approaches unity. However, when T = 30 and 50, the power of both estimators is close to 100%.

Concluding Remarks
This paper compares the bias, size and power of the system CU-GMM and FDML estimators of the AR coefficient in the dynamic panel model when the process is highly persistent, that is, when AR coefficient is local to unit root. This setting is particularly important in empirical finance, as most financial data is persistent.
The main finding of the paper is the relatively large increase in absolute bias of the FDML as the value of the AR parameter approaches unity. Moreover, the absolute bias of the CU-GMM is lower for most combinations of N and T. However, the CU-GMM is shown to suffer from severe size distortions, implying the existence of a trade-off between precision and size. The size is increasing both with the value of the AR parameter and with the number of time periods T. This result is a further contribution of this paper. The power of both the FDML is considerably lower than that of the GMM when T is small. However, the power of both estimators is shown to be close to 100% with N and T sufficiently large.
Overall, although the FDML has slightly higher bias than the CU-GMM when φ is close to unity, its high power and correct size makes it a viable option to the hitherto dominating GMM-based methods in most empirical settings. Thus, the findings in this paper have broader implications for applications in the financial management field. In empirical research, FDML estimates could be reported in lieu, or alongside, GMM estimates when using the AR(1) dynamic panel model. Additionally, the FDML allows practitioners to estimate the dynamic panel model when the number of time periods is close to the number cross-sectional units (N). 4 This is a fairly common situation when using financial management data.
Finally, since dynamic panel models are highly useful in the financial sector, the results of this paper have implications for decision-makers within the sector. This is because precision in econometric estimates is crucial for making correct investment decisions. Specifically, finance practitioners specialized in global quantitative strategy may use the FDML as an alternative to the CU-GMM when, for example, monitoring for bubbles in equity prices, or for comparing capital structure and payout policy between firms.
A limitation of the study is that there are several other GMM-based estimators widely used by practitioners, for instance the one and two step estimators. However, the continuously updating weight matrix is tailor-made for the situation with local to unit root data, making it the obvious competitor to the FDML in the present setting.

1
For additional examples of bias-reducing methods for GMM estimators of dynamic panel models, see Choi et al. (2010) or Mehic (2020). 2 I consider processes for which the autoregressive parameter is greater than 0.9 to be highly persistent. The maximum values for the autoregressive parameter were 0.8 in Elhorst (2010) and Hsiao and Zhang (2015), and 0.9 in Hayakwa and Pesaran (2015). 3 The parameterization in (15), where the individual effects are multiplied by (1 − φ 0 ), is a standard approach in the literature when dealing with almost non-stationary data (Han and Phillips 2013;Bun et al. 2017). Without this correction, the individual effects would have too much of an impact on the results when the true value φ 0 is close to unity.