Next Article in Journal
Artificial Intelligence for Cluster Analysis: Case Study of Transport Companies in Czech Republic
Next Article in Special Issue
Efficient Variance Reduction for American Call Options Using Symmetry Arguments
Previous Article in Journal
A Balanced Portfolio Can Have a Higher Geometric Return Than the Risky Asset
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Second-Order Least Squares Method for Dynamic Panel Data Models with Application

1
Department of Statistics, Cairo University, Giza 12613, Egypt
2
Department of Statistics, University of Manitoba, Winnipeg, MB R3T 2N2, Canada
*
Author to whom correspondence should be addressed.
J. Risk Financial Manag. 2021, 14(9), 410; https://doi.org/10.3390/jrfm14090410
Submission received: 9 August 2021 / Revised: 20 August 2021 / Accepted: 21 August 2021 / Published: 1 September 2021
(This article belongs to the Special Issue Predictive Modeling for Economic and Financial Data)

Abstract

:
Management of financial risks and sound decision making rely on the accurate information and predictive models. Drawing useful information efficiently from big data with complex structures and building accurate models are therefore crucial tasks. Most commonly used methods for statistical inference in dynamic panel data models are based on the differencing transformation of data. However, differencing data may cause substantial loss of information, and therefore the subsequent analysis may fail to capture important features in the original level data. This point is demonstrated by a real data example where we use a semiparametrically efficient estimation method on the level data to reach a more favorable model. In particular, we study a second-order least squares approach which is based on the first two conditional moments of the response variable given the explanatory variables. This estimator is root-N consistent and its asymptotic variance reaches a lower bound semiparametric efficiency. Monte Carlo simulations show that this estimator performs favorably in finite sample situations compared to the first-differenced GMM and the random effects pseudo ML estimators. We also propose a new diagnostic test to check the working moments assumption based on the proposed estimator. A real data application is presented to further demonstrate the usage of this method.

1. Introduction

Modern technology and data collection techniques have produced huge amount of data in business, economics, and many other fields. On the one hand, these data provide rich information to support decision making; however, on the other hand, the big sizes and complex structures of the data make the statistical analysis very challenging. How to draw useful information from data efficiently and how to identify accurate predictive models are important issues for the management of financial risks and decision making in general. A common data type is the repeated measurements data that are collected on a large number of units over certain period of time. In business and economics, the data are typically collected at regular time points (calendar time) and are usually called panel data. Statistical methodologies for the analysis of longitudinal data have been extensively studied in statistics as well as econometrics; however, the research follows two different directions.
In statistical literature, the main stream research focuses on the likelihood or generalized estimating equations (GEE) approaches in linear and nonlinear mixed effects models (e.g., Fitzmaurice et al. 2009). In contrast, in econometric literature, the emphasis is on the likelihood and generalized method of moments (GMM) approaches in dynamic panel data models (e.g., Arellano 2003; Baltagi 2008; Hsiao 2003 2011). In particular, The GMM approach is usually based on some suitable linear transformation such as the first differencing to eliminate the unobserved subject effects (Arellano and Bover 1995; Arellano and Bond 1991; Blundell and Bond 1998). However, differencing operation may cause substantial loss of information in the data and therefore lead to the loss of estimation efficiency and inaccurate model identification. We will show through a real data example that more favorable model can be reached by applying a semiparametrically efficient estimation method to the original level data. Moreover, the model specification can be checked by our proposed diagnostic test.
In this paper, we propose a moments-based approach that is theoretically less restrictive than the likelihood-based methods and is fairly efficient and computationally tractable. This approach requires only the specification of the first two conditional moments of the unobserved subject effect given the process initial value and covariates, and does not require any other initial conditions or distributional assumptions. The data generating process can be either stationary or nonstationary, and does not need to be transformed. The so-called second-order least squares (SLS) estimator is consistent and asymptotically normally distributed when the cross section size N is large and the time series length T is fixed. In addition, this estimator reaches a semiparametric efficiency lower bound. We also propose a diagnostic test based on the SLS estimator to check the conditional moments assumption. Our extensive simulation studies show that the proposed estimator and its variants perform very well in finite sample situations and better than the GMM and likelihood based estimators in most cases.
The SLS method was first proposed by Wang (2003, 2004) to estimate the nonlinear measurement error models. It was extended to the nonlinear longitudinal data models with homoscedastic errors by Wang (2007) and to the censored linear models by Abarin and Wang (2009). Wang and Leblanc (2008) showed that under a nonlinear (homoscedastic) cross-sectional data model, the SLSE is asymptotically more efficient than the OLSE (pseudo MLE) when the error term has nonzero third moment, and both estimators are equally efficient otherwise. Kim and Ma (2012) proved that the SLSE attains the optimal semiparametric efficiency bound in general. More recently, the SLS method has also been used in optimal design problems by several researchers, e.g., Gao and Zhou (2014), Bose and Mukerjee (2015), Gao and Zhou (2017), Yin and Zhou (2017).
This paper is organized as follows. In Section 2 and Section 3 we introduce the model and the semiparametric method for estimation and testing. In Section 4 we derive the asymptotically most efficient version of the proposed estimators. In Section 5 we carry out Monte Carlo simulations to study finite sample properties of the proposed estimators and compare them with other commonly used methods. In Section 6 we apply our method to a real data set to demonstrate its practical merits. Finally, conclusions and discussion are in Section 7 and regularity conditions and mathematical proofs are in the Appendix A, Appendix B, Appendix C, Appendix D, Appendix E and Appendix F.

2. Model and SLS Estimation

Let ( y i , x i , η i ) , i = 1 , 2 , , N be independent and identically distributed random vectors where y i = ( y i 0 , y i 1 , , y i T ) and x i = ( x i 1 , x i 2 , , x i T ) are respectively the measurements of the response variable and p covariates taken for the ith subject over T time periods. Suppose
y i t = α 0 y i ( t 1 ) + β 0 x i t + η i + ε i t , t = 1 , 2 , , T ,
where η i is the unobserved subject effect and error term ε i t satisfies E ( ε i t | η i , y i 0 , x i ) = 0 , E ( ε i t ε i s | η i , y i 0 , x i ) = σ 0 2 if s = t , and zero otherwise. In addition, we assume that the conditional moments E ( η i j | y i 0 , x i ) = f j ( y i 0 , x i , θ 0 ) , j = 1 , 2 are known up to an -dimensional unknown parameter vector θ 0 . This assumption is more general than the unrestricted initial conditions used by Blundell and Bond (1998) and Alvarez and Arellano (2003) to derive the conditional GLS (CGLS) and the random effects ML (RML) estimators, respectively. Note that our semiparametric assumption on η i is not as restrictive as it appears because the functional forms of f j ( y i 0 , x i , θ 0 ) , j = 1 , 2 can be specified naturally based on some diagnostic tools as illustrated in Section 6. Moreover, any suggested specification can be tested using the test developed in Section 3. Also note that although we explicitly deal with an A R ( 1 ) model for the sake of simplicity of notation, our approach can be extended to more general A R ( p ) models as discussed later.
We propose to estimate unknown parameters in model (1) based on the first two conditional moments of the response y i t given its initial value y i 0 and covariates x i . Specifically, let γ 0 = ( α 0 , σ 0 2 , β 0 , θ 0 ) and Γ I R p + + 2 be the corresponding parameter space. By backward substitution in Equation (1), we obtain the reduced form equation
y i t = α 0 t y i 0 + a t ( α 0 ) η i + β 0 x ˜ i t ( α 0 ) + ε ˜ i t ( α 0 ) , t = 1 , 2 , . . . , T ,
where a t ( α ) = r = 0 t 1 α r , x ˜ i t ( α ) = r = 0 t 1 α r x i ( t r ) and ε ˜ i t ( α 0 ) = r = 0 t 1 α 0 r ε i ( t r ) . Under this model, the first two conditional moments of y i t are given by
μ i t ( γ 0 ) = E ( y i t | y i 0 , x i ) = α 0 t y i 0 + β 0 x ˜ i t ( α 0 ) + a t ( α 0 ) f 1 ( y i 0 , x i , θ 0 ) , ν i t s ( γ 0 ) = E ( y i t y i s | y i 0 , x i ) = α 0 t + s y i 0 2 + a t ( α 0 ) a s ( α 0 ) f 2 ( y i 0 , x i , θ 0 ) + β 0 x ˜ i t ( α 0 ) x ˜ i s ( α 0 ) β 0 + σ 0 2 c t s ( α 0 ) + d t s ( α 0 ) y i 0 f 1 ( y i 0 , x i , θ 0 )
+ y i 0 β 0 w i t s ( α 0 ) + f 1 ( y i 0 , x i , θ 0 ) β 0 k i t s ( α 0 ) ,
where c t s ( α ) = α t s r = 0 s 1 α 2 r , d t s ( α ) = α t a s ( α ) + α s a t ( α ) , w i t s ( α ) = α t x ˜ i s ( α ) + α s x ˜ i t ( α ) and k i t s ( α ) = a t ( α ) x ˜ i s ( α ) + a s ( α ) x ˜ i t ( α ) , t s . Further, let
h i ( γ ) = y i t μ i t ( γ ) , 1 t T , y i t y i s ν i t s ( γ ) , 1 s t T , γ Γ .
Then the second-order least squares (SLS) estimator for γ is defined by
γ ^ N = argmin γ Γ 1 N i = 1 N q i ( γ ) ,
where q i ( γ ) = h i ( γ ) W i h i ( γ ) and W i is a nonnegative definite matrix whose elements are real measurable functions of ( y i 0 , x i ) . It follows from the standard M-estimation theory that the SLS estimator γ ^ N has the following asymptotic properties under the regularity conditions given in Appendices Appendix A.
Theorem 1.
(1) Under Assumptions A1–A3, γ ^ N a . s . γ 0 as N and T is fixed.
(2) Under Assumptions A1–A6, N ( γ ^ N γ 0 ) d N ( 0 , A 1 B A 1 ) as N and T is fixed, where
A = E h 1 ( γ 0 ) γ W 1 h 1 ( γ 0 ) γ ,
B = E h 1 ( γ 0 ) γ W 1 h 1 ( γ 0 ) h 1 ( γ 0 ) W 1 h 1 ( γ 0 ) γ .

3. A Diagnostic Test

The estimation approach in the previous section is based on the correct specification of the first two conditional moments of the response variables. In this section we propose a test for these moments, called SW for convenience. It is designed to test hypotheses
H 0 : E h i ( γ 0 ) | y i 0 , x i = 0 v s . H a : E h i ( γ 0 ) | y i 0 , x i 0 .
Specifically, let h ( γ ) = 1 N i = 1 N h i ( γ ) . Then the test statistic is defined as
SW = N h ( γ ^ N ) G ^ N 1 h ( γ ^ N ) ,
where γ ^ N is defined by Equation (5) and G ^ N is given by
G ^ N = 1 N i = 1 N P ¯ i ( γ ^ N ) h i ( γ ^ N ) h i ( γ ^ N ) P ¯ i ( γ ^ N )
with
P ¯ i ( γ ^ N ) = I D ¯ N ( γ ^ N ) A N 1 h i ( γ ^ N ) γ W i ,
A N = 1 N i = 1 N h i ( γ ^ N ) γ W i h i ( γ ^ N ) γ ,
and
D ¯ N ( γ ^ N ) = 1 N i = 1 N h i ( γ ^ N ) γ .
The asymptotic distribution of the test statistic SW is given below.
Theorem 2.
Assume that E P i ( γ 0 ) h i ( γ 0 ) h i ( γ 0 ) P i ( γ 0 ) has full rank K = T ( T + 3 ) / 2 . Then under Assumptions A1–A6 and H 0 , S W d χ K 2 as N and T is fixed.

4. Optimal SLS Estimator

From Equations (6) and (7) we see that the asymptotic covariance matrix of γ ^ N depends on the weights W i . It is therefore desirable to chose the optimal weight so that the asymptotic variance is minimized in the following sense.
Theorem 3.
Suppose U 1 = E h 1 ( γ 0 ) h 1 ( γ 0 ) | y 10 , x 1 is nonsingular with probability one, and Assumptions A2–A6 are satisfied with W 1 = U 1 1 . Then the SLS estimator γ ^ N o obtained by taking W i = U i 1 , i = 1 , 2 , . . . , N has asymptotic covariance matrix
A 0 1 = E 1 h i ( γ 0 ) γ U i 1 h i ( γ 0 ) γ .
Furthermore, for any SLS estimator γ ^ N , its asymptotic covariance matrix is such that A 1 B A 1 A 0 1 is non-negative definite.
However, since the optimal weight U i 1 depends on γ 0 and conditional moments E ( ε i t j | y i 0 , x i ) and E ( η i j | y i 0 , x i ) , j = 3 , 4 , the optimal SLS estimator γ ^ N o is not feasible. The corresponding feasible SLS estimator can be calculated by plugging in consistent estimators of these unknown quantities in U i 1 . This feasible optimal SLS (FOSLS) estimator is consistent under condition (A1) in Appendix A. Moreover, the FOSLS has the same asymptotic covariance matrix as the (infeasible) optimal SLS estimator (Newey and McFadden 1994, Th. 6.1).
We also suggest another version of the FOSLS, called FOSLS1, which may be more robust to any possible stochastic dependence between ε i t and η i , and does not require initial estimates for the third and fourth conditional moments of ε i t and η i . It is obtained by using the weight
W ^ i = C y i 0 , x i , θ ^ N 0 , α ^ N 0 1 N i = 1 N h i * ( γ ^ N 0 ) h i * ( γ ^ N 0 ) C y i 0 , x i , θ ^ N 0 , α ^ N 0 ,
where γ ^ N 0 is a preliminary consistent estimator of γ 0 , and C y i 0 , x i , θ , α is a transformation matrix that maps h i ( γ ) into
h i * ( γ ) = u i t * , 1 t T , u i t * u i s * ν i t s * ( σ 2 , θ ) , 1 s t T ,
where u i t * = y i t α y i t 1 β x i t f 1 ( y i 0 , x i , θ ) and ν i t s * ( σ 2 , θ ) = f 2 ( y i 0 , x i , θ ) f 1 2 ( y i 0 , x i , θ ) + σ 2 1 { s = t } . It can be shown that the FOSLS1 has the same asymptotic covariance matrix as the optimal SLS estimator, if ε i t and η i have constant second to fourth conditional central moments given the process initial value and covariate. It is worthwhile to note that the matrix C and vectors h i * ( γ ) do not involve the moments in (3) and (4) which are calculated based on the reduced form (2). Therefore it is straightforward to generalize them in A R ( p ) models.
Some researchers have studied the semiparametric efficient estimators for various dynamic panel data models. Chamberlain (1992) derived the optimal instrumental variables for the first-difference equation of model (1) under the sequential conditional moment restrictions and showed that the GMM estimator based on these optimal instrumental variables attains the semiparametric efficiency lower bound. Hahn (1997) showed that the GMM estimator based on an increasing set of instruments as the sample size grows would achieve the semiparametric efficiency bound of Chamberlain (1992). More recently, Park et al. (2007) used the geometric approach of Bickel et al. (1993) to construct a semiparametric efficient estimator under the stationary model with normal error distribution.
A natural question is whether the optimal SLS estimator efficiently uses the information in conditional moments (3) and (4). By Lemma 2 of Chamberlain (1987) the minimum bound of the asymptotic variance under E h i ( γ 0 ) | y i 0 , x i = 0 is given by A 0 1 in (9). Therefore by Theorem 3 the optimal SLS estimator attains Chamberlain’s bound of variance and is a semiparametric optimal estimator in this sense.
We conclude this section by comparing the asymptotic variance of the optimal SLS estimator with that of the RML estimator which is identical to the MLE conditional on the initial observation when the errors are normally distributed. Theoretically, the optimal SLSE is at least as efficient as the RMLE by (A14)–(A16) in Appendix F. Unfortunately it is difficult to evaluate the efficiency gain of the optimal SLSE analytically, hence we compare the asymptotic variances of the two estimators numerically. We considered a large number of scenarios with various data generating processes (stationary or nonstationary) and distributions for the error and unobserved random effects.
The percentage gain of efficiency in estimating α 0 as a function of T and α 0 is shown in Figure 1, where z-axis represents the percentage reduction in the variance of RMLE( α 0 ) by using the optimal SLSE( α 0 ). Our simulation results show convincingly that the asymptotic variance of the optimal SLSE( α 0 ) is strictly less than that of the RMLE( α 0 ) except for the case μ 3 ( ε ) = 0 and μ 4 ( ε ) = 3 σ 0 4 (which is true under normal distribution), in which case both estimators have the same asymptotic variances.

5. Monte Carlo Simulation Studies

In this section we carry out extensive Monte Carlo simulations to examine the finite sample properties of FOSLS and FOSLS1 estimators, and compare them with some other popular estimators in the literature, including the linear first differenced GMM (Arellano and Bond 1991), the marginal pseudo maximum likelihood (MPML) (Arellano 2003) and the random effects pseudo maximum likelihood (RML) (Alvarez and Arellano 2003) estimators. For the sake of comparison we adopt the commonly used model setup in these literature, i.e., the model (1) with no covariates x i t but the following specification
E ( η i | y i 0 ) = θ 01 + θ 02 y i 0 , V ( η i | y i 0 ) = exp ( θ 03 ) ,
E ( ε i t j | y i 0 ) = μ j ( ε ) , E ( η i θ 01 θ 02 y i 0 ) j | y i 0 = μ j ( η ) , j = 3 , 4 .
This model specification is also used by Blundell and Bond (1998) to derive conditional-type estimators such as the CGLS, see also Okui (2009).

5.1. Finite Sample Properties and Comparisons

Specifically, the data are generated according to y i 0 i . i . d . N 0 , 2 / ( 1 α 0 2 ) ( 1 α 0 ) , η i | y i 0 θ 01 + θ 02 y i 0 + exp ( θ 03 ) F 1 and ε i t | η i , y i 0 σ 0 2 F 2 with θ 01 = 0 , θ 02 = c 1 α 0 2 / 2 and θ 03 = log ( 1 α 0 ) / 2 , where c , F 1 , F 2 are chosen as follows to be close to the setups in the literature.
  • Normal stationary process: c = 1 and F 1 , F 2 N ( 0 , 1 ) . Under this setup the MPML estimator is the true MLE computed using all data including the initial observations.
  • Nonnormal stationary process: c = 1 and F 1 , F 2 ( χ ( 1 ) 2 1 ) / 2 .
  • Normal nonstationary process: c = 20 and F 1 , F 2 N ( 0 , 1 ) . Under this setup the y process is nonstationary in first two moments and the MPML estimator is inconsistent.
  • Nonnormal nonstationary process: c = 20 and
    (a) F 1 N ( 0 , 1 ) , F 2 ( χ ( 1 ) 2 1 ) / 2 ;(b) F 1 N ( 0 , 1 ) , F 2 3 / 5 t ( 5 ) ;
    (c) F 2 N ( 0 , 1 ) , F 1 ( χ ( 1 ) 2 1 ) / 2 ;(d) F 2 N ( 0 , 1 ) , F 1 3 / 5 t ( 5 ) .
In all scenarios the parameter values are α 0 = 0.2 , 0.5 , 0.8 and the sample sizes are N = 30 , 300 and T = 5 , 10 , 15 . In each simulation 1000 Monte Carlo replications are done and the median estimates and median absolute deviation (MAD) are calculated to assess the performance of the estimators. To compute the FOSLS estimators, we use the RML to obtain preliminary consistent estimates of γ 0 , μ j ( ε ) and μ j ( η ) , j = 3 , 4 , and then plug them in the optimal weight U i 1 . As mentioned earlier this does not affect the asymptotic properties of the second-step estimators.
To save space we only present the numerical results for the nonnormal-nonstationary (a) scenario in Table 1, where we use the MAD of the FOSLS estimator as reference and report the relative MADs for all other estimators. For the other scenarios we provide a summary and discussion of the results below.
In the normal-stationary scenario, the MAD of the FOSLS is decreasing in T and nondecreasing in α 0 . That the FOSLS and RML have equal asymptotic variance under the normal error components (see (A16)) appears clearly for α 0 = 0.2 , 0.5 , while larger values of N are required to see this fact for larger α 0 . As expected the MPML has the smallest MAD in almost all cases because it is the most efficient estimator under normality. However, the gap between the MPML and FOSLS is getting smaller as α 0 decreases or T increases. The first-difference GMM is generally inferior and the problem of weak instruments appears clearly for large α 0 , which is consistent with Blundell and Bond (1998). The FOSLS1 is not reliable for small N because the numerical calculation of its weight matrix is unstable when N < T ( T + 3 ) / 2 . Moreover, the relative MADs of the FOSLS1 for N = 300 show that the estimator has a slower convergence rate than the FOSLS, which was expected. The downward bias in FOSLS vanishes quickly as T increases.
In the nonnormal-stationary scenario, the results shows wide outperformance of the FOSLS, specially for small T. The relative MADs of RML reveal that the true levels of variance reduction gained by FOSLS (see Figure 1a) require N to be larger than 300. The FOSLS competes well with the MPML for small N. Although the FOSLS1 is not reliable for small N / T ratio, it performs well for N = 300 . The FOSLS has smaller bias for small N than other estimators.
In the normal-nonstationary scenario, the FOSLS, RML, and GMM compete very well for small and large N. The close performance of the FOSLS and RML is due to the normality of ε i t . The improvement in the GMM performance is due to the nonstationarity of y i t process. The results show clearly how the MPML breaks down everywhere under this scenario, demonstrating the consequence of misspecifying a nonstationary process. Again, the FOSLS1 requires large N / T ratio to get stable so it is not recommended in this scenario.
Finally, the results of nonnormal-nonstationary scenario presented in Table 1 show the effect of the skewness of the ε i t distribution on the performance of the FOSLS. The RML and GMM are less efficient than the FOSLS by at least 30% for N = 30 , and by as high as 59% for N = 300 . The relative MADs of the RML for large N are consistent with Figure 1c. Although the true levels of variance reduction gained by FOSLS require N to be larger than 300, the gain of efficiency for small N is much larger than the corresponding gain in the nonnormal-stationary scenario. The numerical results of the three remaining cases (b,c,d) in this scenario clearly show the outperformance of the FOSLS( α 0 ) when ε i t has nonnormal distribution.
We conclude this subsection by examining to what extent the asymptotic variance formula for the OSLS in (9) can be used to approximate its finite sample counterpart. To this end we use the asymptotic formula to calculate the standard deviation of the OSLS estimator for α 0 and compare with the sample RMSE calculated through Monte Carlo simulations for various values of N, T, and α 0 . Table 2 shows that both results are fairly close and therefore the asymptotic formula can be safely used in further inference about α 0 such as confidence intervals or testing hypothesis.

5.2. Robustness of FOSLS against Unit-Root

It is well known in the literature that linear first-difference GMM is generally inferior when the autoregressive parameter α 0 is close to one due to the problem of weak instruments, see e.g., Blundell and Bond (1998). On the other hand, our simulation results in the previous subsection show that the performance of FOSLS is not affected by large values of α 0 . Furthermore, it becomes closer to the MPML and sometimes is even better due to the efficiency gain in the case of nonnormal errors. Hence it is interesting to investigate the performance of FOSLS when α 0 is very close to one (unit root case) and see if it breaks down like the GMM or it is stable like the MPML.
To answer these questions we carry out a simulation using the nonnormal-stationary model defined before. The results in Table 3 show that the FOSLS is less biased and more efficient than the RML due to the skewness in the within group errors. Although the FOSLS has larger downward bias than the MPML, the difference vanishes quickly as T and α 0 increase. These results demonstrate that the FOSLS is robust in the case of near unit root and is even more efficient than the MPML for T = 6 and α 0 = 0.99 . This may be due to nonnormality.

5.3. Performance in the Presence of Covariates

In the previous simulation studies we considered models without covariates in order to make our numerical results comparable with existing methods in the literature. Now we investigate if the finite sample performance of the FOSLS changes when the model includes covariates. A well-known work in this respect is Kiviet (1995) who compared some least squares and instrumental variable (IV) type estimators (among them the linear first-difference GMM of Arellano and Bond 1991).
Following Kiviet (1995, Section 5 and Appendix B) we consider model (1) with a covariate and y t being generated from the reduced form equation (here we omit the subscript i) y t = β ϕ t + ψ t + η / ( 1 α ) , t = 0 , 1 , . . . , T , where ϕ t A R ( 2 ) and ψ t A R ( 1 ) are mutually independent stationary processes and both are independent of η N ( 0 , σ η 2 ) . The orthogonality and normality assumptions of Kiviet (1995) imply E ( η | y 0 , x 1 , , x T ) = θ 2 y 0 + θ 4 x 1 and V ( η | y 0 , x 1 , , x T ) = exp ( θ 3 ) as in our notation in (12).
Kiviet (1995) reported simulation results for 14 different combinations (designs) of parameter values and sample sizes. We have simulated all the 14 designs but only present the results for four designs to save space. Table 4 contains the bias, standard deviation and root mean squared error of the FOSLS, RML, and linear first-difference GMM (Arellano and Bond 1991) for α and β . As expected, the RML (which is the conditional MLE) is the best in all cases because the data are generated using the normal distribution. However, the FOSLS (which is asymptotically as efficient as the RML) is very close to the RML. These results show that in the presence of strictly exogenous variable the FOSLS is a good alternative to the most efficient RML under normality. The calculated summary statistics of GMM are different than the corresponding values of GMM1 in Kiviet (1995), probably due to two errors in his equation B6 where ξ 0 and ξ 1 should be standardized. We have also done another simulation under nonnormal-nonstationary setup and obtained similar results as in Table 1. They also clearly show the outperformance of FOSLS compared to RML due to deviation from normality.

6. Application

In this section we use a real data example to demonstrate the practical usefulness of the SLS approach in comparison with the IV approach and to assess the practical gain of efficiency over the RML estimator. In particular, we use a data set published in Wooldridge (2010) and downloadable at http://mitpress.mit.edu/sites/default/files/titles/content/wooldridge/statafiles.zip (as of 28 April 2014). The dataset airfare.dta contains data on airfares, number of passengers, distance, and the market share of the largest carrier for each of the top 1149 city-pair markets within the contiguous 48 US states for the fourth quarters of 1997 through 2000. A detailed description of the data can be found at http://academic.reed.edu/economics/parker/s10/312/Asgns/proj3.html (as of 28 April 2014).
The main factors that influence airfares are the flight distance (ldist), average number of passengers (lpassne), and market concentration (concen). All variables except concen are measured in logarithmic scale. However, while the first factor is time invariant exogenous variable, the other two factors are clearly endogenous because they are also influenced by the airfare. Since the airlines usually set the current airfare by adjusting the previous year’s fare, a linear dynamic regression model with unobserved route heterogeneity and time dummies may be appropriate to measure the effect of the determinants. Indeed, from bivariate scatterplots one can clearly see that the linear dependence between the current and previous airfares, lfare and lfare1 respectively, and moderate positive linear association between lfare and ldist. However, although theoretically concen and lfare should be positively correlated, this is not clearly seen. A possible explanation is that the relation is masked by the negative correlation between concen and ldist, given the positive correlation between lfare and ldist.
In light of these considerations, it is reasonable to start with the following model which includes time dummies D 99 , D 00 for years 1999, 2000, respectively:
l f a r e i t = α 0 + α 1 l f a r e i ( t 1 ) + β 1 l d i s t i + β 2 c o n c e n i t + β 3 c o n c e n i ( t 1 ) + β 4 l p a s s e n i t + β 5 l p a s s e n i ( t 1 ) + β 6 D 99 + β 7 D 00 + η i + ε i t .
We first calculate the naive ordinary least squares (OLS) estimates of the proposed Equation (14) which are shown in the OLS column of Table 5. Although the OLSE are not consistent due to the correlation between η i and l f a r e i ( t 1 ) , the high value of R 2 = 0.95 reflects strong explanatory power of the regressors. So we refit the model using the two stage linear first differenced GMM (GMM2) of Arellano and Bond (1991) and the lags of order at least two of ( l f a r e , c o n c e n , l p a s s e n ) as instrumental variables to deal with the possible endogeneity of these variables. The estimates are calculated using STATA 13.0 and the results are shown in Table 5 under Model-I.
Wooldridge (2010, p. 373) used the GMM2 to fit a first order linear dynamic model with only concen (treated as strictly exogenous) and dummy variables and obtained the estimated autoregressive parameter 0.333 which is not far from ours. According to the reported value of Sargan test (see Table 5), the GMM sequential moments based on the reduced form of model (14) are not rejected. However, the estimated elasticity of lfare with respect to concen is negative (p-value = 0.059 ). A possible explanation for this unexpected sign is the problem of weak instruments which is likely to occur in identifying β 2 given the strong positive linear correlation between concen and concen1. This is similar to the situation where the first differenced GMM estimator is used to estimate the autoregressive parameter when it is close to one (Blundell and Bond 1998). This problem causes inflation in the coefficient variance and leads to unreliable estimates. This is an example where the RE approach with the level data is preferred over the FE approach with differenced data, because the former does not depend on the instrumental variables and hence is able to give more reliable estimates.
However, since the sequential moments are correctly specified in Model-I, the GMM estimates can still be used to recover the within group errors ε i t and route effect η i . We follow Arellano (2003, p. 118–19) to estimate the time effect for year 1998 and subsequently η i . Then realizations of ε i t are obtained directly from Equation (14). Since bivariate scatterplots of η ^ i and the initial values (1997) of the variables (lfare0, concen0, lpassen0, and ldsit0) show possible linear relationships between η i and these initial values, we fit the following auxiliary equation where all coefficients are significant at 0.01 level of significance:
η ^ i = 5.507 + 0.641 l f a r e i 0 + 0.464 c o n c e n i 0 0.041 l d i s t i 0 + 0.286 l p a s s e n i 0 .
Further, simple diagnostics show that ε ^ i t have constant variance across time and no significant autocorrelation, hence the assumptions on η i and ε i t seem reasonable for model (14). These assumptions will be implicitly tested by using our SW test latter.
Based on Equation (15) we use the ‘lme’ command in R to calculate the RML estimates of the following equation and the results are given in Table 5 under Model-I.
l f a r e i t = θ 0 + α 1 l f a r e i ( t 1 ) + β 1 l d i s t i + β 2 c o n c e n i t + β 3 c o n c e n i ( t 1 ) + β 4 l p a s s e n i t + β 5 l p a s s e n i ( t 1 ) + β 6 D 99 + β 7 D 00 + θ 1 l f a r e i 0 + θ 2 c o n c e n i 0 + θ 3 l p a s s e n i 0 + η i * + ε i t .
A normal QQ plot of the corresponding residuals ε ^ i t shows a flat tail symmetric distribution that can be approximated by a Student t distribution with df = 5 , indicating the possibility of efficiency gain by applying the FOSLS estimator. Hence we use the sample skewness and kurtosis of ε ^ i t and η ^ i * to calculate the optimal weight matrix for the FOSLS. To save space we report only the estimated coefficients of the main factors in Table 5, where the standard errors of GMM2 are computed using the robust formula, while the standard errors of FOSLS and RML are computed using Formula (9) and (A14), respectively.
In Model-I, while the Sargan test does not reject the GMM sequential moments (p-value = 0.268 ), our SW test shows fairly strong evidence against the first and second moment specification given by Equations (3) and (4) (p-value = 0.013 ). This motivates us to check the reliability of SW test by examining its empirical sampling distribution under (3) and (4) as follows. First, we use the RML estimates of Model-I to generate data from (16), and draw η i * and ε i t from Student t distribution with df = 6 and 5 respectively to be as close as possible to the estimated residuals and random effect of Model-I. Second, we use the generated data to fit (16) using the RML followed by FOSLS and then calculate the SW statistic. We repeat these two steps 1000 times to obtain an approximation to the sampling distribution of SW statistics under H 0 . Figure 2 confirms that a sample size of 1149 routes is sufficient for the empirical sampling distribution to be close to the asymptotic one χ 9 2 under H 0 . Consequently Model-I needs to be modified by adding or eliminating some variables to pass the SW test.
To proceed, we fit (16) again without concen0 because it is not of main interest and has weak correlation with η ^ i . This leads to Model-II which contains the same explanatory variables as Model-I and of course the same value of Sargan test (6.41). On the other hand, the value of SW statistic drops to 4.01 which is insignificant at any reasonable level of significance. This suggests that Model-II should be used for testing purpose provided that the signs of FOSLS estimates are consistent with the economic theory. The robust standard errors of the FOSLS are computed by Theorem 1 and are reported in Table 6, according to which two variables concen1 and lpassen1 are the candidates to be dropped from the model.
Model-III is obtained by dropping concen1 only from Model-II, but this new specification is rejected by both Sargan and SW test. Consequently Model-IV is obtained by dropping lpassen1 only from Model-II. Interestingly, while SW test rejects Model-IV specification, Sargan test doesn’t reject the GMM sequential moments of this model. However, the GMM estimate of β 2 is negative, which contradicts the economic theory. It follows that Model-IV is not desirable and instead Model-II is preferred. It does include both concen1 and lpassen1 although they may be insignificant according to the robust standard errors. On the other hand, if the standard errors in Table 5 are used, then both concen1 and lpassen1 are significant according to FOSLS and only lpassen1 is according to RML. Since Model-III is rejected by SW test, it follows that using the information inherent in the fourth moment through FOSLS is effective in keeping the variable concen1 in Model-II.

7. Conclusions and Discussion

We studied the SLS approach as an alternative to the commonly used random effects ML (RML) or differenced-GMM estimation for linear dynamic panel data models. Our approach is based on the first two conditional moments of the outcome process and does not postulate any distributional assumptions on the error components in the model. The asymptotic and finite sample properties and practical merits of the proposed estimators are thoroughly investigated.
This research reveals the following interesting new findings. First, differencing data and using instrumental variables may cause substantial loss of information and produce misleading relations, specially when the time varying variables are generated from autoregressive process with high autocorrelation (which is common in economic data). In such a case the linear first-differenced GMM or other similar methods not only are unable to estimate the effect of the time invariant variables, but also, more harmfully, weakly identify the effect of the time variant explanatory variables. In contrast the SLS approach makes use of the information inherent in the level data and therefore can improve the estimation precision and the goodness of fit considerably.
Second, the information in the sample skewness and kurtosis of the within group residuals can be utilized by our OSLS to gain more efficiency over the RML and consequently save important explanatory variables from being wrongly eliminated. In other words, by using the extra efficiency of the OSLS one can avoid falling into misspecification traps. Third, our newly proposed diagnostic test proved to be very useful not only in validating the working conditional moments but also for model selection purpose, while usual goodness of fit criteria such as RMSE, R 2 or AIC may be misleading when the model is incorrectly specified.
We have explicitly dealt with an A R ( 1 ) model for the simplicity of notation. However, our approach can be extended to more general A R ( p ) models. For example, It is straightforward to calculate the FOSLS1 in Section 4 using a generalized form of the transformation matrix C and the deviation form of the moments as in (10) and (11).
This work also raises some interesting points for future research. Although our diagnostic test provides a systematic built-in tool to validate the model conditional moments assumption, in practice it would be interesting to run some sensitivity analysis to investigate the consequences of possible non-rejected deviations from this assumption. It would also be interesting to study the properties of the SW test and extend it to check the working assumption on the third and fourth moments. Last but not least, other general methods such as GMM or equivalently estimating equations applied on level data could have similar asymptotic properties as our approach. However, it is important and worthwhile to compare their finite sample properties and practical implications in real data analysis. The current work provides a complete set of tools for the inference in linear dynamic models with level data which has not been studied so far in the literature.

Author Contributions

Conceptualization, M.S. and L.W.; methodology, M.S. and L.W.; software, M.S.; validation, M.S. and L.W.; formal analysis, M.S.; investigation, M.S.; data curation, M.S.; writing—original draft preparation, M.S.; writing—review and editing, L.W.; supervision, L.W.; funding acquisition, L.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Natural Sciences and Engineering Research Council of Canada (NSERC) grant number 546719.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The airfare data used in the Application section are downloadable at http://mitpress.mit.edu/sites/default/files/titles/content/wooldridge/statafiles.zip (as of 28 April 2014).

Acknowledgments

The first author gratefully acknowledges financial support by the University of Manitoba Graduate Fellowship and Manitoba Graduate Scholarship.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A. Regularity Conditions

To establish the asymptotic properties of the SLS estimator γ ^ N , the following regularity conditions are assumed, where · denotes the Euclidean norm.
Assumption A1.
The parameter space Γ is a compact subset of I R p + + 2 .
Assumption A2.
f j ( y i 0 , x i , θ ) , j = 1 , 2 are Borel measurable functions of ( y i 0 , x i ) for each θ in the corresponding parameter space Θ, and are continuous functions of θ with probability one. Furthermore, for all t,
E W 1 y 10 4 + sup Θ f 2 2 ( y 10 , x 1 , θ ) + x 1 t 4 + η 1 4 + ε 1 t 4 + 1 < .
Assumption A3.
E [ h 1 ( γ ) h 1 ( γ 0 ) ] W 1 [ h 1 ( γ ) h 1 ( γ 0 ) ] = 0 if and only if γ = γ 0 .
Assumption A4.
With probability one, f j ( y i 0 , x i , θ ) , j = 1 , 2 are twice continuously differentiable in int ( Θ ) . Furthermore, for j = 1 , 2 ,
E W 1 ( y 10 2 + x 1 2 + 1 ) 2 j sup N ( θ 0 ) f j ( y 10 , x 1 , θ ) θ 2 + 2 f j ( y 10 , x 1 , θ ) θ θ 2 < ,
where N ( θ 0 ) int ( Θ ) is a closed neighborhood of θ 0 .
Assumption A5.
The matrix
A = E h 1 ( γ 0 ) γ W 1 h 1 ( γ 0 ) γ
in (6) is nonsingular.
Assumption A6.
It holds
E W 1 2 ( y 10 8 + x 1 8 + f 2 4 ( y 10 , θ 0 ) + η 1 8 + ε 1 t 8 + y 10 4 + x 1 4 + 1 f 1 ( y 10 , x 1 , θ 0 ) θ 4 + f 2 ( y 10 , x 1 , θ 0 ) θ 4 + 1 ) < .
Note that Assumptions A1–A6 are standard regularity conditions in the M-estimation literature. Assumption A3 is necessary and sufficient for parameter identification, while Assumptions A2, A4 and A6 are sufficient but not necessary.
Furthermore, for the FOSLSE to be consistent, it is sufficient that
E sup Γ * U 1 1 ( γ * ) y 10 4 + x 1 4 + sup Θ f 2 2 ( y 10 , x 1 , θ ) + η 1 4 + ε 1 t 4 + 1 < ,
where U 1 1 is as in Theorem 3, γ * is the vector containing all generic parameters in U 1 1 including γ , and Γ * is the corresponding compact parameter space.
Although Assumptions A2–A6 and condition (A1) look complicated for the sake of generality, they can be simplified by specifying the functional forms of E ( η i j | y i 0 , x i ) , j = 1 , 2 , 3 , 4 and E ( ε i t j | y i 0 , x i ) , j = 3 , 4 . For example, under (12)–(13), U i 1 has a special structure so that assumptions A2, A4, A6 and condition (A1) are implied by E ( y i 0 4 ) < .

Appendix B. Proof of Theorem 1 (1)

For the simplicity of notation, we use f j , y t , x ˜ t , a t , c t s , d t s , w t s , k t s for f j ( y 10 , x 1 , θ ) , y 1 t , x ˜ 1 t ( α ) , a t ( α ) , c t s ( α ) , d t s ( α ) , w 1 t s ( α ) , and k 1 t s ( α ) respectively.
First, by Cauchy-Schwarz inequality we have
h 1 ( γ ) 2 2 t = 1 T y t 2 + 8 y 0 2 t = 1 T α 2 t + 8 f 1 2 t = 1 T a t 2 + 4 t = 1 T β x ˜ t x ˜ t β + 2 1 s t T y t 2 y s 2 + 16 y 0 4 1 s t T α 2 ( t + s ) + 16 f 2 2 1 s t T a t 2 a s 2 + 16 σ 4 c t s 2 + 16 y 0 2 f 1 2 1 s t T d t s 2 + 8 1 s t T β x ˜ t x ˜ s β 2 + 16 y 0 2 1 s t T β w t s 2 + 16 f 1 2 1 s t T β k t s 2 ,
E W 1 ( y t y s ) 2 E W 1 y t 4 E W 1 y s 4 and E W 1 y 0 2 f 1 2 E W 1 y 0 4 E W 1 f 2 2 . Therefore by Assumptions A1 and A2 we have E sup Γ | q 1 ( γ ) | E W 1 sup Γ h 1 ( γ ) 2 < . It follows from the uniform law of large numbers (ULLN Amemiya 1985; Jennrich 1969) that
sup Γ 1 N i = 1 N q i ( γ ) E q 1 ( γ ) a . s . 0 as N and T is fixed .
Second, since E q 1 ( γ ) = E h 1 ( γ 0 ) W 1 h 1 ( γ 0 ) + 2 E h 1 ( γ ) h 1 ( γ 0 ) W 1 h 1 ( γ 0 ) , we have E q 1 ( γ ) E q 1 ( γ 0 ) and the equality holds if and only if γ = γ 0 . Finally the result follows from (A2) and Lemma 1 of Wang and Leblanc (2008).

Appendix C. Proof of Theorem 1 (2)

First, by the mean value theorem for random functions (Jennrich 1969), Assumptions A1–A4 guarantee that
1 γ ^ N N ( γ 0 ) i = 1 N s i ( γ 0 ) + i = 1 N H ¯ i γ ^ N γ 0 = 0 ,
where
s i ( γ ) = q i ( γ ) γ = 2 h i ( γ ) γ W i h i ( γ ) ,
matrix H ¯ i has rows given by 2 q i ( γ ¯ N r ) γ ( r ) γ , r = 1 , 2 , , ( p + + 2 ) , and γ ¯ N r are measurable mappings into N ( γ 0 ) and lying on the segment joining γ ^ N and γ 0 . By the triangle inequality we have
2 q 1 ( γ ) γ γ 2 h 1 ( γ ) γ W 1 h 1 ( γ ) γ + 2 2 h 1 ( γ ) γ ( i ) γ ( j ) W 1 h 1 ( γ ) i , j ,
and further by Cauchy-Schwarz inequality we have
E sup N ( γ 0 ) h 1 ( γ ) γ W 1 h 1 ( γ ) γ E W 1 sup N ( γ 0 ) h 1 ( γ ) γ 2
and the following inequalities for 1 s t T ,
μ 1 t ( γ ) α 2 4 t 2 α 2 ( t 1 ) y 0 2 + 4 a t α 2 f 2 + 2 β x ˜ t α 2 , ν 1 t s ( γ ) α 2 8 ( t + s ) 2 α 2 ( t + s 1 ) y 0 4 + 8 f 2 2 a s a t α + a t a s α 2 + 8 σ 4 c t s α 2 + 8 y 0 2 f 2 d t s α 2 + 8 y 0 2 β w t s α 2 + 8 f 2 β k t s α 2 + 8 β x ˜ t α 2 β x ˜ s 2 + 8 β x ˜ s α 2 β x ˜ t 2 , ν 1 t s ( γ ) θ 2 4 a t 2 a s 2 f 2 θ 2 + 4 d t s 2 y 0 2 f 1 θ 2 + 2 β k t s 2 f 1 θ 2 , ν 1 t s ( γ ) β 2 4 x ˜ t x ˜ s 2 β 2 + 4 y 0 2 w t s 2 + 4 f 2 k t s 2 .
Hence it follows from Assumptions A2 and A4 that
E W 1 sup N ( γ 0 ) h 1 ( γ ) γ 2 < .
Similarly by the Cauchy-Schwarz inequality and Assumptions A2 and A4 we have
E sup N ( γ 0 ) 2 h 1 ( γ ) γ ( i ) γ ( j ) W 1 h 1 ( γ ) i , j E W 1 sup N ( γ 0 ) h 1 ( γ ) 2 E W 1 sup N ( γ 0 ) i , j p + + 2 2 h 1 ( γ ) γ ( i ) γ ( j ) 2 < .
Combining inequalities (A4), (A5), (A6), Assumptions A2, A4 and the ULLN we have
sup N ( γ 0 ) 1 N i = 1 N 2 q i ( γ ) γ γ E 2 q 1 ( γ ) γ γ a . s . 0 as N for fixed T .
It follows from Lemma 2 of Wang and Leblanc (2008) and the strong consistency of γ ^ N that
1 N i = 1 N H ¯ i a . s . E 2 q 1 ( γ 0 ) γ γ = 2 A as N for fixed T .
Further, by Assumption A6 and the central limit theorem (CLT) we have
1 N i = 1 N s i ( γ 0 ) d N ( 0 , 4 B ) as N for fixed T ,
where B is given in (7). Hence by Slutzky theorem and (A3), (A8) we have, for fixed T,
N ( γ ^ N γ 0 ) = ( 2 A ) 1 1 N i = 1 N s i ( γ 0 ) + o p ( 1 ) .

Appendix D. Proof of Theorem 2

First, by Theorem 1, Assumption A2 and the mean value theorem, for sufficiently large N,
h ( γ ^ N ) = 1 N i = 1 N h i ( γ 0 ) + h ( γ ˜ N ) γ ( γ ^ N γ 0 ) ,
where γ ˜ N γ 0 γ ^ N γ 0 . By (A3), for sufficiently large N such that γ ^ N N ( γ 0 ) ,
i = 1 N H ¯ i γ ^ N γ 0 = 2 i = 1 N h i ( γ 0 ) γ W i h i ( γ 0 ) .
Further, by (A8),
1 N i = 1 N H ¯ i = 2 A + o p ( 1 )
and analog to the proof of (A8), by Assumptions A2, A4 and the ULLN we can verify that
h ( γ ˜ N ) γ = 1 N i = 1 N h i ( γ ˜ N ) γ = E h 1 ( γ 0 ) γ + o p ( 1 ) .
Combining (A10)–(A13) we obtain
N h ( γ ^ N ) = 1 N i = 1 N h i ( γ 0 ) + E h 1 ( γ 0 ) γ N ( γ ^ N γ 0 ) + o p ( 1 ) = 1 N i = 1 N P i ( γ 0 ) h i ( γ 0 ) + o p ( 1 ) , [ see Equation ( A9 ) ]
where
P i ( γ 0 ) = I E h 1 ( γ 0 ) γ A 1 h i ( γ 0 ) γ W i .
Since E P i ( γ 0 ) h i ( γ 0 ) h i ( γ 0 ) P i ( γ 0 ) is nonsingular, the result follows from the CLT.

Appendix E. Proof of Theorem 3

Let
R = U 1 1 / 2 W 1 h 1 ( γ 0 ) γ , Q = U 1 1 / 2 h 1 ( γ 0 ) γ .
Then by matrix form of Cauchy-Schwartz inequality it is straightforward to see that E R R E R Q E 1 Q Q E Q R is nonnegative definite and is zero matrix if W 1 = U 1 1 .

Appendix F. Asymptotic Variance of the RMLE

Using the GMM setup the asymptotic variance of the RML estimator under model (1) is
F 0 = E 1 K V K E K V M V K E 1 K VK ,
where
K = μ ( γ 0 ) γ vech ( S ( γ 0 ) ) γ ,
V = S ( γ 0 ) 1 0 0 1 2 L S ( γ 0 ) 1 S ( γ 0 ) 1 L ,
M = S ( γ 0 ) E u 1 vech ( u 1 u 1 ) | y 10 , x 1 . E vech ( u 1 u 1 ) vech ( u 1 u 1 ) | y 10 , x 1 vech ( S ( γ 0 ) ) vech ( S ( γ 0 ) ) ,
with μ ( γ ) = μ 1 t ( γ ) , 1 t T , vech ( S ( γ ) ) = a t a s ( f 2 f 1 2 ) + σ 2 c t s , 1 s t T and u 1 = y 1 t μ 1 t ( γ 0 ) , 1 t T . Here vec and vech are the usual operators that stack the columns and lower triangle columns respectively of a matrix to form a vector, and L is the so-called selection matrix such as vec ( S ( γ ) ) = L vech ( S ( γ ) ) . Using the same notations, the asymptotic variance of the optimal SLS estimator is given by
F 0 * = E 1 K M 1 K .
Again, by the matrix form of Cauchy-Schwartz inequality F 0 F 0 * is nonnegative definite and is zero matrix if and only if
M VK = K E 1 K M 1 K E K V K .

References

  1. Abarin, Taraneh, and Liqun Wang. 2009. Second-order least squares estimation of censored regression models. Journal of Statistical Planning and Inference 139: 125–35. [Google Scholar] [CrossRef]
  2. Alvarez, Javier, and Manuel Arellano. 2003. The time series and cross-section asymptotics of dynamic panel data estimators. Econometrica 71: 1121–59. [Google Scholar] [CrossRef] [Green Version]
  3. Amemiya, Takeshi. 1985. Advanced Econometrics. Cambridge: Harvard University Press. [Google Scholar]
  4. Arellano, Manuel. 2003. Panel Data Econometrics. Oxford: Oxford University Press. [Google Scholar]
  5. Arellano, Manuel, and Olympia Bover. 1995. Another look at the instrumental variable estimation of error-components models. Journal of Econometrics 68: 29–51. [Google Scholar] [CrossRef] [Green Version]
  6. Arellano, Manuel, and Stephen Bond. 1991. Some tests of specification for panel data: Monte carlo evidence and an application to employment equations. The Review of Economic Studies 58: 277–97. [Google Scholar] [CrossRef] [Green Version]
  7. Baltagi, Badi. 2008. Econometric Analysis of Panel Data. New York: Wiley. [Google Scholar]
  8. Bickel, Peter J., Chris A. Klaassen, Ya’acov Ritov, and Jon A. Wellner. 1993. Efficient and Adaptive Estimation for Semiparametric Models. Baltimore: Johns Hopkins University Press. [Google Scholar]
  9. Blundell, Richard, and Stephen Bond. 1998. Initial conditions and moment restrictions in dynamic panel data models. Journal of Econometrics 87: 115–43. [Google Scholar] [CrossRef] [Green Version]
  10. Bose, Mausumi, and Rahul Mukerjee. 2015. Optimal design measures under asymmetric errors, with application to binary design points. Journal of Statistical Planning and Inference 159: 28–36. [Google Scholar] [CrossRef] [Green Version]
  11. Chamberlain, Gary. 1987. Asymptotic efficiency in estimation with conditional moment restrictions. Journal of Econometrics 34: 305–34. [Google Scholar] [CrossRef]
  12. Chamberlain, Gary. 1992. Comment: Sequential moment restrictions in panel data. Journal of Business & Economic Statistics 10: 20–26. [Google Scholar]
  13. Fitzmaurice, Garrett, Marie Davidian, Geert Verbeke, and Geert Molenberghs. 2009. Longitudinal Data Analysis. Boca Raton: CRC Press. [Google Scholar]
  14. Gao, Lucy L., and Julie Zhou. 2014. New optimal design criteria for regression models with asymmetric errors. Journal of Statistical Planning and Inference 149: 140–51. [Google Scholar] [CrossRef]
  15. Gao, Lucy L., and Julie Zhou. 2017. D-optimal designs based on the second-order least squares estimator. Statistical Papers 58: 77–94. [Google Scholar] [CrossRef]
  16. Hahn, Jinyong. 1997. Efficient estimation of panel data models with sequential moment restrictions. Journal of Econometrics 79: 1–21. [Google Scholar] [CrossRef]
  17. Hsiao, Cheng. 2003. Analysis of Panel Data. Cambridge: Cambridge University Press. [Google Scholar]
  18. Hsiao, Cheng. 2011. Dynamic panel data models. Handbook of Empirical Economics and Finance 13: 373–96. [Google Scholar]
  19. Jennrich, Robert I. 1969. Asymptotic properties of non-linear least squares estimators. The Annals of Mathematical Statistics 40: 633–43. [Google Scholar] [CrossRef]
  20. Kim, Mijeong, and Yanyuan Ma. 2012. The efficiency of the second-order nonlinear least squares estimator and its extension. Annals of the Institute of Statistical Mathematics 64: 751–64. [Google Scholar] [CrossRef] [Green Version]
  21. Kiviet, Jan F. 1995. On bias, inconsistency, and efficiency of various estimators in dynamic panel data models. Journal of Econometrics 68: 53–78. [Google Scholar] [CrossRef]
  22. Newey, Whitney K., and Daniel McFadden. 1994. Large sample estimation and hypothesis testing. Handbook of Econometrics 4: 2111–245. [Google Scholar]
  23. Okui, Ryo. 2009. The optimal choice of moments in dynamic panel data models. Journal of Econometrics 151: 1–16. [Google Scholar] [CrossRef]
  24. Park, Byeong U, Robin C Sickles, and Léopold Simar. 2007. Semiparametric efficient estimation of dynamic panel data models. Journal of Econometrics 136: 281–301. [Google Scholar] [CrossRef]
  25. Wang, Liqun. 2003. Estimation of nonlinear berkson-type measurement error models. Statistica Sinica 13: 1201–10. [Google Scholar]
  26. Wang, Liqun. 2004. Estimation of nonlinear models with berkson measurement errors. The Annals of Statistics 32: 2559–79. [Google Scholar] [CrossRef] [Green Version]
  27. Wang, Liqun. 2007. A unified approach to estimation of nonlinear mixed effects and berkson measurement error models. Canadian Journal of Statistics 35: 233–48. [Google Scholar] [CrossRef]
  28. Wang, Liqun, and Alexandre Leblanc. 2008. Second-order nonlinear least squares estimation. Annals of the Institute of Statistical Mathematics 60: 883–900. [Google Scholar] [CrossRef]
  29. Wooldridge, Jeffrey M. 2010. Econometric Analysis of Cross Section and Panel Data. Cambridge: MIT Press. [Google Scholar]
  30. Yin, Yue, and Julie Zhou. 2017. Optimal designs for regression models using the second-order least squares estimator. Statistica Sinica 27: 1841–56. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Reduction (%) in the variance of RML( α 0 ) gained by the optimal SLS( α 0 ). (a) Stationary, ε i t | η i , y i 0 χ ( 1 ) 2 . (b) Stationary, ε i t | η i , y i 0 t ( 5 ) . (c) Nonstationary, ε i t | η i , y i 0 χ ( 1 ) 2 . (d) Nonstationary, ε i t | η i , y i 0 t ( 5 ) .
Figure 1. Reduction (%) in the variance of RML( α 0 ) gained by the optimal SLS( α 0 ). (a) Stationary, ε i t | η i , y i 0 χ ( 1 ) 2 . (b) Stationary, ε i t | η i , y i 0 t ( 5 ) . (c) Nonstationary, ε i t | η i , y i 0 χ ( 1 ) 2 . (d) Nonstationary, ε i t | η i , y i 0 t ( 5 ) .
Jrfm 14 00410 g001aJrfm 14 00410 g001b
Figure 2. The empirical vs. asymptotic sampling distribution of SW by 1000 data cloning.
Figure 2. The empirical vs. asymptotic sampling distribution of SW by 1000 data cloning.
Jrfm 14 00410 g002
Table 1. The MAD of FOSLS and relative MAD (bold font) of all other estimators, and their Median (regular font) for nonnormal-nonstationary scenario.
Table 1. The MAD of FOSLS and relative MAD (bold font) of all other estimators, and their Median (regular font) for nonnormal-nonstationary scenario.
α 0 = 0.2 α 0 = 0.5 α 0 = 0.8
FOSLSFOSLS1RMLGMMMPMLFOSLSFOSLS1RMLGMMMPMLFOSLSFOSLS1RMLGMMMPML
N T = 5
300.0061.071.291.29120.570.0031.071.441.44124.240.0021.181.561.5489.56
0.2000.200.200.200.900.5000.500.500.500.940.8000.800.800.800.96
3000.0020.931.441.44394.890.0010.951.481.47413.210.0010.891.441.44255.54
0.2000.200.200.200.910.5000.500.500.500.940.8000.800.800.800.96
T = 10
300.0051.261.381.34146.480.0031.321.551.52191.900.0011.161.461.45173.13
0.2000.200.200.200.950.5000.500.500.500.970.8000.800.800.800.99
3000.0010.951.441.47468.000.0010.951.471.46554.610.0001.011.531.53590.03
0.2000.200.200.200.950.5000.500.500.500.970.8000.800.800.800.99
T = 15
300.0052.001.371.40156.160.0032.381.431.42193.590.0012.181.531.54239.64
0.2000.200.200.200.970.5000.500.500.500.980.8000.800.800.800.99
3000.0011.061.501.52497.460.0011.101.561.57641.360.0001.181.591.60748.73
0.2000.200.200.200.970.5000.500.500.500.980.8000.800.800.800.99
Table 2. The RMSE of FOSLS calculated by simulation and asymptotic formula (9) for nonnormal-nonstationary scenario.
Table 2. The RMSE of FOSLS calculated by simulation and asymptotic formula (9) for nonnormal-nonstationary scenario.
α 0 = 0.2 α 0 = 0.5 α 0 = 0.8
SimulationAsy FormulaSimulationAsy FormulaSimulationAsy Formula
N T = 5
300.008660.008150.005260.005010.002680.00288
3000.002650.002580.001590.001580.000940.00091
T = 15
300.007290.007030.003690.003640.001190.00122
3000.002290.002220.001120.001150.000380.00039
Table 3. The RMSE of FOSLS and relative RMSE (bold font) of all other estimators, and their Mean (regular font) for nonnormal near unit root scenario.
Table 3. The RMSE of FOSLS and relative RMSE (bold font) of all other estimators, and their Mean (regular font) for nonnormal near unit root scenario.
α 0 = 0.9 α 0 = 0.95 α 0 = 0.99
FOSLSRMLGMMMPMLFOSLSRMLGMMMPMLFOSLSRMLGMMMPML
N T = 3
1000.2161.234.940.450.2401.135.040.380.1232.3610.280.66
0.820.710.300.880.870.750.120.920.950.770.080.95
3000.1511.215.850.430.1301.528.320.440.0603.5921.850.81
0.830.780.480.900.890.810.270.940.970.830.090.97
T = 6
1000.0861.275.270.670.0941.326.210.490.0373.7517.101.07
0.870.830.520.900.910.860.430.940.980.880.430.97
3000.0391.857.440.880.064 1.367.830.480.0137.5047.901.69
0.900.860.680.900.920.890.530.950.990.910.430.98
Table 4. Bias, standard deviation and RMSE of estimators for Kiviet (1995) model with a covariate, N = 100 .
Table 4. Bias, standard deviation and RMSE of estimators for Kiviet (1995) model with a covariate, N = 100 .
BiasStdRMSE
DesignEstimator α β α β α β
IIGMM−0.0290.0020.3880.1300.3960.131
RML−0.005−0.0060.1060.0930.1070.094
FOSLS−0.005−0.0060.1070.0980.1070.099
VIGMM−0.060−0.0260.3040.0650.3130.066
RML−0.0210.0230.0590.0460.0590.047
FOSLS−0.0200.0230.0630.0470.0630.048
XGMM−0.074−0.0110.0650.3090.0670.309
RML−0.0040.0090.0480.1820.0480.183
FSOLS−0.0050.0090.0480.1830.0480.184
XIIGMM−0.038−0.0190.0880.3440.0960.344
RML0.0000.0060.0580.1780.0580.179
FOSLS−0.0300.0030.0580.1800.0580.180
Table 5. The models fitted to airfare data with coefficient estimates in bold and standard errors in regular font. The standard errors of FOSLS and RML are computed using (9) and (A14), respectively.
Table 5. The models fitted to airfare data with coefficient estimates in bold and standard errors in regular font. The standard errors of FOSLS and RML are computed using (9) and (A14), respectively.
Model-IModel-IIModel-IIIModel-IV
Coef.OLSGMM2RMLFOSLSGMM2RMLFOSLSGMM2RMLFOSLSGMM2RMLFOSLS
Const: θ 0 0.1530.256−0.1220.2480.7230.2430.7860.2790.708
0.0320.0320.0260.0250.0100.0100.0510.050
lfare1: α 1 0.9290.2160.3740.5360.2160.3770.4610.1570.3730.8170.1690.1470.823
0.1000.0390.0330.1000.0140.0130.0800.0090.0090.0720.0170.017
ldist: β 1 0.0310.0490.0660.0510.0910.052−0.0070.062−0.019
0.0040.0040.0030.0030.0010.0010.0060.006
concen: β 2 0.119−0.8490.1080.048−0.8490.0910.162−0.2170.0800.070−0.8420.0950.077
0.4490.0180.0180.4490.0110.0110.1670.0030.0030.3890.0200.019
concen1: β 3 −0.0760.3250.0130.0400.325−0.015−0.0320.3200.000−0.106
0.2150.0220.0220.2150.0110.0110.1870.0200.019
lpassen: β 4 −0.374−0.404−0.367−0.556−0.404−0.367−0.108−0.450−0.367−0.094−0.382−0.333−0.084
0.0810.0060.0060.0810.0040.0040.0570.0020.0020.0600.0070.007
lpassen1: β 5 0.3740.1030.1750.4270.1030.176−0.0140.1060.1750.005
0.1410.0210.0190.1410.0060.0060.1110.0030.003
D99: β 6 0.0000.0030.0160.0090.0030.0160.0370.0200.016−0.0170.0050.023−0.014
0.0130.0030.0030.0130.0020.0020.0060.0010.0010.0120.0030.003
D00: β 7 0.0420.0700.0740.0600.0700.0730.0920.0870.073−0.0030.0780.090 0.073
0.0150.0030.0030.0150.0030.0020.0110.0010.0010.0120.0030.003
Diagnostic Statistics
SW/Sargan 6.41 20.986.41 4.0113.90 335.808.16 8545.16
p-value 0.268 0.0130.268 0.9110.031 0.0000.226 0.000
RMSE0.095.124.023.235.124.042.665.124.070.695.126.010.74
Table 6. Model-II robust standard errors of FOSLS calculated by Theorem 1.
Table 6. Model-II robust standard errors of FOSLS calculated by Theorem 1.
θ 0 α 1 β 1 β 2 β 3 β 4 β 5 β 6 β 7
Const.lfare1ldistconcenconcen1lpassenlpassen1D99D00
0.0610.0410.0070.0340.0340.0260.0230.0040.005
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Salamh, M.; Wang, L. Second-Order Least Squares Method for Dynamic Panel Data Models with Application. J. Risk Financial Manag. 2021, 14, 410. https://doi.org/10.3390/jrfm14090410

AMA Style

Salamh M, Wang L. Second-Order Least Squares Method for Dynamic Panel Data Models with Application. Journal of Risk and Financial Management. 2021; 14(9):410. https://doi.org/10.3390/jrfm14090410

Chicago/Turabian Style

Salamh, Mustafa, and Liqun Wang. 2021. "Second-Order Least Squares Method for Dynamic Panel Data Models with Application" Journal of Risk and Financial Management 14, no. 9: 410. https://doi.org/10.3390/jrfm14090410

Article Metrics

Back to TopTop