Second-Order Least Squares Method for Dynamic Panel Data Models with Application

: Management of ﬁnancial risks and sound decision making rely on the accurate information and predictive models. Drawing useful information efﬁciently from big data with complex structures and building accurate models are therefore crucial tasks. Most commonly used methods for statistical inference in dynamic panel data models are based on the differencing transformation of data. However, differencing data may cause substantial loss of information, and therefore the subsequent analysis may fail to capture important features in the original level data. This point is demonstrated by a real data example where we use a semiparametrically efﬁcient estimation method on the level data to reach a more favorable model. In particular, we study a second-order least squares approach which is based on the ﬁrst two conditional moments of the response variable given the explanatory variables. This estimator is root-N consistent and its asymptotic variance reaches a lower bound semiparametric efﬁciency. Monte Carlo simulations show that this estimator performs favorably in ﬁnite sample situations compared to the ﬁrst-differenced GMM and the random effects pseudo ML estimators. We also propose a new diagnostic test to check the working moments assumption based on the proposed estimator. A real data application is presented to further demonstrate the usage of this method.


Introduction
Modern technology and data collection techniques have produced huge amount of data in business, economics, and many other fields. On the one hand, these data provide rich information to support decision making; however, on the other hand, the big sizes and complex structures of the data make the statistical analysis very challenging. How to draw useful information from data efficiently and how to identify accurate predictive models are important issues for the management of financial risks and decision making in general. A common data type is the repeated measurements data that are collected on a large number of units over certain period of time. In business and economics, the data are typically collected at regular time points (calendar time) and are usually called panel data. Statistical methodologies for the analysis of longitudinal data have been extensively studied in statistics as well as econometrics; however, the research follows two different directions.
In statistical literature, the main stream research focuses on the likelihood or generalized estimating equations (GEE) approaches in linear and nonlinear mixed effects models (e.g., Fitzmaurice et al. 2009). In contrast, in econometric literature, the emphasis is on the likelihood and generalized method of moments (GMM) approaches in dynamic panel data models (e.g., Arellano 2003;Baltagi 2008;Hsiao 2003Hsiao 2011. In particular, The GMM approach is usually based on some suitable linear transformation such as the first differencing to eliminate the unobserved subject effects (Arellano and Bover 1995;Arellano and Bond 1991;Blundell and Bond 1998). However, differencing operation may cause substantial loss of information in the data and therefore lead to the loss of estimation efficiency and inaccurate model identification. We will show through a real data example that more favorable model can be reached by applying a semiparametrically efficient estimation method to the original level data. Moreover, the model specification can be checked by our proposed diagnostic test.
In this paper, we propose a moments-based approach that is theoretically less restrictive than the likelihood-based methods and is fairly efficient and computationally tractable. This approach requires only the specification of the first two conditional moments of the unobserved subject effect given the process initial value and covariates, and does not require any other initial conditions or distributional assumptions. The data generating process can be either stationary or nonstationary, and does not need to be transformed. The so-called second-order least squares (SLS) estimator is consistent and asymptotically normally distributed when the cross section size N is large and the time series length T is fixed. In addition, this estimator reaches a semiparametric efficiency lower bound. We also propose a diagnostic test based on the SLS estimator to check the conditional moments assumption. Our extensive simulation studies show that the proposed estimator and its variants perform very well in finite sample situations and better than the GMM and likelihood based estimators in most cases.
The SLS method was first proposed by Wang (2003Wang ( , 2004 to estimate the nonlinear measurement error models. It was extended to the nonlinear longitudinal data models with homoscedastic errors by Wang (2007) and to the censored linear models by Abarin and Wang (2009). Wang and Leblanc (2008) showed that under a nonlinear (homoscedastic) cross-sectional data model, the SLSE is asymptotically more efficient than the OLSE (pseudo MLE) when the error term has nonzero third moment, and both estimators are equally efficient otherwise. Kim and Ma (2012) proved that the SLSE attains the optimal semiparametric efficiency bound in general. More recently, the SLS method has also been used in optimal design problems by several researchers, e.g., Gao and Zhou (2014), Bose and Mukerjee (2015), Gao and Zhou (2017), Yin and Zhou (2017). This paper is organized as follows. In Sections 2 and 3 we introduce the model and the semiparametric method for estimation and testing. In Section 4 we derive the asymptotically most efficient version of the proposed estimators. In Section 5 we carry out Monte Carlo simulations to study finite sample properties of the proposed estimators and compare them with other commonly used methods. In Section 6 we apply our method to a real data set to demonstrate its practical merits. Finally, conclusions and discussion are in Section 7 and regularity conditions and mathematical proofs are in the Appendices A-F.

Model and SLS Estimation
Let (y i , x i , η i ), i = 1, 2, . . . , N be independent and identically distributed random vectors where y i = (y i0 , y i1 , . . . , y iT ) and x i = (x i1 , x i2 , . . . , x iT ) are respectively the measurements of the response variable and p covariates taken for the ith subject over T time periods. Suppose where η i is the unobserved subject effect and error term ε it satisfies E(ε it |η i , y i0 , and zero otherwise. In addition, we assume that the conditional moments E(η j i |y i0 , x i ) = f j (y i0 , x i , θ 0 ), j = 1, 2 are known up to an -dimensional unknown parameter vector θ 0 . This assumption is more general than the unrestricted initial conditions used by Blundell and Bond (1998) and Alvarez and Arellano (2003) to derive the conditional GLS (CGLS) and the random effects ML (RML) estimators, respectively. Note that our semiparametric assumption on η i is not as restrictive as it appears because the functional forms of f j (y i0 , x i , θ 0 ), j = 1, 2 can be specified naturally based on some diagnostic tools as illustrated in Section 6. Moreover, any suggested specification can be tested using the test developed in Section 3. Also note that although we explicitly deal with an AR(1) model for the sake of simplicity of notation, our approach can be extended to more general AR(p) models as discussed later.
We propose to estimate unknown parameters in model (1) based on the first two conditional moments of the response y it given its initial value y i0 and covariates x i . Specifically, let γ 0 = (α 0 , σ 2 0 , β 0 , θ 0 ) and Γ ⊂ I R p+ +2 be the corresponding parameter space. By backward substitution in Equation (1), we obtain the reduced form equation where a t (α) r) . Under this model, the first two conditional moments of y it are given by where Then the second-order least squares (SLS) estimator for γ is defined bŷ where q i (γ) = h i (γ)W i h i (γ) and W i is a nonnegative definite matrix whose elements are real measurable functions of (y i0 , x i ). It follows from the standard M-estimation theory that the SLS estimatorγ N has the following asymptotic properties under the regularity conditions given in Appendices A.

A Diagnostic Test
The estimation approach in the previous section is based on the correct specification of the first two conditional moments of the response variables. In this section we propose a test for these moments, called SW for convenience. It is designed to test hypotheses . Then the test statistic is defined as whereγ N is defined by Equation (5) andĜ N is given bŷ The asymptotic distribution of the test statistic SW is given below.
under Assumptions A1-A6 and H 0 , SW d → χ 2 K as N → ∞ and T is fixed.

Optimal SLS Estimator
From Equations (6) and (7) we see that the asymptotic covariance matrix ofγ N depends on the weights W i . It is therefore desirable to chose the optimal weight so that the asymptotic variance is minimized in the following sense.
Theorem 3. Suppose U 1 = E h 1 (γ 0 )h 1 (γ 0 )| y 10 , x 1 is nonsingular with probability one, and Assumptions A2-A6 are satisfied with W 1 = U −1 1 . Then the SLS estimatorγ o N obtained by taking W i = U −1 i , i = 1, 2, ..., N has asymptotic covariance matrix Furthermore, for any SLS estimatorγ N , its asymptotic covariance matrix is such that However, since the optimal weight U −1 i depends on γ 0 and conditional moments E(ε j it |y i0 , x i ) and E(η j i |y i0 , x i ), j = 3, 4, the optimal SLS estimatorγ o N is not feasible. The corresponding feasible SLS estimator can be calculated by plugging in consistent estimators of these unknown quantities in U −1 i . This feasible optimal SLS (FOSLS) estimator is consistent under condition (A1) in Appendix A. Moreover, the FOSLS has the same asymptotic covariance matrix as the (infeasible) optimal SLS estimator (Newey and McFadden 1994, Th. 6.1).
We also suggest another version of the FOSLS, called FOSLS1, which may be more robust to any possible stochastic dependence between ε it and η i , and does not require initial estimates for the third and fourth conditional moments of ε it and η i . It is obtained by using the weight whereγ 0 N is a preliminary consistent estimator of γ 0 , and C(y i0 , x i , θ, α) is a transformation matrix that maps h i (γ) into It can be shown that the FOSLS1 has the same asymptotic covariance matrix as the optimal SLS estimator, if ε it and η i have constant second to fourth conditional central moments given the process initial value and covariate. It is worthwhile to note that the matrix C and vectors h * i (γ) do not involve the moments in (3) and (4) which are calculated based on the reduced form (2). Therefore it is straightforward to generalize them in AR(p) models.
Some researchers have studied the semiparametric efficient estimators for various dynamic panel data models. Chamberlain (1992) derived the optimal instrumental variables for the first-difference equation of model (1) under the sequential conditional moment restrictions and showed that the GMM estimator based on these optimal instrumental variables attains the semiparametric efficiency lower bound. Hahn (1997) showed that the GMM estimator based on an increasing set of instruments as the sample size grows would achieve the semiparametric efficiency bound of Chamberlain (1992). More recently, Park et al. (2007) used the geometric approach of Bickel et al. (1993) to construct a semiparametric efficient estimator under the stationary model with normal error distribution.
A natural question is whether the optimal SLS estimator efficiently uses the information in conditional moments (3) and (4). By Lemma 2 of Chamberlain (1987) the minimum bound of the asymptotic variance under E{h i (γ 0 )|y i0 , x i } = 0 is given by A −1 0 in (9). Therefore by Theorem 3 the optimal SLS estimator attains Chamberlain's bound of variance and is a semiparametric optimal estimator in this sense.
We conclude this section by comparing the asymptotic variance of the optimal SLS estimator with that of the RML estimator which is identical to the MLE conditional on the initial observation when the errors are normally distributed. Theoretically, the optimal SLSE is at least as efficient as the RMLE by (A14)-(A16) in Appendix F. Unfortunately it is difficult to evaluate the efficiency gain of the optimal SLSE analytically, hence we compare the asymptotic variances of the two estimators numerically. We considered a large number of scenarios with various data generating processes (stationary or nonstationary) and distributions for the error and unobserved random effects.
The percentage gain of efficiency in estimating α 0 as a function of T and α 0 is shown in Figure 1, where z-axis represents the percentage reduction in the variance of RMLE(α 0 ) by using the optimal SLSE(α 0 ). Our simulation results show convincingly that the asymptotic variance of the optimal SLSE(α 0 ) is strictly less than that of the RMLE(α 0 ) except for the case µ 3(ε) = 0 and µ 4(ε) = 3σ 4 0 (which is true under normal distribution), in which case both estimators have the same asymptotic variances.

Monte Carlo Simulation Studies
In this section we carry out extensive Monte Carlo simulations to examine the finite sample properties of FOSLS and FOSLS1 estimators, and compare them with some other popular estimators in the literature, including the linear first differenced GMM (Arellano and Bond 1991), the marginal pseudo maximum likelihood (MPML) (Arellano 2003) and the random effects pseudo maximum likelihood (RML) (Alvarez and Arellano 2003) estimators. For the sake of comparison we adopt the commonly used model setup in these literature, i.e., the model (1) with no covariates x it but the following specification This model specification is also used by Blundell and Bond (1998) to derive conditionaltype estimators such as the CGLS, see also Okui (2009).

Finite Sample Properties and Comparisons
Specifically, the data are generated according to where c, F 1 , F 2 are chosen as follows to be close to the setups in the literature.
Normal stationary process: c = 1 and F 1 , F 2 ∼ N(0, 1). Under this setup the MPML estimator is the true MLE computed using all data including the initial observations.
Nonnormal stationary process: c = 1 and F 1 , Normal nonstationary process: c = 20 and F 1 , F 2 ∼ N(0, 1). Under this setup the y process is nonstationary in first two moments and the MPML estimator is inconsistent.
Nonnormal nonstationary process: c = 20 and (a) In all scenarios the parameter values are α 0 = 0.2, 0.5, 0.8 and the sample sizes are N = 30, 300 and T = 5, 10, 15. In each simulation 1000 Monte Carlo replications are done and the median estimates and median absolute deviation (MAD) are calculated to assess the performance of the estimators. To compute the FOSLS estimators, we use the RML to obtain preliminary consistent estimates of γ 0 , µ j(ε) and µ j(η) , j = 3, 4, and then plug them in the optimal weight U −1 i . As mentioned earlier this does not affect the asymptotic properties of the second-step estimators.
To save space we only present the numerical results for the nonnormal-nonstationary (a) scenario in Table 1, where we use the MAD of the FOSLS estimator as reference and report the relative MADs for all other estimators. For the other scenarios we provide a summary and discussion of the results below.  In the normal-stationary scenario, the MAD of the FOSLS is decreasing in T and nondecreasing in α 0 . That the FOSLS and RML have equal asymptotic variance under the normal error components (see (A16)) appears clearly for α 0 = 0.2, 0.5, while larger values of N are required to see this fact for larger α 0 . As expected the MPML has the smallest MAD in almost all cases because it is the most efficient estimator under normality. However, the gap between the MPML and FOSLS is getting smaller as α 0 decreases or T increases. The first-difference GMM is generally inferior and the problem of weak instruments appears clearly for large α 0 , which is consistent with Blundell and Bond (1998). The FOSLS1 is not reliable for small N because the numerical calculation of its weight matrix is unstable when N < T(T + 3)/2. Moreover, the relative MADs of the FOSLS1 for N = 300 show that the estimator has a slower convergence rate than the FOSLS, which was expected. The downward bias in FOSLS vanishes quickly as T increases.
In the nonnormal-stationary scenario, the results shows wide outperformance of the FOSLS, specially for small T. The relative MADs of RML reveal that the true levels of variance reduction gained by FOSLS (see Figure 1a) require N to be larger than 300. The FOSLS competes well with the MPML for small N. Although the FOSLS1 is not reliable for small N/T ratio, it performs well for N = 300. The FOSLS has smaller bias for small N than other estimators.
In the normal-nonstationary scenario, the FOSLS, RML, and GMM compete very well for small and large N. The close performance of the FOSLS and RML is due to the normality of ε it . The improvement in the GMM performance is due to the nonstationarity of y it process. The results show clearly how the MPML breaks down everywhere under this scenario, demonstrating the consequence of misspecifying a nonstationary process. Again, the FOSLS1 requires large N/T ratio to get stable so it is not recommended in this scenario.
Finally, the results of nonnormal-nonstationary scenario presented in Table 1 show the effect of the skewness of the ε it distribution on the performance of the FOSLS. The RML and GMM are less efficient than the FOSLS by at least 30% for N = 30, and by as high as 59% for N = 300. The relative MADs of the RML for large N are consistent with Figure 1c. Although the true levels of variance reduction gained by FOSLS require N to be larger than 300, the gain of efficiency for small N is much larger than the corresponding gain in the nonnormal-stationary scenario. The numerical results of the three remaining cases (b,c,d) in this scenario clearly show the outperformance of the FOSLS(α 0 ) when ε it has nonnormal distribution.
We conclude this subsection by examining to what extent the asymptotic variance formula for the OSLS in (9) can be used to approximate its finite sample counterpart. To this end we use the asymptotic formula to calculate the standard deviation of the OSLS estimator for α 0 and compare with the sample RMSE calculated through Monte Carlo simulations for various values of N, T, and α 0 . Table 2 shows that both results are fairly close and therefore the asymptotic formula can be safely used in further inference about α 0 such as confidence intervals or testing hypothesis.

Robustness of FOSLS Against Unit-Root
It is well known in the literature that linear first-difference GMM is generally inferior when the autoregressive parameter α 0 is close to one due to the problem of weak instruments, see e.g., Blundell and Bond (1998). On the other hand, our simulation results in the previous subsection show that the performance of FOSLS is not affected by large values of α 0 . Furthermore, it becomes closer to the MPML and sometimes is even better due to the efficiency gain in the case of nonnormal errors. Hence it is interesting to investigate the performance of FOSLS when α 0 is very close to one (unit root case) and see if it breaks down like the GMM or it is stable like the MPML.
To answer these questions we carry out a simulation using the nonnormal-stationary model defined before. The results in Table 3 show that the FOSLS is less biased and more efficient than the RML due to the skewness in the within group errors. Although the FOSLS has larger downward bias than the MPML, the difference vanishes quickly as T and α 0 increase. These results demonstrate that the FOSLS is robust in the case of near unit root and is even more efficient than the MPML for T = 6 and α 0 = 0.99. This may be due to nonnormality.

Performance in the Presence of Covariates
In the previous simulation studies we considered models without covariates in order to make our numerical results comparable with existing methods in the literature. Now we investigate if the finite sample performance of the FOSLS changes when the model includes covariates. A well-known work in this respect is Kiviet (1995) who compared some least squares and instrumental variable (IV) type estimators (among them the linear first-difference GMM of Arellano and Bond 1991).
Following Kiviet (1995, Section 5 and Appendix B) we consider model (1) with a covariate and y t being generated from the reduced form equation (here we omit the subscript i) y t = βφ t + ψ t + η/(1 − α), t = 0, 1, ..., T, where φ t ∼ AR(2) and ψ t ∼ AR(1) are mutually independent stationary processes and both are independent of η ∼ N(0, σ 2 η ). The orthogonality and normality assumptions of Kiviet (1995) imply E(η|y 0 , x 1 , . . . , x T ) = θ 2 y 0 + θ 4 x 1 and V(η|y 0 , x 1 , . . . , x T ) = exp (θ 3 ) as in our notation in (12). Kiviet (1995) reported simulation results for 14 different combinations (designs) of parameter values and sample sizes. We have simulated all the 14 designs but only present the results for four designs to save space. Table 4 contains the bias, standard deviation and root mean squared error of the FOSLS, RML, and linear first-difference GMM (Arellano and Bond 1991) for α and β. As expected, the RML (which is the conditional MLE) is the best in all cases because the data are generated using the normal distribution. However, the FOSLS (which is asymptotically as efficient as the RML) is very close to the RML. These results show that in the presence of strictly exogenous variable the FOSLS is a good alternative to the most efficient RML under normality. The calculated summary statistics of GMM are different than the corresponding values of GMM1 in Kiviet (1995), probably due to two errors in his equation B6 where ξ 0 and ξ 1 should be standardized. We have also done another simulation under nonnormal-nonstationary setup and obtained similar results as in Table 1. They also clearly show the outperformance of FOSLS compared to RML due to deviation from normality.

Application
In this section we use a real data example to demonstrate the practical usefulness of the SLS approach in comparison with the IV approach and to assess the practical gain of efficiency over the RML estimator. In particular, we use a data set published in Wooldridge (2010) and downloadable at http://mitpress.mit.edu/sites/default/files/ titles/content/wooldridge/statafiles.zip (as of 28 April 2014). The dataset airfare.dta contains data on airfares, number of passengers, distance, and the market share of the largest carrier for each of the top 1149 city-pair markets within the contiguous 48 US states for the fourth quarters of 1997 through 2000. A detailed description of the data can be found at http://academic.reed.edu/economics/parker/s10/312/Asgns/proj3.html (as of 28 April 2014).
The main factors that influence airfares are the flight distance (ldist), average number of passengers (lpassne), and market concentration (concen). All variables except concen are measured in logarithmic scale. However, while the first factor is time invariant exogenous variable, the other two factors are clearly endogenous because they are also influenced by the airfare. Since the airlines usually set the current airfare by adjusting the previous year's fare, a linear dynamic regression model with unobserved route heterogeneity and time dummies may be appropriate to measure the effect of the determinants. Indeed, from bivariate scatterplots one can clearly see that the linear dependence between the current and previous airfares, lfare and lfare1 respectively, and moderate positive linear association between lfare and ldist. However, although theoretically concen and lfare should be positively correlated, this is not clearly seen. A possible explanation is that the relation is masked by the negative correlation between concen and ldist, given the positive correlation between lfare and ldist.
In light of these considerations, it is reasonable to start with the following model which includes time dummies D99, D00 for years 1999, 2000, respectively: We first calculate the naive ordinary least squares (OLS) estimates of the proposed Equation (14) which are shown in the OLS column of Table 5. Although the OLSE are not consistent due to the correlation between η i and l f are i(t−1) , the high value of R 2 = 0.95 reflects strong explanatory power of the regressors. So we refit the model using the two stage linear first differenced GMM (GMM2) of Arellano and Bond (1991) and the lags of order at least two of (l f are, concen, l passen) as instrumental variables to deal with the possible endogeneity of these variables. The estimates are calculated using STATA 13.0 and the results are shown in Table 5 under Model-I. Wooldridge (2010, p. 373) used the GMM2 to fit a first order linear dynamic model with only concen (treated as strictly exogenous) and dummy variables and obtained the estimated autoregressive parameter 0.333 which is not far from ours. According to the reported value of Sargan test (see Table 5), the GMM sequential moments based on the reduced form of model (14) are not rejected. However, the estimated elasticity of lfare with respect to concen is negative (p-value = 0.059). A possible explanation for this unexpected sign is the problem of weak instruments which is likely to occur in identifying β 2 given the strong positive linear correlation between concen and concen1. This is similar to the situation where the first differenced GMM estimator is used to estimate the autoregressive parameter when it is close to one (Blundell and Bond 1998). This problem causes inflation in the coefficient variance and leads to unreliable estimates. This is an example where the RE approach with the level data is preferred over the FE approach with differenced data, because the former does not depend on the instrumental variables and hence is able to give more reliable estimates.
However, since the sequential moments are correctly specified in Model-I, the GMM estimates can still be used to recover the within group errors ε it and route effect η i . We follow Arellano (2003, p. 118-19) to estimate the time effect for year 1998 and subsequently η i . Then realizations of ε it are obtained directly from Equation (14). Since bivariate scatterplots ofη i and the initial values (1997) of the variables (lfare0, concen0, lpassen0, and ldsit0) show possible linear relationships between η i and these initial values, we fit the following auxiliary equation where all coefficients are significant at 0.01 level of significance: η i = −5.507 + 0.641 l f are i0 + 0.464 concen i0 − 0.041 ldist i0 + 0.286 l passen i0 .
Further, simple diagnostics show thatε it have constant variance across time and no significant autocorrelation, hence the assumptions on η i and ε it seem reasonable for model (14). These assumptions will be implicitly tested by using our SW test latter.
Based on Equation (15) we use the 'lme' command in R to calculate the RML estimates of the following equation and the results are given in Table 5 under Model-I. l f are it = θ 0 + α 1 l f are i(t−1) + β 1 ldist i + β 2 concen it + β 3 concen i(t−1) +β 4 l passen it + β 5 l passen i(t−1) + β 6 D99 + β 7 D00 + θ 1 l f are i0 +θ 2 concen i0 + θ 3 l passen i0 A normal QQ plot of the corresponding residualsε it shows a flat tail symmetric distribution that can be approximated by a Student t distribution with df = 5, indicating the possibility of efficiency gain by applying the FOSLS estimator. Hence we use the sample skewness and kurtosis ofε it andη * i to calculate the optimal weight matrix for the FOSLS. To save space we report only the estimated coefficients of the main factors in Table 5, where the standard errors of GMM2 are computed using the robust formula, while the standard errors of FOSLS and RML are computed using Formula (9) and (A14), respectively. In Model-I, while the Sargan test does not reject the GMM sequential moments (p-value = 0.268), our SW test shows fairly strong evidence against the first and second moment specification given by Equations (3) and (4) (p-value = 0.013). This motivates us to check the reliability of SW test by examining its empirical sampling distribution under (3) and (4) as follows. First, we use the RML estimates of Model-I to generate data from (16), and draw η * i and ε it from Student t distribution with df = 6 and 5 respectively to be as close as possible to the estimated residuals and random effect of Model-I. Second, we use the generated data to fit (16) using the RML followed by FOSLS and then calculate the SW statistic. We repeat these two steps 1000 times to obtain an approximation to the sampling distribution of SW statistics under H 0 . Figure 2 confirms that a sample size of 1149 routes is sufficient for the empirical sampling distribution to be close to the asymptotic one χ 2 under H 0 . Consequently Model-I needs to be modified by adding or eliminating some variables to pass the SW test.
To proceed, we fit (16) again without concen0 because it is not of main interest and has weak correlation withη i . This leads to Model-II which contains the same explanatory variables as Model-I and of course the same value of Sargan test (6.41). On the other hand, the value of SW statistic drops to 4.01 which is insignificant at any reasonable level of significance. This suggests that Model-II should be used for testing purpose provided that the signs of FOSLS estimates are consistent with the economic theory. The robust standard errors of the FOSLS are computed by Theorem 1 and are reported in Table 6, according to which two variables concen1 and lpassen1 are the candidates to be dropped from the model.  Model-III is obtained by dropping concen1 only from Model-II, but this new specification is rejected by both Sargan and SW test. Consequently Model-IV is obtained by dropping lpassen1 only from Model-II. Interestingly, while SW test rejects Model-IV specification, Sargan test doesn't reject the GMM sequential moments of this model. However, the GMM estimate of β 2 is negative, which contradicts the economic theory. It follows that Model-IV is not desirable and instead Model-II is preferred. It does include both concen1 and lpassen1 although they may be insignificant according to the robust standard errors. On the other hand, if the standard errors in Table 5 are used, then both concen1 and lpassen1 are significant according to FOSLS and only lpassen1 is according to RML. Since Model-III is rejected by SW test, it follows that using the information inherent in the fourth moment through FOSLS is effective in keeping the variable concen1 in Model-II.

Conclusions and Discussion
We studied the SLS approach as an alternative to the commonly used random effects ML (RML) or differenced-GMM estimation for linear dynamic panel data models. Our approach is based on the first two conditional moments of the outcome process and does not postulate any distributional assumptions on the error components in the model. The asymptotic and finite sample properties and practical merits of the proposed estimators are thoroughly investigated. This research reveals the following interesting new findings. First, differencing data and using instrumental variables may cause substantial loss of information and produce misleading relations, specially when the time varying variables are generated from autoregressive process with high autocorrelation (which is common in economic data). In such a case the linear first-differenced GMM or other similar methods not only are unable to estimate the effect of the time invariant variables, but also, more harmfully, weakly identify the effect of the time variant explanatory variables. In contrast the SLS approach makes use of the information inherent in the level data and therefore can improve the estimation precision and the goodness of fit considerably.
Second, the information in the sample skewness and kurtosis of the within group residuals can be utilized by our OSLS to gain more efficiency over the RML and consequently save important explanatory variables from being wrongly eliminated. In other words, by using the extra efficiency of the OSLS one can avoid falling into misspecification traps. Third, our newly proposed diagnostic test proved to be very useful not only in validating the working conditional moments but also for model selection purpose, while usual goodness of fit criteria such as RMSE, R 2 or AIC may be misleading when the model is incorrectly specified.
We have explicitly dealt with an AR(1) model for the simplicity of notation. However, our approach can be extended to more general AR(p) models. For example, It is straightforward to calculate the FOSLS1 in Section 4 using a generalized form of the transformation matrix C and the deviation form of the moments as in (10) and (11).
This work also raises some interesting points for future research. Although our diagnostic test provides a systematic built-in tool to validate the model conditional moments assumption, in practice it would be interesting to run some sensitivity analysis to investigate the consequences of possible non-rejected deviations from this assumption. It would also be interesting to study the properties of the SW test and extend it to check the working assumption on the third and fourth moments. Last but not least, other general methods such as GMM or equivalently estimating equations applied on level data could have similar asymptotic properties as our approach. However, it is important and worthwhile to compare their finite sample properties and practical implications in real data analysis. The current work provides a complete set of tools for the inference in linear dynamic models with level data which has not been studied so far in the literature. Acknowledgments: The first author gratefully acknowledges financial support by the University of Manitoba Graduate Fellowship and Manitoba Graduate Scholarship.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A. Regularity Conditions
To establish the asymptotic properties of the SLS estimatorγ N , the following regularity conditions are assumed, where · denotes the Euclidean norm.
Assumption A1. The parameter space Γ is a compact subset of I R p+ +2 .
Note that Assumptions A1-A6 are standard regularity conditions in the M-estimation literature. Assumption A3 is necessary and sufficient for parameter identification, while Assumptions A2, A4 and A6 are sufficient but not necessary.
Furthermore, for the FOSLSE to be consistent, it is sufficient that where U −1 1 is as in Theorem 3, γ * is the vector containing all generic parameters in U −1 1 including γ, and Γ * is the corresponding compact parameter space. Although Assumptions A2-A6 and condition (A1) look complicated for the sake of generality, they can be simplified by specifying the functional forms of E(η j i |y i0 , x i ), j = 1, 2, 3, 4 and E(ε j it |y i0 , x i ), j = 3, 4. For example, under (12)-(13), U −1 i has a special structure so that assumptions A2, A4, A6 and condition (A1) are implied by E(y 4 i0 ) < ∞.
Hence it follows from Assumptions A2 and A4 that Similarly by the Cauchy-Schwarz inequality and Assumptions A2 and A4 we have Combining inequalities (A4), (A5), (A6), Assumptions A2, A4 and the ULLN we have It follows from Lemma 2 of Wang and Leblanc (2008) and the strong consistency of Further, by Assumption A6 and the central limit theorem (CLT) we have where B is given in (7). Hence by Slutzky theorem and (A3), (A8) we have, for fixed T,
Then by matrix form of Cauchy-Schwartz inequality it is straightforward to see that E(R R) − E(R Q)E −1 Q Q E Q R is nonnegative definite and is zero matrix if W 1 = U −1 1 .