Common Correlated Effects Estimation for Dynamic Heterogeneous Panels with Non-Stationary Multi-Factor Error Structures

: In this paper, we consider the estimation of a dynamic panel data model with non-stationary multi-factor error structures. We adopted the common correlated effect (CCE) estimation and established the asymptotic properties of the CCE and common correlated effects mean group (CCEMG) estimators, as N and T tend to inﬁnity. The results show that both the CCE and CCEMG estimators are consistent and the CCEMG estimator is asymptotically normally distributed. The theoretical ﬁndings were supported for small samples by an extensive simulation study, showing that the CCE estimators are robust to a wide variety of data generation processes. Empirical ﬁndings suggest that the CCE estimation is widely applicable to models with non-stationary factors. The proposed procedure is also illustrated by an empirical application to analyze the U.S. cigar dataset.


Introduction
Recently, there has been increased interest in the analysis of panel data models with cross-sectionally dependent errors (also known as unobserved common factors or multifactor error structures), which are motivated by empirical applications in economics, such as common shocks and the global financial crisis, see Omay and Kan (2010); Bussiere et al. (2013); Eberhardt et al. (2013) and Chudik et al. (2017), etc. The dependencies across the units violate the traditional assumption of independent and identically distributed errors; conventional panel estimation methods (such as fixed effects estimation) could have serious consequences and lead to inconsistent estimations and misleading inferences. Therefore, in econometrics literature, much effort has been devoted to the estimations for panels with cross-sectional dependence, for example, Pesaran (2006); Bai (2009) ;Zaffaroni (2009); Greenaway-McGrevy et al. (2012); Kao et al. (2012); Chudik and Pesaran (2013); Weidner (2015, 2017), among others. See also Chudik and Pesaran (2015b) for a survey of recent developments in large panel models with cross-sectional dependence.
Among these studies, a predominant approach of dealing with cross-sectionally dependent errors in panel models is the so-called common correlated effect (CCE) method proposed by Pesaran (2006) 1 . The basic idea of CCE estimation is to proxy the unobserved common factors using the cross-sectional averages of the observables in the regression. Comparatively, it has several advantages. For instance, it can be computed by least squares to auxiliary regression, and it does not require the knowledge of the number of unobserved factors. The CCE method has been further developed and applies to different types of panel models. To name a few, Chudik and Pesaran (2015a) suggested the CCE approach to analyze dynamic heterogeneous panels with stationary unobserved common factors. Kapetanios et al. (2011) extended the CCE method to static panel data models with nonstationary multi-factor error structures. Westerlund et al. (2019) considered the CCE for short panels, and Zhou and Zhang (2016) extended the CCE for unbalanced panels.
Among the aforementioned works, there is a gap in the CCE estimation for dynamic panels with non-stationary unobservable factors. To fill this gap, in this paper we consider a linear dynamic heterogeneous panel data model with non-stationary unobserved common factors when both the cross-sectional and time dimensions of the dataset grow to infinity. Under these settings, we find that the CCE estimator of the individual coefficient is consistent, and the CCE mean group (CCEMG) estimator is consistent and has a normal limit distribution. The practical implication of this finding is that for inferential purposes of the CCE estimation, one does not necessarily need to test the stationarity of the unobserved common factors in the model. The finite sample properties are examined through Monte Carlo simulations and the simulation results confirm our theoretical findings in the paper. Moreover, the proposed procedure is illustrated by an empirical application, which analyzes the U.S. cigar dataset.
The rest of the paper is organized as follows. Section 2 sets up the basic model and introduces the CCE estimation of the dynamic heterogeneous panel data model with common factors. The asymptotics of the CCE estimation with non-stationary unobserved common factors is provided in Section 3. Monte Carlo simulation results and an empirical application are reported in Sections 4 and 5, respectively. The concluding remarks are made in Section 6. Proof of the main results is provided in Appendix A.
Notation: The letter K stands for a finite positive constant. All vectors are column vectors represented by bold lower case letters, and matrices are represented by bold capital letters. Let A = tr AA denote the Frobenius norm. A 1 = max 1≤j≤n Σ n i=1 a ij , and A ∞ = max 1≤i≤n Σ n j=1 a ij denote the maximum absolute column and row sum matrix norms, respectively. A + denotes the Moore-Penrose inverse of A, rank(A) and (A) denotes the rank and the spectral radius of A, respectively.

The Model
We assume the scalar dependent variable y it and regressors x it are generated as follows 2 y it = c yi + φ i y i,t−1 and for i = 1, 2, . . . , N and t = 1, 2, . . . , T, where c yi and c xi are individual fixed effects for unit i, x it is a k × 1 vector of the regressors specific to cross-sectional unit i at time t, ε it are the individual-specific (idiosyncratic) errors and u it are the individual-specific components of x it , γ i and Γ i are m × 1 and m × k factor loading matrices, and the m × 1 vector f t represents unobserved common factors. In what follows, we maintain the restriction that model (1) is stationary, such that 0 < |φ i | < 1 for i = 1, 2, . . . , N. Models (1)-(2) have been widely studied in the literature; see, for instance, Pesaran (2006), Chudik and Pesaran (2015a), Westerlund et al. (2019), and the references therein. We follow these studies to consider the CCE estimation for φ i and β i , and reexamine the validity of the CCE estimation when f t is non-stationary.

CCE Estimation
Following Chudik and Pesaran (2015a), let z it = y it , x it , then (1) and (2) can be compactly written as where If the support of (A i ) lies strictly inside the unit circle, then (3) can be rewritten as the following distributed lag form for i = 1, 2, . . . , N.
Taking the cross-sectional average of (5) yields , and L being the lag operator. Furthermore, if Λ(L) is invertible (see Assumption 4 below), then we have When the (k + 1) × m matrix C has the full column rank, i.e., the rank condition holds, we have where B(L) = (C C) −1 C Λ −1 (L). This suggests that the contemporary and lagged value ofz t = (ȳ t ,x t ) can be used as observable proxies for the unobserved common factors f t . Substituting the observed proxies of the unobserved common factors (7) into (1) yields the following augmented regression is the number of lags used to truncate the infinite polynomial distributed lag function δ i (L), 3 and the composite error w it has the form of For notational simplicity, let y i = y i,p T +1 , y i,p T +2 , . . . , y iT , Ξ i = (y i,−1 , X i ), with y i,−1 = y i,p T , y i,p T +1 , . . . , y i,T−1 and X i = (x i,p T +1 , x i,p T +2 , . . . , x iT ) , and w i = w i,p T +1 , w i,p T +2 , . . . , w iT , then the augmented regression (8) can be expressed in vector form as where π i = φ i , β i are the parameters of interest, d i = (c * yi , δ i0 , δ i1 , . . . , δ ip ) are nuisance parameters, and 4Q Based on the cross-sectionally augmented regression model (9) and by the formula for partitioned regression, the CCE estimator of the individual coefficients π i is given bŷ which is an ordinary least squares estimate, where M q = I T−p T −Q(Q Q ) +Q is an orthogonal projection matrix, with I T−p T a (T − p T )-dimensional identity matrix. In panel models with N large, the primary parameters of interest are the means of the individualspecific coefficients, E(π i ) = π, which can be estimated by the common correlated effects mean group (CCEMG) estimatorπ

Assumptions
When the unobserved common factors f t are stationary processes, Chudik and Pesaran (2015a) showed that the CCE estimator (11) of the individual coefficient is consistent, and the CCEMG estimator (12) is consistent and asymptotically normal. However, in practice, the common factors f t may follow a non-stationary process (see Bai andNg 2004, 2010;Pesaran 2007;Pesaran et al. 2013, among others). In this scenario, the validity of CCE estimators and their asymptotic properties need to be re-examined.
Following Kapetanios et al. (2011), we assume the unobserved common factors follow the multivariate unit root process To derive the asymptotic properties of the CCE type estimators (11) and (12) when f t follows (13), we make the following assumptions.
Assumption 1. (Individual-specific errors). (i) The individual-specific errors ε it follow a linear stationary process with uniformly-bounded positive variance, sup i σ 2 i < K, for some constant K, and uniformly-bounded fourth-order cumulants. u it follows a linear stationary process with absolute summable auto-covariances (uniformly in i), with covariance matrices, Σ u i , which are non-singular and satisfy sup i Σ u i < K, and have uniformly-bounded fourth-order cumulants. (ii) ε it are independently distributed of u jt for all i, j, t, and t . For each i, e it = (ε it , u it ) is an (k + 1) × 1 vector of L 2+δ , δ > 0, stationary near epoch dependent processes of size 2δ/(2δ + 4) on the α-mixing process of size −(2 + δ)/δ, and for i = 1, 2, . . . , N, Var(e it ) = Σ e i , which is a non-singular matrix and satisfies sup i Σ e i < K.
Assumption 3. (Heterogeneous coefficients). The slope coefficients π i = φ i , β i follow the random coefficient model where π = E(π i ) = (φ, β ) , π < K, Σ π < K, Σ π is the (k + 1) × (k + 1) symmetric nonnegative definite matrix and the random deviations υ πi are distributed independently of γ j , Γ j , ε jt , u jt and ς t for i, j, and t. Furthermore, the support of φ i lies strictly inside the unit circle, and E c i < K, E α i < K for all i, where c i = (c yi , c xi ) .

Assumption 4. (Exogenous regressors)
. Regressors x it are either strictly exogenous and generated according to the canonical factor model (2) with α i = 0, or weakly exogenous and generated according to (2) with α i , for i = 1, 2, . . . , N, I ID across i, and independently distributed of υ πj , γ j , Γ j , ε jt , u jt and f t for all i, j, and t. In the case where the regressors are weakly exogenous, we also assume: (i) The support of (A i ) lies strictly inside the unit circle, for i = 1, 2, . . . , N, where A i = A −1 0i A 1i with A 0i and A 1i are defined in (4).
Assumption 7. ς t in (13) is an m × 1 vector of L 2+δ , δ > 0, stationary near epoch-dependent processes of size 1/2, on an α-mixing process of size −(2 + δ)/δ, and is distributed independently of the idiosyncratic errors ε it and u it for all i and t.
Several remarks can be made for these assumptions. Assumptions 1-3 are quite standard in the literature for (dynamic) panel models with cross-sectional dependence, for example, see Pesaran (2006) and Kapetanios et al. (2011) and the references therein. Assumption 4 is also made on Chudik and Pesaran (2015a) for exogenous regressors and stationarity conditions for dynamic panels. Assumption 5 is a common condition for the implementation of the CCE estimation (e.g., Pesaran (2006) and Chudik and Pesaran (2015a), etc.), which implies that there are more included regressors than the unobserved factors in the model. See Juodis et al. (2021) for a detailed discussion of the validity of the rank condition and the resulting asymptotics for the CCE estimation. Assumption 6 is a common assumption for the CCE estimation and it is imposed for the partition regression in augmented regression for the dynamic panels (e.g., Chudik and Pesaran 2015a). Assumption 7 requires that the error structures in the unit root process f t are stationary.

Asymptotics
Under these assumptions, we can establish the asymptotic properties of CCE estimators (11) and (12) when f t is non-stationary. To begin with, we note that for the original model (1), it can be rewritten as in the vector form or more compactly as Using the CCE estimator (11) into (15), we havê which shows that the asymptotics ofπ i depends on the unobserved factors through Using the results in Lemma A2, A5, and A6 in Appendix A, we obtain and, thus,π when the rank condition (6) is satisfied. The above results are summarized in the following theorem, establishing the consistency of the CCE estimator of individual coefficients of interest.
Theorem 1. Consider the panel models (1) and (2), suppose Assumption 1-7 hold, then, as See the Appendix A for the proof.
Remark 1. The above theorem suggests that the CCE estimator of the individual slope coefficient is consistent even if the common factors are non-stationary. When the rank condition (6) is not satisfied, the CCE estimator of the individual slope coefficients would be inconsistent due to the correlation of x it and f t . See Juodis et al. (2021) for more discussions on the validity of the CCE estimator when the rank condition does not hold.
Next, we establish the asymptotic properties of the CCEMG estimator of the mean group coefficients, π = E(π i ). We have When the rank condition is satisfied, by (17), we have and, thus, Theorem 2. Consider the panel models (1) and (2), suppose Assumptions 1-7 hold, as (N, The asymptotic variance ofπ MG can be consistently estimated nonparametrically bŷ For the results in both Theorems 1 and 2, we find that, for models with non-stationary common factors, although the intermediate results needed for deriving the asymptotic properties of the common correlated effects estimators significantly differ from the stationary case, as in Chudik and Pesaran (2015a), the final results are surprisingly similar. This is in direct contrast to the usual phenomenon where distributional results of I(1) processes are radically different from those of I(0) processes.

Remark 2.
For the consistency ofπ i andπ MG , no restrictions on the relative expansion rates of N and T to infinity are required. However, they require N/T → ϕ, 0 < ϕ < ∞ for the derivation of the asymptotic distribution ofπ MG due to the time series bias, which arises from the presence of lagged values of the dependent variable; therefore, it is unsuitable for panels with T being small relative to N.
Including a lagged dependent variable as the regressor in the model could induce the estimators with time series bias of order O(T −1 ). When T is not large, the bias is nonnegligible; hence, a certain bias correction approach should be considered. In the simulations below, we consider the Jackknife bias-corrected method for bias reduction (e.g., see (21) below), which is used extensively in the relevant literature (e.g., Hahn and Newey 2004).

Monte Carlo Simulation
In this section, we investigate the finite sample properties of the CCEMG estimation for dynamic heterogeneous panels with non-stationary common factors. We consider the following data-generating processes 5 and for i = 1, 2, . . . , N, and t = −99, . . . , 0, . . . , The main purpose of this paper is to illustrate the validity of the CCEMG estimator in the case of non-stationary unobserved common factors; hence, for the unobserved common factors f t , we consider the following three different non-stationary DGPs: DGP 1. Two non-stationary unobserved common factors (m = 2), 2, for l = 1, 2, and t = −99, . . . , 0, . . . , T. DGP 2. One non-stationary unobserved common factor and a stationary common where σ f l = 1, for l = 1, 2, and t = −99, . . . , 0, . . . , T. For the above DGPs, the starting values are f l,−100 = 0, for l = 1, 2; the first 100 observations are discarded.
We consider the combination of N = 50, 100, 200, and T = 50, 100, 150, 200. The number of replications is set at 2000 times. In what follows, we focus on the lagged coefficient φ (the cross-section mean of φ i ), as well as β 0 (the cross-section mean of β 0i ). To save space, we only report the results of β 0 since the results for β 1 are very similar to that of β 0 and they are available upon request.
Two estimators are considered in the simulation. The first is the main result of the CCEMG estimatorπ MG given in (12), in which, the lag order p T is selected to satisfy p 3 T /T → λ, as T → ∞, for some 0 < λ < ∞; that is, p T = [T 1/3 ], which works well in our Monte Carlo design 6 . The second is the Jackknife bias-corrected CCEMG estimator, which is constructed asπ whereπ a MG is the CCEMG estimator calculated using the first two-thirds of the available time period, namely over the period t = 1, 2, . . . , [2T/3], andπ b MG denotes the CCEMG estimator computed using the observations over the period t = [T/3], [T/3] + 1, . . . , T, where [T/3] denotes the integer part of T/3. Note that a new strategy is applied to improve the performance of the Jackknife estimator, i.e., the whole time period is divided into three parts, the first two-thirds of the available period is applied to calculate the first estimator and another one is computed from the last two-thirds of the period. We find that, in our settings, this division strategy performs better than the half-panel Jackknife method discussed in Chudik and Pesaran (2015a).
We used the statistical software MATLAB to conduct the Monte Carlo experiments; the simulation results are summarized in Tables 1-3 for DGPs 1-3, respectively.
From Table 1, we note that for the estimation of φ, the CCEMG performs well in terms of bias and RMSE, with the bias diminishing as T is increased, and the associated RMSEs fall steadily when T increases, which implies that the CCEMG estimator is consistent. However, it still suffers from the time series bias when T is small. While the Jackknife bias-corrected CCEMG estimator is quite effective at reducing the time series bias of the CCEMG estimator, the bias has been significantly reduced compared with the original CCEMG estimator when T was not large, and the RMSE also decreased with the increase of either N or T. Similar findings can be observed for β 0 .
In order to evaluate the robustness of various estimators, we considered additional results in Tables 2 and 3 for DGPs with both stationary and non-stationary factors or cointegrated factors. Similar to the case with non-stationary factors in DGP1, we find that the CCEMG estimator still performs well regardless of the number of common factors and the non-stationary type, and it can be improved by the Jackknife bias-corrected for the estimation of the autoregressive coefficient φ, the CCEMG estimator of the slope coefficient β 0 performs very well in almost all cases.  Overall, the findings of our Monte Carlo simulations show that, if the parameter of interest is the mean coefficient of the regressors, β 0 , the CCEMG estimator performs well even if N and T are not large. For the mean coefficient of the lagged dependent, φ, the CCEMG estimator is still consistent, but it suffers from the time series bias unless T is sufficiently large and, thus, the Jackknife bias-corrected CCEMG estimator is proposed, it helps to mitigate the time series bias.

Empirical Study
In this section, we illustrate our method by considering the U.S. Cigar dataset, which is frequently used in the literature on panel models (e.g., Baltagi and Li 2004;Bada and Liebl 2014). The panel contains the per capita cigarette consumption of N = 46 American states from 1963 to 1992 (T = 30) as well as data on the income per capita and cigarette prices; the dataset can be obtained from the R package phtt.
To test the cross-sectional dependence in the panel data, following Pesaran (2015) and Bailey et al. (2016), we compute the CD statistic and the α statistic for the variables of interest in Table 4. As can be seen from the table, the CD statistics turn out to be 101.519, 166.27, and 154.142 for consumption, income, and price, respectively; these are highly significant and reject the null hypothesis of weak cross-sectional dependence for all three variables. Additionally, the estimates of α together with their 95% confidence bands further confirm the above results. As a result, we can conclude that there is an obvious cross-sectional dependence for these three variables. To investigate the relationship between the per capita cigarette consumption and the income per capita as well as cigarette prices, following Baltagi and Li (2004), we consider the panel model where y it , x 1it , and x 2it denote the per capita cigarette consumption, the income per capita, and cigarette price for the ith state at time t, respectively, and the idiosyncratic error has the multi-factor structure The proposed dynamic CCE approach is applied to estimate the coefficients in model (22), and the augmented equation to be estimated can be written as where the number of lags p T = [ 3 √ T] = 3, andz t = (ȳ t−1 ,x 1t ,x 2t ) . We focus on the CCEMG estimators and the results are presented in Table 5.
The following conclusions can be drawn from Table 5. On the one hand, the income per capita has a positive effect on the per capita cigarette consumption, while the increase in cigarette price will restrain cigarette consumption to a certain extent, and both are significant. These results are consistent with the conclusions of Bada and Liebl (2014). On the other hand, the lagged explained variable is highly significant, indicating that it is appropriate to use dynamic models for the per capita cigarette consumption. To illustrate the heterogeneous slopes across states, we display both the CCE and the CCEMG estimators in Figure 1, which clearly show that the estimates of coefficients vary from state to state, reflecting the heterogeneity among states. Moreover, to illustrate the potential non-stationarity of unobservable common factors in (23), we consider the method proposed by Bada and Kneip (2014) to select the number of unobservable common factors and estimate the selected common factors. The results are given in Figure 2, where the top panel shows the estimated common factors and the bottom panel shows the estimated time-varying individual effects of N = 46 states. As can be seen from the figure, five common factors have been selected, among which the first and second common factors have obvious tendencies and violate the stationarity condition.

Conclusions
In this paper, we re-examined the CCE type estimator for dynamic heterogeneous panel regression models with non-stationary common factors. Asymptotic properties of CCE estimators are established when both N and T are large. It is shown that, under certain conditions, the main results of Pesaran (2006) and Chudik and Pesaran (2015a) hold for a dynamic panel with non-stationary factors. Monte Carlo simulations were conducted to investigate the finite sample properties of the CCE estimation for the panel with nonstationary factors. An empirical application to the U.S. cigarette consumption dataset shows that the real data may have cross-sectional dependence as well as dynamic and non-stationary common factors (at the same time). Based on the findings of this paper, together with the results by Pesaran (2006); Kapetanios et al. (2011), and Chudik and Pesaran (2015a), we can conclude that the CCE method can be widely used to deal with panel models with error cross-sectional dependence, regardless of whether the model is static or dynamic, and whether the unobservable common factors are stationary.
Author Contributions: These authors contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

Funding:
Cao acknowledges the financial support from the National Natural Science Foundation of China (no. 11861014) and Guangxi Natural Science Foundation (no. 2020JJA110007 and no. 2020JJA110013.

Informed Consent Statement: Not applicable.
Data Availability Statement: Data used in the empirical application can be obtained from the R package phtt.

Acknowledgments:
We are grateful for the constructive comments from the guest editor as well as the two anonymous referees.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Useful Lemmas and Theoretical Derivations of Theorems
The appendix includes proofs of the theorems and lemmas used in the derivations of the main results in the paper.
Recall that Then, the following equalities hold, where matrixQ is given by (10).

Appendix A.1. Useful Lemmas
Now, let us turn to the lemmas, which are needed for the derivation of the results in the main paper.
Lemma A1. (a) If A ∈ R m×n r , r > 0, has a full-rank factorization A = BC, where B ∈ R m×r r , C ∈ R r×n r , then A + = C + B + .
(b) If A ∈ R m×n m , i.e., A is full row rank, then A + = A (AA ) −1 .
Using the properties of Moore-Penrose inverse, Lemma A.1 can be easily established by the MacDuffe Theorem of Ben-Israe and Greville (2003).

. Theoretical Derivation of the Asymptotics of the CCE Estimators
Proof of Theorem 1. Sincê and using the results of Lemmas A5 and A6, we havê Noting that under our assumptions, T −1 Ξ i M q Ξ i tends to a fixed positive definite matrix. Since Ξ i = GΠ i + Ω i , then we have , the first part of (A10) in Lemma A4 implies that the first term is O p (T −1 ). Next, we establish T −1 Ω i ε i p → 0. Note that Ω i = e i Ψ ξi with e i = (ε i , U i ), i.e., Ω i contains the lags of ε it , as well as the contemporary and lags of u it , by Assumption 1, ε it is the series uncorrelated and independent of u it , then we have T −1 Ω i ε i p → 0; consequently, as (N, T, p T ) j → ∞ and p 2 T /T → 0. Then it is followed by the consistency ofπ i .
Proof of Theorem 2. Using the consistency ofπ i , and the definition of the mean group estimatorπ MG , we obtainπ By the assumption of the random coefficient model, π i = π + υ πi , it follows that Combining (A19) and (A20), we havê so we only need to show that 1 Using (A21) and (A22), we obtainπ MG p → π as desired. Next, we establish the asymptotic distribution ofπ MG . We have Using the result (A14) in Lemma A5, when the rank condition is satisfied, we have which, together with the assumption of γ i to be bounded, and the results of Lemma A2, A5 and Lemma A6, we obtain For the third term, by Lemmas A2 and A6, we have Using (A24) and (A25) in (A23), we can obtain By the random coefficient assumption, it now follows that and Σ MG can be consistently estimated nonparametrically bŷ

Appendix A.3. Proofs of Lemmas
Notation: All vectors are column vectors represented by bold lower case letters, and matrices are represented by bold capital letters. Let A = tr AA denote the Frobenius norm. A 1 = max 1≤j≤n Σ n i=1 a ij and A ∞ = max 1≤i≤n Σ n j=1 a ij denote the maximum absolute column and row sum matrix norms, respectively. λ min (A) denotes the minimum eigenvalue of A, and λ max (A) denotes the maximum eigenvalue of A. A + denotes the Moore-Penrose inverse of A, and rk(A) denotes the rank of A. We also let K denote a generic finite constant, which does not depend on N or T, and whose value may vary case by case.
Proof of Lemma A2. Since H = GP, wherē with Ψ(L) = Λ(L)C + O p (N −1/2 ), Λ(L) is invertible. If the rank condition (6) is satisfied, i.e., C is a full column rank matrix, then Λ(L)C has a full column rank; hence, Ψ (L) has full row rank asymptotically, which implies lim N→∞P is a full row rank matrix. Moreover, noting that when the rank condition holds, matrix G G is full rank, so we have where the third equality follows from Lemma A1(a) sinceP has the full row rank asymptotically, and the fourth equality is based on the result of Lemma A1(b).

Proof of Lemma A3. Denote
Note that z it = (y i,t , x i,t ) , so we can write y i,t−1 = S y z i,t−1 and x i,t = S x z it , where We also note that using (A27) into (A26), we have Consequently, we have or more concisely as Proof of Lemma A4. We consider (A7) firstly. Note that So we only need to consider T −1 (Ṽ Ṽ ), which is a (k + 1)(p T + 1) × (k + 1)(p T + 1) matrix. Since the elements of e zit are weakly cross-sectionally dependent, together with the random coefficient assumptions, we have E v t = O(N − 1 2 ) and E v t 2 = O(N −1 ). Con-sider the (s, r)th block element of T −1 (Ṽ Ṽ ), which can be written as T −1 (∑ T t=p T +1v t−sv t−r ), for s, r ∈ {0, 1, . . . , p T }. where the cross-product terms with finite means and variances. Hence, then we have which establishes (A7). Now, we establish (A8), as before, we consider T −1 ε iṼ here, and note that the lth We consider the first term and note that Under the assumption of the individual-specific error, we have cov(ε it , ε js ) = 0, i = j; is a serial uncorrelated covariance stationary process under Assumption 2. Combining these results yields Now, we consider the second term 1 T ∑ T t=p T +1 ε itū t−l , noting that ε it and u it are independently distributed stationary processes with zero means, it follows that which follows that Using (A30) and (A31) in (A29), we have Consequently, we have hence, the first part of (A8) is established. Similarly, the result for T −1 U iV * of the second part of (A8) is established.
For the first part of (A9), since G = (τ,F) andV * = 0,Ṽ , we consider which is a m(p T + 1) × (k + 1)(p T + 1) matrix. Without loss of generality, we consider the first block element, T −1 ∑ T t=p T +1 f tv t , and note that the lth row of that can be written as T −1 ∑ T t=p T +1 f tlv t , l = 1, 2, . . . , m. According to the assumption of f tl andv t (independently distributed processes), it easily follows that by the standard unit root asymptotic analysis result then the first part of (A9) is proven.
To establish the second part of (A9), recalling that , since the norm ofP is assumed to be bounded.
To establish the third part of (A9), noting that Ξ i = GΠ i + Ω i , and using triangle inequality and the submultiplicative property of matrix norm · ∞ , we have by (A8), the first and second parts of (A9), as well as the norm of Π i and Ψ ξi (L) are assumed to be bounded in probability uniformly over i.
To establish the first part of (A10), recalling G = (τ,F), consider the m(p T + 1) × 1 vector T −1F ε i , the element of T −1F ε i can be written as T −1 ∑ T t=p T +1 f t−l,s ε it , l = 0, 1, . . . , p T , s = 1, 2, . . . , m. Since by the assumption of f t−l,s and ε it (independently distributed processes), it easily follows that by the standard unit root asymptotic analysis result T −2 ∑ t ∑ t E( f t−l,s f t −l,s ) = O(1), which establishes that 1 T ∑ T t=p T +1 f t−l,s ε it converges to its limit at the desired rate of O p (1). It follows that T −1F ε i ∞ = O p (1); hence, the first part of (A10) is established. Moreover, the second part of (A10) can be proven similarly.
Recalling thatQ = GP +V * , the third part of (A10) is established because by (A8) and the first part of (A10). For the first part of (A11), note that G = (τ,F), we only need to consider the T −2F F , a m(p T + 1) × m(p T + 1) matrix. We consider the (s, r)th block element of T −2F F , which can be written as T −2 (∑ T t=p T +1 f t−s f t−r ), for s, r ∈ {0, 1, . . . , p T }. Without loss of generality, we consider the first block, T −2 ∑ T t=p T +1 f t f t , and the element of T −2 ∑ T t=p T +1 f t f t can be written as T −2 ∑ T t=p T +1 f tl f tl , l, l ∈ {1, 2, . . . , m}. By the standard unit root asymptotic analysis, we have 1 , which establishes the first part of (A11). The second part of (A11) is established by since the norm ofP is assumed bounded (and the above result).
To prove the third part of (A11), note thatQ = H +V * , by (A7), the second part of (A9), and the previous result in (A11), we have Noting thatQ = GP +V * , by the first part of (A9) and (A11), the first part of (A12) can be established.
To establish the second part of (A12), note that H = GP and Ξ i = GΠ i + Ω i , and recalling that by (A10) and the first part of (A11), as well as the assumption that the norm ofP, Π i , and Ψ ξi (L) is assumed bounded in probability uniformly over i. The third part of (A12) is proven straightforwardly sinceQ = H +V * , using (A8) and the second part of (A12).
Denote the OLS estimator of the multiple regression (A32) asΠ i = (G G) −1 G Ξ i . Since that Ξ i M g Ξ i = Ω i M g Ω i =Ω iΩ i , whereΩ i is the OLS residuals, i.e.,Ω i = Ξ i − GΠ i , and in the light of Assumption, T −1 (Ω i Ω i ) → Σ Ω i , we only need to show that T −1 (Ω iΩ i ) − T −1 (Ω i Ω i ) → 0. In fact, we can write and Since Λ(L) is invertible under the assumption, then (A34) can be rewritten as When the rank condition is satisfied, we have Note that Ξ i can be written as Ξ i = G 2i Π 2i + Ω i , where G 2i = (τ, F) and Π 2i = (c ξi , Ψ ξi C i ) , then Substituting (A36) into (A35), we obtain Moreover, from (A33), Λ(L)CF M qV = −V M qV , which directly follows under the assumption of Λ(L) is invertible and the rank condition is satisfied. Then, using this result in (A37), we have Since the norms of C i , Λ −1 (L) and Ψ ξi are assumed to be bounded, we need to establish the probability orders of V M qV /T and e i M qV /T . ForV M qV /T, since V is a (T − p T ) × (k + 1) submatrix ofV * , (A7) and (A9) implyV V /T = O p (N −1 ) and Q V /T ∞ = O p (N −1/2 ), which together with (A7), we obtain Similarly, by (A7)-(A9), Substituting the above two results into (A39) establishes the result.
Proof of Lemma A6. To prove (A15), we need to determine the order of probability of Substituting (A45)-(A47) into (A44), we have which completes the proof of (A16).

1
An alternative approach to deal with cross-sectional dependence is the principle component analysis proposed by Bai (2009). 2 As in Pesaran (2006) and Kapetanios et al. (2011), observed factors, such as time effects, can also be included in model (1). For notational simplicity and illustration purpose, we do not include such factors in the model (1).
3 As Chudik and Pesaran (2015a) point out, the number of lags p T needs to be restricted. Letting p 3 T /T → λ, 0 < λ < ∞ can ensures that, on the one hand, the number of lags is not too large, so that there are sufficient degrees of freedom for the consistent estimator, and on the other hand, the number of lags is not too small, so that the bias due to the truncation of infinite lag polynomials is sufficiently small 4 We note thatQ can be denoted asQ = (τ,Z), where τ = (1, 1, . . . 1) is a (T − p T ) × 1 vector of ones,Z is the (T − p T ) × (k + 1)p T matrices of observations onz t for t = p T + 1, p T + 2, . . . , T.

5
To illustrate the validity and robustness of the CCE estimator in the case of non-stationary common factors, the data-generating process and parameter settings are similar to the settings in Chudik and Pesaran (2015a), except for unobserved common factors. 6 We also conducted additional Monte Carlo simulations for other settings, such as p T = [0.75T 1/3 ] and p T = [1.25T 1/3 ]; the corresponding results are slightly worse than that of p T = [T 1/3 ], these results are not reported to save space.