Panel Data Estimation for Correlated Random Coefficients Models

This paper considers methods of estimating a static correlated random coefficient model with panel data. We mainly focus on comparing two approaches of estimating unconditional mean of the coefficients for the correlated random coefficients models, the group mean estimator and the generalized least squares estimator. For the group mean estimator, we show that it achieves Chamberlain (1992) semi-parametric efficiency bound asymptotically. For the generalized least squares estimator, we show that when T is large, a generalized least squares estimator that ignores the correlation between the individual coefficients and regressors is asymptotically equivalent to the group mean estimator. In addition, we give conditions where the standard within estimator of the mean of the coefficients is consistent. Moreover, with additional assumptions on the known correlation pattern, we derive the asymptotic properties of panel least squares estimators. Simulations are used to examine the finite sample performances of different estimators.


Introduction
One useful tool for reducing real-world details for econometric modeling is through "suitable" aggregations of micro data.For aggregation not to distort the fundamental behavioral relationships between the micro data and aggregate data, certain "homogeneity" conditions must hold between the micro units (e.g., Hsiao et al. 2005;Pesaran 2003;Stoker 1993;Theil 1954).However, the "homogeneity" assumption is often rejected by empirical investigators (e.g., Kuh 1963;Hsiao and Tahmiscioglu 1997).On the other hand, most policy makers are only interested in the average relationships of the population, not the individual relationship.Random coefficients formulation can be a useful tool to accommodate the "heterogeneity" among micro units and policy makers' desire to find the average relationship (e.g., Hsiao et al. 1993).
Standard random coefficients models assume the variation of coefficients are independent of the variation of regressors (e.g., Hsiao 1996;Hsiao and Pesaran 2008).In recent years, a great deal of attention has been devoted to the correlated random coefficients model.For instance, in the human capital literature, let the dependent variable y denote the logarithm of earnings and the explanatory variable x denote the years of schooling; the coefficient β denotes the rate of return.It is possible that the return to schooling declines with the level of schooling.It is also plausible that there are unmeasured ability or motivational factors that affect the return to schooling and are also correlated with the level of schooling (e.g., Card 1995;Heckman and Vytlacil 1998;Heckman et al. 2006;Heckman et al. 2010).Particularly, Heckman and Vytlacil (1998) propose an instrumental variable method for the population mean of slope coefficients but not the intercept in the cross sectional correlated random coefficients model.They require the existence of both instrumental variables for the regressors and random coefficients.
Many people have worked on correlated random coefficients panel data models.For instance, Chamberlain (1992) showed how to apply his general result on the semiparametric efficiency bound to a random coefficients panel model as an example.The model considered in Chamberlain (1992) also allows for time varying parameters, which is more general than our model.However, the expression of efficient bound obtained using Chamberlain (1992)'s formulas are different from the expression obtained using the direct derivation.We show, in this paper, that they are, indeed, exactly the same.Due to the inclusion of the time varying parameters, Chamberlain (1992) requires the number of time periods T is greater than the number of random coefficients K. Otherwise, the information matrix of the time varying coefficients is singular.Graham and Powell (2012) further considered the situation when T = K, and proposed a novel "irregular" method that leads to consistent estimation.Their approach assumes the existence of panel data with two subpopulations, where one corresponds to units whose regressor values do not change across periods and the other changes across periods.Arellano and Bonhomme (2012) discuss the identification of the distribution of random coefficients conditional on the values of the regressors, extending the idea from Chamberlain (1992).Chernozhukov et al. (2013) consider more general nonseparable panel models that allow for correlated random coefficients model as a special case.
In this paper, we consider the parametric identification and estimation of the unconditional mean of the random coefficients using panel data when the regularity conditions hold.Two approaches are considered; the approach of ignoring the correlations between the coefficients and regressors and the approach of explicitly modeling the correlations between the coefficients and regressors.
The rest of the paper is organized as follows.We discuss the estimation of the unconditional mean of the random coefficients with panel data in Sections 2 and 3. Section 2 considers the approach without explicitly modeling the pattern of correlations.Section 3 considers the approach with explicit assumption about the correlations between the coefficients and regressors.Section 4 provides Monte Carlo results of the different estimators in a finite sample.Concluding remarks are in Section 5.

Panel Parametric Approaches without Explicit Assumption about the Correlations between Coefficients and Regressors
When only cross-sectional data are available, the identification conditions of average effects for a correlated random coefficients model require the existence of instrumental variables, which are very stringent and may not be satisfied for many data sets.However, when panel data are available, it is possible to obtain a consistent estimator of the population mean of random coefficients without the existence of instrumental variables.
Suppose there are T time series observations of (y it , x it ) t=1,...,T for each individual i.Let y i and x i be T × 1 vector and T × K matrix with typical row elements given by y it and x it = (x it,1 , . . ., x it,K ), respectively, for i = 1, 2, . . ., N. Also, let β i = (β i1 , . . ., β iK ) .We have Let u i = (u i1 , . . ., u iT ) , and we assume u i is iid across i, with E(u i |x i ) = 0 and E(u i u i |x i ) = Σ x i (a T × T matrix).We assume that β i is iid with mean β and variance Var(β i ) = ∆.Then we can write where and Substituting β i = β + α i into (1) yields where The standard random coefficients model assumes that α i is a random draw from a population with E(α i |x i ) = 0. Then and Therefore, a consistent estimator of β can be obtained by simply regressing Y on X, where Y and X are of dimensions N × 1 and N × K, respectively.An efficient estimator of β can be obtained by applying the generalized least squares estimator (GLS) (or feasible GLS) (e.g., Hsiao 2003, chp. 6;Swamy 1970).
When E(α i |x i ) = 0 is violated, which is very common in practice, there exist the correlations between the coefficients and regressors, which is our main focus in the paper.We discuss different conditions and estimations in the following subsections.

Group Mean Estimator
In this subsection we impose the following mild conditional moment restriction: Note that (7) is weaker than E(u i |x i , β i ) = 0 as we do not require that α i and u i are orthogonal with each other.Equation (7) implies the following unconditional moment condition Moment condition (9) leads to the estimator of β given by βGM where βi = (x i x i ) −1 x i y i .Estimator (10) is the group mean (GM) estimator of Pesaran and Smith (1995) or Hsiao et al. (1999).Under certain regularity conditions, we show that the GM estimator achieves the semiparametric efficiency bound derived in Chamberlain (1992) Then Particularly, in the uncorrelated case, we impose the restriction that Then the covariance term in (12) drops out.Moreover, if we further impose the conditional homoskedastic error assumption: Then Var( βi ) is simplified to where ∆ = Var(α i ).
The following proposition describes the asymptotic behavior of βGM .
The group mean estimator defined in ( 10) is √ N-consistent and asymptotically normally distributed, specifically, we have where Ω is defined in (12).(ii) βGM is semiparametrically efficient.(iii) If conditions ( 13) and ( 14) also hold, then the asymptotic variance Ω can be simplified to with mean zero and finite variance Ω. Proposition 1(i) follows from the Lindeberg's central limit theorem.(iii) follows from (i), ( 13) and ( 14) directly.We postpone the proof for (ii) to the Appendix A.
Remark 1.Note that (16) holds without imposing any restriction on the correlations between x i and α i , and u i and α i .The random coefficient α i can be correlated with both x i and u i with arbitrary correlation patterns.Also, since x i can contain a constant (an intercept), the conventional fixed effects model is included in the correlated random coefficient model as a special case.

Generalized Least Squares Estimator
In this subsection we consider a generalized least squares (GLS) estimator of β under the assumption that Cov(β i , x i ) = 0 and compare the relative efficiency of the group mean estimator and the GLS estimator.Under the assumption that and the assumptions of ( 13) and ( 14), i.e., E(u i |x i , α i ) = 0 and Var(u i |x i ) = σ 2 u I T , then the best linear unbiased estimator (BLUE) of β is the generalized least squares estimator (e.g., Hsiao 2003, chp. 6 where is a positive definite weight matrix satisfying ∑ N i=1 W i = 1.Contrary to the group mean estimator (10) that takes the simple average of the individual least squares estimator, βi , the GLS estimator takes the weighted average of βi .
By noting that Then by the law of large numbers and the central limit theorem arguments and by noting that where Clearly, βGLS is not feasible.A feasible GLS estimator is given with ∆ and σ 2 u replaced by where βi is given in (10).The consistency of ∆ and σ2 u can be proved similarly as that of ∆ * and σ2 u in the Appendix A.
Remark 2. Note that without condition that E(α i |x i ) = 0 of (17), βGM is still a root-N consistent estimator for β as shown in Proposition 1 while βGLS becomes inconsistent when T is finite due to However, when T is large, by noting that x i x i /T = E(x it x it ) + O p (1/ √ T) under the strong mixing condition and the conditions given in Theorem 24.6 in Davidson (1994), the weight matrix W i is close to a constant matrix: It is easy to see that βGLS The next proposition compares the relative efficiency of βGLS and βGM by comparing their asymptotic variances: Proposition 2. Assuming that T is small (but still T ≥ K) and that conditions ( 13), ( 14) and ( 17) hold.
Then Avar( The proof of Proposition 2 is given in the Appendix A. Proposition 2 says that, under some additional assumptions, βGLS is asymptotically more efficient than βGM .This is in no contradiction with Proposition 1(ii) because the result of Proposition 1 does not require any of the conditions ( 13), ( 14) and ( 17) to hold.With additional conditions, βGM is no longer a semiparametric efficient estimator of β.However, these additional conditions, especially condition (17), are quite restrictive.
It was shown by Hsiao et al. (1999) that when T is large, βGLS becomes a consistent estimator for β without needing the restrictive condition (17).
In other words, if both N and T are large and if lim N,T→∞ (N 1/2 /T) → 0, contrary to the case of only cross-sectional data are available, one can ignore the issue of possible correlations between α i and x i (i.e., we allow for E(α i |x i ) = 0) and simply treat the model as if β i and x i are uncorrelated and apply the conventional GLS (e.g., Hsiao 2003, eq. (6.2.6)).

Within Estimator
If T < K, neither the GM, nor the GLS can be implemented.However, the standard within estimator can still yield a consistent estimator of β in certain cases.Let ȳi• = 1 T ∑ t y it and xi• = 1 T ∑ t x it .The within estimator (or fixed effects estimator) first takes the deviation of each observation from its time series mean, then regress ( where ūi• = T −1 ∑ t u it .The fixed effects (FE) estimator of β is the least squares estimator of (23).
In general, ( 24) is inconsistent.However, if the data generating process of x it takes the form: where µ i is iid with mean a and variance Σ µ and it is iid across both i and over t with Then ( 23) is consistent.To see this, note that under ( 25) and ( 26), we have where where If in addition the following conditional homoskedastic error assumption holds: where as N → ∞.Therefore Proposition 4.Under ( 25) and ( 29), the conventional fixed effects estimator is √ N-consistent and asymptotically normally distributed as N → ∞.The asymptotic covariance matrix of (24) can be approximated using Newey-West heteroscedasticity-autocorrelation consistent formula.Fang and Zhang 1990;Gupta et al. 1993).Therefore, Proposition 5. When (x it , α i ) are jointly elliptically distributed, or conditional homoscedasticity of (x it − xi• ) of (29) holds, the FE estimator ( 24) is √ N-consistent and asymptotically normally distributed.
Another case where the fixed effects estimator can be consistent is that (x it , α i ) are jointly symmetrically distributed.Since x it − xi• has mean equal to zero, (x it − xi• , α i ) will be symmetrically distributed around (0, 0), then 1 → 0 even though x it has mean different from zero.We have Proposition 6.Under (25), (26) and if (x it , α i ) are symmetrically distributed, the fixed effects estimator ( 24) is √ N consistent and asymptotically normally distributed.
Wooldridge (2005) also discussed conditions for the validity of the fixed effects estimator.Although the conventional FE estimator (24) can yield a consistent estimator of β, if x it contains time-invariant variables, the mean effects of time-invariant variables cannot be identified by the conventional fixed effects estimator.Moreover, the FE estimator only makes use of the within (group) variation.Since, in general, the between group variation is much larger than within group variation, the FE estimator could also mean a loss of efficiency.

Panel Least Squares or Generalized Least Squares Estimator
If α i is correlated with x i , i.e., E(α i |x i ) = 0, we can re-write (1) as where 31) is no longer a linear function of x it .For instance, suppose as assumed by Mundlak (1978).Noting that which implies that a = −BE(vec(x i )).Hence, (32) can be written as Equation ( 31) then becomes Let Therefore, the least squares or the generalized least squares estimator of β is is a full rank matrix, where x• = N −1 ∑ N i=1 x i .Similar reasoning can be applied if E(α i |x i ) is a higher order polynomial of x i , say, Then from Substituting ( 38) into (31) we have where v it = x it ω i + u it and ω i = α i − E(α i |x i ).By construction, v it is uncorrelated with the regressor.Therefore, the least squares (LS) or the feasible generalized least squares (FGLS) estimator of (39) yields a √ N consistent and asymptotically normally distributed estimator of β when x• and N −1 ∑ N i=1 x i ⊗ x i are substituted in lieu of E(x i ) and E(x i ⊗ x i ) in (39).
Next, we derive the asymptotic distribution of βM,PLS which is the estimator for β in (35).The feasible GLS type estimator βM,PLS of β in (35) can be constructed as where X and Y are defined in ( 19), and βi is given in (10), and B is the OLS estimator of B in (35).
We have the following proposition Proposition 7.Under conditions ( 13), ( 14) and ( 34), we obtain where The proof is given in the Appendix A.
Remark 3. From the total variance decomposition, under the assumption that E(α i |x i ) has a known functional form, for example as given by ( 34) or (38).Therefore, the asymptotic variance in Proposition 1(iii) can be rewritten as By contrast, from Proposition 7 with some algebraic operations, ] is positive definite.Compared with the group mean estimator, it is not clear which estimator is more efficient, even though E[

Monte Carlo Studies
We consider several data generating designs for the following correlated random coefficients model and u it is a random draw from the standard normal distribution, and is independent with (x it , α i ) for all simulation designs.The regressor x it and the random coefficient α i are correlated with each other and are generated according to the following sample designs: then generate Design 3: Randomly draw α i from a uniform distribution (−0.75, 0.75).Then generate where w it = 1 + χ 2 (5), where χ 2 (5) is a random draw from a chi-square distribution with five degrees of freedom.
Designs 1 and 2 generate β i and x it from uniformly jointly symmetric distribution with mean (1,0), and mean (1,2), respectively.Designs 3-8 generate β i and x it from correlated but not symmetrical distributions.Designs 9 and 10 yield nonlinearly correlated β i and x it .
We consider a case where T = 3 and 20, and N = 50, 100, and 200, respectively.We replicate the experiments two thousand times.The simulation results are consistent with the theoretical results.Because the results for T = 3 or T = 20 are similar, we only report the results for T = 3.The results for T = 20 are available upon request.Tables 1 and 2 provide the bias and mean squared errors of our estimators.As expected, when (α i and x it ) are generated from symmetric distribution with mean (0,0) (Design 1), the LS estimator yields unbiased estimator.However, if (α i and x it ) are generated from symmetric distribution with mean (0,2) (Design 2) or nonsymmetric distribution (Design 3), the LS estimator yields biased estimates of β(=1).Most other estimators will work well if there exhibits linear correlations between the coefficients and regressors except the LS estimator.Performances under nonlinear correlations (Designs 9 and 10) will tell them apart.The GM always enjoys the highest efficiency, which is consistent with our theories.The panel least squares estimators PLS1 and PLS2 could pick up the linear correlations well but fail to do so in the nonlinear cases.Also, they create larger biases to achieve smaller mean squared errors compared with the FE estimator.The GLS ignoring the correlations between the coefficients and regressors would work when there are no correlations or linear correlations, but fails when the correlation is nonlinear and T is small which can be seen from the results of Designs 9 and 10.The conventional FE estimator is nearly unbiased if α i and x it are symmetric (Designs 1 to 8), but is not consistent when the condition is not satisfied, and yields larger mean squared errors than that of GM or GLS.From Table 2 we see that for design 9, only GM estimator exhibits consistency result, while all other estimators' estimation MSEs do not decrease as sample size N increases.Note that when α i is generated from Gamma(1,1) and Beta(1,3), the mean of α i is 1 and 0.25, respectively.Therefore, β in these cases will be 2 and 1.25, respectively.

Concluding Remarks
Parameter heterogeneity among micro units is quite common and a random coefficients model is a convenient way to take into account unobserved heterogeneity in pooling the panel data (e.g., Hsiao and Tahmiscioglu 1997;Hsiao et al. 2005).However, as demonstrated by Card (1995), Heckman and Vytlacil (1998), etc. the parameter variation could often be correlated.When only cross-sectional data are available, it was shown by Heckman and Vytlacil (1998) that the consistent estimate of the mean of the coefficients requires very stringent conditions.In this paper, we show that when panel data are available, there is no need to find separate instruments for x it and β i .As long as the time series dimension T is no smaller than the number of regressors K, we can accommodate the correlations between the random coefficients (α i ) and regressors (x i ).Particularly, the group mean estimator is consistent and achieves Chamberlain's semiparametric efficiency bound.We also give conditions under which the conventional fixed effects estimator and the generalized least squares estimator lead to consistent estimates of the mean coefficient vector β = E(β i ).Simulation results strongly support our theoretical analysis.In particular, our Monte Carlo studies show that the group mean estimate is, indeed, robust to a variety of patterns of correlations between the coefficients and regressors.
This completes the proof of Proposition 7.Moreover, let M * i = (x i Σ −1 i x i ) −1 = ( x i (x i ∆ * x i + σ 2 u I T ) −1 x i ) −1 .Similar as that in the proof of Proposition 2(i) above, we have

Table 2 .
The Mean Squared Errors of the LS, FE, PLS1, PLS2, GM and GLS.