On the Proper Computation of the Hausman Test Statistic in Standard Linear Panel Data Models: Some Clariﬁcations and New Results

: We provide new analytical results for the implementation of the Hausman speciﬁcation test statistic in a standard panel data model, comparing the version based on the estimators computed from the untransformed random effects model speciﬁcation under Feasible Generalized Least Squares and the one computed from the quasi-demeaned model estimated by Ordinary Least Squares. We show that the quasi-demeaned model cannot provide a reliable magnitude when implementing the Hausman test in a ﬁnite sample setting, although it is the most common approach used to produce the test statistic in econometric software. The difference between the Hausman statistics computed under the two methods can be substantial and even lead to opposite conclusions for the test of orthogonality between the regressors and the individual-speciﬁc effects. Furthermore, this difference remains important even with large cross-sectional dimensions as it mainly depends on the within-between structure of the regressors and on the presence of a signiﬁcant correlation between the individual effects and the covariates in the data. We propose to supplement the test outcomes that are provided in the main econometric software packages with some metrics to address the issue at hand.


Introduction
As is well known, the implementation of the Hausman specification test (Hausman 1978) might be affected, in practice and in a finite sample setting, by a non-positive definiteness or (in-) definiteness problem for the variance-covariance matrix corresponding to the difference between the efficient estimator and the consistent estimator 1 .This, in turn, can potentially lead to a negative value of the test statistic, which makes it unreliable for interpreting the test outcome.This problem is usually mentioned in the context of models using instrumental variables (IV) (Baum et al. 2003, pp. 19-22;Staiger and Stock 1997, pp. 567-68) where the Hausman test is performed to assess the endogeneity of the regressors, given a set of instruments.In this case, one solution to ensure a symmetric positive definite (hereafter, SPD) covariance matrix is to use a common and identical estimator for the variance of the (idiosyncratic) error term when confronting the Ordinary Least Squares (OLS) and the IV estimators (Hayashi 2000, pp. 220-33;Baum et al. 2003, pp. 19-22).
Yet, although the issue has been pointed out as a warning by Hausman himself when he addressed the case of a static balanced, panel data model in his seminal presentation of the test (Hausman 1978(Hausman , footnote 25, p. 1267)), it has not been, to the best of our knowledge, further formally examined in this specific framework 2 .This is surprising, as one of the most widespread applications of the Hausman test is for assessing the relevance of the random (RE) versus fixed effects (FE) specification in a panel data model.From our point of view, this application, however, deserves attention in its own right, and this article aims at filling this gap.Specifically, our contributions are twofold.
We first provide new and detailed analytical results for the implementation of the Hausman test in the case of a static and balanced panel data model with individual effects.In particular, we show that the test statistic is unreliable in a finite sample if the variance of the RE estimator is computed on the basis of the estimation of the quasidemeaned model (we denote it in what follows QDM) 3 rather than the conventional and direct implementation of the Feasible Generalized Least Squares (FGLS) on the RE panel data model.This result directly follows from the way in which standard errors are computed under the QDM approach and which can lead to a positive definiteness problem for the covariance matrix in the Hausman test statistic formula.To establish the unreliability, we perform a systematic analysis of the difference between the Hausman statistics computed under the two approaches as well as of the behavior of the statistic that uses the estimates based on the QDM regression framework.In particular, we show that the latter mainly depends on the within-between structure of the regressors and on the presence of a significant correlation between the individual effects and the covariates in the data 4 .
Second, based on a review of the main existing econometric software programs that deal with panel data models, we show that the vast majority of the related packages in those programs implement, by default, the Hausman test using the unreliable statistic.This leads us to assess different ways to supplement the test outcomes provided by those programs and/or to circumvent the reliability problem potentially raised by the use of the statistic involved.
The outline of the paper is as follows.First, in Section 2, we show how the two versions of the Hausman test statistics can produce diverging results for some well-known textbook examples in the context of panel data models.Then, in Section 3, we formalize the implications of the two approaches for the Hausman statistic and derive new analytical results regarding the comparison of the two versions of the statistic that follows from each of the two approaches.We also revisit textbook examples and provide some simulation results in the one-regressor case.Finally, in Section 4, we detail how the Hausman test is implemented in a variety of econometric software programs dealing with panel data models and discuss, on the basis of this review, some possible ways to implement this test in a reliable and robust manner with this software.The last section concludes.

Motivation
In this section, we illustrate the extent to which significant differences in the values of the Hausman test statistic may arise depending on the approach adopted to estimate the panel data model parameters and, more particularly, those pertaining to the variance components under the random effects (RE) specification.

Notation
All the case studies considered below are for a standard, linear, static, and balanced panel data model with individual effects (also called the one-way error component-panel data-model).Accordingly, we consider the following relationship: where i = 1, ..., N denotes the cross-section dimension; t = 1, ..., T the time-series dimension; y it the it-th observation of the dependent variable; X it a column vector of the it-th observation on K explanatory variables; α an unknown scalar and β a (K × 1) vector of unknown parameters, both to be estimated 5 .The error term u it is assumed to take the following composite form: where α * i denotes the (unobservable) individual-specific (or individual) effect-also called the individual component of u it and ε it denotes the idiosyncratic component of u it .Furthermore, we assume that α * i and ε it are independent of each other and that As is well known, two alternative specifications are usually considered regarding the correlation between the individual effect, α * i and the regressors contained in X it .On the one hand, the random-effects (RE) model assumes that this correlation is zero, ensuring that X it is strictly exogenous for u it in (1).On the other hand, the 'fixed effects' model allows for a non-zero correlation between the individual effect and the regressors.The Within transformation is then used to obtain an unbiased estimate for β.
The Hausman specification test (Hausman 1978) is widely used for testing the nocorrelation assumption underlying the RE specification.In our setting, the test is based on the asymptotic properties of the RE and Within (or fixed-effects) estimators of β.Both estimators are consistent under the null hypothesis (of no correlation) while, under the alternative, only the Within estimator is consistent as the RE estimator is (asymptotically) biased.Accordingly, the test statistic is built on a distance measure between the Within and RE estimators.In its implementable version, the statistic writes as: where β W and v ar[ β W ] denote the Within (or fixed effects) estimator of β and a consistent estimator of its asymptotic covariance matrix, respectively; β RE and v ar j [ β RE ] denote the RE estimator of β and a consistent estimator of its asymptotic covariance matrix, respectively.The index j = 1, 2 indicates that two approaches can be considered to obtain a consistent estimator of the asymptotic covariance matrix, leading to two versions of the test statistic (see Section 3.3).

Motivating Examples
We reproduce some outcomes of panel model estimations for well-known case study applications taken from main textbooks in the field.In the following tables, Std Err._1 and Std.Err._2 denote the two sets of standard errors for the parameters implied by the use of two different estimators for the covariance matrix of the RE estimator (detailed below).The values of the two related versions of Hausman test statistics are denoted HM 1 O and HM 2 O .Other variables and parameters are shown in the tables, the definitions and interpretations of which are left for discussion in Section 3 where we further comment on those results.

Motivating Example 1: Gasoline
Baltagi (2005) provides an interesting example of an important difference between the two versions of the Hausman test statistic in a study on the determinants of gasoline demand over the period 1960-1978 across 18 OECD countries 6 .The following specification is adopted: where Gas/Car is motor gasoline consumption per auto, Y/N is real income per capita, P MG /P GDP is real motor gasoline price and Car/N denotes the stock of cars per capita.
Table 1 clearly shows that the two values of the Hausman statistic deviate strongly from each other, even if the null hypothesis is rejected in both cases.It is also interesting to observe that the two sets of standard errors remain close to each other 7 .
where cost is the total cost, in $1000; Q is output, measured in "revenue passenger miles" (index number); f uel price is fuel price and load f actor is a rate of capacity utilization: it is the average rate at which seats on the airline's planes are filled.The dataset consists of six firms observed yearly for 15 years (1970 to 1984).
In this case (Table 2), HM 1 O > HM 2 O while the two values are much closer and clearly lower than in the previous case.They both lead to the rejection of the null.Also, the two sets of standard errors are quite close.
The airline case provides further interesting outcomes when the specification is estimated with two covariates.We single out the regression with log[ f uel price] and load f actor as the two covariates.The corresponding estimation results are provided in Table 3.
This new set of results is interesting for the negative sign of the computed HM 2 O and furthermore in that Using the absolute value of the HM 2 O statistic to perform the Hausman test (as some software packages do), the null hypothesis would not be rejected contrary to the outcome implied by HM 1 O .Note at this point that these results are obtained for a sample distribution of the covariate's observations featuring a structure of the variance largely skewed in its within dimension.Cornwell and Rupert (2008)'s study about the determinants of the returns to schooling.The dataset is a balanced panel of 595 observations on heads of households that runs over the period (1976)(1977)(1978)(1979)(1980)(1981)(1982).Among the specifications examined by Greene, we consider the following one: where EXP denotes the number of years of full-time work experience; WKS, the number of weeks worked; OCC = 1 if the status of the occupation is blue-collar occupation, 0 if not; I ND = 1 if the individual works in a manufacturing industry, 0 if not; SOUTH = 1 if the individual resides in the south, 0 if not; SMSA = 1 if the individual resides in a city, 0 if not; MS = 1 if the individual is married, 0 if not; UN ION = 1 if the individual wage is set by a union contract, 0 if not; lastly log[wage] denotes the log of the (yearly) wage 9 .
The estimation results are presented in Table 4. Here, even with a large sample (4765 observations), we observe significant differences between the two Hausman statistics (which are very large) as well as the two sets of the standard errors for the random effects model (the ratio of the latter that is provided by

The Two Versions of the Hausman Test Statistic
The two versions of the Hausman statistic refer to two possible approaches for estimating the covariance matrix of the estimator of the parameters of the model given in its RE specification (1) and (2).We first start with a formal and explicit presentation of these approaches since their implications for the computation of the Hausman test statistic have remained largely unnoticed.

The Original Hausman Test Specification in a Balanced Panel Data Model
We rewrite model (1) and (2), stacking the observations over the time and crosssectional dimensions: where y is the (NT × 1) vector for the observations of the dependent variable; ι NT a vector of ones of dimension NT and X is the (NT × K) matrix including the observations of the is the (NT × 1) vector for the composite error terms.The covariance matrix of u is denoted by Ω.Given the properties of α * i and ε it , it takes the following form: and W are symmetric and idempotent, and Ω * −1 C with B C replacing B in the previous formulas.Assume first, that the variance components (and thus Ω, Ω * and Ω * C ) are known.Then, it is well established that: (1) The RE estimator of β ( β RE ) corresponds to the Generalized Least Squares (GLS) estimator of β in (3), noted β GLS , and is given by ( 4) with its covariance matrix by (5): (2) The Within (or fixed effects) estimator of β, β W , is given by ( 6) and its covariance matrix by (7): The GLS estimator β GLS can alternatively be obtained, using the quasi-demeaned model (also called the partial Within transformation model) built from (3) with the premultiplying factor 10 Ω * − 1 2 : It is easy to check that the OLS estimator of β in (8)-denoted as β OLS -corresponds to β GLS 11 .Also: where Based on the former estimators, the Hausman specification test statistic initially proposed by Hausman (1978) is: with q ≡ β W − β RE and var( z) denoting the (finite sample) exact covariance matrix of z.
Under the null hypothesis of no-correlation, HM O * is asymptotically distributed as a χ 2 (K).

Two Estimation Procedures
In practice, Ω, Ω * and Ω * C are unknown and replaced by consistent estimators (noted Ω, Ω * and Ω * C , respectively 12 ).In this case, the Hausman test statistic is written as: where: (1) β ∆WRE ≡ β W − β RE with, now, β RE = β FGLS indicating that the RE estimator corresponds to the Feasible Generalized Least Squares (FGLS) estimator for β ( β FGLS ), accounting for the use of (2) v ar( z) is a consistent estimator of the asymptotic covariance matrix of z built upon the finite-sample estimator of the (exact) variance of z and with j = 1, 2 indicating that, for the RE estimator, two approaches are available to compute this matrix.
Under suitable conditions assumed to hold in what follows 13 , β FGLS and β GLS are asymptotically equivalent and HM O is, as HM O * , asymptotically distributed as a χ 2 (K).
We now discuss the choice of variance component estimators that are required to compute HM O .
First, a consistent estimator for var[ ε built from the OLS estimation associated with the Within (transformed) regression model, u W denoting the (NT × 1) vector of the related residuals.
Second, to obtain v ar[ β FGLS ], two approaches are possible.
3.2.1.Approach 1: The (Direct) FGLS Approach Relying on the asymptotic equivalence between β FGLS and β GLS , the computation of a consistent, asymptotic covariance matrix estimator for var[ β RE ] can be considered directly from the expression (5) where Ω * C is substituted for Ω * C and σ 2 ε for σ 2 ε , which yields: with 11) should logically be chosen as the same estimator as the one entering into 14 ψ 2 for computing Ω * C also appearing in ( 11).One usually relies on the estimator based on the 'fixed effects model' for this purpose, so that we set σ 2 ε = σ w 2 ε (Swamy-Arora approach).We use that correspondence in what follows.

Approach 2: The Quasi-Demeaning Approach
A consistent (asymptotic) covariance matrix estimator for β RE can alternatively be obtained from the quasi-demeaned regression model ( 8) considered in its feasible version with We note this feasible version of the QDM model as the FQDM regression model.In this case, and relying again on the asymptotic equivalence between β GLS and β FGLS , the resulting "plug-in" estimator can then be computed from the formula giving the covariance matrix for the OLS estimator of β in (8), i.e., (9), where X * substitutes for X * and σ * 2 ε for σ 2 ε : In line with the quasi-demeaning approach, the computation of σ * 2 ε is, here, usually considered as a byproduct of the OLS estimation process at work for the FQDM regression where u * * denotes the NT vector of the OLS residuals in the FQDM regression model.
From ( 11) and ( 12), we observe that v ar Thus, the two approaches differ in providing two distinct estimators for the variance of the (RE) estimator insofar as they rely on two different estimators of the variance component, σ 2 ε .This leads to what we call, in the following, a disturbance variance disconnect problem. 15

Comparing the Two Versions
From the previous results, it follows that two possible expressions are available for an implementable version of the Hausman test statistic, depending on which estimator of the asymptotic covariance matrix for β RE is chosen.

Two Statistics
Using v ar 1 [ β RE ], we have: or, using v ar 2 [ β RE ], we have: Hausman (1978) originally proposed using HM 1 O , which he considered as the legitimate computational version of HM O * (Hausman 1978, footnote 25, p. 1267): 16   "Note that the elements of q and its standard errors are simply calculated given the estimates β FE and of β FGLS and their standarderrors, making sure to adjust to use the fixed effects estimate of σ 2 ε ".On the other hand, as we will see in Section 4, most software programs compute the Comparing with (13), we note that HM 2 O diverges from HM 1 O as h = 1.

Main Results
To go further into the comparison of the two versions of the Hausman test statistic, we rewrite their expressions as: where we define From ( 16) and ( 17), the comparison between HM 1 O and HM 2 O is based on the one between Γ and Γ.Note that Finally, define h * min ≡ min(σ(H * )) and h * max ≡ max(σ(H * )) with σ(H * ) denoting the spectrum of H * and with: 1 < h * min < h * max .We establish the following results (see Appendix A for details and related proofs): 1.
Γ is a symmetric positive definite (SPD) matrix.It follows that HM 1 O is a positivedefinite quadratic form.

2.
Γ can be either a symmetric positive or a negative definite matrix or even an indefinite matrix depending on specific conditions holding for h.As a consequence, HM 2 O can be of either sign (and even of indeterminate sign a priori) depending on the values taken by h.Specifically, we have: In this case, HM 2 O can be of either sign, which is indeterminate a priori.

Based on those results, comparing HM 2
O relative to HM 1 O depends on whether Γ is SPD or SND.We then have: 1.
If Γ is SPD, the relevant comparison relies upon the magnitude (HM In establishing those results, we directly echo the discussions, mentioned in the introduction, about the positive definiteness of the variance-covariance matrix estimator in the expression of the Hausman test statistic.As we observe, whether this matrix is SPD or not (which translates into whether Γ or Γ is SPD or not) depends on the choice of the estimator for σ 2 ε , which, itself, hinges on the approach that is adopted to estimate the variance components associated with the RE model.In other terms, Γ is, by construction, an SPD matrix, and this has to be related to the use of the same estimator for σ 2 ε , i.e., σ w 2 ε , in the computation of the covariance matrix estimator.On the other hand, this is not necessarily the case for Γ and this has to do with the fact that two different estimators have been considered for σ 2 ε , σ w 2 ε and σ * 2 ε (h = 1).

Back to the Case Studies
Based on h and the proposed metrics for H * , we can now analyze the mechanisms driving the various outcomes observed for the case studies selected in Section 2.2.
From Tables 1-4, in 3 cases out of 4, h * min < h < h * max .It follows that in those cases, Γ is an indefinite matrix.As a consequence, the sign of HM 2 O is a priori indeterminate as well the relative magnitudes of HM 1 O and HM 2 O .The observed outcome depends on the specific value taken by β ∆WRE for the sample considered.
Conversely, in the two covariates' regression case drawn from Greene, where HM 2 O is computed as a negative scalar, we logically have h > h * max and even O , what we, by the way, also observe.

What about h?
As we have shown, the value of h is key for determining the outcome of the Hausman test if it is measured through HM 2 O .In this section, we analyze the main determinants for this ratio and provide an illustration through simulations in the single regressor case.

Determinants
The value of h essentially derives from the comparison between σ w 2 ε and σ * 2 ε and therefore, in turn, from the two residual sums of squares that are associated with, respec-tively, the Within model ( u W • u W ) and the feasible, quasi demeaning model ( u * * • u * * ).We show (see Appendix B) that the following relationship holds between the two expressions: where u B denotes the NT vector of the OLS regression residuals for the Between (transformed) regression model • u and where ∆ * is defined as: where Γ −1 is defined as in ( 16) and can be written as: 18) can be rewritten as: from which it follows that: with η ≡ K/(NT − (K + 1)).Furthermore, considering the definition of HM 1 O given in ( 16), we have: O and therefore: so that h ≷ 1 whenever HM 1 O ≷ K. Taking advantage of the relationship between h and HM 1 O , we identify two categories of determinants for HM 1 O and h from ( 16) and ( 21): • The first category is related to the structure of the data at hand, i.e., the Between and Within components of the (empirical) covariance matrix of the explanatory variables.They are captured by the matrices (X W • X W ) and(X B C • X B C ), which influence the structure of H * (and therefore the magnitude of its eigenvalues).

•
The second category is linked to the correlation between the individual effects α i and the regressors contained in X it , which determines the extent of the (asymptotic as well as finite sample) bias for β RE (with respect to β).This affects the gap between β W (that is unbiased) and β RE and therefore the value of β ∆WRE .
These same factors in turn influence the determination of HM 2 O , the value of which is mostly linked to the comparison between h and the eigenvalues of H * .
With respect to the behavior of HM 2 O compared to HM 1 O , two mechanisms are at work.Assume that HM 1 O is large because of a significant distance between the RE and Within estimator.On the one hand, we could expect that HM 2 O will also be large and therefore that both statistics will correctly lead to reject the null hypothesis.This is because when HM 1 O is large, the further h will be from 1 from the upside, and in turn, the more likely it will be for the condition 1 < h to prevail, which leads to HM 2 O > HM 1 O as long as h remains below h min .On the other hand, the larger HM 1 O , the more likely it will actually be that h > h min and this could be all the more the case as the structure of the covariance matrix would be such that h min (or even h max ) is relatively small.In this case, the sign of HM 2 O becomes indeterminate which does not allow for a clear conclusion about the relative magnitude of the two test statistics and creates a possible divergence for the interpretation of the test.

Illustrations in the Single Regressor Case
To illustrate the role of the previous factors as well as to clarify their interpretation, we perform Monte-Carlo simulations on the behavior of the main magnitudes involved in the comparison between (HM 1 O ) and (HM 2 O ) in a single-regressor model (K = 1).

Preliminary Results
In such a setting, X ={x} and the various expressions for the main estimators and statistics simplify accordingly.We measure the total sample variance in the observations for x with s 2 x where s 2 B C can be used as a measure of, respectively, the Within and the Between components (up to a NTfactor) of the total (empirical) variance in the NT observations contained in x.Finally, define θ W = x 2 W /x 2 T the share of the Within variance in the total variance.Then, substituting, we obtain:

Design of Simulations
We perform Monte-Carlo simulations on the behavior of the previous four quantities, h * , h, HM 1 O and HM 2 O .The details of the simulation design are presented in Appendix C. We generate several series of y a it based on a model y a it = α + β • x a it + u a it , where we fix α = β = 1 and we let the other parameters of the simulation vary: N = (20, 40, 80), T = (20, 40, 80), s 2 x = (0.01, 1, 100), θ W = (0.1, 0.5, 0.9), σ 2 u = (0.01, 1, 100), ρ xu = (−0.99,−0.9, −0.5, 0, 0.5, 0.9, 0.99) and ρ u = (0.1, 0.5, 0.9) with the same notations as before, ρ xu is the correlation between x and u (on the cross-sectional dimension) and ρ u is the intra-class coefficient of the error term (share of the within variance in the total variance in u) 17 .Then, for each combination of the parameters, we perform 199 replications and compute the median of HM 1 O and HM 2 O .To explore the main dimensions of variability of HM 1 O and HM 2 O , we perform an ANCOVA analysis (Table A2 in Appendix C) where we regress, respectively, each of the means and medians on the levels of the different parameters and the interactions that we found significant.

Results of the Simulations
The results are similar whether we consider the mean or the medians over the replications of the levels of HM In this case (see Table A3), HM 2 O is computed as a negative scalar with HM 1 O > HM 2 O .Note also the extremely large value of θ W for the covariate which partly drives the values taken by h * and h.

The Implementation of the Hausman Test in Standard Econometric Software Packages for Panel Data: A Brief Review and Discussion
In this section, we review how six well-known econometric software packages deal with the implementation of the Hausman test in a standard panel data model and provide some discussion.

•
STATA programming commands for the estimation of the random-effects panel data models (xtreg with the re option) rely on the specification of the quasi-demeaned model in (8).As a consequence, the random effects parameter estimates as well as its "conventional" covariance matrix estimate are provided as standard outputs of the OLS regression performed on that model.In particular, the vce(conventional)default -command yields the (asymptotic) covariance matrix estimate based on the standard variance estimator for OLS regression.This corresponds to v ar 2 [ β FGLS ] with, accordingly, σ * 2 ε used for the residual variance estimate.The default version of the command for implementing the standard Hausman test (the hausman command) corresponds accordingly to HM 2 O .

•
The R PLM package developed by Croissant and Millo (2008) allows estimating a wide range of panel data models with R software.Regarding the random-effects specification, the estimation process can be implemented via the plm function, whose model argument takes the random option.Croissant and Millo (2008) point out that it could have been possible to program the computation of the covariance matrix estimator for β FGLS directly from the formula (11), "once the variance components have been estimated and hence the covariance matrix of errors".However, to limit the computational costs associated with the inversion of the (NT × NT) matrix ( Ω) or ( Ω * C ) and the related memory limits to store it, plm resorts to the specification and estimation of the quasi-demeaning estimator (8).Then, the coefficients' covariance matrix estimator is readily calculated by applying the standard OLS formulas which, in the R language, go through the vcov() command.
The phtest command computes the Hausman test in plm.Its main arguments are the two-panel model objects that underlie the comparison (ex.model = within and model = random).The corresponding estimates of the asymptotic covariances matrices provided under both models are thus used to compute the Hausman statistic corresponding to HM 2 O .

•
EViews estimates the random effects models using feasible GLS.The first step refers to the estimation of the covariance matrix for the composite error formed by the effects and the idiosyncratic disturbance.The EViews 9 User's Guide II notes "Once the component variances have been estimated, we form an estimator of the composite residual covariance, and then GLS transform the dependent and regressor data".As for the computation of the FGLS estimate, Eviews uses the quasi-demeaned model specification and proceeds on this basis.However, the calculation of the related coefficients' covariance matrix is based on the direct application of the formula (11) and thus corresponds to The GAUSS Times Serie∞s MT 3.0 TSMT provides a fixed effects and random effects models (TSCS) package that can be implemented through the tsmt library and the one-in-all tscsFit procedure.Another possibility is to use the pdlib GAUSS library and the randomEffects procedure in it.Both procedures implement the quasidemeaning transformation on the original dataset and apply the standard OLS estimator on the transformed data so as to form the FGLS estimate.The covariance matrix estimate comes as a direct by-product of the OLS outcome so that v ar 2 [ β FGLS ] is used.The Hausman test provided in the tscsFit procedure is implemented accordingly and corresponds to HM 2 O .• SAS (SAS ETS 13.2) provides estimation methods for the standard fixed (Within), between, and random effects models in the balanced and unbalanced cases with the PANEL procedure toolbox.Standard panel data models are estimated using the PROC PANEL command with the MODEL statement specifying the regression model and the assumptions for the error structure.Specifically, FIXONE and RANONE must be used to specify the fixed-effect and the random-effect models, respectively (in the cross-sectional one-way case).In the latter case, various methods (but not the Swamy-Arora approach) are proposed to estimate, in the first stage, the variance components (through the VCOMP = option).It is explicitly indicated that the random effects FGLS estimates are then based, in the balanced case, on these variance components estimates through the quasi-demeaning approach, where 'the random effects β is then the result of simple OLS on the transformed data' (see SAS ETS 13.2 User Manual (2014), p. 1417).
The estimator for the asymptotic variance-covariance matrix is thus provided by v ar 2 [ β FGLS ].The Hausman statistic is automatically generated and reported as a conventional F statistic, with the statistic computed as HM 2 O .

Discussion
As the previous review indicates, in all but one of the packages discussed above, the Hausman test is, by default, implemented through the computation of HM 2 O with the quasi-demeaning estimator.The rationale for such a choice is computational.Indeed, the quasi-demeaning approach allows avoiding the inversion of the (NT × NT) matrix Ω or Ω * C , which can be computationally costly (in terms of time and rounding errors).Conversely, the quasi-demeaned model only requires partially demeaning the variables with θ ≡ 1 − ψ as the partial demeaning factor.Yet, as a counterpart of this standard OLS regression, σ * 2 ε is naturally chosen to compute the residual variance estimate and, in turn, yields to HM 2 O , which might, as we have seen, be an unreliable statistic for the Hausman test.In what follows, we explore some ways to circumvent the problems posed by the use of this statistic.
(1) First, it is possible to compute HM 1 O and still rely on the quasi-demeaning approach to estimate the parameters of the RE model (which, as mentioned before, is the default case in the vast majority of available econometric software).This can been easily seen from the relationship (21) that we established between h and HM 1 O .Once h has been determined, which only requires the OLS residual sums of squares from the estimation of the Within and the quasi-demeaning estimator, we can derive HM 1 O .Hence, the following procedure can be suggested, if required, to supplement the existing programs.

1.
Use the quasi-demeaning estimator to compute the RE estimator for β and σ * 2 ε .2.

3.
Rearranging (21), obtain HM 1 O from h as: Implement the Hausman test on the basis of HM 1 O .
(2) Second, depending on the software packages considered, some programming options can be used to fix the potential 'variance disconnect' problem associated with the use of the statistic HM 2 O .For example, in STATA, it is possible to use the sigmamore and/or sigmaless option commands when implementing the Hausman test.As indicated in STATA instructions: "sigmamore and sigmaless specify that the two covariance matrices used in the test be based on a common estimate of disturbance variance.sigmamore specifies that the covariance matrices be based on the estimated disturbance variance from the efficient estimator.sigmaless specifies that the covariance matrices be based on the estimated disturbance variance from the consistent estimator".Following the lines of Hausman's seminal approach would lead to the choice of sigmaless option, whereby the variance estimator, is based on the Within model, would be used 18 .This would ensure the test to be performed upon the HM 1 0 statistics.The choice of sigmamore 19 would imply to consider a third test statistics, HM 3 0 , where the common disturbance variance estimator would be based on the quasi-demeaned model, so that we would have: observe that HM 1 0 > HM 3 0 whenever h < 1.Thus, the more likely it would be to favor (even unduly) the rejection of the null hypothesis on the basis of HM 3 0 when h > 1.Some packages also offer the possibility to rely on an alternative expression for the Hausman statistic that does not involve the use of the RE estimator, so that it is immune to the variance disconnect problem.This expression was initially proposed by Hausman and Taylor (1981) and is based on the difference between the Between and Within estimators q * * ≡ β W − β B .Hausman andTaylor (1981, pp. 1382-83), establish that the resulting version of the Hausman statistic is numerically exactly identical to the one that is built upon q and used above, that is: q • [var( q)] −1 • q = q * * • [var( q * * )] −1 • q * * .Such a solution can be notably implemented in the R PLM package using the phtest command and specifying as arguments model = within and model = between.
(3) Finally, two other approaches that depart from the separate estimation of the FE and RE model parameters -which underlies the standard implementation of the Hausman test -can be emphasized.They have the advantage of solving the disturbance variance estimator disconnect problem, while allowing, more generally, for a robust implementation of the Hausman test 20 .
(3.1)The first of these approaches relies on implementing an auxiliary regression, that was initially proposed by Hausman himself together with the presentation of the standard specification test (see also Mundlak 1978).This regression takes the following form: • y and η a vector of standard random disturbances.
It can be shown that the formula of the standard Wald test statistic for testing whether γ = 0 in the previous regression framework is equivalent to the one of the standard Hausman test statistic as expressed in terms of the difference between the Between and Within estimators (see Hausman and Taylor 1981 and above)  21 .Resorting to this auxiliary regression framework has two advantages.First, it involves only one estimator for the covariance matrix in the Wald test statistic formula, that one for γ, which is immune to the positive definiteness problem that can be encountered with the standard Hausman test statistic.Second, and as underlined by Baltagi and Liu (2007), it can be made robust to heteroskedasticity of unknown form (see, also, Arellano 1993). 22Once the variables have been transformed to be included as regressors in the auxiliary regression framework, the latter can be implemented in a rather standard way in any of the software econometric packages we have reviewed supra.
(3.2) The second approach goes through implementing White (1982)'s reformulated Hausman specification test that is based on the Maximum Likelihood (ML) estimation of the FE and RE model parameters.The related test statistic takes the form: ) denoting the ML estimator related to the Within (resp RE) regression framework; S would serve as the covariance matrix estimator and involves the information matrices for both estimators of β, as well as outer products of scores within and between the two models under concern.White (1982) shows that S remains positive definite even under misspecification (including heteroskedasticity).
The last two procedures we have presented could even be suggested to be used in the first place when assessing the relevance of the RE-model specification as they allow for globally robust implementation of the Hausman test in the context of panel data models (if only, insofar as they do not require to be directly based on the use of the RE (FGLS) disturbance variance estimator).

Conclusions
In this paper, we provide new analytical results of the behavior of the Hausman statistic for the test of orthogonality between the individual effects and the error term in a static and balanced panel data model.We compare the Hausman statistic computed with direct FGLS implementation and the Hausman statistic computed on the quasi-demeaned model.We show that this difference depends upon several parameters; in particular, the between-within structure of the regressors.We show by means of a Monte Carlo simulation in the single regressor case and of a set of well-known textbook examples that the difference can be substantial and that in some cases, the Hausman statistic computed on the basis of the quasi-demeaned model can yield strong negative values.Therefore, despite its computational advantage, the quasi-demeaned model should not be used prima facie as the basis of the computation of the Hausman statistic.We suggest, if needed, to supplement the existing software instructions so as to be able to compute in any case the relevant statistic.Extensions can include deriving these analytical results for unbalanced panel data models, two-way component models and dynamic panel models.
We then deduce from Lemma1 that H * can be diagonalised and that there exists a non singular matrix P such that : that the spectrum of H * is only composed of strictly positive elements.
• Let R = (X W • X W ) −1 and M = H * −1 .By construction, R and (R • M) are again two real symmetric positive matrices.We deduce from Lemma1 that H * −1 can be diagonalised and that there exists a non singular matrix P such that: , then, on the basis of the results above, we can write (P • Γ • P) as: Then, let x denote a non-null vector, and setting y ≡ P −1 • x, we obtain the spectral decomposition of Γ as: Since we know that Γ is SPD, we obtain from the latter decomposition that ∀i, 1 − 1/(λ i (H * )) > 0. which implies that h and proceeding as for above, we can write (P • Γ • P) as: Let x denote a non-null vector, and setting y ≡ P −1 • x, we obtain the spectral decomposition of Γ as: From this spectral decomposition, we conclude that -Γ will be SPD if and only if ∀i, 1 − h λ i (H * ) > 0, that is, if and only if h < h * min is fulfilled.

-
Γ will be SND if and only if ∀i, 1 − h λ i (H * ) < 0, that is, if and only if h > h * max is fulfilled.
We now use the previous results to compare HM 1 O and HM 2 O .For that purpose, we must distinguish according to whether HM 2 O is (for sure) a positive or (for sure) a non-positive quadratic form.This does in turn depend on whether Γ is SPD or SND.

•
If Γ is SPD, the relevant comparison can be built on the magnitude HM 1 O − HM 2 O .The former can be written as: Given the definition of Ξ, and since Γ is SPD, this, in turn, depends on whether Γ − Γ is SPD or SND.
Then, observe that is SPD, it follows that whether Γ − Γ is SPD or SND depends on whether h ≶ 1.

•
If Γ is SND, the relevant comparison is between HMO 1 and |HMO 2 |, and can be built on HMO 1 − |HMO 2 | .The former magnitude can be written as: Given the definition of Ξ * , and as Γ * is SPD (since Γ is SND), this, in turn, depends on whether Γ − Γ * is SND or SPD.Then, observe that Γ − Γ Proceeding as for Γ above, we can write (P • Γ + Γ • P) as: Note by x a non-null vector, and setting y ≡ P −1 • x, we obtain the spectral decomposition of (Γ + Γ) as: From this spectral decomposition, we conclude that: Hence, we obtain the following table covering all cases: correlation coefficient in the variance component literature).We let also the degree of correlation between the regressor and the (composite) error term on the cross-sectional dimension vary across the experiments (we call it ρ xu ).Accordingly, for a given value of s 2 x ; θ W ; σ 2 u ; ρ u ; ρ xu as well as for a given size of the sample (N and T given), we generate (with replications) one series for u = {u it } t=1,...,T i=1,...,N and one for x = {x it } t=1,...,T i=1,...,N and consequently one series for y = {y it } t=1,...,T i=1,...,N ), on the basis of which the different estimation procedures and the computation of the statistics of interest can be implemented.In particular, this issue is generally not addressed in the leading textbooks in panel data econometrics.An exception is Wooldridge (2010), (chp. 10, pp. 289-90), but he merely mentions the possibility of obtaining a non-positive definite covariance matrix if different estimates of the error term variance are used, suggesting a way out, which we further discuss below.

Appendix C.3. ANCOVA Results
3 See Nerlove (1971) and Fuller and Battese (1973), for an original exposition of the transformed, quasi-demeaned model and the related approach.
4 Schreiber (2008) also considers a panel data framework as an illustration for the asymptotic results and highlights cases where the matrix is not SPD in a context where the error term variance estimates differ.He does not, however, focus on the comparison of the different estimation approaches in the random effects model and their implications for the computation of the Hausman test statistic, as we do. 5 We assume in what follows that there is no time-invariant regressor in X it , so that it is possible to compute the β-estimator with the Within transformation of X it (see below). 6 See the initial study by Baltagi and Griffin (1973).The dataset is available at: https://www.wiley.com/legacy/wileychi/baltagi/datasets.html,accessed on 2 May 2023. 7 Interestingly, in an updated version of his textbook, Baltagi (2021) modified the presentation of the Gasoline case study compared to the one provided in 2005 and presented here.In this update, there is no more disconnection between the two versions of the statistic, and only one version is considered, HM 1 O .While, in both presentations, the estimations are drawn from the STATA software package, the second presentation benefits from the use of the sigmaless option command that fixes the computation of the estimator for the idiosyncratic component of the error term.See infra in Section 4.2.

8
The original study is from Greene (1999).The dataset is available at: http://pages.stern.nyu.edu/~wgreene/Text/tables/tablelist5.htm,accessed on 2 May 2023.9 The dataset is available at: http://pages.stern.nyu.edu/~wgreene/Text/Edition7/tablelist8new.htm,accessed on 2 May 2023.12 Given the definition of those matrices, this replacement relies on the use of consistent estimators for the variance components, i.e., σ 2 ε , σ 2 α and/or σ 2 αε .For a discussion about these variance component estimators, see, among others, Amemiya (1971); Fuller and Battese (1974); Maddala (1971); Nerlove (1971); Swamy and Arora (1972); Wallace and Hussain (1969). 13 In particular, we assume that Ω * (resp.Ω * C ) is a consistent estimator for Ω * (resp.Ω * C ). See Wooldridge (2010) for a discussion on these conditions.It can be easily checked that the two approaches give rise to the same (RE) estimator for the parameters, α and β. 16 The emphasis is added by us.The notation used by Hausman for the fixed-effects estimate of β ( β FE ) corresponds to our β W . 17 The intra-class coefficient drives the value of ψ 2 .It can be shown, indeed, that: Note that this estimator is also used to compute ψ 2 and in turn Ω * C , which makes it fully, logically consistent with respect to the FGLS regression model framework.
19 This is, e.g., recommended by Cameron and Trivedi (2009), p. 360. 20 We thank both referees for having highlighted those approaches and suggested to account for them in this discussion subsection.21 in this case, indeed, γ = q * * .

22
Baltagi and Liu show in (Baltagi and Liu 2007) that the Hausman test can be obtained equivalently from other artificial regressions, involving the use of the set of Between-transformed regressor variables, X B , or even the set of the initial regressor variables, X.They also discuss the case where the auxiliary regression can accommodate the presence of potentially endogenous regressors.With respect to the issue of weak instruments in this context, see also (Staiger and Stock 1997).
1 O and HM 2 O for each combination of parameters.The value of HM 1 O and HM 2 O significantly depend on T, θ W , high absolute values of ρ xu and ρ u .Conversely, the scale parameters x 2 T and σ 2 u do not have a significant impact on HM 1 O and HM 2 O .The visual representation of the behavior of (the median) HM 1 O and HM 2 O for N = 80, T = 80, s 2 x = σ 2 u = 1 and ρ u = 0.5 is in Figure 1.This illustrates the strong dependence of the values of the Hausman statistics on θ W on the one hand and on the correlation between x and u on the other hand.In particular, for high absolute values of ρ xu and high values of θ W , HMO 2 even yields negative values.This latter result is, for example, consistent with the regression outcomes obtained with log[ f uel price] as a single covariate in the framework of motivating Example 2 [Airline].

Figure 1 .
Figure 1.Distribution of the median of HM 1 O and HM 2 O , N = 80, T = 80, s 2 x = σ 2 u = 1 and ρ u = 0.5.Finally, looking more closely at the distribution of negative values for HM 2 O (Figure A1 in Appendix C), we clearly see that they are an increasing function of ρ u , θ W and ρ xu (correlation between x and u) in absolute value.While the empirical size of both tests remains around 5% with little variations for the various values of the parameters, the power of HM 2 O consequently drops to 0 for the highest values of ρ u , θ W and ρ xu (Figure A2 in Appendix C).

10A
typical it-observation for y * is given by y * it = y it − θ • y i.where θ ≡ 1 − ψ and y i. .A similar transformation is applied for each of the components of X, hence the quasi-demeaning expression for the transformed model that is obtained in that way.11Indeed, β OLS = (X * * • (I NT − B NT ) • X * * ) −1 • (X * * • (I NT − B NT ) • y * * ), which is equivalent to (4) given the definition of X * * , y * * and Ω * −1 C .

14
The estimator for σ 2 αε is usually computed from the sum of the squares of the OLS regression residuals, ( u B • u B ), for the Between (transformed) regression model withσ 2 αε = u B • u B N−(K+1). (Swamy-Arora approach), see infra. 15

23
Take two symmetric matrices A and B. We denote by A B the property according to which A − B is a positive definite matrix (what we can also write as A − B 0).24If A and B are two symmetric positive definite matrices and are non-singular, then A B =⇒ B −1 A −1 .25 As a reminder, note that ( X * • X
Hausman test statistic based on the second measure, HM 2 O .Consequently, it is important to analyze how HM 2 O behaves in relation to HM 1 O in finite sample settings.For that purpose, define the ratio h as h ≡ σ * 2 ε / σ w 2 ε .HM 2 O can then be rewritten as: Table 5 provides an overview of all possible cases.A wide spectrum of outcomes can be obtained for the value of HM 2 O depending on the value of h.Unreliable results for the test may arise notably when HM 2 O is negative.

Table 5 .
Review of all cases.
The procedure for the Hausman test then corresponds to HM 1 O .• MATLAB provides estimation methods for the standard fixed (Within), between, and random effects models with the the panel data toolbox.Panel data models are estimated using the panel (•) function with the options argument set to re for the random effects model specification.The random effects FGLS estimates are based on the quasi-demeaned model and the asymptotic variance-covariance matrix for statistical inference is accordingly provided by v ar 2 [ β FGLS ] (see Equation (18) in Alvarez et al. (2017)).Then, hausmantest computes the Hausman test where the input of the hausmantest function requires the output structures of the two estimations to be compared.Accordingly, the statistics that are computed correspond to HM 2 O .

Table A1 .
Review of all cases.
h < h * min ( Γ is SPD and HM

Table A2 .
ANCOVA on simulation results.
Schreiber (20088), based on Holly (1982), examines cases in which this problem can arise even asymptotically, when the alternative hypothesis is true. 2