Consistency in Estimation and Model Selection of Dynamic Panel Data Models with Fixed Effects

Li, Guangjie

doi:10.3390/econometrics3030494

Open AccessArticle

Consistency in Estimation and Model Selection of Dynamic Panel Data Models with Fixed Effects

by

Guangjie Li

Cardiff Business School, Cardiff University, Aberconway Building, Colum Drive, Cardiff CF10 3EU, UK

Econometrics 2015, 3(3), 494-524; https://doi.org/10.3390/econometrics3030494

Submission received: 27 February 2015 / Accepted: 16 June 2015 / Published: 10 July 2015

Download

Browse Figures

Versions Notes

Abstract

:

We examine the relationship between consistent parameter estimation and model selection for autoregressive panel data models with fixed effects. We find that the transformation of fixed effects proposed by Lancaster (2002) does not necessarily lead to consistent estimation of common parameters when some true exogenous regressors are excluded. We propose a data dependent way to specify the prior of the autoregressive coefficient and argue for comparing different model specifications before parameter estimation. Model selection properties of Bayes factors and Bayesian information criterion (BIC) are investigated. When model uncertainty is substantial, we recommend the use of Bayesian Model Averaging to obtain point estimators with lower root mean squared errors (RMSE). We also study the implications of different levels of inclusion probabilities by simulations.

Keywords:

dynamic panel data model with fixed effects; incidental parameter problem; consistency in estimation; model selection; bayesian model averaging

JEL classifications:

C52, C11, C13, C15

1. Introduction

For a panel linear regression model with lags of the dependent variable as regressors (dynamic panel model) and agent specific fixed effects, the maximum likelihood estimators (MLE) of the common parameters, whose number does not change with sample size, are inconsistent when the number of time periods is small and fixed, see Nerlove [1] and Nickell [2]. This problem, known as the “incidental parameter problem”, has been reviewed by Lancaster [3]. A plethora of studies have been undertaken to obtain consistent estimators for the common parameters in dynamic panel models. Among them there are two main approaches: one is to use the generalized method of moments (GMM), see the overview in Hsiao [4]; the other is based on modified profile or integrated likelihood, see e.g., the recent works by Bester and Hansen [5], Hahn and Kuersteiner [6], Arellano and Bonhomme [7], Dhaene and Jochmans [8]. Researchers using these two approaches usually presume the moment conditions or the parametric models are correctly specified and the issue of model selection has relatively attracted less attention. Correct model specification is very important, without which consistent parameter estimation can not be achieved. In Andrews and Lu [9], the authors proposed model and moment selection criteria (MMSC) under GMM context based on J-test statistics to address the issue. However, for dynamic panel models, GMM will suffer from the weak instrument problem when the coefficient for the lagged dependent variable is close to 1. Hence MMSC is unlikely to work for such situation.

Lee and Phillips [10] used the bias reducing prior from Arellano and Bonhomme [7] to develop integrated likelihood information criterion to study lag order selection in dynamic panel models. The prior in Arellano and Bonhomme [7] is designed to obtain first-order (in the time dimension) unbiased estimators.1 Lancaster [11] suggested a way to reparameterize the fixed effects to achieve consistent estimation (not just first-order) in the panel. While Lee and Phillips [10] only considered stationary data in their application, we show that it is possible for Lancaster's method to handle non-stationary data. Different from Lee and Phillips [10], our paper focuses on the selection of exogenous regressors rather than lag order selection. For the purpose of model comparison, proper priors must be used for parameters not common to all the models to avoid Bartlett's paradox when Bayes factors are used (see e.g., [12]). Dhaene and Jochmans [8] found that the modified profile likelihood with Lancaster's correction term can be infinite for infinite parameter support, which implies a prior to ensure the posterior distribution to be proper should be used in a Bayesian context. We develop a data dependent proper prior to combine with Lancaster's reparameterization to calculate Bayes factors and find that model selection based on Bayes factors is inconsistent only for very extreme situations, such as when the number of time periods is 2 or when the true value of the lag coefficient is less than

- 1

. On the other hand, model selection based on Bayesian information criterion (BIC) with the parameters evaluated at the biased MLE can be inconsistent under more common circumstances. From an empirical point of view, researchers could often be confronted with a large number of possible regressors and hence many possible models. Model uncertainty leads to estimation risks especially for small samples since the estimates that we use from a misspecified model could be far away from the true parameter values and hence misleading. From our simulations, we find that Bayesian model averaging (BMA) can reduce such risk and produce point estimators with lower root mean squared errors (RMSE).

The plan of the paper is as follows. Section 2 summarizes the model and the posterior results with the estimation strategies discussed. Section 3 gives our motivations to compare different model specifications and shows the conditions under which our estimator will be consistent when the model is misspecified. Section 4 presents the conditions under which Bayes factors and BIC can be consistent in model selection. In Section 5, we carry out simulation studies to verify our claims before Section 6 concludes.

2. The Model and the Estimation

Here we investigate the first order autoregression linear panel model with a fixed effect,

f_{i}

,

\begin{matrix} y_{i, t} & = f_{i} + y_{i, t - 1} ρ + x_{i, t}^{'} β + u_{i, t}, \\ i & = 1 ... N, t = 1 ... T . \end{matrix}

(1)

where ρ is a scalar and

x_{i, t}

is a

k \times 1

vector of explanatory variables. Denote

u_{i}

as

{(u_{i, 1}, u_{i, 2}, \dots, u_{i, T})}^{'}

,

X_{i} = {(x_{i, 1}, x_{i, 2}, \dots, x_{i, T})}^{'}

and

y_{i}

as

{(y_{i, 1}, y_{i, 2}, \dots, y_{i, T})}^{'}

. We can rewrite Equation (1) into the vector form below,

y_{i} = f_{i} ι + y_{i_} ρ + X_{i} β + u_{i},

(2)

where ι is a

T \times 1

vector of ones. By repeated substitution, we can obtain

y_{i_} = f_{i} ζ_{1} + y_{i, 0} ζ_{2} + C X_{i} β + C u_{i},

(3)

where

y_{i_} = {(y_{i, 0}, y_{i, 1}, \dots, y_{i, T - 1})}^{'}

,

ζ_{1} = (\begin{matrix} 0 \\ 1 \\ 1 + ρ \\ \dots \\ 1 + ρ + ρ^{2} + \dots + ρ^{T - 2} \end{matrix}), ζ_{2} = (\begin{matrix} 1 \\ ρ \\ ρ^{2} \\ \dots \\ ρ^{T - 1} \end{matrix}), C = (\begin{matrix} 0 & 0 & \dots & 0 \\ 1 & 0 & \dots & 0 \\ ρ & 1 & \dots & 0 \\ \dots & \dots & \dots & \dots \\ ρ^{T - 2} & ρ^{T - 3} & \dots 1 & 0 \end{matrix})

The following are the assumptions we use throughout the paper.

Assumption 1.

u_{i} | X_{i}, f_{i}, y_{i, 0}, σ^{2}, ρ, β \sim i . i . d . (0, σ^{2} I_{T})

where

I_{T}

is an identity matrix with dimension

T \geq 2

.

Assumption 2. (a)

{(X_{i}, f_{i}, y_{i, 0})}

is a cross-sectionally independent sequence;

(b): $E (| y_{i, 0}^{2} |^{1 + δ}) < ∞$ , $E (| f_{i}^{2} |^{1 + δ}) < ∞$ and $E (| x_{i, t, h}^{2} |^{1 + δ}) < ∞$ , for some $δ > 0$ , all $i = 1, 2, \dots, N$ , $t = 1, 2, \dots, T$ and $h = 1, 2, \dots, k$ , where $x_{i, t, h}$ denotes the hth element in $x_{i, t}$ ;
(c): k and T are finite;
(d): $E (\frac{\sum_{i = 1}^{N} X_{i}^{'} H X_{i}}{N})$ is finite and uniformly positive definite (see [13], p. 22) where $H = I_{T} - \frac{ι ι^{'}}{T}$ ;
(e): For any finite values of ρ, the following expression is uniformly positive, i.e., given sufficiently large N,

$\begin{matrix} \frac{1}{N} \sum_{i = 1}^{N} {E [{(y_{i_} - C u_{i})}^{'} H (y_{i_} - C u_{i})] - \\ E [{(y_{i_} - C u_{i})}^{'} H X_{i}] E {(\sum_{i = 1}^{N} X_{i}^{'} H X_{i})}^{- 1} E [\sum_{i = 1}^{N} X_{i}^{'} H (y_{i_} - C u_{i})]} > 0 . \end{matrix}$

(4)

Assumption 1 implies that

X_{i}

,

f_{i}

and

y_{i, 0}

are strictly exogenous. In comparison to the i.i.d. regularity conditions in Lancaster [11], Assumption 2 (a)–(d) allow the distribution of

X_{i}

,

f_{i}

and

y_{i, 0}

to be heterogeneous for cross sectional units with slightly more rigorous conditions on their moments such that the asymptotic results in the paper can hold. Assumption 2 (e) is used to simplify the proofs of Proposition 4 and Lemma 10 in Appendix D. Its purpose is to prevent the (within-group) regression of

(f_{i} ζ_{1} + y_{i, 0} ζ_{2} + C X_{i} β)

on fixed effects and

X_{i}

from having perfect fit asymptotically, i.e., R-squared tends to 1 as N increases and to ensure the true value of ρ to be the local mode of its marginal posterior (discussed later) asymptotically. When

β = 0

(no exogenous regressors in the model), Assumption 2 (e) rules out

f_{i} = 0

. When

T \geq 3

, if Assumption 2 (e) is satisfied and

β \neq 0

, as shown in Appendix D (Equation (53) and its discussion), the following probability limit should also be strictly positive,

\underset{N \to ∞}{p l i m} \frac{1}{N} [\sum_{i = 1}^{N} β^{'} X_{i}^{'} C^{'} M_{ζ} C X_{i} β - \sum_{i = 1}^{N} β^{'} X_{i}^{'} C^{'} M_{ζ} X_{i} {(\sum_{i = 1}^{N} X_{i}^{'} M_{ζ} X_{i})}^{- 1} \sum_{i = 1}^{N} X_{i}^{'} M_{ζ} C X_{i} β] > 0,

(5)

where

M_{ζ} = I_{T} - ζ {(ζ^{'} ζ)}^{- 1} ζ^{'}

and

ζ = (ζ_{1}, ζ_{2})

. In practice, one could calculate the expression after

p l i m

in Equation (5) to check Assumption 2 (e) with ρ and β replaced by their consistent estimates. If the value of the expression decreases towards 0 with N,2 there would be concern for Assumption 2 (e). We would think such case should be very rare with real data.

The MLE of the common parameters, ρ, β and

σ^{2}

, are not consistent due to the presence of the incidental parameter

f_{i}

, whose number will increase with N. For fixed T, it is impossible for the MLE of

f_{i}

to be consistent. When the predetermined regressor

y_{i, t - 1}

is included, the MLE for ρ will be correlated with that of

f_{i}

and will also be inconsistent. To obtain the consistent estimators for the common parameters, Lancaster [11] suggested the following way to reparameterize the fixed effect:

f_{i} = g_{i} exp [- ϕ (ρ)] - \frac{1}{T} ι^{'} X_{i} β,

(6)

where

g_{i}

is the new fixed effect, ι is a vector of ones and

ϕ (ρ)

is defined as

ϕ (ρ) = \frac{1}{T} \underset{t = 1}{\sum^{T - 1}} \frac{T - t}{t} ρ^{t} .

(7)

We use the following prior for ρ, β and

σ^{2}

and

g = {(g_{1}, g_{2}, . . ., g_{N})}^{'}

:

p (g, σ^{2}, ρ, β) = p (g_{1}) . . . p (g_{N}) p (σ^{2}) p (ρ) p (β | σ^{2}) \propto \frac{1}{σ^{2}} I (ρ_{L} \leq ρ \leq ρ_{U}) p (β | σ^{2}, X) .

(8)

In other words, a flat prior for g and Jeffreys' prior for

σ^{2}

are used. is uniformly distributed over

[ρ_{L}, ρ_{U}]

. The specifications of

ρ_{L}

and

ρ_{U}

will be discussed in Proposition 4 later.

p (β | σ^{2})

takes the form of g-prior in [14]:

β | σ^{2}, X \sim N (0, σ^{2} {(η \overset{N}{\sum_{i = 1}} X_{i}^{'} H X_{i})}^{- 1}),

(9)

where

X = (X_{1}, X_{2}, \dots, X_{N})

. The strength of the prior depends on the value of η. The smaller is η, the less informative is our prior. As discussed in Section 4, to ensure model selection consistency we can choose

0 < η (N) = O (N^{α})

for

α < 0

. For our simulation studies below, we choose

η = \frac{1}{N T}

. The posterior results of the model are summarized below.

Proposition 3. The posterior distributions of the parameters in our model will take the following forms:

g_{i} | y_{i}, y_{i, 0}, σ^{2}, ρ \sim i . i . d . N (e^{ϕ (ρ)} \frac{ι^{'} (y_{i} - y_{i_} ρ)}{T}, \frac{σ^{2}}{T} exp [2 ϕ (ρ)]),

(10)

β | σ^{2}, ρ, Y, Y_{0}, X \sim N (\frac{{(\overset{N}{\sum_{i = 1}} X_{i}^{'} H X_{i})}^{- 1} \overset{N}{\sum_{i = 1}} X_{i}^{'} H (y_{i} - y_{i_} ρ)}{η + 1}, \frac{σ^{2} {(\overset{N}{\sum_{i = 1}} X_{i}^{'} H X_{i})}^{- 1}}{η + 1}),

(11)

σ^{2} | ρ, Y, Y_{0}, X \sim I G (N (T - 1), a ρ^{2} - 2 b ρ + c),

(12)

p (ρ | Y, Y_{0}, X) \propto I (ρ_{L} < ρ < ρ_{U}) exp (N ψ (ρ)),

(13)

where

Y_{0} = (y_{1, 0}, y_{2, 0}, \dots, y_{N, 0})

,

Y = (y_{1}, y_{2}, \dots, y_{N})

and

\begin{matrix} ψ (ρ) & = & ϕ (ρ) - \frac{T - 1}{2} ln (\frac{a}{N} ρ^{2} - 2 \frac{b}{N} ρ + \frac{c}{N}), \end{matrix}

(14)

\begin{matrix} a & = & \overset{N}{\sum_{i = 1}} y_{i_}^{'} H y_{i_{_}} - \frac{1}{η + 1} \overset{N}{\sum_{i = 1}} y_{i_}^{'} H X_{i} {(\overset{N}{\sum_{i = 1}} X_{i}^{'} H X_{i})}^{- 1} \overset{N}{\sum_{i = 1}} X_{i}^{'} H y_{i_}, \end{matrix}

(15)

\begin{matrix} b & = & \overset{N}{\sum_{i = 1}} y_{i_}^{'} H y_{i} - \frac{1}{η + 1} \overset{N}{\sum_{i = 1}} y_{i_}^{'} H X_{i} {(\overset{N}{\sum_{i = 1}} X_{i}^{'} H X_{i})}^{- 1} \overset{N}{\sum_{i = 1}} X_{i}^{'} H y_{i}, \end{matrix}

(16)

\begin{matrix} c & = & \overset{N}{\sum_{i = 1}} y_{i}^{'} H y_{i} - \frac{1}{η + 1} \overset{N}{\sum_{i = 1}} y_{i}^{'} H X_{i} {(\overset{N}{\sum_{i = 1}} X_{i}^{'} H X_{i})}^{- 1} \overset{N}{\sum_{i = 1}} X_{i}^{'} H y_{i} . \end{matrix}

(17)

I G (\cdot)

denotes inverted gamma distribution with degrees of freedom

N (T - 1)

and mean

\frac{a ρ^{2} - 2 b ρ + c}{N (T - 1) - 2}

.

Note that a and c in Equations (15) and (17) are close to the sum of squared residuals (SSR) obtained by respectively regressing

y_{i_}

and

y_{i}

on fixed effects and

X_{i}

.3

ϕ (ρ)

in Equation (14) is the term from Lancaster's reparameterization, which corrects the marginal posterior local mode of ρ to make it consistent. Dhaene and Jochmans [8] showed that the modified profile likelihood function with

ϕ (ρ)

can be infinite for

ρ \to ∞

. Analogical to their results, we show that the marginal posterior of ρ will be infinite and hence improper when

ρ \to ∞

or when T is odd and

ρ \to - ∞

in Appendix D. Lancaster [11] noted such behaviour of ρ's marginal posterior in simulations, but did not discuss much on how to specify the boundary points. In Proposition 4, we provide a data dependent way to specify

ρ_{L}

and

ρ_{U}

, which is necessary for model comparison to avoid Bartlett's paradox. First note that the probability limit of

ψ (ρ)

is

\begin{matrix} \underset{̲}{ψ} (ρ) & = & \underset{N \to ∞}{p l i m} ψ (ρ) = ϕ (ρ) - \frac{T - 1}{2} ln d (ρ), \end{matrix}

(18)

where

d (ρ) = \underset{N \to ∞}{p l i m} (\frac{a}{N} ρ^{2} - 2 \frac{b}{N} ρ + \frac{c}{N}) .

(19)

Proposition 4. If

X_{i}

are the true set of exogenous regressors used to generate

y_{i}

, under Assumption 1 and 2, asymptotically the marginal posterior of ρ in Equation (13) will have more than one stationary points satisfying

{\underset{̲}{ψ}}^{'} (ρ) = 0

: 3 stationary points when T is odd and 2 when T is even, regardless of the true value of ρ. The local posterior mode, which is a consistent point estimator, is the stationary point nearest to the MLE satisfying

{\underset{̲}{ψ}}^{''} (ρ) < 0

asymptotically, which the other stationary point(s) do not satisfy.

ρ_{U}

can be specified as the stationary point on the right of the posterior mode. When T is odd,

ρ_{L}

can be specified as the stationary point on the left of the posterior mode; when T is even,

ρ_{L}

could be chosen as a function of N such that

ρ_{L} (N) < 0

is sufficiently small.4

Choosing the boundary points as in Proposition 4 can ensure the marginal posterior of ρ to be proper and its support to contain the true value of ρ asymptotically.

ρ_{L}

and

ρ_{U}

are different from the boundary points in the constrained maximization in Dhaene and Jochmans [8], who only considered parameter estimation. The interval of our boundary points is wider than theirs, since we want to preserve the bell-shaped part of the posterior density curve for model comparison. Another point to note is that when the true exogenous regressors are included, the local posterior mode will exist regardless of the true value of ρ (even if it is 1) due to Assumption 2.2 (e). Next we investigate the consequences when we can not include the true regressors.

3. Motivations and Methods to Compare Different Model Specifications

In empirical applications, researchers are often faced with many possible regressors suggested by different economic theories to be included into Equation (1). Different models are defined by the inclusion of different combinations of the exogenous regressors and by whether or not the lagged dependent variable is present. Proposition 5 below implies there is no guarantee that the posterior mode in Equation (13) is a consistent estimator if some true regressors are excluded from the model.

Proposition 5. The posterior mode in Equation (13) is a consistent estimator for ρ if and only if we have either

h_{2} (β, \underset{̲}{ρ}) = h_{3} (β) = 0,

(20)

or

\frac{- (T - 1) h_{2} (β, \underset{̲}{ρ})}{h_{3} (β)} = h (\underset{̲}{ρ})

(21)

where

h (ρ) = \overset{T - 1}{\sum_{t = 1}} \frac{T - t}{T} ρ^{t - 1} = \frac{d ϕ (ρ)}{d ρ} = \frac{1}{T} ι^{'} ζ_{1} = - t r a c e (C^{'} H),

(22)

\begin{matrix} h_{2} (β, \underset{̲}{ρ}) = \underset{N \to ∞}{p l i m} \frac{1}{N} [\overset{N}{\sum_{i = 1}} y_{i_}^{'} H {\underset{̲}{X}}_{i} β - \overset{N}{\sum_{i = 1}} y_{i_}^{'} H X_{i} {(\overset{N}{\sum_{i = 1}} X_{i}^{'} H X_{i})}^{- 1} \overset{N}{\sum_{i = 1}} X_{i}^{'} H {\underset{̲}{X}}_{i} β], \\ h_{3} (β) = \underset{N \to ∞}{p l i m} \frac{1}{N} [\overset{N}{\sum_{i = 1}} β^{'} {\underset{̲}{X}}_{i}^{'} H {\underset{̲}{X}}_{i} β - \overset{N}{\sum_{i = 1}} β^{'} {\underset{̲}{X}}_{i}^{'} H X_{i} {(\overset{N}{\sum_{i = 1}} X_{i}^{'} H X_{i})}^{- 1} \overset{N}{\sum_{i = 1}} X_{i}^{'} H {\underset{̲}{X}}_{i} β] . \end{matrix}

(23)

Here

\underset{̲}{X}

represents the regressors in the true model and X denotes the regressors we actually include in our candidate model, while

\underset{̲}{ρ}

is the true value of ρ.

The values of

h_{2} (β, \underset{̲}{ρ})

and

h_{3} (β)

depend on how the true regressors and the included regressors are related, apart from the values of β and

\underset{̲}{ρ}

. For

h_{2} (β, \underset{̲}{ρ}) = h_{3} (β) = 0

to be satisfied, it suffices that the true regressors

\underset{̲}{X}

are a subset of X.5 When some true regressors are excluded, the model will suffer omitted variable bias unless Equation (21) holds. Given Assumption 1 and 2, one example for Equation (21) to hold could be that all the true regressors are covariance stationary and they have no serial correlation; the included regressors have zero correlation with the true regressors; moreover,

ζ_{1}^{'} H \lim_{N \to ∞} \frac{1}{N} \sum_{i = 1}^{N} E (f_{i} {\underset{̲}{X}}_{i}) β = ζ_{2}^{'} H \lim_{N \to ∞} \frac{1}{N} \sum_{i = 1}^{N} E (y_{i, 0} {\underset{̲}{X}}_{i}) β = 0

. For this restrictive case, it will be possible to estimate ρ consistently without any true regressors included.6

To avoid inconsistent estimation due to model misspecification, one could include all the potential regressors into the model. For finite sample, however, that could inflate the posterior variances for the coefficients of the true regressors if too many irrelevant regressors are included. The simulation studies in Section 5 reveal that while including all the regressors does not influence the estimation of ρ in comparison to other consistent approaches, it leads to substantially high RMSE when estimating β. Hence appropriate procedures for model selection are desirable. In a Bayesian framework, one can evaluate different model specifications, denoted by

M_{i}

below, by their posterior model probabilities, which can be calculated as

\begin{matrix} p (M_{j} | Y, Y_{0}, X_{j}) & = \frac{p (M_{j}) p (Y | Y_{0}, X_{j}, M_{j})}{p (Y | Y_{0})} \\ = \frac{p (M_{j}) p (Y | Y_{0}, X_{j}, M_{j})}{\sum_{i = 1}^{2^{K + 1}} p (M_{i}) p (Y | Y_{0}, X_{i}, M_{i})}, \end{matrix}

(24)

where

X_{j}

denotes the regressors included under

M_{j}

and

p (Y | Y_{0}, X_{j}, M_{j})

is the marginal likelihood, obtained by integrating out ρ in Equation (13) or (43) in Appendix C. K is the number of all potential exogenous regressors. The total number of models is therefore

2^{K + 1}

.

p (M_{j})

is the prior model probability of model j. For finite samples, Ley and Steel [15] showed that the choice of prior model probability will affect the posterior results to a large extent when the number of potential regressors is large compared to the sample size. In what follows, we focus on the asymptotic behaviour of posterior model probabilities and assume all the models are equally possible a priori. The posterior model probability

p (M_{j} | Y, Y_{0}, X_{j})

will hence only depend on

p (Y | Y_{0}, X_{j}, M_{j})

.

4. Consistency in Model Selection

In this section, we discuss the situations when the posterior model probability of the true model will tend to 1 as N tends to infinity. We will also analyze whether Bayesian information criterion (BIC) based on biased MLE is consistent in model selection.

For static panel models when the true value of ρ is zero and the lagged dependent variable is not included as a regressor, the analysis of Bayes factors is similar to that of Fernandez et al. [16]. In our context, we can ensure model selection consistency by setting η as a function of N with

0 < η (N) = O (N^{α})

for

α < 0

. As for BIC, it is consistent in model selection for static panel models.

Let us now consider the case when our candidate model (

M_{1}

) contains

y_{i_}

and

X_{i 1}

.

M_{1}

is compared to either

M_{0}

, which has

X_{i 0}

but no

y_{i_}

, or

M_{2}

, which has

X_{i 2}

and

y_{i_}

.

X_{i j}

denotes the exogenous regressors included under model

M_{j}

for

j = 0, 1, 2

, which satisfy Assumption 2.1 and 2.2.

X_{i 1}

can be the same or different from

X_{i 0}

, while

X_{i 2}

is different from

X_{i 1}

. The Bayes factors respectively are:

\frac{p (Y | Y_{0}, X_{1}, M_{1})}{p (Y | Y_{0}, X_{0}, M_{0})} = {(\frac{η}{η + 1})}^{\frac{k_{1} - k_{0}}{2}} \frac{\underset{ρ_{L_{1}}}{\int^{ρ_{U_{1}}}} exp [N ψ (ρ | M_{1})] d ρ}{(ρ_{U_{1}} - ρ_{L_{1}}) {(\frac{c_{| M_{0}}}{N})}^{- \frac{N (T - 1)}{2}}},

(25)

\frac{p (Y | Y_{0}, X_{1}, M_{1})}{p (Y | Y_{0}, X_{2}, M_{2})} = \frac{ρ_{U_{2}} - ρ_{L_{2}}}{ρ_{U_{1}} - ρ_{L_{1}}} {(\frac{η}{η + 1})}^{\frac{k_{1} - k_{2}}{2}} \frac{\underset{ρ_{L_{1}}}{\int^{ρ_{U_{1}}}} exp [N ψ (ρ | M_{1})] d ρ}{\underset{ρ_{L_{2}}}{\int^{ρ_{U_{2}}}} exp [N ψ (ρ | M_{2})] d ρ},

(26)

where

k_{j}

denotes the number of columns in

X_{i j}

;

a_{| M_{j}}

,

b_{| M_{j}}

and

c_{| M_{j}}

in

ψ (ρ | M_{j})

defined in Equation (14) are calculated by replacing

X_{i}

with

X_{i j}

in Equations (15) to (17) for

j = 0, 1, 2

, which multiplied by

\frac{1}{N}

have the following probability limits under Assumption 1 and 2 with

η (N) = O (N^{α})

and

α < 0

:

\begin{matrix} \underset{N \to ∞}{p l i m} \frac{1}{N} a_{| M_{j}} & = & {\underset{̲}{a}}_{| M_{j}}, \end{matrix}

(27)

\begin{matrix} \underset{N \to ∞}{p l i m} \frac{1}{N} b_{| M_{j}} & = & {\underset{̲}{b}}_{| M_{j}} = {\underset{̲}{a}}_{| M_{j}} (\underset{̲}{ρ} + γ_{| M_{j}}), \end{matrix}

(28)

\begin{matrix} \underset{N \to ∞}{p l i m} \frac{1}{N} c_{| M_{j}} & = & {\underset{̲}{c}}_{| M_{j}} = {\underset{̲}{ρ}}^{2} {\underset{̲}{a}}_{| M_{j}} + 2 {\underset{̲}{a}}_{| M_{j}} \underset{̲}{ρ} γ_{| M_{j}} + h_{3} (β_{| M_{j}} | M_{j}) + (T - 1) σ^{2}, \end{matrix}

(29)

\begin{matrix} γ_{| M_{j}} & = & \frac{h_{2} (β_{| M_{j}}, \underset{̲}{ρ} | M_{j}) - σ^{2} h (\underset{̲}{ρ})}{{\underset{̲}{a}}_{| M_{j}}} . \end{matrix}

(30)

γ_{| M_{j}}

stands for the Nickell MLE bias of ρ under

M_{j}

. We can see the MLE bias results from two sources: the incidental parameter part (

σ^{2} h (\underset{̲}{ρ})

) and the model misspecification part (

h_{2} (β_{| M_{j}}, \underset{̲}{ρ} | M_{j})

). Proposition 4 shows that when the model is correctly specified, the local posterior mode is a consistent estimator for ρ. In the simulation studies in Section 5 we find that when some combination of the wrong exogenous regressors are included, the marginal posterior density of ρ can be either monotonically increasing or is U-shaped depending on the value of T and does not have a local maximum. When we find such a wrong model, we will assign 0 as its posterior model probability and will not estimate the model. In Proposition 6 and 7 below, we consider the cases when the local maximum exists for

\underset{̲}{ψ} (ρ | M_{j})

in Equation (18) and show the sufficient conditions under which the Bayes factors in Equations (25) and (26) can lead to the selection of the true model asymptotically. Denote

ρ_{| M_{j}}^{*}

as the local maximum of

\underset{̲}{ψ} (ρ | M_{j})

in

(ρ_{L_{j}}, ρ_{U_{j}})

with

{\underset{̲}{ψ}}^{''} (ρ_{| M_{j}}^{*} | M_{j}) < 0

.

Proposition 6. When

M_{1}

is the true model, i.e.,

\underset{̲}{ρ} \neq 0

and

X_{i 1}

is the set of true regressors to generate Y, as N increases,

\frac{p (Y | Y_{0}, X_{1}, M_{1})}{p (Y | Y_{0}, X_{0}, M_{0})}

in Equation (25) will tend to infinity if the following holds,

ϕ (\underset{̲}{ρ}) + \frac{T - 1}{2} ln \frac{{\underset{̲}{c}}_{| M_{0}}}{(T - 1) σ^{2}} > 0

(31)

When

M_{0}

is the true model, i.e.,

X_{i 0}

is the set of true regressors and

\underset{̲}{ρ} = 0

, as N increases,

\frac{p (Y | Y_{0}, X_{1}, M_{1})}{p (Y | Y_{0}, X_{0}, M_{0})}

in Equation (25) will tend to 0 if either of the following is satisfied:

(a) we have

- ϕ (ρ_{| M_{1}}^{*}) + \frac{T - 1}{2} ln \frac{d (ρ_{| M_{1}}^{*} | M_{1})}{(T - 1) σ^{2}} > 0;

(32)

If Equation (21) is true under

M_{1}

, Equation (32) will hold.

(b) Equation (20) is true under

M_{1}

. In this case, the left hand side of Equation (32) is equal to 0.

Proposition 7. When

M_{2}

is the true model and

M_{1}

is the misspecified model, as N increases,

\frac{p (Y | Y_{0}, X_{1}, M_{1})}{p (Y | Y_{0}, X_{2}, M_{2})}

in Equation (26) will tend to 0 if either of the following holds:

(a) we have

ϕ (\underset{̲}{ρ}) - ϕ (ρ_{| M_{1}}^{*}) + \frac{T - 1}{2} ln \frac{d (ρ_{| M_{1}}^{*} | M_{1})}{(T - 1) σ^{2}} > 0,

(33)

If Equation (21) is true under

M_{1}

, Equation (33) will hold.

(b) Equation (20) is true under

M_{1}

. In this case, the left hand side of Equation (33) is equal to 0.

Since both Equations (20) and (21) imply that the local posterior mode in Equation (13) is a consistent estimator for ρ, from Proposition 6 and 7, we can see that if the posterior mode is consistent under the misspecified model, the misspecified model will not be chosen by the Bayes factor (model selection will be consistent). In Appendix D, we show that

h (ρ)

is positive over ℝ when T is an even number. This implies

ϕ (ρ)

is an increasing function over ℝ. Also note that

ϕ (0) = 0

. Hence

ϕ (ρ) < 0

for

ρ < 0

and it is possible for Equation (31) to be violated when T is even and

\underset{̲}{ρ}

is a negative number. As shown in the last paragraph in Appendix E, though Equation (31) could be violated for the extreme case of

T = 2

and

\underset{̲}{ρ} < 0

, fortunately, apart from this extreme case, violation of Equation (31) could only occur when

\underset{̲}{ρ} < - 1

for T being an even number greater than 2, which may not be relevant for most economic applications with

\underset{̲}{ρ} \in [- 1, 1]

.

Note that Equation (32) is the special case of Equation (33) with

\underset{̲}{ρ} = 0

. When the posterior mode is not consistent under the misspecified model, it is difficult to state under what circumstances Equation (33) is or is not satisfied since

ρ^{*}

generally does not have closed form. By construction,

ρ_{| M_{1}}^{*}

is a local minimum for the left of Equation (33). In our simulation studies in Section 5, we calculate Equation (32) or (33) under different settings when model selection errors based on Bayes factors occur. We cannot find a single occurrence of either Equation (32) or (33) being violated except the cases when Equation (20) is true, that is, when the candidate model nests the true model. It appears that the left hand sides of Equation (33) could be interpreted as how close the candidate model is to nest the true model. Note that with real data, it is difficult to check Equation (20), but one can assess whether Equation (33) is violated by replacing

d (ρ | M_{j})

with

\frac{a_{| M_{j}}}{N} ρ^{2} - 2 \frac{b_{| M_{j}}}{N} ρ + \frac{c_{| M_{j}}}{N}

and supplanting

σ^{2}

and

\underset{̲}{ρ}

by their consistent estimates e.g., those from the model including all the potential regressors.

Proposition 8 below shows when the BIC based on the biased MLE is consistent in model selection. BIC for the model with and without the lagged dependent variable is defined respectively below,

\begin{matrix} B I C_{l a g} & = & N T [ln (\frac{c}{N T} - \frac{b^{2}}{a \times N T}) + ln 2 π + 1] + (1 + k + N) ln (N T), \end{matrix}

(34)

\begin{matrix} B I C_{no lag} & = & N T [ln \frac{c}{N T} + ln 2 π + 1] + (k + N) ln (N T), \end{matrix}

(35)

where a, b and c are defined respectively in Equations (15), (16) and (17) with

η = 0

, and k is the number of exogenous regressors included. The model with smaller BIC value will be preferred.

Proposition 8. For the comparison of the two models in Equation (25), when

M_{0}

is the true model, BIC will be consistent if the following is satisfied

(T - 1) σ^{2} - {\underset{̲}{c}}_{| M_{1}} + \frac{{\underset{̲}{b}}_{| M_{1}}^{2}}{{\underset{̲}{a}}_{| M_{1}}} < 0 .

(36)

However, if Equation (20) is true under

M_{1}

, the left of Equation (36) will be greater than 0 and BIC will be inconsistent.

When

M_{1}

is the true model, BIC will be consistent in model selection if the following condition is met:

{\underset{̲}{c}}_{| M_{0}} - {\underset{̲}{c}}_{| M_{1}} + \frac{{\underset{̲}{b}}_{| M_{1}}^{2}}{{\underset{̲}{a}}_{| M_{1}}} > 0 .

(37)

If

X_{i 0}

is the same as

X_{i 1}

and the probability limit of

{\hat{ρ}}_{M L E}

is equal to 0, the left of Equation (37) will be 0 and BIC will be inconsistent.

For the comparison of the two models in Equation (26), when

M_{1}

is the true model, BIC will be consistent in model selection if the following holds

{\underset{̲}{c}}_{| M_{2}} - {\underset{̲}{c}}_{| M_{1}} - \frac{{\underset{̲}{b}}_{| M_{2}}^{2}}{{\underset{̲}{a}}_{| M_{2}}} + \frac{{\underset{̲}{b}}_{| M_{1}}^{2}}{{\underset{̲}{a}}_{| M_{1}}} > 0 .

(38)

Conditions (36), (37) and (38) are just the sufficient but not necessary conditions for BIC to be consistent in model selection since BIC has a penalty term against over-parameterization (the last term in Equations (34) and (35)). Note that

{\hat{ρ}}_{M L E} = \frac{b}{a}

and

\underset{N \to ∞}{p l i m} {\hat{ρ}}_{M L E} = \underset{̲}{ρ} + γ

from Equations (27) and (28), where γ is the Nickell bias. The violation of Equations (36) and (37) is related to the hypothesis test,

H_{0} : ρ = 0

. When

\underset{N \to ∞}{p l i m} {\hat{ρ}}_{M L E} = γ

(

\underset{̲}{ρ} = 0

) and

X_{i 1} = X_{i 0}

, the probability of making type I errors based on classical test statistics, such as Wald or likelihood ratio (LR), will be 1 and BIC will choose

M_{1}

asymptotically with the left hand side of Equation (36) being

{\underset{̲}{a}}_{| M_{1}} γ_{| M 1}^{2} > 0

; when

\underset{N \to ∞}{p l i m} {\hat{ρ}}_{M L E} = 0

and

X_{i 1} = X_{i 0}

, the probability of making type II errors will be 1 asymptotically and BIC will choose

M_{0}

even if

\underset{̲}{ρ} \neq 0

with the left hand side of Equation (37) being

{\underset{̲}{a}}_{| M_{1}} {(\underset{̲}{ρ} + γ_{| M 1})}^{2} = 0

. When incidental parameters are present, Cox and Reid [17] suggest using the likelihood conditional on the MLE of the orthogonalized incidental parameters to construct LR statistics. In practice, if we find

{\hat{ρ}}_{M L E}

is close to 0 or the estimated Nickell bias, we should be cautious to use BIC for model selection. For Equation (38), as shown in Appendix G, if

M_{1}

is the true model and

X_{i 2}

nests

X_{i 1}

, the left hand side of Equation (38) will be less than or equal to 0 asymptotically depending on whether

{\underset{̲}{a}}_{| M_{2}}

is less than or equal to

{\underset{̲}{a}}_{| M_{1}}

. Though Equation (38) is violated when

{\underset{̲}{a}}_{| M_{2}} = {\underset{̲}{a}}_{| M_{1}}

, BIC can still favour

M_{1}

since there are more parameters under

M_{2}

. However, if

{\underset{̲}{a}}_{| M_{2}} < {\underset{̲}{a}}_{| M_{1}}

, which could happen when

f_{i}

is highly correlated with all the potential regressors, BIC will choose the wrong model

M_{2}

asymptotically, as shown in Section 5.3. Given the SSR interpretation of a in Equation (15) (with

η = 0

), the practical implication of this result is that if BIC chooses the model with all the regressors included, which always has smaller

\frac{a}{N}

for finite sample comparing to other models, we should be cautious with such choice in the application.

5. Simulation Studies

In this section we use Monte Carlo simulation to verify the claims in Proposition 6, 7 and 8 and investigate the impact of model uncertainty on point estimation. The number of simulations is 1000. We set

T = 4

,

σ^{2} = 1

,

η = \frac{1}{N T}

,

ρ_{L} = - N

when T is even and the number of possible regressors to 8. We select 4 regressors out of 8 (K) to generate the dependent variable. The coefficient values of the chosen regressors are

0.1

0.3

, 1 and 2 respectively. We draw independently

f_{i}

and

y_{i, 0}

from

U [- 4, 4]

. For each simulation, we calculate the posterior model probabilities and the BIC of all the models and evaluate the performances of the two criteria. In Proposition 5, we show that the posterior mode is not a consistent estimator of ρ when neither Equation (20) nor (21) holds, which is possible when the regressors have collinearity and serial correlation. We generate the potential regressors to be covariance stationary and make them serially correlated and correlated with each other. The details of the data generating process (DGP) can be found in Appendix A. There are three parameters controlling the properties of the regressors:

σ_{X}^{2} = 5.33

(the variance with the same value as those of

f_{i}

and

y_{i, 0}

),

s = 0.5

(the autocorrelation coefficient) and

λ = 1

(between 0 and 1, the closer to 0, the higher the correlation among the regressors). The settings are the same for the subsequent simulation exercises unless otherwise stated. The results of robust checks with other values of

σ_{X}^{2}

, s and λ are shown in Appendix B.

5.1. When Model Selection is Consistent

The model selection performance results for different values of

\underset{̲}{ρ} > - 1

appear similar and are available upon request. If some true regressors are excluded, Equation (20) or (21) would be violated more often under

\underset{̲}{ρ} = - 1

than when

\underset{̲}{ρ}

takes other positive values. The results presented in Table 1 are based on

\underset{̲}{ρ} = - 1

. The “ER” column shows the error rates of Bayes factors7 while the “ERBIC”column contains those of BIC. For

N = 40

, BIC performs better than Bayes factors. As the sample size increases, the error rates of the two criteria get closer and both decrease, which implies both are consistent in model selection. Note that the coefficient of one of the exogenous regressors in the true model is equal to

0.1

, which is close to 0. Models selected based on Bayes factors often cannot pick up this regressor when

N = 40

. The column “nest” indicates how often the model chosen by Bayes factors only omits the regressor with coefficient

0.1

or is the same as the true model. In other words, the true model nests the chosen model. Comparing this column to “nestbic”, we can see that the models chosen by Bayes factors are more often nested inside the true model with the less important regressor excluded than the models from BIC. Column “ER11” shows the proportions of errors committed when the true model and the model chosen by Bayes factors both include

y_{i_}

but have different exogenous regressors.8 We can see that all the errors made by Bayes factors and BIC are due to the inclusion of the wrong set of exogenous regressors rather than omitting

y_{i_}

. Hence there is no point to check whether or not Equation (31) or (37) is violated. When the errors of ER11 or ERBIC11 occurred, we checked whether Equation (33) or (38) was violated. For this and the following simulation exercises, we did not find any violations of Equation (33). For Equation (38), it is only violated with its left hand side being 0 when the chosen model (

M_{2}

) includes all the regressors of the true model (

M_{1}

). For this case, BIC is still consistent. In other words, the errors of ER11 or ERBIC11 are fixable with larger sample sizes for both selection criteria.

Table 1. Simulation results when both criteria are consistent in model selection.

**Table 1.** Simulation results when both criteria are consistent in model selection.
N	ER	ERBIC	ER11	ERBIC11	Nest	Nestbic
40	0.834	0.762	1	1	0.799	0.704
100	0.543	0.510	1	1	0.829	0.777
200	0.300	0.299	1	1	0.862	0.830
500	0.122	0.110	1	1	0.902	0.907
1000	0.064	0.064	1	1	0.943	0.942

5.2. When Equation (31) is Violated for Bayes Factors

In Section 4, we mentioned that when T is even and

\underset{̲}{ρ}

is a small negative number, it is possible for Equation (31) to be violated. Under the settings in Section A, the left hand side of Equation (31) often has a root of

\underset{̲}{ρ}

between

- 7.4

and

- 7.2

when the true regressors are included.9 If

\underset{̲}{ρ}

is less than the root, Equation (31) will be violated. In our next exercise, we set

\underset{̲}{ρ} = - 7.4

and run the simulations again. The results are in Table 2. We can see that Bayes factors cannot select the true model for once out of the 1000 simulations for all sample sizes while the error rates of BIC gradually decrease with N. All the Bayes factors errors are made when the chosen model does not contain

y_{i_}

(see “ER10”) and Equation (31) is violated. Similar problems with Bayes factors arise when

T = 2

and

- 1 < \underset{̲}{ρ} < 0

as explained in Appendix E. Table 3 shows the simulation results for such situation when

\underset{̲}{ρ} = - 0.9

,

σ^{2} = 100

and the true model does not have

X_{i}

while other settings are the same as before. Bayes factors again show no signs of model selection consistency almost always due to the violation of Equation (31). The “noreg” column shows how often in the errors made by Bayes factors, the chosen model only includes the fixed effects with no other regressors. As sample size increases, Bayes factors tend to make more such errors which BIC never commits.

Table 2. Simulation results when Equation (31) is violated with

\underset{̲}{ρ} = - 7.4

.

**Table 2.** Simulation results when Equation (31) is violated with $\underset{̲}{ρ} = - 7.4$ .
N	ER	ERBIC	ER10	no(31)	Nestbic
40	1	0.787	1	1	0.688
100	1	0.552	1	1	0.756
200	1	0.300	1	1	0.832
500	1	0.139	1	1	0.886
1000	1	0.064	1	1	0.941

Table 3. Simulation results when Equation (31) is violated with

\underset{̲}{ρ} = - 0.9

,

T = 2

,

σ^{2} = 100

and no

X_{i}

in the true model.

**Table 3.** Simulation results when Equation (31) is violated with $\underset{̲}{ρ} = - 0.9$ , $T = 2$ , $σ^{2} = 100$ and no $X_{i}$ in the true model.
N	ER	ERBIC	ER10	no(31)	Noreg
40	0.910	0.696	0.936	0.998	0.621
100	0.873	0.559	0.956	1	0.751
200	0.854	0.440	0.977	1	0.833
500	0.858	0.360	0.986	1	0.893
1000	0.896	0.273	0.991	1	0.921

5.3. When Equations (36), (37) or (38) is Violated for BIC

Bayes factors perform poorly in model selection when Equation (31) is violated, which takes place under rather extreme situations. Next we show that BIC can perform poorly under more common circumstances, which are more possible for economic applications. As discussed in Proposition 8, if

\underset{̲}{ρ} = 0

, BIC could asymptotically choose the model with the true exogenous regressor(s) and

y_{i_}

over the true model. For the next simulation exercise, we change

\underset{̲}{ρ}

to 0. The results are shown in Table 4. Bayes factors now have smaller error rates while BIC cannot identify the true model. As expected, BIC always chooses the models with

y_{i_}

(see “ER01BIC”), while the proportion of errors violating Equation (36), gets higher for bigger sample sizes. Column “cnestbic” shows how often the chosen model by BIC nests the true model when the errors of ER01BIC occur. The values in this column are just slightly smaller than those in “no(36)”, which indicates a high proportion of the violation of Equation (36) happens when the chosen model nests the true model.

Another situation of poor BIC performance is when Equation (37) is close to violation. In Proposition 8, we mentioned that if

\underset{N \to ∞}{p l i m} {\hat{ρ}}_{M L E} = 0

under the true model, a candidate model with the same exogenous regressors as those of the true model will violate Equation (37). In our next experiment, we do not include any exogenous regressors into the true model and set

\underset{̲}{ρ} = 0.0756

to make

{\hat{ρ}}_{M L E}

close to 0. If the candidate model (

M_{0}

) only has fixed effects, the left hand side of Equation (37) is close to but slightly above 0. The simulation results are given in Table 5. We can see that BIC error rates gradually increase to near 1 with the sample size. Column “noregbic” indicates the proportion of BIC errors committed when the chosen model only includes fixed effects. Note that the values in this column are the same as those in “ER10BIC”, which also get closer to 1 with sample size. Clearly, the poor performance of BIC in this scenario should be related to Equation (37).

Table 4. Simulation results when Equation (36) is violated with

\underset{̲}{ρ} = 0

and

T = 4

.

**Table 4.** Simulation results when Equation (36) is violated with $\underset{̲}{ρ} = 0$ and $T = 4$ .
N	ER	ERBIC	ER01BIC	no(36)	Cnestbic	Nest
40	0.844	1	1	0.243	0.230	0.777
100	0.572	1	1	0.524	0.511	0.823
200	0.290	1	1	0.777	0.760	0.864
500	0.104	1	1	0.941	0.925	0.924
1000	0.042	1	1	0.996	0.985	0.965

Table 5. Simulation results when Equation (37) is violated with

\underset{̲}{ρ} = 0.0756

and no exogenous regressors are included in the true model.

**Table 5.** Simulation results when Equation (37) is violated with $\underset{̲}{ρ} = 0.0756$ and no exogenous regressors are included in the true model.
N	ER	ERBIC	ER10	ER10BIC	Noreg	Noregbic
40	0.952	0.968	0.973	0.876	0.660	0.876
100	0.808	0.971	0.943	0.947	0.762	0.947
200	0.540	0.976	0.887	0.968	0.720	0.968
500	0.132	0.985	0.311	0.981	0.303	0.981
1000	0.056	0.991	0	0.990	0	0.990

Finally we show a case when Equation (38) is violated. Note that the left hand side of Equation (38) asymptotically depends on

{\underset{̲}{a}}_{| M_{1}}

(calculated under the true model) and

{\underset{̲}{a}}_{| M_{2}}

(calculated under the wrong candidate model). If

{\underset{̲}{a}}_{| M_{1}} > {\underset{̲}{a}}_{| M_{2}}

, BIC will be inconsistent, which could happen when

M_{2}

nests

M_{1}

and

f_{i}

is highly correlated with all the potential exogenous regressors. In our next exercise, we set

T = 3

,

\underset{̲}{ρ} = - 1

and generate

y_{i, 0}

and

f_{i}^{*}

from

U [- 1, 1]

. When we generate

{\tilde{x}}_{i, t}

in Equation (40), we set

s = - 0.9

.

f_{i}

is generated as

f_{i} = f_{i}^{*} + 10 \frac{1}{T K} \sum_{t = 1}^{T} \sum_{h = 1}^{K} x_{i, t, h}

. In the true model, no exogenous regressors are included, which implies any candidate model including

y_{i_}

nests or is the same as the true model. The results in Table 6 show that BIC is not consistent with increasing error rates as the sample size gets larger than 200 and all the errors are of type ERBIC11. For all the errors made by BIC, we have found that Equation (38) is violated with

{\underset{̲}{a}}_{| M_{1}} > {\underset{̲}{a}}_{| M_{2}}

. For a few cases,

{\underset{̲}{a}}_{| M_{1}}

is very close to

{\underset{̲}{a}}_{| M_{2}}

. The column with the heading

\frac{{\underset{̲}{a}}_{| M_{2}}}{{\underset{̲}{a}}_{| M_{1}}} < 0.999

in Table 6 indicates the percentage of the errors when

{\underset{̲}{a}}_{| M_{2}}

is smaller than

{\underset{̲}{a}}_{| M_{1}}

by more than

0.1 %

of its value. We can see that the majority of the errors happen when

{\underset{̲}{a}}_{| M_{2}}

is smaller by more than a tiny fraction of

{\underset{̲}{a}}_{| M_{1}}

. The column headed with

E (\frac{{\underset{̲}{a}}_{| M_{2}}}{{\underset{̲}{a}}_{| M_{1}}})

shows the sample average of

\frac{{\underset{̲}{a}}_{| M_{2}}}{{\underset{̲}{a}}_{| M_{1}}}

from all the errors, which gets smaller with the sample size. This implies BIC tends to choose the model with lower

\underset{̲}{a}

in comparison to the true model with larger sample sizes. Note that the simulation results are sensitive to the parameter settings. If we change T to 4 while keeping other settings the same, among the BIC errors,

{\underset{̲}{a}}_{| M_{2}}

will be virtually the same as

{\underset{̲}{a}}_{| M_{1}}

and BIC will show decreasing error rates, which, though, are higher than those of Bayes factors for different sample sizes. In this case, we need to change

\underset{̲}{ρ}

and s to make BIC inconsistent. The results are available upon request.

Table 6. Simulation results when Equation (38) is violated.

**Table 6.** Simulation results when Equation (38) is violated.
N	ER	ERBIC	ER11	ERBIC11	$\frac{{\underset{̲}{a}}_{\| M_{2}}}{{\underset{̲}{a}}_{\| M_{1}}} < 0.999$	$E (\frac{{\underset{̲}{a}}_{\| M_{2}}}{{\underset{̲}{a}}_{\| M_{1}}})$
40	0.209	0.439	1	1	0.968	0.878
100	0.119	0.352	1	1	0.977	0.866
200	0.072	0.336	1	1	0.973	0.858
500	0.050	0.406	1	1	0.990	0.811
1000	0.039	0.600	1	1	0.993	0.782

To sum up, it is possible for Equation (36), (37) or (38) to be violated and BIC can be inconsistent in model selection under more common circumstances than Bayes factors.

5.4. Point Estimation

Judging from the previous simulation results, we can see that if we simply select the model with the highest posterior model probability to provide the estimates of our interest, the chances will be high that the model selected is not the true model especially when N is small regardless of which criterion we use. Next we investigate how model uncertainty impacts on point estimation. We set

\underset{̲}{ρ} = 1

and the number of simulations equal to 2000. We then evaluate the performances of different consistent point estimators.10

Table 7 shows the root mean squared errors (RMSE) with the cross section sample size (N) equal to 40. The true values of ρ and β are shown under the column “True”. There are 8 potential regressors, 4 of which are not included in the true model and hence have coefficients equal to 0. The column “Top” shows the RMSE resulting from the posterior mode of the model with the highest posterior model probability, the column “All” shows the results from the model which include all the potential regressors, while the values in the column “BMA” are from the posterior mode average of different models with the weights equal to the posterior model probabilities. To evaluate the significance of a regressor in the Bayesian context, we can calculate the sum of the posterior model probabilities of all the models which include the regressor. The RMSE in the columns headed with percentage numbers are calculated based on certain inclusion probability criterion. If the inclusion probability for a regressor is lower than the percentage number of the column, we will simply use zero as its point estimate. Otherwise, we will use the BMA estimate. From Table 7, we can see that the model including all the potential regressors has much higher RMSE for all the parameters except ρ than other methods. BMA has smaller RMSE for almost all the parameters than the top model criterion11 and it tends to have lower RMSE than inclusion probability criteria for parameters different from 0 while larger RMSE for parameters equal to 0. Higher inclusion probability tends to give smaller RMSE when the true value of the parameter is 0 while higher RMSE for non-zero parameters. The last row of Table 7 shows the sum of RMSE in each column, which is a measure of the overall performances of different criteria. We can see that BMA and various inclusion probability criteria are all better than those of the top model and the all-inclusive model. The sum of RMSE is the smallest when we set the inclusion probability to

50 %

.

Table 7. root mean squared errors (RMSE) of point estimators when

N = 40

.

**Table 7.** root mean squared errors (RMSE) of point estimators when $N = 40$ .
True		Top	BMA	All	30%	40%	50%	60%	70%	80%
$\underset{̲}{ρ}$	1	0.017	0.017	0.017	0.017	0.017	0.017	0.017	0.017	0.017
β	0.1	0.110	0.090	4.954	0.096	0.099	0.101	0.102	0.103	0.103
	0.3	0.126	0.114	4.263	0.117	0.121	0.128	0.138	0.151	0.167
	0	0.072	0.066	3.567	0.062	0.060	0.058	0.056	0.049	0.046
	0	0.071	0.057	5.219	0.054	0.052	0.049	0.045	0.042	0.036
	0	0.051	0.044	2.254	0.039	0.036	0.030	0.027	0.023	0.019
	1	0.119	0.104	6.033	0.105	0.105	0.106	0.111	0.114	0.121
	0	0.057	0.053	3.777	0.049	0.047	0.041	0.038	0.033	0.025
	2	0.118	0.108	6.573	0.108	0.108	0.113	0.113	0.120	0.128
Sum		0.739	0.652	36.656	0.647	0.643	0.643	0.646	0.653	0.661

To add more insights into inclusion probabilities, we present the error rates of in/excluding the wrong/right regressor based on different inclusion probability criteria in Table 8 and compare to those from the top model. Similar to the findings of RMSE, higher inclusion probabilities tend to give larger error rates for non-zero parameters while smaller error rates for the zero parameters. The last row shows the average error rates of different columns, of which the highest value appears when the

10 %

criterion is used and the majority of the errors are from the zero parameters. Note that for a particular regressor, the prior inclusion probability is

50 %

in our setting. If the posterior inclusion probability is no less than

50 %

, it implies the data confirm or strengthen the prior. The top model criterion has smaller average error rate than almost all the inclusion probability criteria except

40 %

and

50 %

.

Table 8. The error rates of excluding or including a regressor based on different criteria when

N = 40

.

**Table 8.** The error rates of excluding or including a regressor based on different criteria when $N = 40$ .
True		Top	10%	20%	30%	40%	50%	60%	70%	80%
$\underset{̲}{ρ}$	1	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000
β	0.1	0.826	0.181	0.526	0.682	0.789	0.856	0.894	0.927	0.955
	0.3	0.119	0.004	0.019	0.046	0.070	0.110	0.150	0.205	0.270
	0	0.053	0.647	0.209	0.105	0.060	0.042	0.026	0.016	0.011
	0	0.052	0.638	0.217	0.114	0.073	0.042	0.023	0.013	0.008
	0	0.048	0.654	0.226	0.103	0.064	0.032	0.019	0.011	0.006
	1	0.003	0.000	0.001	0.001	0.001	0.002	0.003	0.004	0.006
	0	0.043	0.633	0.204	0.107	0.064	0.037	0.023	0.012	0.005
	2	0.001	0.000	0.000	0.000	0.000	0.001	0.001	0.001	0.002
Avg.		0.127	0.307	0.156	0.129	0.125	0.124	0.127	0.132	0.140

Table 9 presents the results of RMSE sums and average error rates under different sample sizes. BMA has smaller RMSE than the top model estimators for all sizes, while the top model average error rate is in general close to the minimum of various inclusion probability criteria. The minimums of RMSE sums are usually attained when the inclusion probability is above or equal to

50 %

, while the minimum average error rates appear at around

50 %

. Therefore, under our simulation settings, for point estimation, it seems sensible to use

50 %

inclusion probability to decide whether or not a regressor should be included.

Table 9. Sum of RMSE and averages of error rates.

**Table 9.** Sum of RMSE and averages of error rates.
	Sum of RMSE					Average Error Rates
N	40	100	200	500	1000	40	100	200	500	1000
Top	0.739	0.535	0.299	0.189	0.107	0.127	0.083	0.046	0.015	0.005
BMA	0.652	0.459	0.288	0.171	0.101	N.A.	N.A.	N.A.	N.A.	N.A.
10%	0.653	0.460	0.287	0.171	0.100	0.307	0.178	0.116	0.062	0.029
20%	0.653	0.460	0.286	0.170	0.099	0.156	0.102	0.063	0.030	0.012
30%	0.647	0.461	0.284	0.169	0.098	0.129	0.085	0.049	0.021	0.008
40%	0.643	0.461	0.285	0.169	0.097	0.125	0.081	0.046	0.017	0.006
50%	0.643	0.429	0.284	0.168	0.096	0.124	0.082	0.048	0.015	0.005
60%	0.646	0.437	0.278	0.169	0.096	0.127	0.087	0.051	0.015	0.004
70%	0.653	0.447	0.279	0.170	0.098	0.132	0.092	0.056	0.017	0.005
80%	0.661	0.437	0.279	0.161	0.100	0.140	0.099	0.063	0.021	0.007
90%	0.685	0.439	0.305	0.171	0.102	0.153	0.111	0.077	0.028	0.008

6. Conclusions

In this paper, we investigated consistent parameter estimation and model selection for the linear dynamic panel model. We use the fixed effect reparameterization proposed by Lancaster [11] combined with our data dependent prior for estimation and calculate Bayes factors to compare different model specifications. We recommend model selection should precede parameter estimation, since Lancaster's fixed effect transformation may not necessarily lead to consistent estimation when some true exogenous regressors are excluded. We have given the conditions under which Bayes factors or BIC can lead to consistency in model selection and have shown that Bayes factors could be inconsistent in model selection when the number of time periods is 2, or when the true autoregressive coefficient is less than

- 1

. Such situations could be rare for most economic applications. BIC based on the biased MLE can be inconsistent when the fixed effects are highly correlated with all the potential exogenous regressors or when the true autoregressive coefficient is 0 or its MLE is close to 0, which are more likely to happen in reality.

When model uncertainty is substantial, e.g. with small sample sizes, we argue for the use of Bayesian model averaging, which can produce point estimators with smaller RMSE than the model with the highest posterior model probability in our simulation exercises. Inclusion probability criteria can be helpful to reduce estimation risk and for deciding which regressor(s) should be chosen. We recommend using

50 %

(the prior inclusion probability) to decide the inclusion of a regressor, which usually produces the smallest RMSE and average error rates in our simulation exercises. It can be promising for future research to extend Lancaster's reparameterization to account for higher order AR models and to consider lag order selection along with regressor selection.

Acknowledgments

The author wishes to thank Roberto Leon Gonzalez for his patient and insightful guidance and Gary Koop for his long-term encouragement and helpful advice. The author is also very grateful to the editor, Kerry Patterson and three anonymous referees, whose comments and suggestions lead to considerable improvement of the paper. The author is responsible for all the remaining errors.

Conflicts of Interest

The author declares no conflict of interest.

Appendix

A. The DGP of Exogenous Regressors

We first draw

\underset{K \times 1}{x_{i, t}^{*}}

from

i . i . d . N (0, σ_{X}^{2} I_{K})

and then generate

{\tilde{x}}_{i, t}

as follows

{\tilde{x}}_{i, t} = s {\tilde{x}}_{i, t - 1} + \sqrt{1 - s^{2}} x_{i, t}^{*},

(39)

with

{\tilde{x}}_{i, 0} = x_{i, 0}^{*}

. s is the first order autocorrelation. Denote

{\tilde{X}}_{i} = {({\tilde{x}}_{i, 1}, \dots, {\tilde{x}}_{i, T})}^{'}

,

{\tilde{X}}_{i, j}

and

X_{i, h}

to be the jth and hth column of

{\tilde{X}}_{i}

and

X_{i}

respectively, where

X_{i, h} = \underset{j = 1}{\sum^{K}} q_{h, j} {\tilde{X}}_{i, j} j = 1, 2, \dots, K .

(40)

Define

\underset{K \times 1}{z} = {(\frac{1}{\sqrt{K}}, \dots, \frac{1}{\sqrt{K}})}^{'}

.

q_{h} = {(q_{h, 1}, \dots, q_{h, K})}^{'}

is drawn from angular central Gaussian distribution (ACG) with

q_{h}^{'} q_{h} = 1

and parameter

z z^{'} + λ (I - z z^{'})

(see [18]).

q_{h}

can be viewed as an orientation (direction) in

ℝ^{K}

. If

λ = 1

,

q_{h}

will be uniformly distributed; if λ is closer to 0,

q_{h}

will be closer to the orientation of z, i.e., the regressors generated thereby will have higher correlation.12 Note that under our data generating design, any element in

X_{i}

will have mean 0 and variance

σ_{X}^{2}

. The correlation coefficient of any two elements in

X_{i}

is the same across i and can be calculated as

c o r r (X_{i, t, h}, X_{i, \tilde{t}, \tilde{h}}) = s^{| t - \tilde{t} |} \underset{j = 1}{\sum^{K}} q_{h, j} q_{\tilde{h}, j} t = 1, 2, \dots, T h = 1, 2, \dots, K .

(41)

B. Properties of the Exogenous Regressors in the Simulation

Here we will do some robustness checks of our simulation results under different settings. Apart from the conditions in Proposition 6 to 8, model selection performance of Bayes factors and BIC is also sensitive to the properties of

X_{i}

for different values of

σ_{X}^{2}

, s and λ. We first reduce

σ_{X}^{2}

to

1.33

to obtain the results in Table 10. The error rates of the two criteria are all higher for different sample sizes than those in Table 1 while the nest rates are all lower. Similar model selection deterioration could also occur with inflated error variances (

σ^{2}

). Hence model selection performance is affected by the relative strength of the signal compared to the noise, which is determined by their variances.

Next we show that the levels of serial correlation and collinearity in the regressors also affect model selection performance. Recall from Section A that s is the first order autocorrelation and λ controls the level of collinearity. To have the regressors with no collinearity, we can set

q_{h}

to be the hth column of an identity matrix of dimension K. We set

\underset{̲}{ρ} = 1

and

N = 200

. The error rates under different levels of serial correlation and collinearity are shown in Table 11. We can see that cross regressor correlation and positive serial correlation are harmful for model selection. If different regressors are orthogonal to each other, Bayes factors and BIC will have lower error rates than when collinearity is present, while the highest error rates appear when s is

0.9

under different levels of collinearity. One intriguing phenomenon is that negative serial correlation seems to enhance model selection performance for most cases in comparison to positive or no serial correlation.

Table 10. Simulation results when

\underset{̲}{ρ} = - 1

and

σ_{X}^{2} = 1.33

.

**Table 10.** Simulation results when $\underset{̲}{ρ} = - 1$ and $σ_{X}^{2} = 1.33$ .
N	ER	ERBIC	ER11	ERBIC11	Nest	Nestbic
40	0.962	0.958	1	1	0.501	0.497
100	0.886	0.872	1	1	0.736	0.716
200	0.767	0.747	1	1	0.827	0.812
500	0.500	0.470	1	1	0.869	0.856
1000	0.221	0.217	1	1	0.910	0.896

Table 11. Error rates under different levels of serial correlation and collinearity for

\underset{̲}{ρ} = 1

and

N = 200

.

**Table 11.** Error rates under different levels of serial correlation and collinearity for $\underset{̲}{ρ} = 1$ and $N = 200$ .
	Bayes Factors					BIC
λs	–0.9	–0.5	0	0.5	0.9	–0.9	–0.5	0	0.5	0.9
orthogonal	0.045	0.039	0.061	0.085	0.569	0.098	0.081	0.089	0.127	0.543
$λ = 1$	0.135	0.154	0.174	0.276	0.779	0.148	0.178	0.186	0.278	0.762
$λ = 0.01$	0.638	0.661	0.696	0.803	0.972	0.600	0.627	0.678	0.794	0.966

C. Proof of Proposition 3 and Proposition 5

Here we use a different way of derivation from Lancaster [11]. In brief, we attempt to find a correction function attached to the marginal posterior density of ρ such that the mode of the marginal posterior is a consistent estimator for ρ. We first reparameterize the fixed effect as

f_{i} = g_{i} r (ρ) - \frac{1}{T} ι^{'} X_{i} β

(42)

where

r (ρ)

is a function of ρ, which we will find out later. The derivation of the conditional posterior distribution

p (g_{i}, β, σ^{2} |, Y, Y_{0})

follows standard Bayesian techniques, see e.g., [19] Chapter 10. The details are available upon request. Here we just show the results after

g_{i}

, β and

σ^{2}

are integrated out.

\begin{matrix} p (ρ | Y, Y_{0}) p (Y | Y_{0}) = & Γ [\frac{N (T - 1)}{2}] T^{- \frac{N}{2}} {(2 π)}^{- \frac{N (T - 1)}{2}} \frac{I (ρ_{L} < ρ < ρ_{U})}{ρ_{U} - ρ_{L}} {(\frac{η}{η + 1})}^{\frac{k}{2}}, \\ N^{- \frac{N (T - 1)}{2}} {(\frac{a}{N} ρ^{2} - 2 \frac{b}{N} ρ + \frac{c}{N})}^{- \frac{N (T - 1)}{2}} r^{- N} (ρ) . \end{matrix}

(43)

Taking log and differentiating both sides with respect to ρ produces

\frac{d ln p (ρ | Y, Y_{0})}{d ρ} = - \frac{N (T - 1) (a ρ - b)}{a ρ^{2} - 2 b ρ + c} - N \frac{d ln r (ρ)}{d ρ} .

By setting the above equal to 0, we can obtain

\frac{d ln r (ρ)}{d ρ} = - \frac{(T - 1) (\frac{a}{N} ρ - \frac{b}{N})}{\frac{a}{N} ρ^{2} - 2 \frac{b}{N} ρ + \frac{c}{N}} .

Suppose for now we have included the true regressors in our model. Taking probability limit of the right hand side by using Equations (27), (28) and (29) and evaluating both sides at

\underset{̲}{ρ}

gives

\frac{d ln r (\underset{̲}{ρ})}{d ρ} = - h (\underset{̲}{ρ}) .

(44)

Solving the above differential equation, we will have13

r (ρ) = e x p [- ϕ (ρ)],

(45)

where

ϕ (ρ)

is given in Equation (7). By replacing

r (ρ)

with

e x p [- ϕ (ρ)]

in Equation (43) and dropping the terms not involving ρ, we will have the result in Equation (13).

When some true regressors are excluded from the model, the differential Equation (44) now becomes

\frac{d ln r (\underset{̲}{ρ})}{d \underset{̲}{ρ}} = \frac{(T - 1) [h_{2} (β, \underset{̲}{ρ}) - σ^{2} h (\underset{̲}{ρ})]}{h_{3} (β) + (T - 1) σ^{2}} d \underset{̲}{ρ}

(46)

If the solution in Equation (45) is still valid, we should have

\frac{- (T - 1) h_{2} (β, \underset{̲}{ρ}) + (T - 1) σ^{2} h (\underset{̲}{ρ})}{h_{3} (β) + (T - 1) σ^{2}} = h (\underset{̲}{ρ}) .

So unless we have either

\frac{- (T - 1) h_{2} (β, \underset{̲}{ρ})}{h_{3} (β)} = h (\underset{̲}{ρ})

or

h_{2} (β, \underset{̲}{ρ}) = h_{3} (β) = 0

, Equation (45) will not be a solution for Equation (46). In other words, the reparameterization of the fixed effect in Equation (6) cannot lead to consistent estimation of ρ.

D. Proof of Proposition 4

To prove the claims in Proposition 4, we first need to prove Lemma 9 and Lemma 10.

Lemma 9. For

T \geq 3

, when T is odd, the polynomial

h (ρ)

is strictly increasing over

(- ∞, ∞)

and has only one real root in

[- 2, - 1)

; when T is even, the polynomial

h (ρ)

is greater than 0 with no real roots and is strictly decreasing for

ρ < - 1

and strictly increasing for

ρ > - 1

with

- 1

as the minimum point.

Proof. When

T = 3

, we have

h (ρ) = \frac{1}{3} (ρ + 2)

; when

T = 4

, we have

h (ρ) = \frac{1}{4} (ρ^{2} + 2 ρ + 3)

. These two cases obviously satisfy the claims in Lemma 9. For

T > 4

, note that

h^{'} (ρ) = \frac{d h (ρ)}{d ρ} = \sum_{t = 1}^{T - 2} \frac{t (T - t - 1)}{T} ρ^{t - 1} = \frac{(T - 2) ρ^{T} - T ρ^{T - 1} + T ρ - (T - 2)}{T {(ρ - 1)}^{3}}

and the Sturm sequence of

T h^{'} (ρ) {(ρ - 1)}^{3}

is

\begin{matrix} { & T h^{'} (ρ) (ρ - 1) = \sum_{t = 0}^{T - 2} (2 t - T + 2) ρ^{t}, \\ \frac{1}{{(ρ - 1)}^{2}} \frac{d h^{'} (ρ) {(ρ - 1)}^{3}}{d ρ} = \sum_{t = 1}^{T - 2} t ρ^{t - 1}, \sum_{t = 1}^{T - 3} (T - t - 2) ρ^{t - 1}, - {(T - 2)}^{2}} . \end{matrix}

(47)

Table 12 shows the signs of the Sturm sequence for

ρ = \pm ∞

and T being even. We can see that the difference between the number of sign changes when ρ changes from

- ∞

to ∞ is 2. By Sturm's theorem, this implies there are two real roots for

T h^{'} (ρ) {(ρ - 1)}^{3} = 0

in

(- ∞, ∞)

. Clearly,

ρ = 1

is one real root. In other words,

h^{'} (ρ)

has only one real root in

(- ∞, ∞)

, which is

ρ = - 1

, and

h^{'} (ρ) > 0

for

ρ > - 1

,

h^{'} (ρ) < 0

for

ρ < - 1

. Therefore

h (ρ)

is strictly decreasing for

ρ < - 1

and strictly increasing for

ρ > - 1

with

ρ = - 1

as the minimum point. Similarly, checking the difference between the number of sign changes in Table 13, we can find that

h^{'} (ρ)

has no real roots and

h^{'} (ρ) > 0

. Hence

h (ρ)

is strictly increasing over the real line when T is odd.

Table 12. Sturm sequence of

T h^{'} (ρ) {(ρ - 1)}^{3}

when T is even and greater than 4.

**Table 12.** Sturm sequence of $T h^{'} (ρ) {(ρ - 1)}^{3}$ when T is even and greater than 4.
ρ	$h^{'} (ρ)$	$\sum_{t = 0}^{T - 2} (2 t - T + 2) ρ^{t}$	$\sum_{t = 1}^{T - 2} t ρ^{t - 1}$	$\sum_{t = 1}^{T - 3} (T - t - 2) ρ^{t - 1}$	$- {(T - 2)}^{2}$
−∞	−	+	−	+	−
∞	+	+	+	+	−

Table 13. Sturm sequence of

T h^{'} (ρ) {(ρ - 1)}^{3}

when T is odd and greater than 4.

**Table 13.** Sturm sequence of $T h^{'} (ρ) {(ρ - 1)}^{3}$ when T is odd and greater than 4.
ρ	$h^{'} (ρ)$	$\sum_{t = 0}^{T - 2} (2 t - T + 2) ρ^{t}$	$\sum_{t = 1}^{T - 2} t ρ^{t - 1}$	$\sum_{t = 1}^{T - 3} (T - t - 2) ρ^{t - 1}$	$- {(T - 2)}^{2}$
−∞	+	−	+	−	−
∞	+	+	+	+	−

We can write

h (ρ) = \frac{ρ^{T} - T ρ + T - 1}{T {(ρ - 1)}^{2}}

and the Sturm sequence of

T h (ρ) {(ρ - 1)}^{2}

is

\{T h (ρ) (ρ - 1) = \sum_{t = 1}^{T - 2} ρ^{t} - (T - 1), \frac{1}{(ρ - 1)} \frac{d h (ρ) {(ρ - 1)}^{2}}{d ρ} = \sum_{t = 0}^{T - 2} ρ^{t}, T - 1\} .

(48)

From Table 14, we can see that

h (ρ)

does not have real roots and hence it is greater than 0 when T is an even number. Table 15 shows

h (ρ)

has only one real root when T is odd. Since

h (- 2) = \frac{{(- 2)}^{T} + 3 T - 1}{9 T} \leq 0

,

h (- 1) = \frac{T - 1}{2 T} > 0

and

h (ρ)

is strictly increasing when T is an odd number no less than 3, the real root of

h (ρ) = 0

must lie in between

- 2

and

- 1

.

Table 14. Sturm sequence of

T h (ρ) {(ρ - 1)}^{2}

when T is even and greater than or equal to 3.

**Table 14.** Sturm sequence of $T h (ρ) {(ρ - 1)}^{2}$ when T is even and greater than or equal to 3.
ρ	$h (ρ)$	$\sum_{t = 1}^{T - 1} ρ^{t} - T + 1$	$\sum_{t = 0}^{T - 2} ρ^{t}$	$T - 1$
−∞	+	−	+	+
∞	+	+	+	+

Table 15. Sturm sequence of

T h (ρ) {(ρ - 1)}^{2}

when T is odd and greater than or equal to 3.

**Table 15.** Sturm sequence of $T h (ρ) {(ρ - 1)}^{2}$ when T is odd and greater than or equal to 3.
ρ	$h (ρ)$	$\sum_{t = 1}^{T - 1} ρ^{t} - T + 1$	$\sum_{t = 0}^{T - 2} ρ^{t}$	$T - 1$
−∞	−	+	−	+
∞	+	+	+	+

☐

Additionally, we need the following lemma to show that the true value of ρ is a local posterior mode asymptotically.

Lemma 10. Under Assumption 1 and 2, we have

\underset{N \to ∞}{p l i m} \frac{a}{N} = \underset{̲}{a} > σ^{2} t r a c e (C^{'} H C)

, where ρ is evaluated at its true value (

\underset{̲}{ρ}

) in C. Also, we have

t r a c e (C^{'} H C) \geq \frac{2 h^{2} (\underset{̲}{ρ})}{T - 1}

, where the equal sign holds only for

T = 2

, and

t r a c e (C^{'} H C) \geq \frac{2 h^{2} (\underset{̲}{ρ})}{T - 1} + h^{'} (\underset{̲}{ρ})

, where the equal sign holds for

T = 2

or

\underset{̲}{ρ} = 1

. In other words, the following are true:

\begin{matrix} \underset{̲}{a} & > & \frac{2 σ^{2} h^{2} (\underset{̲}{ρ})}{T - 1}, \end{matrix}

(49)

\begin{matrix} \underset{̲}{a} & > & \frac{2 σ^{2} h^{2} (\underset{̲}{ρ})}{T - 1} + h^{'} (\underset{̲}{ρ}) σ^{2} . \end{matrix}

(50)

Proof. Substituting Equation (3) into the right of Equation (15) gives

\begin{matrix} a = \sum_{i = 1}^{N} {(f_{i} ζ_{1} + y_{i, 0} ζ_{2} + C X_{i} β + C u_{i})}^{'} H (f_{i} ζ_{1} + y_{i, 0} ζ_{2} + C X_{i} β + C u_{i}) \\ - & \frac{\sum_{i = 1}^{N} [{(f_{i} ζ_{1} + y_{i, 0} ζ_{2} + C X_{i} β + C u_{i})}^{'} H X_{i}] {(\sum_{i = 1}^{N} X_{i}^{'} H X_{i})}^{- 1} \sum_{i = 1}^{N} [X_{i}^{'} H (f_{i} ζ_{1} + y_{i, 0} ζ_{2} + C X_{i} β + C u_{i})]}{η + 1} . \end{matrix}

(51)

Since we assume

E (u_{i} | X_{i}, f_{i}, y_{i, 0}) = 0

and set

η = O (N^{α})

with

α < 0

,

\frac{a}{N}

is asymptotically equivalent to

\frac{\tilde{a}}{N}

, where

\tilde{a}

is defined as

\begin{matrix} \tilde{a} = & \overset{N}{\sum_{i = 1}} u_{i}^{'} C^{'} H C u_{i} + \overset{N}{\sum_{i = 1}} {(f_{i} ζ_{1} + y_{i, 0} ζ_{2} + C X_{i} β)}^{'} H (f_{i} ζ_{1} + y_{i, 0} ζ_{2} + C X_{i} β) \\ - \overset{N}{\sum_{i = 1}} [{(f_{i} ζ_{1} + y_{i, 0} ζ_{2} + C X_{i})}^{'} H X_{i}] {(\overset{N}{\sum_{i = 1}} X_{i}^{'} H X_{i})}^{- 1} \overset{N}{\sum_{i = 1}} [X_{i}^{'} H (f_{i} ζ_{1} + y_{i, 0} ζ_{2} + C X_{i})], \\ = & \overset{N}{\sum_{i = 1}} u_{i}^{'} C^{'} H C u_{i} + \overset{N}{\sum_{i = 1}} {(y_{_} - C u_{i})}^{'} H (y_{_} - C u_{i}) \\ - \overset{N}{\sum_{i = 1}} [{(y_{_} - C u_{i})}^{'} H X_{i}] {(\overset{N}{\sum_{i = 1}} X_{i}^{'} H X_{i})}^{- 1} \overset{N}{\sum_{i = 1}} [X_{i}^{'} H (y_{_} - C u_{i})] . \end{matrix}

(52)

Note that

\frac{1}{N} (\tilde{a} - \sum_{i = 1}^{N} u_{i}^{'} C^{'} H C u_{i})

is non-negative since it is equal to

\frac{1}{N}

multiplied by the SSR obtained by regressing

f_{i} ζ_{1} + y_{i, 0} ζ_{2} + C X_{i} β

on fixed effects and

X_{i}

, i.e.,

f_{i} ζ_{1} + y_{i, 0} ζ_{2} + C X_{i} β = q_{i} ι + X_{i} ϑ + ε_{i},

(53)

where

q_{i}

denotes the fixed effect scalar and

ε_{i} = {(ε_{i 1}, ε_{i 2}, \dots, ε_{i T})}^{'}

is the error term in the regression. Assumption 2 (e) rules out

\underset{N \to ∞}{p l i m} \frac{\sum_{i = 1}^{N} ε_{i}^{'} ε_{i}}{N} = 0

. Note that

(1 - ρ) ζ_{1} + ζ_{2} = ι

. When

T \geq 3

, pre-multiplying both sides of Equation (53) by

M_{ζ}

yields

M_{ζ} C X_{i} β = M_{ζ} X_{i} ϑ + M_{ζ} ε_{i} .

(54)

Therefore one can use Equation (5) to check Assumption 2 (e), which ensures the

\frac{S S R}{N}

from Equation (53) to be strictly positive asymptotically and

\begin{matrix} \underset{̲}{a} > \underset{N \to ∞}{p l i m} \frac{1}{N} \overset{N}{\sum_{i = 1}} u_{i}^{'} C^{'} H C u_{i} = σ^{2} t r a c e (C^{'} H C) . \end{matrix}

(55)

Hence Equation (49) is strict if

t r a c e (C^{'} H C) \geq \frac{2 h^{2} (\underset{̲}{ρ})}{T - 1}

. Similarly, to prove Equation (50), one needs to show

t r a c e (C^{'} H C) \geq \frac{2 h^{2} (\underset{̲}{ρ})}{T - 1} + h^{'} (ρ)

. The proof of these two inequalities can be found in the proof of Lemma 3 in Dhaene and Jochmans [8], by noting

V_{0}^{L B} = \frac{t r a c e (C^{'} H C)}{T - 1}

,

b_{0} = - \frac{h (ρ)}{T - 1}

and

c_{0} = - \frac{h^{'} (ρ)}{T - 1}

, where

V_{0}^{L B}

,

b_{0}

and

c_{0}

are the notations used in their paper. ☐

From Equation (14), we can see that

ψ (ρ) = ϕ (ρ) - l (ρ)

and

l (ρ) = \frac{T - 1}{2} ln (\frac{a}{N} ρ^{2} - 2 \frac{b}{N} ρ + \frac{c}{N})

. The posterior mode is found by checking the intersection points of

h (ρ)

, i.e.,

\frac{d ϕ (ρ)}{d ρ}

and

l^{'} (ρ)

, which is

l^{'} (ρ) = \frac{(T - 1) (\frac{a}{N} ρ - \frac{b}{N})}{\frac{a}{N} ρ^{2} - 2 \frac{b}{N} ρ + \frac{c}{N}} .

(56)

Assuming that the true exogenous regressors are included such that

h_{2} (β, \underset{̲}{ρ}) = h_{3} (β) = 0

and using Equation (27) to (30), we can find that the probability limit of

l^{'} (ρ)

, denoted as

{\underset{̲}{l}}^{'} (ρ)

, is

{\underset{̲}{l}}^{'} (ρ) = \underset{N \to ∞}{p l i m} l^{'} (ρ) = \frac{(T - 1) (ρ - \underset{̲}{ρ} - γ)}{{(ρ - \underset{̲}{ρ} - γ)}^{2} + \frac{σ^{2}}{{\underset{̲}{a}}^{2}} [\underset{̲}{a} (T - 1) - σ^{2} h^{2} (\underset{̲}{ρ})]} .

(57)

From Equation (49) in Lemma 10, we know that

\frac{σ^{2}}{{\underset{̲}{a}}^{2}} [\underset{̲}{a} (T - 1) - σ^{2} h^{2} (\underset{̲}{ρ})] > \frac{σ^{4} h^{2} (\underset{̲}{ρ})}{{\underset{̲}{a}}^{2}} \geq 0

. Hence the denominator in Equation (57) is positive and

{\underset{̲}{l}}^{'} (ρ) \geq (<) 0

for

ρ \geq (<) \underset{̲}{ρ} + γ

. Moreover,

{\underset{̲}{l}}^{''} (ρ) = \underset{N \to ∞}{p l i m} l^{''} (ρ) = - (T - 1) \frac{{(ρ - \underset{̲}{ρ} - γ)}^{2} - \frac{σ^{2}}{{\underset{̲}{a}}^{2}} [\underset{̲}{a} (T - 1) - σ^{2} h^{2} (\underset{̲}{ρ})]}{{({(ρ - \underset{̲}{ρ} - γ)}^{2} + \frac{σ^{2}}{{\underset{̲}{a}}^{2}} [\underset{̲}{a} (T - 1) - σ^{2} h^{2} (\underset{̲}{ρ})])}^{2}} .

(58)

The denominator above is positive. The polynomial in the numerator has two roots:

\underset{̲}{ρ} + γ \pm \sqrt{\frac{σ^{2}}{{\underset{̲}{a}}^{2}} [\underset{̲}{a} (T - 1) - σ^{2} h^{2} (\underset{̲}{ρ})]}

. Using Equation (49) in Lemma 10 again, we have

\underset{̲}{ρ} + γ + \sqrt{\frac{σ^{2}}{{\underset{̲}{a}}^{2}} [\underset{̲}{a} (T - 1) - σ^{2} h^{2} (\underset{̲}{ρ})]} > \underset{̲}{ρ} - \frac{σ^{2} h (\underset{̲}{ρ})}{\underset{̲}{a}} + \frac{σ^{2}}{\underset{̲}{a}} |h (\underset{̲}{ρ})|

and

\underset{̲}{ρ} + γ - \sqrt{\frac{σ^{2}}{{\underset{̲}{a}}^{2}} [\underset{̲}{a} (T - 1) - σ^{2} h^{2} (\underset{̲}{ρ})]} < \underset{̲}{ρ} - \frac{σ^{2} h (\underset{̲}{ρ})}{\underset{̲}{a}} - \frac{σ^{2}}{\underset{̲}{a}} |h (\underset{̲}{ρ})|

. Therefore the true value of ρ, i.e.,

\underset{̲}{ρ}

lies in between the two roots of

{\underset{̲}{l}}^{''} (ρ) = 0

, where

{\underset{̲}{l}}^{'} (ρ)

is increasing.14 When ρ is larger (less) than the bigger (smaller) root,

{\underset{̲}{l}}^{'} (ρ)

will be decreasing. Since

{\underset{̲}{l}}^{'} (ρ) \geq (<) 0

for

ρ \geq (<) \underset{̲}{ρ} + γ

, we can see that

\lim_{\to \pm ∞} {\underset{̲}{l}}^{'} (ρ) = 0

. Since

{\underset{̲}{l}}^{'} (\underset{̲}{ρ}) = h (\underset{̲}{ρ})

,

h (ρ)

therefore intersects

{\underset{̲}{l}}^{'} (ρ)

at

\underset{̲}{ρ}

. Define

\underset{̲}{ψ} (ρ) = ϕ (ρ) - \underset{N \to ∞}{p l i m} l (ρ)

. Evaluating its second order derivative at the true value of ρ yields

{\underset{̲}{ψ}}^{''} (\underset{̲}{ρ}) = h^{'} (\underset{̲}{ρ}) - \frac{[(T - 1) \underset{̲}{a} - 2 σ^{2} h^{2} (\underset{̲}{ρ})]}{(T - 1) σ^{2}}

. Using Equation (50) in Lemma 10, we can find

{\underset{̲}{ψ}}^{''} (\underset{̲}{ρ}) < 0

. In other words,

\underset{̲}{ρ}

is a local maximum for

\underset{̲}{ψ} (ρ)

. When T is even, because

h (ρ) > 0

is increasing after

- 1

and

{\underset{̲}{l}}^{'} (ρ)

is decreasing beyond the bigger root of

{\underset{̲}{l}}^{''} (ρ) = 0

,

h (ρ)

should intersect

{\underset{̲}{l}}^{'} (ρ)

again as at the point

ρ_{U}

in Figure 1, which is larger than

\underset{̲}{ρ}

and the bigger root of

{\underset{̲}{l}}^{''} (ρ) = 0

. We can see that

\underset{̲}{ψ} (ρ)

has bell shape over

(- ∞, ρ_{U}]

when T is even. However, for

ρ > ρ_{U}

,

\underset{̲}{ψ} (ρ)

is increasing and hence

\lim_{ρ \to ∞} \underset{̲}{ψ} (ρ) = ∞

. To ensure the marginal posterior of ρ to be proper, we have to restrict the bounds of ρ in our estimation. Similarly, when T is odd, since

h (ρ)

is strictly increasing,

h (ρ)

should intersect

{\underset{̲}{l}}^{'} (ρ)

on the left hand side of

\underset{̲}{ρ}

. There should be three intersection points as shown in Figure 2. Since

\underset{̲}{ψ} (ρ)

is decreasing for

ρ < ρ_{L}

due to

{\underset{̲}{ψ}}^{'} (ρ) < 0

, we have

\lim_{ρ \to - ∞} \underset{̲}{ψ} (ρ) = ∞

when T is odd. Choosing

ρ_{L}

and

ρ_{U}

in the way described by Proposition 4 can ensure

\underset{̲}{ψ} (ρ)

evaluated at the end points to be smaller than

\underset{̲}{ψ} (\underset{̲}{ρ})

and the marginal posterior density of ρ to be proper.

Figure 1. Intersection points of

h (ρ)

and

{\underset{̲}{l}}^{'} (ρ)

when

T = 6

,

\underset{̲}{ρ} = 1

,

σ^{2} = 1

,

σ_{f}^{2} = 1

,

σ_{y_{0}}^{2} = 0.1

,

E (f_{i}) = E (y_{i, 0}) = E (f_{i} y_{i, 0}) = 0

and there are no exogenous regressors.

Figure 1. Intersection points of

h (ρ)

and

{\underset{̲}{l}}^{'} (ρ)

when

T = 6

,

\underset{̲}{ρ} = 1

,

σ^{2} = 1

,

σ_{f}^{2} = 1

,

σ_{y_{0}}^{2} = 0.1

,

E (f_{i}) = E (y_{i, 0}) = E (f_{i} y_{i, 0}) = 0

and there are no exogenous regressors.

Figure 2. Intersection points of

h (ρ)

and

{\underset{̲}{l}}^{'} (ρ)

when

T = 5

,

\underset{̲}{ρ} = 1

,

σ^{2} = 1

,

σ_{f}^{2} = 1

,

σ_{y_{0}}^{2} = 0.1

,

E (f_{i}) = E (y_{i, 0}) = E (f_{i} y_{i, 0}) = 0

and there are no exogenous regressors.

Figure 2. Intersection points of

h (ρ)

and

{\underset{̲}{l}}^{'} (ρ)

when

T = 5

,

\underset{̲}{ρ} = 1

,

σ^{2} = 1

,

σ_{f}^{2} = 1

,

σ_{y_{0}}^{2} = 0.1

,

E (f_{i}) = E (y_{i, 0}) = E (f_{i} y_{i, 0}) = 0

and there are no exogenous regressors.

E. Proof of Proposition 6

To prove Proposition 6 and 7, essentially we need to simplify the integral(s) which appears in the Bayes factor. One way to do it is Laplace's method, the details of which can be found in [20,21]. Under the assumption that there exists only one solution

ρ^{*}

in (

ρ_{L}

,

ρ_{U}

) for

ψ^{'} (ρ) = 0

with

ψ^{''} (ρ^{*}) < 0

, the integral appearing in the Bayes factor can be written as

\underset{ρ_{L}}{\int^{ρ_{U}}} exp [N ψ (ρ)] = \sqrt{\frac{2 π}{N |ψ^{''} (ρ^{*})|}} exp [N ψ (ρ^{*})] (1 + O (\frac{1}{N}))

(59)

Building on Equation (18), the first and second order derivatives of

\underset{̲}{ψ} (ρ)

are

\begin{matrix} {\underset{̲}{ψ}}^{'} (ρ) & = & h (ρ) - \frac{(T - 1) (ρ - \underset{̲}{ρ} - γ)}{[ρ^{2} - 2 ρ (\underset{̲}{ρ} + γ) + {\underset{̲}{ρ}}^{2} + 2 \underset{̲}{ρ} γ + \frac{(T - 1) σ^{2} + h_{3} (β)}{\underset{̲}{a}}]}, \end{matrix}

(60)

\begin{matrix} {\underset{̲}{ψ}}^{''} (ρ) = h^{'} (ρ) - \\ \frac{(T - 1) [ρ^{2} - 2 ρ (\underset{̲}{ρ} + γ) + {\underset{̲}{ρ}}^{2} + 2 \underset{̲}{ρ} γ + \frac{(T - 1) σ^{2} + h_{3} (β)}{\underset{̲}{a}} - 2 {(ρ - \underset{̲}{ρ} - γ)}^{2}]}{{[ρ^{2} - 2 ρ (\underset{̲}{ρ} + γ) + {\underset{̲}{ρ}}^{2} + 2 \underset{̲}{ρ} γ + \frac{(T - 1) σ^{2} + h_{3} (β)}{\underset{̲}{a}}]}^{2}} . \end{matrix}

(61)

If the chosen set of regressors can lead to consistent estimation of ρ, i.e., either Equation (20) or (21) is satisfied, evaluating Equations (18), (60) and (61) at

\underset{̲}{ρ}

will give

\begin{matrix} \underset{̲}{ψ} (\underset{̲}{ρ}) & = ϕ (\underset{̲}{ρ}) - \frac{T - 1}{2} ln [(T - 1) σ^{2} + h_{3} (β)], \\ {\underset{̲}{ψ}}^{'} (\underset{̲}{ρ}) & = 0, \\ {\underset{̲}{ψ}}^{''} (\underset{̲}{ρ}) & = h^{'} (\underset{̲}{ρ}) - \frac{\underset{̲}{a} (T - 1)}{(T - 1) σ^{2} + h_{3} (β)} + \frac{2 h^{2} (\underset{̲}{ρ})}{T - 1} . \end{matrix}

The Bayes factor in Equation (25) is

\frac{p (Y | Y_{0}, M_{1})}{p (Y | Y_{0}, M_{0})} = {(\frac{η}{η + 1})}^{\frac{k_{1} - k_{0}}{2}} \sqrt{\frac{2 π}{N |ψ^{''} (ρ_{| M_{1}}^{*} | M_{1})|}} \frac{exp [N ψ (ρ_{| M_{1}}^{*} | M_{1})]}{(ρ_{U_{1}} - ρ_{L_{1}}) {(\frac{c_{| M_{0}}}{N})}^{- \frac{N (T - 1)}{2}}} (1 + O (\frac{1}{N})) .

(62)

Asymptotically, replacing

ψ (ρ_{| M_{1}}^{*} | M_{1})

,

ψ^{''} (ρ_{| M_{1}}^{*} | M_{1})

and

\frac{c_{| M_{0}}}{N}

by their probability limits and

\frac{η}{η + 1}

by

O (N^{α})

with

α < 0

(our prior choice for η) should not affect the analysis of the Bayes factor. Define

ξ_{10} = \frac{O (N^{\frac{α (k_{1} - k_{0})}{2}})}{ρ_{U_{1}} - ρ_{L_{1}}} \sqrt{\frac{2 π}{N |{\underset{̲}{ψ}}^{''} (ρ_{| M_{1}}^{*} | M_{1})|}} exp [N (\underset{̲}{ψ} (ρ_{| M_{1}}^{*} | M_{1}) + \frac{T - 1}{2} ln {\underset{̲}{c}}_{| M_{0}})],

(63)

which should have the same asymptotic behaviour as Equation (62). If

X_{i 1}

is the true set of regressors to generate Y (so

h_{2} (β, \underset{̲}{ρ} | M_{1}) = h_{3} (β | M_{1}) = 0

and

ρ_{| M_{1}}^{*} = \underset{̲}{ρ}

),

ξ_{10}

can be written as

\begin{matrix} ξ_{10} = \frac{O (N^{\frac{(k_{1} - k_{0})}{2}})}{ρ_{U} - ρ_{L}} \sqrt{\frac{2 π}{N |h^{'} (\underset{̲}{ρ}) - \frac{{\underset{̲}{a}}_{| M_{1}}}{σ^{2}} + \frac{2 h^{2} (\underset{̲}{ρ})}{T - 1}|}} exp \{N ϕ (\underset{̲}{ρ}) + \frac{N (T - 1)}{2} ln [\frac{{\underset{̲}{c}}_{| M_{0}}}{(T - 1) σ^{2}}]\} . \end{matrix}

(64)

So we can guarantee

\frac{p (Y | Y_{0}, M_{1})}{p (Y | Y_{0}, M_{0})}

tends to infinity given

\underset{̲}{ρ} \neq 0

as long as Equation (31) holds.

Now let us consider the case when the true model is

M_{0}

in Equation (25), i.e.,

\underset{̲}{ρ}

is 0 and

X_{i 0}

is the set of true regressors.

ξ_{10}

takes the following form,

\begin{matrix} ξ_{10} = & \frac{{[(T - 1) σ^{2}]}^{\frac{N (T - 1)}{2}}}{ρ_{U_{1}} - ρ_{L_{1}}} O (N^{\frac{α (k_{1} - k_{0})}{2}}) \sqrt{\frac{2 π}{N |{\underset{̲}{ψ}}^{''} (ρ_{| M_{1}}^{*} | M_{1})|}} exp [N ψ (ρ_{| M_{1}}^{*} | M_{1})] \\ = & O (N^{\frac{α (k_{1} - k_{0})}{2}}) \sqrt{\frac{2 π}{N |{\underset{̲}{ψ}}^{''} (ρ_{| M_{1}}^{*} | M_{1})|}} \frac{exp [N ϕ (ρ_{| M_{1}}^{*}) + \frac{N (T - 1)}{2} ln \frac{(T - 1) σ^{2}}{d (ρ_{| M_{1}}^{*} | M_{1})}]}{ρ_{U_{1}} - ρ_{L_{1}}} . \end{matrix}

(65)

If Equation (32) holds, then the Bayes factor in Equation (25) will tend to 0 for large sample size. If

M_{1}

is misspecified, but ρ can be consistently estimated, i.e.,

ρ_{| M_{1}}^{*} = 0

, Equation (65) can be simplified as

ξ_{10} = O (N^{\frac{(k_{1} - k_{0})}{2}}) \sqrt{\frac{2 π}{N |{\underset{̲}{ψ}}^{''} (0 | M_{1})|}} \frac{exp [\frac{N (T - 1)}{2} ln [\frac{(T - 1) σ^{2}}{(T - 1) σ^{2} + h_{3} (β | M_{1})}]]}{ρ_{U_{1}} - ρ_{L_{1}}} .

(66)

If Equation (21) holds, some true regressors should be excluded from

M_{1}

and hence Equation (32) is true with

h_{3} (β | M_{1}) > 0

. If Equation (20) holds, we have

h_{3} (β | M_{1}) = 0

and

k_{1} \geq k_{0}

. The Bayes factor will therefore tend to 0 when N tends to infinity.

Finally we show when Equation (31) will be violated. If

M_{0}

only includes the true exogenous regressors, we should have

h_{2} (β, \underset{̲}{ρ} | M_{0}) = h_{3} (β | M_{0}) = 0

and

{\underset{̲}{a}}_{| M_{0}} > σ^{2} t r a c e (C^{'} H C)

as in Equation (55). The following should be true:

\begin{matrix} ϕ (\underset{̲}{ρ}) + \frac{T - 1}{2} ln [\frac{{\underset{̲}{a}}_{| M_{0}} {\underset{̲}{ρ}}^{2} - 2 \underset{̲}{ρ} σ^{2} h (\underset{̲}{ρ})}{(T - 1) σ^{2}} + 1] > υ (\underset{̲}{ρ}) = \\ ϕ (\underset{̲}{ρ}) + \frac{T - 1}{2} ln [1 + \frac{{\underset{̲}{ρ}}^{2} \sum_{j = 0}^{T - 2} (T - j - 1) {\underset{̲}{ρ}}^{2 j}}{T - 1} - \frac{{\underset{̲}{ρ}}^{2} \sum_{j = 0}^{T - 2} {(\sum_{i = 0}^{j} {\underset{̲}{ρ}}^{i})}^{2}}{T (T - 1)} - \frac{2 \underset{̲}{ρ} h (\underset{̲}{ρ})}{T - 1}] . \end{matrix}

(67)

When

T = 2

, the right hand side of Equation (67), i.e.,

υ (\underset{̲}{ρ})

, will be

\frac{\underset{̲}{ρ}}{2} + \frac{1}{2} ln (1 - \underset{̲}{ρ} + \frac{{\underset{̲}{ρ}}^{2}}{2})

and it is an increasing function with

υ (0) = 0

. Hence

υ (\underset{̲}{ρ})

is less than 0 when

\underset{̲}{ρ} < 0

. The left hand side of Equation (67) can be negative for

\underset{̲}{ρ} < 0

if

σ^{2}

is much larger than

E (f_{i}^{2})

and

E (y_{i, 0}^{2})

.15 When T is an odd number great than or equal to 3,

υ (ρ)

is positive for

ρ \in ℝ

. When T is even and greater than 2,

υ (ρ)

is positive for

ρ \in (- 1, ∞)

and has a root less than

- 1

. If

\underset{̲}{ρ}

is less than the root,

υ (\underset{̲}{ρ})

will be negative. By direct calculation, as T increases, we find that the root of

υ (ρ)

will get closer to

- 1

from the left. Hence, to sum up, Equation (31) will hold for

\underset{̲}{ρ} \in (- 1, ∞)

when T is any integer greater than or equal to 3 and

h_{2} (β, \underset{̲}{ρ} | M_{0}) = h_{3} (β | M_{0}) = 0

.

F. Proof of Proposition 7

By Laplace's method, we can write Equation (26) as

\begin{matrix} \frac{p (Y | Y_{0}, X_{1}, M_{1})}{p (Y | Y_{0}, X_{2}, M_{2})} = & \frac{ρ_{U_{2}} - ρ_{L_{2}}}{ρ_{U_{1}} - ρ_{L_{1}}} {(\frac{η}{η + 1})}^{\frac{k_{1} - k_{2}}{2}} \sqrt{\frac{ψ^{''} (ρ_{| M_{2}}^{*} | M_{2})}{ψ^{''} (ρ_{| M_{1}}^{*} | M_{1})}} \\ exp [N (ψ (ρ_{| M_{1}}^{*} | M_{1}) - ψ (ρ_{| M_{2}}^{*} | M_{2}))] (1 + O (\frac{1}{N})) . \end{matrix}

(68)

Suppose the true model is

M_{2}

, similar to the previous section, by dropping

\frac{ρ_{U_{2}} - ρ_{L_{2}}}{ρ_{U_{1}} - ρ_{L_{1}}} \sqrt{|\frac{{\underset{̲}{ψ}}^{''} (ρ_{| M_{2}}^{*} | M_{2})}{{\underset{̲}{ψ}}^{''} (\underset{̲}{ρ} | M_{1})}|} = O (1)

, we can obtain the corresponding

ξ_{12}

,

ξ_{12} = O (N^{\frac{α (k_{1} - k_{2})}{2}}) exp \{N [ϕ (ρ_{| M_{1}}^{*}) - ϕ (\underset{̲}{ρ}) + \frac{T - 1}{2} ln \frac{d (\underset{̲}{ρ} | M_{2})}{d (ρ_{| M_{1}}^{*} | M_{1})}]\} .

(69)

Note that

d (\underset{̲}{ρ} | M_{2}) = (T - 1) σ^{2}

. So if Equation (33) is satisfied, the Bayes factor is consistent in model selection. If

M_{2}

despite being misspecified can still lead to consistent estimation of ρ,

ξ_{12}

will become

ξ_{12} = O (N^{\frac{α (k_{1} - k_{2})}{2}}) {\{\frac{(T - 1) σ^{2}}{(T - 1) σ^{2} + h_{3} (β | M_{1})}\}}^{\frac{N (T - 1)}{2}} .

(70)

If Equation (21) holds, we will have

h_{3} (β | M_{1}) > 0

and hence Equation (33); if Equation (20) holds,

M_{1}

will nest

M_{2}

(

k_{1} > k_{2}

) with

h_{3} (β | M_{2}) = 0

. For both cases,

\frac{p (Y | Y_{0}, X_{1}, M_{1})}{p (Y | Y_{0}, X_{2}, M_{2})}

will tend to 0.

G. Proof of Proposition 8

The likelihood function takes the following form,

\begin{matrix} p (Y | θ, Y_{0}) & = {(2 π)}^{- \frac{T N}{2}} σ^{2 (- \frac{N T}{2})} \\ \overset{N}{\prod_{i = 1}} exp {- \frac{1}{2 σ^{2}} {[y_{i} - y_{i_} ρ - ι f_{i} - X_{i} β]}^{'} [y_{i} - y_{i_} ρ - ι f_{i} - X_{i} β]} . \end{matrix}

(71)

By taking log of the likelihood function and solving the first order condition, we can obtain the maximum likelihood estimators as the following,

\begin{matrix} \hat{σ^{2}} & = & \frac{1}{N T} \overset{N}{\sum_{i = 1}} {[y_{i} - y_{i_} \hat{ρ} - ι {\hat{f}}_{i} - X_{i} \hat{β}]}^{'} [y_{i} - y_{i_} \hat{ρ} - ι {\hat{f}}_{i} - X_{i} \hat{β}], \end{matrix}

(72)

\begin{matrix} {\hat{f}}_{i} & = & \frac{ι^{'} (y_{i} - y_{i_} \hat{ρ} - X_{i} \hat{β})}{T}, \end{matrix}

(73)

\begin{matrix} \hat{β} & = & {(\overset{N}{\sum_{i = 1}} X_{i}^{'} H X_{i})}^{- 1} \overset{N}{\sum_{i = 1}} X_{i}^{'} H (y_{i} - y_{i_} \hat{ρ}), \end{matrix}

(74)

\begin{matrix} \hat{ρ} & = & \frac{b}{a}, \end{matrix}

(75)

Substituting the above into the log of (71) multiplied by

- 2

and adding the appropriate constants (number of parameters multiplied by the natural log of the sample size) yields the BIC Equations in (34) and (35). A smaller BIC value indicates evidence in favour of the model. Let us now look at the case of Equation (25). When

X_{i 1}

are the true regressors to generate

Y_{i}

, the BIC difference between

M_{0}

and

M_{1}

is

B I C_{| M_{0}} - B I C_{| M_{1}} = N T [ln \frac{c_{| M_{0}}}{N} - ln (\frac{c_{| M_{1}}}{N} - \frac{{(b_{| M_{1}} / N)}^{2}}{a_{| M_{1}} / N})] + (k_{0} - k_{1} - 1) ln (N T) .

(76)

Asymptotically speaking, replacing

\frac{a}{N}

,

\frac{b}{N}

and

\frac{c}{N}

by

\underset{̲}{a}

,

\underset{̲}{b}

and

\underset{̲}{c}

defined in Equation (27) to (30) respectively should not affect the analysis. Define

ω_{01}

as

\begin{matrix} ω_{01} = & N T ln \frac{{\underset{̲}{c}}_{| M_{0}}}{{\underset{̲}{c}}_{| M_{1}} - \frac{{\underset{̲}{b}}_{| M_{1}}^{2}}{{\underset{̲}{a}}_{| M_{1}}}} + (k_{0} - k_{1} - 1) ln (N T) \\ = & N T ln \frac{(T - 1) σ^{2} + h_{3} (β | M_{0}) + {\underset{̲}{a}}_{| M_{0}} {\underset{̲}{ρ}}^{2} + 2 \underset{̲}{ρ} h_{2} (β, \underset{̲}{ρ} | M_{0}) - 2 \underset{̲}{ρ} σ^{2} h (\underset{̲}{ρ})}{(T - 1) σ^{2} + h_{3} (β | M_{1}) - \frac{{[h_{2} (β, 0 | M_{1}) - σ^{2} h (\underset{̲}{ρ})]}^{2}}{{\underset{̲}{a}}_{| M_{1}}}} \\ + (k_{0} - k_{1} - 1) ln (N T) . \end{matrix}

(77)

If

M_{1}

is the true model, we should have

ω_{01} > 0

,

h_{3} (β | M_{1}) = h_{2} (β, 0 | M_{1}) = 0

and inside the natural log of the first term, the numerator should be larger than the denominator. That is we should have Equation (37) stated in Proposition 8 or

h_{3} (β | M_{0}) + {\underset{̲}{a}}_{| M_{0}} {\underset{̲}{ρ}}^{2} + 2 \underset{̲}{ρ} h_{2} (β, \underset{̲}{ρ} | M_{0}) - 2 \underset{̲}{ρ} σ^{2} h (\underset{̲}{ρ}) + \frac{σ^{4} h^{2} (\underset{̲}{ρ})}{{\underset{̲}{a}}_{| M_{1}}} > 0 .

(78)

If

X_{i 0}

is the same as

X_{i 1}

, we will have

{\underset{̲}{a}}_{| M_{0}} = {\underset{̲}{a}}_{| M_{1}} = \underset{̲}{a}

,

k_{1} = k_{0}

and

h_{2} (β, \underset{̲}{ρ} | M_{0}) = h_{3} (β | M_{0}) = 0

. The left of Equation (78) will become

\underset{̲}{a} {(\underset{̲}{ρ} - \frac{σ^{2} h (\underset{̲}{ρ})}{\underset{̲}{a}})}^{2}

. If

\underset{̲}{ρ} - \frac{σ^{2} h (\underset{̲}{ρ})}{\underset{̲}{a}} = 0

, i.e.,

\underset{̲}{ρ} + γ = \underset{N \to ∞}{p l i m} {\hat{ρ}}_{M L E} = 0

, we will have

B I C_{| M_{0}} - B I C_{| M_{1}} < 0

asymptotically, which means we will prefer

M_{0}

over

M_{1}

even if

\underset{̲}{ρ} \neq 0

. In a situation like this, model selection is not consistent. The problem with BIC will also arise when

M_{0}

is the true model with

\underset{̲}{ρ} = 0

. Now

ω_{01}

is

ω_{01} = N T ln \frac{(T - 1) σ^{2}}{{\underset{̲}{c}}_{| M_{1}} - \frac{{\underset{̲}{b}}_{| M_{1}}^{2}}{{\underset{̲}{a}}_{| M_{1}}}} + (k_{0} - k_{1} - 1) ln (N T),

(79)

To have

ω_{01} < 0

, we should have Equation (36) in Proposition 8 or

\frac{{[h_{2} (β, 0 | M_{1}) - \frac{(T - 1) σ^{2}}{T}]}^{2}}{{\underset{̲}{a}}_{| M_{1}}} - h_{3} (β | M_{1}) < 0 .

(80)

If

h_{2} (β, 0 | M_{1}) = h_{3} (β | M_{1}) = 0

, we will have

ω_{01} > 0

, which implies inconsistency in model selection. For the case of Equation (26), the corresponding

ω_{21}

is

ω_{21} = N T ln \frac{{\underset{̲}{c}}_{| M_{2}} - \frac{{\underset{̲}{b}}_{| M_{2}}^{2}}{{\underset{̲}{a}}_{| M_{2}}}}{{\underset{̲}{c}}_{| M_{1}} - \frac{{\underset{̲}{b}}_{| M_{1}}^{2}}{{\underset{̲}{a}}_{| M_{1}}}} + (k_{2} - k_{1}) ln (N T) .

(81)

BIC will be consistent if we have Equation (38) stated in Proposition 8 or

{\underset{̲}{a}}_{| M_{1}} {\underset{̲}{a}}_{| M_{2}} h_{3} (β | M_{2}) + \underset{̲}{a} | M_{2} σ^{4} h^{2} (\underset{̲}{ρ}) - {\underset{̲}{a}}_{| M_{1}} {[h_{2} (β, \underset{̲}{ρ} | M_{2}) - σ^{2} h (\underset{̲}{ρ})]}^{2} > 0 .

(82)

If

X_{i 2}

nests

X_{i 1}

, the left of Equation (82) can be simplified as

({\underset{̲}{a}}_{| M_{2}} - {\underset{̲}{a}}_{| M_{1}}) σ^{4} h^{2} (\underset{̲}{ρ})

with

{\underset{̲}{a}}_{| M_{2}} \leq {\underset{̲}{a}}_{| M_{1}}

. When

{\underset{̲}{a}}_{| M_{2}} = {\underset{̲}{a}}_{| M_{1}}

,

ω_{21}

will become

(k_{2} - k_{1}) ln (N T)

with

k_{2} > k_{1}

. BIC is therefore consistent. But if

{\underset{̲}{a}}_{| M_{2}} < {\underset{̲}{a}}_{| M_{1}}

,

ω_{21}

in Equation (81) will be dominated by the first term (negative) asymptotically and hence BIC is inconsistent.

References

M. Nerlove. “Experimental Evidence on the Estimation of Dynamic Economic Relations from a Time Series of Cross-Sections.” Econ. Stud. Quart. 18 (1968): 42–74. [Google Scholar]
S. Nickell. “Biases in Dynamic Models with Fixed Effects.” Econometrica 49 (1981): 1417–1426. [Google Scholar] [CrossRef]
T. Lancaster. “The incidental parameter problem since 1948.” J. Econom. 95 (2000): 391–413. [Google Scholar] [CrossRef]
C. Hsiao. Analysis of Panel Data, 2nd ed. Cambridge, UK: Cambridge University Press, 2003. [Google Scholar]
C. Bester, and C. Hansen. “A Penalty Function Approach to Bias Reduction in Nonlinear Panel Models with Fixed Effects.” J. Bus. Econ. Stat. 27 (2009): 131–148. [Google Scholar] [CrossRef]
J. Hahn, and G. Kuersteiner. “Bias reduction for dynamic nonlinear panel models with fixed effects.” Econ. Theory 27 (2011): 1152–1191. [Google Scholar] [CrossRef]
M. Arellano, and S. Bonhomme. “Robust Priors in Nonlinear Panel Data Models.” Econometrica 77 (2009): 489–536. [Google Scholar]
G. Dhaene, and K. Jochmans. Likelihood Inference in an Autoregression with Fixed Effects. Discussion Paper, Sciences Po. Cambridge, UK: Cambridge University Press, 2013. [Google Scholar]
D. Andrews, and B. Lu. “Consistent model and moment selection procedures for GMM estimation with application to dynamic panel data models.” Biometrika 101 (2001): 123–164. [Google Scholar] [CrossRef]
Y. Lee, and P.C. Phillips. “Model Selection in the Presence of Incidental Parameters.” J. Econ., 2014. [Google Scholar] [CrossRef]
T. Lancaster. “Orthogonal Parameters and Panel Data.” Rev. Econ. Stud. 69 (2002): 647–666. [Google Scholar] [CrossRef]
D.J. Poirier. Intermediate Statistics and Econometrics : A Comparative Approach. Cambridge, MA, USA: MIT Press, 1995. [Google Scholar]
H. White. Asymptotic Theory for Econometricians. Upper Saddle River, NJ, USA: Prentice Hall, 2001. [Google Scholar]
A. Zellner. “On Assessing Prior Distributions and Bayesian Regression Analysis with G-prior Distribution.” In Bayesian Inference and Decision Techniques: Essays in Honour of Bruno de Finetti. Edited by P.K. Goel and A. Zellner. Amsterdam, The Netherlands: North-Holland, 1986, pp. 233–243. [Google Scholar]
E. Ley, and M.F. Steel. “On the Effect of Prior Assumptions in Bayesian Model Averaging with Applications to Growth Regression.” J. Appl. Econom. 24 (2009): 651–674. [Google Scholar] [CrossRef]
C. Fernandez, E. Ley, and M.F. Steel. “Benchmark Priors for Bayesian Model Averaging.” J. Econom. 100 (2001): 381–427. [Google Scholar] [CrossRef]
D.R. Cox, and N. Reid. “Parameter Orthogonality and Approximate Conditional Inference.” J. R. Stat. Soc. Ser. B 49 (1987): 1–39. [Google Scholar]
Y. Chikuse. Statistics on Special Manifolds. Berlin, Germany; Heidelberg, Germany: Springer, 2003. [Google Scholar]
G. Koop, D.J. Poirier, and J.L. Tobias. Bayesian Econometric Methods. Cambridge, UK: Cambridge University Press, 2007. [Google Scholar]
L. Tierney, and J. Kadane. “Accurate Approximations for Posterior Moments and Marginal Densities.” J. Am. Stat. Assoc. 81 (1986): 82–86. [Google Scholar] [CrossRef]
R.E. Kass, L. Tierney, and J.B. Kadane. “The Validity of Posterior Expansions Based on Laplace’s Method.” In Bayesian and Likelihood Methods in Statistics and Econometrics: Essays in Honor of George a Barnard. Edited by S. Geisser, J.S. Hodges, S.J. Press and A. Zellner. Amsterdam, The Netherlands: North-Holland, 1990. [Google Scholar]

¹They treat the bias as a result from finite time periods (finite sample bias) and remove the first order bias in the Taylor expansion.
²One could withhold some sample while calculating the expression to see how its value changes with N.
³They are SSR when η = 0.
⁴When T is even, $\lim_{ρ \to ∞} \underset{̲}{ψ} (ρ) = 0$ . For model comparison, we need a proper prior for ρ and hence $ρ_{L}$ must be finite for finite N. In practice, we can choose $ρ_{L}$ to ensure that $[ρ_{L}, ρ_{U}]$ contains a unique posterior mode when the true set of regressors are included. In the subsequent simulations, we choose $ρ_{L} = - N$ when T is even.
⁵Note that $h_{3} (β)$ is the probability limit of $\frac{1}{N}$ times the SSR obtained by regressing ${\underset{̲}{X}}_{i} β$ on fixed effects and $X_{i}$ .
⁶Due to the assumptions in the example, $\frac{- (T - 1) h_{2} (β, \underset{̲}{ρ})}{h_{3} (β)} = \frac{- (T - 1) t r a c e {C^{'} H \sum_{i = 1}^{N} [V a r ({\underset{̲}{X}}_{i} β) + H E ({\underset{̲}{X}}_{i} β) E {({\underset{̲}{X}}_{i} β)}^{'}]}}{t r a c e {H \sum_{i = 1}^{N} [V a r ({\underset{̲}{X}}_{i} β) + H E ({\underset{̲}{X}}_{i} β) E {({\underset{̲}{X}}_{i} β)}^{'}]}}$ . Since $\frac{\sum_{i = 1}^{N} V a r ({\underset{̲}{X}}_{i} β)}{N}$ is proportional to $I_{T}$ and $H E ({\underset{̲}{X}}_{i} β) E {({\underset{̲}{X}}_{i} β)}^{'} = 0$ . Hence $\frac{- (T - 1) h_{2} (β, \underset{̲}{ρ})}{h_{3} (β)} = - t r a c e (C^{'} H) = h (\underset{̲}{ρ})$ .
⁷That is how often the model with the highest posterior model probability is not the true model.
⁸In the subsequent discussion, we use ER10 to denote the proportion of errors made when the true model includes $y_{i_}$ while the chosen model does not, and ER01 for when the true model does not include $y_{i_}$ while the chosen model does. Note that either ER10 + ER11 = 1 or ER01 + ER00 = 1. The notations for BIC are defined similarly.
⁹Since the correlation among different regressors is random in our simulation, ${\underset{̲}{a}}_{M_{0}}$ and the root are also random.
¹⁰The absolute value of Nickell bias under $\underset{̲}{ρ} = 1$ is bigger than when $\underset{̲}{ρ} = - 1$ . None of the conditions for model selection consistency are found to be violated.
¹¹We have also obtained the results of finite sample biases of different point estimators (available upon request), which show that the top model point estimators are generally less biased than other criteria and hence the higher top model RMSE should be due to larger estimator variances.
¹²When $λ = 1 (0.01)$ , by simulation, we find that the $2.5 %$ th, $50 %$ th and $97.5 %$ th quantile of $| q_{h}^{'} z |$ respectively are around $0.012$ ( $0.12$ ), $0.26$ ( $0.94$ ) and $0.73$ ( $0.99$ ).
¹³Strictly speaking, the right hand side should be multiplied by an arbitrary constant not involving ρ.
¹⁴Note that $\underset{̲}{a}$ and $σ^{2}$ are positive. When $h (\underset{̲}{ρ}) \geq 0$ , we can have $\underset{̲}{ρ} + γ + \sqrt{\frac{σ^{2}}{{\underset{̲}{a}}^{2}} [\underset{̲}{a} (T - 1) - σ^{2} h^{2} (\underset{̲}{ρ})]} > \underset{̲}{ρ}$ and $\underset{̲}{ρ} + γ - \sqrt{\frac{σ^{2}}{{\underset{̲}{a}}^{2}} [\underset{̲}{a} (T - 1) - σ^{2} h^{2} (\underset{̲}{ρ})]} < \underset{̲}{ρ} - \frac{2 σ^{2} h (\underset{̲}{ρ})}{\underset{̲}{a}} \leq \underset{̲}{ρ}$ ; when $h (\underset{̲}{ρ}) < 0$ , we can have $\underset{̲}{ρ} + γ + \sqrt{\frac{σ^{2}}{{\underset{̲}{a}}^{2}} [\underset{̲}{a} (T - 1) - σ^{2} h^{2} (\underset{̲}{ρ})]} > \underset{̲}{ρ} - \frac{2 σ^{2} h (\underset{̲}{ρ})}{\underset{̲}{a}} > \underset{̲}{ρ}$ and $\underset{̲}{ρ} + γ - \sqrt{\frac{σ^{2}}{{\underset{̲}{a}}^{2}} [\underset{̲}{a} (T - 1) - σ^{2} h^{2} (\underset{̲}{ρ})]} < \underset{̲}{ρ}$ .
¹⁵When $T = 2$ , $E (f_{i} y_{i, 0}) = 0$ and the true model contains no exogenous regressors, $ϕ (\underset{̲}{ρ}) + \frac{T - 1}{2} ln [\frac{{\underset{̲}{a}}_{| M_{0}} {\underset{̲}{ρ}}^{2} - 2 \underset{̲}{ρ} σ^{2} h (\underset{̲}{ρ})}{(T - 1) σ^{2}} + 1]$ is equal to $\frac{\underset{̲}{ρ}}{2} + \frac{1}{2} ln (1 - \underset{̲}{ρ} + \frac{[\frac{E (f_{i}^{2})}{2} + E (y_{i, 0}^{2}) (\frac{{\underset{̲}{ρ}}^{2}}{2} - \underset{̲}{ρ} + \frac{1}{2}) + \frac{σ^{2}}{2}] {\underset{̲}{ρ}}^{2}}{σ^{2}})$ , which approaches $υ (\underset{̲}{ρ})$ as $\frac{E (f_{i}^{2})}{σ^{2}}$ and $\frac{E (y_{i, 0}^{2})}{σ^{2}}$ get smaller.

© 2015 by the author; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license ( http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, G. Consistency in Estimation and Model Selection of Dynamic Panel Data Models with Fixed Effects. Econometrics 2015, 3, 494-524. https://doi.org/10.3390/econometrics3030494

AMA Style

Li G. Consistency in Estimation and Model Selection of Dynamic Panel Data Models with Fixed Effects. Econometrics. 2015; 3(3):494-524. https://doi.org/10.3390/econometrics3030494

Chicago/Turabian Style

Li, Guangjie. 2015. "Consistency in Estimation and Model Selection of Dynamic Panel Data Models with Fixed Effects" Econometrics 3, no. 3: 494-524. https://doi.org/10.3390/econometrics3030494

Article Menu

Consistency in Estimation and Model Selection of Dynamic Panel Data Models with Fixed Effects

Abstract

1. Introduction

2. The Model and the Estimation

3. Motivations and Methods to Compare Different Model Specifications

4. Consistency in Model Selection

5. Simulation Studies

5.1. When Model Selection is Consistent

5.2. When Equation (31) is Violated for Bayes Factors

5.3. When Equations (36), (37) or (38) is Violated for BIC

5.4. Point Estimation

6. Conclusions

Acknowledgments

Conflicts of Interest

Appendix

A. The DGP of Exogenous Regressors

B. Properties of the Exogenous Regressors in the Simulation

C. Proof of Proposition 3 and Proposition 5

D. Proof of Proposition 4

E. Proof of Proposition 6

F. Proof of Proposition 7

G. Proof of Proposition 8

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI