Asymptotic Theory for a Parameter Dimension-Split Estimation in Time Series Analysis for Multinomial Data

Sutradhar, Brajendra C.; Rao, R. Prabhakar

doi:10.3390/math14122068

Open AccessArticle

Asymptotic Theory for a Parameter Dimension-Split Estimation in Time Series Analysis for Multinomial Data

by

Brajendra C. Sutradhar

^1,* and

R. Prabhakar Rao

²

¹

Department of Mathematics and Statistics, Memorial University, St. John’s, NL A1C 5S7, Canada

²

Department of Economics, Sri Sathya Sai Institute of Higher Learning, Prasanthi Nilayam, Anantapur 515134, India

^*

Author to whom correspondence should be addressed.

Mathematics 2026, 14(12), 2068; https://doi.org/10.3390/math14122068

Submission received: 13 May 2026 / Revised: 1 June 2026 / Accepted: 5 June 2026 / Published: 10 June 2026

(This article belongs to the Section D1: Probability and Statistics)

Download Versions Notes

Abstract

The parameter space in a regression model for multinomial time series data contains the regression parameters those explain the effects of the time dependent covariates, and the dynamic dependence or category transition parameters those explain the influence of the past responses on the multinomial response at a given time. The estimation of the regression parameters can be negatively affected when higher dimension of the parameter space is considered specially for the transition parameters. In this paper we propose a parameter dimension-split approach where a conditional generalized quasi-likelihood (CGQL) estimating function is first developed for the dynamic dependence parameters in terms of unknown regression parameters which is exploited in the next step to develop an observed information matrix based maximum likelihood (ML) estimating equation for the main regression parameters. More specifically, this split approach helps to write the actual joint likelihood function of regression and dynamic dependence parameters as a likelihood function of regression parameters only by replacing the dynamic dependence parameters with their CGQL estimates obtained in the first step. As the time series length is generally large in practice, we have made sure that the proposed CGQL and ML estimators are asymptotically reliable, that is consistent for the respective parameters.

Keywords:

asymptotic properties (consistency); categorical (multinomial) responses over time; dimension-reduction estimation approach; dynamic dependenceand regression parameters; generalized quasi-likelihood and a modified maximum likelihood estimation; time-dependent covariates

MSC:

62A01; 62M05; 62J02; 62F10

1. Introduction

Binary time series analysis is an important research topic in economics and statistics, among other areas. For example, the economic status (profit/loss) of a pharmaceutical industry may be recorded over the years along with certain exogenous explanatory covariates, such as type of industry, yearly advertising cost, and other research and development expenditures. It is likely that the binary profit status of an industry in a given year is correlated with the status of profits from previous years. It is of interest to know both (i) the effects of the time-dependent covariates, and (ii) the dynamic relationship among the responses over the years. Because obtaining the binary status may depend on certain latent variables, the binary time series data have been modeled and analyzed in a variety of ways over the last four decades, primarily in the fields of statistics and econometrics. Readers may refer to existing studies ([1,2,3,4,5,6,7,8,9,10,11,12]) on various binary time series models. More specifically, most of the models considered in these studies have either binary probit or logit forms. By the same token, the multinomial time series analysis is also an important research topic. In this case, as an extension to the binary response variable, one deals with a categorical response variable with more than two categories. For example, it may be more appropriate to classify the profit status of the industries into several categories, such as heavy loss, moderate loss, no loss, moderate profit, or healthy profit, and then examine the effects of exogenous covariates on such categorical responses collected over a long period of time. The correlations among categories over time are also of interest. This type of multinomial time series data has been analyzed mainly in statistics literature by a few authors such as refs. [13,14,15,16,17,18]. As far as the dynamic relationship is concerned, studies by refs. [16,18], for example, have considered a multinomial dynamic logit (MDL) model as a generalization of the binary logit time series model.

As far as the BDL (binary dynamic logits) model is concerned, it is constructed as follows. Let

{y_{t}^{*}, t = 1, \dots, T}

be a sequence of a latent variable, and a binary response

y_{t}

is observed at time t using the relationship

y_{t} = \{\begin{matrix} 1 & if y_{t}^{*} > 0 \\ 0 & otherwise . \end{matrix}

(1)

Also, let

x_{t} = {(x_{t 1}, \dots, x_{t ℓ}, \dots, x_{t, p + 1})}^{'}

denote the

(p + 1)

-dimensional exogenous explanatory covariate vector and

β = {(β_{0}, β_{1}, \dots, β_{p})}^{'}

denote the effect of

x_{t}

on the binary response

y_{t} .

Next, suppose that the latent variable

y_{1}^{*}

in (1) follows a logistic distribution

f_{L} (y_{1}^{*})

([19]) with mean

g_{1}^{*} = x_{1}^{'} β

and variance

\frac{π^{2}}{3}

, whereas for

t = 2, \dots, T, y_{t}^{*}

follows the same logistic distribution

f_{L} (\cdot)

but with mean

g_{t}^{*} = x_{t}^{'} β + γ y_{t - 1}

and variance

\frac{π^{2}}{3},

γ

being a lag 1 dynamic dependence parameter. It then follows from (1) that

\begin{matrix} P r (y_{1} = 1) & = & \int_{- \infty}^{x_{1}^{'} β} f_{L} (y_{1}^{*}) d y_{1}^{*} = \exp (x_{1}^{'} β) / [1 + \exp (x_{1}^{'} β)] \\ P r (y_{t} = 1 | y_{t - 1}) & = & \int_{- \infty}^{x_{t}^{'} β + γ y_{t - 1}} f_{L} (y_{t}^{*}) d y_{t}^{*}, for t = 2, \dots, T \\ = & \exp (x_{t}^{'} β + γ y_{t - 1}) / [1 + (x_{t}^{'} β + γ y_{t - 1})], \end{matrix}

(2)

(see also [6], p. 422). To understand the recursive mean

E [Y_{t}],

variance

var [Y_{t}],

and lag ℓ correlations between

Y_{t}

and

Y_{t - ℓ},

produced by this BDL model (2), it is of interest to estimate

β

and

γ .

Note that there is no range restriction for these parameters.

The above mentioned MDL model is a generalization of the BDL model given by (2). For a discussion on this generalization, see, for example, the studies by refs. [16,17,18,20]. More specifically, the MDL (multinomial dynamic logits) model is constructed as follows: Let

y_{t} = {(y_{t 1}, \dots, y_{t c}, \dots, y_{t, C - 1})}^{'}

denote the

(C - 1)

-dimensional multinomial response variable and for

c = 1, \dots, C - 1,

y_{t}^{(c)} = {(y_{t 1}^{(c)}, \dots, y_{t c}^{(c)}, \dots, y_{t, C - 1}^{(c)})}^{'} = {(01_{c - 1}^{'}, 1, 01_{C - 1 - c}^{'})}^{'} \equiv δ_{t c}

(3)

indicates that the multinomial response recorded at time t belongs to the cth category. For

c = C,

one writes

y_{t}^{(C)} = δ_{t C} = 01_{C - 1} .

Here, and also in (3), for a scalar constant

c_{0}

, we use

c_{0} 1_{C}

for simplicity to represent

c_{0} \otimes 1_{c},

with ⊗ being the well-known Kronecker or direct product. This notation will also be used throughout the rest of the paper when needed. At any time point

t,

let

{fi}_{c} = {(β_{c 0}, β_{c 1}, \dots, β_{c p})}^{'}

denote the effect of

x_{t}

on

y_{t}^{(c)}

for

c = 1, \dots, C - 1 .

Also, let

π_{(1) c}

denote the marginal multinomial probability at time

t = 1

for the observation

y_{t}

to be in the cth category; and for

t = 2, \dots, T,

let the transitional probability from the gth

(g = 1, \dots, C)

category at time

t - 1

to the cth category at time

t,

be denoted by

η_{t | t - 1}^{(c)} (g) .

As an extension of the BDL model (2), one may then write the MDL model as

\begin{matrix} P [y_{1} = y_{1}^{(c)}] = π_{(1) c} & = & \{\begin{matrix} \frac{\exp (x_{1}^{'} {fi}_{c})}{1 + \sum_{g = 1}^{C - 1} \exp (x_{1}^{'} {fi}_{g})} & for c = 1, \dots, C - 1 \\ \frac{1}{1 + \sum_{g = 1}^{C - 1} \exp (x_{1}^{'} {fi}_{g})} & for c = C \end{matrix} \\ P (Y_{t} = y_{t}^{(c)} | Y_{t - 1} = y_{t - 1}^{(g)}) & = & η_{t | t - 1}^{(c)} (g) for t = 2, \dots, T \\ = & \{\begin{matrix} \frac{\exp [x_{t}^{'} β_{c} + γ_{c}^{'} y_{t - 1}^{(g)}]}{1 + \sum_{v = 1}^{C - 1} \exp [x_{t}^{'} β_{v} + γ_{v}^{'} y_{t - 1}^{(g)}]}, & for c = 1, \dots, C - 1 \\ \frac{1}{1 + \sum_{v = 1}^{C - 1} \exp [x_{t}^{'} β_{v} + γ_{v}^{'} y_{t - 1}^{(g)}]}, & for c = C, \end{matrix} \end{matrix}

(4)

where

γ_{c} = {(γ_{c 1}, \dots, γ_{c v}, \dots, γ_{c, C - 1})}^{'}

denotes the dynamic dependence parameters.

Note that for further notational convenience, one may re-express the transitional probabilities in (4) as

\begin{matrix} η_{t | t - 1}^{(c)} (g) = \{\begin{matrix} \frac{\exp [x_{t}^{'} β_{c} + γ_{c}^{'} δ_{(t - 1) g}]}{1 + \sum_{v = 1}^{C - 1} \exp [x_{t}^{'} β_{v} + γ_{v}^{'} δ_{(t - 1) g}]}, & for c = 1, \dots, C - 1 \\ \frac{1}{1 + \sum_{v = 1}^{C - 1} \exp [x_{t}^{'} β_{v} + γ_{v}^{'} δ_{(t - 1) g}]}, & for c = C, \end{matrix} \end{matrix}

(5)

where for

t = 2, \dots, T,

δ_{(t - 1) g}

through (1) has the formula

δ_{(t - 1) g} = \{\begin{matrix} {[01_{g - 1}^{'}, 1, 01_{C - 1 - g}^{'}]}^{'} & for g = 1, \dots, C - 1 \\ 01_{C - 1} & for g = C . \end{matrix}

Note that in (5), the category g occurred at time

t - 1 .

Thus the category g depends on time

t - 1,

and

δ_{(t - 1) g} \equiv δ_{g_{t - 1}} .

However, for simplicity, we use g for

g_{t - 1} .

For convenience, we use

β^{*} = {(β_{1}^{'}, \dots, β_{c}^{'}, \dots, β_{C - 1}^{'})}^{'} : (p + 1) (C - 1) \times 1

to represent all regression effects involved in the marginal and conditional probabilities given in (4) and (5). Similarly, we use

γ^{*} = {(γ_{1}^{'}, \dots, γ_{c}^{'}, \dots, γ_{C - 1}^{'})}^{'} : {(C - 1)}^{2} \times 1

to represent all dynamic dependence parameters involved in the transitional probabilities (4) or (5).

We remark that as we explain below or more specifically in Section 2, the estimation of the aforementioned regression and dynamic dependence parameters through existing joint likelihood approach (for both parameters) may be negatively affected when the dimension of these parameter vectors is large. As a remedy in this paper we provide a dimension-split approach where a new likelihood function is constructed only for the main regression parameters by replacing the large dimensional dynamic dependence parameters in the joint likelihood function with their conditional estimates those will be obtained first using a conditional generalized quasi-likelihood (CGQL) approach conditional on unknown main regression parameters.

Turning back to the estimation importance, notice that the unconditional means

E [Y_{t}],

variances

var [Y_{t}],

and pair-wise correlations

corr [Y_{u}, Y_{t}], u < t,

computed by exploiting the model (4) will involve

β^{*}

and

γ^{*} .

Thus, to understand these basic properties of the multinomial time series, it is of importance to obtain consistent estimates for

β^{*}

and

γ^{*},

at least asymptotically, that is, when

T \to \infty .

As far as the formulas are concerned, the means as the functions of

β^{*}

and

γ^{*}

have the recursive relationship given by

\begin{matrix} E [Y_{t}] = η_{(t | t - 1)} (C) + [η_{(t | t - 1), M} - η_{(t | t - 1)} (C) 1_{C - 1}^{'}] E [Y_{t - 1}] \\ = & {\tilde{π}}_{(t)} (β^{*}, γ^{*}) = {({\tilde{π}}_{(t) 1}, \dots, {\tilde{π}}_{(t) c}, \dots, {\tilde{π}}_{(t) (C - 1)})}^{'}, for t = 2, \dots, T - 1, \end{matrix}

(6)

where the expectation at the initial time

t = 1

has the formula

\begin{matrix} E [Y_{1}] & = & {[π_{(1) 1}, \dots, π_{(1) c}, \dots, π_{(1) (C - 1)}]}^{'} \\ = & π_{(1)} (β^{*}), \end{matrix}

where

π_{(1) c}

is given by (4) for all

c = 1, \dots, C .

In (6),

η_{(t | t - 1)} (C)

is the

(C - 1)

-dimensional vector of conditional probabilities, given by

η_{(t | t - 1)} (C) = {[η_{t | t - 1}^{(1)} (C), \dots, η_{t | t - 1}^{(c)} (C) \dots, η_{t | t - 1}^{(C - 1)} (C)]}^{'},

(7)

and

η_{(t | t - 1), M}

is the

(C - 1) \times (C - 1)

matrix of conditional probabilities given by

η_{(t | t - 1), M} = (η_{t | t - 1}^{(c)} (g))

(8)

where

η_{t | t - 1}^{(c)} (g)

is the

(c, g)

-th element of the matrix for

c = 1, \dots, C - 1;

g = 1, \dots, C - 1 .

Furthermore, similar to (6), the variances and covariances also have the recursive relationships and they are given by

\begin{matrix} var [Y_{t}] & = & diag [{\tilde{π}}_{(t) 1} (β^{*}), \dots, {\tilde{π}}_{(t) c} (β^{*}, γ^{*}), \dots, {\tilde{π}}_{(t) (C - 1)} (β^{*}, γ^{*})] \\ - & {\tilde{π}}_{(t)} (β^{*}, γ^{*}) {\tilde{π}}_{(t)}^{'} (β^{*}, γ^{*}) \\ = & (cov (Y_{t c}, Y_{t k})) = ({\tilde{σ}}_{(t t) c k} (β^{*}, γ^{*})), c, k = 1, \dots, C - 1 \\ = & {\tilde{Σ}}_{(t t)} (β^{*}, γ^{*}), for t = 1, \dots, T \end{matrix}

(9)

\begin{matrix} cov [Y_{u}, Y_{t}] & = & Π_{s = u + 1}^{t} [η_{(s | s - 1), M} - η_{(s | s - 1)} (C) 1_{C - 1}^{'}] var [Y_{u}], for u < t, t = 2, \dots, T \\ = & (cov (Y_{u c}, Y_{t k})) = ({\tilde{σ}}_{(u t) c k} (β^{*}, γ^{*})), c, k = 1, \dots, C - 1 \\ = & {\tilde{Σ}}_{(u t)} (β^{*}, γ^{*}) . \end{matrix}

(10)

As far as the inference is concerned, some authors such as [18] used a partial likelihood approach for the estimation of the regression and the dynamic dependence parameters

(β^{*}, γ^{*})

under the assumption that the observable covariates

{x_{t}, t = 1, \dots, T}

are random, and perhaps dependent on lagged values of the response variable. This approach is equivalent to the so-called conditional likelihood approach where the likelihood is obtained by conditioning the history of both covariates and responses. Consequently, one may easily obtain the information matrix conditional on the whole process history ([16], Equation (17), p. 364.) Notice, however, that the information matrix conditional on the entire history is not the same as the information matrix conditional on the covariate process; but, in many cases, it is the same as the observed information matrix, for example, when the covariate distribution does not involve lagged values of the response variable. Among others, the study by ref. [18] (Equations (17)–(19)) used a joint likelihood approach (see also [20]) to estimate the parameters

β^{*}

and

γ^{*},

where the likelihood estimating equation was solved by using the Fisher information matrix based on Newton’s iterative procedure. These authors found through an intensive simulation study that with a minimal dimension of the parameter space, both approaches work quite well even with relatively short-length series; however, when higher dimensions of the parameter space is considered, it was found that, although both methods require longer series to achieve acceptable levels of accuracy, the approach based on the observed information matrix to compute the standard errors of the parameter estimators performs relatively worse than the approach based on the Fisher information matrix. But obtaining the Fisher information matrix can be computationally involved.

Note that because both the observed and Fisher information matrix-based estimating equations produce similarly efficient estimates for the parameters when the dimension of the parameter space is minimal, and also because the computation of the Fisher information can be complex, the main objective of this paper is to use an observed information matrix-based estimation approach with minimal dimensions for the parameter space. More specifically, we first develop a consistent estimating function for the dynamic dependence parameter

γ^{*}

as a function of the unknown regression effects

β^{*} .

We denote this function as

{\hat{γ}}^{*} (β^{*}) .

We then estimate the regression parameters

β^{*}

by exploiting the likelihood function

L (β^{*}, {\hat{γ}}^{*} (β^{*})),

say, instead of the joint likelihood function

L (β^{*}, γ^{*}) .

Thus, in this approach, the estimation of

β^{*}

will not depend on the dimension of the dynamic dependence parameters

γ^{*} .

This reduced dimension-based estimation approach using the observed information matrix is further elucidated in Section 2. The asymptotic theory for this dimension-reduction approach is discussed in Section 3. Finally, concluding remarks are given in Section 4.

2. Estimation of Parameters: A Dimension-Reduction Approach

Because

β^{*} = {(β_{1}^{'}, \dots, β_{C - 1}^{'})}^{'} : (C - 1) (p + 1) \times 1,

and

γ^{*} = {(γ_{1}^{'}, \dots, γ_{C - 1}^{'})}^{'} : {(C - 1)}^{2} \times 1,

the dimension of the parameter space depends on both p and

C .

Customarily, the joint estimation of

β^{*}

and

γ^{*},

that is,

θ = {({β^{*}}^{'}, {γ^{*}}^{'})}^{'}

, is performed by maximizing the likelihood function with regard to

θ,

where the likelihood function under the model (4) has the form

\begin{matrix} L (θ) = L (β^{*}, γ^{*}) & = & Π_{c = 1}^{C} {[π_{(1) c} (β^{*}; x_{1})]}^{y_{1 c}} \\ \times & Π_{t = 2}^{T} Π_{c = 1}^{C} Π_{g = 1}^{C} {[η_{t | t - 1}^{(c | g)} (β^{*}, γ^{*}; x_{t})]}^{y_{t c}}, \end{matrix}

(11)

where

π_{(1) c} (β^{*})

and

η_{t | t - 1}^{(c | g)} (β^{*}, γ^{*})

are marginal (at

t = 1

) and transitional probabilities, respectively. The solution of the log likelihood estimating equation, namely

\frac{\partial L o g L (θ)}{\partial θ} = 0

, may be obtained by solving the Hessian or Fisher information matrix-based iterative equations. More specifically, the Hessian matrix-based iterative equation has the form

\hat{θ} (r + 1) = \hat{θ} (r) + {[{\{\frac{\partial^{2} L o g L (θ)}{\partial θ \partial θ^{'}}\}}^{- 1} \frac{\partial L o g L (θ)}{\partial θ}]}_{| θ = \hat{θ} (r)}

(12)

(see [16]), and the Fisher information matrix-based equation has the form

\hat{θ} (r + 1) = \hat{θ} (r) + {[{\{E \frac{\partial^{2} L o g L (θ)}{\partial θ \partial θ^{'}}\}}^{- 1} \frac{\partial L o g L (θ)}{\partial θ}]}_{| θ = \hat{θ} (r)}

(13)

(see [18]). Under the assumption that the Hessian matrix is positive semi-definite, some authors, such as [14] (Equations (4.1) and (4.4)), studied the asymptotic properties (as

T \to \infty

) of the likelihood estimator obtained from (12). When the dimension of the parameter (

θ

) space is large, however, the authors of [18] found that this type of Hessian matrix-based likelihood estimate of

θ

performs relatively worse for moderately large

T,

as compared to the Fisher information matrix-based estimate (13). But, obtaining the Fisher information matrix is algebraically involved. When the dimension of the parameter space was small, both Hessian and Fisher information matrix-based estimates were found to work almost the same. For this reason, in this paper, we consider a dimension-splitting or reduction approach for the parameter space, so that the Hessian matrix-based likelihood estimates can be used even when T is not infinitely large. More specifically, we develop a conditional generalized quasi-likelihood (CGQL) estimating function for

γ^{*}

as a function of unknown

β^{*} .

This estimating function may be denoted by

{\hat{γ}}^{*} (β^{*}) .

We then use this estimating function and construct a modified likelihood function for

β^{*}

as

L (β^{*}, {\hat{γ}}^{*} (β^{*})),

so that the estimation of

β^{*}

does not depend on the dimension of

γ^{*} .

Also, this approach will provide a different asymptotic theory than the one in [14] for the estimator of the main regression parameter

β^{*} .

The CGQL cum-modified maximum likelihood estimation (MMLE) approach is provided in Section 2.1 and Section 2.2 below, and the asymptotic theory for the estimates is given in Section 3.

2.1. CGQL Estimating Function for Dynamic Dependence Parameters $(γ^{})$ as a Function of Unknown $β^{}$

Notice from (4) that conditional on

y_{t - 1},

the response vector

y_{t} = {(y_{t 1}, \dots, y_{t c}, \dots, y_{t, C - 1})}^{'}

follows a multinomial distribution with

C - 1

-dimensional mean vector

\begin{matrix} E [Y_{t} | y_{t - 1}] = η_{t | t - 1} (β^{*}, γ^{*}), \end{matrix}

(14)

and

(C - 1) \times (C - 1)

covariance matrix

\begin{matrix} cov [Y_{t} | y_{t - 1}] & = & D_{η_{t | t - 1}} - η_{t | t - 1} (β^{*}, γ^{*}) η_{t | t - 1}^{'} (β^{*}, γ^{*}) \\ = & Σ_{t | t - 1} (β^{*}, γ^{*}), (s a y), \end{matrix}

(15)

where

η_{t | t - 1} (β^{*}, γ^{*}) = {[η_{t | t - 1}^{(1)} (β^{*}, γ^{*}), \dots, η_{t | t - 1}^{(c)} (β^{*}, γ^{*}), \dots, η_{t | t - 1}^{(C - 1)} (β^{*}, γ^{*})]}^{'} : (C - 1) \times 1,

with

η_{t | t - 1}^{(c)} (β^{*}, γ^{*}) = \exp [x_{t}^{'} β_{c} + γ_{c}^{'} y_{t - 1}] / [1 + \sum_{v = 1}^{C - 1} \exp [x_{t}^{'} β_{v} + γ_{v}^{'} y_{t - 1}]],

for

c = 1, \dots, C - 1 .

The diagonal matrix in (15) has the form

D_{η_{t | t - 1}} = diag [η_{t | t - 1}^{(1)} (β^{*}, γ^{*}), \dots, η_{t | t - 1}^{(c)} (β^{*}, γ^{*}), \dots, η_{t | t - 1}^{(C - 1)} (β^{*}, γ^{*})] .

In notation, for

t = 2, \dots, T,

we write this multinomial distribution of

Y_{t}

as

Y_{t} | y_{t - 1} \sim M u l t (η_{t | t - 1} (β^{*}, γ^{*}), Σ_{t | t - 1} (β^{*}, γ^{*})) .

(16)

Now, to develop a CGQL estimating function

{\hat{γ}}^{*} (β^{*})

for

γ^{*},

as a generalization of the QL approach of [21], we follow the GQL approach in [22] and exploit the conditional mean vector

η_{t | t - 1} (β^{*}, γ^{*})

and the conditional covariance matrix

Σ_{t | t - 1} (β^{*}, γ^{*}) .

For this purpose, suppose that the following two assumptions hold.

Assumption 1.

Consider the conditional probability function

η_{t | t - 1}^{(c)} (β^{*}, γ^{*}) = \exp [x_{t}^{'} β_{c} + γ_{c}^{'} y_{t - 1}] / [1 + \sum_{v = 1}^{C - 1} \exp [x_{t}^{'} β_{v} + γ_{v}^{'} y_{t - 1}]]

defined in (15) for a categorical observation to be in c-th category at time t which weights

γ_{c} = {(γ_{c 1}, \dots, γ_{c (C - 1)})}^{'}

with all possible

γ_{h} = {(γ_{h 1}, \dots, γ_{h (C - 1)})}^{'}, h = 1, \dots, C - 1

that could occur at time

(t - 1) .

We assume that this weighted probability function is continuous, that is,

\frac{\partial η_{t | t - 1}^{(c)} (β^{*}, γ^{*})}{\partial γ_{h}}

exists for all

c, h = 1, \dots, C - 1 .

Assumption 2.

The second-order derivative matrix

\frac{\partial η_{t | t - 1}^{'} (β^{*}, γ^{*})}{\partial γ^{*}} \frac{\partial η_{t | t - 1} (β^{*}, γ^{*})}{\partial {γ^{*}}^{'}}

is bounded and positive definite.

Proposition 1.

When the above two assumptions hold, the CGQL estimator

{\hat{γ}}_{C G Q L}^{*} (β^{*})

for

γ^{*}

may be obtained by using the iterative equation

\begin{matrix} {\hat{γ}}_{C G Q L}^{*} (β^{*}) (r + 1) = {\hat{γ}}_{C G Q L}^{*} (β^{*}) (r) \\ + & [{\{\sum_{t = 2}^{T} \frac{\partial η_{t | t - 1}^{'} (β^{*}, γ^{*})}{\partial γ^{*}} {Σ^{- 1}}_{t | t - 1} (β^{*}, γ^{*}) \frac{\partial η_{t | t - 1} (β^{*}, γ^{*})}{\partial {γ^{*}}^{'}}\}}^{- 1} \\ \times & {\{\sum_{t = 2}^{T} \frac{\partial η_{t | t - 1}^{'} (β^{*}, γ^{*})}{\partial γ^{*}} {Σ^{- 1}}_{t | t - 1} (β^{*}, γ^{*}) [y_{t} - η_{t | t - 1} (β^{*}, γ^{*})]\}]}_{| γ^{*} = {\hat{γ}}_{C G Q L}^{*} (r)}, \end{matrix}

(17)

Proof of Proposition 1.

This proposition follows from the fact that under the model (4), one may write the GQL estimating equation for

γ^{*}

as

\sum_{t = 2}^{T} \frac{\partial η_{t | t - 1}^{'} (β^{*}, γ^{*})}{\partial γ^{*}} {Σ^{- 1}}_{t | t - 1} (β^{*}, γ^{*}) [y_{t} - η_{t | t - 1} (β^{*}, γ^{*})] = 0

(18)

□

Lemma 1.

In Assumption 2, the

{(C - 1)}^{2} \times (C - 1)

derivative matrix

\frac{\partial η_{t | t - 1}^{'} (β^{*}, γ^{*})}{\partial γ^{*}}

has the computational formula

\begin{matrix} \frac{\partial η_{t | t - 1}^{'} (β^{*}, γ^{*})}{\partial γ^{*}} = [\frac{\partial η_{t | t - 1}^{(1)} (β^{*}, γ^{*})}{\partial γ^{*}}, \dots, \frac{\partial η_{t | t - 1}^{(c)} (β^{*}, γ^{*})}{\partial γ^{*}}, \dots, \frac{(\begin{matrix} η_{t | t - 1}^{(C - 1)} (β^{*}, γ^{*}) \end{matrix})}{\partial γ^{*}}] \\ = & (\begin{matrix} η_{t | t - 1}^{(1)} (δ_{(t - 1) 1} - η_{t | t - 1}) & \dots & η_{t | t - 1}^{(C - 1)} (δ_{(t - 1) (C - 1)} - η_{t | t - 1}) \end{matrix}) \otimes y_{t - 1} \\ = & η_{(t | t - 1), M}^{*} (β^{*}, γ^{*}) \otimes y_{t - 1}, \end{matrix}

(19)

where

η_{t | t - 1}^{*} (β^{*}, γ^{*})

denotes the matrix constructed by using the

(C - 1)

-dimensional column vectors

[η_{t | t - 1}^{(c)} (δ_{(t - 1) c} - η_{t | t - 1})]

for all

c = 1, \dots, C - 1 .

Proof.

Because

η_{t | t - 1}^{(c)} (β^{*}, γ^{*}) = \exp [x_{t}^{'} β_{c} + γ_{c}^{'} y_{t - 1}] / [1 + \sum_{v = 1}^{C - 1} \exp [x_{t}^{'} β_{v} + γ_{v}^{'} y_{t - 1}]],

for

c = 1, \dots, C - 1,

by (5), it then follows that

\begin{matrix} \frac{\partial η_{t | t - 1}^{(c)}}{\partial γ_{h}} & = & \{\begin{matrix} y_{t - 1} η_{t | t - 1}^{(c)} [1 - η_{t | t - 1}^{(c)}] & for h = c; h, c = 1, \dots, C - 1 \\ - y_{t - 1} η_{t | t - 1}^{(c)} η_{t | t - 1}^{(h)} & for h \neq c; h, c = 1, \dots, C - 1 . \end{matrix} \end{matrix}

(20)

Next, because

γ^{*} = {(γ_{1}^{'}, \dots, γ_{h}^{'}, \dots, γ_{C - 1}^{'})}^{'},

one obtains

\begin{matrix} \frac{\partial η_{t | t - 1}^{(c)}}{\partial γ^{*}} & = & (\begin{matrix} - η_{t | t - 1}^{(1)} η_{t | t - 1}^{(c)} \\ ⋮ \\ η_{t | t - 1}^{(c)} [1 - η_{t | t - 1}^{(c)}] \\ ⋮ \\ - η_{t | t - 1}^{(C - 1)} η_{t | t - 1}^{(c)} \end{matrix}) \otimes y_{t - 1} : (C - 1) (C - 1) \times 1 \\ = & [η_{t | t - 1}^{(c)} (δ_{(t - 1) c} - η_{t | t - 1})] \otimes y_{t - 1}, \end{matrix}

(21)

with

δ_{(t - 1) c} = \{\begin{matrix} {[01_{c - 1}^{'}, 1, 01_{C - 1 - c}^{'}]}^{'} & for c = 1, \dots, C - 1 \\ 01_{C - 1} & for c = C . \end{matrix}

The lemma, i.e., Equation (19), then follows from (21). □

Note that for the asymptotic studies to be discussed in Section 3, it is convenient to use (19) in (17) and re-express the iterative Equation (17) for

γ^{*}

as

\begin{matrix} {\hat{γ}}_{C G Q L}^{*} (β^{*}) (r + 1) = {\hat{γ}}_{C G Q L}^{*} (β^{*}) (r) \\ + & [{\{\sum_{t = 2}^{T} [(η_{(t | t - 1), M}^{*} (β^{*}, γ^{*}) \otimes y_{t - 1}) {Σ^{- 1}}_{t | t - 1} (β^{*}, γ^{*}) {(η_{(t | t - 1), M}^{*} (β^{*}, γ^{*}) \otimes y_{t - 1})}^{'}]\}}^{- 1} \\ \times & {\{\sum_{t = 2}^{T} (η_{(t | t - 1), M}^{*} (β^{*}, γ^{*}) \otimes y_{t - 1}) {Σ^{- 1}}_{t | t - 1} (β^{*}, γ^{*}) [y_{t} - η_{t | t - 1} (β^{*}, γ^{*})]\}]}_{| γ = \hat{γ} (r)} . \end{matrix}

(22)

Let

{\hat{γ}}_{C G Q L}^{*} (β^{*})

denote the moment estimating function of

γ^{*}

obtained via (22).

2.2. Modified Maximum Likelihood (MML) Estimation for $β^{*}$ Using Observed Information

Notice that because

γ^{*}

can be estimated by

{\hat{γ}}_{C G Q L}^{*} (β^{*})

using the estimating function given in (22), one is not concerned about the dimension of

γ^{*}

for

β^{*}

estimation. Thus in a reduced dimension setup, we may estimate

β^{*}

by exploiting the modified likelihood function for

β^{*}

, which is obtained as follows by replacing

γ^{*}

with

{\hat{γ}}_{C G Q L}^{*} (β^{*})

in the joint likelihood function

L (β^{*}, γ^{*}),

for

β^{*}

and

γ^{*} .

More specifically, the modified likelihood function for

β^{*},

by (4), may be written as

\begin{matrix} L (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*})) & = & Π_{c = 1}^{C} {[π_{(1) c} (β^{*}; x_{1})]}^{y_{1 c}} \\ \times & Π_{t = 2}^{T} Π_{c = 1}^{C} Π_{g = 1}^{C} {[{\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}); x_{t})]}^{y_{t c}}, \end{matrix}

(23)

where

{\tilde{η}}_{t | t - 1}^{(c | g)} (\cdot)

denotes the partially estimated dynamic probability function obtained from the true dynamic probability function

η_{t | t - 1}^{(c | g)} (\cdot)

defined in (4), by replacing

γ^{*}

with

{\hat{γ}}_{C G Q L}^{*} (β^{*}) .

Thus,

\begin{matrix} {\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}); x_{t}) \\ = & \{\begin{matrix} \frac{\exp [x_{t}^{'} β_{c} + {\hat{γ}}_{c, C G Q L}^{'} (β^{*}) δ_{(t - 1) g}]}{1 + \sum_{v = 1}^{C - 1} \exp [x_{t}^{'} β_{v} + {\hat{γ}}_{v, C G Q L}^{'} (β^{*}) δ_{(t - 1) g}]}, & for c = 1, \dots, C - 1 \\ \frac{1}{1 + \sum_{v = 1}^{C - 1} \exp [x_{t}^{'} β_{v} + {\hat{γ}}_{v, C G Q L}^{'} (β^{*}) δ_{(t - 1) g_{j}}]}, & for c = C . \end{matrix} \end{matrix}

(24)

We remark that as

{\hat{γ}}_{C G Q L}^{*} (β^{*})

from (22) has an implicit functional form, the construction of the likelihood estimating equation for

β^{*}

encounters a computational problem because of the difficulty in obtaining

\frac{\partial {\hat{γ}}_{C G Q L}^{*} (β^{*})}{\partial β^{*}}

from an implicit function. However, the likelihood estimating equation for

β^{*}

involving

\frac{\partial {\hat{γ}}_{C G Q L}^{*} (β^{*})}{\partial β^{*}}

may be computed as follows:

Likelihood estimating equation for $β^{*}$ : We follow (23) and write this estimating equation for

β^{*}

as

\begin{matrix} \frac{\partial L o g L (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))}{\partial β^{*}} = \sum_{c = 1}^{C} \frac{y_{1 c}}{π_{(1) c} (β^{*})} \frac{\partial π_{(1) c} (β^{*})}{\partial β^{*}} \\ + & \sum_{t = 2}^{T} \sum_{g = 1}^{C} \sum_{c = 1}^{C} [\frac{y_{t c}}{{\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))} \frac{\partial {\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))}{\partial β^{*}}] = 0, \end{matrix}

(25)

where

π_{(1) c} (β^{*})

and

{\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))

are given by (4) and (23), respectively. Their derivatives with respect to

β^{*}

needed for (24) are given in the following two Lemmas.

Lemma 2.

Computation of

\frac{\partial π_{(1) c} (β^{*})}{\partial β^{*}} .

This derivative has the formula

\frac{\partial π_{(1) c} (β^{*})}{\partial β^{*}} = [π_{(1) c} (β^{*}) (δ_{(1) c} - π_{(1)} (β^{*}))] \otimes x_{1} : (C - 1) (p + 1) \times 1,

(26)

where

δ_{(1) c} = \{\begin{matrix} {[01_{c - 1}^{'}, 1, 01_{C - 1 - c}^{'}]}^{'} & for c = 1, \dots, C - 1 \\ 01_{C - 1} & for c = C . \end{matrix}

Proof.

The proof is obvious because

π_{(1)} (β^{*}) = {(π_{(1) 1} (β^{*}), \dots, π_{(1) 1 c} (β^{*}), \dots, π_{(1) (C - 1)} (β^{*}))}^{'},

with

π_{(1) c} (β^{*}) = \{\begin{matrix} \frac{\exp (x_{1}^{'} β_{c})}{1 + \sum_{g = 1}^{C - 1} \exp (x_{1}^{'} β_{g})} & for c = 1, \dots, C - 1; \\ \frac{1}{1 + \sum_{g = 1}^{C - 1} \exp (x_{1}^{'} β_{g})} & for c = C, \end{matrix}

by (4). □

Lemma 3.

Computation of

\frac{\partial {\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))}{\partial β^{*}} : (p + 1) (C - 1) \times 1 .

The computation of this derivative matrix requires the formula for the derivative matrix

\frac{\partial [{\hat{γ}}_{c, C G Q L}^{'} (β^{*})]}{\partial β^{*}} : (p + 1) (C - 1) \times (C - 1),

for all

c = 1, \dots, C - 1,

which can be derived from the formula for the derivative matrix

\frac{\partial [{\hat{γ}}_{C G Q L}^{*^{'}} (β^{*})]}{\partial β} : (p + 1) (C - 1) \times {(C - 1)}^{2}

as follows:

\begin{matrix} \frac{\partial [{\hat{γ}}_{C G Q L}^{*^{'}} (β^{*})]}{\partial β^{*}} \\ = & - [\{\sum_{t = 2}^{T} (η_{(t | t - 1), M}^{*} (β^{*}, γ^{*}) \otimes x_{t}) {Σ^{- 1}}_{t | t - 1} (β^{*}, γ^{*}) (η_{(t | t - 1), M}^{*} (β^{*}, γ^{*}) \otimes y_{t - 1}^{'})\} \\ \times & {\{\sum_{t = 2}^{T} [(η_{(t | t - 1), M}^{*} (β^{*}, γ^{*}) \otimes y_{t - 1}) {Σ^{- 1}}_{t | t - 1} (β^{*}, γ^{*}) {(η_{(t | t - 1), M}^{*} (β^{*}, γ^{*}) \otimes y_{t - 1})}^{'}]\}}^{- 1}] \\ = & [\frac{\partial [{\hat{γ}}_{1, C G Q L}^{'} (β^{*})]}{\partial β^{*}}, \dots, \frac{\partial [{\hat{γ}}_{c, C G Q L}^{'} (β^{*})]}{\partial β^{*}}, \\ \dots, \frac{\partial [{\hat{γ}}_{(C - 1), C G Q L}^{'} (β^{*})]}{\partial β^{*}} {] : (p + 1) (C - 1) \times (C - 1)}^{2} . \end{matrix}

(27)

Proof.

Because the CGQL estimating function for

γ^{*}

, that is,

{\hat{γ}}_{C G Q L}^{*} (β^{*}),

is obtained from (22) at its final iteration stage, the estimating function has the form

\begin{matrix} {\hat{γ}}_{C G Q L}^{*} (β^{*}) = γ^{*} \\ + & [{\{\sum_{t = 2}^{T} [(η_{(t | t - 1), M}^{*} (β^{*}, γ^{*}) \otimes y_{t - 1}) {Σ^{- 1}}_{t | t - 1} (β^{*}, γ^{*}) {(η_{(t | t - 1), M}^{*} (β^{*}, γ^{*}) \otimes y_{t - 1})}^{'}]\}}^{- 1} \\ \times & \{\sum_{t = 2}^{T} (η_{(t | t - 1), M}^{*} (β^{*}, γ^{*}) \otimes y_{t - 1}) {Σ^{- 1}}_{t | t - 1} (β^{*}, γ^{*}) [y_{t} - η_{t | t - 1} (β^{*}, γ^{*})]\}] . \end{matrix}

(28)

The lemma now follows, first because the

β^{*}

involved in the first derivative as well as in the inverse covariance matrix

{Σ^{- 1}}_{t | t - 1} (β^{*}, γ^{*})

is treated to be known from the previous, i.e., from the second-last iteration, and next because by a similar operation as (19), the derivative of the second term in (28) follows from the formula

\begin{matrix} \frac{\partial η_{t | t - 1}^{'} (β^{*}, γ^{*})}{\partial β^{*}} = [\frac{\partial η_{t | t - 1}^{(1)} (β^{*}, γ^{*})}{\partial β^{*}}, \dots, \frac{\partial η_{t | t - 1}^{(c)} (β^{*}, γ^{*})}{\partial β^{*}}, \dots, \frac{\partial η_{t | t - 1}^{(C - 1)} (β^{*}, γ^{*})}{\partial β^{*}}] \\ = & (\begin{matrix} η_{t | t - 1}^{(1)} (δ_{(t - 1) 1} - η_{t | t - 1}) & \dots & η_{t | t - 1}^{(C - 1)} (δ_{(t - 1) (C - 1)} - η_{t | t - 1}) \end{matrix}) \otimes x_{t} \\ = & η_{(t | t - 1), M}^{*} (β^{*}, γ^{*}) \otimes x_{t}, (C - 1) (p + 1) \times (C - 1) . \end{matrix}

(29)

□

Lemma 4.

Computation of

\frac{\partial {\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))}{\partial β^{*}} : (p + 1) (C - 1) \times 1

(continued). This derivative has the formula given by

\begin{matrix} \frac{\partial {\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))}{\partial β^{*}} & = & {\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*})) [(δ_{(t) c} \otimes x_{t}) + \frac{\partial [{\hat{γ}}_{c, C G Q L}^{'} (β^{*})]}{\partial β^{*}} y_{t - 1}^{(g)}] \\ - & {\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*})) \sum_{ν}^{C - 1} {\tilde{η}}_{t | t - 1}^{(ν | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*})) \\ \times & [(δ_{(t) ν} \otimes x_{t}) + \frac{\partial [{\hat{γ}}_{ν, C G Q L}^{'} (β^{*})]}{\partial β^{*}} y_{t - 1}^{(g)}], \end{matrix}

(30)

where, for example,

\frac{\partial [{\hat{γ}}_{c, C G Q L}^{'} (β^{*})]}{\partial β^{*}}

is the

(p + 1) (C - 1) \times (C - 1)

-dimensional cth component matrix in (27) for all

c = 1, \dots, C - 1,

and

\frac{\partial [{\hat{γ}}_{C, C G Q L}^{'} (β^{*})]}{\partial β^{*}} = 0,

δ_{(t) C} = 01_{C - 1}

without any loss of generality.

Proof.

Re-express the formula for

{\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}); x_{t})

in (24) as

\begin{matrix} {\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}); x_{t}) \\ = & \{\begin{matrix} \frac{N_{c}}{D}, & for c = 1, \dots, C - 1 \\ \frac{1}{D}, & for c = C, \end{matrix} \end{matrix}

(31)

where

N_{c} = \exp [x_{t}^{'} β_{c} + {\hat{γ}}_{c, C G Q L}^{'} (β^{*}) y_{t - 1}^{(g)}],

and

D = 1 + \sum_{ν = 1}^{C - 1} \exp [x_{t}^{'} β_{ν} + {\hat{γ}}_{ν, C G Q L}^{'} (β^{*}) y_{t - 1}^{(g)}] .

It then follows that

\begin{matrix} \frac{\partial {\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))}{\partial β^{*}} = {\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*})) [\frac{\partial}{\partial β^{*}} \{x_{t}^{'} β_{c} + {\hat{γ}}_{c, C G Q L}^{'} (β^{*}) y_{t - 1}^{(g)}\} \\ - \frac{1}{D} \sum_{ν = 1}^{C - 1} [\exp (x_{t}^{'} β_{ν} + {\hat{γ}}_{ν, C G Q L}^{'} (β^{*}) y_{t - 1}^{(g)}) \frac{\partial}{\partial β^{*}} \{x_{t}^{'} β_{ν} + {\hat{γ}}_{ν, C G Q L}^{'} (β^{*}) y_{t - 1}^{(g)}\}]] . \end{matrix}

(32)

The formula in (30) follows from (32) because

{\tilde{η}}_{t | t - 1}^{(ν | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*})) = \frac{1}{D} \exp (x_{t}^{'} β_{ν} + {\hat{γ}}_{ν, C G Q L}^{'} (β^{*}) y_{t - 1}^{(g)}), and \frac{\partial}{\partial β^{*}} \{x_{t}^{'} β_{ν}\} = (δ_{(t) ν} \otimes x_{t}) .

□

Simplified likelihood estimating equation for $β^{*}$ : Notice that by using the derivative formulas from (26) and (32), one may reduce the the likelihood estimating Equation (25) as

\begin{matrix} \frac{\partial L o g L (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))}{\partial β^{*}} = \sum_{c = 1}^{C} y_{1 c} [δ_{(1) c} - π_{(1)} (β^{*})] \otimes x_{1} \\ + & \sum_{t = 2}^{T} \sum_{g = 1}^{C} \sum_{c = 1}^{C} y_{t c} [\{(δ_{(t) c} \otimes x_{t}) + \frac{\partial [{\hat{γ}}_{c, C G Q L}^{'} (β^{*})]}{\partial β^{*}} y_{t - 1}^{(g)}\} \\ - & \sum_{ν}^{C - 1} {\tilde{η}}_{t | t - 1}^{(ν | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*})) \{(δ_{(t) ν} \otimes x_{t}) + \frac{\partial [{\hat{γ}}_{ν, C G Q L}^{'} (β^{*})]}{\partial β^{*}} y_{t - 1}^{(g)}\}] = 0, \end{matrix}

(33)

which is easily computable as the formulas for

\frac{\partial [{\hat{γ}}_{c, C G Q L}^{'} (β^{*})]}{\partial β^{*}}

for all

c = 1, \dots, C - 1

involved in this reduced form are available from (27). For

c = C

one uses

\frac{\partial [{\hat{γ}}_{C, C G Q L}^{'} (β^{*})]}{\partial β^{*}} = 0,

and

δ_{(t) C} = 01_{C - 1} .

Lemma 5.

The likelihood Equation (33) for

β^{*}

may be obtained by using the iterative equation

\begin{matrix} {\hat{β}}^{*} (r + 1) = {\hat{β}}^{*} (r) \\ + & {[{\{\frac{\partial^{2} L o g L (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))}{\partial β^{*} \partial {β^{*}}^{'}}\}}^{- 1} \frac{\partial L o g L (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))}{\partial β^{*}}]}_{β^{*} = {\hat{β}}^{*} (r)}, \end{matrix}

(34)

where, under the assumption that

β^{*}

involved in the derivative

\frac{\partial {\hat{γ}}_{c, C G Q L}^{'} (β^{*})}{\partial β^{*}}

in (30) or (33) for

c = 1, \dots, C - 1,

is known from the previous iteration, the second-order derivative in (34), by (33), has the formula

\begin{matrix} \frac{\partial^{2} L o g L (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))}{\partial β^{*} \partial {β^{*}}^{'}} \\ = & - \sum_{c = 1}^{C} y_{1 c} [\frac{\partial π_{(1)} (β^{*})}{\partial {β^{*}}^{'}}] \otimes x_{1} \\ - & \sum_{t = 2}^{T} \sum_{g = 1}^{C} \sum_{c = 1}^{C} y_{t c} [\sum_{ν}^{C - 1} \{(δ_{(t) ν} \otimes x_{t}) + \frac{\partial [{\hat{γ}}_{ν, C G Q L}^{'} (β^{*})]}{\partial β^{*}} y_{t - 1}^{(g)}\} \frac{\partial {\tilde{η}}_{t | t - 1}^{(ν | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))}{\partial {β^{*}}^{'}}], \end{matrix}

(35)

where

π_{(1)} (β^{*}) = {[π_{(i) 1}, \dots, π_{(i) c}, \dots, π_{(i) (C - 1)}]}^{'}

and

\frac{\partial π_{(1) c} (β^{*})}{\partial β^{*}} = [π_{(1) c} (β^{*}) (δ_{(1) c} - π_{(1)} (β^{*}))] \otimes x_{1} : (C - 1) (p + 1) \times 1,

by Lemma 2; and

\frac{\partial {\tilde{η}}_{t | t - 1}^{(ν | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))}{\partial {β^{*}}^{'}}

has the same formula as in (30).

3. Asymptotics

3.1. Consistency of ${\hat{γ}}_{C G Q L}^{} (β^{})$ for Dynamic Dependence Parameter $γ^{*}$

For this purpose, we will derive the asymptotic (as

T \to \infty

) distribution of

{\hat{γ}}_{C G Q L}^{*} (β^{*})

(solution of moment Equation (22)) as in Theorem 1 given below. Recall that the CGQL estimating function for

γ,

i.e., the left-hand side of (18) [see also (19)] has the formula

\begin{matrix} f_{T} (β^{*}, γ^{*}) = \sum_{t = 2}^{T} \frac{\partial η_{t | t - 1}^{'} (β^{*}, γ^{*})}{\partial γ^{*}} {Σ^{- 1}}_{t | t - 1} (β^{*}, γ^{*}) [y_{t} - η_{t | t - 1} (β^{*}, γ^{*})] \\ = & \sum_{t = 2}^{T} [η_{(t | t - 1), M}^{*} (β^{*}, γ^{*}) \otimes y_{t - 1}] {Σ^{- 1}}_{t | t - 1} (β^{*}, γ^{*}) [y_{t} - η_{t | t - 1} (β^{*}, γ^{*})] \end{matrix}

(36)

so that

{\hat{γ}}_{C G Q L}^{*} (β^{*})

satisfies

\begin{matrix} f_{T} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*})) = \sum_{t = 2}^{T} [{\tilde{η}}_{(t | t - 1), M}^{*} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*})) \otimes y_{t - 1}] \\ \times & {\tilde{Σ}}^{- 1}_{t | t - 1} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*})) [y_{t} - {\tilde{η}}_{t | t - 1} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))] = 0 . \end{matrix}

(37)

Notice that conditional on the past history,

y_{t}

in (36) for

t = 2, \dots, T,

depends on only the lag 1 response

y_{t - 1} .

Thus, conditional on

y_{t - 1}

, all

y_{t}

may be treated to be independent. Consequently,

f_{T} (β^{*}, γ^{*})

in (36) is, conditionally, a sum of

T - 1

independent quantities. Furthermore, as shown in (16),

Y_{t} | y_{t - 1} \sim M u l t (η_{t | t - 1} (β^{*}, γ^{*}), Σ_{t | t - 1} (β, γ)) .

Next, for true

γ^{*},

using (36), we write

\begin{matrix} {\bar{f}}_{T} (β^{*}, γ^{*}) & = & \frac{1}{T - 1} \sum_{t = 2}^{T} f_{t} (β^{*}, γ^{*}) \\ = & \frac{1}{T - 1} \sum_{t = 2}^{T} [η_{(t | t - 1), M}^{*} (β^{*}, γ^{*}) \otimes y_{t - 1}] {Σ^{- 1}}_{t | t - 1} (β^{*}, γ^{*}) [y_{t} - η_{t | t - 1} (β^{*}, γ^{*})] . \end{matrix}

(38)

Because

Y_{t} | y_{t - 1}

has the aforementioned multinomial distribution (see also (16)), it then follows that

\begin{matrix} E [{\bar{f}}_{T} (β^{*}, γ^{*})] & = & 0 \\ cov [{\bar{f}}_{T} (β^{*}, γ^{*})] & = & \frac{1}{{(T - 1)}^{2}} \sum_{t = 2}^{T} [η_{(t | t - 1), M}^{*} (β^{*}, γ^{*}) \otimes y_{t - 1}] \\ \times & {Σ^{- 1}}_{t | t - 1} (β^{*}, γ^{*}) {[η_{(t | t - 1), M}^{*} (β^{*}, γ^{*}) \otimes y_{t - 1}]}^{'} \end{matrix}

(39)

\begin{matrix} = & \frac{1}{{(T - 1)}^{2}} V_{T}^{*} (β^{*}, γ^{*}), (say) . \end{matrix}

(40)

We now derive the asymptotic distribution of

{\hat{γ}}_{C G Q L}^{*} (β^{*})

as in the following theorem.

Theorem 1.

We assume that

f_{t} (\cdot)

in (38) satisfy the Lindeberg condition, that is,

\lim_{T \to \infty} {V^{*}}_{T}^{- 1} \sum_{t = 2}^{T} \sum_{(f_{t}^{'} {V^{*}}_{T}^{- 1} f_{t}) > ϵ} f_{t} f_{t}^{'} g (f_{t}) = 0

(41)

for all

ϵ > 0, g (\cdot)

being the probability distribution of

f_{t} (\cdot) .

Then

\begin{matrix} \lim_{T \to \infty} {\hat{γ}}_{C G Q L}^{*} (β^{*}) \\ \to & N (γ^{*}, \{E_{Y_{t - 1}} \sum_{t = 2}^{T} [(η_{(t | t - 1), M}^{*} (β^{*}, γ^{*}) \otimes y_{t - 1}) {Σ^{- 1}}_{t | t - 1} (β^{*}, γ^{*}) \\ \times & {(η_{(t | t - 1), M}^{*} (β^{*}, γ^{*}) \otimes y_{t - 1})}^{'}]\}^{- 1} E_{Y_{t - 1}} [V_{T}^{*} (β^{*}, γ^{*})] \\ \times & {\{E_{Y_{t - 1}} \sum_{t = 2}^{T} [(η_{(t | t - 1), M}^{*} (β^{*}, γ^{*}) \otimes y_{t - 1}) {Σ^{- 1}}_{t | t - 1} (β^{*}, γ^{*}) {(η_{(t | t - 1), M}^{*} (β^{*}, γ^{*}) \otimes y_{t - 1})}^{'}]\}}^{- 1}) . \end{matrix}

(42)

Proof.

Let

\begin{matrix} Z_{T} & = & {[cov [{\bar{f}}_{T} (β^{*}, γ^{*})]]}^{- \frac{1}{2}} {\bar{f}}_{T} (β^{*}, γ^{*}) \\ = & (T - 1) {[V_{T}^{*} (β^{*}, γ^{*})]}^{- \frac{1}{2}} {\bar{f}}_{T} (β^{*}, γ^{*}), \end{matrix}

(43)

where

{\bar{f}}_{T} (β^{*}, γ^{*}) = \frac{1}{T - 1} \sum_{t = 2}^{T} f_{t} (β^{*}, γ^{*})

as in (38). Here

{f_{t} (\cdot)}

s are not identically distributed because, by (16),

Y_{t} | y_{t - 1} \sim Mult (η_{t | t - 1} (β^{*}, γ^{*}), Σ_{t | t - 1} (β^{*}, γ^{*})),

(44)

where the mean vectors and covariance matrices are different at different time points

t .

However, because

{f_{t} (\cdot)}

s satisfy the Lindeberg condition (41), it then follows from the Lindeberg–Feller central limit theorem ([6], Theorem 3.3.6, [23], Theorem 2.2) that

Z_{T}

in (43) has the following limiting distribution:

\lim_{T \to \infty} Z_{T} \to N (0, I_{{(C - 1)}^{2}}) .

(45)

Next, because

{\hat{γ}}_{C G Q L}^{*} (β^{*})

obtained by (22) is a solution of (18), it satisfies (37), i.e.,

\begin{matrix} \sum_{t = 2}^{T} f_{t} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*})) \\ = & \sum_{t = 2}^{T} [{\tilde{η}}_{(t | t - 1), M}^{*} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*})) \otimes y_{t - 1}] \\ \times & {\tilde{Σ}}^{- 1}_{t | t - 1} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*})) [y_{t} - {\tilde{η}}_{t | t - 1} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))] = 0, \end{matrix}

(46)

which, by first-order Taylor’s series expansion, produces

\sum_{t = 2}^{T} f_{t} (β^{*}, γ^{*}) + ({\hat{γ}}_{C G Q L}^{*} (β^{*}) - γ^{*}) \sum_{t = 2}^{T} \frac{\partial f_{t} (β^{*}, γ^{*})}{\partial {γ^{*}}^{'}} ≃ 0 .

(47)

Thus,

\begin{matrix} [{\hat{γ}}_{C G Q L}^{*} (β^{*}) - γ^{*}] ≃ - {[\sum_{t = 2}^{T} \frac{\partial f_{t} (β^{*}, γ^{*})}{\partial {γ^{*}}^{'}}]}^{- 1} \sum_{t = 2}^{T} f_{t} (β^{*}, γ^{*}) \\ \to & - \{- E_{Y_{t - 1}} \sum_{t = 2}^{T} [(η_{(t | t - 1), M}^{*} (β^{*}, γ^{*}) \otimes y_{t - 1}) {Σ^{- 1}}_{t | t - 1} (β^{*}, γ^{*}) \\ \times & {(η_{(t | t - 1), M}^{*} (β^{*}, γ^{*}) \otimes y_{t - 1})}^{'}]\}^{- 1} E_{Y_{t - 1}} \sum_{t = 2}^{T} f_{t} (β^{*}, γ^{*}) \\ = & {\{E_{Y_{t - 1}} \sum_{t = 2}^{T} [(η_{(t | t - 1), M}^{*} (β^{*}, γ^{*}) \otimes y_{t - 1}) {Σ^{- 1}}_{t | t - 1} (β^{*}, γ^{*}) {(η_{(t | t - 1), M}^{*} (β^{*}, γ^{*}) \otimes y_{t - 1})}^{'}]\}}^{- 1} \\ \times & E_{Y_{t - 1}} [{[V_{T}^{*} (β^{*}, γ^{*})]}^{\frac{1}{2}} {[V_{T}^{*} (β^{*}, γ^{*})]}^{- \frac{1}{2}} (T - 1) {\bar{f}}_{T} (β^{*}, γ^{*})] \\ = & {\{E_{Y_{t - 1}} \sum_{t = 2}^{T} [(η_{(t | t - 1), M}^{*} (β^{*}, γ^{*}) \otimes y_{t - 1}) {Σ^{- 1}}_{t | t - 1} (β^{*}, γ^{*}) {(η_{(t | t - 1), M}^{*} (β^{*}, γ^{*}) \otimes y_{t - 1})}^{'}]\}}^{- 1} \\ \times & {[E_{Y_{t - 1}} V_{T}^{*} (β^{*}, γ^{*})]}^{\frac{1}{2}} Z_{T} \\ = & {[E_{Y_{t - 1}} V_{T}^{*} (β^{*}, γ^{*})]}^{- \frac{1}{2}} Z_{T} \end{matrix}

(48)

by (40) and (43). The theorem, i.e., (42), follows from (48) because the limiting distribution

(q (\cdot))

of

Z_{T}

is normal, that is,

\lim_{T \to \infty} q (Z_{T}) \to N (0, I_{{(C - 1)}^{2}})

by (45). Furthermore, because the quantity in the right-hand side of (48) can be re-expressed by (40) as

E_{Y_{t - 1}} {[{(T - 1)}^{2} cov ({\bar{f}}_{T} (β^{*}, γ^{*}) | y_{t - 1})]}^{- \frac{1}{2}} Z_{T},

it then follows under a mild regularity condition (i.e., by assuming that the covariance

E_{Y_{t - 1}} cov ({\bar{f}}_{T} (β^{*}, γ^{*}) | y_{t - 1})

conditional on the past history is finite) that

\lim_{T \to \infty} [{\hat{γ}}_{C G Q L}^{*} (β^{*}) - γ^{*}] \to 0,

showing that

{\hat{γ}}_{C G Q L}^{*} (β^{*})

is a mean squared error consistent function for

γ^{*},

for any

β^{*} .

□

We remark that a Lindeberg’s condition similar to (41) was examined earlier in [14]. For details on this, we refer to ([14], Assumption N(iii), p. 89) for Lindeberg’s condition, and ([14], second paragraph under proof of Theorem 1, p. 93) for its proof.

3.2. Consistency of MML Estimtor ${\hat{β}}^{}$ for $β^{}$

Let

h (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))

denote the likelihood estimating function, i.e., the left-hand side of the likelihood estimating Equation (25) [see also (33)] for

β^{*} .

We further express this function as

h (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*})) = [h_{1} (β^{*}) + h_{T, 2} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))],

(49)

where

\begin{matrix} h_{1} (β^{*}) & = & \sum_{c = 1}^{C} \frac{y_{1 c}}{π_{(1) c} (β^{*})} \frac{\partial π_{(1) c} (β^{*})}{\partial β^{*}}, and \\ h_{T, 2} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*})) & = & \sum_{t = 2}^{T} \sum_{g = 1}^{C} \sum_{c = 1}^{C} [\frac{y_{t c}}{{\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))} \frac{\partial {\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))}{\partial β^{*}}] . \end{matrix}

(50)

The results from the following Lemma 6 will be used in Theorem 2 below to derive the asymptotic as a (

T \to \infty

) distribution of

{\hat{β}}^{*}

(solution of likelihood estimating Equation (25)).

Lemma 6.

Let

{\bar{h}}_{T} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*})) = \frac{1}{T} h (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*})) .

This

(p + 1)

-dimensional mean vector function has the expectation and the

(p + 1) \times (p + 1)

conditional (on lag 1 response) covariance matrix as

\begin{matrix} E [{\bar{h}}_{T} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))] = \frac{1}{T} E [h (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))] = 0, a n d \\ cov ({\bar{h}}_{T} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))) \\ = & \frac{1}{T^{2}} [\{\sum_{c = 1}^{C} \frac{1 - π_{(1) c} (β^{*})}{π_{(1) c} (β^{*})} \frac{\partial π_{(1) c} (β^{*})}{\partial β^{*}} \frac{\partial π_{(1) c} (β^{*})}{\partial {β^{*}}^{'}} - \sum_{u \neq c}^{C} \frac{\partial π_{(1) c} (β^{*})}{\partial β^{*}} \frac{\partial π_{(1) u} (β^{*})}{\partial {β^{*}}^{'}}\} \\ + & \sum_{t = 2}^{T} \sum_{g = 1}^{C} \{(\sum_{c = 1}^{C} \frac{1 - {\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))}{{\tilde{η}}_{t | t - 1}^{(c | g_{j})} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))} \frac{\partial {\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))}{\partial β^{*}} \frac{\partial {\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))}{\partial {β^{*}}^{'}}) \\ - & (\sum_{c \neq u}^{C} \frac{\partial {\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))}{\partial β^{*}} \frac{\partial {\tilde{η}}_{t | t - 1}^{(u | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))}{\partial {β^{*}}^{'}})\}], \end{matrix}

(51)

respectively.

Proof.

Because

π_{(1) C} (β^{*}) = 1 - \sum_{c = 1}^{C - 1} π_{(1) c} (β^{*})

and

{\tilde{η}}_{t | t - 1}^{(C | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*})) = 1 - \sum_{c = 1}^{C - 1} {\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*})),

it follows from (50) that

\begin{matrix} E [h_{1} (β^{*})] & = & \sum_{c = 1}^{C} \frac{E (Y_{1 c})}{π_{(1) c} (β^{*})} \frac{\partial π_{(1) c} (β^{*})}{\partial β^{*}} \\ = & \sum_{c = 1}^{C} \frac{\partial π_{(1) c} (β^{*})}{\partial β^{*}} = 0 and \end{matrix}

(52)

\begin{matrix} E [h_{T, 2} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))] & = & \sum_{t = 2}^{T} \sum_{g = 1}^{C} \sum_{c = 1}^{C} [\frac{E (Y_{t c} | y_{(t - 1), g})}{{\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))} \frac{\partial {\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))}{\partial β^{*}}] \\ = & \sum_{t = 2}^{T} \sum_{g = 1}^{C} \sum_{c = 1}^{C} [\frac{\partial {\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))}{\partial β^{*}}] = 0, \end{matrix}

(53)

Next because

h (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*})) = [h_{1} (β^{*}) + h_{T, 2} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))]

as stated in (49), one writes

\begin{matrix} E [{\bar{h}}_{T} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))] & = & E [\frac{1}{T} h (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))] \\ = & \frac{1}{T} E [h_{1} (β^{*})] + \frac{1}{T} E [h_{T, 2} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))] . \end{matrix}

Hence by applying (52) and (53) we obtain

E [{\bar{h}}_{T} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))] = 0

as stated in the lemma.

Next we compute the

(p + 1) \times (p + 1)

conditional (on the history up to time

t - 1

(

H_{t - 1}

)) covariance matrix of the mean vector function

{\bar{h}}_{T} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))

as

\begin{matrix} cov ({\bar{h}}_{T} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))) = \frac{1}{T^{2}} [cov (h_{1} (β^{*})) + cov (h_{T, 2} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*})))] \\ = & \frac{1}{T^{2}} [\{\sum_{c = 1}^{C} \frac{var (Y_{1 c})}{π_{(1) c}^{2} (β^{*})} \frac{\partial π_{(1) c} (β^{*})}{\partial β^{*}} \frac{\partial π_{(1) c} (β^{*})}{\partial {β^{*}}^{'}} + \sum_{u \neq c}^{C} \frac{cov (Y_{1 c}, Y_{1 u})}{π_{(1) c} (β^{*}) π_{(1) u} (β^{*})} \frac{\partial π_{(1) c} (β^{*})}{\partial β^{*}} \frac{\partial π_{(1) u} (β^{*})}{\partial {β^{*}}^{'}}\} \\ + & \sum_{t = 2}^{T} \sum_{g = 1}^{C} \{\sum_{c = 1}^{C} \frac{var (Y_{t c} | H_{t - 1})}{{[{\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))]}^{2}} \frac{\partial {\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))}{\partial β^{*}} \frac{\partial {\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))}{\partial {β^{*}}^{'}}\} \\ + & \{\sum_{c \neq u}^{C} \frac{cov ((Y_{t c}, Y_{t u}) | H_{t - 1})}{[{\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))] [{\tilde{η}}_{t | t - 1}^{(u | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))]} \frac{\partial {\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))}{\partial β^{*}} \frac{\partial {\tilde{η}}_{t | t - 1}^{(u | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))}{\partial {β^{*}}^{'}}\}], \end{matrix}

(54)

yielding the second part of the lemma because

\begin{matrix} var (Y_{1 c} | H_{t - 1}) & = & π_{(1) c} (β^{*}) [1 - π_{(1) c} (β^{*})] \\ cov (Y_{1 c}, Y_{1 u}) & = & - π_{(1) c} (β^{*}) π_{(1) u} (β^{*}) \\ var (Y_{t c} | H_{t - 1}) & = & {\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*})) [1 - {\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))], t = 2, \dots, T \\ cov ((Y_{t c}, Y_{t u}) | H_{t - 1}) & = & - {\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*})) {\tilde{η}}_{t | t - 1}^{(u | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*})), t = 2, \dots, T . \end{matrix}

□

Lemma 7.

The second-order derivative matrix

\frac{\partial^{2} π_{(1) c} (β^{*})}{\partial β^{*} \partial {β^{*}}^{'}}

has the formula

\begin{matrix} \frac{\partial^{2} π_{(1) c} (β^{*})}{\partial β^{*} \partial {β^{*}}^{'}} \\ = & (\begin{matrix} [- π_{(1) 1} π_{(1) c} \{{(δ_{(1) c} + δ_{(1) 1} - 2 π_{(1)})}^{'} \otimes x_{1}^{'}\}] \otimes x_{1} \\ [- π_{(1) 2} π_{(1) c} \{{(δ_{(1) c} + δ_{(1) 2} - 2 π_{(1)})}^{'} \otimes x_{1}^{'}\}] \otimes x_{1} \\ ⋮ \\ [π_{(1) c} (1 - 2 π_{(1) c}) \{{(δ_{(1) c} - π_{(1)})}^{'} \otimes x_{1}^{'}\}] \otimes x_{1} \\ ⋮ \\ [- π_{(1) (C - 1)} π_{(1) c} \{{(δ_{(1) c} + δ_{(1) (C - 1)} - 2 π_{(1)})}^{'} \otimes x_{1}^{'}\}] \otimes x_{1} \end{matrix}), \end{matrix}

(55)

with

π_{(1)} = {[π_{(1) 1}, \dots, π_{(1) c}, \dots, π_{(1) (C - 1)}]}^{'};

and the second-order derivative matrix

\frac{\partial^{2} {\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))}{\partial β^{*} \partial {β^{*}}^{'}}

has an approximate formula

\begin{matrix} \frac{\partial^{2} {\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))}{\partial β^{*} \partial {β^{*}}^{'}} \\ \approx & [(δ_{(t) c} \otimes x_{t}) + \frac{\partial [{\hat{γ}}_{c, C G Q L}^{'} (β^{*})]}{\partial β^{*}} y_{t - 1}^{(g)}] \frac{\partial {\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β))}{\partial {β^{*}}^{'}} \\ - & \sum_{ν}^{C - 1} [{\tilde{η}}_{t | t - 1}^{(ν | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*})) \{(δ_{(t) ν} \otimes x_{t}) + \frac{\partial [{\hat{γ}}_{ν, C G Q L}^{'} (β^{*})]}{\partial β^{*}} y_{t - 1}^{(g)}\}] \frac{\partial {\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))}{\partial {β^{*}}^{'}} \\ - & {\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*})) \sum_{ν}^{C - 1} [\{(δ_{(t) ν} \otimes x_{t}) + \frac{\partial [{\hat{γ}}_{ν, C G Q L}^{'} (β^{*})]}{\partial β^{*}} y_{t - 1}^{(g)}\} \\ \times & \frac{\partial {\tilde{η}}_{t | t - 1}^{(ν | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))}{\partial {β^{*}}^{'}}] . \end{matrix}

(56)

Proof.

To derive the formula in (55), we first recall from Lemma 2 or more specifically from (26) that

\frac{\partial π_{(1) c}}{\partial β^{*}} = [π_{(1) c} (δ_{(1) c} - π_{(1)})] \otimes x_{1} : (C - 1) (p + 1) \times 1 .

A further derivative then produces

\begin{matrix} \frac{\partial^{2} π_{(1) c} (β^{*})}{\partial β^{*} \partial {β^{*}}^{'}} = \frac{\partial}{\partial {β^{*}}^{'}} [\{π_{(1) c} (δ_{(1) c} - π_{(1)})\} \otimes x_{1}] \\ = & \frac{\partial}{\partial {β^{*}}^{'}} [(\begin{matrix} - π_{(1) 1} π_{(1) c} \\ ⋮ \\ π_{(1) c} [1 - π_{(1) c}] \\ ⋮ \\ - π_{(1) (C - 1)} π_{(1) c} \end{matrix}) \otimes x_{1}], \end{matrix}

(57)

which yields the formula in (55) after some algebraic calculations.

Computing the exact derivative matrix

\frac{\partial^{2} {\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))}{\partial β^{*} \partial {β^{*}}^{'}}

is algebraically complicated. The approximate Formula (56) follows from (30) under the assumption that

β^{*}

involved in the derivative

\frac{\partial {\hat{γ}}_{c, C G Q L}^{'} (β^{*})}{\partial β^{*}}

in (30) for

c = 1, \dots, C - 1

is known from a previous iteration.

We now provide the asymptotic distribution of

{\hat{β}}^{*}

(solution of (25) or (33)) as in the following theorem. □

Theorem 2.

Denote the covariance matrix of

{\bar{h}}_{T} (β^{*})

computed in Lemma 6 by

cov ({\bar{h}}_{T} (β^{*})) = \frac{1}{T^{2}} P_{T} (β^{*}) .

Next, assume that

(h_{1} (β^{*}) + h_{T, 2} (β^{*}))

in (49) satisfies the Lindeberg condition, that is,

\lim_{T \to \infty} P_{T}^{- 1} \sum_{{{(h_{1} + h_{T, 2})}^{'} P_{T}^{- 1} (h_{1} + h_{T, 2})} > ϵ} {(h_{1} + h_{T, 2}) {(h_{1} + h_{T, 2})}^{'} g (h_{1} + h_{T, 2})} = 0

(58)

for all

ϵ > 0, g (\cdot)

being the probability distribution of

(h_{1} + h_{T, 2}) .

Then, the limiting distribution of

{\hat{β}}^{*}

(say,

q ({\hat{β}}^{*}

)) is normal, and is given by

\begin{matrix} \lim_{T \to \infty} q {(\hat{β})}^{*} & \to & N (β^{*}, {[E_{y} \frac{\partial (h_{1} + h_{T, 2})}{\partial {β^{*}}^{'}}]}^{- 1} P_{T} (β^{*}) \\ \times & {[E_{y} \frac{\partial (h_{1} + h_{T, 2})}{\partial {β^{*}}^{'}}]}^{- 1}) . \end{matrix}

(59)

Proof.

The proof of this theorem is similar to that of Theorem 1. The difference lies between the forms of the functions

f_{t} (\cdot)

in Theorem 1 and

(h_{1} (β^{*}) + h_{T, 2} (β^{*})

in the present theorem. Thus, by a similar justification as in (48), under some mild regularity condition, it follows that

\lim_{T \to \infty} {\hat{β}}^{*} \to β^{*},

showing that

{\hat{β}}^{*}

is consistent for

β^{*} .

Details are omitted.

Note that to compute the expected function in (59), that is,

E_{y} \frac{\partial (h_{1} + h_{T, 2})}{\partial {β^{*}}^{'}},

one may first assume that

β^{*}

involved in the first-order derivatives in (50) is known from the previous iteration, and then an approximate second-order derivative as

\begin{matrix} \frac{\partial h_{1} (β^{*})}{\partial {β^{*}}^{'}} & = & - \sum_{c = 1}^{C} \frac{y_{1 c}}{π_{(1) c}^{2} (β^{*})} \frac{\partial π_{(1) c} (β^{*})}{\partial β^{*}} \frac{\partial π_{(1) c} (β^{*})}{\partial {β^{*}}^{'}}, and \\ \frac{\partial h_{T, 2} (β^{*})}{\partial {β^{*}}^{'}} & = & - \sum_{t = 2}^{T} \sum_{g = 1}^{C} \sum_{c = 1}^{C} [\frac{y_{t c}}{{[{\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))]}^{2}} \frac{\partial {\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))}{\partial β^{*}} \frac{\partial {\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*} {\hat{γ}}_{C G Q L}^{*} (β^{*}))}{\partial {β^{*}}^{'}}], \end{matrix}

(60)

yielding the expectation as

\begin{matrix} E_{y} \frac{\partial (h_{1} + h_{2})}{\partial {β^{*}}^{'}} = - [\sum_{c = 1}^{C} \frac{1}{π_{(1) c} (β^{*})} \frac{\partial π_{(1) c} (β^{*})}{\partial β^{*}} \frac{\partial π_{(1) c} (β^{*})}{\partial {β^{*}}^{'}} \\ + & \sum_{t = 2}^{T} \sum_{g = 1}^{C} \sum_{c = 1}^{C} \{\frac{1}{{\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))} \frac{\partial {\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))}{\partial β^{*}} \frac{\partial {\tilde{η}}_{t | t - 1}^{(c | g)} (β^{*}, {\hat{γ}}_{C G Q L}^{*} (β^{*}))}{\partial {β^{*}}^{'}}\}] . \end{matrix}

(61)

□

4. Concluding Remarks

There has been a considerable number of studies on the inferences for multinomial distributions in a panel data setup, where a small number of repeated multinomial responses are collected from a large number of independent individuals. The dimension reduction of parameter space is also considered. We, however, did not include such panel data methods for discussion in this paper as the inferences are quite different under the panel data and time series setups, because in a time series setup a large number of repeated multinomial responses are taken from one unit or individual only. Returning to the time series setup, we have discussed in the paper following [18] (see also [16,17]) that there may be negative inference effects when joint estimation is performed for a large multinomial parameter space involving regression and dynamic dependence parameters. As a possible remedy, we have offered a parameter split or dimension-reduction estimation approach. The asymptotic properties, such as the consistency of the estimators of the dynamic dependence function (for any unknown regression parameters) and then for the main regression parameters, are studied in detail.

Author Contributions

Conceptualization, B.C.S. and R.P.R.; Methodology, B.C.S. and R.P.R.; Writing—original draft, B.C.S. and R.P.R.; Writing—review and editing, B.C.S. and R.P.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study.

Acknowledgments

The authors thank two reviewers for their comments that helped to improve the paper.

Conflicts of Interest

There are no conflicts of interest.

References

Jacobs, P.A.; Lewis, P.A.W. Discrete Time Series Generated by Mixtures I: Correlational and Runs Properties. J. R. Stat. Soc. 1978, 40, 94–105. [Google Scholar] [CrossRef]
Jacobs, P.A.; Lewis, P.A.W. Discrete Time Series Generated by Mixtures II: Asymptotic Properties. J. R. Stat. Soc. 1978, 40, 222–228. [Google Scholar] [CrossRef]
Jacobs, P.A.; Lewis, P.A.W. Stationary Discrete Autoregressive-moving Average Generated by Mixtures. J. Time Ser. Anal. 1983, 4, 19–36. [Google Scholar] [CrossRef]
Keenan, D.M. A Time Series Analysis of Binary Data. J. Am. Stat. Assoc. 1982, 77, 816–821. [Google Scholar] [CrossRef]
Manski, C.F. Semiparametric analysis of discrete response: Asymptotic properties of the maximum score estimator. J. Econom. 1985, 27, 313–333. [Google Scholar] [CrossRef]
Ameniya, T. Advanced Econometrics; Harvard University Press: Cambridge, MA, USA, 1985. [Google Scholar]
Tong, H. Nonlinear Time Series: A Dynamical System Approach; Oxford Statistical Science Series, 6; Oxford University Press: New York, NY, USA, 1990. [Google Scholar]
Horowitz, J. A smoothed maximum score estimator for the binary response model. Econometrica 1992, 60, 505–531. [Google Scholar] [CrossRef]
Park, J.Y.; Phillips, P.C.B. Non-stationary binary choice. Econometrica 2000, 68, 1249–1280. [Google Scholar] [CrossRef]
Moon, H.R. Maximum score estimation of a nonstationary binary choice model. J. Econom. 2004, 120, 385–403. [Google Scholar] [CrossRef]
Jiang, W.; Tanner, M.A. Risk minimization for the series binary choice with variable selection. Econom. Theory 2010, 26, 1437–1452. [Google Scholar] [CrossRef]
De Jong, R.M.; Woutersen, T. Dynamic time series binary choice. Econom. Theory 2011, 27, 673–702. [Google Scholar] [CrossRef][Green Version]
Fahrmeir, L.; Kaufmann, H. Regression models for non-stationary categorical time series. J. Time Ser. Anal. 1987, 8, 147–160. [Google Scholar] [CrossRef]
Kaufmann, H. Regression models for nonstationary categorical time series: Asymptotic estimation theory. Ann. Stat. 1987, 15, 79–98. [Google Scholar] [CrossRef]
Fokianos, K.; Kedem, B. Prediction and classification of non-stationary categorical time series. J. Multivar. Anal. 1998, 67, 277–296. [Google Scholar] [CrossRef]
Fokianos, K.; Kedem, B. Regression theory for categorical time series. Stat. Sci. 2003, 18, 357–376. [Google Scholar] [CrossRef]
Fokianos, K.; Kedem, B. Partial likelihood inference for time series following generalized linear models. J. Time Ser. Anal. 2004, 25, 173–197. [Google Scholar] [CrossRef]
Loredo-Osti, J.C.; Sutradhar, B.C. Estimation of regression and dynamic dependence parameters for non-stationary multinomial time series. J. Time Ser. Anal. 2012, 33, 458–467. [Google Scholar] [CrossRef]
Johnson, N.L.; Kotz, S. Continuous Univariate Distributions-2; John Wiley and Sons: Hoboken, NJ, USA, 1970. [Google Scholar]
Sutradhar, B.C.; Rao, R.P. Regression models for ordinal categorical time series data. In Advances in Time Series Methods and Applications; Li, W.K., Stanford, D.A., Yu, H., Eds.; Field Institute Communications; Springer: New York, NY, USA, 2016; Volume 78, pp. 179–194. [Google Scholar]
Wedderburn, R.W.M. Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method. Biometrika 1974, 61, 439–447. [Google Scholar]
Mallick, T.S.; Sutradhar, B.C. GQL versus conditional GQL inferences for non-stationary time series of counts with overdispersion. J. Time Ser. Anal. 2008, 29, 402–420. [Google Scholar] [CrossRef]
McDonald, D.R. The local limit theorem: A historical perspective. J. Iran. Stat. Soc. 2005, 4, 73–86. [Google Scholar]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sutradhar, B.C.; Rao, R.P. Asymptotic Theory for a Parameter Dimension-Split Estimation in Time Series Analysis for Multinomial Data. Mathematics 2026, 14, 2068. https://doi.org/10.3390/math14122068

AMA Style

Sutradhar BC, Rao RP. Asymptotic Theory for a Parameter Dimension-Split Estimation in Time Series Analysis for Multinomial Data. Mathematics. 2026; 14(12):2068. https://doi.org/10.3390/math14122068

Chicago/Turabian Style

Sutradhar, Brajendra C., and R. Prabhakar Rao. 2026. "Asymptotic Theory for a Parameter Dimension-Split Estimation in Time Series Analysis for Multinomial Data" Mathematics 14, no. 12: 2068. https://doi.org/10.3390/math14122068

APA Style

Sutradhar, B. C., & Rao, R. P. (2026). Asymptotic Theory for a Parameter Dimension-Split Estimation in Time Series Analysis for Multinomial Data. Mathematics, 14(12), 2068. https://doi.org/10.3390/math14122068

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Asymptotic Theory for a Parameter Dimension-Split Estimation in Time Series Analysis for Multinomial Data

Abstract

1. Introduction

2. Estimation of Parameters: A Dimension-Reduction Approach

2.1. CGQL Estimating Function for Dynamic Dependence Parameters $(γ^{})$ as a Function of Unknown $β^{}$

2.2. Modified Maximum Likelihood (MML) Estimation for $β^{*}$ Using Observed Information

3. Asymptotics

3.1. Consistency of ${\hat{γ}}_{C G Q L}^{} (β^{})$ for Dynamic Dependence Parameter $γ^{*}$

3.2. Consistency of MML Estimtor ${\hat{β}}^{}$ for $β^{}$

4. Concluding Remarks

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Asymptotic Theory for a Parameter Dimension-Split Estimation in Time Series Analysis for Multinomial Data

Abstract

1. Introduction

2. Estimation of Parameters: A Dimension-Reduction Approach

2.1. CGQL Estimating Function for Dynamic Dependence Parameters ( γ * ) as a Function of Unknown β *

2.2. Modified Maximum Likelihood (MML) Estimation for β * Using Observed Information

3. Asymptotics

3.1. Consistency of γ ^ C G Q L * ( β * ) for Dynamic Dependence Parameter γ *

3.2. Consistency of MML Estimtor β ^ * for β *

4. Concluding Remarks

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

2.1. CGQL Estimating Function for Dynamic Dependence Parameters $(γ^{})$ as a Function of Unknown $β^{}$

2.2. Modified Maximum Likelihood (MML) Estimation for $β^{*}$ Using Observed Information

3.1. Consistency of ${\hat{γ}}_{C G Q L}^{} (β^{})$ for Dynamic Dependence Parameter $γ^{*}$

3.2. Consistency of MML Estimtor ${\hat{β}}^{}$ for $β^{}$