Next Article in Journal
Fast and Flexible Quantum-Inspired Differential Equation Solvers with Data Integration
Previous Article in Journal
Some New Analytical and Numerical Results for the Nuclear Spin Generator (Sherman) System
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Asymptotic Theory for a Parameter Dimension-Split Estimation in Time Series Analysis for Multinomial Data

by
Brajendra C. Sutradhar
1,* and
R. Prabhakar Rao
2
1
Department of Mathematics and Statistics, Memorial University, St. John’s, NL A1C 5S7, Canada
2
Department of Economics, Sri Sathya Sai Institute of Higher Learning, Prasanthi Nilayam, Anantapur 515134, India
*
Author to whom correspondence should be addressed.
Mathematics 2026, 14(12), 2068; https://doi.org/10.3390/math14122068
Submission received: 13 May 2026 / Revised: 1 June 2026 / Accepted: 5 June 2026 / Published: 10 June 2026
(This article belongs to the Section D1: Probability and Statistics)

Abstract

The parameter space in a regression model for multinomial time series data contains the regression parameters those explain the effects of the time dependent covariates, and the dynamic dependence or category transition parameters those explain the influence of the past responses on the multinomial response at a given time. The estimation of the regression parameters can be negatively affected when higher dimension of the parameter space is considered specially for the transition parameters. In this paper we propose a parameter dimension-split approach where a conditional generalized quasi-likelihood (CGQL) estimating function is first developed for the dynamic dependence parameters in terms of unknown regression parameters which is exploited in the next step to develop an observed information matrix based maximum likelihood (ML) estimating equation for the main regression parameters. More specifically, this split approach helps to write the actual joint likelihood function of regression and dynamic dependence parameters as a likelihood function of regression parameters only by replacing the dynamic dependence parameters with their CGQL estimates obtained in the first step. As the time series length is generally large in practice, we have made sure that the proposed CGQL and ML estimators are asymptotically reliable, that is consistent for the respective parameters.

1. Introduction

Binary time series analysis is an important research topic in economics and statistics, among other areas. For example, the economic status (profit/loss) of a pharmaceutical industry may be recorded over the years along with certain exogenous explanatory covariates, such as type of industry, yearly advertising cost, and other research and development expenditures. It is likely that the binary profit status of an industry in a given year is correlated with the status of profits from previous years. It is of interest to know both (i) the effects of the time-dependent covariates, and (ii) the dynamic relationship among the responses over the years. Because obtaining the binary status may depend on certain latent variables, the binary time series data have been modeled and analyzed in a variety of ways over the last four decades, primarily in the fields of statistics and econometrics. Readers may refer to existing studies ([1,2,3,4,5,6,7,8,9,10,11,12]) on various binary time series models. More specifically, most of the models considered in these studies have either binary probit or logit forms. By the same token, the multinomial time series analysis is also an important research topic. In this case, as an extension to the binary response variable, one deals with a categorical response variable with more than two categories. For example, it may be more appropriate to classify the profit status of the industries into several categories, such as heavy loss, moderate loss, no loss, moderate profit, or healthy profit, and then examine the effects of exogenous covariates on such categorical responses collected over a long period of time. The correlations among categories over time are also of interest. This type of multinomial time series data has been analyzed mainly in statistics literature by a few authors such as refs. [13,14,15,16,17,18]. As far as the dynamic relationship is concerned, studies by refs. [16,18], for example, have considered a multinomial dynamic logit (MDL) model as a generalization of the binary logit time series model.
As far as the BDL (binary dynamic logits) model is concerned, it is constructed as follows. Let { y t * , t = 1 , , T } be a sequence of a latent variable, and a binary response y t is observed at time t using the relationship
y t = 1 if   y t * > 0 0 otherwise .
Also, let x t = ( x t 1 , , x t , , x t , p + 1 ) denote the ( p + 1 ) -dimensional exogenous explanatory covariate vector and β = ( β 0 , β 1 , , β p ) denote the effect of x t on the binary response y t . Next, suppose that the latent variable y 1 * in (1) follows a logistic distribution f L ( y 1 * ) ([19]) with mean g 1 * = x 1 β and variance π 2 3 , whereas for t = 2 , , T , y t * follows the same logistic distribution f L ( · ) but with mean g t * = x t β + γ y t 1 and variance π 2 3 , γ being a lag 1 dynamic dependence parameter. It then follows from (1) that
P r ( y 1 = 1 ) = x 1 β f L ( y 1 * ) d y 1 * = exp ( x 1 β ) / [ 1 + exp ( x 1 β ) ] P r ( y t = 1 | y t 1 ) = x t β + γ y t 1 f L ( y t * ) d y t * ,   for   t = 2 , , T = exp ( x t β + γ y t 1 ) / [ 1 + ( x t β + γ y t 1 ) ] ,
(see also [6], p. 422). To understand the recursive mean E [ Y t ] , variance var [ Y t ] , and lag correlations between Y t and Y t , produced by this BDL model (2), it is of interest to estimate β and γ . Note that there is no range restriction for these parameters.
The above mentioned MDL model is a generalization of the BDL model given by (2). For a discussion on this generalization, see, for example, the studies by refs. [16,17,18,20]. More specifically, the MDL (multinomial dynamic logits) model is constructed as follows: Let y t = ( y t 1 , , y t c , , y t , C 1 ) denote the ( C 1 ) -dimensional multinomial response variable and for c = 1 , , C 1 ,
y t ( c ) = ( y t 1 ( c ) , , y t c ( c ) , , y t , C 1 ( c ) ) = ( 01 c 1 , 1 , 01 C 1 c ) δ t c
indicates that the multinomial response recorded at time t belongs to the cth category. For c = C , one writes y t ( C ) = δ t C = 01 C 1 . Here, and also in (3), for a scalar constant c 0 , we use c 0 1 C for simplicity to represent c 0 1 c , with ⊗ being the well-known Kronecker or direct product. This notation will also be used throughout the rest of the paper when needed. At any time point t , let fi c = ( β c 0 , β c 1 , , β c p ) denote the effect of x t on y t ( c ) for c = 1 , , C 1 . Also, let π ( 1 ) c denote the marginal multinomial probability at time t = 1 for the observation y t to be in the cth category; and for t = 2 , , T , let the transitional probability from the gth ( g = 1 , , C ) category at time t 1 to the cth category at time t , be denoted by η t | t 1 ( c ) ( g ) . As an extension of the BDL model (2), one may then write the MDL model as
P [ y 1 = y 1 ( c ) ] = π ( 1 ) c = exp ( x 1 fi c ) 1 + g = 1 C 1 exp ( x 1 fi g ) for   c = 1 , , C 1 1 1 + g = 1 C 1 exp ( x 1 fi g ) for   c = C P Y t = y t ( c ) | Y t 1 = y t 1 ( g ) = η t | t 1 ( c ) ( g )   for   t = 2 , , T = exp x t β c + γ c y t 1 ( g ) 1 + v = 1 C 1 exp x t β v + γ v y t 1 ( g ) , for   c = 1 , , C 1 1 1 + v = 1 C 1 exp x t β v + γ v y t 1 ( g ) , for   c = C ,
where γ c = ( γ c 1 , , γ c v , , γ c , C 1 ) denotes the dynamic dependence parameters.
Note that for further notational convenience, one may re-express the transitional probabilities in (4) as
η t | t 1 ( c ) ( g ) = exp x t β c + γ c δ ( t 1 ) g 1 + v = 1 C 1 exp x t β v + γ v δ ( t 1 ) g , for   c = 1 , , C 1 1 1 + v = 1 C 1 exp x t β v + γ v δ ( t 1 ) g , for   c = C ,
where for t = 2 , , T ,   δ ( t 1 ) g through (1) has the formula
δ ( t 1 ) g = [ 01 g 1 , 1 , 01 C 1 g ] for   g = 1 , , C 1 01 C 1 for   g = C .
Note that in (5), the category g occurred at time t 1 . Thus the category g depends on time t 1 , and δ ( t 1 ) g δ g t 1 . However, for simplicity, we use g for g t 1 . For convenience, we use β * = ( β 1 , , β c , , β C 1 ) : ( p + 1 ) ( C 1 ) × 1 to represent all regression effects involved in the marginal and conditional probabilities given in (4) and (5). Similarly, we use γ * = ( γ 1 , , γ c , , γ C 1 ) : ( C 1 ) 2 × 1 to represent all dynamic dependence parameters involved in the transitional probabilities (4) or (5).
We remark that as we explain below or more specifically in Section 2, the estimation of the aforementioned regression and dynamic dependence parameters through existing joint likelihood approach (for both parameters) may be negatively affected when the dimension of these parameter vectors is large. As a remedy in this paper we provide a dimension-split approach where a new likelihood function is constructed only for the main regression parameters by replacing the large dimensional dynamic dependence parameters in the joint likelihood function with their conditional estimates those will be obtained first using a conditional generalized quasi-likelihood (CGQL) approach conditional on unknown main regression parameters.
Turning back to the estimation importance, notice that the unconditional means E [ Y t ] , variances var [ Y t ] , and pair-wise correlations corr [ Y u , Y t ] , u < t , computed by exploiting the model (4) will involve β * and γ * . Thus, to understand these basic properties of the multinomial time series, it is of importance to obtain consistent estimates for β * and γ * , at least asymptotically, that is, when T . As far as the formulas are concerned, the means as the functions of β * and γ * have the recursive relationship given by
E [ Y t ] = η ( t | t 1 ) ( C ) + η ( t | t 1 ) , M η ( t | t 1 ) ( C ) 1 C 1 E [ Y t 1 ] = π ˜ ( t ) ( β * , γ * ) = ( π ˜ ( t ) 1 , , π ˜ ( t ) c , , π ˜ ( t ) ( C 1 ) ) ,   for   t = 2 , , T 1 ,
where the expectation at the initial time t = 1 has the formula
E [ Y 1 ] = [ π ( 1 ) 1 , , π ( 1 ) c , , π ( 1 ) ( C 1 ) ] = π ( 1 ) ( β * ) ,
where π ( 1 ) c is given by (4) for all c = 1 , , C . In (6), η ( t | t 1 ) ( C ) is the ( C 1 ) -dimensional vector of conditional probabilities, given by
η ( t | t 1 ) ( C ) = [ η t | t 1 ( 1 ) ( C ) , , η t | t 1 ( c ) ( C ) , η t | t 1 ( C 1 ) ( C ) ] ,
and η ( t | t 1 ) , M is the ( C 1 ) × ( C 1 ) matrix of conditional probabilities given by
η ( t | t 1 ) , M = ( η t | t 1 ( c ) ( g ) )
where η t | t 1 ( c ) ( g ) is the ( c , g ) -th element of the matrix for c = 1 , , C 1 ; g = 1 , , C 1 . Furthermore, similar to (6), the variances and covariances also have the recursive relationships and they are given by
var [ Y t ] = diag [ π ˜ ( t ) 1 ( β * ) , , π ˜ ( t ) c ( β * , γ * ) , , π ˜ ( t ) ( C 1 ) ( β * , γ * ) ] π ˜ ( t ) ( β * , γ * ) π ˜ ( t ) ( β * , γ * ) = ( cov ( Y t c , Y t k ) ) = ( σ ˜ ( t t ) c k ( β * , γ * ) ) ,   c , k = 1 , , C 1 = Σ ˜ ( t t ) ( β * , γ * ) ,   for   t = 1 , , T
cov [ Y u , Y t ] = Π s = u + 1 t η ( s | s 1 ) , M η ( s | s 1 ) ( C ) 1 C 1 var [ Y u ] ,   for   u < t , t = 2 , , T = ( cov ( Y u c , Y t k ) ) = ( σ ˜ ( u t ) c k ( β * , γ * ) ) ,   c , k = 1 , , C 1 = Σ ˜ ( u t ) ( β * , γ * ) .
As far as the inference is concerned, some authors such as [18] used a partial likelihood approach for the estimation of the regression and the dynamic dependence parameters ( β * , γ * ) under the assumption that the observable covariates { x t ,   t = 1 , , T } are random, and perhaps dependent on lagged values of the response variable. This approach is equivalent to the so-called conditional likelihood approach where the likelihood is obtained by conditioning the history of both covariates and responses. Consequently, one may easily obtain the information matrix conditional on the whole process history ([16], Equation (17), p. 364.) Notice, however, that the information matrix conditional on the entire history is not the same as the information matrix conditional on the covariate process; but, in many cases, it is the same as the observed information matrix, for example, when the covariate distribution does not involve lagged values of the response variable. Among others, the study by ref. [18] (Equations (17)–(19)) used a joint likelihood approach (see also [20]) to estimate the parameters β * and γ * , where the likelihood estimating equation was solved by using the Fisher information matrix based on Newton’s iterative procedure. These authors found through an intensive simulation study that with a minimal dimension of the parameter space, both approaches work quite well even with relatively short-length series; however, when higher dimensions of the parameter space is considered, it was found that, although both methods require longer series to achieve acceptable levels of accuracy, the approach based on the observed information matrix to compute the standard errors of the parameter estimators performs relatively worse than the approach based on the Fisher information matrix. But obtaining the Fisher information matrix can be computationally involved.
Note that because both the observed and Fisher information matrix-based estimating equations produce similarly efficient estimates for the parameters when the dimension of the parameter space is minimal, and also because the computation of the Fisher information can be complex, the main objective of this paper is to use an observed information matrix-based estimation approach with minimal dimensions for the parameter space. More specifically, we first develop a consistent estimating function for the dynamic dependence parameter γ * as a function of the unknown regression effects β * . We denote this function as γ ^ * ( β * ) . We then estimate the regression parameters β * by exploiting the likelihood function L ( β * , γ ^ * ( β * ) ) , say, instead of the joint likelihood function L ( β * , γ * ) . Thus, in this approach, the estimation of β * will not depend on the dimension of the dynamic dependence parameters γ * . This reduced dimension-based estimation approach using the observed information matrix is further elucidated in Section 2. The asymptotic theory for this dimension-reduction approach is discussed in Section 3. Finally, concluding remarks are given in Section 4.

2. Estimation of Parameters: A Dimension-Reduction Approach

Because β * = ( β 1 , , β C 1 ) : ( C 1 ) ( p + 1 ) × 1 , and γ * = ( γ 1 , , γ C 1 ) : ( C 1 ) 2 × 1 , the dimension of the parameter space depends on both p and C . Customarily, the joint estimation of β * and γ * , that is, θ = ( β * , γ * ) , is performed by maximizing the likelihood function with regard to θ , where the likelihood function under the model (4) has the form
L ( θ ) = L ( β * , γ * ) = Π c = 1 C [ π ( 1 ) c ( β * ; x 1 ) ] y 1 c × Π t = 2 T Π c = 1 C Π g = 1 C [ η t | t 1 ( c | g ) ( β * , γ * ; x t ) ] y t c ,
where π ( 1 ) c ( β * ) and η t | t 1 ( c | g ) ( β * , γ * ) are marginal (at t = 1 ) and transitional probabilities, respectively. The solution of the log likelihood estimating equation, namely L o g L ( θ ) θ = 0 , may be obtained by solving the Hessian or Fisher information matrix-based iterative equations. More specifically, the Hessian matrix-based iterative equation has the form
θ ^ ( r + 1 ) = θ ^ ( r ) + 2 L o g L ( θ ) θ θ 1 L o g L ( θ ) θ | θ = θ ^ ( r )
(see [16]), and the Fisher information matrix-based equation has the form
θ ^ ( r + 1 ) = θ ^ ( r ) + E 2 L o g L ( θ ) θ θ 1 L o g L ( θ ) θ | θ = θ ^ ( r )
(see [18]). Under the assumption that the Hessian matrix is positive semi-definite, some authors, such as [14] (Equations (4.1) and (4.4)), studied the asymptotic properties (as T ) of the likelihood estimator obtained from (12). When the dimension of the parameter ( θ ) space is large, however, the authors of [18] found that this type of Hessian matrix-based likelihood estimate of θ performs relatively worse for moderately large T , as compared to the Fisher information matrix-based estimate (13). But, obtaining the Fisher information matrix is algebraically involved. When the dimension of the parameter space was small, both Hessian and Fisher information matrix-based estimates were found to work almost the same. For this reason, in this paper, we consider a dimension-splitting or reduction approach for the parameter space, so that the Hessian matrix-based likelihood estimates can be used even when T is not infinitely large. More specifically, we develop a conditional generalized quasi-likelihood (CGQL) estimating function for γ * as a function of unknown β * . This estimating function may be denoted by γ ^ * ( β * ) . We then use this estimating function and construct a modified likelihood function for β * as L ( β * , γ ^ * ( β * ) ) , so that the estimation of β * does not depend on the dimension of γ * . Also, this approach will provide a different asymptotic theory than the one in [14] for the estimator of the main regression parameter β * . The CGQL cum-modified maximum likelihood estimation (MMLE) approach is provided in Section 2.1 and Section 2.2 below, and the asymptotic theory for the estimates is given in Section 3.

2.1. CGQL Estimating Function for Dynamic Dependence Parameters ( γ * ) as a Function of Unknown β *

Notice from (4) that conditional on y t 1 , the response vector y t = ( y t 1 , , y t c , , y t , C 1 ) follows a multinomial distribution with C 1 -dimensional mean vector
E [ Y t | y t 1 ] = η t | t 1 ( β * , γ * ) ,
and ( C 1 ) × ( C 1 ) covariance matrix
cov [ Y t | y t 1 ] = D η t | t 1 η t | t 1 ( β * , γ * ) η t | t 1 ( β * , γ * ) = Σ t | t 1 ( β * , γ * ) ,   ( s a y ) ,
where
η t | t 1 ( β * , γ * ) = [ η t | t 1 ( 1 ) ( β * , γ * ) , , η t | t 1 ( c ) ( β * , γ * ) , , η t | t 1 ( C 1 ) ( β * , γ * ) ] : ( C 1 ) × 1 ,
with η t | t 1 ( c ) ( β * , γ * ) = exp x t β c + γ c y t 1 / 1 + v = 1 C 1 exp x t β v + γ v y t 1 , for c = 1 , , C 1 . The diagonal matrix in (15) has the form
D η t | t 1 = diag [ η t | t 1 ( 1 ) ( β * , γ * ) , , η t | t 1 ( c ) ( β * , γ * ) , , η t | t 1 ( C 1 ) ( β * , γ * ) ] .
In notation, for t = 2 , , T , we write this multinomial distribution of Y t as
Y t | y t 1 M u l t η t | t 1 ( β * , γ * ) , Σ t | t 1 ( β * , γ * ) .
Now, to develop a CGQL estimating function γ ^ * ( β * ) for γ * , as a generalization of the QL approach of [21], we follow the GQL approach in [22] and exploit the conditional mean vector η t | t 1 ( β * , γ * ) and the conditional covariance matrix Σ t | t 1 ( β * , γ * ) . For this purpose, suppose that the following two assumptions hold.
Assumption 1.
Consider the conditional probability function
η t | t 1 ( c ) ( β * , γ * ) = exp x t β c + γ c y t 1 / 1 + v = 1 C 1 exp x t β v + γ v y t 1
defined in (15) for a categorical observation to be in c-th category at time t which weights γ c = ( γ c 1 , , γ c ( C 1 ) ) with all possible γ h = ( γ h 1 , , γ h ( C 1 ) ) ,   h = 1 , , C 1 that could occur at time ( t 1 ) . We assume that this weighted probability function is continuous, that is, η t | t 1 ( c ) ( β * , γ * ) γ h exists for all c , h = 1 , , C 1 .
Assumption 2.
The second-order derivative matrix η t | t 1 ( β * , γ * ) γ * η t | t 1 ( β * , γ * ) γ * is bounded and positive definite.
Proposition 1.
When the above two assumptions hold, the CGQL estimator γ ^ C G Q L * ( β * ) for γ * may be obtained by using the iterative equation
γ ^ C G Q L * ( β * ) ( r + 1 ) = γ ^ C G Q L * ( β * ) ( r ) + t = 2 T η t | t 1 ( β * , γ * ) γ * Σ 1 t | t 1 ( β * , γ * ) η t | t 1 ( β * , γ * ) γ * 1 × t = 2 T η t | t 1 ( β * , γ * ) γ * Σ 1 t | t 1 ( β * , γ * ) [ y t η t | t 1 ( β * , γ * ) ] | γ * = γ ^ C G Q L * ( r ) ,
Proof of Proposition 1.
This proposition follows from the fact that under the model (4), one may write the GQL estimating equation for γ * as
t = 2 T η t | t 1 ( β * , γ * ) γ * Σ 1 t | t 1 ( β * , γ * ) [ y t η t | t 1 ( β * , γ * ) ] = 0
 □
Lemma 1.
In Assumption 2, the ( C 1 ) 2 × ( C 1 ) derivative matrix η t | t 1 ( β * , γ * ) γ * has the computational formula
η t | t 1 ( β * , γ * ) γ * = [ η t | t 1 ( 1 ) ( β * , γ * ) γ * , , η t | t 1 ( c ) ( β * , γ * ) γ * , , η t | t 1 ( C 1 ) ( β * , γ * ) γ * ] = η t | t 1 ( 1 ) ( δ ( t 1 ) 1 η t | t 1 ) η t | t 1 ( C 1 ) ( δ ( t 1 ) ( C 1 ) η t | t 1 ) y t 1 = η ( t | t 1 ) , M * ( β * , γ * ) y t 1 ,
where η t | t 1 * ( β * , γ * ) denotes the matrix constructed by using the ( C 1 ) -dimensional column vectors η t | t 1 ( c ) ( δ ( t 1 ) c η t | t 1 ) for all c = 1 , , C 1 .
Proof. 
Because η t | t 1 ( c ) ( β * , γ * ) = exp x t β c + γ c y t 1 / 1 + v = 1 C 1 exp x t β v + γ v y t 1 , for c = 1 , , C 1 , by (5), it then follows that
η t | t 1 ( c ) γ h = y t 1 η t | t 1 ( c ) [ 1 η t | t 1 ( c ) ] for   h = c ; h , c = 1 , , C 1 y t 1 η t | t 1 ( c ) η t | t 1 ( h ) for   h c ; h , c = 1 , , C 1 .
Next, because γ * = ( γ 1 , , γ h , , γ C 1 ) , one obtains
η t | t 1 ( c ) γ * = η t | t 1 ( 1 ) η t | t 1 ( c ) η t | t 1 ( c ) [ 1 η t | t 1 ( c ) ] η t | t 1 ( C 1 ) η t | t 1 ( c ) y t 1 : ( C 1 ) ( C 1 ) × 1 = η t | t 1 ( c ) ( δ ( t 1 ) c η t | t 1 ) y t 1 ,
with
δ ( t 1 ) c = [ 01 c 1 , 1 , 01 C 1 c ] for   c = 1 , , C 1 01 C 1 for   c = C .
The lemma, i.e., Equation (19), then follows from (21). □
Note that for the asymptotic studies to be discussed in Section 3, it is convenient to use (19) in (17) and re-express the iterative Equation (17) for γ * as
γ ^ C G Q L * ( β * ) ( r + 1 ) = γ ^ C G Q L * ( β * ) ( r ) + t = 2 T ( η ( t | t 1 ) , M * ( β * , γ * ) y t 1 ) Σ 1 t | t 1 ( β * , γ * ) ( η ( t | t 1 ) , M * ( β * , γ * ) y t 1 ) 1 × t = 2 T ( η ( t | t 1 ) , M * ( β * , γ * ) y t 1 ) Σ 1 t | t 1 ( β * , γ * ) [ y t η t | t 1 ( β * , γ * ) ] | γ = γ ^ ( r ) .
Let γ ^ C G Q L * ( β * ) denote the moment estimating function of γ * obtained via (22).

2.2. Modified Maximum Likelihood (MML) Estimation for β * Using Observed Information

Notice that because γ * can be estimated by γ ^ C G Q L * ( β * ) using the estimating function given in (22), one is not concerned about the dimension of γ * for β * estimation. Thus in a reduced dimension setup, we may estimate β * by exploiting the modified likelihood function for β * , which is obtained as follows by replacing γ * with γ ^ C G Q L * ( β * ) in the joint likelihood function L ( β * , γ * ) , for β * and γ * . More specifically, the modified likelihood function for β * , by (4), may be written as
L ( β * , γ ^ C G Q L * ( β * ) ) = Π c = 1 C [ π ( 1 ) c ( β * ; x 1 ) ] y 1 c × Π t = 2 T Π c = 1 C Π g = 1 C [ η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ; x t ) ] y t c ,
where η ˜ t | t 1 ( c | g ) ( · ) denotes the partially estimated dynamic probability function obtained from the true dynamic probability function η t | t 1 ( c | g ) ( · ) defined in (4), by replacing γ * with γ ^ C G Q L * ( β * ) . Thus,
η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ; x t ) = exp x t β c + γ ^ c , C G Q L ( β * ) δ ( t 1 ) g 1 + v = 1 C 1 exp x t β v + γ ^ v , C G Q L ( β * ) δ ( t 1 ) g , for   c = 1 , , C 1 1 1 + v = 1 C 1 exp x t β v + γ ^ v , C G Q L ( β * ) δ ( t 1 ) g j , for   c = C .
We remark that as γ ^ C G Q L * ( β * ) from (22) has an implicit functional form, the construction of the likelihood estimating equation for β * encounters a computational problem because of the difficulty in obtaining γ ^ C G Q L * ( β * ) β * from an implicit function. However, the likelihood estimating equation for β * involving γ ^ C G Q L * ( β * ) β * may be computed as follows:
Likelihood estimating equation for β * : We follow (23) and write this estimating equation for β * as
L o g   L ( β * , γ ^ C G Q L * ( β * ) ) β * = c = 1 C y 1 c π ( 1 ) c ( β * ) π ( 1 ) c ( β * ) β * + t = 2 T g = 1 C c = 1 C y t c η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ) η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ) β * = 0 ,
where π ( 1 ) c ( β * ) and η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ) are given by (4) and (23), respectively. Their derivatives with respect to β * needed for (24) are given in the following two Lemmas.
Lemma 2.
Computation of π ( 1 ) c ( β * ) β * . This derivative has the formula
π ( 1 ) c ( β * ) β * = π ( 1 ) c ( β * ) ( δ ( 1 ) c π ( 1 ) ( β * ) ) x 1 : ( C 1 ) ( p + 1 ) × 1 ,
where
δ ( 1 ) c = [ 01 c 1 , 1 , 01 C 1 c ] for   c = 1 , , C 1 01 C 1 for   c = C .
Proof. 
The proof is obvious because π ( 1 ) ( β * ) = ( π ( 1 ) 1 ( β * ) , , π ( 1 ) 1 c ( β * ) , , π ( 1 ) ( C 1 ) ( β * ) ) , with
π ( 1 ) c ( β * ) = exp ( x 1 β c ) 1 + g = 1 C 1 exp ( x 1 β g ) for   c = 1 , , C 1 ; 1 1 + g = 1 C 1 exp ( x 1 β g ) for   c = C ,
by (4). □
Lemma 3.
Computation of η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ) β * : ( p + 1 ) ( C 1 ) × 1 . The computation of this derivative matrix requires the formula for the derivative matrix [ γ ^ c , C G Q L ( β * ) ] β * : ( p + 1 ) ( C 1 ) × ( C 1 ) , for all c = 1 , , C 1 , which can be derived from the formula for the derivative matrix [ γ ^ C G Q L * ( β * ) ] β : ( p + 1 ) ( C 1 ) × ( C 1 ) 2 as follows:
[ γ ^ C G Q L * ( β * ) ] β * = t = 2 T ( η ( t | t 1 ) , M * ( β * , γ * ) x t ) Σ 1 t | t 1 ( β * , γ * ) ( η ( t | t 1 ) , M * ( β * , γ * ) y t 1 ) × t = 2 T ( η ( t | t 1 ) , M * ( β * , γ * ) y t 1 ) Σ 1 t | t 1 ( β * , γ * ) ( η ( t | t 1 ) , M * ( β * , γ * ) y t 1 ) 1 = [ [ γ ^ 1 , C G Q L ( β * ) ] β * , , [ γ ^ c , C G Q L ( β * ) ] β * , , [ γ ^ ( C 1 ) , C G Q L ( β * ) ] β * ] : ( p + 1 ) ( C 1 ) × ( C 1 ) 2 .
Proof. 
Because the CGQL estimating function for γ * , that is, γ ^ C G Q L * ( β * ) , is obtained from (22) at its final iteration stage, the estimating function has the form
γ ^ C G Q L * ( β * ) = γ * + t = 2 T ( η ( t | t 1 ) , M * ( β * , γ * ) y t 1 ) Σ 1 t | t 1 ( β * , γ * ) ( η ( t | t 1 ) , M * ( β * , γ * ) y t 1 ) 1 × t = 2 T ( η ( t | t 1 ) , M * ( β * , γ * ) y t 1 ) Σ 1 t | t 1 ( β * , γ * ) [ y t η t | t 1 ( β * , γ * ) ] .
The lemma now follows, first because the β * involved in the first derivative as well as in the inverse covariance matrix Σ 1 t | t 1 ( β * , γ * ) is treated to be known from the previous, i.e., from the second-last iteration, and next because by a similar operation as (19), the derivative of the second term in (28) follows from the formula
η t | t 1 ( β * , γ * ) β * = [ η t | t 1 ( 1 ) ( β * , γ * ) β * , , η t | t 1 ( c ) ( β * , γ * ) β * , , η t | t 1 ( C 1 ) ( β * , γ * ) β * ] = η t | t 1 ( 1 ) ( δ ( t 1 ) 1 η t | t 1 ) η t | t 1 ( C 1 ) ( δ ( t 1 ) ( C 1 ) η t | t 1 ) x t = η ( t | t 1 ) , M * ( β * , γ * ) x t , ( C 1 ) ( p + 1 ) × ( C 1 ) .
 □
Lemma 4.
Computation of η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ) β * : ( p + 1 ) ( C 1 ) × 1 (continued). This derivative has the formula given by
η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ) β * = η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ) δ ( t ) c x t + [ γ ^ c , C G Q L ( β * ) ] β * y t 1 ( g ) η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ) ν C 1 η ˜ t | t 1 ( ν | g ) ( β * , γ ^ C G Q L * ( β * ) ) × δ ( t ) ν x t + [ γ ^ ν , C G Q L ( β * ) ] β * y t 1 ( g ) ,
where, for example, [ γ ^ c , C G Q L ( β * ) ] β * is the ( p + 1 ) ( C 1 ) × ( C 1 ) -dimensional cth component matrix in (27) for all c = 1 , , C 1 , and [ γ ^ C , C G Q L ( β * ) ] β * = 0 ,   δ ( t ) C = 01 C 1 without any loss of generality.
Proof. 
Re-express the formula for η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ; x t ) in (24) as
η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ; x t ) = N c D , for   c = 1 , , C 1 1 D , for   c = C ,
where N c = exp x t β c + γ ^ c , C G Q L ( β * ) y t 1 ( g ) , and D = 1 + ν = 1 C 1 exp x t β ν + γ ^ ν , C G Q L ( β * ) y t 1 ( g ) . It then follows that
η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ) β * = η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ) β * x t β c + γ ^ c , C G Q L ( β * ) y t 1 ( g ) 1 D ν = 1 C 1 exp x t β ν + γ ^ ν , C G Q L ( β * ) y t 1 ( g ) β * x t β ν + γ ^ ν , C G Q L ( β * ) y t 1 ( g ) .
The formula in (30) follows from (32) because
η ˜ t | t 1 ( ν | g ) ( β * , γ ^ C G Q L * ( β * ) ) = 1 D exp x t β ν + γ ^ ν , C G Q L ( β * ) y t 1 ( g ) ,   and   β * x t β ν = δ ( t ) ν x t .
 □
Simplified likelihood estimating equation for β * : Notice that by using the derivative formulas from (26) and (32), one may reduce the the likelihood estimating Equation (25) as
L o g   L ( β * , γ ^ C G Q L * ( β * ) ) β * = c = 1 C y 1 c δ ( 1 ) c π ( 1 ) ( β * ) x 1 + t = 2 T g = 1 C c = 1 C y t c δ ( t ) c x t + [ γ ^ c , C G Q L ( β * ) ] β * y t 1 ( g ) ν C 1 η ˜ t | t 1 ( ν | g ) ( β * , γ ^ C G Q L * ( β * ) ) δ ( t ) ν x t + [ γ ^ ν , C G Q L ( β * ) ] β * y t 1 ( g ) = 0 ,
which is easily computable as the formulas for [ γ ^ c , C G Q L ( β * ) ] β * for all c = 1 , , C 1 involved in this reduced form are available from (27). For c = C one uses [ γ ^ C , C G Q L ( β * ) ] β * = 0 , and δ ( t ) C = 01 C 1 .
Lemma 5.
The likelihood Equation (33) for β * may be obtained by using the iterative equation
β ^ * ( r + 1 ) = β ^ * ( r ) + 2 L o g   L ( β * , γ ^ C G Q L * ( β * ) ) β * β * 1 L o g   L ( β * , γ ^ C G Q L * ( β * ) ) β * β * = β ^ * ( r ) ,
where, under the assumption that β * involved in the derivative γ ^ c , C G Q L ( β * ) β * in (30) or (33) for c = 1 , , C 1 , is known from the previous iteration, the second-order derivative in (34), by (33), has the formula
2 L o g   L ( β * , γ ^ C G Q L * ( β * ) ) β * β * = c = 1 C y 1 c π ( 1 ) ( β * ) β * x 1 t = 2 T g = 1 C c = 1 C y t c ν C 1 δ ( t ) ν x t + [ γ ^ ν , C G Q L ( β * ) ] β * y t 1 ( g ) η ˜ t | t 1 ( ν | g ) ( β * , γ ^ C G Q L * ( β * ) ) β * ,
where π ( 1 ) ( β * ) = [ π ( i ) 1 , , π ( i ) c , , π ( i ) ( C 1 ) ] and
π ( 1 ) c ( β * ) β * = π ( 1 ) c ( β * ) ( δ ( 1 ) c π ( 1 ) ( β * ) ) x 1 : ( C 1 ) ( p + 1 ) × 1 ,
by Lemma 2; and η ˜ t | t 1 ( ν | g ) ( β * , γ ^ C G Q L * ( β * ) ) β * has the same formula as in (30).

3. Asymptotics

3.1. Consistency of γ ^ C G Q L * ( β * ) for Dynamic Dependence Parameter γ *

For this purpose, we will derive the asymptotic (as T ) distribution of γ ^ C G Q L * ( β * ) (solution of moment Equation (22)) as in Theorem 1 given below. Recall that the CGQL estimating function for γ , i.e., the left-hand side of (18) [see also (19)] has the formula
f T ( β * , γ * ) = t = 2 T η t | t 1 ( β * , γ * ) γ * Σ 1 t | t 1 ( β * , γ * ) [ y t η t | t 1 ( β * , γ * ) ] = t = 2 T [ η ( t | t 1 ) , M * ( β * , γ * ) y t 1 ] Σ 1 t | t 1 ( β * , γ * ) [ y t η t | t 1 ( β * , γ * ) ]
so that γ ^ C G Q L * ( β * ) satisfies
f T ( β * , γ ^ C G Q L * ( β * ) ) = t = 2 T [ η ˜ ( t | t 1 ) , M * ( β * , γ ^ C G Q L * ( β * ) ) y t 1 ] × Σ ˜ 1 t | t 1 ( β * , γ ^ C G Q L * ( β * ) ) [ y t η ˜ t | t 1 ( β * , γ ^ C G Q L * ( β * ) ) ] = 0 .
Notice that conditional on the past history, y t in (36) for t = 2 , , T , depends on only the lag 1 response y t 1 . Thus, conditional on y t 1 , all y t may be treated to be independent. Consequently, f T ( β * , γ * ) in (36) is, conditionally, a sum of T 1 independent quantities. Furthermore, as shown in (16),
Y t | y t 1 M u l t η t | t 1 ( β * , γ * ) , Σ t | t 1 ( β , γ ) .
Next, for true γ * , using (36), we write
f ¯ T ( β * , γ * ) = 1 T 1 t = 2 T f t ( β * , γ * ) = 1 T 1 t = 2 T [ η ( t | t 1 ) , M * ( β * , γ * ) y t 1 ] Σ 1 t | t 1 ( β * , γ * ) [ y t η t | t 1 ( β * , γ * ) ] .
Because Y t | y t 1 has the aforementioned multinomial distribution (see also (16)), it then follows that
E [ f ¯ T ( β * , γ * ) ] = 0 cov [ f ¯ T ( β * , γ * ) ] = 1 ( T 1 ) 2 t = 2 T η ( t | t 1 ) , M * ( β * , γ * ) y t 1 × Σ 1 t | t 1 ( β * , γ * ) η ( t | t 1 ) , M * ( β * , γ * ) y t 1
= 1 ( T 1 ) 2 V T * ( β * , γ * ) ,   ( say ) .
We now derive the asymptotic distribution of γ ^ C G Q L * ( β * ) as in the following theorem.
Theorem 1.
We assume that f t ( · ) in (38) satisfy the Lindeberg condition, that is,
lim T V * T 1 t = 2 T ( f t V * T 1 f t ) > ϵ f t f t g ( f t ) = 0
for all ϵ > 0 , g ( · ) being the probability distribution of f t ( · ) . Then
lim T γ ^ C G Q L * ( β * ) N γ * , E Y t 1 t = 2 T ( η ( t | t 1 ) , M * ( β * , γ * ) y t 1 ) Σ 1 t | t 1 ( β * , γ * ) × ( η ( t | t 1 ) , M * ( β * , γ * ) y t 1 ) 1 E Y t 1 [ V T * ( β * , γ * ) ] × E Y t 1 t = 2 T ( η ( t | t 1 ) , M * ( β * , γ * ) y t 1 ) Σ 1 t | t 1 ( β * , γ * ) ( η ( t | t 1 ) , M * ( β * , γ * ) y t 1 ) 1 .
Proof. 
Let
Z T = cov [ f ¯ T ( β * , γ * ) ] 1 2 f ¯ T ( β * , γ * ) = ( T 1 ) [ V T * ( β * , γ * ) ] 1 2 f ¯ T ( β * , γ * ) ,
where f ¯ T ( β * , γ * ) = 1 T 1 t = 2 T f t ( β * , γ * ) as in (38). Here { f t ( · ) } s are not identically distributed because, by (16),
Y t | y t 1 Mult η t | t 1 ( β * , γ * ) , Σ t | t 1 ( β * , γ * ) ,
where the mean vectors and covariance matrices are different at different time points t . However, because { f t ( · ) } s satisfy the Lindeberg condition (41), it then follows from the Lindeberg–Feller central limit theorem ([6], Theorem 3.3.6, [23], Theorem 2.2) that Z T in (43) has the following limiting distribution:
lim T Z T N ( 0 , I ( C 1 ) 2 ) .
Next, because γ ^ C G Q L * ( β * ) obtained by (22) is a solution of (18), it satisfies (37), i.e.,
t = 2 T f t ( β * , γ ^ C G Q L * ( β * ) ) = t = 2 T [ η ˜ ( t | t 1 ) , M * ( β * , γ ^ C G Q L * ( β * ) ) y t 1 ] × Σ ˜ 1 t | t 1 ( β * , γ ^ C G Q L * ( β * ) ) [ y t η ˜ t | t 1 ( β * , γ ^ C G Q L * ( β * ) ) ] = 0 ,
which, by first-order Taylor’s series expansion, produces
t = 2 T f t ( β * , γ * ) + ( γ ^ C G Q L * ( β * ) γ * ) t = 2 T f t ( β * , γ * ) γ * 0 .
Thus,
[ γ ^ C G Q L * ( β * ) γ * ] t = 2 T f t ( β * , γ * ) γ * 1 t = 2 T f t ( β * , γ * ) E Y t 1 t = 2 T ( η ( t | t 1 ) , M * ( β * , γ * ) y t 1 ) Σ 1 t | t 1 ( β * , γ * ) × ( η ( t | t 1 ) , M * ( β * , γ * ) y t 1 ) 1 E Y t 1 t = 2 T f t ( β * , γ * ) = E Y t 1 t = 2 T ( η ( t | t 1 ) , M * ( β * , γ * ) y t 1 ) Σ 1 t | t 1 ( β * , γ * ) ( η ( t | t 1 ) , M * ( β * , γ * ) y t 1 ) 1 × E Y t 1 [ V T * ( β * , γ * ) ] 1 2 [ V T * ( β * , γ * ) ] 1 2 ( T 1 ) f ¯ T ( β * , γ * ) = E Y t 1 t = 2 T ( η ( t | t 1 ) , M * ( β * , γ * ) y t 1 ) Σ 1 t | t 1 ( β * , γ * ) ( η ( t | t 1 ) , M * ( β * , γ * ) y t 1 ) 1 × [ E Y t 1 V T * ( β * , γ * ) ] 1 2 Z T = [ E Y t 1 V T * ( β * , γ * ) ] 1 2 Z T
by (40) and (43). The theorem, i.e., (42), follows from (48) because the limiting distribution ( q ( · ) ) of Z T is normal, that is, lim T q ( Z T ) N ( 0 , I ( C 1 ) 2 ) by (45). Furthermore, because the quantity in the right-hand side of (48) can be re-expressed by (40) as
E Y t 1 ( T 1 ) 2 cov ( f ¯ T ( β * , γ * ) | y t 1 ) 1 2 Z T ,
it then follows under a mild regularity condition (i.e., by assuming that the covariance E Y t 1 cov ( f ¯ T ( β * , γ * ) | y t 1 ) conditional on the past history is finite) that
lim T [ γ ^ C G Q L * ( β * ) γ * ] 0 ,
showing that γ ^ C G Q L * ( β * ) is a mean squared error consistent function for γ * , for any β * .  □
We remark that a Lindeberg’s condition similar to (41) was examined earlier in [14]. For details on this, we refer to ([14], Assumption N(iii), p. 89) for Lindeberg’s condition, and ([14], second paragraph under proof of Theorem 1, p. 93) for its proof.

3.2. Consistency of MML Estimtor β ^ * for β *

Let h ( β * , γ ^ C G Q L * ( β * ) ) denote the likelihood estimating function, i.e., the left-hand side of the likelihood estimating Equation (25) [see also (33)] for β * . We further express this function as
h ( β * , γ ^ C G Q L * ( β * ) ) = h 1 ( β * ) + h T , 2 ( β * , γ ^ C G Q L * ( β * ) ) ,
where
h 1 ( β * ) = c = 1 C y 1 c π ( 1 ) c ( β * ) π ( 1 ) c ( β * ) β * ,   and h T , 2 ( β * , γ ^ C G Q L * ( β * ) ) = t = 2 T g = 1 C c = 1 C y t c η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ) η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ) β * .
The results from the following Lemma 6 will be used in Theorem 2 below to derive the asymptotic as a ( T ) distribution of β ^ * (solution of likelihood estimating Equation (25)).
Lemma 6.
Let h ¯ T ( β * , γ ^ C G Q L * ( β * ) ) = 1 T h ( β * , γ ^ C G Q L * ( β * ) ) . This ( p + 1 ) -dimensional mean vector function has the expectation and the ( p + 1 ) × ( p + 1 ) conditional (on lag 1 response) covariance matrix as
E [ h ¯ T ( β * , γ ^ C G Q L * ( β * ) ) ] = 1 T E [ h ( β * , γ ^ C G Q L * ( β * ) ) ] = 0 ,   a n d cov ( h ¯ T ( β * , γ ^ C G Q L * ( β * ) ) ) = 1 T 2 c = 1 C 1 π ( 1 ) c ( β * ) π ( 1 ) c ( β * ) π ( 1 ) c ( β * ) β * π ( 1 ) c ( β * ) β * u c C π ( 1 ) c ( β * ) β * π ( 1 ) u ( β * ) β * + t = 2 T g = 1 C c = 1 C 1 η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ) η ˜ t | t 1 ( c | g j ) ( β * , γ ^ C G Q L * ( β * ) ) η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ) β * η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ) β * c u C η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ) β * η ˜ t | t 1 ( u | g ) ( β * , γ ^ C G Q L * ( β * ) ) β * ,
respectively.
Proof. 
Because π ( 1 ) C ( β * ) = 1 c = 1 C 1 π ( 1 ) c ( β * ) and η ˜ t | t 1 ( C | g ) ( β * , γ ^ C G Q L * ( β * ) ) = 1 c = 1 C 1 η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ) , it follows from (50) that
E [ h 1 ( β * ) ] = c = 1 C E ( Y 1 c ) π ( 1 ) c ( β * ) π ( 1 ) c ( β * ) β * = c = 1 C π ( 1 ) c ( β * ) β * = 0   and
E [ h T , 2 ( β * , γ ^ C G Q L * ( β * ) ) ] = t = 2 T g = 1 C c = 1 C E ( Y t c | y ( t 1 ) , g ) η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ) η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ) β * = t = 2 T g = 1 C c = 1 C η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ) β * = 0 ,
Next because
h ( β * , γ ^ C G Q L * ( β * ) ) = h 1 ( β * ) + h T , 2 ( β * , γ ^ C G Q L * ( β * ) )
as stated in (49), one writes
E h ¯ T ( β * , γ ^ C G Q L * ( β * ) ) = E 1 T h ( β * , γ ^ C G Q L * ( β * ) ) = 1 T E h 1 ( β * ) + 1 T E h T , 2 ( β * , γ ^ C G Q L * ( β * ) ) .
Hence by applying (52) and (53) we obtain
E h ¯ T ( β * , γ ^ C G Q L * ( β * ) ) = 0
as stated in the lemma.
Next we compute the ( p + 1 ) × ( p + 1 ) conditional (on the history up to time t 1 ( H t 1 )) covariance matrix of the mean vector function h ¯ T ( β * , γ ^ C G Q L * ( β * ) ) as
cov ( h ¯ T ( β * , γ ^ C G Q L * ( β * ) ) ) = 1 T 2 cov ( h 1 ( β * ) ) + cov ( h T , 2 ( β * , γ ^ C G Q L * ( β * ) ) ) = 1 T 2 c = 1 C var ( Y 1 c ) π ( 1 ) c 2 ( β * ) π ( 1 ) c ( β * ) β * π ( 1 ) c ( β * ) β * + u c C cov ( Y 1 c , Y 1 u ) π ( 1 ) c ( β * ) π ( 1 ) u ( β * ) π ( 1 ) c ( β * ) β * π ( 1 ) u ( β * ) β * + t = 2 T g = 1 C c = 1 C var ( Y t c | H t 1 ) [ η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ) ] 2 η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ) β * η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ) β * + c u C cov ( ( Y t c , Y t u ) | H t 1 ) [ η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ) ] [ η ˜ t | t 1 ( u | g ) ( β * , γ ^ C G Q L * ( β * ) ) ] η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ) β * η ˜ t | t 1 ( u | g ) ( β * , γ ^ C G Q L * ( β * ) ) β * ,
yielding the second part of the lemma because
var ( Y 1 c | H t 1 ) = π ( 1 ) c ( β * ) [ 1 π ( 1 ) c ( β * ) ] cov ( Y 1 c , Y 1 u ) = π ( 1 ) c ( β * ) π ( 1 ) u ( β * ) var ( Y t c | H t 1 ) = η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ) [ 1 η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ) ] ,   t = 2 , , T cov ( ( Y t c , Y t u ) | H t 1 ) = η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ) η ˜ t | t 1 ( u | g ) ( β * , γ ^ C G Q L * ( β * ) ) ,   t = 2 , , T .
 □
Lemma 7.
The second-order derivative matrix 2 π ( 1 ) c ( β * ) β * β * has the formula
2 π ( 1 ) c ( β * ) β * β * = π ( 1 ) 1 π ( 1 ) c δ ( 1 ) c + δ ( 1 ) 1 2 π ( 1 ) x 1 x 1 π ( 1 ) 2 π ( 1 ) c δ ( 1 ) c + δ ( 1 ) 2 2 π ( 1 ) x 1 x 1 π ( 1 ) c ( 1 2 π ( 1 ) c ) δ ( 1 ) c π ( 1 ) x 1 x 1 π ( 1 ) ( C 1 ) π ( 1 ) c δ ( 1 ) c + δ ( 1 ) ( C 1 ) 2 π ( 1 ) x 1 x 1 ,
with π ( 1 ) = [ π ( 1 ) 1 , , π ( 1 ) c , , π ( 1 ) ( C 1 ) ] ; and the second-order derivative matrix 2 η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ) β * β * has an approximate formula
2 η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ) β * β * δ ( t ) c x t + [ γ ^ c , C G Q L ( β * ) ] β * y t 1 ( g ) η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β ) ) β * ν C 1 η ˜ t | t 1 ( ν | g ) ( β * , γ ^ C G Q L * ( β * ) ) δ ( t ) ν x t + [ γ ^ ν , C G Q L ( β * ) ] β * y t 1 ( g ) η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ) β * η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ) ν C 1 δ ( t ) ν x t + [ γ ^ ν , C G Q L ( β * ) ] β * y t 1 ( g ) × η ˜ t | t 1 ( ν | g ) ( β * , γ ^ C G Q L * ( β * ) ) β * .
Proof. 
To derive the formula in (55), we first recall from Lemma 2 or more specifically from (26) that
π ( 1 ) c β * = π ( 1 ) c ( δ ( 1 ) c π ( 1 ) ) x 1 : ( C 1 ) ( p + 1 ) × 1 .
A further derivative then produces
2 π ( 1 ) c ( β * ) β * β * = β * π ( 1 ) c ( δ ( 1 ) c π ( 1 ) ) x 1 = β * π ( 1 ) 1 π ( 1 ) c π ( 1 ) c [ 1 π ( 1 ) c ] π ( 1 ) ( C 1 ) π ( 1 ) c x 1 ,
which yields the formula in (55) after some algebraic calculations.
Computing the exact derivative matrix 2 η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ) β * β * is algebraically complicated. The approximate Formula (56) follows from (30) under the assumption that β * involved in the derivative γ ^ c , C G Q L ( β * ) β * in (30) for c = 1 , , C 1 is known from a previous iteration.
We now provide the asymptotic distribution of β ^ * (solution of (25) or (33)) as in the following theorem. □
Theorem 2.
Denote the covariance matrix of h ¯ T ( β * ) computed in Lemma 6 by cov ( h ¯ T ( β * ) ) = 1 T 2 P T ( β * ) . Next, assume that ( h 1 ( β * ) + h T , 2 ( β * ) ) in (49) satisfies the Lindeberg condition, that is,
lim T P T 1 { ( h 1 + h T , 2 ) P T 1 ( h 1 + h T , 2 ) } > ϵ { ( h 1 + h T , 2 ) ( h 1 + h T , 2 ) g ( h 1 + h T , 2 ) } = 0
for all ϵ > 0 , g ( · ) being the probability distribution of ( h 1 + h T , 2 ) . Then, the limiting distribution of β ^ * (say, q ( β ^ * )) is normal, and is given by
lim T q ( β ^ ) * N β * , E y ( h 1 + h T , 2 ) β * 1 P T ( β * ) × E y ( h 1 + h T , 2 ) β * 1 .
Proof. 
The proof of this theorem is similar to that of Theorem 1. The difference lies between the forms of the functions f t ( · ) in Theorem 1 and ( h 1 ( β * ) + h T , 2 ( β * ) in the present theorem. Thus, by a similar justification as in (48), under some mild regularity condition, it follows that
lim T β ^ * β * ,
showing that β ^ * is consistent for β * . Details are omitted.
Note that to compute the expected function in (59), that is, E y ( h 1 + h T , 2 ) β * , one may first assume that β * involved in the first-order derivatives in (50) is known from the previous iteration, and then an approximate second-order derivative as
h 1 ( β * ) β * = c = 1 C y 1 c π ( 1 ) c 2 ( β * ) π ( 1 ) c ( β * ) β * π ( 1 ) c ( β * ) β * ,   and h T , 2 ( β * ) β * = t = 2 T g = 1 C c = 1 C y t c [ η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ) ] 2 η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ) β * η ˜ t | t 1 ( c | g ) ( β * γ ^ C G Q L * ( β * ) ) β * ,
yielding the expectation as
E y ( h 1 + h 2 ) β * = c = 1 C 1 π ( 1 ) c ( β * ) π ( 1 ) c ( β * ) β * π ( 1 ) c ( β * ) β * + t = 2 T g = 1 C c = 1 C 1 η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ) η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ) β * η ˜ t | t 1 ( c | g ) ( β * , γ ^ C G Q L * ( β * ) ) β * .
 □

4. Concluding Remarks

There has been a considerable number of studies on the inferences for multinomial distributions in a panel data setup, where a small number of repeated multinomial responses are collected from a large number of independent individuals. The dimension reduction of parameter space is also considered. We, however, did not include such panel data methods for discussion in this paper as the inferences are quite different under the panel data and time series setups, because in a time series setup a large number of repeated multinomial responses are taken from one unit or individual only. Returning to the time series setup, we have discussed in the paper following [18] (see also [16,17]) that there may be negative inference effects when joint estimation is performed for a large multinomial parameter space involving regression and dynamic dependence parameters. As a possible remedy, we have offered a parameter split or dimension-reduction estimation approach. The asymptotic properties, such as the consistency of the estimators of the dynamic dependence function (for any unknown regression parameters) and then for the main regression parameters, are studied in detail.

Author Contributions

Conceptualization, B.C.S. and R.P.R.; Methodology, B.C.S. and R.P.R.; Writing—original draft, B.C.S. and R.P.R.; Writing—review and editing, B.C.S. and R.P.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study.

Acknowledgments

The authors thank two reviewers for their comments that helped to improve the paper.

Conflicts of Interest

There are no conflicts of interest.

References

  1. Jacobs, P.A.; Lewis, P.A.W. Discrete Time Series Generated by Mixtures I: Correlational and Runs Properties. J. R. Stat. Soc. 1978, 40, 94–105. [Google Scholar] [CrossRef]
  2. Jacobs, P.A.; Lewis, P.A.W. Discrete Time Series Generated by Mixtures II: Asymptotic Properties. J. R. Stat. Soc. 1978, 40, 222–228. [Google Scholar] [CrossRef]
  3. Jacobs, P.A.; Lewis, P.A.W. Stationary Discrete Autoregressive-moving Average Generated by Mixtures. J. Time Ser. Anal. 1983, 4, 19–36. [Google Scholar] [CrossRef]
  4. Keenan, D.M. A Time Series Analysis of Binary Data. J. Am. Stat. Assoc. 1982, 77, 816–821. [Google Scholar] [CrossRef]
  5. Manski, C.F. Semiparametric analysis of discrete response: Asymptotic properties of the maximum score estimator. J. Econom. 1985, 27, 313–333. [Google Scholar] [CrossRef]
  6. Ameniya, T. Advanced Econometrics; Harvard University Press: Cambridge, MA, USA, 1985. [Google Scholar]
  7. Tong, H. Nonlinear Time Series: A Dynamical System Approach; Oxford Statistical Science Series, 6; Oxford University Press: New York, NY, USA, 1990. [Google Scholar]
  8. Horowitz, J. A smoothed maximum score estimator for the binary response model. Econometrica 1992, 60, 505–531. [Google Scholar] [CrossRef]
  9. Park, J.Y.; Phillips, P.C.B. Non-stationary binary choice. Econometrica 2000, 68, 1249–1280. [Google Scholar] [CrossRef]
  10. Moon, H.R. Maximum score estimation of a nonstationary binary choice model. J. Econom. 2004, 120, 385–403. [Google Scholar] [CrossRef]
  11. Jiang, W.; Tanner, M.A. Risk minimization for the series binary choice with variable selection. Econom. Theory 2010, 26, 1437–1452. [Google Scholar] [CrossRef]
  12. De Jong, R.M.; Woutersen, T. Dynamic time series binary choice. Econom. Theory 2011, 27, 673–702. [Google Scholar] [CrossRef][Green Version]
  13. Fahrmeir, L.; Kaufmann, H. Regression models for non-stationary categorical time series. J. Time Ser. Anal. 1987, 8, 147–160. [Google Scholar] [CrossRef]
  14. Kaufmann, H. Regression models for nonstationary categorical time series: Asymptotic estimation theory. Ann. Stat. 1987, 15, 79–98. [Google Scholar] [CrossRef]
  15. Fokianos, K.; Kedem, B. Prediction and classification of non-stationary categorical time series. J. Multivar. Anal. 1998, 67, 277–296. [Google Scholar] [CrossRef]
  16. Fokianos, K.; Kedem, B. Regression theory for categorical time series. Stat. Sci. 2003, 18, 357–376. [Google Scholar] [CrossRef]
  17. Fokianos, K.; Kedem, B. Partial likelihood inference for time series following generalized linear models. J. Time Ser. Anal. 2004, 25, 173–197. [Google Scholar] [CrossRef]
  18. Loredo-Osti, J.C.; Sutradhar, B.C. Estimation of regression and dynamic dependence parameters for non-stationary multinomial time series. J. Time Ser. Anal. 2012, 33, 458–467. [Google Scholar] [CrossRef]
  19. Johnson, N.L.; Kotz, S. Continuous Univariate Distributions-2; John Wiley and Sons: Hoboken, NJ, USA, 1970. [Google Scholar]
  20. Sutradhar, B.C.; Rao, R.P. Regression models for ordinal categorical time series data. In Advances in Time Series Methods and Applications; Li, W.K., Stanford, D.A., Yu, H., Eds.; Field Institute Communications; Springer: New York, NY, USA, 2016; Volume 78, pp. 179–194. [Google Scholar]
  21. Wedderburn, R.W.M. Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method. Biometrika 1974, 61, 439–447. [Google Scholar]
  22. Mallick, T.S.; Sutradhar, B.C. GQL versus conditional GQL inferences for non-stationary time series of counts with overdispersion. J. Time Ser. Anal. 2008, 29, 402–420. [Google Scholar] [CrossRef]
  23. McDonald, D.R. The local limit theorem: A historical perspective. J. Iran. Stat. Soc. 2005, 4, 73–86. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sutradhar, B.C.; Rao, R.P. Asymptotic Theory for a Parameter Dimension-Split Estimation in Time Series Analysis for Multinomial Data. Mathematics 2026, 14, 2068. https://doi.org/10.3390/math14122068

AMA Style

Sutradhar BC, Rao RP. Asymptotic Theory for a Parameter Dimension-Split Estimation in Time Series Analysis for Multinomial Data. Mathematics. 2026; 14(12):2068. https://doi.org/10.3390/math14122068

Chicago/Turabian Style

Sutradhar, Brajendra C., and R. Prabhakar Rao. 2026. "Asymptotic Theory for a Parameter Dimension-Split Estimation in Time Series Analysis for Multinomial Data" Mathematics 14, no. 12: 2068. https://doi.org/10.3390/math14122068

APA Style

Sutradhar, B. C., & Rao, R. P. (2026). Asymptotic Theory for a Parameter Dimension-Split Estimation in Time Series Analysis for Multinomial Data. Mathematics, 14(12), 2068. https://doi.org/10.3390/math14122068

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop