1. Introduction
Binary time series analysis is an important research topic in economics and statistics, among other areas. For example, the economic status (profit/loss) of a pharmaceutical industry may be recorded over the years along with certain exogenous explanatory covariates, such as type of industry, yearly advertising cost, and other research and development expenditures. It is likely that the binary profit status of an industry in a given year is correlated with the status of profits from previous years. It is of interest to know both (i) the effects of the time-dependent covariates, and (ii) the dynamic relationship among the responses over the years. Because obtaining the binary status may depend on certain latent variables, the binary time series data have been modeled and analyzed in a variety of ways over the last four decades, primarily in the fields of statistics and econometrics. Readers may refer to existing studies ([
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12]) on various binary time series models. More specifically, most of the models considered in these studies have either binary probit or logit forms. By the same token, the multinomial time series analysis is also an important research topic. In this case, as an extension to the binary response variable, one deals with a categorical response variable with more than two categories. For example, it may be more appropriate to classify the profit status of the industries into several categories, such as heavy loss, moderate loss, no loss, moderate profit, or healthy profit, and then examine the effects of exogenous covariates on such categorical responses collected over a long period of time. The correlations among categories over time are also of interest. This type of multinomial time series data has been analyzed mainly in statistics literature by a few authors such as refs. [
13,
14,
15,
16,
17,
18]. As far as the dynamic relationship is concerned, studies by refs. [
16,
18], for example, have considered a multinomial dynamic logit (MDL) model as a generalization of the binary logit time series model.
As far as the BDL (binary dynamic logits) model is concerned, it is constructed as follows. Let
be a sequence of a latent variable, and a binary response
is observed at time
t using the relationship
Also, let
denote the
-dimensional exogenous explanatory covariate vector and
denote the effect of
on the binary response
Next, suppose that the latent variable
in (1) follows a logistic distribution
([
19]) with mean
and variance
, whereas for
follows the same logistic distribution
but with mean
and variance
being a lag 1 dynamic dependence parameter. It then follows from (1) that
(see also [
6], p. 422). To understand the recursive mean
variance
and lag
ℓ correlations between
and
produced by this BDL model (2), it is of interest to estimate
and
Note that there is no range restriction for these parameters.
The above mentioned MDL model is a generalization of the BDL model given by (2). For a discussion on this generalization, see, for example, the studies by refs. [
16,
17,
18,
20]. More specifically, the MDL (multinomial dynamic logits) model is constructed as follows: Let
denote the
-dimensional multinomial response variable and for
indicates that the multinomial response recorded at time t belongs to the
cth category. For
one writes
Here, and also in (3), for a scalar constant
, we use
for simplicity to represent
with ⊗ being the well-known Kronecker or direct product. This notation will also be used throughout the rest of the paper when needed. At any time point
let
denote the effect of
on
for
Also, let
denote the marginal multinomial probability at time
for the observation
to be in the
cth category; and for
let the transitional probability from the
gth
category at time
to the
cth category at time
be denoted by
As an extension of the BDL model (2), one may then write the MDL model as
where
denotes the dynamic dependence parameters.
Note that for further notational convenience, one may re-express the transitional probabilities in (4) as
where for
through (1) has the formula
Note that in (5), the category
g occurred at time
Thus the category
g depends on time
and
However, for simplicity, we use
g for
For convenience, we use
to represent all regression effects involved in the marginal and conditional probabilities given in (4) and (5). Similarly, we use
to represent all dynamic dependence parameters involved in the transitional probabilities (4) or (5).
We remark that as we explain below or more specifically in
Section 2, the estimation of the aforementioned regression and dynamic dependence parameters through existing joint likelihood approach (for both parameters) may be negatively affected when the dimension of these parameter vectors is large. As a remedy in this paper we provide a dimension-split approach where a new likelihood function is constructed only for the main regression parameters by replacing the large dimensional dynamic dependence parameters in the joint likelihood function with their conditional estimates those will be obtained first using a conditional generalized quasi-likelihood (CGQL) approach conditional on unknown main regression parameters.
Turning back to the estimation importance, notice that the unconditional means
variances
and pair-wise correlations
computed by exploiting the model (4) will involve
and
Thus, to understand these basic properties of the multinomial time series, it is of importance to obtain consistent estimates for
and
at least asymptotically, that is, when
As far as the formulas are concerned, the means as the functions of
and
have the recursive relationship given by
where the expectation at the initial time
has the formula
where
is given by (4) for all
In (6),
is the
-dimensional vector of conditional probabilities, given by
and
is the
matrix of conditional probabilities given by
where
is the
-th element of the matrix for
Furthermore, similar to (6), the variances and covariances also have the recursive relationships and they are given by
As far as the inference is concerned, some authors such as [
18] used a partial likelihood approach for the estimation of the regression and the dynamic dependence parameters
under the assumption that the observable covariates
are random, and perhaps dependent on lagged values of the response variable. This approach is equivalent to the so-called conditional likelihood approach where the likelihood is obtained by conditioning the history of both covariates and responses. Consequently, one may easily obtain the information matrix conditional on the whole process history ([
16], Equation (17), p. 364.) Notice, however, that the information matrix conditional on the entire history is not the same as the information matrix conditional on the covariate process; but, in many cases, it is the same as the observed information matrix, for example, when the covariate distribution does not involve lagged values of the response variable. Among others, the study by ref. [
18] (Equations (17)–(19)) used a joint likelihood approach (see also [
20]) to estimate the parameters
and
where the likelihood estimating equation was solved by using the Fisher information matrix based on Newton’s iterative procedure. These authors found through an intensive simulation study that with a minimal dimension of the parameter space, both approaches work quite well even with relatively short-length series; however, when higher dimensions of the parameter space is considered, it was found that, although both methods require longer series to achieve acceptable levels of accuracy, the approach based on the observed information matrix to compute the standard errors of the parameter estimators performs relatively worse than the approach based on the Fisher information matrix. But obtaining the Fisher information matrix can be computationally involved.
Note that because both the observed and Fisher information matrix-based estimating equations produce similarly efficient estimates for the parameters when the dimension of the parameter space is minimal, and also because the computation of the Fisher information can be complex, the main objective of this paper is to use an observed information matrix-based estimation approach with minimal dimensions for the parameter space. More specifically, we first develop a consistent estimating function for the dynamic dependence parameter
as a function of the unknown regression effects
We denote this function as
We then estimate the regression parameters
by exploiting the likelihood function
say, instead of the joint likelihood function
Thus, in this approach, the estimation of
will not depend on the dimension of the dynamic dependence parameters
This reduced dimension-based estimation approach using the observed information matrix is further elucidated in
Section 2. The asymptotic theory for this dimension-reduction approach is discussed in
Section 3. Finally, concluding remarks are given in
Section 4.
2. Estimation of Parameters: A Dimension-Reduction Approach
Because
and
the dimension of the parameter space depends on both
p and
Customarily, the joint estimation of
and
that is,
, is performed by maximizing the likelihood function with regard to
where the likelihood function under the model (4) has the form
where
and
are marginal (at
) and transitional probabilities, respectively. The solution of the log likelihood estimating equation, namely
, may be obtained by solving the Hessian or Fisher information matrix-based iterative equations. More specifically, the Hessian matrix-based iterative equation has the form
(see [
16]), and the Fisher information matrix-based equation has the form
(see [
18]). Under the assumption that the Hessian matrix is positive semi-definite, some authors, such as [
14] (Equations (4.1) and (4.4)), studied the asymptotic properties (as
) of the likelihood estimator obtained from (12). When the dimension of the parameter (
) space is large, however, the authors of [
18] found that this type of Hessian matrix-based likelihood estimate of
performs relatively worse for moderately large
as compared to the Fisher information matrix-based estimate (13). But, obtaining the Fisher information matrix is algebraically involved. When the dimension of the parameter space was small, both Hessian and Fisher information matrix-based estimates were found to work almost the same. For this reason, in this paper, we consider a dimension-splitting or reduction approach for the parameter space, so that the Hessian matrix-based likelihood estimates can be used even when
T is not infinitely large. More specifically, we develop a conditional generalized quasi-likelihood (CGQL) estimating function for
as a function of unknown
This estimating function may be denoted by
We then use this estimating function and construct a modified likelihood function for
as
so that the estimation of
does not depend on the dimension of
Also, this approach will provide a different asymptotic theory than the one in [
14] for the estimator of the main regression parameter
The CGQL cum-modified maximum likelihood estimation (MMLE) approach is provided in
Section 2.1 and
Section 2.2 below, and the asymptotic theory for the estimates is given in
Section 3.
2.1. CGQL Estimating Function for Dynamic Dependence
Parameters
as a Function of Unknown
Notice from (4) that conditional on
the response vector
follows a multinomial distribution with
-dimensional mean vector
and
covariance matrix
where
with
for
The diagonal matrix in (15) has the form
In notation, for
we write this multinomial distribution of
as
Now, to develop a CGQL estimating function
for
as a generalization of the QL approach of [
21], we follow the GQL approach in [
22] and exploit the conditional mean vector
and the conditional covariance matrix
For this purpose, suppose that the following two assumptions hold.
Assumption 1. Consider the conditional probability functiondefined in (15) for a categorical observation to be in c-th category at time t which weights with all possible that could occur at time We assume that this weighted probability function is continuous, that is, exists for all Assumption 2. The second-order derivative matrix is bounded and positive definite.
Proposition 1. When the above two assumptions hold, the CGQL estimator for may be obtained by using the iterative equation Proof of Proposition 1. This proposition follows from the fact that under the model (4), one may write the GQL estimating equation for
as
□
Lemma 1. In Assumption 2, the derivative matrix has the computational formulawhere denotes the matrix constructed by using the -dimensional column vectors for all Proof. Because
for
by (5), it then follows that
Next, because
one obtains
with
The lemma, i.e., Equation (19), then follows from (21). □
Note that for the asymptotic studies to be discussed in
Section 3, it is convenient to use (19) in (17) and re-express the iterative Equation (17) for
as
Let
denote the moment estimating function of
obtained via (22).
2.2. Modified Maximum Likelihood (MML) Estimation for Using Observed Information
Notice that because
can be estimated by
using the estimating function given in (22), one is not concerned about the dimension of
for
estimation. Thus in a reduced dimension setup, we may estimate
by exploiting the modified likelihood function for
, which is obtained as follows by replacing
with
in the joint likelihood function
for
and
More specifically, the modified likelihood function for
by (4), may be written as
where
denotes the partially estimated dynamic probability function obtained from the true dynamic probability function
defined in (4), by replacing
with
Thus,
We remark that as
from (22) has an implicit functional form, the construction of the likelihood estimating equation for
encounters a computational problem because of the difficulty in obtaining
from an implicit function. However, the likelihood estimating equation for
involving
may be computed as follows:
Likelihood estimating equation for : We follow (23) and write this estimating equation for
as
where
and
are given by (4) and (23), respectively. Their derivatives with respect to
needed for (24) are given in the following two Lemmas.
Lemma 2. Computation of This derivative has the formulawhere Proof. The proof is obvious because
with
by (4). □
Lemma 3. Computation of The computation of this derivative matrix requires the formula for the derivative matrix for all which can be derived from the formula for the derivative matrix as follows: Proof. Because the CGQL estimating function for
, that is,
is obtained from (22) at its final iteration stage, the estimating function has the form
The lemma now follows, first because the
involved in the first derivative as well as in the inverse covariance matrix
is treated to be known from the previous, i.e., from the second-last iteration, and next because by a similar operation as (19), the derivative of the second term in (28) follows from the formula
□
Lemma 4. Computation of (continued). This derivative has the formula given bywhere, for example, is the -dimensional cth component matrix in (27) for all and without any loss of generality. Proof. Re-express the formula for
in (24) as
where
and
It then follows that
The formula in (30) follows from (32) because
□
Simplified likelihood estimating equation for : Notice that by using the derivative formulas from (26) and (32), one may reduce the the likelihood estimating Equation (25) as
which is easily computable as the formulas for
for all
involved in this reduced form are available from (27). For
one uses
and
Lemma 5. The likelihood Equation (33) for may be obtained by using the iterative equationwhere, under the assumption that involved in the derivative in (30) or (33) for is known from the previous iteration, the second-order derivative in (34), by (33), has the formulawhere andby Lemma 2; and has the same formula as in (30).