## 1. Introduction

In many applications, the available data are not of quantitative nature (e.g., real numbers or counts) but consist of observations from a given finite set of categories. In the present article, we are concerned with data about political goals in Germany, fear states in the stock market, and phrases in a bird’s song. For stochastic modeling, we use a

categorical random variable

X, i.e. a qualitative random variable taking one of a finite number of categories, e.g.

$m+1$ categories with some

$m\in \mathbb{N}$. If these categories are unordered,

X is said to be a

nominal random variable, whereas an

ordinal random variable requires a natural order of the categories (

Agresti 2002). To simplify notations, we always assume the possible outcomes to be arranged in a certain order (either lexicographical or natural order), i.e. we denote the range (state space) as

$\mathcal{S}=\{{s}_{0},{s}_{1},\dots ,{s}_{m}\}$. The stochastic properties of

X can be determined based on the vector of marginal probabilities by

$\mathit{p}={({p}_{0},\dots ,{p}_{m})}^{\top}\in {[0;1]}^{m+1}$, where

${p}_{i}=P(X={s}_{i})$ (probability mass function, PMF). We abbreviate

${s}_{k}\left(\mathit{p}\right):={\sum}_{j=0}^{m}{p}_{j}^{k}$ for

$k\in \mathbb{N}$, where

${s}_{1}\left(\mathit{p}\right)=1$ has to hold. The subscripts “

$0,1,\dots ,m$” are used for

$\mathcal{S}$ and

**p** to emphasize that only

m of the probabilities can be freely chosen because of the constraint

${p}_{0}=1-{p}_{1}-\dots -{p}_{m}$.

Well-established dispersion measures for quantitative data, such as variance or inter quartile range, cannot be applied to qualitative data. For a categorical random variable

X, one commonly defines dispersion with respect to the uncertainty in predicting the outcome of

X(

Kvålseth 2011b;

Rao 1982;

Weiß and Göb 2008). This uncertainty is maximal for a uniform distribution

${\mathit{p}}_{\mathrm{uni}}={(\frac{1}{m+1},\dots ,\frac{1}{m+1})}^{\top}$ on

$\mathcal{S}$ (a reasonable prediction is impossible if all states are equally probable, thus maximal dispersion), whereas it is minimal for a one-point distribution

${\mathit{p}}_{\mathrm{one}}$ (i.e., all probability mass concentrates on one category, so a perfect prediction is possible). Obviously, categorical dispersion is just the opposite concept to the concentration of a categorical distribution. To measure the dispersion of the categorical random variable

X, the most common approach is to use either the (normalized)

Gini index (also

index of qualitative variation, IQV) (

Kvålseth 1995;

Rao 1982) defined as

or the (normalized)

entropy (

Blyth 1959;

Shannon 1948) given by

Both measures are minimized by a one-point distribution ${\mathit{p}}_{\mathrm{one}}$ and maximized by the uniform distribution ${\mathit{p}}_{\mathrm{uni}}$ on $\mathcal{S}$. While nominal dispersion is always expressed with respect to these extreme cases, it has to be mentioned that there is an alternative scenario of maximal ordinal variation, namely the extreme two-point distribution; however, this is not further considered here.

If considering a (stationary) categorical process

${\left({X}_{t}\right)}_{\mathbb{Z}}$ instead of a single random variable, then not only marginal properties are relevant but also information about the serial dependence structure (

Weiß 2018). The (signed) autocorrelation function (ACF), as it is commonly applied in case of real-valued processes, cannot be used for categorical data. However, one may use a type of

Cohen’s κ instead (

Cohen 1960). A

$\kappa $-measure of signed serial dependence in categorical time series is given by (see

Weiß 2011,

2013;

Weiß and Göb 2008);

Equation (

3) is based on the lagged bivariate probabilities

${p}_{ij}\left(h\right)=P({X}_{t}=i,{X}_{t-h}=j)$ for

$i,j=0,\dots ,m$.

$\kappa \left(h\right)=0$ for serial independence at lag

h, and the strongest degree of positive (negative) dependence is indicated if all

${p}_{ii}\left(h\right)={p}_{i}$ (

${p}_{ii}\left(h\right)=0$), i.e., if the event

${X}_{t-h}={s}_{i}$ is necessarily followed by

${X}_{t}={s}_{i}$ (

${X}_{t}\ne {s}_{i}$).

Motivated by a mobility index discussed by

Shorrocks (

1978), a simplified type of

$\kappa $-measure, referred to as the

modified κ, was defined by

Weiß (

2011,

2013):

Except the fact that the lower bound of the range differs from the one in Equation (

3) (note that this lower bound is free of distributional parameters), we have the same properties as stated before for

$\kappa \left(h\right)$. The computation of

${\kappa}^{*}\left(h\right)$ is simplified compared to the one of

$\kappa \left(h\right)$ and, in particular, its sample version

${\widehat{\kappa}}^{*}\left(h\right)$ has a more simple asymptotic normal distribution, see

Section 5 for details. Unfortunately,

${\kappa}^{*}\left(h\right)$ is not defined if only one of the

${p}_{i}$ equals 0, whereas

$\kappa \left(h\right)$ is well defined for any marginal distribution not being a one-point distribution. This issue may happen quite frequently for the sample version

${\widehat{\kappa}}^{*}\left(h\right)$ if the given time series is short (a possible circumvention is to replace all summands with

${p}_{i}=0$ by 0). For this reason,

${\kappa}^{*}\left(h\right),{\widehat{\kappa}}^{*}\left(h\right)$ appear to be of limited use for practice as a way of quantifying signed serial dependence. It should be noted that a similar “zero problem” happens with the entropy

${\nu}_{\mathrm{En}}$ in Equation (

2), and, actually, we work out a further relation between

${\nu}_{\mathrm{En}}$ and

${\widehat{\kappa}}^{*}\left(h\right)$ below.

In the recent work by

Lad et al. (

2015),

extropy was introduced as a complementary dual to the entropy. Its normalized version is given by

Here, the zero problem obviously only happens if one of the

${p}_{i}$ equals 1 (i.e., in the case of a one-point distribution). Similar to the Gini index in Equation (

1) and the entropy in Equation (

2), the extropy takes its minimal (maximal) value 0 (1) for

$\mathit{p}={\mathit{p}}_{\mathrm{one}}$ (

$\mathit{p}={\mathit{p}}_{\mathrm{uni}}$), thus also Equation (

5) constitutes a normalized measure of nominal variation. In

Section 2, we analyze its properties in comparison to Gini index and entropy. In particular, we focus on the respective sample versions

${\widehat{\nu}}_{\mathrm{Ex}}$,

${\widehat{\nu}}_{\mathrm{G}}$ and

${\widehat{\nu}}_{\mathrm{En}}$ (see

Section 3). To be able to do statistical inference based on

${\widehat{\nu}}_{\mathrm{Ex}}$,

${\widehat{\nu}}_{\mathrm{G}}$ and

${\widehat{\nu}}_{\mathrm{En}}$, knowledge about their distribution is required. Up to now, only the asymptotic distribution of

${\widehat{\nu}}_{\mathrm{G}}$ and (to some part) of

${\widehat{\nu}}_{\mathrm{En}}$ has been derived; in

Section 3, comprehensive results for all considered dispersion measures are provided. These asymptotic distributions are then used as approximations to the true sample distributions of

${\widehat{\nu}}_{\mathrm{Ex}}$,

${\widehat{\nu}}_{\mathrm{G}}$ and

${\widehat{\nu}}_{\mathrm{En}}$, which is further investigated with simulations and a real application (see

Section 4).

The second part of this paper is dedicated to the analysis of serial dependence. As a novel competitor to the measures in Equations (

3) and (

4), a new type of modified

$\kappa $ is proposed, namely

Again, this constitutes a measure of signed serial dependence, which shares the before-mentioned (in)dependence properties with

$\kappa \left(h\right),{\kappa}^{*}\left(h\right)$. However, in contrast to

${\kappa}^{*}\left(h\right)$, the newly proposed

${\kappa}^{\star}\left(h\right)$ does not have a division-by-zero problem: except for the case of a one-point distribution,

${\kappa}^{\star}\left(h\right)$ is well defined. Note that, in

Section 3.2, it turns out that

${\kappa}^{\star}\left(h\right)$ is related to

${\nu}_{\mathrm{Ex}}$ in some sense, e.g.

$\kappa \left(h\right)$ is related to

${\nu}_{\mathrm{G}}$ and

${\kappa}^{*}\left(h\right)$ to

${\nu}_{\mathrm{En}}$. In

Section 5, we analyze the sample version of

${\kappa}^{\star}\left(h\right)$ in comparison to those of

$\kappa \left(h\right),{\kappa}^{*}\left(h\right)$, and we derive its asymptotic distribution under the null hypothesis of serial independence. This allows us to test for significant dependence in categorical time series. The performance of this

${\widehat{\kappa}}^{\star}$-test, in comparison to those based on

$\widehat{\kappa},{\widehat{\kappa}}^{*}$, is analyzed in

Section 6, where also two further real applications are presented. Finally, we conclude in

Section 7.

## 2. Extropy, Entropy and Gini Index

As extropy, entropy and Gini index all serve for the same task, it is interesting to know their relations and differences. An important practical issue is the “

$0\phantom{\rule{0.166667em}{0ex}}ln0$”-problem, as mentioned above, which never occurs for the Gini index, only occurs in the case of a (deterministic) one-point distribution for the extropy, and always occurs for the entropy if only one

${p}_{i}=0$.

Lad et al. (

2015) further compared the non-normalized versions of extropy and entropy, and they showed that the first is never smaller than the latter. Actually, using the inequality

$ln(1+x)>x/(1+x/2)$ for

$x>0$ from

Love (

1980), it follows that

(see

Appendix B.1 for further details).

Things change, however, if considering the normalized versions

${\nu}_{\mathrm{Ex}}$,

${\nu}_{\mathrm{En}}$ and

${\nu}_{\mathrm{G}}$. For illustration, assume an underlying Lambda distribution

${L}_{m}\left(\lambda \right)$ with

$\lambda \in (0;1)$ defined by the probability vector

${p}_{m;\phantom{\rule{0.166667em}{0ex}}\lambda}\phantom{\rule{0.166667em}{0ex}}=\phantom{\rule{0.166667em}{0ex}}{(1-\lambda +\frac{\lambda}{m+1},\frac{\lambda}{m+1},\dots ,\frac{\lambda}{m+1})}^{\top}$ (

Kvålseth 2011a). Note that

$\lambda \to 0$ leads to a one-point distribution, whereas

$\lambda \to 1$ leads to the uniform distribution; actually,

${L}_{m}\left(\lambda \right)$ can be understood as a mixture of these boundary cases. For

${L}_{m}\left(\lambda \right)$, the Gini index satisfies

${\nu}_{\mathrm{G}}=\lambda (2-\lambda )$ for all

$m\in \mathbb{N}$ (see

Kvålseth (

2011a)). In addition, the extropy

${\nu}_{\mathrm{Ex}}$ has rather stable values for varying

m (see

Figure 1a), whereas the entropy values in

Figure 1b change greatly. This complicates the interpretation of the actual level of normalized entropy.

Finally, the example

$m=10$ plotted in

Figure 1c shows that, in contrast to Equation (

7), there is no fixed order between the normalized entropy

${\nu}_{\mathrm{En}}$ and Gini index

${\nu}_{\mathrm{G}}$. In this and many further numerical experiments, however, it could be observed that the inequalities

${\nu}_{\mathrm{Ex}}\ge {\nu}_{\mathrm{En}}$ and

${\nu}_{\mathrm{Ex}}\ge {\nu}_{\mathrm{G}}$ hold. These inequalities are formulated as a general conjecture here.

From now on, we turn towards the sample versions

${\widehat{\nu}}_{\mathrm{Ex}},{\widehat{\nu}}_{\mathrm{En}},{\widehat{\nu}}_{\mathrm{G}}$ of

${\nu}_{\mathrm{Ex}},{\nu}_{\mathrm{En}},{\nu}_{\mathrm{G}}$. These are obtained by replacing the probabilities

${p}_{i},\mathit{p}$ by the respective estimates

${\widehat{p}}_{i},\widehat{\mathit{p}}$, which are computed as relative frequencies from the given sample data

${x}_{1},\dots ,{x}_{n}$. As detailed in

Section 3,

${x}_{1},\dots ,{x}_{n}$ are assumed as time series data, but we also consider the case of independent and identically distributed (i.i.d.) data.

## 5. Measures of Signed Serial Dependence

After having discussed the analysis of marginal properties of a categorical time series, we now turn to the analysis of serial dependencies. In

Section 1, two known measures of signed serial dependence, Cohen’s

$\kappa \left(h\right)$ in Equation (

3) and a modification of it,

${\kappa}^{*}\left(h\right)$ in Equation (

4), are briefly surveyed, and, in

Section 3.2, we realize a connection to

${\nu}_{\mathrm{G}}$ and

${\nu}_{\mathrm{En}}$, respectively. Motivated by a zero problem with

${\kappa}^{*}\left(h\right)$, a new type of modified

$\kappa $ is proposed in Equation (

6), the measure

${\kappa}^{\star}\left(h\right)$, and this turns out to be related to

${\nu}_{\mathrm{Ex}}$.

If replacing the (bivariate) probabilities in Equations (

3), (

4) and (

6) by the respective (bivariate) relative frequencies computed from

${x}_{1},\dots ,{x}_{n}$, we end up with sample versions of these dependence measures. Knowledge of their asymptotic distribution is particularly relevant for the i.i.d.-case, because this allows us to test for significant serial dependence in the given time series. As shown by

Weiß (

2011,

2013),

$\widehat{\kappa}\left(h\right)$ then has an asymptotic normal distribution, and it holds approximately that

The sample version of

${\kappa}^{*}\left(h\right)$ has a more simple asymptotic normal distribution with

Weiß (

2011,

2013)

but it suffers from the before-mentioned zero problem, especially for short time series.

Thus, it remains to derive the asymptotics of the novel

${\widehat{\kappa}}^{\star}\left(h\right)$ under the null of an i.i.d. sample

${X}_{1},\dots ,{X}_{n}$. The starting point is an extension of the limiting result in Equation (

8). Under appropriate mixing assumptions (see

Section 3),

Weiß (

2013) derived the joint asymptotic distribution of

all univariate and equal-bivariate relative frequencies, i.e. of all

$\sqrt{n}\phantom{\rule{0.166667em}{0ex}}\left({\widehat{p}}_{i}-{p}_{i}\right)$ and

$\sqrt{n}\phantom{\rule{0.166667em}{0ex}}\left({\widehat{p}}_{jj}\left(h\right)-{p}_{jj}\left(h\right)\right)$, which is the

$2\phantom{\rule{0.166667em}{0ex}}(m+1)$-dimensional normal distribution

$N(\mathbf{0},{\mathbf{\Sigma}}^{\left(h\right)})$. The covariance matrix

${\mathbf{\Sigma}}^{\left(h\right)}$ consists of four blocks with entries

where always

$i,j=0,\dots ,m$, and where

This rather complex general result simplifies greatly special cases such as an NDARMA- DGP (

Weiß 2013) and, in particular, for an i.i.d. DGP:

Now, the asymptotic properties of

${\widehat{\kappa}}^{\star}\left(h\right)$ can be derived, as done in

Appendix B.3.

$\sqrt{n}\phantom{\rule{0.166667em}{0ex}}\left({\widehat{\kappa}}^{\star}\left(h\right)-{\kappa}^{\star}\left(h\right)\right)$ is asymptotically normally distributed, and mean and variance can be approximated by plugging Equation (

18) into

In the i.i.d.-case, we simply have

Comparing Equations (

16), (

17) and (

21), we see that all three measures have the same asymptotic bias

$-1/n$, but their asymptotic variances generally differ. An exception to the latter statement is obtained in the case of a uniform distribution, then also the asymptotic variances coincide (see

Appendix B.3).