1. Introduction
Long-range dependence (LRD) refers to a phenomenon where correlation decays slowly with the time lag in a stationary process in a way that the correlation function is no longer summable. This phenomenon was first observed by Hurst [
1,
2] and since then it has been observed in many fields such as economics, hydrology, internet traffic, queueing networks, etc. [
3,
4,
5,
6]. In a second order stationary process, LRD can be measured by the Hurst index
H [
7,
8],
Note that and if the process possesses a long-memory property.
Among the well-known stochastic processes that are stationary and possess long-range dependence are fractional Gaussian noise (FGN) [
9] and fractional autoregressive integrated moving average processes (FARIMA) [
10,
11].
Fractional Gaussian noise
is a mean-zero, stationary Gaussian process with covariance function:
where
is the Hurst parameter. The covariance function obeys the power law with exponent
for large lag,
If then the covariance function decreases slowly with the power law, and , i.e., it has the long-memory property.
A FARIMA(
p,
d,
q) process
is the solution of:
where
are positive integers,
d is real,
B is the backward shift,
and the fractional-differencing operator
, autoregressive operator
, and moving average operator
are, respectively,
where
is the white-noise process, which consists of iid random variables with the finite second moment. Here, the parameter
d manages the long-term dependence structure, and by its relation to the Hurst index,
corresponds to the long-range dependence in the FARIMA process.
Another class of stationary processes that can possess long-range dependence is from the countable-state Markov process [
12]. In a stationary, positive recurrent, irreducible, aperiodic Markov chain, the indicator sequence of visits to a certain state is long-range dependent if and only if return time to the state has an infinite second moment, and this is possible only when the Markov chain has infinite state space. Moreover, if one state has the infinite second moment of return time, then all the other states also have the infinite second moment of return time, and all the states have the same rate of dependency; that is, the indicator sequence of each state is long-range dependence with the same Hurst index.
In this paper, we develop a discrete-time finite-state stationary process that can possess long-range dependence. We define a stationary process
where the number of possible outcomes of
is finite,
for any
and for
for any
and some constants
This leads to:
where
If
, (1.2) implies that as
diverges with the rate of
, and the process is said to have long-memory with Hurst parameter
. Furthermore, from (1.1), for
the process
is long-range dependence if
In particular, if
then the states
and
produce different levels of dependence. For example, if
then the state
produces a long-memory counting process whereas state
produces a short-memory process.
A possible application of our stochastic process is to model the over-dispersed multinomial distribution. In the multinomial distribution, there are
n trials, each trial results in one of the finite outcomes, and the outcomes of the trials are independent and identically distributed. When applying the multinomial model to real data, it is often observed that the variance is larger than what it is assumed to be, which is called over-dispersion, due to the violation of the assumption that trials are independent and have identical distribution [
13,
14], and there have been several ways to model an overdispersed multinomial distribution [
15,
16,
17,
18].
Our stochastic process provides a new method to model an over-dispersed multinomial distribution by introducing dependency among trials. In particular, the variance of the number of a certain outcomes among
n trials is asymptotically proportional to the fractional exponent of
from which we define:
and call the distribution of
the fractional multinomial distribution.
The work in this paper is an extension of the earlier work of the generalized Bernoulli process [
19], and the process in this paper is reduced to the generalized Bernoulli process if there are only two states in the possible outcomes of
, e.g.,
.
In
Section 2, a finite state stationary process that can possess long-range dependence is developed. In
Section 3, the properties of our model are investigated with regard to tail behavior and moments of inter-arrival time of a certain state
, and conditional probability of observing a state
given the past observations in the process. In
Section 4, the fractional multinomial distribution is defined, followed by the conclusions in
Section 5. Some proofs of propositions and theorems are in
Section 6.
Throughout this paper, with and For any set the number of elements in the set and for the empty set, we define
2. Finite-State Stationary Process with Long-Range Dependence
We define the stationary process where the set of possible outcomes of is finite, for , with the probability that we observe a state at time i is for and
For any set
, define the operator:
If define and if
Let be vectors of length and We are now ready to define the following operators.
Definition 1. Let be pairwise disjoint, and Define,and, For ease of notation, we denote
and
by
respectively. Note that if
For any pairwise disjoint sets
if
then
is well defined stationary process with the following probabilities:
In particular, if the stationary process with the probability above is well defined, then, for
we have:
As a result, for
Note that
are
m generalized Bernoulli processes with Hurst parameter,
, respectively (see [
19]). However, they are not independent, since for
Therefore, the process possesses long-range dependence if
All the results that appear in this paper are valid regardless of how the finite-state space of
is defined. More specifically, given that:
for any pairwise disjoint sets
we can define probability (4)–(6) with any state space
for any
in the following way.
Note that the only difference is that the space
is replaced by
. As a result, we can obtain the same results as (7)–(10), except that
is replaced by
and we get:
In a similar way, all the results in this paper can be easily transfered to any finite-state space For the sake of simplicity, we assume without loss of generality, and define .
Now, we will give a restriction on the parameter values, , which will make for any pairwise disjoint sets ; therefore, the process is well-defined with the probability (4)–(6).
ASSUMPTIONS:
(A.1) for
(A.2) For any
,
For the rest of the paper, it is assumed that ASSUMPTIONS (A.1, A.2) hold.
Remark 1. (a). (11) holds if,since,is maximized when as it was seen in Lemma 2.1 of [19]. (b). If with in (11), then we have:and this, together with (11), implies that for any set This means that for any by (3).
(c). From (12),
(d). If (11) is reduced to (2.7) in the Lemma 2.1 in [19]. Now we are ready to show that is well defined with probability (4)–(6).
Proposition 1. For any disjoint sets The next theorem shows that the stochastic process defined with probability (4)–(6) is stationary, and it has long-range dependence if Furthermore, the indicator sequence of each state is stationary, and has long-range dependence if its Hurst exponent is greater than 1/2.
Theorem 1. is a stationary process with the following properties.
i.ii.andwhere Proof. By Proposition 1, is a well-defined stationary process with probability (4)–(6). The other results follow by (7)–(10). □
3. Tail Behavior of Inter-Arrival Time and Other Properties
For
is a stationary process in which the event
is recurrent, persistent, and aperiodic (here, we follow the terminology and definition in [
20]). We define a random variable
as the inter-arrival time between the
i-th
from the previous
, i.e.,
with
Since
is GBP with parameters
for
are iid (see page 9 [
21]). Therefore, we will denote the inter-arrival time between two consecutive observations of
k as
The next Lemma is directly obtained from Theorem 3.6 in [
21].
Lemma 1. For the inter-arrival time for state k, satisfies the following.
i. has a mean of . It has an infinite second moment if
ii.where is a slowly varying function that depends on the parameter . The first result
i in Lemma 1 is similar to Lemma 1 in [
22]. However, here, we have a finite-state stationary process, whereas countable-state space Markov chain was assumed in [
22]. Now, we investigate the conditional probabilities and the uniqueness of our process.
Theorem 2. Let be disjoint subsets of For any such that and for such that the conditional probability satisfies the following: If there has been no interruption of “0” after the last observation of “ℓ”, then the chance to observe “ℓ” depends on the distance between the current time and the last time of observation of “ℓ”, regardless of how other states appeared in the past. This can be considered as a generalized Markov property. Moreover, this chance to observe decreases as the distance increases, following the power law with exponent .
Proof. The result follows from the fact that:
since there is no
between
and
□
In a countable state space Markov chain, long-range dependence is possible only when it has infinite state space, and additionally if it is stationary, positive recurrent, irreducible, aperiodic Markov chain, then each state should have the same long-term memory, i.e., sequence indicators have the same Hurst exponent for all states [
22]. By relaxing the Markov property, long-range dependence was made possible in a finite-state stationary process, also with different Hurst parameter for different states.
Theorem 3. Let be disjoint subsets of For such that , and such that and the conditional probability satisfies the following:
Theorem 4. A stationary process with (4)–(6) is the unique stationary process that satisfies
ii. for and any for some constants iii. for any sets, and iv. for there is a function such that,for disjoint subsets, , such that and ( can be the empty set). Proof. Let
be a stationary process that satisfies
i–
. By
which results in:
Therefore, by
where
. Furthermore, by applying
iii,
iv to
,
This implies that satisfies (4)–(6) with for □
4. Fractional Multinomial Distribution
In this section, we define a fractional multinomial distribution that can serve as an over-dispersed multinomial distribution.
Note that
has mean
for
Further, as
for
, and,
where
, and
It also has the following covariance.
for
We define for and a fixed n, and call its distribution fractional multinomial distribution with parameters
If , follows a multinomial distribution with parameters and for and
If
can serve as over-dispersed multinomial random variables with:
where the over-dispersion parameter
is as follows.
for
and,
where
as
If
the over-dispersion parameter
remains stable as
n increases, whereas if
the over-dispersed parameter
increases with the rate of fractional exponent of
n,
6. Proofs
Lemma 2. For any that satisfies for ,
iii. For any , Proof. i and
ii were proved in Lemma 5.2 in [
19].
For
iii, define
such that,
Then,
which is weighted average of
. □
To ease our notation, we will denote:
by,
and,
where, if
,
and if
,
and are also defined in a similar way.
Lemma 3. For any disjoint sets ,
Proof. i.
where
are two closest elements to
among
such that if
then
if
then
if
then
and if
then
. Therefore,
By (11), and the result is derived.
ii. Since,
it is sufficient if we show:
Note that:
which is non-increasing as set
increases for
. That is,
for any sets
Therefore,
by
iii of Lemma 2. By
i of Lemma 2 combined with the fact that:
from (11), the result is derived. □
Note that for any disjoint sets
Proof of Proposition 1. We will show by mathematical induction that is a random vector with probability (4)–(6) for any n and any . For it is trivial. For it is proved by Lemma 3. Let us assume that is a random vector with probability (4)–(6) for any . We will prove that is a random vector for any
Without loss of generality, fix a set
To prove that
is a random vector with probability (4)–(6), we need to show that
for any pairwise disjoint sets,
such that
If
or 1, then the result follows from the definition of
and Lemma 3, respectively. Therefore, we assume that
and
Let
We will first show that for any such sets,
(13) is equivalent to
For fixed
define the following vectors of length
Since
is a random vector with (4)–(6),
and it can be written as:
Note that:
and:
where
. Therefore, by (14)–(16),
(17) can also be derived by the definition of
without using probability for
. In the same way, using the definition of
Note that, for
since we have:
by (16), and:
The last inequality is due to the fact that:
and for any set
C such that
or
by (11). More specifically,
where
are the two closest elements to
among
. That is,
are two closest elements to
such that if
then
and if
then
which is non-increasing as
j increases since
Therefore,
is non- increasing as
j increases. Also, for fixed
such that
or
,
by the fact that
is non-decreasing as the set
A increases.
Combining the above facts with (17) and (18), and by
i of Lemma 2,
Therefore,
which proves (13) and,
□
Proof of Theorem 3. a. Let
Note that:
Since,
is non-decreasing as
j increases, and by (19) and (20):
the result follows by
ii of Lemma 2.
For fixed
such that
,
and,
is non-increasing as
j increases. Therefore, the result follows by
i of Lemma 2. □