Next Article in Journal
Allocation of Starting Points in Global Optimization Problems
Previous Article in Journal
Optimal Coloring Strategies for the Max k-Cut Game
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Some Generalized Entropy Ergodic Theorems for Nonhomogeneous Hidden Markov Models

School of Economics and Management, Nanjing University of Science and Technology, Nanjing 210094, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Mathematics 2024, 12(4), 605; https://doi.org/10.3390/math12040605
Submission received: 27 January 2024 / Revised: 15 February 2024 / Accepted: 16 February 2024 / Published: 18 February 2024

Abstract

:
Entropy measures the randomness or uncertainty of a stochastic process, and the entropy rate refers to the limit of the time average of entropy. The generalized entropy rate in the form of delayed averages can overcome the redundancy of initial information while ensuring stationarity. Therefore, it has better practical value. A Hidden Markov Model (HMM) contains two stochastic processes, a stochastic process in which all states can be observed and a Markov chain in which all states cannot be observed. The entropy rate is an important characteristic of HMMs. The transition matrix of a homogeneous HMM is unique, while a Nonhomogeneous Hidden Markov Model (NHMM) requires the transition matrices to be dependent on time variables. From the perspective of model structure, NHMMs are novel extensions of homogeneous HMMs. In this paper, the concepts of the generalized entropy rate and NHMMs are defined and fully explained, a strong limit theorem and limit properties of a norm are presented, and then generalized entropy ergodic theorems with an almost surely convergence for NHMMs are obtained. These results provide concise formulas for the computation and estimation of the generalized entropy rate for NHMMs.

1. Introduction

Shannon introduced entropy, originally a thermodynamic function, into information theory to measure the uncertainty of random phenomena [1]. Shannon’s entropy in information theory has been successfully applied in many engineering fields, including vibration-signal feature extraction [2], chaotic image encryption [3], groundwater quality evaluation [4], etc. In order to accurately compute the average uncertainty of a stochastic process, the existence and properties of the entropy rate should be estimated [5,6]. Many approaches have been adopted to improve the theoretical integrity of the entropy rate. One of the famous works is the Shannon–McMillan–Breiman theorem or entropy ergodic theorem, or the asymptotic equipartition property (AEP), which reflects the almost surely (a.s.) convergence of the entropy rate to a constant. Liu ang Yang [7] proposed an extension of the Shannon–McMillan–Breiman theorem and some limit properties for nonhomogeneous Markov chains. Yang [8] proved the AEP for a nonhomogeneous Markov information source. Ordentlich et al. [9] used the Blackwell measure to compute the entropy rate. The entropy rate for Hidden Markov Models (HMMs) has been expressed in terms of upper and lower bounds [10]. However, even if states of HMMs are finite, the finite expression of the entropy rate does not exist, which is mainly because the set of their predictive features is generically infinite. To solve this problem, Jurgens et al. [11] evolved the mixed state according to the iterated function system and sampled the entropy of the place-dependent probability distribution at each step sequentially. Using an arbitrarily long word, the mean of these entropies can converge to the entropy rate.
The entropy rate is defined by the average of random variables throughout the entire process. In fact, we often encounter time series that appear to be “locally stationary”, so we can take an average of what has happened in some window of the recent past. The generalized entropy rate in the form of delayed averages can overcome the redundancy of initial information while ensuring stationarity and, therefore, has better practical value. Essentially, the generalized entropies are nonnegative functions defined on probability distributions that satisfy continuity, maximality, and expansibility. Delayed averages of random variables were first discussed by Zygmund [12]. Using the limiting behavior of delayed averages, Chow [13] proposed necessary and sufficient conditions for the Borel summability of independent identically distributed random variables. Lai [14] studied analogues of the law of the iterated logarithm for delayed averages of independent random variables. On this basis, Gut and Stradtm u ¨ ller [15] studied the strong law of large numbers on delayed averages of random fields. Wang [16] discussed the limit theorems of delayed averages for row-wise conditionally independent stochastic arrays and a class of asymptotic properties of moving averages for Markov chains in Markovian environments. From these studies, it can be seen that the limit of delayed averages has laid a solid theoretical foundation and is naturally applied to the entropy rate.
Combining the generalized entropy rate and nonhomogeneous Markov chains, Wang and Yang [17,18] studied generalized entropy ergodic theorems with a.s. and L 1 convergence for time nonhomogeneous Markov chains, and they obtained generalized entropy ergodic theorems for non-null stationary processes using Markov approximation. Shi et al. [19] studied the generalized AEP of higher-order nonhomogeneous Markov information sources by establishing several strong deviation theorems. The entropy rate is an important characteristic of HMMs, and the entropy rate of HMMs plays an important role in applications such as communication decoding, compression, and sorting. Therefore, theoretical research on the entropy rate of HMMs is very necessary. The classical HMMs were first introduced by Baum and Petrie, and they have been widely applied in various fields, including speech recognition, facial expression recognition, gene prediction, gesture recognition, musical composition, bio-informatics, and big data ranking [20,21,22,23]. The power of these models is that they can be very efficiently implemented and simulated. A homogeneous HMM contains two stochastic processes: the observed process is assumed to be conditionally temporally independent given the hidden process, and the hidden process is assumed to evolve according to a first-order Markov chain. A Nonhomogeneous Hidden Markov Model (NHMM) provides the idea by allowing for the transition matrices of the hidden states to be related to a set of observed covariates, that is, the transition matrix of a homogeneous HMM is unique, while an NHMM requires the transition matrices to be dependent on time variables. From the perspective of model structure, NHMMs are novel extensions of homogeneous HMMs. In the last ten years, new theories on NHMMs have emerged. Yang et al. [24] stated the law of large numbers for countable NHMMs. Zhang et al. [25] studied the stability analysis and controller design for a family of nonhomogeneous hidden semi-Markov jump systems with limited information of sojourn-time probability density functions. Shahzadi et al. [26] proposed a class of Nonhomogeneous Hidden Semi-Markov Models for modelling partially observed processes that do not necessarily behave in a stationary and memoryless manner.
Although there have been fruitful achievements in the two fields of generalized entropy ergodic theorems and NHMMs, research on generalized entropy ergodic theorems for NHMMs is limited. Therefore, we consider extending the application scenarios of the generalized entropy rate. Motivated by the above work, the main focus of this paper is to obtain a strong limit theorem of delayed averages of real-number functions and generalized entropy ergodic theorems with an almost surely convergence for NHMMs. These results provide a general idea for the relevant theoretical proof and concise formulas for the computation and estimation of the generalized entropy rate for NHMMs, and they lay the necessary mathematical, theoretical foundation for the reliability of model applications. The rest of this paper is organized as follows: in Section 2, a detailed description of NHMMs and related definitions are introduced. Section 3 presents some limit properties that are used in Section 4. In Section 4, the main results and the proofs are given. Section 5 summarizes the important content and discusses the significance of the research.

2. Preliminaries

This section provides preliminaries for the subsequent important contents. Continuing with the introduction, we state basic concepts and properties of NHMMs, point out the relationship between NHMMs and homogeneous HMMs, and define the generalized entropy, the generalized entropy rate, and other commonly used symbols on this basis.
Firstly, we give the definition and properties of NHMMs. Let X = { X n , n 0 } be a nonhomogeneous Markov chain defined on the probability space { Ω , F , P } taking values in a finite state space S = { s 1 , s 2 , , s N } , and Y = { Y n , n 0 } a stochastic process defined on the probability space { Ω , F , P } taking values in a finite state space L = { l 1 , l 2 , , l M } . { X , Y } = { ( X n , Y n ) , n 0 } is called an NHMM if and only if it meets the following forms and conditions:
  • The initial distribution of the nonhomogeneous Markov chain X = { X n , n 0 } is
    q ( 0 ) = ( q 0 ( s 1 ) , q 0 ( s 2 ) , , q 0 ( s N ) ) , q 0 ( s i ) = P ( X 0 = s i ) , 0 i N
    and the transition matrices are
    Q n = [ q n ( s i ; s j ) ] ,
    where
    q n ( s i ; s j ) = P ( X n = s j | X n 1 = s i ) , n 1 , s i , s j S , 1 i , j N .
  • For any n
    P { ( Y 0 , Y 1 , , Y n ) = ( y 0 , y 1 , , y n ) | X } = i = 0 n P { Y i = y i | X i } , a . s .
    where ( y 0 , y 1 , , y n ) are the realizations of ( Y 0 , Y 1 , , Y n ) .
If for any n, Q n = Q , , and the conditional probabilities P ( Y n = y n | X n = x n ) are independent on n, where Q is a stochastic matrix, x n and y n are realizations of X n and Y n , then X = { X n , n 0 } is a homogeneous Markov chain and { X , Y } = { ( X n , Y n ) , n 0 } is a homogeneous HMM.
Next, we list some properties for NHMMs, which are also equivalent definitions of the model and play a role in proving the theorems. For any 0 m < k , let X m k = ( X m , X m + 1 , , X k ) , Y m k = ( Y m , Y m + 1 , , Y k ) , and let x m k and y m k be the realizations of X m k and Y m k , respectively. In addition, For any m , k 0 , let X m , k = ( X m , X m + 1 , , X m + k ) , Y m , k = ( Y m , Y m + 1 , , Y m + k ) .
  • { X , Y } = { ( X n , Y n ) , n 0 } is an NHMM if and only if for any n,
    P { Y 0 n = y 0 n | X 0 n = x 0 n } = i = 1 n P { Y i = y i | X i = x i } .
  • { X , Y } = { ( X n , Y n ) , n 0 } is an NHMM if and only if for any n,
    P { X 0 n = x 0 n , Y 0 n = y 0 n } = P { X 0 = x 0 } P { Y 0 = y 0 | X 0 = x 0 } i = 1 n P { X i = x i | X i 1 = x i 1 } P { Y i = y i | X i = x i } .
  • { X , Y } = { ( X n , Y n ) , n 0 } is an NHMM if and only if for any n,
    P { X n = x n | X 0 n 1 = x 0 n 1 , Y 0 n 1 = y 0 n 1 } = P { X n = x n | X n 1 = x n 1 }
    and
    P { Y n = y n | X 0 n = x 0 n , Y 0 n 1 = y 0 n 1 } = P { Y n = y n | X n = x n } .
By Equations (7) and (8), we have
P { X n + 1 = x n + 1 , Y n + 1 = y n + 1 | X 0 n = x 0 n , Y 0 n = y 0 n } = P { X n + 1 = x n + 1 , Y n + 1 = y n + 1 | X n = x n } .
In the following text, probability measures are widely used. Therefore, for any k 1 , denote
p k ( x k ; y k ) = P ( Y k = y k | X k = x k ) ,
p k ( x k 1 ; x k , y k ) = P ( Y k = y k , X k = x k | X k 1 = x k 1 ) ,
p ( x 0 , y 0 , , x k , y k ) = P ( X 0 = x 0 , Y 0 = y 0 , , X k = x k , Y k = y k ) ,
and
p ( x 0 , y 0 , x k 1 , y k 1 ; x k , y k ) = P ( X k = x k , Y k = y k | X 0 = x 0 , Y 0 = y 0 , , X k 1 = x k 1 , Y k 1 = y k 1 ) .
A delayed average is one of the widely known technical indicators used to predict future data in time series analysis, and we define the entropy and the entropy rate in the form of delayed averages. Let ( f ( n ) ) n = 0 be a sequence of non-negative integers such that f ( n ) as n .
For simplicity, we use the natural logarithm here; thus, the generalized entropy can be measured. According to information theory, the definition of the entropy of an NHMM { X , Y } = { ( X n , Y n ) , n 0 } is
H ( X 0 , Y 0 , X 1 , Y 1 , , X n , Y n ) = E [ log P ( X 0 , Y 0 , X 1 , Y 1 , , X n , Y n , ) ] ,
where E [ · ] represents the expectation, and the definition of the entropy rate of an NHMM { X , Y } = { ( X n , Y n ) , n 0 } is
lim n 1 n H ( X 0 , Y 0 , X 1 , Y 1 , , X n , Y n ) .
Combining the concept of the generalized entropy with the properties of NHMMs, we have
H ( X n , f ( n ) , Y n , f ( n ) ) = H ( X n , Y n , , X n + f ( n ) , Y n + f ( n ) ) = H ( X n , Y n ) + k = n + 1 n + f ( n ) H ( X k , Y k | X n , Y n , , X k 1 , Y k 1 ) = H ( X n , Y n ) k = n + 1 n + f ( n ) x n , , x k S y n , , y k L p ( x n , y n , , x k , y k ) log p ( x n , y n , , x k 1 , y k 1 ; x k , y k ) = H ( X n , Y n ) k = n + 1 n + f ( n ) x k 1 S q k ( x k 1 ) x k S y k L p k ( x k 1 ; x k , y k ) log p k ( x k 1 ; x k , y k ) = H ( X n , Y n ) k = n + 1 n + f ( n ) x k 1 S q k ( x k 1 ) x k S y k L q k ( x k 1 ; x k ) p k ( x k ; y k ) log q k ( x k 1 ; x k ) p k ( x k ; y k ) = H ( X n , Y n ) + k = n + 1 n + f ( n ) H ( X k , Y k | X k 1 ) ,
where X n , f ( n ) and Y n , f ( n ) denotes ( X n , X n + 1 , , X n + f ( n ) ) and ( Y n , Y n + 1 , , Y n + f ( n ) ) , respectively, and we use the distribution
q ( k ) = ( q k ( s 1 ) , q k ( s 2 ) , , q k ( s N ) ) , q k ( s i ) = P ( X k = s i ) , 0 i N ,
which is the same below.
To prepare for the following text, we introduce concepts that may frequently appear.
Definition 1. 
Let { X , Y } = { ( X n , Y n ) , n 0 } be an NHMM defined as above. Define the generalized entropy density as
F n , f ( n ) ( ω ) = 1 f ( n ) log P ( X n , f ( n ) , Y n , f ( n ) ) = 1 f ( n ) { log P ( X n , Y n ) + k = n + 1 n + f ( n ) log P ( X k , Y k | X k 1 , Y k 1 ) } ,
where ω Ω .
In the following text, we prove that the limit of the entropy density is the entropy rate.
Definition 2 
([21]). Let X = { X n , n 0 } be a homogeneous Markov chain. Let Q be a transition matrix of X = { X n , n 0 } . Q is called strongly ergodic if there exists a probability distribution π = ( π 1 , π 2 , , π N ) in S satisfying
sup q ( 0 ) q ( 0 ) Q n π 0 , a s n
where q ( 0 ) is a starting vector. Obviously, Equation (19) implies π Q = π , and π is called the stationary distribution determined by Q.
Definition 3. 
Let α = ( α 1 , α 2 , , α N ) be a vector, and then the norm of α is defined by
α = i = 1 N | α i | .
Let A = ( a i j ) N × N be a square matrix, and then the norm of A is defined by
A = max i j = 1 N | a i j | .

3. Some Limit Properties

In this section, we give some lemmas, which are used to prove the main conclusions. These lemmas include a strong limit theorem for NHMMs and limit properties of a norm. Lemma 1 provides a strong limit theorem for bounded functions of NHMMs. Lemma 2 gives the convergence of bounded functions of real sequences on averages. Lemma 3 points out that under the condition of the convergence of the transition matrices of NHMMs, the counting averages of Markov chains converge to a stationary distribution of irreducible matrices. Lemma 4 points out the convergence of the transition matrix multiplication average. Lemma 5 proves the relationship between the convergence of vector sequences on averages and the convergence of subsequences.
Lemma 1. 
Let { X , Y } = { ( X n , Y n ) , n 0 } be an NHMM. Let g n ( · , · , · ) be a sequence of bounded real number functions defined on S 2 × L . If for any ϵ > 0 ,
n = 1 exp { ϵ f ( n ) } < ,
and there exists a positive number γ such that for any s S , l L ,
lim sup n max t S 1 f ( n ) k = n + 1 n + f ( n ) g k ( t , s , l ) p k ( t ; s , l ) exp { γ | g k ( t , s , l ) | } = C γ ( s , l ) < ,
then
lim n 1 f ( n ) k = n + 1 n + f ( n ) { g k ( X k 1 , X k , Y k ) E [ g k ( X k 1 , X k , Y k ) | X k 1 ] } = 0 , a . s .
where E [ · | · ] represents conditional expectation.
Proof. 
Let u be a nonzero real number, and define
Λ n , f ( n ) ( u , ω ) = exp { u k = n + 1 n + f ( n ) g k ( X k 1 , X k , Y k ) } k = n + 1 n + f ( n ) E [ e g k ( X k 1 , X k , Y k ) | X k 1 ] , n = 1 , 2 ,
where ω Ω and E [ · | · ] represents conditional expectation. By the properties of conditional expectations, we have
E Λ n , f ( n ) ( u , ω ) = E [ E [ Λ n , f ( n ) ( u , ω ) | X 0 , n + f ( n ) 1 ] ] = E [ E [ Λ n , f ( n ) 1 ( u , ω ) e u g n + f ( n ) 1 ( X n + f ( n ) 1 , X n + f ( n ) , Y n + f ( n ) ) E [ e g n + f ( n ) 1 ( X n + f ( n ) 1 , X n + f ( n ) , Y n + f ( n ) ) | X n + f ( n ) 1 ] | X 0 , n + f ( n ) 1 ] ] = E [ Λ n , f ( n ) 1 ( u , ω ) E [ e u g n + f ( n ) 1 ( X n + f ( n ) 1 , X n + f ( n ) , Y n + f ( n ) ) ] | X n + f ( n ) 1 ] E [ e g n + f ( n ) 1 ( X n + f ( n ) 1 , X n + f ( n ) , Y n + f ( n ) ) | X n + f ( n ) 1 ] ] = E Λ n , f ( n ) 1 ( u , ω ) = · · · = E Λ n , 1 ( u , ω ) = 1 .
For any ϵ > 0 , by Markov inequality and Equation (25), we have
n = 1 P [ f 1 ( n ) log Λ n , f ( n ) ( u , ω ) ϵ ] = n = 1 P [ Λ n , f ( n ) ( u , ω ) exp ( ϵ f ( n ) ) ] n = 1 1 · exp { ϵ f ( n ) } < .
Combining the Borel–Cantelli Lemma and the arbitrariness of ϵ , we have
lim sup n 1 f ( n ) log Λ n , f ( n ) ( u , ω ) 0 . a . s .
Expanding Equation (28), we have
lim sup n 1 f ( n ) k = n + 1 n + f ( n ) { u g k ( X k 1 , X k , Y k ) log E [ g k ( X k 1 , X k , Y k ) | X k 1 ] } 0 . a . s .
Let 0 < u < γ . Using the inequalities log x x 1 ( x > 0 ) and 0 e x 1 x 1 2 x 2 e | x | ( x R ) , we have
lim sup n 1 f ( n ) k = n + 1 n + f ( n ) { g k ( X k 1 , X k , Y k ) E [ g k ( X k 1 , X k , Y k ) | X k 1 ] } lim sup n 1 f ( n ) k = n + 1 n + f ( n ) { 1 u log E [ e u g k ( X k 1 , X k , Y k ) | X k 1 ] E [ g k ( X k 1 , X k , Y k ) | X k 1 ] } lim sup n 1 f ( n ) k = n + 1 n + f ( n ) { E [ e u g k ( X k 1 , X k , Y k ) 1 u g k ( X k 1 , X k , Y k ) | X k 1 ] u } lim sup n 1 f ( n ) k = n + 1 n + f ( n ) E [ g k 2 ( X k 1 , X k , Y k ) e u | g k ( X k 1 , X k , Y k ) | | X k 1 ] 1 2 u C γ ( s , l ) < . a . s .
Let u 0 in Equation (30), then
lim sup n 1 f ( n ) k = n + 1 n + f ( n ) { g k ( X k 1 , X k , Y k ) E [ g k ( X k 1 , X k , Y k ) | X k 1 ] } 0 . a . s .
Similarly, let γ < u < 0 , we have
lim inf n 1 f ( n ) k = n + 1 n + f ( n ) { g k ( X k 1 , X k , Y k ) E [ g k ( X k 1 , X k , Y k ) | X k 1 ] } 0 . a . s .
Equation (24) follows immediately from inequalities (31) and (32). □
The key to prove Lemma 1 is to construct the likelihood ratio. By using the approximation of the upper and lower limits, it is possible to obtain an almost surely convergence of bounded functions. This lemma is used to prove Theorem 1.
Lemma 2 
([17]). Let h ( z ) be a bounded function defined on an real-number interval D and { z n } n = 0 a sequence in D. If lim n 1 f ( n ) k = n + 1 n + f ( n ) | z k z | = 0 , h ( z ) is continuous at point z, and Equation (22) holds, then
lim n 1 f ( n ) k = n + 1 n + f ( n ) | h ( z k ) h ( z ) | = 0 .
Lemma 3 
([17]). Let Q be an irreducible transition matrix. If Equation (22) holds, and for any t , s S ,
lim n 1 f ( n ) k = n + 1 n + f ( n ) | q k ( s ; t ) q ( s ; t ) | = 0 ,
then
lim n 1 f ( n ) k = n n + f ( n ) 1 1 { s i } ( X k ) = π i a . s .
holds for any s i S , 1 i N , where 1 { s i } ( X k ) is an indicative function and π = ( π 1 , π 2 , , π N ) is the unique stationary distribution determined by Q.
These two lemmas serve as support for theorem 2. It should be emphasized that Q in Lemma 3 is irreducible, so this lemma still holds for ergodic matrices.
Lemma 4. 
Let X = { X n , n 0 } be a nonhomogeneous Markov chain with transition matrices { Q n , n 1 } . Let Q be a periodic strong ergodic stochastic matrix. Assume that c = ( c 1 , c 2 , , c N ) is a left eigenvector of Q, the unique solution of equations c Q = c . Let B be a constant random matrix, where each row of B is c. If Equation (22) holds and
lim n 1 f ( n ) k = n + 1 n + f ( n ) | | Q k Q | | = 0 ,
then
lim n | | 1 f ( n ) k = n + 1 n + f ( n ) Q ( m , m + k 1 ) B | | = 0
holds for any m N , where Q ( m , m + k 1 ) = Q m Q m + k 1 , Q ( m , m ) = Q m , Q 0 = I (I is the identical matrix).
The proof process of Lemma 4 is similar to theorem 1 of [27]. It should be emphasized that the matrices appearing in Lemma 4 are composed of constants, so the norm can be calculated. Lemma 4 is one of the prerequisites for Theorem 3.
Lemma 5. 
Let { β n } n = 1 and β be column vectors with real entries. If Equation (22) holds and
lim n 1 f ( n ) k = n + 1 n + f ( n ) | | β k β | | = 0 ,
then there exists a subsequence { β n k } k = 1 of { β n } n = 1 such that
lim k | | β n k β | | = 0 .
Proof. 
Constructing an inequality, there exists m N
1 f ( n ) k = n + 1 n + f ( n ) | | β m + k β | | m + f ( n ) f ( n ) · 1 m + f ( n ) k = n + 1 n + f ( n ) | | β k β | | .
From Equation (40), it can be concluded that there exists m N , such that
lim n 1 f ( n ) k = n + 1 n + f ( n ) | | β m + k β | | = 0 .
holds. Choose a positive sequence { b k } 0 , and by Equation (41), u , v N ,
1 v k = u + 1 u + v | | β m + k β | | b 1 .
Therefore, there exists u n 1 u + v such that | | g n 1 g | | b 1 . By (40), we have
lim n 1 f ( n ) k = n + 1 n + f ( n ) | | β n 1 + k β | | = 0 .
Similarly, there exists n 2 n 1 such that | | β n 2 β | | a 1 . Generally, we can get a subsequence { β n k } k = 1 of { β n } n = 1 such that | | β n k β | | b k , k = 1 , 2 , 3 , . Equation (39) follows immediately. □
The key to the proof of Lemma 5 is to construct inequality (40). According to the properties of the convergent sequence, even if the sequence is not monotonic, there exists m N , and inequality (40) holds. Lemma 5 is also one of the prerequisites for Theorem 3.

4. Generalized Entropy Ergodic Theorems

In this section, we give the main results and their proofs. Generalized entropy ergodic theorems with an almost surely convergence for NHMMs are presented. These results provide concise formulas for the computation and estimation of the generalized entropy rate for NHMMs.
Theorem 1. 
Let { X , Y } = { ( X n , Y n ) , n 0 } be an NHMM. If for any ϵ > 0 ,
n = 1 exp { ϵ f ( n ) } < ,
and there exists a positive number γ, such that for any s S , l L ,
lim sup n max t S 1 f ( n ) k = n + 1 n + f ( n ) g k ( t , s , l ) p k ( t ; s , l ) exp { γ | g k ( t , s , l ) | } = C γ ( s , l ) < ,
then
lim n F n , f ( n ) ( ω ) = lim n 1 f ( n ) k = n + 1 n + f ( n ) s S l L p k ( X k 1 ; s , l ) log p k ( X k 1 ; s , l ) . a . s .
Proof. 
Set g k ( X k 1 ; s , l ) = log p k ( X k 1 ; s , l ) and γ = 1 2 in Lemma 3. Using the inequality ( log x ) 2 x 1 2 e 2 ( 0 x 1 ) , we can conclude that for any s S , l L ,
C 1 2 ( s , l ) = lim sup n max t S 1 f ( n ) k = n + 1 n + f ( n ) [ log p k ( t ; s , l ) ] 2 p k ( t ; s , l ) exp { 1 2 | p k ( t ; s , l ) | } 16 exp { 2 } .
It is not hard to verify that
1 f ( n ) k = n + 1 n + f ( n ) { g k ( X k 1 , X k , Y k ) s S l L g k ( X k 1 , s , l ) p k ( X k 1 ; s , l ) } = 1 f ( n ) k = n + 1 n + f ( n ) { log p k ( X k 1 ; X k , Y k ) s S l L p k ( X k 1 ; s , l ) log p k ( X k 1 ; s , l ) } = 1 f ( n ) k = n + 1 n + f ( n ) { log p k ( X k 1 ; X k , Y k ) s S l L p k ( X k 1 ; s , l ) log p k ( X k 1 ; s , l ) } = 1 f ( n ) log P ( X n , Y n ) + F n , f ( n ) ( ω ) + 1 f ( n ) k = n + 1 n + f ( n ) s S l L p k ( X k 1 ; s , l ) log p k ( X k 1 ; s , l ) .
The transition probability falls within the interval [0,1], so its logarithm is bounded. In addition, infinitesimal multiplication by a bounded quantity is still infinitesimal, so the initial distribution is erased. Hence the conclusion can be deduced immediately from Lemma 1. □
Theorem 2. 
Let { X , Y } = { ( X n , Y n ) , n 0 } be an NHMM. Assume that for any ϵ > 0 ,
n = 1 exp { ϵ f ( n ) } < ,
and there exists a positive number γ, such that for any s S , l L ,
lim sup n max t S 1 f ( n ) k = n + 1 n + f ( n ) g k ( t , s , l ) p k ( t ; s , l ) exp { γ | g k ( t , s , l ) | } = C γ ( s , l ) < .
If for any ( t , s , l ) S 2 × L ,
lim n 1 f ( n ) k = n + 1 n + f ( n ) | p k ( t ; s , l ) p ( t ; s , l ) | = lim n 1 f ( n ) k = n + 1 n + f ( n ) | q k ( t ; s ) p k ( s ; l ) q ( t ; s ) p ( s ; l ) | = 0
holds, where q represents terms of an irreducible transition matrix Q, then
lim n F n , f ( n ) ( ω ) = t S π ( t ) s S l L p ( t ; s , l ) log p ( t ; s , l ) ,
where π ( t ) , t S belongs to π = ( π 1 , π 2 , , π N ) , which is a stationary distribution of Q.
Proof. 
Set h ( z ) = z log z in Lemma 2; then, for any ( t , s , l ) S 2 × L ,
lim n 1 f ( n ) k = n + 1 n + f ( n ) | p k ( t ; s , l ) log p k ( t ; s , l ) p ( t ; s , l ) log p ( t ; s , l ) | = 0 .
Using absolute inequality, we have
| F n , f ( n ) ( ω ) + t S π ( t ) s S l L p ( t ; s , l ) log p ( t ; s , l ) | | F n , f ( n ) ( ω ) + 1 f ( n ) k = n + 1 n + f ( n ) t S s S l L 1 { t } ( X k 1 ) p k ( t ; s , l ) log p k ( t ; s , l ) | + | 1 f ( n ) k = n + 1 n + f ( n ) t S s S l L 1 { t } ( X k 1 ) [ p k ( t ; s , l ) log p k ( t ; s , l ) p ( t ; s , l ) log p ( t ; s , l ) ] | + | 1 f ( n ) k = n + 1 n + f ( n ) t S [ π ( t ) 1 { t } ( X k 1 ) ] s S l L p ( t ; s , l ) log p ( t ; s , l ) | | F n , f ( n ) ( ω ) + 1 f ( n ) k = n + 1 n + f ( n ) t S s S l L 1 { t } ( X k 1 ) p k ( t ; s , l ) log p k ( t ; s , l ) | + t S s S l L 1 f ( n ) k = n + 1 n + f ( n ) | p k ( t ; s , l ) log p k ( t ; s , l ) p ( t ; s , l ) log p ( t ; s , l ) | + t S | π ( t ) 1 f ( n ) k = n + 1 n + f ( n ) 1 { t } ( X k 1 ) | | s S l L p ( t ; s , l ) log p ( t ; s , l ) |
where π ( t ) , t S belongs to π = ( π 1 , π 2 , , π N ) , which is a stationary distribution of Q. By Lemma 3, Equation (52) follows from Equation (54). □
The proof of Theorem 1 mainly relies on the construction of inequalities, and the proof of Theorem 2 utilizes the properties of norms. The two theorems explain that the limit of the generalized entropy density is the generalized entropy rate.
Theorem 3. 
Let { X , Y } = { ( X n , Y n ) , n 0 } be an NHMM. Q = ( q ( t ; s ) ) N × N is another transition matrix, and assume that Q is periodic and strongly ergodic. Let
β n ( t ) = s S l L q n ( t ; s ) p n ( s ; l ) log q n ( t ; s ) p n ( s ; l ) ,
β ( t ) = s S l L q ( t ; s ) p ( s ; l ) log q ( t ; s ) p ( s ; l ) ,
where β n ( t ) , β ( t ) , t S are the elements of column vectors β n and β, respectively, and assume that { β n , n 1 } are bounded. If Equation (22) holds,
lim n 1 f ( n ) k = n + 1 n + f ( n ) Q k Q = 0 ,
and
lim n 1 f ( n ) k = n + 1 n + f ( n ) β k β = 0 ,
then the generalized entropy rate of { X , Y } exists, and
lim n 1 f ( n ) H ( X n , Y n , , X n + f ( n ) , Y n + f ( n ) ) = π β , a . s .
where π = ( π 1 , π 2 , , π N ) is the unique stationary distribution determined by Q.
Proof. 
Let q ( k 1 ) be a row vector with elements P ( X k 1 = t ) , t S . Hence, using the definition and properties of β n , we have
H ( X k , Y k , | X k 1 ) = q ( k 1 ) β k .
Simply take B as a constant random matrix whose rows are equal to π . Note that π = q ( 0 ) B , where q ( 0 ) is an initial distribution of the Markov chain. Since
1 f ( n ) k = n + 1 n + f ( n ) q ( k 1 ) π = 1 f ( n ) k = n + 1 n + f ( n ) q ( 0 ) Q ( 0 , k 1 ) q ( 0 ) B 1 f ( n ) k = n + 1 n + f ( n ) Q ( 0 , k 1 ) B ,
where Q ( 0 , k 1 ) = Q 1 Q k 1 , Q ( 0 , 0 ) = I (I is the identical matrix), by Equations (57) and (61) and Lemma 5, we have
lim n 1 f ( n ) k = n + 1 n + f ( n ) q ( k 1 ) π = 0 .
By Equation (58) and Lemma 5, there exists a subsequence { β n k } k = 1 of { β n } n = 1 such that
lim k β n k β = 0 .
Hence, β is finite. By Equation (62) and properties of entropy, we have
| 1 f ( n ) H ( X n , Y n , , X n + f ( n ) , Y n + f ( n ) ) + π β | = | 1 f ( n ) H ( X n , Y n ) + 1 f ( n ) k = n + 1 n + f ( n ) H ( X k , Y k | X k 1 ) + π β | | 1 f ( n ) H ( X n , Y n ) | + | 1 f ( n ) k = n + 1 n + f ( n ) q ( k 1 ) β k π β | | 1 f ( n ) H ( X n , Y n ) | + | 1 f ( n ) k = n + 1 n + f ( n ) q ( k 1 ) β k 1 f ( n ) k = n + 1 n + f ( n ) q ( k 1 ) β | + | 1 f ( n ) k = n + 1 n + f ( n ) q ( k 1 ) β π β | 1 f ( n ) | H ( X n , Y n ) | + 1 f ( n ) k = n + 1 n + f ( n ) β k β + β 1 f ( n ) k = n + 1 n + f ( n ) q ( k 1 ) π 0 as n .
This completes the proof of Theorem 3. □
Theorem 3 gives a method to compute the generalized entropy rate of an NHMM under some mild conditions, and when the model degenerates into a homogeneous HMM and a nonhomogeneous Markov chain, the results are still the same. The two corollaries are existing results, which indirectly demonstrate the correctness of this theorem.
Corollary 1. 
Let { X , Y } = { ( X n , Y n ) , n 0 } be an NHMM with a periodic and strongly ergodic transition matrix Q = ( q ( t ; s ) ) N × N and an emission probability matrix P = ( p ( s ; l ) ) N × M , where t , s S , l L . Let
β ( t ) = s S l L q ( t ; s ) p ( s ; l ) log q ( t ; s ) p ( s ; l ) ,
where β ( t ) , t S are the elements of column vector β. Assume that β is bounded. If Equation (22) holds, then the generalized entropy rate of { X , Y } exists, and
lim n 1 f ( n ) H ( X n , Y n , , X n + f ( n ) , Y n + f ( n ) ) = π β , a . s .
where π = ( π 1 , π 2 , , π N ) is the unique stationary distribution determined by Q.
Corollary 2. 
Let X = { ( X n ) , n 0 } be a nonhomogeneous Markov chain. Q = ( q ( t ; s ) ) N × N is another transition matrix, and assume that Q is periodic and strongly ergodic, where t , s S . Let
β n ( t ) = s S q n ( t ; s ) log q n ( t ; s ) ,
β ( t ) = s S q ( t ; s ) log q ( t ; s ) ,
where β n ( t ) , β ( t ) , t S are the elements of column vectors β n and β, respectively. Assume that { β n , n 1 } are bounded. If Equation (22) holds,
lim n 1 f ( n ) k = n + 1 n + f ( n ) Q k Q = 0 ,
and
lim n 1 f ( n ) k = n + 1 n + f ( n ) β k β = 0 ,
then the generalized entropy rate of X exists, and
lim n 1 f ( n ) H ( X n , , X n + f ( n ) ) = π β , a . s .
where π = ( π 1 , π 2 , , π N ) is the unique stationary distribution determined by Q.

5. Conclusions

Entropy ergodic theorems reflect the almost surely convergence of the entropy rate to a constant. The generalized entropy rate in the form of delayed averages can overcome the redundancy of initial information while ensuring stationarity and, therefore, it has better practical value. An NHMM provides the idea by allowing the transition matrix of the hidden states in a homogeneous HMM to be related to a set of observed covariates. Although there have been fruitful achievements in the two fields of generalized entropy ergodic theorems and NHMMs, research on the generalized entropy ergodic theorems for NHMMs is limited. Therefore, we consider extending the application scenarios of the generalized entropy rate. In this paper, we give the basic concepts of NHMMs and the entropy rate in the form of delayed averages, list some lemmas including a strong limit theorem and limit properties of a norm, and prove some generalized entropy ergodic theorems with an almost surely convergence for NHMMs. These results, which are generalizations of previous findings of [17,27], provide concise formulas for the computation and estimation of the generalized entropy rate in the form of delayed averages. Theoretical analysis has proven the existence and limit formula of entropy rate, but its numerical calculation is also very complex and challenging. Specifically, this process requires calculating the integration on a certain measure, which is related to the parameters of HMMs. Many methods have been proposed for this, such as approximate formulas, series expansion, and statistical calculation methods. Due to the fact that approximate formulas are not applicable to any parameters, and their accuracy is difficult to estimate, the convergence conditions of the series are difficult to prove, and the sample size required for statistical convergence is difficult to estimate. Therefore, approximating with upper and lower limits is a reliable method, and its accuracy can be estimated, with relatively lenient requirements for HMM parameters. In the future, we will conduct numerical case studies based on the theoretical analysis above.

Author Contributions

Conceptualization, Q.Y. and L.C.; methodology, Q.Y. and W.C.; writing—original draft preparation, Q.Y. and L.C.; writing—review and editing, T.M.; funding acquisition, Q.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Postgraduate Research & Practice Innovation Program of Jiangsu Province (Grant number KYCX21_0358).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
HMMHidden Markov Model
NHMMNonhomogeneous Hidden Markov Model
AEPAsymptotic equipartition property
a.s.Almost surely

References

  1. Shannon, C. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 37, 379–423. [Google Scholar] [CrossRef]
  2. Bafroui, H.H.; Ohadi, A. Application of wavelet energy and Shannon entropy for feature extraction in gearbox fault detection under varying speed conditions. Neurocomputing 2014, 133, 437–445. [Google Scholar] [CrossRef]
  3. Li, C.; Lin, D.; Feng, B.; Lv, J.; Hao, F. Cryptanalysis of a chaotic image encryption algorithm based on information entropy. IEEE Access 2018, 6, 75834–75842. [Google Scholar] [CrossRef]
  4. Hasan, M.S.U.; Rai, A.K. Groundwater quality assessment in the Lower Ganga Basin using entropy information theory and GIS. J. Clean. Prod. 2020, 274, 123077. [Google Scholar] [CrossRef]
  5. Jacquet, P.; Seroussi, G.; Szpankowski, W. On the entropy of a hidden Markov process. Theor. Comput. Sci. 2008, 395, 203–219. [Google Scholar] [CrossRef] [PubMed]
  6. Feutrill, A.; Roughan, M. A review of Shannon and differential entropy rate estimation. Entropy 2021, 23, 1046. [Google Scholar] [CrossRef] [PubMed]
  7. Liu, W.; Yang, W.G. An extension of Shannon-McMillan theorem and some limit properties for nonhomogeneous Markov chains. Stoch. Proc. Appl. 1996, 61, 129–145. [Google Scholar]
  8. Yang, W.G. The Asymptotic Equipartition Property for a Nonhomogeneous Markov Information Source. Probab. Eng. Inform. Sci. 1998, 12, 509–518. [Google Scholar] [CrossRef]
  9. Ordentlich, E.; Weiman, T. On the optimality of symbol by symbol filtering and denoising. IEEE Trans. Inform. Theory 2006, 52, 19–40. [Google Scholar] [CrossRef]
  10. Travers, N.F. Exponential bounds for convergence of entropy rate approximations in hidden Markov models satisfying a path-mergeability condition. Stoch. Proc. Appl. 2014, 124, 4149–4170. [Google Scholar] [CrossRef]
  11. Jurgens, A.M.; Crutchfield, J.P. Shannon entropy rate of hidden Markov processes. J. Stat. Phys. 2021, 183, 32. [Google Scholar] [CrossRef]
  12. Zygmund, A. Trigonometric Series 1; Cambridge University Press: Cambridge, UK, 1959. [Google Scholar]
  13. Chow, Y.S. Delayed sums and Borel summability for independent, identically distributed random variables. Bull. Inst. Math. Acad. Sin. 1972, 1, 286–291. [Google Scholar]
  14. Lai, Z.L. Limit theorems for delayed sums. Ann. Probab. 1974, 2, 432–440. [Google Scholar] [CrossRef]
  15. Gut, A.; Stradtmüller, U. On the strong law of large numbers for delayed sums and random fields. Acta. Math. Hung. 2010, 129, 182–203. [Google Scholar] [CrossRef]
  16. Wang, Z.Z. A kind of asymptotic properties of moving averages for Markov chains in Markovian environments. Commun. Stat.-Theory Methods 2017, 46, 10926–10940. [Google Scholar]
  17. Wang, Z.Z.; Yang, W.G. The generalized entropy ergodic theorem for nonhomogeneous Markov chains. J. Theor. Probab. 2016, 29, 761–775. [Google Scholar] [CrossRef]
  18. Wang, Z.Z.; Yang, W.G. Markov approximation and the generalized entropy ergodic theorem for non-null stationary process. Proc. Indian-Math. Sci. 2020, 130, 1–13. [Google Scholar] [CrossRef]
  19. Shi, Z.; Zhu, X. The generalized AEP for higher order nonhomogeneous Markov information source. Res. Sq. 2023. [Google Scholar] [CrossRef]
  20. Baum, L.E.; Petrie, T. Statistical inference for probabilistic functions of finite state Markov chains. Ann. Math. Stat. 1966, 37, 1554–1563. [Google Scholar] [CrossRef]
  21. Isaacson, D.; Madsen, R. Markov Chains Theory and Applications; Wiley: New York, NY, USA, 1976. [Google Scholar]
  22. Mor, B.; Garhwal, S.; Kumar, A. A systematic review of hidden Markov models and their applications. Arch. Comput. Methods Eng. 2021, 28, 1429–1448. [Google Scholar] [CrossRef]
  23. Bacha, R.E.; Zin, T.T. A Markov Chain Approach to Big Data Ranking Systems. In Proceedings of the 2017 IEEE 6th Global Conference on Consumer Electronics (GCCE), Nagoya, Japan, 24–27 October 2017; pp. 1–2. [Google Scholar]
  24. Yang, G.Q.; Yang, W.G.; Wu, X.T. The strong laws of large numbers for countable non homogeneous hidden Markov models. Commun. Stat. -Theory Methods 2017, 46, 8808–8819. [Google Scholar] [CrossRef]
  25. Zhang, L.; Cai, B.; Tan, T.; Shi, Y. Stabilization of non-homogeneous hidden semi-Markov jump systems with limited sojourn-time information. Automatica 2020, 117, 108963. [Google Scholar] [CrossRef]
  26. Shahzadi, A.; Wang, T.; Bebbington, M.; Parry, M. Inhomogeneous hidden semi-Markov models for incompletely observed point processes. Ann. Inst. Stat. Math. 2023, 75, 253–280. [Google Scholar] [CrossRef]
  27. Yang, W.G. Convergence in the Cesàro sense and strong law of large numbers for countable nonhomogeneous Markov chains. Linear Algebra Appl. 2002, 354, 275–286. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yao, Q.; Cheng, L.; Chen, W.; Mao, T. Some Generalized Entropy Ergodic Theorems for Nonhomogeneous Hidden Markov Models. Mathematics 2024, 12, 605. https://doi.org/10.3390/math12040605

AMA Style

Yao Q, Cheng L, Chen W, Mao T. Some Generalized Entropy Ergodic Theorems for Nonhomogeneous Hidden Markov Models. Mathematics. 2024; 12(4):605. https://doi.org/10.3390/math12040605

Chicago/Turabian Style

Yao, Qifeng, Longsheng Cheng, Wenhe Chen, and Ting Mao. 2024. "Some Generalized Entropy Ergodic Theorems for Nonhomogeneous Hidden Markov Models" Mathematics 12, no. 4: 605. https://doi.org/10.3390/math12040605

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop