A Metric Based on the Efficient Determination Criterion

This paper extends the concept of metrics based on the Bayesian information criterion (BIC), to achieve strongly consistent estimation of partition Markov models (PMMs). We introduce a set of metrics drawn from the family of model selection criteria known as efficient determination criteria (EDC). This generalization extends the range of options available in BIC for penalizing the number of model parameters. We formally specify the relationship that determines how EDC works when selecting a model based on a threshold associated with the metric. Furthermore, we improve the penalty options within EDC, identifying the penalty ln(ln(n)) as a viable choice that maintains the strongly consistent estimation of a PMM. To demonstrate the utility of these new metrics, we apply them to the modeling of three DNA sequences of dengue virus type 3, endemic in Brazil in 2023.


Introduction
This article embarks on an exploration of the efficient determination criterion (EDC), as introduced in [1], with a particular emphasis on formulating an EDC-based metric.Our endeavor is bolstered by the presence of a Bayesian information criterion (BIC) metric proposed in [2], designed to provide consistent estimations of partition Markov models [2].Our aim is to extend the scope of the BIC-based metric, thereby broadening the array of algorithms available for identifying partition Markov models.
To achieve our goal, we furnish a theoretical framework delineating the operational principles underlying the BIC/EDC, the BIC-based metric, and additionally, we conduct a brief survey of the current research landscape within this domain to provide context for our approach.
Let (X t ) be a discrete-time order o Markov chain on a finite and discrete alphabet ∆, with o < ∞; let us call Ω = ∆ o the state space.Denote the string a k a k+1 . . .a m by a m k , where a i ∈ ∆, k ≤ i ≤ m.For each a ∈ ∆ and s ∈ Ω, the transition probability from the state s to a is ( Given the previous notation, we appeal to a model in (X t ) which allows a more efficient estimation of the transition probabilities, introduced by Equation (1); see [2].
Definition 1.Let (X t ) be a discrete-time order o Markov chain on a finite and discrete alphabet ∆, o < ∞.Two states s, r ∈ Ω = ∆ o are equivalent (denoted by s ∼ p r) if P(a|s) = P(a|r) ∀a ∈ ∆.
For any s ∈ Ω, the equivalence class of s is given by the set of states {r ∈ Ω : r ∼ p s}.
The previous notion allows the definition of a Markov chain with minimal partition P, that is, one which follows the equivalence relationship.Definition 2. Let (X t ) be a discrete-time order o Markov chain on a finite and discrete alphabet ∆, o < ∞, and let P = {Γ 1 , Γ 2 , . . ., Γ |P | } be a partition of Ω = ∆ o ; (X t ) is a Markov chain with minimal partition P if P is defined by the relationship ∼ p introduced by Definition 1.
As previously indicated, the objective of this model is to allow a more efficient estimation of the probabilities introduced by Equation ( 1), which occurs in the most efficient way possible by identifying the parts of the minimal partition (Definition 2), and thus, being able to use all the states inserted in each part to estimate a single probability per part.To identify the partition P introduced in Definition 2, a strategy must be implemented as shown below.
In a given sample x n 1 , of size n, coming from the stochastic process (X t ) under the assumptions of Definition 2, given the state s ∈ Ω and the element of the alphabet a ∈ ∆, we denote the number of occurrences of s followed by a in the sample x n 1 by is the number of occurrences of s in the sample x n 1 .Also, given a partition P of Ω, denote the number of occurrences of elements into Γ (part of P) followed by a as the accumulated number of values N n (s) for s ∈ Γ is denoted by Note that N n (Γ, a) and N n (Γ) can be computed for any partition P of Ω, not only for the partition introduced by Definition 2. The counts of occurrences, in this case N n (Γ, a) and N n (Γ), allow the estimation of probabilities (Equation ( 2)) subject to a modification of the likelihood function of the sample.The likelihood of the sample is then, the maximum of the modified log-likelihood is And, is the maximum likelihood estimator of P(a|Γ) given in Equation (2).
As shown in [2], under the assumptions of Definition 2, the partition P can be consistently (strong consistency) retrieved using the Bayesian information criterion (BIC), defined as with α > 0, a constant value.Then, the BIC takes into consideration the maximum of the modified log-likelihood term penalized by , where (|∆| − 1)|P | is the number of probabilities to be estimated.
In practice, candidates to be the partition according to Definition 2 are compared, and the partition with the higher BIC value is considered more suitable.Also, in [2] a metric is introduced based on the BIC criterion, along with clustering algorithms, which are used to obtain P; the metric is defined below.To achieve consistent estimation, such a metric operates on partitions of the state space that follow certain rules.The metric is then able to refine partitions until it identifies the one cited by Definition 2. The partitions in which we will apply the metric, are made up of members (parts) formed by states sharing all the transition probabilities.The definition to follow formalizes the concept.Definition 3. Let (X t ) be a Markov chain of order o, with finite and discrete alphabet ∆, o < ∞, and state space Ω = ∆ o .Set a partition of Ω, P = {Γ 1 , ..., given a part Γ of P, Γ is a good part if ∀a, a ∈ ∆, P(a|s) = P(a|r), ∀r, s ∈ Γ, r ̸ = s.ii.P is a good partition of Ω if Γ satisfies i. ∀Γ ∈ P.
Under the validity of Definition 3-i, the probabilities introduced by Equation (2) are since all the elements of the good part Γ of P share the transition probabilities.Note that the partition identified by Definition 2 verifies Definition 3-ii, but the reciprocal is naturally not valid.A straightforward example of a good partition is one composed of all the states being isolated.
The following introduces a notion used to estimate the minimal partition (Definition 2).This criterion operates on good parts (Definition 3-i).Definition 4. Let (X t ) be a Markov chain of order o, with finite and discrete alphabet ∆, o < ∞, and state space Ω = ∆ o ; x n 1 is a sample of the process and let P = {Γ 1 , ..., Γ |P | } be a good partition of Ω, where α is a constant and positive value, In [2], it is proved that d P of Definition 4 is a metric, meaning that, if Γ l ∈ P, l ∈ {i, j, k}, i. d P (i, j) ≥ 0, with equality, if and only if As a consequence of the property i of a metric, the ability of d P to operate adequately depends on the accuracy of the maximum likelihood estimation of the transition probabilities P(a|Γ i ) and P(a|Γ j ), ∀a ∈ ∆.That is, when the estimators of those probabilities N n (Γ j ) ∀a ∈ ∆ are near and the sample size n is large enough, we have evidence of proximity between P(a|Γ i ) and P(a|Γ j ), ∀a ∈ ∆.And, such a finding indicates that the elements of both parts must be together.
Partition Markov models, commonly referred to as those delineated by Definition 2, have found application in diverse realms.For instance, they have been employed in data compression in conjunction with Huffman coding, as exemplified in [3].Across these investigations, the utilization of the BIC-based metric d P has proven indispensable.Also, in [2], this metric has been pivotal for modeling the behavior of internet users.The partition Markov model allows identifying the chances of a user visiting a certain internet site in their next step, based on their history, and identifies equivalent histories in the sense introduced by Definition 2.
Since the support of d P is the BIC criterion, the question arises whether there is a broader criterion than BIC that is capable of maintaining strong consistency in the estimation of P. The next section shows that such a criterion exists (a generalization of BIC) and was proved by [4].Then, the next question that we propose to answer is whether such a generalization of the BIC allows the creation of a metric that generalizes the one introduced in [2].
The next section (Section 2) addresses the problem by introducing the efficient determination criterion, and then, presenting how this criterion is linked to a metric, also introducing a cut-off point that enables the practical use of the metric based on the efficient determination criterion, for sufficiently large values of n.Section 3 shows an application in which different fits of model-Definition 2-are compared, inferred by variants of the efficient determination criterion, indicated as recommended in Section 2. This article ends with the Conclusions-Section 4-in which we highlight the main contributions, and the Bibliography section.

Efficient Determination Criterion
Ref. [1] proposes a criterion generalizing the BIC criterion, the efficient determination criterion (EDC).In that paper, the proposal is to introduce a sequence {w n } n≥1 in the place of {ln(n)} n≥1 ; see Equation (6).The generalization also offers more options in the penalty term of Equation ( 6), instead of the number of parameters a function γ(•) is introduced acting over the number of parameters; this function is strictly increasing in the number of parameters.Under the assumptions of Definition 2, the criterion is formulated as follows: With α > 0 a constant value, γ(•) being a strictly increasing function, and {w n } a sequence of positive numbers depending on n.As well as BIC, candidates to be the partition according to Definition 2 are compared, and the higher the EDC, the more indicated the partition is.Note that if we choose γ(•) as being the identity function, γ , and w n = ln(n), then Equation ( 6) is recovered.Then, clearly the EDC criterion is a generalization of the BIC criterion.
Ref. [4] proves that the EDC criterion provides a strongly consistent way to estimate the partition P of Definition 2 if Note that if we take w n = n a for a ∈ (0, 1), the conditions given in Equation ( 9) are valid.Also, we can use w n = a ln(n) for a > 0. Another option is to use w n = n a ln(n) for a ∈ (0, 1). Figure 1 shows penalty functions w n verifying Equation (9).We see in the figure that functions w n are positioned between the functions n and ln(ln(n)).And between n and ln(ln(n)) is also the w n related to the BIC criterion (w n = ln(n)).
Clearly, the penalty ln(ln(n)) does not verify the second statement of Equation ( 9), but according to [5] it is an optimal penalty term for estimating the order of a Markov chain.With such inspiration in mind, the following proposition guarantees that ln(ln(n)) can also be used to obtain a consistent estimate of P. To state the proposition we introduce the notion of relative entropy.Proposition 1.Let (X t ) be a Markov chain of order o, with finite and discrete alphabet ∆, o < ∞, and state space Ω = ∆ o ; x n 1 is a sample of the process and let P = {Γ 1 , ..., Γ |P | } be a partition of Ω, and P(•|Γ) be the probability given by Equation ( 2) related to a good part Γ (Definition 3-i).To any δ > 0 there exists κ > 0 (depending on P(•|•)) such that, eventually, almost surely as n → ∞ for all Γ, good part, with N n (Γ) ≥ 1 and o < κ ln(ln(n)).
Proof.From the proof of Corollary 2 of [6] (on page 1621), we obtain that for any ϵ > 0 there is κ > 0 (depending on for all s ∈ Ω with N n (s) ≥ 1 and o < κ ln(ln(n)).
Consider δ > 0 and set ϵ = δ |∆| 2o , in Equation ( 10), then Because Γ is a good part of P, s ∈ Γ, we obtain Following Equations ( 3), ( 4) and ( 7), we have The next results show that despite ln(ln(n)) violating the second condition imposed by Equation ( 9), the EDC (with ln(ln(n))) provides a consistent estimate of the minimal partition.Theorem 1.Let (X t ) be a Markov chain of order o, with finite and discrete alphabet ∆, o < ∞, and state space Ω = ∆ o ; x n 1 is a sample of the process and let P = {Γ 1 , ..., Γ |P | } be a partition of Ω, and suppose that i and j exist; i ̸ = j such that Γ i and Γ j following Definition 3-i.Then, P(a|Γ i ) = P(a|Γ j ), ∀a ∈ ∆ if, and only if, eventually, almost surely as n → ∞, where EDC(x n 1 , P ) is defined by Equation ( 8), with w n = ln(ln(n)) and EDC(x n 1 , P ij ) is given by Equation ( 8) (with w n = ln(ln(n))) over the partition Proof.The proof is a variant of the one presented in [2], theorem 1. ⇐ is direct from that proof, just considering (i) ln(ln(n)) n → 0 instead of ln(n) n → 0, when n → ∞ and considering (ii) that γ(•) is an increasing function.For ⇒, we have that P(a|Γ i ) = P(a|Γ j ), ∀a ∈ ∆, and we want to prove that EDC(x n 1 , P ) − EDC(x n 1 , P ij ) < 0. Again, following the steps in such a proof, we obtain that EDC(x n 1 , P ) − EDC(x n 1 , P ij ) is bounded above by where D(P(•)||Q(•)) is the relative entropy, given by Definition 5.For each Γ ∈ {Γ i , Γ j }, N n (Γ,.)N n (Γ) and P(.|Γ) are probabilities on ∆; then, Equation (11) follows from lemma 6.3 in [7].On the other hand, since Γ ∈ {Γ i , Γ j } is a good part, by hypothesis, from Proposition 1, for any δ > 0 and large enough n, Equation (12) follows, Then, set α , which is >0, since γ(•) in a strictly increasing function.For any δ > 0 and large enough n, where p = min{P(a|Γ) : a ∈ ∆, Γ ∈ {Γ i , Γ j }}.In particular, taking δ < pc 0 2|∆| , for a large enough n, EDC(x n 1 , P ) − EDC(x n 1 , P ij ) < 0. As a result of the previous theorem, it turns out that it is possible to guarantee that the EDC with the penalty term w n = ln(ln(n)) allows the consistent estimation of the minimal partition.As a consequence, we have: Corollary 1.Let (X t ) be a Markov chain of order o, with finite and discrete alphabet ∆, o < ∞, and state space Ω = ∆ o ; x n 1 is a sample of the process.Let Ψ be the set of all the partitions of Ω. Define where EDC(x n 1 , P ) is defined by Equation ( 8), with w n = ln(ln(n)).Then, eventually, almost surely as n → ∞, P * = P * n , where P * is the partition of Ω, following Definition 2.
Proof.Following the same steps as the proof of Theorem 3 of [2].It is enough to replace the BIC criterion with the EDC criterion (Equation ( 8)) with w n = ln(ln(n)) and apply Theorem 1 instead of Theorem 1 and Corollary 1 of [2].
Corollary 1 complements the results of [4], showing that the minimal partition (Definition 2) is consistently recovered by the EDC (Equation ( 8)) when it is formulated by a strictly increasing function γ and w n follows Equation ( 9), or when w n = ln(ln(n)).
In order to generalize the BIC-based metric d P , given by Definition 4, the following notion is introduced.Definition 6.Let (X t ) be a Markov chain of order o, with finite and discrete alphabet ∆, o < ∞, and state space Ω = ∆ o ; x n 1 is a sample of the process, let P = {Γ 1 , ..., Γ |P | } be a good partition of Ω, and 1 ≤ i, j ≤ |P |, i ̸ = j: α , α a constant and positive value, γ(•) being a strictly increasing function, and {w n } a sequence of positive numbers depending on n.
It is evident that if we take γ The next result shows the relationship between the EDC criterion and the notion introduced in Definition 6.
Theorem 2. Let (X t ) be a Markov chain of order o, with finite and discrete alphabet ∆, o < ∞, and Ω = ∆ o ; x n 1 is a sample of the process.Let P = {Γ 1 , ..., Γ |P | } be a good partition of Ω, and where δ P (i, j) is given by Definition 6, EDC(x n 1 , P ) is defined by Equation ( 8), and EDC(x n 1 , P ij ) is given by Equation (8) over the partition Remark 1.In order to guarantee the consistent estimation of the partition given by Definition 2, we note that Theorem 2 must be used for a large enough n and with weights w n following Equation (9) or w n = ln(ln(n)).
The following theorem characterizes the notion given by Definition 6 as being a metric.
Theorem 3. Let (X t ) be a Markov chain of order o over a finite and discrete alphabet ∆, o < ∞, Ω = ∆ o the state space, and x n 1 a sample of the Markov process.If P = {Γ 1 , . . ., Γ |P | } is a good partition of Ω, for each n, and for any i, j, k ∈ {1, 2, ..., |P |}, given δ P as Definition 6, i.
δ P (i, j) ≥ 0, with equality, if and only if, N n (Γ i ,a) N n (Γ i ) = N n (Γ j ,a) N n (Γ j ) ∀a ∈ ∆; ii.δ P (i, j) = δ P (j, i); available in early 2023 (https://www.ncbi.nlm.nih.gov/,accessed on 10 March 2024).We then proceed to compare the models derived from these sequences by applying the metric-Definition 6-and employing the agglomerative algorithm.Our analysis focuses on observing the variations in partition composition and probability magnitudes as we change the penalization term w n .
According to [10], the genesis of the initial autochthonous case of DENV-3 (GIII-American-I lineage) in Brazil dates back to December 2000, specifically within Rio de Janeiro.Over the course of the 2000s multiple incursions of this lineage were documented from the Caribbean into Brazil.The northern and southeastern regions of Brazil swiftly emerged as the epicenters of dissemination.The advent of this lineage precipitated a significant dengue outbreak in Brazil, in Rio de Janeiro, in 2002, followed by subsequent outbreaks in diverse locales.
However, since 2010, publicly available data indicate a downward tendency in the prevalence of DENV-3; DENV-3 has represented a mere fraction (<1%) of the total dengue cases in Brazil, with scant confirmed instances reported.Consequently, the transmission of DENV-3 has not been substantiated in recent years, pointing out a potential extinction of the DENV-3 (GIII-American-I lineage) within Brazil.The resurgence of DENV-3 is a real challenge in Brazil, since it is expected that the population will not have immunity, given the time that this virus has not been found in the region.
Table 1 shows the GenBank numbers, collection date, and origins of three sequences, introduced by [10].The records correspond to three complete genetic sequences in FASTA format (alphabet ∆ = {a,c,g,t}), of DENV-3, which is already native to Brazil.We assume that each of these sequences is a sample of a process that meets Definition 2. We proceed to fit the model-Definition 2-using the metric-Definition 6-and the agglomerative algorithm.For this, we take into account the alphabet ∆ = {a,c,g,t}, with cardinal |∆| = 4, where the sequences take their values.In Table 2, we show the frequencies for each element of the alphabet.Considering that min{10,697, 10,511, 10,553} = 10,511 and log |∆| (10,511) = 6.68, with integer part equal to 6, we adopt o = 3, since 3 < 6 and the elements of the genetic alphabet ∆ are organized in multiples of 3.
We fit four scenarios for each of the three sequences OQ706226, OQ706227, and OQ706228, with each scenario governed by a different penalty w n .All of them are considered in Definition 6, ∆ = {a,c,g,t}, o = 3, α = 2 (see [11]), and the γ function is the identity function.For each penalty, we identify, using the metric (Definition 6), the estimated partition of the partition given by Definition 2, and then, determine the transition probabilities of each part for each element of ∆.We denote by Γ v i the part i estimated for the sequence v, where v can be A, B, C, corresponding to OQ706226, OQ706227, and OQ706228, respectively.In Tables 3 and 4, we record the results for the three sequences with penalty w n = n 1/2 ; Tables 5 and 6 report the results for the three sequences with penalty w n = n 1/3 .While Tables 7 and 8 show, for the three sequences, the results using the usual BIC penalty (w n = ln(n)).Finally, Tables 9-11 report the results with the penalty w n = ln(ln(n)) (see Corollary 1): for the sequence OQ706226, Table 9; for the sequence OQ706227, Table 10; and for the sequence OQ706228, Table 11.2)-estimated by Equation (5).From top to bottom, for the sequences A (GenBank OQ706226), B (GenBank OQ706227), and C (GenBank OQ706228).∆ = {a,c,g,t}, w n = n 1/2 ; full estimated partitions displayed in Table 3 2)-estimated by Equation (5).From top to bottom, for the sequences A (GenBank OQ706226), B (GenBank OQ706227), and C (GenBank OQ706228).∆ = {a,c,g,t}, w n = n 1/3 ; full estimated partitions displayed in Table 5 γ function given by the identity.We observe from Tables 3, 5, 7 and 9 (right)-11 (right) that as the penalty w n is reduced (that is, when w n approaches the lower limit ln(ln(n))), the model is allowed to acquire more parameters, in this case, more parts.

Sequence Part States
Given a penalization w n , the three sequences, A (GenBank OQ706226), B (GenBank OQ706227), and C (GenBank OQ706228), show a similar number of parts.More specifically, for the penalty w n = n 1/2 the behavior of the three sequences is represented by two parts (Table 3), for w n = n 1/3 the behavior of the three sequences is described by four parts (Table 5).For w n = ln(n), OQ706226 is modeled by five parts while the other two are modeled by six parts (see Table 7).For penalty w n = ln(ln(n)), OQ706226 is modeled by a partition with 13 parts (see Table 9, right) while the other two sequences are modeled by 14 parts, see Tables 10 (right) and 11 (right).2)-estimated by Equation (5).From top to bottom, for the sequences A (GenBank OQ706226), B (GenBank OQ706227), and C (GenBank OQ706228).∆ = {a,c,g,t}, w n = ln(n); full estimated partitions displayed in Table 7 The formal determination of whether the identified models, under each penalty, exhibit significant differences lies beyond the scope of this application.However, we acknowledge it as an open question worthy of further exploration.
The following observation applies to all three sequences.Observing the magnitudes of the transition probabilities, marked in bold, we note that, as reported in Tables 6, 8, and 9 (left)-11 (left), there is a predominant number of parts whose prevalence is the transition to element a of the alphabet {a,c,g,t}, and secondly, parts that indicate a prevalence for element g of the alphabet.As for Table 4, which reports the most penalized case (w n = n 1/2 ), one part is recorded with prevalence for a and another with prevalence for g, which is natural, since the model only has two parts.As Tables 9-11 show, based on the penalty w n = ln(ln(n)), the three sequences show the same part {agc,ggt,gag,ata} with a prevalence for the element t of the alphabet ∆, of lower magnitude, than those previously mentioned.

Conclusions
The main objective of this paper, developed in Section 2, is to introduce a new notion based on the Equation ( 8), as given in Definition 6.This concept is used to identify the minimal partition of a Markov chain-Definition 2. Theorem 3 proves that the concept in Definition 6 constitutes a metric.Furthermore, Theorem 2 establishes the relationship between this new metric and the operation of the EDC criterion, showing that in an iterative process, selecting a partition with a higher EDC value is equivalent to using the value 1 as a threshold in the metric.In this way, we achieve our main goal of proposing an EDC-based metric to estimate the minimal partition.
Our results add to those of [4], in the search to characterize penalty terms that can be used in the EDC criterion to obtain the consistent estimation of the minimal partition.Ref. [4] demonstrates that the EDC, under certain conditions on the term w n (Equation ( 9)), provides a strongly consistent estimate of the minimal partition, as defined in Definition 2. Building on the results from [5], we conjectured that using w n = ln(ln(n)) might preserve strong consistency, even though this term does not satisfy the second condition imposed by Equation (9).We confirm in Theorem 1 and Corollary 1 that strong consistency is indeed achieved using the EDC with the penalization term w n = ln(ln(n)).
We conclude the article with an application demonstrating the effect of the metric introduced in Definition 6 on estimating the minimal partition-Definition 2, using various penalty terms discussed in Remark 1.For this purpose, we analyze three Dengue virus type 3 sequences, native to Brazil and collected in 2023, in FASTA format.The application shows that relaxing the penalty results in higher cardinalities for the estimated partition.We identify which parts (collections of states) of the Dengue sequences have a greater or lesser preference for transitioning to the next element (a, c, g, or t) in the alphabet ∆ = {a, c, g, t}.
As expected, the models identified for each sequence exhibit similar features when the penalty is applied, which is natural given that the sequences share the same collection date and region of origin.

Table 7 .
. In bold, the highest probability per part.Minimal partition-Definition 2-estimated by Definition 6.From top to bottom, for the sequences A (GenBank OQ706226), B (GenBank OQ706227), and