First- and Second-Order Hypothesis Testing for Mixed Memoryless Sources

The first- and second-order optimum achievable exponents in the simple hypothesis testing problem are investigated. The optimum achievable exponent for type II error probability, under the constraint that the type I error probability is allowed asymptotically up to ε, is called the ε-optimum exponent. In this paper, we first give the second-order ε-optimum exponent in the case where the null hypothesis and alternative hypothesis are a mixed memoryless source and a stationary memoryless source, respectively. We next generalize this setting to the case where the alternative hypothesis is also a mixed memoryless source. Secondly, we address the first-order ε-optimum exponent in this setting. In addition, an extension of our results to the more general setting such as hypothesis testing with mixed general source and a relationship with the general compound hypothesis testing problem are also discussed.


Introduction
Let X = {X n } ∞ n=1 and X = {X n } ∞ n=1 be two general sources (cf. Han [1]), where we use the term of general source to denote a sequence of random variables X n (respectively, X n ) indexed by block length n, where each component of X n (respectively, X n ) takes values in alphabet X and may vary depending on n.
We consider the hypothesis testing problem with null hypothesis X, alternative hypothesis X and acceptance region A n ⊂ X n . The probabilities of type I error and type II error are defined, respectively, as µ n := Pr {X n / ∈ A n } , λ n := Pr X n ∈ A n .
We focus mainly on how to determine the ε-optimum exponent, defined as the supremum of achievable exponents R for the type II error probability λ n e −nR under the constraint that the type I error probability is allowed asymptotically up to a constant ε (0 ≤ ε < 1). The classical but fundamental result in this setting is so-called Stein's lemma [2], which gives the ε-optimum exponent in the case where both the null and alternative hypotheses are stationary memoryless sources. The lemma shows that the ε-optimum exponent is given by D(P X ||P X ), the divergence between stationary memoryless sources X and X. Chen [3] has generalized this lemma to the case where both of X and X are general sources, and established the general formula of ε-optimum exponent in terms of divergence spectra. The ε-optimum exponent derived in [3] is called in this paper the first-order ε-optimum exponent.
On the other hand, the second-order asymptotics have also been investigated in several contexts of information theory [4][5][6][7][8][9] to analyze the finer asymptotic behavior of the form λ n e −nR− Strassen [4] has first introduced the notion of ε-optimum achievable exponent of the second-order in hypothesis testing problem in the case where both of X and X are stationary memoryless sources. The results in [4] have also revealed that the asymptotic normality of divergence density rate (or likelihood ratio rate) plays an important role in computing the second-order ε-optimum exponent.
In this paper, on the other hand, we investigate the hypothesis testing for mixed memoryless sources. The class of mixed sources is quite important, because all stationary sources can be regarded as mixed sources consisting of stationary ergodic sources. Therefore, the analysis for mixed sources is primitive but fundamental and thus we first focus on the case where the null hypothesis is a mixed memoryless source and the alternative hypothesis is a memoryless source. In this direction, Han [1] has first derived the single-letter formula for the first-order ε-optimum exponent in the case with mixed memoryless source X and stationary memoryless source X. The first main result in this paper is to establish the single-letter second-order ε-optimum exponent in the same setting by invoking the relevant asymptotic normality. The result is a substantial generalization of that of Strassen [4]. Second, we generalize this setting to the case where both null and alternative hypotheses are mixed memoryless X, X to establish the single-letter first-order ε-optimum exponent.
It should be emphasized that our results described here are valid for mixed memoryless sources with general mixture in the sense that the mixing weight for component sources may be an arbitrary probability measure. For the case of mixed general sources with finite discrete mixture, we reveal the deep relationship with the compound hypothesis testing problem. We notice that the compound hypothesis testing problem is important from both of theoretical and practical points of view. We show that the first-order 0-optimum (respectively, exponentially r-optimum) exponent for the mixed general hypothesis testing coincides with that for the 0-optimum (respectively, exponentially r-optimum) exponent in the compound general hypothesis testing.
The present paper is organized as follows. In Section 2, we fix the problem setting and review the general formula (Theorem 1) for the first-order ε-optimum exponent. This is used to prove Theorem 5 to establish a first-order single-letter formula for hypothesis testing in the case where both the null and alternative hypotheses are mixed memoryless. Moreover, we give the general formula (Theorem 2) for the second-order ε-optimum exponent, which is used to prove Theorem 4 to establish a second-order single-letter formula for hypothesis testing in the case where the null hypothesis is mixed memoryless and the alternative hypothesis is stationary memoryless. In Section 3, we establish the single-letter second-order ε-optimum exponent in the case with mixed memoryless source X and stationary memoryless source X (cf. Theorem 4). Furthermore, in Section 4, we consider the case where both of null and alternative hypotheses are mixed memoryless sources, and derive the single-letter first-order ε-optimum exponent (cf. Theorem 5). Section 5 is devoted to an extension of mixed memoryless sources to mixed general sources. Finally, in Section 6, we define the optimum exponent for the compound general hypothesis testing problem to discuss a relevant relationship with the hypothesis testing with mixed general sources. We conclude the paper in Section 7.

General Formulas for ε-Hypothesis Testing
In this section, we first review the first-order general formula and then give the second-order general formula. Throughout in this paper, the following lemmas play the important role, where we use the notation that P Z indicates the probability distribution of random variable Z. Lemma 1 ([1] (Lemma 4.1.1)). For any t > 0, define the acceptance region as then, it holds that Lemma 2 ([1] (Lemma 4.1.2)). For any t > 0 and any A n , it holds that Proofs of these lemmas are found in [1]. We define the first and second-order ε-optimum exponents as follows.

Definition 1.
Rate R is said to be ε-achievable, if there exists an acceptance region A n such that lim sup n→∞ µ n ≤ ε and lim inf Definition 2 (First-order ε-optimum exponent).
The right-hand side of Equation (5) specifies the asymptotic behavior of the form λ n e −nR . Chen [3] has derived the general limiting formula for B ε (X||X) as follows, which is utilized to establish Theorem 5 in Section 4.
Definition 3. Rate S is said to be (ε, R)-achievable, if there exists an acceptance region A n such that lim sup n→∞ µ n ≤ ε and lim inf Definition 4 (Second-order (ε, R)-optimum exponent).
The right-hand side of Equation (9) specifies the asymptotic behavior of the form λ n e −nR− √ nS .
The general limiting formula for B ε (R|X||X) is given as follows, which is the second-order counterpart of Theorem 1, and is utilized to establish Theorem 4 in the next Section 3.2 to give a second-order single-letter formula for hypothesis testing in the case where the null hypothesis is mixed memoryless and the alternative hypothesis is stationary memoryless.
where K(R, S) = lim sup n→∞ Pr 1 n log P X n (X n ) P X n (X n ) Proof. See Appendix A.

First-Order ε-Optimum Exponent
In the previous section, we have demonstrated the "limiting" formulas for general hypothesis testing. In this and subsequent sections, we consider special but insightful cases and compute the optimum exponents in single-letter forms.
Let Θ be an arbitrary probability space with general probability measure w(θ) (θ ∈ Θ). Then, the hypothesis testing problem to be considered in this section is stated as follows:

•
The null hypothesis is a mixed stationary memoryless source where X n θ is a stationary memoryless source for each θ ∈ Θ and with generic random variable X θ (θ ∈ Θ) taking values in X .

•
The alternative hypothesis is a stationary memoryless source X = X n ∞ n=1 with generic random variable X taking values in X , that is, We assume X to be a finite alphabet hereafter.
To investigate this special case, first we introduce an expurgated parameter set on the basis of types, where the type T of sequence x ∈ X n is the empirical distribution of x, that is, T = (N(x|x)/n) x∈X with the number N(x|x) of i such that x i = x (i = 1, 2, · · · , n).
Let T 1 , T 2 , · · · , T N n denote all possible types of sequences of length n. Then, it is well-known that Now, for each x ∈ X n , we define the set Since P X n θ is an i.i.d. source for each θ ∈ Θ, the set Θ(x) depends only on the type T k of sequence x, and therefore, we may write Θ(T k ) instead of Θ(x). Moreover, we define the "expurgated" set Then, we have the following lemma: Lemma 3 (Han [1]). Let X = {X n } ∞ n=1 denote a mixed memoryless source defined in Equation (13), then we have Next, we introduce two basic "decomposition" lemmas as follows.
Lemma 4 (Upper Decomposition Lemma). Let X = {X n } ∞ n=1 be a mixed memoryless source and X = X n ∞ n=1 be an arbitrary general source. Then, for any θ ∈ Θ * n and any real z n it holds that (20) Proof. See Appendix B.
Lemma 5 (Lower Decomposition Lemma). Let X = {X n } ∞ n=1 be a mixed memoryless source and X = X n ∞ n=1 be an arbitrary general source. Then, for any θ ∈ Θ, z n and γ > 0 it holds that Proof. See Appendix C.
These Lemmas 3-5 are used later in order to establish Theorems 3-5. First, Theorem 3 concerning the first-order ε-optimum exponent for mixed memoryless sources has earlier been given as follows: Theorem 3 (First-order ε-optimum exponent: Han [1]). For 0 ≤ ε < 1, where D(P X ||P X ) denotes the Kullback-Leibler divergence between P X and P X .

Remark 1.
If Θ is a singleton, the above formula reduces to which is nothing but Stein's lemma [2].
This can be verified as follows. Set Then, clearlyβ ε ≤ β ε . Here, we assume thatβ ε < β ε to show a contradiction. From the assumption, there exists a constant γ > 0 satisfyingβ ε + 2γ < β ε . On the other hand, from the definition of β ε , for any η > 0 holds. Thus, setting η < γ leads to which is a contradiction, where the last inequality is due to the definition ofβ ε .

Second-Order ε-Optimum Exponent
Next, we establish the second-order ε-optimum exponent for mixed sources, which is the first main result in this paper. where (32) Proof. See Appendix D.
Here, let us consider the following canonical equation for S In view of Equations (33) and (34), this equation always has a solution S = S(ε). It should be noted that if {θ|D(P X θ ||P X )=B ε (X||X)} dw(θ) = 0 holds, the solution is not unique and so S(ε) = +∞. By using the solution S(ε), it is not difficult to check that Theorem 4 with R = B ε (X||X) can be expressed as The canonical equation is a useful expression for the second-order ε-optimum rate [7,[10][11][12]. Equation (35) is the hypothesis testing counterpart of these results.

Mixed Memoryless Alternative Hypothesis
In this section, we consider the case where not only the null hypothesis but also the alternative hypothesis are mixed memoryless sources to establish the single-letter formula for the first-order ε-optimum exponent, by which we intend to generalize Theorem 3.
Let P X σ σ∈Σ be a family of probability distributions on X , where Σ is a probability space with probability measure v(σ). We assume here that Σ is a compact space and P X σ is continuous as a function of σ ∈ Σ.
The hypothesis testing problem considered in this section is stated as follows: • The null hypothesis is a mixed memoryless source X = {X n } ∞ n=1 as defined by Equations (13) and (14) in Section 3.1.

•
The alternative hypothesis is another mixed memoryless source where Let us now consider, for each P ∈ P (X ) (the set of probability distributions on X ), the equation with respect to σ ∈ Σ as follows: with v-ess. inf f σ := sup{β| Pr{ f σ < β} = 0} (the essential infimum of f σ with respect to v(σ)), where " Pr " is measured with respect to the probability measure v(σ).
Since the solution σ of this equation depends on P, we may write as σ = σ(P) (σ(·) : P(X ) → Σ). Notice here that D(P||P σ ) is continuous in (P, P σ ), and as we have assumed that Σ is compact and P σ is continuous in σ, there indeed exists such a function σ(P). Now, to avoid technical subtleties, we assume here that the function σ(P) may be chosen so as to be continuous. For example, if we consider a special case such that Σ is a closed convex subset of P (X ), then it is not difficult to verify that the function σ(P) is uniquely determined and continuous (or even differentiable), which follows from the strict convexity of D(P||P) in (P, P). Another simple example will be the case that Σ is a countable set.
Hereafter, for simplicity, we write P θ , P n θ (respectively, P σ , P n σ ) instead of P X θ , P X n θ (respectively, P X σ , P X n σ ), then we have the second main result in this paper as Remark 5. In the case that Σ is a singleton, the above theorem coincides with Theorem 3. Therefore, this theorem is a direct generalization of Theorem 3. This means also that both Θ and Σ are singletons, the theorem coincides with Stein's lemma (see Remark 1).

Remark 6.
Remark 2 is also valid in this theorem. That is, B ε (X||X) can be expressed also as Proof of Theorem 5. To show the theorem, let T n θ,ν ⊆ X n be the set of ν-typical sequence with respect to P X θ , that is, let T n θ,ν be the set of all x = (x 1 , x 2 , · · · , x n ) ∈ X n such that where N(x|x) is the number of i such that x i = x, and ν > 0 is an arbitrary constant. Then, it is well known that In the sequel, we use the upper and lower bounds of the probability in the form for each x ∈ T n θ,ν , where δ θ (ν) satisfies δ θ (ν) → 0 as ν → 0, and τ > 0 and c τ m (P θ ) > 0 are some constants independent of n. Proofs of Equations (45) and (46) appear in Appendix E.
We then prove the theorem by using Equations (45) and (46) as follows. In view of Theorem 1 and Remark 6, it suffices to show two inequalities: •

Proof of Equation (47):
Similar to the derivation of Equation (A23) with Lemma 4, we have From the definition of the ν-typical set and Equation (45), we also have for any θ ∈ Θ. Here, we define two sets: Then, from the definition of Θ 2 there exists a small constant γ > 0 satisfying where we have used the relation 1 4 √ n 3 < γ, and δ θ (ν) < γ for sufficiently large n and sufficiently small ν > 0.

Proof of Equation (48):
Similar to the derivation of Equation (A32) with Lemma 5, we have From the definition of the ν-typical set and Equation (46), we also have lim inf n→∞ Pr 1 n log P X n θ (X n θ ) P X n (X n θ ) for any θ ∈ Θ.
We also partition the parameter space Θ into two sets.
Then, for θ ∈ Θ 1 , if we set ν > 0 and τ > 0 sufficiently small, then there exists a constant η > 0 satisfying Thus, again by invoking the weak law of large numbers, we have for ∀θ ∈ Θ 1 lim inf n→∞ Pr 1 n log P X n θ (X n θ ) P n σ(P θ ) (X n θ ) Summarizing up, we obtain lim sup This completes the proof of Equation (48).

Remark 7.
Theorem 3 is a special case of Theorem 5 when Σ is a singleton.
To illustrate a significance of Theorem 5, let us now consider the special case with ε = 0. Then, by virtue of Theorem 5, we have the following simplified result: Corollary 1. In the special case of ε = 0, we have Proof. The formula (40) can be written in this case as Let then this means that Contrarily, let then this means that As a consequence, (66) follows from (67), (69) and (71).

Remark 8.
One may wonder if it might be possible to deal with the second-order ε-optimum problem too using the arguments as developed in the above for the first-order ε-optimum problem with mixed memoryless sources X and X. To do so, however, it seems that we need some novel techniques, which remain to be studied.

Hypothesis Testing with Mixed General Sources
We have so far investigated the ε-hypothesis testing for mixed memoryless sources. In this section, we deal with more general settings such as hypothesis testings with mixed general sources, which inherits the crux of that for mixed memoryless sources (cf. Theorem 5). This leads us to a primitive but insightful "general" observation.
To do so, we consider the case where both of null hypothesis X and alternative hypothesis X are finite mixtures of general sources as follows: • The null hypothesis is a mixed general source X = {X n } ∞ n=1 consisting of K general (not necessarily memoryless) sources X i = {X n i } ∞ n=1 (i = 1, 2, · · · , K), that is, ∀x ∈ X n , where α i > 0 (i = 1, 2, · · · , K) and The alternative hypothesis is another mixed general source X = X n ∞ n=1 consisting of L general (not necessarily memoryless) sources X j = {X n j } ∞ n=1 (j = 1, · · · , L), that is, ∀x ∈ X n , where β j > 0 (j = 1, 2, · · · , L) and ∑ L j=1 β j = 1.
In this general setting, it is hard to derive a compact formula for the first-order ε-optimum exponent (with 0 < ε < 1). Instead, we can obtain the following theorem in the special case of ε = 0.

Theorem 6.
In particular, if X i and X j are all stationary memoryless sources specified by X i (i = 1, 2, · · · , K) and X j (j = 1, 2, · · · , L), respectively, then which is a special case of Corollary 1.

Proof. See Appendix F.
Furthermore, we can also consider the following exponentially r-optimum exponent in hypothesis testing with two mixed general sources X and X as above.

Definition 5.
Let r > 0 be any fixed constant. Rate R is said to be exponentially r-achievable if there exists an acceptance region A n such that lim inf Definition 6 (First-order exponentially r-optimum exponent).

Hypothesis Testing with Compound General Sources
In this section, let us consider the compound hypothesis testing problem with finite null hypotheses X i = {X n i } ∞ n=1 (i = 1, 2, · · · , K) and finite alternative hypotheses X j = {X n j } ∞ n=1 (j = 1, 2, · · · , L), where X i and X j are general sources. As is well-known, this problem is expected to have a primitive but "general" relationship to that of mixed hypothesis at the structural level.
Specifically, the compound hypothesis testing is the problem in which a pair of general sources (X i , X j ) occurs as a pair (null hypothesis, alternative hypothesis), and the tester does not know which pair (X i , X j ) is actually working. This means that the acceptance region A n cannot depend on i and j. The type I error probabilities of the compound hypothesis testing are given by for each general null hypothesis X i . The type II error probabilities are also given by for each general alternative hypothesis X j . Then, the following achievability is of our interest.

Definition 7.
Rate R is said to be 0-achievable for the compound hypothesis testing, if there exists an acceptance region A n such that lim n→∞ µ (i) for all i = 1, 2, · · · , K and j = 1, 2, · · · , L.
Proof. See Appendix G.
From Theorems 6 and 8, we immediately obtain the first-order 0-optimum exponent for the compound hypothesis testing as: Assuming that α i > 0 and β j > 0 hold for all i = 1, 2, · · · , K and j = 1, 2, · · · , L, we have In particular, if X i and X j are all stationary memoryless sources specified by X i and X j , respectively, Equation (86) reduces to Remark 9. Similar to Definition 5, we can define the exponentially r-optimum exponent also for the compound hypothesis testing problem as follows.

Definition 9.
Let r > 0 be any fixed constant. Rate R is said to be exponentially r-achievable for the compound hypothesis testing, if there exists an acceptance region A n such that lim inf lim inf for all i = 1, 2, · · · , K and j = 1, 2, · · · , L.
Then, using an argument similar to the proof of Theorem 8, the following theorem can be shown: Let α i > 0 and β j > 0 hold for all i = 1, 2, · · · , K and j = 1, 2, · · · , L, then it holds that where with sources Equations (72) and (73) we use the notation to denote B e (r|X||X) (cf. Definitions 5 and 6).

Concluding Remarks
Thus far, we have investigated the first-and second-order ε-optimum exponents in the hypothesis testing problem. First, we have studied the second-order ε-optimum problem with mixed memoryless null hypothesis and stationary memoryless alternative hypothesis. As we have shown in the analysis of the second-order ε-optimum exponent, we use, as a key property, the asymptotic normality of divergence density rate for each of the component sources. We also observe that the canonical representation, first introduced in [11], is still efficient to express the second-order ε-optimum exponent for mixed memoryless sources in the hypothesis testing problem.
The first-order ε-optimum exponent in the case with mixed memoryless null and alternative hypotheses has also been established. One may wonder whether we can apply the same approach in the derivation of the second-order ε-optimum exponent in this setting. Notice that one of our key techniques to derive the first-order ε-optimum exponent is an expansion P x around P θ . More careful evaluation of this expansion would be needed to compute the second-order ε-optimum exponent. This remains to be a future work. Our final goal is the problem of hypothesis testing in which both of null and alternative hypotheses are general stationary sources. This paper characterizes the first-and second-order performance of hypothesis testing for mixed memoryless sources as a simple but crucial step toward this goal.
Finally, the relationship between the first-order 0-optimum (respectively, exponentially r-optimum) exponent in the hypothesis testing with mixed general sources and the 0-optimum (respectively, exponentially r-optimum) exponent in the compound hypothesis testing has also been demonstrated.
Define the acceptance region A n as Then, from Lemma 1 with t = R + S √ n we have the upper bound for the type II error probability λ n : from which it follows that lim inf We next evaluate the type I error probability µ n . Noting that we have lim sup because S = S 0 − γ by the definition. Hence, from Equations (A3) and (A5), S = S 0 − γ is (ε, R)-achievable. Since γ > 0 is arbitrary, the direct part has been proved.
(2) Converse Part: Suppose that S is (ε, R)-achievable. Then, there exists an acceptance region A n such that lim sup n→∞ µ n ≤ ε and lim inf n→∞ 1 √ n log 1 λ n e nR ≥ S.
We fix this acceptance region A n . The second inequality means that for any γ > 0 holds for sufficiently large n. On the other hand, from Lemma 2 with t = R + S−2γ √ n it holds that Substituting Equation (A7) into this inequality, we have for sufficiently large n. Thus, we have lim sup Here, from Equation (A6) we have which means that Since γ > 0 is arbitrarily, the proof of the converse part has been completed.

Appendix B. Proof of Lemma 4
Since P X n θ (x) ≤ e 4 √ n P X n (x) holds for ∀θ ∈ Θ * n , we have for any z n . By using this inequality with z n + 1 n log P X n (X n θ ) instead of z n , we have which completes the proof.

Appendix C. Proof of Lemma 5
Setting γ > 0, we define a set for θ ∈ Θ. Then, it holds that Pr X n θ ∈ D n = ∑ x∈D n P X n θ (x) Thus, for any real number z n it holds that n , X n θ / ∈ D n + Pr 1 n log P X n θ (X n θ ) ≤ z n − γ √ n , X n θ ∈ D n ≤ Pr 1 n log P X n (X n θ ) ≤ z n + Pr X n θ ∈ D n ≤ Pr 1 n log P X n (X n θ ) ≤ z n + e − √ nγ . (A17) Hence, we obtain the inequality Pr 1 n log P X n (X n θ ) ≤ z n ≥ Pr from which with z n + 1 n log P X n (X n θ ) instead of z n it follows that for all θ ∈ Θ. This completes the proof.

Appendix D. Proof of Theorem 4
Setting it suffices, in view of Theorem 2, to show two inequalities:

Proof of Equation (A21):
By the definitions of X and X, it holds that lim sup n→∞ Pr 1 n log P X n (X n ) P X n (X n ) where the second equality and the second inequality are due to Lemmas 3 and 4, respectively, and the last inequality is from the reverse Fatou's lemma.

Proof of Equation (A22):
By the definitions of X and X, and Lemma 5 with z n = R + S √ n , it holds that lim sup n→∞ Pr 1 n log P X n (X n ) P X n (X n ) for any γ > 0, where the last inequality is due to Fatou's lemma. We also partition the parameter space Θ into three sets as in Equations (A24)-(A26). Then, similarly to the derivation of Equations (A29) and (A30), we obtain lim inf n→∞ Pr 1 n log P X n θ (X n θ ) P X n (X n θ ) Thus, the right-hand side of Equation (A32) is rewritten as Substituting Equation (A34) into Equation (A32) and noting that γ > 0 is arbitrary, we obtain Equation (A22).
Let P x denote the type of x ∈ T n θ,ν . Then, noting that holds, a(x) is written as Here, it is important to notice that D(P||P σ ) is continuous in (P, P σ ) and hence, owing to the assumption, D(P||P σ(Q) ) is continuous in Q ∈ P (X ). Thus, expanding D(P||P σ(P x ) ) in P x around P θ leads to with some δ θ (ν) such that δ θ (ν) → 0 as ν → 0, because ∑ x∈X |P θ (x) − P x (x)| ≤ ν for x ∈ T n θ,ν . Then, with P x instead of P in Equation (A39) and in view of Equation (A36) for each x ∈ T n θ,ν we have the upper bound: from which it follows that for each x ∈ T n θ,ν Therefore, the proof of Equation (45) has been completed.

Appendix F. Proof of Theorem 6
First, we prove the inequality: To do so, we arbitrarily fix R ij for 1 ≤ ∀i ≤ K, 1 ≤ ∀j ≤ L so that Then, by the definition of B 0 (X i ||X j ), there exists an acceptance region A By using these regions, we define the acceptance region A n as Then, we have from which, together with Equation (A54), we obtain lim n→∞ µ n = 0.
Similarly, we have from which, together with Equation (A55), we obtain for any small γ > 0 lim inf Since R ij are arbitrary as far as Equation (A53) is satisfied, we have Equation (A52). Next, we prove the inequality: •