1. Introduction
Let
be a Polish space with its Borel
-algebra and let
be i.i.d.
-valued observations. We consider the simple hypotheses
where
and
are probability measures on
dominated by a reference measure
. Without loss of generality, one may take
and write
and
. In the unweighted setting, the optimal sum of type-I and type-II error probabilities is characterized by
and can be written as
In the standard (unweighted) Bayesian setting, the decay rate of the optimal total error probability is governed by the Chernoff information [
1,
2]:
Here is usually called the -skewed Bhattacharyya affinity coefficient, and is the affinity coefficient. In view of Hölder’s inequality, .
Chernoff also introduced an asymptotic efficiency notion for comparing two experimental designs
such that
n observations on one test are equivalent (i.e., they give asymptotically the same total loss as
) to
observations on another test; see [
1].
The paper studies a context-sensitive (weighted) analogue of this criterion and the logarithmic asymptotics of the optimal total loss as
, in the framework of [
3,
4]. In the weighted setting, a nonnegative weight function
reweights the loss of a wrong decision according to the realised sample. Thus,
acts as a context factor that changes the relevance of different observations for the statistical task.
Weights of this form arise naturally whenever observations are not equally informative for the inference task. Two canonical mechanisms produce such
. In
importance-type reweighting, samples drawn under a proposal density
g are used to perform inference with respect to a target
h, and the Radon–Nikodym factor
enters the loss as a strictly positive (non-indicator) tilt; this is the mechanism underlying the context-sensitive framework of [
3,
4].
In applications, the informational value of an observation often depends on the underlying channel state. A canonical example, directly relevant to multiple hypothesis testing of transmission regimes, is a mobile communication channel modulated by a multi-zone coverage process (e.g., strong/weak/outage) along the receiver trajectory: samples acquired in outage carry little information about the regime and are weighted accordingly. Such reliability-weighted aggregation in multi-state channels was studied within multi-valued frameworks in [
5,
6].
Under the standard assumption that the modulating state at time
i is determined by
alone, the resulting weight is a strictly positive bounded function
and extends multiplicatively to
. The weighted Chernoff information
then quantifies the effective discrimination rate under channel-dependent reliability and reduces to the classical rate
in the limit
. Further parametric instances (Gaussian, Poisson, exponential) are worked out in
Section 4.
Throughout we assume that the weight is compatible with the i.i.d. structure and factorises across observations; by abuse of notation, denotes both the one-step weight and its product extension.
Assumption 1
(Factorised weight).
The weight function satisfies Assumption 1 is the key single-letter hypothesis. It yields the weighted affinities
hence an additive logarithmic rate. For one observation and equal priors, the weighted Bayes risk equals
Since
for every
,
and therefore
where
(see Definition 2). Under Assumption 1, the same bound factorises over
n observations and yields the exponential scale
. Theorem 1 shows that this scale is exact on the logarithmic level.
1.1. Main Result and Contributions
Let
denote the optimal total context-sensitive loss (sum of weighted type-I and type-II losses, minimised over decision rules) for
n i.i.d. observations under Assumption 1. Our main theorem (Theorem 1) proves the single-letter logarithmic asymptotic
where the rate is the
weighted Chernoff informationFor
, (
5) reduces to the classical Chernoff information.
We also extend the exponent characterisation to a finite family of simple hypotheses: the optimal
M-ary rate is the minimum pairwise weighted Chernoff information (cf. [
7] in the unweighted case). A central technical device is an exponential-family representation of the weighted geometric mixtures
. This embeds the mixtures into a likelihood-ratio exponential family and identifies the exponent through the corresponding log-normaliser. We further derive concentration bounds for tilted weighted log-likelihood ratios and closed-form expressions for
in several parametric models; see
Section 4.
1.2. Contributions
Items (N1)–(N4) below indicate new results; items (A1)–(A3) summarise definitions, geometric context, and tools adopted from the existing literature.
- (N1)
(New.) Theorem 1 establishes the logarithmic asymptotic (
4) for the optimal weighted total loss under the factorised weight of Assumption 1, with rate given by the weighted Chernoff information (
5).
- (N2)
(New.) The exponential-family representation of the weighted geometric mixtures
(
Section 3.2) and the resulting uniqueness of the optimal skewing parameter
.
- (N3)
(New.) Concentration bounds for the tilted weighted log-likelihood and the finite-
n tail bound of Theorem 2 (
Section 3.4).
- (N4)
(New.) Closed-form expressions for
in the Gaussian, Poisson, and exponential models (
Section 4), and the
M-ary extension showing that the optimal rate equals the minimum pairwise weighted Chernoff information.
- (A1)
(Adapted definitions.) The definitions of the weighted Bhattacharyya affinities and the weighted Chernoff information generalise the classical unweighted quantities of [
1,
2] and follow the context-sensitive framework of [
3,
4]; their asymptotic and information-geometric consequences developed below are new.
- (A2)
(Geometric context.) The information-geometric identities of
Section 3.3 are derived in the spirit of the Chentsov–Amari–Nielsen framework [
8,
9,
10,
11] but are stated and proved for the tilted log-normaliser
; the unweighted limit
recovers the classical statements of [
11,
12].
- (A3)
(Standard tool.) The concentration argument uses the Azuma–Hoeffding/McDiarmid inequality [
13,
14]; the novelty lies in its application to the tilted weighted log-likelihood.
1.3. Related Work
The exponential theory of testing errors goes back to Chernoff [
1] and Hoeffding [
2]. The context-sensitive framework and the weighted information quantities used here were developed in [
3,
4]. The information-geometric viewpoint on Chernoff information originates with Chentsov [
9]; the dually flat structure of exponential and mixture families and the associated
-divergences are developed in [
8,
10], and the Chernoff point is characterised as the intersection of an exponential geodesic with the Kullback–Leibler bisector in [
11]. For
, the likelihood-ratio exponential family description is given in [
12]; the present paper extends this picture to the tilted integrand
. The minimum-pairwise principle for multiple testing is due to [
7]. Weighting mechanisms for covariate-dependent relevance have also been studied outside the asymptotic error-exponent framework, e.g., adaptive-kernel conditional-independence testing [
15].
1.4. Structure of the Paper
Section 2 introduces the weighted Bhattacharyya affinities and the weighted Chernoff information.
Section 3 proves the main asymptotic result (
4) and develops the exponential-family and information-geometric identities.
Section 3.4 studies the tilted weighted log-likelihood and derives finite-
n concentration bounds.
Section 4 examines Gaussian, Poisson, and exponential models and includes the
M-ary extension. Auxiliary computations are collected in the appendices.
4. Examples and Applications
The identities of
Section 3 reduce the computation of
and
to the single-letter weighted affinity
followed by optimisation over
. We work this out for Gaussian, Poisson, and exponential families, highlighting how the context weight
modifies the classical formulas. When
, the expressions reduce to the standard unweighted Bhattacharyya and Chernoff quantities; more involved non-exponential-family computations (such as the Cauchy location–scale family) are deferred to the
Appendix A.
4.1. A Numerical Illustration
This subsection illustrates the behaviour of
and
under a non-trivial factorised weight and serves as a direct numerical verification of the Bregman identities of
Section 3.3: for the model below the affinity
is available in closed form, and we check that closed-form evaluation and direct numerical integration agree to machine precision.
Consider the asymmetric Gaussian hypotheses
with a non-indicator factorised weight
At
, one has
and the unweighted Chernoff information is recovered. For
, the weight concentrates near
; in particular, (
54) is not an indicator-type weight, so the weighted problem does not reduce to the unweighted Chernoff information on a restricted domain.
The asymmetry and the centring of at the mean are essential for the illustration. In a fully symmetric configuration (, , ), the problem is invariant under , so the optimum is pinned to for every and the effect of the weight on the Chernoff compromise is invisible. Asymmetric hypotheses are precisely where the weighted formalism is operationally distinct from the classical one, and it is this distinction that the numerics below is designed to expose.
Writing
,
, and
, a direct Gaussian integration yields
and maximising (
55) over
gives
and
.
Table 1 reports their values at three selected
. Direct numerical integration of
from its definition agrees with (
55) to machine precision on all tabulated entries, which confirms the Bregman identities of
Section 3.3 numerically for this example.
The monotone growth of
and the leftward shift of
illustrate a qualitative conclusion of
Section 3: localising
near
preferentially retains observations that are more probable under
and thereby increases the effective discrimination rate, while simultaneously moving the optimal tilting towards the
side. The classical unweighted limit is recovered at
.
In the language of hypothesis testing,
is the parameter that balances the exponential rates of the type-I and type-II losses at the Bayes optimum: the type-I exponent equals
and the type-II exponent equals
(cf.
Section 3). A leftward shift of
therefore corresponds to reallocating the available exponential budget towards faster decay of the type-II loss at the expense of the type-I loss, which is the optimal response to a weight that concentrates mass in the region where
is more plausible.
Data and Code Availability
A Jupyter/Colab notebook reproducing
Table 1 and
Figure 1,
Figure 2 and
Figure 3, together with the direct-integration verification of (
55), is archived on Zenodo [
16] and mirrored on GitHub.
4.2. Gaussian Models
Throughout this subsection, the reference measure is the Lebesgue measure on
. We compute the weighted Bhattacharyya coefficient
together with the weighted Bhattacharyya distance
and the weighted Chernoff information
(Definition 2). Note that, unlike the unweighted case,
is not restricted to
and
(hence also
) may take negative values.
Example 1
(Gaussian weighted Bhattacharyya coefficient with exponential weight).
Let and on , where and , and let for some . Denote by the corresponding densities. For defineThenConsequently,In particular, setting (i.e., ) reduces (57) to the classical (unweighted) Gaussian Bhattacharyya distance; see, e.g., [12]. Corollary 3
(Common covariance).
In Example 1, assume and keep the exponential weight . Set , , and . Then, for any ,and thereforeIf and the unconstrained maximiserbelongs to , then ; otherwise the maximum over is attained at the nearest boundary point . In all cases,In particular, for (i.e., ) we recover and . Proof. The expression for
follows by simplifying Example 1 under
, which makes the determinant prefactor equal to 1 and yields a Gaussian MGF term
. The maximiser follows by differentiating (
59) in
. □
Choosing an exponential weight
corresponds to exponential tilting: for a Gaussian
, this tilting keeps the covariance and shifts the mean to
(with normalisation factor
), which is why the weighted affinities remain available in closed form. In particular, the optimal Chernoff parameter is no longer forced to be
and, as (
59) shows, sufficiently strong tilting can push the maximiser to the boundary
.
4.3. Poisson Models
Example 2
(Poisson model with exponential weight).
Let and let μ be the counting measure on . Fix two hypotheses and with , and write and . Throughout this subsection we work under the standing assumption of Section 2, namely, that the observations are i.i.d. (distributed as under and as under ), and that the weight φ factorises across observations (Assumption 1). We still consider the exponential weight (For we recover the unweighted case .) Equivalently, setting , the weight takes the form ; this reparameterisation is convenient in applications where models a per-event discount factor, while is the natural parameter for the exponential-family calculations below. The two parameterisations are equivalent.For , set(a) Weighted Bhattacharyya coefficient and Chernoff arc.
A direct summation givesHence, by Definition 1, Moreover, the normalised weighted geometric mixture (Chernoff-tilted density) from (18) takes the form(b) Optimal Chernoff parameter.
If , then is strictly concave on sincehence, the maximiser in Definition 2 is unique. Differentiating (61) yields the critical point condition Equivalently, the (unconstrained) critical point satisfies In contrast to the unweighted case (), the context tilt γ may push the optimal Chernoff parameter to the boundary .
Thus, the unconstrained maximiser isand the maximiser on is , whereFinally, If , then does not depend on α and for any .
Derivation of (60). For
,
so multiplying by
and summing over
k gives
4.4. Exponential Models
Example 3
(Exponential model with exponential weight).
Let with Lebesgue measure. Fix two hypotheses and with rates , and writeConsider the exponential weight withso that for all . For set(a) Weighted Bhattacharyya coefficient and Chernoff arc.
A direct computation givesHenceMoreover, the Chernoff-tilted density from (18) is again exponential:(b) Optimal Chernoff parameter.
If , then is strictly concave on ; hence, the maximiser in Definition 2 is unique. Differentiating yields the critical point conditionEquivalently, the (unconstrained) critical point satisfiesso thatThe maximiser on is (projection onto ), andIf , then does not depend on α, so any is optimal and . Setting (i.e., ) recovers the classical unweighted expressions.
Additional Example (Baseline, Non-Exponential Family)
Appendix A contains a closed-form illustration for the Cauchy location–scale family. Since the Cauchy family is not an exponential family, this example complements the main text by showing that, even in the unweighted baseline case
, the Bhattacharyya coefficient (in particular
) and the Chernoff information may involve special functions (complete elliptic integrals). For nontrivial weights
, the symmetry
(hence
) typically fails and a comparable closed form is not available, so we keep the baseline Cauchy computation in the appendix.
4.5. Extension to M-ary Hypothesis Testing
We now record the finite-M analogue of Theorem 1. The key observation is that the optimal M-ary pointwise loss is squeezed between pairwise minima (Lemma 3), and each pairwise term has logarithmic rate given by the corresponding weighted Chernoff information. Hence the overall M-ary rate is determined by the closest pair in terms of .
Fix an integer and let be probability measures on dominated by , with strictly positive densities . Let be i.i.d. under each hypothesis . Assume that the weight function factorises as in Assumption 1.
Assume moreover that for every
and every
,
so that all pairwise weighted Chernoff information values are well-defined and inequality
holds true.
A (deterministic)
M-ary decision rule is a measurable map
. Define the context-sensitive loss under
by
and the total loss
Proposition 6
(Pointwise form of the optimal
M-ary total loss).
For each ,where . Moreover, an optimal rule is given by the maximum-likelihood classifier(with any measurable tie-breaking). Proof. Fix
. Using
pointwise, we obtain
Minimisation over
is therefore pointwise in
and is achieved by selecting an index maximising
, yielding (
63). □
Lemma 3
(Pairwise minima).
For any non-negative numbers ,Consequently, defining for we have the sandwich inequality Proof. Let
be the decreasing rearrangement of
. Then
. Moreover,
, proving the left inequality in (
64).
For the right inequality, let
. Then
Applying (
64) pointwise to
, multiplying by
and integrating yields (
65). □
Theorem 3
(
M-ary exponent equals the minimum pairwise weighted Chernoff information).
For , let be the weighted Chernoff information as in Definition 2, and (62) holds true. SetThen the optimal M-ary total loss satisfiesor equivalently, Proof. Fix
and consider the binary testing problem between
and
with the same factorised weight
. The optimal binary total loss equals
and by Theorem 1 applied to the pair
,
Since the number of pairs is finite, letting
yields
and
Now use the sandwich inequality (
65). Let
attain the minimum
. From the lower bound,
From the upper bound,
Taking
and letting
yields (
66). □
Remark 8
(Nonzero priors do not change the exponent).
Let , , and consider the Bayesian weighted total loss with optimum . Then, the exponent remains . Indeed, writing and , for any ,and taking infimum over gives . Hence, and have the same limit .