Next Article in Journal
Finite-Length Spatiotemporal Modelling for Housing Price Network Spillovers
Previous Article in Journal
3D-TCM-Driven Bit-Level Image Encryption via S-Box Feedback Algorithm
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Weighted Chernoff Information and Optimal Loss Exponent in Context-Sensitive Hypothesis Testing

by
Mark Kelbert
1,2,* and
El’mira Yu. Kalimulina
3,4
1
Laboratory of Stochastic Analysis and Its Applications, Department of Statistics and Data Analysis, National Research University Higher School of Economics, 101000 Moscow, Russia
2
Department of Mathematics, Swansea University, Swansea SA2 8PP, UK
3
Institute for Information Transmission Problems, Russian Academy of Sciences (IITP RAS), 127051 Moscow, Russia
4
Faculty of Mechanics and Mathematics, Lomonosov Moscow State University, 119991 Moscow, Russia
*
Author to whom correspondence should be addressed.
Entropy 2026, 28(5), 536; https://doi.org/10.3390/e28050536
Submission received: 20 March 2026 / Revised: 30 April 2026 / Accepted: 2 May 2026 / Published: 8 May 2026
(This article belongs to the Section Information Theory, Probability and Statistics)

Abstract

We study binary hypothesis testing for i.i.d. observations under a multiplicative context weight. For the optimal weighted total loss, defined as the sum of weighted type-I and type-II losses, we prove the logarithmic asymptotic L n = exp { n D C w ( P , Q ) + o ( n ) } , n , where D C w is the weighted Chernoff information. The single-letter form of the exponent relies on a structural assumption that the weight factorises across observations, φ ( x 1 n ) = i = 1 n φ ( x i ) ; this restriction is essential for the single-letter representation and should be distinguished from the weaker qualitative description “multiplicative context weight”. The proof embeds the weighted geometric mixtures φ p α q 1 α into a likelihood-ratio exponential family and identifies the rate through its log-normaliser. We also derive concentration bounds for the tilted weighted log-likelihood, obtain closed forms for Gaussian, Poisson, and exponential models, and extend the exponent characterisation to finitely many hypotheses.

1. Introduction

Let X be a Polish space with its Borel σ -algebra and let X 1 n = ( X 1 , , X n ) be i.i.d. X -valued observations. We consider the simple hypotheses
H 0 : X 1 n P n versus H 1 : X 1 n Q n ,
where P and Q are probability measures on X dominated by a reference measure μ . Without loss of generality, one may take μ = 1 2 ( P + Q ) and write p = d P d μ and q = d Q d μ . In the unweighted setting, the optimal sum of type-I and type-II error probabilities is characterized by TV ( P n , Q n ) and can be written as
X n min { p ( x 1 n ) , q ( x 1 n ) } d μ n ( x 1 n ) , p ( x 1 n ) = i = 1 n p ( x i ) , q ( x 1 n ) = i = 1 n q ( x i ) .
In the standard (unweighted) Bayesian setting, the decay rate of the optimal total error probability is governed by the Chernoff information [1,2]:
ρ α ( p , q ) : = X p ( x ) α q ( x ) 1 α d μ ( x ) , α [ 0 , 1 ] , ρ ( p , q ) : = inf α [ 0 , 1 ] ρ α ( p , q ) , D C ( P , Q ) : = ln ρ ( p , q ) = max α [ 0 , 1 ] ln ρ α ( p , q ) .
Here ρ α is usually called the α -skewed Bhattacharyya affinity coefficient, and ρ ( p , q ) = inf α [ 0 , 1 ] ρ α ( p , q ) is the affinity coefficient. In view of Hölder’s inequality, ρ α ( p , q ) [ 0 ,   1 ] .
Chernoff also introduced an asymptotic efficiency notion for comparing two experimental designs e = ln ρ 1 ln ρ 2 such that n observations on one test are equivalent (i.e., they give asymptotically the same total loss as n ) to e n observations on another test; see [1].
The paper studies a context-sensitive (weighted) analogue of this criterion and the logarithmic asymptotics of the optimal total loss as n , in the framework of [3,4]. In the weighted setting, a nonnegative weight function φ ( x 1 n ) reweights the loss of a wrong decision according to the realised sample. Thus, φ acts as a context factor that changes the relevance of different observations for the statistical task.
Weights of this form arise naturally whenever observations are not equally informative for the inference task. Two canonical mechanisms produce such φ . In importance-type reweighting, samples drawn under a proposal density g are used to perform inference with respect to a target h, and the Radon–Nikodym factor φ ( x ) = h ( x ) / g ( x ) enters the loss as a strictly positive (non-indicator) tilt; this is the mechanism underlying the context-sensitive framework of [3,4].
In applications, the informational value of an observation often depends on the underlying channel state. A canonical example, directly relevant to multiple hypothesis testing of transmission regimes, is a mobile communication channel modulated by a multi-zone coverage process (e.g., strong/weak/outage) along the receiver trajectory: samples acquired in outage carry little information about the regime and are weighted accordingly. Such reliability-weighted aggregation in multi-state channels was studied within multi-valued frameworks in [5,6].
Under the standard assumption that the modulating state at time i is determined by X i alone, the resulting weight is a strictly positive bounded function φ ( x ) and extends multiplicatively to X 1 n . The weighted Chernoff information D C w ( P , Q ) then quantifies the effective discrimination rate under channel-dependent reliability and reduces to the classical rate D C ( P , Q ) in the limit φ 1 . Further parametric instances (Gaussian, Poisson, exponential) are worked out in Section 4.
Throughout we assume that the weight is compatible with the i.i.d. structure and factorises across observations; by abuse of notation, φ denotes both the one-step weight and its product extension.
Assumption 1 
(Factorised weight). The weight function φ ( x 1 n ) satisfies
φ ( x 1 n ) = i = 1 n φ ( x i ) , φ 0 .
Assumption 1 is the key single-letter hypothesis. It yields the weighted affinities
ρ α w ( p , q ) = X φ ( x ) p ( x ) α q ( x ) 1 α d μ ( x ) ,
hence an additive logarithmic rate. For one observation and equal priors, the weighted Bayes risk equals
1 2 X φ ( x ) min { p ( x ) , q ( x ) } d μ ( x ) .
Since min { a , b } a α b 1 α for every α [ 0 ,   1 ] ,
X φ ( x ) min { p ( x ) , q ( x ) } d μ ( x ) ρ α w ( p , q ) ,
and therefore
X φ ( x ) min { p ( x ) , q ( x ) } d μ ( x ) exp { D C w ( P , Q ) } ,
where D C w ( P , Q ) : = max α [ 0 , 1 ] ln ρ α w ( p , q ) (see Definition 2). Under Assumption 1, the same bound factorises over n observations and yields the exponential scale exp { n D C w ( P , Q ) } . Theorem 1 shows that this scale is exact on the logarithmic level.

1.1. Main Result and Contributions

Let L n denote the optimal total context-sensitive loss (sum of weighted type-I and type-II losses, minimised over decision rules) for n i.i.d. observations under Assumption 1. Our main theorem (Theorem 1) proves the single-letter logarithmic asymptotic
L n = exp { n D C w ( P , Q ) + o ( n ) } , n ,
where the rate is the weighted Chernoff information
D C w ( P , Q ) = max α [ 0 ,   1 ] ln X φ ( x ) p ( x ) α q ( x ) 1 α d μ ( x ) .
For φ 1 , (5) reduces to the classical Chernoff information.
We also extend the exponent characterisation to a finite family of simple hypotheses: the optimal M-ary rate is the minimum pairwise weighted Chernoff information (cf. [7] in the unweighted case). A central technical device is an exponential-family representation of the weighted geometric mixtures α φ p α q 1 α . This embeds the mixtures into a likelihood-ratio exponential family and identifies the exponent through the corresponding log-normaliser. We further derive concentration bounds for tilted weighted log-likelihood ratios and closed-form expressions for D C w in several parametric models; see Section 4.

1.2. Contributions

Items (N1)–(N4) below indicate new results; items (A1)–(A3) summarise definitions, geometric context, and tools adopted from the existing literature.
(N1)
(New.) Theorem 1 establishes the logarithmic asymptotic (4) for the optimal weighted total loss under the factorised weight of Assumption 1, with rate given by the weighted Chernoff information (5).
(N2)
(New.) The exponential-family representation of the weighted geometric mixtures α φ p α q 1 α (Section 3.2) and the resulting uniqueness of the optimal skewing parameter α .
(N3)
(New.) Concentration bounds for the tilted weighted log-likelihood and the finite-n tail bound of Theorem 2 (Section 3.4).
(N4)
(New.) Closed-form expressions for D C w in the Gaussian, Poisson, and exponential models (Section 4), and the M-ary extension showing that the optimal rate equals the minimum pairwise weighted Chernoff information.
(A1)
(Adapted definitions.) The definitions of the weighted Bhattacharyya affinities and the weighted Chernoff information generalise the classical unweighted quantities of [1,2] and follow the context-sensitive framework of [3,4]; their asymptotic and information-geometric consequences developed below are new.
(A2)
(Geometric context.) The information-geometric identities of Section 3.3 are derived in the spirit of the Chentsov–Amari–Nielsen framework [8,9,10,11] but are stated and proved for the tilted log-normaliser F ^ ( θ ) = ln φ ( x ) e θ T t ( x ) + k ( x ) d μ ( x ) ; the unweighted limit φ 1 recovers the classical statements of [11,12].
(A3)
(Standard tool.) The concentration argument uses the Azuma–Hoeffding/McDiarmid inequality [13,14]; the novelty lies in its application to the tilted weighted log-likelihood.

1.3. Related Work

The exponential theory of testing errors goes back to Chernoff [1] and Hoeffding [2]. The context-sensitive framework and the weighted information quantities used here were developed in [3,4]. The information-geometric viewpoint on Chernoff information originates with Chentsov [9]; the dually flat structure of exponential and mixture families and the associated α -divergences are developed in [8,10], and the Chernoff point is characterised as the intersection of an exponential geodesic with the Kullback–Leibler bisector in [11]. For φ 1 , the likelihood-ratio exponential family description is given in [12]; the present paper extends this picture to the tilted integrand φ p α q 1 α . The minimum-pairwise principle for multiple testing is due to [7]. Weighting mechanisms for covariate-dependent relevance have also been studied outside the asymptotic error-exponent framework, e.g., adaptive-kernel conditional-independence testing [15].

1.4. Structure of the Paper

Section 2 introduces the weighted Bhattacharyya affinities and the weighted Chernoff information. Section 3 proves the main asymptotic result (4) and develops the exponential-family and information-geometric identities. Section 3.4 studies the tilted weighted log-likelihood and derives finite-n concentration bounds. Section 4 examines Gaussian, Poisson, and exponential models and includes the M-ary extension. Auxiliary computations are collected in the appendices.

2. Problem Set-Up and Weighted Divergences

2.1. Context-Sensitive Losses and Weighted Total Variation

We keep the binary i.i.d. model from Section 1 and work under Assumption 1. In particular,
p ( x 1 n ) = i = 1 n p ( x i ) , q ( x 1 n ) = i = 1 n q ( x i ) , φ ( x 1 n ) = i = 1 n φ ( x i ) .
Define the φ -tilted (reweighted) densities
p ( x ) : = φ ( x ) p ( x ) E φ ( p ) , q ( x ) : = φ ( x ) q ( x ) E φ ( q ) , E φ ( p ) : = φ ( x ) p ( x ) d μ ( x ) ,
(and similarly for E φ ( q ) ). Throughout this section, we assume that E φ ( p ) , E φ ( q ) ( 0 , ) . (Equivalently, ρ 0 w ( p , q ) , ρ 1 w ( p , q ) ( 0 , ) . ) Then p , q are probability densities and, under φ ( x 1 n ) = i = 1 n φ ( x i ) , we have φ ( x 1 n ) p ( x 1 n ) = E φ ( p ) n ( p ) n ( x 1 n ) (and similarly for q).
Under Assumption 1, we have
X n φ ( x 1 n ) p ( x 1 n ) d μ n = ( E φ ( p ) ) n , X n φ ( x 1 n ) q ( x 1 n ) d μ n = ( E φ ( q ) ) n .
Let D denote the class of (possibly randomised) decision rules D : X n [ 0 , 1 ] , where D ( x 1 n ) is the probability of deciding in favour of H 1 after observing x 1 n . (Deterministic rules correspond to D { 0 , 1 } .)
For D D , define the context-sensitive type-I and type-II losses by
α φ ( D ) : = E P n φ ( X 1 n ) D ( X 1 n ) = X n φ ( x 1 n ) D ( x 1 n ) p ( x 1 n ) d μ n ( x 1 n ) ,
β φ ( D ) : = E Q n φ ( X 1 n ) ( 1 D ( X 1 n ) ) = X n φ ( x 1 n ) ( 1 D ( x 1 n ) ) q ( x 1 n ) d μ n ( x 1 n ) ,
and the corresponding total loss
L n ( D ) : = α φ ( D ) + β φ ( D ) , L n : = inf D D L n ( D ) .
Proposition 1 
(Pointwise form of the optimal total loss). For each n 1 ,
L n = X n φ ( x 1 n ) min { p ( x 1 n ) , q ( x 1 n ) } d μ n ( x 1 n ) .
Moreover, an optimal (deterministic) decision rule is given by the likelihood-ratio test
D n ( x 1 n ) = 1 { q ( x 1 n ) p ( x 1 n ) }
(with any measurable tie-breaking on { p = q } ).
Proof. 
Fix x 1 n . The integrand in L n ( D ) equals
φ ( x 1 n ) p ( x 1 n ) D ( x 1 n ) + q ( x 1 n ) ( 1 D ( x 1 n ) ) = φ ( x 1 n ) q ( x 1 n ) + D ( x 1 n ) ( p ( x 1 n ) q ( x 1 n ) ) .
Minimising pointwise over D ( x 1 n ) [ 0 ,   1 ] yields D n ( x 1 n ) = 1 when p ( x 1 n ) q ( x 1 n ) and D n ( x 1 n ) = 0 when p ( x 1 n ) > q ( x 1 n ) , giving (8). □
We also use the weighted total variation distance
TV φ ( P n , Q n ) : = 1 2 X n φ ( x 1 n ) | p ( x 1 n ) q ( x 1 n ) | d μ n ( x 1 n ) .
Remark 1. 
For φ 1 , this reduces to the usual total variation distance. If φ vanishes on a non-negligible set, TV φ is, in general, a pseudo-distance; this is sufficient for our purposes since it characterises the weighted losses.
Using min { a , b } = 1 2 ( a + b | a b | ) in (8) and the definition of TV φ yields
L n = 1 2 ( E φ ( p ) ) n + ( E φ ( q ) ) n TV φ ( P n , Q n ) .

2.2. Weighted Affinities and Chernoff Information

We introduce the weighted Bhattacharyya affinities and the weighted Chernoff information. Assume that ρ α w ( p , q ) ( 0 , ) for all α [ 0 ,   1 ] .
Definition 1 
(Weighted Bhattacharyya coefficient and distance). For α [ 0 ,   1 ] define the weighted α-skewed Bhattacharyya affinity coefficient
ρ α w ( p , q ) : = X φ ( x ) p ( x ) α q ( x ) 1 α d μ ( x ) ,
and the corresponding weighted Bhattacharyya distance
D B , α w ( p , q ) : = ln ρ α w ( p , q ) .
Definition 2 
(Weighted Chernoff information). The weighted Chernoff information divergence between P and Q is
D C w ( P , Q ) = max α [ 0 , 1 ] ln X φ ( x ) p ( x ) α q ( x ) 1 α d μ ( x ) = max α [ 0 , 1 ] D B , α w ( p , q ) .
A maximiser α = α ( p , q ) in (13) is called the optimal Chernoff parameter.
Remark 2. 
The weighted Chernoff information is symmetric: D C w ( P , Q ) = D C w ( Q , P ) , since ρ α w ( p , q ) = ρ 1 α w ( q , p ) . In general, however, D C w does not satisfy the triangle inequality and is therefore a divergence rather than a metric.
Remark 3. 
Under Assumption 1, for every α [ 0 ,   1 ] ,
X n φ ( x 1 n ) p ( x 1 n ) α q ( x 1 n ) 1 α d μ n ( x 1 n ) = ρ α w ( p , q ) n .
Consequently, the weighted Bhattacharyya distances are additive in n and the corresponding Chernoff exponent is of single-letter form.

3. Asymptotics and Information-Geometric Identities

Before stating the main theorem, we separate the two optimisations that appear throughout this section and should not be conflated. The total loss L n ( D ) = α φ ( D ) + β φ ( D ) is a non-negative functional of the decision rule and is minimised over D D , giving the optimal loss L n = inf D D L n ( D ) . The map α ln ρ α w ( p , q ) is a non-negative concave functional of the skewing parameter and is maximised over α [ 0 , 1 ] , giving the weighted Chernoff information D C w ( P , Q ) = sup α [ 0 , 1 ] [ ln ρ α w ( p , q ) ] . Theorem 1 below connects the two via the single-letter asymptotic L n = exp { n D C w ( P , Q ) + o ( n ) } .

3.1. Asymptotics of the Optimal Sum of Losses

Recall from Section 2 that
L n : = inf D D α φ ( D ) + β φ ( D ) ,
and that Proposition 1 yields (8). The next theorem identifies its exact logarithmic asymptotic rate with the weighted Chernoff information from Definition 2.
Theorem 1 
(Optimal sum of context-sensitive losses). Consider the binary hypotheses H 0 : X 1 n P n versus H 1 : X 1 n Q n , under Assumption 1. Assume also
sup α [ 0 , 1 ] φ ( x ) | ln p ( x ) q ( x ) | p ( x ) α q ( x ) 1 α d μ ( x ) < .
Let D be the class of (possibly randomised) decision rules D : X n [ 0 ,   1 ] , and let α φ ( D ) , β φ ( D ) be defined by (6) and (7). Assume that p , q > 0 μ-a.e. and that ρ α w ( p , q ) ( 0 , ) for all α [ 0 ,   1 ] .
Then, as n ,
L n = exp { n D C w ( P , Q ) + o ( n ) } .
Equivalently,
lim n 1 n ln L n = D C w ( P , Q ) .
Proof. 
By Proposition 1,
L n = X n φ ( x 1 n ) min { p ( x 1 n ) , q ( x 1 n ) } d μ n ( x 1 n ) .
For any α [ 0 , 1 ] , min { a , b } a α b 1 α , hence, by factorisation,
L n ρ α w ( p , q ) n .
Taking the infimum over α [ 0 , 1 ] gives
lim inf n 1 n ln L n D C w ( P , Q ) .
Now fix α arg min α [ 0 , 1 ] ρ α w ( p , q ) and define
r α ( x ) : = φ ( x ) p ( x ) α q ( x ) 1 α ρ α w ( p , q ) , S n : = i = 1 n ln p ( X i ) q ( X i ) .
A direct change of measure yields
L n = ρ α w ( p , q ) n E r α n e α S n 1 { S n > 0 } + e ( 1 α ) S n 1 { S n 0 } .
The bracket is bounded above by 1.
Let F ( α ) : = ln ρ α w ( p , q ) , it is easy to check that F ( α ) is a convex function. Under the regularity assumption of (14),
F ( α ) = E r α ln p ( X ) q ( X ) .
If α ( 0 ,   1 ) , then F ( α ) = 0 . Hence, in view of LLN, S n / n 0 in r α -probability. Therefore, for every ε > 0 ,
E r α n e ( 1 α ) S n 1 { S n 0 } + e α S n 1 { S n > 0 } e ε n r α n | S n | ε n = exp { o ( n ) } .
Combined with the upper bound by 1, this implies
L n = exp { n D C w ( P , Q ) + o ( n ) } .
In the boundary case α = 0 , we have the mean value m > 0 , and 1 ( S n > 0 ) e α S n | α = 0 1 a.s. as n by the strong LLN. Similarly, in the case α = 1 , we have the mean value m 0 , and 1 ( S n 0 ) e ( 1 α ) S n | α = 1 1 a.s. as n . This completes the proof. □
Corollary 1 
(Asymptotics of the weighted total variation). Under the assumptions of Theorem 1, the weighted total variation satisfies
TV φ ( P n , Q n ) = 1 2 ( E φ ( p ) ) n + ( E φ ( q ) ) n exp { n D C w ( P , Q ) + o ( n ) } , n ,
where E φ ( r ) = X φ ( x ) r ( x ) d μ ( x ) .
Proof. 
Combine the identity (10) with (15). □
Remark 4. 
The weighted α-skewed Bhattacharyya distance (12) appears in many papers, see e.g., [12]. Definition 2 shows that the weighted Chernoff information divergence is the maximally skewed weighted Bhattacharyya distance.

3.2. Exponential-Family Representation and Uniqueness of α

In order to develop an effective computational procedure and to connect the weighted Chernoff information to information geometry, we embed the weighted geometric mixtures of p and q into a one-parameter likelihood-ratio exponential family. For α [ 0 ,   1 ] define
Z p q ( α ) : = X φ ( x ) p ( x ) α q ( x ) 1 α d μ ( x ) = ρ α w ( p , q ) ,
and the corresponding normalised density
E p q = ( p q ) α ( x ) : = φ ( x ) p ( x ) α q ( x ) 1 α Z p q ( α ) : α [ 0 ,   1 ] .
By assumption Z p q ( α ) ( 0 , ) , so ( p q ) α is well-defined as a probability density w.r.t. μ .
Set t ( x ) : = ln p ( x ) q ( x ) and k p q ( x ) : = ln φ ( x ) + ln q ( x ) . Then, ( p q ) α admits the exponential-family form
( p q ) α ( x ) = exp α t ( x ) F p q ( α ) + k p q ( x ) ,
F p q ( α ) : = ln Z p q ( α ) = D B , α w ( p , q ) .
In particular, t ( X ) is a sufficient statistic for the family E p q .
The log-normaliser F p q is convex on [ 0 ,   1 ] ; if ln p q is not μ -a.e. constant on { φ > 0 } , then F p q is strictly convex and the maximiser α in (13) is unique. By Hölder’s inequality,
Z p q ( α ) = X ( φ p ) α ( φ q ) 1 α d μ ( E φ ( p ) ) α ( E φ ( q ) ) 1 α .
Finally, note that Z p q ( 1 ) = E φ ( p ) and Z p q ( 0 ) = E φ ( q ) ; hence,
( p q ) 1 ( x ) = φ ( x ) p ( x ) E φ ( p ) , ( p q ) 0 ( x ) = φ ( x ) q ( x ) E φ ( q ) ,
so E p q is an exponential arc between the tilted versions of P and Q . By this definition, the following identities hold:
D C w ( p , q ) = D B , α ( p , q ) w ( p , q ) = D B , α ( q , p ) w ( q , p ) = D C w ( q , p ) .

3.3. Weighted Bregman Divergence and Information-Geometric Identities

3.3.1. Weighted KL Divergence and Weighted Bregman Divergence

This subsection collects information-geometric identities useful for analysing ρ α w and for computing the optimal Chernoff parameter α . We follow [3] for weighted Bregman divergences.
Let E = { p θ : θ Θ R d } be a regular exponential family of densities (with respect to μ ),
p θ ( x ) = exp { θ T t ( x ) F ( θ ) + k ( x ) } , x X .
For a density r, set E φ ( r ) : = X φ ( x ) r ( x ) d μ ( x ) and write E φ ( θ ) : = E φ ( p θ ) .
Assume E φ ( θ ) ( 0 , ) for θ Θ and define the tilted log-normaliser
F ^ ( θ ) : = ln X φ ( x ) e θ T t ( x ) + k ( x ) d μ ( x ) = F ( θ ) + ln E φ ( θ ) .
Equivalently, the tilted density is
p θ ( x ) = φ ( x ) p θ ( x ) E φ ( θ ) .
Definition 3 
(Weighted Kullback–Leibler divergence). For densities p , q on X define
D KL w ( p q ) : = X φ ( x ) p ( x ) ln p ( x ) q ( x ) d μ ( x ) ,
whenever the integral is well defined in ( , ] .
Definition 4 
(Weighted Bregman divergence). The weighted Bregman divergence associated with ( F , F ^ ) is
B φ , F w ( θ 1 , θ 2 ) : = e F ^ ( θ 2 ) F ( θ 2 ) F ( θ 1 ) F ( θ 2 ) ( θ 1 θ 2 ) T F ^ ( θ 2 ) = E φ ( θ 2 ) F ( θ 1 ) F ( θ 2 ) ( θ 1 θ 2 ) T F ^ ( θ 2 ) .
Proposition 2 
(Weighted KL as weighted Bregman divergence). For a regular exponential family E = { P θ , θ Θ } , assume that the integral in (25) is well-defined in ( , ] . Then for any θ 1 , θ 2 Θ ,
D KL w ( p θ 1 p θ 2 ) = B φ , F w ( θ 2 , θ 1 ) .
Proof. 
This identity is stated in [3] (Proposition 4.1); we give a short derivation for completeness. By (22),
ln p θ 1 ( x ) p θ 2 ( x ) = ( θ 1 θ 2 ) T t ( x ) ( F ( θ 1 ) F ( θ 2 ) ) .
Substituting this into (25) yields
D KL w ( p θ 1 p θ 2 ) = ( θ 1 θ 2 ) T X φ ( x ) t ( x ) p θ 1 ( x ) d μ ( x ) ( F ( θ 1 ) F ( θ 2 ) ) E φ ( θ 1 ) .
Using (23) and differentiation under the integral sign (regularity of E ),
F ^ ( θ 1 ) = X φ ( x ) t ( x ) e θ 1 T t ( x ) + k ( x ) d μ ( x ) X φ ( x ) e θ 1 T t ( x ) + k ( x ) d μ ( x ) = X φ ( x ) t ( x ) p θ 1 ( x ) d μ ( x ) E φ ( θ 1 ) .
Hence, φ t p θ 1 d μ = E φ ( θ 1 ) F ^ ( θ 1 ) , and therefore
D KL w ( p θ 1 p θ 2 ) = E φ ( θ 1 ) F ( θ 2 ) F ( θ 1 ) ( θ 2 θ 1 ) T F ^ ( θ 1 ) = B φ , F w ( θ 2 , θ 1 ) ,
which proves (27). □
Proposition 3 
(Primal–dual identities for (weighted) Bregman divergences). Let F be a log-normaliser of a regular exponential family and let F denote its Legendre transform. Write θ = F ( θ ) and θ = F ( θ ) .
(a) Weighted one-parameter identity. Assume d = 1 (one-parameter case) and let θ i : = F ( θ i ) . Then, the following weighted analogue of the classical Bregman duality holds:
B φ , F w ( θ 1 , θ 2 ) = B φ , F w ( θ 2 , θ 1 ) ( θ 1 θ 2 ) ln E φ ( θ 2 ) + ( θ 2 θ 1 ) ln E φ ( θ 1 ) ,
where B φ , F w is as in Definition 4 and B φ , F w is defined analogously (with F replaced by F and with the convention E φ ( θ ) : = E φ ( θ ) under θ = F ( θ ) ).
(b) Classical identity. For any d 1 , the (unweighted) Bregman divergence admits the standard Legendre representation
B F ( θ 0 , θ 1 ) = F ( θ 0 ) + F ( θ 1 ) θ 0 T θ 1 , θ 1 = F ( θ 1 ) ,
where B F ( θ 0 , θ 1 ) = F ( θ 0 ) F ( θ 1 ) ( θ 0 θ 1 ) T F ( θ 1 ) is the usual (unweighted) Bregman divergence.
Proof. 
Part (a) is a weighted extension of the classical duality B F ( θ 1 , θ 2 ) = B F ( θ 2 , θ 1 ) and follows by combining the weighted representation (26) with Legendre relations; see also [3]. Part (b) is standard. □

3.3.2. Weighted Chernoff/Bhattacharyya Quantities Inside an Exponential Family

Let p θ 1 , p θ 2 E and θ α : = α θ 1 + ( 1 α ) θ 2 . A direct calculation yields
ρ α w ( p θ 1 , p θ 2 ) = ln X φ ( x ) p θ 1 ( x ) α p θ 2 ( x ) 1 α d μ ( x ) = F ^ ( θ α ) α F ( θ 1 ) ( 1 α ) F ( θ 2 ) .
Consequently,
D B , α w ( p θ 1 , p θ 2 ) = α F ( θ 1 ) + ( 1 α ) F ( θ 2 ) F ^ ( θ α ) = U F , α ( θ 1 , θ 2 ) ln E φ ( θ α ) ,
where U F , α ( θ 1 , θ 2 ) : = α F ( θ 1 ) + ( 1 α ) F ( θ 2 ) F ( θ α ) is the (unweighted) Jensen/Burbea–Rao divergence induced by F. In particular, when φ 1 we have F ^ F and D B , α w ( p θ 1 , p θ 2 ) = U F , α ( θ 1 , θ 2 ) .
Remark 5 
(Geometric mixtures and tilting by φ ). In particular, when φ 1 , we have F ^ F and the normalised geometric mixture p θ 1 α p θ 2 1 α belongs to the same exponential family, namely, p θ α with θ α = α θ 1 + ( 1 α ) θ 2 .
Proposition 4 
(Optimal Chernoff parameter in an exponential family). Assume that F ^ is strictly convex on the segment [ θ 1 , θ 2 ] and that the maximiser α ( 0 , 1 ) exists. Then, α is unique and satisfies
( θ 1 θ 2 ) T F ^ ( θ α ) = F ( θ 1 ) F ( θ 2 ) , θ α = α θ 1 + ( 1 α ) θ 2 ,
with F ^ ( θ ) = E p θ [ t ( X ) ] .
Proof. 
Differentiate (31) with respect to α and use d d α θ α = θ 1 θ 2 . Strict convexity of F ^ on [ θ 1 , θ 2 ] implies strict concavity of α D B , α w ( p θ 1 , p θ 2 ) , hence uniqueness. □
Proposition 5 
(Chernoff information as a Jensen-type divergence and a Bregman bisector). Let p θ 1 , p θ 2 E and assume that the maximiser α ( 0 , 1 ) in Definition 2 exists and is unique. Set θ α = α θ 1 + ( 1 α ) θ 2 . Then
D C w ( p θ 1 , p θ 2 ) = D B , α w ( p θ 1 , p θ 2 ) = α F ( θ 1 ) + ( 1 α ) F ( θ 2 ) F ^ ( θ α ) .
Moreover, θ α is characterised by the weighted Bregman bisector condition
B φ , F w ( θ 1 , θ α ) = B φ , F w ( θ 2 , θ α ) ,
and the common value recovers the Chernoff information as
D C w ( p θ 1 , p θ 2 ) = 1 E φ ( θ α ) B φ , F w ( θ 1 , θ α ) ln E φ ( θ α )
= 1 E φ ( θ α ) B φ , F w ( θ 2 , θ α ) ln E φ ( θ α ) .
In the special case φ 1 , we have F ^ F and E φ ( θ ) 1 , so that (33) reduces to the classical Jensen divergence induced by F and (35) becomes D C ( p θ 1 , p θ 2 ) = B F ( θ 1 , θ α ) = B F ( θ 2 , θ α ) .
Proof. 
Equation (33) is (31) at α = α . For (34), expand B φ , F w ( θ i , θ α ) = E φ ( θ α ) F ( θ i ) F ( θ α ) ( θ i θ α ) T F ^ ( θ α ) and use (32) to see that the difference vanishes. Finally, substituting θ 1 θ α = ( 1 α ) ( θ 1 θ 2 ) into B φ , F w ( θ 1 , θ α ) / E φ ( θ α ) and using (32) gives
1 E φ ( θ α ) B φ , F w ( θ 1 , θ α ) = α F ( θ 1 ) + ( 1 α ) F ( θ 2 ) F ( θ α ) .
Since F ^ ( θ α ) = F ( θ α ) + ln E φ ( θ α ) , this yields (35). □

3.3.3. Derivative and Weighted KL

Recall F p q ( α ) = ln Z p q ( α ) = ln ρ α w ( p , q ) . In view of (14 ), the differentiation under the integral sign is justified, and we have
F p q ( α ) = E ( p q ) α ln p ( X ) q ( X ) ,
where ( p q ) α is the Chernoff-tilted density from (18). In particular,
F p q ( 1 ) = 1 E φ ( p ) X φ ( x ) p ( x ) ln p ( x ) q ( x ) d μ ( x ) = 1 E φ ( p ) D KL w ( p q ) .
Analogously, F p q ( 0 ) = 1 E φ ( q ) D KL w ( q p ) .

3.3.4. Chernoff–KL

Lemma 1. 
Let α be a maximiser in Definition 2 and assume that α ( 0 , 1 ) (so that F p q ( α ) = 0 below). Set F p q ( α ) : = ln ρ α w ( p , q ) , r α : = ( p q ) α . Then
D C w ( p , q ) = D KL r α r 1 ln E φ ( p ) = D KL r α r 0 ln E φ ( q ) ,
where r 1 ( x ) = φ ( x ) p ( x ) / E φ ( p ) and r 0 ( x ) = φ ( x ) q ( x ) / E φ ( q ) .
Here, D KL denotes the standard (unweighted) Kullback–Leibler divergence.
Proof. 
A direct computation yields, for α [ 0 ,   1 ] ,
D KL ( r α r 1 ) = ( 1 α ) F p q ( α ) F p q ( α ) + F p q ( 1 ) , D KL ( r α r 0 ) = α F p q ( α ) F p q ( α ) + F p q ( 0 ) .
At α = α , we have F p q ( α ) = 0 . Since F p q ( 1 ) = ln E φ ( p ) and F p q ( 0 ) = ln E φ ( q ) , the claim follows from D C w ( p , q ) = F p q ( α ) . □
Corollary 2 
(Chernoff information as a Bregman divergence on the Chernoff arc). Let F p q ( α ) : = ln ρ α w ( p , q ) and define the one-dimensional Bregman divergence
B F p q ( a , b ) : = F p q ( a ) F p q ( b ) ( a b ) F p q ( b ) .
Assume that the maximiser α ( 0 , 1 ) in Definition 2 is interior, so that F p q ( α ) = 0 . Then
D C w ( p , q ) = B F p q ( 1 , α ) ln E φ ( p ) = B F p q ( 0 , α ) ln E φ ( q ) .
Equivalently,
B F p q ( 1 , α ) = D KL ( r α r 1 ) = D C w ( p , q ) + ln E φ ( p ) , B F p q ( 0 , α ) = D KL ( r α r 0 ) = D C w ( p , q ) + ln E φ ( q ) ,
with r α = ( p q ) α , r 1 = φ p / E φ ( p ) and r 0 = φ q / E φ ( q ) .
Proof. 
In the Chernoff exponential family { r α } with log-normalizer F p q , the KL–Bregman identity gives D KL ( r α r 1 ) = B F p q ( 1 , α ) and D KL ( r α r 0 ) = B F p q ( 0 , α ) . Since F p q ( α ) = 0 and F p q ( 1 ) = ln E φ ( p ) , F p q ( 0 ) = ln E φ ( q ) , we obtain
B F p q ( 1 , α ) = F p q ( α ) + F p q ( 1 ) = D C w ( p , q ) + ln E φ ( p ) ,
and similarly for 0. Substituting this into Lemma 1 yields (40). □
Remark 6 
(One-parameter case). Assume d = 1 , θ 1 θ 2 , and that the maximiser α ( 0 , 1 ) in Definition 2 is interior. Assume moreover that F ^ is strictly increasing on Θ and set G ^ = ( F ^ ) 1 . Then (32) yields
α = 1 θ 1 θ 2 G ^ F ( θ 1 ) F ( θ 2 ) θ 1 θ 2 θ 2 .
When φ 1 , we have F ^ F and (41) reduces to the classical formula.
To illustrate the general identities above, we provide explicit expressions for D B , α w and D C w in several parametric settings (Gaussian, Poisson, and exponential); see Section 4.
Remark 7. 
The Bregman representation (40) identifies D C w ( P , Q ) with a Bregman divergence on the tilted log-normaliser F ^ ; geometrically, the optimiser α marks the intersection of the exponential geodesic of the tilted family { φ p α q 1 α } α [ 0 , 1 ] with the weighted Kullback–Leibler bisector, generalising the unweighted characterisation of [11,12]. In the exponential-family setting, the computation of D C w therefore reduces to F ^ and its gradient: once F ^ is available in closed form, α is determined by (41) in the one-parameter case and by a monotone equation in the natural-parameter space in general, without evaluating ρ α w for each α. The examples of Section 4 exemplify this reduction.

3.4. Tilted Weighted Likelihood and Concentration Bounds

Although the optimal rule in the context-sensitive problem is still the usual likelihood-ratio test q / p (cf. Section 3), it is convenient to work with the tilted ratio q / p . The factor φ cancels pointwise in q / p and enters only through the normalisation constants E φ ( p ) and E φ ( q ) . We record two consequences: a large-deviation representation for L / n via the cumulant generating functions ψ P , ψ Q and their Legendre transforms and a finite-n concentration bound based on a martingale argument.
For the tilted distributions, the log-likelihood takes the form
L ( X 1 n ) = L ( X 1 , , X n ) = i = 1 n ln q ( X i ) p ( X i ) = i = 1 n ln q ( X i ) p ( X i ) n ln E φ ( q ) + n ln E φ ( p ) .
Here E φ ( p ) = φ ( x ) p ( x ) d μ ( x ) .
In particular, since
ln q ( x ) p ( x ) = ln q ( x ) p ( x ) + ln E φ ( p ) ln E φ ( q ) ,
we may equivalently rewrite likelihood-ratio threshold rules in terms of L . For example,
i = 1 n ln q ( X i ) p ( X i ) 0 L ( X 1 n ) n ln E φ ( p ) ln E φ ( q ) .
Thus, L is the usual log-likelihood ratio, shifted by a constant determined by the context weight φ .
The log of the moment generating function and its Legendre transform take the form
ψ P ( α ) = ln E P e α ln q ( X ) p ( X ) = ln X q ( x ) α p ( x ) 1 α d μ ( x ) α ln E φ ( q ) + α ln E φ ( p ) , I P ( r ) = sup α α r ψ P ( α ) ,
where α ranges over the set { α R : ψ P ( α ) < } (and similarly for ψ Q ).
Similarly,
ψ Q ( α ) : = ln E Q e α ln q ( X ) p ( X ) = ln E P e ln q ( X ) p ( X ) + α ln q ( X ) p ( X ) = ln X q ( x ) α + 1 p ( x ) α d μ ( x ) α ln E φ ( q ) + α ln E φ ( p ) .
This implies the relation of Legendre transforms
I Q ( r ) = I P ( r ) r + ln E φ ( p ) ln E φ ( q ) .
In particular, I P ( 0 ) may be treated as a natural weighted version of the Chernoff divergence between q and p:
I P ( 0 ) = sup α ln X q ( x ) α p ( x ) 1 α d μ ( x ) + α ln E φ ( q ) α ln E φ ( p ) = : D ^ C w ( q , p ) .
Interpretation. The value I P ( 0 ) is the Chernoff–Cramér exponent controlling the tail event { L / n 0 } under P , i.e., (under standard regularity assumptions), P ( L ( X 1 n ) 0 ) e n I P ( 0 ) . In the unweighted case φ 1 , we have E φ ( p ) = E φ ( q ) = 1 and I P ( 0 ) reduces to the classical Chernoff information between p and q. We also stress that D ^ C w ( q , p ) = I P ( 0 ) is a tilted-likelihood exponent and is distinct from the weighted Chernoff information D C w ( P , Q ) from Definition 2, which governs the optimal sum of context-sensitive losses.

Non-Asymptotic Concentration via a Doob Martingale

The rate functions I P and I Q capture the exponential scale of deviations of L / n as n . To obtain explicit finite-n bounds, we now apply a standard martingale method to L under Q .
Consider the filtration F k = σ ( X 1 , , X k ) and define the random variables { U k , k = 0 , , n } by
U k = E Q L ( X 1 , , X n ) | F k = j = 1 k ln q ( X j ) p ( X j ) + ( n k ) D KL ( Q P ) n ln E φ ( q ) + n ln E φ ( p ) ,
where D KL ( Q P ) = q ( x ) ln q ( x ) p ( x ) d μ ( x ) stands for the (unweighted) Kullback–Leibler divergence of Q and P . Then
U k U k 1 = ln q ( X k ) p ( X k ) D KL ( Q P ) , k = 1 , , n .
Observe that { U k , F k } is a martingale w.r.t. Q .
Assume now that d < and σ 2 < , where d = sup x X ln q ( x ) p ( x ) D KL ( Q P ) and
σ 2 = E Q ( U k U k 1 ) 2 | F k 1 = X q ( x ) ln q ( x ) p ( x ) D KL ( Q P ) 2 d μ ( x ) .
In view of a refined Azuma–Hoeffding/McDiarmid inequality [13,14],
P Q L ( X 1 , , X n ) > ( D KL ( Q P ) + β ) n ( ln E φ ( q ) ln E φ ( p ) ) n exp n D δ + γ 1 + γ γ 1 + γ ,
where δ = σ 2 d 2 , γ = β d , β = β ln E φ ( q ) + ln E φ ( p ) and D ( p q ) = p ln p q + ( 1 p ) ln 1 p 1 q stands for the Kullback–Leibler divergence between the two Bernoulli distributions ( p , 1 p ) and ( q , 1 q ) .
We use the following modified version of Azuma–Hoeffding inequality; see [14].
Lemma 2. 
Let { U k , F k } be a discrete-time real-valued martingale. Assume that, for some constants d , σ > 0 , the following two requirements are satisfied a.s. for every k { 1 , , n } :
| U k U k 1 | d , Var [ U k U k 1 | F k 1 ] σ 2 .
Then, for every β 0 ,
P | U n U 0 | β n 2 exp n D δ + γ 1 + γ γ 1 + γ ,
where δ = σ 2 d 2 and γ = β d . (Note that δ [ 0 , 1 ] automatically whenever | U k U k 1 | d a.s.)
In particular, the one-sided bound P ( U n U 0 β n ) exp { n D ( · ) } holds.
Theorem 2. 
Set
β = β ln E φ ( p ) + ln E φ ( q ) , γ = β d .
Under conditions d < , σ 2 < , and β 0 ,
P Q L ( X 1 , , X n ) β n exp n D δ + γ 1 + γ γ 1 + γ .
Lemma 2 is quoted from [14]. Theorem 2 is its direct application to the tilted log-likelihood ratio (42); the only dependence on the context weight φ is through E φ ( p ) and E φ ( q ) .

4. Examples and Applications

The identities of Section 3 reduce the computation of D B , α w and D C w to the single-letter weighted affinity
ρ α w ( p , q ) = φ ( x ) p ( x ) α q ( x ) 1 α d μ ( x ) ,
followed by optimisation over α [ 0 , 1 ] . We work this out for Gaussian, Poisson, and exponential families, highlighting how the context weight φ modifies the classical formulas. When φ 1 , the expressions reduce to the standard unweighted Bhattacharyya and Chernoff quantities; more involved non-exponential-family computations (such as the Cauchy location–scale family) are deferred to the Appendix A.

4.1. A Numerical Illustration

This subsection illustrates the behaviour of α and D C w under a non-trivial factorised weight and serves as a direct numerical verification of the Bregman identities of Section 3.3: for the model below the affinity ρ α w is available in closed form, and we check that closed-form evaluation and direct numerical integration agree to machine precision.
Consider the asymmetric Gaussian hypotheses
H 0 : P = N ( μ 0 , σ 0 2 ) , H 1 : Q = N ( μ 1 , σ 1 2 ) , ( μ 0 , σ 0 2 ) = ( 0 , 1 ) , ( μ 1 , σ 1 2 ) = ( 3 , 2 ) ,
with a non-indicator factorised weight
φ ( x ) = exp β ( x x 0 ) 2 , x 0 = 0 , β 0 .
At β = 0 , one has φ 1 and the unweighted Chernoff information is recovered. For β > 0 , the weight concentrates near x 0 = μ 0 ; in particular, (54) is not an indicator-type weight, so the weighted problem does not reduce to the unweighted Chernoff information on a restricted domain.
The asymmetry σ 0 2 σ 1 2 and the centring of φ at the H 0 mean are essential for the illustration. In a fully symmetric configuration ( σ 0 = σ 1 , μ 1 = μ 0 , x 0 = 0 ), the problem is invariant under α 1 α , so the optimum is pinned to α = 1 / 2 for every β and the effect of the weight on the Chernoff compromise is invisible. Asymmetric hypotheses are precisely where the weighted formalism is operationally distinct from the classical one, and it is this distinction that the numerics below is designed to expose.
Writing A ( α ) = α / σ 0 2 + ( 1 α ) / σ 1 2 + 2 β , B ( α ) = α μ 0 / σ 0 2 + ( 1 α ) μ 1 / σ 1 2 + 2 β x 0 , and C ( α ) = α μ 0 2 / σ 0 2 + ( 1 α ) μ 1 2 / σ 1 2 + 2 β x 0 2 , a direct Gaussian integration yields
ln ρ α w ( p , q ) = 1 2 ln 2 π A ( α ) α 2 ln ( 2 π σ 0 2 ) 1 α 2 ln ( 2 π σ 1 2 ) 1 2 C ( α ) B ( α ) 2 A ( α ) ,
and maximising (55) over α [ 0 ,   1 ] gives α ( β ) and D C w ( P , Q ) ( β ) . Table 1 reports their values at three selected β . Direct numerical integration of ρ α w from its definition agrees with (55) to machine precision on all tabulated entries, which confirms the Bregman identities of Section 3.3 numerically for this example.
The monotone growth of β D C w and the leftward shift of α ( β ) illustrate a qualitative conclusion of Section 3: localising φ near μ 0 preferentially retains observations that are more probable under H 0 and thereby increases the effective discrimination rate, while simultaneously moving the optimal tilting towards the H 0 side. The classical unweighted limit is recovered at β = 0 .
In the language of hypothesis testing, α is the parameter that balances the exponential rates of the type-I and type-II losses at the Bayes optimum: the type-I exponent equals α D C w and the type-II exponent equals ( 1 α ) D C w (cf. Section 3). A leftward shift of α therefore corresponds to reallocating the available exponential budget towards faster decay of the type-II loss at the expense of the type-I loss, which is the optimal response to a weight that concentrates mass in the region where H 0 is more plausible.

Data and Code Availability

A Jupyter/Colab notebook reproducing Table 1 and Figure 1, Figure 2 and Figure 3, together with the direct-integration verification of (55), is archived on Zenodo [16] and mirrored on GitHub.

4.2. Gaussian Models

Throughout this subsection, the reference measure is the Lebesgue measure on R d . We compute the weighted Bhattacharyya coefficient
ρ α w ( P , Q ) = R d φ ( x ) p ( x ) α q ( x ) 1 α d x , α [ 0 , 1 ] ,
together with the weighted Bhattacharyya distance D B , α w ( P , Q ) : = ln ρ α w ( P , Q ) and the weighted Chernoff information D C w ( P , Q ) = max α [ 0 ,   1 ] D B , α w ( P , Q ) (Definition 2). Note that, unlike the unweighted case, ρ α w is not restricted to [ 0 ,   1 ] and D B , α w (hence also D C w ) may take negative values.
Example 1 
(Gaussian weighted Bhattacharyya coefficient with exponential weight). Let P = N ( μ 1 , Σ 1 ) and Q = N ( μ 2 , Σ 2 ) on R d , where Σ 1 0 and Σ 2 0 , and let φ ( x ) = e γ T x for some γ R d . Denote by p , q the corresponding densities. For α [ 0 ,   1 ] define
Σ α : = α Σ 1 1 + ( 1 α ) Σ 2 1 1 , μ ˜ α : = Σ α α Σ 1 1 μ 1 + ( 1 α ) Σ 2 1 μ 2 + γ .
Then
ρ α w ( P , Q ) = R d e γ T x p ( x ) α q ( x ) 1 α d x = | Σ α | 1 / 2 | Σ 1 | α / 2 | Σ 2 | ( 1 α ) / 2 exp 1 2 α μ 1 T Σ 1 1 μ 1 + ( 1 α ) μ 2 T Σ 2 1 μ 2 μ ˜ α T Σ α 1 μ ˜ α .
Consequently,
D B , α w ( P , Q ) = = 1 2 α μ 1 T Σ 1 1 μ 1 + ( 1 α ) μ 2 T Σ 2 1 μ 2 μ ˜ α T Σ α 1 μ ˜ α + ln | Σ 1 | α | Σ 2 | 1 α | Σ α | .
In particular, setting γ = 0 (i.e., φ 1 ) reduces (57) to the classical (unweighted) Gaussian Bhattacharyya distance; see, e.g., [12].
Corollary 3 
(Common covariance). In Example 1, assume Σ 1 = Σ 2 = Σ 0 and keep the exponential weight φ ( x ) = e γ T x . Set δ : = μ 1 μ 2 , v Σ 1 2 : = v T Σ 1 v , and μ α : = α μ 1 + ( 1 α ) μ 2 . Then, for any α [ 0 ,   1 ] ,
ρ α w ( P , Q ) = R d e γ T x p ( x ) α q ( x ) 1 α d x = exp α ( 1 α ) 2 δ Σ 1 2 + γ T μ α + 1 2 γ T Σ γ ,
and therefore
D B , α w ( P , Q ) = ln ρ α w ( P , Q ) = α ( 1 α ) 2 δ Σ 1 2 γ T μ α 1 2 γ T Σ γ .
If μ 1 μ 2 and the unconstrained maximiser
α ˜ = 1 2 γ T δ δ Σ 1 2
belongs to ( 0 ,   1 ) , then α = α ˜ ; otherwise the maximum over α [ 0 ,   1 ] is attained at the nearest boundary point α { 0 , 1 } . In all cases,
D C w ( P , Q ) = max α [ 0 , 1 ] D B , α w ( P , Q ) = D B , α w ( P , Q ) .
In particular, for γ = 0 (i.e., φ 1 ) we recover α = 1 / 2 and D C ( P , Q ) = δ Σ 1 2 / 8 .
Proof. 
The expression for ρ α w follows by simplifying Example 1 under Σ 1 = Σ 2 = Σ , which makes the determinant prefactor equal to 1 and yields a Gaussian MGF term exp ( γ T μ α + 1 2 γ T Σ γ ) . The maximiser follows by differentiating (59) in α . □
Choosing an exponential weight φ ( x ) = e γ T x corresponds to exponential tilting: for a Gaussian X N ( μ , Σ ) , this tilting keeps the covariance and shifts the mean to μ + Σ γ (with normalisation factor exp ( γ T μ + 1 2 γ T Σ γ ) ), which is why the weighted affinities remain available in closed form. In particular, the optimal Chernoff parameter is no longer forced to be α = 1 / 2 and, as (59) shows, sufficiently strong tilting can push the maximiser to the boundary α { 0 , 1 } .

4.3. Poisson Models

Example 2 
(Poisson model with exponential weight). Let X = N 0 = { 0 , 1 , 2 , } and let μ be the counting measure on X . Fix two hypotheses P = Poi ( λ 1 ) and Q = Poi ( λ 2 ) with λ 1 , λ 2 > 0 , and write p = p λ 1 and q = q λ 2 . Throughout this subsection we work under the standing assumption of Section 2, namely, that the observations X 1 , , X n are i.i.d. (distributed as P under H 0 and as Q under H 1 ), and that the weight φ factorises across observations (Assumption 1). We still consider the exponential weight φ γ ( k ) = e γ k , γ R . (For γ = 0 we recover the unweighted case φ 1 .) Equivalently, setting ε : = e γ > 0 , the weight takes the form φ ( k ) = ε k ; this reparameterisation is convenient in applications where ε ( 0 ,   1 ) models a per-event discount factor, while γ R is the natural parameter for the exponential-family calculations below. The two parameterisations are equivalent.
For α [ 0 ,   1 ] , set
λ α : = λ 1 α λ 2 1 α .
(a) Weighted Bhattacharyya coefficient and Chernoff arc.   A direct summation gives
ρ α w ( P , Q ) = k = 0 φ γ ( k ) p ( k ) α q ( k ) 1 α = exp α λ 1 ( 1 α ) λ 2 + e γ λ α .
Hence, by Definition 1,
D B , α w ( P , Q ) = ln ρ α w ( P , Q ) = α λ 1 + ( 1 α ) λ 2 e γ λ α .
Moreover, the normalised weighted geometric mixture (Chernoff-tilted density) from (18) takes the form
( p q ) α ( k ) = φ γ ( k ) p ( k ) α q ( k ) 1 α ρ α w ( P , Q ) = exp { e γ λ α } ( e γ λ α ) k k ! = Poi ( e γ λ α ) .
(b) Optimal Chernoff parameter.   If λ 1 λ 2 , then α D B , α w ( P , Q ) is strictly concave on [ 0 ,   1 ] since
d 2 d α 2 D B , α w ( P , Q ) = e γ λ α ln λ 1 λ 2 2 < 0 ;
hence, the maximiser α in Definition 2 is unique. Differentiating (61) yields the critical point condition
0 = d d α D B , α w ( P , Q ) = λ 1 λ 2 e γ λ α ln λ 1 λ 2 .
Equivalently, the (unconstrained) critical point α = α ˜ satisfies
λ α ˜ = e γ L ( λ 1 , λ 2 ) , L ( λ 1 , λ 2 ) : = λ 1 λ 2 ln λ 1 ln λ 2 .
In contrast to the unweighted case ( γ = 0 ), the context tilt γ may push the optimal Chernoff parameter to the boundary α { 0 , 1 } .
Thus, the unconstrained maximiser is
α ˜ = ln L ( λ 1 , λ 2 ) γ ln λ 2 ln λ 1 ln λ 2 ,
and the maximiser on [ 0 ,   1 ] is α = Π [ 0 , 1 ] ( α ˜ ) , where
Π [ 0 , 1 ] ( a ) : = min { 1 , max { 0 , a } } .
Finally,
D C w ( P , Q ) = max α [ 0 , 1 ] D B , α w ( P , Q ) = D B , α w ( P , Q ) .
If λ 1 = λ 2 , then ρ α w ( P , Q ) does not depend on α and D C w ( P , Q ) = D B , α w ( P , Q ) for any α [ 0 ,   1 ] .
Derivation of (60). For k N 0 ,
p ( k ) α q ( k ) 1 α = exp { α λ 1 ( 1 α ) λ 2 } λ α k k ! ,
so multiplying by e γ k and summing over k gives
ρ α w ( P , Q ) = exp { α λ 1 ( 1 α ) λ 2 } k 0 ( e γ λ α ) k / k ! = exp { α λ 1 ( 1 α ) λ 2 + e γ λ α } .

4.4. Exponential Models

Example 3 
(Exponential model with exponential weight). Let X = R + = [ 0 , ) with Lebesgue measure. Fix two hypotheses P = Exp ( λ 1 ) and Q = Exp ( λ 2 ) with rates λ 1 , λ 2 > 0 , and write
p ( x ) = λ 1 e λ 1 x 1 { x 0 } , q ( x ) = λ 2 e λ 2 x 1 { x 0 } .
Consider the exponential weight φ γ ( x ) = e γ x with
γ < min { λ 1 , λ 2 } ,
so that ρ α w ( P , Q ) ( 0 , ) for all α [ 0 ,   1 ] . For α [ 0 ,   1 ] set
λ α : = α λ 1 + ( 1 α ) λ 2 .
(a) Weighted Bhattacharyya coefficient and Chernoff arc.   A direct computation gives
ρ α w ( P , Q ) = 0 e γ x p ( x ) α q ( x ) 1 α d x = λ 1 α λ 2 1 α λ α γ .
Hence
D B , α w ( P , Q ) = ln ρ α w ( P , Q ) = ln ( λ α γ ) α ln λ 1 ( 1 α ) ln λ 2 .
Moreover, the Chernoff-tilted density ( p q ) α from (18) is again exponential:
( p q ) α ( x ) = e γ x p ( x ) α q ( x ) 1 α ρ α w ( P , Q ) = ( λ α γ ) e ( λ α γ ) x 1 { x 0 } = Exp ( λ α γ ) .
(b) Optimal Chernoff parameter.   If λ 1 λ 2 , then α D B , α w ( P , Q ) is strictly concave on [ 0 ,   1 ] ; hence, the maximiser α in Definition 2 is unique. Differentiating yields the critical point condition
λ 1 λ 2 λ α γ = ln λ 1 λ 2 .
Equivalently, the (unconstrained) critical point α = α ˜ satisfies
λ α ˜ γ = L ( λ 1 , λ 2 ) , L ( λ 1 , λ 2 ) : = λ 1 λ 2 ln λ 1 ln λ 2 ,
so that
α ˜ = γ + L ( λ 1 , λ 2 ) λ 2 λ 1 λ 2 .
The maximiser on [ 0 ,   1 ] is α = Π [ 0 , 1 ] ( α ˜ ) (projection onto [ 0 ,   1 ] ), and
D C w ( P , Q ) = max α [ 0 , 1 ] D B , α w ( P , Q ) = D B , α w ( P , Q ) .
If λ 1 = λ 2 = λ , then ρ α w ( P , Q ) = λ / ( λ γ ) does not depend on α, so any α [ 0 ,   1 ] is optimal and D C w ( P , Q ) = ln ( λ γ ) ln λ . Setting γ = 0 (i.e., φ 1 ) recovers the classical unweighted expressions.

Additional Example (Baseline, Non-Exponential Family)

Appendix A contains a closed-form illustration for the Cauchy location–scale family. Since the Cauchy family is not an exponential family, this example complements the main text by showing that, even in the unweighted baseline case φ 1 , the Bhattacharyya coefficient (in particular ρ 1 / 2 ) and the Chernoff information may involve special functions (complete elliptic integrals). For nontrivial weights φ , the symmetry ρ α = ρ 1 α (hence α = 1 / 2 ) typically fails and a comparable closed form is not available, so we keep the baseline Cauchy computation in the appendix.

4.5. Extension to M-ary Hypothesis Testing

We now record the finite-M analogue of Theorem 1. The key observation is that the optimal M-ary pointwise loss is squeezed between pairwise minima (Lemma 3), and each pairwise term has logarithmic rate given by the corresponding weighted Chernoff information. Hence the overall M-ary rate is determined by the closest pair in terms of D C w .
Fix an integer M 2 and let P 1 , , P M be probability measures on X dominated by μ , with strictly positive densities p 1 , , p M . Let X 1 n = ( X 1 , , X n ) be i.i.d. under each hypothesis H i : X 1 n P i n . Assume that the weight function factorises as in Assumption 1.
Assume moreover that for every 1 i < j M and every α [ 0 ,   1 ] ,
ρ α w ( p i , p j ) = X φ ( x ) p i ( x ) α p j ( x ) 1 α d μ ( x ) ( 0 , ) ,
so that all pairwise weighted Chernoff information values are well-defined and inequality
max i j sup α [ 0 ,   1 ] φ ( x ) | ln p i ( x ) p j ( x ) | p i ( x ) α p j ( x ) 1 α d μ ( x ) <
holds true.
A (deterministic) M-ary decision rule is a measurable map δ n : X n { 1 , , M } . Define the context-sensitive loss under H i by
L i , n ( δ n ) : = E P i n φ ( X 1 n ) 1 { δ n ( X 1 n ) i } ,
and the total loss
L n , M ( δ n ) : = i = 1 M L i , n ( δ n ) , L n , M : = inf δ n L n , M ( δ n ) .
Proposition 6 
(Pointwise form of the optimal M-ary total loss). For each n 1 ,
L n , M = X n φ ( x 1 n ) i = 1 M p i ( x 1 n ) max 1 j M p j ( x 1 n ) d μ n ( x 1 n ) ,
where p i ( x 1 n ) = k = 1 n p i ( x k ) . Moreover, an optimal rule is given by the maximum-likelihood classifier
δ n ( x 1 n ) arg max 1 j M p j ( x 1 n )
(with any measurable tie-breaking).
Proof. 
Fix δ n . Using i = 1 M 1 { δ n i } p i = i = 1 M p i p δ n pointwise, we obtain
L n , M ( δ n ) = X n φ ( x 1 n ) i = 1 M p i ( x 1 n ) p δ n ( x 1 n ) ( x 1 n ) d μ n ( x 1 n ) .
Minimisation over δ n is therefore pointwise in x 1 n and is achieved by selecting an index maximising p j ( x 1 n ) , yielding (63). □
Lemma 3 
(Pairwise minima). For any non-negative numbers a 1 , , a M ,
max 1 i < j M min ( a i , a j ) k = 1 M a k max 1 k M a k 1 i < j M min ( a i , a j ) .
Consequently, defining for i < j
I n i , j : = X n φ ( x 1 n ) min { p i ( x 1 n ) , p j ( x 1 n ) } d μ n ( x 1 n ) ,
we have the sandwich inequality
max i < j I n i , j L n , M i < j I n i , j .
Proof. 
Let a ( 1 ) a ( 2 ) a ( M ) be the decreasing rearrangement of ( a 1 , , a M ) . Then k a k max k a k = r = 2 M a ( r ) a ( 2 ) . Moreover, max i < j min ( a i , a j ) = a ( 2 ) , proving the left inequality in (64).
For the right inequality, let k arg max k a k . Then
1 i < j M min ( a i , a j ) k k min ( a k , a k ) = k k a k = k = 1 M a k max k a k .
Applying (64) pointwise to a i = p i ( x 1 n ) , multiplying by φ ( x 1 n ) and integrating yields (65). □
Theorem 3 
(M-ary exponent equals the minimum pairwise weighted Chernoff information). For 1 i < j M , let D C w ( P i , P j ) be the weighted Chernoff information as in Definition 2, and (62) holds true. Set
C M w : = min 1 i < j M D C w ( P i , P j ) .
Then the optimal M-ary total loss satisfies
L n , M = exp { n C M w + o ( n ) } , n ,
or equivalently,
lim n 1 n ln L n , M = C M w .
Proof. 
Fix 1 i < j M and consider the binary testing problem between P i and P j with the same factorised weight φ . The optimal binary total loss equals
I n i , j = X n φ ( x 1 n ) min { p i ( x 1 n ) , p j ( x 1 n ) } d μ n ( x 1 n ) ,
and by Theorem 1 applied to the pair ( P i , P j ) ,
I n i , j = exp { n D C w ( P i , P j ) + o i , j ( n ) } .
Since the number of pairs is finite, letting r n : = max i < j | o i , j ( n ) | yields r n = o ( n ) and
I n i , j = exp { n D C w ( P i , P j ) + O ( r n ) } uniformly over i < j .
Now use the sandwich inequality (65). Let ( i , j ) attain the minimum C M w = D C w ( P i , P j ) . From the lower bound,
L n , M I n i , j = exp { n C M w + O ( r n ) } .
From the upper bound,
L n , M i < j I n i , j M 2 exp { n C M w + O ( r n ) } .
Taking 1 n ln ( · ) and letting n yields (66). □
Remark 8 
(Nonzero priors do not change the exponent). Let w 1 , , w M > 0 , i w i = 1 , and consider the Bayesian weighted total loss L n , M ( w ) ( δ n ) = i = 1 M w i L i , n ( δ n ) with optimum L n , M ( w ) : = inf δ n L n , M ( w ) ( δ n ) . Then, the exponent remains C M w . Indeed, writing w min : = min i w i and w max : = max i w i , for any δ n ,
w min L n , M ( δ n ) L n , M ( w ) ( δ n ) w max L n , M ( δ n ) ,
and taking infimum over δ n gives w min L n , M L n , M ( w ) w max L n , M . Hence, 1 n ln L n , M ( w ) and 1 n ln L n , M have the same limit C M w .

5. Conclusions

We studied context-sensitive simple hypothesis testing under a multiplicative weight and proved that the optimal total loss admits the single-letter logarithmic asymptotic
L n = exp { n D C w ( P , Q ) + o ( n ) } .
The rate is the weighted Chernoff information. The main structural ingredient is an exponential-family embedding of the weighted geometric mixtures φ p α q 1 α , which yields the characterisation of the optimal Chernoff parameter through the log-normaliser and leads to weighted information-geometric identities. We also derived finite-n concentration bounds for the tilted weighted log-likelihood, obtained explicit formulas in Gaussian, Poisson, and exponential models, and extended the logarithmic asymptotic to finitely many hypotheses through the minimum pairwise weighted Chernoff information.

Open Problems

The single-letter representation (4) rests on Assumption 1 (factorised weights), on the integrability of the tilted log-normaliser F ^ , and on i.i.d. sampling of simple hypotheses. Relaxing these assumptions defines several natural open directions.
(a)
Non-factorised weights. Replacing Assumption 1 by a weight φ ( x 1 n ) = ψ 1 n i = 1 n h ( x i ) or a pairwise-interaction weight; the single-letter rate is then expected to be replaced by a variational formula over the space of probability measures, in the spirit of Sanov/Gibbs conditioning.
(b)
Integrability. Weights with heavy tails in the sufficient statistic t ( x ) may violate the finiteness of F ^ near α θ 1 + ( 1 α ) θ 2 ; the boundary cases α { 0 , 1 } (cf. Proposition 5 and [3]) call for a systematic treatment via truncation or a change in base measure.
(c)
Dependent observations. A weighted counterpart of the Gärtner–Ellis theorem for stationary ergodic sequences, along the lines of the weighted extensions in [4].
(d)
Composite hypotheses. A weighted analogue of the generalised likelihood-ratio test and its exponent, with sup/inf characterisations in terms of D C w over the composite parameter sets.
(e)
Information geometry of weighted manifolds. Extending the Chentsov–Amari framework [8,9,10,11,12] to weighted statistical manifolds; the Fisher metric and the dually flat structure depend on symmetries that φ breaks in a controlled way.

Author Contributions

Conceptualization, methodology, and supervision, M.K.; formal analysis and validation, M.K. and E.Y.K.; software, visualization, data curation, and project administration, E.Y.K.; writing–original draft preparation, M.K. and E.Y.K.; writing–review and editing, E.Y.K.; funding acquisition, M.K. and E.Y.K. All authors have read and agreed to the published version of the manuscript.

Funding

The work by MK was carried out in the framework of a research project HSE-BR-2025-039 implemented as part of the Basic Research Program at HSE University. The second author (E.Yu. Kalimulina) was supported by the Ministry of Science and Higher Education of the Russian Federation under the state assignment (project FFNU-2025-0029).

Data Availability Statement

The Python/Jupyter notebook reproducing Table 1 and Figure 1, Figure 2 and Figure 3, together with the direct numerical-integration verification of the Gaussian formula in Section 4.1, is openly available on Zenodo. No new experimental datasets were generated.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A. Cauchy Location–Scale Family

We consider the univariate Cauchy location–scale family
p l , s ( x ) = s π s 2 + ( x l ) 2 , x R , l R , s > 0 .
In this appendix, we set φ 1 , so the weighted quantities from Section 2 coincide with their classical counterparts (e.g., ρ α w = ρ α , D B , α w = D B , α , D C w = D C ). This example provides a closed-form illustration outside the exponential-family setting.
Although the main body of the paper develops weighted Chernoff/Bhattacharyya quantities, we include the Cauchy location–scale family as an unweighted baseline closed-form benchmark outside the exponential-family setting (note the appearance of complete elliptic integrals even in the classical case). This appendix is not used in the proofs of the weighted results; it serves as a sanity check and illustrates the analytic complexity of ρ α beyond exponential families. Nontrivial weights must satisfy the finiteness conditions from Section 2; for heavy-tailed Cauchy laws, common exponential tilts φ ( x ) = e γ x violate these conditions for any γ 0 , and in general, a nonconstant weight also breaks the symmetry leading to α = 1 / 2 .

Appendix A.1. KL Divergence (Closed Form)

Proposition A1 
(KL divergence between two Cauchy laws). Let P = Cauchy ( l 1 , s 1 ) and Q = Cauchy ( l 2 , s 2 ) with s 1 , s 2 > 0 . Then
D KL ( P | | Q ) = R p l 1 , s 1 ( x ) ln p l 1 , s 1 ( x ) p l 2 , s 2 ( x ) d x = ln ( s 1 + s 2 ) 2 + ( l 1 l 2 ) 2 4 s 1 s 2 .

Appendix A.2. Chernoff Parameter and Bhattacharyya Coefficient

Recall the (unweighted) α -Bhattacharyya coefficient and distance
ρ α ( P , Q ) = R p l 1 , s 1 ( x ) α p l 2 , s 2 ( x ) 1 α d x , D B , α ( P , Q ) = ln ρ α ( P , Q ) , α [ 0 ,   1 ] .
Proposition A2 
(Chernoff parameter for Cauchy). For the Cauchy location–scale family (with φ 1 ), one has the symmetry
ρ α ( P , Q ) = ρ 1 α ( P , Q ) , α [ 0 , 1 ] ;
equivalently, D B , α ( P , Q ) = D B , 1 α ( P , Q ) . Consequently, if P Q , then the Chernoff maximiser is unique and equals
α = 1 2 , and hence D C ( P , Q ) = max α [ 0 , 1 ] D B , α ( P , Q ) = D B , 1 / 2 ( P , Q ) .
If P = Q , then D B , α ( P , Q ) 0 and every α [ 0 , 1 ] is a maximiser.
Proof. 
Set F ( α ) : = ln ρ α ( P , Q ) . By Hölder’s inequality, F is convex on [ 0 ,   1 ] , and it is strictly convex unless p l 1 , s 1 / p l 2 , s 2 is μ -a.e. constant (equivalently, unless P = Q ). Hence D B , α ( P , Q ) = F ( α ) is concave (strictly concave when P Q ). For Cauchy laws, the symmetry (A2) holds; see, e.g., [17] for an invariance-based proof. Therefore, D B , α is symmetric about 1 / 2 , and concavity implies that it attains its maximum at α = 1 / 2 . Strict concavity yields uniqueness when P Q . □

Appendix A.3. Closed Form for ρ1/2 and DC

Lemma A1 
(A standard elliptic-integral identity). Let a , b > 0 and d R . Define the complete elliptic integral of the first kind
K ( m ) : = 0 π / 2 d u 1 m sin 2 u , m [ 0 , 1 ) .
Then
d x ( x 2 + a 2 ) ( x d ) 2 + b 2 = 4 ( a + b ) 2 + d 2 K ( a b ) 2 + d 2 ( a + b ) 2 + d 2 .
Proposition A3 
(Bhattacharyya coefficient for Cauchy, closed form). Let P = Cauchy ( l 1 , s 1 ) and Q = Cauchy ( l 2 , s 2 ) with s 1 , s 2 > 0 , and set δ : = l 1 l 2 . Then
ρ 1 / 2 ( P , Q ) = R p l 1 , s 1 ( x ) p l 2 , s 2 ( x ) d x = 4 s 1 s 2 π ( s 1 + s 2 ) 2 + δ 2 K ( s 1 s 2 ) 2 + δ 2 ( s 1 + s 2 ) 2 + δ 2 .
Consequently,
D C ( P , Q ) = D B , 1 / 2 ( P , Q ) = ln ρ 1 / 2 ( P , Q ) ,
with ρ 1 / 2 ( P , Q ) given by (A4).
Proof. 
By definition,
p l 1 , s 1 ( x ) p l 2 , s 2 ( x ) = s 1 s 2 π ( x l 1 ) 2 + s 1 2 ( x l 2 ) 2 + s 2 2 .
Shift x x + l 2 to obtain
ρ 1 / 2 ( P , Q ) = s 1 s 2 π d x ( x 2 + s 2 2 ) ( x δ ) 2 + s 1 2 .
Apply Lemma A1 with a = s 2 , b = s 1 , d = δ to get (A4). Finally, Proposition A2 yields D C ( P , Q ) = D B , 1 / 2 ( P , Q ) , hence (A5). □
Remark A1 
(Useful special cases). If l 1 = l 2 (common location), then δ = 0 and (A4) reduces to
ρ 1 / 2 ( P , Q ) = 4 s 1 s 2 π ( s 1 + s 2 ) K s 1 s 2 s 1 + s 2 2 .
If s 1 = s 2 = s (common scale), then
ρ 1 / 2 ( P , Q ) = 4 s π 4 s 2 + δ 2 K δ 2 4 s 2 + δ 2 .
Remark A2 
(Weighted case). For a general weight φ, one has
ρ 1 / 2 w ( P , Q ) = R φ ( x ) p l 1 , s 1 ( x ) p l 2 , s 2 ( x ) d x .
The symmetry (A2) (and hence α = 1 / 2 ) typically fails unless φ is compatible with the invariance argument used in [17]. We therefore use the Cauchy model mainly as a closed-form baseline example at φ 1 .

References

  1. Chernoff, H. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann. Math. Stat. 1952, 23, 493–507. [Google Scholar] [CrossRef]
  2. Hoeffding, W. Asymptotically optimal tests for multinomial distributions. Ann. Math. Stat. 1965, 36, 369–401. [Google Scholar] [CrossRef]
  3. Kelbert, M.; Suhov, Y. Context-sensitive hypothesis-testing and exponential families. Statistics 2025, 59, 845–878. [Google Scholar] [CrossRef]
  4. Kelbert, M.; Suhov, Y. On basic context-dependent concepts of Information Theory and Statistics. Theory Probab. Appl. 2026, 70, 563–583. [Google Scholar]
  5. Kalimulina, E.Y. Application of multi-valued logic models in traffic aggregation problems in mobile networks. In Proceedings of the 2021 IEEE 15th International Conference on Application of Information and Communication Technologies (AICT), Baku, Azerbaijan, 13–15 October 2021; pp. 1–6. [Google Scholar]
  6. Esin, A.A.; Kalimulina, E.Y. Markov-modulated queueing network for mobile traffic aggregation with threshold-controlled buffers. Math. Model. Numer. Simul. Appl. 2026, 6, 4. [Google Scholar] [CrossRef]
  7. Nielsen, F. Hypothesis testing, information divergence and computational geometry. In Geometric Science of Information; Nielsen, F., Barbaresco, F., Eds.; GSI 2013, Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2013; Volume 8085, pp. 241–248. [Google Scholar] [CrossRef]
  8. Amari, S.-I.; Nagaoka, H. Methods of Information Geometry; Translations of Mathematical Monographs; American Mathematical Society: Providence, RI, USA, 2000; Volume 191. [Google Scholar] [CrossRef]
  9. Chentsov, N.N. Statistical Decision Rules and Optimal Inference; Translations of Mathematical Monographs; American Mathematical Society: Providence, RI, USA, 1982; Volume 53. [Google Scholar]
  10. Amari, S.-I. Information Geometry and Its Applications; Applied Mathematical Sciences; Springer: Japan, Tokyo, 2016; Volume 194. [Google Scholar] [CrossRef]
  11. Nielsen, F. An information-geometric characterization of Chernoff information. IEEE Signal Process. Lett. 2013, 20, 269–272. [Google Scholar] [CrossRef]
  12. Nielsen, F. Revisiting Chernoff information with likelihood ratio exponential families. Entropy 2022, 24, 1400. [Google Scholar] [CrossRef] [PubMed]
  13. Azuma, K. Weighted sums of certain dependent random variables. Tôhoku Math. J. 1967, 19, 357–367. [Google Scholar] [CrossRef]
  14. McDiarmid, C. On the method of bounded differences. In Surveys in Combinatorics; Cambridge University Press: Cambridge, UK, 1989; Volume 141, pp. 148–188. [Google Scholar]
  15. Ren, Y.; Zhang, J.; Xia, Y.; Wang, R.; Xie, F.; Guan, J.; Zhang, H.; Zhou, S. Regression-based conditional independence test with adaptive kernels. Artif. Intell. 2025, 347, 104391. [Google Scholar] [CrossRef]
  16. Kalimulina, E.Y. Weighted Chernoff Information–Numerical Illustration. Software, companion to the present paper, version 1.0.0. Zenodo 2026. [Google Scholar] [CrossRef]
  17. Nielsen, F.; Okamura, K. On f-divergences between Cauchy distributions. IEEE Trans. Inf. Theory 2023, 69, 3150–3170. [Google Scholar] [CrossRef]
Figure 1. The map α ln ρ α w ( p , q ) for the Gaussian hypotheses N ( 0 , 1 ) , N ( 3 , 2 ) with weight (54), for β { 0 , 1 / 16 , 1 / 4 } . The optimum α is marked by a bullet on each curve and shifts to the left as β increases.
Figure 1. The map α ln ρ α w ( p , q ) for the Gaussian hypotheses N ( 0 , 1 ) , N ( 3 , 2 ) with weight (54), for β { 0 , 1 / 16 , 1 / 4 } . The optimum α is marked by a bullet on each curve and shifts to the left as β increases.
Entropy 28 00536 g001
Figure 2. Optimal skewing parameter α ( β ) for the Gaussian example. The dashed line marks the unweighted value α ( 0 ) .
Figure 2. Optimal skewing parameter α ( β ) for the Gaussian example. The dashed line marks the unweighted value α ( 0 ) .
Entropy 28 00536 g002
Figure 3. Weighted Chernoff information β D C w ( P , Q ) . The dashed line marks the classical value D C recovered at β = 0 .
Figure 3. Weighted Chernoff information β D C w ( P , Q ) . The dashed line marks the classical value D C recovered at β = 0 .
Entropy 28 00536 g003
Table 1. Optimal skewing parameter and weighted Chernoff information for the Gaussian example (54), obtained by maximising (55).
Table 1. Optimal skewing parameter and weighted Chernoff information for the Gaussian example (54), obtained by maximising (55).
β α ( β ) D C w ( P , Q )
0 (unweighted) 0.4153 0.8018
1 / 16 0.3355 0.9827
1 / 4 0.0963 1.4935
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kelbert, M.; Kalimulina, E.Y. Weighted Chernoff Information and Optimal Loss Exponent in Context-Sensitive Hypothesis Testing. Entropy 2026, 28, 536. https://doi.org/10.3390/e28050536

AMA Style

Kelbert M, Kalimulina EY. Weighted Chernoff Information and Optimal Loss Exponent in Context-Sensitive Hypothesis Testing. Entropy. 2026; 28(5):536. https://doi.org/10.3390/e28050536

Chicago/Turabian Style

Kelbert, Mark, and El’mira Yu. Kalimulina. 2026. "Weighted Chernoff Information and Optimal Loss Exponent in Context-Sensitive Hypothesis Testing" Entropy 28, no. 5: 536. https://doi.org/10.3390/e28050536

APA Style

Kelbert, M., & Kalimulina, E. Y. (2026). Weighted Chernoff Information and Optimal Loss Exponent in Context-Sensitive Hypothesis Testing. Entropy, 28(5), 536. https://doi.org/10.3390/e28050536

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop