Next Article in Journal
Efficient Implementation of Discrete-Time Quantum Walks on Quantum Computers
Previous Article in Journal
PyDTS: A Python Toolkit for Deep Learning Time Series Modelling
Previous Article in Special Issue
Slope Entropy Characterisation: An Asymmetric Approach to Threshold Parameters Role Analysis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Some Theoretical Foundations of Bare-Simulation Optimization of Some Directed Distances between Fuzzy Sets Respectively Basic Belief Assignments

by
Michel Broniatowski
1 and
Wolfgang Stummer
2,*
1
Laboratoire de Probabilités, Statistique et Modélisation, Sorbonne Université, 4 Place Jussieu, 75252 Paris, France
2
Department of Mathematics, Friedrich-Alexander-Universität Erlangen–Nürnberg, Cauerstrasse 11, 91058 Erlangen, Germany
*
Author to whom correspondence should be addressed.
Entropy 2024, 26(4), 312; https://doi.org/10.3390/e26040312
Submission received: 17 January 2024 / Revised: 14 March 2024 / Accepted: 22 March 2024 / Published: 1 April 2024

Abstract

:
It is well known that in information theory—as well as in the adjacent fields of statistics, machine learning and artificial intelligence—it is essential to quantify the dissimilarity between objects of uncertain/imprecise/inexact/vague information; correspondingly, constrained optimization is of great importance, too. In view of this, we define the dissimilarity-measure-natured generalized φ–divergences between fuzzy sets, ν –rung orthopair fuzzy sets, extended representation type ν –rung orthopair fuzzy sets as well as between those fuzzy set types and vectors. For those, we present how to tackle corresponding constrained minimization problems by appropriately applying our recently developed dimension-free bare (pure) simulation method. An analogous program is carried out by defining and optimizing generalized φ–divergences between (rescaled) basic belief assignments as well as between (rescaled) basic belief assignments and vectors.

1. Introduction

Directed distances—particularly known as divergences— D ( Q , P ) between probability vectors (i.e., vectors of probability frequencies) Q , P are widely used in statistics as well as in the adjacent research fields of information theory, artificial intelligence and machine learning. Prominent examples are, e.g., the Kullback–Leibler information distance/divergence (also known as relative entropy), the Hellinger distance, the Jensen–Shannon divergence and Pearson’s chi-square distance/divergence; those are special cases of the often-used wider class of the so-called Csiszár–Ali–Silvey–Morimoto (CASM) φ –divergences D φ ( Q , P ) (cf. [1,2,3]). For some comprehensive overviews on CASM φ –divergences, the reader is referred to, e.g., the insightful books [4,5,6,7,8,9], the survey articles [10,11,12,13], and the references therein. It is well known that the optimization of such CASM φ –divergences plays an important role, in obtaining estimators (e.g., the omnipresent maximum likelihood estimation method can be equivalently seen as a minimum Kullback–Leibler information distance estimation method), as well as in quantifying the model adequacy in the course of a model-search (model-selection) procedure; for the latter, see, e.g., [14,15,16,17].
In the literature, one can also find a substantial amount of special cases of CASM φ –divergences D φ ( Q , P ) between other prominent statistical quantities Q , P (other than probability frequencies), see, e.g., [18] for a corresponding recent survey. In contrast, there also exist special cases of CASM φ –divergences D φ ( B , A ) between other basic objects B,A for the quantification of uncertain/imprecise/inexact/vague information such as fuzzy sets (cf. [19]) and basic belief assignments from Dempster–Shafer evidence theory (cf. [20,21]). Indeed, as far as the former is concerned, for instance, ref. [22] employ (a variant of) the Kullback–Leibler information distance between two fuzzy sets B and A (which they call fuzzy expected information for discrimination in favor of B against A), ref. [23] investigate the Jensen–Shannon divergence between two intuitionistic fuzzy sets B and A (which they call symmetric information measure between B and A), whereas [24] deal with the Jensen–Shannon divergence between two extended representation type (i.e., hesitancy-degree including) Pythagorean fuzzy sets B and A. As far as CASM φ –divergences D φ ( B , A ) between basic belief assignments (BBAs) B, A is concerned, for instance, refs. [25,26] employ the Jensen–Shannon divergence for multi-sensor data fusion, whereas [27] use the Hellinger distance for characterizing the degree of conflict between BBAs.
In view of the above-mentioned illuminations, the main goals of this paper are as follows:
(M1)
to define—dissimilarity-quantifying— generalized CASM φ–divergences between fuzzy sets, between ν –rung orthopair fuzzy sets in the sense of [28] (including intuitionistic and Pythagorean fuzzy sets), between extended representation type ν –rung orthopair fuzzy sets, between those fuzzy set types and vectors, between (rescaled) basic belief assignments as well as between (rescaled) basic belief assignments and vectors;
(M2)
to present how one can tackle corresponding constrained minimization problems by appropriately applying our recently developed dimension-free bare (pure) simulation method of [29].
This agenda is achieved in the following way: in the next Section 2, we recall the basic definitions and properties of generalized CASM φ–divergences between vectors. The follow-up Section 3 explains the basic principles of our above-mentioned bare-simulation optimization method of [29] (where, for the sake of brevity, we focus on the minimal values and not on the corresponding minimizers). In Section 4, we achieve the main goals (M1) and (M2) for the above-mentioned types of fuzzy sets, whereas Section 5 concerns with (M1) and (M2) for (rescaled) basic belief assignments. The conclusions are discussed in the final Section 6.

2. Divergences between Vectors

As usual, a divergence is a function D : R K × R K [ 0 , ] with the following properties: D ( Q , P ) 0 for K–dimensional vectors Q , P R K , and D ( Q , P ) = 0 iff Q = P . Since, in general, D ( Q , P ) D ( P , Q ) and the triangle inequality is not satisfied, D ( Q , P ) can be interpreted as directed distance from Q to P ; accordingly, the divergences D can be connected to dissimilarity quantification and to geometric issues in various different ways, see, e.g., the detailed discussion in Section 1.5 of [18]. Typically, a divergence D is generated by some function φ . For the latter, here we fundamentally require the following:
(G1)
φ : ] , [ [ 0 , ] is lower semicontinuous and convex;
(G2)
φ ( 1 ) = 0 ;
(G3)
the effective domain d o m ( φ ) : = { t R : φ ( t ) < } has interior i n t ( d o m ( φ ) ) of the form i n t ( d o m ( φ ) ) = ] a , b [ for some a < 1 < b (notice that (G3) follows from (G1), (G2) and the requirement that i n t ( d o m ( φ ) ) is non-empty);
(G4’)
φ is strictly convex in a neighborhood ] t s c , t + s c [ ] a , b [ of one ( t s c < 1 < t + s c ).
Furthermore, we set φ ( a ) : = lim t a φ ( t ) and φ ( b ) : = lim t b φ ( t ) . The class of all functions φ with (G1), (G2), (G3) and (G4’) will be denoted by Υ ˜ ( ] a , b [ ) . For φ Υ ˜ ( ] a , b [ ) , P : = p 1 , , p K R > 0 K : = { R : = ( r 1 , , r K ) R K : r i > 0 for all i = 1 , , K } and Q : = ( q 1 , , q K ) Ω R K we define as directed distance the generalized Csiszár–Ali–Silvey–Morimoto φ–divergence (cf. [1,2,3,30,31]) —in short, the generalized φ–divergence
D φ ( Q , P ) : = k = 1 K p k · φ q k p k 0 ,
where for finiteness reasons one often even has Q Ω R 0 K : = { R : = ( r 1 , , r K ) R K : r i 0 for all i = 1 , , K } ; for a comprehensive technical treatment, see, e.g., [32] (for instance, if ] a , b [ = ] 0 , [ then one can include zeros in (1) by the “conventions” p k · φ 0 p k = p k · lim t 0 φ ( t ) for p k > 0 , 0 · φ q k 0 = q k · 0 q k · φ q k 0 = q k · lim t 0 t · φ ( 1 t ) for q k > 0 , and 0 · φ 0 0 = 0 ). Comprehensive overviews on these important (generalized) φ –divergences are given in e.g., [4,5,6,7,8,9,10,11,12,13,14,15,16,17,18], and the references therein.

3. Optimization of Generalized φ –Divergences via the Bare Simulation Solution Method

In the following we deal with
Problem 1.
For pregiven φ Υ ˜ ( ] a , b [ ) , positive-entries vector P : = p 1 , , p K R > 0 K (or from some subset thereof), and subset Ω R K with regularity properties
c l ( Ω ) = c l i n t Ω , i n t Ω ,
find
inf Q Ω D φ ( Q , P ) ,
provided that
inf Q Ω D φ ( Q , P ) < .
Remark 1.
(a) When Ω is not closed but merely satisfies (2), then the infimum in (3) may not be reached in Ω although it is finite; additionally, Ω is a closed set, then a minimizer Q * Ω exists. In the subsetup where Ω is a closed convex set and i n t ( Ω ) , (2) is satisfied and the minimizer Q * Ω in (3) is attained and even unique. When Ω is open and satisfies (2), then the infimum in (3) exists, but is generally reached at some generalized projection of P on Ω (see [33] for the Kullback–Leibler divergence case of probability measures, which extends to any generalized φ–divergence in our framework). However, in this paper, we only deal with finding the infimum/minimum in (3) (rather than a corresponding minimizer ).
(b) Our approach is predestined for non- or semiparametric models. For instance, (2) is valid for the appropriate tubular neighborhoods of parametric models or for more general non-parametric settings, such as, e.g., shape constraints.
(c) Without further mentioning, the regularity condition (2) is supposed to hold in the full topology.
According to our work [29], the above-mentioned Problem 1 can be solved by a new dimension-free precise bare simulation (BS) method to be explained in the following. We first suppose
Condition 1.
With M P : = i = 1 K p i > 0 , the divergence generator φ in (3) is such that its multiple φ ˜ : = M P · φ satisfies (G1) to (G4’) (i.e., φ ˜ Υ ˜ ( ] a , b [ ) ) and additionally there holds the representation
φ ˜ ( t ) = sup z R z · t log R e z · y d ζ ˜ ( y ) , t R ,
for some probability measure ζ ˜ on the real line R such that the function z R e z · y d ζ ˜ (y) is finite on some open interval containing zero (notice that the latter implies that R y d ζ ˜ (y) = 1 and that ζ ˜ has light tails).
A detailed discussion on the representability (5) can be found in [34]. By means of this cornerstone Condition 1, we construct in [29] a sequence ( ξ n W ˜ ) n N of R K –valued random variables/random vectors (on an auxiliary probability space ( X , A , )) as follows: for any n N and any k 1 , , K , let n k : = n · p ˜ k where x denotes the integer part of x and p ˜ k : = p k / M P > 0 . Thus, one has lim n n k n = p ˜ k . Moreover, we assume that n N is large enough, namely, n max k { 1 , , K } 1 p ˜ k , and decompose the set { 1 , , n } of all integers from 1 to n into the following disjoint blocks: I 1 ( n ) : = 1 , , n 1 , I 2 ( n ) : = n 1 + 1 , , n 1 + n 2 , and so on until the last block I K ( n ) : = { k = 1 K 1 n k + 1 , , n } , which, therefore, contains all integers from n 1 + + n K 1 + 1 to n. Clearly, I k ( n ) has n k 1 elements (i.e., card ( I k ( n ) ) = n k , where card ( A ) denotes the number of elements in a set A) for all k { 1 , , K 1 } , and the last block I K ( n ) has n k = 1 K 1 n k 1 elements, which, anyhow, satisfies lim n card ( I K ( n ) ) / n = p ˜ K . Furthermore, consider a vector W ˜ : = W ˜ 1 , , W ˜ n where the W ˜ i ’s are i.i.d. copies of the random variable W ˜ whose distribution is associated with the multiple divergence-generator φ ˜ through (5), in the sense that [ W ˜ · ] =   ζ   [·]. We group the W ˜ i ’s according to the above-mentioned blocks and sum them up blockwise, in order to build the following K–component random vector
ξ n W ˜ : = 1 n i I 1 ( n ) W ˜ i , , 1 n i I K ( n ) W ˜ i .
For such a context, in [29], we obtain the following solution of Problem 1:
Theorem 1.
Under Condition 1, there holds the “bare-simulation (BS) minimizability”
inf Q Ω D φ ( Q , P ) = lim n 1 n log ξ n W ˜ Ω
for any Ω R K with regularity properties (2) and finiteness property (4).
Theorem 1 provides our principle for the approximation of the solution of the deterministic optimization problem (3). Indeed, by replacing the involved limit by its finite counterpart, we deduce for given large n
inf Q Ω D φ ( Q , P ) 1 n log ξ n W ˜ Ω ;
it remains to estimate the probability on the right-hand side of (7). The latter can be performed either by a naive estimator of the frequency of those replications of ξ n W ˜ which hit Ω , or, more efficiently, by some improved estimator, see [29] for details, where we give concrete construction methods as well as numerous solved cases; for the latter, for the sake of brevity, we mention only two important special cases. The first one deals with the class of power-divergence generators φ : = c ˜ · φ γ : R [ 0 , ] (with arbitrary multiplier c ˜ ] 0 , [ ) defined by
c ˜ · φ γ ( t ) : = c ˜ · t γ γ · t + γ 1 γ · ( γ 1 ) , if γ ] , 0 [ and t ] 0 , [ , c ˜ · ( log t + t 1 ) , if γ = 0 and t ] 0 , [ , c ˜ · t γ γ · t + γ 1 γ · ( γ 1 ) , if γ ] 0 , 1 [ and t [ 0 , [ , c ˜ · ( t · log t + 1 t ) , if γ = 1 and t [ 0 , [ , c ˜ · ( t 1 ) 2 2 , if γ = 2 and t ] , [ , c ˜ · t γ γ · t + γ 1 γ · ( γ 1 ) · 1 ] 0 , [ ( t ) + ( 1 γ t γ 1 ) · 1 ] , 0 ] ( t ) , if γ ] 2 , [ and t ] , [ , , else ,
which—by (1)—generate (the vector-valued form of) the generalized power divergences given by
D c ˜ · φ γ ( Q , P ) : = c ˜ · k = 1 K ( q k ) γ · ( p k ) 1 γ γ · ( γ 1 ) 1 γ 1 · k = 1 K q k + 1 γ · k = 1 K p k , if γ ] , 0 [ , P R > 0 K and Q R > 0 K , c ˜ · k = 1 K p k · log p k q k + k = 1 K q k k = 1 K p k , if γ = 0 , P R > 0 K and Q R > 0 K , c ˜ · k = 1 K ( q k ) γ · ( p k ) 1 γ γ · ( γ 1 ) 1 γ 1 · k = 1 K q k + 1 γ · k = 1 K p k , if γ ] 0 , 1 [ , P R > 0 K and Q R 0 K , c ˜ · k = 1 K q k · log q k p k k = 1 K q k + k = 1 K p k , if γ = 1 , P R > 0 K and Q R 0 K , c ˜ · k = 1 K ( q k p k ) 2 2 · p k , if γ = 2 , P R > 0 K and Q R K , c ˜ · k = 1 K ( q k ) γ · ( p k ) 1 γ γ · ( γ 1 ) · 1 [ 0 , [ ( q k ) 1 γ 1 · k = 1 K q k + 1 γ · k = 1 K p k , if γ ] 2 , [ , P R > 0 K and Q R K , , else ;
for a corresponding detailed literature embedding (including applications and transformations), the reader is referred to [29]. For any fixed M P : = i = 1 K p i ] 0 , [ , Condition 1 is satisfied for φ : = c ˜ · φ γ for all c ˜ ] 0 , [ and all γ R ] 1 , 2 [ , and thus the BS-minimizability concerning Theorem 1 can be applied. Notice that the case γ ] 1 , 2 [ has to be left out for technical reasons. The corresponding crucial simulation distributions [ W ˜ · ] =   ζ   [·] (cf. (5)) are given by the following:
(DIS1)
a tilted stable distribution on [ 0 , [ for the case γ ] , 0 [ ;
(DIS2)
the “ G a m m a ( M P · c ˜ , M P · c ˜ ) distribution” for γ = 0 ;
(DIS3)
the “ C o m p o u n d P o i s s o n ( M P · c ˜ γ ) G a m m a ( M P · c ˜ 1 γ , γ 1 γ ) distribution” for γ ] 0 , 1 [ ;
(DIS4)
the “ 1 M P · c ˜ –fold of P o i s s o n ( M P · c ˜ ) distribution” for γ = 1 ;
(DIS5)
the “ N o r m a l ( 1 , 1 M P · c ˜ ) distribution” for γ = 2 ;
(DIS6)
a distorted stable distribution on ] , [ for γ ] 2 , [ .
for details see our paper [29].
The second important special case to be mentioned here, deals with
φ s n K L , c ˜ ( t ) : = c ˜ · t · log t + ( t + 1 ) · log 2 t + 1 [ 0 , [ , if t ] 0 , [ , c ˜ · log 2 , if t = 0 , , if t ] , 0 [ ,
which leads to
D φ s n K L , c ˜ ( Q , P ) = c ˜ · k = 1 K q k · log 2 q k q k + p k + k = 1 K p k · log 2 p k q k + p k , if P R > 0 K , Q R 0 K .
For any fixed M P : = i = 1 K p i ] 0 , [ , Condition 1 is satisfied for φ : = φ s n K L , c ˜ for all c ˜ ] 0 , [ , and thus the BS-minimizability concerning Theorem 1 can be applied. The corresponding crucial simulation distribution [ W ˜ · ] =   ζ   [·] (cf. (5)) is given by the “ 1 M P · c ˜ –fold of N e g a t i v e B i n o m i a l ( M P · c ˜ , 1 2 ) ” (cf. [29]). For the special subcase c ˜ = 1 we derive
D φ s n K L , 1 ( Q , P ) = D φ 1 ( Q , ( Q + P ) / 2 ) + D φ 1 ( P , ( Q + P ) / 2 ) , if P R > 0 K , Q R 0 K and i = 1 K p i = i = 1 K q i ,
which means that in such a situation the divergence (11) can be rewritten as a sum of two generalized Kullback–Leibler divergences (cf. (9)). For the important subsetup that i = 1 K p i = i = 1 K q i = 1 , and thus both P = as well as Q = are probability vectors, the divergence D φ s n K L , 1 ( , ) in (12) is the well-known (cf. [35,36,37,38,39,40,41]) Jensen–Shannon divergence (being also called symmetrized and normalized Kullback–Leibler divergence, symmetrized and normalized relative entropy, and capacitor discrimination).
For further examples the reader is referred to our paper [29]. In the latter, we also derive bare-simulation optimization versions for constraint sets Ω in—a strictly positive multiple A > 0 of—the probability simplex, to be explained in the following. First, we denote by S K : = { Q : = ( q 1 , , q K ) R 0 K : i = 1 K q i = 1 } to be the simplex of probability vectors (probability simplex), and S > 0 K : = { Q : = ( q 1 , , q K ) R > 0 K : i = 1 K q i = 1 } . For better emphasis, (as already performed above) for elements of these two sets we use the symbols , instead of Q , P , etc., but for their components we still use our notation q k , p k . Moreover, subsets of S K or S > 0 K will be denoted by Ω Ω instead of Ω , etc. As indicated above, in the following we deal with constraint sets of the form A · Ω Ω for some arbitrary A ] 0 , [ , which automatically satisfy i n t A · Ω Ω = in the full topology and thus the regularity condition (2) is violated (cf. Remark 1 (c)). Therefore, we need an adaption of the above-mentioned method. In more detail, we deal with
Problem 2.
For pregiven φ Υ ˜ ( ] a , b [ ) , positive-components vector P : = p 1 , , p K R > 0 K , and subset A · Ω Ω A · S K with regularity properties—in the relative topology (!!) —
c l ( A · Ω Ω ) = c l i n t A · Ω Ω , i n t A · Ω Ω ,
find
inf Q A · Ω Ω D φ ( Q , P ) ,
provided that
inf Q A · Ω Ω D φ ( Q , P ) <
and that divergence generator φ additionally satisfies the Condition 1.
For the directed distance minimization Problem 2, we proceed (with the same notations) as above and construct the following K–component random vector (instead of ξ n W ˜ in (6))
ξ n w W ˜ : = i I 1 ( n ) W ˜ i k = 1 K i I k ( n ) W ˜ i , , i I K ( n ) W ˜ i k = 1 K i I k ( n ) W ˜ i , if j = 1 n W ˜ j 0 , ( , , ) = : , if j = 1 n W ˜ j = 0 .
By construction, in case of j = 1 n W ˜ j 0 , the sum of the random K vector components of (15) are now automatically equal to one, but—as (depending on φ ) the W ˜ i ’s may take both positive and negative values— these random components may be negative with a probability strictly greater than zero (respectively, non-negative with a probability strictly less than one). However, [ ξ n w W ˜ S > 0 K ] > 0 since all the (identically distributed) random variables W ˜ i have and expectation of one (as a consequence of the assumed representability (5)); in case of [ W ˜ 1 > 0 ] = 1 , one has even [ ξ n w W ˜ S > 0 K ] = 1 . Summing up things, the probability [ ξ n w W ˜ Ω Ω ] is strictly positive and finite at least for large n, whenever inf Q A · Ω Ω D φ ( Q , P ) is finite.
As mentioned right after (9) above, the required representability (5) is satisfied for all (multiples of) the generators φ ( · ) : = c ˜ · φ γ ( · ) of (8) with c ˜ ] 0 , [ and γ R ] 1 , 2 [ (cf. [29]). Within this context, for arbitrary constants A ˘ > 0 and c ˘ > 0 we define the auxiliary functions
H γ , c ˘ , A ˘ ( z ) : = c ˘ γ · ( γ 1 ) · A ˘ γ · 1 γ c ˘ · z γ 1 1 γ · ( A ˘ 1 ) , if γ ] , 0 [ ] 0 , 1 [ [ 2 , [ and z R such that γ · z c ˘ , z c ˘ · ( 1 A ˘ + log A ˘ ) , if γ = 0 and z R , c ˘ · 1 A ˘ A ˘ · log 1 z c ˘ log A ˘ , if γ = 1 and z ] , c ˘ [ ,
we obtained (with slight rescalation) in [29] the following solution of Problem 2:
Theorem 2.
Let P R > 0 K with M P : = i = 1 K p i , c ˜ ] 0 , [ , γ R ] 1 , 2 [ and A ] 0 , [ be arbitrary but fixed. Moreover, let c ˘ : = M P · c ˜ , A ˘ : = A M P and ( W ˜ i ) i N be a family of independent and identically distributed R –valued random variables with probability distribution   ζ   [·]:= [ W ˜ 1 · ] being connected with the divergence generator φ : = c ˜ · φ γ ( · ) via the representability (5) (cf. (DIS1)-(DIS6)). Then there holds the “bare-simulation (BS) minimizability”
inf Q A ·   Ω Ω   D c ˜ · φ γ ( Q , P ) = H γ , c ˘ , A ˘ lim n 1 n log ξ n w W ˜   Ω Ω  
for all sets A · Ω Ω M ˜ γ satisfying the regularity properties (13) in the relative topology. Here, M ˜ γ : = A · S > 0 K for γ ] , 0 ] , respectively, M ˜ γ : = A · S K for γ ] 0 , 1 ] [ 2 , [ .
Analogously to (7), Theorem 2 provides our principle for the approximation of the solution of the deterministic optimization problem (14). Indeed, by replacing the involved limit by its finite counterpart, we deduce for given large n
inf Q A ·   Ω Ω   D c ˜ · φ γ ( Q , P ) H γ , c ˘ , A ˘ 1 n log ξ n w W ˜   Ω Ω   ;
the probability in the latter can be estimated either by a naive estimator of the frequency of those replications of ξ n w W ˜ which hit Ω Ω , or more efficiently by some improved estimator; see [29] for details, where (for the case A = 1 , with straightforward adaption to A 1 ) we give concrete construction methods as well as numerous solved cases.
By means of the straightforward deterministic transformations, Theorem 2 carries straightforwardly over to the BS optimizability of, e.g., the following important quantities
R γ ( Q , P ) : = log k = 1 K ( q k ) γ · ( p k ) 1 γ γ · ( γ 1 ) = log ( 1 γ ) · M P + γ · A + γ · ( γ 1 ) · D φ γ ( Q , P ) γ · ( γ 1 ) , if γ ] , 0 [ ] 0 , 1 [ [ 2 , [ ,
(provided that all involved power divergences are finite), which are (by monotonicity and continuity) thus BS-minimizable on Ω = A · Ω Ω for all γ ] , 0 [ ] 0 , 1 [ [ 2 , [ :
inf Q A ·   Ω Ω   R γ ( Q , P ) = lim n log ( 1 γ ) · M P + γ · A + γ · ( γ 1 ) · H γ , c ˘ , A ˘ 1 n log ξ n w W ˜   Ω Ω   γ · ( γ 1 )
for all sets A · Ω Ω M ˜ γ satisfying the regularity properties (13) in the relative topology; here, the simulation distribution [ W ˜ · ] =   ζ   [·] (cf. (5)) is given by (DIS1), (DIS3), (DIS5) and (DIS6), respectively. The special subcase A = 1 , M P = 1 in (18) (and thus, Q , P are probability vectors , ) corresponds to the prominent Renyi divergences/distances [42] (in the scaling of, e.g., [4] and in probability vector form), see, e.g., [43] for a comprehensive study of their properties. Notice that R γ ( Q , P ) may become strictly negative, e.g., in the case that ( 1 γ ) · M P + γ · A ] 0 , 1 [ , however, in this case the variant
R γ n n ( Q , P ) : = R γ ( Q , P ) l o g ( 1 γ ) · M P + γ · A γ · ( γ 1 )
always stays non-negative and leads basically to the “same” optimization
inf Q A · Ω Ω R γ n n ( Q , P ) = inf Q A · Ω Ω R γ ( Q , P ) l o g ( 1 γ ) · M P + γ · A γ · ( γ 1 ) .
For the cases γ = 1 and γ = 0 , important transformations are the modified Kullback–Leibler information (modified relative entropy)
M P I ( Q , P ) : = k = 1 K q k · log q k p k = D φ 1 ( Q , P ) + A M P
and the modified reverse Kullback–Leibler information (modified reverse relative entropy)
M P A I ˜ ( Q , P ) : = k = 1 K p k · log p k q k = D φ 0 ( Q , P ) + M P A ;
notice that I ( Q , P ) can become negative if A < M P and I ˜ ( Q , P ) can become negative if A > M P (see [30] for counterexamples). Nevertheless, in general, we obtain immediately from γ = 1 in Theorem 2 that
inf Q A ·   Ω Ω   I ( Q , P ) = lim n A M P 1 n log ξ n w W ˜   Ω Ω  
for all sets A · Ω Ω M ˜ 1 satisfying the regularity properties (13) in the relative topology; here, the simulation distribution [ W ˜ · ] =   ζ   [·] (cf. (5)) is given by (DIS4). Moreover, by employing γ = 0 in Theorem 2 we deduce
inf Q A ·   Ω Ω   I ˜ ( Q , P ) = lim n M P A 1 n log ξ n w W ˜   Ω Ω  
for all sets A · Ω Ω M ˜ 0 satisfying the regularity properties (13) in the relative topology; here, the simulation distribution [ W ˜ · ] =   ζ   [·] (cf. (5)) is given by (DIS2).
Remark 2.
By taking the special case P:= u n i f : = ( 1 K , , 1 K ) to be the probability vector of frequencies of the uniform distribution on { 1 , , K } (and thus M P = 1 ) in the above formulas (16) to (24), we can deduce many bare-simulation solutions of constrained optimization of various different versions of entropies Q E ( Q ) ; for details, see Sections VIII and XII in [29].
From (19), (21), (23) and (24) we deduce the approximations (for large n N )
inf Q A ·   Ω Ω   R γ ( Q , P ) log ( 1 γ ) · M P + γ · A + γ · ( γ 1 ) · H γ , c ˘ , A ˘ 1 n log ξ n w W ˜   Ω Ω   γ · ( γ 1 ) ,
inf Q A ·   Ω Ω   R γ n n ( Q , P ) log ( 1 γ ) · M P + γ · A + γ · ( γ 1 ) · H γ , c ˘ , A ˘ 1 n log ξ n w W ˜   Ω Ω   γ · ( γ 1 ) l o g ( 1 γ ) · M P + γ · A γ · ( γ 1 ) ,
inf Q A ·   Ω Ω   I ( Q , P ) A M P 1 n log ξ n w W ˜   Ω Ω   ,
inf Q A ·   Ω Ω   I ˜ ( Q , P ) M P A 1 n log ξ n w W ˜   Ω Ω   ,
where for the involved ξ n w W ˜ Ω Ω one can again employ either a naive estimator of the frequency of those replications of ξ n w W ˜ which hit Ω Ω , or an improved estimator, see [29] for details.

4. Minimization Problems with Fuzzy Sets

Our above-mentioned BS framework can be applied to the—imprecise/inexact/vague information describing — fuzzy sets (cf. [19]) and optimization problems on divergences between those. Indeed, let Y = d 1 , , d K be a finite set (called the universe (of discourse)), C Y and M C : Y [ 0 , 1 ] be a corresponding membership function, where M C ( d k ) represents the degree/grade of membership of the element d k to the set C; accordingly, the object C * : = { x , M C ( x ) | x Y } is called a fuzzy set in Y (or fuzzy subset of Y ). Moreover, if A Y and B Y are two unequal sets, then the corresponding membership functions M A and M B should be unequal. Furthermore, we model the vector of membership degrees to C by P C : = p k C k = 1 , , K : = M C ( d k ) k = 1 , , K , which satisfies the key constraint  0 p k C 1 for all k { 1 , , K } and, consequently, the aggregated key constraint  0 k = 1 K p k C K (as a side remark, k = 1 K M C ( d k ) is called power of the fuzzy set C * ). For divergence generators φ in Υ ˜ ( ] a , b [ ) with (say) 0 a < 1 < b and for two sets A , B Y we can apply (1) to the corresponding membership functions and define the generalized φ–divergence D φ ( B * , A * ) between the fuzzy sets B * and A * (on the same universe Y ) as (cf. [44])
D φ ( B * , A * ) : = D φ ( P B , P A ) = k = 1 K p k A · φ p k B p k A = k = 1 K M A ( d k ) · φ M B ( d k ) M A ( d k ) 0
(depending on φ , zero degree values may have to be excluded for finiteness). For instance, we can take φ ( t ) : = φ 1 ( t ) : = t · log t + 1 t [ 0 , [ for t [ 0 , [ (cf. (8)) to end up with a generalized Kullback–Leibler divergence (generalized relative entropy) between B * and A * given by (cf. (9))
D c ˜ · φ 1 ( B * , A * ) = D c ˜ · φ 1 ( P B , P A ) = c ˜ · k = 1 K M B ( d k ) · log M B ( d k ) M A ( d k ) k = 1 K M B ( d k ) + k = 1 K M A ( d k ) 0 , if M A ( d k ) > 0 and M B ( d k ) 0 for all k = 1 , , K ;
this contrasts the choice φ ( t ) : = φ ˘ ( t ) : = t · log t [ 1 e , [ of [22]
D φ ˘ ( B * , A * ) = I ( P B , P A ) = c ˜ · k = 1 K M B ( d k ) · log M B ( d k ) M A ( d k ) , if M A ( d k ) > 0 and M B ( d k ) 0 for all k = 1 , , K ,
which they call fuzzy expected information for discrimination in favor of B against A, and which may become negative (cf. see the discussion after (22) and [30] in a more general context). Returning to the general case (29), as a special case of the above-mentioned BS concepts, we can tackle optimization problems of the type
inf B * Ω * D φ ( B * , A * ) : = inf P B Ω D φ ( P B , P A )
where Ω * is a collection of fuzzy sets (on the same universe Y ) whose membership degree vectors form the set Ω satisfying (2) and (4). Because of the inequality type key constraint
0 M B ( d k ) 1 for all k { 1 , , K }
which is incorporated into Ω , and which implies that Ω is contained in the K–dimensional unit hypercube and in particular 0 k = 1 K p k B K , Theorem 1 (and thus, (7)) applies correspondingly (e.g., to the generalized power divergences D c ˜ · φ γ ( B * , A * ) = D c ˜ · φ γ ( P B , P A ) given by (9) and the generalized Jensen–Shannon divergence D φ s n K L , c ˜ ( B * , A * ) = D φ s n K L , c ˜ ( P B , P A ) given by (11))—unless there is a more restrictive constraint that violates (2) such as, e.g., k = 1 K p k B = c with c K for which Theorem 2 (and hence, (17))—as well as its consequences (19), (21), (23), (24) (and thus, (25)–(28)) can be employed.
The above-mentioned considerations can be extended to the recent concept of ν–rung orthopair fuzzy sets (cf. [28]) and divergences between those. Indeed, for C Y , besides a membership function M C : Y [ 0 , 1 ] one additionally models a nonmembership function  N C : Y [ 0 , 1 ] , where N C ( d k ) represents the degree/grade of nonmembership of the element d k to the set C. Moreover, if A Y and B Y are unequal sets, then the corresponding nonmembership functions N A and N B should be unequal. For fixed ν [ 1 , [ , the key constraint
0 M C ( d k ) ν + N C ( d k ) ν 1 for all k { 1 , , K }
is required to be satisfied, too. Accordingly, the object C * * : = { x , M C ( x ) , N C ( x ) | x Y } is called a ν–rung orthopair fuzzy set in Y (or … subset of Y ). The object C * * is called intuitionistic fuzzy set in Y (cf. [45]) in case of ν = 1 , and Pythagorean fuzzy set in Y (cf. [46,47]) in the case of ν = 2 . For the choice ν = 1 together with N C ( x ) : = 1 M C ( x ) , the object C * * can be regarded as an extended representation of the fuzzy set C * in Y .
For any ν –rung orthopair fuzzy set C * * in Y , we model the corresponding vector of concatenated membership and nonmembership degrees to C by P C : = p k C k = 1 , , 2 K : = M C ( d 1 ) , M C ( d K ) , N C ( d 1 ) , N C ( d K ) , which, due to (30), satisfies the aggregated key constraint
0 k = 1 2 K p k C ν K ;
in other words, P C lies (within the 2 K –dimensional Euclidean space) in the intersection of the first/positive orthant with the ν –norm ball centered at the origin and with radius K 1 / ν . Analogously to (29), we can define the generalized φ–divergence D φ ( B * * , A * * ) between the ν–rung orthopair fuzzy sets B * * and A * * (on the same universe Y ) as (cf. [44])
D φ ( B * * , A * * ) : = D φ ( P B , P A ) = k = 1 2 K p k A · φ p k B p k A = k = 1 K M A ( d k ) · φ M B ( d k ) M A ( d k ) + N A ( d k ) · φ N B ( d k ) N A ( d k ) 0
respectively, as its variant (cf. [44])
D φ v a r ( B * * , A * * ) : = D φ ( P B ν , P A ν ) : = k = 1 2 K p k A ν · φ p k B ν p k A ν = k = 1 K M A ( d k ) ν · φ M B ( d k ) ν M A ( d k ) ν + N A ( d k ) ν · φ N B ( d k ) ν N A ( d k ) ν 0 .
For the special choice φ ( t ) : = φ s n K L , 1 ( t ) (cf. (10)) and ν = 1 , the definition (32) leads to the generalized Jensen–Shannon divergence between B * * and A * * given by
D φ s n K L , 1 ( B * * , A * * ) : = D φ s n K L , 1 ( P B , P A ) = k = 1 K { M B ( d k ) · log 2 M B ( d k ) M B ( d k ) + M A ( d k ) + M A ( d k ) · log 2 M A ( d k ) M B ( d k ) + M A ( d k ) + N B ( d k ) · log 2 N B ( d k ) N B ( d k ) + N A ( d k ) + N A ( d k ) · log 2 N A ( d k ) N B ( d k ) + N A ( d k ) } 0 , if M A ( d k ) > 0 , N A ( d k ) > 0 , M B ( d k ) 0 , N B ( d k ) 0 , for all k { 1 , , K } ;
this coincides with the symmetric information measure between B * * and A * * of [23]. For the special choice φ ( t ) : = φ 1 ( t ) : = t · log t + 1 t [ 0 , [ with t [ 0 , [ (cf. (8)), ν = 1 , N A ( x ) : = 1 M A ( x ) , N B ( x ) : = 1 M B ( x ) and thus (31) turning into k = 1 2 K p k A ν = k = 1 2 K p k B ν = K , one can straightforwardly see that the outcoming generalized Kullback–Leibler divergence (generalized relative entropy) between B * * and A * * given by
0 D φ 1 ( B * * , A * * ) = D φ 1 ( P B , P A ) = k = 1 K M B ( d k ) · log M B ( d k ) M A ( d k ) k = 1 K M B ( d k ) + k = 1 K M A ( d k ) + k = 1 K N B ( d k ) · log N B ( d k ) N A ( d k ) k = 1 K N B ( d k ) + k = 1 K N A ( d k ) = k = 1 K M B ( d k ) · log M B ( d k ) M A ( d k ) + k = 1 K N B ( d k ) · log N B ( d k ) N A ( d k ) , if M A ( d k ) > 0 , N A ( d k ) > 0 and M B ( d k ) 0 , N B ( d k ) 0 for all k = 1 , , K ,
coincides with D φ ˘ ( B * * , A * * ) where φ ˘ ( t ) : = t · log t ; the latter divergence was used, e.g., in [22] under the name average fuzzy information for discrimination in favor of B against A.
Returning to the general context, in terms of the divergences (32) and (33), we can tackle—as a special case of the above-mentioned BS concepts—optimization problems of the type
inf B * * Ω * * D φ ( B * * , A * * ) : = inf P B Ω D φ ( P B , P A ) respectively inf B * * Ω * * D φ v a r ( B * * , A * * ) : = inf P B Ω D φ ( P B ν , P A ν ) ,
where Ω * * is a collection of ν –rung orthopair fuzzy sets whose concatenated membership–nonmembership degree vectors form the set Ω satisfying (2) and (4) as well as (30) for C : = B . Because of the latter, Theorem 1 (and thus, (7)) applies correspondingly (e.g., to the generalized power divergences D c ˜ · φ γ ( B * * , A * * ) = D c ˜ · φ γ ( P B , P A ) given by (9) and the generalized Jensen–Shannon divergence D φ s n K L , c ˜ ( B * * , A * * ) = D φ s n K L , c ˜ ( P B , P A ) given by (11))—unless there is a more restrictive constraint that violates (2) such as, e.g., k = 1 2 K p k B = c with c K for which Theorem 2 (and thus, (17)) — as well as its consequences (19), (21), (23), (24) (and thus, (25)–(28)) can be employed; such a situation appears, e.g., in the case ν = 1 together with N B ( x ) : = 1 M B ( x ) , which leads to c = K .
For the ν –rung orthopair fuzzy sets C * * in Y , we can also further “flexibilize” our divergences by additionally incorporating the hesitancy degree of the element d k to C, which is defined as
H C ( d k ) : = 1 M C ( d k ) ν N C ( d k ) ν 1 / ν [ 0 , 1 ]
(cf. [28]), and which implies the key constraint
H C ( d k ) ν + M C ( d k ) ν + N C ( d k ) ν = 1 for all k { 1 , , K } .
Accordingly, the object C * * * : = { x , M C ( x ) , N C ( x ) , H C ( x ) | x Y } can be regarded as an extended representation of the ν –rung orthopair fuzzy set C * * in Y . For C * * * , we model the corresponding vector of concatenated membership, nonmembership and hesitancy degrees to C by
P C : = p k C k = 1 , , 3 K : = M C ( d 1 ) , M C ( d K ) , N C ( d 1 ) , N C ( d K ) , H C ( d 1 ) , H C ( d K )
which, due to (34), satisfies the aggregated key constraint
k = 1 3 K p k C ν = K ;
in other words, P C lies (within the 3 K –dimensional Euclidean space) in the intersection of the first/positive orthant with the ν –norm sphere centered at the origin and with radius K 1 / ν . Analogously to (32) and (33), we can define the generalized φ–divergence D φ ( B * * * , A * * * ) between the extended representation type ν–rung orthopair fuzzy sets B * * * and A * * * (on the same universe Y ) as (cf. [44])
D φ ( B * * * , A * * * ) : = D φ ( P B ν , P A ν ) : = k = 1 3 K p k A ν · φ p k B ν p k A ν = k = 1 K { M A ( d k ) ν · φ M B ( d k ) ν M A ( d k ) ν + N A ( d k ) ν · φ N B ( d k ) ν N A ( d k ) ν + H A ( d k ) ν · φ H B ( d k ) ν H A ( d k ) ν } 0 ;
for technical reasons, we do not deal with its variant
D φ v a r ( B * * * , A * * * ) : = D φ ( P B , P A ) = k = 1 3 K p k A · φ p k B p k A = k = 1 K M A ( d k ) · φ M B ( d k ) M A ( d k ) + N A ( d k ) · φ N B ( d k ) N A ( d k ) + H A ( d k ) · φ H B ( d k ) H A ( d k ) 0 .
For instance, by taking the special choice ν = 2 and φ ( t ) : = φ s n K L , 1 ( t ) (cf. (10)) in (35), we arrive at the Jensen–Shannon divergence between B * * * and A * * * of the form D φ s n K L , 1 ( B * * * , A * * * ) : = D φ s n K L , 1 ( P B , P A ) , which—by the virtue of (35) and (11) — coincides with the (squared) Pythagorean fuzzy set Jensen–Shannon divergence measure between B * * * and A * * * of [24].
To continue with the general context, as a particular application of the above-mentioned BS concepts, we can tackle the general optimization problems of the generalized power divergence type
inf B * * * Ω * * * D c ˜ · φ γ ( B * * * , A * * * ) : = inf P B Ω D c ˜ · φ γ P B ν , P A ν , γ R ] 1 , 2 [ , c ˜ ] 0 , [ ,
where Ω * * * is a collection of extended representation type ν –rung orthopair fuzzy sets whose concatenated membership–nonmembership-hesitancy degree vectors form the set Ω , satisfying (34) for C : = B (for each member) as well as the regularity properties (13) in the relative topology. Thus, for the minimization of (36) we can apply our Theorem 2 (and, consequently, (17)) by choosing there (with a slight abuse of notation) A = M P = K . Of course, we can also apply our BS optimization method to the corresponding Renyi divergences R γ ( B * * * , A * * * ) : = R γ ( P B ν , P A ν ) (via (19)), R γ n n ( B * * * , A * * * ) : = R γ n n ( P B ν , P A ν ) (via (19) and (21)) as well as I ( B * * * , A * * * ) : = I ( P B ν , P A ν ) (via (23)), I ˜ ( B * * * , A * * * ) : = I ˜ ( P B ν , P A ν ) (via (24)), and employ the correspondingly applied approximations (25)–(28). For instance, by applying (20) with A = M P = K (with a slight abuse of notation) we arrive at the non-negative γ–order Renyi divergence between ν–rung orthopair fuzzy sets given by
0 R γ n n ( B * * * , A * * * ) : = R γ n n ( P B ν , P A ν ) : = 1 γ · ( γ 1 ) · log k = 1 3 K p k B ν γ · p k A ν 1 γ log ( K ) = 1 γ · ( γ 1 ) · [ log ( k = 1 K { M B ( d k ) ν γ · M A ( d k ) ν 1 γ + N B ( d k ) ν γ · N A ( d k ) ν 1 γ + H B ( d k ) ν γ · H A ( d k ) ν 1 γ } ) log ( K ) ] 0 ; γ ] , 0 [ ] 0 , 1 [ [ 2 , [ ;
depending on γ , zero degree values may have to be excluded for finiteness. As a side remark, let us mention that our divergence (37) contrasts to the recent (first) divergence of [48] who basically uses a different scaling, the product k = 1 K instead of the sum k = 1 K , as well as p k A ν + p k B ν 2 instead of p k A ν . By appropriately applying (19) and (21), we can tackle with our BS method for γ ] , 0 [ ] 0 , 1 [ [ 2 , [ the minimization problem inf B * * * Ω * * * R γ n n ( B * * * , A * * * ) , where Ω * * * is a collection of extended representation type ν –rung orthopair fuzzy sets B * * * whose concatenated membership–nonmembership-hesitancy degree vectors form the set Ω satisfying (34) for C : = B (for each member) as well as the regularity properties (13) in the relative topology.
We can also apply our BS optimization method to “crossover cases” D φ ( B * , P ) : = D φ ( P B , P ) and D P , φ ( A * ) : = D φ ( P , P A ) (instead of (29), D φ ( B * * , P ) : = D φ ( P B , P ) and D φ ( P , A * * ) : = D φ ( P , P A ) (instead of (32)), D φ v a r ( B * * , P ) : = D φ ( P B ν , P ) and D φ v a r ( P , A * * ) : = D φ ( P , P A ν ) (instead of (33)), D φ ( B * * * , P ) : = D φ ( P B ν , P ) and D φ ( P , A * * * ) : = D φ ( P , P A ν ) (instead of (35)), R γ ( B * * * , P ) : = R γ ( P B ν , P ) and R γ ( P , A * * * ) : = R γ ( P , P A ν ) , R γ n n ( B * * * , P ) : = R γ n n ( P B ν , P ) and R γ n n ( P , A * * * ) : = R γ n n ( P , P A ν ) , I ( B * * * , P ) : = I ( P B ν , P ) and I ( P , A * * * ) : = I ( P , P A ν ) , I ˜ ( B * * * , P ) : = I ˜ ( P B ν , P ) and I ˜ ( P , A * * * ) : = I ˜ ( P , P A ν ) , D c ˜ · φ γ ( B * * * , P ) : = D c ˜ · φ γ ( P B ν , P ) and D c ˜ · φ γ ( P , A * * * ) : = D c ˜ · φ γ ( P , P A ν ) , where P R d i m (respectively, R 0 d i m , R > 0 d i m , A · S 0 d i m or A · S > 0 d i m ) is a general vector (not necessarily induced by a fuzzy set) having the same dimension d i m (namely, K, 2 K or 3 K ) as the fuzzy set induced vector to be compared with. For instance, if we apply Remark 2 to P:= u n i f : = ( 1 3 K , , 1 3 K ) and Q : = P B ν and employ the corresponding straightforward application of the general results of Sections VIII and XII of [29], then we end up (via the appropriately applied Theorem 2) with the BS optimization results of D c ˜ · φ γ ( B * * * , u n i f ), R γ ( B * * * , u n i f ), I ( B * * * , u n i f ) and I ˜ ( B * * * , u n i f ), which can be deterministically transformed into the BS optimization results of various different versions of entropies E ( B * * * ) : = E P B ν of ν –rung orthopairs fuzzy sets B * * * , where E ( · ) is any entropy in Chapter VIII of [29].
As a final remark of this section, let us mention that we can carry over the above-mentioned definitions and optimizations to (classical, intuitionistic, Pythagorean and ν –rung orthopair) L–fuzzy sets, where the range of the membership functions, nonmembership functions and hesitancy functions is an appropriately chosen lattice L (rather than L = [ 0 , 1 ] ); for the sake of brevity, the details are omitted here.

5. Minimization Problems with Basic Belief Assignments

Our BS framework also covers—imprecise/inexact/vague information describing—basic belief assignments from Dempster–Shafer evidence theory (cf. [20,21]) and optimization problems on the divergences between those. Indeed, let Y = d 1 , , d K be a finite set (called the frame of discernment) of mutually exclusive and collectively exhaustive events d k . The corresponding power set of Y is denoted by 2 Y and has 2 K elements; we enumerate this by 2 Y : = { A 1 , , A 2 K } , where for convenience we set A 1 : = . A mapping M : 2 Y [ 0 , 1 ] is called a basic belief assignment (BBA) (sometimes alternatively called basic probability assignment (BPA)) if it satisfies the two conditions
M ( ) = 0 and A 2 Y M ( A ) = 1 .
Here, the belief mass M ( A ) reflects, e.g., the trust degree of evidence to proposition A 2 Y . From this, one can build the belief function B e l : 2 Y [ 0 , 1 ] by B e l ( A ) : = B : B A M ( B ) and the plausibility function P l : 2 Y [ 0 , 1 ] by P l ( A ) : = B : B A M ( B ) . Moreover, we model the 2 K –dimensional vector of (M–based) BBA values (vector of (M–based) belief masses) by P M : = p k M k = 1 , , 2 K : = M ( A k ) k = 1 , , 2 K , which satisfies the key constraint  0 p k M 1 for all k { 1 , , 2 K } and, by virtue of (38), the aggregated key constraint  k = 1 2 K p k M = 1 . Hence, P M lies formally in the 2 K –dimensional simplex S 2 K (but generally not in the corresponding probability vector-describing S K ).
For divergence generators φ in Υ ˜ ( ] a , b [ ) with (say) 0 a < 1 < b and for two BBAs M 1 , M 2 on the same frame of discernment Y , we can apply (1) to the corresponding vectors of BBA values and define the generalized φ–divergence D φ ( M 2 , M 1 ) between the BBAs M 2 and M 1 (in short, Belief–φ–divergence) as
D φ ( M 2 , M 1 ) : = D φ ( P M 2 , P M 1 ) = k = 1 2 K p k M 1 · φ p k M 2 p k M 1 = k = 1 2 K M 1 ( A k ) · φ M 2 ( A k ) M 1 ( A k ) 0 .
This definition (39) of the Belief– φ –divergence was first given in our paper [44]; later on, [49] used formally the same definition under the different (but deficient) assumptions that φ is only convex and satisfies φ ( 1 ) = 0 , which leads to a violation of the basic divergence properties (take, e.g., φ ( t ) = 0 for all t, which implies D φ ( P M 2 , P M 1 ) = 0 even if P M 2 P M 1 ; such an effect is excluded in our setup due to our assumption (G4’) being part of Υ ˜ ( ] a , b [ ) ); as a technical remark let us mention that (as already indicated in Section 2) depending on φ , zero belief masses may have to be excluded for finiteness in (39). For instance, we can take in (39) the special case φ ( t ) : = φ s n K L , 1 ( t ) (cf. (10)) to end up with the Belief-Jensen–Shannon divergence of [25,26] who applies this to multi-sensor data fusion. As another special case we can take φ ( t ) : = φ 1 / 2 ( t ) (cf. (8)) to end up with the 4–times square of the Hellinger distance of BBAs of [27], who use this for characterizing the degree of conflict between BBAs. To continue with the general context, as a particular application of the above-mentioned BS concepts, we can tackle general optimization problems of the type
inf M 2 Ω B B A D c ˜ · φ γ ( M 2 , M 1 ) : = inf P M 2 Ω D c ˜ · φ γ ( P M 2 , P M 1 ) γ R ] 1 , 2 [ , c ˜ ] 0 , [ ,
where Ω B B A is a collection of BBAs whose vectors of BBA-values form the set Ω S 2 K satisfying the regularity properties (13) in the relative topology; hence, we can apply our Theorem 2 (and thus, (17)) for the minimization of (40), by taking 2 K instead of K as well as (with a slight abuse of notation) A = M P M 1 = 1 . Of course, we can also apply our BS optimization method to the corresponding Renyi divergences R γ ( M 2 , M 1 ) : = R γ ( P M 2 , P M 1 ) (via (19)), R γ n n ( M 2 , M 1 ) : = R γ n n ( P M 2 , P M 1 ) (via (19) and (21)) as well as I ( M 2 , M 1 ) : = I ( P M 2 , P M 1 ) (via (23)), I ˜ ( M 2 , M 1 ) : = I ˜ ( P M 2 , P M 1 ) (via (24)), and employ the correspondingly applied approximations (25)–(28). For instance, by applying (18) and (19) we arrive at
R γ ( M 2 , M 1 ) : = R γ ( P M 2 , P M 1 ) = log k = 1 2 K ( M 2 ( A k ) ) γ · ( M 1 ( A k ) ) 1 γ γ · ( γ 1 ) , if γ ] , 0 [ ] 0 , 1 [ [ 2 , [ ,
and
inf M 2 Ω B B A R γ ( M 2 , M 1 ) = inf P M 2 Ω R γ ( P M 2 , P M 1 ) = lim n log 1 + γ · ( γ 1 ) · H γ , c ˘ , A ˘ 1 n log ξ n w W ˜ Ω γ · ( γ 1 )
where ξ n w W ˜ is chosen according to Theorem 2 with 2 K instead of K as well as A = M P M 1 = 1 .
We can also apply our BS optimization method to the divergences between the rescaling of BBAs. For instance, let M ˘ ( A ) : = M ( A ) 2 | A | 1 ( A 2 Y ) with the convention that 0 0 : = 0 , and denote the corresponding vector P M ˘ : = p k M ˘ k = 1 , , 2 K : = M ˘ ( A k ) k = 1 , , 2 K . Accordingly, we define the generalized φ–divergence D φ ( M ˘ 2 , M ˘ 1 ) between the rescaled BBAs M ˘ 2 and M ˘ 1 (in short, rescaled Belief–φ–divergence) as (cf. [44])
D φ ( M ˘ 2 , M ˘ 1 ) : = D φ ( P M ˘ 2 , P M ˘ 1 ) = k = 1 2 K p k M ˘ 1 · φ p k M ˘ 2 p k M ˘ 1 = k = 2 2 K M ˘ 1 ( A k ) · φ M ˘ 2 ( A k ) M ˘ 1 ( A k ) = k = 2 2 K M 1 ( A k ) 2 | A k | 1 · φ M 2 ( A k ) M 1 ( A k ) 0 ,
where for A 1 : = we have used the convention that 0 · φ ( 0 0 ) : = 0 (depending on φ , other zero rescaled belief masses may have to be excluded for finiteness). For the corresponding minimization problem
inf M ˘ 2 Ω r e s B B A D φ ( M ˘ 2 , M ˘ 1 ) : = inf P M ˘ 2 Ω D φ ( P M ˘ 2 , P M ˘ 1 )
where Ω r e s B B A is a collection of rescaled BBAs whose vectors of rescaled BBA values form the set Ω satisfying (2) and (4), we can apply Theorem 1 and (7) (with 2 K instead of K)—unless there is a more restrictive constraint that violates (2) such as, e.g., k = 1 2 K p k M ˘ 2 = A , for which Theorem 2 (and thus, (17)) can be employed.
We can also apply our BS optimization method to “crossover cases” D φ ( M 2 , P ) : = D φ ( P M 2 , P ) and D φ ( P , M 1 ) : = D φ ( P , P M 1 ) (instead of (39)), D φ ( M ˘ 2 , P ) : = D φ ( P M ˘ 2 , P ) and D φ ( P , M ˘ 1 ) : = D φ ( P , P M ˘ 1 ) (instead of (41)), R γ ( M 2 , P ) : = R γ ( P M 2 , P ) and R γ ( P , M 1 ) : = R γ ( P , P M 1 ) , R γ n n ( M 2 , P ) : = R γ n n ( P M 2 , P ) and R γ n n ( P , M 1 ) : = R γ n n ( P , P M 1 ) , I ( M 2 , P ) : = I ( P M 2 , P ) and I ( P , M 1 ) : = I ( P , P M 1 ) , I ˜ ( M 2 , P ) : = I ˜ ( P M 2 , P ) and I ˜ ( P , M 1 ) : = I ˜ ( P , P M 1 ) , D c ˜ · φ γ ( M 2 , P ) : = D c ˜ · φ γ ( P M 2 , P ) and D c ˜ · φ γ ( P , M 1 ) : = D c ˜ · φ γ ( P , P M 1 ) , where P R 2 K (respectively, R 0 2 K , R > 0 2 K , A · S 0 2 K or A · S > 0 2 K ) is a general vector (not necessarily induced by a (rescaled) BBA) having the same dimension (namely, 2 K ) as the (rescaled) BBA-induced vector to be compared with. For instance, if we apply Remark 2 to P:= u n i f : = ( 1 2 K , , 1 2 K ) and Q : = P M 2 and employ the corresponding straightforward application of the general results of Sections VIII and XII of [29], then we end up (via the appropriately applied Theorem 2) with the BS optimization results of D c ˜ · φ γ ( M 2 , u n i f ), R γ ( M 2 , u n i f ), I ( M 2 , u n i f ) and I ˜ ( M 2 , u n i f ) , which can be deterministically transformed into the BS optimization results of various different versions of entropies E ( M 2 ) : = E P M 2 of BBAs M 2 , where E ( · ) is any entropy in Chapter VIII of [29].
To give a more explicit example concerning the preceding paragraph, our BS method of Theorem 2 and (17) (with 2 K instead of K as well as A = M P M 1 = 1 ) applies to the crossover case
inf M Ω B B A D c ˜ · φ γ ( M , P ) : = inf P M Ω D c ˜ · φ γ ( P M , P ) γ R ] 1 , 2 [ , c ˜ ] 0 , [ ,
where P M is a vector of M–based BBA values and Ω B B A is as above, and P is a vector whose sum of components may not necessarily be one. For instance, for the special choice γ = 1 , i.e., φ ( t ) : = φ 1 ( t ) : = t · log t + 1 t [ 0 , [ (cf. (8)), P M : = p k M k = 1 , , 2 K : = M ( A k ) k = 1 , , 2 K , P : = p k k = 1 , , 2 K with p k : = 2 | A k | 1 employing the cardinality | A k | of A k , and the usual convention 0 · log ( 0 0 ) : = 0 , we end up with (cf. (9))
D φ 1 ( M , P ) = D φ 1 ( P M , P ) = k = 2 2 K M ( A k ) · log M ( A k ) 2 | A k | 1 1 + k = 2 2 K ( 2 | A k | 1 ) = : E D E ( M ) 1 + k = 2 2 K ( 2 | A k | 1 )
where E D E ( M ) : = k = 2 2 K M ( A k ) · log M ( A k ) 2 | A k | 1 0 is nothing but (a multiple of) Deng’s entropy of the BBA M (cf. [50], see also, e.g., [51]). Of course, we can also deal with the optimization of corresponding Renyi divergences inf M Ω B B A R γ ( M , P ) = inf P M Ω R γ ( P M , P ) by applying (18) and (19). For the “reverse-argument” crossover case inf M Ω B B A D φ ( P , M ) : = inf P Ω D φ ( P , P M ) one can apply Theorem 1 (and thus, (7))—unless there is a more restrictive constraint, which violates (2) such as, e.g., k = 1 2 K p k = A , for which Theorem 2 (and hence, (17))—as well as its consequences (19), (21), (23), (24) (and thus, (25)–(28)) can be employed.

6. Conclusions

In this paper, we have—in terms of generalized φ–divergences—quantified the dissimilarity between fuzzy sets and some of their subsetups such as ν –rung orthopair fuzzy sets and extended representation type ν –rung orthopair fuzzy sets; crossover-cases of generalized φ–divergences between fuzzy set types and general vectors have been treated, too. Moreover, we have presented how to tackle corresponding constrained minimization problems by appropriately applying our recently developed dimension-free bare (pure) simulation method of [29]. Afterwards, an analogous program is carried out by defining generalized φ–divergences between (rescaled) basic belief assignments as well as between (rescaled) basic belief assignments and vectors, and tackling corresponding constrained minimization problems.
As an outlook for future work, it is interesting to study ways of efficiently simulating the corresponding minimizers (in addition to the here-addressed minima). This ongoing research will appear in a future paper.

Author Contributions

Both authors have equally contributed to all aspects of this paper. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

The authors would like to thank the reviewers for their helpful suggestions concerning a more comfortable readability of the paper. W. Stummer is grateful to the Sorbonne Université Paris for its multiple partial financial support and especially to the LPSM for its multiple great hospitality. M. Broniatowski thanks very much the FAU Erlangen-Nürnberg for its partial financial support and hospitality.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Csiszár, I. Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizität von Markoffschen Ketten. Publ. Math. Inst. Hungar. Acad. Sci. 1963, A-8, 85–108. [Google Scholar]
  2. Ali, M.S.; Silvey, D. A general class of coefficients of divergence of one distribution from another. J. Roy. Statist. Soc. B 1966, 28, 131–140. [Google Scholar] [CrossRef]
  3. Morimoto, T. Markov processes and the H-theorem. J. Phys. Soc. Jpn. 1963, 18, 328–331. [Google Scholar] [CrossRef]
  4. Liese, F.; Vajda, I. Convex Statistical Distances; Teubner: Leipzig, Germany, 1987. [Google Scholar]
  5. Read, T.R.C.; Cressie, N.A.C. Goodness-of-Fit Statistics for Discrete Multivariate Data; Springer: New York, NY, USA, 1988. [Google Scholar]
  6. Vajda, I. Theory of Statistical Inference and Information; Kluwer: Dordrecht, The Netherlands, 1989. [Google Scholar]
  7. Csiszár, I.; Shields, P.C. Information Theory and Statistics: A Tutorial; Now Publishers: Hanover, NH, USA, 2004. [Google Scholar]
  8. Pardo, L. Statistical Inference Based on Divergence Measures; Chapman & Hall/CRC: Boca Raton, FL, USA, 2006. [Google Scholar]
  9. Liese, F.; Miescke, K.J. Statistical Decision Theory: Estimation, Testing, and Selection; Springer: New York, NY, USA, 2008. [Google Scholar]
  10. Liese, F.; Vajda, I. On divergences and informations in statistics and information theory. IEEE Trans. Inf. Theory 2006, 52, 4394–4412. [Google Scholar] [CrossRef]
  11. Vajda, I.; van der Meulen, E.C. Goodness-of-fit criteria based on observations quantized by hypothetical and empirical percentiles. In Handbook of Fitting Statistical Distributions with R; Karian, Z.A., Dudewicz, E.J., Eds.; CRC: Heidelberg, Germany, 2010; pp. 917–994. [Google Scholar]
  12. Reid, M.D.; Williamson, R.C. Information, divergence and risk for binary experiments. J. Mach. Learn. Res. 2011, 12, 731–817. [Google Scholar]
  13. Basseville, M. Divergence measures for statistical data processing—An annotated bibliography. Signal Process. 2013, 93, 621–633. [Google Scholar] [CrossRef]
  14. Lindsay, B.G. Statistical distances as loss functions in assessing model adequacy. In The Nature of Scientific Evidence; Taper, M.P., Lele, S.R., Eds.; The University of Chicago Press: Chicago, IL, USA, 2004; pp. 439–487. [Google Scholar]
  15. Lindsay, B.G.; Markatou, M.; Ray, S.; Yang, K.; Chen, S.-C. Quadratic distances on probabilities: A unified foundation. Ann. Statist. 2008, 36, 983–1006. [Google Scholar] [CrossRef]
  16. Markatou, M.; Sofikitou, E. Non-quadratic distances in model assessment. Entropy 2018, 20, 464. [Google Scholar] [CrossRef] [PubMed]
  17. Markatou, M.; Chen, Y. Statistical distances and the construction of evidence functions for model adequacy. Front. Ecol. Evol. 2019, 7, 447. [Google Scholar] [CrossRef]
  18. Broniatowski, M.; Stummer, W. A unifying framework for some directed distances in statistics. In Geometry and Statistics; Nielsen, F., Rao, A.S.R.S., Rao, C.R., Eds.; Handbook of Statistics; Academic Press: Cambrigde MA, USA, 2022; Volume 46, pp. 145–223. [Google Scholar]
  19. Zadeh, L.A. Fuzzy sets. Inf. Control 1965, 8, 338–353. [Google Scholar] [CrossRef]
  20. Dempster, A.P. Upper and lower probabilities induced by a multivalued mapping. Ann. Math. Stat. 1967, 38, 325–339. [Google Scholar] [CrossRef]
  21. Shafer, G. A Mathematical Theory of Evidence; Princeton University Press: Princeton, NJ, USA, 1976. [Google Scholar]
  22. Bhandari, D.; Pal, N.R. Some new information measures for fuzzy sets. Inf. Sci. 1993, 67, 209–228. [Google Scholar] [CrossRef]
  23. Vlachos, I.K.; Sergiadis, G.D. Intuitionistic fuzzy information—Applications to pattern recognition. Pattern Recogn. Lett. 2007, 28, 197–206. [Google Scholar] [CrossRef]
  24. Xiao, F.; Ding, W. Divergence measure of Pythagorean fuzzy sets and its application in medical diagnosis. Appl. Soft. Comput. J. 2019, 79, 254–267. [Google Scholar] [CrossRef]
  25. Xiao, F. Multi-sensor data fusion based on the belief divergence measure of evidences and the belief entropy. Inf. Fusion 2019, 46, 23–32. [Google Scholar] [CrossRef]
  26. Xiao, F. A new divergence measure for belief functions in D-S evidence theory for multisensor data fusion. Inf. Sci. 2020, 514, 462–483. [Google Scholar] [CrossRef]
  27. Li, J.; Xie, B.; Jin, Y.; Hu, Z.; Zhou, L. Weighted conflict evidence combination method based on hellinger distance and the belief entropy. IEEE Access 2020, 8, 225507–225521. [Google Scholar] [CrossRef]
  28. Yager, R.R. Generalized orthopair fuzzy sets. IEEE Trans. Fuzzy Syst. 2017, 25, 1222–1230. [Google Scholar] [CrossRef]
  29. Broniatowski, M.; Stummer, W. A precise bare simulation approach to the minimization of some distances. I. Foundations. IEEE Trans. Inf. Theory 2023, 69, 3062–3120. [Google Scholar] [CrossRef]
  30. Stummer, W.; Vajda, I. On divergences of finite measures and their applicability in statistics and information theory. Statistics 2010, 44, 169–187. [Google Scholar] [CrossRef]
  31. Broniatowski, M.; Keziou, A. Minimization of ϕ-divergences on sets of signed measures. Stud. Scient. Math. Hungar. 2006, 43, 403–442. [Google Scholar]
  32. Broniatowski, M.; Stummer, W. Some universal insights on divergences for statistics, machine learning and artificial intelligence. In Geometric Structures of Information; Nielsen, F., Ed.; Springer Nature: Cham, Switzerland, 2019; pp. 149–211. [Google Scholar]
  33. Csiszár, I. Sanov property, generalized I-projection and a conditional limit theorem. Ann. Probab. 1984, 12, 768–793. [Google Scholar] [CrossRef]
  34. Broniatowski, M.; Stummer, W. On a cornerstone condition of bare-simulation distance/divergence optimization. In Geometric Science of Information GSI 2023; Nielsen, F., Barbaresco, F., Eds.; Lecture Notes in Computer Science; Springer Nature: Cham, Switzerland, 2023; Volume 14071, pp. 105–116. [Google Scholar]
  35. Burbea, C.; Rao, C.R. On the convexity of some divergence measures based on entropy functions. IEEE Trans. Inf. Theory 1982, 28, 489–495. [Google Scholar] [CrossRef]
  36. Lin, J. Divergence measures based on the Shannon entropy. IEEE Trans. Inf. Theory 1991, 37, 145–151. [Google Scholar] [CrossRef]
  37. Pardo, M.C.; Vajda, I. About distances of discrete distributions satisfying the data processing theorem of information theory. IEEE Trans. Inf. Theory 1997, 43, 1288–1293. [Google Scholar] [CrossRef]
  38. Topsoe, F. Some inequalities for information divergence and related measures of discrimination. IEEE Trans. Inf. Theory 2000, 46, 1602–1609. [Google Scholar] [CrossRef]
  39. Endres, D.M.; Schindelin, J.E. A new metric for probability distributions. IEEE Trans. Inf. Theory 1993, 40, 1858–1860. [Google Scholar] [CrossRef]
  40. Vajda, I. On metric divergences of probability measures. Kybernetika 2009, 45, 885–900. [Google Scholar]
  41. Sason, I. Tight bounds for symmetric divergence measures and a new inequality relating f-divergences. In Proceedings of the 2015 IEEE Information Theory Workshop (ITW), Jerusalem, Israel, 26 April 2015–1 May 2015; 5p. [Google Scholar]
  42. Renyi, A. On measures of entropy and information. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability; Neyman, J., Ed.; University of California Press: Berkeley, CA, USA, 1961; Volume 1, pp. 547–561. [Google Scholar]
  43. van Erven, T.; Harremoes, P. Renyi divergence and Kullback–Leibler divergence. IEEE Trans. Inf. Theory 2014, 60, 3797–3820. [Google Scholar] [CrossRef]
  44. Broniatowski, M.; Stummer, W. A precise bare simulation approach to the minimization of some distances. Foundations. arXiv 2021, arXiv:2107.01693v1. Correction in arXiv 2022, arXiv:2107.01693v2. [Google Scholar] [CrossRef]
  45. Atanassov, K.T. Intuitionistic fuzzy sets. Fuzzy Sets Syst. 1986, 20, 87–96. [Google Scholar] [CrossRef]
  46. Yager, R.R. Pythagorean fuzzy subsets. In Proceedings of the 2013 Joint IFSA World Congress and NAFIPS Annual Meeting (IFSA/NAFIPS), Edmonton, AB, Canada, 24–28 June 2013; pp. 57–61. [Google Scholar]
  47. Yager, R.R.; Abbasov, A.M. Pythagorean membership grades, complex numbers, and decision making. Int. J. Intell. Syst. 2013, 28, 436–452. [Google Scholar] [CrossRef]
  48. Verma, R. Multiple attribute group decision-making based on order-α divergence and entropy measures under q-rung orthopair fuzzy environment. Int. J. Intell. Syst. 2020, 35, 718–750. [Google Scholar] [CrossRef]
  49. Huang, J.; Song, X.; Xiao, F.; Cao, Z.; Lin, C.-T. Belief f–divergence for EEG complexity evaluation. Inf. Sci. 2023, 643, 119189. [Google Scholar] [CrossRef]
  50. Deng, Y. Deng entropy. Chaos Solitons Fract. 2016, 91, 549–553. [Google Scholar] [CrossRef]
  51. Kang, B.; Deng, Y. The maximum Deng entropy. IEEE Access 2020, 7, 120758–120765. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Broniatowski, M.; Stummer, W. Some Theoretical Foundations of Bare-Simulation Optimization of Some Directed Distances between Fuzzy Sets Respectively Basic Belief Assignments. Entropy 2024, 26, 312. https://doi.org/10.3390/e26040312

AMA Style

Broniatowski M, Stummer W. Some Theoretical Foundations of Bare-Simulation Optimization of Some Directed Distances between Fuzzy Sets Respectively Basic Belief Assignments. Entropy. 2024; 26(4):312. https://doi.org/10.3390/e26040312

Chicago/Turabian Style

Broniatowski, Michel, and Wolfgang Stummer. 2024. "Some Theoretical Foundations of Bare-Simulation Optimization of Some Directed Distances between Fuzzy Sets Respectively Basic Belief Assignments" Entropy 26, no. 4: 312. https://doi.org/10.3390/e26040312

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop