1. Introduction
The representation and quantification of uncertainty are central issues in information theory, artificial intelligence, and decision-making under incomplete or imprecise knowledge. Classical probability theory (PT) provides a rigorous and widely accepted framework for uncertainty modeling; however, it requires precise probability assignments that are often unavailable in real-world applications. This limitation has motivated the development of more general models of uncertainty, commonly referred to as theories based on imprecise probabilities [
1].
Among these frameworks, evidence theory (ET), also known as Dempster–Shafer theory [
2,
3], has been extensively employed to manage uncertainty-based information in practical applications such as medical diagnosis [
4], statistical classification [
5], target identification [
6], and face recognition [
7]. Evidence theory extends PT by introducing the concept of a
basic probability assignment (BPA), which generalizes the notion of a probability distribution. Each BPA induces an associated belief function and a plausibility function, where the belief (respectively, plausibility) of a set represents the minimum (respectively, maximum) degree of support that the available evidence provides for that set.
Quantifying the uncertainty represented by a BPA is a fundamental problem in evidence theory. To this end, numerous uncertainty measures have been proposed, most of which are inspired by Shannon entropy [
8], the standard measure of uncertainty in PT. However, extending Shannon entropy to evidence theory is non-trivial, as ET accounts for additional types of uncertainty not present in classical probability models.
As pointed out by Yager [
9], two different types of uncertainty arise in evidence theory:
conflict, which occurs when information supports disjoint sets, and
non-specificity, which arises when information is assigned to sets with cardinality greater than one. Consequently, any uncertainty measure in ET must be able to jointly capture both conflict and non-specificity in a coherent manner.
Klir and Wierman [
10] analyzed the mathematical properties that uncertainty measures in evidence theory should satisfy; this study was later extended by Abellán and Masegosa [
11], who also introduced behavioral requirements for such measures. Among all proposals to date, the maximum entropy defined on the closed and convex set of probability distributions (credal set) compatible with a BPA [
10] is the only uncertainty measure in evidence theory that satisfies all crucial mathematical properties and behavioral requirements simultaneously.
The Maximum Entropy Principle [
12] is applicable in various domains, being principally related to information theory and applications [
13,
14,
15]. It states that, in the absence of complete information, one should prioritize the most unbiased distribution by maximizing uncertainty, thereby ensuring that no groundless assumptions are introduced into the model. In the fields of evidence theory and uncertainty quantification [
16,
17,
18,
19,
20], this principle and its associated uncertainty measures are essential for constructing belief distributions that reflect incomplete or ambiguous information.
Despite its strong axiomatic foundations, the practical computation of maximum entropy in evidence theory remains challenging. The algorithm proposed in [
21] (also presented in [
19]) involves solving constrained optimization problems whose complexity grows exponentially with the size of the frame of discernment. Consequently, numerous alternative uncertainty measures with lower computational complexity have been proposed in recent years. However, these alternatives often fail to satisfy all the required axiomatic properties and behavioral conditions [
22,
23,
24], contributing to a lack of consensus regarding uncertainty measures in ET. In summary, while maximum entropy exhibits optimal axiomatic behavior, it entails a high computational cost; conversely, alternative measures are computationally efficient but theoretically weaker.
On the other hand, maximum entropy has demonstrated excellent performance in practical applications, particularly in data mining and related fields. For specific classes of belief functions, its computation becomes straightforward and can be executed efficiently, as illustrated in [
15,
25,
26].
An approximation of the maximum entropy of the credal set associated with a BPA was proposed in [
27]. This approach computes the maximum entropy over the credal set consistent with belief intervals for singleton elements, where the lower and upper bounds correspond to belief and plausibility values, respectively. Although this measure satisfies all crucial mathematical properties and behavioral requirements, utilizing belief intervals for singletons instead of the full BPA may lead to information loss. Indeed, the credal set associated with a BPA is always contained within the credal set that is compatible with the corresponding belief intervals [
27], resulting in an uncertainty measure that may overestimate the level of uncertainty represented by the original BPA. However, this method offers the advantage of bypassing the enumeration of all subsets of the frame of discernment. A key question remains regarding whether this computational advantage persists across all possible scenarios.
The aim of this paper is to provide a comparative study of two algorithms for computing maximum entropy in the theory of evidence: the Meyerowitz et al. algorithm [
19,
21], which operates directly on belief functions; and the maximum entropy algorithm based on reachable probability intervals [
18,
27], derived from evidential constraints. The comparison addresses both theoretical aspects and numerical behavior, highlighting the advantages and limitations of each approach under various uncertainty scenarios.
The remainder of this paper is organized as follows.
Section 2 introduces the basic concepts of evidence theory, reviews the main uncertainty measures proposed in this framework, and describes the algorithms for computing maximum entropy from a BPA and from belief intervals for singletons.
Section 3 presents the comparative study via numerical examples and experiments in which millions of different belief functions are randomly generated. Finally, concluding remarks and directions for future work are provided in
Section 4.
2. Background
Let be a finite set of possible alternatives, also known as the frame of discernment. Let denote the power set of X.
2.1. Theory of Evidence
Evidence theory (ET), also known as Dempster–Shafer theory [
2,
3], is based on the concept of a
basic probability assignment (BPA). A BPA is a mapping
satisfying
and
.
If satisfies , then A is said to be a focal element of m.
A given BPA
m on
X has an associated
belief function and a
plausibility function . These functions are defined as follows:
It should be noted that for each
,
. The interval
is referred to as the
belief interval of
A. Furthermore,
where
denotes the complement of
A. Consequently,
and
are considered dual or conjugate functions. Either function is sufficient to represent uncertainty-based information in ET; for this purpose,
is more commonly utilized.
For a given BPA
m on
X, the set of compatible probability distributions (which corresponds to a closed and convex set, also known as a
credal set) is defined as follows:
where
is the set of all probability distributions on
X.
2.2. Uncertainty Measures in Evidence Theory
Shannon entropy [
8] is the standard uncertainty measure in probability theory. Given a probability distribution
p defined on a finite set
, Shannon entropy is defined as follows:
The type of uncertainty quantified by
S is usually referred to as
conflict, which is the only form of uncertainty present in classical probability theory. Shannon entropy satisfies a well-known set of desirable axiomatic properties [
8,
10].
In possibility theory, uncertainty is commonly quantified by the Hartley measure [
28], defined as follows:
This measure captures
non-specificity, the sole type of uncertainty considered in possibility theory.
As noted by Yager [
9], conflict and non-specificity coexist in ET. Conflict arises when information supports mutually exclusive subsets, whereas non-specificity appears when information is assigned to sets with cardinality greater than one. A generalization of the Hartley measure to evidence theory was introduced by Dubois and Prade [
29]:
attains its minimum value (zero) when
m is a probability distribution, and its maximum value (
) when
. It constitutes an appropriate measure of non-specificity in ET and can be naturally extended to more general uncertainty frameworks [
18].
Numerous attempts have been made to generalize Shannon entropy to evidence theory; however, most proposals fail to satisfy the essential mathematical and behavioral requirements for this framework. A total uncertainty measure capable of jointly capturing conflict and non-specificity was proposed by Harmanec and Klir [
19]. This measure, denoted by
, is defined as the maximum Shannon entropy over the credal set
associated with a BPA
m:
To date, this is the only measure that satisfies all necessary mathematical properties and behavioral requirements for uncertainty measures in evidence theory [
19,
26].
Despite its strong foundations, computing
is computationally demanding. Algorithms proposed in the literature [
19,
21,
30,
31] involve solving nonlinear optimization problems with exponential complexity. Consequently, several alternative measures with lower computational costs have been proposed.
One well-known alternative is Deng entropy [
22,
32,
33,
34], defined as follows:
In this formulation, the expression can be decomposed into two components: one capturing non-specificity and the other quantifying conflict. However, Deng entropy violates several essential mathematical properties and exhibits problematic behavior in various scenarios [
23]. Similarly, Pan [
35] introduced an uncertainty measure based on the plausibility transformation:
where
, and
is the
plausibility transformation value. As shown in [
27],
also fails to satisfy all required mathematical properties.
Let us now consider the set of belief intervals for singleton elements associated with a BPA
m:
Zhao et al. [
36] proposed a measure (
) combining Deng entropy with these intervals. Nevertheless, it does not satisfy all crucial mathematical properties required in ET [
27].
Let
denote the credal set consistent with the belief intervals for singleton elements:
In [
27], a new uncertainty measure was proposed as the maximum entropy over
:
This measure satisfies all essential requirements [
27]. However, since
, representing uncertainty through singleton belief intervals may result in information loss. Consequently,
may indicate a higher level of uncertainty than the original BPA. Its primary advantage lies in the significant reduction of computational complexity, albeit at the cost of potential information loss.
2.3. Maximum Entropy from a Belief Function
The maximum entropy associated with a belief function is obtained by solving a constrained optimization problem over the credal set induced by a basic probability assignment. An exact procedure for this task was proposed by Meyerowitz et al. [
21] and subsequently by Harmanec and Klir [
19]. The algorithm iteratively constructs the probability distribution that maximizes Shannon entropy while satisfying the evidential constraints. Algorithm 1 calculates the maximum entropy of a BPA
m with associated
function. The procedure is described as follows:
| Algorithm 1 Algorithm to attain the maximum entropy from a Belief function |
current frame of discernment associated belief function while and
do Select a non-empty subset maximizing If multiple subsets satisfy the condition, select the one with maximum cardinality for do Assign probability to x end for for do end for end while if
then for do Assign probability 0 to x end for end if |
The resulting probability distribution maximizes Shannon entropy under the constraints induced by the original belief function. Although exact, this algorithm requires evaluating all subsets of the frame of discernment at each iteration, resulting in exponential computational complexity.
2.4. Maximum Entropy from Reachable Probability Intervals
An alternative approach to computing maximum entropy is based on the reachable probability intervals [
1] derived from the belief and plausibility values of singleton elements. Let
denote the set of such intervals, where
and
.
This algorithm builds upon the work in [
18] to obtain the maximum entropy on the credal set associated with a reachable set of probability intervals. We introduce the following notation:
: the minimum value of the probability distribution p among the components whose indices belong to the set .
: the second smallest value of the probability distribution p among the components in . If no such value exists, .
: the number of indices in that attain the minimum value of the probability distribution p.
: the minimum value among the real numbers .
The following procedure (Algorithm 2) yields the probability distribution
that attains the maximum entropy on
:
| Algorithm 2 Algorithm to attain the maximum entropy from reachable probability intervals |
for to n do end for while
do for do if then end if end for for do if then if then else end if end if end for end while |
Unlike the belief-function-based algorithm, this approach exhibits polynomial computational complexity relative to the number of singleton elements.
2.5. Discussion
The two algorithms analyzed in this work address the same conceptual objective—computing maximum entropy within the framework of evidence theory—but rely on fundamentally different representations of uncertainty. Meyerowitz et al.’s algorithm operates directly on belief functions and computes the exact maximum entropy associated with the credal set induced by a BPA. In contrast, the interval-based algorithm computes maximum entropy over a larger credal set defined solely by the belief and plausibility values of singleton elements.
From a theoretical perspective, the Meyerowitz et al. algorithm preserves the full informational content of the belief function, providing an exact characterization of total uncertainty. However, this precision entails high computational complexity that grows exponentially with the size of the frame of discernment. Consequently, its applicability is primarily limited to problems of moderate dimensionality.
The interval-based algorithm significantly reduces computational overhead by restricting the optimization problem to singleton constraints. Although this approach may result in information loss—since the credal set induced by singleton intervals contains only the one associated with the original belief function—the experimental results demonstrate that the resulting maximum entropy values are often remarkably close or even identical to those obtained by the exact algorithm. This suggests that, in many practical scenarios, the loss of specificity introduced by interval representations has a negligible impact on the total uncertainty measure.
Contrary to claims in previous studies, our analysis indicates that the discrepancy between the two approaches is not as pronounced as might be expected. The interval-based method provides a reliable approximation of maximum entropy while offering substantial computational advantages. These findings underscore the importance of balancing representational accuracy with computational feasibility when selecting algorithms for evidence-based models.
3. Comparison of Algorithms
Our principal aim in this work is to compare the performance of the maximum entropy calculation algorithms.
We will introduce several examples and apply the respective algorithms to them in order to compare them. The objective of this comparison is to see how both algorithms work in a specific situation when both are applied within the TE. To do this, we will first study them by applying Meyerowitz et al.’s algorithm, specific to the theory of evidence, with the algorithm adapted to reachable intervals, taking only the singletons. We remark that the calculus of the values of
,
,
and
is based on the expressions in the
Section 2.
The first two examples are free of conflict, while the following two have conflict, allowing us to see how this situation could affect the effectiveness of the algorithms. To simplify, Algorithm-1 is the algorithm of Meyerowitz et al., and Algorithm-2 is the algorithm based on the reachable probability intervals of the singletons.
Example 1. Let us start with the set , whose mass function m is given byWe see that subsets , and of X are mutually disjointed, so there is no conflict. With this, we will proceed to apply the algorithms presented above. - 1.
Algorithm-1.
We first construct Table 1 with the values of the belief function for each subset and the value of . First iteration.
We observe that the maximum value of is for . With this, we have , and we can make modifications to the belief functions so that we obtain Now, we take and , performing a new iteration.
Second iteration.
We start from , so we have the values seen in Table 2:
Table 2.
Values of and in the second iteration.
Table 2.
Values of and in the second iteration.
| A | Bel(A) | |
|---|
| 0.40 | 0.20 |
| 0.35 | 0.175 |
| 0.40 | 0.1 |
| 0.40 | 0.1 |
| 0.35 | 0.11 |
| 0.35 | 0.11 |
| 0.75 | 0.1875 |
In this case, the maximum of is reached at , so we assign . Moving to the next step of the algorithm, , andso a new iteration begins.Third iteration.
Given that , the only possible non-zero value is that shown in Table 3:
Table 3.
Values of and in the third iteration.
Table 3.
Values of and in the third iteration.
| A | Bel(A) | |
|---|
| 0.35 | 0.175 |
Thus, the maximum of is . Thus, and we obtain the new , with , so we can proceed to calculate the maximum entropy.
- 2.
Algorithm-2
To apply this algorithm, we need to calculate the probability intervals of the singletons. Calculating the belief and plausibility functions associated with each element of , we obtain the values shown in Table 4: Therefore, we start with the following set of probability intervalswhere:and is the probability vector for which we will calculate the maximum entropy. First iteration.
To begin, we initialize with , where 1 corresponds to singleton a, 2 to singleton b, etc. We construct the vector with the lower bound values for each :and we see that . We check if for some , and we see that it holds for , so we remove it from S, obtaining . We calculate the following values: - *
,
- *
,
- *
,
- *
,
hence, using the algorithm, we carry out the assignmentupdating for each , so we have .
Second iteration.
We start fromwith , so the algorithm ends and we proceed to calculate the maximum entropy associated with the given distribution:
Example 2. We consider the set with a given mass function m defined by - 1.
Algorithm-1
We have the values for the belief function and for each subset , as shown in Table 5: First iteration.
We observe that the maximum of is reached for . Thus, for the elements of this set, we have and . With this, we proceed to update the values of the belief function as follows: We update , verifying that , and apply the algorithm again.
Second iteration.
We start from with the values in Table 6:
Table 6.
Values of and in the second iteration.
Table 6.
Values of and in the second iteration.
| A | Bel(A) | |
|---|
| 0.10 | 0.10 |
| 0.15 | 0.15 |
| 0.10 | 0.05 |
| 0.25 | 0.125 |
| 0.10 | 0.05 |
| 0.15 | 0.075 |
| 0.35 | 0.175 |
| 0.15 | 0.075 |
| 0.25 | 0.08 |
| 0.45 | 0.15 |
| 0.25 | 0.08 |
| 0.50 | 0.1 |
| 0.60 | 0.15 |
In this case, the maximum of is , a value that corresponds to the set . Thus, and we can update the belief function:So, our new set is whose associated belief function is ; therefore, we begin another iteration.
Third iteration.
We have the values shown in Table 7:
Table 7.
Values of and in the third iteration.
Table 7.
Values of and in the third iteration.
| A | Bel(A) | |
|---|
| 0.10 | 0.10 |
| 0.15 | 0.15 |
| 0.25 | 0.125 |
From these, we can see that is maximized if , so we assign . With this, we update the value of the function : We then have now that with , so the algorithm is applied again.
Fourth iteration.
Since we start from the set , we maximize on this same set, so we assign . By updating both the set X and the value of its belief function, we obtain and , at which point the algorithm terminates, and we can proceed to obtain the value of the maximum entropy:
- 2.
Algorithm-2
First, we will transform the data given by the mass function into reachable intervals. The values of the belief function and the plausibility function associated with each element of are as shown in Table 8: Thus, we initialize the table to correctly apply the algorithm: First iteration.
We assign , and vector is given bywhich satisfies . We look for indices i such that , with . We have and , so S becomes . Therefore, - *
,
- *
,
- *
,
- *
.
Hence, we apply the assignment , with , . Thus, , and our vector becomes
, before applying the algorithm again.
Second iteration.
We start with , where . The algorithm finishes, and we calculate the maximum associated entropy as follows:
Unlike the two previous examples, we will now study two examples in which conflict appears.
Example 3. Given a set , we define the mass function m as - 1.
Algorithm-1
The values of the non-zero belief function associated with the are shown in Table 9: First iteration.
We identify that set A, which means the maximal is . Since the chosen set coincides with X, we assign probabilities to all its elements: Thus, for each , we have , and the new set is set ∅. Therefore, we can proceed to calculate the value of the maximum entropy associated with this distribution as follows:
- 2.
Algorithm-2
For each element of we have its associated values, expressed in Table 10: We can initialize the following: First iteration.
Let us assign , so for each we have the vector: It is true that , so we proceed to check if holds for some i. It does not hold for any , so . We calculate the following:
- *
,
- *
,
- *
,
- *
, so we move to step 8.
We now assign , where
Second iteration.
The new vector is given by:verifying . Furthermore, no satisfies , so S is not modified. Thus, - *
,
- *
,
- *
,
- *
.
Hence, we assign
Third iteration.
The vector updated with the values from the previous iteration is as follows:verifying , and thus terminating the algorithm. Thus, we have arrived at the fact that the probability vector of maximum entropy is , and we proceed to obtain the value of the maximum entropy:
Example 4. We define the following mass function m on the set : - 1.
Algorithm-1
The values of for each are shown in Table 11: First iteration.
As in the previous example, we observe that the set that maximizes is , and thus assign probabilities to all elements in the following form: We then proceed to take the set , with , so we move directly to calculating the maximum entropy:
- 2.
Algorithm-2
Table 12 shows the reachable interval associated with each element of :
First iteration.
We assign , , so that: . Furthermore, no satisfies , so S remains the same. We now obtain the following values:
- *
,
- *
,
- *
,
- *
.
Since , we perform the assignment presented in step 8, and proceed to the next iteration.
Second iteration
We now start from the vector:such that . Again, no for , then - *
;
- *
;
- *
;
- *
, so we go to step 7.
We assign , and the algorithm is applied again.
Third iteration.
Since we have , we can verify that thus terminating the algorithm. Therefore, the probabilities with maximum entropy for the given intervals are , and we calculate the associated maximum entropy:
Having seen these examples, we can highlight some differences between the algorithms. To do this, we will use
Table 13:
From this first numerical comparison, we can make the following comments.
In the case of the examples where conflict appears, the number of iterations used by Algorithm-1 is lower than the number of iterations used by Algorithm-2. Furthermore, the maximum entropy coincides for both theories. Therefore, we might think that, for cases with conflict, Algorithm-1 is more efficient with respect to the number of iterations needed. However, in the case where there is no conflict, we can think that the opposite occurs.
In the following Subsection, we show what happens when we carry out an extensive experimentation with a very large number of different BPAs.
Experimentation and Computational Analysis
To evaluate the computational efficiency of both algorithms, we conducted a series of experiments generating Basic Probability Assignments (BPAs) on frames of discernment with sizes , , and . Both algorithms were implemented in the C++ programming language and executed on a system equipped with an Intel Core i5 1.8 GHz CPU and 8 GB of RAM.
To assess performance across different evidential structures, we randomly generated one million BPAs with conflict (C) and one million without conflict (NC). For conflicting cases (C), all possible subsets could serve as focal elements; values were generated in the range
and subsequently normalized. For the non-conflicting cases (NC), we considered combinations of disjoint sets. For
and
, all possible disjoint combinations were explored, while for
, focal sets were restricted to cardinalities between 2 and 8 to maximize mass distribution. The results are summarized in
Table 14.
The experimental results align with the theoretical expectations derived from the numerical examples. In scenarios involving conflict, where BPAs typically possess a larger number of focal elements, Algorithm-1 consistently outperforms Algorithm-2. This is noteworthy because a higher count of focal elements often presents a computational challenge. The relative underperformance of Algorithm-2 in these cases is attributed to the high overhead of calls to auxiliary functions. For , Algorithm-1 shows an improvement of approximately over Algorithm-2, a disparity that becomes even more pronounced at , reaching a performance gain of .
In contrast, in conflict-free scenarios, the performance hierarchy is partially reversed. Although the differences are negligible for , Algorithm-2 shows an improvement of approximately for . In these cases, the number of focal sets is restricted by the disjointness constraint, which benefits the iterative structure of Algorithm-2.
For , where the number of possible focal sets increases exponentially (), the behavioral patterns persist. In conflicting scenarios, Algorithm-1’s superiority suggests that its efficiency scales better with the number of elements in the universal set. In contrast, for non-conflict scenarios, both algorithms exhibit similar performance, with Algorithm-2 maintaining a slight advantage (below ). This convergence suggests that as n increases, the computational burden of calculating the and functions becomes the dominant factor in terms of overall complexity.
Theoretical analysis confirms these observations. Algorithm-1 solves a constrained maximization problem over the full credal set, leading to an exponential time and space complexity of . Although Algorithm-2 employs a constructive strategy that appears polynomial (), its execution time remains inherently linked to the number of focal sets . When this number is large, the frequency of calls to auxiliary functions results in an overall exponential complexity comparable to that of Algorithm-1.
In conclusion, our experimental study suggests a pragmatic approach to algorithm selection: Algorithm-2 is preferable for belief structures where conflict is absent or minimal. However, for general applications where conflict is likely—a more common occurrence in real-world data—the classical Algorithm-1 remains the more robust and efficient choice.
4. Conclusions and Future Work
The computational cost associated with the algorithm of Meyerowitz et al. has traditionally been the primary drawback to using maximum entropy as a measure to quantify uncertainty and information within Evidence Theory (ET).
This paper critically analyzes the assertion that is frequently found in the recent literature [
27]:
“The exact computation of maximum entropy in Evidence Theory, as performed by the algorithm of Meyerowitz et al. (Algorithm-1), is characterized by exponential computational complexity with respect to the size of the frame of discernment. In contrast, the interval-based formulation (Algorithm-2) reduces the problem to a polynomial-time optimization at the cost of a controlled loss of information. Since Algorithm-2 typically yields maximum entropy values close to those obtained with Algorithm-1, it constitutes a preferable alternative in practical applications, particularly for large frames of discernment.”
Our analysis reveals that when evaluating these algorithms, it is essential to distinguish between scenarios where conflict is absent and those where it is present. Theoretically, Algorithm-1 is expected to perform worse in the presence of conflict, as conflict typically increases the number of focal elements. Under such conditions, it has been generally assumed that Algorithm-2 would be more efficient. However, our experimental results demonstrate that this expected behavior does not consistently occur. This discrepancy can be attributed to the extensive number of function calls required by Algorithm-2, which effectively offsets its theoretical computational advantages.
As future work, we intend to investigate alternative procedures for computing maximum entropy in ET that offer superior computational efficiency, even if this entails obtaining approximate rather than exact values, as is currently the case with Algorithm-2.
Overall, the findings of this study challenge the common assumption that transforming a belief function into an equivalent representation based on reachable probability intervals necessarily facilitates the computation of maximum entropy.