A New Hybrid Possibilistic-Probabilistic Decision-Making Scheme for Classification

Basel Solaiman; Didier Guériot; Shaban Almouahed; Bassem Alsahwa; Éloi Bossé

doi:10.3390/e23010067

,

and

Image & Information Processing Department (iTi), IMT-Atlantique, Technopôle Brest Iroise CS 83818, 29238 Brest, France

^*

Author to whom correspondence should be addressed.

Entropy2021, 23(1), 67;https://doi.org/10.3390/e23010067

This article belongs to the Special Issue Information Entropy Algorithms for Image, Video, and Signal Processing

Version Notes

Order Reprints

Abstract

Uncertainty is at the heart of decision-making processes in most real-world applications. Uncertainty can be broadly categorized into two types: aleatory and epistemic. Aleatory uncertainty describes the variability in the physical system where sensors provide information (hard) of a probabilistic type. Epistemic uncertainty appears when the information is incomplete or vague such as judgments or human expert appreciations in linguistic form. Linguistic information (soft) typically introduces a possibilistic type of uncertainty. This paper is concerned with the problem of classification where the available information, concerning the observed features, may be of a probabilistic nature for some features, and of a possibilistic nature for some others. In this configuration, most encountered studies transform one of the two information types into the other form, and then apply either classical Bayesian-based or possibilistic-based decision-making criteria. In this paper, a new hybrid decision-making scheme is proposed for classification when hard and soft information sources are present. A new Possibilistic Maximum Likelihood (PML) criterion is introduced to improve classification rates compared to a classical approach using only information from hard sources. The proposed PML allows to jointly exploit both probabilistic and possibilistic sources within the same probabilistic decision-making framework, without imposing to convert the possibilistic sources into probabilistic ones, and vice versa.

Keywords:

possibility theory; possibilistic decision rule; possibilistic maximum likelihood; pattern classification; uncertainty; Bayesian decision; maximum a posteriori; image processing

1. Introduction

Uncertainty can be categorized into two main kinds [1]: aleatory or randomness uncertainty, aka statistical uncertainty, due to the variability or the natural randomness in a process and epistemic uncertainty, aka systematic uncertainty, which is the scientific uncertainty in the model of the process. It is due to limited data and knowledge. Epistemic uncertainty calls for alternative methods of representation, propagation, and interpretation of uncertainty than just probability. Since the beginning of the 60 s, following fruitful cross-fertilization, a convergence is emerging between physics, engineering, mathematics, and the cognitive sciences to provide new techniques and models that shows a trend of inspiration from human brain mechanism towards a unified theory to represent knowledge, belief and uncertainty [2,3,4,5,6,7,8,9].

Uncertainty is a natural and unavoidable part in real-world applications. When observing a “real-world situation”, decision making is the process of selecting among several alternatives or decisions.

The problem here is to assign a label or a class to measurements or other types of observations (data) from sensors or other sources to which the observations are assumed to belong. This is a typical classification process.

As shown in Figure 1, the general classification process can be formulated as follows. An input set of observations o (

o \in Ψ

) is “observed” using a sensor (or a set of sensors) delivering a feature vector

x \in Θ

(

Θ

is called the features set). This feature vector x is then injected into the decision-making system or labelling system in order to recognize the most likely decision (hypothesis, alternative, class) from a given exhaustive set

Ω = {C_{m} . m = 1, \dots, M}

of M exclusive decisions [10].

Figure 1. Structure of a multisource classification system (Source: [11]).

The development of both classification algorithms and decision-making criteria are governed by several factors mainly depending on the nature of the feature vector, the nature of the imperfection attached to the observed features as well as the available knowledge characterizing each decision. Several global constraints also drive the conception of the global classification process: the “physical” nature and quality of the measures delivered by the sensors, the categories discrimination capacity of the computed features, the nature and the quality of the available knowledge used for the development of the decision-making system.

However, in much of the literature, the decision-making system is performed by the application of two successive functionalities: the soft labeling and the hard decision (selection) functionalities. The labeling functionality [12] uses the available a priori knowledge in order to perform a mapping

ℓ

between the features set

Θ

and the decisions set

Ω (ℓ : Θ ⟶ Ω)

. For each feature vector

x \in Θ

, a soft decision label vector

ℓ (x) = [ℓ_{C_{1}} (x), \dots, ℓ_{C_{m}} (x), \dots, ℓ_{M} (x)] \in {[0, 1]}^{M}

is determined in the light of the available knowledge where

ℓ_{C_{M}} (x)

measures the degree of belief or support, that we have in the occurrence of the decision

C_{m}

. For instance, if the available knowledge allows probabilistic computations, the soft decision label vector is given through

ℓ_{C_{M}} (x) = P r {C_{M} | x}

[13] where

P r {C_{M} | x}

represents the a posteriori probability of the decision

C_{M}

given the observed feature vector

x \in Θ

[14]. When the available knowledge is expressed in terms of ambiguous information, the possibility theory formalism (developed by L. Zadeh [7] and D. Dubois et al. [15,16,17]) can be used. The soft decision label vector

ℓ (x)

is then expressed with an a posteriori possibility distribution

π_{x}

defined on the decisions set

Ω

. In this case,

ℓ_{C_{m}} (x) = π_{x} (C_{m})

where

π_{x} (C_{m})

represents the possibility degree for the decision

C_{m}

to occur, given the observed feature vector

x \in Θ

.

The second functionality performed by the decision-making process is called the hard decision or the selection functionality. As the ultimate goal of most classification applications is to select one and only one class (associated with the observations “o” for which the feature vector

x \in Θ

is extracted) out of the classes set

Ω

, then a mapping has to be applied in order to transform the soft decision label vector

ℓ (x)

into a hard decision label vector for which one and only one decision is selected. The goal is then to make a choice according to an optimality criterion.

In this paper, we propose a new criterion for decision-making process in classification systems called possibilistic maximum likelihood (PML). This criterion is framed within the possibility theory, but it uses corresponding notions from Bayesian decision-making. The main motivation being the development of PML is for multisource information fusion where an object or a pattern may be observed through several channels and where the available information, concerning the observed features, may be of a probabilistic nature for some features, and of an epistemic nature for some others.

In the presence of both types of information sources, most encountered studies transform one of the two information types into the other form, and then apply either the classical Bayesian or possibilistic decision-making criteria. With the PML decision-making approach, the Bayesian decision-making framework is adopted. The epistemic knowledge is integrated into the decision-making process by defining possibilistic loss values instead of the usually used zero-one loss values. A set of possibilistic loss values is proposed and evaluated in the context of pixel-based image classification where a synthetic scene, composed of several thematic classes, is randomly generated using two types of probabilistic sensors: a Gaussian and a Rayleigh sensor, complemented by an expert type of information source. Results obtained with the proposed PML criterion show that the classification recognition rates approach the optimal case, being, when all the available information is expressed in terms of probabilistic knowledge.

When the sources of information can be modelled by probability theory, the Baysesian approach has sufficient decision-making tools to fuse that information and performs classification. However, in the case where the knowledge available for the decision-making process is ill-defined in the sense that it is totally or partially expressed in terms of ambiguous information representing limitations in feature values, or, encoding linguistic expert’s knowledge about the relationship between the feature values and different potential decisions, new mathematical tools (i.e., PML) need to be developed. This type of available knowledge can be represented as a conditional possibilistic soft decision label vector

ℓ (x)

defined on the decisions set

Ω

such that,

ℓ_{C_{m}} (x) = π_{x} (C_{m}) = π (C_{m} | x)

where

π (C_{m} | x)

represents the possibility degree for the decision C_m to occur, given the observed feature vector

x \in Θ

and the underlying observations o.

Possibility theory constitutes the natural framework allowing to tackle this type of information imperfection (called the epistemic uncertainty type) when one and only one decision (hard decision) must be selected from the exhaustive decisions set

Ω

, with incomplete, ill-defined or ambiguous available knowledge thus encoded as a possibility distribution over

Ω

. This paper proposes a joint decision-making criterion which allows to integrate such extra possibilistic knowledge within a probabilistic decision-making framework taking into account both types of information: possibilistic and probabilistic. In spite of the fact that possibility theory deals with uncertainty, which means that a unique but unknown elementary decision is to occur, and the ultimate goal is to determine this decision, there are relatively few studies that tackle that decision-making issue [18,19,20,21,22,23,24,25]. We must however mention the considerable contributions of Dubois and Prade [26] on possibility theory as well as on clarification on the various semantics of fuzzy sets [27,28,29,30]. Denoeux et al. [31,32,33] contributed as well significantly on that topic but they consider epistemic uncertainty as a higher order uncertainty upon probabilistic models such as in imprecise probabilities of Walley [34,35] and fuzzy sets type-2 [36,37,38] which is not being the case in this current paper.

The paper is organized this way. A brief recall of the Bayesian decision-making criteria, and of possibility theory is given in Section 2 and Section 3. Three major possibilistic decision making criteria, i.e., maximum possibility, maximum necessity measure and confidence index maximization, are being detailed in Section 4. The PML criterion is presented in Section 5 followed by its evaluation in Section 6. Paper closes with conclusion in Section 7.

2. Hard Decision in the Bayesian Framework

In the Bayesian classification framework, the most widely used hard decision is based on minimizing an overall decision risk function [14]. Assuming

o \in Ψ

is the pattern for which the feature vector

x \in Θ

is observed, let

λ_{m, n}

denotes a “predefined” conditional loss or penalty, incurred for deciding that the observed pattern o is associated with the decision C_n, whereas the true decision (class or category) for o is

C_{m} (n, m ϵ {1, \dots, M})

. Therefore, the probabilistic expected loss

R (C_{n} | x)

, also called the Conditional risk, associated with the decision

C_{n}

given the observed feature vector

x \in Θ

, is given by:

R (C_{n} | x) = E {λ_{m, n}} = \sum_{m = 1}^{M} λ_{m, n} P r {C_{m} | x}

(1)

where

E {\cdot}

stands for the mathematical expectation. Bayes decision criterion consists in minimizing the overall risk R, also called Bayes risk, as defined in (2), by computing the conditional risk for all decisions and then, selecting the decision C_n for which

R (C_{n} | x)

is minimum:

R (C_{n} | x) = E_{x} {R (C_{n} | x)} = \int R (C_{n} | x) P r {x} d x

(2)

Therefore, the minimum-risk Bayes decision criterion is based on the selection of the decision C_n which gives the smallest risk

R (C_{n} | x)

. This rule can thus be formulated as follows:

D e c i s i o n [x (p)] = \arg \min_{n = 1, \dots, M} (\sum_{m = 1}^{M} λ_{m, n} P r {C_{m} | x})

(3)

If

P r {C_{m}}

denotes the a priori probability of the decision

C_{m}

and

P r {x | C_{m}}

, the likelihood function of the measured feature vector x, given the decision C_m, then using Bayes’ rule, the minimum-risk Bayes decision criterion (3) can be rewritten as:

D e c i s i o n [x (p)] = \arg \min_{n = 1, \dots, M} {(\sum_{m = 1}^{M} P r {C_{m}} P r {x | C_{m}})}_{}

(4)

In the two-category decision case, i.e.,

Ω = {C_{1}, C_{2}}

, it can be easily shown that the minimum-risk Bayes decision criterion, simply called Bayes criterion, can be expressed as in (5):

L R = \frac{P r {x | C_{1}}}{P r {x | C_{2}}} \begin{matrix} \overset{D e c i s i o n [x (p)] = C_{1}}{\overset{︷}{>}} \\ \underset{D e c i s i o n [x (p)] = C_{2}}{\underset{︸}{<}} \end{matrix} \underset{η}{\underset{︸}{\begin{matrix} \frac{λ_{2, 2} - λ_{2, 1}}{λ_{1, 1} - λ_{1, 2}} & \cdot & \frac{P r {C_{2}}}{P r {C_{1}}} \end{matrix}}}

(5)

In other words, this decision criterion consists of comparing the likelihood ratio (LR)

P r {x | C_{1}} / P r {x | C_{2}}

to a threshold η independent of the observed feature vector x. The binary cost, or zero-one loss, assignment is commonly used in classification problems. This rule, expressed in (6), gives

λ_{m, n}

no cost for a correct decision (when the true pattern class/decision

C_{m}

is identical to the decided class/decision

C_{n}

) and a unit cost for a wrong decision (when the true class/decision

C_{m}

is different from the decided class/decision

C_{n}

).

λ_{m, n} = {\begin{matrix} \begin{matrix} 0 & if & C_{m} = C_{n} \end{matrix} \\ \begin{matrix} 1 & if & C_{m} \neq C_{n} \end{matrix} \end{matrix}

(6)

It should be noticed that this binary cost assignment considers all errors as equally costly. It also leads to express the conditional risk as:

R (C_{n} | x) = \sum_{m = 1}^{M} λ_{m, n} P r {C_{m} | x} = 1 - P r {C_{n} | x}

(7)

A decision minimizing the conditional risk

R (C_{n} | x)

becomes a decision maximizing the a posteriori probability

P r {C_{n} | x}

. As shown in (8), this version of the Bayes criterion is called the maximum a posteriori criterion (MAP) since it seeks to determine the decision maximizing the a posteriori probability value. It is also obvious that this decision process corresponds to the minimum-error decision rule which leads to the best recognition rate that a decision criterion can achieve:

D e c i s i o n_{M A P} [x (p)] = \arg \max_{n = 1, \dots M} P r {C_{n} | x}

(8)

When the decisions a priori probabilities

P r {C_{m}}

and the likelihood functions

P r {x | C_{m}}

are not available, or simply difficult to obtain, the Minmax Probabilistic Criterion (MPC) can be an interesting alternative to the minimum-risk Bayes decision criterion [39]. As expressed in (9), this hard decision criterion consists in selecting the decision that minimizes the maximum decision cost:

D e c i s i o n_{M P C} [x (p)] = \arg \min_{n = 1, \dots, M} [\max_{m = 1, \dots, M} {λ_{m, n} P r {C_{m} | x}}]

(9)

3. Brief Review of Possibility Theory

Possibility theory is a relatively new theory devoted to handle uncertainty in the context where the available knowledge is only expressed in an ambiguous form. This theory was first introduced by Zadeh in 1978 as an extension of fuzzy sets and fuzzy logic theory, to express the intrinsic fuzziness of natural languages as well as uncertain information [7]. It is well established that probabilistic reasoning, based on the use of a probability measure, constitutes the optimal approach dealing with uncertainty. In the case where the available knowledge is ambiguous and encoded by a membership function, i.e., a fuzzy set, defined over the decisions set, the possibility theory transforms the membership function into a possibility distribution

π

. Then the realization of each event (subset of the decisions set) is bounded by a possibilistic interval defined though a possibility,

Π

, and a necessity, N, measures [16]. The use of these two dual measures in possibility theory makes the main difference compared with the probability theory. Besides, possibility theory is not additive in terms of beliefs combination, and makes sense on ordinal structures [17]. In the following subsections, the basic concepts of a possibility distribution and the dual possibilistic measures (possibility and necessity measures) will be presented. The possibilistic decision rules will be detailed in Section 4. Full details can be found in [11].

3.1. Possibility Distribution

Let

Ω = {C_{1}, C_{2}, \dots, C_{M}}

be a finite and exhaustive set of M mutually exclusive elementary decisions (e.g., decisions, thematic classes, hypothesis, etc.). Exclusiveness means that one and only one decision may occur at one time, whereas exhaustiveness states that the occurring decision certainly belongs to

Ω

. Possibility theory is based on the notion of possibility distribution denoted by

π

, which maps elementary decisions from

Ω

to the interval [0, 1], thus encoding “our” state of knowledge or belief, on the possible occurrence of each class

C_{m} \in Ω

. The value

π (C_{m})

represents to what extent it is possible for

C_{m}

to be the unique occurring decision. In this context, two extreme cases of knowledge are given:

■: Complete knowledge: $\exists! C_{m} \in Ω, π (C_{m}) = 1$ and $π (C_{n}) = 0, \forall C_{n} \in Ω, C_{n} \neq C_{m}$ .
■: Complete ignorance: $\forall C_{m} \in Ω, π (C_{m}) = 1$ (all elements from $Ω$ are considered as totally possible). $π (\cdot)$ is called a normal possibility distribution if it exists at least one element $C_{m_{0}}$ from $Ω$ such that $π (C_{m_{0}}) = 1$ .

3.2. Possibility and Necessity Measures

Based on the possibility distribution concept, two dual set measures, possibility,

Π

, and a necessity, N, measures are derived. For every subset (or event)

A \subseteq Ω

, these measures are defined by:

Π (A) = \max_{C_{m} \in A} [π (C_{m})]

(10a)

N (A) = 1 - Π (A^{c}) = \min_{C_{m} \notin A^{c}} [1 - π (C_{m})]

(10b)

where

A^{c}

denotes the complement of the event A (i.e.,

A \cup A^{c} = Ω with A \cap A^{c} = \emptyset

).

The possibility measure

Π (A)

estimates the level of consistency about event A occurrence, given the available knowledge encoded by the possibility distribution

π

. Thus,

Π (A) = 0

means that A is an impossible event while

Π (A) = 1

means that the event A is totally possible. The necessity measure N(A) evaluates the level of certainty about event A occurrence, involved by possibility distribution

π

.

N (A) = 0

means that the certainty about the occurrence of A is null. On the contrary,

N (A) = 1

means that the occurrence of A is totally certain. In a classification problem, where each decision

C_{m}

refers to a given class or category, the case where all events A are composed of a single decision

(A_{m} = {C_{m}}, m = 1, \dots, M)

, is of particular interest. In this case, the possibility

Π (\cdot)

, and the necessity N(

\cdot

), measures are reduced to:

Π (A_{m}) = Π ({C_{m}}) = π (C_{m})

(11a)

N (A_{m}) = N ({C_{m}}) = 1 - Π ({C_{m}}^{c}) = 1 - \max_{n \neq m} π (C_{n})

(11b)

4. Decision-Making in the Possibility Theory Framework

In this section, we will investigate existing possibilistic decision-making rules. Two families of rules can be distinguished: rules based on the direct use of the information encapsulated in the possibility distribution, and rules based on the use of uncertainty measures associated with this possibility distribution. Let

Ω = {C_{1}, C_{2}, \dots, C_{M}}

be a finite and exhaustive set of M mutually exclusive elementary decisions. Given an observed pattern

o \in Ψ

for which the feature vector

x \in Θ

is observed, let

π_{x} (C_{m})

denotes the a posteriori possibility distribution

π (C_{m} | x)

defined on

Ω

. The possibility,

Π_{x} ({C_{m}})

, and necessity,

N_{x} ({C_{m}})

, measures are obtained as expressed in Equation (11), using the possibility distribution

π_{x} (C_{m})

.

4.1. Decision Rule Based on the Maximum of Possibility

The decision rule based on the maximum of possibility is certainly the most widely used in possibilistic classification—decision-making applications. Indeed, as shown in (12), this rule is based on the selection of the elementary decision

C_{m_{0}} \in Ω

having the highest possibility degree of occurrence

Π_{x} ({C_{m_{0}}})

:

D e c i s i o n [x (p)] = C_{m_{0}} if and only if m_{0} = \arg \max_{n = 1, \dots M} Π_{x} ({C_{m}})

(12)

A “first” mathematical justification of this “intuitive” possibilistic decision-making rule can be derived from the Minmax Probabilistic Criterion (MPC), Equation (9), using a binary cost assignment rule. Indeed, ‘converting’ the a posteriori possibility distributions

π_{x} (\cdot)

into a posteriori probability distributions

P r {\cdot | x}

is assumed to respect the three following constraints [30]: (a) the consistency principle, (b) the preference ordering preservation, and (c) the least commitment principle. The preference ordering preservation, on which we focus the attention here, means that if decision

C_{m_{1}}

is preferred to decision

C_{m_{2}}

, i.e.,

π_{x} (C_{m_{1}}) > π_{x} (C_{m_{2}})

, then the a posteriori probability distribution

P r {\cdot | x}

obtained from

π_{x} (\cdot)

should satisfy

P r {C_{m_{1}} | x} > P r {C_{m_{2}} | x}

. Equation (13) sums up this preference ordering preservation constraint:

π_{x} (C_{m_{1}}) > π_{x} (C_{m_{2}}) ⟺ P r {C_{m_{1}} | x} > P r {C_{m_{2}} | x}

(13)

Therefore, selecting the decision maximizing the a posteriori probability or selecting the decision maximizing the a posteriori possibility decision is identical: using the MPC associated with the binary cost assignment rule or using the maximum possibility decision rule led to an identical result as expressed in (14).

D e c i s i o n [x (p)] = C_{m_{0}} iff m_{0} = \arg \max_{m = 1, \dots M} P r {C_{m} | x} = \max_{m = 1, \dots M} π_{x} (C_{m})

(14)

This decision-making criterion is called the Naive Bayes style possibilistic criterion Refs. [40,41,42] and most ongoing efforts are oriented into the computation of the a posteriori possibility values using numerical data [43]. An extensive study of properties and equivalence between possibilistic and probability approaches is presented in [20]. Notice that this decision rule, strongly inspired from probabilistic decision reasoning, does not provide a hard decision mechanism when several elementary decisions have the same maximum possibility measure.

4.2. Decision Rule Based on Maximizing the Necessity Measure

It is worthwhile to notice that the a posteriori measures of possibility

Π_{x}

and necessity

N_{x}

coming from a normal a posteriori possibility distribution

π_{x} (\cdot)

, constitute a bracketing for the a posteriori probability distribution

P r {\cdot | x}

[17]:

N_{x} ({C_{m}}) = 1 - \max_{\begin{matrix} n \neq m \\ n = 1 . \dots, M \end{matrix}} π_{x} (C_{n}) \leq P r {C_{m} | x} \leq Π_{x} ({C_{m}}) = π_{x} (C_{m})

(15)

Therefore, the maximum possibility decision criterion can be considered as an optimistic decision criterion as it maximizes the upper bound of the a posteriori probability distribution. On the contrary, a pessimistic decision criterion based on maximizing the a posteriori necessity measure can be considered as a maximization of the lower bound of the a posteriori probability distribution. Equation (16) expresses this pessimistic decision criterion:

D e c i s i o n [x (p)] = C_{m_{0}} iff m_{0} = \arg \max_{n = 1, \dots M} [N_{x} ({C_{m}})]

(16)

The question that we must raise concerns the “links” between the optimistic and the pessimistic decision criteria. Let us consider the a posteriori possibility distribution

π_{x} (\cdot)

for which

C_{m_{1}} (resp . C_{m_{2}})

is the “winning decision” obtained using the maximum possibility (resp. necessity measure) decision criteria as given in (17):

π_{x} (C_{m_{1}}) = \max_{m} π_{x} (C_{m}) a n d N_{x} ({C_{m_{2}}}) = \max_{m} N_{x} ({C_{m}})

(17)

The following important question can be formulated as follows: “Is the winning decision

C_{m_{1}}

(according to the maximum possibility criterion) is the same as the winning decision

C_{m_{2}}

according to maximum necessity measure criterion?”

First, notice that if several elementary decisions share the same maximum possibility value

υ = π_{x} (C_{m_{1}})

, then, the necessity measure becomes a useless decision criterion since:

N_{x} ({C_{m}}) = 1 - \max_{k \neq m} π_{x} (C_{k}) = 1 - υ for all the elementary decisions .

Now, suppose that only one decision

C_{m_{1}}

assumes the maximum possibility value

υ = π_{x} (C_{m_{1}})

, it is important to raise the question whether the decision

C_{m_{1}}

will (or will not) be the decision assuming the maximum necessity measure value. Let us note v’, the possibility value for the “second best” decision according to the possibility value criterion. As

C_{m_{1}}

is the unique decision having the maximum possibility value

υ

, we have

υ^{'} < υ

. Therefore, as shown in (18), the necessity measure value

N_{x} ({C_{m}})

only gets maximum for the decision

C_{m_{1}}

since

1 - υ^{'} > 1 - υ

.

N_{x} ({C_{m}}) = 1 - \max_{k \neq m} π_{x} (C_{k}) = {\begin{matrix} \begin{matrix} 1 - υ^{'} if & m = \end{matrix} m_{1} \\ \begin{matrix} 1 - υ if & m \neq \end{matrix} m_{1} \end{matrix}

(18)

As a conclusion, when the maximum necessity measure criterion is useful for application (i.e., only one elementary decision assumes the maximum possibility value), then, both decision criteria (maximum possibility and maximum necessity) produce the same winning decision. In order to illustrate the difference between the maximum possibility and the maximum necessity measure criteria, Figure 2 presents an illustrative example.

Figure 2. Comparative example of the maximum possibility and maximum necessity measures decision criteria using four a posteriori possibility distributions.

In Figure 2 example, four different a posteriori possibility distributions

π_{1}, π_{2}, π_{3}, π_{4}

, all defined on a five elementary decisions set

Ω = {C_{1}, C_{2}, C_{3}, C_{4}, C_{5}}

are considered. The necessity measures

N_{k} ({C_{m}})

have been computed from the corresponding possibility distribution

π_{k}

. The underlined values indicate which decisions result from the maximum possibility decision criterion as well as the maximum necessity measure decision criterion, for the four possibility distributions

π_{k}

. Note that the necessity measure assumes at most two values whatever the considered possibility distribution. When the a posteriori possibility distribution has one and only one decision having the highest possibility degree, then both decision rules produce the same winning decision. This is the case of the normal possibility distribution

π_{1}

as well as the subnormal possibility distribution

π_{3}

, indicated as cases (a) and (c) in Figure 2.

When several elementary decisions share the same highest possibility degree, then the maximum possibility decision criterion can randomly select one of these potential winning decisions. In this case, the maximum necessity measure decision criterion will affect a single necessity measure degree to all elementary decisions from

Ω

, and thus, it will be impossible to select any of the potential winning decisions. This behavior can be observed with a normal possibility distribution

π_{2}

as well as with a subnormal possibility distribution (like

π_{4}

), cases (b) and (d) in Figure 2. This example clearly shows the weakness of the decisional capacity of the maximum necessity measure decision criterion when compared to the maximum possibility decision criterion.

4.3. Decision Rule Based on Maximizing the Confidence Index

Other possibilistic decision rules based on the use of uncertainty measures are also encountered in literature. The most frequently used criterion (proposed by Kikuchi et al. [44]) is based on the maximization of the confidence index Ind defined as a combination of the possibility and the necessity measures for each event

A \subseteq Ω

, given a possibility distribution

π (\cdot)

:

I n d : 2^{Ω} ⟶ [- 1, + 1]

A ⟶ I n d (A) = Π (A) + N (A) - 1, \forall A \subseteq Ω

(19)

where

2^{Ω}

denotes the power set of

Ω

, i.e., the set of all subsets from

Ω

.

For an event A, this index ranges from −1 to +1:

-: Ind(A) = −1, iff $Π (A) = N (A) = 0$ (the occurrence of A is totally impossible and uncertain);
-: Ind(A) = +1, iff $Π (A) = N (A) = 1$ (the occurrence of A is totally possible and certain).

Restricting the application of this measure to events

A_{m}

having only one decision

A_{m} = {C_{m}}

shows that Ind(

A_{m}

) measures the difference between the possibility measure of the event

A_{m}

(which is identical to the possibility degree of the decision

C_{m}

) and the highest possibility degree of all decisions contained in

{(A_{m})}^{c}

(the complement of

A_{m}

in

Ω

):

I n d (A_{m}) = Π (A_{m}) + N (A_{m}) - 1 = π (C_{m}) - \max_{m \neq n} π (C_{n})

(20)

Therefore, if

A_{m_{0}} = {C_{m_{0}}}

is the only event having the highest possibility measure value

π (C_{m_{0}})

, then,

A_{m_{0}}

will be the unique event having a positive confidence index value, whereas all other events will have negative values, as illustrated in Figure 3 where we assume

π (C_{m_{0}}) > π (C_{m}), \forall m \neq m_{0}

, and,

C_{m_{1}}

refers to the decision having the second highest possibility degree.

Figure 3. Confidence indices associated with different decisions (

A_{m_{0}}

: event having the highest possibility degree,

A_{m_{1}}

: event with the second highest possibility degree). (Source: [11]).

In a classification decision-making problem, the decision criterion associated with this index can be formulated as follows:

D e c i s i o n_{I n d} [x (p)] = A_{m_{0}} iff I n d (A_{m_{0}}) = \max_{m = 1, \dots M} [I n d (A_{m})]

(21)

The main difference between the maximum possibility and the maximum confidence index decision criteria lies in the fact that the maximum possibility decision criterion is only based on the maximum possibility degree whereas the maximum confidence index decision criterion is based on the difference between the two highest possibility degrees associated with the elementary decisions. As already mentioned, it is important to notice that the event

A_{m_{0}} = {C_{m_{0}}}

having the highest possibilistic value, will be the unique event producing a positive confidence index measuring the difference with the second highest possibility degree. All other events

A_{m} = {C_{m}}, \forall m \neq m_{0}

, will produce negative confidence indices.

When several decisions share the same highest possibility degree, their confidence index (the highest one) will be null. This shows the real capacity of this uncertainty measure for the decision-making process. However, this criterion brings the same resulting decisions as the two former ones.

5. Possibilistic Maximum Likelihood (PML) Decision Criterion

In the formulation of the Bayesian classification approach, all information sources are assumed to have probabilistic uncertainty where the available knowledge describing this uncertainty is expressed, estimated or evaluated in terms of probability distributions. In the possibilistic classification framework, the information sources are assumed to suffer from possibilistic (or epistemic) uncertainty where the available knowledge describing this uncertainty is expressed in terms of possibility distributions. In this section, the Bayesian pattern recognition framework is generalized in order to integrate both probabilistic and epistemic sources of knowledge. A joint probabilistic—possibilistic decision criterion called Possibilistic Maximum Likelihood (PML) is proposed to handle both types of uncertainties.

5.1. Sources with Probabilistic and Possibilistic Types of Uncertainties

In some situations, an object from the observation space is observed through several feature sets. This is the case, for instance, in multi-sensor environment for classification applications. In such situations, the information available for the description of the feature vectors may be of different natures: probabilistic, epistemic, etc. Yager [24,45,46] addresses the same sort of problems: multi-source uncertain information fusion in the case when the information can be both from hard sensors of a probabilistic type and from soft knowledge-expert linguistic source of a possibilistic type. He uses t-norms (‘and’ operations) to combine possibility and probability measures. As will be explained below, Yager’s product of possibilities and probabilities coincides with our ‘decision variables’ optimized through the proposed PML approach.

Let us consider the example illustrated in Figure 4, where each pattern o (from the patterns set

Ψ

) is “observed” through two channels. Source 1 (resp. source 2) measures a sub-feature vector

x_{1} \in Θ_{1} (resp . x_{2} \in Θ_{2})

. Therefore, the resulting feature vector

x (o)

is obtained as the concatenation of the two sub-feature vectors:

x (o) = [x_{1} x_{2}]

. In this configuration, the available information in the sub-feature vector x₁ (resp. x₂) undergoes probabilistic (resp. epistemic) uncertainties and is encoded as an a posteriori probability soft decision label vector

ℓ_{C_{n}}^{1} = P r {C_{n} | x_{1}}, n = 1, 2, \dots, M

(resp. a posteriori possibility soft decision label vector

ℓ_{C_{n}}^{2} = π_{x_{2}} (C_{n}), n = 1, 2, \dots, M)

.

Figure 4. Multi-source information context for pattern classification.

As an example, in a remote sensing system, Source 1 may be considered as a multispectral imaging system, where all potential a posteriori probability distributions,

P r {C_{n} | x_{1}}, n = 1, 2, \dots, M

, are assumed to be known and well established. The second sensor, Source 2, could be a radar imaging system where the available information concerning the different thematic classes is expressed by an expert using ambiguous linguistic variables like: “the thematic class

C_{n}

is observed as “Bright”, “Slightly Dark”, etc. in the sub-feature set

Θ_{2}

”. Each linguistic variable can be used to generate an a posteriori possibility distribution associated with each thematic class

π_{x_{2}} (C_{n}), n = 1, 2, \dots, M

.

5.2. Possibilistic Maximum Likelihood (PML) Decision Criterion: A New Hybrid Criterion

In the Bayesian decision framework, detailed in previous sections, the binary cost assignment approach suffers from two constraints. On one hand, all errors are considered as equally costly: the penalty (or cost) of misclassifying an observed pattern o as being associated with a decision

C_{n}

whereas the true decision for “o” is

C_{m}

is the same (unit loss). This situation does not reflect real applications constraints. For instance, deciding that an examined patient is healthy whereas he suffers a cancer is much more serious than the other way around. On the other hand, the loss function values

λ_{m, n}, \forall m, n \in {1, 2, \dots, M}

are static (or, predefined) and do not depend on the feature vectors of the observed patterns. The possibilistic maximum likelihood (PML) criterion, proposed in this paper, is based on the use of the epistemic source of information (the a posteriori possibility distribution, defined on the sub-feature space

Θ_{2}

) in order to define possibilistic loss values and to inject, afterwards, these values into the Bayesian decision criterion.

Assume that, for each object

o \in Ψ

, the observed feature vector is given by

x (o) = [x_{1} x_{2}] \in Θ_{1} \times Θ_{2}

, and denote

P r {\cdot | x_{1}} (resp . π_{x_{2}} (\cdot)

) as encoding the a posteriori probability (resp. possibility) soft decision label vectors defined over the sub-feature set

Θ_{1}

(resp.

Θ_{2}

). The proposed PML criterion relies on the use of loss values

λ_{m, n}

ranging from −1 (i.e., no loss) to +1 (i.e., maximum loss), and

λ_{m, n}

refers to the risk of choosing

C_{n}

whereas the real decision for the considered pattern is

C_{m}

. Depending on the epistemic information available through Source 2, the proposed loss values are given by:

λ_{m, n} = {\begin{matrix} \begin{matrix} \begin{matrix} \max_{k \neq m} π_{x_{2}} (C_{k}) = & Π_{x_{2}} ({C_{m}}^{c}) \end{matrix} & \forall m \neq n \end{matrix} \\ \begin{matrix} \max_{k \neq m} π_{x_{2}} (C_{k}) - π_{x_{2}} (C_{m}) = - I n d_{x_{2}} ({C_{m}}) & if m = n \end{matrix} \end{matrix}

(22)

In the case of a wrong decision, the decision penalty values, i.e.,

λ_{m, n}

where

\neq m

, are considered as positive loss values ranging in the interval:

0 \leq

λ_{m, n}

=

\max_{k \neq m} π_{x_{2}} (C_{k}) \leq 1 .

Thus, the wrong decision unit cost in the framework of binary-cost assignment, is “softened”, in this possibilistic approach, and assumes its maximum value, i.e., unit cost, only when the wrong decision

C_{n}

has a total possibility degree of occurrence.

When a correct decision

C_{m}

is selected, the zero-loss value (used by the binary cost assignment approach) is substituted by

λ_{m, n} = \max_{k \neq m} π_{x_{2}} (C_{k}) - π_{x_{2}} (C_{m}) = - I n d_{x_{2}} ({C_{m}})

. If the occurrence possibility degree

π_{x_{2}} (C_{m})

of the true decision

C_{m}

, is the highest degree

π_{x_{2}} (C_{m}) > \max_{k} π_{x_{2}} (C_{k})

, then the resulting loss value

λ_{m, n}

becomes negative. The smallest penalty value is reached, i.e.,

λ_{m, n} = - 1

, when

π_{x_{2}} (C_{m}) = 1

(i.e., true decision

C_{m}

has a total possibility degree of occurrence), with a null possibility degree of occurrence for all the remaining decisions (leading to

\max_{m \neq k} π_{x_{2}} (C_{k}) = 0

). Two special cases are present:

(1): If the true decision $C_{m}$ shares the same maximum possibility value with, at least one different wrong decision $C_{m}$ , then, the correct decision $C_{m}$ loss value becomes null $λ_{m, n} = \max_{k \neq m} π_{x_{2}} (C_{k}) - π_{x_{2}} (C_{m}) = 0$ ;
(2): If the true decision $C_{m}$ does not produce the maximum occurrence possibility degree, i.e., $π_{x_{2}} (C_{m}) < \max_{k \neq m} π_{x_{2}} (C_{k})$ , then the loss value $λ_{m, n}$ is positive and will increase the conditional risk, associated with the true decision $C_{m}$ .

Using the proposed possibilistic loss values, the conditional risk

R (C_{k} | x_{1})

of choosing decision C_k can thus be computed as follows:

R (C_{k} | x) = - I n d_{x_{2}} ({C_{k}}) \cdot P r {C_{k} | x_{1}} + \sum_{\begin{matrix} i = 1 \\ i \neq k \end{matrix}}^{M} Π ({C_{m}}^{c}) \cdot P r {C_{i} | x_{1}}

(23)

As already mentioned, Bayes decision criterion computes the conditional risk for all decisions, then, selects the decision

C_{n}

for which

R (C_{n} | x)

is minimum. Based on Equation (23), and to select the minimum conditional risk decision, the comparison of conditional risks related to two decisions

C_{k}

and

C_{p}

, can be straightforward performed leading to:

R (C_{k} | x_{1}) \leq R (C_{p} | x_{1}) \Leftrightarrow π_{x_{2}} (C_{k}) \cdot P r {C_{k} | x_{1}} \geq π_{x_{2}} (C_{k}) \cdot P r {C_{p} | x_{1}}

(24)

Therefore, the application of the PML criterion, for the selection of the minimum conditional risk decision (out of M potential elementary decisions) can be simply formulated by the following decision rule:

D e c i s i o n [x (p)] = \arg \max_{n = 1, \dots M} π_{x_{2}} (C_{n}) \cdot P r {C_{n} | x_{1}}

(25)

This “intuitive” decision criterion allows the joint use both probabilistic and epistemic sources of information in the very same Bayesian minimum risk framework. As an example, the application of the proposed possibilistic loss values in the two-class decision case, where

Ω = {C_{1}, C_{2}}

, leads to the following loss matrix [

λ

]:

[λ] = [\begin{matrix} λ_{1, 1} & λ_{1, 2} \\ λ_{2, 1} & λ_{2, 2} \end{matrix}] = [\begin{matrix} π_{x_{2}} (C_{2}) - π_{x_{2}} (C_{1}) & π_{x_{2}} (C_{2}) \\ π_{x_{2}} (C_{1}) & π_{x_{2}} (C_{1}) - π_{x_{2}} (C_{2}) \end{matrix}]

(26)

The use of this loss matrix [

λ

] into the minimum-risk Bayes decision approach (as defined in (5)), leads to express the PML decision as follows:

\frac{P r {x_{1} | C_{1}}}{P r {x_{1} | C_{2}}} \begin{matrix} \overset{D e c i s i o n [x (p)] = C_{1}}{\overset{︷}{>}} \\ \underset{D e c i s i o n [x (p)] = C_{2}}{\underset{︸}{<}} \end{matrix} \begin{matrix} \frac{π_{x_{2}} (C_{2})}{π_{x_{2}} (C_{1})} & \cdot & \frac{P r {C_{2}}}{P r {C_{1}}} \end{matrix}

(27)

Notice that when the proposed possibilistic loss values are considered, then the PML induces a “weighting adjustment” of the a priori probabilities where the weighting factors are simply the a posteriori possibility degrees issued from the possibilistic Source 2. In the case of equal a priori probabilities,

P r {C_{2}} = P r {C_{1}}

, this decision criterion turns to an intuitive form using jointly probabilistic and epistemic sources of information, in the Bayesian minimum risk framework as shown by:

π_{x_{2}} (C_{1}) \cdot P r {x_{1} | C_{1}} \begin{matrix} \overset{D e c i s i o n [x (p)] = C_{1}}{\overset{︷}{>}} \\ \underset{D e c i s i o n [x (p)] = C_{2}}{\underset{︸}{<}} \end{matrix} π_{x_{2}} (C_{2}) \cdot P r {x_{1} | C_{2}}

(28)

It is worthwhile to notice that when the two following conditions prevail:

when the available probabilistic information (issued from source 1) is non-informative; and,
when the only meaningful and available information is reduced to the epistemic expert information on the sub-feature vector issued from source 2; then, the proposed PML criterion is simply reduced to the maximum possibility decision criterion:

$π_{x_{2}} (C_{1}) \begin{matrix} \overset{D e c i s i o n [x (p)] = C_{1}}{\overset{︷}{>}} \\ \underset{D e c i s i o n [x (p)] = C_{2}}{\underset{︸}{<}} \end{matrix} π_{x_{2}} (C_{2})$

(29)

This raises a fundamental interpretation of the maximum possibility decision criterion as being a very special case of the possibilistic Bayesian decision making process under the total ignorance assumption of the probabilistic source of information.

5.3. PML Decision Criterion Behavior

Let

S_{1}

denotes a probabilistic source of information measuring a sub-feature vector

x_{1} \in Θ_{1}

and attributing to each elementary decision

C_{m}, m = 1, 2, \dots, M

, an a posteriori probability soft decision label

P r {C_{m} | x_{1}}

. Under the assumption of equal a priori probabilities and using the binary-cost assignment, the application of the maximum a posteriori criterion (MAP), Equation (8), turns to be the “optimal” criterion ensuring the minimum-error decision rate.

Assume that an additional possibilistic source of information,

S_{2}

, (measuring a sub-feature vector

x_{2} \in Θ_{2})

is available, see Figure 4. Based on the use of the sub-feature vector

x_{2} \in Θ_{2}

,

S_{2}

attributes to each elementary decision,

C_{m}

, an a posteriori possibility soft decision label

π_{x_{2}} (C_{m}), m = 1, 2, \dots, M

. To obtain a hard decision, the application of the maximum of possibility decision criterion, Equation (12), is considered.

In the previous section, we have proposed the possibilistic maximum likelihood, PML, decision criterion, Equation (25), as a hybrid decision criterion allowing the coupled use of both sources of information,

S_{1}

and

S_{2}

, by considering the possibilistic information issued from

S_{2}

, i.e.,

π_{x_{2}} (C_{m}), m = 1, 2, \dots, M

, for the definition of the loss values in the framework of the minimum-risk Bayes decision criterion (instead of the use of the binary-cost assignment approach). In this section, we will briefly discuss, from a descriptive point of view through an illustrative example, the “decisional behavior” of the PML criterion when compared to the decisions obtained with the “individual” application of the MAP and the maximum of possibility decision criteria.

First, it is worthwhile to notice that the “decision variable” to be maximized by the PML criterion is simply the direct product

υ (C_{m}) = π_{x_{2}} (C_{m}) \cdot P r {C_{m} | x_{1}}, m = 1, 2, \dots, M

, which is a T-norm fusion operator (considering both probabilistic and possibilistic information as two “similar” measures of the degree of truthfulness related to the occurrence of different elementary decisions, see also p.101 of Yager [24]). This also means that both sources of information,

S_{1}

and

S_{2}

, are considered as having the same informative level. It is also important to notice that the PML criterion, as a decision fusion operator merging decisional information from both sources,

S_{1}

and

S_{2}

, constitutes a coherent decision fusion criterion in the sense that:

-: when both sources $S_{1}$ and $S_{2}$ are in full agreement (i.e., leading to the same decision $C_{m_{0}}$ ), then, the decision obtained by the application of the PML criterion will be the same as $C_{m_{0}}$ ;
-: when one of the two sources $S_{1}$ and $S_{2}$ suffers from total ignorance (i.e., producing equal a posteriori probabilities, for $S_{1}$ and equal a posteriori possibilities, for $S_{2}$ ), then the PML criterion will “duplicate” the same elementary decision as the one proposed by the remaining reliable source of information;
-: when the two sources $S_{1}$ and $S_{2}$ lack decisional agreement, then, the decision obtained by the application of the PML criterion will be the most “plausible” elementary decision that may be different from individual decisions resulting from the MAP (resp. maximum possibility) criterion using the sub-feature vector $x_{1} \in Θ_{1}$ (resp. sub-feature vector $x_{2} \in Θ_{2}$ ).

This decision fusion coherence is illustrated through the examples given in Table 1. The decisions set is formed by five elementary decisions, i.e.,

Ω = {C_{1}, C_{2}, C_{3}, C_{4}, C_{5}}

, and we assume that, given the observed feature

x_{1} \in Θ_{1}

, the probabilistic source

S_{1}

produces the following a posteriori probability distribution:

P r {\cdot | x_{1}}

= [0.1 0.4 0.1 0.3 0.1]. Each example fits in one sub-array which presents

S_{1}

and

S_{2}

specific configuration (

P r {\cdot | x_{1}}

,

π_{x_{2}} (\cdot)

and

υ = P r {\cdot | x_{1}} \cdot π_{x_{2}} (\cdot))

with the resulting decision for each decision parameter. The cases presented in Table 1 are explained as the following:

Table 1. PML decision making behavior for several cases.

Case 1: when both sources $S_{1}$ and $S_{2}$ agree, with a winning decision $C_{2}$ , the PML criterion maintains this agreement and obtains the same decision, $C_{2}$ .
Cases 2 and 3: it shows that when one of the two sources presents a total ignorance, then the PML criterion “duplicates” the same elementary decision as the one offered by the remaining reliable source of information.
Cases 4, 5 and 6: when sources $S_{1}$ and $S_{2}$ lack agreement (i.e., dissonant sources), then, the resulting decision obtained through the application of the PML criterion is the most reasonable decision. That may not necessarily be one of the winning decisions offered by the two sources (this is specifically shown in case 6).

6. Experimental and Validation Results

In this section, the proposed PML decision-making criterion is evaluated in a pixel-based image classification context. A synthetic scene composed of five thematic classes

Ω = {C_{1}, C_{2}, C_{3}, C_{4}, C_{5}}

is assumed to be observed through two independent imaging sensors. Sensor

S_{1}

(resp. sensor

S_{2}

) provides an image

I_{1}

(resp.

I_{2}

) of the simulated scene. The two considered sensors are assumed to be statistically independent. Without loss of generalization, pixels from both images

I_{1}

and

I_{2}

are assumed to have the same spatial resolution, thus, they represent the same observed spatial cell or object o. The value of the pixel

I_{1} (i, j)

(resp.

I_{2} (i, j)

) provides the observed feature

x_{1} (resp . x_{2})

delivered by the first (resp. second) sensor. According to sensors characteristics, the measured feature

x_{1} (resp . x_{2})

follows a Gaussian

N (m_{C}, σ_{C}^{2})

(resp. Rayleigh

ℛ (σ_{C}^{2})

) probability distribution with related parameters

m_{C}, σ_{C}^{2}

depending on the thematic class “C” of the observed object.

Figure 5 depicts the experimental simulated images

I_{1}

(resp.

I_{2}

) assumed to be delivered at the output of the two sensors. Figure 6 shows the possibility distributions encoding expert’s information, for the five thematic classes. Parameter values considered for each thematic class are given in the same figure. This configuration of classes’ parameters is considered as a reasonable configuration that may be encountered when real data is observed. Nevertheless, other configurations have been generated and the obtained results are in full accordance with those obtained by the considered configuration.

Figure 5. Two-sensors simulated images representing a scene of five thematic classes. Pixels from

I_{1}

(resp.

I_{2}

) are generated using Gaussian (resp. Rayleigh) density functions.

Figure 6. Triangular-shaped possibility distributions encoding expert’s knowledge, for the five thematic classes and the two sensors.

In addition to the previously mentioned probabilistic information, we assume that each thematic class is described, by an expert, using the “simplest” linguistic variable “Close to

v_{S, C_{k}}

” where

v_{S, C_{k}}

denotes the thematic class

C_{k}

feature mean value, observed through sensor

S_{s}

. Therefore, the only information given by the expert is

v_{1, C_{k}} = m_{C_{K}}

for sensor

S_{1}

(underlying Gaussian distributions) and

v_{2, C_{k}} = σ_{C_{K}} \sqrt{π / 2}

for sensor

S_{2}

(underlying Rayleigh distributions). For each sensor

S_{s}

and thematic class

C_{k}

, a standard triangular possibility distribution is considered to encode this epistemic knowledge with the summit positioned at the mean value

v_{S, C_{k}}

and the support covering the whole range of the features set. It is clearly seen that the possibility distributions (considered as encoding the expert’s knowledge), represent a weak knowledge which is less informative than the initial, or even estimated, probabilistic density functions.

To evaluate the efficiency of the proposed possibilistic maximum likelihood decision making criterion, the adopted procedure consists, first, on the random generation of 1000 statistical realizations of the two synthetic Gaussian and Rayleigh images (with the five considered thematic classes) representing the analyzed scene. Second, the following average pixel-based recognition rates are evaluated:

-: $τ (P r_{C_{m}}^{G})$ : Minimum-risk Bayes average pixel-based decision recognition rate, Equation (8), using zero-one loss assignment, for each thematic class $C_{m}, m = 1, 2, \dots, 5,$ based on the use of sensor $S_{1}$ Gaussian feature $x_{1}$ only.
-: $τ (π_{C_{m}}^{G})$ : Maximum possibility average pixel-based decision recognition rate, Equation (12), exploiting the epistemic expert knowledge for the description of each considered thematic class $C_{m}, m = 1, 2, \dots, 5,$ in the features set $Θ_{1}$ only.
-: $τ (P r_{C_{m}}^{G} \cdot π_{C_{m}}^{R})$ : Possibilistic maximum likelihood average pixel-based decision recognition rate, Equation (24), jointly exploiting the epistemic expert knowledge for the description of each considered thematic class $C_{m}, m = 1, 2, \dots, 5,$ in the features set $Θ_{2}$ (sensor $S_{2}$ ), and the Gaussian probabilistic knowledge for the description of the same thematic class in the features set $Θ_{1}$ (sensor $S_{1}$ ).
-: $τ (P r_{C_{m}}^{R})$ : Minimum-risk Bayes average pixel-based decision recognition rate, Equation (8), using zero-one loss assignment, for each thematic class $C_{m}, m = 1, 2, \dots, 5,$ based on the use of sensor $S_{2}$ Rayleigh feature $x_{2}$ only.
-: $τ (π_{C_{m}}^{G})$ : Maximum possibility average pixel-based decision recognition rate, Equation (12), exploiting the epistemic expert knowledge for the description of each considered thematic class $C_{m}, m = 1, 2, \dots, 5,$ in the features set $Θ_{1}$ (sensor $S_{1}$ ).
-: $τ (P r_{C_{m}}^{R} \cdot π_{C_{m}}^{G})$ : Possibilistic maximum likelihood average pixel-based decision recognition rate, Equation (24), jointly exploiting the epistemic expert knowledge for the description of each thematic class $C_{m}, m = 1, 2, \dots, 5,$ in sensor $S_{1}$ features set $Θ_{1}$ , and the Rayleigh probabilistic knowledge for the description of $C_{m}, m = 1, 2, \dots, 5,$ in sensor $S_{2}$ features set $Θ_{2}$ .
-: $τ (P r_{C_{m}}^{G} \cdot P r_{C_{m}}^{R})$ : Minimum-risk Bayes average pixel-based decision recognition rate, Equation (8), using zero-one loss assignment, for each thematic class $C_{m}, m = 1, 2, \dots, 5,$ and based on the joint use of both sensors $S_{1}$ (associated with Gaussian probabilistic knowledge in the features set $Θ_{1}$ ) and S₂ (associated with Rayleigh probabilistic knowledge in the features set $Θ_{2}$ . Sensors $S_{1}$ and $S_{2}$ are considered as being statistically independent. This criterion as well as all the criteria above have been calculated for the example in Table 2.

Table 2. PML decision average pixel-based recognition rates for the five thematic classes using various configurations of knowledge sources.

The obtained average recognition rates are summarized in Table 2 (last row). As expected, at the global scene level, the average recognition rates when a probabilistic information source is used (for modelling the observed features) are higher than those obtained by the use of epistemic knowledge (i.e.,

τ (P r^{G}) \geq τ (π^{G})

, and

τ (P r^{R}) \geq τ (π^{R})

). Nevertheless, at the thematic classes’ level, this property does not hold for some classes. This is mainly due to the fact that for “sharp classes” probability density functions, i.e., small variance, (for instance, thematic classes

C_{2}

and

C_{4}

), the possibility distributions shape used to encode the expert knowledge (i.e., a wide-based triangular possibility shape) may bias each class influence, leading to a better recognition rate to the detriment of other neighboring classes (for instance, class

C_{3}

). In this case, this leads to obtain

τ (π_{C_{2}}^{G}) > τ (P r_{C_{2}}^{G})

and

τ (π_{C_{4}}^{G}) > τ (P r_{C_{4}}^{G})

.

Poorer recognition performances of the maximum possibility decision criterion clearly come from the “weak epistemic knowledge” produced by the expert (indicating just the mean values) compared to the “strong probabilistic” knowledge involved by full probability density functions (resulting from either a priori information or the densities estimation using some learning data). The most interesting and promising result can be witnessed in terms of recognition rate improvement when the epistemic knowledge is jointly used with the probabilistic one as proposed by the PML decision criterion. Indeed, Table 2 (columns 4 and 7, bold numbers) shows that for all classes

C_{m}, m = 1, 2, \dots, 5,

we have

τ (P r_{C_{m}}^{G} \cdot π_{C_{m}}^{R}) \geq τ (P r_{C_{m}}^{G})

and

τ (P r_{C_{m}}^{R} \cdot π_{C_{m}}^{G}) \geq τ (P r_{C_{m}}^{R})

.

It is worthwhile to notice (columns 4 and 7, bold numbers) that the level of performance improvement depends on the “informative” capacity of the “additional” knowledge source. For instance, embedding the Gaussian source of knowledge (in terms of epistemic knowledge form) into the decisional process based on the probabilistic Rayleigh source of knowledge, improves much more the performance level than the reverse (i.e., embedding epistemic Rayleigh source of knowledge into the decisional process based on the probabilistic Gaussian source of knowledge):

τ (P r_{C_{m}}^{G} \cdot π_{C_{m}}^{R}) \approx

τ (P r_{C_{m}}^{G})

whereas

τ (P r_{C_{m}}^{R} \cdot π_{C_{m}}^{G}) > τ (P r_{C_{m}}^{R})

.

Finally, it is important to notice that the PML decision performances are lower-upper bounds delimited as follows:

\begin{matrix} τ (P r_{C_{m}}^{G}) \leq τ (P r_{C_{m}}^{G} \cdot π_{C_{m}}^{R}) \leq τ (P r_{C_{m}}^{G} \cdot P r_{C_{m}}^{R}) \\ τ (P r_{C_{m}}^{R}) \leq τ (P r_{C_{m}}^{R} \cdot π_{C_{m}}^{G}) \leq τ (P r_{C_{m}}^{R} \cdot P r_{C_{m}}^{G}) \end{matrix}

(30)

Given the fact that the two sources S₁ and S₂ are assumed to be statistically independent, then the joint probability distribution of the augmented feature vector

x = [x_{1} x_{2}]

is the direct product of marginal ones. This simply means that the upper bounds given in Equation (30) constitute the optimal recognition rate (obtained by considering both probabilistic sources of knowledge). Therefore, the PML criterion improves the performances of the use of a “single” probabilistic source of knowledge, and approaches for some thematic classes the optimal recognition rate upper bound (last column of Table 2).

7. Conclusions

In this paper, a new criterion for decision-making process in classification systems is proposed. After a brief recall of the Bayesian decision-making criteria, three major possibilistic decision making criteria, i.e., maximum possibility, maximum necessity measure and confidence index maximization, have been detailed. It was clearly shown that the three considered decision criteria lead to, at best, the maximum possibility decision criterion. However, the maximum possibility criterion has no physical justification. A new criterion called, possibilistic maximum likelihood (PML) framed within the possibility theory, but using notions from Bayesian decision-making, has been presented and its behavior evaluated. The main motivation being the development of such criterion is for multisource information fusion where a pattern may be observed through several channels and where the available knowledge, concerning the observed features, may be of a probabilistic nature for some features, and of an epistemic nature for some others.

In this configuration, most encountered studies transform one of the two knowledge types into the other form, and then apply either the classical Bayesian or possibilistic decision-making criteria. In this paper, we have proposed a new approach called the Possibilistic maximum likelihood (PML) decision-making approach, where the Bayesian decision-making framework is adopted and where the epistemic knowledge is integrated into the decision-making process by defining possibilistic loss values instead of the usually used zero-one loss values.

A set of possibilistic loss values is proposed and evaluated in the context of pixel-based image classification where a synthetic scene, composed of several thematic classes, was randomly generated using two types of sensors: a Gaussian and a Rayleigh sensor. The evaluation of the proposed PML criterion has clearly shown the interest of the application of PML; where the obtained recognition rates approach the optimal rates (i.e., where all the available knowledge is expressed in terms of probabilistic knowledge). Moreover, the proposed PML decision criterion offers a physical interpretation of the maximum possibility decision criterion as a special case of the possibilistic Bayesian decision-making process when all the available probabilistic information indicates equal decisions probabilities.

Author Contributions

Conceptualization, B.S., D.G., S.A. and B.A.; Methodology, B.S., D.G., B.A., and É.B.; Software, S.A. and B.A.; Supervision, B.S. and É.B.; Writing—original draft, B.S.; Writing—review & editing, and É.B. All authors have read and agreed to the final published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Dubois, D. Uncertainty theories: A unified view. In SIPTA School 08—UEE 08; SIPTA: Montpellier, France, 2008. [Google Scholar]
Klir, G.J.; Wierman, M.J. Uncertainty-Based Information: Elements of Generalized Information Theory; Physica-Verlag HD: Heidelberg, Germany, 1999. [Google Scholar]
Denoeux, T. 40 years of Dempster-Shafer theory. Int. J. Approx. Reason. 2016, 79, 1–6. [Google Scholar] [CrossRef]
Yager, R.R.; Liu, L. (Eds.) Classic Works of the Dempster-Shafer Theory of Belief Functions; Springer: Berlin, Germany, 2008; Volume 219. [Google Scholar]
Shafer, G. A Mathematical Theory of Evidence; Princeton University Press: Princeton, NJ, USA, 1976; Volume 42. [Google Scholar]
Zadeh, L.A. Fuzzy sets. Inf. Control 1965, 8, 338–353. [Google Scholar] [CrossRef]
Zadeh, L. Fuzzy Sets as the Basis for a Theory of Possibility. Fuzzy Sets Syst. 1978, 1, 3–28. [Google Scholar] [CrossRef]
Zadeh, L.A. Generalized theory of uncertainty (GTU)—principal concepts and ideas. Comput. Stat. Data Anal. 2006, 51, 15–46. [Google Scholar] [CrossRef]
Dubois, D.; Prade, H. The legacy of 50 years of fuzzy sets: A discussion. Fuzzy Sets Syst. 2015, 281, 21–31. [Google Scholar] [CrossRef]
Duda, R.O.; Hart, P.E.; Stork, D.G. Pattern Classification; John Wiley & Sons: Hoboken, NJ, USA, 2012. [Google Scholar]
Solaiman, B.; Bossé, É. Possibility Theory for the Design of Information Fusion Systems; Springer: Berlin, Germany, 2019. [Google Scholar]
Frélicot, C. On unifying probabilistic/fuzzy and possibilistic rejection-based classifiers. In Advances in Pattern Recognition; Springer: Berlin, Germany, 1998; pp. 736–745. [Google Scholar]
Solaiman, B.; Bossé, E.; Pigeon, L.; Guériot, D.; Florea, M.C. A conceptual definition of a holonic processing framework to support the design of information fusion systems. Inf. Fusion 2015, 21, 85–99. [Google Scholar] [CrossRef]
Tou, J.T.; Gonzalez, R.C. Pattern Recognition Principles; Addison-Wesley: Boston, MA, USA, 1974. [Google Scholar]
Dubois, D.; Prade, H. Possibility Theory: An Approach to Computerized Processing of Uncertainty; Plenum Press: New York, NY, USA, 1988. [Google Scholar]
Dubois, D.J. Fuzzy Sets and Systems: Theory and Applications; Academic Press: Cambridge, MA, USA, 1980; Volume 144. [Google Scholar]
Dubois, D.; Prade, H. When upper probabilities are possibility measures. Fuzzy Sets Syst. 1992, 49, 65–74. [Google Scholar] [CrossRef]
Aliev, R.; Pedrycz, W.; Fazlollahi, B.; Huseynov, O.H.; Alizadeh, A.; Guirimov, B. Fuzzy logic-based generalized decision theory with imperfect information. Inf. Sci. 2012, 189, 18–42. [Google Scholar] [CrossRef]
Buntao, N.; Kreinovich, V. How to Combine Probabilistic and Possibilistic (Expert) Knowledge: Uniqueness of Reconstruction in Yager’s (Product) Approach. Int. J. Innov. Manag. Inf. Prod. (IJIMIP) 2011, 2, 1–8. [Google Scholar]
Coletti, G.; Petturiti, D.; Vantaggi, B. Possibilistic and probabilistic likelihood functions and their extensions: Common features and specific characteristics. Fuzzy Sets Syst. 2014, 250, 25–51. [Google Scholar] [CrossRef]
Fargier, H.; Amor, N.B.; Guezguez, W. On the complexity of decision making in possibilistic decision trees. arXiv 2012, arXiv:1202.3718. [Google Scholar]
Guo, P. Possibilistic Decision-Making Approaches. In The 2007 International Conference on Intelligent Systems and Knowledge Engineering; Atlantis Press: Beijing, China, 2007; pp. 684–688. [Google Scholar]
Weng, P. Qualitative decision making under possibilistic uncertainty: Toward more discriminating criteria. arXiv 2012, arXiv:1207.1425. [Google Scholar]
Yager, R.R. A measure based approach to the fusion of possibilistic and probabilistic uncertainty. Fuzzy Optim. Decis. Mak. 2011, 10, 91–113. [Google Scholar] [CrossRef]
Yager, R.R. On the fusion of possibilistic and probabilistic information in biometric decision-making. In Proceedings of the 2011 IEEE Workshop on Computational Intelligence in Biometrics and Identity Management (CIBIM), Paris, France, 11–15 April 2011; IEEE: Piscataway Township, NJ, USA, 2011; pp. 109–114. [Google Scholar]
Bouyssou, D.; Dubois, D.; Prade, H.; Pirlot, M. Decision Making Process: Concepts and Methods; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
Dubois, D.; Foulloy, L.; Mauris, G.; Prade, H. Probability-possibility transformations, triangular fuzzy sets, and probabilistic inequalities. Reliab. Comput. 2004, 10, 273–297. [Google Scholar] [CrossRef]
Dubois, D.; Prade, H. Possibility theory: Qualitative and quantitative aspects. In Quantified Representation of Uncertainty and Imprecision; Springer: Berlin, Germany, 1998; pp. 169–226. [Google Scholar]
Dubois, D.; Prade, H. The three semantics of fuzzy sets. Fuzzy Sets Syst. 1997, 90, 141–150. [Google Scholar] [CrossRef]
Dubois, D.; Prade, H.; Sandri, S. On possibility/probability transformations. In Fuzzy Logic; Springer: Berlin, Germany, 1993; pp. 103–112. [Google Scholar]
Denœux, T. Modeling vague beliefs using fuzzy-valued belief structures. Fuzzy Sets Syst. 2000, 116, 167–199. [Google Scholar] [CrossRef]
Denœux, T. Maximum likelihood estimation from fuzzy data using the EM algorithm. Fuzzy Sets Syst. 2011, 183, 72–91. [Google Scholar] [CrossRef]
Denoeux, T. Maximum likelihood estimation from uncertain data in the belief function framework. Knowl. Data Eng. 2013, 25, 119–130. [Google Scholar] [CrossRef]
Walley, P. Statistical Reasoning With Imprecise Probabilities; Chapman and Hall: London, UK, 1991. [Google Scholar]
Walley, P. Towards a unified theory of imprecise probability. Int. J. Approx. Reason. 2000, 24, 125–148. [Google Scholar] [CrossRef]
Linda, O.; Manic, M.; Alves-Foss, J.; Vollmer, T. Towards resilient critical infrastructures: Application of Type-2 Fuzzy Logic in embedded network security cyber sensor. In Proceedings of the 2011 4th International Symposium on Resilient Control Systems (ISRCS), Boise, ID, USA, 9–11 August 2011; IEEE: Piscataway Township, NJ, USA, 2011; pp. 26–32. [Google Scholar]
Mendel, J.M.; John, R.I.B. Type-2 fuzzy sets made simple. Fuzzy Syst. 2002, 10, 117–127. [Google Scholar] [CrossRef]
Ozen, T.; Garibaldi, J.M. Effect of type-2 fuzzy membership function shape on modelling variation in human decision making. In Proceedings of the IEEE International Conference on Fuzzy Systems, Budapest, Hungary, 25–29 July 2004. [Google Scholar]
Luce, R.D.; Raiffa, H. Games and Decisions; Wiley: New York, NY, USA, 1957. [Google Scholar]
Haouari, B.; Amor, N.B.; Elouedi, Z.; Mellouli, K. Naïve possibilistic network classifiers. Fuzzy Sets Syst. 2009, 160, 3224–3238. [Google Scholar] [CrossRef]
Benferhat, S.; Tabia, K. An efficient algorithm for naive possibilistic classifiers with uncertain inputs. In Scalable Uncertainty Management; Springer: Berlin, Germany, 2008; pp. 63–77. [Google Scholar]
Bounhas, M.; Hamed, M.G.; Prade, H.; Serrurier, M.; Mellouli, K. Naive possibilistic classifiers for imprecise or uncertain numerical data. Fuzzy Sets Syst. 2014, 239, 137–156. [Google Scholar] [CrossRef]
Bounhas, M.; Mellouli, K.; Prade, H.; Serrurier, M. Possibilistic classifiers for numerical data. Soft Comput. 2013, 17, 733–751. [Google Scholar] [CrossRef]
Kikuchi, S.; Perincherry, V. Handling Uncertainty in Large Scale Systems with Certainty and Integrity. Available online: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.132.7069 (accessed on 30 December 2020).
Yager, R.R. Set measure directed multi-source information fusion. Fuzzy Syst. 2011, 19, 1031–1039. [Google Scholar] [CrossRef]
Yager, R.R. Hard and soft information fusion using measures. In Proceedings of the 2010 IEEE International Conference on Intelligent Systems and Knowledge Engineering, Hangzhou, China, 15–16 November 2010. [Google Scholar]

Figure 1. Structure of a multisource classification system (Source: [11]).

Figure 2. Comparative example of the maximum possibility and maximum necessity measures decision criteria using four a posteriori possibility distributions.

Figure 3. Confidence indices associated with different decisions (

A_{m_{0}}

: event having the highest possibility degree,

A_{m_{1}}

: event with the second highest possibility degree). (Source: [11]).

Figure 4. Multi-source information context for pattern classification.

Figure 5. Two-sensors simulated images representing a scene of five thematic classes. Pixels from

I_{1}

(resp.

I_{2}

) are generated using Gaussian (resp. Rayleigh) density functions.

Figure 6. Triangular-shaped possibility distributions encoding expert’s knowledge, for the five thematic classes and the two sensors.

Table 1. PML decision making behavior for several cases.

	Case 1			Case 2			Case 3
	$P r {\cdot \| x_{1}}$	$π_{x_{2}} (\cdot)$	$υ (\cdot)$	$P r {\cdot \| x_{1}}$	$π_{x_{2}} (\cdot)$	$υ (\cdot)$	$P r {\cdot \| x_{1}}$	$π_{x_{2}} (\cdot)$	$υ (\cdot)$
C₁	0.1	0.2	0.02	0.1	1.0	0.1	0.2	0.8	0.16
C₂	0.4	0.7	0.28	0.4	1.0	0.4	0.2	0.2	0.04
C₃	0.1	0.3	0.03	0.1	1.0	0.1	0.2	0.1	0.02
C₄	0.3	0.7	0.21	0.3	1.0	0.3	0.2	0.1	0.02
C₅	0.1	0.1	0.10	0.1	1.0	0.1	0.2	0.2	0.04
	Case 4			Case 5			Case 6
	$P r {\cdot \| x_{1}}$	$π_{x_{2}} (\cdot)$	$υ (\cdot)$	$P r {\cdot \| x_{1}}$	$π_{x_{2}} (\cdot)$	$υ (\cdot)$	$P r {\cdot \| x_{1}}$	$π_{x_{2}} (\cdot)$	$υ (\cdot)$
C₁	0.1	1.0	0.1	0.1	0.8	0.08	0.1	0.8	0.08
C₂	0.4	0.2	0.08	0.4	0.2	0.08	0.4	0.3	0.12
C₃	0.1	0.0	0.0	0.1	0.1	0.01	0.1	0.4	0.04
C₄	0.3	0.0	0.0	0.3	0.3	0.03	0.35	0.5	0.16
C₅	0.1	0.0	0.0	0.1	0.1	0.02	0.1	0.5	0.05

Table 2. PML decision average pixel-based recognition rates for the five thematic classes using various configurations of knowledge sources.

Knowledge Sources	Source S₁: Probabilistic (G: Gaussian) Source S₂: Epistemic (Expert)			Source S₂: Probabilistic (R: Rayleigh) Source S₁: Epistemic (Expert)			Both Sources Are Probabilistic
Knowledge Sources	$S_{1}$	$S_{2}$	$S_{1} \underset{PML}{\underset{︸}{\oplus}} S_{2}$	$S_{2}$	$S_{1}$	$S_{1} \underset{PML}{\underset{︸}{\oplus}} S_{2}$	$S_{1} \oplus S_{2}$
Criterion	$τ (P r_{C_{m}}^{G})$	$τ (π_{C_{m}}^{G})$	$τ (P r_{C_{m}}^{G} \cdot π_{C_{m}}^{R})$	$τ (P r_{C_{m}}^{R})$	$τ (π_{C_{m}}^{G})$	$τ (P r_{C_{m}}^{R} \cdot π_{C_{m}}^{G})$	$τ (P r_{C_{m}}^{G} \cdot P r_{C_{m}}^{R})$
C₁	0.96	0.33	0.96	0.34	0.96	0.64	0.96
C₂	0.90	0.32	0.91	0.49	0.98	0.68	0.92
C₃	0.78	0.91	0.81	0.89	0.67	0.90	0.91
C₄	0.94	0.25	0.95	0.32	0.96	0.61	0.97
C₅	0.83	0.72	0.86	0.58	0.75	0.71	0.94
Average Recognition Rate	0.88	0.51	0.90	0.52	0.86	0.71	0.94

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

A New Hybrid Possibilistic-Probabilistic Decision-Making Scheme for Classification

Abstract

1. Introduction

2. Hard Decision in the Bayesian Framework

3. Brief Review of Possibility Theory

3.1. Possibility Distribution

3.2. Possibility and Necessity Measures

4. Decision-Making in the Possibility Theory Framework

4.1. Decision Rule Based on the Maximum of Possibility

4.2. Decision Rule Based on Maximizing the Necessity Measure

4.3. Decision Rule Based on Maximizing the Confidence Index

5. Possibilistic Maximum Likelihood (PML) Decision Criterion

5.1. Sources with Probabilistic and Possibilistic Types of Uncertainties

5.2. Possibilistic Maximum Likelihood (PML) Decision Criterion: A New Hybrid Criterion

5.3. PML Decision Criterion Behavior

6. Experimental and Validation Results

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics