A New Hybrid Possibilistic-Probabilistic Decision-Making Scheme for Classification

Uncertainty is at the heart of decision-making processes in most real-world applications. Uncertainty can be broadly categorized into two types: aleatory and epistemic. Aleatory uncertainty describes the variability in the physical system where sensors provide information (hard) of a probabilistic type. Epistemic uncertainty appears when the information is incomplete or vague such as judgments or human expert appreciations in linguistic form. Linguistic information (soft) typically introduces a possibilistic type of uncertainty. This paper is concerned with the problem of classification where the available information, concerning the observed features, may be of a probabilistic nature for some features, and of a possibilistic nature for some others. In this configuration, most encountered studies transform one of the two information types into the other form, and then apply either classical Bayesian-based or possibilistic-based decision-making criteria. In this paper, a new hybrid decision-making scheme is proposed for classification when hard and soft information sources are present. A new Possibilistic Maximum Likelihood (PML) criterion is introduced to improve classification rates compared to a classical approach using only information from hard sources. The proposed PML allows to jointly exploit both probabilistic and possibilistic sources within the same probabilistic decision-making framework, without imposing to convert the possibilistic sources into probabilistic ones, and vice versa.


Introduction
Uncertainty can be categorized into two main kinds [1]: aleatory or randomness uncertainty, aka statistical uncertainty, due to the variability or the natural randomness in a process and epistemic uncertainty, aka systematic uncertainty, which is the scientific uncertainty in the model of the process. It is due to limited data and knowledge. Epistemic uncertainty calls for alternative methods of representation, propagation, and interpretation of uncertainty than just probability. Since the beginning of the 60 s, following fruitful cross-fertilization, a convergence is emerging between physics, engineering, mathematics, and the cognitive sciences to provide new techniques and models that shows a trend of inspiration from human brain mechanism towards a unified theory to represent knowledge, belief and uncertainty [2][3][4][5][6][7][8][9].
Uncertainty is a natural and unavoidable part in real-world applications. When observing a "real-world situation", decision making is the process of selecting among several alternatives or decisions.
The problem here is to assign a label or a class to measurements or other types of observations (data) from sensors or other sources to which the observations are assumed to belong. This is a typical classification process.
As shown in Figure 1, the general classification process can be formulated as follows. An input set of observations o (o ∈ Ψ) is "observed" using a sensor (or a set of sensors) Figure 1. Structure of a multisource classification system (Source: [11]).
The development of both classification algorithms and decision-making criteria are governed by several factors mainly depending on the nature of the feature vector, the nature of the imperfection attached to the observed features as well as the available knowledge characterizing each decision. Several global constraints also drive the conception of the global classification process: the "physical" nature and quality of the measures delivered by the sensors, the categories discrimination capacity of the computed features, the nature and the quality of the available knowledge used for the development of the decision-making system.
However, in much of the literature, the decision-making system is performed by the application of two successive functionalities: the soft labeling and the hard decision (selection) functionalities. The labeling functionality [12] uses the available a priori knowledge in order to perform a mapping ℓ between the features set Θ and the decisions set Ω ( : Θ ⟶ Ω) . For each feature vector ∈ Θ , a soft decision label vector ( ) = [ 1 ( ), … , ( ), … , ( ) ] ∈ [0,1] is determined in the light of the available knowledge where ℓ ( ) measures the degree of belief or support, that we have in the occurrence of the decision . For instance, if the available knowledge allows probabilistic computations, the soft decision label vector is given through ( ) = { | } [13] where { | }represents the a posteriori probability of the decision given the observed feature vector ∈ Θ [14]. When the available knowledge is expressed in terms of ambiguous information, the possibility theory formalism (developed by L. Zadeh [7] and D. Dubois et al. [15][16][17]) can be used. The soft decision label vector ( ) is then expressed with an a posteriori possibility distribution defined on the decisions set Ω. In this case, ( ) = ( ) where ( ) represents the possibility degree for the decision to occur, given the observed feature vector ∈ Θ.
The second functionality performed by the decision-making process is called the hard decision or the selection functionality. As the ultimate goal of most classification applications is to select one and only one class (associated with the observations "o" for which the feature vector ∈ Θ is extracted) out of the classes set Ω, then a mapping has to be applied in order to transform the soft decision label vector ( ) into a hard decision label vector for which one and only one decision is selected. The goal is then to make a choice according to an optimality criterion.
In this paper, we propose a new criterion for decision-making process in classification systems called possibilistic maximum likelihood (PML). This criterion is framed within the possibility theory, but it uses corresponding notions from Bayesian decision- Figure 1. Structure of a multisource classification system (Source: [11]).
The development of both classification algorithms and decision-making criteria are governed by several factors mainly depending on the nature of the feature vector, the nature of the imperfection attached to the observed features as well as the available knowledge characterizing each decision. Several global constraints also drive the conception of the global classification process: the "physical" nature and quality of the measures delivered by the sensors, the categories discrimination capacity of the computed features, the nature and the quality of the available knowledge used for the development of the decision-making system.
However, in much of the literature, the decision-making system is performed by the application of two successive functionalities: the soft labeling and the hard decision (selection) functionalities. The labeling functionality [12] uses the available a priori knowledge in order to perform a mapping between the features set Θ and the decisions set Ω ( : Θ −→ Ω). For each feature vector x ∈ Θ, a soft decision label vector (x) = C 1 (x), . . . , C m (x), . . . , M (x) ∈ [0, 1] M is determined in the light of the available knowledge where C M (x) measures the degree of belief or support, that we have in the occurrence of the decision C m . For instance, if the available knowledge allows probabilistic computations, the soft decision label vector is given through C M (x) = Pr{C M |x} [13] where Pr{C M |x} represents the a posteriori probability of the decision C M given the observed feature vector x ∈ Θ [14]. When the available knowledge is expressed in terms of ambiguous information, the possibility theory formalism (developed by L. Zadeh [7] and D. Dubois et al. [15][16][17]) can be used. The soft decision label vector (x) is then expressed with an a posteriori possibility distribution π x defined on the decisions set Ω. In this case, C m (x) = π x (C m ) where π x (C m ) represents the possibility degree for the decision C m to occur, given the observed feature vector x ∈ Θ.
The second functionality performed by the decision-making process is called the hard decision or the selection functionality. As the ultimate goal of most classification applications is to select one and only one class (associated with the observations "o" for which the feature vector x ∈ Θ is extracted) out of the classes set Ω, then a mapping has to be applied in order to transform the soft decision label vector (x) into a hard decision label vector for which one and only one decision is selected. The goal is then to make a choice according to an optimality criterion.
In this paper, we propose a new criterion for decision-making process in classification systems called possibilistic maximum likelihood (PML). This criterion is framed within the possibility theory, but it uses corresponding notions from Bayesian decision-making. The main motivation being the development of PML is for multisource information fusion where an object or a pattern may be observed through several channels and where the available information, concerning the observed features, may be of a probabilistic nature for some features, and of an epistemic nature for some others.
In the presence of both types of information sources, most encountered studies transform one of the two information types into the other form, and then apply either the classical Bayesian or possibilistic decision-making criteria. With the PML decision-making approach, the Bayesian decision-making framework is adopted. The epistemic knowledge is integrated into the decision-making process by defining possibilistic loss values instead of the usually used zero-one loss values. A set of possibilistic loss values is proposed and evaluated in the context of pixel-based image classification where a synthetic scene, composed of several thematic classes, is randomly generated using two types of probabilistic sensors: a Gaussian and a Rayleigh sensor, complemented by an expert type of information source. Results obtained with the proposed PML criterion show that the classification recognition rates approach the optimal case, being, when all the available information is expressed in terms of probabilistic knowledge.
When the sources of information can be modelled by probability theory, the Baysesian approach has sufficient decision-making tools to fuse that information and performs classification. However, in the case where the knowledge available for the decision-making process is ill-defined in the sense that it is totally or partially expressed in terms of ambiguous information representing limitations in feature values, or, encoding linguistic expert's knowledge about the relationship between the feature values and different potential decisions, new mathematical tools (i.e., PML) need to be developed. This type of available knowledge can be represented as a conditional possibilistic soft decision label vector (x) defined on the decisions set Ω such that, C m (x) = π x (C m ) = π(C m |x) where π(C m |x) represents the possibility degree for the decision C m to occur, given the observed feature vector x ∈ Θ and the underlying observations o.
Possibility theory constitutes the natural framework allowing to tackle this type of information imperfection (called the epistemic uncertainty type) when one and only one decision (hard decision) must be selected from the exhaustive decisions set Ω, with incomplete, ill-defined or ambiguous available knowledge thus encoded as a possibility distribution over Ω. This paper proposes a joint decision-making criterion which allows to integrate such extra possibilistic knowledge within a probabilistic decision-making framework taking into account both types of information: possibilistic and probabilistic. In spite of the fact that possibility theory deals with uncertainty, which means that a unique but unknown elementary decision is to occur, and the ultimate goal is to determine this decision, there are relatively few studies that tackle that decision-making issue [18][19][20][21][22][23][24][25]. We must however mention the considerable contributions of Dubois and Prade [26] on possibility theory as well as on clarification on the various semantics of fuzzy sets [27][28][29][30]. Denoeux et al. [31][32][33] contributed as well significantly on that topic but they consider epistemic uncertainty as a higher order uncertainty upon probabilistic models such as in imprecise probabilities of Walley [34,35] and fuzzy sets type-2 [36][37][38] which is not being the case in this current paper.
The paper is organized this way. A brief recall of the Bayesian decision-making criteria, and of possibility theory is given in Sections 2 and 3. Three major possibilistic decision making criteria, i.e., maximum possibility, maximum necessity measure and confidence index maximization, are being detailed in Section 4. The PML criterion is presented in Section 5 followed by its evaluation in Section 6. Paper closes with conclusion in Section 7.

Hard Decision in the Bayesian Framework
In the Bayesian classification framework, the most widely used hard decision is based on minimizing an overall decision risk function [14]. Assuming o ∈ Ψ is the pattern for which the feature vector x ∈ Θ is observed, let λ m,n denotes a "predefined" conditional loss or penalty, incurred for deciding that the observed pattern o is associated with the decision C n , whereas the true decision (class or category) for o is C m (n, m {1, · · · , M}). Therefore, the probabilistic expected loss R(C n |x) , also called the Conditional risk, associated with the decision C n given the observed feature vector x ∈ Θ, is given by: where E{·} stands for the mathematical expectation. Bayes decision criterion consists in minimizing the overall risk R, also called Bayes risk, as defined in (2), by computing the conditional risk for all decisions and then, selecting the decision C n for which R(C n |x) is minimum: Therefore, the minimum-risk Bayes decision criterion is based on the selection of the decision C n which gives the smallest risk R(C n |x) . This rule can thus be formulated as follows: If Pr{C m } denotes the a priori probability of the decision C m and Pr{x|C m }, the likelihood function of the measured feature vector x, given the decision C m , then using Bayes' rule, the minimum-risk Bayes decision criterion (3) can be rewritten as: In the two-category decision case, i.e., Ω = {C 1 , C 2 }, it can be easily shown that the minimum-risk Bayes decision criterion, simply called Bayes criterion, can be expressed as in (5): In other words, this decision criterion consists of comparing the likelihood ratio (LR) Pr{x|C 1 }/Pr{x|C 2 } to a threshold η independent of the observed feature vector x. The binary cost, or zero-one loss, assignment is commonly used in classification problems. This rule, expressed in (6), gives λ m,n no cost for a correct decision (when the true pattern class/decision C m is identical to the decided class/decision C n ) and a unit cost for a wrong decision (when the true class/decision C m is different from the decided class/decision C n ).
It should be noticed that this binary cost assignment considers all errors as equally costly. It also leads to express the conditional risk as: A decision minimizing the conditional risk R(C n |x) becomes a decision maximizing the a posteriori probability Pr{C n |x}. As shown in (8), this version of the Bayes criterion is called the maximum a posteriori criterion (MAP) since it seeks to determine the decision maximizing the a posteriori probability value. It is also obvious that this decision process corresponds to the minimum-error decision rule which leads to the best recognition rate that a decision criterion can achieve: When the decisions a priori probabilities Pr{C m } and the likelihood functions Pr{x|C m } are not available, or simply difficult to obtain, the Minmax Probabilistic Criterion (MPC) can be an interesting alternative to the minimum-risk Bayes decision criterion [39]. As

Brief Review of Possibility Theory
Possibility theory is a relatively new theory devoted to handle uncertainty in the context where the available knowledge is only expressed in an ambiguous form. This theory was first introduced by Zadeh in 1978 as an extension of fuzzy sets and fuzzy logic theory, to express the intrinsic fuzziness of natural languages as well as uncertain information [7]. It is well established that probabilistic reasoning, based on the use of a probability measure, constitutes the optimal approach dealing with uncertainty. In the case where the available knowledge is ambiguous and encoded by a membership function, i.e., a fuzzy set, defined over the decisions set, the possibility theory transforms the membership function into a possibility distribution π. Then the realization of each event (subset of the decisions set) is bounded by a possibilistic interval defined though a possibility, Π, and a necessity, N, measures [16]. The use of these two dual measures in possibility theory makes the main difference compared with the probability theory. Besides, possibility theory is not additive in terms of beliefs combination, and makes sense on ordinal structures [17]. In the following subsections, the basic concepts of a possibility distribution and the dual possibilistic measures (possibility and necessity measures) will be presented. The possibilistic decision rules will be detailed in Section 4. Full details can be found in [11].

Possibility Distribution
Let Ω = {C 1 , C 2 , · · · , C M } be a finite and exhaustive set of M mutually exclusive elementary decisions (e.g., decisions, thematic classes, hypothesis, etc.). Exclusiveness means that one and only one decision may occur at one time, whereas exhaustiveness states that the occurring decision certainly belongs to Ω. Possibility theory is based on the notion of possibility distribution denoted by π, which maps elementary decisions from Ω to the interval [0, 1], thus encoding "our" state of knowledge or belief, on the possible occurrence of each class C m ∈ Ω. The value π(C m ) represents to what extent it is possible for C m to be the unique occurring decision. In this context, two extreme cases of knowledge are given: Complete knowledge: ∃!C m ∈ Ω, π(C m ) = 1 and π(C n ) = 0, ∀C n ∈ Ω, C n = C m . Complete ignorance: ∀C m ∈ Ω, π(C m ) = 1 (all elements from Ω are considered as totally possible). π(·) is called a normal possibility distribution if it exists at least one element C m 0 from Ω such that π(C m 0 ) = 1.

Possibility and Necessity Measures
Based on the possibility distribution concept, two dual set measures, possibility, Π, and a necessity, N, measures are derived. For every subset (or event) A ⊆ Ω, these measures are defined by: occurrence of A is totally certain. In a classification problem, where each decision C m refers to a given class or category, the case where all events A are composed of a single decision (A m = {C m }, m = 1, . . . , M), is of particular interest. In this case, the possibility Π(·), and the necessity N(·), measures are reduced to:

Decision-Making in the Possibility Theory Framework
In this section, we will investigate existing possibilistic decision-making rules. Two families of rules can be distinguished: rules based on the direct use of the information encapsulated in the possibility distribution, and rules based on the use of uncertainty measures associated with this possibility distribution. Let Ω = {C 1 , C 2 , · · · , C M } be a finite and exhaustive set of M mutually exclusive elementary decisions. Given an observed pattern o ∈ Ψ for which the feature vector x ∈ Θ is observed, let π x (C m ) denotes the a posteriori possibility distribution π(C m |x) defined on Ω. The possibility, Π x ({C m }), and necessity, N x ({C m }), measures are obtained as expressed in Equation (11), using the possibility distribution π x (C m ).

Decision Rule Based on the Maximum of Possibility
The decision rule based on the maximum of possibility is certainly the most widely used in possibilistic classification-decision-making applications. Indeed, as shown in (12), this rule is based on the selection of the elementary decision C m 0 ∈ Ω having the highest possibility degree of occurrence Π x ({C m 0 }): A "first" mathematical justification of this "intuitive" possibilistic decision-making rule can be derived from the Minmax Probabilistic Criterion (MPC), Equation (9), using a binary cost assignment rule. Indeed, 'converting' the a posteriori possibility distributions π x (·) into a posteriori probability distributions Pr{·|x } is assumed to respect the three following constraints [30]: (a) the consistency principle, (b) the preference ordering preservation, and (c) the least commitment principle. The preference ordering preservation, on which we focus the attention here, means that if decision C m 1 is preferred to decision C m 2 , i.e., π x (C m 1 ) > π x (C m 2 ), then the a posteriori probability distribution Pr{·|x } obtained from π x (·) should satisfy Pr{C m 1 |x} > Pr{C m 2 |x}. Equation (13) sums up this preference ordering preservation constraint: Therefore, selecting the decision maximizing the a posteriori probability or selecting the decision maximizing the a posteriori possibility decision is identical: using the MPC associated with the binary cost assignment rule or using the maximum possibility decision rule led to an identical result as expressed in (14).
This decision-making criterion is called the Naive Bayes style possibilistic criterion Refs. [40][41][42] and most ongoing efforts are oriented into the computation of the a posteriori possibility values using numerical data [43]. An extensive study of properties and equivalence between possibilistic and probability approaches is presented in [20]. Notice that this decision rule, strongly inspired from probabilistic decision reasoning, does not provide a

Decision Rule Based on Maximizing the Necessity Measure
It is worthwhile to notice that the a posteriori measures of possibility Π x and necessity N x coming from a normal a posteriori possibility distribution π x (·), constitute a bracketing for the a posteriori probability distribution Pr{·|x } [17]: Therefore, the maximum possibility decision criterion can be considered as an optimistic decision criterion as it maximizes the upper bound of the a posteriori probability distribution. On the contrary, a pessimistic decision criterion based on maximizing the a posteriori necessity measure can be considered as a maximization of the lower bound of the a posteriori probability distribution. Equation (16) expresses this pessimistic decision criterion: The question that we must raise concerns the "links" between the optimistic and the pessimistic decision criteria. Let us consider the a posteriori possibility distribution π x (·) for which C m 1 (resp. C m 2 ) is the "winning decision" obtained using the maximum possibility (resp. necessity measure) decision criteria as given in (17): The following important question can be formulated as follows: "Is the winning decision C m 1 (according to the maximum possibility criterion) is the same as the winning decision C m 2 according to maximum necessity measure criterion?" First, notice that if several elementary decisions share the same maximum possibility value υ = π x (C m 1 ), then, the necessity measure becomes a useless decision criterion since: Now, suppose that only one decision C m 1 assumes the maximum possibility value υ = π x (C m 1 ), it is important to raise the question whether the decision C m 1 will (or will not) be the decision assuming the maximum necessity measure value. Let us note v', the possibility value for the "second best" decision according to the possibility value criterion. As C m 1 is the unique decision having the maximum possibility value υ, we have υ < υ. Therefore, as shown in (18), the necessity measure value N x ({C m }) only gets maximum for the decision As a conclusion, when the maximum necessity measure criterion is useful for application (i.e., only one elementary decision assumes the maximum possibility value), then, both decision criteria (maximum possibility and maximum necessity) produce the same winning decision. In order to illustrate the difference between the maximum possibility and the maximum necessity measure criteria, Figure 2 presents an illustrative example.
In Figure 2 example, four different a posteriori possibility distributions π 1 , π 2 , π 3 , π 4 , all defined on a five elementary decisions set Ω = {C 1 , C 2 , C 3 , C 4 , C 5 } are considered. The necessity measures N k ({C m }) have been computed from the corresponding possibility distribution π k . The underlined values indicate which decisions result from the maximum possibility decision criterion as well as the maximum necessity measure decision criterion, for the four possibility distributions π k . Note that the necessity measure assumes at most two values whatever the considered possibility distribution. When the a posteriori possibility distribution has one and only one decision having the highest possibility degree, then both decision rules produce the same winning decision. This is the case of the normal possibility distribution π 1 as well as the subnormal possibility distribution π 3 , indicated as cases (a) and (c) in Figure 2.
mum possibility decision criterion as well as the maximum necessity measure decision criterion, for the four possibility distributions . Note that the necessity measure assumes at most two values whatever the considered possibility distribution. When the a posteriori possibility distribution has one and only one decision having the highest possibility degree, then both decision rules produce the same winning decision. This is the case of the normal possibility distribution π 1 as well as the subnormal possibility distribution π 3 , indicated as cases (a) and (c) in Figure 2.
When several elementary decisions share the same highest possibility degree, then the maximum possibility decision criterion can randomly select one of these potential winning decisions. In this case, the maximum necessity measure decision criterion will affect a single necessity measure degree to all elementary decisions from Ω, and thus, it will be impossible to select any of the potential winning decisions. This behavior can be observed with a normal possibility distribution π 2 as well as with a subnormal possibility distribution (like π 4 ), cases (b) and (d) in Figure 2. This example clearly shows the weakness of the decisional capacity of the maximum necessity measure decision criterion when compared to the maximum possibility decision criterion.

Decision Rule Based on Maximizing the Confidence Index
Other possibilistic decision rules based on the use of uncertainty measures are also encountered in literature. The most frequently used criterion (proposed by Kikuchi et al. [44]) is based on the maximization of the confidence index Ind defined as a combination of the possibility and the necessity measures for each event ⊆ Ω, given a possibility distribution (•): When several elementary decisions share the same highest possibility degree, then the maximum possibility decision criterion can randomly select one of these potential winning decisions. In this case, the maximum necessity measure decision criterion will affect a single necessity measure degree to all elementary decisions from Ω, and thus, it will be impossible to select any of the potential winning decisions. This behavior can be observed with a normal possibility distribution π 2 as well as with a subnormal possibility distribution (like π 4 ), cases (b) and (d) in Figure 2. This example clearly shows the weakness of the decisional capacity of the maximum necessity measure decision criterion when compared to the maximum possibility decision criterion.

Decision Rule Based on Maximizing the Confidence Index
Other possibilistic decision rules based on the use of uncertainty measures are also encountered in literature. The most frequently used criterion (proposed by Kikuchi et al. [44]) is based on the maximization of the confidence index Ind defined as a combination of the possibility and the necessity measures for each event A ⊆ Ω, given a possibility distribution π(·): where 2 Ω denotes the power set of Ω, i.e., the set of all subsets from Ω.  Therefore, if A m 0 = {C m 0 } is the only event having the highest possibility measure value π(C m 0 ), then, A m 0 will be the unique event having a positive confidence index value, whereas all other events will have negative values, as illustrated in Figure 3 where we assume π(C m 0 ) > π(C m ), ∀m = m 0 , and, C m 1 refers to the decision having the second highest possibility degree. In a classification decision-making problem, the decision criterion associated with this index can be formulated as follows: The main difference between the maximum possibility and the maximum confidence index decision criteria lies in the fact that the maximum possibility decision criterion is only based on the maximum possibility degree whereas the maximum confidence index decision criterion is based on the difference between the two highest possibility degrees associated with the elementary decisions. As already mentioned, it is important to notice that the event A m 0 = {C m 0 } having the highest possibilistic value, will be the unique event producing a positive confidence index measuring the difference with the second highest possibility degree. All other events A m = {C m }, ∀m = m 0 , will produce negative confidence indices.
When several decisions share the same highest possibility degree, their confidence index (the highest one) will be null. This shows the real capacity of this uncertainty measure for the decision-making process. However, this criterion brings the same resulting decisions as the two former ones.

Possibilistic Maximum Likelihood (PML) Decision Criterion
In the formulation of the Bayesian classification approach, all information sources are assumed to have probabilistic uncertainty where the available knowledge describing this uncertainty is expressed, estimated or evaluated in terms of probability distributions. In the possibilistic classification framework, the information sources are assumed to suffer from possibilistic (or epistemic) uncertainty where the available knowledge describing this uncertainty is expressed in terms of possibility distributions. In this section, the Bayesian pattern recognition framework is generalized in order to integrate both probabilistic and epistemic sources of knowledge. A joint probabilistic-possibilistic decision criterion called Possibilistic Maximum Likelihood (PML) is proposed to handle both types of uncertainties.

Sources with Probabilistic and Possibilistic Types of Uncertainties
In some situations, an object from the observation space is observed through several feature sets. This is the case, for instance, in multi-sensor environment for classification applications. In such situations, the information available for the description of the feature vectors may be of different natures: probabilistic, epistemic, etc. Yager [24,45,46] addresses the same sort of problems: multi-source uncertain information fusion in the case when the information can be both from hard sensors of a probabilistic type and from soft knowledgeexpert linguistic source of a possibilistic type. He uses t-norms ('and' operations) to combine possibility and probability measures. As will be explained below, Yager's product of possibilities and probabilities coincides with our 'decision variables' optimized through the proposed PML approach.
Let us consider the example illustrated in Figure 4, where each pattern o (from the patterns set Ψ) is "observed" through two channels. Source 1 (resp. source 2) measures a sub-feature vector x 1 ∈ Θ 1 (resp.x 2 ∈ Θ 2 ). Therefore, the resulting feature vector x(o) is obtained as the concatenation of the two sub-feature vectors: x(o) = [x 1 x 2 ]. In this configuration, the available information in the sub-feature vector x 1 (resp. x 2 ) undergoes probabilistic (resp. epistemic) uncertainties and is encoded as an a posteriori probability soft decision label vector 1 C n = Pr{C n |x 1 }, n = 1, 2, . . . , M (resp. a posteriori possibility soft decision label vector 2 C n = π x 2 (C n ), n = 1, 2, . . . , M).
temic sources of knowledge. A joint probabilistic-possibilistic decision criterion called Possibilistic Maximum Likelihood (PML) is proposed to handle both types of uncertainties.

Sources with Probabilistic and Possibilistic Types of Uncertainties
In some situations, an object from the observation space is observed through several feature sets. This is the case, for instance, in multi-sensor environment for classification applications. In such situations, the information available for the description of the feature vectors may be of different natures: probabilistic, epistemic, etc. Yager [24,45,46] addresses the same sort of problems: multi-source uncertain information fusion in the case when the information can be both from hard sensors of a probabilistic type and from soft knowledge-expert linguistic source of a possibilistic type. He uses t-norms ('and' operations) to combine possibility and probability measures. As will be explained below, Yager's product of possibilities and probabilities coincides with our 'decision variables' optimized through the proposed PML approach.
Let us consider the example illustrated in Figure 4, where each pattern o (from the patterns set Ψ) is "observed" through two channels. Source 1 (resp. source 2) measures a sub-feature vector 1 ∈ Θ 1 (resp. 2 ∈ Θ 2 ). Therefore, the resulting feature vector ( ) is obtained as the concatenation of the two sub-feature vectors: ( ) = [ 1 2 ]. In this configuration, the available information in the sub-feature vector x1 (resp. x2) undergoes probabilistic (resp. epistemic) uncertainties and is encoded as an a posteriori probability soft decision label vector ℓ 1 = { | 1 }, = 1,2, … , (resp. a posteriori possibility soft decision label vector ℓ 2 = 2 ( ), = 1,2, … , ). As an example, in a remote sensing system, Source 1 may be considered as a multispectral imaging system, where all potential a posteriori probability distributions, { | 1 }, = 1,2, … , , are assumed to be known and well established. The second sensor, Source 2, could be a radar imaging system where the available information concerning the different thematic classes is expressed by an expert using ambiguous linguistic variables like: "the thematic class is observed as "Bright", "Slightly Dark", etc. in the sub- As an example, in a remote sensing system, Source 1 may be considered as a multispectral imaging system, where all potential a posteriori probability distributions, Pr{C n |x 1 }, n = 1,2, . . . , M, are assumed to be known and well established. The second sensor, Source 2, could be a radar imaging system where the available information concerning the different thematic classes is expressed by an expert using ambiguous linguistic variables like: "the thematic class C n is observed as "Bright", "Slightly Dark", etc. in the sub-feature set Θ 2 ". Each linguistic variable can be used to generate an a posteriori possibility distribution associated with each thematic class π x 2 (C n ), n = 1, 2, . . . , M.

Possibilistic Maximum Likelihood (PML) Decision Criterion: A New Hybrid Criterion
In the Bayesian decision framework, detailed in previous sections, the binary cost assignment approach suffers from two constraints. On one hand, all errors are considered as equally costly: the penalty (or cost) of misclassifying an observed pattern o as being associated with a decision C n whereas the true decision for "o" is C m is the same (unit loss). This situation does not reflect real applications constraints. For instance, deciding that an examined patient is healthy whereas he suffers a cancer is much more serious than the other way around. On the other hand, the loss function values λ m,n , ∀m, n ∈ {1, 2, . . . , M} are static (or, predefined) and do not depend on the feature vectors of the observed patterns. The possibilistic maximum likelihood (PML) criterion, proposed in this paper, is based on the use of the epistemic source of information (the a posteriori possibility distribution, defined on the sub-feature space Θ 2 ) in order to define possibilistic loss values and to inject, afterwards, these values into the Bayesian decision criterion.
Assume that, for each object o ∈ Ψ, the observed feature vector is given by x(o) = [x 1 x 2 ] ∈ Θ 1 × Θ 2 , and denote Pr{·|x 1 } (resp.π x 2 (·)) as encoding the a posteriori probability (resp. possibility) soft decision label vectors defined over the sub-feature set Θ 1 (resp. Θ 2 ). The proposed PML criterion relies on the use of loss values λ m,n ranging from −1 (i.e., no loss) to +1 (i.e., maximum loss), and λ m,n refers to the risk of choosing C n whereas the real decision for the considered pattern is C m . Depending on the epistemic information available through Source 2, the proposed loss values are given by: In the case of a wrong decision, the decision penalty values, i.e., λ m,n where = m, are considered as positive loss values ranging in the interval: 0 ≤ λ m,n = max k =m π x 2 (C k ) ≤ 1. Thus, the wrong decision unit cost in the framework of binary-cost assignment, is "softened", in this possibilistic approach, and assumes its maximum value, i.e., unit cost, only when the wrong decision C n has a total possibility degree of occurrence.
When a correct decision C m is selected, the zero-loss value (used by the binary cost assignment approach) is substituted by λ m,n = max If the occurrence possibility degree π x 2 (C m ) of the true decision C m , is the highest degree π x 2 (C m ) > max k π x 2 (C k ), then the resulting loss value λ m,n becomes negative. The smallest penalty value is reached, i.e., λ m,n = −1, when π x 2 (C m ) = 1 (i.e., true decision C m has a total possibility degree of occurrence), with a null possibility degree of occurrence for all the remaining decisions (leading to max m =k π x 2 (C k ) = 0). Two special cases are present: (1) If the true decision C m shares the same maximum possibility value with, at least one different wrong decision C m , then, the correct decision C m loss value becomes null λ m,n = max k =m π x 2 (C k ) − π x 2 (C m ) = 0; (2) If the true decision C m does not produce the maximum occurrence possibility degree, i.e., π x 2 (C m ) < max k =m π x 2 (C k ), then the loss value λ m,n is positive and will increase the conditional risk, associated with the true decision C m .
Using the proposed possibilistic loss values, the conditional risk R(C k |x 1 ) of choosing decision C k can thus be computed as follows: As already mentioned, Bayes decision criterion computes the conditional risk for all decisions, then, selects the decision C n for which R(C n |x) is minimum. Based on Equation (23), and to select the minimum conditional risk decision, the comparison of conditional risks related to two decisions C k and C p , can be straightforward performed leading to: Therefore, the application of the PML criterion, for the selection of the minimum conditional risk decision (out of M potential elementary decisions) can be simply formulated by the following decision rule: This "intuitive" decision criterion allows the joint use both probabilistic and epistemic sources of information in the very same Bayesian minimum risk framework. As an example, the application of the proposed possibilistic loss values in the two-class decision case, where Ω = {C 1 , C 2 }, leads to the following loss matrix [λ]: The use of this loss matrix [λ] into the minimum-risk Bayes decision approach (as defined in (5)), leads to express the PML decision as follows: Notice that when the proposed possibilistic loss values are considered, then the PML induces a "weighting adjustment" of the a priori probabilities where the weighting factors are simply the a posteriori possibility degrees issued from the possibilistic Source 2. In the case of equal a priori probabilities, Pr{C 2 } = Pr{C 1 }, this decision criterion turns to an intuitive form using jointly probabilistic and epistemic sources of information, in the Bayesian minimum risk framework as shown by: It is worthwhile to notice that when the two following conditions prevail: • when the available probabilistic information (issued from source 1) is non-informative; and, • when the only meaningful and available information is reduced to the epistemic expert information on the sub-feature vector issued from source 2; then, the proposed PML criterion is simply reduced to the maximum possibility decision criterion: This raises a fundamental interpretation of the maximum possibility decision criterion as being a very special case of the possibilistic Bayesian decision making process under the total ignorance assumption of the probabilistic source of information.

PML Decision Criterion Behavior
Let S 1 denotes a probabilistic source of information measuring a sub-feature vector x 1 ∈ Θ 1 and attributing to each elementary decision C m , m = 1, 2, . . . , M, an a posteriori probability soft decision label Pr{C m |x 1 }. Under the assumption of equal a priori probabilities and using the binary-cost assignment, the application of the maximum a posteriori criterion (MAP), Equation (8), turns to be the "optimal" criterion ensuring the minimum-error decision rate.
Assume that an additional possibilistic source of information, S 2 , (measuring a subfeature vector x 2 ∈ Θ 2 ) is available, see Figure 4. Based on the use of the sub-feature vector x 2 ∈ Θ 2 , S 2 attributes to each elementary decision, C m , an a posteriori possibility soft decision label π x 2 (C m ), m = 1, 2, . . . , M. To obtain a hard decision, the application of the maximum of possibility decision criterion, Equation (12), is considered.
In the previous section, we have proposed the possibilistic maximum likelihood, PML, decision criterion, Equation (25), as a hybrid decision criterion allowing the coupled use of both sources of information, S 1 and S 2 , by considering the possibilistic information issued from S 2 , i.e., π x 2 (C m ), m = 1, 2, . . . , M, for the definition of the loss values in the framework of the minimum-risk Bayes decision criterion (instead of the use of the binarycost assignment approach). In this section, we will briefly discuss, from a descriptive point of view through an illustrative example, the "decisional behavior" of the PML criterion when compared to the decisions obtained with the "individual" application of the MAP and the maximum of possibility decision criteria.
First, it is worthwhile to notice that the "decision variable" to be maximized by the PML criterion is simply the direct product υ(C m ) = π x 2 (C m )·Pr{C m |x 1 }, m = 1, 2, . . . , M, which is a T-norm fusion operator (considering both probabilistic and possibilistic information as two "similar" measures of the degree of truthfulness related to the occurrence of different elementary decisions, see also p.101 of Yager [24]). This also means that both sources of information, S 1 and S 2 , are considered as having the same informative level. It is also important to notice that the PML criterion, as a decision fusion operator merging decisional information from both sources, S 1 and S 2 , constitutes a coherent decision fusion criterion in the sense that: when both sources S 1 and S 2 are in full agreement (i.e., leading to the same decision C m 0 ), then, the decision obtained by the application of the PML criterion will be the same as C m 0 ; -when one of the two sources S 1 and S 2 suffers from total ignorance (i.e., producing equal a posteriori probabilities, for S 1 and equal a posteriori possibilities, for S 2 ), then the PML criterion will "duplicate" the same elementary decision as the one proposed by the remaining reliable source of information; -when the two sources S 1 and S 2 lack decisional agreement, then, the decision obtained by the application of the PML criterion will be the most "plausible" elementary decision that may be different from individual decisions resulting from the MAP (resp. maximum possibility) criterion using the sub-feature vector x 1 ∈ Θ 1 (resp. sub-feature vector x 2 ∈ Θ 2 ). This decision fusion coherence is illustrated through the examples given in Table 1. The decisions set is formed by five elementary decisions, i.e., Ω = {C 1 , C 2 , C 3 , C 4 , C 5 }, and we assume that, given the observed feature x 1 ∈ Θ 1 , the probabilistic source S 1 produces the following a posteriori probability distribution: Pr{·|x 1 } = [0.1 0.4 0.1 0.3 0.1]. Each example fits in one sub-array which presents S 1 and S 2 specific configuration (Pr{·|x 1 }, π x 2 (·) and υ = Pr{·|x 1 }·π x 2 (·)) with the resulting decision for each decision parameter. The cases presented in Table 1 are explained as the following: Case 1: when both sources S 1 and S 2 agree, with a winning decision C 2 , the PML criterion maintains this agreement and obtains the same decision, C 2 . Cases 2 and 3: it shows that when one of the two sources presents a total ignorance, then the PML criterion "duplicates" the same elementary decision as the one offered by the remaining reliable source of information. Cases 4, 5 and 6: when sources S 1 and S 2 lack agreement (i.e., dissonant sources), then, the resulting decision obtained through the application of the PML criterion is the most reasonable decision. That may not necessarily be one of the winning decisions offered by the two sources (this is specifically shown in case 6).

Experimental and Validation Results
In this section, the proposed PML decision-making criterion is evaluated in a pixelbased image classification context. A synthetic scene composed of five thematic classes Ω = {C 1 , C 2 , C 3 , C 4 , C 5 } is assumed to be observed through two independent imaging sensors. Sensor S 1 (resp. sensor S 2 ) provides an image I 1 (resp. I 2 ) of the simulated scene. The two considered sensors are assumed to be statistically independent. Without loss of generalization, pixels from both images I 1 and I 2 are assumed to have the same spatial resolution, thus, they represent the same observed spatial cell or object o. The value of the pixel I 1 (i, j) (resp. I 2 (i, j)) provides the observed feature x 1 (resp.x 2 ) delivered by the first (resp. second) sensor. According to sensors characteristics, the measured feature x 1 (resp.x 2 ) follows a Gaussian N m C , σ 2 C (resp. Rayleigh R σ 2 C ) probability distribution with related parameters m C , σ 2 C depending on the thematic class "C" of the observed object. Figure 5 depicts the experimental simulated images I 1 (resp. I 2 ) assumed to be delivered at the output of the two sensors. Figure 6 shows the possibility distributions encoding expert's information, for the five thematic classes. Parameter values considered for each thematic class are given in the same figure. This configuration of classes' parameters is considered as a reasonable configuration that may be encountered when real data is observed. Nevertheless, other configurations have been generated and the obtained results are in full accordance with those obtained by the considered configuration.
In addition to the previously mentioned probabilistic information, we assume that each thematic class is described, by an expert, using the "simplest" linguistic variable "Close to v S,C k " where v S,C k denotes the thematic class C k feature mean value, observed through sensor S s . Therefore, the only information given by the expert is v 1,C k = m C K for sensor S 1 (underlying Gaussian distributions) and v 2,C k = σ C K √ π/2 for sensor S 2 (underlying Rayleigh distributions). For each sensor S s and thematic class C k , a standard triangular possibility distribution is considered to encode this epistemic knowledge with the summit positioned at the mean value v S,C k and the support covering the whole range of the features set. It is clearly seen that the possibility distributions (considered as encoding the expert's knowledge), represent a weak knowledge which is less informative than the initial, or even estimated, probabilistic density functions.
ing the expert's knowledge), represent a weak knowledge which is less informative than the initial, or even estimated, probabilistic density functions.
To evaluate the efficiency of the proposed possibilistic maximum likelihood decision making criterion, the adopted procedure consists, first, on the random generation of 1000 statistical realizations of the two synthetic Gaussian and Rayleigh images (with the five considered thematic classes) representing the analyzed scene. Second, the following average pixel-based recognition rates are evaluated: Figure 5. Two-sensors simulated images representing a scene of five thematic classes. Pixels from (resp. ) are generated using Gaussian (resp. Rayleigh) density functions.  ing the expert's knowledge), represent a weak knowledge which is less informative than the initial, or even estimated, probabilistic density functions.
To evaluate the efficiency of the proposed possibilistic maximum likelihood decision making criterion, the adopted procedure consists, first, on the random generation of 1000 statistical realizations of the two synthetic Gaussian and Rayleigh images (with the five considered thematic classes) representing the analyzed scene. Second, the following average pixel-based recognition rates are evaluated: Figure 5. Two-sensors simulated images representing a scene of five thematic classes. Pixels from (resp. ) are generated using Gaussian (resp. Rayleigh) density functions.  To evaluate the efficiency of the proposed possibilistic maximum likelihood decision making criterion, the adopted procedure consists, first, on the random generation of 1000 statistical realizations of the two synthetic Gaussian and Rayleigh images (with the five considered thematic classes) representing the analyzed scene. Second, the following average pixel-based recognition rates are evaluated: - τ Pr G C m : Minimum-risk Bayes average pixel-based decision recognition rate, Equation (8), using zero-one loss assignment, for each thematic class C m , m = 1, 2, . . . , 5, based on the use of sensor S 1 Gaussian feature x 1 only.
τ π G C m : Maximum possibility average pixel-based decision recognition rate, Equation (12), exploiting the epistemic expert knowledge for the description of each considered thematic class C m , m = 1, 2, . . . , 5, in the features set Θ 1 only. - τ Pr G C m ·π R C m : Possibilistic maximum likelihood average pixel-based decision recognition rate, Equation (24), jointly exploiting the epistemic expert knowledge for the description of each considered thematic class C m , m = 1, 2, . . . , 5, in the features set Θ 2 (sensor S 2 ), and the Gaussian probabilistic knowledge for the description of the same thematic class in the features set Θ 1 (sensor S 1 ). - τ Pr R C m : Minimum-risk Bayes average pixel-based decision recognition rate, Equation (8), using zero-one loss assignment, for each thematic class C m , m = 1, 2, . . . , 5, based on the use of sensor S 2 Rayleigh feature x 2 only. τ π G C m : Maximum possibility average pixel-based decision recognition rate, Equation (12), exploiting the epistemic expert knowledge for the description of each considered thematic class C m , m = 1, 2, . . . , 5, in the features set Θ 1 (sensor S 1 ). - τ Pr R C m ·π G C m : Possibilistic maximum likelihood average pixel-based decision recognition rate, Equation (24), jointly exploiting the epistemic expert knowledge for the description of each thematic class C m , m = 1, 2, . . . , 5, in sensor S 1 features set Θ 1 , and the Rayleigh probabilistic knowledge for the description of C m , m = 1, 2, . . . , 5, in sensor S 2 features set Θ 2 . - τ Pr G C m ·Pr R C m : Minimum-risk Bayes average pixel-based decision recognition rate, Equation (8), using zero-one loss assignment, for each thematic class C m , m = 1, 2, . . . , 5, and based on the joint use of both sensors S 1 (associated with Gaussian probabilistic knowledge in the features set Θ 1 ) and S 2 (associated with Rayleigh probabilistic knowledge in the features set Θ 2 . Sensors S 1 and S 2 are considered as being statistically independent. This criterion as well as all the criteria above have been calculated for the example in Table 2. The obtained average recognition rates are summarized in Table 2 (last row). As expected, at the global scene level, the average recognition rates when a probabilistic information source is used (for modelling the observed features) are higher than those obtained by the use of epistemic knowledge (i.e., τ Pr G ≥ τ π G , and τ Pr R ≥ τ π R ). Nevertheless, at the thematic classes' level, this property does not hold for some classes. This is mainly due to the fact that for "sharp classes" probability density functions, i.e., small variance, (for instance, thematic classes C 2 and C 4 ), the possibility distributions shape used to encode the expert knowledge (i.e., a wide-based triangular possibility shape) may bias each class influence, leading to a better recognition rate to the detriment of other neighboring classes (for instance, class C 3 ). In this case, this leads to obtain τ π G C 2 > τ Pr G C 2 and τ π G C 4 > τ Pr G C 4 . Poorer recognition performances of the maximum possibility decision criterion clearly come from the "weak epistemic knowledge" produced by the expert (indicating just the mean values) compared to the "strong probabilistic" knowledge involved by full probability density functions (resulting from either a priori information or the densities estimation using some learning data). The most interesting and promising result can be witnessed in terms of recognition rate improvement when the epistemic knowledge is jointly used with the probabilistic one as proposed by the PML decision criterion. Indeed, Table 2 (columns 4 and 7, bold numbers) shows that for all classes C m , m = 1, 2, . . . , 5, we have τ Pr G C m ·π R C m ≥ τ Pr G C m and τ Pr R C m ·π G C m ≥ τ Pr R C m . It is worthwhile to notice (columns 4 and 7, bold numbers) that the level of performance improvement depends on the "informative" capacity of the "additional" knowledge source. For instance, embedding the Gaussian source of knowledge (in terms of epistemic knowledge form) into the decisional process based on the probabilistic Rayleigh source of knowledge, improves much more the performance level than the reverse (i.e., embedding epistemic Rayleigh source of knowledge into the decisional process based on the probabilistic Gaussian source of knowledge): τ Pr G C m ·π R C m ≈τ Pr G C m whereas τ Pr R C m ·π G C m > τ Pr R C m . Finally, it is important to notice that the PML decision performances are lower-upper bounds delimited as follows: Given the fact that the two sources S 1 and S 2 are assumed to be statistically independent, then the joint probability distribution of the augmented feature vector x = [x 1 x 2 ] is the direct product of marginal ones. This simply means that the upper bounds given in Equation (30) constitute the optimal recognition rate (obtained by considering both probabilistic sources of knowledge). Therefore, the PML criterion improves the performances of the use of a "single" probabilistic source of knowledge, and approaches for some thematic classes the optimal recognition rate upper bound (last column of Table 2).

Conclusions
In this paper, a new criterion for decision-making process in classification systems is proposed. After a brief recall of the Bayesian decision-making criteria, three major possibilistic decision making criteria, i.e., maximum possibility, maximum necessity measure and confidence index maximization, have been detailed. It was clearly shown that the three considered decision criteria lead to, at best, the maximum possibility decision criterion. However, the maximum possibility criterion has no physical justification. A new criterion called, possibilistic maximum likelihood (PML) framed within the possibility theory, but using notions from Bayesian decision-making, has been presented and its behavior evaluated. The main motivation being the development of such criterion is for multisource information fusion where a pattern may be observed through several channels and where the available knowledge, concerning the observed features, may be of a probabilistic nature for some features, and of an epistemic nature for some others.
In this configuration, most encountered studies transform one of the two knowledge types into the other form, and then apply either the classical Bayesian or possibilistic decision-making criteria. In this paper, we have proposed a new approach called the Possibilistic maximum likelihood (PML) decision-making approach, where the Bayesian decision-making framework is adopted and where the epistemic knowledge is integrated into the decision-making process by defining possibilistic loss values instead of the usually used zero-one loss values.
A set of possibilistic loss values is proposed and evaluated in the context of pixel-based image classification where a synthetic scene, composed of several thematic classes, was randomly generated using two types of sensors: a Gaussian and a Rayleigh sensor. The evaluation of the proposed PML criterion has clearly shown the interest of the application of PML; where the obtained recognition rates approach the optimal rates (i.e., where all the available knowledge is expressed in terms of probabilistic knowledge). Moreover, the proposed PML decision criterion offers a physical interpretation of the maximum possibility decision criterion as a special case of the possibilistic Bayesian decision-making process when all the available probabilistic information indicates equal decisions probabilities.