Paraconsistent Probabilities: Consistency, Contradictions and Bayes' Theorem

This paper represents the first steps towards constructing a paraconsistent theory of probability based on the Logics of Formal Inconsistency (LFIs). We show that LFIs encode very naturally an extension of the notion of probability able to express sophisticated probabilistic reasoning under contradictions employing appropriate notions of conditional probability and paraconsistent updating, via a version of Bayes' theorem for conditionalization. We argue that the dissimilarity between the notions of inconsistency and contradiction, one of the pillars of LFIs, plays a central role in our extended notion of probability. Some critical historical and conceptual points about probability theory are also reviewed. Quod facile est in re, id probabile est in mente. As is widely recognized, paraconsistency is the investigation of logic systems endowed with a negation ¬, such that not every contradiction of the form p and ¬p entails everything. In other terms, a paraconsistent logic does not suffer from trivialism, in the sense that a contradiction does not necessarily explode or trivialize the deductive machinery of the system. In strict terms, even an irrelevant contradiction in traditional logic obliges a reasoner that follows such a logic to derive anything from a pair of contradictory statements {α, ¬α}, as a result of the so-called Principle of Explosion (PEx): On the other hand, a paraconsistent logician, by using a more cautious way of reasoning, is free of the burden of PEx and could pause to investigate the causes for the contradiction, instead of injudiciously deriving unwanted consequences from it. Common sense, however, recognizes that some contradictions may be intolerable, and those would destroy the very act of reasoning (that is, lead to trivialization). This amounts to admitting that not all contradictions are equivalent. The Logics of Formal Inconsistency (LFIs), a family of paraconsistent logics designed to express the notion of consistency (and inconsistency, as well) within the object language by employing a connective • (reading • α as " α is consistent ") realizes such an intuition.

As defended in [2], LFIs can be regarded as theories of logical consequence, epistemic in character, that tell us how to make sensible inferences in the presence of contradictions.From the purely mathematical viewpoint, LFIs are subsystems of classical logic, albeit they extend classical logic in the sense that classicality may be recovered in the presence of consistency: contradictions involving consistent sentences will lead to explosive triviality.Consistency in the LFIs is not regarded as synonymous with freedom from contradiction (as happens with the traditional notion of the consistency of a theory T, where consistency means that there is no sentence α such that T α and T ¬α where is a specified consequence relation in the language of T).The usual notion of consistency, totally depending on negation, is perhaps sufficient for certain mathematical purposes, but not for the whole enterprise of reasoning, as argued in [3].In the LFIs, however, the notions of consistency and non-contradiction are not coincident, nor are the notions of inconsistency and contradiction the same.For more details, conceptual motivations and the main results about LFIs, the reader is referred to [4,5].
The distinguishing feature of the LFIs is that the principle of explosion is not valid in general, although this law is not abolished, but restricted to consistent sentences.Therefore, a contradictory theory may be non-trivial unless the contradiction refers to something consistent.
More than three decades ago, Williams J. N. [6] pleaded that it is a mistake to suppose that inconsistency is the same as contradiction.The LFIs fully formalize this intuition, and starting from this perspective, it is possible to build a number of logical systems with different assumptions that not only encode classical reasoning, but also (at the price of adding new principles) converge to classical logic.We have chosen a particular logic endowed with adequate principles to deal with our paraconsistent probability measures, without obfuscating the fact that several other logics would give rise to specific (weaker or stronger) notions of paraconsistent probabilistic measures.

Ci, a Logic of Formal Inconsistency
The system Ci is a member of the hierarchy of the LFIs with some features that make it reasonably close to classical logic; it is appropriate, in this way, to define a generalized notion of probability strong enough to enjoy useful properties.Consider the following stock of propositional axioms and rules: Definition 1.Let Σ be a propositional signature closed under the unary connectives {•, ¬} and under the binary connectives {→, ∧, ∨}.The logic Ci (over Σ) is defined by the following Hilbert calculus:

Modus Ponens (MP): α, α→β β
As investigated in detail in [7], Ci can be extended to the first-order logic QCi (over a convenient extension of Σ) by adding appropriate axioms and rules.
It is worth noting that axioms Ax1 to Ax9 plus Modus Ponens MP define a Hilbert calculus for positive propositional classical logic (see [4]), and therefore, all of the laws concerning positive logic (as distribution of ∧ over ∨, etc.) are valid.
It is instructive to show, as an example, the useful properties of the distribution of conjunction over disjunction that holds as good as in classical logic (not a surprise, since positive classical logic is incorporated into our paraconsistent logic).However, as the validity of such laws may raise some doubts, we provide a quick proof of them.The symbol Ci stands for the derivability relation (when there is no danger of confusion, we shall drop the index in Ci in order to simplify notation) of Ci, and α ≡ β means α Ci β and β Ci α: Theorem 1. Distributing conjunctions and disjunctions: Proof.We just prove the first item (the second item is analogous).From right to left, axioms Ax4 and Ax5, plus Ax8, easily show the intended implication.From left to right, just notice that: As proven in [4], the logic Ci cannot be semantically characterized by finite matrices, but it can be characterized in terms of valuations over {0, 1} (also called bivaluations): Definition 2. Let L be the collection of sentences of Ci.A function v : L → {0, 1} is a valuation for Ci if it satisfies the following clauses: The semantical consequence relation w.r.t.bivaluations for Ci is defined as expected: Γ Ci ϕ iff, for every valuation v for Ci, if v(γ) = 1 for every γ ∈ Γ, then v(ϕ) = 1.
As shown in [4], a strong (classical) negation can be defined in Ci as ∼ β α = α → ⊥ β , where ⊥ β = (β ∧ (¬β ∧ •β)) is a bottom formula (that is: ⊥ β Ci ψ for every ψ) for any sentence β.In order to simplify matters, a privileged β will be chosen, and the subscript will be omitted in ⊥ β and ∼ β from now on.

Theorem 3. Properties of strong negation:
The strong negation ∼ satisfies the following properties in Ci: Proof.Detailed proofs can be found in [4].
Additionally, several metatheorems, such as the deduction metatheorem, can be proven in Ci.Now, by defining 'α is inconsistent' as •α: = ¬ • α, axioms Ax12 and Ax13 (which permit to add and eliminate double negations) convey the meaning that 'α is not inconsistent if and only if it is inconsistent'.Of course, by the very definition and the same axioms on double negations, it also holds 'α is not consistent if and only if it is inconsistent'.Some other relevant results in Ci hold as follows.
Theorem 4. Properties of consistency: Proof.Again, detailed proofs can be found in [4].
Theorem 5.The following are bottom particles in Ci: Proof.We refer the reader once again to [4] for the proofs.

Consistency, Inconsistency and Paraconsistent Probability
As hinted in the previous section (see [2] for details), the formal notion of consistency here considered does not necessarily depend on negation.Indeed, the logical machinery of the LFIs shows that consistency may be conceived of as a primitive concept, whose meaning can be thought of as "conclusively established as true (or false)" by extra-logical means, depending on the subject matter.Consistency, in this sense, is a notion independent of model theoretical and proof-theoretical methods and is more close to the idea of regularity, or something contrary to change (an elaboration on this view is offered in [8]).
However, consistency is also connected to complying with the laws of traditional probability, as put by F.Ramsey (see [9]), who argues that degrees of belief should satisfy the probability axioms and defends that this is connected to a notion of consistency or coherence.In this way, the notion of consistency (at least for degrees of belief) can be regarded as the satisfaction of the probability axioms.
This paper aims to investigate deeper relationships between logic and probability, emphasizing a new way to define a paraconsistent theory of probability.A previous approach has been developed in [10], where variations of paraconsistent Bayesianism based on a four-valued paraconsistent logic are discussed.A still earlier attempt has been briefly sketched in [11].An entirely different view is taken in [12], where probabilistic semantics is given for a couple of many-valued paraconsistent logics.The connections between non-classical logics and probability are not circumscribed to paraconsistent logic; see, e.g., [13] for the case of infinite-valued logic and in particular [14] for a broader and more philosophical perspective.
Probability functions are usually defined for a σ-algebra of subsets of a given universe set Ω, but it is also natural to define probability functions directly for sentences in the object language.We will refer to them, respectively, as probability on sets versus probability on sentences (see the discussion in Section 5).
Although these two approaches are equivalent in classical logic in view of the representation theorems for Boolean algebras, this is not so for probability based on other logics, since the algebraic kinship may be lost for non-classical logics or be much less immediate.Furthermore, in algebraic terms, probability functions in set-theoretical settings are required to satisfy countable additivity, but since propositional language is compact, for probability on sentences, it suffices to require finite additivity.
Our first definition of paraconsistent probability will be directly concerned with probability on sentences, with the primary intention to emphasize the role of a new, more cautious logic, behind the probabilistic reasoning, and the effects of logical machinery on a corresponding version of Bayes' rule.In Section 5, however, a notion of paraconsistent probability spaces (concerning probability on sets) will be presented and discussed.Definition 3. A probability function for the language L of a logic L, or a L-probability function, is a function P : L → R satisfying the following conditions, where L stands for the syntactic derivability relation of L: 1. Non-negativity: 0 ≤ P(ϕ) ≤ 1 for all ϕ ∈ L It should be remarked that the same meta-axioms, taking appropriately L for L the classical, intuitionistic or paraconsistent derivability relation (in the present case, Ci ), define, respectively, the classical, the intuitionistic and the probabilistic probability measures.The intuitionistic case is treated in [15].This is clear evidence that the concept of probability can be regarded as entirely logic dependent and that the choice of the underlying logic is a matter of interest and convenience.However, once a choice is made, the consequences are radically different, as we intend to illustrate below.
Two events α and β are said to be independent if P(α ∧ β) = P(α) • P(β).Two events can be independent concerning one probability measure and dependent concerning another.Some immediate consequences of such axioms are the following: Theorem 6. Regularity of L-probability measures.
Proof.(1) Immediate, in view of axioms anti-tautologicity and comparison above; (2) Immediate, in view of axiom comparison above.
As a consequence of Theorem 6,

for any probability function P.
Two sentences α and β are said to be logically incompatible if α, β ϕ, for any ϕ (or equivalently, if α ∧ β act as a bottom particle).Some simple calculation rules follow: Theorem 7. Calculation rules of Ci-probability measures.Let P be a Ci-probability measure; then: Proof.Only Items (1) and (2) will be proven (the rest is routine): (1): Since α and β are logically incompatible, α ∧ β act as a bottom particle, and the result is immediate by Theorem 6 and finite additivity; (2): Use finite additivity in the sentences •α ∨ (α ∧ ¬α) and •α ∧ (α ∧ ¬α).
Probabilities are sometimes seen as generalized truth values.In this way, in the classical case, the so-called probabilistic semantics may be thought of as extending the bivaluations v : L → {0, 1} of classical propositional logic with the probability functions ranging on the real unit interval [0, 1].The other way round, bivaluations can be regarded as degenerate probability functions; in this sense, classical logic is to be regarded as a special case of probability logic.We show below that an analogous property holds for Ci and for the above defined notion of paraconsistent probability.
Probabilistic semantics aim to interpret logic systems (viz., logic entailment) with no appeal to truth conditions.In this way, it differs from standard truth-valued semantics: probabilistic semantics is much more general, notwithstanding both being equivalent in the classical case.What happens is that each semantics expresses logical truth and logical consequence in its own way, and we show in this section that, quite surprisingly, the equivalence between such two distinct semantics also holds for the paraconsistent probability theory based on Ci.
Define P as a probabilistic semantic relation whose meaning is Γ P ϕ if and only if for every probability function P, if P(ψ) = 1 for every ψ ∈ Γ, then P(ϕ) = 1.It can be shown that Ci is (strongly) sound and complete with respect to such probabilistic semantics: Theorem 8. Completeness of Ci with respect to probabilistic semantics: Γ ϕ if and only if Γ P ϕ Proof.The left-to-right direction follows directly from the axioms of probability, namely, tautologicity and comparison, plus the compactness property of Ci proofs.For the right-to-left direction, notice that if Γ P ϕ, then in particular, this holds for the probability functions P 2 , such that P 2 : L → {0, 1}.It suffices, then, by appealing to Theorem 2, to show that the mappings P 2 satisfy all of the conditions for bivaluations of Definition 2: On the other hand, if P 2 (α) = 1 and P 2 (β) = 1, since again by comparison P 2 (α) ≤ P 2 (α ∨ β), then P 2 (α ∨ β) = 1, and the result follows by finite additivity, i.e., P(α Analogous to the previous item, mutatis mutandis.
Probabilistic semantics may be seen as an alternative to truth-valued semantics, with the intention to explain semantic notions, such as truth and consequence, in terms of probability functions.It has been shown in [16] that a semantics to standard logic (regarding soundness and completeness) can be provided without any appeal to model-theoretic or proof-theoretic concepts.The idea was later extended in [17], proving that a probabilistic semantics can be given to any extension of classical propositional logic.However, Theorem 8 is somewhat surprising, because the logic Ci is not an extension of classical propositional logic, just the contrary: although its language extends the language of classical logic, deductively, it is a contraction.Our (paraconsistent) probabilistic semantics to Ci shows, therefore, that an alternative to truth-valued semantics in terms of non-standard probability functions can be provided even in cases of the contractions of classical logic.

Conditional Probabilities and Paraconsistent Updating
Perhaps the most interesting use of probability in paraconsistent logic is to help the so-called Bayesian epistemology or the formal representation of belief degrees in philosophy.The well-known Bayes' rule permits one to update probabilities as new information is acquired and, in the paraconsistent case, even when such new information involves some degree of contradictoriness.
We define, as usual, the conditional probability of α given β for P(β) = 0 as: The traditional Bayes' theorem for conditionalization says, for P(β) = 0: As usual, P(α) here denotes the prior probability, i.e, is the probability of α before β has been observed.P(α/β) denotes the posterior probability, i.e., the probability of α after β is observed.P(β/α) is the likelihood or the probability of observing β given α, and P(β) is called the marginal likelihood or "model evidence".
It is convenient to show at this point a simple, yet pivotal generalization of the classical theorem of total probability: Therefore, On the other hand, by the axiom of tautologicity (Definition 3): and since: it follows that: Theorem 10.Paraconsistent Bayes' Conditionalization Rule (PBCR): If P(α ∧ ¬α) = 0, then: Proof.First notice that P(α ∧ ¬α) = 0 entails that P(α) = 0 and P(¬α) = 0 in view of Definition 3 and Ax4, so the quotient is well defined.Suppose we have two contradictory hypothesis, α and ¬α, and wish to compute the probability of α based on evidence β.Since the definition of conditional probability gives: it remains to compute the marginal likelihood P(β) depending on P(α) and P(¬α).This follows immediately by Theorem 9, dividing and multiplying each term by, respectively, P(α), P(¬α) and P(α ∧ ¬α) (which are not zero).
It is clear that this rule generalizes the classical conditionalization rule, as it reduces to the classical case if P(α ∧ ¬α) = 0 or if α is consistent: indeed, in the last case, As a slogan, we could summarize PBCR as saying: "Posterior probability is proportional to likelihood times prior probability and inversely proportional to the marginal likelihood analyzed in terms of its components".It is possible, however, to formulate other kinds of conditionalization rules by combining the notions of conditional probability, contradictoriness, consistency and inconsistency.
Example 1.As an example, suppose that a doping test for an illegal drug is such that it is 98% accurate in the case of a regular user of that drug (i.e., it produces a positive result, showing "doping", with probability 0.98 in the case that the tested individual often uses the drug), and 90% accurate in the case of a non-user of the drug (i.e., it produces a negative result, showing "no doping", with probability 0.9 in the case that the tested individual has never used the drug or does not often use the drug).
Suppose, additionally, that: (i) it is known that 10% of the entire population of all athletes often uses this drug; (ii) that 95% of the entire population of all athletes does not often use the drug or has never used it; and (iii) that the test produces a positive result, showing "doping", with probability 0.11 for the whole population, independent of the tested individual.
Let us use the following abbreviations mnemonically: D: the event that the drug test has declared "doping" (positive) for an individual; C: the event that the drug test has declared "clear" or "no doping" (negative) for an individual; A: the event that the person tested often uses the drug; ¬A: the event that the person tested does not often use the drug or has never used it.
Suppose someone has been tested, and the test is positive ("doping").What is the probability that the tested individual regularly uses this illegal drug, that is what is P(A/D)?
By applying the paraconsistent Bayes' rule: All of the values are known, with the exception of P(D/A ∧ ¬A).Since: it remains to compute P(D ∧ A ∧ ¬A).It follows directly from Theorem 9 that P(D ∧ A ∧ ¬A) = P(D ∧ A) + P(D ∧ ¬A) − P(D) = P(D/A).P(A) + P(D/¬A).P(¬A) − P(D) = 0.083.Therefore, by plugging in all of the values, it follows that P(A/D) = 35.5%.
The probability P(¬A/D) that a tested individual never uses the drug or just uses it sporadically, given that the test has been positive, analogously computed via the paraconsistent Bayes' rule, is 34.4%.
As shown below, this example suggests that the paraconsistent Bayes' conditionalization rule is more sensitive to the test parameters than traditional conditionalization.The following tables compare the paraconsistent results with the results obtained by trying to remove the contradiction involving the events A (the event that the person tested often uses the drug) and ¬A (the event that the person tested does not often use the drug or has never used it), that is by trying to make them "classical".The two tables refer to two kinds of tests: a less reliable test, with 10% of false positives (Table 1), and a more reliable test, with 2% of false positives (Table 2).
Since A and ¬A overlap by 5%, we might think, thus, about reviewing the values, by removing the overlap according to three hypothetical scenarios: an alarming scenario, by lowering the value of ¬A by 5%; a happy scenario, by lowering the value of A by 5%; and a cautious scenario, by dividing the surplus equally between A and ¬A and computing the probability P(A/D) that the tested individual regularly uses this illegal drug.

Alarming Scenario Cautious Scenario
Happy Scenario Using paraconsistent probabilities, however, one obtains a lower value concerning this less reliable test, namely, P(A/D) = 35.5%,tending towards the "happy" hypothetical scenario where fewer people use the drug.

Alarming Scenario Cautious Scenario Happy Scenario
In this second, more reliable test, by computing the result directly via paraconsistent probabilities, one obtains a higher value, namely P(A/D) = 79%, tending towards the "cautious" hypothetical scenario.
The values P(D/A) and P(C/¬A) are known, respectively, as sensitivity and specificity, and the positive likelihood ratio is defined as sensitivity/1 − speci f icity (similarly, the negative likelihood ratio is defined as 1 − sensitivity/speci f icity.Defining such measures for paraconsistent probabilities would help to assess their meaning, but this kind of approach is not the intention of this paper.
These simple examples suggest the following interpretation about Bayesian paraconsistent updates: When the test is less reliable, paraconsistent probabilities tend toward cautious optimism: values tend to reflect the most favorable outcome.On the other hand, as the test gets more reliable, paraconsistent probabilities tend toward cautious realism, in the sense of favoring more realistic expectations of undesirable outcomes.This notwithstanding, the test is cautious in all cases.
Such numerical results are very much connected to the philosophical profile of paraconsistent logics, which naturally endorses cautious reasoning about contradictions: when you find a contradiction, it is better to carefully analyze its causes, instead of risking throwing the baby out with the bath water.
It is worth noting that, apparently, our comparisons above would seem to be futile.Since, as an unwary reader could think, standard probabilistic merging techniques would be directly applied to define a belief function consensus (as studied, e.g., in [19] and further extended in several ways), this is however highly debatable, as discussed in [20] by one of the founders of Dempster-Shafer theory, who argues against global consensus methods and in favor of direct reassessment of the items of evidence that contribute to a belief function.
Besides such a debate of whether or not a consensus of belief functions can be directly computed, it is not obvious at all how to understand beliefs in terms of probability theory.As put by J. Y. Halpern and R. Fagin in [21], page 3: If we view belief as generalized probability, then it makes sense to update beliefs but not combine them.If we view beliefs as a representation of evidence, then it makes sense to combine them, but not update them.This suggests that the rule of combination is appropriate only when we view beliefs as representations of evidence The belief functions defined by our probability functions, of course, are generalized probabilities, and we align with Halpern and Fagin, among others, in that it makes sense to update beliefs instead of combining them.
Although this first example is just a suggestive sample of what can be done with a robust calculus of probabilities based on a well-founded paraconsistent logic, it is convenient to recall that false positives themselves can be regarded as contradictory when false positive results are more probable than true positive tests; this occurs when the incidence of a certain condition (for instance, disease) in a population is below the probability of a false positive rate.In this case, the traditional Bayes' conditionalization rule already incorporates a mechanism for rationally handling some types of contradictory data without falling into trivialization.Indeed, in [22], the paraconsistent paradigm is invoked as a tool to evaluate the sensitivity of the traditional Full Bayesian Significance Test (FBST) value regarding changes in the prior or reference density.That paper argues that such an intuitive measure of inconsistency can be made rigorous in the context of paraconsistent logic and bilattice structures.Our intention is different, as we start from a logical-paraconsistent approach, but both views can be, in principle, combined.
Furthermore, in [23], a conviction measure and a loss function are defined, intended to be used for evaluating financial operations strategies.A classification tree learning algorithm is thus defined over such a conviction measure, with the advantage of outputting more cautious decisions.It is not implausible that our generalization of Bayes' updating procedure could be used with similar applications in mind, but this is a subject of further investigation.

Paraconsistent Probability Spaces
As is historically recognized (see, e.g., [24]), two main competing schools of probability emerged in Europe in the 17th century, leading to different methods of statistical inference and estimation, frequentist (based on the laws of large numbers and concerned with stochastic laws of chance) and epistemological (refereeing to credence or reasonable degrees of belief, somehow connected to the modern Bayesianism).However, there is also a second competition, represented by the theories of logical probability (in contemporary times, Keynes/Carnap) versus measure-theoretical approaches to probability (Kolmogorov).These pair of bipartite traditions are not unrelated, but nobody knows for sure how all this is related to the debate between logic and probability (see below).
On the other hand, the reference to probability on sets is more recent.Kolmogorov introduced in his classic book in German of 1933, translated as [25], what he called the 'elementary theory of probability', which nowadays is widely used by mathematicians, statisticians and engineers and connected to measure theory.Probability on sentences is more common among philosophers and logicians.
Probability can, alternatively, be based on game theory rather than measure theory, as developed in [26], viewing probability as a perfect information game between two players.The so-called axiom of continuity, considered not to be well motivated, is not assumed in [26].As put in [27]: Countable additivity for probability has always been controversial.Émile Borel, who introduced it, and Andrei Kolmogorov, who confirmed its role in measure-theoretic probability, were both ambivalent about it.They saw no conceptual argument for requiring probabilities to be countably additive.It is merely mathematically convenient to assume they are.As Kolmogorov explained in his Grundbegriffe, countable additivity has no meaning for empirical experience, which is always finite, but it is mathematically useful.We can elaborate Kolmogorov's explanation by pointing out that infinities enter into applied mathematics not as representation but as simplification.
Probability on sets (more properly called probability functions) assigns probabilities directly to sets.Although events, statements or predicates in many cases can be expressed as sets, these two accounts are not the same, although they are inherently equivalent in the classical scenario.In a paper that surveys and discusses K. Popper's contributions to the theory of probability, H. Leblanc in [28] recalls that Popper reacted against the dependency of probability theory on the Boolean algebra of sets and proposed his notion of absolute probability motivated by this criticism.
The equivalence between probability on sentences and probability on sets is far from obvious under the paraconsistent paradigm, and our proposal would not be complete if we could not offer a similar approach from the paraconsistent perspective.Definition 4. A paraconsistent probability space is a structure Ω, Σ, Π, P µ where: 1. Ω is the sample space composed of all possible outcomes 2. Σ ⊆ ℘(Ω) is a set of events, such that Σ is a σ -algebra, i.e.: (a) ∅ ∈ Σ, Ω ∈ Σ and Π ∈ Σ; (b) Σ is closed under ∪, ∩ and countable unions; (c) Σ is closed under the following two binary operations: i. X : Σ → Σ, such that X ⊆ X , where X is the usual complement.ii.
X : Σ → Σ, such that X ∩ X = X (d) Π is the set of all consistent outcomes; 3. The map P µ : Σ → [0, 1] is a probability measure satisfying the following conditions: The point of assigning a probability interpretation to logic systems is how legitimate it is to regard probability as attached to logical statements, instead of directly to events.This duality causes perplexity to some, since statisticians and engineers may find confusing or unnecessary this passage to logic and would prefer to deal with events directly (in this case, the notion of a random variable is essential).Philosophers and logicians, on the other hand, may find it natural to attach probabilities to statements, not to events (and in this case, the notion of consequence relation is essential).
A (classical) probability space is just a particular case of paraconsistent probability space Ω, Σ, P where Π is empty and the operation X : Σ → Σ is the identity on Σ (and consequently X = X).
In the classical case, a definite link between probability on events and probability on sentences (in the cases of finite additivity) can be defined so that the probability measures of information in both cases is the same.For the cases that demand infinite additivity, an infinitary propositional language or a first-order logic is required (see [29], pages 401-404).For De Finneti (see [30]), as much as for Borel and Kolmogorov, infinite additivity is just a question of mathematical convenience, not strictly justified by the concept of probability: Its success owes much to the mathematical convenience of making the calculus of probability merely a translation of modern measure theory.[...] No-one has given a real justification of countable additivity (other than just taking it as a 'natural extension' of finite additivity).
In the cases where Σ is closed under ∪, ∩ and finite unions, a σ -algebra is referred to as a σ-algebra.
The connection between probability on events and probability on sentences in the classical case is granted through algebraic maneuvers à la Lindenbaum algebra, by showing that probability measures on sentences turn out to be individual cases of probability measures on events.The argument is essentially the following: denote the set of classical bivaluations in its language L as Val, and assign to each sentence ϕ ∈ L a subset of Val, defined by It can be inductively proven that: Therefore, the universe A = {[[ϕ]] ∈ ℘(Val)|ϕ ∈ L} acquires a σ-algebra structure.Now, given any probability distribution P : L → [0, 1] as in Definition 3, one defines a probability measure over sets P µ : A → [0, 1], as in Definition 4, by P µ ([[ϕ]]) = P(ϕ), for any ϕ ∈ L. Clearly P µ is well-defined in view of Theorem 6, Item (2).
The converse can be easily determined in the cases of finite probability spaces Ω, P µ representing the outcomes of a certain stochastic experiment Ω = {ω 1 , • • • , ω n } (say, throwing dice).In such cases, one can define a corresponding probabilist logic by setting atomic variables α 1 , • • • , α n , where α i means that ω i is the outcome of the experiment, and sentences saying that ω i occurs independently, and defining P L (ρ i ) = P({ω i }) for i = 1, • • • , n and P L (λ) = 0 for any other sentences in these atomic variables.It is easy to see that L, P L is a propositional probability model.Some more details can be found, for the classical case, in [31], Chapter 4.7.A concrete example of a paraconsistent probability space is given by the structure: Example 2. Ω, Σ, Π, P µ where: 1. Ω is any set (representing all possible outcomes) 2. Σ ⊆ ℘(Ω) is the set of all events, such that Σ is a σ-algebra, i.e.: In this case, Π represents the consistent events (from the point of view of the logic Ci).When Π = Σ, this kind of paraconsistent probability space turns out to be a classical probability space.The structure Ω, Σ, Π is a particular case of a paraconsistent algebra of sets, as investigated in [32], and the results therein (particularly Theorems 4 and 5) can be adapted to give a precise connection between the concepts of paraconsistent probability on events and probability on sentences in the Ci logic (again, in the cases of finite additivity).
This kind of structure, of interest for mathematical statistics, would permit dealing with paraconsistent probability distributions in the way that is familiar for applications.
The main differences between 'paraconsistent probability logics' and 'paraconsistent probability spaces' are connected to the fact that the former are concerned with the transmission of probability through valid inferences, while the latter concentrate on measures of sets, the distribution of probabilities and their consequences, permitting one to treat random variables, expectation, central limit theorem, etc.Of course, as much as in the classical case, philosophical issues concern probabilistic logics, not spaces.
Probability spaces and probability logic, as we have seen, make for equivalent views in the classical case (at least for the finite situations), and a similar proof can be adapted for our paraconsistent probability logic and paraconsistent probability spaces (keeping in mind that in both cases, the definitions reflect the inherent nature of or underlying logic Ci).A deep investigation about paraconsistent probability spaces in the direction of a more sophisticated treatment involving random variables, measure theory and similar topics, however, is out of the scope of the present paper.

Discussion: From Paraconsistent Probability to Paraconsistent Possibility
Possibility theory is a theory of uncertainty used in areas, such as artificial intelligence, non-monotonic reasoning, belief revision and similar domains, to express uncertain knowledge in scenarios of incomplete information.It is regarded as a kind of imprecise probability theory, at times seen as a generalization of probability, sometimes as its rival (cf., e.g., [33]).Possibility theory uses a pair of dual set functions (possibility and necessity measures) instead of only one measure, in this way differing from probability.
As a formalism to model and reason with uncertain information, possibility theory is also connected to the Dempster-Shafer theory of evidence (see [34]).Probability, possibility and other credal calculi are alternative formalisms, with distinct motivations, and sometimes, it may be important to combine different belief measures for better solving certain problems, as explained in [35].
Possibility theory can be easily and naturally extended over LFIs, by defining possibility and necessity functions (as, e.g., in [36] and especially in [37], where the notion of degrees of support is investigated as generalizations of degrees of belief) simulating our paraconsistent probability measures over LFIs.A deeper study on the meaning and applications of paraconsistent possibility theory, as well as its inter-relations to paraconsistent probability are postponed to further work.

Summary, Comments and Conclusions
We have reviewed some basic points about paraconsistency, characterizing a logic of formal inconsistency as a paraconsistent logic endowed with a notion of consistency • and a negation ¬, which is free from trivialism.This means that a contradiction expressed by means of the negation ¬ does not necessarily trivialize the underlying consequence relation, although consistent contradictions do explode.We then defined a measure of probability to one of such logics, the system Ci, taking profit from the underlying notion of consistency and essaying the first steps towards paraconsistent Bayesian updating.
One of the most important topic in statistics is how to ensure reliable uncertain inferences.If we agree with this, the notion of probability and its connection to logic are of fundamental importance.Of course, it is also possible to propose a modal paraconsistent approach to probability based on paraconsistent modalities (see [38]), following the lines of [39], but this has to wait for a better motivation.
How can we interpret paraconsistent probabilities?One possible viewpoint is to interpret paraconsistent probabilities as degrees of belief that a rational agent attaches to events, in such a way that such degrees respect the following principles: the necessary events (for instance, tautologies) get the maximum degrees; impossible events (for instance, bottom particles) get the lowest degrees; probabilities respect logical consequence; and finite additivity is guaranteed (that is, conjunction and disjunction retain their classical interpretation).The last condition seems to be less obvious, but the Dutch book argument provides, at least in the classical case, a line of justification for keeping finite additivity.A Dutch book is a sequence of bets, regarded by the agent as fair, but that in the long range guarantee the agent's loss.De Finetti proved in [40] that a Dutch book can be constructed against an agent whose degrees of belief do not respect finite additivity probabilities.
J. B. Paris in [41] has shown that a generalization of the standard Dutch book theorem (his Theorem 5) applies to several non-classical logics, among them to paraconsistent logics where the meaning of conjunction and disjunction satisfies the clause for finite additivity in Definition 3.This generalization encompasses our notion of paraconsistent probability based on the logic Ci as well as paraconsistent probability theories based on several other LFIs.More recently, it has been shown in [42] that the Dutch book argument can be further extended to the domain of MV-algebras, providing a logical characterization of coherence for imprecise probability.
It should be taken into account that our underlying logic Ci enlarges the classical scenarios in significant ways: so for instance, even if impossible events should have degree zero by a rational agent, neither such events are necessarily unlikely nor a contradiction is an impossible event (although a consistent contradiction, as commented above, is impossible).
The question of how intimate is the relationship between logic and probability has been a controversial issue for more than three centuries.For instance, F. Ramsey considered in [9] the theory of probability as a branch of logic, in which we find echoes of Leibniz and his defense of a "new kind of logic" devoted to questions of jurisprudence.Leibniz believed that mathematical calculations could help the legal standpoint, but he lacked the relevant combinatorial tools.Boole also presented a calculus of probabilities, based on his logical calculus, developed in the earlier part of his book The Laws of Thought.All this led to a tradition of regarding probabilities as a generalization of two-valued logic, as R. Jeffrey argues in [43].
Perhaps the two most relevant questions concerning the relationship between logic and probability are: (i) whether the laws of probability should be classified as laws of logic; and (ii) how logic and probability could be combined to refine reasoning.The interpretation of probability has been a torrid controversial question.If we see consequence relations as preserving degrees of probability instead of truth (as in standard logic), then we gain an interpretation that sees probability as part of logic.We do not intend to enter into this thorny issue in this paper.What concerns us here is that, independent of the chosen interpretation, logic and probability can be combined in a mathematically and philosophically well-reasoned way.
In this respect, considering that probability theory differs from classical logic in various aspects, that paraconsistent logic differs from the classical stance, as well, and that both are tolerant to contradictions, inexactness, and so on, their combination offers a new and exciting reasoning paradigm.Some specific problems connected to probabilities in non-classical logics are pointed out by J. R. G. Williams in [14]: Studying axiomatizations of non-classical probabilities is an open-ended task.Can we extend the results of Paris, Mundici et al., and get more general sense of what set of axioms are sufficient to characterize expectations of truth value?A major obstacle here is the appeal throughout to the additivity principle (P3) and its variants, which is the only one to turn essentially on the behavior of particular connectives.Is there a way of capturing its content in terms of logical relations between sentences rather such hardwired constraints?Our axiomatics does not suffer from such problems and naturally generalizes the additivity principle.Williams also warns (on p. 17) that the generalization of conditional probability in some cases does not guarantee that P(α|α) = 1.Again, this is not a problem in our setting, and yet, another possible source of problems, the fact that P(α ∨ β) = P(α) + P(β) should hold for all logically-incompatible α and β, for instance, is treated in our Theorem 7.
As concerns axiomatizations of non-classical probabilities, some other LFIs are natural candidates to be used in connection with probabilities.One of the most promising is the three-valued paraconsistent logic LFI1 (see [44]): it is maximal with respect to classical logic (thus, in a certain sense closer to classical logic), enjoys some forms of De Morgan laws, and, above all, is algebraizable.This task is beyond the scope of this paper and will be postponed to future work.
De Finetti provocatively, in [30]), said that probability does not exist, meaning that probability exists only subjectively, within the individual minds, to which Leibniz would certainly agree.For de Finetti, probability does not need to exist as a property of the outside world and, thus, is a purely epistemic concept.In this sense, paraconsistent probability as attached to the logics of formal inconsistency can be seen as a measure of the persistence of belief under contradictions within the individual minds and has epistemological roots with strong connections with the notion of evidence, as defended in [2].
Many arguments maintain that credences should conform to the probability calculus, but since of course an agent can hold contradictory credences without losing its status of being ideally rational (and hopefully, immune to a Dutch book argument), its rationality should be coherent to a paraconsistent probability calculus.It seems to us that what we have in hand is a promising tool to enlarge the notion of probability.