## 1. Introduction

**symmetries and asymmetries**as desirable properties; Crupi et al. [8] and Greco et al. [15] suggested

**normalization**(for measures between −1 and 1) as a desirable property; Greco et al. [16] proposed

**monotonicity**as a desirable property. We can find that only measures F (proposed by Kemeny and Oppenheim) and Z among popular confirmation measures possess these desirable properties. Measure Z was proposed by Crupi et al. [8] as the normalization of some other confirmation measures. It is also called the certainty factor proposed by Shortliffe and Buchanan [7].

- to distinguish channel confirmation measures that are compatible with the likelihood ratio and prediction confirmation measures that can be used to assess probability predictions,
- to use a prediction confirmation measure c* to eliminate the Raven Paradox, and
- to explain that confirmation and falsification may be compatible.

- Confirmation and statistical learning mutually support so that the confirmation measures can be used not only to assess major premises but also to make probability predictions.

- It clarifies that we cannot use one confirmation measure for two different tasks: (1) to assess (communication) channels, such as medical tests as testing means, and (2) to assess probability predictions, such as to assess “Ravens are black”.
- It provides measure c* that manifests the Nicod criterion and hence provides a new method to clarify the Raven Paradox.

## 2. Background

#### 2.1. Statistical Probability, Logical Probability, Shannon’s Channel, and Semantic Channel

_{j}|x

_{i}) (j = 0, 1, …, n; i = 0, 1, …, m) or a group of transition probability functions (so called by Shannon [21]), P(y

_{j}|x) (j = 0, 1, …, n), where y

_{j}is a constant, and x is a variable.

_{j}|x) be the truth function of y

_{j}, where θ

_{j}is a model or a set of model parameters, by which we construct T(θ

_{j}|x). The θ

_{j}is alse explained as a fuzzy sub-set of the domain of x [17]. For example, y

_{j}= “x is young”. Its truth function may be

_{j}|x) = exp[−(x − 20)

^{2}/25],

_{k}= “x is elderly”, its truth function may be a logistic function:

_{k}|x) = 1/[1 + exp[−0.2(x − 65)],

_{j}|x), we can make the probability prediction P(x|y

_{j}) by

_{j}) = P(x)P(y

_{j}|x)/P(y

_{j}),

_{j}|x), we can also make a probability prediction or produce a likelihood function by

_{j}) is the logical probability of y

_{j}. There is

_{j}) = P(x|y

_{j}) as T(θ

_{j}|x)∝P(y

_{j}|x). Since the maximum of T(θ

_{j}|x) is 1, letting P(x|θ

_{j}) = P(x|y

_{j}), we can obtain the optimized truth function [17]:

_{j}|x) = [P(x|y

_{j})/P(x)]/max[P(x|y

_{j})/P(x)] = P(y

_{j}|x)/max[P(y

_{j}|x)],

#### 2.2. To Review Popular Confirmation Measures

_{1}to denote a hypothesis, h

_{0}to denote its negation, and h to denote one of them. We use e

_{1}as another hypothesis as the evidence of h

_{1}, e

_{0}as its negation, and e as one of them. We use c(e, h) to represent a confirmation measure, which means the degree of inductive support. Note that c(e, h) here is used as in [8], where e is on the left, and h is on the right.

- $Z({h}_{1},{e}_{1})=\{\begin{array}{l}[P({h}_{1}|{e}_{1})-P({h}_{1})]/P({h}_{0}),\text{}\mathrm{as}\text{}P({h}_{1}|{e}_{1})\ge P({h}_{1}),\\ [P({h}_{1}|{e}_{1})-P({h}_{1})]/P({h}_{1}),\text{}\mathrm{otherwise},\end{array}$
- F(e
_{1}, h_{1}) = [ P(e_{1}|h_{1})−P(e_{1}|h_{0})]/[ P(e_{1}|h_{1})+ P(e_{1}|h_{0})] (Kemeny and Oppenheim, 1952 [12]).

_{1}) and P(h

_{1}|e

_{1}) in D, R, and C, are logical probabilities. Some authors explain that probabilities they use, such as P(e

_{1}|h

_{1}) in F, are statistical probabilities.

- Hypothesis 1: h
_{1}(x) = “x is elderly”, where x is a variable for an age and h_{1}(x) is a predicate. An instance x=70 may be the evidence, and the truth value T(θ_{1}|70) of proposition h_{1}(70) should be 1. If x=50, the (uncertain) truth value should be less, such as 0.5. Let e_{1}= “x ≥ 60”, true e_{1}may also be the evidence that supports h_{1}so that T(θ_{1}|e_{1}) > T(θ_{1}). - Hypothesis 2: h
_{1}(x) = “If age x ≥ 60, then x is elderly”, which is a hypothetical judgment, a major premise, or a rule. Note that x = 70 or x ≥ 60 is only the evidence of the consequent “x is elderly” instead of the evidence of the rule. The rule’s evidence should be a sample with many examples. - Hypothesis 3: e
_{1}→h_{1}= “If age x ≥ 60, then x is elderly”, which is the same as Hypothesis 2. The difference is that e_{1}= “x ≥ 60”; h_{1}= “x is elderly”. The evidence is a sample with many examples like {(e_{1}, h_{1}), (e_{1}, h_{0}), …}, or a sampling distribution P(e, h), where P means statistical probability.

- Understanding 1: The h is the major premise to be confirmed, and e is the evidence that supports h; h and e are so used by Elles and Fitelson [14].
- Understanding 2: The e and h are those in rule e→h as used by Kemeny and Oppenheim [12]. The e is only the evidence that supports consequent h instead of the major premise e→h (see Section 2.3 for further analysis).

_{1}, h

_{1}), (e

_{0}, h

_{1}), (e

_{1}, h

_{0}), and (e

_{0}, h

_{0}) as the evidence to confirm a rule and to use the four examples’ numbers a, b, c, and d (see Table 1) to construct confirmation measures. The following statements are based on this common view.

_{1}, h

_{1}). For example, e

_{1}= “raven” (“raven” is a label or the abbreviate of “x is a raven”) and h

_{1}= “black”; a is the number of black ravens. Similarly, b is the number of black non-raven things; c is the number of non-black ravens; d is the number of non-black and non-raven things.

- Hypothesis Symmetry (HS): c(e
_{1}→h_{1}) = −c(e_{1}→h_{0}) (two consequents are opposite), - Evidence Symmetry (ES): c(e
_{1}→h_{1}) = −c(e_{0}→h_{1}) (two antecedents are opposite), - Commutativity Symmetry (CS): c(e
_{1}→h_{1}) = c(h_{1}→e_{1}), and - Total Symmetry (TS): c(e
_{1}→h_{1}) = c(e_{0}→h_{0}).

_{1}= “positive” (e.g., “x is positive”, where x is a specimen), e

_{0}= “negative”, h

_{1}= “infected” (e.g.,“x is infected”), and h

_{0}= “uninfected”. Then the positive likelihood ratio is LR

^{+}= P(e

_{1}|h

_{1})/P(e

_{1}|h

_{0}), which indicates the reliability of the rule e

_{1}→h

_{1}. Measures L and F have the one-to-one correspondence with LR:

_{1}→h

_{1}) = log LR

^{+};

_{1}, h

_{1})=(LR

^{+}− 1)/(LR

^{+}+ 1).

#### 2.3. To Distinguish a Major Premise’s Evidence and Its Consequent’s Evidence

#### 2.4. Incremental Confirmation or Absolute Confirmation

_{1}= “x is elderly” is 0.2; the evidence is one or several people with age(s) x > 60; the conditionally logical probability of h

_{1}is 0.9. With measure D, the degree of confirmation is 0.9 − 0.2 = 0.7, which is very large and irrelevant to the prior knowledge.

#### 2.5. The Semantic Channel and the Degree of Belief of Medical Tests

_{1}denotes an infected specimen (or person), h

_{0}denotes an uninfected specimen, e

_{1}is positive, and e

_{0}is negative. We can treat e

_{1}as a prediction “h is infected” and e

_{0}as a prediction “h is uninfected”. In other word, h is a true label or true statement, and e is a prediction or selected label. The x is the observed feature of h; E

_{1}and E

_{2}are two sub-sets of the domain of x. If x is in E

_{1}, then e

_{1}is selected; if x is in E

_{0}, then e

_{0}is selected.

_{0}) and P(x|h

_{1}) and the magnitudes of four conditional probabilities (with four colors).

_{1}|h

_{1}) is called sensitivity [18], and P(h

_{0}|e

_{0}) is called specificity. They ascertain a Shannon channel, which is denoted by P(e|h), as shown in Table 2.

_{1}(h) as the combination of believable and unbelievable parts (see Figure 4). The truth function of the believable part is T(E

_{1}|h)∈{0,1}. The unbelievable part is a tautology, whose truth function is always 1. Then we have the truth functions of predicates e

_{1}(h) and e

_{0}(h):

_{e}

_{1}|h)= b

_{1}’ + b

_{1}’ T(E

_{1}|h); T(θ

_{e}

_{0}|h) = b

_{0}’ + b

_{0}’ T(E

_{0}|h).

_{1}’ is the proportion of the unbelievable part, and also the truth value for the counter-instance h

_{0}.

_{1}is

_{j}) is also the predicted probability of h according to T(θ

_{e}

_{1}|h) or the semantic meaning of e

_{1}.

#### 2.6. Semantic Information Formulas and the Nicod–Fisher Criterion

_{j}about x

_{i}is defined with the log-normalized-likelihood:

_{j}|x

_{i}) is the truth value of proposition y

_{j}(x

_{i}) and T(θ

_{j}) is the logical probability of y

_{j}. If T(θ

_{j}|x) is always 1, then this semantic information formula becomes Carnap and Bar-Hillel’s semantic information formula [30].

_{j}= “x is about x

_{j}.” We can express the truth functions of y

_{j}by

_{j}|x) = exp[−(x − x

_{j})

^{2}/(2σ

^{2})].

_{i}; θ

_{j}), we have generalized Kullback–Leibler information or relative cross-entropy:

_{j}) is the sampling distribution, and P(x|θ

_{j}) is the likelihood function. If P(x|θ

_{j}) is equal to P(x|y

_{j}), then I(X; θ

_{j}) reaches its maximum and becomes the relative entropy or the Kullback–Leibler divergence.

_{1}about h becomes

_{i}|e

_{1}) is the conditional probability from a sample.

**D**be a sample {(h(t), e(t))|t = 1 to N; h(t)∈{h

_{0}, h

_{1}}; e(t)∈{e

_{0}, e

_{1}}}, which includes two sub-samples or conditional samples

**H**

_{0}with label e

_{0}and

**H**

_{1}with label e

_{1}. When N data points in

**D**come from Independent and Identically Distributed random variables, we have the log-likelihood

_{1i}is the number of example (h

_{i}, e

_{1}) in

**D**; N

_{1}is the size of

**H**

_{1}. H(h|θ

_{e}

_{1}) is the cross-entropy. If P(h|θ

_{e}

_{1}) = P(h|e

_{1}), then the cross-entropy becomes the Shannon entropy. Meanwhile, the cross-entropy reaches its minimum, and the likelihood reaches its maximum.

_{1}, h

_{1}) increases the average log-likelihood L(θ

_{e}

_{1})/N

_{1}; a counterexample (e

_{1}, h

_{0}) decreases it; examples (e

_{0}, h

_{0}) and (e

_{0}, h

_{1}) with e

_{0}are irrelevant to it.

_{1}, h

_{1}) supports rule e

_{1}→h

_{1}; a counterexample (e

_{1}, h

_{0}) undermines e

_{1}→h

_{1}. No reference exactly indicates if Nicod affirmed that (e

_{0}, h

_{1}) and (e

_{0}, h

_{1}) are irrelevant to e

_{1}→h

_{1}. If Nicod did not affirm, we can add this affirmation to the criterion, then call the corresponding criterion the Nicod–Fisher criterion, since Fisher proposed the maximum likelihood estimation. From now on, we use the Nicod–Fisher criterion to replace the Nicod criterion.

#### 2.7. Selecting Hypotheses and Confirming Rules: Two Tasks from the View of Statistical Learning

- Induction. It is similar to label learning. For uncertain hypotheses, label learning is to train a likelihood function P(x|θ
_{j}) or a truth function T(θ_{j}|x) by a sampling distribution [17]. The Logistic function often used for binary classifications may be treated as a truth function. - Hypothesis selection. It is like classification according to different criteria.
- Confirmation. It is similar to reliability analysis. The classical methods are to provide likelihood ratios and correct rates (including false rates, as those in Table 8).

_{j}is used to predict h

_{j}:

_{i}; θ

_{ej}) (i, j = 0,1), we can optimize the classifier [17]:

_{j}= “x is h

_{j}” according to x. To tell information receivers how reliable the rule e

_{j}→h

_{j}is, we need the likelihood ratio LR to indicate how good the channel is or need the correct rate to indicate how good the probability prediction is. Confirmation is similar. We need to provide a confirmation measure similar to LR, such as F, and a confirmation measure similar to the correct rate. The difference is that the confirmation measures should change between −1 and 1.

## 3. Two Novel Confirmation Measures

#### 3.1. To Derive Channel Confirmation Measure b*

_{1}about h is

_{e}

_{1})/db

_{1}’ = 0, we can obtain the optimized b

_{1}’:

_{1}|e

_{1})/ P(h

_{1}) ≥ P(h

_{0}|e

_{1})/ P(h

_{0}). The b’* can be called a disconfirmation measure. Letting both the numerator and the denominator multiply by P(e

_{1}), the above formula becomes:

_{1}’* = P(e

_{1}|h

_{0})/ P(e

_{1}|h

_{1}) = (1 − specificity)/sensibility = 1/LR

^{+}.

_{e}

_{1}|h)∝P(e

_{1}|h), the average semantic information reaches its maximum. Using T*(θ

_{e}

_{1}|h)∝P(e

_{1}|h), we can directly obtain

_{1}* = 1 − b

_{1}’* = [P(e

_{1}|h

_{1}) − P(e

_{1}|h

_{0})]/P(e

_{1}|h

_{1})

_{1}→h

_{1}. Considering P(e

_{1}|h

_{1}) < P(e

_{1}|h

_{0}), we have

_{1}* = b

_{1}’* − 1 = [P(e

_{1}|h

_{0}) − P(e

_{1}|h

_{1})]/P(e

_{1}|h

_{0}).

_{1}* possesses HS or Consequent Symmetry.

_{1}→h

_{0}) = −b*(e

_{1}→h

_{1}) and b*(e

_{0}→h

_{1}) = −b*(e

_{0}→h

_{0}).

_{1}* > 0 and P(h), we obtain

_{1}|θ

_{e}

_{1}) = P(h

_{1})/[ P(h

_{1}) + b

_{1}’*P(h

_{0})] = P(h

_{1})/[1 − b

_{1}*P(h

_{0})].

_{1}* = 0, then P(h

_{1}|θ

_{e}

_{1}) = P(h

_{1}). If b

_{1}* < 0, then we can make use of HS or Consequent Symmetry to obtain b

_{10}* = b

_{1}*(e

_{1}→h

_{0}) = |b

_{1}*(e

_{1}→h

_{1})| = |b

_{1}*|. Then we have

_{0}|θ

_{e}

_{1}) = P(h

_{0})/[ P(h

_{0}) + b

_{10}’*P(h

_{1})] = P(h

_{0})/[1 − b

_{10}*P(h

_{1})].

_{1}* = 2F

_{1}/(1 + F

_{1}) from F

_{1}= F(e

_{1}→h

_{1}) for the probability prediction P(h

_{1}|θ

_{e}

_{1}), but the calculation of probability predictions with F

_{1}is a little complicated.

_{1}→h

_{1}) caused by Δd = 1 is 0.348 − 0.333, whereas the increment caused by Δa = 1 is 0.340 − 0.333. The former is greater than the latter, which means that a piece of white chalk can support “Ravens are black” better than a black raven. Hence measure F does not accord with the Nicod–Fisher criterion. Measures b* and Z do not either.

_{e}

_{1}) is related to prior probability P(h), whereas b* and F are irrelevant to P(h).

#### 3.2. To Derive Prediction Confirmation Measure c*

_{1}|h

_{1}) = 0.5 and specificity P(e

_{0}|h

_{0}) = 0.95. We can calculate b

_{1}’* = 0.1 and b

_{1}* = 0.9. When the prior probability P(h

_{1}) of the infection changes, predicted probability P(h

_{1}|θ

_{e}

_{1}) (see Equation (35)) changes with the prior probability, as shown in Table 4. We can obtain the same results using the classical Bayes’ formula (see Equation (5)).

_{e}

_{1}) as the combination of a believable part with proportion c

_{1}and an unbelievable part with proportion c

_{1}’, as shown in Figure 6. We call c

_{1}the degree of belief of the rule e

_{1}→h

_{1}as a prediction.

_{e}

_{1}) = P(h|e

_{1}), c

_{1}becomes c

_{1}*. The degree of disconfirmation for predictions is

_{1}→h

_{1}) = P(h

_{0}|e

_{1})/P(h

_{1}|e

_{1}), if P(h

_{0}|e

_{1}) ≤ P(h

_{1}|e

_{1});

c’*(e

_{1}→h

_{1}) = P(h

_{1}|e

_{1})/P(h

_{0}|e

_{1}), if P(h

_{1}|e

_{1}) ≤ P(h

_{0}|e

_{1}).

_{1}= P(h

_{1}|θ

_{e}

_{1}) = P(h

_{1}|e

_{1}) is the correct rate of rule e

_{1}→h

_{1}. This correct rate means that the probability of h

_{1}we predict as x∈E

_{1}is CR

_{1}. Letting both the numerator and denominator of Equation (38) multiply by P(e

_{1}), we obtain

_{1}→h

_{0}) = −c*(e

_{1}→h

_{1}) and c*(e

_{0}→h

_{1}) = −c*(e

_{0}→h

_{0}).

_{0}) and P(h

_{1}), which are different. If P(h

_{0}) = P(h

_{1}) = 0.5, then prediction confirmation measure c* is equal to channel confirmation measure b*.

_{1}|θ

_{e}

_{1}) = 0.77 in Table 4, we have c

_{1}* = (0.77 − 0.23)/0.77 = 0.701. We can also use c* for probability predictions. When c

_{1}* > 0, according to Equation (39), we have the correct rate of rule e

_{1}→h

_{1}:

_{1}* = 0.701, then CR

_{1}= 1/(2−0.701) = 0.77. If c*(e

_{1}→h

_{1}) = 0, then CR

_{1}= 0.5. If c*(e

_{1}→h

_{1}) < 0, we may make use of HS to have c

_{10}* = c*(e

_{1}→h

_{0}) = |c*

_{1}|, and then make probability prediction:

_{F}* is also convenient for probability predictions when P(h) is certain. There is

_{1}→h

_{1}) and c

_{F}*(e

_{1}→h

_{1}) possess all the above-mentioned desirable properties.

#### 3.3. Converse Channel/Prediction Confirmation Measures b*(h→e) and c*(h→e)

- Bayesian confirmation measures with P(h|e) for e→h,
- Likelihoodist confirmation measures with P(e|h) for e→h,
- converse Bayesian confirmation measures with P(h|e) for h→e, and
- converse Likelihoodist confirmation measures with P(e|h) for h→e.

- channel confirmation measure b*(e→h),
- prediction confirmation measure c*(e→h),
- converse channel confirmation measure b*(h→e), and
- converse prediction confirmation measure c*(h→e).

_{1}→e

_{1}). The positive examples’ proportion and the counterexamples’ proportion can be found in the upside of Figure 7. Then we have

_{1}→e

_{1}) is sensitivity or true positive rate P(h

_{1}|e

_{1}). The correct rate reflected by c*(h

_{0}→e

_{0}) is specificity or true negative rate P(h

_{0}|e

_{0}).

_{1}→e

_{1}). Now the source is P(e) instead of P(h). We may swap e

_{1}with h

_{1}in b*(e

_{1}→h

_{1}) or swap a with d and b with c in f(a, b, c, d) to obtain

#### 3.4. Eight Confirmation Formulas for Different Antecedents and Consequents

_{1}→h

_{1}) or c*(e

_{1}→h

_{1}) means that the test shows positive for more uninfected people.

_{1}→e

_{1}) or c*(h

_{1}→e

_{1}) means that the test shows negative for more infected people. Underreports are more serious problems.

_{F}* = F, and measure c* becomes measure c

_{F}*. For example,

_{F}*(e

_{1}→h

_{1}) = (a − c)/(a + c).

#### 3.5. Relationship Between Measures b* and F

_{F}* as follows:

_{F}*. Measure b* has all the above-mentioned desirable properties as well as measure F. The differences are that measure b* has a greater absolute value than measure F; measure b* can be used for probability predictions more conveniently (see Equation (35)).

#### 3.6. Relationships between Prediction Confirmation Measures and Some Medical Test’s Indexes

_{0}|e

_{1}) is also the misreporting rate of rule e

_{1}→h

_{1}; False Negative Rate P(e

_{0}|h

_{1}) is also the underreporting rate of rule h

_{1}→e

_{1}.

## 4. Results

#### 4.1. Using Three Examples to Compare Various Confirmation Measures

_{1}→h

_{1}) = (0.1 − 0.01)/0.1 = 0.9, which is very large. In Example 2, b*(e

_{1}→h

_{1}) = (1 − 0.9)/1 = 0.1, which is very small. The two examples indicate that fewer counterexamples’ existence is more important to b* than more positive examples’ existence. Measures F, c*, and c

_{F}* also possess this characteristic, which is compatible with the Logicality requirement [15]. However, most confirmation measures do not possess this characteristic.

_{1}) = 0.2 and n = 1000 and then calculated the degrees of confirmation with different confirmation measures for the above two examples, as shown in Table 9, where the base of log for R and L is 2. Table 9 also includes Example 3 (e.g., Ex. 3), in which P(h

_{1}) is 0.01. Example 3 reveals the difference between Z and b* (or F).

_{1}→h

_{1}) are negative. The negative values should be reasonable for assessing probability predictions when counterexamples are more than positive examples.

_{0}) = 0.99>>P(h

_{1}) = 0.01, measure Z is very different from measures F and b* (see blue numbers) because F and b* are independent of P(h) unlike Z.

#### 4.2. Using Measures b* to Explain Why And How CT is also Used to Test COVID-19

_{1}|h

_{1}) − P(e

_{1}|h

_{0})]/P(e

_{1}|h

_{1}) = [0.5 − (1 − 0.95)]/0.5 = 0.9;

_{0}|h

_{0}) − P(e

_{0}|h

_{1})]/P(e

_{0}|h

_{0}) = [0.95 − (1 − 0.5)]/0.95 = 0.47.

_{1}) = 0.25, the author calculated the various degrees of confirmation with different confirmation measures for the same sensitivities and specificities, as shown in Table 12.

_{1}) from 0.1 to 0.6, we will find that measure M is also not consistent with the improved diagnosis. If we believe a test-positive or test-negative when its degree of confirmation is greater than 0.2, then D is also undesirable, and only measures F and b* satisfy our requirements.

#### 4.3. How Various Confirmation Measures are Affected by Increments Δa and Δd

_{1}) and P(e

_{1}) more than increasing P((h

_{1}|e

_{1}) and P(e

_{1}|h

_{1}). The causes for other measures except c* are similar.

## 5. Discussions

#### 5.1. To Clarify the Raven Paradox

_{1}→h

_{1}) and F(h

_{0}→e

_{0}) is that their counterexamples are the same, yet, their positive examples are different. When d increases to d+Δd, F(e

_{1}→h

_{1}) and F(h

_{0}→e

_{0}) unequally increase. Therefore,

- though measure F denies the Equivalence Condition, it still affirms that Δd affects both F(e
_{1}→h_{1}) and F(h_{0}→e_{0}); - measure F does not accord the Nicod–Fisher criterion.

_{1}→h

_{1}) can evidently increase with a and slightly increase with d. After comparing different confirmation measures, Fitelson and Hawthorne [28] believe that the likelihood ratio may be used to explain that a black raven can confirm “Ravens are black” more strongly than a non-black non-raven thing.

_{1}→h

_{1}) = (a − c)/(a˅c) and c*(h

_{0}→e

_{0}) = (d − c)/(d˅c), the Equivalence Condition does not hold, and measure c* accords with the Nicod–Fisher criterion very well. Hence, the Raven Paradox does not exist anymore according to measure c*.

#### 5.2. About Incremental Confirmation and Absolute Confirmation

_{1}→h

_{1}) increases from 0.1667 to 0.1669; c*(e

_{1}→h

_{1}) increase from 0.5 to 0.5025. The increments are about 1/10 of those in Table 13. Therefore, the increment of the degree of confirmation brought about by a new example is closely related to the number of old examples or our prior knowledge.

- the sample size n is big enough;
- each example is selected independently;
- examples are representative.

#### 5.3. Is Hypothesis Symmetry or Consequent Symmetry desirable?

#### 5.4. About Bayesian Confirmation and Likelihoodist Confirmation

#### 5.5. About the Certainty Factor for Probabilistic Expert Systems

_{1}→h

_{1}) is related to the believable part of the truth function of predicate e

_{1}(h). It is similar to CF(h

_{1}→e

_{1}). The differences are that b*(e

_{1}→h

_{1}) is independent of P(h) whereas CF(h

_{1}→e

_{1}) is related to P(h); b*(e

_{1}→h

_{1}) is compatible with statistical probability theory whereas CF(h

_{1}→e

_{1}) is not.

#### 5.6. How Confirmation Measures F, b*, and c* are Compatible with Popper’s Falsification Thought

_{1}|h

_{1}) (for b*) or P(h

_{1}|e

_{1}) (for c*). In Example 2 of Table 9, although the proportion of positive examples is large, the proportion of counterexamples is not small so that the degree of confirmation is very small. This example shows that to raise degree of confirmation, it is not sufficient to increase the posterior probability. It is necessary and sufficient to decrease the relative proportion of counterexamples.

_{1}→h

_{1}) is what they need.

## 6. Conclusions

## Supplementary Materials

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- Carnap, R. Logical Foundations of Probability, 2nd ed.; University of Chicago Press: Chicago, IL, USA, 1962. [Google Scholar]
- Popper, K. Conjectures and Refutations, 1st ed.; Routledge: London, UK; New York, NY, USA, 2002. [Google Scholar]
- Hempel, C.G. Studies in the Logic of Confirmation. Mind
**1945**, 54, 1–26, 97–121. [Google Scholar] [CrossRef] - Nicod, J. Le Problème Logique De L’induction; Alcan: Paris, France, 1924; p. 219, (Engl. Transl. The logical problem of induction. In Foundations of Geometry and Induction; Routledge: London, UK, 2000.). [Google Scholar]
- Mortimer, H. The Logic of Induction; Prentice Hall: Paramus, NJ, USA, 1988. [Google Scholar]
- Horwich, P. Probability and Evidence; Cambridge University Press: Cambridge, UK, 1982. [Google Scholar]
- Shortliffe, E.H.; Buchanan, B.G. A model of inexact reasoning in medicine. Math. Biosci.
**1975**, 23, 351–379. [Google Scholar] [CrossRef] - Crupi, V.; Tentori, K.; Gonzalez, M. On Bayesian measures of evidential support: Theoretical and empirical issues. Philos. Sci.
**2007**, 74, 229–252. [Google Scholar] [CrossRef][Green Version] - Christensen, D. Measuring confirmation. J. Philos.
**1999**, 96, 437–461. [Google Scholar] [CrossRef] - Nozick, R. Philosophical Explanations; Clarendon: Oxford, UK, 1981. [Google Scholar]
- Good, I.J. The best explicatum for weight of evidence. J. Stat. Comput. Simul.
**1984**, 19, 294–299. [Google Scholar] [CrossRef] - Kemeny, J.; Oppenheim, P. Degrees of factual support. Philos. Sci.
**1952**, 19, 307–324. [Google Scholar] [CrossRef][Green Version] - Fitelson, B. Studies in Bayesian Confirmation Theory. Ph.D. Thesis, University of Wisconsin, Madison, WI, USA, 2001. [Google Scholar]
- Eells, E.; Fitelson, B. Symmetries and asymmetries in evidential support. Philos. Stud.
**2002**, 107, 129–142. [Google Scholar] [CrossRef] - Greco, S.; Slowiński, R.; Szczęch, I. Properties of rule interestingness measures and alternative approaches to normalization of measures. Inf. Sci.
**2012**, 216, 1–16. [Google Scholar] [CrossRef] - Greco, S.; Pawlak, Z.; Slowiński, R. Can Bayesian confirmation measures be useful for rough set decision rules? Eng. Appl. Artif. Intell.
**2004**, 17, 345–361. [Google Scholar] [CrossRef] - Lu, C. Semantic information G theory and Logical Bayesian Inference for machine learning. Information
**2019**, 10, 261. [Google Scholar] [CrossRef][Green Version] - Sensitivity and specificity. Wikipedia the Free Encyclopedia. Available online: https://en.wikipedia.org/wiki/Sensitivity_and_specificity (accessed on 27 February 2020).
- Greco, S.; Slowiński, R.; Szczech, I. Measures of rule interestingness in various perspectives of confirmation. Inf. Sci.
**2016**, 346–347, 216–235. [Google Scholar] [CrossRef][Green Version] - Lu, C. A generalization of Shannon’s information theory. Int. J. Gen. Syst.
**1999**, 28, 453–490. [Google Scholar] [CrossRef] - Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J.
**1948**, 27, 379–429, 623–656. [Google Scholar] [CrossRef][Green Version] - Tarski, A. The semantic conception of truth and the foundations of semantics. Philos. Phenomenol. Res.
**1994**, 4, 341–376. [Google Scholar] [CrossRef] - Davidson, D. Truth and meaning. Synthese
**1967**, 17, 304–323. [Google Scholar] [CrossRef] - Tentori, K.; Crupi, V.; Bonini, N.; Osherson, D. Comparison of confirmation measures. Cognition
**2007**, 103, 107–119. [Google Scholar] [CrossRef] - Glass, D.H. Entailment and symmetry in confirmation measures of interestingness. Inf. Sci.
**2014**, 279, 552–559. [Google Scholar] [CrossRef] - Susmaga, R.; Szczęch, I. Selected group-theoretic aspects of confirmation measure symmetries. Inf. Sci.
**2016**, 346–347, 424–441. [Google Scholar] [CrossRef] - Thornbury, I.R.; Fryback, D.G.; Edwards, W. Likelihood ratios as a measure of the diagnostic usefulness of excretory urogram information. Radiology
**1975**, 114, 561–565. [Google Scholar] [CrossRef] - Fitelson, B.; Hawthorne, J. How Bayesian confirmation theory handles the paradox of the ravens. In The Place of Probability in Science; Eells, E., Fetzer, J., Eds.; Springer: Dordrecht, Germany, 2010; pp. 247–276. [Google Scholar]
- Huber, F. What Is the Point of Confirmation? Philos. Sci.
**2005**, 72, 1146–1159. [Google Scholar] [CrossRef][Green Version] - Carnap, R.; Bar-Hillel, Y. An Outline of a Theory of Semantic Information; Technical Report No. 247; Research Lab. of Electronics, MIT: Cambridge, MA, USA, 1952. [Google Scholar]
- Crupi, V.; Tentori, K. State of the field: Measuring information and confirmation. Stud. Hist. Philos. Sci.
**2014**, 47, 81–90. [Google Scholar] [CrossRef] - Lu, C. Semantic channel and Shannon channel mutually match and iterate for tests and estimations with maximum mutual information and maximum likelihood. In Proceedings of the 2018 IEEE International Conference on Big Data and Smart Computing, Shanghai, China, 15 January 2018; IEEE Computer Society Press Room: Washington, DC, USA, 2018; pp. 15–18. [Google Scholar]
- Available online: http://news.cctv.com/2020/02/13/ARTIHIHFAHyTYO6NEovYRMNh200213.shtml (accessed on 13 February 2020).
- Wang, S.; Kang, B.; Ma, J.; Zeng, X.; Xiao, M.; Guo, J.; Cai, M.; Yang, J.; Li, Y.; Meng, X.; et al. A deep learning algorithm using CT images to screen for Corona Virus Disease (COVID-19). medRxiv
**2020**. [Google Scholar] [CrossRef][Green Version] - Scheffler, I.; Goodman, N.J. Selective confirmation and the ravens: A reply to Foster. J. Philos.
**1972**, 69, 78–83. [Google Scholar] [CrossRef] - Heckerman, D.E.; Shortliffe, E.H. From certainty factors to belief networks. Artif. Intell. Med.
**1992**, 4, 35–52. [Google Scholar] [CrossRef]

**Figure 3.**The relationship between two feature distributions and four conditional probabilities for the Shannon channel of the medical test.

**Figure 4.**Truth function T(θ

_{e}

_{1}|h) includes the believable part with proportion b

_{1}and the unbelievable part with proportion b

_{1}’ (b

_{1}’ = 1 − |b

_{1}|).

**Figure 6.**Likelihood function P(h|θ

_{e}

_{1}) may be regarded as a believable part plus an unbelievable part.

**Figure 7.**The numbers of positive examples and counterexamples for c*(e

_{0}→h

_{0}) (see the left side) and c*(e

_{1}→h

_{1}) (see the right side).

**Figure 9.**How the proportions of positive examples and counterexamples affect b*(e

_{1}→h

_{1}). (

**a**) Example 1: positive examples’ proportion is P(e

_{1},|h

_{1}) = 0.1, and counterexamples’ proportion is P(e

_{1}|h

_{0}) = 0.01.

**(b)**Example 2: positive examples’ proportion is P(e

_{1},|h

_{1}) = 1, and counterexamples’ proportion is P(e

_{1}|h

_{0}) = 0.9.

**Figure 10.**Using both NAT and CT to diagnose the infection of COVID-19 with the help of confirmation measure b*.

e_{0} | e_{1} | |
---|---|---|

h_{1} | b | a |

h_{0} | d | c |

Negative e_{0} | Positive e_{1} | |
---|---|---|

Infected h_{1} | P(e_{0}|h_{1}) = 1 − sensitivity | P(e_{1}|h_{1}) = sensitivity |

Uninfected h_{0} | P(e_{0}|h_{0}) = specificity | P(e_{1}|h_{0}) = 1 − specificity |

e_{0} (Negative) | e_{1} (Positive) | |
---|---|---|

h_{1} (infected) | T(θe_{0}|h_{1}) = b_{0}’ | T(θ_{e}_{1}|h_{1}) = 1 |

h_{0} (uninfected) | T(θ_{e}_{0}|h_{0}) = 1 | T(θ_{e}_{1}|h_{0}) = b_{1}’ |

**Table 4.**Predictive probability P(h

_{1}|θ

_{e}

_{1}) changes with prior probability P(h

_{1}) as b

_{1}* = 0.9.

Common People | Risky Group | High-Risky Group | |
---|---|---|---|

P(h_{1}) | 0.001 | 0.1 | 0.25 |

P(h_{1}|θ_{e}_{1}) | 0.002 | 0.19 | 0.77 |

e_{0} (Negative) | e_{1} (Positive) | |
---|---|---|

h_{1} (infected) | P(e_{0}|h_{1}) = b/(a + b) | P(e_{1}|h_{1}) = a/(a + b) |

h_{0} (uninfected) | P(e_{0}|h_{0}) = d/(c + d) | P(e_{1}|h_{0}) = c/(c + d) |

h_{1} (infected) | P(h_{1}|e_{0}) = b/(b + d) | P(h_{1}|e_{1}) = a/(a + c) |

h_{0} (uninfected) | P(h_{0}|e_{0}) = d/(b + d) | P(h_{0}|e_{1}) = c/(a + c) |

b*(e→h) (for Channels, Refer to Figure 3) | c*(e→h) (for Predictions, Refer to Figure 7) | |
---|---|---|

e_{1}→h_{1} | $\frac{P({e}_{1}|{h}_{1})-P({e}_{1}|{h}_{0})}{P({e}_{1}|{h}_{1})\vee P({e}_{1}|{h}_{0})}=\frac{ad-bc}{a(c+d)\vee c(a+b)}$ | $\frac{P({h}_{1}|{e}_{1})-P({h}_{0}|{e}_{1})}{P({h}_{1}|{e}_{1})\vee P({h}_{0}|{e}_{1})}=\frac{a-c}{a\vee c}$ |

e_{0}→h_{0} | $\frac{P({e}_{0}|{h}_{0})-P({e}_{0}|{h}_{1})}{P({e}_{0}|{h}_{0})\vee P({e}_{0}|{h}_{1})}=\frac{ad-bc}{d(a+b)\vee b(c+d)}$ | $\frac{P({h}_{0}|{e}_{0})-P({h}_{1}|{e}_{0})}{P({h}_{0}|{e}_{0})\vee P({h}_{1}|{e}_{0})}=\frac{d-b}{d\vee b}$ |

b*(h→e) (for Converse Channels) | c*(h→e) (for Converse Predictions, Refer to Figure 7) | |
---|---|---|

h_{1}→e_{1} | $\frac{P({h}_{1}|{e}_{1})-P({h}_{1}|{e}_{0})}{P({h}_{1}|{e}_{1})\vee P({h}_{1}|{e}_{0})}=\frac{ad-bc}{a(b+d)\vee b(a+c)}$ | $\frac{P({e}_{1}|{h}_{1})-P({e}_{0}|{h}_{1})}{P({e}_{1}|{h}_{1})\vee P({e}_{0}|{h}_{1})}=\frac{a-b}{a\vee b}$ |

h_{0}→e_{0} | $\frac{P({h}_{0}|{e}_{0})-P({h}_{0}|{e}_{1})}{P({h}_{0}|{e}_{0})\vee P({h}_{0}|{e}_{1})}=\frac{ad-bc}{d(a+c)\vee c(b+d)}$ | $\frac{P({e}_{0}|{h}_{0})-P({e}_{1}|{h}_{0})}{P({e}_{0}|{h}_{0})\vee P({e}_{1}|{h}_{0})}=\frac{d-c}{d\vee c}$ |

**Table 8.**PCMs (Prediction Confirmation Measures) are related to different correct rates and false rates in the medical test [18].

PCM | Correct Rate Positively Related to c* | False Rate Negatively Related to c* |
---|---|---|

c*(e_{1}→h_{1}) | P(h_{1}|e_{1}): PPV (Positive Predictive Value) | P(h_{0}|e_{1}): FDR (False Discovery Rate) |

c*(e_{0}→h_{0}) | P(h_{0}|e_{0}): NPV (Negative Predictive Value) | P(h_{1}|e_{0}): FOR (False Omission Rate) |

c*(h_{1}→e_{1}) | P(e_{1}|h_{1}): Sensitivity or TPR (True Positive Rate) | P(e_{0}|h_{1}): FNR (False Negative Rate) |

c*(h_{0}→e_{0}) | P(e_{0}|h_{0}): Specificity or TNR (True Negative Rate) | P(e_{1}|h_{0}): FPR (False Positive Rate) |

Ex. | a, b, c, d | D | M | R | C | Z | S | N | L | F | b* | c* |
---|---|---|---|---|---|---|---|---|---|---|---|---|

1 | 20, 180, 8, 792 | 0.514 | 0.072 | 1.84 | 0.014 | 0.643 | 0.529 | 0.09 | 3.32 | 0.818 | 0.9 | 0.8 |

2 | 200, 0, 720, 80 | 0.017 | 0.08 | 0.12 | 0.016 | 0.022 | 0.217 | 0.1 | 0.152 | 0.053 | 0.1 | −0.722 |

3 | 10, 0, 90, 900 | 0.09 | 0.9 | 3.32 | 0.009 | 0.091 | 0.1 | 0.091 | 3.46 | 0.833 | 0.91 | −0.9 |

Sensitivity | Specificity | |
---|---|---|

NAT | 0.5 | 0.95 |

CT | 0.8 | 0.75 |

NAT-Negative, b_{0}* = 0.47 | NAT-Positive, b_{1}* = 0.9 | |
---|---|---|

CT-positive, b_{1}* = 0.69 | Final positive (changed) | Final positive |

CT-negative, b_{0}* = 0.73 | Final negative | Final positive |

D | M | Z | S | C | N | F | b* | c* | |
---|---|---|---|---|---|---|---|---|---|

c(NAT-) | 0.10 | 0.11 | 0.40 | 0.62 | 0.08 | 0.45 | 0.31 | 0.47 | 0.83 |

c(NAT+) | 0.52 | 0.34 | 0.69 | 0.62 | 0.08 | 0.45 | 0.82 | 0.90 | 0.70 |

c(CT−) | 0.17 | 0.14 | 0.67 | 0.43 | 0.10 | 0.55 | 0.58 | 0.73 | 0.91 |

c(CT+) | 0.27 | 0.41 | 0.36 | 0.43 | 0.10 | 0.55 | 0.52 | 0.69 | 0.06 |

c(CT+) > c(NAT−) | No | No | No | ||||||

c(NAT+) > c(CT−) | No | No | No |

f(a, b, c, d) | a = d = 20 b = c = 10 | Δa = 1 Δd = 0 | Δd = 1 Δa = 0 | Δf/Δa-Δf/Δd | |
---|---|---|---|---|---|

D(e_{1}→h_{1}) | a/(a + c) − (a + b)/n | 0.167 | 0.169 | 0.175 | −0.006 |

M(e_{1}→h_{1}) | a/(a + b) − (a + c)/n | 0.167 | 0.169 | 0.175 | −0.006 |

C(e_{1}→h_{1}) | a/n − (a + c)(a + b)/n^{2} | 0.083 | 0.086 | 0.086 | 0 |

Z(e_{1}→h_{1}) | D(e_{1}→h_{1})/[(c + d)/n] | 0.333 | 0.344 | 0.344 | 0 |

S(e_{1}→h_{1}) | a/(a + c) − b(b + d) | 0.333 | 0.334 | 0.344 | 0 |

N(e_{1}→h_{1}) | a/(a + b) − c/(c + d) | 0.333 | 0.334 | 0.344 | 0 |

F(e_{1}→h_{1}) | (ad-bc)/(ad + bc + 2ac) | 0.333 | 0.340 | 0.348 | −0.007 |

LR^{+} | [a/(a + b)]/[c/(c + d)] | 2 | 2.03 | 2.07 | −0.034 |

c*(e_{1}→h_{1}) | (a − c)/max(a, c) | 0.5 | 0.524 | 0.5 | 0.024 > 0 |

HS or Consequent Symmetry | ES or Antecedent Symmetry | |
---|---|---|

Misunderstood HS | c(e, h) = −c(e, −h) | c(h, e) = −c(−h, e) |

Misunderstood ES | c(h, e) = −c(h, −e) | c(e, h) = −c(−e, h) |

© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).