# The P–T Probability Framework for Semantic Communication, Falsification, Confirmation, and Bayesian Reasoning

## Abstract

**:**

## 1. Introduction

#### 1.1. Are There Different Probabilities or One Probability with Different Interpretations?

“Ever since its birth, probability has been characterized by a peculiar duality of meaning. As described by Hacking: probability is ‘Janus faced. On the one side it is statistical, concerning itself with the stochastic laws of chance processes. On the other side it is epistemological, dedicated to assessing the reasonable degree of belief in propositions quite devoid of statistical background’”.

**Example**

**1.**

_{1}= “child”, y

_{2}= ”youth”, y

_{3}= ”middle aged”, y

_{4}= ”elder”, and y

_{5}= “adult”. Notice that some youths and all middle-aged people and elders are also adults. Suppose that ten thousand people go through a door. For everyone denoted by x, entrance guards judge if x is adult, or if “x is adult” is true. If 7000 people are judged to be adults, then the logical probability of y

_{5}= “x is adult” is 7000/10,000 = 0.7. If the task of entrance guards is to select one from the five labels for every person, there may be only 1000 people who are labeled “adult”. The statistical probability of “adult” should be 1000/10,000 = 0.1.

_{5}is less than its logical probability? The reason is that other 6000 adults are labeled “youth”, “middle aged”, or “elder”. In other words, the reason is that a person can only make one of five labels’ selected probabilities increase 1/10,000. In contrast, a person can make two or more labels’ logical probabilities increase 1/10,000. For example, a 20-year-old man can make both logical probabilities of “youth” and “adult” increase 1/10,000.

- A hypothesis or label y
_{j}has two probabilities: a logical probability and a statistical or selected probability. If we use P(y_{j}) to represent its statistical probability, we cannot use P(y_{j}) for its logical probability. - Statistical probabilities are normalized (the sum is 1), whereas logical probabilities are not. The logical probability of a hypothesis is bigger than its statistical probability in general.
- For given age x, such as x = 30, the sum of the truth values of five labels may be bigger than 1.
- The logical probability of “adult” is related to the population age distribution P(x), which is a statistical probability distribution. Clearly, the logical probabilities of “adult” obtained from the door of a school and the door of a hospital must be different.

_{j}calculated above accords with Reichenbach’s frequentist definition of logical probability [9]. However, Reichenbach has not distinguished logical probability and statistical probability.

#### 1.2. A Problem with the Extension of Logic: Can a Probability Function Be a Truth Function?

_{j}(x) = “x has property a

_{j}”, and set A

_{j}includes all x that have property a

_{j}. Then, the characteristic function of A

_{j}is the truth function of y

_{j}(x). For example, x represents an age or a x-year-old person, and people with age ≥ 18 are defined as adults. Then, the set that includes all x with property age ≥ 18 is A

_{j}= [18, ∞). Its characteristic function is also the truth function of predicate “x is adult” (see Figure 1).

_{60}, and N

_{60}* people among the N

_{60}people are judged to be elderly. Then, the truth value of proposition “a 60-year-old person is elderly” is N

_{60}*/N

_{60}. This truth value is about 0.8. If x = 80, the truth value should be 1.

_{j}|x) with variable x as the condition should be a truth function (e.g., fuzzy truth function). Shannon calls P(y

_{j}|x) the Transition Probability Function (TPF) ([3], p. 11). In the following, we use an example in natural language to explain why TPFs are not truth functions and how the human brain thinks using conceptual extensions (or denotations), which can be represented by truth functions.

**Example**

**2.**

**.**Suppose we know the population age prior distribution P(x) and the posterior distributions P(x|“adult”) and P(x|“elder”), where the extension of “adult” is crisp and that of “elder” is fuzzy. Please solve the truth functions or the extensions of labels “adult” and “elder”.

_{j}). We cannot get TPF P(y

_{j}|x) either. Because of P(y

_{j}|x) = P(x|y

_{j})P(y

_{j})/P(x) according to Bayes’ theorem, without P(y

_{j}), we cannot obtain P(y

_{j}|x). However, the human brain can estimate the extension of y

_{j}from P(x) and P(x|y

_{j}) without P(y

_{j}) (see Figure 1). With the extension of a label, even if P(x) is changed, the human brain can still predict the posterior distribution and classify people with different labels.

- The existing probability theories lack the methods used by the human brain (1) to find the extension of a label (an inductive method) and (2) to use the extension as the condition for reasoning or prediction (a deductive method).
- The TPF and the truth function are different, because a TPF is related to how many labels are used, whereas a truth function is not.

#### 1.3. Distinguishing (Fuzzy) Truth Values and Logical Probabilities—Using Zadeh’s Fuzzy Set Theory

_{i}in fuzzy set θ

_{j}is the truth value of proposition y

_{j}(x

_{i}) = “x

_{i}is in θ

_{j}”, the membership function of a fuzzy set is equivalent to the truth function of a predicate y

_{j}(x). The probability of a fuzzy event proposed by Zadeh [20] is just the logical probability of the predicate “x is in θ

_{j}”. Zadeh [21] thinks that the fuzzy set theory and the probability theory are complementary rather than competitive; fuzzy sets can enrich our concepts of probability.

#### 1.4. Can We Use Sampling Distributions to Optimize Truth Functions or Membership Functions?

_{j}) represents a sampling distribution. The core method of statistical learning is to optimize a likelihood function P(x|θ

_{j}) (θ

_{j}is a model or a set of parameters) with a sampling distribution P(x|y

_{j}). We also want to use sampling distributions to optimize truth functions or membership functions so that we can connect statistics and logic.

#### 1.5. Purpose, Methods, and Structure of This Paper

- using the statistical probability framework, e.g., the P probability framework, adopted by Shannon for electrocommunication as the foundation, adding Kolmogorov’s axioms (for logical probability) and Zadeh’s membership functions (as truth functions) to the framework,
- setting up the relationship between truth functions and likelihood functions by a new Bayes Theorem, called Bayes’ Theorem III, so that we can optimize truth functions (in logic) with sampling distributions (in statistics), and
- using the P–T probability framework and the semantic information Formulas (1) to express verisimilitude and testing severity and (2) to derive two practical confirmation measures and several new formulas to enrich Bayesian reasoning (including fuzzy syllogisms).

## 2. The P–T Probability Framework

#### 2.1. The Probability Framework Adopted by Shannon for Electrocommunication

**Definition**

**1.**

- X is a discrete random variable taking a value xϵU, whereUis the universe {x
_{1}, x_{2}, …, x_{m}}; P(x_{i}) =P(X = x_{i}) is the limit of the relative frequency of event X = x_{i}. In the following applications, x represents an instance or a sample point. - Y is a discrete random variable taking a value yϵV = {y
_{1}, y_{2}, …, y_{n}};P(y_{j}) = P(Y = y_{j}). In the following applications, y represents a label, hypothesis, or predicate. - P(y
_{j}|x) = P(Y = y_{j}|X = x) is a Transition Probability Function (TPF) (named by Shannon [3]).

_{j}), j = 1,2, …, n (see Equation (6)).

#### 2.2. The P–T Probability Framework for Semantic Communication

**Definition**

**2.**

- The y
_{j}is a label or a hypothesis, y_{j}(x_{i}) is a proposition, and y_{j}(x) is a propositional function. We also call y_{j}or y_{j}(x) a predicate. The θ_{j}is a fuzzy subset of universe U, which is used to explain the semantic meaning of a propositional function y_{j}(x) = “x ϵ θ_{j}” = “x belongs to θ_{j}” = “x is in θ_{j}”. The θ_{j}is also treated as a model or a set of model parameters. - A probability that is defined with “=“, such as P(y
_{j}) = P(Y = y_{j}), is a statistical probability. A probability that is defined with “ϵ”, such as P(X ϵ θ_{j}), is a logical probability. To distinguish P(Y = y_{j}) and P(X ϵ θ_{j}), we define T(y_{j}) = T(θ_{j}) = P(X ϵ θ_{j}) as the logical probability of y_{j}. - T(y
_{j}|x) = T(θ_{j}|x) = P(x ϵ θ_{j}) = P(X ϵ θ_{j}|X = x) is the truth function of y_{j}and the membership function of θ_{j}. It changes between 0 and 1, and its maximum is 1.

_{j}) is equivalent to P(“X ϵ θ” is true) = P(y

_{j}is true). According to Davidson’s truth condition semantics [32], the truth function of y

_{j}ascertains the (formally) semantic meaning of y

_{j}. A group of truth functions, T(θ

_{j}|x), j = 1, 2, …, n, forms a semantic channel:

^{U}be the power set (Borel set), including all possible subsets of U, and let Θ denote the random variable taking a value θ ϵ {θ

_{1}, θ

_{2}, …, θ

_{|}

_{2U}

_{|}}. Then, Kolmogorov’s probability P(θ

_{j}) is the probability of event Θ = θ

_{j}. As Θ = θ

_{j}is equivalent to X ϵ θ

_{j}, we have P(Θ = θ

_{j}) = P(X ϵ θ

_{j}) = T(θ

_{j}). If all sets in 2

^{U}are crisp, then T(θ

_{j}) becomes Kolmogorov’s probability.

_{j}:

_{j}) is equal to logical probability T(θ

_{j}) for every j only when the following two conditions are tenable:

- The universe of θ only contains some subsets of 2
^{U}that form a partition of U, which means any two subsets in the universe are disjoint. - The y
_{j}is always correctly selected.

_{j}) = T(θ

_{j}) for every j. Many researchers do not distinguish statistical probability and logical probability, because they suppose the above two conditions are always tenable. However, in real life, the two conditions are not tenable in general. When we use Kolmogorov’s probability system for many applications without distinguishing statistical and logical probabilities, the two conditions are necessary. However, the latter condition is rarely mentioned.

_{j}). Suppose there are three atomic propositions a, b, and c. The logical probability of a minimum term, such as $ab\overline{c}$ (a and b and not c), is 1/8; the logical probability of $b\overline{c}=ab\overline{c}\vee \overline{a}b\overline{c}$ is 1/4. This logical probability defined by Carnap and Bar-Hillel is irrelevant to the prior probability distribution P(x). However, the logical probability T(θ

_{j}) is related to P(x). A less logical probability needs not only a smaller extension, but also rarer instances. For example, “x is 20-year-old” has a less logical probability than “x is young”, but “x is over 100 years old” with a larger extension has a still less logical probability than “x is 20-year-old”, because people over 100 are rare.

_{j}) is the logical probability of a predicate itself y

_{j}(x) itself without the quantifier (such as ∀x). It is the average truth value. Why do we define the logical probability in this way? One reason is that this mathematical definition conforms to the literal definition that logical probability is the probability in which a hypothesis is judged to be true. Another reason is that this definition is useful for extending Bayes’ Theorem.

#### 2.3. Three Bayes’ Theorems

**Bayes’**

**Theorem I.**

^{U}; A

^{c}and B

^{c}are two complementary sets of A and B. Two symmetrical formulas express this theorem:

^{c})T(B

^{c}),

^{c})T(A

^{c}).

**Bayes’**

**Theorem II.**

**Bayes’**

**Theorem III.**

_{j}) in Equation (8) is the horizontally normalizing constant (which makes the sum of P(x|θ

_{j}) be 1), whereas T(θ

_{j}) in Equation (9) is the longitudinally normalizing constant (which makes the maximum of T(θ

_{j}|x) be 1).

_{j}) from Equation (6) the Bayes prediction, P(x|θ

_{j}) from Equation (8) the semantic Bayes prediction, and Equation (8) the semantic Bayes formula. In [27], Dubois and Prade mentioned a formula, similar to Equation (8), proposed by S. F. Thomas and M. R. Civanlar earlier.

#### 2.4. The Matching Relation between Statistical Probabilities and Logical Probabilities

**D**be a sample {(x(t), y(t))|t = 1 to N; x(t) ϵ U; y(t) ϵ V}, where (x(t), y(t)) is an example, and the use of each label is almost reasonable. All examples with label y

_{j}in

**D**form a sub-sample denoted by

**D**

_{j}.

**D**

_{j}is big enough, we can obtain smooth sampling distribution P(x|y

_{j}) from

**D**

_{j}. According to Fisher’s maximum likelihood estimation, when P(x|y

_{j}) = P(x|θ

_{j}), we have the maximum likelihood between

**D**

_{j}and θ

_{j}. Therefore, we set the matching relation between P and T by

_{j}) = P(x|y

_{j}), j = 1, 2, …, n,

_{j}) is the optimized likelihood function. Then, we have the optimized truth functions:

_{j}|x) = [P*(x|θ

_{j})/P(x)]/max(P*(x|θ

_{j})/P(x)) = [P(x|y

_{j})/P(x)]/max(P(x|y

_{j})/P(x)), j = 1, 2, …, n.

_{j}|x) = P(y

_{j}|x)/max(P(y

_{j}|x)), j = 1, 2, …, n,

_{j}|x) indicates the using rule of a label y

_{j}, and hence, the above Formulas (12) and (13) reflect Wittgenstein’s thought: meaning lies in uses ([35], p. 80).

_{j}|x), we can make new probability predictions P(x|θ

_{j}) using Bayes’ Theorem III when P(x) is changed.

_{j}) is the mass, and T(θ

_{j}) is like the belief. Suppose that V has a subset V

_{j}= {y

_{j}

_{1}, y

_{j}

_{2}, …}, in which every label is not contradictory with y

_{j}. We define PL(V

_{j}) = ∑

_{k}P(y

_{jk}), which is like the plausibility. We also have P(y

_{j}) ≤ T(θ

_{j}) ≤ PL(V

_{j}) and P(y

_{j}|x) ≤ T(θ

_{j}|x) ≤ PL(V

_{j}|x).

**D**, P(x) is flat, which means there are N/m examples for every x. Then, the area under P(y

_{j}|x) is the number of examples in

**D**

_{j}. Suppose that example (x

_{j}*, y

_{j}) is most among examples (x, y

_{j}) with different x in

**D**

_{j}, and the number of (x

_{j}*, y

_{j}) is N

_{j}*. Then, we divide all examples with y

_{j}into N

_{j}* rows. Every row can be treated as a set S

_{k}. It is easy to prove [37] that the truth function obtained from Equation (13) is the same as the membership function obtained from the statistics of a random set.

**D**

_{j}to be big enough. Otherwise, P(y

_{j}|x) is not smooth, and hence P(x|θ

_{j}) is meaningless. In these cases, we need to use the maximum likelihood criterion or the maximum semantic information criterion to optimize truth functions (see Section 3.2).

#### 2.5. The Logical Probability and the Truth Function of a GPS Pointer or a Color Sense

_{j}|x) = exp[−|x − x

_{j}|

^{2}/(2σ

^{2})]

_{j}= “x is about x

_{j},” where x

_{j}is a reading, x is the actual value, and σ is the standard deviation. For a GPS device, x

_{j}is the pointed position (a vector) by y

_{j}, x is the actual position, and σ is the Root Mean Square (RMS), which denotes the accuracy of a GPS device.

**Example**

**3.**

_{j}.

_{j}) or a TPF P(y

_{j}|x). It is not a likelihood function, because the most possible position is not x

_{j}pointed by y

_{j}. It is also not TPF P(y

_{j}|x), because we cannot know its maximum. It is reasonable to think that the GPS pointer provides a truth function.

_{j}), according to which the position with the star is the most possible position. Most people can make the same prediction without using any mathematical formula. It seems that human brains automatically use a similar method: predicting according to the fuzzy extension of y

_{j}and the prior knowledge P(x).

_{j}is also the similarity function or the confusion probability function between x and x

_{j}; logical probability T(θ

_{j}) is the confusion probability of other colors that are confused with x

_{j}by our eyes.

_{j}with the confusion probability. Suppose that there exists a Plato’s idea x

_{j}for every fuzzy set θ

_{j}. Then, the membership function of θ

_{j}or the truth function of y

_{j}is also the confusion probability function between x and x

_{j}[14].

## 3. The P–T Probability Framework for Semantic Communication, Statistical Learning, and Constraint Control

#### 3.1. From Shannon’s Information Measure to the Semantic Information Measure

_{j}, I(X; Y) becomes the Kullback–Leibler (KL) divergence:

_{i}, I(X; y

_{j}) becomes

_{j}) to replace posterior distribution P(x|y

_{j}), we have (the amount of) semantic information conveyed by y

_{j}about x

_{i}:

_{j}(x

_{i}) is always 1, then the above formula becomes Carnap and Bar-Hillel’s semantic information formula [33].

_{i}; θ

_{j}) for different x

_{i}, we have average semantic information

_{i}|y

_{j}) (i = 1,2, …) is the sampling distribution. This formula can be used to optimize truth functions.

_{j}) for different y

_{j}, we have semantic mutual information:

_{j}) and I(X; Θ) can be used as the criterion of classifications. We also call I(X; θ

_{j}) generalized Kullback–Leibler (KL) information and I(X; Θ) generalized mutual information.

_{i}, y

_{j}) between x

_{i}and y

_{j}, and R is the minimum mutual information for given D. R(D) will be further introduced in relation to random events’ control in Section 3.3.

_{i}, y

_{j}) with I(x

_{i}; θ

_{j}), I developed another fidelity evaluation function R(G) [12,15], where G is the lower limit of semantic mutual information I(X; Θ), and R(G) is the minimum Shannon mutual information for given G. G/R(G) indicates the communication efficiency, whose upper limit is 1 when P(x|θ

_{j}) = P(x|y

_{j}) for all j. R(G) is called the rate-verisimilitude function (this verisimilitude will be further discussed in Section 4.3). The rate-verisimilitude function is useful for data compression according to visual discrimination [15] and the convergence proofs of mixture models and maximum mutual information classifications [12].

#### 3.2. Optimizing Truth Functions and Classifications for Natural Language

_{j}|x) and the prior probability distribution P(x) we can produce a likelihood function; a truth function can also be treated as a predictive model. Additionally, a truth function as a predictive model has the advantage that it still works when P(x) is changed.

**D**

_{j}is not big enough, and hence P(x|y

_{j}) is unsmooth, we cannot use Equations (12) or (13) to obtain a smooth truth function. In this case, we can use the generalized KL formula to get an optimized continuous truth function. For example, the optimized truth function is

_{elder}of the truth function. In this case, we can assume that T(θ

_{elder}) is a logistic function: T(θ

_{elder}) = 1/[1 + exp(−u(x − v))], where u and v are two parameters to be optimized. If we know P(y

_{j}|x) without knowing P(x), we may assume that P(x) is constant to obtain sampling distribution P(x|y

_{j}) and logical probability T(θ

_{elder}) [12].

_{j}|x) from sampling distributions P(x|y

_{j}) and P(x) and (2) make the semantic Bayes prediction P(x|θ

_{j}) using T*(θ

_{j}|x) and P(x). Logical Bayesian Inference is different from Bayesian Inference [5]. The former uses prior P(x), whereas the latter uses prior P(θ).

_{1}|x) and P(θ

_{0}|x) or two truth functions T(θ

_{1}|x) and T(θ

_{0}|x) with parameters. However, multi-label learning is difficult [40], because it is impossible to design n TPFs with parameters. Nevertheless, using the P–T probability framework, multi-label learning is also easy, because every label’s learning is independent [12].

_{j}= f(x). For instance, we can classify people with different ages into classes with labels “child”, “youth”, “adult”, “middle aged”, and “elder”. Using the maximum semantic information criterion, the classifier is

_{elder}|x) and the division point of “elder” will automatically increase [12].

#### 3.3. Truth Functions Used as Distribution Constraint Functions for Random Events’ Control

_{j}) is the posterior distribution after a control action y

_{j}, then the KL divergence I(X; y

_{j}) is the control amount (in bits), which reflects the complexity of the control. If the ideal posterior distribution is P(x|θ

_{j}), then the effective control amount is

_{i}|θ

_{j}) is on the left of “log” instead of the right. For generalized KL information I(X; θ

_{j}), when prediction P(x|θ

_{j}) approaches fact P(x|y

_{j}), I(X; θ

_{j}) approaches its maximum. In contrast, for effective control amount I

_{c}(X; θ

_{j}), as fact P(x|y

_{j}) approaches ideality P(x|θ

_{j}), I

_{c}(X; θ

_{j}) approaches its maximum. For an inadequate P(x|y

_{j}), I

_{c}(X; θ

_{j}) may be negative. P(x|y

_{j}) may also have parameters.

_{j}|x) used as a DCF means that there should be

_{j}) ≤ 1 − P(x|θ

_{j}) = 1 − P(x)T(θ

_{j}|x)/T(θ

_{j}), for T(θ

_{j}|x) < 1.

_{j}is a crisp set, this condition means that x cannot be outside of θ

_{j}. If θ

_{j}is fuzzy, it means that x outside of θ

_{j}should be limited. There are many distributions P(x|y

_{j}) that meet the above condition, but only one needs the minimum KL information I(X; y

_{j}). For example, assuming that x

_{j}makes T(θ

_{j}|x

_{j}) = 1, if P(x|y

_{j}) = 1 for x = x

_{j}and P(x|y

_{j}) = 0 for x ≠ x

_{j}, then P(x|y

_{j}) meets the above condition. However, this P(x|y

_{j}) needs information I(X; y

_{j}) that is not minimum.

_{i}, y

_{j}) = (x

_{i}− y

_{j})

^{2}is less than a given value C, which means the constraint sets possess the same magnitude. Unlike the constraint condition of R(C), the constraint condition of R(Θ) is that the constraint sets are fuzzy and possess different magnitudes. I have concluded [14,15]:

- For given DCFs T(θ
_{j}|x) (j = 1, 2, …, n) and P(x), when P(x|y_{j}) = P(x|θ_{j}) = P(x)T(θ_{j}|x)/T(θ_{j}), the KL divergence I(X; y_{j}) and Shannon’s mutual information I(X; Y) reach their minima; the effective control amount I_{c}(X; y_{j}) reaches its maximum. If every set θ_{j}is crisp, I(X; y_{j}) = −logT(θ_{j}) and I(X; Y) = −∑_{j}P(y_{j})logT(θ_{j}). - A rate-distortion function R(D) is equivalent to a rate-tolerance function R(Θ), and a semantic mutual information formula can express it with truth functions or DCFs (see Appendix B for details). However, an R(Θ) function may not be equivalent to an R(D) function, and hence, R(D) is a special case of R(Θ).

_{ij}= d(x

_{i}, y

_{j}) be the distortion or the loss when we use y

_{j}to represent x

_{i}. D is the upper limit of the average distortion. For given P(x), we can obtain the minimum Shannon mutual information I(X; Y), e.g., R(D). The parameterization of R(D) ([43], P. 32) includes two formulas:

_{i}) is

_{ij}) is 1 as s = 0. An often-used distortion function is d(x

_{i}, y

_{j}) = (y

_{j}-x

_{i})

^{2}. For this distortion function, exp(sd

_{ij}) is a Gaussian function (without the coefficient). Therefore, exp(sd

_{ij}) can be treated as a truth function or a DCF T(θ

_{xi}|y), where θ

_{xi}is a fuzzy set on V instead of U; λ

_{i}can be treated as the logical probability T(θ

_{xi}) of x

_{i}. Now, we can find that Equation (27) is actually a semantic Bayes formula (in Bayes’ Theorem III). An R(D) function can be expressed by the semantic mutual information formula with a truth function that is equal to an R(Θ) function (see Appendix B for details).

_{i}) is the Boltzmann distribution [44]

_{i}|T) is the probability of a particle in the ith state with energy e

_{i}, or the density of particles in the ith state with energy e

_{i}; T is the absolute temperature, k is the Boltzmann constant, and Z is the partition function.

_{i}is the ith energy, G

_{i}is the number of states with e

_{i}, and G is the total number of all states. Then, P(x

_{i}) = G

_{i}/G is the prior distribution. Hence, the above formula becomes

_{i}/(kT)] can be treated as a truth function or a DCF, Z’ as a logical probability, and Equation (29) as a semantic Bayes formula in Bayes’ Theorem III.

## 4. How the P–T probability Framework and the G Theory Support Popper’s Thought

#### 4.1. How Popper’s Thought about Scientific Progresses is Supported by the Semantic Information Measure

“The amount of empirical information conveyed by a theory, or its empirical content, increases with its degree of falsifiability.”(p. 96)

“The logical probability of a statement is complementary to its degree of falsifiability: it increases with decreasing degree of falsifiability.”(p. 102)

_{p}= log(1/m

_{p}),

_{p}is its logical probability. However, this formula is irrelevant to the instance that may or may not make p true. Therefore, it can only indicate how severe the test is, not how well p survives the test.

“It characterizes as preferable the theory which tells us more; that is to say, the theory which contains the greater amount of experimental information or content; which is logically stronger; which has greater explanatory and predictive power; and which can therefore be more severely tested by comparing predicted facts with observations. In short, we prefer an interesting, daring, and highly informative theory to a trivial one.”.([45], p. 294)

_{j}is a GPS pointer or a hypothesis “x is about x

_{j}” with a Gaussian truth function exp[−(x − x

_{j})

^{2}/(2σ

^{2})] (see Figure 5). Then, we have the amount of semantic information:

_{i}) means that x

_{i}is unexpected; large P(x

_{i}|θ

_{j}) means that the prediction is correct; log[P(x

_{i}|θ

_{j})/P(x

_{i})] indicates how severe and how well y

_{j}is tested by x

_{i}. Large T(θ

_{j}|x

_{i}) means that y

_{j}is true or close to the truth; small T(θ

_{j}) means that y

_{j}is precise. Hence, log[T(θ

_{j}|x

_{i})/T(θ

_{j})] indicates the verisimilitude of x

_{j}reflecting x

_{i}. Unexpectedness, correctness, testability, truth, precision, verisimilitude, and deviation are all reconciled in the formula.

#### 4.2. How the Semantic Information Measure Supports Popper’s Falsification Thought

_{j}) reflects this thought. Popper affirms that a counterexample can falsify a universal hypothesis. The generalized KL information (Equation (19)) supports this point of view. The truth function of a universal hypothesis only takes value 0 or 1. If there is an instance x

_{i}that makes T(θ

_{j}|x

_{i}) = 0, then I(x

_{i}; θ

_{j}) is −∞. The average information I(X; θ

_{j}) is also −∞, which falsifies the universal hypothesis.

- Popper claims that scientific knowledge grows by repeating conjectures and refutations. Repeating conjectures should include adding auxiliary hypotheses.
- Falsification is not the aim. Falsifiability is only the demarcation criterion of scientific and non-scientific theories. The aim of science is to predict empirical facts with more information. Scientists hold a scientific theory depending on if it can convey more information than other theories. Therefore, being falsified does not means being given up.

- increasing the fuzziness or decreasing the predictive precision of the hypothesis to a proper level and
- reducing the degree of belief in a rule or a major premise.

#### 4.3. For Verisimilitude: To Reconcile the Content Approach and the Likeness Approach

_{j}|x) is also the confusion probability function; it reflects likeness between x and x

_{j}. The x

_{i}(or X = x

_{i}) is the consequence, and the distance between x

_{i}and x

_{j}in the feature space reflects the likeness. The log[1/T(θ

_{j})] represents the testing severity and potential information content. Using Equation (32), we can easily explain an often-mentioned example: why “the sun has 9 satellites” (8 is true) has higher verisimilitude than “the sun has 100 satellites” [52].

**x**= (h, r, w) to denote weather, where h is temperature, r is rainfall, and w is wind speed. Let

**x**

_{j}= (h

_{j}, r

_{j}, w

_{j}) be the predicted weather and

**x**

_{i}= (h

_{i}, r

_{i}, w

_{i}) be the actual weather (consequence). The prediction is y

_{j}= “

**x**is about

**x**

_{j}”. For simplicity, we assume that h, r, and w are independent. The Gaussian truth function may be:

_{j}|

**x**) = exp[−(h − h

_{j})

^{2}/(2σ

_{h}

^{2}) − (r − r

_{j})

^{2}/(2σ

_{r}

^{2}) − (w − w

_{j})

^{2}/(2σ

_{w}

^{2})].

**x**and

**x**

_{j}). If the consequence is

**x**

_{i}, then the truth value T(θ

_{j}|

**x**

_{i}) of proposition y

_{j}(

**x**

_{i}) is the likeness. Additionally, the information I(

**x**

_{i}; θ

_{j}) = log[T(θ

_{j}|

**x**

_{i})/T(θ

_{j})] is the verisimilitude, which has almost all desirable properties for which the three approaches are used.

**x**

_{i}; θ

_{j}) is also related to prior probability distribution P(

**x**). The correct prediction of unusual weather has much higher verisimilitude than that of common weather if both predictions are right.

## 5. The P–T Probability Framework and the G Theory Are Used for Confirmation

#### 5.1. The Purpose of Confirmation: Optimizing the Degrees of Belief in Major Premises for Uncertain Syllogisms

- The task of confirmation:

- Only major premises, such as “if the medical test is positive, then the tested person is infected” and “if x is a raven, then x is black”, need confirmation. The degrees of confirmation are between −1 and 1. A proposition, such as “Tom is elderly”, or a predicate, such as “x is elderly” (x is one of the given people), needs no confirmation. The truth function of the predicate reflects the semantic meaning of “elderly” and is determined by the definition or the idiomatic usage of “elderly”. The degree of belief in a proposition is a truth value, and that in a predicate is a logical probability. The truth value and the logical probability are between 0 and 1 instead of −1 and 1.

- The purpose of confirmation:

- The purpose of confirmation is not only for assessing hypotheses (major premises), but also for probability predictions or uncertain syllogisms. A syllogism needs a major premise. However, as pointed out by Hume and Popper, it is impossible to obtain an absolutely right major premise for an infinite universe by induction. However, it is possible to optimize the degree of belief in the major premise by the proportions of positive examples and counterexamples. The optimized degree of belief is the degree of confirmation. Using a degree of confirmation, we can make an uncertain or fuzzy syllogism. Therefore, confirmation is an important link in scientific reasoning according to experience.

- The method of confirmation:

- I do not directly define a confirmation measure, as most researchers do. I derive the confirmation measures by optimizing the degree of belief in a major premise with the maximum semantic information criterion or the maximum likelihood criterion. This method is also the method of statistical learning, where the evidence is a sample.

_{1}denotes an infected specimen (or person), h

_{0}denotes an uninfected specimen, e

_{1}is positive, and e

_{0}is negative. We can treat e

_{1}as a prediction “h is infected” and e

_{0}as a prediction “h is uninfected”. The x is the observed feature of h; E

_{1}and E

_{2}are two sub-sets of the universe of x. If x is in E

_{1}, we select e

_{1}; if x is in E

_{0}, we select e

_{0}. For the binary signal detection, we use “0” or “1” in the destination to predict 0 or 1 in the source according to the received analog signal x.

_{1}then h

_{1}”, denoted by e

_{1}→h

_{1}, and “if e

_{0}then h

_{0}”, denoted by e

_{0}→h

_{0}. A confirmation measure is denoted by c(e→h).

_{1}, h

_{1}), (e

_{0}, h

_{1}), (e

_{1}, h

_{0}), and (e

_{0}, h

_{0}). Then, we can use the four examples’ numbers a, b, c, and d (see Table 1) to construct confirmation measures.

_{1}, h

_{1}). The b, c, and d are in like manner. An absolute confirmation measure can be expressed as function f(a, b, c, d). Its increment is

#### 5.2. Channel Confirmation Measure b* for Assessing a Classification as a Channel

_{1}(h) as the combination of believable and unbelievable parts (see Figure 7). The truth function of the believable part of e

_{1}is T(E

_{1}|h) ϵ {0,1}. There are T(E

_{1}|h

_{1}) = T(E

_{0}|h

_{0}) = 1 and T(E

_{1}|h

_{0}) = T(E

_{0}|h

_{1}) = 0. The unbelievable part is a tautology, whose truth function is always 1. Then, we have the truth functions of predicates e

_{1}(h) and e

_{0}(h):

_{e}

_{1}|h) = b

_{1}′ + b

_{1}T(E

_{1}|h);

_{e}

_{0}|h) = b

_{0}′ + b

_{0}T(E

_{0}|h).

_{1}is the proportion of the believable part, and b

_{1}′ = 1 − |b

_{1}| is the proportion of the unbelievable part and also the truth value of y

_{1}(h

_{0}), where h

_{0}is a counter-instance. The b

_{1}′ may be regarded as the degree of disbelief in the major premise e

_{1}→h

_{1}.

_{e}

_{1}|h), we can make probability prediction P(h|θ

_{e}

_{1}) = P(h)T(θ

_{e}

_{1}|h)/T(θ

_{e}

_{1}). According to the generalized KL formula (Equation (19)), when P(h|θ

_{e}

_{1}) = P(h|e

_{1}) or T*(θ

_{e}

_{1}|h)∝P(e

_{1}|h), the average semantic information I(h; θ

_{e}

_{1}) reaches its maximum. Letting P(h

_{1}|θ

_{e1}) = P(h

_{1}|e

_{1}), we derive (see Appendix D for details):

_{1}* = 1 − b

_{1}′* = [P(e

_{1}|h

_{1}) − P(e

_{1}|h

_{0})]/P(e

_{1}|h

_{1}).

_{1}|e

_{1}) < P(h

_{0}|e

_{1}), we have

_{1}* = b

_{1}′* − 1 = [P(e

_{1}|h

_{0}) − P(e

_{1}|h

_{1})]/P(e

_{1}|h

_{0}).

^{+}is positive LR. Elles and Fitelson [54] proposed Hypothesis Symmetry: c(e

_{1}→h

_{1}) = −c(e

_{1}→h

_{0}). As we also have c(h

_{1}→e

_{1}) = −c(h

_{1}→e

_{0}), where two consequents are opposite, I called this symmetry Consequence Symmetry [58]. As

_{1}* possesses this symmetry.

_{1}→h

_{0}) = −b*(e

_{1}→h

_{1}) and b*(e

_{0}→h

_{1}) = −b*(e

_{0}→h

_{0}).

_{1}* > 0 and P(h), we have

_{1}|θ

_{e}

_{1}) = P(h

_{1})/[ P(h

_{1}) + b

_{1′}*P(h

_{0})] = P(h

_{1})/[1 − b

_{1}*P(h

_{0})].

_{1}* = 0, then P(h

_{1}|θ

_{e}

_{1}) = P(h

_{1}). If P(h

_{1}|θ

_{e}

_{1}) < 0, then we can make use of Consequent Symmetry to make the probability prediction [58]. So far, it is still problematic to use b*, F, or another measure to assess how well a probability prediction or clarify the Raven Paradox.

#### 5.3. Prediction Confirmation Measure c* for Clarifying the Raven Paradox

_{1}* > 0 does not mean P(h

_{1}|θ

_{e}

_{1}) > P(h

_{0}|θ

_{e}

_{1}). Most other confirmation measures have similar problems [58].

_{e}

_{1}) as the combination of a believable part with proportion c

_{1}and an unbelievable part with proportion c

_{1}′, as shown in Figure 8. We call c

_{1}the degree of belief in rule e

_{1}→h

_{1}as a prediction.

_{e}

_{1}) = P(h|e

_{1}), c

_{1}becomes c

_{1}*. Then, we derive the prediction confirmation measure

_{1}= P(h

_{1}|θ

_{e}

_{1}) = P(h

_{1}|e

_{1}) is the correct rate of rule e

_{1}→h

_{1}. Letting both the numerator and denominator of Equation (44) multiply by P(e

_{1}), we obtain

_{1}→h

_{1}) also possesses Consequence Symmetry. Making use of this symmetry, we can obtain c*(e

_{1}→h

_{0}) = −c*(e

_{1}→h

_{1}) and c*(e

_{0}→h

_{1}) = −c*(e

_{0}→h

_{0}).

_{1}* > 0, according to Equation (45), we have the correct rate of rule e

_{1}→h

_{1}:

_{1}→h

_{1}) < 0, we may make use of Consequence Symmetry to make the probability prediction. However, when P(h) is changed, we should still use b* with P(h) for probability predictions.

_{1}→h

_{1}) and F(h

_{0}→e

_{0}) is that their counterexamples are the same (c = 1), yet their positive examples are different. When d increases to d + Δd, F(e

_{1}→h

_{1}) = (ad − bc)/(ad + bc + 2ac) and F(h

_{0}→e

_{0}) = (ad − bc)/(ad + bc + 2dc) unequally increase. Therefore, though measure F denies the Equivalence Condition, it still affirms that Δd affects both F(e

_{1}→h

_{1}) and F(h

_{0}→e

_{0}), and hence, measure F does not accord with the Nicod Criterion.

_{1}→h

_{1}) = (6 − 1)/6 = 5/6.

_{1}→h

_{1}) can evidently increase with a and slightly increase with d. For example, Fitelson and Hawthorne [62] believe that measure LR may be used to explain that a black raven can confirm “ravens are black” more strongly than a non-black non-raven thing. Is it true?

_{1}→h

_{1}) = (a − c)/max(a, c) and c*(h

_{0}→e

_{0}) = (d − c)/max(d, c), the Equivalence Condition does not hold, and measure c* accords with the Nicod Criterion very well. Therefore, the Raven Paradox does not exist anymore according to measure c*.

#### 5.4. How Confirmation Measures F, b*, and c* are Compatible with Popper’s Falsification Thought

_{1}→h

_{1}) is what they need.

## 6. Induction, Reasoning, Fuzzy Syllogisms, and Fuzzy Logic

#### 6.1. Viewing Induction from the New Perspective

- induction for probability predictions: to optimize likelihood functions with sampling distributions,
- induction for the semantic meanings or the extensions of labels: to optimize truth functions with sampling distributions, and
- induction for the degrees of confirmation of major premises: to optimize the degrees of belief in major premises with the proportions of positive examples and counterexamples after classifications.

#### 6.2. The Different Forms of Bayesian Reasoning as Syllogisms

_{j}|x)∝P(y

_{j}|x) or T(θ

_{ej}|h)∝P(e

_{j}|h), the consequences are the same as those from the classical statistics.

_{1}* = 1 or c

_{1}* = 1, for given e

_{1}, the consequence is P(h

_{1}) = 1 and P(h

_{0}) = 0.

_{1}→h

_{1}) is the generalization of a classical syllogism. The fuzzy syllogism is:

- The major premise is b*(e
_{1}→h_{1}) = b_{1}* - The minor premise is e
_{1}with P(h), - The consequence is P(h|θ
_{e1}) = P(h_{1})/[P(h_{1}) + (1 − b_{1}*)P(h_{0})] = P(h_{1})/[1 − b_{1}*P(h_{0})].

_{1}* = 1, if the minor premise becomes “x is in E

_{2}, and E

_{2}is included in E

_{1}”, then this syllogism becomes Barbara (AAA-1) [63], and the consequence is P(h

_{1}|θ

_{e1}) = 1. Hence, we can use the above fuzzy syllogism as the fuzzy Barbara.

_{0}, we can only use a converse confirmation measure b*(h

_{0}→e

_{0}) or c*(h

_{0}→e

_{0}) as the major premise to obtain the consequence (see [58] for details).

#### 6.3. Fuzzy Logic: Expectations and Problems

^{c}and the logical expression of three truth functions be f(a(x), b(x), c(x)) with three operators ∧, ∨, and

^{¯}. The operators ∩ and ∧ can be omitted. There are 2

^{8}different expressions with A, B, and C. To simplify statistics, we expect

^{c}≠ ϕ and A∪A

^{c}≠ U ) since

^{c}are negatively correlated, there are

## 7. Discussions

#### 7.1. How the P–T Probability Framework has Been Tested by Its Applications to Theories

_{j}|x) and prior distribution P(x), we can produce likelihood function P(x|θ

_{j}) and train T(θ

_{j}|x) and P(x|θ

_{j}) with sampling distributions P(x|y

_{j}). Then, we can let a machine reason like the human brain with the extensions of concepts.

_{i}; θ

_{j}) = log[T(θ

_{j}|x)/T(θ

_{j})]. We can also use the likelihood function to express predictive information, because I(x

_{i}; θ

_{j}) = log[T(θ

_{j}|x)/T(θ

_{j})] = log[P(x

_{i}|θ

_{j})/P(x

_{i})]. When we calculate the average semantic information I(X; θ

_{j}) and I(X; Θ), we also need statistical probabilities, such as P(x|y

_{j}), to express sampling distributions.

_{j}) as verisimilitude and testing severity, both of which can be mutually converted. With statistical probabilities, we can express how sampling distributions test hypotheses.

- two types of reasoning with Bayes’ Theorem III,
- Logical Bayesian Inference from sampling distributions to optimized truth functions, and
- fuzzy syllogisms with the degrees of confirmation of major premises.

#### 7.2. How to Extend Logic to the Probability Theory?

_{i}to represent a red ball on the ith draw and $\overline{R}$

_{i}= W

_{i}to represent a white ball on the ith draw. However, if the balls have more colors, this interpretation will not sound so good. In this case, the frequentist interpretation is simpler.

_{1}(x); when people include high school students and soldiers, the prior distribution is P

_{2}(x). T

_{1}(A) is obtained from P

_{1}(x) and T(A|x), T

_{2}(A) from P

_{2}(x) and T(A|x), and so on. T

_{1}(A∩B) should be much smaller than T

_{2}(A∩B), even if T

_{1}(A) = T

_{2}(A) and T

_{1}(B) = T

_{2}(B).

_{1}→h

_{1}) and c*(e

_{1}→h

_{1}), which may be negative, to represent the extended major premises. The most extended syllogisms in Table 4 are related to Bayes’ formulas. The measure c*(p→q) is the function of P(q|p) (see Equation (44)); they are compatible. It should also be reasonable to use P(q|p) (p and q are two predicates) as the measure for assessing a fuzzy major premise. However, P(q|p) and P(p => q) are different. We can prove

#### 7.3. Comparing the Truth Function with Fisher’s Inverse Probability Function

_{j}) as the inference tool (where θ

_{j}is a constant). It is called Bayesian Inference to use Bayesian posterior P(θ|

**D**) as the inference tool (where P(θ|

**D**) means parameters’ posterior distribution for given data or a sample). It is called the Logical Bayesian Inference to use the truth function T(θ

_{j}|x) as the inference tool [12].

_{j}|x) as inverse probability [5]. As x is a variable, we had better call P(θ

_{j}|x) the Inverse Probability Function (IPF). According to Bayes’ Theorem II, there are

_{j}|x) = P(θ

_{j})P(x|θ

_{j})/P(x),

_{j}) = P(x

_{i}) P(θ

_{j}|x)/P(θ

_{j}).

_{j}|x), we can make use of the prior knowledge P(x) well. When P(x) is changed, P(θ

_{j}|x) can still be used for probability predictions. However, why did Fisher and other researchers give up P(θ

_{j}|x) as the inference tool?

_{j}|x), j = 1,2, with parameters. For instance, we can use a pair of logistic functions as the IPFs. Unfortunately, when n > 2, it is hard to construct P(θ

_{j}|x), j = 1,2, …, n, because there is normalization limitation ∑

_{j}P(θ

_{j}|x) = 1 for every x. That is why a multi-class or multi-label classification is often converted into several binary classifications [40].

_{j}|x) and P(y

_{j}|x) as predictive models also have a serious disadvantage. In many cases, we can only know P(x) and P(x|y

_{j}) without knowing P(θ

_{j}) or P(y

_{j}) so that we cannot obtain P(y

_{j}|x) or P(θ

_{j}|x). Nevertheless, we can get truth function T*(θ

_{j}|x) in these cases. There is no normalization limitation, and hence it is easy to construct a group of truth functions and train them with P(x) and P(x|y

_{j}), j = 1,2, …, n, without P(y

_{j}) or P(θ

_{j}).

- We can use an optimized truth function T*(θ
_{j}|x) to make probability prediction for different P(x) as well as we use P(y_{j}|x) or P(θ_{j}|x). - We can train a truth function with parameters by a sample with a small size as well as we train a likelihood function.
- The truth function can indicate the semantic meaning of a hypothesis or the extension of a label.It is also the membership function, which is suitable for classification.
- To train a truth function T(θ
_{j}|x), we only need P(x) and P(x|y_{j}), without needing P(y_{j}) or P(θ_{j}). - Letting T*(θ
_{j}|x)∝P(y_{j}|x), we can bridge statistics and logic.

#### 7.4. Answers to Some Questions

#### 7.5. Some Issues That Need Further Studies

- The human brain thinks using the extensions (or denotations) of concepts more than interdependencies. A truth function indicates the (fuzzy) extension of a label and reflects the semantic meaning of the label; Bayes’ Theorem III expresses the reasoning with the extension.
- The new confirmation methods and the fuzzy syllogisms can express the induction and the reasoning with degrees of belief that the human brain uses, and the reasoning is compatible with statistical reasoning.
- The Boltzmann distribution has been applied to the Boltzmann machine [71] for machine learning. With the help of the semantic Bayes formula and the semantic information methods, we can better understand this distribution and the Regularized Least Square criterion related to information.

_{1}then h = h

_{1}” to some effective fuzzy syllogisms. It will be complicated to extend more syllogisms [63]. We need further study for the extension.

## 8. Conclusions

## Funding

## Acknowledgments

## Conflicts of Interest

## Appendix A. The Proof of Bayes’ Theorem III

_{j}) = P(X = x, Xϵθ

_{j}) = P(x|θ

_{j})T(θ

_{j}) = T(θ

_{j}|x)P(x), there are

_{j}) is horizontally normalized, T(θ

_{j}) in Equation (8) is ∑

_{i}P(x

_{i}) T(θ

_{j}|x

_{i}). As T(θ

_{j}|x) is longitudinally normalized, we have

_{j})P(x|θ

_{j})/P(x)] = T(θ

_{j})max[P(x|θ

_{j})/P(x)].

_{j}) = 1/max[P(x|θ

_{j})/P(x)].

## Appendix B. A R(D) Function is Equal to a R(Θ) Function with Truth Functions

_{xi}|y) = exp(sd

_{ij}), we have

## Appendix C. The Relationship between Information and Thermodynamic Entropy [14]

_{j}is the temperature of the jth area (y

_{j}), and N

_{j}is the number of particles in the jth area. We now consider minimum mutual information R(Θ) for given distribution constraint functions T(θ

_{j}|x) = exp[−e

_{i}/(kT

_{j})] (j = 1, 2, …). The logical probability of y

_{j}is T(θ

_{j}) = Z

_{j}/G, and the statistical probability is P(y

_{j}) = N

_{j}/N. From Appendix B and the above equation for S, we derive

_{j}= E

_{j}/N

_{j}is the average energy of a particle in the j-th area.

## Appendix D. The Derivation for b1*

_{1}|θ

_{e1}) = P(h

_{1}|e

_{1}). From

_{1}′* = P(e

_{1}|h

_{0})/P(e

_{1}|h

_{1}) for P(h

_{1}|e

_{1}) ≥ P(h

_{0}|e

_{1}). Hence,

_{1}* = 1 − b

_{1}′* = [P(e

_{1}|h

_{1}) − P(e

_{1}|h

_{0})]/P(e

_{1}|h

_{1}).

## Appendix E. Illustrating the Fuzzy Logic in the Decoding Model of Color Vision

## Appendix F. To Prove P(q|p) ≤ P(p => q)

## References

- Galavotti, M.C. The Interpretation of Probability: Still an Open Issue? Philosophies
**2017**, 2, 20. [Google Scholar] [CrossRef] - Hájek, A. Interpretations of probability. In the Stanford Encyclopedia of Philosophy (Fall 2019 Edition); Zalta, E.N., Ed.; Available online: https://plato.stanford.edu/archives/fall2019/entries/probability-interpret/ (accessed on 17 June 2020).
- Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J.
**1948**, 27, 379–423. [Google Scholar] [CrossRef][Green Version] - Fisher, R.A. On the mathematical foundations of theoretical statistics. Philos. Trans. R. Soc.
**1922**, 222, 309–368. [Google Scholar] - Fienberg, S.E. When did Bayesian Inference become “Bayesian”? Bayesian Anal.
**2006**, 1, 1–40. [Google Scholar] [CrossRef] - Popper, K.R. The propensity interpretation of the calculus of probability and the quantum theory. In the Colston Papers No 9; Körner, S., Ed.; Butterworth Scientific Publications: London, UK, 1957; pp. 65–70. [Google Scholar]
- Popper, K. Logik Der Forschung: Zur Erkenntnistheorie Der Modernen Naturwissenschaft; Springer: Vienna, Austria, 1935; English translation: The Logic of Scientific Discovery; Routledge Classic: London, UK; New York, NY, USA, 2002. [Google Scholar]
- Carnap, R. The two concepts of Probability: The problem of probability. Philos. Phenomenol. Res.
**1945**, 5, 513–532. [Google Scholar] [CrossRef] - Reichenbach, H. The Theory of Probability; University of California Press: Berkeley, CA, USA, 1949. [Google Scholar]
- Kolmogorov, A.N. Grundbegriffe der Wahrscheinlichkeitrechnung; Ergebnisse Der Mathematik (1933); Translated as Foundations of Probability; Chelsea Publishing Company: New York, NY, USA, 1950. [Google Scholar]
- Greco, S.; Slowiński, R.; Szczech, I. Measures of rule interestingness in various perspectives of confirmation. Inf. Sci.
**2016**, 346, 216–235. [Google Scholar] [CrossRef][Green Version] - Lu, C. Semantic Information G Theory and Logical Bayesian Inference for Machine Learning. Information
**2019**, 10, 261. Available online: https://www.mdpi.com/2078-2489/10/8/261 (accessed on 10 September 2020). [CrossRef][Green Version] - Lu, C. Shannon equations’ reform and applications. BUSEFAL
**1990**, 44, 45–52. Available online: https://www.listic.univ-smb.fr/production-scientifique/revue-busefal/version-electronique/ebusefal-44/ (accessed on 5 March 2019). - Lu, C. A Generalized Information Theory; China Science and Technology University Press: Hefei, China, 1993; ISBN 7-312-00501-2. (In Chinese) [Google Scholar]
- Lu, C. A generalization of Shannon’s information theory. Int. J. Gen. Syst.
**1999**, 28, 453–490. [Google Scholar] [CrossRef] - Jaynes, E.T. Probability Theory: The Logic of Science; Bretthorst, G.L., Ed.; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
- Dubois, D.; Prade, H. Possibility theory, probability theory and multiple-valued logics: A clarification. Ann. Math. Artif. Intell.
**2001**, 32, 35–66. [Google Scholar] [CrossRef] - Carnap, R. Logical Foundations of Probability, 2nd ed.; University of Chicago Press: Chicago, IL, USA, 1962. [Google Scholar]
- Zadeh, L.A. Fuzzy sets. Inf. Control
**1965**, 8, 338–353. [Google Scholar] [CrossRef][Green Version] - Zadeh, L.A. Probability measures of fuzzy events. J. Math. Anal. Appl.
**1986**, 23, 421–427. [Google Scholar] - Zadeh, L.A. Fuzzy set theory and probability theory: What is the relationship? In International Encyclopedia of Statistical Science; Lovric, M., Ed.; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
- Wang, P.Z. From the fuzzy statistics to the falling random subsets. In Advances in Fuzzy Sets, Possibility Theory and Applications; Wang, P.P., Ed.; Plenum Press: New York, NY, USA, 1983; pp. 81–96. [Google Scholar]
- Wang, P.Z. Fuzzy Sets and Falling Shadows of Random Set; Beijing Normal University Press: Beijing, China, 1985. (In Chinese) [Google Scholar]
- Keynes, J.M. A Treatise on Probability; Macmillan and Co.: London, UK, 1921. [Google Scholar]
- Demey, L.; Kooi, B.; Sack, J. Logic and probability. In The Stanford Encyclopedia of Philosophy (Summer 2019 Edition); Edward, N.Z., Ed.; Available online: https://plato.stanford.edu/archives/sum2019/entries/logic-probability/ (accessed on 17 June 2020).
- Adams, E.W. A Primer of Probability Logic; CSLI Publications: Stanford, CA, USA, 1998. [Google Scholar]
- Dubois, D.; Prade, H. Fuzzy sets and probability: Misunderstandings, bridges and gaps. In Proceedings of the 1993 Second IEEE International Conference on Fuzzy Systems, San Francisco, CA, USA, 28 March–1 April 1993; Volume 2, pp. 1059–1068. [Google Scholar] [CrossRef]
- Coletti, G.; Scozzafava, R. Conditional probability, fuzzy sets, and possibility: A unifying view. Fuzzy Sets Syst.
**2004**, 144, 227–249. [Google Scholar] - Gao, Q.; Gao, X.; Hu, Y. A uniform definition of fuzzy set theory and the fundamentals of probability theory. J. Dalian Univ. Technol.
**2006**, 46, 141–150. (In Chinese) [Google Scholar] - von Mises, R. Probability, Statistics and Truth, 2nd ed.; George Allen and Unwin Ltd.: London, UK, 1957. [Google Scholar]
- Tarski, A. The semantic conception of truth: And the foundations of semantics. Philos. Phenomenol. Res.
**1994**, 4, 341–376. [Google Scholar] [CrossRef] - Davidson, D. Truth and meaning. Synthese
**1967**, 17, 304–323. [Google Scholar] [CrossRef] - Carnap, R.; Bar-Hillel, Y. An Outline of a Theory of Semantic Information; Technical Report No. 247; Research Lab. of Electronics, MIT: Cambridge, MA, USA, 1952. [Google Scholar]
- Bayes, T.; Price, R. An essay towards solving a problem in the doctrine of chance. Philos. Trans. R. Soc. Lond.
**1763**, 53, 370–418. [Google Scholar] - Wittgenstein, L. Philosophical Investigations; Basil Blackwell Ltd.: Oxford, UK, 1958. [Google Scholar]
- Wikipedia contributors, Dempster–Shafer theory. In Wikipedia, the Free Encyclopedia; 29 May 2020; Available online: https://en.wikipedia.org/wiki/Dempster%E2%80%93Shafer_theory (accessed on 18 June 2020).
- Lu, C. The third kind of Bayes’ Theorem links membership functions to likelihood functions and sampling distributions. In Cognitive Systems and Signal Processing; Sun, F., Liu, H., Hu, D., Eds.; ICCSIP 2018, Communications in Computer and Information Science, vol 1006; Springer: Singapore, 2019; pp. 268–280. [Google Scholar]
- Floridi, L. Semantic conceptions of information. In Stanford Encyclopedia of Philosophy; Stanford University: Stanford, CA, USA, 2005; Available online: http://seop.illc.uva.nl/entries/information-semantic/ (accessed on 17 June 2020).
- Shannon, C.E. Coding theorems for a discrete source with a fidelity criterion. IRE Nat. Conv. Rec.
**1959**, 4, 142–163. [Google Scholar] - Zhang, M.L.; Li, Y.K.; Liu, X.Y.; Geng, X. Binary relevance for multi-label learning: An overview. Front. Comput. Sci.
**2018**, 12, 191–202. [Google Scholar] - Lu, C. GPS information and rate-tolerance and its relationships with rate distortion and complexity distortions. J. Chengdu Univ. Inf. Technol.
**2012**, 6, 27–32. (In Chinese) [Google Scholar] - Sow, D.M. Complexity Distortion Theory. IEEE Trans. Inf. Theory
**2003**, 49, 604–609. [Google Scholar] [CrossRef][Green Version] - Berger, T. Rate Distortion Theory; Prentice-Hall: Enklewood Cliffs, NJ, USA, 1971. [Google Scholar]
- Wikipedia contributors, Boltzmann distribution. In Wikipedia, the Free Encyclopedia; 5 August 2020; Available online: https://en.wikipedia.org/wiki/Boltzmann_distribution (accessed on 12 August 2020).
- Popper, K. Conjectures and Refutations, 1st ed.; Routledge: London, UK; New York, NY, USA, 2002. [Google Scholar]
- Zhong, Y.X. A theory of semantic information. China Commun.
**2017**, 14, 1–17. [Google Scholar] [CrossRef] - Klir, G. Generalized information theory. Fuzzy Sets Syst.
**1991**, 40, 127–142. [Google Scholar] [CrossRef] - Lakatos, I. Falsification and the methodology of scientific research programmes. In Can Theories be Refuted? Synthese Library; Harding, S.G., Ed.; Springer: Dordrecht, The Netherlands, 1976; Volume 81. [Google Scholar]
- Popper, K. Realism and the Aim of Science; Bartley, W.W., III, Ed.; Routledge: New York, NY, USA, 1983. [Google Scholar]
- Lakatos, I. Popper on demarcation and induction. In the Philosophy of Karl Popper; Schilpp, P.A., Ed.; Open Court: La Salle, IL, USA, 1974; pp. 241–273. [Google Scholar]
- Tichý, P. On Popper’s definitions of verisimilitude. Br. J. Philos. Sci.
**1974**, 25, 155–160. [Google Scholar] [CrossRef] - Oddie, G. Truthlikeness, the Stanford Encyclopedia of Philosophy (Winter 2016 Edition); Zalta, E.N., Ed.; Available online: https://plato.stanford.edu/archives/win2016/entries/truthlikeness/ (accessed on 18 May 2020).
- Zwart, S.D.; Franssen, M. An impossibility theorem for verisimilitude. Synthese
**2007**, 158, 75–92. Available online: https://doi.org/10.1007/s11229-006-9051-y (accessed on 10 September 2020). [CrossRef][Green Version] - Eells, E.; Fitelson, B. Symmetries and asymmetries in evidential support. Philos. Stud.
**2002**, 107, 129–142. [Google Scholar] [CrossRef] - Kemeny, J.; Oppenheim, P. Degrees of factual support. Philos. Sci.
**1952**, 19, 307–324. [Google Scholar] [CrossRef][Green Version] - Huber, F. What Is the Point of Confirmation? Philos. Sci.
**2005**, 72, 1146–1159. [Google Scholar] [CrossRef][Green Version] - Hawthorne, J. Inductive Logic. In The Stanford Encyclopedia of Philosophy (Spring 2018 Edition); Edward, N.Z., Ed.; Available online: https://plato.stanford.edu/archives/spr2018/entries/logic-inductive/ (accessed on 18 May 2020).
- Lu, C. Channels’ confirmation and predictions’ confirmation: From the medical test to the raven paradox. Entropy
**2020**, 22, 384. Available online: https://www.mdpi.com/1099-4300/22/4/384 (accessed on 10 September 2020). [CrossRef][Green Version] - Hempel, C.G. Studies in the logic of confirmation. Mind
**1945**, 54, 1–26. [Google Scholar] [CrossRef] - Nicod, J. Le Problème Logique De L’induction; Alcan: Paris, France, 1924; p. 219, English translation: The logical problem of induction. In Foundations of Geometry and Induction; Routledge: London, UK, 2000. [Google Scholar]
- Scheffler, I.; Goodman, N.J. Selective confirmation and the ravens: A reply to Foster. J. Philos.
**1972**, 69, 78–83. [Google Scholar] [CrossRef] - Fitelson, B.; Hawthorne, J. How Bayesian confirmation theory handles the paradox of the ravens. In the Place of Probability in Science; Eells, E., Fetzer, J., Eds.; Springer: Dordrecht, Germany, 2010; pp. 247–276. [Google Scholar]
- Wikipedia Contributors, Syllogism. Wikipedia, the Free Encyclopedia. Available online: https://en.wikipedia.org/w/index.php?title=Syllogism&oldid=958696904 (accessed on 20 May 2020).
- Wang, P.Z.; Zhang, H.M.; Ma, X.W.; Xu, W. Fuzzy set-operations represented by falling shadow theory. In Fuzzy Engineering toward Human Friendly Systems, Proceedings of the International Fuzzy Engineering Symposium’91, Yokohama, Japan, 13–15 November 1991; IOS Press: Amsterdam, The Netherlands, 1991; Volume 1, pp. 82–90. [Google Scholar]
- Lu, C. Decoding model of color vision and verifications. Acta Opt. Sin.
**1989**, 9, 158–163. (In Chinese) [Google Scholar] - Lu, C. B-fuzzy quasi-Boolean algebra and a generalize mutual entropy formula. Fuzzy Syst. Math.
**1991**, 5, 76–80. (In Chinese) [Google Scholar] - CIELAB. Symmetric Colour Vision Model, CIELAB and Colour Information Technology. 2006. Available online: http://130.149.60.45/~farbmetrik/A/FI06E.PDF (accessed on 10 September 2020).
- Lu, C. Explaining color evolution, color blindness, and color recognition by the decoding model of color vision. In Proceedings of the 11th IFIP TC 12 International Conference, IIP 2020, Hangzhou, China, 3–6 July 2020; Shi, Z., Vadera, S., Chang, E., Eds.; Springer: Cham, Switzerland, 2020; pp. 287–298. Available online: https://www.springer.com/gp/book/9783030469306 (accessed on 10 September 2020).
- Guo, S.Z. Principle of Fuzzy Mathematical Analysis Based on Structured Element; Northeast University Press: Shenyang, Chine, 2004. (In Chinese) [Google Scholar]
- Froese, T.; Taguchi, S. The Problem of Meaning in AI and Robotics: Still with Us after All These Years. Philosophies
**2019**, 4, 14. [Google Scholar] [CrossRef][Green Version] - Wikipedia contributors, Boltzmann machine. Wikipedia, the Free Encyclopedia. Available online: https://en.wikipedia.org/wiki/Boltzmann_machine (accessed on 10 August 2020).

**Figure 1.**Solving the truth functions of “adult” and “elder” using prior and posterior probability distributions. The human brain can guess T(“adult”|x) (cyan dashed line) or the extension of “adult” from P(x|”adult”) (dark green line) and P(x) (dark blue dashdotted line) and estimate T(“elder”|x) (purple thin dashed line) or the extension of “elder” from P(x|”elder”) (grey thin line) and P(x). Equations (12) and (22) are new formulas for calculating T(“adult”|x) and T(“elder”|x).

**Figure 2.**The Shannon channel and the semantic channel. (

**a**) The Shannon channel described by the P probability frame; (

**b**) the semantic channel described by the P–T probability frame. A membership function ascertains the semantic meaning of y

_{j}. A fuzzy set θ

_{j}may be overlapped or included by another.

**Figure 3.**The optimized truth function (dashed line) is the same as the membership function obtained from the statistics of a random set [37]. S

_{k}is a set-value (a thin line); N

_{j}is the number of set-values.

**Figure 4.**Illustrating a GPS device’s positioning with a deviation. The round point is the pointed position, the star is the most possible position, and the dashed line is the predicted probability distribution. Source: author.

**Figure 5.**Semantic information conveyed by y

_{j}about x

_{i}can represent the verisimilitude of x

_{j}reflecting x

_{i}. When real x is x

_{i}, the truth value is T(θ

_{j}|x

_{i}) = 0.8; information I(x

_{i}; θ

_{j}) is log(0.8/0.35) = 1.19 bits. If x exceeds a certain range, the information is negative.

**Figure 6.**Illustrating the medical test and the binary classification for explaining confirmation. If x is in E

_{1}, we use e

_{1}as prediction “h is h

_{1}”; if x is in E

_{0}, we use e

_{0}as prediction “h = h

_{0}”.

**Figure 7.**Truth function T(θ

_{e}

_{1}|h) includes the believable part with proportion b

_{1}and the unbelievable part with proportion b

_{1}′ (b

_{1}′ = 1 − |b

_{1}|).

**Figure 8.**Likelihood function P(h|θ

_{e}

_{1}) may be regarded as a believable part plus an unbelievable part.

**Figure 9.**Using a sample to examine different confirmation measures for the Raven Paradox. Source: author.

e_{0} | e_{1} | |
---|---|---|

h_{1} | b | a |

h_{0} | d | c |

e_{0} (Negative) | e_{1} (Positive) | |
---|---|---|

h_{1} (Infected) | P(e_{0}|h_{1}) = b/(a + b) | P(e_{1}|h_{1}) = a/(a + b) |

h_{0} (Uninfected) | P(e_{0}|h_{0}) = d/(c + d) | P(e_{1}|h_{0}) = c/(c + d) |

e_{0} (Negative) | e_{1} (Positive) | |
---|---|---|

h_{1} (infected) | T(θ_{e0}|h_{1}) = b_{0}′ | T(θ_{e1}|h_{1}) = 1 |

h_{0} (uninfected) | T(θ_{e0}|h_{0}) = 1 | T(θ_{e}_{1}|h_{0}) = b_{1}′ |

Reasoning between/with | Major Premise or Model | Minor Premise or Evidence | Consequence | Interpretation |
---|---|---|---|---|

Between two instances | P(y_{j}|x) | x_{i} (X = x_{i}) | P(y_{j}|x_{i}) | Conditional SP |

y_{j}, P(x) | P(x|y_{j}) = P(x)P(y_{j}|x)/P(y_{j})P(y _{j}) = ∑_{i}P(y_{j}|x_{i}) P(x_{i}) | Bayes’ Theorem II (Bayes’ prediction) | ||

Between two sets | T(θ_{2}|θ) | y_{1} (is true) | T(θ_{2}|θ_{1}) | Conditional LP |

y_{2}, T(θ) | T(θ|θ_{2}) = T(θ_{2}|θ)T(θ)/T(θ_{2}), T(θ_{2}) = T(θ_{2}|θ)T(θ) + T(θ_{2}|θ’)T(θ’) | Bayes’ Theorem I (θ’ is the complement of θ) | ||

Between an instance and a set (or model) | T(θ_{j}|x) | X = x_{i} or P(x) | T(θ_{j}|x_{i}) or T(θ_{j}) = ∑_{i}T(θ_{j}|x_{i})P(x_{i}) | Truth value and logical probability |

y_{j} is true, P(x) | P(x|θ_{j}) = P(x)T(θ_{j}|x)/T(θ_{j}),T(θ _{j}) = ∑_{i}T(θ_{j}|x_{i})P(x_{i}) | The semantic Bayes prediction in Bayes’ Theorem III | ||

P(x|θ_{j}) | x_{i} or D_{j} | P(x_{i}|θ_{j}) or P(D_{j}|θ_{j}) | Likelihood | |

P(x) | T(θ_{j}|x) = [P(x|θ_{j})/P(x)]/max[P(x|θ_{j})/P(x)] | Inference in Bayes’ Theorem III | ||

Induction: with sampling distributions to train predictive models | P(x|θ_{j}) | P(x|y_{j}) | P*(x|θ_{j}) (optimized P(x|θ_{j}) with P(x|y_{j})) | Likelihood Inference |

P(x|θ) and P(θ) | P(x|y) or D | P(θ|D) = P(θ)P(D|θ)/∑ _{j} P(θ_{j})P(D|θ_{j}) | Bayesian Inference | |

T(θ_{j}|x_{j}) | P(x|y_{j}) and P(x) | T*(θ_{j}|x) = P(x|y_{j})/P(x)/max[P(x|y_{j})/P(x)]= P(y _{j}|x)/max[P(y_{j}|x)] | Logical Bayesian Inference | |

With degree of channel confirmation | b_{1}* = b*(e_{1}→h_{1}) > 0 | h or P(h) | T(θ_{e}_{1}|h_{1}) = 1, T(θ_{e}_{1}|h_{0}) = 1 − |b_{1}*|;T(θ _{e1}) = P(h_{1}) + (1 − b_{1}*)P(h_{0}) | Truth values and logical probability |

e_{1} (e_{1} is true), P(h) | P(h_{1}|θ_{e}_{1}) = P(h_{1})/T(θ_{e}_{1}), P(h _{0}|θ_{e}_{1}) = (1 − b_{1}*)P(h_{0})/T(θ_{e}_{1}); | A fuzzy syllogism with b* | ||

With degree of prediction confirmation | c_{1}* = c*(e_{1}→h_{1}) > 0 | e_{1} (e_{1} is true) | P(h_{1}|θ_{e}_{1}) = 1/(2 − c_{1}*),P(h _{0}|θ_{e1}) = (1 − c_{1}*)/(2 − c_{1}*) | A fuzzy syllogism with c* |

© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Lu, C. The P–T Probability Framework for Semantic Communication, Falsification, Confirmation, and Bayesian Reasoning. *Philosophies* **2020**, *5*, 25.
https://doi.org/10.3390/philosophies5040025

**AMA Style**

Lu C. The P–T Probability Framework for Semantic Communication, Falsification, Confirmation, and Bayesian Reasoning. *Philosophies*. 2020; 5(4):25.
https://doi.org/10.3390/philosophies5040025

**Chicago/Turabian Style**

Lu, Chenguang. 2020. "The P–T Probability Framework for Semantic Communication, Falsification, Confirmation, and Bayesian Reasoning" *Philosophies* 5, no. 4: 25.
https://doi.org/10.3390/philosophies5040025