1. A Long Story Made Short
Probability, taken as a quantitative notion whose value ranges in the interval between 0 and 1, emerged around the middle of the 17th century thanks to the work of two leading French mathematicians: Blaise Pascal and Pierre Fermat. According to a well-known anecdote: “a problem about games of chance proposed to an austere Jansenist by a man of the world was the origin of the calculus of probabilities”
2. The ‘man of the world’ was the French gentleman Chevalier de Méré, a conspicuous figure at the court of Louis XIV, who asked Pascal—the ‘austere Jansenist’—the solution to some questions regarding gambling, such as how many dice tosses are needed to have a fair chance to obtain a double-six, or how the players should divide the stakes if a game is interrupted. Pascal involved Fermat in the study of such problems, and the two entertained a correspondence that is considered the cradle of modern probability
3.
To be sure, probability existed long before Pascal and Fermat, and the kind of problems they studied had already been addressed by a number of illustrious mathematicians and scientists. What was new to the work of Pascal and Fermat was their systematic approach, leading to a notion of probability apt to be detached from the realm of games of chance to be applied to all sorts of problems. In other words, they paved the way to the study of probability as a general model for reasoning under uncertainty.
Ever since its birth, probability has been characterized by a peculiar duality of meaning. As described by Hacking: probability is “Janus faced. On the one side it is statistical, concerning itself with the stochastic laws of chance processes. On the other side it is epistemological, dedicated to assessing the reasonable degrees of belief in propositions quite devoid of statistical background” [
5], (p. 12). The first testimony to the twofold meaning of probability is the work of Pascal, who studied random events like games of chance, but also applied probability to God’s existence in his famous wager. This argues that betting on God’s existence is always advisable, because it maximizes expected utility: if a probability of ½ is assigned to God’s existence, betting on it is the thing to do because the expected gain is two lives instead of one; but betting is preferable even if the probability of God’s existence is estimated very low, because this is balanced by an infinite expected value of eternal life. Pascal’s wager also suggests that considerations of utility and bets have been associated with probability since the beginning of its history. The same holds for the notion of mathematical expectation, which results from the combination of probability and utility. Mathematical expectation is the focus of the work of another pioneer of the study of probability, namely the multifarious Dutch scientist Christiaan Huygens, inventor of the pendulum clock and proponent of the wave nature of light.
By the turn of the 18th century, probability had progressed enormously, and had considerably widened its scope of application, as facilitated by the combinatorial calculus and studies in statistics and demography. A decisive step forward was made by the Swiss scientists belonging to the Bernoulli family, including Jakob, the author of Ars conjectandi (1713), which contained the first limit theorem, namely the ‘weak law of large numbers’, opening the analysis of direct probability—the probability assigned to a sample taken from a population whose law is known.
Other Bernoullis worth mentioning are Nikolaus and Daniel, who achieved key results in the analysis of mathematical expectation and started the study of distributions of errors of observation, which reached its peak in the first half of the 19th century with Carl Friedrich Gauss. After Jakob, the study of direct probability was carried on by a number of authors, including Abraham De Moivre, Pierre Simon de Laplace, and Siméon Denis Poisson, up to the 20th century mathematicians Émile Borel and Francesco Cantelli, and the Russians Pafnuty Chebyshev, Andrej Markov, Alexandr Lyapunov, and Andrej Kolmogorov.
An important result was obtained in the second half of the 18th century by Thomas Bayes, who started the study of inverse probability, namely the probability assigned to a hypothesis on the basis of a corpus of evidence. Inverse probability is also called the ‘probability of causes’ because it enables to estimate the probabilities of the causes underlying some observed event. Bayes’ rule provides a tool for combining inductive reasoning with probability, whose import was only grasped much later. Today, Bayes’ method is considered the cornerstone of statistical inference by the supporters of the Bayesian School.
The 18th century saw a rapid growth of probability. Nicolas Condorcet’s ‘social mathematics’ opened a new research field called social mathematics, leading to the application of probability to morals and politics. He exerted an influence on a number of authors, including Adolphe Quetelet, who made progress in the study of statistical distributions. This progressed enormously due to the work of Francis Galton, and later of Karl Pearson, Raphael Weldon, and others who shaped modern statistics by developing the analysis of correlation and regression, and methods for assessing statistical hypotheses against experimental data through significance tests. Other branches of modern statistics were started by Ronald Fisher, Jerzy Neyman, and Egon Pearson, who extended the method of tests to a comparison between two alternative hypotheses.
In parallel, probability entered natural science not only to master the errors of measurement, but as a constituent of physics. Starting in 1827 with Robert Brown’s work on the motion of particles suspended in fluid, the use of probability to characterize complex physical phenomena progressed rapidly, leading to the kinetic theory of gases and thermodynamics developed by James Clerk Maxwell, Ludwig Boltzmann, and Josiah W. Gibbs. By 1905–1906, Marian von Smoluchowski and Albert Einstein brought the study of Brownian motion to completion. In the same years, the analysis of radiation led Einstein and others such as Max Planck, Erwin Schrödinger, Louis de Broglie, Paul Dirac, Werner Heisenberg, Max Born, and Niels Bohr to formulate quantum mechanics, making probability a basic ingredient of the description of matter.
2. Probability and Its Interpretations
The twofold meaning of probability—epistemic and empirical—is the origin of the ongoing debate on its interpretation. The tenet that probability should take a single meaning became predominant with Laplace’s work, after a long period in which the two meanings of probability peacefully coexisted in the work of most authors
4.
In 1933 Andrej Kolmogorov gave a famous axiomatization of probability, which is still dominant through its embedding in measure theory. While spelling out the mathematical properties of probability in an axiomatic system, Kolmogorov draws a separation between its mathematics and its interpretation. The calculus tells how to calculate final (or posterior) probabilities on the basis of initial (or prior) ones, for instance the probability of obtaining two 3 when throwing two dice, given that every face of the dice has a 1/6 probability, or the probability of seeing an ace when drawing from a deck of 52 cards, knowing that every card has a 1/52 probability.
The calculation of initial probabilities—in our example: 1/6 assigned to each face of the dice, and 1/52 assigned to drawing one card from a deck—does not fall under the calculus, and can be obtained by means of different methods. The adoption of one particular method for the sake of calculating initial probabilities is a matter of controversy among the upholders of different interpretations.
3. The Classical Theory
The so-called ‘classical theory’ is usually associated with the French mathematician, physicist and astronomer Pierre Simon de Laplace (1749–1827). Known for his superb systematization of Newton’s mechanics—contained in his
Traité de mécanique céleste (1798–1825)—Laplace took Newtonian mechanics as the pillar on which the entire edifice of human knowledge should be made to rest. The fundamental component of Laplace’s conception is determinism, seen as the key ingredient of Newtonian theory. In this perspective, probability acquires an epistemic meaning, resulting from the intrinsic limitation of human knowledge. This thesis is spelled out in two famous essays devoted to probability:
Théorie analytique des probabilités (1812) and
Essai philosophique sur les probabilités (1814)
5.
Determinism is the doctrine that causality is the overall rule governing the universe, whose history is a long causal chain where every state is determined by the preceding one, and in turn determines the subsequent one, in ways that are described by the laws of mechanics. Causality and predictability are inextricably linked within the deterministic view embraced by Laplace, who strongly believes in the strength of this epistemological paradigm. The human mind is incapable of grasping all the connections of the causal network underpinning the universe, but one can conceive of a superior intelligence, able to grasp every detail of it: “An intelligence that, at a given instant, could comprehend all the forces by which nature is animated and the respective situation of the beings that make it up, if moreover, it were vast enough to submit these data to analysis, would encompass in the same formula the movements of the greatest bodies of the universe and those of the lightest atoms. For such an intelligence nothing would be uncertain, and the future, like the past, would be open to its eyes” [
10] (p. 2). Even though such comprehensive knowledge will remain out of reach, man can broaden his knowledge by using probability, which provides an epistemic tool of the utmost importance, both in science and everyday life.
Laplace holds that probability is “relative in part to [...] ignorance and in part to our knowledge” [
10] (p. 3). Made necessary by a lack of complete knowledge, probability applies to the available information, and is always relative to it. In order to evaluate probability, Laplace suggests that one ought to focus on the possibilities which are open with respect to the occurrence of an event, and states that the value of probability corresponds to the ratio of the number of favorable cases to that of all possible cases. This is known as the classical definition of probability. It is grounded on the judgment that, in the absence of information that would give reason to believe otherwise, the possibilities open to an event are equally possible, according to the precept known as the ‘principle of insufficient reason’, or ‘principle of indifference’. For the sake of determining probability values, equally possible cases are then taken as equally probable, leading to an equi-probability, or uniform distribution of priors.
Laplace gave great impulse to the study of inverse probability in a Bayesian fashion, by means of the method called Laplace’s rule, also known as the rule of succession. According to this rule, the probability of a given event is to be inferred from the number of cases in which the same event has been observed to happen: if m is the number of observed positive cases, and n that of negative cases, the probability that the next case observed is positive equals (m + 1)/(m + n + 2). If no negative cases have been observed, the formula reduces to (m + 1)/(m + 2). Laplace’s rule applies to two alternatives, such as the occurrence and non-occurrence of some event, and assumes uniform distribution of priors and independence of trials.
The uniform distribution, together with the principle of indifference on which it rests, provoked hot debate. To be sure, Laplace recommends to exercise some caution when using his method; should there be reasons to believe that the examined cases are not equally possible, “one must first determine their respective possibilities, the apposite appreciation of which is one of the most delicate points in the theory of chances” [
10] (p. 6). The determination of the different chances characterizing, say, the two sides of a biased coin would be a case for counting frequencies. As a matter of fact, non-uniform prior distributions are allowed by Laplace, albeit regarded as unnecessary because, as observed by Stephen Stigler, “the analysis for uniform prior distributions was already sufficiently general to encompass all cases, at least for the large sample problems Laplace had in mind” [
11] (p. 135). The thing is that very large samples make priors mostly irrelevant.
Biased coins are not the only challenge faced by the classical approach. The most serious difficulties occur in relation to problems involving an infinite number of possibilities, in which case the classical theory can lead to incompatible probability values. The problem is known as ‘Bertrand’s paradoxes’ after the French mathematician Joseph Bertrand. In view of these and other difficulties a number of authors turned to the frequentist interpretation of probability
6.
4. The Frequency Theory
According to the frequency interpretation, probability is defined as the limit of the relative frequency of a given attribute, observed in the initial part of an indefinitely long sequence of repeatable events, such as the observations (or measurements) obtained by experimentation. The basic assumption underlying this definition is that the experiments generating frequencies can be reproduced in identical conditions, and generate independent results.
The frequency approach started in the 19th century thanks to two Cambridge mathematicians, namely Robert Leslie Ellis and John Venn, and reached its climax with Richard von Mises. As stated in the first pages of von Mises’
Probability, Statistics and Truth, probability, once defined in terms of frequency, has to do with mass phenomena, or, in the author’s terminology, with collectives. In von Mises’ words: “a collective is a mass phenomenon or a repetitive event, or, simply, a long sequence of observations for which there are sufficient reasons to believe that the relative frequency of the observed attribute would tend to a fixed limit if the observations were indefinitely continued. This limit will be called the probability of the attribute considered within the given collective” [
12] (p. 15). The existence of a collective is taken to be a necessary condition for probability, in the sense that without a collective there cannot be meaningful probability assignments. As pointed out in the above quoted passage, in order to qualify as a collective a sequence of observations must possess two features: be apt to be indefinitely prolonged, and to exhibit frequencies tending to a limit.
In addition, a collective must be random. Randomness is defined in an operational fashion as insensitivity to place selection, where place selection essentially amounts to random sampling. Insensitivity to place selection obtains when the limiting values of the relative frequencies in a collective are not affected by any of all the selections that can be made on it. The limiting values of the relative frequencies observed in the sub-sequences (samples) obtained by place selection equal those of the original sequence. In other words: “the limiting values of the relative frequencies in a collective must be independent of all possible place selections” [
12] (p. 25). This randomness condition is also called by von Mises as the ‘principle of the impossibility of a gambling system’ because it excludes contriving a system leading to a sure win in any hypothetical game of chance. The failure of all attempts to devise a gambling system is meant to secure an empirical foundation to the notion of a collective.
Von Mises re-states the theory of probability in terms of collectives, by means of four operations that he calls selection, mixing, partition, and combination. This procedure serves the purpose of spelling out an empirical notion of probability liable to be operationally reduced to a measurable quantity. The obvious objection to the operational character of this theory is that infinite sequences are never to be obtained. Von Mises answers that probability as an idealized limit can be compared to other limiting notions used in science, such as velocity or density. He believes that “the results of a theory based on the notion of the infinite collective can be applied to finite sequences of observations in a way which is not logically definable, but is nevertheless sufficiently exact in practice” [
12] (p. 85)
7.
Since probability can only refer to collectives, under von Mises’ approach it makes no sense to talk of the probability of single occurrences of events, like the death of a particular person, or the behavior of a single gas molecule. To stress this feature of the frequency theory, von Mises says that talking of the probability of single events “has no meaning” [
12] (p. 11).
This originates the so-called single case problem affecting frequentism. Also debatable are the basic assumptions underpinning the frequency theory, that is the independence of outcomes of observations, and the absolute similarity of the experimental conditions. Nonetheless, after von Mises’ work, frequentism became so popular with physicists and natural scientists as to become the official interpretation of probability in science, and was also accepted by ‘orthodox’ statisticians.
The appeal exercised by frequentism lies with its empirical character and its objective flavor. This derives from the conviction, deeply ingrained in the frequency outlook, that there exist correct probability values that are generally unknown, but can be approached by estimates based on the frequencies observed within larger and larger samples.
In an attempt to widen its scope of application, Hans Reichenbach developed a version of frequentism more flexible than von Mises’. First of all, the author defines a weaker notion of randomness, relative to a restricted domain of selections “not defined by mathematical rules, but by reference to physical (or psychological) occurrences” [
14] (p. 150). This bears some similarity to the notion—familiar to statisticians—of ‘pseudo-randomness’, which is commonly adopted by researchers when making random sampling by means of methods like the so-called tables of random numbers. The samples so obtained are random for all practical purposes, although someone who knew how to generate the sequence of random numbers itself would be able to predict it. In Reichenbach’s words: “random sequences are characterized by the peculiarity that a person who does not know the attribute of the elements is unable to construct a mathematical selection by which he would, on an average, select more hits that would correspond to the frequency of the major sequence. [...] this might be called a psychological randomness” [
14] (p. 150).
Furthermore, Reichenbach weakens the notion of limit by introducing that of practical limit “for sequences that, in dimensions accessible to human observation, converge sufficiently and remain within the interval of convergence” [
14] (p. 347). Such a move is inspired by the conviction that “it is with sequences having a practical limit that all actual statistics are concerned” [
14] (p. 348).
In the conviction that the theory of probability should allow for single case assignments, because it would otherwise fail to accomplish its main task, namely that of allowing to predict uncertain events, Reichenbach made an attempt to solve the single case problem. The crucial notion in this connection is that of a posit, namely “a statement with which we deal as true, although the truth value is unknown” [
14] (p. 373), whose role is to connect the probability of a sequence to the probability of a single case belonging to it. The key idea is that a posit regarding a single occurrence of an event receives a weight from the probabilities attached to the reference class, to which the event in question has been assigned. The reference class must obey a criterion of homogeneity, namely it should include as many cases as possible similar to the one under consideration, while excluding dissimilar ones. Similarity is to be taken relative to the properties that are deemed relevant, and homogeneity is obtained through successive partitions of the reference class by means of statistically relevant properties. A homogeneous reference class cannot be further partitioned in this way.
Reichenbach’s proposal is not free from difficulties. A major problem arises in connection with the requirement of homogeneity, because one can never be sure that all relevant properties with respect to a given phenomenon have been taken into account and included in the reference class. In the awareness of that, Reichenbach claims that “the statement about a single case is not uttered by us with any pretense of its being a true statement; it is uttered in the form of a posit, or as we may also say [...] in the form of a wager. The frequency within the corresponding class determines, for the single case, the weight of the posit or wager. [...] we stand in a similar way before every future event, whether it is a job we are expecting to get, the result of a physical experiment, the sun’s rising tomorrow, or the next world-war. All our posits concerning these events figure within our list of expectations with a predictional value, a weight, determined by their probability” [
15] (pp. 314–315).
Reichenbach regards scientific knowledge as the result of continuous interplay between experiencing frequencies and predicting probabilities, as reflected by the title of his book
Experience and Prediction. Such an interplay is made possible by the method of posits, resting on a distinction between primitive knowledge, where prior probabilities are unknown, and advanced knowledge, characterized by the fact that prior probabilities are available. Reichenbach calls prior probabilities anticipative (or blind) posits, whose measure is entrusted to the frequentist canon, which he calls inductive rule. Within advanced knowledge, the probability calculus allows to obtain appraised posits on the basis of priors. The method of posits, starting with blind posits to proceed to appraised posits that become part of a complex system, is the core of induction, having the property of being a self-correcting procedure, responsible for “the overwhelming success of scientific method” [
15] (p. 364).
A pivotal role within such procedure is assigned to Bayes’ rule, regarded as the proper tool for the confirmation of scientific hypotheses. By holding that prior probabilities must be determined by means of frequencies alone, Reichenbach qualifies as an objective Bayesian. One can say that for Reichenbach Bayes’ rule combined with priors calculated on the basis of frequencies, is the cornerstone on which the whole edifice of knowledge rests
8.
5. The Propensity Theory
The propensity theory of probability was put forward by Karl Raimund Popper in two papers of the late Fifties, namely “The Propensity Interpretation of the Calculus of Probability, and the Quantum Theory” (1957) and “The propensity Interpretation of Probability” (1959), to solve the single case problem faced by frequentism, with specific reference to quantum mechanics, where probability is commonly referred to single electrons, collisions, and things of that sort. Popper’s idea is that probability should be taken as a dispositional property of the experimental set-up, or the generating conditions of experiments, liable to be reproduced over and over again to form a sequence. According to Popper, such a move “allows us to interpret the probability of a singular event as a property of the singular event itself” [
16] (p. 37). Singular event are not associated with particular objects, like particles or dice, but rather with the experimental arrangement (or set-up) in which experiments take place: “every experimental arrangement is liable to produce, if we repeat the experiment very often, a sequence with frequencies which depend upon this particular experimental arrangement. These virtual frequencies [...] characterize the disposition, or the propensity, of the experimental arrangement to give rise to certain characteristic frequencies when the experiment is often repeated” [
16] (p. 67)
9.
Propensities are the object of conjectural statements, which, according to Popper’s falsificationist epistemology, should be testable. In order to guarantee testability to propensity attributions Popper distinguishes between probability statements expressing propensities and statistical statements, and claims that the first express conjectured frequencies pertaining to virtual sequences of experiments, while the second express relative frequencies observed in actual sequences of experiments. Probability statements denoting propensities can be tested by means of statistical statements, by comparing the conjectured frequencies against the ones observed in sequences of performed experiments. When referred to repeatable experiments, propensities can be measured by means of observed frequencies, otherwise, they are estimated “speculatively” [
19] (p. 17).
In the eighties Popper resumed the notion of propensity to make it the focus of a metaphysical view contemplating all sorts of probabilistic tendencies operating in the world. At that point, Popper regarded the propensity theory not only as the solution to the single case problem, but also as the core of an objective view of probability underpinning an indeterministic world view. In Popper’s essay
A World of Propensities (1990), the propensity interpretation is endowed with a “cosmological significance», based on the claim that «we live in a world of propensities, and [...] this fact makes our world both more interesting and more homely than the world as seen by earlier states of the sciences” [
19] (p. 9). In other words, the propensity theory is meant as embracing in the same picture probabilistic tendencies of all kinds, ranging from physics and biology, to the motives of human action.
The propensity theory is not without difficulties. In the first place, it faces a reference class problem akin to that affecting frequentism. While single case attributions made within the frequency outlook need to be based on a reference class containing all relevant properties, propensity attributions require the complete description of the experimental arrangement; therefore, the problem of identifying a complete set of information is circumvented, not solved. Furthermore, as pointed out by Paul Humphreys, the dispositional character of propensities, defined as tendencies to produce certain outcomes, confers to them a peculiar asymmetry that goes in the opposite direction from that characterizing inverse probability, making the propensity theory inapplicable to Bayes’ rule
10. This persuaded a number of authors, including Wesley Salmon to adopt the notion of propensity to represent probabilistic causal tendencies, rather than probabilities
11. That said, it should be added that Popper’s propensity theory exercised a considerable influence upon philosophers of science, and is the object of ongoing debate. Versions of it are embraced by a number of authors including Hugh Mellor, Ronald Giere, David Miller, Donald Gillies, and many others
12.
6. The Logical Interpretation
According to the logical interpretation, probability is a logical relation between two propositions, one of which describes a body of evidence, the other a hypothesis. So conceived probability is a part of logic, and partakes in the objectivity characterizing logic. The logical nature of probability goes hand in hand with its rational character, so that the probability theory can be seen as the theory of reasonable degrees of belief.
Anticipated by Leibniz, the logical interpretation was embraced by the Czech mathematician and logician Bernard Bolzano, and heralded by the British Augustus De Morgan, George Boole, William Stanley Jevons, William Ernest Johnson, and John Maynard Keynes, best known for his contribution to economic theory.
The most conspicuous version of logicism is due to Rudolf Carnap, who made probability the object of inductive logic—also called the logic of confirmation—meant as a tool for making the best probability estimates based on the given evidence, and a basis for rational decision
13. A fundamental feature of logicism is the tenet that a given body of evidence supports a unique probability assignment. In other words, logicists contend that there exists correct probability values, which can in principle be known and measured.
In his monumental
Logical Foundations of Probability, first published in 1950 and re-published in 1962 with changes and additions, Carnap develops inductive logic as an axiomatic system, formalized within a first-order predicate calculus with identity, which applies to the measures of confirmation as defined on the semantic content of statements. The object of inductive logic is what Carnap calls probability
1, or degree of confirmation, to be distinguished from probability
2, or probability as frequency. According to the author, probability
1 “has its place in inductive logic and hence in the methodology of science”, probability2 “in mathematical statistics and its applications” [
25] (p. 5). Within scientific methodology, probability
1 is assigned a twofold role, being both a method of confirmation and a device for estimating relative frequencies. In
The Continuum of Inductive Methods, Carnap shows that there is a complete correspondence between the two meanings of probability
1, in the sense that there is a one-to-one correspondence between the confirmation functions and the estimate functions, and that these functions form a continuum. The methods belonging to the continuum are characterized by a blend of a priori and empirical components: at one end of it stands the ‘straight rule’, corresponding to the frequentist canon, according to which priors are calculated in a purely empirical fashion by means of observed frequencies, while the other end of the continuum is occupied by a function that Carnap calls c+, which corresponds to the classical (Laplacian) way of assigning equal probability to priors on a priori grounds. In the middle of the continuum lies the function called c*, characterized by symmetry, corresponding to the property usually called exchangeability. Events belonging to a sequence of observations are exchangeable if the probability of h successes in n events is the same, for whatever permutation of the n events, and for every n and h ≤ n. Exchangeability allows learning from experience faster than independence, which makes it preferable whenever the available evidence is scant. For this reason, Carnap recommends using the function c*, especially in connection with predictive inference, namely the inference from an observed to an unobserved sample, deemed “the most important kind of inductive inference” [
24] (p. 568).
Part of Carnap’s account is a requirement of total evidence, stating that “in the application of inductive logic to a given knowledge situation, the total evidence available must be taken as basis for determining the degree of confirmation” [
24] (p. 211). In other words: the estimation of probability must take into account all relevant information. Such a requisite is on a par with the requirement of homogeneity imposed by Reichenbach on the reference class, or the need to refer propensity attributions to the complete description of the generating conditions, and looks equally problematic. All of such requirements, made necessary by the want to safeguard the objective character of probability evaluations, clash with the fact that in practice one can hardly ever be sure of having considered all relevant evidence. This problem does not arise in connection with subjective theory, to be examined in the next section.
7. The Subjective Interpretation
The subjective interpretation takes probability to be the degree of belief held by a given agent regarding the occurrence of an uncertain event, based on the information available. In order to measure the degree of belief, which is assumed as a primitive notion, an operational definition must be produced. A well-known method for doing it is the betting scheme, according to which degree of belief corresponds to the odds at which an agent would be ready to bet on the occurrence of an event. The probability of the given event equals the price to be paid by a player to obtain a certain gain in case it occurs. The method of bets, which dates back to the 17th century, suffers from various difficulties, like the diminishing marginal utility of money, and the fact that people can take different attitudes towards betting, depending on their aversion or proclivity to risk. In order to circumvent such difficulties other methods have been proposed; for instance, Ramsey adopted the notion of personal preference determined on the basis of the expectation of an individual of obtaining certain goods, not necessarily of a monetary kind.
A crucial role within the subjective view is played by the notion of coherence. Put in terms of betting, coherence ensures that the chosen betting ratios avoid sure loss/gain, namely what is known in the literature as a Dutch book. Frank Plumpton Ramsey was the first to state in a famous paper written in 1926 called “Truth and Probability” that coherent degrees of belief satisfy the laws of additive probability
14. This makes coherence the only condition to be met by degrees of belief. There follows that insofar as a set of degrees of belief is coherent there is no further demand of rationality to be imposed, and it is perfectly admissible that two people on the basis of a given body of information hold different probability evaluations.
The subjective theory reached its climax with the Italian mathematician Bruno de Finetti, who made a decisive step towards a mature subjectivism by showing that the adoption of Bayes’ rule taken in conjunction with exchangeability leads to a convergence between degrees of belief and observed frequencies. This result, often called ‘de Finetti’s representation theorem’, ensures the applicability of subjective probability to statistical inference
15. An uncompromising Bayesian, de Finetti regards the shift from prior to posterior probabilities as the cornerstone of statistical inference, and gives it a subjective interpretation, in the sense that moving from prior to posterior probabilities always involves personal judgment.
Notably, in this perspective the idea of a self-correcting procedure, dear to the upholders of frequentism, does not make sense; changing opinion in the light of new evidence just means making a new judgment. Part of this standpoint is the tenet that there are no correct or rational probability assignments; in de Finetti’s words “the subjective theory [...] does not contend that the opinions about probability are uniquely determined and justifiable. Probability does not correspond to a self-proclaimed ‘rational’ belief, but to the effective personal belief of anyone” [
29] (p. 218). This marks a sharp contrast between subjectivism and logicism, whose supporters, as we saw, maintain that a given body of evidence supports only one correct (rational) probability assignment.
In order to measure degrees of belief, de Finetti, in the course of his long and productive career, adopted a number of different methods including, in addition to the betting scheme, the qualitative relation ‘at least as probable as’, and the so-called penalty methods, which can also serve the purpose of improving probability evaluations
16. As a matter of fact, de Finetti, who strongly opposed the idea that probability is an objective notion, took very seriously the problem of objectivity of probability evaluations, and gave it a significant contribution partly in collaboration with Leonard Jimmie Savage. De Finetti’s often misunderstood claim that “Probability does not exist”, printed in capital letters in the Preface to the English edition of his Theory of Probability, is the expression of the author’s anti-realistic attitude, implying a rejection of the objective, metaphysical notion of probability, and should not be taken as an indication that subjectivism is an ‘anything goes’ approach
17. Far from embracing an anarchist program according to which probability can take whatever value you like, provided that coherence is satisfied, de Finetti emphasized that the evaluation of probability should take into account all factual available evidence, including frequencies and symmetries. In addition to it, subjective elements come into play. In the author’s words: “Every probability evaluation essentially depends on two components: (1) the objective component, consisting of the evidence of known data and facts; and (2) the subjective component, consisting of the opinion concerning unknown facts based on known evidence” [
33] (p. 7). The subjective component is considered a prerequisite for the appraisal of objective elements, made necessary by the fact that the collection and exploitation of factual evidence involves a number of subjective elements, like the decision to take as relevant to the problem under consideration certain elements instead of others. Moreover, in many situations the competence and experience of the evaluator can influence probability judgments in various ways.
For de Finetti, the evaluation of probability should not be confused with its definition, for these are different concepts, not to be mixed up. The confusion between them is for de Finetti the ‘original sin’, affecting all other interpretations of probability, whose supporters pick a unique criterion (frequency, or symmetry), and ground on it both the definition and the evaluation of probability. This leads to a ‘rigid’ attitude according to which, after defining by whatever rule the probability of an event, one determines a function in a univocal way. By contrast, the subjective standpoint is deemed ‘elastic’, in the sense that it does not commit the choice of one particular function to a single rule or method: “the subjective theory [...] does not contend that the opinions about probability are uniquely determined and justifiable. Probability does not correspond to a self-proclaimed ‘rational’ belief, but to the effective personal belief of anyone” [
29] (p. 218). While regarding the evaluation of probability as a complex procedure, resulting from the concurrence of objective and subjective elements, largely determined by the context, de Finetti rejects objectivism, but retains objectivity, namely the idea that one should aim at defining ‘good probability appraisers’—to borrow an expression coined by I. J. Good
18—namely methods allowing for successful predictions.
8. Still an Open Issue?
In the preceding sections a number of different views regarding the interpretation of probability are examined. Can any of such interpretations be considered predominant? This question does not allow for a straightforward answer, and the issue is still a matter of ongoing debate.
From a philosophical point of view, Laplace’s theory is outdated because of its ties with determinism. By contrast, as already observed the method for calculating initial probabilities embodied in it is apt to be applied to a wide array of problems, although it cannot be generalized to all possible situations.
The frequency interpretation has long been considered the natural candidate to account for the notion of probability occurring in the natural sciences. However, while frequentism matches the uses of probability in areas like population genetics and statistical mechanics, the difficulties of applying it to single events make it unsuitable to be employed in other areas of science, like quantum mechanics. Also problematic appears its everyday life application, because identifying the proper reference class is a delicate matter that admits of no univocal solution. However, frequentism underpins the methodology of ‘orthodox’ statistics, and in that capacity is largely accredited
19.
After Popper, the propensity interpretation gained increasing popularity, but elicited various objections. As we have seen, it seems unsuitable to interpret inverse probabilities; besides, it faces a reference class problem analogous to that affecting frequentism, giving the impression that the problems of frequentism are displaced, rather than removed. All the more so, since propensity attributions represent conjectures to be tested against observed frequencies, making the applicability of the whole theory rest ultimately on frequencies.
As regards epistemic theories, logicism counts more followers among philosophers of science and logicians than among scientists. This is partly due to the awkward formalism adopted by Carnap, who is undoubtedly the major representative of that trend. The literature on confirmation has increasingly focused on Bayesian methodology. To be sure, also Carnap’s functions belong to the Bayesian family, but a number of authors including Richard Jeffrey and Brian Skyrms
20, have found arduous the pretense to defend a unique probability evaluation based on a given body of evidence, and have moved towards a more pluralistic approach, closer to subjectivism.
Subjectivism is strictly related to Bayesianism, so much so that there is a tendency to conflate the two, and speak of a ‘Bayesian interpretation’ of probability, meaning the subjective theory. Granted that subjectivism is the most widespread approach among Bayesian statisticians, the expression ‘Bayesian interpretation’ is to say the least misleading, because one can be a Bayesian without being a subjectivist—as witnessed among others by Reichenbach’s work. As a matter of fact, recent debate on Bayesianism paid great attention to the problem of suggesting ‘objective’ criteria for the choice of initial probabilities, giving rise to a specific trend of research, labelled ‘objective Bayesianism’
21. Work in this direction tends to transpose to the framework of Bayesianism the fundamental divergence between logicism and subjectivism, which essentially amounts to the tenet, shared by logicists like Carnap but not by subjectivists, that a degree of belief should be univocally determined by a given body of evidence. It is in that connection that the influence of logicism on contemporary debates looks tangible
22.
The subjective theory is quite popular within economics and the social sciences, mostly due to the fact that in such areas, personal opinions and expectations enter directly into the information used for forecasting, model building, and making hypotheses. In the realm of the natural sciences, where there is a strong demand for ‘objective’ probability, subjectivism is still regarded with suspicion, and subjective probability is often misconceived as expressing the personal feelings of those who evaluate it under no obligation to take into account the empirical evidence.
To be sure, the debate on the interpretation of probability is much richer than suggested by this paper, especially because in recent years a number of new perspectives have been put forward, resulting in original combinations of elements belonging to one or the other of the traditional trends. Nevertheless, the issue is still open, and nothing suggests that it will be settled soon. As a matter of fact, among researchers operating in various disciplines there is a widespread tendency to adopt a given notion of probability out of habit, in accordance to some long-standing tradition established in certain fields. In other words, probability is often used uncritically, ignoring the assumptions underlying its main interpretations. For that reason, investigating the foundations of probability, its meaning, and the assumptions made by each of its interpretations still looks like a useful exercise.