To Be or to Have Been Lucky, That Is the Question

Is it possible to measure the dispersion of ex ante chances (i.e., chances “before the event”) among people, be it gambling, health, or social opportunities? We explore this question and provide some tools, including a statistical test, to evidence the actual dispersion of ex ante chances in various areas, with a focus on chronic diseases. Using the principle of maximum entropy, we derive the distribution of the risk of becoming ill in the global population as well as in the population of affected people. We find that affected people are either at very low risk, like the overwhelming majority of the population, but still were unlucky to become ill, or are at extremely high risk and were bound to become ill.


Introduction
"That evening he was lucky": what do we mean by this? It is even weirder when we say: "the luck turned". Does this mean that we could be visited by fortune? Or that some people are luckier than others on certain days? Of course, we cannot rule out the fact that some people may bias the chances of success simply by cheating. Yet, is there any way to assess the dispersion of chances among gamblers (or just the fraction of cheaters)?
This kind of question is part of the field of probability calculus, which aims at determining the relative likelihoods of events (for a nice historical introduction to probability theory, see [1]). Probability calculus started during the summer of 1654 with the correspondence between Pascal and Fermat precisely on the elementary problems of gambling [2]. Symmetry arguments are at the heart of this calculus: for example, for an unbiased coin, the two results-heads or tails-are a priori equivalent and therefore, have the same probability of occurrence of 1/2. This is why it is not anecdotal that Pascal wanted to give his treatise the "astonishing" title "Geometry of Chance". Another illustration of the power of symmetry arguments is the tour de force of Maxwell who managed to calculate the velocity distribution of particles in idealized gases [3]. At the time when he derived what is since called the Maxwell-Boltzmann distribution, there was no possibility to measure this distribution. It was almost 60 years before Otto Stern could achieve the first experimental verification of this distribution [4], around the same time when he confirmed with Walther Gerlach the existence of the electron spin [5], for which he won the Nobel Prize in 1944. The agreement between theoretical and experimental distributions was surprisingly good. Since its invention in the middle of the 17th century, probability calculus has accompanied most if not all new fields of science, especially since the beginning of the 20th century with the burst of genetics and quantum physics up to the most recent developments of quantum cognition [6], not to mention its countless applications in finance and economy.
In probability theory, events are usually associated with random variables that are measurable. For example, in the heads or tails game, heads may be associated with 1 and tails with 0. Then, for a given number N of draws, one can count the number of times the heads are flipped. This number k is between 0 and N and the ratio k/N is the frequency of the heads. If the coin is unbiased, this frequency fluctuates around 1/2 when the game (N draws for each game) is played many times. Importantly, the frequency is observed ex post, i.e., after the game is played; the mean frequency is used as a measure of probability of getting a head. This is the usual way of assessing probabilities in the frequentist perspective of statistics. Remember that assessing probabilities for anticipating the outcome of future events is the very purpose of statistics. However, it is not always possible to deduce probabilities from frequency measurements. For example, suppose that each coin is tossed only once. Can we still assess the dispersion of chances among gamblers?
Dispersion of chances is far from being limited to gamblers. Disease risk is another area where people may be and actually are unequal for genetic or environmental reasons. In this case, the result of a "draw" is whether or not you have a disease D. The "game" is then limited to one "draw" per person. Of course, the mean probability to become ill can still be observed. Yet, can we assess the dispersion of disease risks? Then, if so, how can we? As a last emblematic example, we mention social opportunities. Measuring inequality of opportunity is a crucial issue with considerable political stakes, though it is extremely difficult to assess. On this last point, we postpone the in-depth study of the measure of unequal opportunities to a further work.
In all these examples, be it gambling, disease, or social opportunity, the ex ante chances are themselves random variables that cannot be deduced from frequency measurements nor be induced by symmetry arguments. They are hidden variables. Nevertheless, we argue here that the probability distribution function (pdf) of the ex ante chances can be assessed and we propose some tools to (i) first test the existence of some dispersion of chances in the population; (ii) then, infer the pdf of the ex ante chances; and (iii) explore more specifically the relevance of those tools to and their consequences in the field of chronic diseases, i.e., diseases that occur at various ages and persist throughout life [7]. Importantly, we do not assume any hypothetical functional form for the pdf of chances and then infer its parameters by Bayesian inference as is usually carried out. Here, we first test the inequality of chances in the population, then infer the functional form of the pdf by means of the principle of maximum entropy.

A Simple Draw Is Not Enough
Let us first assume that there is a sample of n people tossing a coin and that each of them has a probability p i to win (hence, 1 − p i to lose). In an unbiased game, all the p i are identical and equal to 1/2. Imagine that some gamblers are luckier, others less fortunate-hence, some p i are greater than 1/2, others less than 1/2. This means that the p i are random variables that are drawn from a probability distribution f (p) that is different from δ(p − 1/2), where δ is the Dirac delta function. Let Φ and Σ 2 be the mean and variance of f (p). Let us assume now that each individual plays N times. The result of each draw j of the individual i is a random variable X j i , either 1 in cases of success or 0 in cases of failure. This is a Bernoulli process: for each i, the random variables X j i are i.i.d. (independent, identically distributed, i.e., the probability of success p i is the same for the N draws of i). Let us define S i = ∑ N j=1 X j i as the score over N draws. It is the number of times the individual i has won. Given the risk p i , S i is a random variable that follows a binomial distribution B(N, p i ). The mean and the variance of S i for a given risk p i are Once every individual has played N times, we obtain an estimation of the distribution of the n random variables S i as a histogram over the N + 1 values k = 0, 1, 2, . . . , N. These random variables S i are independent but non-identically distributed, as the p i are different from one individual to another.
Just as the p i are drawn from the distribution f (p), the S i are the realizations of a random variable S (which takes the N + 1 discrete values k = 0, 1, 2, . . . , N). The underlying distribution is no longer only on the random variable S, but on the joint probability of S and p. Thus, the marginal probability distribution function of S is given as follows: where E[·] is expected value of · with the probability distribution of p, f (p); and C k N = N! k!(N−k)! is the binomial coefficient "N choose k", i.e., the number of k-combinations of N. The mean of S is where E N [·] is the expected value of · with the probability distribution of S, P N (S); and E N [S|p] is the conditional expected value of S for a given underlying probability p, i.e., the Bernoulli distribution. Since Φ is the mean of the distribution f (p), and the variance of S is Now, we recall the first two moments of f (p), given its mean Φ and its variance Σ 2 Equation (3) gives the variance of the score S as a function of the variance Σ 2 of f (p). In the following for the sake of clarity, we will refer to Σ 2 as the dispersion of chances.
Note that within the limit N → ∞ , the probability distribution of the random variable S/N converges to the distribution f (p).
We simulated two populations of n gamblers each drawing N times. Both populations have the same mean chance of gain Φ = 1/2. However, in the first population, the chance distribution is where there is no dispersion of chances, i.e., Σ 2 = 0. In contrast, in the second population, the chance distribution is where the dispersion is maximal, i.e., Σ 2 = 1/4. The histograms are plotted in Figure 1 for a number n of gamblers ranging from 10 to 100 and a number N of draws ranging from 1 to 4. Equation (3) shows that if N = 1, the variance Var 1 [S] = Φ(1 − Φ) does not depend on the dispersion of chances Σ 2 . As a matter of fact, when N = 1, the gains are either 0 or 1 so that the histogram of gains has only two bins, one at 0, the other at 1. The mean of gains is Φ and the variance is Φ(1 − Φ). Neither the mean nor the variance depends on the dispersion of chances Σ 2 . Moreover, according to Equation (1), the histogram of gains itself depends only on the mean of the distribution f (p): In each case, the histogram of success is plotted for increasing values of the number N of draws (N = 1, 2, 3, 4) and for two numbers n of gamblers: n = 10 (in blue) and 100 (in orange). For N = 1, note that the histogram for f 0 is similar to the histogram for f 1 and both histograms converge to the same limit as n goes to infinity. On the contrary, for each N ≥ 2, the histograms for f 0 and f 1 diverge as n increases.
The histogram of gains, therefore, cannot provide information on the dispersion of chances, as shown in Figure 1 for the case N = 1 where the histograms for f 0 and f 1 are indistinguishable. This means that a simple draw is not enough to extract the variance of f (p) from the histogram of gains; multiple draws are necessary, though are they sufficient?

A Statistical Test of the Dispersion of Chances
We then note in Figure 1 that the histogram of gains for two draws (N = 2) has three bins, one at 0, the second at 1 and the third at 2, with the following values: Hence, the histogram of gains now depends on (and only on) both the mean and the variance of f (p). Note that Equation (5) shows that is maximal when Φ = 1/2. For three or more draws, we could also have access to higher order moments of f (p). Nevertheless, the minimum condition for the presence of a probability dispersion is that the variance of f (p) is non-zero. We therefore propose to design a statistical test that will be able to discriminate between both following hypotheses: Null hypothesis H 0 : everybody has the same probability Φ of gain. This means that Alternative hypothesis H 1 : f has the same mean Φ but there is some dispersion of chances among the population, so that some people are luckier than others; hence, f has a non-zero dispersion Σ 2 .
According to H 0 , the mean of N draws is Φ and the variance is  Figure 2 shows how the variance of S varies as a function of the number of draws N for two typical distributions of mean 1/2: f 1 with zero dispersion and f 2 with maximum dispersion 1/4. The distribution f 1 (resp. f 2 ) illustrates the case of a variance growing linearly (resp. quadratically) with N.
A relevant statistical test is needed to discriminate between the two hypotheses H 0 and H 1 , or at least to reject the null hypothesis H 0 . Moreover, in the remainder of this paper, we are particularly interested in the case N = 2. It is then necessary to reformulate our hypotheses because it becomes difficult to discriminate the quadratic behavior from the linear behavior with only three points. Therefore, we rephrase our hypothesis test, based on the fact that the number of draws is limited to N = 2: Null hypothesis H 0 : the variance of S reads To estimate the variance of S from a sample of n individuals, the unbiased variance estimator is used: The estimation of the variance of S, V n , from a sample of finite size n is subject to statistical fluctuations. Thus, our hypotheses become: Null hypothesis H 0 : V n − 2Φ(1 − Φ) is compatible with 0 considering the error bars, i.e., the standard deviation of V n .
Alternative hypothesis H 1 : where Ψ = Φ(1 − Φ). Its asymptotic expression for n 1 reads Figure 3 compares the expression of the variance Var Σ 2 [V n ] (black dashed line) obtained in Equation (7) and its asymptotic expression (grey dashed line) in Equation (8) with simulations (blue dots) and shows good agreement. It can be noted that the distribution of V n tends towards a normal distribution . Now, we wish to estimate the probability of having obtained a value as high as V n under the null hypothe-sis H 0 , i.e., the p-value. Since V n follows a normal distribution, the p-value can be expressed as follows where erf and erfc are, respectively, the error function and the complementary error function. By posing E 0 [V n ] and Var 0 [V n ] as the mean and the variance of V n under the null hypothesis H 0 , i.e., Σ 2 = 0, we have Within the limit of large sample sizes n 1, one can write using, again, In the context of Figure 2 restricted to the case N = 2 and n = 100, the estimated variance V n for the distribution f 1 (of mean 1/2 and dispersion 1/4) leads to z = 20, i.e., a p-value of 10 −87 . This allows us to reject the null hypothesis in this case.

Dispersion of Disease Risks for Twins
Inequality in disease risk is a major public health issue [8,9]. Of course, part of this inequality is known to depend on genetic and environmental factors. At the turn of the 2000s, a new approach called genome wide association studies (GWAS) was designed to characterize the genetic predisposition to a chronic disease [10]. GWAS are supposed to find in particular the genes involved in a given disease, and among these genes, the variants most at risk, i.e., the DNA sequences of a given gene that are more represented in the people affected by the disease. Such variants characterize the genetic predisposition to the disease. The mean frequency that an individual will become ill in a given population, specified by genetic and environmental factors, can then be measured. As usual, this frequency can be used as a measure of the probability of becoming ill. However, can we assess the dispersion of disease risk, if only it exists, in this specific population? More generally, is there any way to assess the dispersion of risk in a more objective manner, without any a priori assumption on presumed risk factors? Here comes into play a providential help from the existence of twins. Identical twins, also called monozygotic (MZ) twins, have the same genome, shared the same fetal environment and, generally, share the same living conditions. Therefore, they are most likely to also share the same probability of becoming ill, whatever the disease. Identical twins are, therefore, like a player betting twice. This is much related to the gambling question addressed above for N = 2 (two draws). Indeed, as both twins have the same probability p of having disease D, the status-healthy or ill-of each of the two twins is equivalent, respectively, to the outcome-loss or gain-of each of the two draws by one and the same gambler. In this situation, probability p is called a risk. Let f (p) be the probability distribution function of the risk of having disease D in the population. We define the random variable S as above, i.e., S = 0 if both twins are healthy, S = 1 if only one of the two twins is ill and S = 2 if both twins are ill. The mean Φ and variance of S are given by Equations (2) and (3), respectively, hence for N = 2 Then, if V n is significantly greater than S 1 − S/2 , which amounts to carrying out the hypothesis test presented in the above section, we can conclude that there is some dispersion of the disease risk. As we will see below, the dispersion is in fact unusually large. However, before that, let us calculate the twin concordance rate of the disease D. In genetics, the twin concordance rate is the probability τ that a twin is affected given that his/her co-twin is affected: Note that τ is equal to the probandwise concordance rate, which is known to best assess the twin concordance rate [11].
Using Equations (4) and (6), we can also reformulate the concordance rate of twins in terms of the moments of the distribution f (p): Note we can generalize the concordance rate for a N-tuple: Using Equations (6) and (10), we obtain The twin concordance rate can also be computed using the probability density function f a (p) restricted to the population of affected people. Let f (X, p) be the joint probability of an individual to have a risk p ∈ [0, 1] and to be in the state X ∈ {0, 1}. According to Bayes' theorem, also known as the theorem of the probability of causes since it was independently rediscovered by Laplace [12], we write Then, f (p|X = 1) is the distribution of the risk p in the population of affected people Now, by definition, we have f (X = 1|p) = p and by noting that P[X = 1] = P 1 [S = 1], we also have This leads to the following expression of the risk distribution function among affected people Note that f a (p) is the so-called "size-biased law" of the risk p of becoming ill. Sizebiased laws are found in many contexts, notably rare events [13], Poisson point processes [14] or familial risk of disease [15].
The mean risk in the affected population is then where E a [·] is the expected value of · among affected people, with the probability distribution f a (p). Using Equation (11), we obtain which proves that the mean risk in the affected population is equal to the twin concordance rate.
We proceed now to evaluate the functional form of the distribution f (p). Using the prevalence and the twin concordance rate of the disease D, we have access to, and only to, the mean Φ and dispersion Σ 2 of f (p). The principle of maximum entropy then provides us with the least arbitrary distribution [16,17]. Dowson and Wragg proved [18] that in the class P of absolutely continuous probability distributions on [0, 1] with given first and second moments (i.e., given mean and variance), there exists a distribution in P which maximizes the entropy (13) and the corresponding density function f (p) on [0, 1] is a truncated normal distribution f (p; m, s, 0, 1), which may be either bell-shaped or U-type. Dowson and Wragg show that when Φ 1 and Σ > Φ, which is usual for most if not all chronic diseases (unpublished results), the distribution f (p; m, s, 0, 1) is U-type (see Appendix B). This distribution, which will be simply denoted f (p; m, s) in the following, can then be written The imaginary error function erfi(x) can be expressed using the Dawson function D(x) Therefore, f (p; m, s) can finally be written It is straightforward to express Φ and Σ 2 in terms of the parameters m and s: Inverting this system of equations to obtain the risk distribution function of the disease D in terms of Φ and Σ 2 is a bit trickier and requires a numerical solver. In the next section, we show the outcome of this general formalism for one specific chronic disease, namely Crohn's disease.

Application to Crohn's Disease (CD)
Crohn's disease (CD) is one of the most well-documented chronic diseases, particularly in the field of genetics [19]. Its prevalence Φ and twin concordance rate τ are [20]: Then, the twin relative risk is The dispersion of the risk of being affected is, therefore, huge for CD. It is now necessary to calculate the p-value according to Equation (9) in order to be able to reject (or not) our null hypothesis H 0 . To do this, we first need to estimate the number of twin pairs n that remains unknown in the Swedish study [20]. Nevertheless, the number of twin pairs with at least one affected twin is known and equal to n 1 + n 2 = 31.5, where n 1 = 24 and n 2 = 7.5 are the number of discordant and concordant twin pairs, respectively [20]. We can reconstruct the sample size n that would have been needed to obtain n 1 and n 2 , with probabilities P 2 [S = 1] and P 2 [S = 2]: P 2 [S = 1] + P 2 [S = 2] = n 1 + n 2 n By using Equations (5) and (6), we obtain the following sample size Equation (9) is used by calculating z within the limit of large sample sizes n 1. This results in z ≈ 2.4, which allows us to reject the null hypothesis H 0 with the p-value ≈ 8 · 10 −3 .
It is then legitimate to calculate the parameters m and s of the truncated normal distribution f (p; m, s, 0, 1), which maximizes the entropy H[ f ] given the mean Φ and the dispersion Σ 2 . Solving the system of Equations (14) and (15) for Φ = 0.0025 and Σ = 0.031 gives m ≈ 0.505 s ≈ 0.0278 Both probability distribution functions f (p; m, s) and f a (p; m, s) = p f (p; m, s)/Φ for CD are plotted in Figure 4a and zoomed in in Figure 4b. Quite remarkably, the probability density function f a (p; m, s) in the population of affected people has two narrow peaks, one close to p = 0 and the other one close to p = 1. This means that there are two quite separate categories of people who become ill: in the left peak (close to p = 0), people are at very low risk, but still have been unlucky to become ill, whereas in the right peak (close to p = 1), people are at extremely high risk, hence are unlucky a priori, and indeed, were bound to become ill. Not having any luck (to become ill because of high risk) or to have been unlucky (to become ill despite low risk), that is the question! Finally, we note that concordant twins are very likely to be in the right peak, whereas discordant twins are in the left one. Indeed, when two MZ twins have their common risk p in the left peak, their probability of being concordant is extremely low, of the order of the mean of p 2 restricted to the left peak of f a (p), which is of the order of 10 −5 . On the contrary, when two MZ twins have their common risk p in the right peak, their probability to be concordant is extremely high, of the order of 0.997. Interestingly enough, the fraction of people in the right peak (area under the curve) is 38.52%, quite similar to the (probandwise) twin concordance rate of 38.65% [20]. This strongly suggests that concordant twins for a given disease both have a strong predisposition for this disease, whereas discordant twins both have no particular predisposition.

Conclusions
Assessing the inequality of chances in a given population is a critical problem that has several issues, notably health and social opportunity. Starting with the simple heads or tails game, we have shown that, although hidden variables such as ex ante chances of gamblers (possibly cheating) cannot be assessed, their distribution can actually be assessed whenever multiple draws are available. For this purpose, we have proposed a hypothesis test to evidence the inequality of chances in a given population, then infer the functional form of the probability distribution function of the ex ante chances by means of the principle of maximum entropy, which gives the least arbitrary distribution given the mean and variance of the probability distribution function.
We applied this methodology to chronic diseases and found that the distribution of the risk of becoming ill is usually a U-type truncated normal distribution. We have computed the parameters of this U-type distribution in the case of Crohn's disease using the prevalence and the twin concordance rate of this pathology. Moreover, we have found that the risk distribution function among affected people is bimodal with two narrow peaks, one corresponding to people with no liable risk factor and the other one to people genetically or environmentally destined to become ill. An interesting consequence is that concordant twins for a given disease both have a strong predisposition for that disease, while discordant twins both have no particular predisposition.
One should still not over-interpret the results, as they still rely only on estimates of the prevalence and the twin concordance rate of the disease. It can be thought of as the best possible interpretation in terms of distribution, based on the available information. Nevertheless, maximizing the entropy of the risk distribution function leads to significantly different conclusions than more arbitrary distributions such as, for example, beta-distributions [21].
Twins provide a unique means to play twice at the lottery of diseases. Of course, twins are all the more relevant to assess ex ante chances as they share the same environmental factors. In the same vein, "social twins" or more generally "social clones" would be of great help in assessing inequality of opportunities. However, controlling the environment of such social clones would be rather challenging as the issue of choice comes into play, which may change people's lives with the same opportunities. Assessing the inequality of opportunities is, therefore, one of the most delicate, almost completely open, issues.
Pascal could never complete his treatise "Geometry of Chance". This never-ending treatise is still being written, as evidenced in this Special Issue. Acknowledgments: This work has benefited from fruitful discussions with Anne Dumay, Jean-Pierre Hugot and Alberto Romagnoni. We thank Bastien Mallein and Laurent Tournier for their careful reading of the manuscript and helpful comments.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Computing the Variance of V n
To estimate the variance of S from a sample of n individuals, the unbiased variance estimator is used: where S is the mean estimator We first recall the following properties of S: To simplify the calculation, we consider sufficiently large samples (typically n > 30) so that the distribution of S tends towards the normal distribution N E S , Var S of mean E S = m and variance Var S = Var[S]/n, according to the central limit theorem. The variable S is then independent of S i , which has, as an immediate effect, a null covariance Cov ∑ n i=1 (S i − m) 2 , S − m 2 = 0. Then, Var (S − m) 2 and Var S − m 2 remain to be determined. Let us start with the latter, which is simpler. Now all that remains is to determine Var (S − m) 2 . This term requires expressing the moments of S as a function of the moments (up to the 4th order) of the distribution f . First, let us start by explicating the variance.
Then, we calculate the -th moments of S (for = 2, 3, 4) In general, we need to know the higher order moments of the distribution f if we want to go further. However, we are only interested here in the case N = 2, where some welcome simplifications arise. It turns out the higher order moments of the distribution f do not contribute to the moments of S. Var S − m 2 = 8 n 2 Φ(1 − Φ) + Σ 2 2 Var (S − m) 2 = 2Φ(1 − Φ) 1 − 2Φ + 2Φ 2 + 14(2Φ − 1) 2 Σ 2 − 4Σ 4 Then, we obtain the following expression Finally, we can simplify further by posing Ψ = Φ(1 − Φ):