<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xml:lang="en" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Information</journal-id>
<journal-title>Information</journal-title>
<issn pub-type="epub">2078-2489</issn>
<publisher>
<publisher-name>Molecular Diversity Preservation International (MDPI)</publisher-name></publisher></journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3390/info2020277</article-id>
<article-id pub-id-type="publisher-id">information-02-00277</article-id>
<article-categories>
<subj-group>
<subject>Article</subject></subj-group></article-categories>
<title-group>
<article-title>Spencer-Brown <italic>vs.</italic> Probability and Statistics: Entropy's Testimony on Subjective and Objective Randomness</article-title></title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Stern</surname><given-names>Julio Michael</given-names></name></contrib>
<aff id="af1-information-02-00277">Department of Applied Mathematics, Institute of Mathematics and Statistics, University of Sao Paulo, Rua do Matao 1010, Cidade Universitaria, 05508-090, Sao Paulo, Brazil; E-Mail: <email>jmstern@hotmail.com</email>; Fax: +55-11-3819-3922</aff></contrib-group>
<pub-date pub-type="collection">
<year>2011</year></pub-date>
<pub-date pub-type="epub">
<day>04</day>
<month>04</month>
<year>2011</year></pub-date>
<volume>2</volume>
<issue>2</issue>
<fpage>277</fpage>
<lpage>301</lpage>
<history>
<date date-type="received">
<day>08</day>
<month>02</month>
<year>2011</year></date>
<date date-type="rev-recd">
<day>22</day>
<month>03</month>
<year>2011</year></date>
<date date-type="accepted">
<day>23</day>
<month>03</month>
<year>2011</year></date></history>
<permissions>
<copyright-statement>© 2011 by the author; licensee MDPI, Basel, Switzerland.</copyright-statement>
<copyright-year>2011</copyright-year>
<license>
<p>This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/.)</p></license></permissions>
<abstract>
<p>This article analyzes the role of entropy in Bayesian statistics, focusing on its use as a tool for detection, recognition and validation of eigen-solutions. “Objects as eigen-solutions” is a key metaphor of the cognitive constructivism epistemological framework developed by the philosopher Heinz von Foerster. Special attention is given to some objections to the concepts of probability, statistics and randomization posed by George Spencer-Brown, a figure of great influence in the field of radical constructivism.</p></abstract>
<kwd-group>
<kwd>Bayesian statistics</kwd>
<kwd>cognitive constructivism</kwd>
<kwd>eigen-solutions</kwd>
<kwd>maximum entropy</kwd>
<kwd>objective-subjective complementarity</kwd>
<kwd>randomization</kwd>
<kwd>subjective randomness</kwd></kwd-group></article-meta></front>
<body>
<sec sec-type="intro">
<label>1.</label>
<title>Introduction</title>
<p>In several already published articles, I defend the use of Bayesian Statistics in the epistemological framework of cognitive constructivism. In particular, I show how the FBST—The Full Bayesian Significance Test for precise hypotheses—can be used as a tool for detection, recognition and validation of eigen-solutions, see [<xref ref-type="bibr" rid="b1-information-02-00277">1</xref>–<xref ref-type="bibr" rid="b12-information-02-00277">12</xref>]. “Objects as eigen-solutions” is a key metaphor of cognitive constructivism as developed by the Austrian-American philosopher Heinz von Foerster, see [<xref ref-type="bibr" rid="b13-information-02-00277">13</xref>]. For some recent applications in empirical science, see [<xref ref-type="bibr" rid="b14-information-02-00277">14</xref>–<xref ref-type="bibr" rid="b20-information-02-00277">20</xref>].</p>
<p>In Statistics, specially in the design of statistical experiments, Randomization plays a role which is in the very core of objective-subjective complementarity, a concept of great significance in the epistemological framework of cognitive constructivism as well as in the theory of Bayesian statistics. The pivotal role of randomization in a well designed statistical experiment is that of a decoupling operation used to sever illegitimate functional links, thus avoiding spurious associations, breaking false influences, separating confounding variables, <italic>etc.</italic>, see [<xref ref-type="bibr" rid="b10-information-02-00277">10</xref>] and [<xref ref-type="bibr" rid="b21-information-02-00277">21</xref>].</p>
<p>The use of randomization in Statistics is an original idea of Charles Saunders Peirce and Joseph Jastrow, see [<xref ref-type="bibr" rid="b22-information-02-00277">22</xref>,<xref ref-type="bibr" rid="b23-information-02-00277">23</xref>]. Randomization is now a standard requirement for many scientific studies. In [<xref ref-type="bibr" rid="b8-information-02-00277">8</xref>] and [<xref ref-type="bibr" rid="b10-information-02-00277">10</xref>] I consider the position of C.S.Peirce as a forerunner of cognitive constructivism, based on the importance, relevance and coherence of his philosophical and scientific work. Among his several contributions, the introduction of randomization in statistical design stands indubitably out. In future articles, I hope to further expand the analysis of the role of Bayesian statistics in cognitive constructivism and provide other interesting applications.</p>
<p>I shall herein analyze some objections to the concepts of probability, statistics and randomization posed by George Spencer-Brown, a figure of great influence in the field of radical constructivism. Abstinence from statistical analysis and related quantitative methods may, at first glance, look like an idyllic fantasy island where many beautiful dreams come true. However, in my personal opinion, this position threatens to exile the cognitive constructivism epistemological framework to a limbo of powerless theories. In this article, entropy is presented as a cornerstone concept for the precise analysis and a key idea for the correct understanding of several important topics in probability and statistics. This understanding should help to clear the way for establishing Bayesian statistics as a preferred tool for scientific inference in mainstream cognitive constructivism.</p>
<p>In what follows, Section 2 corresponds to the first part of this article's title and elaborates upon “the case of Spencer-Brown <italic>vs.</italic> probability and statistics”. Corresponding to the second part of the title, Section 3 provides “the testimony of entropy on subjective randomness”. Section 4 gives “the testimony of entropy on objective randomness”, presenting several mathematical definitions, theorems and algorithms. In this article, entropy based informational analysis is the key used to “solve” all the probability paradoxes and objections to statistical science posed by Spencer Brown. Section 4 is completely self-contained. Hence, a reader preferring to be exposed first to intuitions and motivations, can read the sections of this article in the order they are presented; meanwhile, a reader seeking a more axiomatic approach can start with Section 4. Section 5 presents our final conclusions.</p></sec>
<sec>
<label>2.</label>
<title>Spencer-Brown, Probability and Statistics</title>
<p>In [<xref ref-type="bibr" rid="b24-information-02-00277">24</xref>–<xref ref-type="bibr" rid="b26-information-02-00277">26</xref>], Spencer-Brown analyzed some apparent paradoxes involving the concept of randomness, and concluded that the language of probability and statistics was inappropriate for the practice of scientific inference. In subsequent work, [<xref ref-type="bibr" rid="b27-information-02-00277">27</xref>], he reformulates classical logic using only a generalized <italic>nor</italic> operator (marked <italic>not-or</italic>, unmarked <italic>or</italic>), that he represents à la mode of Charles Saunders Peirce or John Venn, by a graphical boundary or distinction mark, see [<xref ref-type="bibr" rid="b28-information-02-00277">28</xref>–<xref ref-type="bibr" rid="b34-information-02-00277">34</xref>].</p>
<p>Making (or arbitrating) distinctions is, according to Spencer-Brown, the basic (if not the only) operation of human knowledge, an idea that has either influenced or been directly explored by several authors in the radical constructivist movement. The following quotations, from [<xref ref-type="bibr" rid="b26-information-02-00277">26</xref>] p. 23, p. 66 and p. 105, are typical arguments used by Spencer-Brown in his rejection of probability and statistics:
<disp-quote>
<p>Retroactive reclassification of observations in one of the scientist's most important tools, and we shall meet it again when we consider statistical arguments. (p. 23)</p>
<p>We have found so far that the concept of probability used in statistical science is meaningless in its own terms; but we have found also that, however meaningful it might have been, its meaningfulness would nevertheless have remained fruitless because of the impossibility of gaining information from experimental results, however significant. This final paradox, in some ways the most beautiful, I shall call the Experimental Paradox (p. 66).</p>
<p>The essence of randomness has been taken to be absence of pattern. But what has not hitherto been faced is that the absence of one pattern logically demands the presence of another. It is a mathematical contradiction to say that a series has no pattern; the most we can say is that it has no pattern that anyone is likely to look for. The concept of randomness bears meaning only in relation to the observer: If two observers habitually look for different kinds of pattern they are bound to disagree upon the series which they call random (p. 105).</p></disp-quote></p>
<p>Several authors concur, at least in part, with my opinion about Spencer-Brown's technical analysis of probability and statistics, see [<xref ref-type="bibr" rid="b35-information-02-00277">35</xref>–<xref ref-type="bibr" rid="b39-information-02-00277">39</xref>]. In Section 3, I carefully explain why I disagree with it. In some of my arguments, which are are based on information theory and the notion of entropy, I dissent from Spencer-Brown's interpretation of measures of order-disorder in sequential signals. In [<xref ref-type="bibr" rid="b40-information-02-00277">40</xref>–<xref ref-type="bibr" rid="b44-information-02-00277">44</xref>], some of the basic concepts in this area are reviewed with a minimum of mathematics. For more advanced developments see [<xref ref-type="bibr" rid="b45-information-02-00277">45</xref>–<xref ref-type="bibr" rid="b47-information-02-00277">47</xref>].</p>
<p>I also disapprove some of Spencer Brown's proposed methodologies to detect “relevant” event sequences, that is, his criteria to “mark distinct patterns” in empirical observations. My objections have a lot in common with the standard caveats against <italic>ex post facto</italic> “fishing expeditions” for interesting outcomes, or simple <italic>post hoc</italic> “sub-group analysis” in experimental data banks. This kind of retroactive or retrospective data analyses is considered a questionable statistical practice, and pointed as the culprit of many misconceived studies, misleading arguments and mistaken conclusions. The literature on statistical methodology for clinical trials has been particularly keen in warning against this kind of practice. See [<xref ref-type="bibr" rid="b48-information-02-00277">48</xref>,<xref ref-type="bibr" rid="b49-information-02-00277">49</xref>] for two interesting papers addressing this specific issue and published in high impact medicine journals less than a year before I wrote this text. When consulting for pharmaceutical companies or advising in the design of statistical experiments, I often find it useful to quote Conan Doyle's Sherlock Holmes, in The Adventure of Wisteria Lodge:
<disp-quote>
<p>Still, it is an error to argue in front of your data. You find yourself insensibly twisting them around to fit your theories.</p></disp-quote></p>
<p>Finally, I am also suspicious or skeptical about the intention behind some applications of Spencer-Brown's research program, including the use of extrasensory empathic perception for coded message communication, exercises on object manipulation using paranormal powers, <italic>etc</italic>. Unable to reconcile his psychic research program with statistical science, Spencer-Brown had no regrets in disqualifying the later, as he clearly stated in the prestigious scientific journal <italic>Nature</italic>, see pp. 594–595 of [<xref ref-type="bibr" rid="b25-information-02-00277">25</xref>]:
<disp-quote>
<p>[On telepathy:] Taking the psychical research data (that is, the residuum when fraud and incompetence are excluded), I tried to show that these now threw more doubt upon existing pre-suppositions in the theory of probability than in the theory of communication.</p>
<p>[On psychokinesis:] If such an ‘agency’ could thus ‘upset’ a process of randomizing, then all our conclusions drawn through the statistical tests of significance would be equally affected, including the conclusions about the ‘psychokinesis’ experiments themselves. (How are the target numbers for the die throws to be randomly chosen? By more die throws?) To speak of an ‘agency’ which can ‘upset’ any process of randomization in an uncontrollable manner is logically equivalent to speaking of an inadequacy in the theoretical model for empirical randomness, like the luminiferous ether of an earlier controversy, becomes, with the obsolescence of the calculus in which it occurs, a superfluous term.</p></disp-quote></p>
<p>Spencer-Brown's conclusions in [<xref ref-type="bibr" rid="b24-information-02-00277">24</xref>–<xref ref-type="bibr" rid="b26-information-02-00277">26</xref>], including his analysis of probability, were considered to be controversial (if not unreasonable or extravagant) even by his own colleagues at the Society of Psychical Research, see [<xref ref-type="bibr" rid="b50-information-02-00277">50</xref>,<xref ref-type="bibr" rid="b51-information-02-00277">51</xref>]. It seems that current research in this area, even not being free (or afraid) of criticism, has abandoned the path of naïve confrontation with statistical science, see [<xref ref-type="bibr" rid="b52-information-02-00277">52</xref>,<xref ref-type="bibr" rid="b53-information-02-00277">53</xref>]. For additional comments, see [<xref ref-type="bibr" rid="b54-information-02-00277">54</xref>–<xref ref-type="bibr" rid="b57-information-02-00277">57</xref>].</p>
<p>Curiously, Charles Saunders Peirce and his student Joseph Jastrow, who introduced the idea of randomization in statistical trials, also struggled with some of the very same dilemmas faced by Spencer-Brown, namely, the eventual detection of distinct patterns or seemingly ordered (sub)strings in a long random sequence. Peirce and Jastrow did not have at their disposal the heavy mathematical artillery I have quoted in the previous paragraphs. Nevertheless, as experienced explorers that are not easily lured, when traveling in desert sands, by the mirage of a misplaced oasis, these intrepid pioneers were able to avoid the conceptual pitfalls that lead Spencer-Brown so far astray. For more details see [<xref ref-type="bibr" rid="b10-information-02-00277">10</xref>], [<xref ref-type="bibr" rid="b22-information-02-00277">22</xref>,<xref ref-type="bibr" rid="b23-information-02-00277">23</xref>] and [<xref ref-type="bibr" rid="b58-information-02-00277">58</xref>–<xref ref-type="bibr" rid="b60-information-02-00277">60</xref>].</p>
<p>As stated in the introduction, the cognitive constructivist framework can be supported by the FBST, a non-decision theoretic formalism drawn from Bayesian statistics, see [<xref ref-type="bibr" rid="b1-information-02-00277">1</xref>] and [<xref ref-type="bibr" rid="b3-information-02-00277">3</xref>–<xref ref-type="bibr" rid="b5-information-02-00277">5</xref>]. The FBST was conceived as a tool for validating objective knowledge of eigen-solutions and, as such, can be easily integrated to the epistemological framework of cognitive constructivism in scientific research practice. Contrasting our distinct views of cognitive constructivism, it is not at all surprising that I have come to conclusions concerning the use of probability and statistics, and also to the relation between probability and logic, that are fundamentally different from those of Spencer-Brown.</p></sec>
<sec>
<label>3.</label>
<title>Pseudo, Quasi and Subjective Randomness</title>
<p>The focus of the present section are the properties of “natural” and “artificial” random sequences. The implementation of probabilistic algorithms require good random number generators, (RNGs). These algorithms include: Numerical integration methods such as Monte Carlo or Markov Chain Monte Carlo (MCMC); evolutionary computing and stochastic optimization methods such as genetic programming and simulated annealing; and also, of course, the efficient implementation of randomization methods.</p>
<p>The most basic random number generator replicates i.i.d. (independent and identically distributed) random variables uniformly distributed in the unit interval, [0, 1]. From this basic uniform generator one gets a uniform generator in the <italic>d</italic>-dimensional unit box, [0, 1]<italic><sup>d</sup></italic>, and, from the later, non-linear generators for many other multivariate distributions, see [<xref ref-type="bibr" rid="b61-information-02-00277">61</xref>,<xref ref-type="bibr" rid="b62-information-02-00277">62</xref>].</p>
<p>Historically, the technology of random number generators was developed in the context of Monte Carlo methods. The nature of Monte Carlo algorithms makes them very sensitive to correlations, auto-correlations and other statistical properties of the random number generator used in its implementation. Hence, in this context, the statistical properties of “natural” and “artificial” random sequences came to close scrutiny. For the aforementioned historical and technological reasons, Monte Carlo methods are frequently used as a benchmark for testing the properties of these generators. Hence, although Monte Carlo methods proper lie outside the scope of this article, we shall keep them as a standard application benchmark in our discussions.</p>
<p>The clever ideas and also the caveats of engineering good random number generators are in the core of many paradoxes found by Spencer-Brown. The objective of this section is to explain the basic ideas behind these generators and, in so doing, avoid the conceptual traps and pitfalls that took Spencer-Brown analyses so much off course.</p>
<sec>
<label>3.1.</label>
<title>Random and Pseudo-Random Number Generators</title>
<p>The concept of randomness is usually applied to a variable or a process (to be generated or observed) involving some uncertainty. The following definition is presented at p. 10 of [<xref ref-type="bibr" rid="b61-information-02-00277">61</xref>]:
<disp-quote>
<p>A random event is an event which has a chance of happening, and probability is a numerical measure of that chance.</p></disp-quote></p>
<p>Monte Carlo, and several other probabilistic algorithms, require a random number generator. With the last definition in mind, engineering devices based on sophisticated physical processes have been built in the hope of offering a source of “true” random numbers. However, these special devices were cumbersome, expensive, not portable nor easily available, and often unreliable. Moreover, practitioners soon realized that simple deterministic sequences could successfully be used to emulate a random generator, as stated in the following quotes (our emphasis) at p. 26 of [<xref ref-type="bibr" rid="b61-information-02-00277">61</xref>] and p. 15 of [<xref ref-type="bibr" rid="b62-information-02-00277">62</xref>]:
<disp-quote>
<p>For electronic digital computers it is most convenient to calculate a sequence of numbers one at a time as required, by a completely specified rule which is, however, so devised that no <italic><bold>reasonable</bold></italic> statistical test will detect any significant departure from randomness. Such a sequence is called <italic>pseudorandom</italic>. The great advantage of a specified rule is that the sequence can be exactly reproduced for purposes of computational checking.</p>
<p>A sequence of <italic>pseudorandom</italic> numbers (<italic>U<sub>i</sub></italic>) is a deterministic sequence of numbers in [0, 1] having the same <italic><bold>relevant</bold></italic> statistical properties as a sequence of random numbers.</p></disp-quote></p>
<p>Many deterministic random emulators used today are Linear Congruential Pseudo-Random Generators (LCPRG), as in the following example:
<disp-formula id="FD1">
<mml:math id="mm1" display="block">
<mml:semantics id="sm1">
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>a</mml:mi>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>+</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>mod</mml:mo>
<mml:mi>m</mml:mi></mml:mrow></mml:semantics></mml:math></disp-formula>where the multiplier <italic>a</italic>, the increment <italic>c</italic> and the modulus <italic>m</italic> should obey the conditions: (i) <italic>c</italic> and <italic>m</italic> are relatively prime; (ii) <italic>a</italic> − 1 is divisible by all prime factors of <italic>m</italic>; (iii) <italic>a</italic> − 1 is a multiple of 4 if <italic>m</italic> is a multiple of 4. LCPRG's are fast and easy to implement if <italic>m</italic> is taken as the computer's word range, 2<italic><sup>s</sup></italic>, where <italic>s</italic> is the computer's word size, typically <italic>s</italic> = 32 or <italic>s</italic> = 64. The LCPRG's starting point, <italic>x</italic><sub>0</sub>, is called the seed. Given the same seed the LCPG will reproduce the same sequence, a very convenient feature for tracing, debugging and verifying application programs.</p>
<p>However, LCPRG's are not an universal solution. For example, it is trivial to devise some statistics whose behaviour will be far from random, see [<xref ref-type="bibr" rid="b63-information-02-00277">63</xref>]. There the importance of the words <bold>reasonable</bold> and <bold>relevant</bold> in the last quotations becomes clear: For most practical applications these statistics are irrelevant. LCPRG's can also exhibit very long range auto-correlations and, unfortunately, these are more likely to affect long simulated time series required in some special applications. The composition of several LCPRG's by periodic seed refresh may mitigate some of these difficulties, see [<xref ref-type="bibr" rid="b62-information-02-00277">62</xref>]. LCPRG's are also not appropriate to some special applications in cryptography, see [<xref ref-type="bibr" rid="b64-information-02-00277">64</xref>]. Current state of the art generators are given in [<xref ref-type="bibr" rid="b65-information-02-00277">65</xref>,<xref ref-type="bibr" rid="b66-information-02-00277">66</xref>].</p></sec>
<sec>
<label>3.2.</label>
<title>Chance is Lumpy—Quasi-Random Generators</title>
<p>“<italic>Chance is Lumpy</italic>” is Robert Abelson's First Law of Statistics, stated in p. XV of [<xref ref-type="bibr" rid="b67-information-02-00277">67</xref>]. The probabilistic expectation is a linear operator, that is, <italic>E</italic>(<italic>Ax</italic> + <italic>b</italic>) = <italic>AE</italic>(<italic>x</italic>) + <italic>b</italic>, where <italic>x</italic> in random vector and <italic>A</italic> and <italic>b</italic> are a determined matrix and vector. The Covariance operator is defined as Cov(<italic>x</italic>) = <italic>E</italic>((<italic>x</italic> − <italic>E</italic>(<italic>x</italic>)) ⊗ (<italic>x</italic> − <italic>E</italic>(<italic>x</italic>))). Hence, Cov(<italic>Ax</italic> + <italic>b</italic>) = <italic>A</italic>Cov(<italic>x</italic>)<italic>A</italic>′. Therefore, given <italic>n</italic> i.i.d. scalar variables, <italic>x<sub>i</sub></italic> | Var(<italic>x<sub>i</sub></italic>) = <italic>σ</italic><sup>2</sup>, the variance of their mean, <italic>m</italic> = (1<italic>/n</italic>)<bold>1</bold>′<italic>x</italic> (notice the simplified vector notation <bold>1</bold> = [1, 1 …, 1]), is given by
<disp-formula id="FD2">
<mml:math id="mm2" display="block">
<mml:semantics id="sm2">
<mml:mrow>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>n</mml:mi></mml:mfrac>
<mml:msup>
<mml:mn mathvariant="bold">1</mml:mn>
<mml:mo>′</mml:mo></mml:msup>
<mml:mspace width="0.2em"/>
<mml:mtext>diag</mml:mtext>
<mml:mo stretchy="false">(</mml:mo>
<mml:msup>
<mml:mi>σ</mml:mi>
<mml:mn>2</mml:mn></mml:msup>
<mml:mn mathvariant="bold">1</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mspace width="0.2em"/>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>n</mml:mi></mml:mfrac>
<mml:mn mathvariant="bold">1</mml:mn>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>n</mml:mi></mml:mfrac></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>n</mml:mi></mml:mfrac></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mo>…</mml:mo></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>n</mml:mi></mml:mfrac></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow>
<mml:mo>]</mml:mo></mml:mrow>
<mml:mspace width="0.5em"/>
<mml:mrow>
<mml:mo stretchy="true">[</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msup>
<mml:mi>σ</mml:mi>
<mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mn>0</mml:mn></mml:mtd>
<mml:mtd>
<mml:mo>…</mml:mo></mml:mtd>
<mml:mtd>
<mml:mn>0</mml:mn></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mn>0</mml:mn></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:msup>
<mml:mi>σ</mml:mi>
<mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mo>…</mml:mo>
<mml:mn>0</mml:mn></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mo>⋮</mml:mo></mml:mtd>
<mml:mtd>
<mml:mo>⋮</mml:mo></mml:mtd>
<mml:mtd>
<mml:mo>⋱</mml:mo></mml:mtd>
<mml:mtd>
<mml:mo>⋮</mml:mo></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mn>0</mml:mn></mml:mtd>
<mml:mtd>
<mml:mn>0</mml:mn></mml:mtd>
<mml:mtd>
<mml:mo>⋯</mml:mo></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:msup>
<mml:mi>σ</mml:mi>
<mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow>
<mml:mo>]</mml:mo></mml:mrow>
<mml:mspace width="0.5em"/>
<mml:mrow>
<mml:mo stretchy="true">[</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>n</mml:mi></mml:mfrac></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>n</mml:mi></mml:mfrac></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mo>⋮</mml:mo></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>n</mml:mi></mml:mfrac></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow>
<mml:mo>]</mml:mo></mml:mrow>
<mml:mo>=</mml:mo>
<mml:msup>
<mml:mi>σ</mml:mi>
<mml:mn>2</mml:mn></mml:msup>
<mml:mo stretchy="false">/</mml:mo>
<mml:mi>n</mml:mi></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>Hence, mean values of iid random variables converge to their expected values at a rate of 
<inline-formula>
<mml:math id="mm3" display="inline">
<mml:semantics id="sm3">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">/</mml:mo>
<mml:msqrt>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msqrt></mml:mrow></mml:semantics></mml:math></inline-formula></p>
<p>Quasi-random sequences are deterministic sequences built not to emulate random sequences, as pseudo-random sequences do, but to achieve faster convergence rates. For <italic>d</italic>-dimensional quasi-random sequences, an appropriate measure of fluctuation, called discrepancy, only grows at a rate of log(<italic>n</italic>)<italic><sup>d</sup></italic>, hence growing much slower than 
<inline-formula>
<mml:math id="mm4" display="inline">
<mml:semantics id="sm4">
<mml:mrow>
<mml:msqrt>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msqrt></mml:mrow></mml:semantics></mml:math></inline-formula>. Therefore, the convergence rate corresponding to quasi-random sequences, log(<italic>n</italic>)<italic><sup>d</sup></italic>/<italic>n</italic>, is much faster than the one corresponding to (pseudo) random sequences, 
<inline-formula>
<mml:math id="mm5" display="inline">
<mml:semantics id="sm5">
<mml:mrow>
<mml:msqrt>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msqrt>
<mml:mo stretchy="false">/</mml:mo>
<mml:mi>n</mml:mi></mml:mrow></mml:semantics></mml:math></inline-formula>. <xref ref-type="fig" rid="f1-information-02-00277">Figure 1</xref> allows the visual comparison of typical (pseudo) random (left) and quasi-random (right) sequences in [0, 1]<sup>2</sup>. By visual inspection we see that the points of the quasi-random sequence are more “homogeneously scattered”, that is, they do not “clump together”, as the point of the (pseudo) random sequence often do.</p>
<p>Let us consider an axis-parallel rectangles in the unit box,
<disp-formula id="FD3">
<mml:math id="mm6" display="block">
<mml:semantics id="sm6">
<mml:mrow>
<mml:mi>R</mml:mi>
<mml:mo>=</mml:mo>
<mml:mo stretchy="false">[</mml:mo>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:mo stretchy="false">]</mml:mo>
<mml:mo>×</mml:mo>
<mml:mo stretchy="false">[</mml:mo>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mn>2</mml:mn></mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mn>2</mml:mn></mml:msub>
<mml:mo stretchy="false">]</mml:mo>
<mml:mo>×</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo stretchy="false">[</mml:mo>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mi>d</mml:mi></mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>d</mml:mi></mml:msub>
<mml:mo stretchy="false">]</mml:mo>
<mml:mo>⊆</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">]</mml:mo></mml:mrow>
<mml:mi>d</mml:mi></mml:msup></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>The discrepancy of the sequence <italic>s</italic><sub>1:</sub><italic><sub>n</sub></italic> in box <italic>R</italic>, and the overall discrepancy of the sequence are defined as
<disp-formula id="FD4">
<mml:math id="mm7" display="block">
<mml:semantics id="sm7">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>D</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>:</mml:mo>
<mml:mi>n</mml:mi></mml:mrow></mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>R</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mi>n</mml:mi>
<mml:mtext>Vol</mml:mtext>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>R</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>−</mml:mo>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>:</mml:mo>
<mml:mi>n</mml:mi></mml:mrow></mml:msub>
<mml:mo>∩</mml:mo>
<mml:mi>R</mml:mi></mml:mrow>
<mml:mo>|</mml:mo></mml:mrow>
<mml:mo>,</mml:mo></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mi>D</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>:</mml:mo>
<mml:mi>n</mml:mi></mml:mrow></mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:munder>
<mml:mrow>
<mml:mo>sup</mml:mo></mml:mrow>
<mml:mrow>
<mml:mi>R</mml:mi>
<mml:mo>⊆</mml:mo>
<mml:mo stretchy="false">[</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>1</mml:mn>
<mml:msup>
<mml:mo stretchy="false">[</mml:mo>
<mml:mi>d</mml:mi></mml:msup></mml:mrow></mml:munder>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mrow>
<mml:mi>D</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>:</mml:mo>
<mml:mi>n</mml:mi></mml:mrow></mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>R</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mo>|</mml:mo></mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>It is possible to prove that the discrepancy of the Halton-Hammersley sequence, defined next, is of order <italic>O</italic>(log(<italic>n</italic>)<italic><sup>d</sup></italic><sup>−1</sup>), see chapter 2 of [<xref ref-type="bibr" rid="b68-information-02-00277">68</xref>].</p>
<p>Halton-Hammersley sets: Given <italic>d</italic> − 1 distinct prime numbers, <italic>p</italic>(1), <italic>p</italic>(2), … <italic>p</italic>(<italic>d</italic> − 1), the <italic>i</italic>-th point, <italic>x<sup>i</sup></italic>, in the Halton-Hammersley set, {<italic>x</italic><sup>1</sup>, <italic>x</italic><sup>2</sup>, … <italic>x<sup>n</sup></italic>}, is
<disp-formula id="Fd5">
<mml:math id="mm8" display="block">
<mml:semantics id="sm8">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msup>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi></mml:msup>
<mml:mo>=</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy="false">/</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mn>2</mml:mn>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>,</mml:mo>
<mml:mo>…</mml:mo>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>d</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo stretchy="false">]</mml:mo></mml:mrow>
<mml:mo>′</mml:mo></mml:msup>
<mml:mo>,</mml:mo>
<mml:mtext>for</mml:mtext>
<mml:mspace width="0.2em"/>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>:</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mtext>where</mml:mtext>
<mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mn>0</mml:mn></mml:msub>
<mml:mo>+</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:mo>+</mml:mo>
<mml:mi>p</mml:mi>
<mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mn>2</mml:mn></mml:msup>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mn>2</mml:mn></mml:msub>
<mml:mo>+</mml:mo>
<mml:mi>p</mml:mi>
<mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mn>3</mml:mn></mml:msup>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mn>3</mml:mn></mml:msub>
<mml:mo>+</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mn>0</mml:mn></mml:msub></mml:mrow>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mfrac>
<mml:mo>+</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mn>1</mml:mn></mml:msub></mml:mrow>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mfrac>
<mml:mo>+</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mn>2</mml:mn></mml:msub></mml:mrow>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mn>3</mml:mn></mml:msup></mml:mrow></mml:mfrac>
<mml:mo>+</mml:mo>
<mml:mo>…</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>That is, the (<italic>k</italic> + 1)-th coordinate of <italic>x<sup>i</sup></italic>, 
<inline-formula>
<mml:math id="mm9" display="inline">
<mml:semantics id="sm9">
<mml:mrow>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mi>i</mml:mi></mml:msubsup>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:semantics></mml:math></inline-formula>, is obtained by the bit (or digit) reversal of <italic>i</italic> written in <italic>p</italic>(<italic>k</italic>)-adic or base <italic>p</italic>(<italic>k</italic>) notation.</p>
<p>The Halton-Hammersley set is a generalization of van der Corput set, built in the bidimensional unit square, <italic>d</italic> = 2, using the first prime number, <italic>p</italic> = 2. The following example, from p. 33 of [<xref ref-type="bibr" rid="b61-information-02-00277">61</xref>] and p. 117 of [<xref ref-type="bibr" rid="b69-information-02-00277">69</xref>], builds the 8-point van der Corput set, expressed in binary and decimal notation.</p>
<preformat>
<monospace>function x= corput(n,b)</monospace>
</preformat>
<preformat>
<monospace>% size n base b v.d.corput set</monospace>
</preformat>
<preformat>
<monospace>m=floor (log(n)/log(b));</monospace>
</preformat>
<preformat>
<monospace>u=1 : n; D=[ ];</monospace>
</preformat>
<preformat>
<monospace>for i=0:m</monospace>
</preformat>
<preformat>
<monospace>d= rem(u,b);</monospace>
</preformat>
<preformat>
<monospace>u= (u–d)/b;</monospace>
</preformat>
<preformat>
<monospace>D= [D; d];</monospace>
</preformat>
<preformat>
<monospace>end</monospace>
</preformat>
<preformat>
<monospace>x=((1./b′).ˆ(1 : (m\ma1)))*D;</monospace>
</preformat>
<table-wrap id="t1-information-02-00277" position="anchor">
<table frame="hsides" rules="groups">
<thead>
<tr>
<th colspan="2" align="center" valign="top">Decimal</th>
<th colspan="2" align="center" valign="top">Binary</th></tr>
<tr>
<th colspan="4" valign="bottom">
<hr/></th></tr>
<tr>
<th align="left" valign="middle"><italic>i</italic></th>
<th align="left" valign="middle"><italic>r</italic><sub>2</sub>(<italic>i</italic>)</th>
<th align="left" valign="middle"><italic>i</italic></th>
<th align="left" valign="middle"><italic>r</italic><sub>2</sub>(<italic>i</italic>)</th></tr></thead>
<tbody>
<tr>
<td align="left" valign="top">1</td>
<td align="left" valign="top">0.5</td>
<td align="left" valign="top">1</td>
<td align="left" valign="top">0.1</td></tr>
<tr>
<td align="left" valign="top">2</td>
<td align="left" valign="top">0.21</td>
<td align="left" valign="top">10</td>
<td align="left" valign="top">0.01</td></tr>
<tr>
<td align="left" valign="top">3</td>
<td align="left" valign="top">0.75</td>
<td align="left" valign="top">11</td>
<td align="left" valign="top">0.11</td></tr>
<tr>
<td align="left" valign="top">4</td>
<td align="left" valign="top">0.125</td>
<td align="left" valign="top">100</td>
<td align="left" valign="top">0.001</td></tr>
<tr>
<td align="left" valign="top">5</td>
<td align="left" valign="top">0.625</td>
<td align="left" valign="top">101</td>
<td align="left" valign="top">0.101</td></tr>
<tr>
<td align="left" valign="top">6</td>
<td align="left" valign="top">0.375</td>
<td align="left" valign="top">110</td>
<td align="left" valign="top">0.011</td></tr>
<tr>
<td align="left" valign="top">7</td>
<td align="left" valign="top">0.875</td>
<td align="left" valign="top">111</td>
<td align="left" valign="top">0.111</td></tr>
<tr>
<td align="left" valign="top">8</td>
<td align="left" valign="top">0.0625</td>
<td align="left" valign="top">1000</td>
<td align="left" valign="top">0.0001</td></tr></tbody></table></table-wrap>
<p>Quasi-random sequences, also known as low-discrepancy sequences, can substitute pseudo-random sequences in some applications of Monte Carlo methods, achieving higher accuracy with less computational effort, see [<xref ref-type="bibr" rid="b70-information-02-00277">70</xref>–<xref ref-type="bibr" rid="b72-information-02-00277">72</xref>]. Nevertheless, since by design the points of a quasi-random sequence tend to avoid each other, strong (negative) correlations are expected to appear. In this way, the very reason that can make quasi-random sequences so helpful, can ultimately impose some limits to their applicability. Some of these problems are commented in p. 766 of [<xref ref-type="bibr" rid="b73-information-02-00277">73</xref>]:
<disp-quote>
<p>First, quasi-Monte Carlo methods are valid for integration problems, but may not be directly applicable to simulations, due to the correlations between the points of a quasi-random sequence. … A second limitation: the improved accuracy of quasi-Monte Carlo methods is generally lost for problems of high dimension or problems in which the integrand is not smooth.</p></disp-quote></p></sec>
<sec>
<label>3.3.</label>
<title>Subjective Randomness and Its Paradoxes</title>
<p>When asked to look at patterns like those in <xref ref-type="fig" rid="f1-information-02-00277">Figure 1</xref>, many subjects perceive the quasi-random set as “more random” than the (pseudo) random set. How can this paradox be explained? This was the topic of many psychological studies in the field of subjective randomness. The quotation in the next paragraph is from one of these studies, p. 306 in [<xref ref-type="bibr" rid="b36-information-02-00277">36</xref>], emphasis are ours:
<disp-quote>
<p>One major source of confusion is the fact that randomness involves two distinct ideas: <bold>process</bold> and <bold>pattern</bold>, [<xref ref-type="bibr" rid="b74-information-02-00277">74</xref>]. It is natural to think of randomness as a process that generates unpredictable outcomes (stochastic process according to [<xref ref-type="bibr" rid="b75-information-02-00277">75</xref>]). Randomness of a <bold>process</bold> refers to the <bold>unpredictability</bold> of the individual event in the series [<xref ref-type="bibr" rid="b76-information-02-00277">76</xref>,<xref ref-type="bibr" rid="b77-information-02-00277">77</xref>]. This is what Spencer Brown [<xref ref-type="bibr" rid="b26-information-02-00277">26</xref>] calls <bold>primary randomness</bold>. However, one usually determines the randomness of the process by means of its output, which is supposed to be <bold>patternless</bold>. This kind of randomness refers, by definition, to a sequence. It is labeled <bold>secondary randomness</bold> by Spencer Brown. It requires that all symbol types, as well as all ordered pairs (diagrams), ordered triplets (trigrams)… n-grams in the sequence be equiprobable. This definition could be valid for any n only in infinite sequences, and it may be approximated in finite sequences only up to ns much smaller than the sequence's length. The entropy measure of randomness is based on this definition, see chapter 1 and 2 of [<xref ref-type="bibr" rid="b41-information-02-00277">41</xref>].</p>
<p>These two aspects of randomness are closely related. We ordinarily expect outcomes generated by a random process to be patternless. Most of them are. Conversely, a sequence whose order is random supports the hypothesis that it was generated by a random mechanism, whereas sequences whose order is not random cast doubt on the random nature of the generating process.</p></disp-quote></p>
<p>Spencer-Brown was intrigued by the apparent incompatibility of the notions of primary and secondary randomness. The apparent collision of these two notions generates several interesting paradoxes, taking Spencer-Brown to question the applicability of the concept of randomness in particular and probability and statistical analysis in general, see [<xref ref-type="bibr" rid="b24-information-02-00277">24</xref>–<xref ref-type="bibr" rid="b26-information-02-00277">26</xref>], and also [<xref ref-type="bibr" rid="b35-information-02-00277">35</xref>], [<xref ref-type="bibr" rid="b38-information-02-00277">38</xref>,<xref ref-type="bibr" rid="b39-information-02-00277">39</xref>], [<xref ref-type="bibr" rid="b54-information-02-00277">54</xref>–<xref ref-type="bibr" rid="b57-information-02-00277">57</xref>] and [<xref ref-type="bibr" rid="b78-information-02-00277">78</xref>], In fact, several subsequent psychological studies were able to confirm that, for many subjects, the intuitive or common-sense perception of primary and secondary randomness are quite discrepant. However, a careful mathematical analysis makes it possible to reconcile the two notions of randomness. These are the topics discussed in this section.</p>
<p>The relation between the joint and conditional entropy for a pair of random variables, see Section 4,
<disp-formula id="FD6">
<mml:math id="mm10" display="block">
<mml:semantics id="sm10">
<mml:mrow>
<mml:mi>H</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mi>H</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>+</mml:mo>
<mml:mi>H</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>|</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mi>H</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>+</mml:mo>
<mml:mi>H</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>|</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>motivates the definition of first, second and higher order entropies, defined over the distribution of words of size <italic>m</italic> in a string of letters from an alphabet of size <italic>a</italic>.
<disp-formula id="FD7">
<mml:math id="mm11" display="block">
<mml:semantics id="sm11">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>H</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:mo>=</mml:mo>
<mml:munder>
<mml:mo>∑</mml:mo>
<mml:mi>j</mml:mi></mml:munder>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>log</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>,</mml:mo></mml:mrow></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>H</mml:mi>
<mml:mn>2</mml:mn></mml:msub>
<mml:mo>=</mml:mo>
<mml:munder>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi></mml:mrow></mml:munder>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>|</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>log</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>|</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>H</mml:mi>
<mml:mn>3</mml:mn></mml:msub>
<mml:mo>=</mml:mo>
<mml:munder>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>k</mml:mi></mml:mrow></mml:munder>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>|</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo>|</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>log</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo>|</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>…</mml:mo></mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>It is possible to use these entropy measures to assess the disorder or lack of pattern in a given finite sequence, using the empirical probability distributions of single letters, pairs, triplets, <italic>etc</italic>. However, in order to have a significant empirical distribution of <italic>m</italic>-plets, any possible <italic>m</italic>-plet must be well represented in the sequence, that is, the word size, <italic>m</italic>, is required to be very short relative to the sequence log-size, that is, <italic>m</italic> ≪ log<italic><sub>a</sub></italic>(<italic>n</italic>).</p>
<p>In the article [<xref ref-type="bibr" rid="b36-information-02-00277">36</xref>], <xref ref-type="fig" rid="f2-information-02-00277">Figure 2</xref> displays the typical perceived or apparent randomness of Boolean (0-1) bit sequences, represented as black-and-white pixel in linear arrays, <italic>versus</italic> the second order entropy of the same strings, see also [<xref ref-type="bibr" rid="b41-information-02-00277">41</xref>]. Clearly, there is a remarkable bias of the apparent randomness relative to the entropic measure.</p>
<p>This effect is known as the <italic>gambler's fallacy</italic> when betting on <italic>cool spots</italic>. It consists of expecting the random sequence to “compensate” finite average fluctuations from expected values. This effect is also described in p. 303 of [<xref ref-type="bibr" rid="b36-information-02-00277">36</xref>]:
<disp-quote>
<p>When people invent superfluous explanations because they perceive patterns in random phenomena, they commit what is known in statistical parlance as Type I error. The other way of going awry, known as Type II error, occurs when one dismisses stimuli showing some regularity as random. The numerous randomization studies in which participants generated too many alternations and viewed this output as random, as well as the judgments of over alternating sets as maximally random in the perception studies, were all instances of type II error in research results.</p></disp-quote></p>
<p>It is known that other gamblers exhibit the opposite behavior, preferring to bet on <italic>hot spots</italic>, expecting the same fluctuations to occur repeatedly. These effects are the consequence of a perceived coupling, by a negative or positive correlation or other measure of association, between non overlapping segments that are in fact supposed to be decoupled, uncorrelated or have no association, that is, to be independent. For a statistical analysis, see [<xref ref-type="bibr" rid="b58-information-02-00277">58</xref>,<xref ref-type="bibr" rid="b59-information-02-00277">59</xref>]. A possible psychological explanation of the gambler's fallacy is given by the constructivist theory of Jean Piaget, see [<xref ref-type="bibr" rid="b79-information-02-00277">79</xref>], as quoted in p. 316 of [<xref ref-type="bibr" rid="b36-information-02-00277">36</xref>], in which any “lump” in the sequence is (miss) perceived as non-random order:
<disp-quote>
<p>In analogy to Piaget's operations, which are conceived as internalized actions, perceived randomness might emerge from hypothetical action, that is, from a thought experiment in which one describes, predicts, or abbreviates the sequence. The harder the task in such a thought experiment, the more random the sequence is judged to be.</p></disp-quote></p>
<p>The same hierarchical decomposition scheme used for higher order conditional entropy measures can be adapted to measure the disorder or patternless of a sequence, relative to a given subject's model of “computer” or generation mechanism. In the case of a discrete string, this generation model could be, for example, a deterministic or probabilistic Turing machine, a fixed or variable length Markov chain, <italic>etc.</italic> It is assumed that the model is regulated by a code, program or vector parameter, <italic>θ</italic>, and outputs a data vector or observed string, <italic>x.</italic> The hierarchical complexity measure of such a model emulates the Bayesian prior and conditional likelihood decomposition, <italic>H</italic>(<italic>p</italic>(<italic>θ</italic>, <italic>x</italic>)) = <italic>H</italic>(<italic>p</italic>(<italic>θ</italic>)) + <italic>H</italic>(<italic>p</italic>(<italic>x</italic> | <italic>θ</italic>)), that is, the total complexity is given by the complexity of the program plus the complexity of the output given the program. This is the starting point for several complexity models, like Andrey Kolmogorov, Ray Solomonoff and Gregory Chaitin's computational complexity models, Jorma Rissanen's Minimum Description Length (MDL), and Chris Wallace and David Boulton's Minimum Message Length (MML). All these alternative complexity models can also be used to successfully reconcile the notions of primary and secondary randomness, showing that they are asymptotically equivalent, see [<xref ref-type="bibr" rid="b80-information-02-00277">80</xref>–<xref ref-type="bibr" rid="b85-information-02-00277">85</xref>].</p></sec></sec>
<sec>
<label>4.</label>
<title>Entropy and Its Use in Mathematical Statistics</title>
<p>Entropy is the cornerstone concept of the preceding section, used as a central idea in the understanding of order and disorder in stochastic processes. Entropy is the key that allowed us to unlock the mysteries and solve the paradoxes of subjective randomness, making it possible to reconcile the notions of unpredictability of stochastic process and patternless of randomly generated sequences. Similar entropy based arguments reappear, in more abstract, subtle or intricate forms, in the analysis of technical aspects of Bayesian statistics like, for example, the use of prior and posterior distributions and the interpretation of their informational content. This section gives a short review covering the definition of entropy, its main properties, and some of its most important uses in mathematical statistics.</p>
<p>The origins of the entropy concept lay in the fields of Thermodynamics and Statistical Physics, but its applications have extended far and wide to many other phenomena, physical or not. The entropy of a probability distribution, <italic>H</italic>(<italic>p</italic>(<italic>x</italic>)), is a measure of uncertainty (or impurity, confusion) in a system whose states, <italic>x</italic> ∈ <italic>χ</italic>, have <italic>p</italic>(<italic>x</italic>) as probability distribution. We follow closely the presentation in the following references. For the basic concepts, see [<xref ref-type="bibr" rid="b42-information-02-00277">42</xref>] and [<xref ref-type="bibr" rid="b86-information-02-00277">86</xref>–<xref ref-type="bibr" rid="b89-information-02-00277">89</xref>]. For maximum entropy (MaxEnt) characterizations, see [<xref ref-type="bibr" rid="b45-information-02-00277">45</xref>] and [<xref ref-type="bibr" rid="b90-information-02-00277">90</xref>]. For numerical optimization methods for MaxEnt problems, see [<xref ref-type="bibr" rid="b91-information-02-00277">91</xref>–<xref ref-type="bibr" rid="b95-information-02-00277">95</xref>]. For posterior asymptotic convergence, see [<xref ref-type="bibr" rid="b96-information-02-00277">96</xref>]. For a detailed analysis of the connection between MaxEnt optimization and Bayesian statistics' formalisms, that is, for a deeper view of the relation between MaxEnt and Bayes' rule updates, see [<xref ref-type="bibr" rid="b97-information-02-00277">97</xref>].</p>
<sec>
<label>4.1.</label>
<title>Convexity</title>
<p>This section introduces the notion of convexity, a concept at the heart of the definition of entropy and generalized directed divergences. Convexity arguments are also needed to prove, in the following sections, important properties of entropy and its generalizations. In this section we use the following notations: <bold>0</bold> and <bold>1</bold> are the origin and unit vector of appropriate dimension. Subscripts are used as an element index in a vector or as a row index in a matrix, and superscripts are used as an index for distinct vectors or as a column index in a matrix.</p>
<sec>
<title>Definition</title>
<p>A region <italic>S</italic> ∈ <italic>R<sup>n</sup></italic> is Convex iff, for any two points, <italic>x</italic><sup>1</sup>, <italic>x</italic><sup>2</sup> ∈ <italic>S</italic>, and weights 0 ≤ <italic>l</italic><sub>1</sub>, <italic>l</italic><sub>2</sub> ≤ 1 | <italic>l</italic><sub>1</sub> + <italic>l</italic><sub>2</sub> = 1, the convex combination of these two points remains in <italic>S</italic>, <italic>i.e. l</italic><sub>1</sub><italic>x</italic><sup>1</sup> + <italic>l</italic><sub>2</sub><italic>x</italic><sup>2</sup> ∈ <italic>S</italic>.</p></sec>
<sec>
<title>Theorem</title>
<p>Finite Convex Combination: A region <italic>S</italic> ∈ <italic>R<sup>n</sup></italic> is Convex iff any (finite) convex combination of its points remains in the region, <italic>i.e.</italic>, ∀ <bold>0</bold> ≤ <italic>l ≤</italic> <bold>1</bold> | <bold>1</bold>′<italic>l</italic> = 1, <italic>X</italic> = [<italic>x</italic><sup>1</sup>, <italic>x</italic><sup>2</sup>, … <italic>x<sup>m</sup></italic>], <italic>x<sup>j</sup></italic> ∈ <italic>S</italic>,
<disp-formula id="FD8">
<mml:math id="mm12" display="block">
<mml:semantics id="sm12">
<mml:mrow>
<mml:mi>X</mml:mi>
<mml:mspace width="0.2em"/>
<mml:mi>l</mml:mi>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mn>1</mml:mn>
<mml:mn>1</mml:mn></mml:msubsup></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mn>1</mml:mn>
<mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mo>…</mml:mo></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mn>1</mml:mn>
<mml:mi>m</mml:mi></mml:msubsup></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mn>2</mml:mn>
<mml:mn>1</mml:mn></mml:msubsup></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mn>2</mml:mn>
<mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mo>…</mml:mo></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mn>2</mml:mn>
<mml:mi>m</mml:mi></mml:msubsup></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mo>⋮</mml:mo></mml:mtd>
<mml:mtd>
<mml:mo>⋮</mml:mo></mml:mtd>
<mml:mtd>
<mml:mo>⋱</mml:mo></mml:mtd>
<mml:mtd>
<mml:mo>⋮</mml:mo></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mi>n</mml:mi>
<mml:mn>1</mml:mn></mml:msubsup></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mi>n</mml:mi>
<mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mo>…</mml:mo></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>m</mml:mi></mml:msubsup></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow>
<mml:mo>]</mml:mo></mml:mrow>
<mml:mspace width="0.5em"/>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>l</mml:mi>
<mml:mn>1</mml:mn></mml:msub></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>l</mml:mi>
<mml:mn>2</mml:mn></mml:msub></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mo>…</mml:mo></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>l</mml:mi>
<mml:mi>m</mml:mi></mml:msub></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow>
<mml:mo>]</mml:mo></mml:mrow>
<mml:mo>∈</mml:mo>
<mml:mi>S</mml:mi></mml:mrow></mml:semantics></mml:math></disp-formula></p></sec>
<sec>
<title>Proof</title>
<p>By induction in the number of points, <italic>m</italic>.</p></sec>
<sec>
<title>Definition</title>
<p>The Epigraph of the function <italic>φ</italic> : <italic>R<sup>n</sup></italic> → <italic>R</italic> is the region of <italic>X</italic> “above the graph” of <italic>φ</italic>, <italic>i.e.</italic>,
<disp-formula id="FD9">
<mml:math id="mm13" display="block">
<mml:semantics id="sm13">
<mml:mrow>
<mml:mtext>Epi</mml:mtext>
<mml:mspace width="0.2em"/>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>φ</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mspace width="0.2em"/>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mspace width="0.2em"/>
<mml:mi>x</mml:mi>
<mml:mo>∈</mml:mo>
<mml:msup>
<mml:mi>R</mml:mi>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msup>
<mml:mspace width="0.2em"/>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mspace width="0.2em"/>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msub>
<mml:mo>≥</mml:mo>
<mml:mi>φ</mml:mi>
<mml:mspace width="0.2em"/>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mspace width="0.1em"/>
<mml:msup>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>2</mml:mn></mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>n</mml:mi></mml:msub>
<mml:mo stretchy="false">]</mml:mo></mml:mrow>
<mml:mo>′</mml:mo></mml:msup>
<mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow>
<mml:mspace width="0.1em"/>
<mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:semantics></mml:math></disp-formula></p></sec>
<sec>
<title>Definition</title>
<p>A function <italic>φ</italic> is convex iff its epigraph is convex. A function <italic>φ</italic> is concave iff −<italic>φ</italic> is convex.</p></sec>
<sec>
<title>Theorem</title>
<p>A differentiable function, <italic>φ</italic> : <italic>R</italic> → <italic>R</italic>, with non negative second derivative is convex.</p></sec>
<sec>
<title>Proof</title>
<p>Consider <italic>x</italic><sup>0</sup> = <italic>l</italic><sub>1</sub><italic>x</italic><sup>1</sup> + <italic>l</italic><sub>2</sub><italic>x</italic><sup>2</sup>, and the Taylor expansion around <italic>x</italic><sup>0</sup>,
<disp-formula id="FD10">
<mml:math id="mm14" display="block">
<mml:semantics id="sm14">
<mml:mrow>
<mml:mi>φ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mi>φ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msup>
<mml:mi>x</mml:mi>
<mml:mn>0</mml:mn></mml:msup>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>+</mml:mo>
<mml:msup>
<mml:mi>φ</mml:mi>
<mml:mo>′</mml:mo></mml:msup>
<mml:mo stretchy="false">(</mml:mo>
<mml:msup>
<mml:mi>x</mml:mi>
<mml:mn>0</mml:mn></mml:msup>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>−</mml:mo>
<mml:msup>
<mml:mi>x</mml:mi>
<mml:mn>0</mml:mn></mml:msup>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>+</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">/</mml:mo>
<mml:mn>2</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:msup>
<mml:mi>φ</mml:mi>
<mml:mo>″</mml:mo></mml:msup>
<mml:mo stretchy="false">(</mml:mo>
<mml:msup>
<mml:mi>x</mml:mi>
<mml:mo>∗</mml:mo></mml:msup>
<mml:mo stretchy="false">)</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>−</mml:mo>
<mml:msup>
<mml:mi>x</mml:mi>
<mml:mn>0</mml:mn></mml:msup>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:semantics></mml:math></disp-formula>where <italic>x</italic>* is an appropriate intermediate point. If <italic>φ</italic>″(<italic>x*</italic>) &gt; 0 the last term is positive. Now, making <italic>x</italic> = <italic>x</italic><sup>1</sup> and <italic>x</italic> = <italic>x</italic><sup>2</sup> we have, respectively, that <italic>φ</italic>(<italic>x</italic><sup>1</sup>) ≥ <italic>φ</italic>(<italic>x</italic><sup>0</sup>) + <italic>φ</italic>′(<italic>x</italic><sup>0</sup>)<italic>l</italic><sub>1</sub>(<italic>x</italic><sup>1</sup> − <italic>x</italic><sup>2</sup>) and <italic>φ</italic>(<italic>x</italic><sup>2</sup>) ≥ <italic>φ</italic>(<italic>x</italic><sup>0</sup>) + <italic>φ</italic>′(<italic>x</italic><sup>0</sup>)<italic>l</italic><sub>2</sub>(<italic>x</italic><sup>2</sup> − <italic>x</italic><sup>1</sup>) multipying the first inequality by <italic>l</italic><sub>1</sub>, the second by <italic>l</italic><sub>2</sub>, and adding them, we obtain the desired result.</p></sec>
<sec>
<title>Theorem</title>
<p>Jensen Inequality: If <italic>φ</italic> is a convex function,
<disp-formula id="Fd11">
<mml:math id="mm15" display="block">
<mml:semantics id="sm15">
<mml:mrow>
<mml:mi>E</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>φ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>≥</mml:mo>
<mml:mi>φ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mtext>E</mml:mtext>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>For discrete distributions the Jensen inequality is a special case of the finite convex combination theorem. Arguments of Analysis allow us to extend the result to continuous distributions.</p></sec></sec>
<sec>
<label>4.2.</label>
<title>Boltzmann-Gibbs-Shannon Entropy</title>
<p>If <italic>H</italic>(<italic>p</italic>(<italic>x</italic>)) is to be a measure of uncertainty, it is reasonable that it should satisfy the following list of requirements. For the sake of simplicity, we present several aspects of the theory in finite spaces.</p>
<list list-type="order">
<list-item>
<p>If the system has <italic>n</italic> possible states, <italic>x</italic><sub>1</sub>, … <italic>x<sub>n</sub></italic>, the entropy of the system with a given distribution, <italic>p<sub>i</sub></italic>≡ <italic>p</italic>(<italic>x<sub>i</sub></italic>), is a function
<disp-formula id="FD12">
<mml:math id="mm16" display="block">
<mml:semantics id="sm16">
<mml:mrow>
<mml:mi>H</mml:mi>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>H</mml:mi>
<mml:mi>n</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>n</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:semantics></mml:math></disp-formula></p></list-item>
<list-item>
<p><italic>H</italic> is a continuous function.</p></list-item>
<list-item>
<p><italic>H</italic> is a function symmetric in its arguments.</p></list-item>
<list-item>
<p>The entropy is unchanged if an impossible state is added to the system, <italic>i.e.</italic>,
<disp-formula id="FD13">
<mml:math id="mm17" display="block">
<mml:semantics id="sm17">
<mml:mrow>
<mml:msub>
<mml:mi>H</mml:mi>
<mml:mi>n</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>…</mml:mo>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>n</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>H</mml:mi>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>…</mml:mo>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>n</mml:mi></mml:msub>
<mml:mo>,</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:semantics></mml:math></disp-formula></p></list-item>
<list-item>
<p>The system's entropy is minimal and null when the system is fully determined, <italic>i.e.</italic>,
<disp-formula id="FD14">
<mml:math id="mm18" display="block">
<mml:semantics id="sm18">
<mml:mrow>
<mml:msub>
<mml:mi>H</mml:mi>
<mml:mi>n</mml:mi></mml:msub>
<mml:mo>=</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo>,</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn></mml:mrow></mml:semantics></mml:math></disp-formula></p></list-item>
<list-item>
<p>The system's entropy is maximal when all states are equally probable, <italic>i.e.</italic>,
<disp-formula id="FD15">
<mml:math id="mm19" display="block">
<mml:semantics id="sm19">
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>n</mml:mi></mml:mfrac>
<mml:mn mathvariant="bold">1</mml:mn>
<mml:mo stretchy="false">}</mml:mo>
<mml:mo>=</mml:mo>
<mml:mo>arg</mml:mo>
<mml:mo>max</mml:mo>
<mml:msub>
<mml:mi>H</mml:mi>
<mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:semantics></mml:math></disp-formula></p></list-item>
<list-item>
<p>A system maximal entropy increases with the number of states, <italic>i.e.</italic>,
<disp-formula id="FD16">
<mml:math id="mm20" display="block">
<mml:semantics id="sm20">
<mml:mrow>
<mml:msub>
<mml:mi>H</mml:mi>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msub>
<mml:mspace width="0.2em"/>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:mfrac>
<mml:mn mathvariant="bold">1</mml:mn></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow>
<mml:mo>&gt;</mml:mo>
<mml:msub>
<mml:mi>H</mml:mi>
<mml:mi>n</mml:mi></mml:msub>
<mml:mrow>
<mml:mspace width="0.2em"/>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>n</mml:mi></mml:mfrac>
<mml:mn mathvariant="bold">1</mml:mn></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:semantics></mml:math></disp-formula></p></list-item>
<list-item>
<p>Entropy is an extensive quantity, <italic>i.e.</italic>, given two independent systems, with distributions <italic>p</italic> and <italic>q</italic>, the entropy of the composite system is additive, <italic>i.e.</italic>,
<disp-formula id="FD17">
<mml:math id="mm21" display="block">
<mml:semantics id="sm21">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>H</mml:mi>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mi>m</mml:mi></mml:mrow></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>r</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>H</mml:mi>
<mml:mi>n</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>H</mml:mi>
<mml:mi>m</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>q</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>,</mml:mo></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:msub>
<mml:mi>q</mml:mi>
<mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:semantics></mml:math></disp-formula></p></list-item></list>
<p>The Boltzmann-Gibbs-Shannon measure of entropy,
<disp-formula id="FD18">
<mml:math id="mm22" display="block">
<mml:semantics id="sm22">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>H</mml:mi>
<mml:mi>n</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mo>−</mml:mo>
<mml:msub>
<mml:mi>I</mml:mi>
<mml:mi>n</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mo>−</mml:mo>
<mml:msubsup>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mi>n</mml:mi></mml:msubsup>
<mml:mrow>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>log</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mo>−</mml:mo>
<mml:msub>
<mml:mtext>E</mml:mtext>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>log</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>,</mml:mo></mml:mrow></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo>log</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>≡</mml:mo>
<mml:mn>0</mml:mn></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:semantics></mml:math></disp-formula>satisfies requirements (1) to (8), and is the most usual measure of entropy. In Physics it is usual to take the logarithm in Napier base, while in Computer Science it is usual to take base 2 and in Engineering it is usual to take base 10. The opposite of the entropy, <italic>I</italic>(<italic>p</italic>) = −<italic>H</italic>(<italic>p</italic>), the Negentropy, is a measure of Information available about the system.</p>
<p>For the Boltzmann-Gibbs-Shannon entropy we can extend requirement 8, and compute the composite Negentopy even without independence:
<disp-formula id="FD19">
<mml:math id="mm23" display="block">
<mml:semantics id="sm23">
<mml:mrow>
<mml:mtable columnalign="left">
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mrow>
<mml:msub>
<mml:mi>I</mml:mi>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mi>m</mml:mi></mml:mrow></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>r</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mspace width="0.2em"/>
<mml:mo>=</mml:mo>
<mml:mspace width="0.2em"/>
<mml:msubsup>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>m</mml:mi></mml:mrow></mml:msubsup>
<mml:mrow>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi></mml:mrow></mml:msub>
<mml:mo>log</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi></mml:mrow></mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mspace width="0.2em"/>
<mml:mo>=</mml:mo>
<mml:mspace width="0.2em"/>
<mml:msubsup>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>m</mml:mi></mml:mrow></mml:msubsup>
<mml:mrow>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>Pr</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo stretchy="false">|</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>log</mml:mo>
<mml:mspace width="0.1em"/>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mtext mathvariant="normal">Pr</mml:mtext>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo stretchy="false">|</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mo>=</mml:mo>
<mml:mspace width="0.2em"/>
<mml:msubsup>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mi>n</mml:mi></mml:msubsup>
<mml:mrow>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>log</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:msubsup>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mi>m</mml:mi></mml:msubsup>
<mml:mrow>
<mml:mo>Pr</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo stretchy="false">|</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mo>+</mml:mo>
<mml:msubsup>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mi>n</mml:mi></mml:msubsup>
<mml:mrow>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:msubsup>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mi>m</mml:mi></mml:msubsup>
<mml:mrow>
<mml:mtext mathvariant="normal">Pr</mml:mtext>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo stretchy="false">|</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mo>log</mml:mo>
<mml:mspace width="0.1em"/>
<mml:mo stretchy="false">(</mml:mo>
<mml:mtext>Pr</mml:mtext>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo stretchy="false">|</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mo>=</mml:mo>
<mml:mspace width="0.2em"/>
<mml:msub>
<mml:mi>I</mml:mi>
<mml:mi>n</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>+</mml:mo>
<mml:msubsup>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mi>n</mml:mi></mml:msubsup>
<mml:mrow>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:msub>
<mml:mi>I</mml:mi>
<mml:mi>m</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:msup>
<mml:mi>q</mml:mi>
<mml:mi>i</mml:mi></mml:msup>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mrow>
<mml:mtext>where</mml:mtext>
<mml:mo>,</mml:mo></mml:mrow></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:msubsup>
<mml:mi>q</mml:mi>
<mml:mi>j</mml:mi>
<mml:mi>i</mml:mi></mml:msubsup>
<mml:mo>=</mml:mo>
<mml:mtext>Pr</mml:mtext>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo stretchy="false">|</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>If we add this last identity as item number 9 in the list of requirements, we have a characterization of Boltzmann-Gibbs-Shannon entropy, see [<xref ref-type="bibr" rid="b87-information-02-00277">87</xref>–<xref ref-type="bibr" rid="b89-information-02-00277">89</xref>].</p>
<p>Like many important concepts, this measure of entropy was discovered and re-discovered several times in different contexts, and sometimes the uniqueness and identity of the concept was not immediately recognized. A well known anecdote refers the answer given by von Neumann, after Shannon asked him how to call a “newly” discovered concept in Information Theory. As reported by Shannon in p. 180 of [<xref ref-type="bibr" rid="b98-information-02-00277">98</xref>]:
<disp-quote>
<p>“My greatest concern was what to call it. I thought of calling it information, but the word was overly used, so I decided to call it uncertainty. When I discussed it with John von Neumann, he had a better idea. Von Neumann told me, You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, nobody knows what entropy really is, so in a debate you will always have the advantage.”</p></disp-quote></p></sec>
<sec>
<label>4.3.</label>
<title>Csiszar's Divergence</title>
<p>In order to check that requirement (6) is satisfied, we can use (with <italic>q</italic> ∝ 1) the following lemma:</p>
<sec>
<title>Lemma</title>
<p>Shannon Inequality.</p>
<p>If <italic>p</italic> and <italic>q</italic> are two distributions over a system with <italic>n</italic> possible states, and <italic>q<sub>i</sub></italic>≠ 0, then the Information of <italic>p</italic> Relative to <italic>q</italic>, <italic>I<sub>n</sub></italic>(<italic>p</italic>, <italic>q</italic>), is positive, except if <italic>p</italic> = <italic>q</italic>, when it is null,
<disp-formula id="FD20">
<mml:math id="mm24" display="block">
<mml:semantics id="sm24">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>I</mml:mi>
<mml:mi>n</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>q</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>≡</mml:mo>
<mml:msubsup>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mi>n</mml:mi></mml:msubsup>
<mml:mrow>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>log</mml:mo>
<mml:mspace width="0.2em"/>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>i</mml:mi></mml:msub></mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>q</mml:mi>
<mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow>
<mml:mo>,</mml:mo></mml:mrow></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>I</mml:mi>
<mml:mi>n</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>q</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>≥</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>I</mml:mi>
<mml:mi>n</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>q</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>⇒</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo>=</mml:mo>
<mml:mi>q</mml:mi></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:semantics></mml:math></disp-formula></p></sec>
<sec>
<title>Proof</title>
<p>By Jensen inequality, if <italic>φ</italic> is a convex function,
<disp-formula id="FD21">
<mml:math id="mm25" display="block">
<mml:semantics id="sm25">
<mml:mrow>
<mml:mtext>E</mml:mtext>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>φ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>≥</mml:mo>
<mml:mi>φ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mtext>E</mml:mtext>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>Taking
<disp-formula id="FD22">
<mml:math id="mm26" display="block">
<mml:semantics id="sm26">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>φ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mi>t</mml:mi>
<mml:mspace width="0.2em"/>
<mml:mtext>In</mml:mtext>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mspace width="0.2em"/>
<mml:mtext>and</mml:mtext>
<mml:mspace width="0.2em"/>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>i</mml:mi></mml:msub></mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>q</mml:mi>
<mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mtext>E</mml:mtext>
<mml:mi>q</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:msubsup>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mi>n</mml:mi></mml:msubsup>
<mml:mrow>
<mml:msub>
<mml:mi>q</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>i</mml:mi></mml:msub></mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>q</mml:mi>
<mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mfrac>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>I</mml:mi>
<mml:mi>n</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>q</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>q</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>log</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>≥</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>log</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn></mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>Shannon's inequality motivates the use of the Relative Information as a measure of (non symmetric) “distance” between distributions. In Statistics this measure is known as the Kullback-Leibler distance. The denominations Directed Divergence or Cross Information are used in Engineering. The proof of Shannon inequality motivates the following generalization of divergence:</p></sec>
<sec>
<title>Definition</title>
<p>Csiszar's <italic>φ</italic>-divergence.</p>
<p>Given a convex function <italic>φ</italic>,
<disp-formula id="FD23">
<mml:math id="mm27" display="block">
<mml:semantics id="sm27">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>φ</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>q</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mspace width="0.2em"/>
<mml:mo>=</mml:mo>
<mml:mspace width="0.2em"/>
<mml:msubsup>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mi>n</mml:mi></mml:msubsup>
<mml:mrow>
<mml:msub>
<mml:mi>q</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mi>φ</mml:mi>
<mml:mrow>
<mml:mspace width="0.2em"/>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>i</mml:mi></mml:msub></mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>q</mml:mi>
<mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mi>φ</mml:mi>
<mml:mrow>
<mml:mspace width="0.2em"/>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mn>0</mml:mn>
<mml:mn>0</mml:mn></mml:mfrac></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow>
<mml:mspace width="0.2em"/>
<mml:mo>=</mml:mo>
<mml:mspace width="0.2em"/>
<mml:mn>0</mml:mn>
<mml:mspace width="0.1em"/>
<mml:mo>,</mml:mo>
<mml:mn>0</mml:mn>
<mml:mi>φ</mml:mi>
<mml:mrow>
<mml:mspace width="0.2em"/>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mi>c</mml:mi>
<mml:mn>0</mml:mn></mml:mfrac></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow>
<mml:mo>=</mml:mo>
<mml:mi>c</mml:mi>
<mml:munder>
<mml:mrow>
<mml:mo>lim</mml:mo></mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>→</mml:mo>
<mml:mo>∞</mml:mo></mml:mrow></mml:munder>
<mml:mfrac>
<mml:mrow>
<mml:mi>φ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mi>t</mml:mi></mml:mfrac></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>For example, we can define the quadratic and the absolute divergence as
<disp-formula id="FD24">
<mml:math id="mm28" display="block">
<mml:semantics id="sm28">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msup>
<mml:mi>χ</mml:mi>
<mml:mn>2</mml:mn></mml:msup>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>q</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mspace width="0.2em"/>
<mml:mo>=</mml:mo>
<mml:mspace width="0.2em"/>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>−</mml:mo>
<mml:msub>
<mml:mi>q</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mn>2</mml:mn></mml:msup></mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>q</mml:mi>
<mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mfrac>
<mml:mo>,</mml:mo></mml:mrow></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mtext>for</mml:mtext>
<mml:mspace width="0.2em"/>
<mml:mi>φ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>b</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>q</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mspace width="0.2em"/>
<mml:mo>=</mml:mo>
<mml:mspace width="0.2em"/>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">|</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>−</mml:mo>
<mml:msub>
<mml:mi>q</mml:mi>
<mml:mi>i</mml:mi></mml:msub></mml:mrow>
<mml:mo stretchy="false">|</mml:mo></mml:mrow></mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>q</mml:mi>
<mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mfrac>
<mml:mo>,</mml:mo></mml:mrow></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mtext>for</mml:mtext>
<mml:mi>φ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mo>|</mml:mo></mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:semantics></mml:math></disp-formula></p></sec></sec>
<sec>
<label>4.4.</label>
<title>Maximum Entropy under Constraints</title>
<p>This section analyzes solution techniques for some problems formulated as entropy maximization. The results obtained in this section are needed to obtain some fundamental principles of Bayesian statistics, presented in the following sections. This section also presents the Bregman algorithm for solving constrained maxent problems on finite distributions. The analysis of small problems (far from asymptotic conditions) poses many interesting questions in the study of subjective randomness, an area so far neglected in the literature.</p>
<p>Given a prior distribution, <italic>q</italic>, we would like to find a vector <italic>p</italic> that minimizes the Relative Information <italic>I<sub>n</sub></italic>(<italic>p</italic>,<italic>q</italic>), where <italic>p</italic> is under the constraint of being a probability distribution, and maybe also under additional constraints over the expectation of functions taking values on the system's states, that is, we want
<disp-formula id="FD25">
<mml:math id="mm29" display="block">
<mml:semantics id="sm29">
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo>∗</mml:mo>
<mml:mo>}</mml:mo>
<mml:mo>=</mml:mo>
<mml:mo>arg</mml:mo>
<mml:mo>min</mml:mo>
<mml:msub>
<mml:mi>I</mml:mi>
<mml:mi>n</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>q</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo>≥</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>|</mml:mo>
<mml:msup>
<mml:mn mathvariant="bold">1</mml:mn>
<mml:mo>′</mml:mo></mml:msup>
<mml:mi>p</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
<mml:mspace width="0.2em"/>
<mml:mtext>and</mml:mtext>
<mml:mspace width="0.2em"/>
<mml:mi>A</mml:mi>
<mml:mi>p</mml:mi>
<mml:mo>=</mml:mo>
<mml:mi>b</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>A</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>×</mml:mo>
<mml:mi>n</mml:mi></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p><italic>p</italic>* is the Minimum Information or Maximum Entropy (MaxEnt) distribution, relative to <italic>q</italic>, given the constraints {<italic>A</italic>, <italic>b</italic>}. We can write the probability normalization constraint as a generic linear constraint, including <bold>1</bold> and 1 as the <italic>m</italic>-th (or 0-th) rows of matrix <italic>A</italic> and vector <italic>b</italic>. So doing, we do not need to keep any distinction between the normalization and the other constraints. In this article, the operators ⊙ and ⊘ indicate the point (element) wise product and division between matrices of same dimension.</p>
<p>The Lagrangian function of this optimization problem, and its derivatives are:
<disp-formula id="FD26">
<mml:math id="mm30" display="block">
<mml:semantics id="sm30">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>w</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:msup>
<mml:mi>p</mml:mi>
<mml:mo>′</mml:mo></mml:msup>
<mml:mo>log</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo>∅</mml:mo>
<mml:mi>q</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>+</mml:mo>
<mml:msup>
<mml:mi>w</mml:mi>
<mml:mo>′</mml:mo></mml:msup>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>b</mml:mi>
<mml:mo>−</mml:mo>
<mml:mi>A</mml:mi>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:mi>L</mml:mi></mml:mrow>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mfrac>
<mml:mo>=</mml:mo>
<mml:mo>log</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo stretchy="false">/</mml:mo>
<mml:msub>
<mml:mi>q</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>−</mml:mo>
<mml:msup>
<mml:mi>w</mml:mi>
<mml:mo>′</mml:mo></mml:msup>
<mml:msup>
<mml:mi>A</mml:mi>
<mml:mi>i</mml:mi></mml:msup>
<mml:mo>,</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:mi>L</mml:mi></mml:mrow>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:msub>
<mml:mi>w</mml:mi>
<mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:mfrac>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>k</mml:mi></mml:msub>
<mml:mo>−</mml:mo>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mi>k</mml:mi></mml:msub>
<mml:mi>p</mml:mi></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>Equating the <italic>n</italic> + <italic>m</italic> derivatives to zero, we have a system with <italic>n</italic> + <italic>m</italic> unknowns and equations, giving viability and optimality conditions (VOCs) for the problem:
<disp-formula id="FD27">
<mml:math id="mm31" display="block">
<mml:semantics id="sm31">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>q</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mtext>exp</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>w</mml:mi>
<mml:mo>′</mml:mo></mml:msup>
<mml:msup>
<mml:mi>A</mml:mi>
<mml:mi>i</mml:mi></mml:msup>
<mml:mo>−</mml:mo>
<mml:mn mathvariant="bold">1</mml:mn></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow>
<mml:mspace width="0.2em"/>
<mml:mtext>or</mml:mtext>
<mml:mspace width="0.2em"/>
<mml:mi>p</mml:mi>
<mml:mo>=</mml:mo>
<mml:mi>q</mml:mi>
<mml:mo>⊙</mml:mo>
<mml:mo>exp</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:msup>
<mml:mi>w</mml:mi>
<mml:mo>′</mml:mo></mml:msup>
<mml:mi>A</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mo>′</mml:mo></mml:msup>
<mml:mo>−</mml:mo>
<mml:mn mathvariant="bold">1</mml:mn>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mi>k</mml:mi></mml:msub>
<mml:mi>p</mml:mi>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>k</mml:mi></mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo>≥</mml:mo>
<mml:mn>0</mml:mn></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>We can further replace the unknown probabilities, <italic>p<sub>i</sub></italic>, writing the VOCs only on <italic>w</italic>, the dual variables (Lagrange multipliers),
<disp-formula id="FD28">
<mml:math id="mm32" display="block">
<mml:semantics id="sm32">
<mml:mrow>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mi>k</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>w</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>≡</mml:mo>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mi>k</mml:mi></mml:msub>
<mml:mspace width="0.2em"/>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>q</mml:mi>
<mml:mo>⊙</mml:mo>
<mml:mo>exp</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:msup>
<mml:mi>w</mml:mi>
<mml:mo>′</mml:mo></mml:msup>
<mml:mi>A</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mo>′</mml:mo></mml:msup>
<mml:mo>−</mml:mo>
<mml:mn mathvariant="bold">1</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>−</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>k</mml:mi></mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>The last form of the VOCs motivates the use of iterative algorithms of Gauss-Seidel type, solving the problem by cyclic iteration. In this type of algorithm, one cyclically “fits” one equation of the system, for the current value of the other variables. For a detailed analysis of this type of algorithm, see [<xref ref-type="bibr" rid="b91-information-02-00277">91</xref>–<xref ref-type="bibr" rid="b95-information-02-00277">95</xref>] and [<xref ref-type="bibr" rid="b99-information-02-00277">99</xref>].</p>
<sec>
<title>Bregman Algorithm</title>
<p>Initialization: Take <italic>t</italic> = 0, <italic>w<sup>t</sup></italic> ∈ <italic>R<sup>m</sup></italic>, and
<disp-formula id="FD29">
<mml:math id="mm33" display="block">
<mml:semantics id="sm33">
<mml:mrow>
<mml:msubsup>
<mml:mi>p</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>t</mml:mi></mml:msubsup>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>q</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>exp</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>w</mml:mi>
<mml:msup>
<mml:mi>t</mml:mi>
<mml:mo>′</mml:mo></mml:msup></mml:msup>
<mml:msup>
<mml:mi>A</mml:mi>
<mml:mi>i</mml:mi></mml:msup>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>Iteration step: for <italic>t</italic> = 1, 2, …, Take
<disp-formula id="FD30">
<mml:math id="mm34" display="block">
<mml:semantics id="sm34">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>=</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>mod</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mtext>and</mml:mtext></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mi>ν</mml:mi>
<mml:mo>|</mml:mo>
<mml:mi>φ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>ν</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mtext>where</mml:mtext></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:semantics></mml:math></disp-formula>
<disp-formula id="FD31">
<mml:math id="mm35" display="block">
<mml:semantics id="sm35">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msup>
<mml:mi>w</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msup>
<mml:mspace width="0.3em"/>
<mml:mo>=</mml:mo>
<mml:mspace width="0.3em"/>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>w</mml:mi>
<mml:mn>1</mml:mn>
<mml:mi>t</mml:mi></mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mo>…</mml:mo>
<mml:msubsup>
<mml:mi>w</mml:mi>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mi>t</mml:mi></mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mi>w</mml:mi>
<mml:mi>k</mml:mi>
<mml:mi>t</mml:mi></mml:msubsup>
<mml:mo>+</mml:mo>
<mml:mi>ν</mml:mi>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mi>w</mml:mi>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mi>t</mml:mi></mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mo>…</mml:mo>
<mml:msubsup>
<mml:mi>w</mml:mi>
<mml:mi>m</mml:mi>
<mml:mi>t</mml:mi></mml:msubsup></mml:mrow>
<mml:mo>]</mml:mo></mml:mrow></mml:mrow>
<mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msubsup>
<mml:mi>p</mml:mi>
<mml:mi>i</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msubsup>
<mml:mspace width="0.3em"/>
<mml:mo>=</mml:mo>
<mml:mspace width="0.3em"/>
<mml:msub>
<mml:mi>q</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>exp</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:msup>
<mml:mi>w</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>′</mml:mo></mml:mrow></mml:msup>
<mml:msup>
<mml:mi>A</mml:mi>
<mml:mi>i</mml:mi></mml:msup>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mspace width="0.3em"/>
<mml:mo>=</mml:mo>
<mml:mspace width="0.3em"/>
<mml:msubsup>
<mml:mi>p</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>t</mml:mi></mml:msubsup>
<mml:mo>exp</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>ν</mml:mi>
<mml:msubsup>
<mml:mi>A</mml:mi>
<mml:mi>k</mml:mi>
<mml:mi>i</mml:mi></mml:msubsup>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mi>φ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>ν</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mspace width="0.3em"/>
<mml:mo>=</mml:mo>
<mml:mspace width="0.3em"/>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mi>k</mml:mi></mml:msub>
<mml:msup>
<mml:mi>p</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msup>
<mml:mo>−</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>From our discussion of Entropy optimization under linear constraints, it should be clear that the maximum relative entropy distribution for a system under constraints on the expectation of functions taking values on the system's states,
<disp-formula id="FD32">
<mml:math id="mm36" display="block">
<mml:semantics id="sm36">
<mml:mrow>
<mml:msub>
<mml:mi>E</mml:mi>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msub>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mi>k</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>∫</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mi>k</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi>d</mml:mi>
<mml:mi>x</mml:mi>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>(including the normalization constraint, <italic>a</italic><sub>0</sub> = <bold>1</bold>, <italic>b</italic><sub>0</sub> = 1) has the form
<disp-formula id="FD33">
<mml:math id="mm37" display="block">
<mml:semantics id="sm37">
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mi>q</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>exp</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>−</mml:mo>
<mml:msub>
<mml:mi>θ</mml:mi>
<mml:mn>0</mml:mn></mml:msub>
<mml:mo>−</mml:mo>
<mml:msub>
<mml:mi>θ</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>−</mml:mo>
<mml:msub>
<mml:mi>θ</mml:mi>
<mml:mn>2</mml:mn></mml:msub>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mn>2</mml:mn></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>Notice that we took <italic>θ</italic><sub>0</sub> = −(<italic>w</italic><sub>0</sub> − 1),<italic>θ<sub>k</sub></italic> = −<italic>w<sub>k</sub></italic>, and we have also indexed the state <italic>i</italic> by variable <italic>x</italic>, so to write the last equation in the standard form used in the statistical literature.</p>
<p>Several distributions commonly used in Statistics can be interpreted as MaxEnt densities (relative to the uniform distribution, if not otherwise stated) given some constraints over the expected value of state functions. For example:</p>
<p>The Normal distribution:
<disp-formula id="FD34">
<mml:math id="mm38" display="block">
<mml:semantics id="sm38">
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">|</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>β</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>R</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>R</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>exp</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>−</mml:mo>
<mml:mfrac>
<mml:mi>n</mml:mi>
<mml:mn>2</mml:mn></mml:mfrac>
<mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>−</mml:mo>
<mml:mi>β</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mo>′</mml:mo></mml:msup>
<mml:mi>R</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>−</mml:mo>
<mml:mi>β</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:semantics></mml:math></disp-formula> is characterized as the distribution of maximum entropy on <italic>R<sup>n</sup></italic>, given the expected values of its first and second moments, <italic>i.e.</italic>, mean vector <italic>β</italic> and inverse covariance or precision matrix <italic>R</italic>.</p>
<p>The Wishart distribution:
<disp-formula id="FD35">
<mml:math id="mm39" display="block">
<mml:semantics id="sm39">
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>S</mml:mi>
<mml:mo stretchy="false">|</mml:mo>
<mml:mi>ν</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>V</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>ν</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>V</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>exp</mml:mo>
<mml:mspace width="0.3em"/>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mi>ν</mml:mi>
<mml:mo>−</mml:mo>
<mml:mi>d</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mn>2</mml:mn></mml:mfrac>
<mml:mo>log</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mtext>det</mml:mtext>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>S</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>−</mml:mo>
<mml:msub>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi></mml:mrow></mml:msub>
<mml:mrow>
<mml:msub>
<mml:mi>V</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi></mml:mrow></mml:msub>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:semantics></mml:math></disp-formula>is characterized as the distribution of maximum entropy in the support <italic>S</italic> &gt; 0, given the expected value of the elements and log-determinant of matrix <italic>S</italic>. That is, writing Γ′ for the digamma function,
<disp-formula id="FD36">
<mml:math id="mm40" display="block">
<mml:semantics id="sm40">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mtext>E</mml:mtext>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi></mml:mrow></mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>V</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi></mml:mrow></mml:msub>
<mml:mspace width="0.3em"/>
<mml:mo>,</mml:mo>
<mml:mspace width="0.2em"/></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mtext>E</mml:mtext>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>log</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>det</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>S</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:msubsup>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mi>d</mml:mi></mml:msubsup>
<mml:mrow>
<mml:msup>
<mml:mtext>Γ</mml:mtext>
<mml:mo>′</mml:mo></mml:msup>
<mml:mrow>
<mml:mspace width="0.2em"/>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mi>ν</mml:mi>
<mml:mo>−</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mn>2</mml:mn></mml:mfrac></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>The Dirichlet distribution
<disp-formula id="FD37">
<mml:math id="mm41" display="block">
<mml:semantics id="sm41">
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>|</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>exp</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mi>m</mml:mi></mml:msubsup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>θ</mml:mi>
<mml:mi>k</mml:mi></mml:msub>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>log</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>k</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:semantics></mml:math></disp-formula>is characterized as the distribution of maximum entropy in the simplex support, <italic>x</italic> ≥ 0 | <bold>1</bold>′<italic>x</italic> = 1, given the expected values of the log-coordinates, E(log(<italic>x<sub>k</sub></italic>)). In this parameterization, E(<italic>x<sub>k</sub></italic>) = <italic>θ<sub>k</sub></italic>.</p></sec>
<sec>
<title>Jeffrey's Rule</title>
<p>Richard Jeffrey considered the problem of updating an old probability distribution, <italic>q</italic>, to a new distribution, <italic>p</italic>, given new constraints on the probabilities of a partition, that is,
<disp-formula id="FD38">
<mml:math id="mm42" display="block">
<mml:semantics id="sm42">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>∈</mml:mo>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:msub>
<mml:mrow>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>α</mml:mi>
<mml:mi>k</mml:mi></mml:msub></mml:mrow>
<mml:mo>,</mml:mo></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mo>∑</mml:mo>
<mml:mi>k</mml:mi></mml:msub>
<mml:mrow>
<mml:msub>
<mml:mi>α</mml:mi>
<mml:mi>k</mml:mi></mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo></mml:mrow></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:mo>∪</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo>∪</mml:mo>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>m</mml:mi></mml:msub>
<mml:mo>=</mml:mo>
<mml:mo>{</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>}</mml:mo>
<mml:mo>,</mml:mo></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>l</mml:mi></mml:msub>
<mml:mo>∩</mml:mo>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>k</mml:mi></mml:msub>
<mml:mo>=</mml:mo>
<mml:mo>Ø</mml:mo>
<mml:mo>,</mml:mo></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>≠</mml:mo>
<mml:mi>k</mml:mi></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>His solution to this problem, known as the <italic>Jeffrey's rule</italic>, coincides with the minimum information divergence distribution, relative to <italic>q</italic>, given the new constraints. This solution can be expressed analytically as
<disp-formula id="FD39">
<mml:math id="mm43" display="block">
<mml:semantics id="sm43">
<mml:mrow>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>α</mml:mi>
<mml:mi>k</mml:mi></mml:msub>
<mml:msub>
<mml:mi>q</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo stretchy="false">/</mml:mo>
<mml:msub>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>∈</mml:mo>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:msub>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>q</mml:mi>
<mml:mi>j</mml:mi></mml:msub>
<mml:mo>,</mml:mo></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>|</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>∈</mml:mo>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow></mml:semantics></mml:math></disp-formula></p></sec></sec>
<sec>
<label>4.5.</label>
<title>Fisher's Metric and Jeffreys' Prior</title>
<p>In this section the Fisher Information Matrix is defined and used to obtain the geometrically invariant Jeffreys' prior distributions. These distributions also have interesting asymptotic properties concerning the representation of vague or no information. The properties of Fisher's metric discussed in this section are also needed to establish further asymptotic results in the next section.</p>
<p>The Fisher Information Matrix, <italic>J</italic>(<italic>θ</italic>), is defined as minus the expected Hessian of the log-likelihood. Under appropriate regularity conditions, the <italic>information geometry</italic> is defined by the metric in the parameter space given by the Fisher information matrix, that is, the geometric length of a curve is computed integrating the form <italic>dl</italic><sup>2</sup> = <italic>dθ</italic>′ <italic>J</italic>(<italic>θ</italic>)<italic>dθ</italic>.</p>
<sec>
<title>Lemma</title>
<p>The Fisher information matrix can also be written as the covariance matrix of the gradient of the same likelihood, <italic>i.e.</italic>,
<disp-formula id="FD40">
<mml:math id="mm44" display="block">
<mml:semantics id="sm44">
<mml:mrow>
<mml:mi>J</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>≡</mml:mo>
<mml:mo>−</mml:mo>
<mml:msub>
<mml:mtext>E</mml:mtext>
<mml:mi>χ</mml:mi></mml:msub>
<mml:mfrac>
<mml:mrow>
<mml:msup>
<mml:mo>∂</mml:mo>
<mml:mn>2</mml:mn></mml:msup>
<mml:mo>log</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>|</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:msup>
<mml:mi>θ</mml:mi>
<mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mfrac>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mtext>E</mml:mtext>
<mml:mi>χ</mml:mi></mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:mo>log</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>|</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:mi>θ</mml:mi></mml:mrow></mml:mfrac>
<mml:mfrac>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:mo>log</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>|</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:mi>θ</mml:mi></mml:mrow></mml:mfrac></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:semantics></mml:math></disp-formula></p></sec>
<sec>
<title>Proof</title>
<p>
<disp-formula id="FD41">
<mml:math id="mm45" display="block">
<mml:semantics id="sm45">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mo>∫</mml:mo>
<mml:mi>χ</mml:mi></mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">|</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi>d</mml:mi>
<mml:mi>x</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>⇒</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mo>∫</mml:mo>
<mml:mi>χ</mml:mi></mml:msub>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">|</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:mi>θ</mml:mi></mml:mrow></mml:mfrac>
<mml:mi>d</mml:mi>
<mml:mi>x</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>⇒</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mo>∫</mml:mo>
<mml:mi>χ</mml:mi></mml:msub>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">|</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:mi>θ</mml:mi></mml:mrow></mml:mfrac>
<mml:mfrac>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">|</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">|</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mfrac>
<mml:mi>d</mml:mi>
<mml:mi>x</mml:mi>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:mo>log</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">|</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:mi>θ</mml:mi></mml:mrow></mml:mfrac>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">|</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi>d</mml:mi>
<mml:mi>x</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn></mml:mrow></mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:semantics></mml:math></disp-formula>differentiating again relative to the parameter,
<disp-formula id="FD42">
<mml:math id="mm46" display="block">
<mml:semantics id="sm46">
<mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mo>∫</mml:mo>
<mml:mi>χ</mml:mi></mml:msub>
<mml:mrow>
<mml:mspace width="0.2em"/>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:msup>
<mml:mo>∂</mml:mo>
<mml:mn>2</mml:mn></mml:msup>
<mml:mo>log</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">|</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:msup>
<mml:mi>θ</mml:mi>
<mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mfrac>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">|</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>+</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:mo>log</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>|</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:mi>θ</mml:mi></mml:mrow></mml:mfrac>
<mml:mfrac>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">|</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:mi>θ</mml:mi></mml:mrow></mml:mfrac></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow>
<mml:mspace width="0.2em"/>
<mml:mi>d</mml:mi>
<mml:mi>x</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn></mml:mrow></mml:semantics></mml:math></disp-formula>observing that the second term can be written as
<disp-formula id="FD43">
<mml:math id="mm47" display="block">
<mml:semantics id="sm47">
<mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mo>∫</mml:mo>
<mml:mi>χ</mml:mi></mml:msub>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:mo>log</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">|</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:mi>θ</mml:mi></mml:mrow></mml:mfrac>
<mml:mfrac>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">|</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:mi>θ</mml:mi></mml:mrow></mml:mfrac>
<mml:mfrac>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">|</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">|</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mfrac>
<mml:mi>d</mml:mi>
<mml:mi>x</mml:mi>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mo>∫</mml:mo>
<mml:mi>χ</mml:mi></mml:msub>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:mo>log</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">|</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:mi>θ</mml:mi></mml:mrow></mml:mfrac>
<mml:mfrac>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:mo>log</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">|</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:mi>θ</mml:mi></mml:mrow></mml:mfrac>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">|</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi>d</mml:mi>
<mml:mi>x</mml:mi></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:semantics></mml:math></disp-formula>we obtain the lemma.</p>
<p>Harold Jeffreys used the Fisher metric to define a class of prior distributions, proportional to the determinant of the information matrix, 
<disp-formula id="FD44">
<mml:math id="mm48" display="block">
<mml:semantics id="sm48">
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>∝</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mrow>
<mml:mi>J</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mo>|</mml:mo></mml:mrow></mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">/</mml:mo>
<mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>Lemma: Jeffreys' priors are geometric objects in the sense of being invariant by a continuous and differentiable change of coordinates in the parameter space, <italic>η</italic> = <italic>f</italic>(<italic>θ</italic>). The proof follows pp. 41–54 of [<xref ref-type="bibr" rid="b100-information-02-00277">100</xref>]:</p></sec>
<sec>
<title>Proof</title>
<p>
<disp-formula id="FD45">
<mml:math id="mm49" display="block">
<mml:semantics id="sm49">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>J</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mspace width="0.1em"/>
<mml:mo>=</mml:mo>
<mml:mspace width="0.1em"/>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:mi>n</mml:mi></mml:mrow>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:mi>θ</mml:mi></mml:mrow></mml:mfrac></mml:mrow>
<mml:mo>]</mml:mo></mml:mrow>
<mml:mspace width="0.3em"/>
<mml:mi>J</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>η</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mspace width="0.3em"/>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:mi>n</mml:mi></mml:mrow>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:mi>θ</mml:mi></mml:mrow></mml:mfrac></mml:mrow>
<mml:mo>]</mml:mo></mml:mrow></mml:mrow>
<mml:mo>′</mml:mo></mml:msup>
<mml:mspace width="0.3em"/>
<mml:mo>,</mml:mo>
<mml:mspace width="0.3em"/>
<mml:mtext>hence</mml:mtext></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mrow>
<mml:mi>J</mml:mi>
<mml:mspace width="0.1em"/>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mo>|</mml:mo></mml:mrow></mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">/</mml:mo>
<mml:mn>2</mml:mn></mml:mrow></mml:msup>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:mi>n</mml:mi></mml:mrow>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:mi>θ</mml:mi></mml:mrow></mml:mfrac></mml:mrow>
<mml:mo>|</mml:mo></mml:mrow>
<mml:mspace width="0.3em"/>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mrow>
<mml:mi>J</mml:mi>
<mml:mspace width="0.1em"/>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>η</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mo>|</mml:mo></mml:mrow></mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">/</mml:mo>
<mml:mn>2</mml:mn></mml:mrow></mml:msup>
<mml:mo>,</mml:mo>
<mml:mspace width="0.2em"/>
<mml:mtext>and</mml:mtext></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mrow>
<mml:mi>J</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mo>|</mml:mo></mml:mrow></mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">/</mml:mo>
<mml:mn>2</mml:mn></mml:mrow></mml:msup>
<mml:mi>d</mml:mi>
<mml:mi>θ</mml:mi>
<mml:mo>=</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mrow>
<mml:mi>J</mml:mi>
<mml:mspace width="0.2em"/>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>η</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mo>|</mml:mo></mml:mrow></mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">/</mml:mo>
<mml:mn>2</mml:mn></mml:mrow></mml:msup>
<mml:mspace width="0.2em"/>
<mml:mi>d</mml:mi>
<mml:mi>η</mml:mi>
<mml:mo>.</mml:mo>
<mml:mspace width="0.2em"/>
<mml:mtext>Q.E.D.</mml:mtext></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>Example: For the multinomial distribution,
<disp-formula id="FD46">
<mml:math id="mm50" display="block">
<mml:semantics id="sm50">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>y</mml:mi>
<mml:mo stretchy="false">|</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>!</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mo>∏</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mi>m</mml:mi></mml:msubsup>
<mml:mrow>
<mml:msubsup>
<mml:mi>θ</mml:mi>
<mml:mi>i</mml:mi>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:msubsup></mml:mrow></mml:mrow>
<mml:mo stretchy="false">/</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mo>∏</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mi>m</mml:mi></mml:msubsup>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>!</mml:mo>
<mml:mspace width="0.3em"/></mml:mrow>
<mml:mo>,</mml:mo>
<mml:mspace width="0.2em"/>
<mml:msub>
<mml:mi>θ</mml:mi>
<mml:mi>m</mml:mi></mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>−</mml:mo>
<mml:msubsup>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msubsup>
<mml:mrow>
<mml:msub>
<mml:mi>θ</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mspace width="0.2em"/>
<mml:mo>,</mml:mo>
<mml:mspace width="0.2em"/>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>m</mml:mi></mml:msub>
<mml:mo>=</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>−</mml:mo>
<mml:msubsup>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msubsup>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:mo>=</mml:mo>
<mml:mo>log</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo>|</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:msubsup>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mi>m</mml:mi></mml:msubsup>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>log</mml:mo>
<mml:msub>
<mml:mi>θ</mml:mi>
<mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:msup>
<mml:mo>∂</mml:mo>
<mml:mn>2</mml:mn></mml:msup>
<mml:mi>L</mml:mi></mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>∂</mml:mo>
<mml:msub>
<mml:mi>θ</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mfrac>
<mml:mo>=</mml:mo>
<mml:mo>−</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi></mml:msub></mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mi>θ</mml:mi>
<mml:mi>i</mml:mi>
<mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mfrac>
<mml:mo>+</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>m</mml:mi></mml:msub></mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mi>θ</mml:mi>
<mml:mi>m</mml:mi>
<mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mfrac>
<mml:mo>,</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msup>
<mml:mo>∂</mml:mo>
<mml:mn>2</mml:mn></mml:msup>
<mml:mi>L</mml:mi></mml:mrow>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:msub>
<mml:mi>θ</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>∂</mml:mo>
<mml:msub>
<mml:mi>θ</mml:mi>
<mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:mfrac>
<mml:mo>=</mml:mo>
<mml:mo>−</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>m</mml:mi></mml:msub></mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mi>θ</mml:mi>
<mml:mi>m</mml:mi>
<mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mfrac>
<mml:mspace width="0.2em"/>
<mml:mo>,</mml:mo>
<mml:mspace width="0.3em"/>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>…</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mo>−</mml:mo>
<mml:msub>
<mml:mtext>E</mml:mtext>
<mml:mi>χ</mml:mi></mml:msub>
<mml:mfrac>
<mml:mrow>
<mml:msup>
<mml:mo>∂</mml:mo>
<mml:mn>2</mml:mn></mml:msup>
<mml:mi>L</mml:mi></mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>∂</mml:mo>
<mml:msub>
<mml:mi>θ</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mfrac>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mi>n</mml:mi>
<mml:mrow>
<mml:msub>
<mml:mi>θ</mml:mi>
<mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mfrac>
<mml:mo>+</mml:mo>
<mml:mfrac>
<mml:mi>n</mml:mi>
<mml:mrow>
<mml:msub>
<mml:mi>θ</mml:mi>
<mml:mi>m</mml:mi></mml:msub></mml:mrow></mml:mfrac>
<mml:mo>,</mml:mo>
<mml:mspace width="0.2em"/>
<mml:mo>−</mml:mo>
<mml:msub>
<mml:mtext>E</mml:mtext>
<mml:mi>χ</mml:mi></mml:msub>
<mml:mfrac>
<mml:mrow>
<mml:msup>
<mml:mo>∂</mml:mo>
<mml:mn>2</mml:mn></mml:msup>
<mml:mi>L</mml:mi></mml:mrow>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:msub>
<mml:mi>θ</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>∂</mml:mo>
<mml:msub>
<mml:mi>θ</mml:mi>
<mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:mfrac>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mi>n</mml:mi>
<mml:mrow>
<mml:msub>
<mml:mi>θ</mml:mi>
<mml:mi>m</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mrow>
<mml:mi>J</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mo>|</mml:mo></mml:mrow>
<mml:mo>=</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>θ</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:msub>
<mml:mi>θ</mml:mi>
<mml:mn>2</mml:mn></mml:msub>
<mml:mo>…</mml:mo>
<mml:msub>
<mml:mi>θ</mml:mi>
<mml:mi>m</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msup>
<mml:mspace width="0.3em"/>
<mml:mo>,</mml:mo>
<mml:mspace width="0.3em"/>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>∝</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>θ</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:msub>
<mml:mi>θ</mml:mi>
<mml:mn>2</mml:mn></mml:msub>
<mml:mo>…</mml:mo>
<mml:msub>
<mml:mi>θ</mml:mi>
<mml:mi>m</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">/</mml:mo>
<mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">|</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>∝</mml:mo>
<mml:msubsup>
<mml:mi>θ</mml:mi>
<mml:mn>1</mml:mn>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">/</mml:mo>
<mml:mn>2</mml:mn></mml:mrow></mml:msubsup>
<mml:msubsup>
<mml:mi>θ</mml:mi>
<mml:mn>2</mml:mn>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>2</mml:mn></mml:msub>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">/</mml:mo>
<mml:mn>2</mml:mn></mml:mrow></mml:msubsup>
<mml:mo>…</mml:mo>
<mml:msubsup>
<mml:mi>θ</mml:mi>
<mml:mi>m</mml:mi>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>m</mml:mi></mml:msub>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">/</mml:mo>
<mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>In general Jeffrey's priors are not minimally informative in any sense. However, in pp. 41–54 of [<xref ref-type="bibr" rid="b100-information-02-00277">100</xref>], Zellner gives the following argument (attributed to Lindley) to present Jeffreys' priors as “knowing little” in the sense of being asymptotically minimally informative. The following equations give several definitions related to the concept of information gain, that is expressed as the prior average information associated with an observation minus the prior information measure: <italic>I</italic>(<italic>θ</italic>)—the information measure of <italic>p</italic>(<italic>x</italic> | <italic>θ</italic>), <italic>A</italic>—the prior average information associated with an observation, <italic>G</italic>—the information gain, and <italic>G<sub>a</sub></italic>—the asymptotic information gain
<disp-formula id="FD47">
<mml:math id="mm51" display="block">
<mml:semantics id="sm51">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>I</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>∫</mml:mo>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>|</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>log</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>|</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi>d</mml:mi>
<mml:mi>x</mml:mi>
<mml:mo>;</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>∫</mml:mo>
<mml:mrow>
<mml:mi>I</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi>d</mml:mi>
<mml:mi>θ</mml:mi></mml:mrow></mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>G</mml:mi>
<mml:mo>=</mml:mo>
<mml:mi>A</mml:mi>
<mml:mo>−</mml:mo>
<mml:mrow>
<mml:mo>∫</mml:mo>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>log</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi>d</mml:mi>
<mml:mi>θ</mml:mi>
<mml:mo>;</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mi>a</mml:mi></mml:msub>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>∫</mml:mo>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:msqrt>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>|</mml:mo>
<mml:mi>J</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>|</mml:mo></mml:mrow></mml:msqrt>
<mml:mi>d</mml:mi>
<mml:mi>θ</mml:mi>
<mml:mo>−</mml:mo>
<mml:mrow>
<mml:mo>∫</mml:mo>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>log</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi>d</mml:mi>
<mml:mi>θ</mml:mi></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>Although Jeffreys' priors in general do not maximize the information gain, <italic>G</italic>, the asymptotic convergence results presented in the next section imply that Jeffrey's priors maximize the asymptotic information gain, <italic>G<sub>a</sub></italic>. For further details and generalizations, see [<xref ref-type="bibr" rid="b101-information-02-00277">101</xref>–<xref ref-type="bibr" rid="b110-information-02-00277">110</xref>].</p>
<p>Comparing the several versions of noninformative priors in the multinomial example, one can say that Jeffreys' prior “discounts” half an observation of each kind, while the maxent prior discounts one full observation, and the flat prior discounts none. Similarly, slightly different versions of uninformative priors for the multivariate normal distribution are shown in [<xref ref-type="bibr" rid="b106-information-02-00277">106</xref>]. This situation leads to the possible criticism stated by Berger in p. 89 of [<xref ref-type="bibr" rid="b104-information-02-00277">104</xref>]:
<disp-quote>
<p>Perhaps the most embarrassing feature of noninformative priors, however, is simply that there are often so many of them.</p></disp-quote></p>
<p>One response to this criticism, to which Berger explicitly subscribes in p. 90 of [<xref ref-type="bibr" rid="b104-information-02-00277">104</xref>], is that
<disp-quote>
<p>It is rare for the choice of a noninformative prior to markedly affect the answer… so that any reasonable noninformative prior can be used. Indeed, if the choice of noninformative prior does have a pronounced effect on the answer, then one is probably in a situation where it is crucial to involve subjective prior information.</p></disp-quote></p>
<p>The robustness of the inference procedures to variations on the form of the uninformative prior can be tested using sensitivity analysis, as discussed in Section 4.7 of [<xref ref-type="bibr" rid="b104-information-02-00277">104</xref>]. For alternative approaches on robustness and sensitivity analysis based on paraconsistent logic, see [<xref ref-type="bibr" rid="b4-information-02-00277">4</xref>,<xref ref-type="bibr" rid="b5-information-02-00277">5</xref>].</p></sec></sec>
<sec>
<label>4.6.</label>
<title>Posterior Asymptotic Convergence</title>
<p>Posterior convergence constitutes the principal mechanism enabling information acquisition or learning in Bayesian statistics. Arguments based on relative information, <italic>I</italic>(<italic>p</italic>, <italic>q</italic>), can be used to prove fundamental results concerning posterior distribution asymptotic convergence. This section presents two of these fundamental results, following Appendix B of [<xref ref-type="bibr" rid="b96-information-02-00277">96</xref>].</p>
<sec>
<title>Theorem</title>
<p>Posterior Consistency for Discrete Parameters:</p>
<p>Consider a model where <italic>f</italic>(<italic>θ</italic>) is the prior in a discrete parameter space, Θ = {<italic>θ</italic><sup>1</sup>, <italic>θ</italic><sup>2</sup>, …}, <italic>X</italic> = [<italic>x</italic><sup>1</sup>, … <italic>x<sup>n</sup></italic>] is a series of observations, and the posterior is given by
<disp-formula id="FD48">
<mml:math id="mm52" display="block">
<mml:semantics id="sm52">
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msup>
<mml:mi>θ</mml:mi>
<mml:mi>k</mml:mi></mml:msup>
<mml:mo>|</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>∝</mml:mo>
<mml:mi>f</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msup>
<mml:mi>θ</mml:mi>
<mml:mi>k</mml:mi></mml:msup>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo>|</mml:mo>
<mml:msup>
<mml:mi>θ</mml:mi>
<mml:mi>k</mml:mi></mml:msup>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mi>f</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msup>
<mml:mi>θ</mml:mi>
<mml:mi>k</mml:mi></mml:msup>
<mml:mo stretchy="false">)</mml:mo>
<mml:msubsup>
<mml:mo>∏</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mi>n</mml:mi></mml:msubsup>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msup>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi></mml:msup>
<mml:mo>|</mml:mo>
<mml:msup>
<mml:mi>θ</mml:mi>
<mml:mi>k</mml:mi></mml:msup>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>Further, assume that this model has a unique vector parameter, <italic>θ</italic><sup>0</sup>, giving the best approximation for the “true” predictive distribution <italic>g</italic>(<italic>x</italic>), in the sense that it minimizes the relative information
<disp-formula id="FD49">
<mml:math id="mm53" display="block">
<mml:semantics id="sm53">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:msup>
<mml:mi>θ</mml:mi>
<mml:mn>0</mml:mn></mml:msup>
<mml:mo>}</mml:mo>
<mml:mspace width="0.3em"/>
<mml:mo>=</mml:mo>
<mml:mspace width="0.3em"/>
<mml:mo>arg</mml:mo>
<mml:munder>
<mml:mo>min</mml:mo>
<mml:mi>k</mml:mi></mml:munder>
<mml:mi>I</mml:mi>
<mml:mspace width="0.1em"/>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>g</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>|</mml:mo>
<mml:msup>
<mml:mi>θ</mml:mi>
<mml:mi>k</mml:mi></mml:msup>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>I</mml:mi>
<mml:mspace width="0.2em"/>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>g</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>|</mml:mo>
<mml:msup>
<mml:mi>θ</mml:mi>
<mml:mi>k</mml:mi></mml:msup>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow>
<mml:mspace width="0.3em"/>
<mml:mo>=</mml:mo>
<mml:mspace width="0.3em"/>
<mml:mrow>
<mml:msub>
<mml:mo>∫</mml:mo>
<mml:mi>χ</mml:mi></mml:msub>
<mml:mrow>
<mml:mi>g</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>log</mml:mo>
<mml:mspace width="0.2em"/>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mi>g</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>|</mml:mo>
<mml:msup>
<mml:mi>θ</mml:mi>
<mml:mi>k</mml:mi></mml:msup>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mfrac></mml:mrow>
<mml:mo>)</mml:mo>
<mml:mspace width="0.3em"/></mml:mrow>
<mml:mi>d</mml:mi>
<mml:mi>x</mml:mi>
<mml:mspace width="0.3em"/>
<mml:mo>=</mml:mo>
<mml:mspace width="0.3em"/>
<mml:msub>
<mml:mtext>E</mml:mtext>
<mml:mi>χ</mml:mi></mml:msub>
<mml:mo>log</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mi>g</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>|</mml:mo>
<mml:msup>
<mml:mi>θ</mml:mi>
<mml:mi>k</mml:mi></mml:msup>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mfrac></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>Then,
<disp-formula id="FD50">
<mml:math id="mm54" display="block">
<mml:semantics id="sm54">
<mml:mrow>
<mml:munder>
<mml:mrow>
<mml:mo>lim</mml:mo></mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>→</mml:mo>
<mml:mo>∞</mml:mo>
<mml:mspace width="0.2em"/></mml:mrow></mml:munder>
<mml:mi>f</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msup>
<mml:mi>θ</mml:mi>
<mml:mi>k</mml:mi></mml:msup>
<mml:mo stretchy="false">|</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mi>δ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msup>
<mml:mi>θ</mml:mi>
<mml:mi>k</mml:mi></mml:msup>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mi>θ</mml:mi>
<mml:mn>0</mml:mn></mml:msup>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:semantics></mml:math></disp-formula></p></sec>
<sec>
<title>Heuristic Argument</title>
<p>Consider the logarithmic coefficient
<disp-formula id="FD51">
<mml:math id="mm55" display="block">
<mml:semantics id="sm55">
<mml:mrow>
<mml:mo>log</mml:mo>
<mml:mspace width="0.2em"/>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msup>
<mml:mi>θ</mml:mi>
<mml:mi>k</mml:mi></mml:msup>
<mml:mo>|</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msup>
<mml:mi>θ</mml:mi>
<mml:mn>0</mml:mn></mml:msup>
<mml:mo>|</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mfrac></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow>
<mml:mo>=</mml:mo>
<mml:mo>log</mml:mo>
<mml:mspace width="0.2em"/>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msup>
<mml:mi>θ</mml:mi>
<mml:mi>k</mml:mi></mml:msup>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msup>
<mml:mi>θ</mml:mi>
<mml:mn>0</mml:mn></mml:msup>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mfrac></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow>
<mml:mo>+</mml:mo>
<mml:msubsup>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mi>n</mml:mi></mml:msubsup>
<mml:mrow>
<mml:mo>log</mml:mo>
<mml:mspace width="0.2em"/>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msup>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi></mml:msup>
<mml:mo>|</mml:mo>
<mml:msup>
<mml:mi>θ</mml:mi>
<mml:mi>k</mml:mi></mml:msup>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msup>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi></mml:msup>
<mml:mo>|</mml:mo>
<mml:msup>
<mml:mi>θ</mml:mi>
<mml:mn>0</mml:mn></mml:msup>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mfrac></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>The first term is a constant, and the second term is a sum which terms have all negative expected (relative to <italic>x</italic>, for <italic>k</italic> ≠ 0) value since, by our hypotheses, <italic>θ</italic><sup>0</sup> is the unique argument that minimizes <italic>I</italic>(<italic>g</italic>(<italic>x</italic>), <italic>p</italic>(<italic>x</italic>|<italic>θ<sup>k</sup></italic>)). Hence, (for <italic>k</italic> ≠ 0), the right hand side goes to minus infinite as <italic>n</italic> increases. Therefore, at the left hand side, <italic>f</italic>(<italic>θ<sup>k</sup></italic> | <italic>X</italic>) must go to zero. Since the total probability adds to one, <italic>f</italic>(<italic>θ</italic><sup>0</sup> | <italic>X</italic>) must go to one, QED.</p>
<p>We can extend this result to continuous parameter spaces, assuming several regularity conditions, like continuity, differentiability, and having the argument <italic>θ</italic><sup>0</sup> as an interior point of Θ with the appropriate topology. In such a context, we can state that, given a pre-established small neighborhood around <italic>θ</italic><sup>0</sup>, like <italic>C</italic>(<italic>θ</italic><sup>0</sup>, <italic>ε</italic>) the cube of side size <italic>ε</italic> centered at <italic>θ</italic><sup>0</sup>, this neighborhood concentrates almost all mass of <italic>f</italic>(<italic>θ</italic> | <italic>X</italic>), as the number of observations grows to infinite. Under the same regularity conditions, we also have that Maximum a Posteriori (MAP) estimator is a consistent estimator, <italic>i.e.</italic>, <italic>θ̂</italic> → <italic>θ</italic><sup>0</sup>.</p>
<p>The next results show the convergence in distribution of the posterior to a Normal distribution. For that, we need the Fisher information matrix identity from the last section.</p></sec>
<sec>
<title>Theorem</title>
<p>Posterior Normal Approximation:</p>
<p>The posterior distribution converges to a Normal distribution with mean <italic>θ</italic><sup>0</sup> and precision <italic>nJ</italic>(<italic>θ</italic><sup>0</sup>).</p></sec>
<sec>
<title>Proof (heuristic)</title>
<p>We only have to write the second order log-posterior Taylor expansion centered at <italic>θ̂</italic>,
<disp-formula id="FD52">
<mml:math id="mm56" display="block">
<mml:semantics id="sm56">
<mml:mrow>
<mml:mo>log</mml:mo>
<mml:mi>f</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">|</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mo>log</mml:mo>
<mml:mi>f</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mover accent="true">
<mml:mi>θ</mml:mi>
<mml:mo stretchy="true">^</mml:mo></mml:mover>
<mml:mo stretchy="false">|</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>+</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:mo>log</mml:mo>
<mml:mi>f</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mover accent="true">
<mml:mi>θ</mml:mi>
<mml:mo stretchy="true">^</mml:mo></mml:mover>
<mml:mo stretchy="false">|</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:mi>θ</mml:mi></mml:mrow></mml:mfrac>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo>−</mml:mo>
<mml:mover accent="true">
<mml:mi>θ</mml:mi>
<mml:mo stretchy="true">^</mml:mo></mml:mover>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>+</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mn>2</mml:mn></mml:mfrac>
<mml:mo stretchy="false">(</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>θ</mml:mi>
<mml:mo>−</mml:mo>
<mml:mover accent="true">
<mml:mi>θ</mml:mi>
<mml:mo stretchy="true">^</mml:mo></mml:mover>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mo>′</mml:mo></mml:msup>
<mml:mfrac>
<mml:mrow>
<mml:msup>
<mml:mo>∂</mml:mo>
<mml:mn>2</mml:mn></mml:msup>
<mml:mo>log</mml:mo>
<mml:mi>f</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mover accent="true">
<mml:mi>θ</mml:mi>
<mml:mo stretchy="true">^</mml:mo></mml:mover>
<mml:mo stretchy="false">|</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:msup>
<mml:mi>θ</mml:mi>
<mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mfrac>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo>−</mml:mo>
<mml:mover accent="true">
<mml:mi>θ</mml:mi>
<mml:mo stretchy="true">^</mml:mo></mml:mover>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>+</mml:mo>
<mml:mi>𝒪</mml:mi>
<mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo>−</mml:mo>
<mml:mover accent="true">
<mml:mi>θ</mml:mi>
<mml:mo stretchy="true">^</mml:mo></mml:mover>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mn>3</mml:mn></mml:msup></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>The term of order zero is a constant. The linear term is null, for <italic>θ̂</italic> is the MAP estimator at an interior point of Θ. The Hessian in the quadratic term is
<disp-formula id="FD53">
<mml:math id="mm57" display="block">
<mml:semantics id="sm57">
<mml:mrow>
<mml:mi>H</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mover accent="true">
<mml:mi>θ</mml:mi>
<mml:mo stretchy="true">^</mml:mo></mml:mover>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msup>
<mml:mo>∂</mml:mo>
<mml:mn>2</mml:mn></mml:msup>
<mml:mo>log</mml:mo>
<mml:mi>f</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mover accent="true">
<mml:mi>θ</mml:mi>
<mml:mo stretchy="true">^</mml:mo></mml:mover>
<mml:mo>|</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:msup>
<mml:mi>θ</mml:mi>
<mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mfrac>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msup>
<mml:mo>∂</mml:mo>
<mml:mn>2</mml:mn></mml:msup>
<mml:mo>log</mml:mo>
<mml:mi>f</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mover accent="true">
<mml:mi>θ</mml:mi>
<mml:mo stretchy="true">^</mml:mo></mml:mover>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:msup>
<mml:mi>θ</mml:mi>
<mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mfrac>
<mml:mo>+</mml:mo>
<mml:msubsup>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mi>n</mml:mi></mml:msubsup>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:msup>
<mml:mo>∂</mml:mo>
<mml:mn>2</mml:mn></mml:msup>
<mml:mo>log</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msup>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi></mml:msup>
<mml:mo>|</mml:mo>
<mml:mover accent="true">
<mml:mi>θ</mml:mi>
<mml:mo stretchy="true">^</mml:mo></mml:mover>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:msup>
<mml:mi>θ</mml:mi>
<mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mfrac></mml:mrow></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>The Hessian is negative definite, by the regularity conditions, and because <italic>θ̂</italic> is the MAP estimator. The first term is constant, and the second is the sum of <italic>n</italic> i.i.d. random variables. At the other hand we have already shown that the MAP estimator, and also that all the posterior mass concentrates around <italic>θ</italic><sup>0</sup>. We also see that the Hessian grows (in average) linearly with <italic>n</italic>, and that the higher order terms can not grow super-linearly. Also for a given <italic>n</italic> and <italic>θ</italic> → <italic>θ̂</italic>, the quadratic term dominates all higher order terms. Hence, the quadratic approximation of the log-posterior in increasingly more precise, Q.E.D.</p></sec></sec></sec>
<sec>
<label>5.</label>
<title>Final Remarks</title>
<p>The objections raised by Spencer-Brown against probability and statistics, analyzed in Sections 1 and 2, are somewhat simplistic and stereotypical, possibly explaining why they had little influence outside a close circle of admirers, most of them related to the radical constructivism movement. However, arguments very similar to those used to demystify Spencer-Brown's misconceptions and elucidate its misunderstandings, reappear in more subtle or abstract forms in the analysis of far more technical matters like, for example, the use and interpretation of prior and posterior distributions in Bayesian statistics.</p>
<p>In this article, entropy is presented as a cornerstone concept for the precise analysis and a key idea for the correct understanding of several important topics in probability and statistics. This understanding should help to clear the way for establishing Bayesian statistics as a preferred tool for scientific inference in mainstream cognitive constructivism.</p></sec></body>
<back>
<sec sec-type="display-objects">
<title>Figures and Table</title>
<fig id="f1-information-02-00277" position="float">
<label>Figure 1.</label>
<caption>
<p>(Pseudo)—random and quasi-random point sets on the unit box.</p></caption>
<graphic xlink:href="information-02-00277f1.gif"/></fig>
<fig id="f2-information-02-00277" position="float">
<label>Figure 2.</label>
<caption>
<p>EN, <italic>H</italic><sub>2</sub>-entropy <italic>vs.</italic> AR, apparent randomness. Probability of black-white pixel alternation.</p></caption>
<graphic xlink:href="information-02-00277f2.gif"/></fig></sec>
<ack>
<p>The author is grateful for the support of the Department of Applied Mathematics of the Institute of Mathematics and Statistics of the University of São Paulo, FAPESP—Fundação de Amparo à Pesquisa do Estado de São Paulo, and CNPq—Conselho Nacional de Desenvolvimento Científico e Tecnológico (grant PQ-306318-2008-3). The author is also grateful for the helpful discussions with several of his professional colleagues, including Carlos Alberto de Braganca Pereira, Fernando Bonassi, Luis Esteves, Marcelo de Souza Lauretto, Rafael Bassi Stern, Sergio Wechsler and Wagner Borges.</p></ack>
<ref-list>
<title>References</title>
<ref id="b1-information-02-00277"><label>1.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Borges</surname><given-names>W.</given-names></name><name><surname>Stern</surname><given-names>J.M.</given-names></name></person-group><article-title>The rules of logic composition for the bayesian epistemic e-values</article-title><source>Log. J. IGPL</source><year>2007</year><volume>15</volume><fpage>401</fpage><lpage>420</lpage><pub-id pub-id-type="doi">10.1093/jigpal/jzm032</pub-id></citation></ref>
<ref id="b2-information-02-00277"><label>2.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pereira</surname><given-names>C.A.B.</given-names></name><name><surname>Stern</surname><given-names>J.M.</given-names></name></person-group><article-title>Evidence and credibility: Full bayesian significance test for precise hypotheses</article-title><source>Entropy</source><year>1999</year><volume>1</volume><fpage>69</fpage><lpage>80</lpage><pub-id pub-id-type="doi">10.3390/e1040069</pub-id></citation></ref>
<ref id="b3-information-02-00277"><label>3.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pereira</surname><given-names>C.A.B.</given-names></name><name><surname>Wechsler</surname><given-names>S.</given-names></name><name><surname>Stern</surname><given-names>J.M.</given-names></name></person-group><article-title>Can a significance test be genuinely bayesian?</article-title><source>Bayesian Anal.</source><year>2008</year><volume>3</volume><fpage>79</fpage><lpage>100</lpage><pub-id pub-id-type="doi">10.1214/08-BA303</pub-id></citation></ref>
<ref id="b4-information-02-00277"><label>4.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Stern</surname><given-names>J.M.</given-names></name></person-group><article-title>Significance Tests, Belief Calculi, and Burden of Proof in Legal and Scientific Discourse</article-title><source>Laptec-2003, Frontiers in Artificial Intelligence and Its Applications</source><publisher-name>ISO Press</publisher-name><publisher-loc>Amsterdam, The Netherlands</publisher-loc><year>2003</year><volume>101</volume><fpage>139</fpage><lpage>147</lpage></citation></ref>
<ref id="b5-information-02-00277"><label>5.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Stern</surname><given-names>J.M.</given-names></name></person-group><article-title>Paraconsistent Sensitivity Analysis for Bayesian Significance</article-title><comment>Tests SBIA'04</comment><source>Lecture Notes Artificial Intelligence</source><person-group person-group-type="editor"><name><surname>Goebel</surname><given-names>R.</given-names></name><name><surname>Siekmann</surname><given-names>J.</given-names></name><name><surname>Wahlster</surname><given-names>W.</given-names></name></person-group><publisher-name>Springer</publisher-name><publisher-loc>Heidelberg , Germany</publisher-loc><year>2004</year><volume>3171</volume><fpage>134</fpage><lpage>143</lpage></citation></ref>
<ref id="b6-information-02-00277"><label>6.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Stern</surname><given-names>J.M.</given-names></name></person-group><source>Language, Metaphor and Metaphysics: The Subjective Side of Science</source><comment>Technical Report MAC-IME-USP-06-09</comment><publisher-name>Department of Statistical Science, University College</publisher-name><publisher-loc>London, UK</publisher-loc><year>2006</year></citation></ref>
<ref id="b7-information-02-00277"><label>7.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stern</surname><given-names>J.M.</given-names></name></person-group><article-title>Cognitive constructivism, eigen-solutions, and sharp statistical hypotheses</article-title><source>Cybern. Hum. Knowing</source><year>2007</year><volume>14</volume><fpage>9</fpage><lpage>36</lpage></citation></ref>
<ref id="b8-information-02-00277"><label>8.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stern</surname><given-names>J.M.</given-names></name></person-group><article-title>Language and the self-reference paradox</article-title><source>Cybern. Hum. Knowing</source><year>2007</year><volume>14</volume><fpage>71</fpage><lpage>92</lpage></citation></ref>
<ref id="b9-information-02-00277"><label>9.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Stern</surname><given-names>J.M.</given-names></name></person-group><source>Complex Structures, Modularity and Stochastic Evolution</source><comment>Technical Report IME-USP-MAP-07-01</comment><publisher-name>University of Sao Paulo</publisher-name><publisher-loc>Sao Paulo, Brazil</publisher-loc><year>2007</year></citation></ref>
<ref id="b10-information-02-00277"><label>10.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stern</surname><given-names>J.M.</given-names></name></person-group><article-title>Decoupling, sparsity, randomization, and objective bayesian inference</article-title><source>Cybern. Hum. Knowing</source><year>2008</year><volume>15</volume><fpage>49</fpage><lpage>68</lpage></citation></ref>
<ref id="b11-information-02-00277"><label>11.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Stern</surname><given-names>J.M.</given-names></name></person-group><article-title>Cognitive Constructivism the Epistemic Significance of Sharp Statistical Hypotheses</article-title><conf-name>Presented at MaxEnt 2008, The 28th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering</conf-name><conf-loc>Boraceia, Sao Paulo, Brazil</conf-loc><year>2008</year></citation></ref>
<ref id="b12-information-02-00277"><label>12.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Stern</surname><given-names>J.M.</given-names></name></person-group><article-title>The Living Intelligent Universe</article-title><conf-name>Proceeding of MBR09-The Internaternational Conference on Model-Based Reasoning in Science and Technology</conf-name><conf-loc>Unicamp, Brazil</conf-loc><year>2009</year></citation></ref>
<ref id="b13-information-02-00277"><label>13.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>von Foerster</surname><given-names>H.</given-names></name></person-group><source>Understanding Understanding: Essays on Cybernetics and Cognition</source><publisher-name>Springer Verlag</publisher-name><publisher-loc>New York, NY, USA</publisher-loc><year>2003</year></citation></ref>
<ref id="b14-information-02-00277"><label>14.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Bernardo</surname><given-names>G.G.</given-names></name><name><surname>Lauretto</surname><given-names>M.S.</given-names></name><name><surname>Stern</surname><given-names>J.M.</given-names></name></person-group><article-title>The Full Bayesian Significance Test form Symmetry in Contingency Tables</article-title><conf-name>Proceeding of 30th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering</conf-name><conf-loc>Chamonix, France</conf-loc><conf-date>July 4–9, 2010</conf-date></citation></ref>
<ref id="b15-information-02-00277"><label>15.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Chakrabarty</surname><given-names>D.</given-names></name></person-group><article-title>CHASSIS-Inverse Modelling of Relaxed Dynamical Systems</article-title><conf-name>Proceedings of the 18th World IMACS MODSIM Congress</conf-name><conf-loc>Cairns, Australia</conf-loc><conf-date>13–17 July 2009</conf-date></citation></ref>
<ref id="b16-information-02-00277"><label>16.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Johnson</surname><given-names>R.</given-names></name><name><surname>Chakrabarty</surname><given-names>D.</given-names></name><name><surname>O'Sullivan</surname><given-names>E.</given-names></name><name><surname>Raychaudhury</surname><given-names>S.</given-names></name></person-group><article-title>Comparing X-ray and dynamical mass profiles in the early-type galaxy NGC 4636</article-title><source>Astrophys. J.</source><year>2009</year><volume>706</volume><fpage>980</fpage><lpage>994</lpage><pub-id pub-id-type="doi">10.1088/0004-637X/706/2/980</pub-id></citation></ref>
<ref id="b17-information-02-00277"><label>17.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Loschi</surname><given-names>R.H.</given-names></name><name><surname>Monteiro</surname><given-names>J.V.D.</given-names></name><name><surname>Rocha</surname><given-names>G.H.M.A.</given-names></name><name><surname>Mayrink</surname><given-names>V.D.</given-names></name></person-group><article-title>Testing and estimating the non-disjunction fraction in meiosis I using reference priors</article-title><source>Biom. J.</source><year>2007</year><volume>49</volume><fpage>824</fpage><lpage>839</lpage><pub-id pub-id-type="doi">10.1002/bimj.200710364</pub-id><pub-id pub-id-type="pmid">17726717</pub-id></citation></ref>
<ref id="b18-information-02-00277"><label>18.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Madruga</surname><given-names>M.R.</given-names></name><name><surname>Esteves</surname><given-names>L.G.</given-names></name><name><surname>Wechsler</surname><given-names>S.</given-names></name></person-group><article-title>On the bayesianity of pereira-stern tests</article-title><source>Test</source><year>2001</year><volume>10</volume><fpage>291</fpage><lpage>299</lpage><pub-id pub-id-type="doi">10.1007/BF02595698</pub-id></citation></ref>
<ref id="b19-information-02-00277"><label>19.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rifo</surname><given-names>L.L.</given-names></name><name><surname>Torres</surname><given-names>S.</given-names></name></person-group><article-title>Full bayesian analysis for a class of jump-diffusion models</article-title><source>Comm. Stat. Theor. Meth.</source><year>2009</year><volume>38</volume><fpage>1262</fpage><lpage>1271</lpage><pub-id pub-id-type="doi">10.1080/03610920802395694</pub-id></citation></ref>
<ref id="b20-information-02-00277"><label>20.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rodrigues</surname><given-names>J.</given-names></name></person-group><article-title>Full bayesian significance test for zero-inflated distributions</article-title><source>Comm. Stat. Theor. Meth.</source><year>2006</year><volume>35</volume><fpage>299</fpage><lpage>307</lpage><pub-id pub-id-type="doi">10.1080/03610920500439984</pub-id></citation></ref>
<ref id="b21-information-02-00277"><label>21.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Colla</surname><given-names>E.</given-names></name><name><surname>Stern</surname><given-names>J.M.</given-names></name></person-group><article-title>Sparse factorization methods for inference in bayesian networks</article-title><source>AIP Conf. Proc.</source><year>2008</year><volume>1073</volume><fpage>136</fpage><lpage>143</lpage></citation></ref>
<ref id="b22-information-02-00277"><label>22.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hacking</surname><given-names>I.</given-names></name></person-group><article-title>Telepathy: Origins of Randomization in Experimental Design</article-title><source>Isis</source><year>1988</year><volume>79</volume><fpage>427</fpage><lpage>451</lpage><pub-id pub-id-type="doi">10.1086/354775</pub-id></citation></ref>
<ref id="b23-information-02-00277"><label>23.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Peirce</surname><given-names>C.S.</given-names></name><name><surname>Jastrow</surname><given-names>J.</given-names></name></person-group><article-title>On small differences of sensation</article-title><source>Memoirs of the National Academy of Sciences</source><publisher-name>National Academies Press</publisher-name><publisher-loc>Washingtion, DC, USA</publisher-loc><year>1884</year><volume>3</volume><fpage>75</fpage><lpage>83</lpage></citation></ref>
<ref id="b24-information-02-00277"><label>24.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Spencer-Brown</surname><given-names>G.</given-names></name></person-group><article-title>Statistical significance in psychical research</article-title><source>Nature</source><year>1953</year><volume>172</volume><fpage>154</fpage><lpage>156</lpage><pub-id pub-id-type="doi">10.1038/172154a0</pub-id><pub-id pub-id-type="pmid">13072613</pub-id></citation></ref>
<ref id="b25-information-02-00277"><label>25.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Spencer-Brown</surname><given-names>G.</given-names></name></person-group><article-title>Answer to soal <italic>et al.</italic></article-title><source>Nature</source><year>1953</year><volume>172</volume><fpage>594</fpage><lpage>595</lpage></citation></ref>
<ref id="b26-information-02-00277"><label>26.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Spencer-Brown</surname><given-names>G.</given-names></name></person-group><source>Probability and Scientific Inference</source><publisher-name>Longmans Green</publisher-name><publisher-loc>London, UK</publisher-loc><year>1957</year></citation></ref>
<ref id="b27-information-02-00277"><label>27.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Spencer-Brown</surname><given-names>G.</given-names></name></person-group><source>Laws of Form</source><publisher-name>Allen and Unwin</publisher-name><publisher-loc>London, UK</publisher-loc><year>1969</year></citation></ref>
<ref id="b28-information-02-00277"><label>28.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Carnielli</surname><given-names>W.</given-names></name></person-group><article-title>Formal Polynomials and the Laws of Form</article-title><source>Dimensions of Logical Concepts</source><person-group person-group-type="editor"><name><surname>Béziau</surname><given-names>J.Y.</given-names></name><name><surname>Costa-Leite</surname><given-names>A.</given-names></name></person-group><publisher-name>UNICAMP</publisher-name><publisher-loc>Campinas, Brazil</publisher-loc><year>2009</year></citation></ref>
<ref id="b29-information-02-00277"><label>29.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Edwards</surname><given-names>A.W.F.</given-names></name></person-group><source>Cogwheels of the Mind: The Story of Venn Diagrams</source><publisher-name>The Johns Hopkins University Press</publisher-name><publisher-loc>Baltimore, MD, USA</publisher-loc><year>2004</year></citation></ref>
<ref id="b30-information-02-00277"><label>30.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kauffman</surname><given-names>L.H.</given-names></name></person-group><article-title>The mathematics of charles sanders peirce</article-title><source>Cybern. Hum. Knowing</source><year>2001</year><volume>8</volume><fpage>79</fpage><lpage>110</lpage></citation></ref>
<ref id="b31-information-02-00277"><label>31.</label><citation citation-type="web"><person-group person-group-type="author"><name><surname>Kauffman</surname><given-names>L.H.</given-names></name></person-group><article-title>Laws of Form: An Exploration in Mathematics and Foundations, 2006</article-title><comment>Available at: <ext-link xlink:href="http://www.math.uic.edu/kauffman/Laws.pdf" ext-link-type="uri">http://www.math.uic.edu/kauffman/Laws.pdf</ext-link>(accessed on 1 April 2011)</comment></citation></ref>
<ref id="b32-information-02-00277"><label>32.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Meguire</surname><given-names>P.</given-names></name></person-group><article-title>Discovering boundary algebra: A simple notation for boolean algebra and the truth functions</article-title><source>Int. J. Gen. Sys.</source><year>2003</year><volume>32</volume><fpage>25</fpage><lpage>87</lpage><pub-id pub-id-type="doi">10.1080/0308107031000075690</pub-id></citation></ref>
<ref id="b33-information-02-00277"><label>33.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Peirce</surname><given-names>C.S.</given-names></name></person-group><article-title>A Boolean Algebra with One Constant</article-title><source>Collected Papers of Charles Sanders Peirce</source><person-group person-group-type="editor"><name><surname>Hartshorne</surname><given-names>C.</given-names></name><name><surname>Weiss</surname><given-names>P.</given-names></name><name><surname>Burks</surname><given-names>A.</given-names></name></person-group><publisher-name>InteLex</publisher-name><publisher-loc>Charlottesville, VA, USA</publisher-loc><year>1992</year></citation></ref>
<ref id="b34-information-02-00277"><label>34.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sheffer</surname><given-names>H.M.</given-names></name></person-group><article-title>A Set of five independent postulates for boolean algebras, with application to logical constants</article-title><source>Trans. Amer. Math. Soc.</source><year>1913</year><volume>14</volume><fpage>481</fpage><lpage>488</lpage><pub-id pub-id-type="doi">10.1090/S0002-9947-1913-1500960-1</pub-id></citation></ref>
<ref id="b35-information-02-00277"><label>35.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Flew</surname><given-names>A.</given-names></name></person-group><article-title>Probability and statistical inference by G.Spencer-Brown (review)</article-title><source>Phil. Q.</source><year>1959</year><volume>9</volume><fpage>380</fpage><lpage>381</lpage><pub-id pub-id-type="doi">10.2307/2216376</pub-id></citation></ref>
<ref id="b36-information-02-00277"><label>36.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Falk</surname><given-names>R.</given-names></name><name><surname>Konold</surname><given-names>C.</given-names></name></person-group><article-title>Making sense of randomness: Implicit encoding as a basis for judgment</article-title><source>Psychol. Rev.</source><year>1997</year><volume>104</volume><fpage>301</fpage><lpage>318</lpage><pub-id pub-id-type="doi">10.1037/0033-295X.104.2.301</pub-id></citation></ref>
<ref id="b37-information-02-00277"><label>37.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Falk</surname><given-names>R.</given-names></name><name><surname>Konold</surname><given-names>C.</given-names></name></person-group><article-title>Subjective randomness</article-title><source>Encyclopedia of Statistical Sciences</source><edition>2nd ed.</edition><publisher-name>Wiley</publisher-name><publisher-loc>New York, NY, USA</publisher-loc><year>2005</year><volume>13</volume><fpage>8397</fpage><lpage>8403</lpage></citation></ref>
<ref id="b38-information-02-00277"><label>38.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Good</surname><given-names>I.J.</given-names></name></person-group><article-title>Probability and statistical inference by G.Spencer-Brown (review)</article-title><source>Br. J. Philos. Sci.</source><year>1958</year><volume>9</volume><fpage>251</fpage><lpage>255</lpage></citation></ref>
<ref id="b39-information-02-00277"><label>39.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mundle</surname><given-names>C.W.K.</given-names></name></person-group><article-title>Probability and statistical inference by G.Spencer-Brown (review)</article-title><source>Philosophy</source><year>1959</year><volume>34</volume><fpage>150</fpage><lpage>154</lpage><pub-id pub-id-type="doi">10.1017/S0031819100047483</pub-id></citation></ref>
<ref id="b40-information-02-00277"><label>40.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Atkins</surname><given-names>P.W.</given-names></name></person-group><source>The Second Law</source><publisher-name>The Scientific American Books</publisher-name><publisher-loc>New York, NY, USA</publisher-loc><year>1984</year></citation></ref>
<ref id="b41-information-02-00277"><label>41.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Attneave</surname><given-names>E.</given-names></name></person-group><source>Applications of Information Theory to Psychology: A Summary of Basic Concepts, Methods, and Results</source><publisher-name>Holt, Rinehart and Winston</publisher-name><publisher-loc>New York, NY, USA</publisher-loc><year>1959</year></citation></ref>
<ref id="b42-information-02-00277"><label>42.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Dugdale</surname><given-names>J.S.</given-names></name></person-group><source>Entropy and Its Physical Meaning</source><publisher-name>Taylor and Francis</publisher-name><publisher-loc>London, UK</publisher-loc><year>1996</year></citation></ref>
<ref id="b43-information-02-00277"><label>43.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Krippendorff</surname><given-names>K.</given-names></name></person-group><source>Information Theory: Structural Models for Qualitative Data (Quantitative Applications in the Social Sciences V.62.)</source><publisher-name>Sage</publisher-name><publisher-loc>Beverly Hills, CA, USA</publisher-loc><year>1986</year></citation></ref>
<ref id="b44-information-02-00277"><label>44.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Tarasov</surname><given-names>L.</given-names></name></person-group><source>The World Is Built on Probability</source><publisher-name>MIR</publisher-name><publisher-loc>Moscow, Russia</publisher-loc><year>1988</year></citation></ref>
<ref id="b45-information-02-00277"><label>45.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Kapur</surname><given-names>J.N.</given-names></name></person-group><source>Maximum Entropy Models in Science and Engineering</source><publisher-name>John Wiley</publisher-name><publisher-loc>New Delhi, India</publisher-loc><year>1989</year></citation></ref>
<ref id="b46-information-02-00277"><label>46.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Rissanen</surname><given-names>J.</given-names></name></person-group><source>Stochastic Complexity in Statistical Inquiry</source><publisher-name>World Scientific</publisher-name><publisher-loc>New York, NY, USA</publisher-loc><year>1989</year></citation></ref>
<ref id="b47-information-02-00277"><label>47.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Wallace</surname><given-names>C.S.</given-names></name></person-group><source>Statistical and Inductive Inference by Minimum Message Length</source><publisher-name>Springer</publisher-name><publisher-loc>NewYork, NY, USA</publisher-loc><year>2005</year></citation></ref>
<ref id="b48-information-02-00277"><label>48.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tribble</surname><given-names>C.G.</given-names></name></person-group><article-title>Industry-sponsored negative trials and the potential pitfalls of post hoc analysis</article-title><source>Arch. Surg.</source><year>2008</year><volume>143</volume><fpage>933</fpage><lpage>934</lpage><pub-id pub-id-type="doi">10.1001/archsurg.143.10.933</pub-id><pub-id pub-id-type="pmid">18936369</pub-id></citation></ref>
<ref id="b49-information-02-00277"><label>49.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname><given-names>R.</given-names></name><name><surname>Lagakos</surname><given-names>S.W.</given-names></name><name><surname>Ware</surname><given-names>J.H.</given-names></name><name><surname>Hunter</surname><given-names>D.J.</given-names></name><name><surname>Drazen</surname><given-names>J.M.</given-names></name></person-group><article-title>Statistics in medicine-reporting of subgroup analyses in clinical trials</article-title><source>New Engl. J. Med.</source><year>2007</year><volume>357</volume><fpage>2189</fpage><lpage>2194</lpage><pub-id pub-id-type="doi">10.1056/NEJMsr077003</pub-id><pub-id pub-id-type="pmid">18032770</pub-id></citation></ref>
<ref id="b50-information-02-00277"><label>50.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Scott</surname><given-names>C.G.</given-names></name></person-group><article-title>Spencer-brown and probability: A critique</article-title><source>J. Soc. Psych. Res.</source><year>1958</year><volume>39</volume><fpage>217</fpage><lpage>234</lpage></citation></ref>
<ref id="b51-information-02-00277"><label>51.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Soal</surname><given-names>S.G.</given-names></name><name><surname>Stratton</surname><given-names>F.J.</given-names></name><name><surname>Thouless</surname><given-names>R.H.</given-names></name></person-group><article-title>Statistical significance in psychical research</article-title><source>Nature</source><year>1958</year><volume>172</volume><fpage>594</fpage></citation></ref>
<ref id="b52-information-02-00277"><label>52.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Atmanspacher</surname><given-names>H.</given-names></name></person-group><article-title>Non-physicalist physical approaches. guest editorial</article-title><source>Mind Matter</source><year>2005</year><volume>3</volume><fpage>3</fpage><lpage>6</lpage></citation></ref>
<ref id="b53-information-02-00277"><label>53.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ehm</surname><given-names>W.</given-names></name></person-group><article-title>Meta-analysis of mind-matter experiments: A statistical modeling perspective</article-title><source>Mind Matter</source><year>2005</year><volume>3</volume><fpage>85</fpage><lpage>132</lpage></citation></ref>
<ref id="b54-information-02-00277"><label>54.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Henning</surname><given-names>C.</given-names></name></person-group><source>Falsification of Propensity Models by Statistical Tests and the Goodness-of-Fit Paradox</source><comment>Technical Report no. 304</comment><publisher-name>Department of Statistical Science, University College</publisher-name><publisher-loc>London</publisher-loc><year>2006</year></citation></ref>
<ref id="b55-information-02-00277"><label>55.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kaptchuk</surname><given-names>T.J.</given-names></name><name><surname>Kerr</surname><given-names>C.E.</given-names></name></person-group><article-title>Commentary: Unbiased divination, unbiased evidence, and the patulin clinical trial</article-title><source>Int. J. Epidemiol.</source><year>2004</year><volume>33</volume><fpage>247</fpage><lpage>251</lpage><pub-id pub-id-type="doi">10.1093/ije/dyh047</pub-id><pub-id pub-id-type="pmid">15082621</pub-id></citation></ref>
<ref id="b56-information-02-00277"><label>56.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Utts</surname><given-names>J.</given-names></name></person-group><article-title>Replication and meta-analysis in parapsychology</article-title><source>Stat. Sci.</source><year>1991</year><volume>6</volume><fpage>363</fpage><lpage>403</lpage><pub-id pub-id-type="doi">10.1214/ss/1177011577</pub-id></citation></ref>
<ref id="b57-information-02-00277"><label>57.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wassermann</surname><given-names>G.D.</given-names></name></person-group><article-title>Some comments on the methods and statements in parapsychology and other sciences</article-title><source>Br. J. Philos. Sci.</source><year>1955</year><volume>6</volume><fpage>122</fpage><lpage>140</lpage></citation></ref>
<ref id="b58-information-02-00277"><label>58.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bonassi</surname><given-names>F.V.</given-names></name><name><surname>Stern</surname><given-names>R.B.</given-names></name><name><surname>Wechsler</surname><given-names>S.</given-names></name></person-group><article-title>The gambler's fallacy: a bayesian approach</article-title><source>AIP Conf. Proc.</source><year>2008</year><volume>1073</volume><fpage>8</fpage><lpage>15</lpage></citation></ref>
<ref id="b59-information-02-00277"><label>59.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bonassi</surname><given-names>F.V.</given-names></name><name><surname>Nishimura</surname><given-names>R.</given-names></name><name><surname>Stern</surname><given-names>R.B.</given-names></name></person-group><article-title>In defense of randomization: A subjectivist bayesian approach</article-title><source>AIP Conf. Proc.</source><year>2009</year><volume>1193</volume><fpage>32</fpage><lpage>39</lpage></citation></ref>
<ref id="b60-information-02-00277"><label>60.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dehue</surname><given-names>T.</given-names></name></person-group><article-title>Deception, efficiency, and random groups: Psychology and the gradual origination of the random group design</article-title><source>Isis</source><year>1997</year><volume>88</volume><fpage>653</fpage><lpage>673</lpage><pub-id pub-id-type="doi">10.1086/383850</pub-id><pub-id pub-id-type="pmid">9519574</pub-id></citation></ref>
<ref id="b61-information-02-00277"><label>61.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Hammersley</surname><given-names>J.M.</given-names></name><name><surname>Handscomb</surname><given-names>D.C.</given-names></name></person-group><source>Monte Carlo Methods</source><publisher-name>Chapman and Hall</publisher-name><publisher-loc>London, UK</publisher-loc><year>1964</year></citation></ref>
<ref id="b62-information-02-00277"><label>62.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Ripley</surname><given-names>B.D.</given-names></name></person-group><source>Stochastic Simulation</source><publisher-name>Wiley</publisher-name><publisher-loc>New York, NY, USA</publisher-loc><year>1987</year></citation></ref>
<ref id="b63-information-02-00277"><label>63.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Marsaglia</surname><given-names>G.</given-names></name></person-group><article-title>Random numbers fall mainly in the planes</article-title><source>Proc. Natl. Acad. Sci.</source><year>1968</year><volume>61</volume><fpage>25</fpage><lpage>28</lpage><pub-id pub-id-type="doi">10.1073/pnas.61.1.25</pub-id><pub-id pub-id-type="pmid">16591687</pub-id></citation></ref>
<ref id="b64-information-02-00277"><label>64.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Boyar</surname><given-names>J.</given-names></name></person-group><article-title>Inferring sequences produced by pseudo-random number generators</article-title><source>J. ACM</source><year>1989</year><volume>36</volume><fpage>129</fpage><lpage>141</lpage><pub-id pub-id-type="doi">10.1145/58562.59305</pub-id></citation></ref>
<ref id="b65-information-02-00277"><label>65.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Matsumoto</surname><given-names>M.</given-names></name><name><surname>Nishimura</surname><given-names>T.</given-names></name></person-group><article-title>Mersenne twister: A 623-dimensionally equidistributed uniform pseudorandom number generator</article-title><source>ACM Trans. Model. Comput. Simul.</source><year>1998</year><volume>8</volume><fpage>3</fpage><lpage>30</lpage><pub-id pub-id-type="doi">10.1145/272991.272995</pub-id></citation></ref>
<ref id="b66-information-02-00277"><label>66.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Matsumoto</surname><given-names>M.</given-names></name><name><surname>Kurita</surname><given-names>Y.</given-names></name></person-group><article-title>Twisted GFSR generators</article-title><source>ACM Trans. Model. Comput. Simul.</source><year>1992</year><volume>2</volume><fpage>179</fpage><lpage>194</lpage><pub-id pub-id-type="doi">10.1145/146382.146383</pub-id></citation></ref>
<ref id="b67-information-02-00277"><label>67.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Abelson</surname><given-names>R.P.</given-names></name></person-group><source>Statistics as Principled Argument</source><publisher-name>LEA</publisher-name><publisher-loc>Hillsdale, NJ, USA</publisher-loc><year>1995</year></citation></ref>
<ref id="b68-information-02-00277"><label>68.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Matous̆ek</surname><given-names>J.</given-names></name></person-group><source>Geometric Discrepancy</source><publisher-name>Springer</publisher-name><publisher-loc>Berlin, Germany</publisher-loc><year>1991</year></citation></ref>
<ref id="b69-information-02-00277"><label>69.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Günther</surname><given-names>M.</given-names></name><name><surname>Jüngel</surname><given-names>A.</given-names></name></person-group><source>Finanzderivate mit MATLAB. Mathematische Modellierung und Numerische Simulation</source><publisher-name>Vieweg Verlag</publisher-name><publisher-loc>Wiesbaden, Germany</publisher-loc><year>2003</year><fpage>117</fpage></citation></ref>
<ref id="b70-information-02-00277"><label>70.</label><citation citation-type="thesis"><person-group person-group-type="author"><name><surname>Merkel</surname><given-names>R.</given-names></name></person-group><article-title>Analysis and Enhancements of Adaptive Random Testing</article-title><source>Ph.D. Thesis</source><publisher-name>Swinburne University of Technology in Melbourne</publisher-name><publisher-loc>Melbourne, Australia</publisher-loc><year>2005</year></citation></ref>
<ref id="b71-information-02-00277"><label>71.</label><citation citation-type="thesis"><person-group person-group-type="author"><name><surname>Ökten</surname><given-names>G.</given-names></name></person-group><article-title>Contributions to the Theory of Monte Carlo and Quasi monte Carlo Methods</article-title><source>Ph.D. Thesis</source><publisher-name>Clearmont University</publisher-name><publisher-loc>Clearmont, CA, USA</publisher-loc><year>1999</year></citation></ref>
<ref id="b72-information-02-00277"><label>72.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sen</surname><given-names>S.K.</given-names></name><name><surname>Samanta</surname><given-names>T.</given-names></name><name><surname>Reese</surname><given-names>A.</given-names></name></person-group><article-title>Quasi versus pseudo random generators: Discrepancy, complexity and integration-error based comparisson</article-title><source>Int. J. Innov. Comput. Inform. Control</source><year>2006</year><volume>2</volume><fpage>621</fpage><lpage>651</lpage></citation></ref>
<ref id="b73-information-02-00277"><label>73.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Morokoff</surname><given-names>W.J.</given-names></name></person-group><article-title>Generating quasi-random paths for stochastic processes</article-title><source>SIAM Rev.</source><year>1998</year><volume>40</volume><fpage>765</fpage><lpage>788</lpage><pub-id pub-id-type="doi">10.1137/S0036144597317959</pub-id></citation></ref>
<ref id="b74-information-02-00277"><label>74.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Zabell</surname><given-names>S.L.</given-names></name></person-group><article-title>The Quest for Randomness and its Statistical Applications</article-title><source>Statistics for the Twenty-First Century</source><person-group person-group-type="editor"><name><surname>Gordon</surname><given-names>E.</given-names></name><name><surname>Gordon</surname><given-names>S.</given-names></name></person-group><publisher-name>Mathematical Association of America</publisher-name><publisher-loc>Washington, DC, USA</publisher-loc><year>1992</year></citation></ref>
<ref id="b75-information-02-00277"><label>75.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Gell'Mann</surname><given-names>M.</given-names></name></person-group><source>The Quark and the Jaguar: Adventures in the Simple and the Complex</source><publisher-name>W. H. Freeman</publisher-name><publisher-loc>New York, NY, USA</publisher-loc><year>1994</year></citation></ref>
<ref id="b76-information-02-00277"><label>76.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lopes</surname><given-names>L.L.</given-names></name></person-group><article-title>Doing the Impossible: a note on induction and the experience of randomness</article-title><source>J. Exp. Psychol. Learn. Mem. Cognit.</source><year>1982</year><volume>8</volume><fpage>626</fpage><lpage>636</lpage><pub-id pub-id-type="doi">10.1037/0278-7393.8.6.626</pub-id></citation></ref>
<ref id="b77-information-02-00277"><label>77.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lopes</surname><given-names>L.L.</given-names></name><name><surname>Oden</surname><given-names>G.C.</given-names></name></person-group><article-title>Distinguishing between random and nonrandom events</article-title><source>J. Exp. Psychol. Learn. Mem. Cognit.</source><year>1987</year><volume>13</volume><fpage>392</fpage><lpage>400</lpage><pub-id pub-id-type="doi">10.1037/0278-7393.13.3.392</pub-id></citation></ref>
<ref id="b78-information-02-00277"><label>78.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tversky</surname><given-names>Y</given-names></name><name><surname>Kahneman</surname><given-names>D.</given-names></name></person-group><article-title>Belief in the law of small numbers</article-title><source>Psychol. Bull.</source><year>1971</year><volume>76</volume><fpage>105</fpage><lpage>110</lpage><pub-id pub-id-type="doi">10.1037/h0031322</pub-id></citation></ref>
<ref id="b79-information-02-00277"><label>79.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Piaget</surname><given-names>J.</given-names></name><name><surname>Inhelder</surname><given-names>B.</given-names></name></person-group><source>The Origin of the Idea of Chance in Children</source><person-group person-group-type="editor"><name><surname>Leake</surname><given-names>L.</given-names></name><name><surname>Burrell</surname><given-names>E.</given-names></name><name><surname>Fishbein</surname><given-names>H.D.</given-names></name></person-group><publisher-name>Norton</publisher-name><publisher-loc>New York, NY, USA</publisher-loc><year>1975</year></citation></ref>
<ref id="b80-information-02-00277"><label>80.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chaitin</surname><given-names>G.J.</given-names></name></person-group><article-title>Randomness and mathematical proof</article-title><source>Sci. Amer.</source><year>1975</year><volume>232</volume><fpage>47</fpage><lpage>52</lpage></citation></ref>
<ref id="b81-information-02-00277"><label>81.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chaitin</surname><given-names>G.J.</given-names></name></person-group><article-title>Randomness in arithmetic</article-title><source>Sci. Amer.</source><year>1988</year><volume>259</volume><fpage>80</fpage><lpage>85</lpage><pub-id pub-id-type="doi">10.1038/scientificamerican0788-80</pub-id></citation></ref>
<ref id="b82-information-02-00277"><label>82.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kac</surname><given-names>M.</given-names></name></person-group><article-title>What is random?</article-title><source>Amer. Sci.</source><year>1983</year><volume>71</volume><fpage>405</fpage><lpage>406</lpage></citation></ref>
<ref id="b83-information-02-00277"><label>83.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kolmogorov</surname><given-names>A.N.</given-names></name></person-group><article-title>Three approaches to the quantitative definition of information</article-title><source>Probl. Inform. Transm.</source><year>1965</year><volume>1</volume><fpage>1</fpage><lpage>7</lpage></citation></ref>
<ref id="b84-information-02-00277"><label>84.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Martin-Löf</surname><given-names>E.</given-names></name></person-group><article-title>The definition of random sequences</article-title><source>Inform. Contr.</source><year>1966</year><volume>9</volume><fpage>602</fpage><lpage>619</lpage><pub-id pub-id-type="doi">10.1016/S0019-9958(66)80018-9</pub-id></citation></ref>
<ref id="b85-information-02-00277"><label>85.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Martin-Löf</surname><given-names>P.</given-names></name></person-group><article-title>Algorithms and randomness</article-title><source>Int. Statist. Inst.</source><year>1969</year><volume>37</volume><fpage>265</fpage><lpage>272</lpage><pub-id pub-id-type="doi">10.2307/1402117</pub-id></citation></ref>
<ref id="b86-information-02-00277"><label>86.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Csiszar</surname><given-names>I.</given-names></name></person-group><article-title>Information Measures</article-title><conf-name>Proceedings of the 7th Prage Conferences of Information Theory</conf-name><conf-loc>Prague, Czech Republic</conf-loc><year>1974</year><volume>2</volume><fpage>73</fpage><lpage>86</lpage></citation></ref>
<ref id="b87-information-02-00277"><label>87.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Khinchin</surname><given-names>A.I.</given-names></name></person-group><source>Mathematical Foundations of Information Theory</source><publisher-name>Dover</publisher-name><publisher-loc>New York, NY, USA</publisher-loc><year>1953</year></citation></ref>
<ref id="b88-information-02-00277"><label>88.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Renyi</surname><given-names>A.</given-names></name></person-group><article-title>On Measures of Entropy and Information</article-title><conf-name>Proceedings of the 4th Berkeley Symposium on Mathematical and Statistical Problems</conf-name><conf-loc>Statistical Laboratory of the University of California, Berkeley</conf-loc><conf-date>June 20–July 30, 1960</conf-date><publisher-name>University of California Press</publisher-name><publisher-loc>Berkeley, CA, USA</publisher-loc><year>1961</year><volume>VI</volume><fpage>547</fpage><lpage>561</lpage></citation></ref>
<ref id="b89-information-02-00277"><label>89.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Renyi</surname><given-names>A.</given-names></name></person-group><source>Probability Theory</source><publisher-name>North-Holland</publisher-name><publisher-loc>Amsterdam, the Netherlands</publisher-loc><year>1970</year></citation></ref>
<ref id="b90-information-02-00277"><label>90.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Gokhale</surname><given-names>D.V.</given-names></name></person-group><article-title>Maximum Entropy Characterization of Some Distributions</article-title><source>Statistical Distributions in Scientific Work</source><person-group person-group-type="editor"><name><surname>Patil</surname><given-names>G.P.</given-names></name><name><surname>Kotz</surname><given-names>G.P.</given-names></name><name><surname>Ord</surname><given-names>J.K.</given-names></name></person-group><publisher-name>Springer</publisher-name><publisher-loc>Berlin, Germany</publisher-loc><year>1975</year><volume>3</volume><fpage>299</fpage><lpage>304</lpage></citation></ref>
<ref id="b91-information-02-00277"><label>91.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Censor</surname><given-names>Y.</given-names></name><name><surname>Zenios</surname><given-names>S.</given-names></name></person-group><source>Introduction to Methods of Parallel Optimization</source><publisher-name>IMPA</publisher-name><publisher-loc>Rio de Janeiro, Brazil</publisher-loc><year>1994</year></citation></ref>
<ref id="b92-information-02-00277"><label>92.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Censor</surname><given-names>Y.</given-names></name><name><surname>Zenios</surname><given-names>S.A.</given-names></name></person-group><source>Parallel Optimization: Theory, Algorithms, and Applications</source><publisher-name>Oxford University Press</publisher-name><publisher-loc>New York, NY, USA</publisher-loc><year>1997</year></citation></ref>
<ref id="b93-information-02-00277"><label>93.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Elfving</surname><given-names>T.</given-names></name></person-group><article-title>On some methods for entropy maximization and matrix scaling</article-title><source>Linear Algebra Appl.</source><year>1980</year><volume>34</volume><fpage>321</fpage><lpage>339</lpage><pub-id pub-id-type="doi">10.1016/0024-3795(80)90171-8</pub-id></citation></ref>
<ref id="b94-information-02-00277"><label>94.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Fang</surname><given-names>S.C.</given-names></name><name><surname>Rajasekera</surname><given-names>J.R.</given-names></name><name><surname>Tsao</surname><given-names>H.S.J.</given-names></name></person-group><source>Entropy Optimization and Mathematical Programming</source><publisher-name>Kluwer</publisher-name><publisher-loc>Dordrecht, The Netherlands</publisher-loc><year>1997</year></citation></ref>
<ref id="b95-information-02-00277"><label>95.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Iusem</surname><given-names>A.N.</given-names></name><name><surname>Pierro</surname><given-names>A.R.</given-names></name></person-group><article-title>De Convergence results for an accelerated nonlinear cimmino algorithm</article-title><source>Numer. Math.</source><year>1986</year><volume>46</volume><fpage>367</fpage><lpage>378</lpage></citation></ref>
<ref id="b96-information-02-00277"><label>96.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Gelman</surname><given-names>A.</given-names></name><name><surname>Carlin</surname><given-names>J.B.</given-names></name><name><surname>Stern</surname><given-names>H.S.</given-names></name><name><surname>Rubin</surname><given-names>D.B.</given-names></name></person-group><source>Bayesian Data Analysis</source><edition>2nd ed</edition><publisher-name>Chapman and Hall/CRC</publisher-name><publisher-loc>New York, NY, USA</publisher-loc><year>2003</year></citation></ref>
<ref id="b97-information-02-00277"><label>97.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Caticha</surname><given-names>A.</given-names></name></person-group><article-title>Lectures on Probability, Entropy and Statistical Physics</article-title><conf-name>Presented at MaxEnt 2008, The 28th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering</conf-name><conf-loc>Boracéia, São Paulo, Brazil</conf-loc><year>2008</year></citation></ref>
<ref id="b98-information-02-00277"><label>98.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tribus</surname><given-names>M.</given-names></name><name><surname>McIrvine</surname><given-names>E.C.</given-names></name></person-group><article-title>Energy and information</article-title><source>Sci. Amer.</source><year>1971</year><volume>224</volume><fpage>178</fpage><lpage>184</lpage></citation></ref>
<ref id="b99-information-02-00277"><label>99.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Garcia</surname><given-names>M.V.P.</given-names></name><name><surname>Humes</surname><given-names>C.</given-names></name><name><surname>Stern</surname><given-names>J.M.</given-names></name></person-group><article-title>Generalized line criterion for gauss seidel method</article-title><source>J. Comput. Appl. Math.</source><year>2002</year><volume>22</volume><fpage>91</fpage><lpage>97</lpage></citation></ref>
<ref id="b100-information-02-00277"><label>100.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Zellner</surname><given-names>A.</given-names></name></person-group><source>Introduction to Bayesian Inference in Econometrics</source><publisher-name>Wiley</publisher-name><publisher-loc>New York, NY, USA</publisher-loc><year>1971</year></citation></ref>
<ref id="b101-information-02-00277"><label>101.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Amari</surname><given-names>S.I.</given-names></name><name><surname>Barndorff-Nielsen</surname><given-names>O.E.</given-names></name><name><surname>Kass</surname><given-names>R.E.</given-names></name><name><surname>Lauritzen</surname><given-names>S.L.</given-names></name><name><surname>Rao</surname><given-names>C.R.</given-names></name></person-group><article-title>Differential Geometry in Statistical Inference</article-title><source>IMS Lecture Notes Monograph</source><publisher-name>Institute of Mathematical Statistics</publisher-name><publisher-loc>Hayward, CA, USA</publisher-loc><year>1987</year><volume>10</volume></citation></ref>
<ref id="b102-information-02-00277"><label>102.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Amari</surname><given-names>S.I.</given-names></name></person-group><source>Methods of Information Geometry</source><publisher-name>American Mathematical Society</publisher-name><publisher-loc>Providence, RI, USA</publisher-loc><year>2007</year></citation></ref>
<ref id="b103-information-02-00277"><label>103.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Berger</surname><given-names>J.O.</given-names></name><name><surname>Bernardo</surname><given-names>J.M.</given-names></name></person-group><article-title>On the Development of Reference Priors</article-title><source>Bayesian Statistics 4</source><person-group person-group-type="editor"><name><surname>Bernardo</surname><given-names>J.M.</given-names></name><name><surname>Berger</surname><given-names>J.O.</given-names></name><name><surname>Lindley</surname><given-names>D.V.</given-names></name><name><surname>Smith</surname><given-names>A.F.M.</given-names></name></person-group><publisher-name>Oxford University Press</publisher-name><publisher-loc>Oxford, UK</publisher-loc><year>1992</year><fpage>35</fpage><lpage>60</lpage></citation></ref>
<ref id="b104-information-02-00277"><label>104.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Berger</surname><given-names>J.O.</given-names></name></person-group><source>Statistical Decision Theory and Bayesian Analysis</source><edition>2nd ed</edition><publisher-name>Springer</publisher-name><publisher-loc>New York, NY, USA</publisher-loc><year>1993</year></citation></ref>
<ref id="b105-information-02-00277"><label>105.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Bernardo</surname><given-names>J.M.</given-names></name><name><surname>Smith</surname><given-names>A.F.M.</given-names></name></person-group><source>Bayesian Theory</source><publisher-name>Wiley</publisher-name><publisher-loc>New York, NY, USA</publisher-loc><year>2000</year></citation></ref>
<ref id="b106-information-02-00277"><label>106.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>DeGroot</surname><given-names>M.H.</given-names></name></person-group><source>Optimal Statistical Decisions</source><publisher-name>McGraw-Hill</publisher-name><publisher-loc>New York, NY, USA</publisher-loc><year>1970</year></citation></ref>
<ref id="b107-information-02-00277"><label>107.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Hartigan</surname><given-names>J.A.</given-names></name></person-group><source>Bayes Theory</source><publisher-name>Springer</publisher-name><publisher-loc>New York, NY, USA</publisher-loc><year>1983</year></citation></ref>
<ref id="b108-information-02-00277"><label>108.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Jeffreys</surname><given-names>H.</given-names></name></person-group><source>Theory of Probability</source><edition>3rd ed.</edition><publisher-name>Clarendon Press</publisher-name><publisher-loc>Oxford, UK</publisher-loc><year>1961</year></citation></ref>
<ref id="b109-information-02-00277"><label>109.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Scholl</surname><given-names>H.</given-names></name></person-group><article-title>Shannon optimal priors on independent identically distributed statistical experiments converge weakly to Jeffreys' prior</article-title><source>Test</source><year>1998</year><volume>7</volume><fpage>75</fpage><lpage>94</lpage><pub-id pub-id-type="doi">10.1007/BF02565103</pub-id></citation></ref>
<ref id="b110-information-02-00277"><label>110.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Zhu</surname><given-names>H.</given-names></name></person-group><source>Information Geometry, Bayesian Inference, Ideal Estimates and Error Decomposition</source><publisher-name>Santa Fe Institute</publisher-name><publisher-loc>Santa Fe, NM, USA</publisher-loc><year>1998</year></citation></ref></ref-list></back></article>
