This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

It has been proposed that the general function of the brain is inference, which corresponds quantitatively to the minimization of uncertainty (or the maximization of information). However, there has been a lack of clarity about exactly what this means. Efforts to quantify information have been in agreement that it depends on probabilities (through Shannon entropy), but there has long been a dispute about the definition of probabilities themselves. The “frequentist” view is that probabilities are (or can be) essentially equivalent to frequencies, and that they are therefore properties of a physical system, independent of any observer of the system. E.T. Jaynes developed the alternate “Bayesian” definition, in which probabilities are always conditional on a state of knowledge through the rules of logic, as expressed in the maximum entropy principle. In doing so, Jaynes and others provided the objective means for deriving probabilities, as well as a unified account of information and logic (knowledge and reason). However, neuroscience literature virtually never specifies any definition of probability, nor does it acknowledge any dispute concerning the definition. Although there has recently been tremendous interest in Bayesian approaches to the brain, even in the Bayesian literature it is common to find probabilities that are purported to come directly and unconditionally from frequencies. As a result, scientists have mistakenly attributed their own information to the neural systems they study. Here I argue that the adoption of a strictly Jaynesian approach will prevent such errors and will provide us with the philosophical and mathematical framework that is needed to understand the general function of the brain. Accordingly, our challenge becomes the identification of the biophysical basis of Jaynesian information and logic. I begin to address this issue by suggesting how we might identify a probability distribution over states of one physical system (an “object”) conditional only on the biophysical state of another physical system (an “observer”). The primary purpose in doing so is not to characterize information and inference in exquisite, quantitative detail, but to be as clear and precise as possible about what it means to perform inference and how the biophysics of the brain could achieve this goal.

It is almost universally agreed that the nervous system is specialized for processing information. But for most people, that statement would seem too vague to be meaningful. While everyone has some intuitive notion of the meaning of “information” for most people, including neuroscientists, the concept is too poorly defined to provide any deep insight into the function of the nervous system. I myself had both a bachelor’s degree and a Ph.D. in neuroscience before I discovered, to my surprise, that there exists a precise quantitative definition of information. Although Claude Shannon gave this definition in 1948 [

There are several remarkable facts that suggest serious shortcomings in our basic approach to understanding the brain and information. First, despite its considerable contributions with respect to engineering, information theory still has not found its way into biology and medical textbooks after 60 years. Even standard neuroscience textbooks, including the authoritative text of Kandel and colleagues [

I propose here that these difficulties can be attributed in large part to confusion about the nature of information. Scientists have often mistaken their own knowledge about physical systems for inherent properties of the systems themselves. One consequence of this confusion has been that the concept of information has been divorced from the biophysical substrates that constitute the nervous system. Here I suggest that, properly understood, the concept of information can be grounded in biophysics and it can offer insight into neural function.

The concepts of probability and information are of fundamental importance to the computational goal of the nervous system. Since at least as far back as von Helmholz (1896) [

According to Bayesian accounts, the function of the brain is to infer or predict the state of the world for the purpose of selecting motor outputs (“decision-making” in a broad sense of the term). I have proposed that the only fundamental problem in making decisions is uncertainty (lack of information) about the world [

As scientists who view the goal of the brain as inference, our goal is to try to use our knowledge to infer the information and inference of a brain. Thus we try to “take the brain’s point of view”, a challenge that has been approached through a variety of different methods [

Are the information and probabilities used in neuroscience properties of the environment (an observed object), the neural system under investigation, or the scientist? The frequentist view is that probabilities are a property of a physical system (or object) that generates frequency distributions, and they exist independent of any observer of that system. The physical system could correspond to a neural system or anything else. Here I argue in favor of the view of Jaynes (and Laplace, shown at left) that probabilities are always conditional on the information of an observer about an object. I presume that the observer’s information must be in a physical form inside the observer. There could be many observers, but in neuroscience, the observer of interest would usually correspond either to the scientist (who observes a neural entity as well as its environment from a “third-person” perspective), or to the neural entity under investigation (for example, a brain, neuron or ion channel, which observes its environment from its “first-person” perspective). The arrows indicate the typical direction of information flow. The distinction between “observer” and “object” is merely one of perspective, and does not imply any fundamental distinction between the qualities of the two.

Two fundamentally distinct definitions of probability have been proposed, “Jaynesian” (better known as “Bayesian”) and “frequentist” (summarized in

Two definitions of probability. These can be illustrated by the following scenario: There are four mutually exclusive possibilities, A, B, C, and D. What is the probability of each?

Definition | Frequentist | Jaynesian |
---|---|---|

Answer | Probabilities are unknown and undefined because there are no frequencies | 1/4 = 0.25 |

Meaning | Probabilities come from measuring frequencies | Probabilities describe information (from any source) |

Derivation | A collection of |
Logic (common sense) |

Attribution | Probabilities are a property of a physical system | Probabilities are a property of an observer of a physical system |

Thomas Bayes was the first to explicitly and formally introduce the concept that probabilities can be conditional on information in what is now known as Bayes’s Theorem (BT) (he died in 1761, and his theorem was published posthumously). Like others before him, he simply used his own intuition to derive probabilities from information. Simon Pierre Laplace (shown in

Jaynesian theory ultimately seeks to provide a unified account of information (knowledge) and logic (reason), and thus its relevance extends far beyond formal applications of probability theory. The Jaynesian definition of probability has been advanced by many authors over the years (e.g., [

At the risk of oversimplification, the approach of Jaynes can be summarized by the statement:

Logic is universal and indisputable, whereas information is localized in space and time to an observer, and it typically differs over time between observers.

The probability of a particular proposition (or state) is always entirely conditional on a particular set of knowledge or information. This conditionality is the defining feature of Jaynesian probabilities. The rules relating knowledge to probability are essentially just “common sense” or logic, as embodied within the principle of maximum entropy. The maximum entropy principle requires that we fully acknowledge our ignorance by considering all possibilities equally probable unless we have evidence to the contrary. For example, if the only information available is that “there are four possible outcomes”, then the probability distribution that describes that information is “flat” since entropy is maximized when the probabilities are all equal. Since by definition the sum of the probabilities must equal one (which is merely a trivial but useful convention), the probability of each outcome is 0.25. In contrast to this contrived example, we often have information that does not constrain the number of possible states, but does constrain the location and scale. If a state of knowledge consists only of the location and scale (such as mean and variance), with no bounds on the state space, that knowledge is described by a Gaussian probability distribution, which has the maximum entropy given this knowledge. In some cases our knowledge derives almost entirely from observing the past frequency of an event, in which case the probability distribution that best describes our knowledge may closely resemble the observed frequency distribution. Thus knowledge derived from measurement of frequency distributions is treated just like any other knowledge.

The bridge between information and a probability distribution is the principle of maximum entropy. The probability distribution that correctly describes a state of knowledge is the one with maximum entropy, and thus mathematical methods to find a probability distribution should seek to maximize the entropy

The entropy (H(x|_{i}_{i}

A probability distribution is conditional on information, and it fully and quantitatively characterizes the information upon which it is conditional (with respect to a particular state space). Having derived a probability distribution, we can also measure the

Information and entropy (uncertainty) have a fairly simple and intuitive relationship: The more information, the narrower the probability distribution, and the less the entropy. The information determines the probabilities, and the entropy, through the maximum entropy principle. If one were to quantify information, it would simply be the difference between the entropy

In Equation 2,

The relativistic nature of Shannon information (Equation 2) can cause confusion and limit its general utility as a measure of information. Intuitively, we would like to quantify the amount of information in such a way that entropy corresponds directly to an absence of information, similar to the way in which a vacuum corresponds to an absence of matter. We would like to use a single and universal scale, allowing us to measure it in an absolute sense, in isolation and without consideration of any other information. This would allow us to quantify all of an observer’s information (

We could quantify information on an absolute scale using Equation 2 if we can identify a probability distribution corresponding to a state

The maximum entropy principle of Jaynes is a formal expression of logic, and it is the most fundamental principle within probability theory, since it defines probabilities and enables their derivation. All probabilities should be maximum entropy distributions (MEDs). In principle, all probabilities could be derived through entropy maximization, and thus it is sufficient to encapsulate all of probability theory. However, except for the simplest states of information, deriving MEDs directly is challenging if not impossible as a practical matter. Fortunately there is a rule known as Bayes’s Theorem (BT) that allows us to manipulate probabilities without the trouble of directly maximizing entropy (in a formal, mathematical sense). BT takes two or more MEDs, each based on distinct information, and finds a single MED for all the information. Like all of probability theory, BT is simply an expression of logic acting on information. It is an equality derived from the decomposition of a joint probability into a product of its components.

Verbally, it is typically described as stating that the posterior probability distribution equals the product of the prior distribution (

The general method of Jaynesian probability theory can be summarized as a sequence of steps that apply to the formal derivation and manipulation of probabilities (

There is a further stipulation that is particularly critical for understanding the brain and the physical basis of information and inference. In addition to precisely stating the information, we must also specify where we believe that information to be, and if possible, its putative physical basis. “Where” must at least specify whether it is possessed by a scientist, a neural system under investigation, or whether it is in the environment external to the neural system (

The General Methodological Sequence of Jaynes.

Several criticisms have been made of Jaynesian probabilities. Extensive counterarguments have been given elsewhere (e.g., [

Jaynesian probabilities have been criticized as being “subjective”, and thus ambiguous and not appropriate for science. The simple characterization of Jaynesian probabilities as “subjective” can be misleading. As expressions of logic, Jaynesian probabilities are objective properties of information without any ambiguity. Two observers with the same information will apply the same probabilities, and two observers with different information will apply different probabilities. The information is subjective in the sense that it tends to differ across observers, in accord with reality. It is the ability of Jaynesian probabilities to objectively describe subjective information that makes them so useful in understanding brain function.

A second criticism of Jaynesian probabilities is that it is not always clear how they should be calculated. Although this is undoubtedly true, it is not a valid criticism of Jaynesian methods. To derive Jaynesian probability distributions, one must first be able to specify precisely what information is relevant, and then perform the mathematics. In the case of brain research, each of these might be difficult or even impossible as a practical matter, but there is no reason to think that it cannot be done as a matter of principle. Furthermore, there is reason to think this may not be so challenging with respect to the neural basis of inference (see

A third criticism stems from confusion of the subjective aspect of Jaynesian probabilities with our conscious attempts at quantifying the strength of our beliefs. Jaynesian probabilities are not simply a number that a person verbally reports based upon introspection. A person typically struggles to state the probability that one of their beliefs is true. This may be in part for the same reason that scientists and experts on probability theory struggle to rationally calculate probabilities in cases in which a great diversity of information is relevant. Of perhaps greater importance is the fact that, although human behavior is routinely based upon perceptions of what is probable and what is not, to be asked to verbally state (or consciously perceive) a probability is highly unusual and seldom important. To solve any particular problem, the brain must have information about some relevant aspect of the world (with a corresponding state space). If the brain is then asked to verbally specify a probability, the relevant aspect of the world (and its state space) now corresponds to the abstract notion of “probabilities” themselves, and the problem facing the brain has radically changed. The brain may have substantial information about some aspect of the world, but may have little information about the probabilities. In other words, information about “

A fourth criticism is not aimed at Jaynesian probabilities themselves, but rather questions their utility in understanding brain function on the grounds that behavior is not rational (optimal in its use of information). Although a full discussion of this issue is beyond the scope of the present work, several important points should be noted.

Humans are capable of reason, and there are numerous instances in which brain function is at least semi-rational.

To assess whether a brain is rational, we first must know what relevant information is in the brain. We should not confuse ignorance with lack of reason.

We know for a fact that the brain does not rationally integrate all of its information at all times. However, logical integration of smaller amounts of information, perhaps at the level of single neurons, is certainly conceivable.

Although the application of Jaynesian principles to the brain is often viewed as prescriptive, specifying how the brain ought to function, my view is that Jaynesian principles are better viewed as descriptive (see

Like any other intellectual endeavor, the frequentist approach to probabilities relies on the application of reason to information. It may be understood as a poorly formulated and incomplete implementation of Jaynesian principles. The most general point to be made here is that those maintaining a frequentist view are in reality basing their probabilities on their own information. Frequentist probabilities are not actually properties of an observed system, as frequentists claim.

Some specific faults with the frequentist definition of probability are summarized below. Criticisms 1–5 have all been made previously by Jaynes and others (e.g., [

The frequentist definition contradicts intuition, and is of limited use, because it completely fails to account for information that is obviously relevant to probabilities. The simple example given above is the statement that “there are

In those cases in which frequency distributions are available, it is unclear over what finite range or period they ought to be measured in order to derive probabilities. It is common to assume that the world is “stationary” and then to extrapolate an observed frequency to its limit as one imagines an infinite set of observations. But the real world is seldom if ever known to be stationary, and the actual data is always finite. To imagine otherwise is wishful thinking. Without any real, unique, and well-defined frequency distribution, the concept of a “true” probability, which we try to estimate, is just fantasy.

Because measurement of a frequency distribution over a finite period is never sufficient to fully specify a probability distribution, derivation of a probability distribution requires incorporation of additional information that is

Empirical evidence has shown that given the same information, there are many cases in which frequentist methods make less accurate predictions than Jaynesian methods [

A common perception is that while some probabilities come directly from frequencies, others are conditional on information, and therefore the frequentist definition is applicable in some situations and the Jaynesian definition in others. But Jaynesian methods have no problem incorporating information derived from frequency distributions. In those cases in which frequentist methods succeed, they give the same probabilities that could be derived from Jaynesian methods. At their best, frequentist methods are a special case of the more general Jaynesian methods. Thus there is no virtue or advantage to the frequentist definition of probability.

The most glaring faults of frequentist probabilities are not evident in conventional applications of probability theory in which scientists are always the observer, but arise solely when an observer (such as a scientist) wishes to understand how another observer (such as a brain) can perform inference.

Frequentist methods are narrow in their objective and do not address inference in general.

Frequentist methods do not incorporate logic in a formal sense, and thus cannot help in understanding its neural basis.

The most severe fault is the misattribution of knowledge. When frequentist approaches have been used to study the neural basis of inference [

Despite these serious flaws with frequentism, I have found one small and not so serious piece of evidence in its favor. Before discussing any of the relevant issues in my lectures to graduate and undergraduate students, I simply present the same questions and answers presented above and in

Although Jaynes referred to his own work as “Bayesian”, as opposed to “frequentist”, there are reasons to favor the term “Jaynesian” for many aspects of contemporary probability theory. The maximum entropy principle of Jaynes is the most primitive and fundamental principle within probability theory. It alone

Within the field of “Bayesian brain theory”, “Bayesian” has indeed been used to denote any use of BT. In fact, BT has often been used with frequentist rather than Jaynesian probabilities (see

The frequentist view of probability may be very slowly falling out of favor, but it still exerts a dominant influence within neuroscience. In 2004, a prominent neuroscientist wrote in a letter to the author: “How can probabilities of external events be conditional on the internal information an animal has, unless we assume telekinesis”? Another vivid example of the frequentist perspective comes from a 2010 “Review” in Nature Reviews Neuroscience [

The prevalence of the frequentist view in neuroscience is not immediately obvious because it is routinely used without any acknowledgment that more than one definition of probability exists. Even books that make extensive use of probabilities and quantify information do not clearly state a definition of probability or mention that there is more than one definition [

A natural consequence of frequentist probabilities is the need to imagine “probabilities of probabilities”, with the former probabilities being conditional on an observer and the latter being unconditional properties of an object (observed system). Similarly, distinctions have been made between the neural representation of “known and unknown probabilities” (e.g., [

Additional evidence of a partially frequentist perspective comes from the use of Bayes’s Theorem (BT). Within books [

The frequentist perspective and its terminology have a powerful influence on how we view natural processes, including those in the brain. It is routine to refer to neural processes as “random” or “stochastic” or “probabilistic”. These are all adjectives that are properly used to describe an observed system, not the observer (“noisy” is sometimes used in this manner as well). Thus the strong implication of this language is that these are intrinsic properties of these physical systems, rather than merely a description of our own ignorance and consequent inability to make accurate predictions. The notion of randomness is another example of the misattribution of information and probabilities. The apparent randomness of neural activity has provided the uncertainty in purportedly “Bayesian” models of how neurons can perform inference (see

Consider the output of each of four systems in the case of a known and invariant input (

An ion channel is either open or closed.

A vesicle containing neurotransmitter is either released or not released from a presynaptic terminal.

A neuron either generates an action potential or it does not.

In a “tail-flick assay”, a rat either removes its tail from a hotplate or it does not.

In each of these cases, the input can be invariant (over time or “trials”), and yet the output is variable. Thus the input does not fully determine the output. Cases 1 and 2, on a more microscopic scale, are routinely referred to as “stochastic” or “random” processes, case 3 at the “cellular level” is sometimes referred to in this manner. Case 4 concerning behavior is usually not said to be “random”, and most neuroscientists would probably agree that no behavior is a “truly random process”. Are these cases fundamentally distinct from one another? What is a “random process”, and how would one recognize it?

Illustration of neural systems that are sometimes said to be “random, stochastic, noisy, or probabilistic.”

If input does not fully determine the output of a physical system, is the system partially “random”? The answer, with respect to each of the four cases above, is “no” according to the view of contemporary physics. In each case, the system is believed to exist in a state that is “hidden” from the observation of the scientist at the time of the input. These hidden states act “behind the scenes” to determine the observed output of the system for any given input. The apparent randomness of the system is merely the product of the ignorance of the scientist. According to present scientific dogma, it is only at the very small scale of quantum physics that physical systems are “truly random”. One could question, as Jaynes did, how physicists could ever be confidant that their inability to predict the behavior of any system, including systems on a very small scale, is not merely due to their own ignorance [

It may be useful to consider these neural systems in more detail. Ion channels are the most microscopic of the four systems listed above. Until the invention of the patch-clamp method of electrophysiological recording, it was not possible for us to observe the electrical behavior of single channels. Now we know that many single channels have just two conductance states, open or closed, and we cannot predict with high accuracy the conductance state of a single channel at any moment. However, our best models of ion channels consist of numerous open and closed states (mostly closed states), which are indistinguishable in patch-clamp recordings (

The case of action potential generation is better understood and is generally thought to be more directly relevant to neural inference. Action potentials can be recorded with electrodes that are intracellular and measure voltage across the neuron’s membrane, or with extracellular electrodes that are sensitive to the time of the action potential but not to the membrane voltage. If extracellular electrodes are used, then it is observed that the neuron’s spike output varies even when its input (“stimulus”) is the same. The neuron’s output (firing rate) has thus been modeled as a Poisson “random” process. However, the neuron’s output does not appear nearly so “random” when one uses an intracellular electrode, in which case action potentials are always preceded by a depolarization towards a threshold. According to well known models of neuronal membrane biophysics (such as Hodgkin-Huxley models), there is nothing random about the generation of action potentials [

It is obvious that the same system cannot be “random” when observed with one technique and “deterministic” when observed with another technique. It is also clear that neuroscience would have suffered a tremendous setback if early pioneers of extracellular recording had concluded that action potential generation was substantially random, and that we should therefore be satisfied merely to characterize Poisson spike statistics without seeking greater knowledge of underlying mechanisms. With such a view, Hodgkin and Huxley might not have attempted to construct a biophysical and deterministic model of action potential generation. The entire scientific enterprise is based on our belief that Nature is full of information that we can acquire through careful observation. Were we to abandon this belief in favor of the view that a particular system is “random”, then the implication is that the system has no further information to yield and thus we should not invest our time in searching for a deeper understanding of it.

How then are we to use the words “random, stochastic, and probabilistic”? The use of these words to describe physical systems is at best misleading, and at worst, pathological and harmful to science. However, it is perfectly clear and appropriate to state that a model of a physical system has a “random” or “stochastic” component. This does not imply that the scientist necessarily believes any physical system to be “truly random”. The random component typically corresponds to a component of the model that is not sufficiently understood, or not sufficiently important, for the scientist to specify in detail. The scientist has complete knowledge of the model, which he or she created. The model naturally corresponds to a hypothesis about a physical system, based upon the scientist’s knowledge, and the random components of the model correspond to our ignorance about the physical system. As emphasized repeatedly here, we should not confuse our knowledge of a physical system with the physical system itself (which we believe to have an independent existence).

In much of neuroscience it is of limited practical consequence whether an event, such as the release of a vesicle, is inherently random or merely unpredictable for the scientist. However, biological models of Bayesian inference have relied on this apparent randomness to derive probability distributions that are purported to be conditional on neural activity but are derived simply from measuring frequency distributions of firing rate across time (repeated “trials”) (e.g., [

The value of Jaynesian probability theory is in providing the foundational framework for understanding information and logic. Jaynes did not address the physical or neurobiological basis of information and logic. In another article in this same issue of Information, Phillips also explores the work of Jaynes in relation to brain function and he comes to conclusions that are similar in several respects to my own [

The advantages of Jaynesian probability theory have been well documented (

In studying a physical system, a scientist is an observer, and the physical system is the object of inference. Although we necessarily observe brains (and people) as objects, we also believe that they themselves are observers (

Scientists ideally try to work from a common, shared body of knowledge, and to thereby describe nature from the common perspective of a unified Science. To the extent that two rational scientists share the same information, they will naturally agree on the probabilities. Indeed, the sharing of information is why the Jaynesian versus frequentist debate sometimes appears to be of only slight consequence with respect to the actual probabilities. Frequentist probabilities tend to be very similar (numerically) to third-person Jaynesian probabilities. However, we can expect that the information of a brain or neuron is likely to be very different from that of a scientist about that same aspect of the world, in which case the probabilities will be very different.

The distinction between neurocentric and xenocentric is not at all restricted to probabilities. Whether in science or our personal lives, we must always observe from a particular perspective. In psychology and cognitive neuroscience, this same issue was once actively debated. B.F. Skinner was the most prominent advocate of a xenocentric approach to behavior (“behaviorism”), in which the input-output relationship of an animal or human is studied as an object like any other physical system. Although this approach can be useful, the input-output relationships are hopelessly complex. Indeed, we cannot even adequately understand the function of a single ion channel without consideration of its many hidden internal states (

Within neuroscience today, the perspective taken depends almost entirely on whether the system being studied is viewed by the scientist as “high” or “low”. In studying “high-level”, “cognitive” systems that are presumed closer in function to the conscious experience of humans (e.g., prefrontal cortex), a neurocentric perspective is adopted. But in considering parts of the nervous system that are viewed as sensory or motor, or in considering any brain region in a “lower” animal, biologists have seldom taken a neurocentric approach. In studies at the cellular or molecular level, xenocentric input-output analyses in the spirit of Skinner are the absolute rule. Even when the goal of “taking the neuron’s point of view” has been explicitly stated by the authors, the actual probability distributions have been conditional on the information of the scientists about a neuron’s inputs and outputs. This is true of the work of Rieke and colleagues [

I believe that one virtue of Jaynesian theory in contemporary neuroscience is to counter what I see as an excessive emphasis on BT (Equation 3). Whereas the mainstream view seems to be that to “be Bayesian” is synonymous with the “performance” of BT, BT is not an absolutely essential aspect of probability theory (

First, these models presume that the brain needs to “know the probabilities” and that a probability is the answer to an inference problem and should thus correspond to a neuronal output. This implies that a brain could have information and yet be incapable of inference because of its inability to do the math. But although the brain needs to know about aspects of the real world, it is not at all clear that it needs to “know about the probabilities” of aspects of the real world.

Second, if the brain does in fact need to calculate probabilities, then the first calculations must identify maximum entropy distributions that are conditional on the brain’s information. BT is useless until probabilities are available. Thus it is at best premature to focus on the math of BT rather than the more fundamental precursor of entropy maximization. And if maximization of entropy can somehow be achieved readily, even for complex sets of information, then BT is unnecessary.

Third, it has been proposed that neurons perform BT, taking a prior and one or more likelihood functions as inputs, and generating an output that represents the posterior [

The alternate view favored here is that to function well the nervous system needs to have the right information in the right place at the right time, but that it has no need to perform any of the mathematics of probability theory [

If Jaynesian logic is viewed as prescriptive, then this would naturally call for experimental tests to verify how well, if at all, the brain follows Jaynesian principles. The results may be expected to vary on a case by case basis, with the brain being rational in some respects but not in others. By contrast, if Jaynesian probabilities are viewed as purely descriptive, then the issue of empirical verification does not arise. The probabilities merely describe the information in the brain, whether the brain functions well or is pathological [

In virtually all applications of probability theory, the probabilities and information have no known physical basis. Jaynesian probabilities are conditional on information, and we can presume that the information must be within some physical substrate, but we cannot say precisely what the substrate is or how the probabilities are conditional on that physical reality. What we would like is to be able to identify a probability distribution over states of one physical system (“the object”) conditional exclusively on the information inherent within the state of another physical system (“the observer”). To the best of my knowledge, this “physical inference” has never been described in a precise quantitative manner for any physical system, either in physics or neuroscience.

Although Jaynes was a physicist, he did little to directly address the physical basis of information or logic. Like Laplace before him, he developed probability theory to allow us to characterize the information of scientists about the world, and he primarily applied probabilities towards problems in physics. Among his greatest achievements in this regard was his simultaneous introduction in 1957 of the maximum entropy principle and its application to the field of statistical mechanics (SM) [

Jaynes demonstrated that a small and simple state of information is sufficient to derive probability distributions, and in some cases it can be extremely powerful in making accurate predictions about the world. This suggests that the physical basis of inference may be an approachable problem, at least in cases in which the observer possesses only a small amount of information. Mathematically, the derivation of probabilities is simpler for simpler sets of information. Furthermore, when we consider physical inference, it may be that we typically need to consider only small and relatively simple states of information.

When we think of a brain, or a computer, we imagine a large and complex set of information of the sort that in practice would overwhelm a team of the best probability theorists. This information is akin to a vast library (e.g., Google’s servers), but a library does not consist of integrated information. To perform logic and inference, the information must be

We can presume that the knowledge of an observer about the external world corresponds to “knowledge of self”. An observer knows with certainty its own state, including its configuration, temperature, mass,

To understand how a single molecule could “perform inference”, it is not essential to understand proteins or biophysics or even Boltzmann’s distribution. The simple point is that at each moment in time, a molecular sensor “observes” a single “sample” of some external quantity (external to the neuron), such as neurotransmitter concentration. As Jaynes demonstrated, in SM and numerous other cases, that is all the information that is necessary to derive a probability distribution and “perform” inference. By observing that sample, the probability distribution conditional on the information of the molecular sensor (and thus the neuron) is narrower than it would be were the sensor not there to observe the sample (see

If a single molecule is capable of performing inference and reducing uncertainty, then one could reasonably argue that this is trivial. Every cell in the body is full of such molecular sensors, and this certainly does not in itself provide clear insight into the structure of the nervous system. However, some sensors, and some synaptic inputs, provide more information than others depending on the statistical structure of the neuron’s environment, and this forms an objective basis for understanding which inputs a neuron should select and what sorts of learning rules it should implement [

Whereas it is commonly implied that our lack of understanding of brain function should be attributed primarily to its complexity, and to its many mechanistic properties that we have not yet characterized, the alternative view presented here is that confusion about the nature of information has been a more fundamental problem. A major source of that confusion has been the longstanding frequentist notion that the probabilities and information that scientists have characterized are properties of the physical systems that scientists observe, when in fact these probabilities describe the information of the scientists themselves about those systems. Jaynes referred to this as the “mind projection fallacy” [

I would like to thank Bill Phillips for discussions and comments on the manuscript, Karl Friston and Peter Freed for helpful comments on earlier versions of the manuscript, and Chang Sub Kim for helping me to better understand statistical mechanics. This work was supported by the World Class University (WCU) program of the National Research Foundation of Korea funded by the Ministry of Education, Science and Technology (grant number R32-2008-000-10218-0).