Information Processing in the Brain as Optimal Entropy Transport: A Theoretical Approach

Islas, Carlos; Padilla, Pablo; Prado, Marco Antonio

doi:10.3390/e22111231

Open AccessArticle

Information Processing in the Brain as Optimal Entropy Transport: A Theoretical Approach

by

Carlos Islas

^1,†,

Pablo Padilla

^2,*,† and

Marco Antonio Prado

^1,2,†

¹

Universidad Autónoma de la Ciudad de México, Doctor García Diego núm. 168, Cuauhtémoc, Ciudad de México 06720, Mexico

²

IIMAS, Universidad Nacional Autónoma de México Circuito Escolar, Cd. Universitaria, Ciudad de México 04510, Mexico

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Entropy 2020, 22(11), 1231; https://doi.org/10.3390/e22111231

Submission received: 9 September 2020 / Revised: 14 October 2020 / Accepted: 15 October 2020 / Published: 29 October 2020

(This article belongs to the Section Entropy and Biology)

Download

Browse Figures

Versions Notes

Abstract

:

We consider brain activity from an information theoretic perspective. We analyze the information processing in the brain, considering the optimality of Shannon entropy transport using the Monge–Kantorovich framework. It is proposed that some of these processes satisfy an optimal transport of informational entropy condition. This optimality condition allows us to derive an equation of the Monge–Ampère type for the information flow that accounts for the branching structure of neurons via the linearization of this equation. Based on this fact, we discuss a version of Murray’s law in this context.

Keywords:

informational entropy; neuroscience; Monge–Ampère equation; optimal transport; variational calculus; Murray’s law; neuronal branching structures

1. Introduction

The brain as the organ responsible for processing information in the body has been subjected to evolutionary pressure. “The business of the brain is computation, and the task it faces is monumental. It must take sensory inputs from the external world, translate this information into a computationally accessible form, then use the results to make decisions about which course of action is appropriate given the most probable state of the world” [1]. It is therefore arguable that information processing has been optimized, at least to some extent, by natural selection [2,3]. This is a rather abstract claim that should ultimately be contrasted with experiments (in this respect, see [4,5]). In a subsequent work, we explore this question in more detail, but a plausible connection can be established with experimental and theoretical results via fMRI in which different measures of cost have been proposed. The brain as an informational system is the subject of active research (see for instance [6,7,8]). For example, “behavior analysis has adopted these tools [Information theory] as a novel means of measuring the interrelations between behavior, stimuli, and contingent outcomes” [1]. In this context, in [1], it was shown that informational measures of contingency appear to be a reasonable basis for predicting behavior. We also refer the reader to the issue devoted to information theory in neuroscience [9] for a recent account of this perspective. In the work just mentioned, several papers investigate optimization principles (for instance, the maximum entropy principle [10] or the free energy principle; see [11,12]) as tools for understanding inference, coding, and other brain functionalities. Information theory will allow filling the gaps between disciplines, for example psychology, neurobiology, physics, mathematics, and computer science. In the present paper, we adopt a related setting and mathematically formalize information processing in the brain within the framework of optimal transport. The rationale for this is consistent with the view that some essential brain functionalities such as inference and the coordination of tasks (e.g., auditory and motor activities) involve the transportation of information and that such processes should be efficient, have been subjected to evolutionary pressure, and as a consequence, are (pseudo) optimal. As was already pointed out, our theoretical proposal should be contrasted with experimental results.

It is necessary to observe from the very beginning that information processing and transport are of an intrinsically spatiotemporal nature. Therefore, our proposal should include these two features. In doing so, we expect the spatial part of the optimization to give rise to spatial patterns, for instance network-like or branching hierarchical components, as well as the temporal structure, such as periodic or synchronized patterns. However, in this paper, we begin by considering only the spatial part, except for a few general remarks. Since our proposal is an attempt to establish a methodological framework to study informational entropy transport in the brain, we deal with spacial aspects first, leaving a study of the dynamical aspects for a subsequent work. Furthermore, in order to simplify the problem, we consider only the one-dimensional case in the mathematical formalism. However, it is certain that the geometry of the brain has to play an essential role in all the processes [13], and we extrapolate some of the results we obtain to two and three dimensions.

Now, we provide an overview of the paper. In Section 2, we present a general framework for information processing in the brain as an optimal transport of entropy, as well as some mathematical results in the context of the Monge–Kantorovich problem. The main idea is that instead of considering that some sort of physical mass is being transported, it is informational entropy. In order to provide the mathematical results and for the sake of completeness, we adapt the material on the existence of a solution in the optimal mass transportation case as presented by Bonnotte [14] to the informational case. We conclude this section with a derivation of the Monge–Ampère equation in the one-dimensional case. We begin Section 3 by recalling the linearization of the Monge–Ampère equation around the square of the distance function, which involves the Laplacian. Then, we argue that adding a nonlinear term is justified by the physiological nature of transmission along neurons. The resulting model is a semilinear elliptic equation. At the end of this section, we relate the qualitative features of the solutions with the branching structure of neural networks in the brain. In other words, we show that the optimal transport process of informational entropy is consistent with the geometric branching structure of neural branching. In Section 4, we elaborate on the relationship of the branching structure of neurons and Murray’s law [15], which provides the optimal branching ratio of the father to the daughter branch sections, as well as the optimal bifurcation angle. We propose a modified version of Murray’s law when the underlying transport network carries information instead of a fluid. The last section is devoted to concluding remarks, further research, and open questions.

2. The Monge–Kantorovich Problem

We present a general overview of the results on optimal transportation theory needed in this work. For a complete exposition, see [14,16,17] or [18], but for the sake of completeness, we include a general discussion without giving the proofs of the results, but including appropriate references. Our presentation follows closely [14] and some parts of [16].

The original Monge–Kantorovich problem was formulated in the context of mass transport (Monge) or budget allocation (Kantorovich). In a later section, we will adapt the setting to include the transport of informational entropy.

Monge’s problem: Given two probability measures

μ

and

ν

on

R^{n}

and a cost function

c : R^{n} \times R^{n} ⟶ [0, \infty]

, the problem of Monge can be stated as follows:

Find T : R^{n} ⟶ R^{n}, such that ν = T # μ and \int_{R^{n}} c (x, T (x)) d μ (x) is minimal .

(1)

The condition

ν = T # μ

means that T transports

μ

onto

ν

; that is,

ν

is the push-forward of

μ

by T: for any

ξ

,

\int_{R^{n}} ξ (y) d ν (y) = \int_{R^{n}} ξ (T (x)) d μ (x)

.

Monge–Kantorovich problem: Monge’s problem might have no solution; hence, it is better to take the following generalization proposed by Leonid Kantorovich: instead of looking for a map, find a measure:

π \in Π (μ, ν) such that \int_{R^{n} \times R^{n}} c (x, y) d π (x, y) is minimal,

(2)

where

Π (μ, ν)

stands for the set of all transport plans between

μ

and

ν

, i.e., the probability measures on

R^{n} \times R^{n}

with marginals

μ

and

ν

. This problem really extends Monge’s problem. For any transport map T, sending

μ

onto

ν

yields a measure

π \in Π (μ, ν)

, which is given by

π = (I d, T) # μ

, i.e., the only measure

π

on

R^{n} \times R^{n}

such that:

\forall ξ \in C_{b}, \int_{R^{n} \times R^{n}} ξ (x, y) d π (x, y) = \int_{R^{n}} ξ (T (x)) d μ (x),

and the associated costs of transportation are the same. In this version, it is not difficult to show that there is always a solution ([14] or [16]).

2.1. Dual Formulation

There is a duality between the Monge–Kantorovich problem (2) and the following problem:

Find ψ \in L^{1} (μ), ϕ \in L^{1} (ν), such that ψ (x) + ϕ (y) \leq c (x, y) and \int_{X} ψ (x) d μ + \int_{Y} ϕ (y) d ν is maximal .

It seems natural to look for a solution of this problem among the pairs

(ψ, ϕ) \in P (X \times Y)

that satisfy:

\begin{matrix} ϕ (y) = inf_{x} {c (x, y) - ψ (x)} \end{matrix} and \begin{matrix} ψ (x) = inf_{y} {c (x, y) - ϕ (y)} . \end{matrix}

We will write

ϕ (y) = ψ^{c} (y)

and

ψ (x) = ϕ^{c} (x)

.

Definition 1.

A function ψ is said to be c-concave if

ψ = ϕ^{c}

for some function ϕ. In that case,

ψ^{c}

and

ϕ^{c}

are called the c-transform of ψ and ϕ, respectively. We also say that

(ψ, ψ^{c})

is an admissible pair and ψ,

ψ^{c}

are admissible potentials.

Then, the problem becomes:

Find ψ \in L^{1} (μ) such that \int_{X} ψ (x) d μ + \int_{Y} ψ^{c} (y) d ν is maximal .

(3)

The function

ψ

is called a Kantorovich potential between

μ

and

ν

.

The following proposition explains how to relate the Monge–Kantorovich problem (2) with (3), known as the Kantorovich duality principle.

Proposition 1

(Kantorovich duality principle). Let μ and ν be Borel probability measures on X and

Y \in R^{n}

, respectively. If the cost function

c : R^{n} \times R^{n} ⟶ [0, \infty]

is lower semi-continuous and:

\int_{Y} \int_{X} c (x, y) d μ (x) d ν (y) < \infty,

then there is a Borel map

ψ : R^{n} ⟶ R

that is c-concave and optimal for (3). Moreover, the resulting maximum is equal to the minimum of the Monge–Kantorovich problem (2); i.e.,

min_{π \in Π (μ, ν)} \int_{R^{n} \times R^{n}} c (x, y) d π (x, y) = max_{ϕ \in L^{1} (μ)} \{\int_{X} ϕ (x) d μ (x) + \int_{Y} ϕ^{c} (y) d ν (y)\},

or:

min_{π \in Π (μ, ν)} I [π] = max_{(ϕ, ϕ^{c}) \in Φ} J (ϕ, ϕ^{c}),

(4)

where,

Φ = \{(ϕ, ψ) \in L^{1} (μ) \times L^{1} (ν) | ϕ (x) + ψ (y) \leq c (x, y) for μ - a . e . x \in X and ν - a . e . y \in Y\} .

(5)

If

π \in Π (μ, ν)

is optimal, then

ϕ (x) + ϕ^{c} (y) = c (x, y)

almost everywhere for π.

Proof.

A proof of this result can be found in [16]. □

2.2. Solution in the Real Line: Optimal Transportation Case

For the rest of this section, we only consider the one-dimensional case, as discussed in the Introduction.

Let X and Y be two bounded smooth open sets in

R

and

μ (d x)

,

ν (d y)

the probability measures of X and Y, respectively, with

μ (d x) = f d x

,

ν (d y) = g d y

,

f = 0

in

R \

X, and

g = 0

in

R \

Y. Let

F : R ⟶ [0, 1]

and

G : R ⟶ [0, 1]

be the cumulative distributions of

μ

and

ν

, respectively, defined by

F (x) = μ ((- \infty, x])

and

G (y) = ν ((- \infty, y])

.

Proposition 2.

Let h ∈

C^{1} (R)

be a non-negative, strictly convex function. Let μ and ν be Borel probability measures on

R

such that:

\int_{Y} \int_{X} h (x - y) d μ (x) d ν (y) < \infty .

(6)

If μ has no atom and F and G stand for the respective cumulative distribution of μ and ν, respectively, then:

T = G^{- 1} \circ F

solves Monge’s problem (1) for the cost:

c (x, y) = h (x - y) .

(7)

If π is the induced transform plan, that is:

π = (I d, T) # μ

defined as

T # μ (E) = μ (T^{- 1} (E))

for

E \in Y

, then π is optimal for the Monge–Kantorovich problem (2).

Proof.

A proof of Proposition (2) can be found in [14]. □

In order to get the previous result, one has to consider the functional:

\begin{matrix} \int_{X \times Y} c (x, y) d π (x, y), \end{matrix}

where

c : X \times Y ⟶ R

is some given cost function and

Π (x, y)

stands for the set of all transport plans between

μ

and

ν

, meaning the probability measures on

R \times R

with marginals

μ

and

ν

; i.e.,

Π (μ, ν) = \{π \in P (X \times Y) | \int_{Y} d π (x, y) = d μ (x), \int_{X} d π (x, y) = d ν (y)\},

(8)

or more rigorously:

Π (μ, ν) = \{π \in P (X \times Y) | π (A \times Y) = μ (A), π (X, B) = ν (B), A \subseteq X, B \subseteq Y\} .

Our goal is to get a similar result to Proposition (2) in the case when entropy transportation is considered instead of mass transportation. This is the content of the next section.

3. Solution in the Real Line: Optimal Entropy Transportation Case

As was pointed out in [1], “the fundamental measure in information theory is the entropy. It corresponds to the uncertainty associated with a signal. “Entropy” and “information” are used interchangeably, because the uncertainty about an outcome before it is observed, corresponds to the information gained by observing it” (for a review, see [1,19] or [20]). In that context, we will prove the existence of an optimal entropy transport for the cost function

c (x, y)

satisfying (6) and (7), similarly to the optimal transportation case discussed in the last section. We will also find the Monge–Ampère equation for quadratic cost

{| x - y |}^{2} / 2

for this optimal entropy transport. A few words are in order regarding the choice of c. From the mathematical perspective, it considerably simplifies the analysis. As a matter of fact, the optimal transportation problem has not been solved for the general case of nonquadratic costs. On the other hand, from the physiological point of view, it is natural to assume that the energy required to send a signal from one point of the brain to another can be taken as a monotone function of the distance.

Let

μ

and

ν

be probability measures defined as above, then take

X = Y = Ω \in R

, and let the entropy be characterized by Shannon’s proposal:

- ρ (x) ln (ρ (x)),

(9)

where

x \in Ω

,

ρ

, and

\tilde{ρ}

are the distribution densities with respect to

μ

and

ν

, respectively, in the same way as the formulation of the optimal transportation problem states; i.e.,

μ = ρ d x, ν = \tilde{ρ} d y,

satisfying (8). We wish (9) to be related to the probability measure

μ

in

Ω

. It is natural to think that in passing from the state characterized by (9) to the one characterized by

- \tilde{ρ} (y) ln (\tilde{ρ} (y))

, where

y \in Ω

and wish it to be related to the probability measure

ν

in

Ω

. Similarly,

- ρ (x) ln (ρ (x))

and

- \tilde{ρ} (y) ln (\tilde{ρ} (y))

will be the marginals of

- ρ (x, y) ln (ρ (x, y))

.

As a first concrete proposal, we consider the following functional:

\int_{Ω \times Ω} c (x, y) ρ (x, y) ln (ρ (x, y)) d x d y,

where c is a spatial cost function. Notice that we have dropped the minus sign, so looking for maximal entropy is equivalent to minimizing the previous expression. The problem is then to find the optimal entropy transport strategy between x and y (analogous to Monge’s problem). There is however a standard problem, which is the fact that in the continuous case, the entropy could be negative, whereas in the discrete case (i.e., for discrete probability functions), the entropy is always positive. We therefore consider the absolute value of the entropy defined in (9). More precisely, define the following:

| ρ ln ρ (x) | = {(ρ ln ρ)}^{+} (x) + {(ρ ln ρ)}^{-} (x),

with:

\begin{matrix} {(ρ ln ρ)}^{+} (x) = max_{Ω} (ρ ln ρ, 0) \end{matrix} and \begin{matrix} {(ρ ln ρ)}^{-} (x) = - min_{Ω} (ρ ln ρ, 0) . \end{matrix}

It is natural to assume that:

\int_{Ω} | ρ ln ρ (x) | d x = K,

(10)

for some constant

K \in R^{+} \

{0}

, and let:

d μ = \frac{1}{K} | ρ (x) ln ρ (x) | d x,

(11)

with:

\begin{matrix} d μ^{+} = \frac{1}{K} {(ρ ln ρ)}^{+} (x) d x \end{matrix} and \begin{matrix} d μ^{-} = \frac{1}{K} {(ρ ln ρ)}^{-} (x) d x . \end{matrix}

Observation 1.

If

ρ

(x, y)

is the distribution density of

π (x, y)

in

Ω \times Ω

, then (11) is a well-defined Borel probability measure on

Ω

, and:

d ν = \frac{1}{K} | \tilde{ρ} (y) ln \tilde{ρ} (y) | d y

(12)

is also a well-defined Borel probability measure on

Ω

. Then, we can define:

\hat{Π} (μ, ν) = \{π \in P (Ω \times Ω) | \int_{Y = Ω} d π (x, y) = d μ (x), \int_{X = Ω} d π (x, y) = d ν (y)\},

(13)

with

μ

and

ν

given by (11) and (12); or more rigorously:

\begin{matrix} \hat{Π} (μ, ν) = \{π \in P (Ω \times Ω) | π (A, Ω) = μ (A), π (Ω, B) = ν (B), A, B \subseteq Ω\}, \end{matrix}

with

μ

and

ν

given by (11) and (12); we can take the cumulative distributions of

μ

and

ν

, respectively, by

F (x) = μ ((- \infty, x])

and

G (y) = ν ((- \infty, y])

with

μ

and

ν

given as before. By definition, they are non-decreasing.

Now, we are in the condition to establish the problem (1) analogous to that of Monge, namely:

\begin{matrix} Find a map T : R ⟶ R, such that ν = T # μ with μ and ν given by (11) and (12) and \\ \int_{Ω} c (x, T (x)) d μ (x) is minimal, \end{matrix}

(14)

and analogous to the Monge–Kantorovich problem:

\begin{matrix} Find a measure π \in \hat{Π} (μ, ν) given by (13) with μ and ν given by (11) and (12) such that \\ \int_{Ω \times Ω} c (x, y) d π (x, y) is minimal . \end{matrix}

(15)

Next, we present the equivalent proposition of (2), for the measures given by (11) and (12), namely:

Proposition 3.

Let h∈

C^{1} (R)

be a non-negative, strictly convex function. Let μ and ν be Borel probability measures on

R

given by (11) and (12), respectively. Suppose that:

\int_{Ω} \int_{Ω} h (x - y) d μ (x) d ν (y) < \infty .

(16)

for

Ω \in R

. If μ has no atom and F and G represent the corresponding cumulative distribution functions of μ and ν, respectively, then:

T = G^{- 1} \circ F

solves Problem (14) for the cost:

c (x, y) = h (x - y) .

If π is the induced entropy transform plan, that is:

π = (I d, T) # μ,

defined as

T # μ (E) = μ (T^{- 1} (E))

for

E \in Ω

, then π is optimal for Problem (15).

Proof.

The proof of this result is adapted from [14] and for the sake of completeness is given in Appendix A. □

Our next goal is to deduce the Monge–Ampère equation for this case. In order to do that, we will need an analog of the Kantorovich duality principle (Proposition (1)), for the measures given by (11) and (12), namely:

Proposition 4.

Let μ and ν be the Borel probability measures on

Ω \in R

given by (11) and (12), respectively. If the cost function:

c : R \times R ⟶ [0, \infty)

is lower semi-continuous and:

\int_{Ω} \int_{Ω} c (x, y) d μ (x) d ν (y) < \infty,

then there is a Borel map

ψ : R ⟶ R

that is c-concave and optimal for (3). Moreover, the resulting maximum is equal to the minimum of Problem (15); i.e.,

min_{π \in \hat{Π} (μ, ν)} \int_{R \times R} c (x, y) d π (x, y) = max_{ψ \in L^{1} (μ)} \{\int_{Ω} ψ (x) d μ (x) + \int_{Ω} ψ^{c} (y) d ν (y)\}

or:

min_{π \in \hat{Π} (μ, ν)} I [π] = max_{(ψ, ψ^{c}) \in Φ} J (ψ, ψ^{c}),

(17)

where,

Φ = \{(ϕ, ψ) \in L^{1} (μ) \times L^{1} (ν) | ϕ (x) + ψ (y) \leq c (x, y) for μ - a . e . x \in Ω and ν - a . e . y \in Ω\},

(18)

and if

π \in \hat{Π} (μ, ν)

given by (13) is optimal, then

ψ (x) + ψ^{c} (y) = c (x, y)

almost everywhere for π.

Proof.

A proof of this result can be found in Appendix A. □

If we propose

C (x, y) = {| x - y |}^{2} / 2

, we wish T to be expressed as

T = \nabla ϕ

for some convex function

ϕ

and then to be able to find the corresponding Monge–Ampère equation related to the measures

μ

and

ν

given by (11) and (12). This fact is guaranteed by Brenier’s theorem. The details of its proof adapted for our case are important and can be found in Appendix A.

Theorem 1

(Brenier). Let μ and ν be the Borel probability measures on

Ω \in R

given by (11) and (12), respectively, and with finite second-order moments; that is, such that:

\begin{matrix} \int_{Ω} {| x |}^{2} d μ (x) < \infty \end{matrix} and \begin{matrix} \int_{Ω} {| y |}^{2} d ν (y) < \infty . \end{matrix}

(19)

Then, if μ is absolutely continuous on Ω, there exists a unique

T : R ⟶ R

such that

ν = T # μ

and:

\int_{Ω} {| x - T (x) |}^{2} d μ (x) = min_{γ \in \hat{Π} (μ, ν)} \int_{Ω} {| x - y |}^{2} d γ (x, y),

with

\hat{Π} (μ, ν)

given by (13). Moreover, there is only one optimal transport plan, γ, which is necessarily

(I d, T) # μ

, and T is the gradient of a convex function φ, which is therefore unique up to an additive constant. There is also a unique (up to an additive constant) Kantorovich potential, ψ, which is locally Lipschitz and linked to φ through the relation:

φ (x) = \frac{1}{2} {| x |}^{2} - ψ (x) .

Proof.

See Appendix A. □

Observation 2.

Observe that Theorem 1 holds for the general case on

R^{n}

.

The Monge–Ampère Equation

Let:

\begin{matrix} d μ (x) = \frac{1}{K} | ρ (x) ln [ρ (x)] | d x \end{matrix} and \begin{matrix} d ν (y) = \frac{1}{K} | \tilde{ρ} (y) ln [\tilde{ρ} (y)] | d y \end{matrix}

be two probability measures, absolutely continuous with respect to the Lebesgue measure. By Theorem 1, there exists a unique gradient of a convex function,

φ

, such that:

\int_{Ω} ζ (y) | \tilde{ρ} (y) ln [\tilde{ρ} (y)] | d y = \int_{Ω} ζ (\nabla ϕ (x)) | ρ (x) ln [ρ (x)] | d x,

(20)

for all test functions

ζ \in C_{b} (R)

. Since

φ

is strictly convex, then

\nabla φ

is

C^{0}

and one-to-one. Hence, taking

y = \nabla φ (x)

, we get:

\int_{Ω} ζ (y) | \tilde{ρ} (y) ln [\tilde{ρ} (y)] | d y = \int_{Ω} ζ (\nabla ϕ (x)) | \tilde{ρ} (\nabla φ (x)) ln [\tilde{ρ} (\nabla φ (x))] | det D^{2} φ (x) d x .

(21)

From (20) and (21), we get:

| ρ (x) ln [ρ (x)] | = | \tilde{ρ} (\nabla φ (x)) ln [\tilde{ρ} (\nabla φ (x))] | det D^{2} φ (x),

(22)

and the Monge–Ampère equation:

det D^{2} φ (x) = \frac{| ρ (x) ln [ρ (x)] |}{| \tilde{ρ} (\nabla φ (x)) ln [\tilde{ρ} (\nabla φ (x))] |},

(23)

corresponding to this case.

4. Neural Branching Structure and the Linearization of the Monge–Ampère Equation

As was pointed out in the Introduction, the purpose of this section is to propose a model for the branching structure of the neurons, which is consistent with the process of information transport previously introduced. The basic idea is as follows. If we consider that information transport is optimized in some brain processes, we consequently have (as discussed in the previous section) an associated Monge–Ampère equation for the transport plan potential. Besides, it is natural to consider a cost that is close to some power of the distance function, since physiological cost can be taken to depend on the distance traveled by the corresponding signal, as is usually assumed in transport networks. For technical reasons, and in order to be able to adapt results well known in the literature, we take this cost function to be close to:

\frac{{| x - y |}^{2}}{2},

From a qualitative perspective, this choice should not change the results much, as long as the cost function remains convex (see [16]). If this is the case, we can then compute the linearization of the Monge–Ampère equation around this quadratic cost function. The resulting equation is a linear elliptic equation. We argue that a self-activating mechanism should be incorporated in the form of a nonlinear term (as a result of the excitable nature of the transport of electric impulses along axons). In this way, we end up with a semilinear equation that can be used to explain branching processes in biological networks (see [21]). More specifically, the solution to this equation could be associated with the concentration of a morphogen, e.g., a growth factor, and if such a concentration is above a certain threshold, a branching mechanism is triggered. It is then consistent to look for solutions that are close to the quadratic cost function and therefore to linearize around it.

In order to linearize the Monge–Ampère equation, we assume then that

φ

is very close to

{| x |}^{2} / 2

, so

| ρ (x) ln [ρ (x)] |

is very close to

| \tilde{ρ} (\nabla φ (x)) ln [\tilde{ρ} (\nabla φ (x))] |

. In that case, following [16,22,23], make:

φ (x) \equiv φ_{ε} (x) = \frac{{| x |}^{2}}{2} + ε η + O (ε^{2})

(24)

and:

| \tilde{ρ} (\nabla φ (x)) ln [\tilde{ρ} (\nabla φ (x))] | = | \tilde{ρ} (\nabla φ (x)) ln [\tilde{ρ} (\nabla φ (x))] |_{ε} \equiv (1 + ε h + O (ε^{2})) | ρ (x) ln [ρ (x)] |,

(25)

with

η, h \in L^{1} (μ)

and

ε ≪ 1

. We leave the details of this computation for Appendix A. Substituting (24) and (25) in the Monge–Ampère Equation (23), we get as the linearized operator:

L η = h,

(26)

with:

L = (- Δ + \nabla (- log | ρ (x) ln ρ (x) |) \cdot \nabla) .

Then, the Laplacian plus a transport term can be seen as the linearized version of the Monge–Ampère equation for our proposal. We notice that the main mechanism responsible for the flow of information along the axons is the propagation of electrical impulses. This is well known to be an excitable process that involves, among others, a self-activating component, as for instance in the standard Hodgkin–Huxley or FitzHugh–Nagumo models. If we include this into the linearization previously obtained (Equation (26)), we get:

L ϕ = (- Δ ϕ + \nabla (- log | ρ (x) ln ρ (x) |) \cdot \nabla ϕ) + F (ϕ),

where F a is function describing the self-activating mechanism and can be taken typically as a power of

ϕ

:

F (ϕ) = ϕ^{p},

with

p > 1

. Solutions of this type of equations have been studied by many authors since the pioneering work by Ni and Takagi ([24] and the references therein), since they appear in different contexts. These solutions typically exhibit concentrations that can be responsible for branching structures.

Indeed, if one assumes that the concentration of a solution to the previous equation is correlated with a growth factor morphogen, then a branch will stem out of the main branch. This or similar models have been proposed using reaction-diffusion models following Turing’s original proposal ([25] or [26]), for pattern formation; in particular for branching structures in plants ([27,28] or [29]), lungs ([21]), and other vascular systems ([30,31]). Figure 1 and Figure 2 show numerical simulations for a particular case of the linearization of the Monge–Ampère equation. Growth is induced by the concentration of the solution, and it can be seen that the process gives rise to lateral branches.

5. Murray’s Law and Neural Branching

In the previous section, we argued that neuronal branching is compatible with reaction diffusion processes. On the other hand, we deduced the corresponding equations by considering the transport of information along neural networks. The question of whether there is some connection with Murray’s law arises naturally. Recall that Murray’s law refers to a transport network ([32,33]).

In his original paper [32], Murray obtained from optimizing considerations a relationship for the different parameters associated with a branching transport network that was later generalized in [15]. In what follows, we deduce it from scratch for the sake of completeness (we refer the reader to [15,34,35] for further details). The total power required for the flow to overcome the viscous drag is described by:

W_{t} = \frac{8 μ L f^{2}}{π r^{4}} + m π L r^{2},

(27)

where

μ

is the dynamic viscosity of the fluid, L is the vessel length, f is the volumetric flow rate, m is an all-encompassing metabolic coefficient that includes the chemical cost of keeping the blood constituents fresh and functional and the general cost owing to the weight of the blood and the vessel, and r is the vessel radius. For our purposes, we modify Equation (27) as follows:

W_{t} = \frac{a f^{2}}{r^{2}} + b r^{α},

(28)

where the first term corresponds to the power required for an electrical impulse to propagate along the axon. Notice in particular that the factor of

r^{2}

in the denominator follows from the fact that electrical resistance is inversely proportional to the area of the section of the conducting material. On the other hand, the second term is proportional to a power,

α

, of the radius and describes the fact that metabolic cost can vary depending on the type of neurons with which we are dealing. For instance, the degree of myelination of the axon could determine the effective cost associated with information transport. The minimum power is found by differentiating with respect to r and equating to zero:

\frac{d W_{t}}{d r} = \frac{- 2 a f^{2}}{r^{3}} + α b r^{α - 1} .

(29)

With this, the optimal radius:

r^{2 + α} = \frac{2 a}{α b} f^{2},

(30)

and the optimal relation between volumetric flow rate and vessel radius, such that the power requirement is minimized, is obtained with:

f = k r^{1 + \frac{α}{2}},

(31)

where

k = \sqrt{\frac{α b}{2 a}}

.

Using the construction of [33], if the radius of the main branch (

r_{0}

), lateral branches (

r_{1}

,

r_{2}

), and x and y are the angles between the lateral branches and the main branch, we obtain a generalized version of Murray’s law:

f_{0} = f_{1} + f_{2} = k r_{o}^{1 + \frac{α}{2}} = k (r_{1}^{1 + \frac{α}{2}} + r_{2}^{1 + \frac{α}{2}});

thus, we get the general law:

r_{0}^{1 + \frac{α}{2}} = r_{1}^{1 + \frac{α}{2}} + r_{2}^{1 + \frac{α}{2}} .

for

α \in R

. Using these relations, we obtain three general equations associated with the branching angles (see Figure 3) x, y, and

x + y

:

cos (x) = \frac{r_{o}^{4} + r_{1}^{4} - {(r_{0}^{1 + \frac{α}{2}} - r_{1}^{1 + \frac{α}{2}})}^{\frac{8}{2 + α}}}{2 r_{0}^{2} r_{1}^{2}},

(32)

cos (y) = \frac{r_{o}^{4} + r_{2}^{4} - {(r_{0}^{1 + \frac{α}{2}} - r_{2}^{1 + \frac{α}{2}})}^{\frac{8}{2 + α}}}{2 r_{0}^{2} r_{2}^{2}},

(33)

cos (x + y) = \frac{{(r_{1}^{1 + \frac{α}{2}} + r_{2}^{1 + \frac{α}{2}})}^{\frac{8}{2 + α}} - r_{1}^{4} - r_{2}^{4}}{2 r_{1}^{2} r_{2}^{2}},

(34)

which correspond to a different generalized Murray’s law for different values of

α \in R

.

Observation 3.

If

α = 2

, then by (32)–(34), we get

cos (x + y) = 1

and then

x + y = 0

, which is consistent with the fact that

cos x = 1

, then

x = 0

and

cos y = 1

, and then,

y = 0

. This case corresponds to no branching. In terms of the cost, this would imply that it is more efficient for the network not to bifurcate.

If

α = 4

, then

cos (x) = \frac{r_{o}^{4} + r_{1}^{4} - {(r_{0}^{3} - r_{1}^{3})}^{\frac{4}{3}}}{2 r_{0}^{2} r_{1}^{2}}

,

cos (y) = \frac{r_{o}^{4} + r_{2}^{4} - {(r_{0}^{3} - r_{2}^{3})}^{\frac{4}{3}}}{2 r_{0}^{2} r_{2}^{2}}

, and

cos (x + y) = \frac{{(r_{1}^{3} + r_{2}^{3})}^{\frac{4}{3}} - r_{1}^{4} - r_{2}^{4}}{2 r_{1}^{2} r_{2}^{2}}

, which corresponds to Murray’s original proposal [33], which states that the angle in the bifurcation of an artery should not be less than

75^{\circ}

(

{74.9}^{\circ}

to be more exact). This is consistent with the numerical and experimental results in [36].

If

α = 6

, we get

cos (x + y) = 0

and then

x + y = \frac{π}{2}

. On the other hand,

cos (x) = r_{1}^{2} / r_{0}^{2}

, then

cos (x) > 0

since

r_{0}, r_{1} \neq 0

; this implies that

cos (x) \neq 0

and

x \neq \frac{π}{2}

. Similarly,

y \neq \frac{π}{2}

. We obtain that

x + y = \frac{π}{2}

and

x, y \in (0, \frac{π}{2})

for this case. In other words, for

α = 6

, the angle between the bifurcated branches is

π / 2

, but orthogonal branching is ruled out.

We conclude then that the relevant values for our purposes are for

α \in [2, 6]

. It would be interesting to contrast these possible scenarios with experimental data for different kinds of nervous tissues. To our knowledge, no systematic experimental study of branching angles has been carried out.

6. Conclusions

We proposed that information flow in some brain processes can be analyzed in the framework of optimal transportation theory. A Monge–Ampère equation was obtained for the optimal transportation plan potential in the one-dimensional case. Extrapolating to higher dimensions, the corresponding linearization around a quadratic distance cost was derived and shown to be consistent with the branching structure of the nervous system. Finally, a generalized version of Murray’s law was derived assuming different cost functions, depending on a parameter related to the metabolic maintenance term. Future work includes a detailed comparison of the methodological proposal with experimental data. In particular, it would be interesting to carry out the program here proposed in a concrete cognitive experiment. Possible concrete experiments to compare with could be found in [2,3,5]. Here, we outline a simple procedure with fMRI data in which a direct connection with optimal transport theory can be tested. Consider the brain activity map for the resting state given by standard fMRI. Once normalized, this map will provide the initial probability density and entropy to be transported. In fact, in [37], another possible methodology for measuring the entropy with fMRI can be found. Later on, the subject is asked to perform a simple motor task, for instance move the right hand. The corresponding density after the task is done can then be registered as before and will provide the final density in the optimal transport problem. Some intermediate densities should be determined as well. This information will provide a transport plan that can be compared with the mathematical solution of the problem. Correspondingly, the branching structures and their bifurcation angles and radii should be compared with experimental results as well. In principle, Murray’s law should be consistent with the Monge–Ampère equation and its linearization, and it should be possible to derive it from them. A precise relationship between the maximum entropy principle and optimal entropy transport should be clarified.

Author Contributions

All authors contributed equally to the manuscript: read and approved the final version, conceptualization, methodology, software, validation, formal analysis, investigation, resources, writing–original draft preparation, writing–review and editing, visualization, supervision and project administration. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

M.A.P. was supported by Universidad Autónoma de la Ciudad de México, sabbatical approval UACM/CAS/010/19. This work was supported by the Departamento de Matemáticas y Mecánica (MyM) of the Intituto de Investigaciones en Mateámaticas Aplicadas y en Sistemas (IIMAS) of the Universidad Nacional Autónoma de la Ciudad de México (UNAM). M.A.P. would like to thank P.P. and IIMAS-UNAM for the support during his sabbatical leave. The authors would also like to thank Jorge Castillo-Medina at the Universidad Autónoma de Guerrero in Acapulco for his kind permission to use Figure 1 and Figure 2.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Relevant Theorems and Some Proofs

Theorem A1

(The continuity and support theorem). Let μ be a probability measure on

R

with cumulative distribution function F. The following properties are equivalent:

a.: The function F is strictly increasing in the interval:

${x \in R | 0 < F (x) < 1} .$
b.: The inverse function $F^{- 1}$ is continuous.
c.: The inverse measure $μ^{- 1}$ is non-atomic.
d.: The support of μ, given by:

$s u p p (μ) = \bar{{A \in Σ | μ (A) > 0}}$

with Σ a σ-algebra defined on $R$ , is a closed interval in the real line, finite or not.

Proof.

The proof of this result can be found in [38], Appendix A. □

Proposition A1.

Let h ∈

C^{1} (R)

be a non-negative, strictly convex function. Let μ and ν be Borel probability measures on

R

given by (11) and (12), respectively. Suppose that:

\int_{Ω} \int_{Ω} h (x - y) d μ (x) d ν (y) < \infty .

(A1)

for

Ω \in R

. If μ has no atom and F and G represent the respective cumulative distribution functions of μ and ν, respectively, then:

T = G^{- 1} \circ F

solves Problem (14) for the cost:

c (x, y) = h (x - y) .

If π is the induced entropy transform plan, that is:

π = (I d, T) # μ,

defined as

T # μ (E) = μ (T^{- 1} (E))

for

E \in Ω

, then π is optimal for Problem (15).

Proof.

T is well defined:
The only problem we might have with the definition of $T = G^{- 1} \circ F$ could be when $F (x) = 0$ . However, if $F (x) = 0$ , then:

$\begin{matrix} T (x) & = G^{- 1} (F (x)) \\ = G^{- 1} (0) \\ = min \{y \in [- \infty, \infty] | 0 \leq G (y)\} \\ = - \infty, \end{matrix}$

but if for some $a \in R$ , we have $F (a) = 0$ , then $μ ((- \infty, a]) = 0$ , which means that $a = - \infty$ and T is well defined, as desired.
$ν = T # μ$ :
Let F and G be defined as in observation (1). Then, $T = G^{- 1} \circ F$ is non-decreasing, since F and G are non-decreasing. Then:

$\begin{matrix} T^{- 1} ((- \infty, y]) & = \{x \in [- \infty, \infty] | T (x) = y\} \\ = \{x \in [- \infty, \infty] | G^{- 1} (F (x)) = y\} \\ = \{x \in [- \infty, \infty] | F (x)) = G (y)\}, \end{matrix}$

Since T is non-decreasing, $T^{- 1} ((- \infty, y])$ is an interval.
Claim 1. Since $μ$ has no atom, F is increasing and continuous, and then, $T^{- 1} ((- \infty, y])$ is a closed interval.
If $μ$ has no atom, let $A \in Ω$ and $α \in R^{+}$ such that $μ (A) = α$ . Then, given $β \in R^{+}$ such that $0 < β \leq α$ , there exits $B \subset A$ such that $μ (B) = β$ . Then, $\forall γ \in R$ such that $0 < γ \leq α$ , there exists $x_{γ} \in A$ such that:

$μ ((- \infty, x_{γ}]) = γ,$

then $0 < F (x_{γ}) < F (x_{α}) \leq 1$ for all $γ, α$ such that $x_{γ} < x_{α}$ , and so, F is increasing and continuous; Theorem A1 implies that $F^{- 1}$ is continuous, and then:

${x \in R | 0 \leq F (x) \leq 1} = s u p p (μ),$

is closed. In particular, ${x \in R | 0 \leq F (x) \leq G (y)}$ is closed. Then, we conclude that $T^{- 1} ((- \infty, y])$ is a closed interval, as desired.
We have proven Claim 1.
Now, if $x = sup {x \in R | F (x) \leq G (y)}$ , then $F (x) = G (y)$ , and we have:

$\begin{matrix} μ (T^{- 1} ((- \infty, y])) & = μ ((- \infty, x]) \\ = F (x) \\ = G (y) \\ = ν ((- \infty, y]), \end{matrix}$

and then, $ν = T # μ$ , as desired.
$T = G^{- 1} \circ F$ is optimal:
Observe that $(μ, ν)$ given by (11) and (12) satisfy:

$\begin{matrix} \int_{Y = Ω} d π (x, y) = d μ (x) \end{matrix} and \begin{matrix} \int_{X = Ω} d π (x, y) = d ν (y) \end{matrix}$

where:

$d π (x, y) = \frac{1}{K} | ρ (x, y) ln (ρ (x, y) | d x d y,$

and:

$d μ (x) = \frac{1}{K} | ρ (x) ln (ρ (x)) | d x,$

then $π (x, y) \in \hat{Π} (μ, ν)$ as given by (13).
Now, observe that:

$\begin{matrix} \int_{Ω} h^{'} (x - T (x)) \frac{1}{K} | ρ (x) ln (ρ (x)) | d x & = \int_{Ω} h^{'} (x - T (x)) d μ; \end{matrix}$

since T and $h^{'}$ are non-decreasing,

$h^{'} (u - T (u)) \leq h^{'} (u - T (x))$

for $u \geq x$ . Then:

$\begin{matrix} \int_{x}^{y} h^{'} (u - T (u)) d u & \leq \int_{x}^{y} h^{'} (u - T (x)) d u \\ \leq h (y - T (x)) - h (x - T (x)) . \end{matrix}$

On the other hand,

$h^{'} (u - T (u)) \geq h^{'} (u - T (x))$

for $u \leq x$ , and then:

$\begin{matrix} \int_{x}^{y} h^{'} (u - T (u)) d u & = - \int_{y}^{x} h^{'} (u - T (u)) d u \\ \leq - \int_{y}^{x} h (u - T (x)) d u; \\ = - [h (x - T (x)) - h (y - T (x))] \\ = h (y - T (x)) - h (x - T (x)) . \end{matrix}$

As a consequence, in any case, we have:

$\begin{matrix} \int_{x}^{y} [h (u - T (u)) d u \leq h (y - T (x)) - h (x - T (x)) . \end{matrix}$

(A2)

Set:

$ψ (y) = \int_{0}^{y} h^{'} (u - T (u)) d u,$

then (A2) becomes:

$ψ (y) - ψ (x) \leq h (y - T (x)) - h (x - T (x)),$

which implies:

$\begin{matrix} ψ^{c} (T (x)) & = inf_{y} \{h (y - T (x)) - ψ (y)\} \\ = h (x - T (x)) - ψ (x); \end{matrix}$

as a consequence, $ψ$ is c-concave, according to Definition (1).
Hypothesis A1 implies the existence of $x_{0}$ , $y_{0} \in Ω$ such that:

$\begin{matrix} \int_{Ω} h (x - y_{0}) d μ (x) < \infty \end{matrix} and \begin{matrix} \int_{Ω} h (x_{0} - y) d ν (y) < \infty . \end{matrix}$

(A3)

Claim 2: $ψ \in L^{1} (μ)$ and $ψ^{c} \in L^{1} (ν)$ : Observe that:

$\begin{matrix} h (x - y_{0}) - ψ^{c} (y_{0}) & = h (x - y_{0}) - inf_{y} \{h (y - y_{0}) - ψ (y)\} \\ \geq h (x - y_{0}) - (h (x - y_{0}) - ψ (x)) \\ = ψ (x), \end{matrix}$

(A4)

and:

$\begin{matrix} h (x_{0} - T (x)) - ψ (x_{0}) & \geq inf_{y} \{h (y - T (x)) - ψ (y)\} \\ = h (x - T (x)) - ψ (x) \\ = ψ^{c} (T (x)); \end{matrix}$

hence:

$ψ (x) \geq - ψ^{c} (T (x)) \geq - h (x_{0} - T (x)) + ψ (x_{0}),$

(A5)

then, by (A4) and (A5),

$\begin{matrix} h (x - y_{0}) - ψ^{c} (y_{0}) \geq ψ (x) \geq - h (x_{0} - T (x)) + ψ (x_{0}); \end{matrix}$

(A6)

now, by Hypothesis A3 and since $ν = T # μ$ :

$\int_{Ω} h (x_{0} - T (x)) d μ = \int_{Ω} h (x_{0} - y) d ν < \infty;$

(A7)

therefore, integrating (A6) with respect to $μ$ , we get:

$\int_{Ω} [h (x - y_{0}) - ψ^{c} (y_{0})] d μ (x) \geq \int_{Ω} ψ (x) d μ (x) \geq \int_{Ω} [- h (x_{0} - T (x)) + ψ (x_{0})] d μ (x);$

as a consequence:

$\int_{Ω} | ψ (x) | d μ (x) < \infty,$

and we can conclude that $ψ \in L^{1} (μ)$ .
Similarly:

$\begin{matrix} h (x_{0} - T (x)) - ψ (x_{0}) \geq ψ^{c} (T (x)) & \geq - ψ (x) \geq - h (x - y_{0}) + ψ^{c} (y_{0}) \end{matrix}$

by (A4) and (A5); using (A3) and (A7), if we integrate with respect to $μ$ ,

$\int_{Ω} [h (x_{0} - T (x)) - ψ (x_{0})] d μ (x) \geq \int_{Ω} ψ^{c} (T (x)) d μ (x) \geq \int_{Ω} [- h (x - y_{0}) + ψ^{c} (y_{0})] d μ (x);$

therefore:

$\int_{Ω} | ψ^{c} (T (x)) | d μ (x) = \int_{Ω} | ψ^{c} (y) | d ν (y) < \infty$

Hence, $ψ^{c} \in L^{1} (ν)$ . We have proven Claim 2.
Integrating $ϕ (x) + ϕ^{c} (T (x)) = h (x - T (x)) = c (x, y)$ with respect to $μ$ , we get:

$\int_{Ω} ψ (x) d μ (x) + \int_{Ω} ψ^{c} (T (x)) d μ (x) = \int_{Ω} h (x - T (x)) d μ (x),$

and then:

$\int_{Ω} ψ (x) d μ (x) + \int_{Ω} ψ^{c} (y) d ν (y) = \int_{Ω} c (x, T (x)) d μ (x) .$

Finally, observe that $ψ (x) + ψ^{c} (y) \leq c (x, y)$ for every $(x, y) \in Ω \times Ω$ ; if $π_{1} \in \hat{Π} (μ, ν)$ is another entropy transport plan, the associated total entropy transport cost $c (x, y)$ is greater, by the definition of $ψ$ and $ψ^{c}$ ; then, the equality holds only for the optimal entropy transport plan $π$ and $T = G^{- 1} \circ F$ ; hence, it solves Problem (14), and $π = (I d, T) # μ$ solves Problem (15).
We have proven the proposition.
□

Proposition A2.

Let μ and ν be the Borel probability measures on

Ω \in R

given by (11) and (12), respectively. If the cost function:

c : R \times R ⟶ [0, \infty)

is lower semi-continuous and:

\int_{Ω} \int_{Ω} c (x, y) d μ (x) d ν (y) < \infty,

then there is a Borel map

ψ : R ⟶ R

that is c-concave and optimal for (3). Moreover, the resulting maximum is equal to the minimum of Problem (15); i.e.,

min_{π \in \hat{Π} (μ, ν)} \int_{R \times R} c (x, y) d π (x, y) = max_{ψ \in L^{1} (μ)} \{\int_{Ω} ψ (x) d μ (x) + \int_{Ω} ψ^{c} (y) d ν (y)\}

or:

min_{π \in \hat{Π} (μ, ν)} I [π] = max_{(ψ, ψ^{c}) \in Φ} J (ψ, ψ^{c}),

(A8)

where,

Φ = \{(ϕ, ψ) \in L^{1} (μ) \times L^{1} (ν) | ϕ (x) + ψ (y) \leq c (x, y) for μ - a . e . x \in Ω and ν - a . e . y \in Ω\},

(A9)

and if

π \in \hat{Π} (μ, ν)

given by (13) is optimal, then

ψ (x) + ψ^{c} (y) = c (x, y)

almost everywhere for π.

Proof.

The existence of a maximizing pair and Relation (A8) have been proven in Proposition A1. Now, choosing an optimal

π \in \hat{Π} (μ, ν)

, we have:

\begin{matrix} \int_{Ω \times Ω} [c (x, y) - ψ (x) - ψ^{c} (y)] d π (x, y) & = \int_{Ω \times Ω} c (x, y) d π (x, y) & - \int_{Ω} ψ (x) d μ (x) \\ - \int_{Ω} ψ^{c} (y) d ν (y) \\ = 0, \end{matrix}

and since

c (x, y) - ψ (x) - ψ^{c} (y) \geq 0

in

Ω \times Ω

, it must vanish for

π

-a.e.

(x, y) \in Ω \times Ω

.

The proof is completed. □

Theorem A2

(Brenier). Let μ and ν be the Borel probability measures on

Ω \in R

given by (11) and (12), respectively, and with second-order moments; that is, such that:

\begin{matrix} \int_{Ω} {| x |}^{2} d μ (x) < \infty \end{matrix} and \begin{matrix} \int_{Ω} {| y |}^{2} d ν (y) < \infty \end{matrix}

(A10)

Then, if μ is absolutely continuous on Ω, there exists a unique

T : R ⟶ R

such that

ν = T # μ

and:

\int_{Ω} {| x - T (x) |}^{2} d μ (x) = min_{γ \in \hat{Π} (μ, ν)} \int_{Ω} {| x - y |}^{2} d γ (x, y),

with

\hat{Π} (μ, ν)

given by (13). Moreover, there is only one optimal transport plan γ, which is necessarily

(I d, T) # μ

, and T is the gradient of a convex function φ, which is therefore unique up to an additive constant. There is also a unique (up to an additive constant) Kantorovich potential, ψ, which is locally Lipschitz and linked to φ through the relation:

φ (x) = \frac{1}{2} {| x |}^{2} - ψ (x) .

Proof.

Claim 1:

π = (Id, \nabla ϕ) # μ

, and it is optimal:

Set

c (x, y) = \frac{{| x - y |}^{2}}{2}

. Proposition A1 guarantees the existence of the pair

(ψ, ψ^{c}) \in Φ

as given by (A9) such that

ψ (x) + ψ^{c} (y) \leq \frac{{| x - y |}^{2}}{2}

for all

(x, y) \in Ω \times Ω

; then:

ψ (x) + ψ^{c} (y) \leq \frac{x^{2} - 2 x y + y^{2}}{2}

and then:

x y \leq [\frac{x^{2}}{2} - ψ (x)] + [\frac{y^{2}}{2} - ψ^{c} (y)] .

Set:

\begin{matrix} φ (x) = [\frac{x^{2}}{2} - ψ (x)] \end{matrix} and \begin{matrix} φ^{c} (y) = [\frac{y^{2}}{2} - ψ^{c} (y)] \end{matrix} .

(A11)

Hypothesis A10 means:

M = \int_{Ω} \frac{{| x |}^{2}}{2} d μ (x) + \int_{Ω} \frac{{| y |}^{2}}{2} d ν (y) < \infty,

then:

I [π] = \int_{Ω \times Ω} \frac{{| x - y |}^{2}}{2} d π (x, y) \leq \int_{Ω \times Ω} [{| x |}^{2} + {| y |}^{2}] d π (x, y) = 2 M;

hence,

I [π]

is always finite in

\hat{Π} (μ, ν)

given by (13). As a consequence,

\begin{matrix} inf_{\hat{Π} (μ, ν)} I [π] & = \frac{1}{2} [\int_{Ω} {| x |}^{2} d μ (x) + \int_{Ω} {| y |}^{2} d ν (y)] \\ - sup \{\int_{Ω \times Ω} x y d π (x, y) | π \in \hat{Π} (μ, ν)\} \\ = M - inf_{\hat{Φ}} \{\int_{Ω} φ (x) d μ (x) + \int_{Ω} φ^{c} (y) d ν (y)\} \\ = sup_{Φ} \{\int_{Ω} φ (x) d μ (x) + \int_{Ω} φ^{c} (y) d ν (y)\} \\ = sup_{Φ} J (φ, φ^{c}), \end{matrix}

where

J (φ, φ^{c}) = \int_{Ω} φ (x) d μ (x) + \int_{Ω} φ^{c} (y) d ν (y)

,

Φ

given by (18), and:

\begin{matrix} \hat{Φ} = {(φ, ψ) \in L^{1} (μ) \times L^{1} (ν) with values in R \cup {\infty} | x y \leq φ (x) + ψ (y) for μ - a . e . x \in Ω \\ and ν - a . e . y \in Ω} . \end{matrix}

(A12)

Then, the Kantorovich duality principle (4) becomes:

sup \{\int_{Ω \times Ω} x y d π (x, y) | π \in \hat{Π} (μ, ν)\} = inf_{\hat{Φ}} J (φ, φ^{c}) .

(A13)

Now,

x y \leq φ (x) + φ^{c} (y)

since

(φ, φ^{c}) \in \hat{Φ}

; hence:

φ^{c} \geq sup_{x} \{x y - φ (x)\} \equiv φ^{*} (y) .

φ^{*} (y)

is called the Legendre transform of

φ (x)

and satisfies:

φ (x) + φ^{*} (y) \geq x y .

It can be proven that

inf_{\hat{Φ}} J = J (φ, φ^{*})

(see [16]; Theorem 2.9); now, Proposition A1 guarantees the existence of an optimal transport plan

π

. As a consequence, Relation (A13) implies:

\begin{matrix} \int_{Ω} φ (x) d μ (x) + \int_{Ω} φ^{*} (y) d ν (y) & = \int_{Ω \times Ω} x y d π (x, y) \\ = \int_{Ω \times Ω} [φ (x) + φ^{c} (y)] d π (x, y), \end{matrix}

and then:

\int_{Ω \times Ω} [φ (x) + φ^{c} (y) - x y] d π (x, y) = 0,

for all

(x, y) \in Ω \times Ω

. However,

φ (x) + φ^{c} (y) - x y \geq 0

by the definition of the Legendre transform. Then, for

π

-a.e.

(x, y) \in Ω \times Ω

, we have:

\begin{matrix} φ (x) + φ^{*} (y) - x y = 0 & ⟺ x y \geq φ (x) + φ^{*} (y) \\ ⟺ x y \geq φ (x) + (z y - φ (z)), \forall z \in R \\ ⟺ φ (z) \geq φ (x) + y (z - x) \\ ⟺ y \in \partial φ (x) \end{matrix}

(A14)

(by symmetry, we can also conclude that

x \in \partial φ^{*} (y)

). Now, since

φ

is convex, it can be proven that it is locally Lipschitz and continuous; Rademacher’s theorem (see [39]; Section 3.1.2) implies then that

φ

is differentiable

μ

-a.e. on

R

. As a consequence:

\partial φ (x) = \{\nabla φ (x)\} .

(A15)

One can prove that

y = \nabla φ (x)

(see [16]). Therefore,

π = (I d, T) # μ = (I d, \nabla φ) # μ

, and it is optimal.

We have proven Claim 1.

Claim 2:

π = (Id, \nabla φ) # μ

is the only transport plan:

Let

\hat{φ}

be another convex function such that

ν = \nabla \hat{φ} # μ

.

We want to prove

\nabla φ (x) = \nabla \hat{φ} (x)

μ

-a.e.

x \in Ω

.

By Claim 1,

(I d, \nabla \hat{φ}) # μ

is an optimal transport plan, and the pair

(\hat{φ}, {\hat{φ}}^{*})

is optimal for the dual problem, just like

(φ, φ^{*})

. Therefore:

\int_{Ω} \hat{φ} (x) d μ (x) + \int_{Ω} {\hat{φ}}^{*} (y) d ν (y) = \int_{Ω} φ (x) d μ (x) + \int_{Ω} φ^{*} (y) d ν (y) .

Let

π

be the optimal transport plan associated with

φ

. Then:

\int_{Ω \times Ω} [\hat{φ} (x) + {\hat{φ}}^{*} (y)] d π (x, y) = \int_{Ω \times Ω} [φ (x) + φ^{*} (y)] d π (x, y) = \int_{Ω \times Ω} x y d π (x, y);

hence:

\int_{Ω} [\hat{φ} (x) + {\hat{φ}}^{*} (\nabla φ (x))] d μ (x) = \int_{Ω} x \nabla φ (x) d μ (x),

and:

\int_{Ω} [\hat{φ} (x) + {\hat{φ}}^{*} (\nabla φ (x)) - x \nabla φ (x)] d μ (x) = 0;

since

\hat{φ} (x) + {\hat{φ}}^{*} (\nabla φ (x)) - x \nabla φ (x) \geq 0

, we conclude:

\hat{φ} (x) + {\hat{φ}}^{*} (\nabla φ (x)) - x \nabla φ (x) = 0,

μ

-a.e. on

Ω

. Hence, by (A14):

\nabla φ (x) \in \partial \hat{φ} (x), μ - a . e . x \in Ω .

Now, since

\hat{φ}

is differentiable

μ

-a.e. on

Ω

:

\nabla φ (x) = \nabla \hat{φ} (x) .

Claim 2 has been proven.

Finally, we have proven not only the uniqueness of Problem (15) (analogous to the Monge–Kantorovich problem), but also the uniqueness of the gradient of a convex function

\nabla ϕ

such that

\nabla φ # μ = ν

and:

min_{γ \in \hat{Π} (μ, ν)} \int_{Ω \times Ω} {| x - y |}^{2} d γ (x, y) = \int_{Ω} {| x - \nabla φ (x) |}^{2} d μ (x) = \int_{Ω} {| x - T (x) |}^{2} d μ (x) .

We have proven the theorem.

□

Appendix B. Linearization of the Monge–Ampère Equation

Assume that

φ

is very close to

\frac{{| x |}^{2}}{2}

, so

| ρ (x) ln [ρ (x)] |

is very close to

| \tilde{ρ} (\nabla φ (x)) ln [\tilde{ρ} (\nabla φ (x))] |

. In that case, following [16,22,23], make:

φ (x) \equiv φ_{ε} (x) = \frac{{| x |}^{2}}{2} + ε η + O (ε^{2}),

and:

| \tilde{ρ} (\nabla φ (x)) ln [\tilde{ρ} (\nabla φ (x))] | \equiv | \tilde{ρ} (\nabla φ (x)) ln [\tilde{ρ} (\nabla φ (x))] |_{ε} = (1 + ε h + O (ε^{2})) | ρ (x) ln [ρ (x)] | .

with

η, h \in L^{1} (μ)

. Then, by the Neumann series:

\begin{matrix} \frac{| ρ (x) ln [ρ (x)] |}{| \tilde{ρ} (\nabla φ (x)) ln [\tilde{ρ} (\nabla φ (x))] |} & = \frac{| ρ (x) ln [ρ (x)] |}{(1 + ε h + O (ε^{2})) | ρ (\nabla φ (x)) ln [ρ \nabla φ (x))] |} \\ = \frac{(1 - ε h + O (ε^{2})) | ρ (x) ln [ρ (x)] |}{| ρ (\nabla φ (x)) ln [ρ \nabla φ (x))] |} . \end{matrix}

(A16)

On the other hand, it is easy to see that:

\begin{matrix} det D^{2} φ (x) & = det D^{2} (\frac{{| x |}^{2}}{2} + ε η + O (ε^{2})) \\ = det D^{2} (\frac{{| x |}^{2}}{2}) + ε trace (H D^{2} η) + O (ε^{2}) . \end{matrix}

Definition A1.

The term

trace ({HD}^{2} η)

is called the linearization of the Monge–Ampère equation, where

H = {(H)}_{i j}

is the matrix of co-factors of

D^{2} (\frac{{| x |}^{2}}{2}) = I

.

Then:

det D^{2} (φ (x)) = 1 + ε trace (D^{2} η) + O (ε^{2}) .

(A17)

Substituting (A16) and (A17) into the Monge–Ampère Equation (22) and omitting order superior terms, we get:

\begin{matrix} (1 + ε Δ η) | ρ (\nabla φ (x)) ln [ρ \nabla φ (x))] | & = (1 - ε h) | ρ (x) ln [ρ (x)] |, \end{matrix}

(A18)

and:

\begin{matrix} | ρ (\nabla φ (x)) ln [ρ \nabla φ (x))] | & = | ρ ln ρ | (\nabla φ (x)) \\ = | ρ ln ρ | \nabla (\frac{{| x |}^{2}}{2} + ε η) \\ = | ρ ln ρ | (x + ε \nabla η) \\ = | ρ (x) ln ρ (x) | + ε \nabla η \cdot \nabla (| ρ (x) ln ρ (x) |), \end{matrix}

by using the order one Taylor’s series. as a consequence, (A18) becomes:

\begin{matrix} (1 + Δ η) [| ρ (x) ln ρ (x) | + ε \nabla η \cdot \nabla (| ρ (x) ln ρ (x) |)] & = (1 - ε h) | ρ (x) ln ρ (x) |, \end{matrix}

then:

\begin{matrix} (1 - ε h) & = (1 + Δ η) [1 + ε \nabla η \cdot (\frac{\nabla | ρ (x) ln ρ (x) |}{| ρ (x) ln ρ (x) |})] \\ = (1 + Δ η) [1 + ε \nabla η \cdot \nabla log | ρ (x) ln ρ (x) |], \end{matrix}

and then:

(1 - ε h) = 1 + ε \nabla η \cdot \nabla log | ρ (x) ln ρ (x) | + ε Δ η + ε^{2} Δ η [\nabla η \cdot \nabla log | ρ (x) ln ρ (x) |];

omitting again order two terms, we get:

- h = \nabla η \cdot \nabla log | ρ (x) ln ρ (x) | + Δ η,

or simply:

L η = h,

(A19)

with:

L = (- Δ + \nabla (- log | ρ (x) ln ρ (x) |) \cdot \nabla) .

We have proven that the Laplace equation can be seen as the linearized version of the Monge–Ampère equation for our proposal.

References

Jensen, G.; Ward, R.D.; Balsam, P.D. Information: Theory, brain, and behavior. J. Exp. Anal. Behav. 2013, 100, 408–431. [Google Scholar] [CrossRef] [PubMed]
Pregowska, A.; Szczepanski, J.; Wajnryb, E. Temporal code versus rate code for binary Information Sources. Neurocomputing 2016, 216, 756–762. [Google Scholar] [CrossRef] [Green Version]
Pregowska, A.; Szczepanski, J.; Wajnryb, E. How Far can Neural Correlations Reduce Uncertainty? Comparison of Information Transmission Rates for Markov and Bernoulli Processes. Int. J. Neural Syst. 2019, 29, 1950003. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Harris, J.J.; Jolivet, R.; Engl, E.; Attwell, D. Energy-Efficient Information Transfer by Visual Pathway Synapses. Curr. Biol. 2015, 25, 3151–3160. [Google Scholar] [CrossRef] [Green Version]
Harris, J.J.; Engl, E.; Attwell, D.; Jolivet, R.B. Energy-efficient information transfer at thalamocortical synapses. PLoS Comput. Biol. 2019, 15, 1–27. [Google Scholar] [CrossRef] [Green Version]
Keshmiri, S. Entropy and the Brain: An Overview. Entropy 2020, 22, 917. [Google Scholar] [CrossRef]
Salmasi, M.; Stemmler, M.; Glasauer, S.; Loebel, A. Synaptic Information Transmission in a Two-State Model of Short-Term Facilitation. Entropy 2019, 21, 756. [Google Scholar] [CrossRef] [Green Version]
Crumiller, M.; Knight, B.; Kaplan, E. The Measurement of Information Transmitted by a Neural Population: Promises and Challenges. Entropy 2013, 15, 3507–3527. [Google Scholar] [CrossRef] [Green Version]
Panzeri, S.; Piasini, E. Information Theory in Neuroscience. Entropy 2019, 21, 62. [Google Scholar] [CrossRef] [Green Version]
Isomura, T. A Measure of Information Available for Inference. Entropy 2018, 20, 512. [Google Scholar] [CrossRef] [Green Version]
Friston, K. The free-energy principle: A unified brain theory? Nat. Rev. Neurosci. 2010, 11, 127–138. [Google Scholar] [CrossRef] [PubMed]
Ramstead, M.J.D.; Badcock, P.B.; Friston, K. Answering Schrodinger’s question: A free-energy formulation. Phys. Life Rev. 2016, 24, 1–16. [Google Scholar] [CrossRef] [PubMed]
Luczak, A. Measuring neuronal branching patterns using model-based approach. Front. Comput. Neurosci. 2010, 4, 135. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Bonnotte, N. Unidimensional and Evolution Methods for Optimal Transportation. Ph.D. Thesis, Scuola Normale Superiore di Pisa and Université Paris-Sud XI, Orsay, France, 2013. [Google Scholar]
Stephenson, D.; Patronis, A.; Holland, D.M.; Lockerby, D.A. Generalizing Murray’s Law: An optimization principle for fluidic networks of arbitrary shape and scale. J. Appl. Phys. 2015, 118, 174302. [Google Scholar] [CrossRef]
Villani, C. Topics in Optimal Transportation, 1st ed.; Graduate Studies in Mathematics; American Mathematical Society: Providence, RI, USA, 2003; Volume 58. [Google Scholar]
Villani, C. Optimal Transport Old and New, 1st ed.; Grundlehren der Mathematischen Wissenschaften: A Series of Comprehensive Studies in Mathematics; Springer: Berlin/Heidelberg, Germany, 2009; Volume 338. [Google Scholar] [CrossRef] [Green Version]
Evans, L.C. Partial Differential Equations and Monge–Kantorovich Mass Transfer. Curr. Dev. Math. 1997, 1997, 65–126. [Google Scholar] [CrossRef]
Shannon, C. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef] [Green Version]
Applebaum, D. Probability and Information: An Integrated Approach, 2nd ed.; Cambridge University Press: Cambridge, UK, 2010. [Google Scholar] [CrossRef]
Alarcón, T.; Castillo, J.; García-Ponce, B.; Padilla, P. Growth rate and shape as possible control mechanisms for the selection of mode development in optimal biological branching processes. Eur. Phys. J. Spec. Top. 2016, 225, 2581–2589. [Google Scholar] [CrossRef]
Gutierrez, C.E.; Caffarelli, L.A. Properties of the solutions of the linearized Monge-Ampére equation. Am. J. Math. 1997, 119, 423–465. [Google Scholar] [CrossRef]
Gutiérrez, C.E. The Monge-Ampére Equation, 2nd ed.; Progress in Nonlinear Differential Equations and Their Applications; Birkhäuser: Basel, Switzerland, 2016; Volume 89. [Google Scholar] [CrossRef]
Ni, W.M. Diffusion, cross-diffusion, and their spike-layer steady states. Not. AMS 1998, 45, 9–18. [Google Scholar]
Turing, A.M. The Chemical Basis of Morphogenesis. Philos. Trans. R. Soc. Lond. Ser. B Biol. Sci. 1952, 237, 37–72. [Google Scholar]
Zhu, X.; Yang, H. Turing Instability-Driven Biofabrication of Branching Tissue Structures: A Dynamic Simulation and Analysis Based on the Reaction–Diffusion Mechanism. Micromachines 2018, 9, 109. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Meinhardt, H.; Koch, A.J.; Bernasconi, G. Models of pattern formation applied to plant development. In Symmetry in Plants; Series in Mathematical Biology and Medicine: Volume 4; World Scientific: Singapore, 1998; pp. 723–758. [Google Scholar] [CrossRef] [Green Version]
Cortes-Poza, Y.; Padilla-Longoria, P.; Alvarez-Buylla, E. Spatial dynamics of floral organ formation. J. Theor. Biol. 2018, 454, 30–40. [Google Scholar] [CrossRef] [PubMed]
Barrio, R.A.; Romero-Arias, J.R.; Noguez, M.A.; Azpeitia, E.; Ortiz-Gutiérrez, E.; Hernández-Hernández, V.; Cortes-Poza, Y.; Álvarez-Buylla, E. Cell Patterns Emerge from Coupled Chemical and Physical Fields with Cell Proliferation Dynamics: The Arabidopsis thaliana Root as a Study System. PLoS Comput. Biol. 2013, 9, e1003026. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Serini, G.; Ambrosi, D.; Giraudo, E.; Gamba, A.; Preziosi, L.; Bussolino, F. Modeling the early stages of vascular network assembly. EMBO J. 2003, 22, 1771–1779. [Google Scholar] [CrossRef] [Green Version]
Köhn, A.; de Back, W.; Starruß, J.; Mattiotti, A.; Deutsch, A.; Perez-Pomares, J.M.; Herrero, M.A. Early Embryonic Vascular Patterning by Matrix-Mediated Paracrine Signalling: A Mathematical Model Study. PLoS ONE 2011, 6, e24175. [Google Scholar]
Murray, C. The Physiological Principle of Minimum Work: I. The Vascular System and the Cost of Blood Volume. Proc. Natl. Acad. Sci. USA 1926, 12, 207–214. [Google Scholar] [CrossRef] [Green Version]
Murray, C. The Physiological Principle of Minim Work Applied to the Angle of Branching of Arteries. J. Gen. Physiol. 1926, 9, 835–841. [Google Scholar] [CrossRef]
McCulloh, K.A.; Sperry, J.S.; Adler, F.R. Water transport in plants obeys Murray’s law. Nature 2003, 421, 939–942. [Google Scholar] [CrossRef]
Zheng, X.; Shen, G.; Wang, C.; Li, Y.; Dunphy, D.; Tawfique Hasan, T.; Brinker, C.J.; Su, B.L. Bio-inspired Murray materials for mass transfer and activity. Nat. Commun. 2017, 8, 14921. [Google Scholar] [CrossRef] [Green Version]
Özdemir, H.I. The structural properties of carotid arteries in carotid artery diseases—A retrospective computed tomography angiography study. Pol. J. Radiol. 2020, 85, e82–e89. [Google Scholar] [CrossRef]
Wang, Z.; Li, Y.; Childress, A.R.; Detre, J.A. Brain Entropy Mapping Using fMRI. PLoS ONE 2014, 9, e89948. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Bobkov, S.; Ledoux, M. One-Dimensional Empirical Measures, Order Statistics, and Kantorovich Transport Distances, 1st ed.; Memoirs of the American Mathematical Society; American Mathematical Society: Providence, RI, USA, 2016; Volume 261. [Google Scholar] [CrossRef]
Evans, L.C.; Gariepy, R.F. Measure Theory and Fine Properties of Functions, Revised ed.; Textbooks in Mathematics; CRC Press: Boca Raton, FL, USA, 2015. [Google Scholar]

Sample Availability: Samples of the compounds are available from the authors.

Figure 1. Numerical solution of the linearization of the Monge-Ampère equation including growth. Branching occurs when there is the concentration of the solution, the morphogen, above a certain threshold (the color code stands for standard heat maps: red, high; blue, low). This simulation was provided by Jorge Castillo-Medina and developed in COMSOL. For more details, the reader is referred to [21] and the references therein.

Figure 2. A similar simulation as in the previous figure with a different growth rate. Notice the different branching structure. Simulation performed by J. Castillo as well; see also [21].

Figure 3. The figure shows schematically the geometric configuration when branching occurs on a plane. Illustration after [33] by the authors.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Islas, C.; Padilla, P.; Prado, M.A. Information Processing in the Brain as Optimal Entropy Transport: A Theoretical Approach. Entropy 2020, 22, 1231. https://doi.org/10.3390/e22111231

AMA Style

Islas C, Padilla P, Prado MA. Information Processing in the Brain as Optimal Entropy Transport: A Theoretical Approach. Entropy. 2020; 22(11):1231. https://doi.org/10.3390/e22111231

Chicago/Turabian Style

Islas, Carlos, Pablo Padilla, and Marco Antonio Prado. 2020. "Information Processing in the Brain as Optimal Entropy Transport: A Theoretical Approach" Entropy 22, no. 11: 1231. https://doi.org/10.3390/e22111231

APA Style

Islas, C., Padilla, P., & Prado, M. A. (2020). Information Processing in the Brain as Optimal Entropy Transport: A Theoretical Approach. Entropy, 22(11), 1231. https://doi.org/10.3390/e22111231

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Information Processing in the Brain as Optimal Entropy Transport: A Theoretical Approach

Abstract

1. Introduction

2. The Monge–Kantorovich Problem

2.1. Dual Formulation

2.2. Solution in the Real Line: Optimal Transportation Case

3. Solution in the Real Line: Optimal Entropy Transportation Case

The Monge–Ampère Equation

4. Neural Branching Structure and the Linearization of the Monge–Ampère Equation

5. Murray’s Law and Neural Branching

6. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A. Relevant Theorems and Some Proofs

Appendix B. Linearization of the Monge–Ampère Equation

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI