The Information Geometry of Space-time

The method of maximum entropy is used to model curved physical space in terms of points defined with a finite resolution. Such a blurred space is automatically endowed with a metric given by information geometry. The corresponding space-time is such that the geometry of any embedded spacelike surface is given by its information geometry. The dynamics of blurred space, its geometrodynamics, is constructed by requiring that as space undergoes the deformations associated with evolution in local time, it sweeps a four-dimensional space-time. This reproduces Einstein's equations for vacuum gravity. We conclude with brief comments on some of the peculiar properties of blurred space: There is a minimum length and blurred points have a finite volume. There is a relativistic"blur dilation". The volume of space is a measure of its entropy.


Introduction
The problem of reconciling quantum theory (QT) and general relativity (GR) has most commonly been addressed by preserving the framework of QT essentially unchanged while modifying the structure and dynamics of space-time. This is not unreasonable. Einstein's equation, G µν = 8πG T µν , relates geometry on the left to matter on the right. Since our best theories for the matter right hand side are QTs it is natural to try to construct a theory in which the geometrical left hand side is also of quantum mechanical origin. 1 Further thought however shows that this move carries a considerable risk, particularly because the old process of quantization involves ad hoc rules which, however successful in the past, have led to conceptual difficulties that would immediately spread and also infect the gravitational field. One example is the old quantum measurement problem and its closely related cousin the problem of macroscopic superpositions. Do quantum superpositions of space-times even * Presented at MaxEnt 2019, the 39th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering (June 30-July 5, 2019, Garching bei München, Germany). 1 For an introduction to the extensive literature on canonical quantization of gravity, loop quantum gravity, string theory, and causal sets see e.g., [1] [2]. make sense? In what direction would the future be? Another example is the cosmological constant problem. Does the zero point energy of quantum fields gravitate? Why does it not give rise to unacceptably large space-time curvatures? Considerations such as these suggest that the issue of whether and how to quantize gravity hinges on a deeper understanding of the foundations of QT and also on a deeper understanding of GR and of geometry itself -what, after all, is distance? Why are QT and GR framed in such different languages? Recent developments indicate that they might be closer than previously thought -the link is entropy. Indeed, in the entropic dynamics approach [3]- [6] QT is derived as an application of entropic methods of inference [7] with a central role assigned to concepts of information geometry. 2 And, on the GR side, the link between gravity and entropy has been recognized from the early work of Bekenstein and Hawking and further reaffirmed in more recent thermodynamic approaches to GR [16]- [21].
In a previous paper [22] we used the method of maximum entropy to construct a model of physical space in which points are blurred; they are defined with a finite resolution. Such a blurred space is a statistical manifold and therefore it is automatically endowed with a Riemannian metric given by information geometry. Our goal here is to further close the gap between QT and GR by formulating the corresponding Lorentzian geometry of space-time.
The extension from space to space-time is not just a simple matter of applying information geometry to four dimensions rather than three. The problem is that information geometry leads to metrics that are positive -statistical manifolds are inevitably Riemannian -which cannot reproduce the light-cone structure of space-time. Some additional ingredient is needed. We do not model space-time as a statistical manifold. Instead, space-time is modelled as a fourdimensional manifold such that the geometry of all space-like embedded surfaces is given by information geometry. We find that in the limit of a flat space-time our model coincides with a stochastic model of space-time proposed long ago by Ingraham by following a very different line of argument [23].
Blurred space is a curious hybrid: some features are typical of discrete spaces while other features are typical of continuous manifolds. 3 For example, there is a minimum length and blurred points have a finite volume. The volume of a region of space is a measure of the number points within it, and it is also a measure of its bulk entropy. Under Lorentz transformations the minimum length suffers a dilation which is more analogous to the relativistic time dilation than to the familiar length contraction.
The dynamics of blurred space, its geometrodynamics, is constructed by requiring that as three-dimensional space undergoes the deformations associated with time evolution it sweeps a four-dimensional space-time. As shown in a remarkable paper by Hojman, Kuchaȓ, and Teitelboim [25] in the context of the familiar sharp space-time this requirement is sufficient to determine the dynamics. Exactly the same argument can be deployed here. The result is that in the absence of matter the geometrodynamics of four-dimensional blurred space-time is given by Einstein's equations. The coupling of gravity to matter will not be addressed in this work.

The information geometry of blurred space
To set the stage we recall the model of blurred space as a smooth threedimensional manifold X the points of which are defined with a finite resolution [22]. It is noteworthy that, unlike the very rough space-time foams expected in some models of quantum gravity, one expects blurred space to be very smooth because irregularities at scales smaller than the local uncertainty are suppressed. Blurriness is implemented as follows: when we say that a test particle is located at x ∈ X (with coordinates x a , a = 1, 2, 3) it turns out that it is actually located at some unknown neighboring x ′ . The probability that x ′ lies within Since to each point x ∈ X one associates a distribution p(x ′ |x) the space X is a statistical manifold automatically endowed with a metric. Indeed, when points are blurred one cannot fully distinguish the point at x described by the distribution p(x ′ |x) from another point at x + dx described by p(x ′ |x + dx). The quantitative measure of distinguishability [7][10] is the information distance, where the metric tensor g abthe information metric -is given by, (We adopt the standard notation ∂ a = ∂/∂x a and dx ′ = d 3 x ′ .) Thus, in a blurred space distance is distinguishability. In Section 4 we will briefly address the physical/geometrical interpretation of dℓ. For now we merely state [22] that dℓ measures the distance between two neighboring points in units of the local uncertainty defined by the distribution p(x ′ |x), that is, information length is measured in units of the local blur.
In order to completely define the information geometry of X which will allow us to introduce notions of parallel transport, curvature, and so on, one must specify a connection or covariant derivative ∇. The natural choice is the Levi-Civita connection, defined so that ∇ a g bc = 0. Indeed, as argued in [26], the Levi-Civita connection is to be preferred because, unlike the other α-connections [10], it does not require imposing any additional structure on the Hilbert space of functions (p) 1/2 . The next step is to use the method of maximum entropy to assign the blur distribution p(x ′ |x). The challenge is to identify the constraints that capture the physically relevant information. One might be tempted to consider imposing constraints on the expected values of this does not work because in a curved space neither of these constraints is covariant. This technical difficulty is evaded by maximizing entropy on the flat space T P that is tangent to X at P and then using the exponential map (see [22]) to "project" the distribution from the flat T P to the curved space X. It is important to emphasize that the validity of this construction rests on the assumption that the normal neighborhood of every point x -the region about x where the exponential map is 1-1 -is sufficiently large. The assumption is justified provided the scale of the blur is much smaller than the scale over which curvature effects are appreciable.
Consider a point P ∈ X with generic coordinates x a and a positive definite tensor field γ ab (x). The components of y ∈ T P are y a . The distributionp(y|P ) on T P is assigned on the basis of information about the expectation y a P and the variance-covariance matrix y a y b P , y a P = 0 and y a y b P = γ ab (P ) .
On X it is always possible to transform to new coordinates such that γ ij (P ) = δ ij and ∂ k γ ij (P ) = 0 , where i, j, . . . = 1, 2, 3. If γ ab were a metric tensor the new coordinates would be called Riemann Normal Coordinates at P (RNC P ). The new components of y are and the constraints (3) take the simpler form, y i P = 0 and y i y j P = δ ij .
We can now maximize the entropy relative to the measureq(y) subject to (7) and normalization. Since T P is flat we can takeq(y) to be constant and we may ignore it. The result in RNC P iŝ Using the inverse of eq.(6) we can transform back to the original coordinates y a , y a = X a i y i and γ ab = X i a X j b δ ij .
The resulting distribution is also Gaussian, and the matrix γ ab of Lagrange multipliers turns out to be the inverse of the correlation matrix γ ab , γ ab γ bc = δ c a . Next we use the exponential map to project y i coordinates on the flat T P to the RNC P coordinates on the curved X, The corresponding distribution p(x ′i |P ) induced on X byp(y i |P ) on T P is or Thus, in RNC P the distribution p(x ′i |x i ) retains the Gaussian form. We can now invert (4) and transform back to the original generic frame of coordinates x a and define p(x ′a |x a ) by which is an identity between scalars and holds in all coordinate systems. In the x a coordinates the distribution p(x ′a |x a ) will not, in general, be Gaussian, (16) Finally we substitute (16) into (2) to calculate the information metric g ab . (The integral is easily handled in RNC P .) The result is deceptively simple, The main result of [22] was to show that the metric g ab of a blurred space is a statistical concept that measures the "degree of distinguishability" between neighboring points. The metric is given by the Lagrange multipliers γ ab associated to the covariance tensor γ ab that describes the blurriness of space.

Space-time and the geometrodynamics of pure gravity
The constraint that determines the dynamics is the requirement that blurred space be a three-dimensional spacelike "surface" embedded in four-dimensional space-time. As shown in [25] the reason this condition is so constraining is that when evolving from an initial to a final surface every intermediate surface must also be embeddable in the same space-time and, furthermore, the sequence of intermediate surfaces -the path or foliation -is not unique. Such a "foliation invariance", which amounts to the local relativity of simultaneity, is a requirement of consistency: if there are two alternative paths to evolve from an initial to a final state, then the two paths must lead to the same result. Space-time is foliated by a sequence of space-like surfaces {Σ}. Points on the surface Σ are labeled by coordinates x a (a = 1, 2, 3) and space-time events are labeled by space-time coordinates X µ (µ = 0, 1, 2, 3). The embedding of Σ within space-time is defined by four functions X µ = X µ (x). An infinitesimal deformation of Σ to a neighboring Σ ′ is specified by X µ (x) → X µ (x)+δX µ (x). The deformation vector δX µ (x) is decomposed into normal and tangential components, where n µ is the unit normal to the surface and the three vectors X µ a = ∂X µ /∂x a are tangent to the coordinate lines x a (n µ n µ = −1, n µ X µ a = 0). We assume a phase space endowed with a symplectic structure: the basic dynamical variables are the surface metric g ab (x) and its canonically conjugate momentum π ab (x). This leads to a Hamiltonian dynamics where the super-Hamiltonian H ⊥ (x)[g, π] and the super-momentum H a (x)[g, π] generate normal and tangential deformations respectively. In order for the dynamics to be consistent with the kinematics of deformations the Poisson brackets of H ⊥ and H a must obey two sets of conditions [27] [28]. First, they must close in the same way as the "group" of deformations, that is, they must provide a representation of the "algebra" of deformations 4 , And second, the initial values of the variables g ab and π ab must be restricted to obey the weak constraints A remarkable feature of the resulting dynamics is that once the constraints (22) are imposed on one initial surface Σ they will be satisfied automatically on all subsequent surfaces. As shown in [25] the generators that satisfy (19)(20)(21) are H ⊥ = 2κG abcd π ab π cd − 1 2κ G abcd = 1 2 g −1/2 (g ac g bd + g ad g bc − g ab g cd ) , where κ and Λ are constants which, once the coupling to matter is introduced, can be related to Newton's constant G = c 4 κ/8π and to the cosmological constant Λ. Equations (22)(23)(24)(25) are known to be equivalent to Einstein's equations in vacuum.
To summarize: (a) Space-time is constructed so that the geometry of any embedded spacelike surface is given by information geometry. (b) The geometrodynamics of blurred space is given by Einstein's equations. These are the main conclusions of this paper.

Discussion
Dimensionless distance? -As with any information geometry the distance dℓ given in eqs.(1-2) turns out to be dimensionless. The interpretation [22] is that an information distance is measured distances in units of the local uncertainty -the blur. To make this explicit we write the distribution (14) that describes a blurred point in RNC P in the form so that the information distance between two neighboring points is Since the blur ℓ 0 is the only unit of length available to us (there are no external rulers) it follows that ℓ 0 = 1 but it is nevertheless useful to write our equations showing ℓ 0 explicitly. In (26) the two points x and x ′ are meant to be simultaneous.
Minimum length -To explore the geometry of blurred space it helps to distinguish the abstract "mathematical" points that are sharply defined by the coordinates x from the more "physical" blurred points. We shall call them cpoints and b-points respectively. In RNC P the distance between two c-points located at x and at x + ∆x is given by (27). To find the corresponding distance ∆λ between two b-points located at x and at x + ∆x we recall that when we say a test particle is at x it is actually located at x ′ = x + y so that Taking the expectation over y with the probability (26) -use y i = 0 and y i y j = ℓ 2 0 δ ij -we find We see that even as ∆x → 0 and the two b-points coincide we still expect a minimum rms distance of √ 6ℓ 0 .
Blur dilation -The size of the blur of space is a length but it does not behave as the length of a rod. When referred to a moving frame it does not undergo a Lorentz contraction. It is more analogous to time dilation. Just as a clock marks time by ticking along the time axis; so are lengths measured by ticking ℓ 0 s along them. By the principle of relativity all inertial observers measure the same blur in their own rest frames -the proper blur ℓ 0 . Relative to another inertial frame the blur is dilated to γℓ 0 where γ is the usual relativistic factor. This implies the proper blur ℓ 0 is indeed the minimum attainable.
The volume of a blurred point: is space continuous or discrete? -A b-point is smeared over the whole of space but we can still define a useful measure of its volume by adding all volume elements g 1/2 (x ′ )d 3 x ′ weighed by the scalar density p(x ′ |x)/g 1/2 (x ′ ). Therefore in ℓ 0 units a blurred point has unit volume. This means that we can measure the volume of a finite region of space by counting the number of b-points it contains. It also means that the number of distinguishable b-points within a region of finite volume is finite which is a property one would normally associate to discrete spaces. In this sense blurred space is both continuous and discrete. (See also [24].) The entropy of space -The statistical state of blurred space is the joint distribution of all the y x variables associated to every b-point x. We assume that the y x variables at different xs are independent, and therefore their joint distribution is a product,P [y] = xp (y x |x) .
From (11) and (17) the distributionp (y a x |x) in the tangent space T x is Gaussian, which shows explicitly how the information metric g ab determines the statistical state of space. Next we calculate the total entropy of space, relative to the uniform distribution which is independent of y -a constant. Since the y's in eq.(30) are independent variables the entropy is additive, S[g] = x S(x), and we only need to calculate the entropy S(x) associated to a b-point at a generic location x, Thus, the entropy per b-point is a numerical constant s 0 and the entropy of any region R of space, S R [g], is just its volume, Thus, the entropy of a region of space is proportional to the number of b-points within it and is proportional to its volume.
Canonical quantization of gravity? -The picture of space as a smooth blurred statistical manifold stands in sharp contrast to ideas inspired from various models of quantized gravity in which the short distance structure of space is dominated by extreme fluctuations. From our perspective it is not surprising that attempts to quantize gravity by imposing commutation relations on the metric tensor g ab have not been successful. The information geometry approach suggests a reason why: quantizing the Lagrange multipliers g ab = γ ab would be just as misguided as formulating a quantum theory of fluids by imposing commutation relations on those Lagrange multipliers like temperature, pressure, or chemical potential, that define the thermodynamic macrostate.
Physical consequences of a minimum length? -A minimum length will eliminate the short wavelength divergences in QFT. This in turn will most likely illuminate our understanding of the cosmological constant and affect the scale dependence of running coupling constants. One also expects that QFT effects that are mediated by short wavelength excitations should be suppressed. For example, the lifetime of the proton ought to be longer than predicted by grandunified theories formulated in Minkowski space-time. The nonlocality implicit in a minimum length might lead to possible violations of CPT symmetry with new insights into matter-antimatter asymmetry. Of particular interest would be the early universe cosmology where inflation might amplify minimum-length effects possibly making them observable.