A Transport–Information Geometric Formulation of Cosmic Structure Formation: A Unified Dual-Affine Perspective

Takeuchi, Tsutomu T.

doi:10.3390/sym18060992

Open AccessArticle

A Transport–Information Geometric Formulation of Cosmic Structure Formation: A Unified Dual-Affine Perspective

by

Tsutomu T. Takeuchi

^1,2

¹

Division of Particle and Astrophysical Science, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8602, Japan

²

The Research Center for Statistical Machine Learning, The Institute of Statistical Mathematics, 10-3 Midori-cho, Tachikawa, Tokyo 190-8562, Japan

Symmetry 2026, 18(6), 992; https://doi.org/10.3390/sym18060992 (registering DOI)

Submission received: 19 April 2026 / Revised: 30 May 2026 / Accepted: 3 June 2026 / Published: 9 June 2026

(This article belongs to the Special Issue Symmetries in Galaxies: Structure, Motion, and Evolution of Galaxies)

Download

Browse Figures

Versions Notes

Abstract

Cosmic large-scale structure formation is commonly described in terms of the evolution of density fluctuations and correlation statistics. However, such approaches primarily characterize amplitude variations and do not directly capture the spatial rearrangement of mass distributions. Recent developments based on optimal transport theory have introduced a complementary perspective, in which structure formation is understood as a transport process in the space of probability measures equipped with Wasserstein geometry. In this work, we extend this framework by introducing transport–information geometry, which unifies transport geometry with information geometry. Within this formulation, cosmological states are represented as elements of the product space of probability measures and statistical manifolds, allowing gravitational mass transport and generative deformations associated with galaxy formation to be treated in a unified manner. Using entropic optimal transport, we demonstrate that Wasserstein geometry and Kullback–Leibler-based information geometry are connected within a single mathematical structure, leading to a geometric interpretation of cosmological evolution as a coupled transport–information process endowed with a dual-affine structure. In this picture, gravitational evolution corresponds to generative deformation associated with e-geometry, while observational processes, including finite sampling and survey selection, are described as mixing and projection in m-geometry. This dual-affine cosmology provides a unified framework in which gravitational transport, galaxy bias, observational effects, and nonlinear multi-stream structures are consistently incorporated. The resulting formulation offers a systematic basis for cosmological inference, data analysis, and stochastic descriptions of structure formation.

Keywords:

galaxy evolution; galaxy formation; large-scale structure; structure formation; optimal transport; information geometry

1. Introduction

Cosmic large-scale structure formation is commonly understood as the amplification of initial density fluctuations driven by gravitational instability. In this framework, the statistical properties of the density field

ρ (x)

, in particular two-point statistics such as the correlation function and the power spectrum, have been widely used as primary descriptors. The effects of galaxy formation are typically incorporated through bias models, and observational data are given as finite galaxy catalogs

μ_{N}

. In this way, cosmological structure formation is described as a hierarchical generative process,

\begin{matrix} ρ_{lin} \to ρ_{NL} \to λ_{gal} \to μ_{N}, \end{matrix}

(1)

where

λ_{gal}

is the galaxy intensity field. This chain contains several conceptually distinct operations. The transition from

ρ_{lin}

to

ρ_{NL}

involves the geometric rearrangement of matter by gravitational instability, whereas the transition from

ρ_{NL}

to

λ_{gal}

involves probabilistic galaxy formation and bias, and the final transition to

μ_{N}

involves finite sampling and observational selection. Existing approaches often emphasize one of these aspects separately. Transport-based formulations describe the rearrangement of matter distributions, while statistical approaches focus on probabilistic generation, galaxy bias, and cosmological inference. Conventional summary statistics mainly characterize the amplitude structure of density fluctuations and do not directly describe the spatial rearrangement of mass distributions. In particular, correlation functions and power spectra compress information into amplitude-based statistics and do not explicitly retain the geometric deformation or transport history of the mass distribution. A unified framework that treats mass transport, stochastic galaxy formation, and observational projection within a single geometric structure is therefore still needed.

To address this limitation, it is natural to seek a geometric description that explicitly tracks how mass distributions are rearranged during structure formation. Wasserstein geometry based on optimal transport theory has therefore recently attracted attention [1,2,3]. Optimal transport methods have already found important applications in cosmology. In particular, transport-based reconstruction techniques have been used to infer primordial matter distributions from observed galaxy catalogs and to characterize nonlinear mass rearrangement during structure formation [4]. More recently, entropic optimal transport, Schrödinger bridges, and diffusion-based generative formulations have attracted increasing attention in probabilistic modeling and inverse problems. In optimal transport, the distance between two probability distributions is defined as the minimal cost required to rearrange mass from one distribution to another, providing a natural geometric structure on the space of probability measures. From this perspective, cosmological structure formation can be understood not only as a change in density amplitudes but also as a transport process of mass distributions. Indeed, the distance between the initial density field

ρ_{lin}

and the observed galaxy catalog

μ_{N}

can be characterized by the quadratic Wasserstein distance

W_{2} (ρ_{lin}, μ_{N})

, and its statistical expectation can be decomposed into contributions from gravitational transport, galaxy formation bias, and finite sampling. This result indicates that cosmological structure formation can be reinterpreted as a geometric problem of mass rearrangement rather than purely a statistical problem.

However, the cosmological generative process cannot be described solely by mass transport. Galaxy formation involves stochastic processes, and observed catalogs are finite samples, so the generative process inevitably includes information loss and coarse-graining. In addition, observational effects such as survey windows and selection functions, mixing induced by multi-stream structures, and fluctuations due to finite volume cannot be captured by simple transport models. Therefore, a more complete description of cosmological structure formation requires a framework that simultaneously incorporates transport geometry and statistical structures associated with probabilistic generation and information loss. A complete geometric description must therefore incorporate not only transport, but also probabilistic generation, statistical mixing, and information loss.

A natural mathematical framework for describing such statistical aspects is information geometry [5,6]. In information geometry, families of probability distributions are treated as statistical manifolds endowed with geometric structures based on the Kullback–Leibler divergence. In particular, exponential-family structures associated with probabilistic generation and convex structures associated with mixture distributions appear as dual-affine geometry. Within this framework, generative processes are described by e-geometry, while observational and mixing processes are described by m-geometry. This dual structure reflects the fact that probabilistic generation and statistical observation correspond to distinct geometric operations. From a cosmological perspective, this distinction is particularly natural. The evolution from primordial fluctuations to nonlinear structures may be viewed as a generative process that continuously deforms probability distributions, whereas observational sampling, survey masks, selection functions, and finite-volume effects act as mixing and projection operations. This suggests that the dual-affine structure of information geometry provides a natural language for separating physical evolution from observational transformations.

As a concrete representation in information geometry, probability distributions can be embedded via their log-likelihood functions. Specifically, a distribution

p (x)

is mapped to

\begin{matrix} l (x) = ln p (x), \end{matrix}

(2)

which defines a representation in a function space. In this representation, distances between distributions can be evaluated through the variance of differences in ℓ, and the Kullback–Leibler divergence is locally approximated by the squared Euclidean distance. This construction maps the infinite-dimensional space of probability distributions to a space of log-density functions and allows projection onto finite-dimensional representations through evaluation at a finite set of sample points. In this sense, it provides a natural coordinate system that connects theoretical probability distributions with observational data. In the present context, this log-density representation can be viewed as a transformation

\begin{matrix} ρ \mapsto ln ρ, \end{matrix}

(3)

which plays a key role in connecting density transport described by Wasserstein geometry with statistical generation described by information geometry.

Despite the rapid development of optimal transport cosmology, Schrödinger bridge formulations, Bayesian reconstruction methods, and information-geometric approaches, these frameworks have largely been developed independently. Transport-based approaches primarily focus on the geometric rearrangement of matter distributions, whereas information-geometric approaches focus on probabilistic generation, statistical inference, and observational uncertainty. As a result, the relationship between transport geometry and information geometry has remained largely implicit in cosmological applications.

The central idea of the present work is to bridge this gap by introducing a transport–information manifold,

\begin{matrix} (ρ, θ) \in P_{2} (Ω) \times M_{cosmo}, \end{matrix}

(4)

where

ρ

denotes the mass distribution and

θ

represents the parameters of a cosmological generative model, and where

Ω

denotes the spatial domain of the mass distribution. Within this formulation, gravitational transport, stochastic galaxy formation, and observational effects are interpreted as different geometric operations acting on the same state space. This construction extends existing transport-based descriptions of cosmology by incorporating information-geometric degrees of freedom associated with probabilistic generation, statistical mixing, and observational projection.

Motivated by these considerations, the aim of this work is to integrate optimal transport theory and information geometry and to formulate the cosmological generative process within a unified geometric framework that incorporates both transport and information. In particular, by employing entropic optimal transport, we show that Wasserstein transport geometry and Kullback–Leibler information geometry are naturally connected within a single variational principle. Within this framework, gravitational evolution is interpreted as transport in e-geometry associated with generative deformation, while galaxy formation and observational processes are described as mixing and projection in m-geometry.

Accordingly, cosmological evolution can be represented as a combination of three complementary operations:

\begin{matrix} \begin{matrix} transport \\ + \\ statistical generation \\ + \\ observational projection . \end{matrix} \end{matrix}

(5)

The proposed framework is intended to extend rather than replace standard cosmological dynamics. In appropriate limits, the transport sector recovers the continuity equation and the potential-flow dynamics underlying the Zel’dovich approximation, while the information-geometric sector provides a complementary description of stochastic generation, information diffusion, coarse graining, and observational inference. Furthermore, the entropic regularization parameter

ε

acquires a physical interpretation as a measure of stochasticity and information diffusion in the cosmological generative process. These connections will be developed explicitly in the later sections.

The principal contributions of this work may be summarized as follows:

1.: Introduction of a transport–information manifold that combines Wasserstein geometry and information geometry into a unified cosmological state space.
2.: Identification of entropic optimal transport as a geometric bridge between transport geometry and information geometry.
3.: Formulation of cosmological evolution as a dual-affine process consisting of transport, probabilistic generation, and observational projection.
4.: Geometric interpretation of observational effects, including finite sampling, survey selection, coarse graining, and stochasticity, as manifestations of m-geometry.

The central claim of this paper is that cosmological structure formation can be understood as a geometric process consisting of mass transport, probabilistic generation, and observational projection. Within this framework, gravitational transport, galaxy bias, finite sampling, observational effects, and multi-stream mixing are reorganized within a unified transport–information geometry. The overall structure of the transport–information framework developed in this work is summarized in Figure 1.

The structure of this paper is as follows. Section 2 formulates galaxy distributions as finite point processes and establishes the statistical structure of cosmological data together with the foundation of dual-affine geometry. Section 3 introduces entropic optimal transport and clarifies the relationship between transport and information geometry, showing that they can be described in a unified way through exponential couplings. Section 4 defines the transport–information manifold and constructs the geometric structure of the cosmological state space. Section 5 introduces an action principle and dynamics on this space, formulating cosmological evolution as geometric motion governed by coupled transport and information terms. Section 6 formulates observational processes as m-affine projections and establishes the framework of dual-affine cosmology by decomposing the generative process into geometric transformation and mixing. Section 7 discusses the cosmological implications of the framework, including the reformulation of inference as an m-projection problem, the interpretation of structure formation as dual-affine dynamics, and the geometric separation of physical and observational effects together with scale-dependent structure. Section 8 demonstrates how the transport–information framework recovers standard cosmological dynamics in the deterministic limit and illustrates the complete generative chain through a one-dimensional Zel’dovich toy model, including galaxy bias, finite sampling, stochastic transport, and the physical interpretation of the entropic scale. Section 9 outlines possible extensions and connections of transport–information cosmology, including non-Gaussian statistics, unified response theory, observational pipelines, and links to information geometry, statistical learning, and modern generative inference methods. Finally, Section 10 summarizes the main results and emphasizes the unified interpretation of cosmic structure formation as a coupled transport–information process on the transport–information manifold.

2. Dual-Affine Structure of Cosmological Generation

Galaxy survey data obtained from observations or simulations are finite sets of galaxy positions rather than continuous density fields. As introduced in Equation (1), the cosmological generative chain is written as

\begin{matrix} ρ_{lin} \to ρ_{NL} \to λ_{gal} \to μ_{N}, \end{matrix}

(6)

where the first arrow represents gravitational mass transport, the second probabilistic galaxy formation/bias, and the third finite sampling and observational selection (see also Figure 1). The aim of this section is to identify the dual-affine structure naturally associated with these distinct operations.

A natural framework that provides geometric structure on the space of probability measures is optimal transport theory. For probability measures

ρ, μ \in P_{2} (R^{3})

, the quadratic Wasserstein distance is defined as

\begin{matrix} W_{2}^{2} (ρ, μ) = inf_{π \in Π (ρ, μ)} \int {∥ x - y ∥}^{2} d π (x, y), \end{matrix}

(7)

where

Π (ρ, μ)

denotes the set of couplings with marginals

ρ

and

μ

. This distance can be interpreted as the minimal cost required to rearrange mass from

ρ

to

μ

, and it endows the space of probability measures with a nonlinear Riemannian geometric structure. In particular, geodesics in this space are given by continuous deformations induced by transport maps, allowing the time evolution of density fields to be understood as geometric motion.

This interpretation is directly connected to standard Newtonian structure formation. In the single-stream regime, the matter distribution evolves according to the continuity equation

\begin{matrix} \frac{\partial ρ}{\partial t} + \nabla \cdot (ρ v) = 0, \end{matrix}

(8)

where v is the peculiar velocity field. When the flow is potential,

v = \nabla ϕ

, Equation (8) is precisely the kinematic structure underlying the Benamou–Brenier dynamical formulation of optimal transport. In the Zel’dovich approximation, the Lagrangian map is written as

\begin{matrix} x (q, t) = q + D (t) Ψ_{0} (q), Ψ_{0} (q) = \nabla Φ (q), \end{matrix}

(9)

so that the evolution is represented by a potential transport map. The gravitational potential is related to the matter density contrast through the Poisson equation,

\begin{matrix} \nabla^{2} Φ = δ_{lin}, \end{matrix}

(10)

up to the conventional cosmological normalization. Thus, the transport sector of the present framework is not a replacement for standard cosmological dynamics but a geometric reformulation of the mass-conserving part of gravitational evolution.

Furthermore, in Wasserstein space a notion of curvature can be introduced through the convexity of the entropy functional

\begin{matrix} H (ρ) = \int ρ (x) ln ρ (x) d x \end{matrix}

(11)

along geodesics. This structure is related to curvature lower bounds in the sense of Lott–Sturm–Villani and provides a geometric characterization of the nonlinearity of density evolution. In this sense, Wasserstein geometry offers a natural geometric framework for describing the transport and deformation of density fields.

However, mass-preserving transport alone cannot describe the probabilistic and observational stages of the chain. In realistic cosmological processes, stochasticity in galaxy formation and information loss due to finite observations play essential roles, and thus a more general geometric structure is required. The transition from

ρ_{NL}

to

λ_{gal}

involves bias, halo occupation, baryonic physics, and selection effects, while

λ_{gal} \to μ_{N}

involves sampling noise and finite survey geometry. These are probabilistic transformations and projections rather than mass-preserving transport maps.

A natural framework for describing such statistical structures is information geometry, e.g., [5,6,7]. In information geometry, families of probability distributions are treated as statistical manifolds equipped with the Fisher metric and dual-affine connections based on the Kullback–Leibler divergence. Under this structure, statistical manifolds possess two dual-affine structures: the e-connection associated with exponential families and the m-connection associated with mixture families.

Suppose that a family of probability distributions

p (x)

is given by the exponential family

\begin{matrix} p (x | θ) = exp (θ^{i} F_{i} (x) - ψ (θ)), \end{matrix}

(12)

where

θ^{i}

are natural parameters, where

F_{i} (x)

denote generic sufficient statistics of the model family, and where

ψ (θ)

is the log-partition function. The natural parameters

θ

provide e-affine coordinates on the statistical manifold, and the linear interpolation

\begin{matrix} θ (t) = (1 - t) θ_{0} + t θ_{1} \end{matrix}

(13)

defines an e-geodesic corresponding to generative deformation within the exponential family.

On the other hand, the expectation coordinates

\begin{matrix} η_{i} = E [F_{i} (x)] \end{matrix}

(14)

provide m-affine coordinates. In this coordinate system, the linear interpolation

\begin{matrix} η (t) = (1 - t) η_{0} + t η_{1} \end{matrix}

(15)

defines an m-geodesic corresponding to mixture operations of probability distributions. Thus, the dual-affine structure of information geometry can be understood as a geometric unification of two fundamental probabilistic operations: generative transformation and statistical mixing.

This dual structure admits a natural interpretation in the context of cosmological generation. Gravitational instability can be viewed as a generative deformation from the initial density field to the late-time density field. In the Lagrangian description, the particle displacement field

Ψ

defines the mapping

\begin{matrix} x (q, t) = q + Ψ (q, t), \end{matrix}

(16)

which transports the mass distribution. Such generative deformations share structural similarities with parameter-driven distribution transformations and can be interpreted as operations in e-geometry. The geometric interpretation of large-scale structure formation through Lagrangian displacements originates in the classical work of Zel’dovich and subsequent developments in relativistic cosmology [8].

In contrast, observed galaxy catalogs are obtained as finite samples from a Poisson or Cox process based on an intensity field

λ_{gal} (x)

. Observational operations therefore involve statistical sampling with averaging and mixing, which naturally correspond to m-geometry. In this sense, observed galaxy distributions can be understood as m-geometric projections of the underlying continuous distributions. Thus, the e-geometric direction describes generative deformation, while the m-geometric direction describes mixing, averaging, sampling, and projection. This distinction provides a geometric language for separating physical evolution from observational transformations.

This correspondence becomes more concrete when considered through entropic optimal transport. For a coupling

π

, consider the variational problem

\begin{matrix} D_{ε} (ρ, μ) = inf_{π \in Π (ρ, μ)} [\int c (x, y) d π (x, y) + ε KL (π ∥ ρ \otimes μ)] . \end{matrix}

(17)

The optimal coupling has an exponential structure and forms an e-geometric family. On the other hand, the space of all couplings

Π (ρ, μ)

is a convex subset of the probability simplex and thus carries an m-geometric structure. Therefore, entropic optimal transport provides a concrete realization in which transport geometry and information geometry coexist within a single structure.

The parameter

ε

controls the balance between deterministic transport and statistical spreading. In the limit

ε \to 0

, the coupling concentrates on the minimum-cost transport plan, whereas finite

ε

statistically mixes multiple transport paths. Thus, entropic optimal transport already realizes a dual structure: deterministic rearrangement is encoded by the transport cost, while stochastic mixing is encoded by the entropy term.

From this perspective, cosmological large-scale structure formation can be understood as a combination of two geometric operations,

\begin{matrix} \begin{matrix} e - geometry : & generative evolution and deterministic transport; \\ m - geometry : & observational mixing, sampling, and coarse graining . \end{matrix} \end{matrix}

(18)

That is, gravitational evolution governs generative transformations, while observational sampling introduces mixing and coarse-graining. This dual-affine structure provides a geometric expression of the intrinsic asymmetry of cosmological generation processes. The next section develops this correspondence explicitly by showing how entropic optimal transport produces exponential couplings and how these couplings connect Wasserstein geometry with Kullback–Leibler information geometry.

3. Entropic Optimal Transport and Exponential Couplings

In the previous section, we saw that the cosmological generation process can be decomposed into e-geometry as generative transformation and m-geometry as observational mixing from the viewpoint of dual-affine structure. However, at that stage information geometry and optimal transport theory remain parallel frameworks, and a concrete mathematical mechanism connecting them has not yet been made explicit. The purpose of this section is to provide such a bridge.

Entropic optimal transport provides the concrete bridge between the transport-dominated description of matter evolution and the probabilistic description required for observational cosmology. Unlike classical optimal transport, it incorporates stochastic spreading through an entropy term and therefore naturally accommodates finite sampling, observational selection, and multi-stream mixing.

To this end, we introduce entropic optimal transport, summarized in Figure 1. The coupling that appears in entropic regularization admits an exponential representation, and the space of such couplings naturally possesses an e-flat structure in the sense of information geometry. On the other hand, the space of all couplings forms a probability simplex and thus has an m-flat structure. Therefore, entropic optimal transport provides a natural setting in which dual-affine structure and transport geometry intersect. Furthermore, when considering its dynamical counterpart in terms of the Schrödinger bridge, this structure extends beyond static distributional distances and can be understood as the geometry of stochastic generative processes. This point is essential for the geometric formulation of cosmological generation pursued in this work.

An important aspect to emphasize is that this connection is not merely formal but is realized at the level of a variational principle. Classical optimal transport is formulated as the minimization of a transport cost, whereas information geometry is based on the Kullback–Leibler divergence. Entropic optimal transport combines these two quantities within a single functional to be minimized, so that transport geometry and information geometry are not externally coupled but instead emerge as intrinsic aspects of a unified structure. In this sense, entropic optimal transport can be regarded as a concrete implementation of transport–information geometry.

3.1. Discrete Formulation and Sinkhorn Scaling

In cosmological applications, observed galaxy catalogs are finite samples and are often represented as discrete measures. We therefore consider

\begin{matrix} ρ = \sum_{i} a_{i} δ_{D} (x - x i), μ = \sum_{j} b_{j} δ_{D} (y - y j) . \end{matrix}

(19)

In this case, the coupling is represented by a non-negative matrix

π i j

whose row sums and column sums satisfy

\begin{matrix} \sum_{j} π i j = a_{i}, \sum_{i} π_{i j} = b_{j} . \end{matrix}

(20)

The entropic regularization problem then becomes

\begin{matrix} min_{π_{i j} \geq 0} [\sum_{i, j} c_{i j} π_{i j} + ε \sum_{i, j} π_{i j} ln \frac{π_{i j}}{a_{i} b_{j}}] \end{matrix}

(21)

where

c_{i j} = {| x i - y j |}^{2}

. Applying the method of Lagrange multipliers, one finds that the optimal coupling takes the form

\begin{matrix} π i j = u_{i} K i j v_{j} \end{matrix}

(22)

where

\begin{matrix} K_{i j} = exp (- \frac{c_{i j}}{ε}) \end{matrix}

(23)

and

u_{i}, v_{j}

are scaling factors determined so as to satisfy the marginal constraints. This is the basic form of Sinkhorn scaling [9]. Thus, for finite galaxy catalogs the coupling matrix

π_{i j}

is not merely a numerical approximation but the directly observable transport object. Furthermore, since

\begin{matrix} ln π_{i j} = ln u_{i} + ln v_{j} - \frac{c_{i j}}{ε} \end{matrix}

(24)

the optimal coupling becomes affine after taking the logarithm. This means that the family of couplings has an exponential representation.

An important point here is that this exponential structure becomes explicitly visible only after discretization. In the continuous setting, the exponential representation exists implicitly as a structure in function space. In the discrete setting, however,

π_{i j}

is given as a finite-dimensional vector, so that its logarithm is seen directly to possess a linear structure. In other words,

ln π_{i j}

separates into a sum of scalar functions, and the contributions from the row and column directions decompose additively. This decomposition corresponds to the natural parameter representation in information geometry, and

ln u_{i}

and

ln v_{j}

can be interpreted as dual parameters associated with the marginal distributions.

Moreover, this structure corresponds directly to the dual-affine structure of the coupling space. The space of

π_{i j}

has an m-flat structure as a probability simplex, while the family defined by the exponential representation forms an e-flat submanifold. From this point of view, Sinkhorn scaling can be understood as an iterative procedure that searches for an e-flat family within an m-flat space so as to satisfy the given marginal constraints. The algorithm can therefore be interpreted not merely as a numerical method, but as implementing a geometric projection along dual-affine structure.

The role of

ε

is also clear in this discrete setting. From the definition of

K_{i j}

, it follows that

ε

provides the scale that exponentially suppresses the effect of the cost matrix

c_{i j}

. When

ε

is small,

π_{i j}

concentrates on minimum-cost paths and reproduces the singular structure of classical optimal transport. When

ε

is finite, by contrast,

π_{i j}

becomes spread out and multiple transport paths are statistically mixed. This behavior shows concretely how the competition between transport and information appears at the discrete level. The parameter

ε

controls the spread of the coupling: small

ε

concentrates

π_{i j}

near minimum-cost paths, whereas finite

ε

statistically mixes multiple paths.

This discrete structure also connects naturally to the log-density representation introduced later in this paper. More precisely, the logarithm of

π_{i j}

can be treated as a point in a finite-dimensional Euclidean space, and comparison between distributions can then be described in terms of distances in that space. This viewpoint is isomorphic to the embedding of probability distributions by log-likelihood vectors and provides a natural geometric representation for finite-sample data. Discrete entropic OT therefore provides a concrete setting that connects transport geometry, information geometry, and finite-dimensional representation through log-density coordinates.

3.2. Exponential Family Structure of Entropic Couplings

The exponential representation of entropic couplings provides the canonical realization of an e-flat structure. By rewriting the expression above slightly, the optimal coupling can be written as

\begin{matrix} π_{i j} = exp (α_{i} + β_{j} - \frac{c_{i j}}{ε}) \end{matrix}

(25)

where

\begin{matrix} α_{i} = ln u_{i}, β_{j} = ln v_{j} \end{matrix}

(26)

This is clearly of exponential-family form. That is, the family of entropic couplings can be regarded as an e-flat family described by the natural parameters

(α, β)

. This point is consistent with the viewpoint of information geometry, which emphasizes the correspondence between exponential representations and dually flat structures [5,7].

In classical optimal transport, the coupling is merely a constrained optimization variable. Once entropic regularization is introduced, however, the coupling appears as an exponential family and becomes an object of information geometry. An important point here is that this exponential-family structure cannot be separated from the presence of the transport cost. If

c_{i j}

were absent,

π_{i j}

would appear only as a deformation of the independent coupling. Conversely, if only the cost term were present and there were no entropic regularization then an exponential representation would not generally appear. Therefore, the very structure

\begin{matrix} α_{i} + β_{j} - \frac{c_{i j}}{ε} \end{matrix}

(27)

arises as the result of the competition between transport cost and information regularization. This means that an entropic coupling is a fundamental object that simultaneously carries both transport and information. From the viewpoint of the present paper, it is a local model of transport–information geometry.

In particular, linear interpolation along the natural parameters

(α, β)

defines an e-geodesic:

\begin{matrix} α (t) = (1 - t) α^{(0)} + t α^{(1)}, β (t) = (1 - t) β^{(0)} + t β^{(1)} \end{matrix}

(28)

The corresponding coupling is given by

\begin{matrix} π^{(t)} i j = exp (α_{i} (t) + β_{j} (t) - \frac{c i j}{ε}) . \end{matrix}

(29)

Thus, deformations of entropic couplings follow e-geometry as generative deformations. Here, “generative” means that when one coupling is transformed into another the distribution is not additively mixed but multiplicatively transformed through a linear deformation of the log-density. This multiplicative deformation corresponds closely to the character of e-geometry discussed in the previous section.

Moreover, this exponential representation makes the linear structure in log-density space explicit. Indeed, since

\begin{matrix} ln π_{i j} = α_{i} + β_{j} - \frac{c_{i j}}{ε} \end{matrix}

(30)

the log-coupling has an affine structure in which the contributions from the row and column directions are separated. This decomposition agrees with the natural-parameter representation in information geometry, and

α_{i}

and

β_{j}

can be interpreted as components of dual coordinates. Accordingly, an entropic coupling can be understood as an object that possesses a Euclidean linear structure in log-density space.

This viewpoint connects directly to the embedding of probability distributions by log-likelihood functions. That is, when a distribution

p (x)

is written as

l (x) = ln p (x)

the KL distance between distributions is locally approximated by the squared difference of ℓ. Similarly, in the discrete coupling case,

ln π_{i j}

is treated as a finite-dimensional vector and its difference defines a Euclidean distance. Therefore, the exponential structure in entropic OT provides a natural coordinate system that connects the infinite-dimensional geometry of probability-distribution space with finite-dimensional data representations.

From a cosmological perspective, this result implies that transport couplings can be represented in a coordinate system that is simultaneously adapted to mass rearrangement and probabilistic generation. The logarithmic representation therefore provides a common language through which optimal transport, information geometry, and finite observational data can be described within a single framework.

3.3. The m-Flat Structure of the Coupling Simplex

On the other hand, the full coupling space

Π (ρ, μ)

is a convex subset of the probability simplex. Accordingly, the convex combination of two couplings

π^{(0)}

and

π^{(1)}

,

\begin{matrix} π^{(t)} = (1 - t) π^{(0)} + t π^{(1)}, \end{matrix}

(31)

is again a coupling, and this defines an m-geodesic as a linear interpolation in expectation coordinates. That is, the entire coupling space is m-flat with respect to mixture operations. This fact may appear obvious at first sight but it is important from the viewpoint of information geometry. The reason is that the convex structure that appears here is precisely the essence of m-geometry [5,7]. Interpolations obtained by linearly combining distributions themselves correspond to additive mixing, averaging, and coarse-graining, and therefore they represent exactly the aspects of observational generation and finite sampling.

Thus, in entropic optimal transport the full coupling space is m-flat while the family of optimal entropic couplings within it possesses an e-flat structure. This relationship is completely parallel to the basic situation in information geometry. In other words, the space of entropic OT has the structure

\begin{matrix} m - flat simplex \supset e - flat exponential family . \end{matrix}

(32)

This inclusion is not merely a formal analogy. The m-flat simplex is the space of all possible couplings, and from within it a special exponential family is selected according to the balance between transport cost and information content. Therefore, entropic regularization is an operation that brings a layer of e-geometry, in the form of an exponential family, onto the stage of m-geometry given by the probability simplex. In this sense, dual-affine structure is intrinsic to the essence of entropic OT.

Furthermore, this double structure connects naturally to the description of observational data. For galaxy catalogs given as finite samples the coupling matrix

π_{i j}

itself becomes a basic statistical quantity. In this setting, m-geometry corresponds to the averaging and mixing of observational data, while e-geometry corresponds to deformations of the generative model. Accordingly, the dual-affine structure in coupling space provides a framework that unifies the theoretical description and the observational description of cosmological generation processes in a single geometric language. In this sense, the entropic optimal transport introduced in this section is a basic building block that concretely implements transport–information geometry.

3.4. Schrödinger Bridge and Stochastic Transport

The importance of the Schrödinger bridge formulation lies in the fact that it transforms entropic optimal transport from a static comparison of distributions into a dynamical theory of stochastic evolution. From the viewpoint of cosmology, this is essential because the observed large-scale structure is not simply related to the initial density field through a deterministic transport map. Rather, the evolution includes stochasticity arising from nonlinear dynamics, galaxy formation, and observational processes. The Schrödinger bridge therefore provides a natural dynamical framework in which transport and uncertainty can be treated simultaneously.

It is well known that entropy-regularized optimal transport is equivalent to the Schrödinger bridge problem [10,11]. In this problem, given a reference stochastic process R, one considers the set of probability measures P on path space with fixed initial and terminal distributions, and one seeks the process that minimizes

\begin{matrix} inf_{P} KL (P ∥ R), \end{matrix}

(33)

where

\begin{matrix} KL (P ∥ R) = E_{P} [ln \frac{P}{R}] \end{matrix}

(34)

is the Kullback–Leibler divergence on path space. The constraints in this variational problem are

\begin{matrix} P (X_{0} \in d^{3} x) = ρ_{0} (x) d^{3} x, P (X_{1} \in d^{3} y) = ρ_{1} (y) d^{3} y . \end{matrix}

(35)

If the reference process R is a diffusion process

\begin{matrix} d^{3} x_{t} = \sqrt{2 ε} d W_{t} \end{matrix}

(36)

then, by the Girsanov transformation, for any diffusion process with drift

b (X_{t}, t)

one obtains

\begin{matrix} KL (P ∥ R) = \frac{1}{4 ε} E_{P} [\int_{0}^{1} {| b (X_{t}, t) |}^{2} d t] + const . \end{matrix}

(37)

Thus, the Schrödinger bridge problem is equivalent to minimizing the action

\begin{matrix} inf_{b, ρ} \int_{0}^{1} \int \frac{{| b (x, t) |}^{2}}{4 ε} ρ (x, t) d^{3} x d t, \end{matrix}

(38)

where

ρ (x, t)

satisfies the Fokker–Planck equation

\begin{matrix} \frac{\partial ρ}{\partial t} + \nabla \cdot (ρ b) = ε Δ ρ . \end{matrix}

(39)

This equation provides a direct connection between entropic transport and standard cosmological dynamics. When

ε \to 0

the diffusion term disappears and the equation reduces to the continuity equation of pressureless matter evolution. The resulting dynamics coincide with the mass-conserving transport picture that underlies the Zel’dovich approximation and the large-scale limit of cold-dark-matter evolution. Entropic optimal transport may therefore be viewed as a stochastic extension of classical transport dynamics.

This formulation is closely related to the dynamic optimal transport of Benamou and Brenier [12]:

\begin{matrix} inf_{ρ, v} \int_{0}^{1} \int ρ {| v |}^{2} d^{3} x d t . \end{matrix}

(40)

Setting

v = b / (2 ε)

shows that the Schrödinger bridge action coincides with the kinetic energy functional of the transport velocity field

v

. However, while classical optimal transport satisfies the continuity equation

\begin{matrix} \frac{\partial ρ}{\partial t} + \nabla \cdot (ρ v) = 0 \end{matrix}

(41)

the Schrödinger bridge includes the additional diffusion term

ε Δ ρ

. Consequently, entropic optimal transport corresponds to the drift–diffusion equation

\begin{matrix} \frac{\partial ρ}{\partial t} = - \nabla \cdot (ρ v) + ε Δ ρ . \end{matrix}

(42)

The first term represents deterministic mass transport, while the second term represents information diffusion induced by entropy regularization. In this sense, the diffusion term can be interpreted as an intrinsic manifestation of m-geometric mixing already at the level of dynamical evolution.

This correspondence clarifies the geometric meaning of the entropic parameter

ε

. In classical optimal transport, the dynamics are entirely determined by the transport velocity field. In contrast, the Schrödinger bridge introduces an additional information-diffusion component. The competition between these two contributions controls the transition between deterministic and stochastic evolution. The variational interpretation of diffusion equations as gradient flows in Wasserstein space is closely related to the seminal JKO formulation [13].

The exponential-family structure of the optimal coupling

\begin{matrix} π (x, y) = exp (α (x) + β (y) - \frac{c (x, y)}{ε}) \end{matrix}

(43)

is the static representation associated with the endpoint distributions of this stochastic process. In this sense, entropic optimal transport is not merely a theory of static couplings but is a dynamical description of generative processes in terms of stochastic transport. The Schrödinger bridge thereby lifts the dual-affine structure of static entropic couplings into the time domain.

This decomposition admits a direct physical interpretation in the cosmological context. The advective term

- \nabla \cdot (ρ v)

corresponds to the deterministic gravitational mass transport realized in the Zel’dovich approximation or N-body dynamics, namely the standard cold-dark-matter picture of mass-conserving transport. In contrast, the diffusion term

ε Δ ρ

provides an effective description of intrinsic stochasticity.

Concretely, this term encapsulates three physically distinct effects that arise at small scales or in observational processes: (i) the intrinsic stochasticity of galaxy formation, such as scatter in halo occupation distributions, bursty star formation histories, and stochastic baryonic feedback from supernovae and AGN; (ii) velocity dispersion arising from multi-stream regions and shell-crossing in the nonlinear regime, which induces Fokker–Planck-type corrections; and (iii) observational and numerical coarse-graining effects such as finite sampling, shot noise, survey window functions, and smoothing scales in simulations.

Thus,

ε

is not a mere mathematical regularization parameter but acquires a direct physical meaning: it quantifies the strength of stochasticity and information diffusion in the cosmological generative process. Importantly,

ε

is expected to be scale-dependent. At large, linear scales, gravitational evolution is coherent and

ε \approx 0

, recovering the classical optimal transport limit

ε \to 0

and the continuity equation of mass-preserving transport. At small, nonlinear, or galaxy scales, where nonlinear structure formation, galaxy formation physics, and observational effects dominate,

ε

takes finite values and the diffusion term becomes essential.

From a practical perspective,

ε

may be regarded as an effective, scale-dependent parameter characterizing unresolved stochasticity. In principle, it may be estimated empirically by comparing deterministic transport models with observed galaxy catalogs.

This scale dependence provides a natural connection to the scale-dependent structure discussed in later sections. From the viewpoint of transport–information geometry, the transition from large-scale deterministic transport to small-scale stochastic dynamics is described as a continuous deformation controlled by

ε

.

3.5. Log-Density Representation and Observational Coordinates

The dynamical formulation developed above describes evolution in the space of probability distributions. To connect this structure to observational data, it is necessary to introduce an explicit coordinate representation. The log-density representation provides such a coordinate system and establishes a direct connection between transport geometry, information geometry, and finite observational data.

For this purpose, a natural construction is the embedding via the log-density. For a probability distribution

p (x)

, as introduced in Section 1, the log-density representation is given by

\begin{matrix} l (x) = ln p (x) . \end{matrix}

(44)

Since the Kullback–Leibler divergence between distributions is locally approximated by the squared difference of ℓ, the space of distributions acquires a locally Euclidean structure in the log-density space.

This construction is particularly transparent for discrete data. Given a finite set of sample points

{x_{s}}_{s = 1}^{N}

, define

\begin{matrix} l = (ln p (x_{1}), \dots, ln p (x_{N})) \end{matrix}

(45)

so that the probability distribution is represented as a finite-dimensional vector. In this representation, differences between distributions are evaluated as differences between these vectors, and statistical comparisons are implemented as geometry in Euclidean space. Thus, the log-density representation provides a natural coordinate system that connects the information geometry of distribution space with the geometry of observational data.

This viewpoint is directly consistent with the structure of entropic coupling introduced above. Indeed, the coupling satisfies

\begin{matrix} ln π (x, y) = α (x) + β (y) - \frac{c (x, y)}{ε}, \end{matrix}

(46)

which exhibits an additive decomposition in log-density space. Therefore, the exponential structure in entropic optimal transport can be understood as a linear structure in the log-density space. This implies that transport geometry and information geometry are unified through the common coordinate system of log-density.

An additional important aspect is that this log-density representation is compatible with observational operations. Galaxy catalogs given as finite samples can be interpreted as evaluation points of an underlying probability distribution, and their statistical properties can thus be described as finite-dimensional vectors such as ℓ. In this sense, the log-density embedding provides a natural mapping between theoretical distributions and observational data. Therefore, the structure of entropic optimal transport developed here is not merely an abstract geometry but provides a geometric framework that can be implemented directly in correspondence with observational data.

3.6. Interpretation for Cosmological Generation

The discussion above allows the cosmological generation chain introduced in Section 1 to be reinterpreted geometrically. The key observation is that transport and information are not competing descriptions, but complementary aspects of the same generative process.

From a cosmological perspective, this dual-affine structure admits a natural physical interpretation. Gravitational evolution corresponds to a generative deformation from the initial density field to the late-time density field, and it is associated with e-geometry as an exponential family of transformations. In contrast, the finite catalog obtained after galaxy formation is intrinsically the result of sampling and averaging, and it corresponds to m-geometry as the formation of a mixed state. In particular, the observed catalog

μ_{N}

is a finite-sample measure whose generation is related to convex mixtures of couplings and coarse graining, while the exponential representation of the entropic coupling captures the generative deformation driven by gravitational evolution.

Returning to the generation chain in Equation (1), the preceding results show that its deterministic and stochastic components correspond respectively to transport and information-geometric directions. The early stage

ρ_{lin} \to λ_{gal}

is governed by generative models constructing the distribution; here, the natural interpolation is given by linear interpolation in log-density or natural parameters, and e-geometry is dominant. In the final stage

λ_{gal} \to μ_{N}

, the distribution is averaged and projected through finite sampling and the observational window; here, convex combinations are natural and m-geometry is dominant. Consequently, the entire process is endowed with the dual-affine structure

\begin{matrix} generative evolution + observational mixing . \end{matrix}

(47)

Incorporating the stochastic transport structure introduced in the previous subsection further refines this picture. Even at the level of the continuous density field, the generative process is described by the drift–diffusion dynamics

\begin{matrix} \frac{\partial ρ}{\partial t} = - \nabla \cdot (ρ v) + ε Δ ρ . \end{matrix}

(48)

The first term corresponds to e-geometric generative deformation, namely deterministic gravitational transport, while the second term already introduces an m-type mixing effect at the density level.

Thus, the separation between e-geometry and m-geometry is not strictly tied to different stages of the generative chain but represents two interacting aspects of a unified stochastic process. The deterministic component corresponds to transport geodesics, whereas the diffusion term induces stochastic mixing. Consequently, m-geometry appears both intrinsically through stochastic evolution and externally through observational projection.

In this sense, entropic optimal transport is not merely an interpolation between the Wasserstein distance and the Kullback–Leibler divergence. Rather, it reveals the e-geometry and m-geometry inherent in the cosmological generative process and provides a fundamental geometric framework for treating transport and information simultaneously. From this perspective, dual-affine cosmology is not introduced as an abstract mathematical reformulation. Rather, it emerges naturally once transport, stochasticity, and observation are treated within a common probabilistic framework.

Entropic optimal transport therefore serves as a local model for implementing dual-affine cosmology and provides the concrete prototype of the transport–information manifold to be introduced in the following sections. Based on this structure, we define in the next section the transport–information manifold of cosmological state space and study the geometric dynamics on it.

3.7. Illustrative Gaussian Transport Example

To illustrate the operational meaning of transport–information geometry, we consider a simple one-dimensional Gaussian transport problem. The purpose of this example is not to model realistic cosmological structure formation but rather to demonstrate explicitly how transport and information components appear within a computable setting. A more cosmologically relevant toy model is presented later in Section 8. Let

\begin{matrix} ρ_{0} (x) = \frac{1}{\sqrt{2 π} σ_{0}} exp (- \frac{x^{2}}{2 σ_{0}^{2}}) \end{matrix}

(49)

and

\begin{matrix} ρ_{1} (x) = \frac{1}{\sqrt{2 π} σ_{1}} exp (- \frac{{(x - d)}^{2}}{2 σ_{1}^{2}}) \end{matrix}

(50)

represent the initial and final states. The parameter d specifies a coherent displacement of the density distribution, while the change in variance represents a deformation of the density profile.

In the transport-dominated limit

ϵ \to 0

, the interpolation follows the deterministic optimal-transport geodesic connecting the two distributions. For finite values of

ϵ

, the entropy regularization introduces an additional diffusion component and the intermediate distributions become broader. The resulting evolution may be interpreted as the competition between coherent mass transport and stochastic information diffusion.

Figure 2 presents this behavior together with its dual-affine interpretation. Here,

W_{2}

denotes the quadratic Wasserstein distance. The upper row shows the explicit density evolution, while the lower schematic summarizes the correspondence between the transport sector governed by Wasserstein geometry and the information sector governed by KL/Fisher geometry. The central entropic OT/Schrödinger bridge component acts as the bridge connecting these two descriptions. Although intentionally simple, this example demonstrates that the proposed transport–information framework yields concrete computable trajectories and observable density deformations. In this sense, the framework is not merely a geometric reformulation but provides an operational description of the interplay between transport and information in stochastic generative processes.

4. Transport–Information Manifold of Cosmological States

In the previous sections, we have seen that transport geometry and information geometry are unified within a single framework through entropic optimal transport. In this section, we apply this viewpoint to cosmological generative processes and construct a geometric state space that simultaneously incorporates mass transport and statistical generation. We refer to this space as the transport–information manifold. In this section we construct a transport–information manifold that simultaneously incorporates mass transport and probabilistic generation.

The construction proposed here should be viewed as an effective geometric description of cosmological generation rather than a complete axiomatic formulation. The aim is to identify the minimal geometric structure required to describe transport, probabilistic generation, and observational projection within a unified framework. Questions concerning the full infinite-dimensional differential-geometric and functional-analytic foundations of the resulting state space remain important subjects for future investigation. In this sense, the manifold defined in this section is based on the geometry of stochastic processes arising from the Schrödinger bridge formulation developed in the previous section. Throughout this paper,

θ

collectively denotes the parameters governing galaxy formation, bias, selection, and observational generation.

4.1. Cosmological State Space

The cosmological generative process expressed in Equation (1) involves two distinct degrees of freedom. The first is the geometric rearrangement of the mass distribution, and the second is the deformation of the probabilistic generative model. The former is described by transport geometry on the space of probability measures

P_{2} (R^{3})

, while the latter is described by information geometry on a statistical manifold

M_{cosmo}

. This naturally leads to a representation of cosmological states as pairs of transport and generative degrees of freedom. Therefore, the cosmological state space is defined as their direct product,

\begin{matrix} G_{cosmo} = P_{2} (R^{3}) \times M_{cosmo} . \end{matrix}

(51)

Equation (51) should be interpreted as an effective product structure. The Wasserstein component represents the evolving matter distribution, while the information-geometric component represents the family of generative and observational models defined on top of it. The direct-product construction provides the simplest state space capable of accommodating both aspects simultaneously.

A point in this space is denoted by

(ρ, θ)

, where

ρ

represents the mass density distribution and

θ

denotes the parameters of the galaxy formation model. This definition implies that cosmological states are not described by a single density field or a single statistical model but rather as a pair consisting of a transport state and a generative state. Specifically,

ρ

represents the matter distribution rearranged by gravitational evolution, while

θ

represents the statistical degrees of freedom characterizing galaxy formation, bias, selection effects and the observational probabilities defined on top of it. In this sense,

G_{cosmo}

is the minimal state space that simultaneously accommodates the physical and statistical aspects of cosmological generative processes.

Moreover, this product structure is naturally consistent with the dual-affine structure discussed in the previous section. Transport on

P_{2} (R^{3})

corresponds to generative deformation and is associated with e-geometry, while the family of probability distributions on

M_{cosmo}

carries m-geometry corresponding to mixing and averaging. Thus, a cosmological state can be understood as a point possessing degrees of freedom at two levels: spatial configuration and statistical generation. Accordingly,

G_{cosmo}

is not merely a product space but serves as the fundamental stage on which the interaction between transport and information is realized.

This construction is also natural in relation to observational data. Observed galaxy catalogs appear on the

ρ

side as finite-sample measures, while their generative probabilities are characterized by

θ

. Therefore, actual observations are given as projections of

(ρ, θ)

, and the geometric structure on this space is directly reflected in the properties of observables. In this sense,

G_{cosmo}

is both a theoretical and an observational state space. Observational data may be viewed as the image of a projection operator,

\begin{matrix} Π_{m} : G_{cosmo} \to D, \end{matrix}

(52)

where

D

denotes the space of finite observational catalogs. The observed catalog is then represented as

\begin{matrix} μ_{N} = Π_{m} (ρ, θ) . \end{matrix}

(53)

This viewpoint also clarifies the relationship between theoretical cosmology and observational cosmology. Theoretical evolution acts primarily on the transport component

ρ

, whereas observational inference acts primarily on the statistical component

θ

. The observed catalog is therefore naturally interpreted as a projection from the transport–information manifold to a finite-dimensional observational representation.

4.2. Metric Structure

Having defined the cosmological state space, the next step is to specify a geometric notion of distance. The metric should quantify both the cost of rearranging matter distributions and the cost of modifying the underlying generative model. The natural choice is therefore a combination of the Wasserstein metric and the Fisher metric.

The geometry of the transport component is given by the Wasserstein metric. In the formulation of Otto [14], the norm of a tangent vector

v

at a density

ρ

is defined as

\begin{matrix} {∥ v ∥}_{ρ}^{2} = \int ρ (x) {| v (x) |}^{2} d x . \end{matrix}

(54)

This provides a Riemannian formulation of the space of probability measures and represents the kinetic energy associated with the rearrangement of mass distributions. On the other hand, the statistical manifold

M_{cosmo}

is equipped with the Fisher metric

\begin{matrix} g_{i j} (θ) = E_{θ} [\partial_{i} ln p (x | θ) \partial_{j} ln p (x | θ)] . \end{matrix}

(55)

The Fisher metric measures the local distinguishability of a family of probability distributions and plays a fundamental role in information geometry, e.g., [5,7]. Therefore, a natural metric on the transport–information manifold is given by

\begin{matrix} ∥ (\dot{ρ}, \dot{θ}) ∥^{2} = \int ρ {| v |}^{2} d x + g_{i j} (θ) {\dot{θ}}^{i} {\dot{θ}}^{j} . \end{matrix}

(56)

The direct-product structure induces a corresponding decomposition of the tangent space,

\begin{matrix} T_{(ρ, θ)} G_{cosmo} = T_{ρ} P_{2} (R^{3}) \oplus T_{θ} M_{cosmo} . \end{matrix}

(57)

A tangent vector therefore takes the form

\begin{matrix} (\dot{ρ}, \dot{θ}), \end{matrix}

(58)

combining a transport deformation of the density field with a deformation of the generative model. The metric introduced above acts independently on these two sectors and provides the simplest geometric coupling between transport and information. The corresponding geodesic length functional is

\begin{matrix} L = \int \sqrt{{\int ρ | v |}^{2} d x + g_{i j} (θ) {\dot{θ}}^{i} {\dot{θ}}^{j}} d t . \end{matrix}

(59)

For the direct-product metric, geodesics decompose into Wasserstein geodesics on

P_{2} (R^{3})

and Fisher geodesics on

M_{cosmo} .

In Equation (56),

v

is the transport velocity field satisfying the continuity equation

\begin{matrix} \frac{\partial ρ}{\partial t} + \nabla \cdot (ρ v) = 0 . \end{matrix}

(60)

The appearance of the continuity equation is important because it establishes an explicit connection with standard cosmological dynamics. In the limit of pressureless matter and potential flow, Equation (60) coincides with the kinematic equation underlying the Zel’dovich approximation and large-scale cold-dark-matter evolution. Thus, the transport component of the transport–information manifold reproduces the standard transport picture of structure formation.

Mathematically, the metric combines an infinite-dimensional Wasserstein component with a finite-dimensional Fisher component. The present work adopts this structure as an effective geometric model motivated by the cosmological generative process. A rigorous treatment of the resulting infinite-dimensional product geometry, including questions of completeness, geodesic existence, and curvature, lies beyond the scope of the present paper and will be addressed elsewhere.

Furthermore, this metric structure is consistent with the entropic optimal transport introduced in the previous section. The Wasserstein component provides the transport cost, while the Fisher component provides the information distance, and both were coupled within a single variational principle under entropy regularization. From this perspective, the above metric can be understood as its local limit. Therefore, the transport–information manifold realizes the geometric structure of entropic optimal transport at the level of the state space.

4.3. Dual-Affine Connections

The statistical manifold

M_{cosmo}

possesses a dual-affine structure. That is, there exist the e-connection

\nabla^{(e)}

and the m-connection

\nabla^{(m)}

. As seen in Section 2, for exponential families the natural parameters

θ

serve as e-affine coordinates, while expectation parameters serve as m-affine coordinates. This dually flat structure provides a unified framework that geometrically distinguishes and relates two fundamental operations: generative deformation and mixing deformation [5,7]. In this setting, deformations along the e-connection are given by linear transformations of the log-density while deformations along the m-connection are given by convex combinations of distributions. Thus, the dual-affine structure provides a framework that locally linearizes the two operations of generation and averaging of distributions. The e-connection corresponds to generative deformations whereas the m-connection corresponds to observational mixing and projection.

A similar dual structure also appears in the space of couplings. As seen in Section 3, the entropic coupling

\begin{matrix} π (x, y) = exp (α (x) + β (y) - \frac{c (x, y)}{ε}) \end{matrix}

(61)

is e-flat as an exponential family, while the full space of couplings is m-flat as a probability simplex. Therefore, the transport–information manifold has the structure

\begin{matrix} (P_{2}, g_{W}) \times (M_{cosmo}, g_{F}, \nabla^{(e)}, \nabla^{(m)}) . \end{matrix}

(62)

The important point here is that the dual-affine structure is not only an internal structure of the statistical manifold but also appears within the transport problem through entropic coupling. As a result, mass transport and probabilistic generation are not merely two juxtaposed theories but can be understood as different manifestations of the same underlying dual geometric principle.

Furthermore, this structure is consistent with the log-density representation introduced in the previous section. The log-density naturally appears as an e-coordinate, and comparisons between distributions are described in a Euclidean manner in this coordinate system. On the other hand, averaging and finite sampling in observations are implemented as m-structures. Thus, the dual-affine structure plays the role of connecting theoretical generative processes and observational projections within a unified geometric language. In this sense, the transport–information manifold extends the dually flat structure of information geometry to the cosmological state space.

The formulation should be understood locally in the same sense as standard information geometry. The e- and m-connections provide local affine descriptions of generative and mixture directions, while the full cosmological dynamics may involve nonlinear curvature, singularities, and observational degeneracies. This local geometric viewpoint is sufficient for the present purpose, because our aim is to identify the differential-geometric structure underlying transport, generation, and projection rather than to construct a complete global atlas of the full cosmological state space.

4.4. Geometric Trajectories and Zel’dovich Constraint Manifold

The cosmological generative process can be described as a curve on the transport–information manifold

\begin{matrix} t \mapsto (ρ_{t}, θ_{t}) . \end{matrix}

(63)

The transport component is governed by the continuity Equation (60). On the other hand, the parameter evolution is described as a deformation of the generative model

\begin{matrix} {\dot{θ}}^{i} = V^{i} (θ) . \end{matrix}

(64)

Therefore, the cosmological generative process can be understood as a geometric trajectory of

(ρ_{t}, θ_{t})

. In this setting, the transport and information components are not independent but form coupled dynamics. That is, changes in the mass distribution affect the generative model, and deformations of the generative model alter the structure of the observed distribution. This interaction is the source of nonlinearity in cosmological structure formation.

This formulation is designed to include the standard description of structure formation as a limiting case. In the pressureless single-stream regime, the matter distribution satisfies the continuity equation together with gravitational acceleration generated by the Newtonian potential. In co-moving coordinates, the potential is related to the density contrast by the Poisson equation

\begin{matrix} \nabla^{2} Φ = 4 π G a^{2} {\bar{ρ}}_{m} δ \end{matrix}

(65)

up to conventional normalization. Thus, the transport component of the present framework is anchored in the usual Newtonian limit of cosmological perturbation theory.

In the early stage of structure formation, the Zel’dovich approximation

\begin{matrix} x (q, t) = q + D (t) \nabla Φ (q) \end{matrix}

(66)

holds [15]. Here, q denotes the Lagrangian coordinate,

D (t)

is the linear growth factor, and

Φ (q)

is the initial displacement potential. In this case, the velocity field is given by

\begin{matrix} v (x, t) = \dot{D} (t) \nabla Φ (q) . \end{matrix}

(67)

Thus, the transport is restricted to potential flow

\begin{matrix} v = \nabla ϕ . \end{matrix}

(68)

This interpretation is closely related to the geometric viewpoint of classical mechanics developed by Arnold [16], in which physical evolution is represented as geodesic motion on a configuration manifold. In the present case, the configuration space is replaced by the Wasserstein space of probability measures, and the Zel’dovich approximation appears as a particular constrained geodesic flow.

The density evolution induced by Equation (66) is determined by mass conservation,

\begin{matrix} ρ (x, t) d^{3} x = ρ_{0} (q) d^{3} q, \end{matrix}

(69)

or equivalently

\begin{matrix} ρ (x, t) = \frac{ρ_{0} (q)}{det (\partial x / \partial q)} . \end{matrix}

(70)

For the Zel’dovich map, the Jacobian is

\begin{matrix} det (\frac{\partial x}{\partial q}) = det [δ_{i j} + D (t) \partial_{i} \partial_{j} Φ (q)] . \end{matrix}

(71)

In the linear regime, this gives

\begin{matrix} δ (x, t) ≃ - D (t) \nabla^{2} Φ (q), \end{matrix}

(72)

which is the standard linear relation between the displacement potential and the density contrast. Therefore, the Zel’dovich approximation appears in the present framework as the potential-flow, mass-conserving limit of the transport sector.

Under this constraint, the Wasserstein action

\begin{matrix} \int_{0}^{1} \int ρ {| v |}^{2} d x d t \end{matrix}

(73)

becomes a variational problem on the space of potentials. Therefore, the Zel’dovich approximation can be understood as motion on a constrained submanifold

\begin{matrix} Z \subset G_{cosmo} \end{matrix}

(74)

within the transport–information manifold. This interpretation is also closely related to transport-based reconstruction approaches in cosmology [4]. In such approaches, the large-scale matter distribution is reconstructed by identifying optimal transport maps connecting the primordial and observed states. The present framework extends this viewpoint by embedding the transport dynamics into the larger transport–information manifold and by incorporating probabilistic generative degrees of freedom. This interpretation is important, because the Zel’dovich approximation is not merely an approximate solution but defines a particular geodesic submanifold in transport geometry.

More explicitly, the submanifold

Z

may be characterized by the restrictions

\begin{matrix} Z = \{(ρ, θ) \in G_{cosmo} | x (q, t) = q + D (t) \nabla Φ (q), \nabla^{2} Φ \propto δ_{lin}\} . \end{matrix}

(75)

The first condition imposes potential transport, while the second identifies the displacement potential with the initial density contrast through the Poisson equation. Thus,

Z

represents the sector of the transport–information manifold corresponding to standard linear structure formation.

Furthermore, this constraint also has implications for the information-geometric degrees of freedom. That is, under the constraint of potential flow the admissible transport paths are restricted, and at the same time the corresponding deformations of the generative model are also constrained. In this sense,

Z

is not merely a constraint on transport but functions as a geometric constraint on the entire transport–information manifold.

For example, if the growth factor depends on cosmological parameters,

D = D (t; θ)

, then variations in

θ

alter the admissible transport path through Equation (66). Conversely, the observed deformation of the density field constrains the allowed values of the generative parameters. This provides a concrete mechanism by which transport geometry and information geometry become coupled even in the linear or quasi-linear regime.

Thus, cosmological structure formation can be understood as a geometric motion on the transport–information manifold, with the Zel’dovich constraint manifold appearing as its low-order approximation. The geometric role of transport dynamics is closely related to the variational formulation of diffusion processes developed by Jordan, Kinderlehrer, and Otto [13]. From this perspective, deterministic transport and stochastic evolution can both be understood as geometric flows on the space of probability measures. The standard continuity equation, the Poisson equation, and the Zel’dovich approximation are therefore not external to the present framework but arise as limiting structures within the transport sector of the transport–information manifold.

Beyond this constrained regime, nonlinear evolution, shell crossing, galaxy formation stochasticity, and observational projection move the system away from

Z

. The role of the full transport–information manifold is precisely to describe such departures from the deterministic linear transport picture by adding information-geometric and stochastic degrees of freedom. In this sense, the present framework extends the standard perturbative description without replacing it.

In the next section, we introduce an action principle and dynamics naturally defined on this state space, and we describe cosmological evolution as coupled dynamics of transport geometry and information geometry.

5. Cosmological Dynamics on the Transport–Information Manifold

In the previous sections we defined the cosmological state space as the transport–information manifold

\begin{matrix} G_{cosmo} = P_{2} (R^{3}) \times M_{cosmo} . \end{matrix}

(76)

The first factor carries the Wasserstein transport geometry of mass distributions, while the second carries the information geometry of the generative model. What remains is to specify the natural dynamics on this manifold. In this section we introduce an action principle governing cosmological evolution and clarify its variational, Hamiltonian, curvature, and gradient-flow structures. This formulation shows that structure formation is not merely the time evolution of a density field but also geometric dynamics on the transport–information manifold, where mass transport and statistical generation interact as coupled geometric processes.

5.1. Transport–Information Action

A tangent vector on

G_{cosmo}

is given by

(\dot{ρ}, \dot{θ})

, where

\dot{ρ}

corresponds to a transport velocity field

v (x, t)

satisfying the continuity equation and

\dot{θ}

is a tangent vector on the statistical manifold

M_{cosmo}

. The Riemannian metric on the product manifold is the direct sum

\begin{matrix} G = g_{W} + λ g_{F}, \end{matrix}

(77)

with

λ > 0

a constant controlling the relative weight between transport and information contributions. The squared norm of the tangent vector is then

\begin{matrix} ∥ (\dot{ρ}, \dot{θ}) ∥_{G}^{2} = \int ρ (x) {| v (x, t) |}^{2} d^{3} x + λ g_{i j} (θ) {\dot{θ}}^{i} {\dot{θ}}^{j} . \end{matrix}

(78)

The first term is the kinetic energy associated with mass rearrangement (Wasserstein geometry), while the second is the informational kinetic energy associated with deformation of the generative model (Fisher geometry). The present formulation therefore extends rather than replaces the conventional transport description of structure formation.

The natural action for the motion of cosmological states is therefore

\begin{matrix} A [(ρ, θ)] = \frac{1}{2} \int_{t_{0}}^{t_{1}} [{\int ρ (x, t) | v (x, t) |}^{2} d^{3} x + λ g_{i j} (θ (t)) {\dot{θ}}^{i} (t) {\dot{θ}}^{j} (t)] d t . \end{matrix}

(79)

Cosmological evolution is the extremal curve of this action. This construction is consistent with the entropic optimal transport framework of Section 3: the transport term recovers the kinetic energy of the Benamou–Brenier formulation, while the information term corresponds to the Fisher-metric distance. The parameter

λ

thus plays the role of a geometric coupling constant between mass transport and statistical generation.

The role of

λ

is distinct from that of the entropic parameter

ε

introduced in Section 3. While

ε

controls the strength of stochastic diffusion or mixing in the transport process,

λ

controls the relative geometric weight assigned to deformations of the statistical model. Thus,

ε

measures stochasticity in the transport process, whereas

λ

measures the coupling between transport geometry and information geometry at the level of the state-space metric.

5.2. Hamiltonian Formulation

The action can be cast in Hamiltonian form. Introducing the velocity potential

ϕ

via

v = \nabla ϕ

, the transport-sector Hamiltonian is

\begin{matrix} H_{W} [ρ, ϕ] = \frac{1}{2} \int ρ (x) {| \nabla ϕ (x) |}^{2} d^{3} x . \end{matrix}

(80)

The transport-sector Hamiltonian recovers the kinetic part of pressureless matter evolution in the Newtonian limit.

Adding the information-geometric sector yields the full transport–information Hamiltonian

\begin{matrix} H_{T I} [ρ, ϕ, θ, π_{θ}] = \frac{1}{2} \int ρ (x) {| \nabla ϕ (x) |}^{2} d^{3} x + \frac{λ}{2} g^{i j} (θ) π_{i} π_{j}, \end{matrix}

(81)

where

π_{i} = λ g_{i j} (θ) {\dot{θ}}^{j}

is the momentum conjugate to

θ^{i}

.

The canonical equations separate into transport and information sectors:

\begin{matrix} \frac{\partial ρ}{\partial t} & = - \nabla \cdot (ρ \nabla ϕ), \end{matrix}

(82)

\begin{matrix} \frac{\partial ϕ}{\partial t} & = - \frac{1}{2} {| \nabla ϕ |}^{2} \end{matrix}

(83)

and

\begin{matrix} {\dot{θ}}^{i} & = λ g^{i j} π_{j}, \end{matrix}

(84)

\begin{matrix} {\dot{π}}_{i} & = - \frac{λ}{2} \partial_{i} g^{j k} (θ) π_{j} π_{k}, \end{matrix}

(85)

which is equivalent to the geodesic equation on the Fisher manifold

\begin{matrix} {\ddot{θ}}^{k} + Γ_{i j}^{k} (θ) {\dot{θ}}^{i} {\dot{θ}}^{j} = 0 . \end{matrix}

(86)

Thus, cosmological evolution on

G_{cosmo}

is the simultaneous coupling of Wasserstein transport dynamics and Fisher geodesic dynamics. Although the Hamiltonian appears as a direct sum, the two sectors evolve jointly on the same state space, generating the full structure-formation process. This Hamiltonian structure is also consistent with the stochastic description in Section 3: the transport part recovers the kinetic term of the Schrödinger bridge, while the information sector supplies the additional statistical degrees of freedom.

The resulting picture may therefore be interpreted as a geometric extension of standard cosmological dynamics. The Wasserstein sector recovers the transport of matter, while the information sector introduces additional degrees of freedom associated with galaxy formation, bias, stochasticity, and observational uncertainty.

5.3. Curvature, Gradient Flows, and Zel’dovich Dynamics

The Hamiltonian description captures the conservative (geodesic) aspect of the dynamics. Realistic cosmological evolution, however, also involves coarse-graining, stochasticity, and dissipation. These effects are naturally incorporated through the curvature and gradient-flow structures on the same manifold.

The statistical manifold

M_{cosmo}

generally carries nonzero curvature defined by the Fisher metric. This curvature measures the nonlinearity of the generative model: in nearly linear regimes (e.g., log-linear bias) the manifold is locally flat, while strong galaxy bias or complex selection functions induce significant curvature. Physically, large curvature implies that small changes in model parameters

θ

produce highly nonlinear changes in the observed distribution, thereby governing the “stiffness” of cosmological inference. The curvature also enters the geodesic equation through the Christoffel symbols

Γ_{i j}^{k}

, directly affecting the time evolution of the generative model. Curvature in

M_{cosmo}

therefore measures the degree to which cosmological inference departs from a locally linear parameter-estimation problem.

To incorporate self-gravitating cosmological dynamics, we augment the transport kinetic term by a gravitational potential-energy functional. This corresponds to embedding the usual Vlasov–Poisson or pressureless self-gravitating system into the transport–information framework. For this, we introduce the total energy functional

\begin{matrix} F [ρ, θ] = F_{grav} [ρ] + F_{info} [θ], \end{matrix}

(87)

where

F_{grav}

is the gravitational potential energy and

F_{info}

is the information potential. This construction extends the JKO viewpoint [13] by coupling Wasserstein and information-geometric flows.

The composite gradient flow on the transport–information manifold is

\begin{matrix} \frac{\partial ρ}{\partial t} & = \nabla \cdot (ρ \nabla \frac{δ F}{δ ρ}), \end{matrix}

(88)

\begin{matrix} {\dot{θ}}^{i} & = - g^{i j} (θ) \partial_{j} F_{info} . \end{matrix}

(89)

When combined with the entropic regularization of Section 3, the density equation acquires the diffusion term

ε Δ ρ

, yielding the drift–diffusion dynamics

\begin{matrix} \frac{\partial ρ}{\partial t} = \nabla \cdot (ρ \nabla \frac{δ F}{δ ρ}) + ε Δ ρ . \end{matrix}

(90)

The first term is deterministic transport driven by gravity and the generative model, while the second encodes the stochasticity and coarse-graining already discussed in Section 3. The competition between these terms, controlled by the scale-dependent parameter

ε

, determines the statistical properties of structure formation. For finite

ε

, the evolution acquires an information-diffusion component associated with unresolved physics, multi-stream dynamics, galaxy formation stochasticity, and observational coarse graining.

Finally, the Zel’dovich approximation emerges as a distinguished constrained solution within this framework. Consider the Lagrangian mapping

x (q, t) = q + D (t) Ψ_{0} (q)

with

Ψ_{0} = \nabla Φ

. The corresponding velocity field satisfies

v = \nabla ϕ

, and the transport kinetic energy reduces to a quadratic form in the initial displacement. This trajectory minimizes the transport-sector action and lies on a constrained submanifold

Z \subset G_{cosmo}

. Since the growth factor

D (t; θ)

depends on the cosmological parameters

θ

, the Zel’dovich solution couples the transport and information degrees of freedom. Thus, the Zel’dovich approximation is not merely an early-time approximation but a reference geodesic on a special constrained manifold within the full nonlinear transport–information dynamics.

This result provides an explicit connection between the transport–information manifold and standard cosmological perturbation theory. The Zel’dovich approximation appears not as an external approximation imposed on the formalism but as a distinguished geodesic solution embedded within the transport sector itself.

Taken together, the action principle, Hamiltonian, curvature, and gradient-flow structures provide a unified geometric description of cosmological evolution as coupled transport and information dynamics on

G_{cosmo}

. In the following section we show how observational processes are incorporated as m-affine projections on this manifold, completing the dual-affine cosmology framework.

6. Observation as $m$ -Projection and Curvature Deformation

In the previous sections we saw that the cosmological generative process has the hierarchical structure

\begin{matrix} ρ_{lin} ⟶ ρ_{NL} ⟶ λ_{gal} ⟶ μ_{N} . \end{matrix}

(91)

The first part of this chain describes the physical and probabilistic generation of the underlying distribution, whereas the final step describes the production of a finite observational catalog. In the language developed above, the former is naturally associated with e-geometric generative deformation, while the latter is associated with m-geometric mixing, projection, and coarse graining. In this section we reformulate the observational stage from the viewpoint of information geometry. In particular, we show that observation is realized as an m-affine projection from the transport–information manifold to the space of finite catalogs. This formulation makes it possible to geometrically separate the underlying generative process from its observed manifestation. This observational branch of the generative process is shown schematically in Figure 1.

6.1. Observation as an m-Affine Projection

A cosmological state is specified by

\begin{matrix} (ρ, θ) \in G_{cosmo} = P_{2} (R^{3}) \times M_{cosmo}, \end{matrix}

(92)

where

ρ

is the mass distribution and

θ

parametrizes the generative model. During the generative stage these quantities evolve along the transport–information dynamics described in the preceding sections. The intensity field of galaxies may be regarded schematically as a functional of the state,

\begin{matrix} λ_{gal} = Λ_{gal} [ρ, θ], \end{matrix}

(93)

where

Λ_{gal}

encodes galaxy bias, selection at the level of the generative model, halo occupation, and other astrophysical effects. The observed catalog

μ_{N}

, however, is not identical to

λ_{gal}

. It is obtained after observational selection, restriction to the survey volume, and finite sampling. We therefore write the observational process as

\begin{matrix} μ_{N} = Π_{m} [λ_{gal}; W, S, N], \end{matrix}

(94)

where W denotes the survey window, S denotes the selection function, and N denotes the finite number of observed objects.

The map

Π_{m}

is an m-affine map in the information-geometric sense. For any convex combination of intensity fields,

\begin{matrix} λ = \sum_{a} w_{a} λ_{a}, w_{a} \geq 0, \sum_{a} w_{a} = 1, \end{matrix}

(95)

one has

\begin{matrix} Π_{m} [λ] = \sum_{a} w_{a} Π_{m} [λ_{a}] . \end{matrix}

(96)

This linearity in mixture coordinates is the defining feature of an m-affine operation. Explicitly, the observational map can be represented schematically as

\begin{matrix} λ_{gal} (x) ⟼ S (x) I_{W} (x) λ_{gal} (x) ⟼ μ_{N}, \end{matrix}

(97)

where

I_{W} (x)

is the indicator function of the survey window. The first arrow represents selection and masking, while the second arrow represents finite sampling from the selected intensity field. Both operations are linear in measure or intensity and therefore belong to m-geometry.

Equivalently, if the selected intensity is denoted by

\begin{matrix} λ_{obs} (x) = S (x) I_{W} (x) λ_{gal} (x), \end{matrix}

(98)

then the catalog may be modeled as a Poisson or Cox realization,

\begin{matrix} μ_{N} = \sum_{n = 1}^{N} δ_{D} (x - x_{n}), x_{n} \sim λ_{obs} (x) . \end{matrix}

(99)

This expression makes explicit that the observed catalog is a random finite measure generated from an underlying intensity field. The observational map

Π_{m}

therefore contains both deterministic linear operations, such as masking and selection, and stochastic sampling operations. In this sense, observation is not merely a loss of resolution but a well-defined geometric operation on the statistical side of the transport–information manifold.

As a natural consequence, any local geometric quantity inferred from the observed catalog need not coincide with the corresponding quantity of the underlying transport process. For example, the curvature measured from the observed local measure

μ_{x, obs}^{(R)}

differs in general from the transport curvature at the generative stage:

\begin{matrix} K_{obs} (x; R) \neq K_{tr} (x; R), \end{matrix}

(100)

where

K_{tr}

is the curvature associated with the underlying transport process.

In the present work,

K (x; R)

denotes a local effective curvature observable associated with the probability measure in a neighborhood of scale R around the point x. Motivated by the entropy-based curvature framework developed in [17], such observables may be constructed from the local second variation of entropy in Wasserstein space. Under a local quadratic approximation, the effective curvature is related to derivatives of the coarse-grained log-density, schematically

\begin{matrix} K (x; R) \sim \nabla^{2} ln ρ_{R} (x), \end{matrix}

(101)

where

ρ_{R}

denotes the density field smoothed on scale R. The precise estimator is not fixed in the present paper, and K is used as a generic curvature observable characterizing the local geometric response of the measure. The observed curvature is thus an m-projected version of the generative curvature. This statement should not be understood as introducing a new physical curvature independent of the matter distribution. Rather,

K_{obs}

is reconstructed from the finite observed measure after selection, masking, and sampling have acted. The difference between

K_{obs}

and

K_{tr}

quantifies the deformation induced by observational projection.

6.2. Decomposition of Curvature Deformation

Any m-affine transformation of the measure induces a corresponding deformation of curvature. We therefore decompose the observed curvature as

\begin{matrix} K_{obs} (x; R) = K_{tr} (x; R) + δ K_{obs} (x; R), \end{matrix}

(102)

where the observational distortion is further decomposed into four geometrically distinct contributions:

\begin{matrix} δ K_{obs} = δ K_{samp} + δ K_{sel} + δ K_{mix} + δ K_{disc} . \end{matrix}

(103)

Here,

δ K_{samp}

: distortion due to finite sampling and shot noise,

δ K_{sel}

: distortion due to selection functions and survey windows,

δ K_{mix}

: distortion due to multi-stream structure and statistical mixing,

δ K_{disc}

: distortion due to discretization and reference-measure construction.

Although the decomposition is not strictly unique or orthogonal it provides a minimal and systematic classification of observational effects according to their m-geometric origin. This decomposition allows us to treat observational distortions as independent m-geometric responses to the underlying e-geometric generative process. It also supplies the starting point for cosmological inference: reconstructing the true transport curvature $K_{tr}$ reduces to modeling and subtracting the m-geometric distortion $δ K_{obs}$ .

A concrete estimator is obtained by inverting the m-projection:

\begin{matrix} {\hat{K}}_{tr} (x; R) = K_{obs} (x; R) - {\hat{δ K}}_{obs} (x; R), \end{matrix}

(104)

where

{\hat{δ K}}_{obs}

is a model- or data-driven estimate of the total observational distortion. The accuracy of this estimator naturally depends on how faithfully each component of

δ K_{obs}

is characterized.

6.3. Canonical Examples of Observational Deformation

The four distortion terms introduced above appear explicitly in realistic observational processes. Their significance is that they arise from different m-geometric operations acting on the underlying generative distribution.

Finite sampling from a Poisson point process produces fluctuations in the empirical measure

\begin{matrix} μ_{N}^{(R)} (x) = \frac{1}{N_{R} (x)} \sum_{i \in B_{R} (x)} δ_{D} (x - x_{i}) \end{matrix}

(105)

that manifest as

δ K_{samp}

. Since the number of objects inside a finite volume fluctuates stochastically the measured curvature differs from its ensemble expectation. This effect is particularly prominent in low-density regions and at small smoothing scales.

The selection function and survey window transform the intensity field according to

\begin{matrix} λ_{obs} (x) = S (x) I_{W} (x) λ (x) . \end{matrix}

(106)

Because this operation corresponds to a linear reweighting of the underlying measure it belongs naturally to m-geometry. The resulting curvature distortion is represented by

δ K_{sel}

. This contribution primarily affects large-scale structures and survey-scale gradients.

Nonlinear multi-streaming introduces an additional source of observational deformation. In this regime the effective distribution becomes a statistical mixture,

\begin{matrix} p (x) = \sum_{a} w_{a} p_{a} (x), \end{matrix}

(107)

which is a convex combination in mixture space. Since mixture operations are the defining structure of m-geometry the corresponding curvature deformation is naturally identified with

δ K_{mix}

. This contribution is expected to be most important near density peaks, cluster environments, and shell-crossing regions.

Finally, any practical curvature measurement requires discretization, smoothing, kernel estimation, or reference-measure construction. These procedures inevitably introduce additional coarse-graining effects that are collected into

δ K_{disc}

. Although such effects are often treated as technical details they possess a clear geometric interpretation as observational deformations acting on the measured distribution.

The importance of this classification is that the four distortion terms originate from different physical and observational mechanisms yet all can be represented within the same m-affine framework. Consequently, observational effects need not be treated as a heterogeneous collection of corrections. Instead, they become geometrically unified as different manifestations of observational projection. This viewpoint provides a systematic language for comparing survey effects, finite-sampling effects, and nonlinear mixing effects within a common framework.

In this way, all observational distortions are uniformly described within m-affine geometry. The decomposition (102) therefore provides a clear geometric principle for classifying the physical content of observational processes.

This decomposition also suggests a natural hierarchy of correction procedures. Large-scale survey effects may be treated through

δ K_{sel}

, finite-sampling effects through

δ K_{samp}

, nonlinear dynamical effects through

δ K_{mix}

, and numerical or estimator-dependent effects through

δ K_{disc}

. The decomposition therefore provides a geometric roadmap for observational calibration and reconstruction.

6.4. Hierarchy of Estimators and Response Structure

The inference problem discussed in the previous section is not restricted to the estimation of a single scalar quantity but extends naturally to higher-order statistical structures. This extension follows from the interpretation of curvature as a response quantity to the generative process.

To make this notion precise, consider a perturbation of the generative distribution. Let

\begin{matrix} λ \mapsto λ + δ λ \end{matrix}

(108)

be a small deformation of the intensity field. For any observable functional

O [λ]

, we define the response hierarchy through functional derivatives

\begin{matrix} R_{1} (x) & = \frac{δ O}{δ λ (x)}, \end{matrix}

(109)

\begin{matrix} R_{2} (x, y) & = \frac{δ^{2} O}{δ λ (x), δ λ (y)} \end{matrix}

(110)

and higher-order responses analogously.

Here,

R_{1}

represents the linear response to perturbations of the generative distribution, while

R_{2}

represents the quadratic response. Higher-order response functions describe increasingly nonlinear sensitivity to the underlying generative process. Within this framework, the transport curvature

K_{tr}

can be interpreted as a second-order response to deformations of the distribution. That is, curvature measures the nonlinear sensitivity of observables to variations in the underlying generative structure. Curvature estimation can therefore be understood as part of a more general response theory.

This interpretation provides a direct connection between curvature statistics and inference. While first-order responses characterize changes in the mean structure of a distribution, second-order responses characterize its local geometric stability. Curvature thus plays a role analogous to a susceptibility or Hessian in conventional response theory. This response structure is also consistent with the dual-affine geometry discussed in the previous sections. Observables are constructed as m-geometric responses to generative deformations occurring in e-geometry. In this setting, first-order responses appear as changes in average properties, whereas higher-order responses appear as changes in distributional shape, covariance structure, and curvature.

In practice, the observable curvature

K_{obs}

itself can be regarded as a functional of the observed measure, and its variation under perturbations of the underlying intensity field defines the empirical response hierarchy. Accordingly, curvature estimation should be understood not merely as the reconstruction of a geometric quantity but as the reconstruction of the response structure associated with the generative process. In this sense, transport–information geometry provides a unified foundation for cosmological data analysis. It simultaneously describes the generative process, observational projection, curvature deformation, and response structure within a common geometric language.

In the next section, we discuss how these theoretical structures lead to a broader reinterpretation of cosmological inference, structure formation, and observational cosmology within the transport–information framework.

7. Implications for Cosmological Inference and Structure Formation

In the previous sections we have formulated cosmological evolution as geometric dynamics on the transport–information manifold and shown that observation corresponds to an m-affine projection. Within this framework, the generative process and the observational process are both deformations of the same probability distribution yet they are clearly separated as distinct geometric operations. In this section we discuss the implications of this unified picture for cosmological inference and our understanding of structure formation. In particular, we reinterpret conventional analyses from the viewpoint of transport–information geometry and highlight how the dual-affine structure naturally resolves several conceptual difficulties in observational cosmology.

7.1. Reinterpretation of Cosmological Inference

Conventional cosmological inference has typically treated the observed catalog

μ_{N}

as a direct realization of a probability distribution, from which cosmological parameters

θ

are estimated through its statistical properties. However, the framework developed here shows that

μ_{N}

is not the generative distribution

λ_{gal}

itself but rather its m-affine projection:

\begin{matrix} μ_{N} = Π_{m} [λ_{gal}] . \end{matrix}

(111)

This implies that standard inference implicitly works with a projected quantity.

This observation clarifies the geometric structure underlying modern cosmological reconstruction methods. Approaches such as Bayesian large-scale-structure inference, forward modeling, and simulation-based cosmology do not directly infer cosmological parameters from the observed catalog. Rather, they attempt to reconstruct a latent generative state from a projected observational realization. In the present framework, this procedure corresponds naturally to the inversion of an m-projection.

From this perspective, cosmological inference is naturally reformulated as an inverse problem of recovering the generative state

(ρ, θ)

from its m-projected observable. Because the mapping belongs to m-geometry, inference corresponds to inverting the projection and reconstructing the underlying e-geometric structure. Modern Bayesian reconstruction methods, e.g., [18], provide a natural observational counterpart of the m-projection viewpoint. This reinterpretation clarifies the essence of the problem: rather than directly estimating

θ

from raw data, one first corrects for the distortions induced by the m-projection and then one evaluates the parameters of the generative model. This structure is fully consistent with the curvature decomposition and the estimator

{\hat{K}}_{tr}

introduced in Section 6. Accordingly, the present framework provides a theoretical foundation that unifies observational corrections with parameter inference.

Furthermore, uncertainty in inference is now understood geometrically: observational noise and finite-sample fluctuations appear as well-defined distortions in m-geometry, which can be quantified through distances and curvature in the space of probability distributions. Thus, cosmological inference is both a statistical estimation problem and a geometric inverse problem.

7.2. Structure Formation as Dual-Affine Dynamics

The central claim of this paper is that cosmological structure formation is dynamics endowed with a dual-affine structure, that is, the coupling of

\begin{matrix} e - geometry : generative evolution + m - geometry : observational mixing . \end{matrix}

(112)

Gravitational evolution corresponds to generative deformation in e-geometry: the density field

ρ

evolves through transport maps, which are expressed as linear transformations in log-density or natural-parameter space. Galaxy formation and the growth of structure are therefore understood as exponential-family deformations.

Observation, on the other hand, corresponds to mixing operations in m-geometry. Finite sampling, selection functions, and survey windows all act as convex combinations or linear mappings of probability distributions, so that the observed structure is the generative structure acted upon by an m-projection. These two aspects are intrinsically asymmetric: the generative process is multiplicative, while the observational process is additive. This asymmetry is precisely what makes cosmological data analysis challenging, yet it is elegantly unified within the dual-affine structure.

On the transport–information manifold, the full dynamics are described by the joint evolution

\begin{matrix} t \mapsto (ρ_{t}, θ_{t}), \end{matrix}

(113)

where e-geometry governs the generative degrees of freedom and m-geometry governs the observational degrees of freedom.

As nonlinear evolution proceeds, the Lagrangian map loses injectivity and multi-stream regions appear. In the present framework, such multi-streaming is not an exceptional breakdown of the fluid approximation but a natural consequence of mixing in m-geometry: multiple streams are expressed as convex combinations of probability measures. Thus, structure formation is understood not merely as the evolution of a density field but as a unified geometric dynamics in the space of probability distributions.

7.3. Separation of Physical and Observational Effects

Thanks to the dual-affine structure it becomes possible to geometrically separate physical information from observational distortions in cosmological data. In conventional analyses, observational effects have often been treated as empirical correction terms. In the present framework, however, they are formulated as independent deformations belonging to m-geometry. This separation is both conceptually clean and practically powerful.

Concretely, any observable is decomposed as

\begin{matrix} K_{obs} = K_{tr} + δ K_{obs}, \end{matrix}

(114)

where

K_{tr}

is the generative curvature arising from transport geometry and

δ K_{obs}

collects the distortions induced by the m-projection. The physically meaningful quantity is

K_{tr}

, while the observable

K_{obs}

is its deformed version under m-geometry. Inference therefore reduces to reconstructing

K_{tr}

by correcting each component of

δ K_{obs}

in turn, as formulated in Section 6.

An important advantage is that this separation is model-independent and geometric: independently of the specific galaxy formation model or observational setup, the distinction between e-geometry and m-geometry itself provides a universal framework. This common language enables consistent comparisons across different datasets and analysis methods.

7.4. Scale Dependence and Effective Theory

Another natural consequence of the transport–information geometric description is the transparent incorporation of scale dependence. The curvature

K (x; R)

is inherently scale-dependent, and its variation reflects the different regimes of structure formation. At small scales nonlinear effects and discretization dominate, while at large scales averaged structures become dominant.

This scale dependence can be understood within an effective-theory picture. The curvature observed at a given scale R is the result of m-geometric averaging (coarse graining) over smaller-scale degrees of freedom:

\begin{matrix} K_{eff} (x; R) = C_{R} [K_{tr}] (x), \end{matrix}

(115)

where

C_{R}

denotes a coarse-graining operator acting at scale R. This averaging operation corresponds exactly to mixing in m-geometry and is fully consistent with the gradient-flow formulation and the diffusion term

ε Δ ρ

introduced earlier.

In particular, the diffusion term in the Schrödinger bridge can be interpreted as the effective averaging of small-scale structures. Consequently, changing the smoothing scale R is equivalent to varying the effective strength of information diffusion. This viewpoint also clarifies the geometric meaning of smoothing and binning procedures commonly used in data analysis: they are not merely technical conveniences but m-geometric coarse-graining operations. Within the present framework, results obtained at different scales can therefore be compared and connected in a consistent theoretical manner.

Accordingly, transport–information geometry provides a natural foundation for understanding cosmological structure formation as a scale-dependent effective theory, in which small-scale nonlinearities and large-scale averaged structures are described within a single geometric picture.

8. Concrete Cosmological Limit and a One-Dimensional Toy Model

The purpose of this section is to make explicit how the transport–information formulation reduces to standard cosmological dynamics in an appropriate limit and how the full generative chain can be implemented in a simple toy model. This also clarifies that the present framework is not a replacement of the standard gravitational description but an extension in which deterministic transport, stochastic galaxy formation, and observational sampling are treated within a single geometric structure.

8.1. Standard Cosmological Limit

We first recall the transport part of the theory in the limit where information diffusion is absent. In the Schrödinger bridge formulation, the density evolves according to the drift–diffusion equation

\begin{matrix} \frac{\partial ρ}{\partial t} = - \nabla \cdot (ρ v) + ε Δ ρ . \end{matrix}

(116)

The parameter

ε

controls the strength of entropic spreading. In the deterministic limit,

\begin{matrix} ε \to 0, \end{matrix}

(117)

Equation (116) reduces to

\begin{matrix} \frac{\partial ρ}{\partial t} + \nabla \cdot (ρ v) = 0 . \end{matrix}

(118)

This is the standard continuity equation for a pressureless matter distribution. Therefore, the Wasserstein transport sector of the present theory contains the mass-conserving part of standard cosmological dynamics as its deterministic limit.

In the single-stream regime, the velocity field is potential,

\begin{matrix} v = \nabla ϕ, \end{matrix}

(119)

and the density evolution can be represented by a Lagrangian transport map

\begin{matrix} x = T_{t} (q) = q + Ψ (q, t), \end{matrix}

(120)

where

q

is the initial Lagrangian coordinate and

x

is the Eulerian coordinate. Mass conservation gives

\begin{matrix} ρ (x, t) d^{3} x = ρ_{0} (q) d^{3} q . \end{matrix}

(121)

Equivalently,

\begin{matrix} ρ (x, t) = \frac{ρ_{0} (q)}{det (\partial x_{i} / \partial q_{j})} . \end{matrix}

(122)

Thus, the density evolution is expressed as the pushforward

\begin{matrix} ρ_{t} = {(T_{t})}_{#} ρ_{0} . \end{matrix}

(123)

This is precisely the geometric structure described by optimal transport.

The Zel’dovich approximation is obtained by choosing the displacement field as

\begin{matrix} Ψ (q, t) = D (t) s (q), s (q) = - \nabla_{q} Φ (q), \end{matrix}

(124)

where

D (t)

is the linear growth factor and

Φ

is the initial displacement potential. The corresponding map is

\begin{matrix} x = q - D (t) \nabla_{q} Φ (q) . \end{matrix}

(125)

The initial density contrast is related to the potential by the Poisson-type relation

\begin{matrix} δ_{lin} (q) = \nabla_{q}^{2} Φ (q) \end{matrix}

(126)

up to the conventional cosmological normalization. Therefore, the chain

\begin{matrix} Φ ⟶ Ψ ⟶ ρ_{t} \end{matrix}

(127)

is the standard gravitational construction of large-scale structure in the single-stream regime. In the present framework, this chain is interpreted as a deterministic geodesic-like motion in the Wasserstein sector of the transport–information manifold.

8.2. One-Dimensional Zel’dovich Transport

We now give a one-dimensional toy model in which all steps can be written explicitly. Let the Lagrangian coordinate be

q \in [0, L]

with periodic boundary conditions. We take the initial displacement potential to be

\begin{matrix} Φ (q) = - \frac{A}{k^{2}} cos (k q), k = \frac{2 π n}{L}, \end{matrix}

(128)

where A controls the amplitude of the initial perturbation. Then

\begin{matrix} δ_{lin} (q) = \frac{d^{2} Φ}{d q^{2}} = A cos (k q) . \end{matrix}

(129)

The Zel’dovich displacement is

\begin{matrix} Ψ (q, t) = - D (t) \frac{d Φ}{d q} = - D (t) \frac{A}{k} sin (k q), \end{matrix}

(130)

and hence the Eulerian position is

\begin{matrix} x (q, t) = q - D (t) \frac{A}{k} sin (k q) . \end{matrix}

(131)

The Jacobian of the map is

\begin{matrix} J (q, t) = \frac{d x}{d q} = 1 - D (t) A cos (k q) . \end{matrix}

(132)

For a uniform initial density

\bar{ρ}

, mass conservation gives

\begin{matrix} ρ_{Z} (x (q, t), t) = \frac{\bar{ρ}}{J (q, t)} = \frac{\bar{ρ}}{1 - D (t) A cos (k q)} . \end{matrix}

(133)

The corresponding density contrast is

\begin{matrix} 1 + δ_{Z} (x (q, t), t) = \frac{1}{1 - D (t) A cos (k q)} . \end{matrix}

(134)

Expanding this expression for

D (t) A ≪ 1

, we obtain

\begin{matrix} δ_{Z} (x (q, t), t) = D (t) A cos (k q) + D^{2} (t) A^{2} {cos}^{2} (k q) + O (A^{3}) . \end{matrix}

(135)

Thus, the linear density contrast is recovered at first order, while nonlinear mode coupling appears at higher order. Shell crossing occurs when

\begin{matrix} J (q, t) = 0, \end{matrix}

(136)

or equivalently

\begin{matrix} D (t) A cos (k q) = 1 . \end{matrix}

(137)

Before shell crossing, the map

q \mapsto x

is one-to-one and the evolution is a deterministic transport map. After shell crossing, the map becomes multi-valued in Eulerian space and the deterministic Monge transport description must be replaced by a probabilistic coupling. This is precisely where entropic optimal transport and the Schrödinger bridge provide a natural extension.

8.3. Biased Galaxy Formation as an Information-Geometric Deformation

The matter density obtained above is not directly observed as a galaxy catalog. We therefore introduce a simple biased galaxy intensity field. Let

\begin{matrix} λ_{gal} (x) = \bar{n} {[\frac{ρ_{Z} (x, t)}{\bar{ρ}}]}^{α} = \bar{n} {[1 + δ_{Z} (x, t)]}^{α}, \end{matrix}

(138)

where

\bar{n}

is the mean galaxy number density and

α

is a bias parameter. In the weakly nonlinear regime,

\begin{matrix} λ_{gal} (x) = \bar{n} [1 + α δ_{Z} (x, t) + \frac{α (α - 1)}{2} δ_{Z}^{2} (x, t) + O (δ_{Z}^{3})] . \end{matrix}

(139)

Thus the usual linear bias coefficient is

\begin{matrix} b_{1} = α, \end{matrix}

(140)

while the quadratic bias coefficient is

\begin{matrix} b_{2} = α (α - 1) . \end{matrix}

(141)

This shows explicitly how a galaxy-formation model appears as a deformation of the matter distribution.

The information-geometric interpretation becomes clear by taking the logarithm:

\begin{matrix} ln λ_{gal} (x) = ln \bar{n} + α ln [1 + δ_{Z} (x, t)] . \end{matrix}

(142)

The parameter

α

is a natural parameter controlling the log-density response of the galaxy intensity field. Therefore, the transformation

\begin{matrix} ρ_{Z} ⟶ λ_{gal} \end{matrix}

(143)

is naturally interpreted as an e-geometric deformation in the statistical model of galaxy formation.

For small density contrast,

\begin{matrix} ln λ_{gal} (x) = ln \bar{n} + α δ_{Z} (x, t) - \frac{α}{2} δ_{Z}^{2} (x, t) + O (δ_{Z}^{3}) . \end{matrix}

(144)

This expression shows explicitly that nonlinear galaxy bias is represented as nonlinear curvature in log-density space.

8.4. Poisson Sampling and the Observed Empirical Measure

The observed catalog is obtained by finite sampling from the intensity field. In the simplest model, galaxies are generated by an inhomogeneous Poisson point process with intensity

λ_{gal} (x)

. The probability of observing N galaxies at positions

{x_{i}}_{i = 1}^{N}

is

\begin{matrix} P ({x_{i}}_{i = 1}^{N} | λ_{gal}) = exp [- \int_{0}^{L} λ_{gal} (x) d x] \prod_{i = 1}^{N} λ_{gal} (x_{i}) . \end{matrix}

(145)

The observed empirical measure is

\begin{matrix} μ_{N} (x) = \frac{1}{N} \sum_{i = 1}^{N} δ_{D} (x - x_{i}) . \end{matrix}

(146)

The expectation value of the empirical measure is

\begin{matrix} E [μ_{N} (d x)] = \frac{λ_{gal} (x)}{\int_{0}^{L} λ_{gal} (x^{'}) d x^{'}} d x . \end{matrix}

(147)

For a test function f, the empirical average is

\begin{matrix} \hat{f} = \int f (x) μ_{N} (d x) = \frac{1}{N} \sum_{i = 1}^{N} f (x_{i}), \end{matrix}

(148)

and its expectation is

\begin{matrix} E [\hat{f}] = \frac{\int_{0}^{L} f (x) λ_{gal} (x) d x}{\int_{0}^{L} λ_{gal} (x) d x} . \end{matrix}

(149)

Thus, the observed statistic is obtained by an m-affine averaging operation applied to the galaxy intensity distribution.

The fluctuation of the empirical statistic is

\begin{matrix} Var (\hat{f}) = \frac{1}{N} [\frac{\int_{0}^{L} f^{2} (x) λ_{gal} (x) d x}{\int_{0}^{L} λ_{gal} (x) d x} - {(\frac{\int_{0}^{L} f (x) λ_{gal} (x) d x}{\int_{0}^{L} λ_{gal} (x) d x})}^{2}] . \end{matrix}

(150)

This equation makes explicit how finite sampling appears as an information-geometric projection error. The factor

1 / N

is the familiar shot-noise scaling, while the weight

λ_{gal}

shows that the noise is controlled by the generative distribution itself.

8.5. The Full Generative Chain in the Toy Model

Combining the above steps, the toy model realizes the full cosmological generative chain:

\begin{matrix} Φ (q) ⟶ x (q, t) ⟶ ρ_{Z} (x, t) ⟶ λ_{gal} (x) ⟶ μ_{N} . \end{matrix}

(151)

Explicitly, the sequence is

\begin{matrix} Φ (q) & = - \frac{A}{k^{2}} cos (k q), \end{matrix}

(152)

\begin{matrix} x (q, t) & = q - D (t) \frac{A}{k} sin (k q), \end{matrix}

(153)

\begin{matrix} ρ_{Z} (x (q, t), t) & = \frac{\bar{ρ}}{1 - D (t) A cos (k q)}, \end{matrix}

(154)

\begin{matrix} λ_{gal} (x (q, t)) & = \bar{n} {[1 - D (t) A cos (k q)]}^{- α}, \end{matrix}

(155)

\begin{matrix} μ_{N} (x) & = \frac{1}{N} \sum_{i = 1}^{N} δ_{D} (x - x_{i}) . \end{matrix}

(156)

This chain shows explicitly the separation of three operations:

\begin{matrix} transport : Φ \to x (q, t) \to ρ_{Z}, \end{matrix}

(157)

\begin{matrix} statistical generation : ρ_{Z} \to λ_{gal}, \end{matrix}

(158)

\begin{matrix} observational projection : λ_{gal} \to μ_{N} . \end{matrix}

(159)

This example also clarifies the role of the entropic parameter

ε

. Before shell crossing and in the absence of stochasticity, the deterministic limit

ε \to 0

is appropriate and the transport is described by the map

x (q, t)

. However, near shell crossing the Jacobian

J (q, t)

approaches zero and the deterministic map becomes singular. In that regime, the transport relation is better represented by a coupling rather than a map:

\begin{matrix} π_{ε} (q, x) = exp [α_{0} (q) + β_{0} (x) - \frac{{| x - x (q, t) |}^{2}}{ε}] . \end{matrix}

(160)

For

ε \to 0

, this coupling concentrates on the deterministic graph

x = x (q, t)

:

\begin{matrix} π_{ε} (q, x) ⟶ ρ_{0} (q) δ_{D} (x - x (q, t)) . \end{matrix}

(161)

For finite

ε

, the coupling spreads around the deterministic map and represents stochastic transport, unresolved multi-streaming, velocity dispersion, coarse graining, or observational uncertainty. Thus,

ε

measures the departure from deterministic transport and controls the transition from a Monge map to a probabilistic coupling.

Figure 3 summarizes the complete generative chain discussed above. Starting from a smoothed Gaussian random field, the Zel’dovich approximation produces a deterministic transport map and the corresponding nonlinear matter density field through mass conservation. A biased galaxy intensity field is then generated from the transported density, and the final observed catalog is obtained through Poisson sampling.

8.6. Physical Interpretation of the Entropic Scale

The toy model discussed above clarifies the deterministic transport component of the transport–information framework. However, one of the central quantities appearing throughout the present formulation is the entropic scale parameter

ε

. In conventional entropy-regularized optimal transport,

ε

is often introduced primarily as a computational regularization parameter. Within the present framework, however, its role is considerably broader. The parameter

ε

controls the balance between deterministic transport and stochastic mixing and therefore determines the relative importance of transport geometry and information geometry in the cosmological generative process.

This interpretation becomes apparent in the Schrödinger bridge formulation,

\begin{matrix} \frac{\partial ρ}{\partial t} = - \nabla \cdot (ρ v) + ε Δ ρ, \end{matrix}

(162)

where the first term describes deterministic mass transport and the second term describes information diffusion. Dimensional analysis immediately shows that

\begin{matrix} [ε] = L^{2} T^{- 1}, \end{matrix}

(163)

which is identical to the dimension of a diffusion coefficient. Thus, at the most basic level,

ε

measures the strength of effective stochastic diffusion in the cosmological generative process.

An important point is that this diffusion should not be interpreted as a fundamental microscopic diffusion of dark matter particles. Rather, it represents an effective description of unresolved stochasticity. Several physically distinct mechanisms contribute to such stochasticity. A useful schematic decomposition is

\begin{matrix} ε = ε_{gal} + ε_{ms} + ε_{obs}, \end{matrix}

(164)

where

ε_{gal}

represents stochasticity associated with galaxy formation,

ε_{ms}

represents stochasticity induced by multistream dynamics and shell crossing, and

ε_{obs}

represents observational and numerical coarse graining.

The first contribution,

ε_{gal}

, includes scatter in halo occupation statistics, bursty star formation, stochastic feedback processes, and other baryonic effects that cannot be represented by deterministic transport alone. The second contribution,

ε_{ms}

, arises from the breakdown of the single-stream approximation. After shell crossing, multiple Lagrangian trajectories contribute to the same Eulerian position, producing velocity dispersion and effective mixing. The third contribution,

ε_{obs}

, encodes finite sampling, survey masks, smoothing kernels, finite spatial resolution, and other observational effects that transform a continuous distribution into a finite catalog.

The effective entropic scale is therefore expected to depend on the physical scale under consideration. Denoting explicitly this dependence by

ε (R)

, one expects

\begin{matrix} ε (R) \to 0 (R \to \infty), \end{matrix}

(165)

because large-scale evolution is well approximated by coherent gravitational transport. In contrast, at nonlinear and galaxy scales,

\begin{matrix} ε (R) > 0, \end{matrix}

(166)

and stochastic processes become increasingly important. From this viewpoint, the scale dependence of

ε

provides a geometric characterization of the transition from deterministic large-scale structure formation to stochastic small-scale dynamics.

The geometric significance of

ε

is perhaps even more fundamental. In the limit

\begin{matrix} ε \to 0 \end{matrix}

(167)

the transport plan collapses onto a deterministic Monge map and the dynamics are governed almost entirely by Wasserstein transport geometry. For finite

ε

, the coupling acquires an information-geometric component and multiple transport paths contribute simultaneously. Thus,

ε

controls the transition between a purely transport-dominated description and a transport–information description.

9. Outlook: Extensions and Connections of Transport–Information Cosmology

In the previous sections, we showed that the cosmological generative process can be formulated as dynamics on the transport–information manifold endowed with a dual-affine structure and that observation can be understood as an m-affine projection. This structure couples the transport geometry of mass distributions with the information geometry of generative models, and it provides a principle for decomposing deformations of probability distributions into distinct geometric operations. The concrete toy model in the previous section illustrates how this framework recovers the standard Zel’dovich description in the deterministic limit while extending it to biased galaxy generation and finite sampling. In this section, we discuss how the same viewpoint connects to non-Gaussian statistics, perturbation theory, observational summary statistics, and broader applications in cosmological inference.

9.1. Extension to Non-Gaussian and Higher-Order Structures

The formulation developed in this paper is not restricted to Gaussian statistics. The Gaussian approximation appears only as a local approximation around a particular point on the transport–information manifold. More generally, nonlinear structure formation, galaxy bias, and observational projection generate non-Gaussian probability distributions. In the present framework, these effects are not treated as exceptional corrections to a Gaussian theory but as geometric deformations of the underlying probability measure.

A useful way to understand this extension is through the exponential representation of the entropic coupling,

\begin{matrix} π (x, y) = exp [α (x) + β (y) - \frac{c (x, y)}{ε}] . \end{matrix}

(168)

This expression shows that the coupling belongs to an exponential family whose natural parameters are determined by the balance between transport cost and information regularization. Departures from Gaussianity are then encoded in nonlinear responses of the potentials

α

and

β

, and hence appear as nonlinear deformations of the coupling itself.

This hierarchy gives a natural geometric interpretation of conventional cosmological statistics. The power spectrum and two-point correlation function describe second-order fluctuations of the density field. The bispectrum represents the leading nonlinear mode-coupling response, and the trispectrum and higher polyspectra encode higher-order departures from Gaussianity. Thus, in the present framework,

\begin{matrix} P (k), ξ (r), B (k_{1}, k_{2}, k_{3}), T (k_{1}, k_{2}, k_{3}, k_{4}), \dots \end{matrix}

(169)

are not independent statistical objects but different levels of a common response hierarchy on the transport–information manifold.

This interpretation also clarifies the role of primordial non-Gaussianity. For example, local-type non-Gaussianity is usually written schematically as

\begin{matrix} Φ (x) = ϕ_{G} (x) + f_{NL} [ϕ_{G}^{2} (x) - 〈ϕ_{G}^{2}〉], \end{matrix}

(170)

where

ϕ_{G}

is a Gaussian random field. In the present language,

f_{NL}

parametrizes a displacement away from the Gaussian submanifold in the space of probability measures. Subsequent gravitational evolution transports this initial deformation, while galaxy formation and observation project it through information-geometric transformations. Therefore, primordial non-Gaussianity, nonlinear gravitational mode coupling and observational mixing can be described as successive geometric deformations of the same probability measure.

The same viewpoint applies to non-Gaussianity generated dynamically by nonlinear structure formation. In standard perturbation theory, the density contrast is expanded as

\begin{matrix} δ (k, t) = \sum_{n = 1}^{\infty} D^{n} (t) δ^{(n)} (k), \end{matrix}

(171)

where

\begin{matrix} δ^{(n)} (k) = \int \prod_{a = 1}^{n} \frac{d^{3} k_{a}}{{(2 π)}^{3}} {(2 π)}^{3} δ_{D} (k - \sum_{a = 1}^{n} k_{a}) F_{n} (k_{1}, \dots, k_{n}) \prod_{a = 1}^{n} δ_{lin} (k_{a}) . \end{matrix}

(172)

The kernels

F_{n}

describe nonlinear mode coupling generated by gravitational evolution. From the transport–information viewpoint, these kernels can be interpreted as local coordinate expressions of nonlinear transport on the Wasserstein component of the state space. The subsequent transformation from matter density to galaxy intensity and then to a finite catalog adds information-geometric and observational response terms on top of this transport response.

Thus, non-Gaussian statistics arise from at least three geometrically distinct sources: nonlinear transport of the matter density, nonlinear galaxy formation as deformation of the generative model, and observational mixing through m-projection. The present framework does not replace the perturbative expansion in Equation (171). Rather, it provides a geometric interpretation of such expansions by identifying the tangent directions and response tensors to which the perturbative kernels correspond. In this sense, standard perturbation theory appears as a local coordinate expansion of the transport sector, while galaxy bias and observational effects extend the expansion to the full transport–information manifold.

9.2. Unified Response Theory of Cosmological Statistics

The response structure introduced in this paper redefines cosmological statistics as measurements of deformations on the transport–information manifold. In conventional analyses, the correlation function, power spectrum, bispectrum, and other summary statistics are often treated as separate descriptors of the density field. In the present formulation, they are instead organized as responses of observables to variations in the transport state and in the generative model.

Let an observable be written as

\begin{matrix} O = O [ρ, θ], \end{matrix}

(173)

where

ρ

denotes the transported matter distribution and

θ

represents parameters of the generative model, including bias, selection, and sampling effects. A first variation of the observable is decomposed as

\begin{matrix} δ O = \int \frac{δ O}{δ ρ (x)} δ ρ (x) d^{3} x + \frac{\partial O}{\partial θ^{i}} δ θ^{i} . \end{matrix}

(174)

The first term is the response to spatial rearrangement of the mass distribution and belongs to the transport sector. The second term is the response to deformation of the generative model and belongs to the information-geometric sector. This decomposition reflects the direct-sum structure of tangent vectors on the product state space.

Higher-order responses are obtained by iterating the same variation. For example, the second variation contains

\begin{matrix} δ^{2} O = \int \frac{δ^{2} O}{δ ρ (x) δ ρ (y)} δ ρ (x) δ ρ (y) d^{3} x d^{3} y + 2 \int \frac{\partial}{\partial θ^{i}} (\frac{δ O}{δ ρ (x)}) δ ρ (x) δ θ^{i} d^{3} x + \frac{\partial^{2} O}{\partial θ^{i} \partial θ^{j}} δ θ^{i} δ θ^{j} + \dots . \end{matrix}

(175)

The three displayed terms correspond respectively to pure transport response, mixed transport–information response, and pure information-geometric response. This structure makes explicit how physical density evolution, galaxy formation, and observational selection contribute separately to a measured statistic.

Furthermore, this response structure is consistent with the decomposition of observed curvature. Namely, the relation

\begin{matrix} K_{obs} = K_{tr} + δ K_{bias} + δ K_{obs} \end{matrix}

(176)

is understood as the separation between curvature generated by transport, deformation induced by galaxy formation, and distortion caused by observational projection and finite sampling. Thus, observed curvature is not identified directly with physical curvature but is decomposed into transport, generative, and observational contributions.

The resulting picture provides a unified response theory of cosmological statistics. Linear theory corresponds to the first-order response around a homogeneous or Gaussian reference state. Standard perturbation theory corresponds to higher-order transport response in local coordinates. Galaxy bias corresponds to information-geometric deformation of the generative model. Survey masks, smoothing, selection functions, and finite sampling correspond to m-projections. Consequently, conventional observables such as

ξ (r)

,

P (k)

, the bispectrum, curvature statistics, and catalog-level estimators are reorganized as different projections of the same underlying transport–information response structure.

9.3. Implications for Observational Pipelines and Survey Analysis

The present theory also has direct consequences for the implementation of observational data analysis. In particular, data processing in cosmological surveys has conventionally been designed as an accumulation of statistical procedures, whereas in the present framework these procedures are uniformly understood as a chain of projections and coarse-graining operations based on m-geometry. That is, an observational pipeline is described as a composition of operations that map a distribution generated in transport geometry into a form that is observable in information-geometric terms.

Conceptually, an observational pipeline is represented as the mapping

\begin{matrix} ρ_{tr} ⟶ λ_{gal} ⟶ μ_{N} ⟶ \hat{O} . \end{matrix}

(177)

Here,

ρ_{tr}

is the matter distribution obtained through transport geometry,

λ_{gal}

is the galaxy intensity field determined by the generative model,

μ_{N}

is the observed measure as a finite sample, and

\hat{O}

is the estimated observable. From the viewpoint of the present theory, the first half of this transformation belongs primarily to generative deformation in e-geometry, while the second half is described as mixing and projection in m-geometry. Therefore, the entire observational pipeline is understood as a noncommutative composition of e-deformations and m-projections.

This decomposition clarifies the role of each processing stage. Specifically,

ρ_{tr} \to λ_{gal}

carries the nonlinearity of the generative model,

λ_{gal} \to μ_{N}

carries sampling and finite-sample realization, and

μ_{N} \to \hat{O}

carries statistical estimation and coarse graining. All of these are described in a unified way as deformations of probability distributions, but their geometric properties are distinct.

This structure provides a design principle for data analysis. Each processing step is classified as a concrete transformation in m-geometry, and its effect is evaluated in terms of deformations of curvature and response. For example, smoothing attenuates curvature as a local mixing operation, while binning coarse-grains response as information loss associated with discretization. Weighting and selection functions appear as rescalings of the distribution, and all of these are uniformly treated as m-affine transformations.

This viewpoint also suggests a systematic way to compare different surveys. Suppose that two surveys observe the same underlying transport-generated distribution but with different projection operators

Π_{1}

and

Π_{2}

. Their observed catalogs may be written as

\begin{matrix} μ_{N}^{(1)} = Π_{1} (ρ_{tr}, θ), μ_{N}^{(2)} = Π_{2} (ρ_{tr}, θ) . \end{matrix}

(178)

The difference between the surveys is then attributed to the difference between the corresponding m-projections rather than to a different underlying transport process. Survey cross-calibration can therefore be formulated as the problem of identifying and correcting the projection operators relative to a common transport geometry.

Furthermore, this viewpoint organizes the treatment of observational and systematic errors geometrically. Noise and missing data are represented as random mixtures of distributions, and the resulting degradation is quantified as a distortion of the response structure. Accordingly, error correction is formulated as an inverse problem in m-geometry, and it is understood as an inverse projection from the observed distribution to the generative distribution.

The present theory also provides a clear guideline for the design of estimators. Observables should be constructed not merely as statistical summaries, but as quantities reflecting geometric features of the distribution. In particular, estimators based on curvature and response capture simultaneously the structure in transport geometry and the distinguishability in information geometry and therefore combine physical interpretability with statistical stability. In this sense, transport–information geometry does not prescribe a single new estimator but provides a classification principle for existing and future estimators according to the geometric operations to which they are sensitive. Accordingly, transport–information geometry redefines cosmological data analysis as a chain of geometric operations. Through this redefinition, an observational pipeline is no longer a collection of empirical methods but can be designed as a systematic structure based on the integration of transport geometry and information geometry.

9.4. Connections to Information Geometry and Statistical Learning

The framework developed in this paper is not specific to cosmology but is deeply connected to the geometry of probability distributions in general. In particular, the exponential coupling in entropic optimal transport shares the same structure as exponential families in information geometry and demonstrates that transport geometry and information geometry are unified through a dual-affine structure.

This framework is also fundamentally consistent with statistical learning theory. Generative models and energy-based models represent distributions by directly learning the log-density function. In this setting, learning can be interpreted as the optimization of distances and curvature in distribution space, closely corresponding to geodesics and gradient flows in information geometry. Therefore, transport–information geometry provides the geometric foundation for these learning algorithms.

Further, stochastic processes based on the Schrödinger bridge can be understood as dynamical extensions of generative models. In this framework, the time evolution of distributions is described as a coupling of transport and diffusion, which is precisely the dynamical realization of entropic optimal transport. This structure is isomorphic to diffusion models and stochastic generative models, showing that distribution generation and transport are governed by a common variational principle.

More explicitly, score-based generative models evolve samples by a stochastic differential equation whose drift is determined by the score

\nabla ln p_{t}

. In the present notation, this score is a vector field in log-density space and therefore belongs naturally to the information-geometric description. At the same time, the movement of probability mass through the stochastic process is a transport operation. Thus, diffusion-based generative models provide computational realizations of the same transport–information coupling that appears in entropic optimal transport and Schrödinger bridge theory.

This connection is particularly relevant for cosmological inference. In Bayesian reconstruction of the large-scale structure, one seeks a posterior distribution over initial conditions or latent density fields given an observed catalog. In the present framework, this problem can be interpreted as inference on the transport–information manifold: the transport component reconstructs the mass rearrangement, while the information component accounts for bias, stochastic galaxy formation, selection effects, and sampling. Consequently, transport-based reconstruction, likelihood-free inference, score-based generative modeling, and Schrödinger bridge methods can be understood as complementary computational implementations of the same geometric structure.

From these perspectives, the present theory provides a natural bridge between cosmology and machine learning. Cosmological data analysis is reformulated as geometric inference on probability distributions, while generative models in machine learning are interpreted as computational realizations of this geometry. This correspondence enables the development of new analysis methods and learning algorithms based on transport geometry and information geometry. The extensions discussed in this section illustrate the broad applicability of transport–information geometry beyond the specific context of cosmology. This general framework, which unifies the generation, transport, and observation of probability distributions through a dual-affine structure, opens promising avenues for further theoretical and practical developments in various fields.

10. Conclusions

In this paper, we have reformulated cosmological large-scale structure formation as a hierarchical generative process and expressed it within the unified framework of transport–information geometry. The central idea is to define the cosmological state space as the product manifold

\begin{matrix} G_{cosmo} = P_{2} (R^{3}) \times M_{cosmo}, \end{matrix}

(179)

where the first factor carries the Wasserstein transport geometry of mass distributions and the second carries the information geometry of generative models equipped with the Fisher metric and dual-affine connections.

On this state space, cosmological evolution is described as geometric dynamics governed by the action

\begin{matrix} A [(ρ, θ)] = \frac{1}{2} \int_{t_{0}}^{t_{1}} [{\int ρ (x, t) | v (x, t) |}^{2} d^{3} x + λ g_{i j} (θ (t)) {\dot{θ}}^{i} (t) {\dot{θ}}^{j} (t)] d t . \end{matrix}

(180)

The first term represents the transport kinetic energy associated with mass rearrangement, while the second term represents the information-theoretic kinetic energy associated with deformation of the generative model. Accordingly, gravitational evolution and galaxy-formation modeling are described as coupled dynamics of Wasserstein transport and information-geometric deformation, while observational processes are incorporated as m-affine projections on the same geometric stage.

A key ingredient of the present formulation is entropic optimal transport, which provides a variational bridge between transport geometry and information geometry. In particular, the optimal coupling takes the exponential form

\begin{matrix} π (x, y) = exp (α (x) + β (y) - \frac{c (x, y)}{ε}), \end{matrix}

(181)

which exhibits an e-flat structure, while the full space of couplings forms an m-flat probability simplex. This makes the dual-affine structure explicit and allows the cosmological generative process to be understood as an alternation between generative deformations in e-geometry and mixing transformations in m-geometry.

The physical meaning of the entropic scale

ε

was also clarified. Rather than being merely a numerical regularization parameter,

ε

acts as an effective measure of stochasticity and information diffusion. In the deterministic limit

ε \to 0

, the framework reduces to mass-conserving transport and recovers the standard continuity equation and the Zel’dovich approximation. For finite

ε

, stochastic transport, multistream mixing, galaxy-formation scatter, and observational coarse graining are incorporated within a unified geometric description.

Observation is then naturally formulated as an m-affine projection, where the observed catalog is obtained through selection functions, survey windows, and finite sampling. Under this formulation, observed curvature is not identical to the true transport curvature, but is understood as the m-projected image of the generative curvature. This leads to the geometric decomposition

\begin{matrix} K_{obs} = K_{tr} + δ K_{samp} + δ K_{sel} + δ K_{mix} + δ K_{disc}, \end{matrix}

(182)

which allows a clear separation between quantities belonging to the physical generative process and those arising from observational projection.

To demonstrate the concrete realization of the framework, we constructed a one-dimensional toy model based on a smoothed Gaussian random field and the Zel’dovich approximation. The resulting generative chain

δ_{lin} ⟶ x (q) ⟶ ρ_{Z} ⟶ λ_{gal} ⟶ μ_{N}

explicitly illustrates the successive roles of deterministic transport, information-geometric deformation associated with galaxy formation, and observational projection through finite sampling. This example shows that the present framework is not merely an abstract reformulation but a direct extension of standard cosmological structure formation.

The conclusions of this work can be summarized as follows. Cosmological large-scale structure formation is a dual-affine process consisting of mass transport, statistical generation, and observational mixing, and its natural stage is the transport–information manifold. Within this framework, gravitational evolution is unified as generative transport in e-geometry, while observation is unified as projection in m-geometry. Thus, cosmology can be compactly expressed as the coupling of transport and information.

The framework furthermore provides a common geometric interpretation of several concepts that are usually discussed separately, including nonlinear perturbative evolution, non-Gaussian statistics, galaxy bias, finite sampling, survey selection, Bayesian reconstruction, and modern generative inference methods. In this picture, conventional perturbation theory appears as a local coordinate expansion of transport dynamics, while higher-order statistics and response functions correspond to geometric response tensors on the transport–information manifold.

The significance of this formulation lies not merely in introducing a new mathematical language to cosmology. Rather, the essential contribution is the reorganization of gravitational evolution, galaxy bias, finite sampling, observational selection, and inference problems—traditionally treated separately—under a single geometric principle given by the coupling of transport geometry and information geometry. Through this reorganization, cosmological structure formation is understood as dynamics of probability distributions, and observational data analysis is formulated as its geometric inverse problem.

From this perspective, transport–information geometry provides a common framework connecting optimal transport cosmology, Schrödinger bridge dynamics, information geometry, Bayesian reconstruction, and modern diffusion-based generative methods. This unification suggests that transport and information are not competing descriptions of cosmological evolution but complementary aspects of a single probabilistic geometric structure.

Therefore, transport–information geometry is positioned not only as a geometric framework for cosmological structure formation but also as a general language for the generation, transport, and observation of probability distributions.

Funding

This work was supported by JSPS Grant-in-Aid for scientific research (24H00247). This work was also supported in part by the Collaboration Funding of the Institute of Statistical Mathematics “Machine-Learning-Based Cosmogony: From Structure Formation to Galaxy Evolution”.

Data Availability Statement

No new data were created in this study.

Acknowledgments

We sincerely thank Shiro Ikeda for providing the ideas that motivated this work.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Villani, C. Optimal Transport: Old and New; Grundlehren der Mathematischen Wissenschaften; Springer: Berlin, Germany, 2009. [Google Scholar]
Ambrosio, L.; Gigli, N.; Savaré, G. Gradient Flows: In Metric Spaces and in the Space of Probability Measures, 2nd ed.; Lectures in Mathematics ETH Zürich, Birkhäuser: Basel, Switzerland, 2008. [Google Scholar] [CrossRef]
Peyré, G.; Cuturi, M. Computational Optimal Transport: With Applications to Data Science. Found. Trends Mach. Learn. 2019, 11, 355–607. [Google Scholar] [CrossRef]
Brenier, Y.; Frisch, U.; Hénon, M.; Loeper, G.; Matarrese, S.; Mohayaee, R.; Sobolevskiĭ, A. Reconstruction of the early Universe as a convex optimization problem. Mon. Not. R. Astron. Soc. 2003, 346, 501–524. [Google Scholar] [CrossRef]
Amari, S.I. Information Geometry and Its Applications; Applied Mathematical Sciences; Springer: Tokyo, Japan, 2016; Volume 194. [Google Scholar] [CrossRef]
Nielsen, F. An Elementary Introduction to Information Geometry. Entropy 2020, 22, 1100. [Google Scholar] [CrossRef]
Amari, S.; Nagaoka, H. Methods of Information Geometry; Translations of Mathematical Monographs; American Mathematical Society: Providence, RI, USA, 2000. [Google Scholar]
Zel’dovich, Y.B.; Novikov, I.D. Relativistic Astrophysics. Vol. 2: The Structure and Evolution of the Universe; University of Chicago Press: Chicago, IL, USA; London, UK, 1983. [Google Scholar]
Cuturi, M. Sinkhorn Distances: Lightspeed Computation of Optimal Transport. In Proceedings of the Advances in Neural Information Processing Systems; Burges, C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K., Eds.; Curran Associates, Inc.: Nice, France, 2013; Volume 26. [Google Scholar]
Léonard, C. From the Schrödinger Problem to the Monge–Kantorovich Problem. J. Funct. Anal. 2012, 262, 1879–1920. [Google Scholar] [CrossRef]
Léonard, C. A survey of the Schrödinger problem and some of its connections with optimal transport. Discret. Contin. Dyn. Syst. 2014, 34, 1533–1574. [Google Scholar] [CrossRef]
Benamou, J.D.; Brenier, Y. A Computational Fluid Mechanics Solution to the Monge–Kantorovich Mass Transfer Problem. Numer. Math. 2000, 84, 375–393. [Google Scholar] [CrossRef]
Jordan, R.; Kinderlehrer, D.; Otto, F. The Variational Formulation of the Fokker–Planck Equation. SIAM J. Math. Anal. 1998, 29, 1–17. [Google Scholar] [CrossRef]
Otto, F. The geometry of dissipative evolution equations: The porous medium equation. Commun. Partial. Differ. Equ. 2001, 26, 101–174. [Google Scholar] [CrossRef]
Zel’dovich, Y.B. Gravitational instability: An approximate theory for large density perturbations. Astron. Astrophys. 1970, 5, 84–89. [Google Scholar]
Arnold, V.I. Mathematical Methods of Classical Mechanics, 2nd ed.; Graduate Texts in Mathematics; Springer: New York, NY, USA, 1989; Volume 60. [Google Scholar] [CrossRef]
Takeuchi, T.T. A Geometric Theory of Cosmological Structure via Entropic Curvature in Wasserstein Space. arXiv 2026, arXiv:2604.00593. [Google Scholar] [CrossRef]
Jasche, J.; Wandelt, B.D. Bayesian physical reconstruction of initial conditions from large-scale structure surveys. Mon. Not. R. Astron. Soc. 2013, 432, 894–913. [Google Scholar] [CrossRef]

Figure 1. Schematic illustration of the dual-affine generative process in transport–information geometry. Here,

θ

denotes the generative-model parameters defined on the information manifold. The cosmological state is described by the coupled evolution of the mass distribution

ρ_{t}

in transport space and the generative parameters

θ_{t}

on the information manifold. The solid trajectories represent e-geometric generative deformation, corresponding to deterministic transport and model evolution. The diffuse spreading around the transport trajectory represents intrinsic stochastic mixing, modeled as diffusion

ε Δ ρ_{t}

. The final projection to the observed catalog

μ_{N}

represents m-geometric mixing and projection, incorporating sampling, selection, and coarse-graining effects. This diagram emphasizes that m-geometry appears both intrinsically in the stochastic dynamics and externally in the observational process.

Figure 1. Schematic illustration of the dual-affine generative process in transport–information geometry. Here,

θ

denotes the generative-model parameters defined on the information manifold. The cosmological state is described by the coupled evolution of the mass distribution

ρ_{t}

in transport space and the generative parameters

θ_{t}

on the information manifold. The solid trajectories represent e-geometric generative deformation, corresponding to deterministic transport and model evolution. The diffuse spreading around the transport trajectory represents intrinsic stochastic mixing, modeled as diffusion

ε Δ ρ_{t}

. The final projection to the observed catalog

μ_{N}

represents m-geometric mixing and projection, incorporating sampling, selection, and coarse-graining effects. This diagram emphasizes that m-geometry appears both intrinsically in the stochastic dynamics and externally in the observational process.

Figure 2. Illustrative Gaussian toy model demonstrating the transport–information framework: (a) Initial and final Gaussian densities, representing the endpoints of a transport process. (b) Deterministic displacement interpolation corresponding to the transport-dominated limit

ϵ \to 0

, which follows an optimal-transport geodesic. (c) Entropy-regularized evolution for finite

ϵ

, where the density acquires an information-diffusion component. Larger values of

ϵ

produce broader intermediate distributions, illustrating the increasing role of stochastic mixing. (d) Dual-affine interpretation of the toy model. The upper level represents the cosmological generative chain from the initial density to the observed catalog, while the lower level shows the corresponding transport and information sectors. The central entropic optimal transport (OT)/Schrödinger bridge block connects the Wasserstein transport geometry and the KL/Fisher information geometry. Although highly simplified, this example demonstrates that the proposed framework is operational and produces explicitly computable transport and diffusion trajectories rather than remaining a purely abstract geometric construction.

Figure 2. Illustrative Gaussian toy model demonstrating the transport–information framework: (a) Initial and final Gaussian densities, representing the endpoints of a transport process. (b) Deterministic displacement interpolation corresponding to the transport-dominated limit

ϵ \to 0

, which follows an optimal-transport geodesic. (c) Entropy-regularized evolution for finite

ϵ

, where the density acquires an information-diffusion component. Larger values of

ϵ

produce broader intermediate distributions, illustrating the increasing role of stochastic mixing. (d) Dual-affine interpretation of the toy model. The upper level represents the cosmological generative chain from the initial density to the observed catalog, while the lower level shows the corresponding transport and information sectors. The central entropic optimal transport (OT)/Schrödinger bridge block connects the Wasserstein transport geometry and the KL/Fisher information geometry. Although highly simplified, this example demonstrates that the proposed framework is operational and produces explicitly computable transport and diffusion trajectories rather than remaining a purely abstract geometric construction.

Figure 3. One-dimensional realization of the cosmological generative chain using a smoothed Gaussian random field and the Zel’dovich approximation. Panels (a–e) show the successive transformations

δ_{lin} \to x (q) \to ρ_{Z} \to λ_{gal} \to μ_{N}

. Panel (a) shows the initial density contrast field, panel (b) the Zel’dovich transport map, panel (c) the transported matter density, panel (d) the biased galaxy intensity field, and panel (e) a Poisson realization of the resulting galaxy catalog. In panel (d), the dashed curve shows the underlying matter density field for comparison. The figure illustrates the separation between deterministic mass transport, information-geometric deformation associated with galaxy formation, and observational projection through finite sampling.

Figure 3. One-dimensional realization of the cosmological generative chain using a smoothed Gaussian random field and the Zel’dovich approximation. Panels (a–e) show the successive transformations

δ_{lin} \to x (q) \to ρ_{Z} \to λ_{gal} \to μ_{N}

. Panel (a) shows the initial density contrast field, panel (b) the Zel’dovich transport map, panel (c) the transported matter density, panel (d) the biased galaxy intensity field, and panel (e) a Poisson realization of the resulting galaxy catalog. In panel (d), the dashed curve shows the underlying matter density field for comparison. The figure illustrates the separation between deterministic mass transport, information-geometric deformation associated with galaxy formation, and observational projection through finite sampling.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Takeuchi, T.T. A Transport–Information Geometric Formulation of Cosmic Structure Formation: A Unified Dual-Affine Perspective. Symmetry 2026, 18, 992. https://doi.org/10.3390/sym18060992

AMA Style

Takeuchi TT. A Transport–Information Geometric Formulation of Cosmic Structure Formation: A Unified Dual-Affine Perspective. Symmetry. 2026; 18(6):992. https://doi.org/10.3390/sym18060992

Chicago/Turabian Style

Takeuchi, Tsutomu T. 2026. "A Transport–Information Geometric Formulation of Cosmic Structure Formation: A Unified Dual-Affine Perspective" Symmetry 18, no. 6: 992. https://doi.org/10.3390/sym18060992

APA Style

Takeuchi, T. T. (2026). A Transport–Information Geometric Formulation of Cosmic Structure Formation: A Unified Dual-Affine Perspective. Symmetry, 18(6), 992. https://doi.org/10.3390/sym18060992

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Transport–Information Geometric Formulation of Cosmic Structure Formation: A Unified Dual-Affine Perspective

Abstract

1. Introduction

2. Dual-Affine Structure of Cosmological Generation

3. Entropic Optimal Transport and Exponential Couplings

3.1. Discrete Formulation and Sinkhorn Scaling

3.2. Exponential Family Structure of Entropic Couplings

3.3. The m-Flat Structure of the Coupling Simplex

3.4. Schrödinger Bridge and Stochastic Transport

3.5. Log-Density Representation and Observational Coordinates

3.6. Interpretation for Cosmological Generation

3.7. Illustrative Gaussian Transport Example

4. Transport–Information Manifold of Cosmological States

4.1. Cosmological State Space

4.2. Metric Structure

4.3. Dual-Affine Connections

4.4. Geometric Trajectories and Zel’dovich Constraint Manifold

5. Cosmological Dynamics on the Transport–Information Manifold

5.1. Transport–Information Action

5.2. Hamiltonian Formulation

5.3. Curvature, Gradient Flows, and Zel’dovich Dynamics

6. Observation as m -Projection and Curvature Deformation

6.1. Observation as an m-Affine Projection

6.2. Decomposition of Curvature Deformation

6.3. Canonical Examples of Observational Deformation

6.4. Hierarchy of Estimators and Response Structure

7. Implications for Cosmological Inference and Structure Formation

7.1. Reinterpretation of Cosmological Inference

7.2. Structure Formation as Dual-Affine Dynamics

7.3. Separation of Physical and Observational Effects

7.4. Scale Dependence and Effective Theory

8. Concrete Cosmological Limit and a One-Dimensional Toy Model

8.1. Standard Cosmological Limit

8.2. One-Dimensional Zel’dovich Transport

8.3. Biased Galaxy Formation as an Information-Geometric Deformation

8.4. Poisson Sampling and the Observed Empirical Measure

8.5. The Full Generative Chain in the Toy Model

8.6. Physical Interpretation of the Entropic Scale

9. Outlook: Extensions and Connections of Transport–Information Cosmology

9.1. Extension to Non-Gaussian and Higher-Order Structures

9.2. Unified Response Theory of Cosmological Statistics

9.3. Implications for Observational Pipelines and Survey Analysis

9.4. Connections to Information Geometry and Statistical Learning

10. Conclusions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

6. Observation as $m$ -Projection and Curvature Deformation