Diffeological Statistical Models, the Fisher Metric and Probabilistic Mappings

Hông Vân Lê

doi:10.3390/math8020167

Institute of Mathematics, Czech Academy of Sciences, Zitna 25, 11567 Praha 1, Czech Republic

Mathematics2020, 8(2), 167;https://doi.org/10.3390/math8020167

This article belongs to the Special Issue Geometry and Topology in Statistics

Version Notes

Order Reprints

Abstract

We introduce the notion of a

C^{k}

-diffeological statistical model, which allows us to apply the theory of diffeological spaces to (possibly singular) statistical models. In particular, we introduce a class of almost 2-integrable

C^{k}

-diffeological statistical models that encompasses all known statistical models for which the Fisher metric is defined. This class contains a statistical model which does not appear in the Ay–Jost–Lê–Schwachhöfer theory of parametrized measure models. Then, we show that, for any positive integer k , the class of almost 2-integrable

C^{k}

-diffeological statistical models is preserved under probabilistic mappings. Furthermore, the monotonicity theorem for the Fisher metric also holds for this class. As a consequence, the Fisher metric on an almost 2-integrable

C^{k}

-diffeological statistical model

P \subset P (X)

is preserved under any probabilistic mapping

T : X ⇝ Y

that is sufficient w.r.t. P. Finally, we extend the Cramér–Rao inequality to the class of 2-integrable

C^{k}

-diffeological statistical models.

Keywords:

statistical model; diffeology; the Fisher metric; probabilistic mapping; Cramér-Rao inequality

1. Introduction

In mathematical statistics, the notion of a statistical model and the notion of a parameterized statistical model are of central importance [1]. For a measurable space

X

, let us denote by

P (X)

the space of all probability measures on

X

. According to currently accepted theories, see e.g., [1] and the references therein, a statistical model is a subset

P_{X} \subset P (X)

and a parameterized statistical model is a parameter set

Θ

, together with a mapping

p : Θ \to P (X)

. The image

p (Θ) \subset P (X)

is a statistical model endowed with the parameterization

p : Θ \to p (Θ)

. If the parameter set

Θ

is a smooth manifold, then we can study a statistical model

p (Θ)

, endowed with a parameterization

p : Θ \to p (Θ) \subset P (X)

, by applying differential geometric techniques to

Θ

and to smooth the mappings

p : Θ \to P (X)

.

This idea lies in the heart of the field of information geometry, which is in the domain of mathematical statistics, where we study (parameterized) statistical models using techniques of differential geometry [2,3,4,5]. In the book “Information Geometry" by Ay, Jost, Lê, and Schwachhöfer, a parameterized statistical model is a triple

(M, X, p)

where M is a Banach manifold,

X

is a measurable space, and

i \circ p : M \overset{p}{\to} P (X) \overset{i}{\to} S (X)

is a

C^{1}

-map. Here

S (X)

is the Banach space of all signed finite measures on

X

endowed with the total variation norm

{∥ \cdot ∥}_{T V}

and i is the natural inclusion. We would like to emphasize that the concept of a parameterized statistical model introduced in [5,6,7] encompasses statistical models endowed with the structure of a finite dimensional manifold [2,3,8], or with the structure of an infinite dimensional Banach manifold [9]. The theory of parameterized measure models, moreover, allows us to study singular statistical models

P_{X}

using differential geometric techniques, if

P_{X}

is endowed with a parameterization by a Banach manifold.

In this study, inspired by the theory of diffeological spaces founded by Souriau and developed further by many people, we shall generalize the concept of a parameterized statistical model to the concept of a

C^{k}

-diffeological statistical model

P \subset P (X)

, which, by definition, is a subset in

P (X)

endowed with a compatible

C^{k}

-diffeology. We shall show that the concept of a

C^{k}

-diffeological statistical model is more flexible than the concept of a parameterized statistical model. In particular, the image

p (M)

of any parameterized statistical model

(M, X, p)

has a natural compatible

C^{1}

-diffeology. Moreover, for any

k \in N^{+} \cup \infty

, any subset in

P (X)

can be provided with a compatible

C^{k}

-diffeology (and hence it has a structure of a

C^{k}

-diffeological statistical model).

Furthermore, not every subset in

P (X)

can be written as

p (M)

for some parameterized statistical model

(M, X, p)

. Hence the class of

C^{1}

-diffeological statistical models is larger than the class of statistical models parameterized by Banach manifolds as the Ay–Jost–Lê–Schwachhöfer theory. We also extend conceptually many results in the Ay–Jost–Lê–Schwachhöfer theory concerning the differential geometry of parameterized statistical models and their application to statistics and to the class of

C^{k}

-diffeological statistical models, using the theory of probabilistic mappings, developed in a recent work by Jost, Lê, Luu and Tran [10].

Our paper is organized as follows. In the second section we introduce the notions of

C^{k}

-diffeological statistical models, almost 2-integrable

C^{k}

-diffeological statistical models, and 2-integrable

C^{k}

-diffeological statistical models. In the third section we recall the notion of probabilistic mappings and related results in [10] and prove that the class of (almost 2-integrable/resp. 2-integrable)

C^{k}

-statistical models is preserved under probabilistic mappings (Theorem 1). Then we extend the monotonicity of the Fisher metric on 2-integrable parameterized statistical models to the class of almost 2-integrable

C^{k}

-diffeological statistical models (Theorem 2). In the last section, we prove a diffeological version of the Cramér–Rao inequality (Theorem 3) which extends previously known versions of the Cramér–Rao inequality in [5,11]. We conclude our paper with a discussion on some future directions and open questions.

2. Almost 2-Integrable Diffeological Statistical Models

Given a statistical model,

P \subset P (X)

, which we also denote by

P_{X}

, it is known that

P_{X}

is endowed with a natural geometric structure induced from the Banach space

(S (X), | |, | |_{T V})

.

Definition 1.

(cf. [5], Definition 3.2, p. 141) (1) Let

(V, ∥ \cdot ∥)

be a Banach space,

X \overset{i}{↪} V

be an arbitrary subset, where i denotes the inclusion, and

x_{0} \in X

. Then

v \in V

is called a tangent vector of X at

x_{0}

, if there is a

C^{1}

-map

c : R \to X

, i.e., the composition

i \circ c : R \to V

is a

C^{1}

-map, such that

c (0) = x_{0}

and

\dot{c} (0) = v

.

(2) The tangent (double) cone

C_{x} X

at a point

x \in X

is defined as the subset of the tangent space

T_{x} V = V

that consists of tangent vectors of X at x. The tangent space

T_{x} X

is the linear hull of the tangent cone

C_{x} X

.

(3) The tangent cone fibration

C X

(resp. the tangent fibration

T X

) is the union

\cup_{x \in X} C_{x} X

(resp.

\cup_{x \in X} T_{x} X

), which is a subset of

V \times V

and, therefore, it is endowed with the induced topology from

V \times V

.

Remark 1.

(1) The notion of a tangent cone in Definition 1 occurs in a similar fashion in the theory of singular spaces, see e.g., [12], §3, [13], §3, [14], p. 166.

(2) Definition 1 differs from [5], Definition 3.1, in that, in Definition 1, the domain of a

C^{1}

-curve c is

R

and in [5] the domain of a

C^{1}

-curve c is

(- ε, ε)

. Since

(- ε, ε)

is diffeomorphic to

R

, both the two choices of the domain of c are equivalent.

Example 1.

Let us consider a mixture family

P_{X}

of probability measures

p_{η} μ_{0}

on

X

that are dominated by

μ_{0} \in P (X)

, where the density functions,

p_{η}

, are of the following form

p_{η} (x) : = g^{1} (x) η_{1} + g^{2} (x) η_{1} + g^{3} (x) (1 - η_{1} - η_{2}) for x \in X .

(1)

Here

g^{i}

, for

i = 1, 2, 3

, are nonnegative functions on

X

, such that

E_{μ_{0}} (g^{i}) = 1

and

η = (η_{1}, η_{2}) \in D_{b} \subset R^{2}

is a parameter, which will be specified as follows. Let us divide the square

D = [0, 1] \times [0, 1] \subset R^{2}

into smaller squares and color them in black and white as with a chessboard. Let

D_{b}

be the closure of the subset of D colored in black. If η is an interior point of

D_{b}

, then

C_{p_{η}} P_{X} = R^{2}

. If η is a boundary point of

D_{b}

, then

C_{p_{η}} P_{X} = R

. If η is a corner point of

D_{b}

, then

C_{p_{η}} P_{X}

consists of two intersecting lines.

Let $P_{X}$ be a statistical model. Then it is known that any $v \in C_{ξ} P_{X}$ is dominated by $ξ$ . Hence the logarithmic representation of v

$log v : = d v / d ξ$

(2)

is an element of $L^{1} (X, ξ)$ . The set ${log v | v \in C_{ξ} P_{X}}$ is a subset in $L^{1} (X, ξ)$ . We denote it by $log (C_{ξ} P_{X})$ and will call it the logarithmic representation of $C_{ξ} P_{X}$ .
Next we want to put a Riemannian metric on a statistical model $P_{X}$ i.e., to put a positive quadratic form $g$ on each tangent space $T_{ξ} P_{X} \subset L^{1} (X, ξ)$ . The space $L^{1} (X, ξ)$ does not have a natural metric but its subspace $L^{2} (X, ξ)$ is a Hilbert space.

Definition 2.

A statistical model

P_{X}

will be called almost 2-integrable, if

log (C_{ξ} P_{X}) \subset L^{2} (X, ξ)

(3)

for all

ξ \in P_{X}

. In this case we define the Fisher metric

g

on

P_{X}

as follows. For each

v, w \in C_{ξ} P_{X}

g_{ξ} (v, w) : = {⟨ log v, log w ⟩}_{L^{2} (X, ξ)} = \int_{X} log v \cdot log w d ξ .

(4)

Since

T_{ξ} P_{X}

is the linear hull of

C_{ξ} P_{X}

, Formula (4) extends uniquely to a positive quadratic form on

T_{ξ} P_{X}

, which is called the Fisher metric.

Example 2.

Let us reconsider Example 1. Recall that our statistical model

P_{X}

is parameterized by a map

p : D_{b} \to S (X), η \mapsto p_{η} \cdot μ_{0},

which is the restriction of the affine map

L : R^{2} \to S (X)

, defined by the same formula. Hence, any tangent vector

\tilde{v} \in T_{η} P_{X}

can be written as

\tilde{v} = d p (v)

where

v \in T_{η} D_{b}

. For

v = (v_{1}, v_{2}) \in T_{η} D_{b}

, we have

d p (v) = [(g^{1} - g^{3}) v_{1} + (g^{2} - g^{3}) v_{2}] μ_{0}

. If

g^{i} (x) > 0

for all

x \in X

and

i = 1, 2, 3

, then

p_{η} (x) > 0

for all

x \in X

and all

η \in D_{b}

. Therefore

log d p {(v)}_{| p (η)} = \frac{d p (v)}{d (p_{η} μ_{0})} = \frac{(g^{1} - g^{3}) v_{1} + (g^{2} - g^{3}) v_{2}}{p_{η}} \in L^{1} (X, p (η)) .

Hence

P_{X}

is almost 2-integrable, if

\frac{g^{1} - g^{3}}{{\sqrt{p}}_{η}}, \frac{g^{2} - g^{3}}{{\sqrt{p}}_{η}} \in L^{2} (X, μ_{0}) \forall η \in D_{b} .

In this case we have

g_{| p (η)} (d p (v), d p (w)) = {⟨ log d p (v), log d p (w) ⟩}_{L^{2} (X, p (η))} .

(5)

Next we shall introduce the notion of a

C^{k}

-diffeological statistical model.

Definition 3.

For

k \in N^{+} \cup \infty

and a nonempty set X, a

C^{k}

-diffeology of X is a set

D

of mappings

p : U \to X

, where U is an open domain in

R^{n}

, and n runs over nonnegative integers, such that the three following axioms are satisfied.

D1. Covering. The set

D

contains the constant mappings

x : r \mapsto x

, defined on

R^{n}

, for all

x \in X

and for all

n \in N

.

D2. Locality. Let

p : U \to X

be a mapping. If for every point

r \in U

there exists an open neighborhood V of r, such that

p_{| V}

belongs to

D

then the map

p

belongs to

D

.

D3. Smooth compatibility. For every element

p : U \to X

of

D

, for every real domain V, for every

ψ \in C^{k} (V, U)

,

p \circ ψ

belongs to

D

.

A

C^{k}

-diffeological space is a nonempty set equipped with a

C^{k}

-diffeology

D

. Elements

p : U \to X

of

D

will be called

C^{k}

-maps from U to X.

A statistical model

P_{X}

endowed with a

C^{k}

-diffeology

D_{X}

will be called a

C^{k}

-diffeological statistical model, if for any map

p : U \to P_{X}

in

D_{X}

the composition

i \circ p : U \to S (X)

is a

C^{k}

-map.

Remark 2.

(1) In [14], Iglesias-Zemmour considered only

C^{\infty}

-diffeologies. The notion of a

C^{k}

-diffeology, as given in Definition 3 is a straightforward adaptation of the concept of a smooth diffeology, as given in [14], §1.5.

(2) As

(S (X), ∥ \cdot ∥_{T V})

is a Banach space, by [15], Lemma 3.11, p. 30, a compatible

C^{\infty}

-diffeology on a statistical model

P_{X}

is defined by smooth maps

c : R \to P_{X}

.

(3) Given a

C^{k}

-diffeological statistical model

(P_{X}, D_{X})

and

ξ \in P_{X}

, the tangent cone

C_{ξ} (P_{X}, D_{X})

is the subset of

C_{ξ} P_{X}

that consists of the tangent vectors

\dot{c} (0)

of

C^{k}

-curves

c : R \to X

in

D_{X}

, such that

c (0) = ξ

. Similarly, the tangent space

T_{ξ} (P_{X}, D_{X})

is the linear hull of

C_{ξ} (P_{X}, D_{X})

.

(4) Let

(P_{X}, D_{X})

be a

C^{k}

-diffeological statistical model and V a locally convex vector space. A map

φ : P_{X} \to V

is called Gateaux-differentiable on

(P_{X}, D_{X})

if for any

C^{k}

-curve

c : R \to P_{X}

in

D_{X}

the composition

φ \circ c : R \to V

is differentiable. We recommend [15] for differential calculus on locally convex vector spaces.

Example 3.

(1) Let

(M, X, p)

be a parametrized statistical model. Then

(p (M), D_{X})

is a

C^{1}

-diffeological statistical model where

D_{X}

consists of all

C^{1}

-maps

q : R^{n} \supset U \to p (M)

, such that there exists a

C^{1}

-map

ψ^{M} : U \to M

and

q = p \circ ψ^{M}

.

(2) Let

P_{X}

be a statistical model. Then

P_{X}

can be endowed with a structure of a

C^{k}

-diffeological statistical model for any

k \in N^{+} \cup \infty

, where its diffeology

D_{X}^{(k)}

consists of all mappings

p : U \to P_{X}

, such that the composition

i \circ p : U \to S (X)

is of the class

C^{k}

, where U is any open domain in

R^{n}

for

n \in N

.

(3) Let

X

be the closed interval

[0, 1]

. Let

P_{X} : = f \cdot μ_{0}

, where

f \in C^{\infty} (X)

, such that

\int_{X} f d μ_{0} = 1

and

f (x) > 0

for all

x \in X

. We claim that, there does not exist a parameterized statistical model

(M, X, p)

, such that

P_{X} = p (M)

. Assume the opposite, i.e., there is a

C^{1}

-map

p : M \to S (X)

, such that

p (M) = P_{X}

. Then for any

m \in M

we have

d p (T_{m} (M)) = T_{p (m)} P_{X} = {f \in C^{\infty} (X) | \int_{X} f d μ_{0} = 0}

. However, this is not the case, as it is known that the space

C^{\infty} ([0, 1])

cannot be the image of a linear bounded map from a Banach space M to

L_{1} ([0, 1])

, see e.g., [16], p. 1434.

Definition 4.

A

C^{k}

-diffeological statistical model

(P_{X}, D_{X})

will be called almost 2-integrable, if

log (C_{ξ} (P_{X}, D_{X})) \subset L^{2} (X, ξ)

for all

ξ \in P_{X}

.

An almost 2-integrable

C^{k}

-diffeological statistical model

(P_{X}, D_{X})

will be called 2-integrable, if for any

C^{k}

-map

p : U \to P_{X}

in

D_{X}

, the function

v \mapsto {| d p (v) |}_{g}

is continuous on

T U

.

Example 4.

(1) By [5], Theorem 3.2, p. 155, a parameterized statistical model

(M, X, p)

is 2-integrable, if and only if

(p (M), p_{*} (D_{M}))

is a 2-integrable

C^{1}

-diffeological statistical model.

(2) The

C^{1}

-diffeological statistical model

(P_{X}, D_{X}^{(1)})

in Example 3(3) is 2-integrable, though there is no parameterized statistical model

(M, X, p)

such that

p (M) = P_{X}

.

(3) Let

X

be a measurable space and λ be a σ-finite measure. In [17], p. 274, Friedrich considered a family

P (λ) : = {μ \in P (X) | μ ≪ λ}

that is endowed with the following diffeology

D (λ)

. A curve

c : R \to P (λ)

is a

C^{1}

-curve, if

log \dot{c} (t) \in L^{2} (X, c (t)) .

Hence

(P (λ), D (λ))

is an almost 2-integrable

C^{1}

-diffeological statistical model.

Remark 3.

The axiomatics of Espaces différentiels, which became later the diffeological spaces, were introduced by J.-M. Souriau in the beginning of the nineteen-eighties [18]. Diffeology is a variant of the theory of differentiable spaces, introduced and developed a few years before by K.T. Chen [19]. As I have worked with a different theory of smooth structures on singular spaces [12,13], I appreciate the elegance of the theory of diffeology for its consistent and simple treatment of smooth structures on (possibly infinite dimensional) singular spaces. The best source for diffeology is the monograph by P. Iglesias-Zemmour [14].

3. Probabilistic Mappings

In 1962, Lawvere proposed a categorical approach to probability theory, where morphisms are Markov kernels, and most importantly, he supplied the space

P (X)

with a natural

σ

-algebra

Σ_{w}

, making the notion of Markov kernels and hence many constructions in probability theory and mathematical statistics functorial.

Let us recall the definition of

Σ_{w}

. Given a measurable space

X

, let

F_{s} (X)

denote the linear space of simple functions on

X

. Recall that

S (X)

is the space of all signed finite measures on

X

. There is a natural homomorphism

I : F_{s} (X) \to S^{*} (X) : = H o m (S (X), R), f \mapsto I_{f}

, defined by integration:

I_{f} (μ) : = \int_{X} f d μ

for

f \in F_{s} (X)

and

μ \in S (X)

. Following Lawvere [20], we define

Σ_{w}

to be the smallest

σ

-algebra on

S (X)

, such that

I_{f}

is measurable for all

f \in F_{s} (X)

. Let

M (X)

denote the space of all finite nonnegative measures on

X

. We also denote by

Σ_{w}

, the restriction of

Σ_{w}

to

M (X)

,

M^{*} (X) : = M (X) ∖ {0}

, and

P (X)

.

For a topological space $X$ we shall consider the natural Borel $σ$ -algebra $B (X)$ . Then, every continuous function is measurable w.r.t. $B (X)$ . If $X$ is, moreover, a metric space, then $B (X)$ is the smallest algebra making any continuous function measurable ([21], Lemma 2.13).
Let $C_{b} (X)$ be the space of bounded continuous functions on a topological space $X$ . We denote by $τ_{v}$ , the smallest topology on $S (X)$ , such that for any $f \in C_{b} (X)$ the map $I_{f} : (S (X), τ_{v}) \to R$ is continuous. We also denote by $τ_{v}$ , the restriction of $τ_{v}$ to $M (X)$ and $P (X)$ , which is also called the weak topology that generates the weak convergence of probability measures. It is known that $(P (X), τ_{v})$ is separable, and metrizable if, and only if, $X$ is [21], Theorem 3.1.4, p. 104. If $X$ is separable and metrizable then the Borel $σ$ -algebra on $P (X)$ generated by $τ_{v}$ coincides with $Σ_{w}$ .

Definition 5.

([10], Definition 2.4) A probabilistic mapping (or an arrow) from a measurable space

X

to a measurable space

Y

is a measurable mapping from

X

to

(P (Y), Σ_{w})

.

We shall denote by

\bar{T} : X \to (P (Y), Σ_{w})

the measurable mapping defining/generating a probabilistic mapping

T : X ⇝ Y

. Similarly, for a measurable mapping

p : X \to P (Y)

we shall denote by

\underset{̲}{p} : X ⇝ Y

the generated probabilistic mapping. Note that a probabilistic mapping is denoted by a curved arrow and a measurable mapping by a straight arrow.

Example 5.

([10], Example 2.6) (1) Assume that

X

is separable and metrizable. Then the identity mapping

I d_{P} : (P (X), τ_{v}) \to (P (X), τ_{v})

is continuous, and hence measurable w.r.t. the Borel σ-algebra

Σ_{w} = B (τ_{v})

. Consequently,

I d_{P}

generates a probabilistic mapping

e v : (P (X), B (τ_{v})) ⇝ (X, B (X))

and we write

\bar{e v} = I d_{P}

. Similarly, for any measurable space

X

, we also have an arrow (a probabilistic mapping)

e v : (P (X), Σ_{w}) ⇝ X

generated by the measurable mapping

\bar{e v} = I d_{P}

.

(2) Let

δ_{x}

denote the Dirac measure concentrated at x. It is known that the map

δ : X \to (P (X), Σ_{w}), x \mapsto δ (x) : = δ_{x}

, is measurable [22]. If

X

is a topological space, then the map

δ : X \to (P (X), τ_{v})

is continuous, as the composition

I_{f} \circ δ : X \to R

is continuous for any

f \in C_{b} (X)

. Hence, if

κ : X \to Y

is a measurable mapping between measurable spaces (resp. a continuous mapping between separable metrizable spaces), then the map

\bar{κ} : X \overset{δ \circ κ}{\to} P (Y)

is a measurable mapping (resp. a continuous mapping). We regard κ as a probabilistic mapping defined by

δ \circ κ : X \to P (Y)

. In particular, the identity mapping

I d : X \to X

of a measurable space

X

is a probabilistic mapping generated by

δ : X \to P (X)

. Graphically speaking, any straight arrow (a measurable mapping)

κ : X \to Y

between measurable spaces can be seen as a curved arrow (a probabilistic mapping).

Given a probabilistic mapping

T : X ⇝ Y

, we define a linear map

S_{*} (T) : S (X) \to S (Y)

, called Markov morphism, as follows [2], Lemma 5.9, p. 72,

S_{*} (T) (μ) (B) : = \int_{X} \bar{T} (x) (B) d μ (x)

(6)

for any

μ \in S (X)

and

B \in Σ_{Y}

.

Proposition 1.

Assume that

T : X ⇝ Y

is a probabilistic mapping.

(1) Then T induces a linear bounded map

S_{*} (T) : S (X) \to S (Y)

w.r.t. the total variation norm

| | \cdot {| |}_{T V}

. The restriction

M_{*} (T)

of

S_{*} (T)

to

M (X)

(resp.

P_{*} (T)

of

S_{*} (T)

to

P (X)

) maps

M (X)

to

M (Y)

(resp.

P (X)

to

P (Y)

).

(2) Probabilistic mappings are morphisms in the category of measurable spaces; i.e., for any probabilistic mappings

T_{1} : X ⇝ Y

and

T_{2} : Y ⇝ Z

, we have

M_{*} (T_{2} \circ T_{1}) = M_{*} (T_{2}) \circ M_{*} (T_{1}), P_{*} (T_{2} \circ T_{1}) = P_{*} (T_{2}) \circ P_{*} (T_{1}) .

(7)

(3)

M_{*}

and

P_{*}

are faithful functors.

(4) If

ν ≪ μ \in M^{*} (X)

then

M_{*} (T) (ν) ≪ M_{*} (T) (μ)

.

Remark 4.

The first assertion of Proposition 1 is due to Chentsov [2], Lemma 5.9, p. 72. The second assertion has been proven in [10], Theorem 2.14 (1), extending Giry’s result in [22]. The third assertion has been proven in [10]. The last assertion of Proposition 1 is due to Morse–Sacksteder [23], Proposition 5.1.

We also denote by

T_{*}

the map

S_{*} (T)

, if no confusion can arise.

Given a probabilistic mapping

T : X ⇝ Y

and a

C^{k}

-diffeological statistical model

(P_{X}, D_{X})

, we define a

C^{k}

-diffeological space

(T_{*} (P_{X}), T_{*} (D_{X}))

as the image of

D

by T [14], §1.43, p. 24. In other words, a mapping

p : U \to T_{*} (P_{X})

belongs to

T_{*} (D_{X})

if and only if it satisfies the following condition. For every

r \in U

there exists an open neighborhood

V \subset U

of r, such that either

p_{| V}

is a constant mapping, or there exists a mapping

q : U \to P_{X}

in

D_{X}

, such that

p_{| V} = T_{*} \circ q

.

Theorem 1.

Let

T : X ⇝ Y

be a probabilistic mapping and

(P_{X}, D_{X})

is a

C^{k}

-diffeological statistical model.

(1) Then

(T_{*} (P_{X}), T_{*} (D_{X}))

is a

C^{k}

-diffeological statistical model.

(2) If

(P_{X}, D_{X})

is an almost 2-integrable

C^{k}

-diffeological statistical model, then

(T_{*} (P_{X}), T_{*} (D_{X}))

is also an almost 2-integrable

C^{k}

-diffeological statistical model.

(3) If

(P_{X}, D_{X})

is a 2-integrable

C^{k}

-diffeological statistical model, then

(T_{*} (P_{X}), T_{*} (D_{X}))

is also a 2-integrable

C^{k}

-diffeological statistical model.

Proof.

(1) The first assertion is straightforward, since

T_{*} : S (X) \to S (Y)

is a linear bounded map by Proposition 1(1).

(2) Assume that

(P_{X}, D_{X})

is an almost 2-integrable

C^{k}

-statistical model and

v \in C_{ξ} (P_{X}, D_{X})

. Then there exits a

C^{k}

-map

c : R \to P_{X}

in

D_{X}

, such that

{\frac{d}{d t}}_{| t = 0} c (ξ) = v

. Since

T_{*} : S (X) \to S (Y)

is a bounded linear map,

{\frac{d}{d t}}_{| t = 0} T_{*} \circ c = T_{*} (v) .

By the monotonicity theorem [5], Corollary 5.1, p. 260, we have

∥ \frac{d T_{*} v}{d T_{*} ξ} ∥_{L^{2} (Y, T_{*} ξ)} \leq {∥ v ∥}_{L^{2} (X, ξ)} .

(8)

This proves that

(T_{*} (P_{X}), T_{*} (D_{X}))

is almost 2-integrable.

(3) Assume that

(P_{X}, D_{X})

is a

C^{k}

-diffeological statistical model. Let

c : R \to T_{*} (P_{X})

be an element in

T_{*} (D_{X})

. Then

c = T_{*} \circ c^{'}

, where

c : R \to P_{X}

is an element of

D_{X}

, i.e.,

i \circ c : R \to S (X)

is of class

C^{k}

and

(R, X, c)

is a parameterized 2-integrable statistical model. By [5], Theorem 5.4, p. 264,

(R, Y, T_{*} \circ c)

is a 2-integrable parameterized statistical model. Combined with the first assertion of Theorem 1 this proves the last assertion of Theorem 1. □

Denote by

L (X)

, the space of bounded measurable functions on a measurable space

X

. Given a probabilistic mapping

T : X ⇝ Y

, we define a linear map

T^{*} : L (Y) \to L (X)

, as follows [10], (2.2),

T^{*} (f) (x) : = I_{f} (\bar{T} (x)) = \int_{Y} f d \bar{T} (x),

(9)

which coincides with the classical formula (5.1) in [2], p. 66, for the transformation of a bounded measurable f under a Markov morphism (i.e., a probabilistic mapping) T. In particular, if

κ : X \to Y

is a measurable mapping, then we have

κ^{*} (f) (x) = f (κ (x))

, since

\bar{κ} = δ \circ κ

.

Definition 6.

([10], Definition 2.22, cf. [23]) Let

P_{X} \subset P (X)

and

P_{Y} \subset P (Y)

. A probabilistic mapping

T : X ⇝ Y

will be called sufficient for

P_{X}

if there exists a probabilistic mapping

\underset{̲}{p} : Y ⇝ X

, such that for all

μ \in P_{X}

and

h \in L (X)

we have

T_{*} (h μ) = {\underset{̲}{p}}^{*} (h) T_{*} (μ), i . e ., {\underset{̲}{p}}^{*} (h) = \frac{d T_{*} (h μ)}{d T_{*} (μ)} \in L^{1} (Y, T_{*} (μ)) .

(10)

In this case we shall call the measurable mapping

p : Y \to P (X)

defining the probabilistic mapping

\underset{̲}{p} : Y ⇝ X

a conditional mapping for T.

Example 6.

Assume that

κ : X ⇝ Y

is a measurable mapping (i.e., a statistic) which is a probabilistic mapping sufficient for

P_{X} \subset P (X)

. Let

p : Y \to P (X), y \mapsto p_{y},

be a conditional mapping for κ. By (9),

{\underset{̲}{p}}^{*} (1_{A}) (y) = p_{y} (A)

, and we rewrite (10) as follows

p_{y} (A) = \frac{d κ_{*} (1_{A} μ)}{d κ_{*} μ} \in L^{1} (Y, κ_{*} (μ)) .

(11)

The RHS of (11) is the conditional measure of μ applied to A w.r.t. the measurable mapping κ. The equality (11) implies that this conditional measure is regular and independent of μ. Thus the notion of sufficiency of a measurable mapping κ for

P_{X}

coincides with the classical notion of sufficiency of κ for

P_{X}

, see e.g., [2], p. 28, [24], Definition 2.8, p. 85. We also note that the equality in (11) is understood as equivalence class in

L^{1} (Y, κ_{*} (μ))

and hence every statistic

κ^{'}

that coincides with a sufficient statistic κ except on a zero μ-measure set, for all

μ \in P_{X}

, is also a sufficient statistic for

P_{X}

.

Example 7.

(cf. [2], Lemma 2.8, p. 28) Assume that

μ \in P (X)

has a regular conditional distribution w.r.t. to a statistic

κ : X \to Y

; i.e., there exists a measurable mapping

p : Y \to P (X), y \mapsto p_{y},

such that

E_{μ}^{σ (κ)} (1_{A} | y) = p_{y} (A)

(12)

for any

A \in Σ_{X}

and

y \in Y

. Let Θ be a set and

P : = {ν_{θ} \in P (X) | θ \in Θ}

be a parameterized family of probability measures dominated by μ. If there exists a function

h : Y \times Θ \to R

such that for all

θ \in Θ

, and we have

ν_{θ} = h (κ (x)) μ,

(13)

then κ is sufficient for P, since, for any

θ \in Θ

,

p^{*} (1_{A}) = \frac{d κ_{*} (1_{A} ν_{θ})}{d κ_{*} ν_{θ}}

does not depend on θ. Condition (13) is the Fisher–Neymann sufficiency condition for a family of dominated measures.

Example 8.

Let

κ : X \to Y

be a measurable 1-1 mapping. Then for any statistical model

P_{X} \subset P (X)

, the statistic κ is sufficient w.r.t.

P_{X}

, since, for any

A \in Σ_{X}

and any

μ \in P_{X}

, we have

\frac{d κ_{*} (1_{A} μ)}{d κ_{*} μ} = {(κ^{- 1})}^{*} (1_{A}) \in L^{1} (Y, κ_{*} (μ)) .

Next, we shall show that probabilistic mappings do not increase the Fisher metrics on almost 2-integrable

C^{k}

-diffeological statistical models. Thus the Fisher metric serves as a “information quantity” of almost 2-integrable

C^{k}

-diffeological statistical models.

Theorem 2.

Let

T : X ⇝ Y

be a probabilistic mapping and

(P_{X}, D_{X})

an almost 2-integrable

C^{k}

-diffeological statistical model. Then for any

μ \in P_{X}

and any

v \in T_{μ} (P_{X}, D_{X})

, we have

g_{μ} (v, v) \geq g_{T_{*} μ} (T_{*} v, T_{*} v)

with the equality, if T is sufficient w.r.t.

P_{X}

.

Proof.

The monotonicity assertion of Theorem 2 follows from (8). The second assertion of Theorem 2 follows from the first assertion, taking into account Theorem 2.8.2 in [10], which states the existence of a probabilistic mapping

p : Y ⇝ X

, such that

p_{*} (T_{*} (P_{X})) = P_{X}

, and therefore

p_{*} (T_{*} (D_{X})) = D_{X}

. □

Let us apply Theorem 2 to Example 4 (3), originally from [17]. In [17], Satz 1, p.274, Friedrich considered the group

G (X, Σ_{X}, λ)

of all measurable 1-1 mappings

Φ : X \to X

, such that

Φ_{*} (λ) ≪ λ

. Clearly

Φ_{*} (P (λ)) \subset P (λ)

. Example 8 says that

Φ

is a sufficient statistic w.r.t.

P (λ)

. Hence Theorem 2 implies the following

Corollary 1.

([17], Satz 1) The group

G (X, Σ_{X}, λ)

acts isometrically on

P (λ)

.

Remark 5.

Theorem 2 extends the Monotonicity Theorem [5], Theorem 5.5, p. 265, for 2-integrable parameterized statistical models. (As we remarked in Section 5, Theorem 2 can be easily extended to the case of almost l-integrable

C^{k}

-diffeological measure models.)

4. The Cramér–Rao Inequality for 2-Integrable Diffeological Statistical Models

In this section we shall prove a version of the Cramér–Rao inequality for estimators with values in a 2-integrable

C^{k}

-diffeological statistical model.

Definition 7.

Let

P_{X} \subset P (X)

be a statistical model. An estimator is a map

\hat{σ} : X \to P_{X}

.

Assume that V is a locally convex topological vector space. Then we denote, by

M a p (P_{X}, V)

, the space of all mappings

φ : P_{X} \to V

and by

V^{'}

, the topological dual of V. It is usually easier to estimate only a “coordinate"

φ (ξ)

of a probability measure

ξ \in P_{X}

, which determines

ξ

uniquely, if

φ

is embedded.

Definition 8.

Let

P_{X}

be a statistical model and

φ \in M a p (P_{X}, V)

. A φ-estimator

{\hat{σ}}_{φ}

is a composition

φ \circ \hat{σ} : X \overset{\hat{σ}}{\to} P_{X} \overset{φ}{\to} V

.

Example 9.

Assume that

k : X \times X \to R

is a symmetric and positive definite kernel function and let V be the associated RKHS. For any

x \in X

, we denote by

k_{x}

, the function on

X

defined by

k_{x} (y) : = k (x, y)

, for any

y \in X

. Then

k_{x}

is an element of V. Let

P_{X} = P (X)

. Then we define the kernel mean embedding

φ : P (X) \to V

as follows [25]

φ (ξ) : = \int_{X} k_{x} d ξ (x),

where the integral should be understood as a Bochner integral.

Remark 6.

(1) In classical statistics (see e.g., [26], §13, p. 51, [27], p. 4, [8], §4, p. 82, [5], Definition 5.1, p. 277) one considers only the parameter estimations for parameterized statistical models. In this case, an estimator is a map from

X

to the parameter set Θ of a statistical model

p (Θ) \subset P (X)

. Usually one assumes that the parameterization

p : Θ \to p (Θ)

is 1-1, hence, a parameter estimation is equivalent to a nonparametric estimation in the sense of Definition 7. Note that the ultimate aim of a statistical experiment is to estimate the probability measure generating the observable of the experiment. In general, we can only assume that the unknown generating probability measure belongs to a statistical model

P_{X} \subset P (X)

. In this case, we need to use non-parametric estimation; see e.g., [28], p. 1. Note that, by Example 3,

P_{X}

has a natural structure of a

C^{1}

-diffeological statistical model.

(2) The notion of a φ-estimation occurs in classical statistics in similar fashion; see e.g., [26], p. 52, where the author called similar estimators substitution estimators, and in [29], Definition 1.2, p. 4, where the authors consider estimands, which are versions of φ-estimators for a parameter estimation problem, see [5], p. 279.

For

φ \in M a p (P_{X}, V)

and

l \in V^{'}

we denote by

φ^{l}

the composition

l \circ φ

. Then we set

L_{φ}^{2} (X, P_{X}) : = {\hat{σ} : X \to P_{X} | φ^{l} \circ \hat{σ} \in L_{ξ}^{2} (X) for all ξ \in P_{X} and l \in V^{'}} .

For

\hat{σ} \in L_{φ}^{2} (X, P_{X})

we define the

φ

-mean value of

\hat{σ}

, denoted by

φ_{\hat{σ}} : P_{X} \to V^{''}

, as follows (cf. [5], (5.54), p. 279)

φ_{\hat{σ}} (ξ) (l) : = E_{ξ} (φ^{l} \circ \hat{σ}) for ξ \in P_{X} and l \in V^{'} .

Let us identify V with a subspace in

V^{^{″}}

via the canonical pairing.

The difference

b_{\hat{σ}}^{φ} : = φ_{\hat{σ}} - φ \in M a p (P_{X}, V^{^{″}})

will be called the bias of the

φ

-estimator

{\hat{σ}}_{φ}

.

For all

ξ \in P_{X}

we define a quadratic function

M S E_{ξ}^{φ} [\hat{σ}]

on

V^{'}

, which is called the mean square error quadratic function at

ξ

, by setting for

l, h \in V^{'}

(cf. [5], (5.56), p. 279)

M S E_{ξ}^{φ} [\hat{σ}] (l, h) : = E_{ξ} [(φ^{l} \circ \hat{σ} (x) - φ^{l} (ξ)) \cdot (φ^{h} \circ \hat{σ} (x) - φ^{h} (ξ))] .

(14)

Similarly we define the variance quadratic function of the

φ

-estimator

φ \circ \hat{σ}

at

ξ \in P_{X}

is the quadratic form

V_{ξ}^{φ} [\hat{σ}]

on

V^{'}

, such that, for all

l, h \in V^{'}

we have (cf. [5], (5.57), p. 279)

V_{ξ}^{φ} [\hat{σ}] (l, h) = E_{ξ} [φ^{l} \circ \hat{σ} (x) - E_{ξ} (φ^{l} \circ \hat{σ} (x)) \cdot φ^{h} \circ \hat{σ} (x) - E_{ξ} (φ^{h} \circ \hat{σ} (x))] .

Then it is known that [5], (5.58), p. 279,

M S E_{ξ}^{φ} [\hat{σ}] (l, h) = V_{ξ}^{φ} [\hat{σ}] (l, h) + ⟨ b_{\hat{σ}}^{φ} (ξ), l ⟩ \cdot ⟨ b_{\hat{σ}}^{φ} (ξ), h ⟩ .

(15)

Remark 7.

Assume that V is a real Hilbert space with a scalar product

⟨ \cdot, \cdot ⟩

and the associated norm

∥ \cdot ∥

. Then the scalar product defines a canonical isomorphism

V = V^{'}, v (w) : = ⟨ v, w ⟩

, for all

v, w \in V

. For

\hat{σ} \in L_{φ}^{2} (X, P_{X})

, the mean square error

M S E_{ξ}^{φ} (\hat{σ})

of the φ-estimator

φ \circ \hat{σ}

is defined by

M S E_{ξ}^{φ} (\hat{σ}) : = E_{ξ} (∥ φ \circ \hat{σ} - φ (ξ) ∥^{2}) .

(16)

The RHS of (16) is well-defined, since

\hat{σ} \in L_{φ}^{2} (X, P_{X})

, and therefore

⟨ φ \circ \hat{σ} (x), φ \circ \hat{σ} (x) ⟩ \in L^{1} (X, ξ) and ⟨ φ \circ \hat{σ} (x), φ (ξ) ⟩ \in L^{2} (X, ξ) .

Similarly, we define the variance of a φ-estimator

φ \circ \hat{σ}

at ξ as follows

V_{ξ}^{φ} (\hat{σ}) : = E_{ξ} (∥ φ \circ \hat{σ} - E_{ξ} (φ \circ \hat{σ}) ∥^{2}) .

If V has a countable basis of orthonormal vectors

v_{1}, \dots, v_{\infty}

, then we have

M S E_{ξ}^{φ} (\hat{σ}) = \sum_{i = 1}^{\infty} M S E_{ξ}^{φ} [\hat{σ}] (v_{i}, v_{i}),

(17)

V_{ξ}^{φ} (\hat{σ}) = \sum_{i = 1}^{\infty} V_{ξ}^{φ} [\hat{σ}] (v_{i}, v_{i}) .

(18)

Now, we assume that

(P_{X}, D_{X})

is an almost 2-integrable

C^{k}

-diffeological statistical model. For any

ξ \in P_{X}

, let

T_{ξ}^{g} (P_{X}, D_{X})

be the completion of

T_{ξ} (P_{X}, D_{X})

w.r.t. the Fisher metric

g

. Since

T_{ξ}^{g} (P_{X}, D_{X})

is a Hilbert space, the map

L_{g} : T_{ξ}^{g} (P_{X}, D_{X}) \to {(T_{ξ}^{g} (P_{X}, D_{X}))}^{'}, L_{g} (v) (w) : = {⟨ v, w ⟩}_{g},

is an isomorphism. Then we define the inverse

g^{- 1}

of the Fisher metric

g

on

{(T_{ξ}^{g} (P_{X}, D_{X}))}^{'}

as follows

{⟨ L_{g} v, L_{g} w ⟩}_{g^{- 1}} : = {⟨ v, w ⟩}_{g} .

(19)

Definition 9.

(cf. [5], Definition 5.18, p. 281) Assume that

\hat{σ} \in L_{φ}^{2} (X, P_{X})

. We shall call

\hat{σ}

a φ-regular estimator, if for all

l \in V^{'}

the function

ξ \mapsto ∥ φ^{l} \circ \hat{σ} ∥_{L^{2} (X, ξ)}

is locally bounded, i.e., for all

ξ_{0} \in P_{X}

lim_{ξ \to ξ_{0}} sup {∥ φ^{l} \circ \hat{σ} ∥}_{L^{2} (X, ξ)} < \infty .

Proposition 2.

Assume that

(P_{X}, D_{X})

is a 2-integrable

C^{k}

-diffeological statistical model, V is a topological vector space,

φ \in M a p (P_{X}, V)

and

\hat{σ} : X \to P_{X}

is a φ-regular estimator. Then the

V^{''}

-valued function

φ_{\hat{σ}}

is Gateaux-differentiable on

(P_{X}, D_{X})

. Furthermore, for any

l^{'} \in V^{'}

, the differential

d φ_{\hat{σ}}^{l} (ξ)

extends to an element in

{(T_{ξ}^{g} (P_{X}, D_{X}))}^{'}

for all

ξ \in P_{X}

.

Proof.

Assume that a map

c : R \to P_{X}

belongs to

D_{X}

. Then

(R, X, c)

is a 2-integrable parametrized statistical model. By Lemma 5.2 in [5], p. 282, the composition

φ_{\hat{σ}} \circ c

is differentiable. This proves the first assertion of Proposition 2.

Next, we shall show that

d φ_{\hat{σ}} (ξ)

extends to an element in

{(T_{ξ}^{g} (P_{X}, D_{X}))}^{'}

for all

ξ \in P_{X}

. Let

X \in C_{ξ} (P_{X}, D_{X})

and

c : R \to P_{X}

be a

C^{k}

-curve, such that

c (0) = ξ

and

\dot{c} (0) = X

. By Lemma 5.3 [5], p. 284, we have

\partial_{X} (φ_{\hat{σ}}^{l}) = \int_{X} (φ^{l} \circ \hat{σ} (x) - E_{ξ} (φ^{l} \circ \hat{σ}) \cdot log X d ξ (x),

(20)

where

φ^{l} \circ \hat{σ} (x) - E_{ξ} (φ^{l} \circ \hat{σ}) \in L^{2} (X, ξ)

. Denote by

Π_{ξ} : L^{2} (X, ξ) \cdot ξ \to T_{ξ}^{g} P_{X}

, the orthogonal projection. Set

{grad}_{g} (φ_{\hat{σ}}^{l}) : = Π_{ξ} [(φ^{l} \circ \hat{σ} (x) - E_{ξ} (φ^{l} \circ \hat{σ})) \cdot ξ] \in T_{ξ}^{g} P_{X} .

(21)

Then we rewrite (20), as follows

\partial_{X} (φ^{l}) = {⟨ {grad}_{g} (φ_{\hat{σ}}^{l}), X ⟩}_{g} .

Hence

d φ_{\hat{σ}}^{l}

is the restriction of

L_{g} ({grad}_{g} (φ_{\hat{σ}}^{l})) \in {(T_{ξ}^{g} (P_{X}, D_{X}))}^{'}

. This completes the proof of Proposition 2. □

For any

ξ \in P_{X}

, we denote

{(g_{\hat{σ}}^{φ})}^{- 1} (ξ)

to be the following quadratic form on

V^{'}

:

{(g_{\hat{σ}}^{φ})}^{- 1} (ξ) (l, k) : = {⟨ d φ_{\hat{σ}}^{l}, d φ_{\hat{σ}}^{k} ⟩}_{g^{- 1}} (ξ) : = ⟨ {grad}_{g} (φ_{\hat{σ}}^{l}), {grad}_{g} (φ_{\hat{σ}}^{k}) ⟩ .

(22)

Theorem 3

(Diffeological Cramér–Rao inequality). Let

(P_{X}, D_{X})

be a 2-integrable

C^{k}

-diffeological statistical model, φ, a V-valued function on

P_{X}

and

\hat{σ} \in L_{φ}^{2} (X, P_{X})

, a φ-regular estimator. Then the difference

V_{ξ}^{φ} [\hat{σ}] - {({\hat{g}}_{\hat{σ}}^{φ})}^{- 1} (ξ)

is a positive semi-definite quadratic form on

V^{'}

for any

ξ \in P_{X}

.

Proof.

To prove Theorem 3 it suffices to show that for any

l \in V^{'}

we have

E_{ξ} {(φ^{l} \circ \hat{σ} - E_{ξ} (φ^{l} \circ \hat{σ}))}^{2} \geq ∥ {grad}_{g} (φ_{\hat{σ}}^{l}) {) ∥}_{g}^{2} .

(23)

Clearly (23) follows from (21). This completes the proof of Theorem 3. □

Theorem 3 is an extension of the general Cramér–Rao inequality [11], Theorem 2, see also [5], Theorem 5.7, p. 286.

5. Discussion

The extension of the notion of a k-integrable parametrized measure model (as introduced in [6,7], see also [5]) to the notion of an almost k-integrable diffeological measure model can be done.

(1) There are two main differences between parameterized statistical models and

C^{k}

-diffeological statistical models. First, the parameter space of a parameterized statistical model is a single smooth Banach manifold, and parameter spaces for a

C^{k}

-diffeological statistical model can be different but compatible. Secondly, parameter spaces for a

C^{k}

-diffeological statistical model are finite dimensional. If

k = \infty

, this assumption is well-motivated [14], see also Remark 2 (2).

(2) It would be interesting to apply the theory of

C^{k}

-statistical models to stochastic processes. It is known that Banach manifolds are not suitable for many questions of global analysis, see e.g., [15], p. 1, and therefore, the theory of parameterized measure models might have limited applications to stochastic processes. On the other hand, there are many open questions in the theory of

C^{\infty}

-diffeological spaces, e.g., we do not know under which conditions we can define the Levi–Civita connection on a Riemannian

C^{\infty}

-diffeological space. Furthermore, the theory of

C^{k}

-diffeological spaces has not been considered before, with

k \neq \infty

.

(3) The variational calculus founded by Leibniz and Newton is a cornerstone of differential geometry and modern analysis. In our opinion, it is best expressed in the language of diffeological spaces that declare which mappings into a diffeological space are smooth. This language is a counterpart to the language of ringed spaces in algebraic geometry that declares which functions are algebraic.

Funding

This research was funded by the Institutional Research Plan RVO:67985840 and by the Grant Agency of Czech Republic, grant number GAČR-18-01953J.

Acknowledgments

The author would like to thank Patrick Iglesias-Zemmour for a stimulating discussion on diffeology, Lorenz Schwachhöfer for helpful comments on an early version of this paper and Tat Dat To for the suggestion to consider Friedrich’s examples in [17]. A part of this paper was completed during the Workshop “Information Geometry” in Toulouse 14–18 October 2019. The author would like to thank the organizers, and especially Stephane Puechmorel, for their invitation and hospitality during the workshop. The author is grateful to the anonymous referees for their critical comments and suggestions, which helped her to significantly improve the exposition of this paper.

Conflicts of Interest

The author declares no conflict of interest.

References

McCullagh, P. What is a statistical model. Ann. Stat. 2002, 30, 1225–1310. [Google Scholar] [CrossRef]
Chentsov, N. Statistical Decision Rules and Optimal Inference; Nauka: Moscow, Russia, 1972; English translation in: Translation of Math. Monograph vol. 53, Amer. Math. Soc.: Providence, RI, USA, 1982. [Google Scholar]
Amari, S. Differential-Geometric Methods in Statistics; Lecture Notes in Statistics 28; Springer: Heidelberg, Germany, 1985. [Google Scholar]
Amari, S. Information Geometry and Its Applications; Applied Mathematical Sciences; Springer: Berlin, Germany, 2016; Volume 194. [Google Scholar]
Ay, N.; Jost, J.; Lê, H.V.; Schwachhöfer, L. Information Geometry; Springer Nature: Cham, Switzerland, 2017. [Google Scholar]
Ay, N.; Jost, J.; Lê, H.V.; Schwachhöfer, L. Information geometry and sufficient statistics. Probab. Theory Relat. Fields 2015, 162, 327–364. [Google Scholar] [CrossRef]
Ay, N.; Jost, J.; Lê, H.V.; Schwachhöfer, L. Parameterized measure models. Bernoulli 2018, 24, 1692–1725. [Google Scholar] [CrossRef]
Amari, S.; Nagaoka, H. Methods of Information Geometry; Translations of Mathematical Monographs 191; Amer. Math. Soc.: Providence, RI, USA, 2000. [Google Scholar]
Pistone, G.; Sempi, C. An infinite-dimensional structure on the space of all the probability measures equivalent to a given one. Ann. Stat. 1995, 23, 1543–1561. [Google Scholar] [CrossRef]
Jost, J.; Lê, H.V.; Luu, D.H.; Tran, T.D. Probabilistic mappings and Bayesian nonparametrics. arXiv 2019, arXiv:1905.11448. [Google Scholar]
Lê, H.V.; Jost, J.; Schwachhöfer, L. The Cramér-Rao Inequality on Singular Statistical Models. In Proceedings of the Conference “Geometric Science of Information”, GSI 2017, Paris, France, 7–9 November 2017; LNCS. Springer Nature: Cham, Switzerland, 2017; Volume 10589, pp. 552–560. [Google Scholar]
Lê, H.V.; Somberg, P.; Vanžura, J. Smooth structures on pseudomanifolds with isolated conical singularities. Acta Math. Vietnam. 2013, 38, 33–54. [Google Scholar] [CrossRef][Green Version]
Lê, H.V.; Somberg, P.; Vanžura, J. Poisson smooth structures on stratified symplectic spaces. In The Springer Proceedings in Mathematics & Statistics “Mathematics in the 21st Century, 6th World Conference”, Lahore, March 2013; Springer: Basel, Switzerland, 2015; Volume 98, Chapter 7; pp. 181–204. [Google Scholar]
Iglesias-Zemmour, P. Diffeology; Amer. Math. Soc.: Providence, RI, USA, 2013. [Google Scholar]
Kriegl, A.; Michor, P.W. The Convenient Setting of Global Analysis; Amer. Math. Soc.: Providence, RI, USA, 1997. [Google Scholar]
Grabiner, S. Range of products of operators. Can. J. Math. 1974, XXVI, 1430–1441. [Google Scholar] [CrossRef]
Friedrich, T. Die Fisher-Information und symplektische Strukturen. Math. Nachr. 1991, 153, 273–296. [Google Scholar] [CrossRef]
Souriau, J.-M. Groupes différentiels. In Lecture Notes in Mathematics, Vol. 836; Springer: Berlin, Germany, 1980; pp. 91–128. [Google Scholar]
Chen, K.T. Iterated path integrals. Bull. Am. Math. Soc. 1977, 83, 831–879. [Google Scholar] [CrossRef]
Lawvere, W.F. The Category of Probabilistic Mappings. 1962. Unpublished. Available online: https://ncatlab.org/nlab/files/lawvereprobability1962.pdf (accessed on 19 December 2019).
Bogachev, V.I. Weak Convergence of Measures; Mathematical Surveys and Monographs; Amer. Math. Soc.: Providence, RI, USA, 2018; Volume 234. [Google Scholar]
Giry, M. A categorical approach to probability theory. In Categorical Aspects of Topology and Analysis; Banaschewski, B., Ed.; Lecture Notes in Mathematics; Springer: Berlin/Heidelberg, Germany, 1982; Volume 915, pp. 68–85. [Google Scholar]
Morse, N.; Sacksteder, R. Statistical isomorphism. Ann. Math. Stat. 1966, 37, 203–214. [Google Scholar] [CrossRef]
Schervish, M.J. Theory of Statistics, 2nd ed.; Springer: New York, NY, USA, 1997. [Google Scholar]
Muandet, K.; Fukumizu, K.; Sriperumbudur, B.; Schölkopf, B. Kernel Mean Embedding of Distributions: A Review and Beyonds. Found. Trends Mach. Learn. 2017, 10, 1–141. [Google Scholar] [CrossRef]
Borovkov, A.A. Mathematical Statistics; Gordon and Breach Science Publishers: Amsterdam, The Netherlands, 1998. [Google Scholar]
Ibragimov, I.A.; Has’minskii, R.Z. Statistical Estimation: Asymptotic Theory; Springer: New York, NY, USA, 1981. [Google Scholar]
Tsybakov, A.B. Introduction to Nonparametric Estimation; Springer Science+Business Media: New York, NY, USA, 2009. [Google Scholar]
Lehmann, E.L.; Casella, G. Theory of Point Estimation, 2nd ed.; Springer: New York, NY, USA, 1998. [Google Scholar]

© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Diffeological Statistical Models, the Fisher Metric and Probabilistic Mappings

Abstract

1. Introduction

2. Almost 2-Integrable Diffeological Statistical Models

3. Probabilistic Mappings

4. The Cramér–Rao Inequality for 2-Integrable Diffeological Statistical Models

5. Discussion

Funding

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics