Abstract
We introduce the notion of a -diffeological statistical model, which allows us to apply the theory of diffeological spaces to (possibly singular) statistical models. In particular, we introduce a class of almost 2-integrable -diffeological statistical models that encompasses all known statistical models for which the Fisher metric is defined. This class contains a statistical model which does not appear in the Ay–Jost–Lê–Schwachhöfer theory of parametrized measure models. Then, we show that, for any positive integer k , the class of almost 2-integrable -diffeological statistical models is preserved under probabilistic mappings. Furthermore, the monotonicity theorem for the Fisher metric also holds for this class. As a consequence, the Fisher metric on an almost 2-integrable -diffeological statistical model is preserved under any probabilistic mapping that is sufficient w.r.t. P. Finally, we extend the Cramér–Rao inequality to the class of 2-integrable -diffeological statistical models.
1. Introduction
In mathematical statistics, the notion of a statistical model and the notion of a parameterized statistical model are of central importance []. For a measurable space , let us denote by the space of all probability measures on . According to currently accepted theories, see e.g., [] and the references therein, a statistical model is a subset and a parameterized statistical model is a parameter set , together with a mapping . The image is a statistical model endowed with the parameterization . If the parameter set is a smooth manifold, then we can study a statistical model , endowed with a parameterization , by applying differential geometric techniques to and to smooth the mappings .
This idea lies in the heart of the field of information geometry, which is in the domain of mathematical statistics, where we study (parameterized) statistical models using techniques of differential geometry [,,,]. In the book “Information Geometry" by Ay, Jost, Lê, and Schwachhöfer, a parameterized statistical model is a triple where M is a Banach manifold, is a measurable space, and is a -map. Here is the Banach space of all signed finite measures on endowed with the total variation norm and i is the natural inclusion. We would like to emphasize that the concept of a parameterized statistical model introduced in [,,] encompasses statistical models endowed with the structure of a finite dimensional manifold [,,], or with the structure of an infinite dimensional Banach manifold []. The theory of parameterized measure models, moreover, allows us to study singular statistical models using differential geometric techniques, if is endowed with a parameterization by a Banach manifold.
In this study, inspired by the theory of diffeological spaces founded by Souriau and developed further by many people, we shall generalize the concept of a parameterized statistical model to the concept of a -diffeological statistical model, which, by definition, is a subset in endowed with a compatible -diffeology. We shall show that the concept of a -diffeological statistical model is more flexible than the concept of a parameterized statistical model. In particular, the image of any parameterized statistical model has a natural compatible -diffeology. Moreover, for any , any subset in can be provided with a compatible -diffeology (and hence it has a structure of a -diffeological statistical model).
Furthermore, not every subset in can be written as for some parameterized statistical model . Hence the class of -diffeological statistical models is larger than the class of statistical models parameterized by Banach manifolds as the Ay–Jost–Lê–Schwachhöfer theory. We also extend conceptually many results in the Ay–Jost–Lê–Schwachhöfer theory concerning the differential geometry of parameterized statistical models and their application to statistics and to the class of -diffeological statistical models, using the theory of probabilistic mappings, developed in a recent work by Jost, Lê, Luu and Tran [].
Our paper is organized as follows. In the second section we introduce the notions of -diffeological statistical models, almost 2-integrable -diffeological statistical models, and 2-integrable -diffeological statistical models. In the third section we recall the notion of probabilistic mappings and related results in [] and prove that the class of (almost 2-integrable/resp. 2-integrable) -statistical models is preserved under probabilistic mappings (Theorem 1). Then we extend the monotonicity of the Fisher metric on 2-integrable parameterized statistical models to the class of almost 2-integrable -diffeological statistical models (Theorem 2). In the last section, we prove a diffeological version of the Cramér–Rao inequality (Theorem 3) which extends previously known versions of the Cramér–Rao inequality in [,]. We conclude our paper with a discussion on some future directions and open questions.
2. Almost 2-Integrable Diffeological Statistical Models
Given a statistical model, , which we also denote by , it is known that is endowed with a natural geometric structure induced from the Banach space .
Definition 1.
(cf. [], Definition 3.2, p. 141) (1) Let be a Banach space, be an arbitrary subset, where i denotes the inclusion, and . Then is called a tangent vector of X at , if there is a -map , i.e., the composition is a -map, such that and .
(2) The tangent (double) cone at a point is defined as the subset of the tangent space that consists of tangent vectors of X at x. The tangent space is the linear hull of the tangent cone .
(3) The tangent cone fibration (resp. the tangent fibration ) is the union (resp. ), which is a subset of and, therefore, it is endowed with the induced topology from .
Remark 1.
(1) The notion of a tangent cone in Definition 1 occurs in a similar fashion in the theory of singular spaces, see e.g., [], §3, [], §3, [], p. 166.
(2) Definition 1 differs from [], Definition 3.1, in that, in Definition 1, the domain of a -curve c is and in [] the domain of a -curve c is . Since is diffeomorphic to , both the two choices of the domain of c are equivalent.
Example 1.
Let us consider a mixture family of probability measures on that are dominated by , where the density functions, , are of the following form
Here , for , are nonnegative functions on , such that and is a parameter, which will be specified as follows. Let us divide the square into smaller squares and color them in black and white as with a chessboard. Let be the closure of the subset of D colored in black. If η is an interior point of , then . If η is a boundary point of , then . If η is a corner point of , then consists of two intersecting lines.
- Let be a statistical model. Then it is known that any is dominated by . Hence the logarithmic representation of vis an element of . The set is a subset in . We denote it by and will call it the logarithmic representation of .
- Next we want to put a Riemannian metric on a statistical model i.e., to put a positive quadratic form on each tangent space . The space does not have a natural metric but its subspace is a Hilbert space.
Definition 2.
A statistical model will be called almost 2-integrable, if
for all . In this case we define the Fisher metric on as follows. For each
Since is the linear hull of , Formula (4) extends uniquely to a positive quadratic form on , which is called the Fisher metric.
Example 2.
Let us reconsider Example 1. Recall that our statistical model is parameterized by a map
which is the restriction of the affine map , defined by the same formula. Hence, any tangent vector can be written as where . For , we have . If for all and , then for all and all . Therefore
Hence is almost 2-integrable, if
In this case we have
Next we shall introduce the notion of a -diffeological statistical model.
Definition 3.
For and a nonempty set X, a -diffeology of X is a set of mappings , where U is an open domain in , and n runs over nonnegative integers, such that the three following axioms are satisfied.
D1. Covering. The set contains the constant mappings , defined on , for all and for all .
D2. Locality. Let be a mapping. If for every point there exists an open neighborhood V of r, such that belongs to then the map belongs to .
D3. Smooth compatibility. For every element of , for every real domain V, for every , belongs to .
A -diffeological space is a nonempty set equipped with a -diffeology . Elements of will be called -maps from U to X.
A statistical model endowed with a -diffeology will be called a -diffeological statistical model, if for any map in the composition is a -map.
Remark 2.
(1) In [], Iglesias-Zemmour considered only -diffeologies. The notion of a -diffeology, as given in Definition 3 is a straightforward adaptation of the concept of a smooth diffeology, as given in [], §1.5.
(2) As is a Banach space, by [], Lemma 3.11, p. 30, a compatible -diffeology on a statistical model is defined by smooth maps .
(3) Given a -diffeological statistical model and , the tangent cone is the subset of that consists of the tangent vectors of -curves in , such that . Similarly, the tangent space is the linear hull of .
(4) Let be a -diffeological statistical model and V a locally convex vector space. A map is called Gateaux-differentiable on if for any -curve in the composition is differentiable. We recommend [] for differential calculus on locally convex vector spaces.
Example 3.
(1) Let be a parametrized statistical model. Then is a -diffeological statistical model where consists of all -maps , such that there exists a -map and .
(2) Let be a statistical model. Then can be endowed with a structure of a -diffeological statistical model for any , where its diffeology consists of all mappings , such that the composition is of the class , where U is any open domain in for .
(3) Let be the closed interval . Let , where , such that and for all . We claim that, there does not exist a parameterized statistical model , such that . Assume the opposite, i.e., there is a -map , such that . Then for any we have . However, this is not the case, as it is known that the space cannot be the image of a linear bounded map from a Banach space M to , see e.g., [], p. 1434.
Definition 4.
A -diffeological statistical model will be called almost 2-integrable, if for all .
An almost 2-integrable -diffeological statistical model will be called 2-integrable, if for any -map in , the function is continuous on .
Example 4.
(1) By [], Theorem 3.2, p. 155, a parameterized statistical model is 2-integrable, if and only if is a 2-integrable -diffeological statistical model.
(2) The -diffeological statistical model in Example 3(3) is 2-integrable, though there is no parameterized statistical model such that .
(3) Let be a measurable space and λ be a σ-finite measure. In [], p. 274, Friedrich considered a family that is endowed with the following diffeology . A curve is a -curve, if
Hence is an almost 2-integrable -diffeological statistical model.
Remark 3.
The axiomatics of Espaces différentiels, which became later the diffeological spaces, were introduced by J.-M. Souriau in the beginning of the nineteen-eighties []. Diffeology is a variant of the theory of differentiable spaces, introduced and developed a few years before by K.T. Chen []. As I have worked with a different theory of smooth structures on singular spaces [,], I appreciate the elegance of the theory of diffeology for its consistent and simple treatment of smooth structures on (possibly infinite dimensional) singular spaces. The best source for diffeology is the monograph by P. Iglesias-Zemmour [].
3. Probabilistic Mappings
In 1962, Lawvere proposed a categorical approach to probability theory, where morphisms are Markov kernels, and most importantly, he supplied the space with a natural -algebra , making the notion of Markov kernels and hence many constructions in probability theory and mathematical statistics functorial.
Let us recall the definition of . Given a measurable space , let denote the linear space of simple functions on . Recall that is the space of all signed finite measures on . There is a natural homomorphism , defined by integration: for and . Following Lawvere [], we define to be the smallest -algebra on , such that is measurable for all . Let denote the space of all finite nonnegative measures on . We also denote by , the restriction of to , , and .
- For a topological space we shall consider the natural Borel -algebra . Then, every continuous function is measurable w.r.t. . If is, moreover, a metric space, then is the smallest algebra making any continuous function measurable ([], Lemma 2.13).
- Let be the space of bounded continuous functions on a topological space . We denote by , the smallest topology on , such that for any the map is continuous. We also denote by , the restriction of to and , which is also called the weak topology that generates the weak convergence of probability measures. It is known that is separable, and metrizable if, and only if, is [], Theorem 3.1.4, p. 104. If is separable and metrizable then the Borel -algebra on generated by coincides with .
Definition 5.
([], Definition 2.4) A probabilistic mapping (or an arrow) from a measurable space to a measurable space is a measurable mapping from to .
We shall denote by the measurable mapping defining/generating a probabilistic mapping . Similarly, for a measurable mapping we shall denote by the generated probabilistic mapping. Note that a probabilistic mapping is denoted by a curved arrow and a measurable mapping by a straight arrow.
Example 5.
([], Example 2.6) (1) Assume that is separable and metrizable. Then the identity mapping is continuous, and hence measurable w.r.t. the Borel σ-algebra . Consequently, generates a probabilistic mapping and we write . Similarly, for any measurable space , we also have an arrow (a probabilistic mapping) generated by the measurable mapping .
(2) Let denote the Dirac measure concentrated at x. It is known that the map , is measurable []. If is a topological space, then the map is continuous, as the composition is continuous for any . Hence, if is a measurable mapping between measurable spaces (resp. a continuous mapping between separable metrizable spaces), then the map is a measurable mapping (resp. a continuous mapping). We regard κ as a probabilistic mapping defined by . In particular, the identity mapping of a measurable space is a probabilistic mapping generated by . Graphically speaking, any straight arrow (a measurable mapping) between measurable spaces can be seen as a curved arrow (a probabilistic mapping).
Given a probabilistic mapping , we define a linear map , called Markov morphism, as follows [], Lemma 5.9, p. 72,
for any and .
Proposition 1.
Assume that is a probabilistic mapping.
(1) Then T induces a linear bounded map w.r.t. the total variation norm . The restriction of to (resp. of to ) maps to (resp. to ).
(2) Probabilistic mappings are morphisms in the category of measurable spaces; i.e., for any probabilistic mappings and , we have
(3) and are faithful functors.
(4) If then .
Remark 4.
The first assertion of Proposition 1 is due to Chentsov [], Lemma 5.9, p. 72. The second assertion has been proven in [], Theorem 2.14 (1), extending Giry’s result in []. The third assertion has been proven in []. The last assertion of Proposition 1 is due to Morse–Sacksteder [], Proposition 5.1.
We also denote by the map , if no confusion can arise.
Given a probabilistic mapping and a -diffeological statistical model , we define a -diffeological space as the image of by T [], §1.43, p. 24. In other words, a mapping belongs to if and only if it satisfies the following condition. For every there exists an open neighborhood of r, such that either is a constant mapping, or there exists a mapping in , such that .
Theorem 1.
Let be a probabilistic mapping and is a -diffeological statistical model.
(1) Then is a -diffeological statistical model.
(2) If is an almost 2-integrable -diffeological statistical model, then is also an almost 2-integrable -diffeological statistical model.
(3) If is a 2-integrable -diffeological statistical model, then is also a 2-integrable -diffeological statistical model.
Proof.
(1) The first assertion is straightforward, since is a linear bounded map by Proposition 1(1).
(2) Assume that is an almost 2-integrable -statistical model and . Then there exits a -map in , such that . Since is a bounded linear map,
By the monotonicity theorem [], Corollary 5.1, p. 260, we have
This proves that is almost 2-integrable.
(3) Assume that is a -diffeological statistical model. Let be an element in . Then , where is an element of , i.e., is of class and is a parameterized 2-integrable statistical model. By [], Theorem 5.4, p. 264, is a 2-integrable parameterized statistical model. Combined with the first assertion of Theorem 1 this proves the last assertion of Theorem 1. □
Denote by , the space of bounded measurable functions on a measurable space . Given a probabilistic mapping , we define a linear map , as follows [], (2.2),
which coincides with the classical formula (5.1) in [], p. 66, for the transformation of a bounded measurable f under a Markov morphism (i.e., a probabilistic mapping) T. In particular, if is a measurable mapping, then we have , since .
Definition 6.
([], Definition 2.22, cf. []) Let and . A probabilistic mapping will be called sufficient for if there exists a probabilistic mapping , such that for all and we have
In this case we shall call the measurable mapping defining the probabilistic mapping a conditional mapping for T.
Example 6.
Assume that is a measurable mapping (i.e., a statistic) which is a probabilistic mapping sufficient for . Let be a conditional mapping for κ. By (9), , and we rewrite (10) as follows
The RHS of (11) is the conditional measure of μ applied to A w.r.t. the measurable mapping κ. The equality (11) implies that this conditional measure is regular and independent of μ. Thus the notion of sufficiency of a measurable mapping κ for coincides with the classical notion of sufficiency of κ for , see e.g., [], p. 28, [], Definition 2.8, p. 85. We also note that the equality in (11) is understood as equivalence class in and hence every statistic that coincides with a sufficient statistic κ except on a zero μ-measure set, for all , is also a sufficient statistic for .
Example 7.
(cf. [], Lemma 2.8, p. 28) Assume that has a regular conditional distribution w.r.t. to a statistic ; i.e., there exists a measurable mapping such that
for any and . Let Θ be a set and be a parameterized family of probability measures dominated by μ. If there exists a function such that for all , and we have
then κ is sufficient for P, since, for any ,
does not depend on θ. Condition (13) is the Fisher–Neymann sufficiency condition for a family of dominated measures.
Example 8.
Let be a measurable 1-1 mapping. Then for any statistical model , the statistic κ is sufficient w.r.t. , since, for any and any , we have
Next, we shall show that probabilistic mappings do not increase the Fisher metrics on almost 2-integrable -diffeological statistical models. Thus the Fisher metric serves as a “information quantity” of almost 2-integrable -diffeological statistical models.
Theorem 2.
Let be a probabilistic mapping and an almost 2-integrable -diffeological statistical model. Then for any and any , we have
with the equality, if T is sufficient w.r.t. .
Proof.
The monotonicity assertion of Theorem 2 follows from (8). The second assertion of Theorem 2 follows from the first assertion, taking into account Theorem 2.8.2 in [], which states the existence of a probabilistic mapping , such that , and therefore . □
Let us apply Theorem 2 to Example 4 (3), originally from []. In [], Satz 1, p.274, Friedrich considered the group of all measurable 1-1 mappings , such that . Clearly . Example 8 says that is a sufficient statistic w.r.t. . Hence Theorem 2 implies the following
Corollary 1.
([], Satz 1) The group acts isometrically on .
Remark 5.
Theorem 2 extends the Monotonicity Theorem [], Theorem 5.5, p. 265, for 2-integrable parameterized statistical models. (As we remarked in Section 5, Theorem 2 can be easily extended to the case of almost l-integrable -diffeological measure models.)
4. The Cramér–Rao Inequality for 2-Integrable Diffeological Statistical Models
In this section we shall prove a version of the Cramér–Rao inequality for estimators with values in a 2-integrable -diffeological statistical model.
Definition 7.
Let be a statistical model. An estimator is a map .
Assume that V is a locally convex topological vector space. Then we denote, by , the space of all mappings and by , the topological dual of V. It is usually easier to estimate only a “coordinate" of a probability measure , which determines uniquely, if is embedded.
Definition 8.
Let be a statistical model and . A φ-estimator is a composition .
Example 9.
Assume that is a symmetric and positive definite kernel function and let V be the associated RKHS. For any , we denote by , the function on defined by , for any . Then is an element of V. Let . Then we define the kernel mean embedding as follows []
where the integral should be understood as a Bochner integral.
Remark 6.
(1) In classical statistics (see e.g., [], §13, p. 51, [], p. 4, [], §4, p. 82, [], Definition 5.1, p. 277) one considers only the parameter estimations for parameterized statistical models. In this case, an estimator is a map from to the parameter set Θ of a statistical model . Usually one assumes that the parameterization is 1-1, hence, a parameter estimation is equivalent to a nonparametric estimation in the sense of Definition 7. Note that the ultimate aim of a statistical experiment is to estimate the probability measure generating the observable of the experiment. In general, we can only assume that the unknown generating probability measure belongs to a statistical model . In this case, we need to use non-parametric estimation; see e.g., [], p. 1. Note that, by Example 3, has a natural structure of a -diffeological statistical model.
(2) The notion of a φ-estimation occurs in classical statistics in similar fashion; see e.g., [], p. 52, where the author called similar estimators substitution estimators, and in [], Definition 1.2, p. 4, where the authors consider estimands, which are versions of φ-estimators for a parameter estimation problem, see [], p. 279.
For and we denote by the composition . Then we set
For we define the -mean value of , denoted by , as follows (cf. [], (5.54), p. 279)
Let us identify V with a subspace in via the canonical pairing.
The difference will be called the bias of the -estimator .
For all we define a quadratic function on , which is called the mean square error quadratic function at , by setting for (cf. [], (5.56), p. 279)
Similarly we define the variance quadratic function of the -estimator at is the quadratic form on , such that, for all we have (cf. [], (5.57), p. 279)
Then it is known that [], (5.58), p. 279,
Remark 7.
Assume that V is a real Hilbert space with a scalar product and the associated norm . Then the scalar product defines a canonical isomorphism , for all . For , the mean square error of the φ-estimator is defined by
The RHS of (16) is well-defined, since , and therefore
Similarly, we define the variance of a φ-estimator at ξ as follows
If V has a countable basis of orthonormal vectors , then we have
Now, we assume that is an almost 2-integrable -diffeological statistical model. For any , let be the completion of w.r.t. the Fisher metric . Since is a Hilbert space, the map
is an isomorphism. Then we define the inverse of the Fisher metric on as follows
Definition 9.
(cf. [], Definition 5.18, p. 281) Assume that . We shall call a φ-regular estimator, if for all the function is locally bounded, i.e., for all
Proposition 2.
Assume that is a 2-integrable -diffeological statistical model, V is a topological vector space, and is a φ-regular estimator. Then the -valued function is Gateaux-differentiable on . Furthermore, for any , the differential extends to an element in for all .
Proof.
Assume that a map belongs to . Then is a 2-integrable parametrized statistical model. By Lemma 5.2 in [], p. 282, the composition is differentiable. This proves the first assertion of Proposition 2.
Next, we shall show that extends to an element in for all . Let and be a -curve, such that and . By Lemma 5.3 [], p. 284, we have
where . Denote by , the orthogonal projection. Set
Then we rewrite (20), as follows
Hence is the restriction of . This completes the proof of Proposition 2. □
For any , we denote to be the following quadratic form on :
Theorem 3
(Diffeological Cramér–Rao inequality). Let be a 2-integrable -diffeological statistical model, φ, a V-valued function on and , a φ-regular estimator. Then the difference is a positive semi-definite quadratic form on for any .
Proof.
Theorem 3 is an extension of the general Cramér–Rao inequality [], Theorem 2, see also [], Theorem 5.7, p. 286.
5. Discussion
The extension of the notion of a k-integrable parametrized measure model (as introduced in [,], see also []) to the notion of an almost k-integrable diffeological measure model can be done.
(1) There are two main differences between parameterized statistical models and -diffeological statistical models. First, the parameter space of a parameterized statistical model is a single smooth Banach manifold, and parameter spaces for a -diffeological statistical model can be different but compatible. Secondly, parameter spaces for a -diffeological statistical model are finite dimensional. If , this assumption is well-motivated [], see also Remark 2 (2).
(2) It would be interesting to apply the theory of -statistical models to stochastic processes. It is known that Banach manifolds are not suitable for many questions of global analysis, see e.g., [], p. 1, and therefore, the theory of parameterized measure models might have limited applications to stochastic processes. On the other hand, there are many open questions in the theory of -diffeological spaces, e.g., we do not know under which conditions we can define the Levi–Civita connection on a Riemannian -diffeological space. Furthermore, the theory of -diffeological spaces has not been considered before, with .
(3) The variational calculus founded by Leibniz and Newton is a cornerstone of differential geometry and modern analysis. In our opinion, it is best expressed in the language of diffeological spaces that declare which mappings into a diffeological space are smooth. This language is a counterpart to the language of ringed spaces in algebraic geometry that declares which functions are algebraic.
Funding
This research was funded by the Institutional Research Plan RVO:67985840 and by the Grant Agency of Czech Republic, grant number GAČR-18-01953J.
Acknowledgments
The author would like to thank Patrick Iglesias-Zemmour for a stimulating discussion on diffeology, Lorenz Schwachhöfer for helpful comments on an early version of this paper and Tat Dat To for the suggestion to consider Friedrich’s examples in []. A part of this paper was completed during the Workshop “Information Geometry” in Toulouse 14–18 October 2019. The author would like to thank the organizers, and especially Stephane Puechmorel, for their invitation and hospitality during the workshop. The author is grateful to the anonymous referees for their critical comments and suggestions, which helped her to significantly improve the exposition of this paper.
Conflicts of Interest
The author declares no conflict of interest.
References
- McCullagh, P. What is a statistical model. Ann. Stat. 2002, 30, 1225–1310. [Google Scholar] [CrossRef]
- Chentsov, N. Statistical Decision Rules and Optimal Inference; Nauka: Moscow, Russia, 1972; English translation in: Translation of Math. Monograph vol. 53, Amer. Math. Soc.: Providence, RI, USA, 1982. [Google Scholar]
- Amari, S. Differential-Geometric Methods in Statistics; Lecture Notes in Statistics 28; Springer: Heidelberg, Germany, 1985. [Google Scholar]
- Amari, S. Information Geometry and Its Applications; Applied Mathematical Sciences; Springer: Berlin, Germany, 2016; Volume 194. [Google Scholar]
- Ay, N.; Jost, J.; Lê, H.V.; Schwachhöfer, L. Information Geometry; Springer Nature: Cham, Switzerland, 2017. [Google Scholar]
- Ay, N.; Jost, J.; Lê, H.V.; Schwachhöfer, L. Information geometry and sufficient statistics. Probab. Theory Relat. Fields 2015, 162, 327–364. [Google Scholar] [CrossRef]
- Ay, N.; Jost, J.; Lê, H.V.; Schwachhöfer, L. Parameterized measure models. Bernoulli 2018, 24, 1692–1725. [Google Scholar] [CrossRef]
- Amari, S.; Nagaoka, H. Methods of Information Geometry; Translations of Mathematical Monographs 191; Amer. Math. Soc.: Providence, RI, USA, 2000. [Google Scholar]
- Pistone, G.; Sempi, C. An infinite-dimensional structure on the space of all the probability measures equivalent to a given one. Ann. Stat. 1995, 23, 1543–1561. [Google Scholar] [CrossRef]
- Jost, J.; Lê, H.V.; Luu, D.H.; Tran, T.D. Probabilistic mappings and Bayesian nonparametrics. arXiv 2019, arXiv:1905.11448. [Google Scholar]
- Lê, H.V.; Jost, J.; Schwachhöfer, L. The Cramér-Rao Inequality on Singular Statistical Models. In Proceedings of the Conference “Geometric Science of Information”, GSI 2017, Paris, France, 7–9 November 2017; LNCS. Springer Nature: Cham, Switzerland, 2017; Volume 10589, pp. 552–560. [Google Scholar]
- Lê, H.V.; Somberg, P.; Vanžura, J. Smooth structures on pseudomanifolds with isolated conical singularities. Acta Math. Vietnam. 2013, 38, 33–54. [Google Scholar] [CrossRef][Green Version]
- Lê, H.V.; Somberg, P.; Vanžura, J. Poisson smooth structures on stratified symplectic spaces. In The Springer Proceedings in Mathematics & Statistics “Mathematics in the 21st Century, 6th World Conference”, Lahore, March 2013; Springer: Basel, Switzerland, 2015; Volume 98, Chapter 7; pp. 181–204. [Google Scholar]
- Iglesias-Zemmour, P. Diffeology; Amer. Math. Soc.: Providence, RI, USA, 2013. [Google Scholar]
- Kriegl, A.; Michor, P.W. The Convenient Setting of Global Analysis; Amer. Math. Soc.: Providence, RI, USA, 1997. [Google Scholar]
- Grabiner, S. Range of products of operators. Can. J. Math. 1974, XXVI, 1430–1441. [Google Scholar] [CrossRef]
- Friedrich, T. Die Fisher-Information und symplektische Strukturen. Math. Nachr. 1991, 153, 273–296. [Google Scholar] [CrossRef]
- Souriau, J.-M. Groupes différentiels. In Lecture Notes in Mathematics, Vol. 836; Springer: Berlin, Germany, 1980; pp. 91–128. [Google Scholar]
- Chen, K.T. Iterated path integrals. Bull. Am. Math. Soc. 1977, 83, 831–879. [Google Scholar] [CrossRef]
- Lawvere, W.F. The Category of Probabilistic Mappings. 1962. Unpublished. Available online: https://ncatlab.org/nlab/files/lawvereprobability1962.pdf (accessed on 19 December 2019).
- Bogachev, V.I. Weak Convergence of Measures; Mathematical Surveys and Monographs; Amer. Math. Soc.: Providence, RI, USA, 2018; Volume 234. [Google Scholar]
- Giry, M. A categorical approach to probability theory. In Categorical Aspects of Topology and Analysis; Banaschewski, B., Ed.; Lecture Notes in Mathematics; Springer: Berlin/Heidelberg, Germany, 1982; Volume 915, pp. 68–85. [Google Scholar]
- Morse, N.; Sacksteder, R. Statistical isomorphism. Ann. Math. Stat. 1966, 37, 203–214. [Google Scholar] [CrossRef]
- Schervish, M.J. Theory of Statistics, 2nd ed.; Springer: New York, NY, USA, 1997. [Google Scholar]
- Muandet, K.; Fukumizu, K.; Sriperumbudur, B.; Schölkopf, B. Kernel Mean Embedding of Distributions: A Review and Beyonds. Found. Trends Mach. Learn. 2017, 10, 1–141. [Google Scholar] [CrossRef]
- Borovkov, A.A. Mathematical Statistics; Gordon and Breach Science Publishers: Amsterdam, The Netherlands, 1998. [Google Scholar]
- Ibragimov, I.A.; Has’minskii, R.Z. Statistical Estimation: Asymptotic Theory; Springer: New York, NY, USA, 1981. [Google Scholar]
- Tsybakov, A.B. Introduction to Nonparametric Estimation; Springer Science+Business Media: New York, NY, USA, 2009. [Google Scholar]
- Lehmann, E.L.; Casella, G. Theory of Point Estimation, 2nd ed.; Springer: New York, NY, USA, 1998. [Google Scholar]
© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).