Abstract
Analogous to Kolmogorov’s theorem for the existence of stochastic processes describing random functions, we consider theorems for the existence of stochastic processes describing random measures as limits of inverse measure systems. Specifically, given a coherent inverse system of random (bounded/signed/positive/probability) histograms on refining partitions, we study conditions for the existence and uniqueness of a corresponding random inverse limit, a Radon probability measure on the space of (bounded/signed/positive/probability) measures. Depending on the topology (vague/tight/weak/total-variational) and Kingman’s notion of complete randomness, the limiting random measure is in one of four phases, distinguished by their degrees of concentration (support/domination/discreteness). The results are applied in the well-known Dirichlet and Polya tree families of random probability measures and a new Gaussian family of signed inverse limit measures. In these three families, examples of all four phases occur, and we describe the corresponding conditions of defining parameters.
Keywords:
random Radon measure; stochastic process (existence); stochastic integral; phase structure MSC:
28C20; 60B11; 60G07; 60G15; 60G57
1. Introduction
Underpinning theories of random functions is Kolmogorov’s theorem for the existence of stochastic processes. Given a (e.g., Polish) domain for real valued functions f, we depart from the collection of projections (where S is any finite subset of ) and, for every S, we provide a probability distribution for the following projection:
Consistency among projections dictates the necessary condition where implies that is marginal to . Kolmogorov’s theorem says that this consistency condition is also sufficient: if the chosen are consistent, there exists a probability distribution for a random function f such that the projected random are distributed according to for all finite .
Kolmogorov’s theorem has advantages and disadvantages: On the one hand, mere consistency is enough without further conditions, making existence remarkably easy to prove. This enhances applicability greatly, as does the simplicity of the approach: properties of derive from those of the (user-defined) , and calculations for random are feasible because they take place in finite-dimensional probability spaces. On the other hand, the limiting is a Borel measure for the product topology on , which dissociates from some interesting other (e.g., metrizable) topologies, while the indistinct nature of forces the imposition of extra conditions on the choices of the to induce properties like (-almost-sure) continuity, differentiability, or integrability.
Probability measures on spaces of measures are (less common, but) also of interest in various parts of science. For example, in non-parametric statistics [], particularly of the Bayesian type [,], probability distributions on spaces of probability measures play a central role; machine learning shares much with statistics, and random (probability) measures also feature prominently there.
In those disciplines, the most straightforward approach is often to define random measures by the mapping of random functions: it is commonplace, for example, to normalize an integrable positive random function in order to define a random probability density function. But it remains attractive to think about constructions of random measures that take place directly on the space of measures in its full generality. In this sense, based on the Poisson-type family of completely random measures [,], a well-developed theory of point processes (e.g., the family of Dirichlet random probability measures) exists (for an overview, see [,]) and is applied widely. Further examples exist, but a comprehensive mathematical theory for the construction and study of probability measures on measure spaces is lacking to date.
Ideally, such a theory of random measures would be based on an existence theorem like Kolmogorov’s theorem for random functions. In this paper, we formulate several new existence theorems of this kind. For a directed set of finite, measurable partitions of , we choose distributions and define, for all restrictions of to ,
These projections are called random histograms in what follows, and their consistency follows from the additivity of : if refines , then, for any ,
when . Under which conditions does such a system of random histograms have a (unique) histogram limit ? (By which we mean a probability measure on the space of measures with -restrictions that match the .)
This question is, of course, not new: both Bayesian non-parametric statistics and stochastic analyses have formulated a wide variety of conditions for existence, more or less independently. First explorations of the subject in stochastic analyses date back to the studies of [,]. The authors of these studies formulated the classical Bochner–Kolmogorov conditions for the existence of a random distribution function on . Other approaches based on inner regularity are considered in [,,] and discussed comprehensively in [,]. Definitions of measure-theoretic inverse limits are presented in [,,,]. Limits of random histogram systems in the Bayesian non-parametric literature were first discussed in [], which introduced the Pólya-tree family of histogram systems. Many further developments were based on Kingman’s completely random measures, most prominently in the form of the Dirichlet process [,]. For overviews of these and further developments on non-parametric Bayesian priors, see [,]. Regarding existence, most noteworthy is [], which formulated the so-called Mean-measure condition for the existence of a limit for a system of random probability histograms : Orbanz requires that there exists a Borel probability measure G on with histogram projections that match histogram expectations: for all partitions and all .
In this paper we prove and apply several existence theorems for limits of random histogram systems and analyze the variety of ways in which the corresponding random measures manifest in theory and examples. After introductory remarks and a discussion of the Bourbaki–Prokhorov–Schwartz theorem in Section 2, we consider spaces of probability measures with the tight, weak or total-variational topology, and we derive conditions that guarantee the existence and uniqueness of a limiting Radon probability measure in those cases in Section 3 and Section 4. As it turns out, the manifestations of their respective random probability measures are quite different: a limit that is Radon for the weak or total-variational topology is supported by the subset of of measures dominated by G, while a that is Radon for the tight topology is supported by the subset of of measures with support contained in that of G. Combined with the Poisson-like manifestation of completely random measures, in Section 5, we distinguish four phases for random probability histogram limits: absolutely-continuous, fixed-atomic, continuous-singular and random-atomic. The results are applied to known examples like the Dirichlet (in Section 6) and Pólya-tree (in Section 7) families, and we re-derive and sharpen some of the existing results.
In Section 8, we consider spaces of signed measures with the vague, tight and weak topologies and derive conditions that guarantee the existence and uniqueness of Radon histogram limits . The generalization to signed measures accommodates a new family of Gaussian probability distributions, defined as limits of random histogram systems of the form,
where N denotes a multivariate normal distribution with (suitably defined) expectation and covariance . All four phases found for probability histogram limits are also realized in this setting, so Gaussian histogram limits exist with the same wide range of diffuse and point-like manifestations. We argue that Gaussian histogram limits based on Green’s functions for the harmonic operator generalize the well-known two-dimensional Gaussian free field [] to higher dimensions, suggesting a potential role in four-dimensional Euclidean quantum field theory.
To conclude, we emphasize the constructive nature of the existence theorems provided: random histogram systems not only define but also approximate random measures. The approximative property has two large advantages, one computational and one analytic: Firstly, histogram systems consist of finite-dimensional probability distributions, which we can simulate. The Dirichlet process, for example, derives much of its immense popularity from its ease of numerical implementation and use, and this considerable advantage extends to all histogram methods. The second advantage lies in mathematical accessibility. The analyses of example histogram limits in Section 6, Section 7 and Section 8 are possible only because calculations with finite-dimensional random histograms are feasible, and limits of the results correspond to properties of the infinite-dimensional histogram limits.
2. Limits of Random Histogram Systems
The existence theorems that follow in Section 3 and Section 4 require some (tedious but necessary) bookkeeping of partitions and corresponding notation (Section 2.1), as well as a preparatory discussion of the relevant existence theorem, the Bourbaki–Prokhorov–Schwartz theorem (Section 2.2).
2.1. Inverse Systems of Random Histograms
We start by introducing directed sets of partitions and the associated histogram systems.
2.1.1. Measures, Partitions and Histograms
Let be a Hausdorff topological space (further specification, e.g., Polishness, compactness, etc., follows below). For most purposes, can be thought of as a space like : for statisticians, this space plays the role of the sample space, while, for probabilists and physicists, represents Euclidean space-time. The space has a Borel -algebra that we denote by . We consider a collection of partitions of , consisting of finite numbers of non-empty Borel sets. (The maximal such collection, denoted , contains all finite partitions into non-empty Borel sets.) We order partially by the refinement of partitions (if and refines , write ) and assume throughout that forms a directed set for ordering by refinement. Naturally, implies inclusion for the generated -algebras . With the notation for the cardinality of , let denote the index set . Furthermore, we associate with a finite, discrete space and the mapping , such that for all . For all , we also define such that (and as the identity on for all ), and we define to be such that for all .
Consider (or any of the discrete spaces ): define (or ) to be the linear space of all bounded, continuous (or ), and let (or ) denote the space of all bounded, signed Radon measures on (or on ). For , we say that dominates (notation ) if implies for all . Define the bilinear form (or ). For any , let denote the (unique pair of) positive measures such that , and define , noting that the total variational norm equals . The bilinear form places in dual correspondence with (or with ). We refer to the resulting topology on as the tight topology, denoted as . Let denote the positive cone in and let denote the space of all Radon probability measures on (or equivalent with ). If is a Polish space, then so are and (see [], Ch. IX, § 5, No. 4, Proposition 10). Alternatively, we view and as normed spaces, with the supremum norm (or ) and total-variational norm (or )), respectively. We refer to the corresponding norm topology on as the total-variational topology, denoted as . Below we also consider and in duality with the space of all bounded Borel-measurable , based on the same bilinear form . We refer to the corresponding topology on as the weak topology, denoted . Clearly, the total-variational topology refines the weak topology and the weak topology refines the tight topology.
We specialize to probability measures in most of this section and exclusively in Section 3, Section 4, Section 5, Section 6 and Section 7 and generalize to signed measures only in Section 8. Summarizing the most basic requirements for , and , we give the following definition.
Definition 1.
We say that , and satisfy the minimal conditions if
- (i.)
- and are Hausdorff topological spaces;
- (ii.)
- is a directed set of finite partitions of in terms of non-empty, Borel-measurable sets;
- (iii.)
- for any and all , is Borel-measurable.
With regard to the third requirement, it is noted that if the space is a Polish space, the mappings , () are measurable with respect to the Borel -algebra for the tight topology on .
For any Borel probability measure on , there exists a mapping on , that takes a finite, measurable partition of into the (α-)histogram associated with P. Note that , so any can be represented by an element of the simplex (and we shall interchange these two perspectives freely in what follows). Consider such that refines . By finite additivity of the measure P, we have, for every ,
so the histograms and are related through the summation of probabilities for components that are unified when partitions coarsen. Clearly, any probability measure P defines a collection of probability histograms related through (1), which, conversely, are enough to reconstruct P if is rich enough (as per Carathéodory’s extension). To give these observations regarding histograms formal expression, we make the following definitions. For every , there exists a projection mapping ,
that maps a probability distribution to its -histogram. Based on (1), for all such that , there is a transition mapping ,
that maps -histograms to -histograms. Then is the identity for any . Also, for any , we have,
and, for all ,
Together with the fact that forms a directed set, the following property implies that is rich enough for histograms to fix measures on all of the Borel -algebra.
Definition 2.
A set of partitions of a Hausdorff topological space is said to resolve if the σ-algebra generated by the union of all sets A in all partitions in is the Borel σ-algebra, i.e., if .
To formulate necessary conditions below, we also need a construction of partitions in terms of a topological basis for .
Definition 3.
Let be a topological basis for . We say that a partition α (or collection of partitions ) is generated by the basis if for (any and) any , A is the union of a finite number of subsets obtained through a finite number of intersections of with U or , .
Example 1.
In a topological space with a countable basis , we may construct a sequence of refining partitions based on an enumeration of : start with ; for all , intersect all sets in with and and then define to consist of all such non-empty intersections. The resulting is a fully ordered set, and resolves .
2.1.2. Domination, Histogram Densities and Total Variation
In dominated families of probability measures, the convergence of histogram systems coincides with the martingale convergence of Radon–Nikodym densities (see also Appendix A1.6 of []). Due to the monotony of the relation , is a directed filtration. Furthermore, if resolves , the limit of the filtration (which has the union of all , , as a generating ring) is equal to the Borel -algebra on .
Let be given and assume that , so that P has a (Q-almost-everywhere unique) Radon–Nikodym density with respect to Q. Consider the -measurable functions , defined by,
for Q-almost-all (in particular, if , for some , the corresponding term proportional to is (Q-almost-everywhere equal to 0 and therefore) not included in the sum). We may define for every the Q-dominated probability measure :
for all , where it is noted that for all , (“”, in a slightly abusive but natural notation that we introduce in Remark 1).
Lemma 1.
Let , and satisfy the minimal conditions and assume that resolves . Then, for any and any dominating , converges to P in total variation.
Proof.
The Radon–Nikodym density function is Q-almost-everywhere equal to the -measurable conditional expectation and, as such, forms a non-negative, uniformly integrable Doob martingale relative to the filtration . Since resolves , Doob’s martingale convergence guarantees that in . The assertion now follows from the fact that for Q-dominated probability measures P and , the total variational norm of their difference is proportional to the -norm of the difference between densities,
for all . □
The above martingale convergence of densities has implications for the total-variational norm that we shall appeal to in Section 3 and Section 8.
Proposition 1.
Let μ be a bounded, signed, Borel measure on . The mapping is monotone-increasing. If resolves , then the total-variational norm for μ equals,
Proof.
If , and refines , then,
Let a signed measure and be given. According to the Hahn–Jordan decomposition, there exists a such that, for any , , we have and, for any , , we have . Moreover, . Since is directed and the union of all , () generates a generating ring for , and there exist an and a with , so that,
proving the assertion. □
The quantities , used to control weak compactness in Section 3, are also suprema of their histogram versions.
Lemma 2.
Assume that resolves . For any such that and any , we have,
Proof.
Write the Radon–Nikodym derivative of P with respect to Q as . Let be given. If , then and for any . If , it follows from the convexity of and Jensen’s inequality that
which implies that, for any ,
and that the mapping is monotone-increasing. Based on Proposition 1, we then find,
Note that we have for every ,
An appeal to Lemma 1 then proves the assertion. □
2.1.3. Random Histogram Systems and Coherence
Regarding random elements (e.g., random elements of a Bayesian statistical model), we can project P onto its random histograms, as formalized in the following proposition, which introduces the notion of coherence.
Proposition 2.
Let , and satisfy the minimal conditions and let Π denote a Borel probability distribution on describing a random element P. Then, for every ,
induces a random histogram with probability distribution on . If , then and are coherent, i.e., the distribution of follows from that of through summation, as in Equation (1).
Proof.
By assumption, for every and every , is Borel-measurable. Accordingly, is a Borel probability distribution on . Coherence (Equation (1)) is a consequence of Equation (4). □
Our main question may be paraphrased as the converse of the above proposition: suppose that we provide distributions for random histograms for all . Under which conditions does a collection of (probability) histogram distributions define a random (probability) measure (uniquely)? According to Proposition 2, coherence is necessary.
Definition 4.
Let , and satisfy the minimal conditions. For every , let be a distribution for a random histogram , as in Equation (8). Assume that the resulting system of random histograms has the following property: if , then the distribution follows from through summation, as in Equation (3), i.e.,
Then, we refer to as a coherent (inverse) system of random histogram distributions. If there exists a unique Radon probability distribution Π on with projections for all , then Π is called its random histogram limit.
For later reference, we define mean measures for Borel probability distributions on .
Definition 5.
Let , and satisfy the minimal conditions. Consider with a Borel probability measure Π. The mean measure G under Π is defined pointwise,
for every Borel set A in . Its restrictions to the sub-σ-algebras are denoted as .
To see that G is a well-defined probability measure, note that the -additivity of G is guaranteed by monotone convergence. Also note that the restrictions are mean measures for the distributions on : for any ,
Remark 1.
In the above, we abuse notation slightly: for any probability measure in , the domain is rather than . So when we mean to refer to , we shall often use the more natural notation instead.
2.2. The Bourbaki–Prokhorov–Schwartz Theorem
The conditions we derive in subsequent sections are based on a theorem from [] (referred to as Prokhorov’s theorem in []), which says that the existence of a limiting positive Radon measure in inverse systems of positive measures is equivalent to a form of inner regularity that holds for all projections simultaneously. This leads to the characterization of those inverse systems that consistently define Radon probability measures on with various topologies.
To discuss the Bourbaki–Prokhorov–Schwartz Theorem, we first have to generalize somewhat: let be a directed set and assume that , are Hausdorff topological spaces and that for any , there exist continuous, surjective transition mappings . Together, they form an inverse system of Hausdorff spaces (see [], Ch. I, § 4, No. 4; Ch. I, § 2, No. 3, Prop. 4, denoted as . If T denotes a Hausdorff topological space, a family of (projection) mappings , is said to be coherent if for all , , and it is said to be separating if for all , , there exists an such that .
Theorem 1
(Bourbaki–Prokhorov–Schwartz). Let be an inverse system of Hausdorff topological spaces indexed by , T a Hausdorff topological space and a coherent and separating family of continuous mappings. Let be a coherent inverse system of positive measures on . There exists a bounded, positive Radon measure μ on T projecting to for all , if and only if, for every , there is a compact such that for all ,
When holds, the measure μ is uniquely determined and for every compact set L in T.
Proof.
See Theorem 1 of [], Ch. IX, § 4, No. 2. □
If all conditions of Theorem 1 are met, but the system of functions is not separating, then a measure exists but may not be unique.
Bourbaki [] continues with application to a proof of existence of the Wiener measure and Kolmogorov’s perspective and the definition of so-called promeasures (also commonly known as cylinder set measures), which can be compared with the coherent histogram systems we define below: for a locally convex space E, [], Ch. IX, § 6 considers the collection of all linear subspaces V of finite co-dimension in E with continuous projections (and canonical for ) to introduce as the inverse system of finite-dimensional quotients. A coherent system of positive measures on the finite-dimensional spaces , , is called a promeasure on E. It is noted that [], Ch. IX, § 6, No. 8–10 formulates a sufficient condition (Minlos’s Theorem, [], Ch. IX, § 6, No. 10, Theorem 2, based on []), but it appears difficult to apply unless E is a (barrelled) nuclear space.
In subsequent sections, we apply Theorem 1 directly to spaces of (bounded/signed/-positive/probability) measures with various topologies, limiting the inverse system of finite-dimensional quotients and promeasures, to inverse systems of partitions and random histograms. Let us prepare for the discussion by referring to some specifications pertaining to the situation where , and satisfy the minimal conditions and , , and , in the form of the following proposition.
Proposition 3.
Let , and satisfy the minimal conditions. For all , the mappings are continuous and surjective, and forms an inverse system of compact Hausdorff topological spaces, with a non-empty, compact, Hausdorff inverse limit N.
Proof.
Let be given. For any , the mapping is an element of . Because is surjective, the induced mapping is a bounded linear operator (with the norm equal to one). The transpose mapping is defined by,
for all and . The linear mapping is bounded (with the norm less than or equal to one) and surjective. Note that if we express as a vector in ,
in accordance with (3). Finally, it is noted that inverse limits of non-empty, compact spaces are non-empty and compact (see [], Ch. I, § 9, No. 6, Prop. 8). □
The space N consists of finitely additive probability set functions on the -algebra generated by the partitions in . Existence theorems for inverse limit probability measures on associated inverse limit spaces like N have been studied extensively: Bochner’s theorem [] and Choksi’s theorem [] give relatively mild sufficient conditions for the existence of a limiting probability measure on N for inverse systems of Radon probability spaces (see also [,,]). But, although (with the weak topology of Section 3) is homeomorphic to a subspace of N, it has proven difficult to formulate an additional condition to specify that is concentrated on the image of in N (see, however, [,,] and the correct proof of the Mean-measure condition in []). One of the strengths of the Bourbaki–Prokhorov–Schwartz theorem is that is projected directly onto the spaces , without detour via the inverse limit N. In this way, Theorem 1 avoids the (attractive but misleading) suggestion that a probability distribution on N is an easy way to get ‘close to’ the desired distribution on . By insisting only on continuous projections , Theorem 1 focusses on inner regularity as the central issue. (Compare [], Theorem 21 and [], Ch. IX, § 4, No. 2, Theorem 1).
3. Random Histogram Limits with the Weak Topology
Again, let , and satisfy the minimal conditions, and fix the topology on to be the weak topology, defined as the subspace topology that inherits from with the weak topology. The compactness of a subset of is characterized by the Dunford–Pettis–Grothendieck theorem, as described in Appendix A.
3.1. Support and Approximation of Weak Histogram Limits
Before we apply Theorem 1 to define Radon probability measures on with the weak topology (and total-variational topology), let us consider some consequences, that is, necessary conditions for the existence of a random histogram limit. First, we characterize the support of Borel probability measures; next, we consider approximations of random P by random .
3.1.1. Support and Domination
The following lemma is immediate but central enough to emphasize.
Lemma 3.
Let , and satisfy the minimal conditions. Consider with a Borel probability measure Π. For any Borel set A in , implies that .
Proof.
Let a Borel set A in be given and assume that . If the Borel set in has probability , then, by -additivity, for some , the Borel set has probability . This would imply that , contradicting the assumption. □
Domination by the mean measure plays a role in the following proposition concerning the support of weakly-Borel probability measures on .
Proposition 4.
Let , and satisfy the minimal conditions. Consider with the weak topology and a Borel probability distribution Π. Let G be the mean measure under Π. Then, is closed in and,
Moreover, if is such that for all measurable partitions , lies in the support of in , then P lies in the weak support of Π.
Proof.
If is not dominated by G, then there exists a Borel set A such that . Consequently, for small enough , the weakly open neighborhood does not meet , so is weakly closed. According to Lemma 3, , so U receives -mass zero, implying that .
Regarding the last assertion, it is noted that since with the weak topology is homeomorphic to a subspace of the inverse limit N of Proposition 3, the collection of sets (where is any topological basis for ) in forms a basis for the weak topology. Consequently, for any weak neighborhood U of , there exists an and a such that , and,
by assumption. □
Recall the open problem regarding the construction of random elements from specific dominated families, as posed in [] (p. 42):
Indeed, it appears to be an open problem to find simple sufficient conditions, analogous to Corollary 9.3.VI, for the realizations of a random measure to be [almost-surely] absolutely continuous with respect to a given measure.
- (Ref. [] (Corollary 9.3.VI), formulates a condition for a system of random histograms with a limit to be almost-surely non-atomic). Proposition 4 says that, given some probability measure G, we should look for coherent systems of random histograms with the projections as their mean histograms and a limit that is a weakly-Borel probability measure on . In Section 3.2, we provide a relatively simple necessary and sufficient condition for a coherent random histogram system to have a unique weakly-Radon random histogram limit .
3.1.2. Approximation by Weakly Convergent Histograms
Next, we consider the way in which histogram systems with a weak limit approximate (random) probabilities . Let be a Radon probability measure on with the weak topology, with mean measure G. For every and every , let denote the collection of all Borel sets B in that are approximated by elements of the -algebra to within G-measure :
Note that for any and any B, there exists an such that (see Theorem 4.4 in []).
Proposition 5.
Let , and satisfy the minimal conditions. If Π is a weakly-Radon probability measure on with mean measure G, then, for every , there exists a partition and an , such that for all ,
Proof.
Let be given. By inner regularity, there exists a weakly compact H in such that . For every , there exists an , such that for all Borel sets A in , implies that for all , cf. (A1). In particular, if , then, for some , , implying that for all , . □
This observation is important from a computational perspective: the practitioner chooses an approximating partition to perform computations with histograms and would like to be able to control the accuracy of their approximations for the P in terms of their restrictions to . They have control over the probability measures and, as a result, control over the mean measures . Accordingly, they can choose a level of refinement (expressed by a choice for some partition ), making approximations in the G-measure by the -histogram. The Radon property ensures that the approximation in the G-measure carries over to approximation in the P-measure, uniformly in P, with arbitrarily high -probability, depending on the degree of approximation in the level that is chosen for actual computations. Such a guarantee concerning degrees of approximation is not automatic if is a Radon measure for the tight topology of Section 4.
3.2. Existence of Weak Histogram Limits
Let denote a set of finite Borel-measurable partitions of , directed for ordering by refinement. If we equip with the weak topology, Theorem 1 takes the following form.
Theorem 2.
Let , and satisfy the minimal conditions. Assume that resolves and consider with the weak topology. Let be a coherent system of Borel probability measures on the inverse system . There exists a unique weakly-Radon probability measure Π on projecting to for all , if and only if, there is a such that, for every there is a such that,
for all .
Proof.
According to Proposition 3, forms an inverse system of Hausdorff topological spaces. For all , is continuous with respect to the weak topology. If and , then there exists a set B in the -algebra generated by the such that , which cannot be the case unless, for some , the histogram projections and differ. Combining this with Equation (4), we can conclude that forms a coherent and separating family of continuous mappings .
The assertion now follows from Theorem 1 if we can show that condition (10) holds. To that end, let be given and define . Given some monotone decreasing sequence such that , let be positive constants such that,
for every . Define,
Let be given, choose such that and define . Since resolves , Proposition 1 and Lemma 2 say that for all P and, hence,
Conclude that H is relatively compact with respect to the weak topology, cf. the Dunford–Pettis–Grothendieck condition. For the compact closure of H in and any , we have (by monotony of for any ),
which shows that condition (10) of Theorem 1 is satisfied. Conclude that there exists a unique Radon probability measure on that projects to for all .
Conversely, let be a weakly-Radon probability measure on . According to Proposition 4, the weak support of is dominated by the mean measure G. Again appealing to Lemma 2, we see that that for every , all and all ,
Since is weakly-Radon, for every , there exists a constant such that,
verifying that condition (12) holds. □
On first sight, condition (12) may appear technical and inaccessible. It is noted, however, that in practice one has considerable control: one may choose (large enough to resolve but otherwise) as small as possible and a histogram system as simple as possible in order to enable the verification of condition (12) in a manageable form. Moreover, all subsequent calculations involve only probability distributions on finite-dimensional simplices, enhancing feasibility greatly.
Regarding the measure Q, we simplify by appeal to a necessary condition: it is clear that if Theorem 2 holds, then the support of the probability measure is dominated by the mean measure G, cf. Proposition 4. So, when looking for a candidate-dominating measure Q to verify condition (12), we can turn to the mean measures of the histogram distributions : if we show either that the are the histograms associated with a mean measure G (the Mean-measure condition []) or that condition (16) below holds (cf. Definition 5), then G can play the role of Q.
Based on those two remarks, condition (12) can be rewritten as follows:
For some , -histograms equal the mean measures for all and, for every there is a such that,
for all .
3.3. Existence of Total-Variational Histogram Limits
Let be a Hausdorff topological space and let be a subset of dominated by a probability measure . In this subsection, we distinguish from , represented by the same set, denoted as when equipped with the weak topology and when equipped with the total-variational topology. The identity mapping is a continuous bijection. Write and for the associated Borel -algebras.
Proposition 6.
If is separable and is a dominated subset of , then is separable and .
Proof.
Let Q denote the probability measure that dominates . By the separability of , the Banach space of Q-integrable, real-valued functions on , is separable, and so is its subspace of Radon–Nikodym densities . Since and are homeomorphic, is separable too. It can be shown [,] that, then, the total-variational norm is measurable with respect to the minimal -algebra for the measurability of the mappings , , which is contained in . Accordingly, . Since the total-variational topology refines the weak topology, also, . □
As a consequence, any Borel probability measure on (viewed as a -additive set-function with as its domain) gives rise to a Borel probability measure on (viewed as a -additive set function with as its domain).
Corollary 1.
Let be separable and, together with and , satisfy the minimal conditions. Consider with the total-variational topology. Assume that resolves . Let be a coherent system of Borel probability measures on the inverse system . If condition (12) holds, there exists a unique -Radon probability measure Π on projecting to for all .
Proof.
Under the stated conditions, Theorem 2 asserts the existence of a Radon probability measure on with the weak topology and Proposition 4 guarantees that is dominated by the mean measure G. Because is separable, , so is a Borel probability measure on . Uniqueness follows from the uniqueness of the weak histogram limit. By the Radon–Nikodym theorem, is homeomorphic (isometrically, even, cf. (7)) to a closed subset of the Polish space and, therefore, a Radon space, so that is a Radon measure. □
So, remarkably, the existence of a total-variational random histogram limit does not impose stricter conditions than the existence of a weak random histogram limit; moreover, not even inner regularity is lost in the transition from to . From the perspective of Theorem 5, this amplification is inconsequential, but events and statements involving the total-variational norm are very common and the measurability of total-variational balls is crucial for many applications (for example, in large-sample limits of posterior distributions on metric spaces in non-parametric statistics [,]).
4. Random Histogram Limits with the Tight Topology
The existence question of a limit for coherent random histogram systems has been studied extensively with the tight topology for : a rich body of literature has grown from Kingman’s original work on completely random measures [], with an emphasis on limits with almost-surely purely atomic realizations [,]. Here, we revisit the existence problem without restricting to point processes and derive a necessary and sufficient condition in Section 4.1 based on Theorem 1. In Section 4.2 we consider the support of tight random histogram limits.
4.1. Existence of Tight Histogram Limits
Let be a Polish space with topology . We are interested in the construction of Radon probability measures on with the tight topology. In comparison with the construction of Section 3.2, the assertion is weaker since the weak topology refines the tight topology. Accordingly, compactness as in condition (10) constitutes a less stringent restriction, while the continuity requirement of histogram projections becomes harder to satisfy.
Indeed, when one tries to reproduce the initial steps in the proof of Section 3.2 with the tight topology, a disappointment awaits: if with the standard topology, for example, then for any partition of into two or more subsets, the projection mapping of Equation (2) is not continuous: superficially, it appears that Theorem 1 cannot be applied.
In order to correct this, we refine to a zero-dimensional version of , rendering projection mappings continuous for a collection of partitions that is large enough to be separating and resolving. While this leaves the Borel -algebra unchanged, the transition to does complicate the nature of tight compactness in . A counterexample at the end of this subsection shows that this complication corresponds directly to the precise way in which a coherent system of random histograms can fail to have a tight limit.
4.1.1. Zero-Dimensional Refinements of Polish Spaces
With a countable basis for the topology , define the topological sub-basis,
for a topology on the set in which each basis element is clopen; denote the resulting topological space by .
Proposition 7.
The space is zero-dimensional and the identity mapping is continuous. If and are two bases for , the corresponding spaces and are homeomorphic. If is Polish, then is also Polish.
Proof.
The sub-basis gives rise to a basis consisting of clopen sets, so is zero-dimensional, and the identity i is continuous because refines . Because any contains a from the basis and vice versa, the identity mapping on is continuous from to and also from to . Assuming that is Polish, the countable product space is Polish (see [] (Proposition 3.3)) and has a diagonal that is a closed subspace, homeomorphic to . Enumerate the basis sets in and define to be the refinement of with made clopen (e.g., is the topological sum of and , etc.). The canonical set-theoretic identification is continuous. The spaces are all Polish as the topological sums of and (which are Polish). The product space is Polish and the map is a continuous bijection. Then, is Polish and homeomorphic to . □
Proposition 8.
Let be a Polish space. The Borel sets on and are equal and any set function μ is a (bounded/signed/positive/probability) Borel measure on if and only if μ is a (bounded/signed/positive/probability) Borel measure on .
Proof.
Note that the Borel -algebra on generated by the basis is identical to the -algebra generated by and its complements, which form the sub-basis for . Conclude that and have the same Borel sets. Boundedness, signedness or positivity, being a probability measure and countable additivity, are then identical as properties of set functions on the Borel -algebra. □
Proposition 8 implies the existence of a bijective mapping with the following properties.
Proposition 9.
The mapping is a continuous bijection. If is Polish, and are Polish and is Borel-measurable.
Proof.
Any bounded and continuous is also bounded, continuous when viewed as , so there exists a linear, injective mapping of norm one. Transpose to that a bounded, injective, linear of norm one (see [], Ch. II, § 6, No. 4, Proposition 5 and [], Ch. IV, § 1, No. 3, Proposition 8). As noted earlier, if is a Polish space, then so is (ref. [], Ch. IX, § 5, No. 4, Proposition 10), so, based on Proposition 7, both and are Polish spaces. According to Theorems 12.4 and 14.12 (Souslin’s theorem) in [], if are standard Borel spaces and is a Borel-measurable injection, then its inverse on is also Borel-measurable. Applied to , this proves the last assertion. □
For all , define the mappings ,
that take any bounded, signed Radon measure on into its -histogram.
Proposition 10.
Let α be a partition of generated by a basis and let denote the associated zero-dimensional version of . The mapping is continuous for the tight topology.
Proof.
For any partition generated by the basis (cf. Definition 3), any is clopen in . For any clopen A, is a bounded, continuous function on . Therefore, is continuous with respect to the tight topology and so is . □
Compactness in has a more stringent meaning than in the original space : indeed, according to Brouwer’s theorem, any compact is a union of a subspace homeomorphic to the Cantor set with isolated points. For example, with in its standard topology, is not compact in the space .
4.1.2. Tight Histogram Limits with Zero-Dimensional Compacta
Tight compactness in is characterized in Prokhorov’s theorem (see Appendix A), which says roughly that a norm-bounded set H of measures is relatively tightly compact if inner regularity is a property that holds uniformly in H.
We are now in a position to apply Theorem 1 to (or rather, ) with the tight topology. (Note: mention of the Radon property in the statement of Theorem 3 is accurate but strictly speaking redundant since is a Radon space.)
Theorem 3.
Let be a Polish space with a directed set of partitions that resolves , generated by a basis that gives rise to a zero-dimensional . Consider with the tight topology. Let be a coherent system of Borel probability measures on the inverse system . There exists a unique Radon probability measure Π on projecting to for all , if and only if, for every , there is a compact such that,
for all .
Proof.
Under the condition of the theorem, , and satisfy the minimal conditions. Like before (see Proposition 3), forms an inverse system of compact Hausdorff topological spaces. As in the proof of Theorem 1, is a coherent and separating family of mappings on and, from Proposition 10, we can conclude that the are also continuous.
To show that condition (10) holds, let be given and define . Given some decreasing sequence such that , , let be compact subsets of such that,
for every and every . Define,
Let be given, choose such that and choose . Since the Borel sets decrease as the level of refinement of the partition increases, and since resolves , , and by monotony. Conclude that H is relatively compact with respect to the tight topology, according to (A2). For the compact closure of H in and any , we have (by monotony of for any K),
which shows that condition (10) of Theorem 1 is satisfied. Conclude that there exists a unique Radon probability measure on that projects to for all . The continuous mapping of Proposition 9 serves to define , a Radon probability measure on , and still projects to for all .
Conversely, since is Polish, cf. Proposition 9, , and are Polish spaces and the mapping is Borel-measurable. Therefore, the mapping defines a Borel probability measure on , which is Radon because is a Radon space. So, according to Prokhorov’s theorem, for every , there exists a compact in such that,
With , for every , we have and, accordingly,
for any , which implies the converse. □
Let us paraphrase: to have a coherent inverse system of probability measures for histograms defining a limit that is a Radon probability measure on for the tight topology, we look for compacta in a zero-dimensional version of that captures most of the mass of the projected measures with high -probability, uniformly in .
4.1.3. Tight Histogram Limits with Ordinary Compacta
In certain histogram systems (like those that define Dirichlet process distributions), there is an easy way to prove the Mean-measure condition (see the proof of Theorem 6). In histogram systems where this condition is less or not accessible (like those that define the Pólya-tree distributions), zero-dimensional compacta in the space are unwieldy, so we also provide a re-formulation of Theorem 3 that relies only on compacta in .
To avoid mention of the zero-dimensional space , we re-construct compacta from decreasing sequences of compacta in . Let be a directed set of partitions generated by a basis. For every , consider the topological space obtained from by declaring all sets clopen, i.e., is the topological sum of all the partition sets with their subspace topologies. Note that the set-theoretic identity mapping on is continuous as a mapping . (See also the proof of Proposition 7.)
Lemma 4.
A subset is compact in if and only if is compact in for all . Conversely, given compact subsets of for all , the subset is compact in .
Proof.
Consider the product space . The diagonal is a closed subset of , homeomorphic to , and the mappings are the canonical projections , applied after the homeomorphism . A compact in has compact images in all , . Conversely, if H is a subset of such that is compact in for all , then is compact in by Tychonov’s theorem, and so is the closed subspace . Set-theoretically, for all , implying that is homeomorphic to H, so H is compact. Given compact subsets of for all , the subset is compact as a subset of any , (), so is compact as a subset of . □
Corollary 2.
Let be a Polish space with a countable basis and a well-ordered sequence of partitions generated by the basis that resolves . Consider with the tight topology. Let be a coherent system of Borel probability measures on the inverse system . If, for all , all and all , there is a compact in such that,
for all such that , then there exists a unique Radon probability measure Π on projecting to for all .
Proof.
Enumerate the partitions in , and let be given. To find a compact subset of to satisfy property (16), we construct a decreasing sequence of non-empty, compact sets in the spaces , () by induction and take the intersection. For now, assume that . According to condition (17), there exists a compact set in such that,
for all . Make the induction assumption that for given , there is a compact in with,
for all . Fix . To combine masses back at a later stage and choose for all , such that . For any , there exists a compact , such that,
for all . The intersection is not only compact in but also in . Then, for any , if does not lie in any of the subsets on the left-hand sides of inequalities (18) and (19), then,
and the -probability of that event is lower-bounded by,
completing the induction step. Define , which is compact in by Lemma 4, and,
for all , showing that condition (16) is satisfied, and the assertion follows from Theorem 3. Coming back to the assumption that , if consists of more than one set, then the induction argument is started from a (finite) partition that coincides with some -stage in the proof provided above. □
The requirements that Corollary 2 places on and are more specific than those of Theorem 3, but not necessarily more restrictive: all Polish spaces have countable bases, and well-ordered partition systems , generated by some countable basis , can all be derived as subsequences of the generic situation, cf. Example 1.
4.1.4. Coherent Random Histogram Systems Without Limit
To conclude, we consider the cases in which condition (16) does not hold. We start with a counterexample that illustrates concretely how the failure of condition (16) is related to the ‘leaking away’ of probability mass in the limit of refining .
Example 2.
Consider with a basis defined by all open intervals with rational midpoints and rational radii. Consider a triangular array defined by of values in , such that for every , we have , ; for every , ; and for every , . Defining to be of the form,
one verifies that the are generated by the basis and refines for any . Assuming that the set is dense in , the resulting partitions collectively generate the Borel σ-algebra. (For later reference, we indicate the possibility to choose , , to define partitions on .)
The simplest example of a coherent histogram system that does not satisfy condition (16) is constructed as follows. Choose some , and define histogram distributions for all for the probability vectors satisfying,
that is, some non-zero fraction of the total probability mass in the n-th histogram is concentrated in the ‘outside’ sets with -probability one. As and , coherence of the histogram system is maintained. Assuming that , any compact K in fails to meet the ‘outside’ sets, and , for large enough n, which invalidates condition (16).
To summarize the above example, the problem occurs because the shifts a non-zero amount of mass towards without limitations as n grows. For any presumed limit measure on , this would mean that for all compact sets K in , . This shows that property (16) cannot be satisfied, and no such limit exists as a probability distribution on .
The non-compactness of appears essential in the above example; however, the next example shows that the situation is more complicated: mass can ‘leak away’ not just to points at infinity but at any boundary between partition sets.
Example 3.
In Example 2, take equal to the compact subset , define the points for all , and consider partitions
so that . Now define the histogram distributions for the probability vectors by,
and the distribution of equal to that of . Again, we have a coherent system of histogram distributions and, like in Example 2, probability mass is shifted up against the boundary points at 0 and 1, but no limiting distribution Π on exists with the as histogram projections.
In fact, probability mass does not even have to disappear at points of the boundary of : if we make this example on part of a refining system of partitions of , with some fraction of the total probability mass in (a random histogram system on) the complements , the construction on will continue to make some non-zero fraction of the mass ‘leak away’ across the boundary of , which lies in the interior of .
The concluding remark in Example 3 is close to the generic situation: if we partition into intervals, boundaries between partition sets create the potential for coherent random histogram systems that make probability mass disappear there in the limit. If we generalize to higher dimensions, it becomes clear that mass does not necessarily disappear at specific points, it may be concentrated in any decreasing sequence of partition sets with an empty limit; this shows in which way a (histogram-specific) form of -additivity makes a re-appearance.
These counterexamples highlight the significance of condition (16): requiring the existence of a compact K in would prevent Counterexample 2 but not Example 3. In order to prevent ‘leakage’ of the latter type, we have to impose the stronger requirement of the existence of a compact in , which keeps probability mass away from all potential points of ‘leakage’ simultaneously.
In case condition (16) cannot be satisfied, as in Examples 2 and 3, it is possible to consider the compactification of , for example, the Stone–Čech compactification . With the canonical extension of partitions of to the space , condition (16) is satisfied trivially. The limiting probability measure on may not be unique (because the projections onto the spaces are not necessarily separating). Moreover, in applications, the added points in the closed subset lack interpretation.
4.2. Support in the Tight Topology
Below, Theorem 3 is used to characterize the support of histogram limit measures on with the tight topology for a Polish space . As it turns out, the appropriate relation for the mean measure is the inclusion of supports. This assertion is already known in the literature (see, for example, Theorem 4.15 in []), but the proof given here is not. In the formulation of the following theorem, let G denote the mean measure of Definition 5.
Proposition 11.
Let be a Polish space. Consider with the tight topology and a Borel probability distribution Π. Let G be the mean measure under Π. Then, is closed in and,
Moreover, if is such that for all partitions , lies in the support of in , then P lies in the tight support of Π.
Proof.
If P is such that , there exists an and, by complete regularity of , a continuous with on and . While , the open neighborhood of x for which receives non-zero P-probability, and we see that . So, if Q lies in the tight neighborhood of P (for some ), and, accordingly, , from which it follows that is closed. Moreover, by Markov’s inequality and Fubini’s Theorem,
Conclude that P has a tight neighborhood of -mass zero, which means that P does not lie in the tight support of .
Regarding the last assertion, it is noted that since with the tight topology is the continuous image of a subset of the inverse limit N of Proposition 3, the collection of sets (where is any basis for , e.g., total-variational balls) in forms a basis for the tight topology. Consequently, for any tight neighborhood U of , there exists an and a such that , and,
by assumption. □
5. Phase Structure of Probability Histogram Limits
In this section we combine the two main existence theorems of the preceding sections with the general theory of completely random measures [] to describe the various ways in which random histogram limits manifest. In Section 5.1 we review completely random point processes [] and show in Section 5.2 how combination leads to the conclusion that random histogram limits occur in one of four distinct phases: continuous-singular or dominated, each either purely atomic or not (see Theorem 5 below). The phase of a random histogram limit depends on the topology on and on independence within random histogram distributions. In Section 7 and Section 8, we demonstrate that both in the Pólya-tree family and in the Gaussian family of histogram limits, changes in the defining parameters of their histogram systems can cause the limit to transition from one phase to another.
5.1. Completely Random Measures
In [,], so-called completely random measures are defined as positive random measures that assign stochastically independent random masses to disjoint measurable subsets of the underlying space , and it is shown that (the random part of) a completely random measure is a purely atomic measure with -probability one. (Note, we say that a positive measure is purely atomic if the collection D of points for which (so-called atoms) contains all -mass, i.e., ; we say that is non-atomic, if .) Below, we give the briefest of introductions to completely random measures (following [] (Chapters 9 and 10)), and relate the results to the existence theorems of Section 3 and Section 4.
Definition 6.
Let be a Polish space. A random positive Radon measure ν on , distributed according to , is called a completely random measure, if, for any finite collection of disjoint measurable sets , the measures are independent.
Any (random) positive Radon measure decomposes as a sum of a (random) purely atomic measure and a (random) non-atomic measure in a unique way [] (Proposition 9.3.IV),
and, for a random positive Radon measure to be almost-surely non-atomic, it is necessary and sufficient that for any , there is a finite Borel measurable partition of such that for all finer finite Borel-measurable partitions , . In the case of a completely random measure , this implies that is -almost-surely equal to some fixed (that is, non-random) non-atomic measure [] (Proposition 10.1.II). As a consequence, the atomic part of any completely random measure can be fixed or random, while the non-atomic part is always non-random.
Definition 7.
For any random positive Radon measure on and any , define the cumulant by,
Fubini’s theorem implies that if is a completely random measure, for any , the cumulant is a positive Borel measure. The theorem below says that the atomic part of a completely random measure decomposes into a sum of random atoms at fixed points in and a sum of random atoms at random points in .
Theorem 4
([]). Let be a completely random measure with cumulant measures for . If all are σ-finite, then with -probability one, ν satisfies the decomposition,
where is a non-random, non-atomic, σ-finite measure on ; is a purely atomic measure supported on a fixed, countable subset where and are independent if , ; and is a random purely atomic measure that is independent of .
As it turns out, -finiteness of is equivalent to the existence of countable cover of such that , for every . Furthermore, the set of fixed atoms D is the set of atoms of , and the -additivity of implies the countability of D. The random purely atomic measure is realized with the help of a Poisson point process N on (cf. [] (Proposition 9.1.III-(v)), as follows:
with an intensity measure that may be unbounded on sets of the form , , but satisfies,
One may also start by choosing the intensity measure and define through (22). In the case that , the existence of an almost-surely strictly positive and finite completely random measure is equivalent to the requirement that the Lévy intensity associated wit the random jumps assigns infinite mass to [] (see p. 563).
The measure appears in as the t-linear contribution: So, completely random measures with cumulant measures without -terms () and without fixed atoms () are characterized as purely atomic with random locations, purely ; similarly, completely random measures with and Poisson intensity measure are characterized as purely atomic with fixed locations, purely . Based on complete randomness, a wider class of random measures can be characterized in which the almost-sure atomic nature is preserved [].
Complete randomness imposes a purely atomic nature on random probability measures too, after normalization: if a given positive random measure satisfies with -probability one, then defines a random probability measure called a normalized completely random measure, and P inherits the purely atomic nature of . The histogram distributions for follow from the distributions through,
where . We say that the random histograms are independent up to normalization.
5.2. Phases of Probability Histogram Limits
Combining the conclusions of Section 3 and Section 4 with the presence or absence of complete randomness, we arrive at the following theorem. (Given a random probability measure, let G denote the associated mean measure.)
Theorem 5
(Phases of random histogram limits). Let , and satisfy the minimal conditions. Let be a directed set of finite, Borel-measurable partitions that resolve , with a coherent system of Borel histogram probability measures on the inverse system .
- (i.)
- (absolutely-continuous)If condition (12) is satisfied, the histogram limit describes a random element P of , distributed according to a weakly-Radon probability measure Π, such that Π-almost-surely, for all measurable , implies :The random element P can be identified isometrically with a random positive Radon–Nikodym density function p in of norm one, and we can write, for all ,
- (ii.)
- (fixed-atomic)if condition (12) is satisfied and the describe normalized completely random histograms, cf. (23), the histogram limit is a normalized version of the sum ν of a fixed non-atomic measure and a random purely atomic measure supported on the fixed, countable set . For all ,Assume, in addition, that is a Polish space and that is a directed set of finite partitions generated by a basis that resolves ;
- (iii.)
- (continuous-singular)If condition (16) is satisfied, the histogram limit describes a random element P of , distributed according to a tightly-Radon probability measure Π, such that Π-almost-surely, for all open , implies :
- (iv.)
- (random-atomic)if condition (16) is satisfied and the describe normalized completely random histograms, cf. (23), the histogram limit is a normalized version of the sum ν of a fixed non-atomic measure , a random purely atomic measure supported on the fixed, countable set , and a random purely atomic measure . For all ,
Proof.
In cases (i.) and (iii.), the theorem states the assertions of Theorems 2 and 3; in cases (ii.) and (iv.), these assertions are combined with those of Theorem 4, specific to normalized completely random measures (where it is observed that the set of atoms of (for any ) is equal to the set of atoms of G). □
In qualitative terms, we may describe the phase structure of random histogram limits as follows: The most general, least constrained type of limit above is that of the continuous-singular phase. According to (20), any continuous-singular random P decomposes into a random atomic component and a random non-atomic component. The random component of any completely random case manifests as purely atomic (the random-atomic phase), with independent, randomly sized point masses at fixed locations and independent random locations. Many examples of (normalized) completely random families are known, including the well-known Dirichlet family (which is discussed in Section 6) and a sub-family of Gaussian histogram systems (see Section 8).
The random non-atomic component of a histogram limit in the continuous-singular phase is novel and more interesting: it is implied by the above that dependence in histogram distributions is required to induce a random non-atomic continuous-singular component. To illustrate the nature of such a component, we may think, for example, of and a with G equal to the Lebesgue measure, describing a random Stieltjes function , from a class that is everywhere continuous but not everywhere (or even nowhere) differentiable (e.g., the so-called Cantor distribution). Such distributions are non-atomic but cannot be identified with random Radon–Nikodym density functions. The Gaussian histogram systems of Section 8 are in the non-atomic continuous-singular phase generically, and only Gaussian histogram systems with diagonal covariance matrices are in the random-atomic phase.
In the absolutely-continuous phase, the histogram distributions are such that the histogram probabilities may be larger than their means , but not to such a degree that (-averages of) proportions between and grow unbounded in the limit. This is borne out by the formulation of property (13), and also serves to interpret later bounds (e.g., (26)). The upper bound on the proportions between and induces domination with -probability one. Extending the above example with , the absolutely continuous phase describes a random Stieltjes function that is everywhere differentiable and can be identified with a random Radon–Nikodym density function with respect to G. In Section 8, we discuss Gaussian random histogram limits in the absolutely-continuous phase. If we specify that an absolutely-continuous random histogram limit is also normalized completely random, then the limit is in the fixed-atomic phase: combining the resulting purely atomic character of the random component with domination by G, we find only random point masses at the fixed locations of the atoms of G. The sub-family of Dirichlet process distributions with countably supported base measures are in the fixed-atomic phase.
The distinction between the random-atomic and fixed-atomic phases provides an alternative explanation for the decomposition of the random, purely atomic component in Kingman’s theorem: based on the above and the Radon–Nikodym theorem, we explain this by the fact that any random probability measure decomposes uniquely into a random component dominated by its mean measure G and a random component that is mutually singular with respect to G (but still with support inside the support of G).
6. Existence and Phases of Dirichlet Histogram Limits
The best-known family of histogram limits is the Dirichlet family; its definition is based most conveniently on the observation that if are independent and distributed according to Gamma distributions , then is distributed according to . (Below, we use the convention that is a single atom of mass one located at zero.)
Definition 8.
Let ν be a non-zero, bounded, positive Borel measure on a Polish space and define, for every Borel-measurable partition α,
where . The histogram distributions on are those of the normalized positive random elements , where , () and . Together, the distributions are coherent and form the Dirichlet histogram system with base measure ν.
It is clear that the Gamma process, defined by the positive random vectors , is completely random and that Dirichlet histogram systems are normalized completely random. Limits of Dirichlet histogram systems therefore describe random probability measures in one of the two atomic phases.
A second immediate observation is that coherence of the histogram system could have been guaranteed based on parametrization in terms of a finitely additive base measure . The well-known Mean-measure condition [] requires to be countably additive to guarantee the existence of a unique histogram limit with respect to the tight topology on . We come back to the Mean measure condition below.
6.1. Tight Limits of Dirichlet Histogram Systems
The following theorem is the (by now classical, see []) existence result for Dirichlet histogram limits, with a new proof in terms of condition (16).
Theorem 6.
Let be a Polish space, endow with the tight topology and let ν be a non-zero, bounded, positive Borel measure on . There exists a unique Radon probability measure on projecting to the Dirichlet histogram distributions (24), describing a random probability measure in the random atomic phase.
Proof.
Let be a countable basis for and let be a refining sequence of partitions, generated by , that resolves . By assumption, there exist distributions for the random histograms , (). As said, the coherence of the inverse system follows from finite additivity of the measure .
To prove condition (16), let be given. According to Proposition 8, defines a bounded positive Borel measure on and, according to Proposition 7, is Polish, so is a Radon measure on . Hence, there exists a compact in such that,
Let be given. By Markov’s inequality and the fact that under ,
for any , we have,
by Markov’s inequality, the fact that the are proportional to and the fact that . Conclude that there exists a unique histogram limit , a Radon probability measure on with the tight topology. Because the histogram system is normalized completely random, the limiting random element P is in the random-atomic phase. □
To conclude, two remarks are in order: Firstly, coming back to the Mean-measure condition, it is noted that the above proof relies on being not just finitely, but countably additive, to imply the Radon property. Secondly, we note that restriction to with partitions generated by the basis may be confusing since the most common definition of the Dirichlet histogram system involves all Borel-measurable partitions, . We argue that this distinction expresses the difference between the roles that plays in Theorem 6 and Proposition 2: to define , we are restricted to directed sets of a special form, while, after proving existence, we may use histograms associated with all .
6.2. Weak Limits of Dirichlet Histogram Systems
Whether is a Radon measure with respect to the weak topology as well depends on the base measure . To make a preliminary assessment, note that, given and , for any ,
Based on (25), we see that for every ,
Now, let be given. Due to the bound (26), for any and any , Markov’s inequality gives,
for all . Since , as refines (unless is finite), this shows that the most obvious upper bound to imply uniform integrability does not lead to a useful argument. However, we show the following.
Theorem 7.
Let , and satisfy the minimal conditions and consider with the weak topology. Let ν be a non-zero, bounded, positive, purely atomic measure on . Then, there exists a unique Radon probability measure on with the weak topology, projecting to for all . In that case, describes a normalized completely random measure in the fixed-atomic phase.
Proof.
First, consider a countable set D with the discrete topology (which is a Polish space), with a bounded, positive Borel measure on D. According to Theorem 6, the Dirichlet histogram system with base measure has a Radon histogram limit on with the tight topology. Since any bounded is continuous, the tight and weak topologies are equal. Therefore is also Radon with respect to the weak topology on by default.
Now, let be Polish and let D denote the set . Let denote the set of all finite partitions of D and let denote the set of all finite, Borel-measurable partitions of . Define to contain all partitions that combine a partition from and a partition from , to partition the whole space . Note that resolves , and is directed and co-final in . For any , the Dirichlet histogram distribution is such that for the (-measurable) subset ,
So, with -probability one. The projections of onto give rise to a Dirichlet histogram system with base measure , the restriction of to subsets of D. As argued above, the limit is a Radon probability measure on with the weak topology. The space is weak-to-weak homeomorphic to the weakly closed subspace M of all such that , through the mapping , for all . Conclude that the histogram system based on partitions in has a histogram limit that is Radon on with the weak topology. □
7. Existence and Phases of Pólya-Tree Histogram Limits
Here, we give only a very brief introduction to Pólya-tree distributions; for much more, see [,,,] and the overviews in [,].
The Pólya-tree distribution is defined through a sequence of refining partitions of a Polish space (usually or the interval ), where, in each step, every set in the preceding partition is split into two subsets. To describe the resulting tree of refinements, we define the following: For every , we denote by the set of all binary sequences of length m (and we denote the empty binary sequence formally as , forming the only element of the set denoted ). We also define the set of all finite binary sequences (including the empty one). For any two binary sequences , , we write for the concatenation in . In particular, for any , () in appends a zero (one) to . Also note that for all . We write out as and use the notation for the projections onto the first binary digits. We also define for any with , with the last digit flipped: .
We use to organize a refining sequence of partitions, , , , etc., into a dyadic tree, defining and, for all ,
Mostly, we shall look at refinement through intersection with basis sets and their complements, i.e., for every , either or equals for some element U in a basis for . Note that in the case of a countable basis , iterative application of the above construction gives rise to a countable that resolves .
Example 4.
A typical example is a dyadic tree of partitions of (or ), constructed by iteratively bisecting every interval at the mid-point. This leads to a sequence of refining partitions , , consisting of intervals of the forms where and , , which is generated by a basis and which resolves . (In case , we add to every partition the singleton ).
To arrive at random histogram distributions for the Pólya-tree, we define for every a so-called splitting variable (and ), taking values in such that
- (i.)
- for any such that , is independent of ;
- (ii.)
- for every , there exist such that has a distribution.
(In case , we assign a separately chosen, fixed probability to with -probability one for all . As a default, we choose .)
Remark 2.
Here and below, we extend the usual family of Beta-distributions somewhat: we consider and and define for all , for all and .
The splitting variables are interpreted as random fractions that determine how much of the probability mass of goes to and how much remains for , in accordance with (27):
Consequently, for every , , the random probability for can be written as a product of independent fractions:
which fixes the histogram probability measures on for all ,
By construction, the are such that the refinement and coarsening of partitions (corresponding to relations of type (1)) are accommodated coherently.
For later reference, we note the first two moments of the random variables : for every and every , the mean measure equals,
by independence of the variables and expectations of the -distributions. Expressed in terms of the parameters , the second moment of takes the form,
based on the independence of the , the variances of the corresponding -distributions and Equation (29).
To have a sub-class of relatively simple examples, we define so-called homogeneous Pólya-tree systems.
Definition 9.
Let denote the a dyadic tree of partitions of (or ), as in Example 4. A Pólya-tree system is called homogeneous if we choose for all and set for all .
Accordingly, in a homogeneous Pólya-tree system, splitting variables are distributed symmetrically around , and the mean measure G for any homogeneous Pólya-tree system with a limit is a Lebesgue measure.
7.1. Tight Limits of Pólya-Tree Histogram Systems
First, the general case of the Pólya-tree histogram system is analyzed with Corollary 2: here, the particulars of the partition play a role in the formulation of the condition, so we have to be specific regarding and its partitioning. In this subsection, we specify that (or ), with a dyadic tree of partitions. We use the following notation: for all , and .
Theorem 8.
Let and let be the dyadic tree of Example 4. Let be a coherent inverse system of Pólya-tree measures (with parameter ) on the inverse system . Then, there exists a unique probability measure on that is Radon with respect to the tight topology and projects to the Pólya-tree histograms parametrized by , if and only if,
for every , and the resulting random element P of is in the continuous-singular phase.
Proof.
Given that , the partition consists of intervals of the forms , where and , , which is generated by a basis for the standard topology on . The well-ordered set of partitions resolves . For the given , we consider . Let also be given. If , with -probability one and any compact satisfies property (17). Assuming that , we write for certain fixed , like above, and consider the sequence of half-open intervals in , defined by,
Assuming that (30) holds, choose large enough such that,
Note that for all , , while, for any , . Defining K to be the closure of , by Markov’s inequality, we have,
which shows that property (17) holds.
Conversely, suppose that there exists a , such that,
Then,
while the sequence decreases to ⌀. Hence, the mean measures do not define a measure (on the ring that is formed by the union of all , ), which precludes the existence of a Borel probability measure on with the tight topology (if would exist, would define a Borel mean measure). □
Remark 3.
The above applies to examples with as well, but, in that case, we have to require, in addition to (30), that,
because aside from the open, left-sided boundaries of half-open intervals , there are directions towards where mass can ‘leak away’ in the limit.
Example 5.
It is well known [] that a Pólya-tree histogram system with defining parameters that satisfy,
for all coincides with a Dirichlet histogram system (not on all of , but on a smaller set of dyadic partitions that resolves , generated by a basis). Accordingly, such Dirichlet–Pólya-tree histogram systems have limits that are Radon probability measures on with the tight topology, and the resulting random element P of is in the random-atomic phase.
In the example below, we make a choice for the parameters that gives rise to a coherent histogram system without a tight limit. This choice is not singular by construction in the sense that parameters either grow very large or vanish in the limit: for all , we have . To introduce the example, we define the following function on .
Definition 10.
In the standard construction of Cantor space as a subspace of by successive deletions of open mid-sections of intervals, we define the Cantor mid-point function x that parametrizes the set of all mid-points of deleted intervals in terms of finite binary sequences: maps to the midpoint of the interval that is deleted in the m-th transition in the construction of the set : for example, in , , in , , , , in , etc..
Example 6.
Take with a dyadic tree of partitions as defined in Example 2, and, for all , ,
Note that
It is noted that and
Similarly,
Since for all ,
Conclude that,
which implies that the Pólya-tree random histograms defined in (33) form a coherent system that does not lead to a limiting probability measure on with the tight topology.
7.2. Weak Limits of Pólya-Tree Histogram Systems
Second, we formulate a sufficient condition for the parameters such that the corresponding Pólya-tree histogram system has a limit that is a Radon probability measure on with the weak topology. Based on this condition, it is demonstrated that homogeneous Pólya-tree systems with give rise to such weak histogram limits. This rate of growth is lower than that required in the sufficient condition of [], which is elaborated upon in [,,] and re-visited in [].
Theorem 9.
Let be a second countable metrizable space with countable basis , with a corresponding dyadic tree of partitions , , generated by the basis. Let be a coherent inverse system of Pólya-tree measures (with parameter ) on the inverse system . Assume also that condition (16) holds. Then, there exists a unique Radon probability measure Π on with the weak topology, projecting to for all if,
The resulting random element P of is in the absolutely-continuous phase.
Proof.
Condition (16) implies the existence of a tightly-Borel probability measure on and a corresponding mean measure , which serves as our choice of Q in the proof for property (12). Let be given. For any and every , Markov’s inequality gives,
for all , where K denotes the value of the supremum in Condition (34). Consequently, condition (12) is satisfied and Theorem 2 asserts that there exists a unique weakly-Radon probability measure on that projects to for all . □
Corollary 3.
Assume the conditions of Theorem 9 and let a sequence , () be given. If the grows like m or faster, , there exists a unique Radon probability measure Π on with the weak topology, projecting to the associated homogeneous Pólya-tree histogram system.
Proof.
Note that the sufficient condition of [] (see also []) suggests that the absolute continuity of homogeneous Pólya-tree limits sets in when grows as or faster; here, it is shown that absolute continuity is already obtained with that grows more slowly, like , or faster.
8. Existence and Phases of Gaussian Histogram Limits
Most known examples of random histogram systems with a limit are of the (normalized) completely random type []. The reason for the preference for systems with independent components is coherence, cf. (1) or (9), which is analyzed most conveniently with infinite divisibility, requiring independence between summands. The consequence is that most known histogram limits are in one of the atomic phases of Theorem 5. In this section we introduce the family of Gaussian random measures, random signed measures on the space with components that display dependence generically, manifesting in one of the non-atomic phases of Theorem 5.
8.1. Random Histogram Limits with Signed Measures
To arrive at a proof of existence for Gaussian histogram limits, we have to generalize the approaches of Section 3.2 and Section 4.1. Consider the case of a locally compact Polish space . The most natural generalization of our random histogram question calls for construction of Radon probability measures on , the space of all signed (and potentially unbounded) Radon measures on , with the vague topology (see [], Ch. III, § 1, No. 9), rather than with the tight topology as in Section 4. However, to make histogram projections continuous, transition to a zero-dimensional refinement (as in Section 4.1.1) is still a necessary step, which does not combine well with the vague topology (test functions for the vague topology on remain continuous when viewed as , but the compactness of their supports in is lost in general).
However, since with the vague topology is the inverse limit of the spaces for compact , we may also limit attention to compact subsets initially and then use Theorem 1 (with a directed set of compact labeling a coherent inverse system of histogram limits ) to define a limiting Radon probability measure on with the vague topology.
We defer proof of existence for such a ‘vague inverse limit of histogram limits on compacta’ to future work and focus here on the case where itself is a compact Polish space. Then, and the vague and tight topologies coincide. Although is not compact in general, with the tight topology still stands in continuous bijective correspondence with (as in (the first part of) Proposition 9), and the histogram projections of the form (15) are continuous. This enables the use of Theorem 1 to prove the existence of histogram limits that are Radon probability measures on with the vague/tight topology.
Theorem 10.
Let be a compact Polish space and let be a directed set of partitions that resolves , generated by a basis that gives rise to a zero-dimensional . Consider with the tight topology and a coherent random histogram system . If,
- (i.)
- for every , there is a constant such that for all ,
- (ii.)
- and, for every there is a compact such that for all ,
then there exists a unique Radon probability distribution Π on projecting to for all .
Proof.
According to Proposition 8, the Borel sets on and , as well as (bounded) signed Borel set functions and measures, are the same. The first part of Proposition 9 remains true, (the set-theoretic identity mapping is a tight-to-tight continuous bijection), but the second part fails because and are not necessarily Polish spaces (see Remark 4). Like before, the mappings form a coherent and separating family of continuous mappings on . Trivially, the linear spaces are finite-dimensional normed spaces (the vague, tight and total-variational topology are all equivalent) and the mappings are surjective and continuous, like in Proposition 3. Accordingly, forms an inverse system.
To show that condition (10) holds, let be given and let be a constant such that property (35) is satisfied for every . Define,
since , cf. Proposition 1.
Let be a compact subset of . For any ,
with as in the proof of Theorem 3 and the corresponding compacta in cf. property (36), we also define,
Following steps analogous to those in the proof of Theorem 3, one then finds that satisfies Prokhorov’s conditions (see (A2)), so that the closure forms a compact subset of and, for any , we have (by monotony of for any , like in the proofs of Theorems 2 and 3),
which shows that condition (10) of Theorem 1 is satisfied. Conclude that there exists a unique Radon probability measure on that projects to for all . The continuous mapping serves to define , a Radon probability measure on , and still projects to for all . □
As it stands, property (36) is somewhat unwieldy due to the occurrence of compacta in . Analogous to Corollary 2, we also provide a version that refers only to compacta in . For brevity’s sake, we omit the proof (which follows the same precise steps of the proof of Corollary 2): if, for all , all and all , there is a , compact in , such that,
for all such that , then property (36) is satisfied.
Remark 4.
Regarding the pair of properties (35) and (36), we remark that, unlike earlier applications of Theorem 1, our conditions are sufficient but (perhaps) not necessary for the existence of a histogram limit: note that is not necessarily complete and not a Polish space generically (see [], Ch. III, § 1, No. 9, Proposition 14), so that Borel measurability of the inverse mapping can no longer be guaranteed. Accordingly, not every Radon probability measure on can be extended to a Borel probability measure on canonically, and there may exist coherent histogram systems with an tight inverse limit Π on , for which stated conditions do not hold.
8.2. Existence of Tight Gaussian Histogram Limits
For the definition of Gaussian histogram systems, location and covariance parameters are defined in a way comparable to that of the base measure of the Dirichlet family.
Definition 11.
Let be a compact Polish space, let λ be a signed Radon measure on . For any Borel-measurable partition α, let denote the α-histogram projection of λ, in . Let Σ be a signed, symmetric Radon measure on (symmetric meaning that for all ). Assume that for every , the -matrix , with entries,
() is semi-positive definite. We refer to λ as the center measure, Σ as the covariance measure and as Gaussian parameters.
The measure may be viewed equivalently as a linear mapping that takes continuous functions on into signed Radon measures on , or as a symmetric bi-linear form. To give examples, we may turn to the theory of reproducing kernel Hilbert spaces.
Example 7.
For , let be a compact subset of , and consider a so-called positive-definite symmetric kernel function ; cf. the Moore–Aronszajn theorem. Every such kernel function is the reproducing kernel for a unique Hilbert space of functions on . We use k to define,
and note that such a is a covariance measure in the sense of Definition 11. Mercer’s theorem formulates the associated spectral theory, with viewed as a (compact, self-adjoint, positive) integral operator . Indeed, we may define a kernel by choice of a countable orthonormal subset of continuous functions , and non-negative , to define .
To extend the previous example for general covariance measures , note that if consists of partitions generated by a basis for , then, for any continuous ,
Polarization then defines a positive semi-definite bilinear form on . Within , there is a linear space of functions f that are -almost-surely equal to zero ( if ; if .), and the quotient space is a real pre-Hilbert space (see [] (Definition 7.5)), with Hilbert space completion denoted as , which generalizes the Moore–Aronszajn Hilbert spaces associated with reproducing kernel functions.
Definition 12.
Given Gaussian parameters , we define a Gaussian histogram system as follows: for all , we choose normal probability distributions for random signed histograms , as follows:
where denotes the multivariate normal distribution on with expectation and covariance matrix . When , we speak of a centered Gaussian histogram distribution, denoted as .
For partitions , where refines , let be as in (3), the mapping that expresses finite additivity. Below, we show that for any and any Gaussian parameters , the above histogram distributions define a coherent system, referred to as the Gaussian histogram system associated with the parameters .
The inclusion of a center measure is not of influence for the existence of Gaussian histogram limits: for all and all Borel sets B in ,
and, hence, exists if and only if exists. The existence of histogram limits therefore only concerns the -parameter. The existence conditions of Theorem 10 can be dominated by uniform bounds on mixed second moments of the absolute histogram components , which we denote by,
where is a constant that depends only on the correlation coefficient between and (see [], p. 933).
Corollary 4.
Let be a compact Polish space and let be a directed set of partitions generated by a basis, as in Example 1. Let Σ be a covariance measure on . Consider with the tight topology and the centered Gaussian histogram system . If the covariance measure Σ is such that,
- (i.)
- (ii.)
- and, for any open that decrease to ⌀,
then there exists a unique Radon probability distribution on projecting to for all .
Proof.
To use Theorem 10, we first verify the coherence of Gaussian histogram systems. (We do this first step of the proof generically, that is, with .) If , , and we write , then,
for . This can be expressed in terms of a linear mapping such that,
(where denotes the matrix transpose of ). Recall that for any finite , any linear and a random variable distributed , the random variable is distributed . So has the same distribution as for ,
for all . This verifies the coherence of the histogram system .
In the rest of the proof, we assume that the histogram system is centered: . To show that property (35) holds, we use Chebyshev’s inequality to upper-bound its left-hand side, for every and :
Assuming Condition (40) and choosing M large enough, we see that property (35) is satisfied for every and all .
Let and be given. Because is generated by a basis, A is the intersection of an (open) finite intersection U of basis sets and a (closed) finite intersection C of complements of basis sets. Because is a Polish space, U is , i.e., U is equal to a countable union of closed sets . Then, the closed sets increase to A as . In the open complements of in A, there exists a decreasing sequence of basis elements , and we may define closed sets . The open sets decrease to ⌀ and the sets are closed as subsets of and therefore compact. By assumption (see Example 1), is such that all sets in the basis occur as elements of for some . So, for every and every , there exists a such that the decomposition is such that .
In the above proof, we satisfy property (37) by roughly following the proof of Theorem 8, but with a different construction of the compacta , which is more generic and based on Example 1. In applications, control over the choice of allows for convenient constructions. Where the proof of Theorem 8 depends on the Carathéodory-like condition that for (specific) sequences in the generating ring that decrease to ⌀, here, the second-absolute-moment set functions of Condition (41) are required to go to zero.
Based on Condition (40), we briefly come back to the space and indicate how it is related to the covariance structure of a centered Gaussian histogram limit . To this end, we define the real-valued stochastic integrals for and consider the linear space of real-valued random variables that they span. Assuming integrability, on L, we define the bilinear form,
With , the quotient space with as an inner product is a real pre-Hilbert space, with Hilbert space completion denoted by . The following proposition involves histogram approximations of continuous functions: for any and any partition generated by a basis, let , be real numbers such that and let , noting that, for all , as refines within an that resolves .
Proposition 12.
Proof.
We first show that the (semi-definite) inner-product spaces and L are isometrically isomorphic. Let be continuous and denote their supremum norms by . By the continuity of ,
and the last step holds by Lebesgue’s dominated convergence, based on the facts that and . Using the definition of , we may then write,
To complete the argument, note that,
and the right-hand side is monotone-increasing in . By monotone convergence,
so that Condition (40) asserts the integrability of the function . So, using again the continuity of and dominated convergence,
This implies that for any , if and only if . Consequently, the pre-Hilbert spaces and are in isometrically isomorphic correspondence, and so are their (unique) Hilbert space completions. □
To conclude this subsection, we briefly consider two statistical perspectives: the frequentist question of estimation of the covariance measure and the Bayesian question of using Gaussian histogram limits to define priors on spaces of probability measures.
Remark 5.
Consider the statistical estimation of the covariance measure Σ from observed data. Assume independent and identically distributed observations , each distributed marginally according to the Gaussian histogram limit for some fixed covariance measure Σ. The idealized, direct question of how to estimate Σ from the observed sample is difficult because the data is of a functional nature: in any practical way, observing points in a space of measures amounts to the observation of some approximation or projection. In the present context, we interpret the data points as histograms: for some sample size and some , we observe independent and identically distributed . The question of estimating is then the textbook question of estimating the covariance matrix based on an independent and identically distributed sample from a multivariate normal distribution. This is a smooth parametric estimation problem, with best-regular estimators displaying -convergence, optimal asymptotic covariance and optimal Wald-type confidence sets.
If the functional data is expressed through any (that is, if the statistician can choose the partition before he sees the data ), it is important to note that, like the approximation of probability densities in Lemma 1, it is possible to approximate Σ by histograms in total variation:
(as ). In such a setting, we may refine partitions as the sample size n increases, ideally in such a way that the estimation error,
(as ) and the histogram approximation error are of comparable order.
Remark 6.
To illustrate the statistical possibilities also from the Bayesian perspective, consider the following family of non-parametric priors for measure spaces: since Gaussian histogram limits describe random signed measures, there is no direct Bayesian interpretation for a Gaussian histogram limit as a prior on a statistical model. However, a Gaussian random measure Φ can be conditioned to be positive, and if,
the conditioned random element can be normalized to a random probability measure, analogous to the (Bayesian) normalization of positive completely random measures (e.g., the Gamma process of Example 8). The resulting class of normalized conditionally positive Gaussian priors enable the novel option of describing continuous-singular random probability measures with histograms, rather than only random discrete probability measures (e.g., the Dirichlet random measures of Section 6).
8.3. Existence of Weak Gaussian Histogram Limits
In Section 8.1, we saw that for the existence of tight histogram limits, condition (35) (which is close to necessary, cf. Remark 4) says that the limiting random has a norm that is a tight real-valued random variable. We shall see in the present subsection that for the existence of weak Gaussian histogram limits, it is sufficient that is an integrable random variable.
Let be a compact Hausdorff space, fix the topology on to be the weak topology and choose a directed set of finite partitions in non-empty Borel sets. Then, , and satisfy the minimal conditions of Definition 1. The compactness of a subset of is still characterized by the Dunford–Pettis–Grothendieck condition, but with P replaced by the positive measure : a subset H of is relatively compact in the weak topology if and only if for some positive, bounded measure ,
as . The proof of the existence theorem for histogram limits that are Radon with respect to the weak topology does not differ substantially from that of Theorem 2, so we omit explicit statement.
Theorem 11.
Let be a compact Hausdorff space, consider with the weak topology and choose a directed set of finite partitions in non-empty Borel sets that resolves . Let be a coherent system of Borel probability measures on the inverse system . There exists a unique weakly-Radon probability measure Π on projecting to for all , if and only if, there is a such that for every , there is a such that,
for all .
Given a Radon probability measure on , the role of the mean measure G of Definition 5 as the dominating measure is taken over by the positive measure,
for , and implies (as in Lemma 3). Proposition 4 can be adapted to the signed case as well: is closed in and,
Below, we apply Theorem 11 to Gaussian histogram systems. To prepare, it is noted that for all and all ,
Clearly, the resulting set functions are not the -restrictions of the measure Q (contrary to the cases of positive or probability histogram limits, where all are restrictions of the mean measure G). For every and all , , and, hence, . If we assume that resolves , the positive measures (and the total-variational norms ) increase to (and the total-variational norm ).
Corollary 5.
Let be a compact Hausdorff space and let be a directed set of partitions generated by a basis that resolves . Let Σ be a covariance measure on . Consider with the weak topology and the centered Gaussian histogram system . If,
then there exists a unique weakly Radon probability distribution on projecting to for all .
Proof.
Two remarks are in order: firstly, we relate Condition (44) for the existence of weak Gaussian limits to Condition (40) for the existence of tight Gaussian limits by the Cauchy-Schwartz inequality:
showing that (44) implies (40). Second, we note that Corollary 1 stays valid in the signed case, so if Condition (44) holds, there exists a unique Radon probability measure on with the total-variational topology, projecting to for all .
Example 8.
Let be a compact subset of . Consider Example 7 with a kernel function that is constant: for some . If we let consist of partitions α with the property that for all , there exists a translation vector x in such that the Lebesgue measure of is zero. Then, for every α, is the -matrix with all entries equal to,
(where for any ). Clearly, the corresponding covariance measure Σ gives rise to positive semi-definite covariance matrices , with random histogram components that are highly dependent: in fact, the linear space of all with components that sum to zero forms the kernel of :
and a centered multivariate normal distribution is supported on the range of its covariance matrix. This means that lies on the diagonal of with probability one:
Note that
so, according to Corollary 5, there exists a weakly-Radon probability measure on projecting to for all .
Additionally, a moment’s thought shows that the above example serves as a bound for a host of examples based on the reproducing kernels of Example 7.
Corollary 6.
Let be a compact Hausdorff space and let be a directed set of partitions generated by a basis that resolves . Let Σ be a covariance measure based on a bounded kernel function . Then, there exists a unique weakly-Radon probability distribution on projecting to for all .
Proof.
Assume first that is as in Example 8. Since, for some and all , for all and all , . Summability then follows as in Inequality (46), proving the existence of a weakly-Radon histogram limit . For any Borel-measurable partition of , the weakly continuous mapping induces the Gaussian histogram distribution on . The uniqueness of the limit proves the assertion. □
Over the last two decades there has been considerable interest in the so-called Gaussian free field (see, e.g., refs. [,]); below, we use Green’s functions for the harmonic operators in as covariance kernels to define Gaussian histogram systems in the closure of a non-empty, bounded, open subset of .
Example 9.
We consider the existence question for the Gaussian free field first in : the Green’s function is of the form,
where is harmonic in x for every y. Define,
(where we choose f such that is symmetric and positive-definite). Choose a directed set of partitions generated by a basis that resolves . Based on Corollary 6, we see immediately that the associated centered Gaussian histogram system with histogram distributions has a limit that is a weakly Radon probability measure on . Then, , (), is a multiple of the Lebesgue measure due to translation invariance and,
implying that the random element is of the form,
for all , where ϕ is a random Radon–Nikodym density function in , the Banach space of Lebesgue integrable functions on .
For , the situation changes drastically (see Figure 1 and Figure 2): Green’s functions are unbounded and display singular behavior in neighborhoods of the diagonal, namely, and for . To apply Corollary 6, we modify for small length scales to regularize the singular behavior near the diagonal (e.g., for some small , replace by , which replaces the pole for with an upper bound ). The modified are bounded kernel functions, and Corollary 6 guarantees that for every , there exists a weak histogram limit . One may hope that the limit for as (which may exist only as a tightly Radon probability measure) describes the so-called Gaussian free field in d Euclidean dimensions. In light of earlier explorations (see, for example, ref. [] for a detailed overview of (mostly) the case), it is also possible that the limit exists only if we embed the space of Radon measures on , in spaces of distributions on . The limiting probability distribution would then describe a random generalized function of the type discussed in [] (Ch. IX, §6, No. 10) and []
Figure 1.
A sample from a random histogram on a 64 × 64 (lower-right) partitioned square patch of two-dimensional Euclidean space-time with the Green’s function for the Laplacian to define the covariance measure and its 32 × 32 (lower-left), 16 × 16 (upper-right) and 8 × 8 (upper-left) coarsened histograms. Coherence of the histogram system says that the distributions of the random 8 × 8, 16 × 16 and 32 × 32 histograms must equal the distributions implied by the coarsening of the random 64 × 64 histogram. The histogram limit is the random object obtained by infinite refinement.
Figure 2.
Samples from Gaussian random histograms on a 64 × 64 partitioned two-dimensional square slice of Euclidean space-time, with the Green’s function for the Laplacian to define the covariance measure, in two dimensions (upper-left), three dimensions (upper-right) and four dimensions (lower-left), alongside a sample with the Yukawa potential of a massive scalar boson field in four dimensions (lower-right).
8.4. Completely Random Gaussian Histogram Limits
The class of (centered) Gaussian histogram limits has a non-empty intersection with the class of completely random measures, characterized by covariance measures that place all mass on the diagonal. Completely random Gaussian histogram limits exist in the fixed or random atomic phase.
Definition 13.
Let Σ be a covariance measure on . We say that Σ is diagonal if for all , .
Note that with a diagonal , the set function defined by for all is a positive Radon measure on , and the Hilbert space is isometrically isomorphic to , the usual space of -square-integrable functions on .
A diagonal covariance measure leaves histogram components independent and leads to a completely random limit, which is of a fixed- or random-atomic nature also in the case of a signed completely random measure. If a Gaussian histogram system with a diagonal covariance measure has a tight limit and distributes its mass in a uniformly asymptotically negligible way (see (Section XVII.7) of []), infinite divisibility of the distribution of the random variable is implied.
Corollary 7.
Let be a compact Polish space and let be a directed set of partitions that resolves and is generated by a basis. Assume that Σ is a diagonal covariance measure. Then, the centered Gaussian histogram system has a unique tightly-Radon probability distribution on projecting to for all . If, in addition,
then the total-variational norm has a probability distribution that is infinitely divisible.
Proof.
For a diagonal covariance measure , any and any , the restrictions of cumulant measures of Definition 7 to the -algebras are given by,
for any . By the Carathéodory extension, all cumulants are therefore finite positive measures and, hence, [] (Theorem 4.4), there exists a tight completely random limit described by a marked Poisson process [] (Ch. 9–10) (with marks in rather than ), of the form (21).
With a diagonal covariance measure , the components of are independent random variables for every . Therefore, the norms of the random histograms ,
are sums of independent terms in a triangular array, and uniform asymptotic negligibility, as assumed in (47), is sufficient for tight convergence to an infinitely divisible limiting probability distribution [] (Section XVII.7). Since the total-variational norm is the monotone limit of the norms (Proposition 1), its probability distribution is tight and infinitely divisible. □
Example 10.
Let , let consist of partitions in half-open intervals, generated by a basis and collectively fine enough to resolve . Consider a centered Gaussian histogram system with diagonal covariance measure Σ defined by choosing τ equal to the Lebesgue measure. With sets of the form (and the singleton , for which we set for all ), the histogram system,
describes the independent, normally distributed increments of Brownian motion started from , so the (random) Stieltjes function for the measure Φ,
is a version of the sample path of Brownian motion on . Since resolves , the Lebesgue measures of all intervals go to zero as α increases in , and Condition (47) is satisfied. (By extension, if we replace normal distributions by stable distributions in this construction appropriately, infinite divisibility preserves the coherence and existence of Φ, cf. Theorem 10, implies random Stieltjes functions corresponding to right-continuous versions of Lévy sample paths on .)
Weak histogram limits with diagonal covariance measures display a limitation similar to that of Theorem 7. To appreciate the problem, note that for a diagonal covariance measure with positive dominated by the Lebesgue measure, Condition (44) cannot be satisfied. Purely atomic measures , however, lead to Gaussian histogram systems with weak limits.
Corollary 7 and Example 8 form two extremes: in diagonal cases, Gaussian histogram limits manifest in the fixed- or random-atomic phase, while covariance measures that spread their mass more homogeneously over dependence introduces a degree of smoothness, a situation that we have seen in its most extreme form in (the highly non-diagonal) Example 8. Gaussian histogram limits for other covariance measures are somewhere in between: depending on the degree to which -mass is located away from the diagonal, corresponding to the degree of dependence between Gaussian histogram components, the histogram limit may manifest in close-to-atomic (i.e., highly concentrated) or smooth/closer-to-constant form.
To demonstrate the explanatory value of the phase structure of Gaussian histogram limits described above, the last example, which is analyzed more comprehensively in forthcoming work, suggests the applicability of Gaussian histogram limits in (Euclidean) quantum field theory [].
Example 11.
Let be given and consider for some constant , and Σ diagonal with,
for all . The space plays the role of d-dimensional Euclidean ‘momentum space’ (and the constant Λ that makes compact is known as the UV cutoff scale in physics). The kernel defining Σ is interpreted as the (unregularized, Euclidean) ‘propagator’ of the massless scalar field (roughly, the Green’s function for the Laplace operator, which is represented by the convolution kernel in momentum space). The diagonal Gaussian histogram limit exists, cf. Corollary 7.
We point out the following consequence of the phase of this Gaussian histogram limit. In this case, ‘quantization’, a description of the field in terms of particles, emerges as a consequence of complete randomness: the Gaussian histogram limit is in the random-atomic phase and manifests as a random sum of discrete point masses in Euclidean momentum space. Such configurations have an immediate physical interpretation, as states describing (off-shell) particles, point-like quanta of momentum. It is noted that the emergence of quantization is not a feature of (second-quantized) quantum field theory: in the physical theory of quantum fields, particles are axiomatic and introduced by hand with the formal introduction of a Fock space to describe quantum states of the field (see the classical work [] (p. 106)).
Funding
This research received no external funding.
Data Availability Statement
The original contributions presented in this study are included in the article. Further inquiries can be directed to the author.
Acknowledgments
The author thanks Jan van Mill, Harm de With and Georg Meyl for numerous insightful discussions at various stages in the development of this work. The author also wishes to thank the University Torino and University Bocconi, Milano, for their hospitality.
Conflicts of Interest
The author declares no conflicts of interest.
Appendix A. Topologies on Measure Spaces
The locally convex topologies that play a role in vector spaces of measures are discussed here in some detail, with special attention for compactness criteria and subspaces of probability measures. In the main text, two topologies feature centrally, the tight and weak topologies (with a smaller role for the total variational topology). While we mention a minimal set of definitions necessary in the main text in Section 2.1, here, we consider the weak topology in some more detail. We assume that the reader is familiar with the much-better-known total-variational and tight topologies, and we mention Prokhorov’s condition for tight compactness for reference in the main text.
From a measure-theoretic perspective, the canonical duality for a vector space of bounded measures is formulated with a space of bounded measurable functions. Let be a measurable space and denote its vector space of bounded measures by (a measure is bounded if ). Let denote the vector space of bounded measurable functions on . For and , defines a bilinear form and a sub-basis of sets for the weak topology . The weak topology is the initial topology for the mappings , , a locally convex topology with semi-norms . A net (sequence, filter) , (), converges weakly to if for every , . The space with the weak topology is, in general, not metrizable, not complete and not separable, making it rather inaccessible compared to the tight and total-variational topologies.
The Dunford–Pettis–Grothendieck theorem characterizes weak compactness for subspaces of (see, for example, Appendix 8, Theorem 6 of []): a subset H of is relatively compact in the weak topology if for some ,
as (to make this condition also necessary, choose Q from the positive elements of the L-space generated by , see []). Therefore, dominated subspaces play a central role when considering inner regularity with the weak topology. For any subset H of dominated by some and any , there is a unique such that by the Radon–Nikodym theorem. In that case, duality between and the vector space of (Q-almost-everywhere equivalence classes of) bounded measurable functions on corresponds to the better-known weak-star topology on the continuous dual of the normed space . Weak(-star) compactness is characterized by the Dunford–Pettis theorem (see []), which says that a Q-dominated is relatively compact in the weak topology if and only if the subset of Radon–Nikodym densities in is uniformly integrable, i.e.,
as . Finally, we may characterize the relative compactness of a subset H of in the weak topology by the condition that there exists a such that for every , there exists a , such that,
for every . These three characterizations of weak compactness are equivalent.
While tight and total-variational convergence are widely used in statistics, the weak topology is much less prominent. This is unfortunate, given that in frequentist statistics, the weak topology characterizes the testability of hypotheses and the estimability of parameters, cf. an underappreciated theorem of Le Cam and Schwartz that gives necessary and sufficient conditions for asymptotically consistent estimation []. The following theorem is the Le Cam–Schwartz theorem, in a form specific to the existence of asymptotically consistent hypothesis tests, in the case where H is a weakly compact subset of .
Theorem A1.
For all , let denote independent, identically distributed observations with for some P from H, a weakly compact subset of . For mutually exclusive hypotheses , there exists a sequence of measurable such that,
for all , if and only if there exists a sequence of -uniformly continuous functions such that , as for every .
The interpretation of this theorem is that data-based distinction between two subsets in a statistical model can be made with asymptotic certainty if and only if allow for limiting separation with a sequence of -uniformly continuous functions. This result appears technical and inaccessible, but, for weakly compact H, the tight and weak topologies are equal and the above condition may be replaced by the requirement that both B and V are -sets for the tight topology! The Le Cam–Schwartz theorem demonstrates the centrality of the weak topology in (asymptotic) frequentist statistics.
The total-variational topology is the strong topology associated with the weak topology . is a norm on and the associated norm topology is complete. (Banach spaces like are considered in detail in [], for example.) Since,
the total-variational distance between two probability measures can be written as the -difference between densities, cf. (7), for any such that . On a subset that is dominated by a bounded positive measure Q, the Radon–Nikodym theorem induces a one-to-one mapping , which is isometric by (7).
If the -algebra is the Borel -algebra on associated with a (Hausdorff completely regular) topology on , weaker dual topologies are also natural: in particular, we consider the vector space of bounded Radon measures and the vector space of bounded continuous functions on . For and , defines a sub-basis of sets for the tight topology . The tight topology is the initial topology for the mappings , , a locally convex topology with semi-norms . A net (sequence, filter) , (), converges weakly to if for every , . If is a Polish space, the cone of all positive, bounded Radon measures and the subspace of probability measure with the tight topology are Polish spaces too. (Note that is not necessarily a Polish space!)
Prokhorov’s Theorem characterizes tight compactness for subspaces of [,]: let be a Hausdorff completely regular space; a subset H of is relatively tightly compact if and only if and, for every , there exists a compact such that,
References
- Le Cam, L. Asymptotic Methods in Statistical Decision Theory; Springer: New York, NY, USA, 1986. [Google Scholar] [CrossRef]
- Ghosal, S.; van der Vaart, A.W. Fundamentals of Nonparametric Bayesian Inference; Cambridge Series in Statistical and Probabilistic Mathematics; Cambridge University Press: Cambridge, UK, 2017. [Google Scholar] [CrossRef]
- Kleijn, B.J.K. Frequentist validity of Bayesian limits. Ann. Statist. 2021, 49, 182–202. [Google Scholar] [CrossRef]
- Kingman, J.F.C. Completely random measures. Pac. J. Math. 1967, 21, 59–78. [Google Scholar] [CrossRef]
- Kingman, J.F.C. Random discrete distributions. J. R. Stat. Soc. Ser. B (Methodol.) 1975, 37, 1–15. [Google Scholar] [CrossRef]
- Daley, D.J.; Vere-Jones, D. An Introduction to the Theory of Point Processes: Volume I: Elementary Theory and Methods; Probability and Its Applications; Springer: New York, NY, USA, 2003. [Google Scholar]
- Daley, D.J.; Vere-Jones, D. An Introduction to the Theory of Point Processes: Volume II: General Theory and Structure; Probability and Its Applications; Springer: New York, NY, USA, 2007. [Google Scholar]
- Bochner, S. Harmonic Analysis and the Theory of Probability; Courier Corporation: North Chelmsford, MA, USA, 1955. [Google Scholar]
- Choksi, J.R. Inverse limits of measure spaces. Proc. Lond. Math. Soc. 1958, 3, 321–342. [Google Scholar] [CrossRef]
- Metivier, M. Limites projectives de measures, martingales, applications. Ann. Mat. 1963, 63, 225–352. [Google Scholar] [CrossRef]
- Schwartz, L. Radon Measures on Arbitrary Topological Spaces and Cylindrical Measures; Studies in mathematics; Tata Institute of Fundamental Research: Mumbai, India, 1973. [Google Scholar]
- Bourbaki, N. Integration II: Chapters 7–9; Actualités scientifiques et industrielles; Springer: Berlin/Heidelberg, Germany, 2004. [Google Scholar]
- Rao, M.M. Foundations of Stochastic Analysis; Academic Press: Cambridge, MA, USA, 1981. [Google Scholar]
- Bogachev, V.I. Measure Theory; Springer: New York, NY, USA, 2007; Volumes I–II. [Google Scholar]
- Mallory, D.; Sion, M. Limits of inverse systems of measures. Ann. Inst. Fourier 1971, 21, 25–57. [Google Scholar] [CrossRef]
- Rao, M.M. Projective limits of probability spaces. J. Multivar. Anal. 1971, 1, 28–57. [Google Scholar] [CrossRef]
- Pinter, M. The existence of an inverse limit of an inverse system of measure spaces—A purely measurable case. Acta Math. Hungar. 2010, 126, 65–77. [Google Scholar] [CrossRef]
- Beznea, L.; Cîmpean, I. On Bochner-Kolmogorov Theorem. In Séminaire de Probabilités XLVI; Donati-Martin, C., Lejay, A., Rouault, A., Eds.; Springer International Publishing: Cham, Switzerland, 2014; pp. 61–70. [Google Scholar] [CrossRef]
- Kraft, C.H. A Class of Distribution Function Processes Which Have Derivatives. J. Appl. Probab. 1964, 1, 385–388. [Google Scholar] [CrossRef]
- Ferguson, T.S. A Bayesian analysis of some nonparametric problems. Ann. Statist. 1973, 1, 209–230. [Google Scholar] [CrossRef]
- Ferguson, T.S. Prior distributions on spaces of probability measures. Ann. Statist. 1974, 2, 615–629. [Google Scholar] [CrossRef]
- De Blasi, P.; Favaro, S.; Lijoi, A.; Mena, R.H.; Pruenster, I.; Ruggiero, M. Are Gibbs-type priors the most natural generalization of the Dirichlet process? IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 212–229. [Google Scholar] [CrossRef] [PubMed]
- Orbanz, P. Projective limit random probabilities on Polish spaces. Electron. J. Stat. 2011, 5, 1354–1373. [Google Scholar] [CrossRef]
- Werner, W.; Powell, E. Lecture notes on the Gaussian Free Field. arXiv 2020, arXiv:2004.04720. [Google Scholar] [CrossRef]
- Bourbaki, N. General Topology: Chapters 1–4; Elements of Mathematics; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
- Minlos, R.A. Generalized random processes and their extension to a measure. Sel. Transl. Math. Stat. Prob. 1963, 3, 291–313. [Google Scholar]
- Ghosh, J.K.; Ramamoorthi, R.V. Bayesian Nonparametrics; Springer Series in Statistics; Springer: Berlin/Heidelberg, Germany, 2003. [Google Scholar]
- Kingman, J.F.C.; Taylor, S.J. Introduction to Measure and Probability; Cambridge University Press: Cambridge, UK, 1966. [Google Scholar] [CrossRef]
- Pfanzagl, J. On the Existence of Product Measurable Densities. Sankhya 1969, 31, 13–18. [Google Scholar]
- Strasser, H. Mathematical Theory of Statistics; De Gruyter: Berlin, Germany; New York, NY, USA, 1985. [Google Scholar] [CrossRef]
- Kechris, A.S. Classical Descriptive Set Theory; Graduate texts in mathematics; Springer: Berlin/Heidelberg, Germany, 1995. [Google Scholar]
- Bourbaki, N. Topological Vector Spaces: Chapters 1–5; Springer: Berlin/Heidelberg, Germany, 1987. [Google Scholar]
- Regazzini, E.; Lijoi, A.; Prünster, I. Distributional results for means of normalized random measures with independent increments. Ann. Stat. 2003, 31, 560–585. [Google Scholar] [CrossRef]
- James, L.F. A simple proof of the almost sure discreteness of a class of random measures. Stat. Probab. Lett. 2003, 65, 363–368. [Google Scholar] [CrossRef]
- Mauldin, R.D.; Sudderth, W.D.; Williams, S.C. Polya Trees and Random Distributions. Ann. Statist. 1992, 20, 1203–1221. [Google Scholar] [CrossRef]
- Lavine, M. Some Aspects of Polya Tree Distributions for Statistical Modelling. Ann. Statist. 1992, 20, 1222–1235. [Google Scholar] [CrossRef]
- Lavine, M. More Aspects of Polya Tree Distributions for Statistical Modelling. Ann. Statist. 1994, 22, 1161–1176. [Google Scholar] [CrossRef]
- Bourbaki, N. Integration I: Chapters 1–6; Springer: Berlin/Heidelberg, Germany, 2003. [Google Scholar]
- Trèves, F. Topological Vector Spaces, Distributions and Kernels; Pure and applied mathematics; Academic Press: New York, NY, USA; London, UK, 1967. [Google Scholar]
- Kan, R.; Robotti, C. On moments of folded and truncated multivariate normal distributions. J. Comput. Graph. Stat. 2017, 26, 930–934. [Google Scholar] [CrossRef]
- Sheffield, S. Gaussian free fields for mathematicians. Probab. Theory Relat. Fields 2007, 139, 521–541. [Google Scholar] [CrossRef]
- Gel’fand, I.M.; Vilenkin, N.Y. Generalized Functions: Applications of Harmonic Analysis; AMS Chelsea Publishing: Providence, RI, USA, 1964; Volume 4. [Google Scholar]
- Feller, V. An Introduction to Probability Theory and Its Applications, 2nd ed.; Wiley Series in Probability and Statistics; John Wiley and Sons, Inc.: Hoboken, NJ, USA, 1991; Volume II. [Google Scholar]
- Hellmund, G. Completely random signed measures. Stat. Probab. Lett. 2009, 79, 894–898. [Google Scholar] [CrossRef]
- Zuber, J.B.; Itzykson, C. Quantum Field Theory; Dover Books on Physics; Dover Publications: Mineola, NY, USA, 2012. [Google Scholar]
- Diestel, J. Uniform Integrability: An Introduction; Dipartimento di Scienze Matematiche, Università Degli Studi di Trieste: Trieste, Italy, 1991. [Google Scholar]
- Le Cam, L.; Schwartz, L. A Necessary and Sufficient Condition for the Existence of Consistent Estimates. Ann. Math. Statist. 1960, 31, 140–150. [Google Scholar] [CrossRef]
- Dunford, N.; Schwartz, J. Linear Operators, Part 1: General Theory; Wiley Classics Library; Wiley: Hoboken, NJ, USA, 1988. [Google Scholar]
- Prokhorov, Y.V. Convergence of Random Processes and Limit Theorems in Probability Theory. Theory Probab. Appl. 1956, 1, 157–214. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).