Existence and Phase Structure of Random Inverse Limit Measures

B. J. K. Kleijn

doi:10.3390/math13142309

Korteweg-de Vries Institute for Mathematics, University of Amsterdam, Science Park 105–107, 1098 XG Amsterdam, The Netherlands

Mathematics2025, 13(14), 2309;https://doi.org/10.3390/math13142309

This article belongs to the Section D1: Probability and Statistics

Version Notes

Order Reprints

Abstract

Analogous to Kolmogorov’s theorem for the existence of stochastic processes describing random functions, we consider theorems for the existence of stochastic processes describing random measures as limits of inverse measure systems. Specifically, given a coherent inverse system of random (bounded/signed/positive/probability) histograms on refining partitions, we study conditions for the existence and uniqueness of a corresponding random inverse limit, a Radon probability measure on the space of (bounded/signed/positive/probability) measures. Depending on the topology (vague/tight/weak/total-variational) and Kingman’s notion of complete randomness, the limiting random measure is in one of four phases, distinguished by their degrees of concentration (support/domination/discreteness). The results are applied in the well-known Dirichlet and Polya tree families of random probability measures and a new Gaussian family of signed inverse limit measures. In these three families, examples of all four phases occur, and we describe the corresponding conditions of defining parameters.

Keywords:

random Radon measure; stochastic process (existence); stochastic integral; phase structure

MSC:

28C20; 60B11; 60G07; 60G15; 60G57

1. Introduction

Underpinning theories of random functions is Kolmogorov’s theorem for the existence of stochastic processes. Given a (e.g., Polish) domain

X

for real valued functions f, we depart from the collection of projections

f \mapsto f_{S} = (f (s) : s \in S)

(where S is any finite subset of

X

) and, for every S, we provide a probability distribution

Π_{S}

for the following projection:

f_{S} : = (f (s) : s \in S) \sim Π_{S} .

Consistency among projections dictates the necessary condition where

S^{'} \subset S

implies that

Π_{S^{'}}

is marginal to

Π_{S}

. Kolmogorov’s theorem says that this consistency condition is also sufficient: if the chosen

Π_{S}

are consistent, there exists a probability distribution

Π

for a random function f such that the projected random

f_{S}

are distributed according to

Π_{S}

for all finite

S \subset X

.

Kolmogorov’s theorem has advantages and disadvantages: On the one hand, mere consistency is enough without further conditions, making existence remarkably easy to prove. This enhances applicability greatly, as does the simplicity of the approach: properties of

Π

derive from those of the (user-defined)

Π_{S}

, and calculations for random

f_{S}

are feasible because they take place in finite-dimensional probability spaces. On the other hand, the limiting

Π

is a Borel measure for the product topology on

R^{X}

, which dissociates

Π

from some interesting other (e.g., metrizable) topologies, while the indistinct nature of

R^{X}

forces the imposition of extra conditions on the choices of the

Π_{S}

to induce properties like (

Π

-almost-sure) continuity, differentiability, or integrability.

Probability measures on spaces of measures are (less common, but) also of interest in various parts of science. For example, in non-parametric statistics [], particularly of the Bayesian type [,], probability distributions on spaces of probability measures play a central role; machine learning shares much with statistics, and random (probability) measures also feature prominently there.

In those disciplines, the most straightforward approach is often to define random measures by the mapping of random functions: it is commonplace, for example, to normalize an integrable positive random function in order to define a random probability density function. But it remains attractive to think about constructions of random measures that take place directly on the space of measures in its full generality. In this sense, based on the Poisson-type family of completely random measures [,], a well-developed theory of point processes (e.g., the family of Dirichlet random probability measures) exists (for an overview, see [,]) and is applied widely. Further examples exist, but a comprehensive mathematical theory for the construction and study of probability measures on measure spaces is lacking to date.

Ideally, such a theory of random measures would be based on an existence theorem like Kolmogorov’s theorem for random functions. In this paper, we formulate several new existence theorems of this kind. For a directed set

A

of finite, measurable partitions

α

of

R

, we choose distributions

Π_{α}

and define, for all restrictions of

μ

to

α

,

μ_{α} : = (μ (A) : A \in α) \sim Π_{α} .

These projections

μ_{α}

are called random histograms in what follows, and their consistency follows from the additivity of

μ

: if

β

refines

α

, then, for any

A \in α

,

(\sum_{B \subset A} μ_{β} (B) : A \in α) \sim Π_{α} .

when

μ_{β} \sim Π_{β}

. Under which conditions does such a system of random histograms have a (unique) histogram limit

Π

? (By which we mean a probability measure

Π

on the space of measures with

α

-restrictions that match the

Π_{α}

.)

This question is, of course, not new: both Bayesian non-parametric statistics and stochastic analyses have formulated a wide variety of conditions for existence, more or less independently. First explorations of the subject in stochastic analyses date back to the studies of [,]. The authors of these studies formulated the classical Bochner–Kolmogorov conditions for the existence of a random distribution function on

R

. Other approaches based on inner regularity are considered in [,,] and discussed comprehensively in [,]. Definitions of measure-theoretic inverse limits are presented in [,,,]. Limits of random histogram systems in the Bayesian non-parametric literature were first discussed in [], which introduced the Pólya-tree family of histogram systems. Many further developments were based on Kingman’s completely random measures, most prominently in the form of the Dirichlet process [,]. For overviews of these and further developments on non-parametric Bayesian priors, see [,]. Regarding existence, most noteworthy is [], which formulated the so-called Mean-measure condition for the existence of a limit for a system of random probability histograms

P_{α} \sim Π_{α}

: Orbanz requires that there exists a Borel probability measure G on

X

with histogram projections

G_{α}

that match histogram expectations:

G_{α} (A) = \int P_{α} (A) d Π_{α} (P_{α})

for all partitions

α

and all

A \in α

.

In this paper we prove and apply several existence theorems for limits of random histogram systems and analyze the variety of ways in which the corresponding random measures manifest in theory and examples. After introductory remarks and a discussion of the Bourbaki–Prokhorov–Schwartz theorem in Section 2, we consider spaces of probability measures

M^{1} (X)

with the tight, weak or total-variational topology, and we derive conditions that guarantee the existence and uniqueness of a limiting Radon probability measure

Π

in those cases in Section 3 and Section 4. As it turns out, the manifestations of their respective random probability measures are quite different: a limit

Π

that is Radon for the weak or total-variational topology is supported by the subset of

M^{1} (X)

of measures dominated by G, while a

Π

that is Radon for the tight topology is supported by the subset of

M^{1} (X)

of measures with support contained in that of G. Combined with the Poisson-like manifestation of completely random measures, in Section 5, we distinguish four phases for random probability histogram limits: absolutely-continuous, fixed-atomic, continuous-singular and random-atomic. The results are applied to known examples like the Dirichlet (in Section 6) and Pólya-tree (in Section 7) families, and we re-derive and sharpen some of the existing results.

In Section 8, we consider spaces of signed measures with the vague, tight and weak topologies and derive conditions that guarantee the existence and uniqueness of Radon histogram limits

Π

. The generalization to signed measures accommodates a new family of Gaussian probability distributions, defined as limits of random histogram systems of the form,

(Φ_{α} (A) : A \in α) \sim N (λ_{α}, Σ_{α}),

where N denotes a multivariate normal distribution with (suitably defined) expectation

λ_{α}

and covariance

Σ_{α}

. All four phases found for probability histogram limits are also realized in this setting, so Gaussian histogram limits exist with the same wide range of diffuse and point-like manifestations. We argue that Gaussian histogram limits based on Green’s functions for the harmonic operator generalize the well-known two-dimensional Gaussian free field [] to higher dimensions, suggesting a potential role in four-dimensional Euclidean quantum field theory.

To conclude, we emphasize the constructive nature of the existence theorems provided: random histogram systems not only define but also approximate random measures. The approximative property has two large advantages, one computational and one analytic: Firstly, histogram systems consist of finite-dimensional probability distributions, which we can simulate. The Dirichlet process, for example, derives much of its immense popularity from its ease of numerical implementation and use, and this considerable advantage extends to all histogram methods. The second advantage lies in mathematical accessibility. The analyses of example histogram limits in Section 6, Section 7 and Section 8 are possible only because calculations with finite-dimensional random histograms are feasible, and limits of the results correspond to properties of the infinite-dimensional histogram limits.

2. Limits of Random Histogram Systems

The existence theorems that follow in Section 3 and Section 4 require some (tedious but necessary) bookkeeping of partitions and corresponding notation (Section 2.1), as well as a preparatory discussion of the relevant existence theorem, the Bourbaki–Prokhorov–Schwartz theorem (Section 2.2).

2.1. Inverse Systems of Random Histograms

We start by introducing directed sets of partitions and the associated histogram systems.

2.1.1. Measures, Partitions and Histograms

Let

X

be a Hausdorff topological space (further specification, e.g., Polishness, compactness, etc., follows below). For most purposes,

X

can be thought of as a space like

R^{d}

: for statisticians, this space plays the role of the sample space, while, for probabilists and physicists,

X

represents Euclidean space-time. The space

X

has a Borel

σ

-algebra that we denote by

B

. We consider a collection

A

of partitions

α

of

X

, consisting of finite numbers of non-empty Borel sets. (The maximal such collection, denoted

A_{0}

, contains all finite partitions into non-empty Borel sets.) We order

A

partially by the refinement of partitions (if

α, β \in A

and

β

refines

α

, write

α \leq β

) and assume throughout that

A

forms a directed set for ordering by refinement. Naturally,

α \leq β

implies inclusion for the generated

σ

-algebras

σ (α) \subset σ (β) \subset B

. With the notation

| α |

for the cardinality of

α

, let

I (α)

denote the index set

{1, \dots, | α |}

. Furthermore, we associate with

α

a finite, discrete space

X_{α} = {e_{1}, \dots, e_{| α |}}

and the mapping

φ_{α} : X \to X_{α}

, such that

φ_{α} (x) = e_{i}

for all

x \in A_{i}

. For all

α \leq β

, we also define

φ_{α β} : X_{β} \to X_{α}

such that

φ_{α} = φ_{α β} \circ φ_{β}

(and

φ_{α α}

as the identity on

X_{α}

for all

α

), and we define

J_{α β} (i) \subset I (β)

to be such that

A_{i} = \cup_{j \in J_{α β} (i)} B_{j}

for all

i \in I (α)

.

Consider

X

(or any of the discrete spaces

X_{α}

): define

C_{b} (X)

(or

C (X_{α})

) to be the linear space of all bounded, continuous

f : X \to R

(or

f_{α} : X_{α} \to R

), and let

M_{b} (X)

(or

M (X_{α})

) denote the space of all bounded, signed Radon measures

μ

on

X

(or

μ_{α}

on

X_{α}

). For

μ, ν \in M_{b} (X)

, we say that

μ

dominates

ν

(notation

ν ≪ μ

) if

μ (B) = 0

implies

ν (B) = 0

for all

B \in B

. Define the bilinear form

⟨ μ, f ⟩ = \int f d μ

(or

{⟨ μ_{α}, f_{α} ⟩}_{α} = \int f_{α} d μ_{α}

). For any

μ \in M_{b} (X)

, let

μ_{+}, μ_{-}

denote the (unique pair of) positive measures such that

μ = μ_{+} - μ_{-}

, and define

| μ | = μ_{+} + μ_{-}

, noting that the total variational norm

∥ μ ∥

equals

⟨ | μ |, 1 ⟩

. The bilinear form

⟨ \cdot, \cdot ⟩

places

C_{b} (X)

in dual correspondence with

M_{b} (X)

(or

C (X_{α})

with

M (X_{α})

). We refer to the resulting topology on

M_{b} (X)

as the tight topology, denoted as

T_{C}

. Let

M_{+} (X)

denote the positive cone in

M_{b} (X)

and let

M^{1} (X) \subset M_{+} (X)

denote the space of all Radon probability measures on

X

(or equivalent with

X_{α}

). If

X

is a Polish space, then so are

M_{+} (X)

and

M^{1} (X)

(see [], Ch. IX, § 5, No. 4, Proposition 10). Alternatively, we view

C_{b} (X)

and

M_{b} (X)

as normed spaces, with the supremum norm

{∥ \cdot ∥}_{\infty, X}

(or

{∥ \cdot ∥}_{\infty, α}

) and total-variational norm

{∥ \cdot ∥}_{1, X}

(or

{∥ \cdot ∥}_{1, α}

)), respectively. We refer to the corresponding norm topology on

M_{b} (X)

as the total-variational topology, denoted as

T_{T V}

. Below we also consider

M_{b} (X)

and

M^{1} (X)

in duality with the space of all bounded Borel-measurable

h : X \to R

, based on the same bilinear form

⟨ μ, h ⟩

. We refer to the corresponding topology on

M_{b} (X)

as the weak topology, denoted

T_{1}

. Clearly, the total-variational topology refines the weak topology and the weak topology refines the tight topology.

We specialize to probability measures in most of this section and exclusively in Section 3, Section 4, Section 5, Section 6 and Section 7 and generalize to signed measures only in Section 8. Summarizing the most basic requirements for

X

,

A

and

M^{1} (X)

, we give the following definition.

Definition 1.

We say that

X

,

A

and

M^{1} (X)

satisfy the minimal conditions if

(i.): $X$ and $M^{1} (X)$ are Hausdorff topological spaces;
(ii.): $A$ is a directed set of finite partitions of $X$ in terms of non-empty, Borel-measurable sets;
(iii.): for any $α \in A$ and all $A \in α$ , $M^{1} (X) \to [0, 1] : P \mapsto P (A)$ is Borel-measurable.

With regard to the third requirement, it is noted that if the space

X

is a Polish space, the mappings

P \mapsto P (G)

, (

G \in B

) are measurable with respect to the Borel

σ

-algebra for the tight topology on

M^{1} (X)

.

For any Borel probability measure

P \in M^{1} (X)

on

X

, there exists a mapping on

A

,

α \mapsto P_{α} = (P (A_{1}), \dots, P (A_{| α |}))

that takes a finite, measurable partition

α

of

X

into the (α-)histogram associated with P. Note that

P (A_{1}) + \dots + P (A_{| α |}) = P (X) = 1

, so any

P_{α} \in M^{1} (X_{α})

can be represented by an element of the simplex

S_{| α |} = {p \in R^{| α |} : Σ_{i} p_{i} = 1}

(and we shall interchange these two perspectives freely in what follows). Consider

α, β \in A

such that

β

refines

α

. By finite additivity of the measure P, we have, for every

A \in α

,

P (A) = P (\cup_{B \subset A} B) = \sum {P (B) : B \in β, B \subset A},

(1)

so the histograms

P_{α}

and

P_{β}

are related through the summation of probabilities for components that are unified when partitions coarsen. Clearly, any probability measure P defines a collection of probability histograms related through (1), which, conversely, are enough to reconstruct P if

A

is rich enough (as per Carathéodory’s extension). To give these observations regarding histograms formal expression, we make the following definitions. For every

α \in A

, there exists a projection mapping

φ_{* α} : M^{1} (X) \to M^{1} (X_{α})

,

φ_{* α} (P) = (P (A_{1}), \dots, P (A_{| α |})),

(2)

that maps a probability distribution to its

α

-histogram. Based on (1), for all

α, β \in A

such that

α \leq β

, there is a transition mapping

φ_{* α β} : M^{1} (X_{β}) \to M^{1} (X_{α})

,

φ_{* α β} (P_{β}) = (\sum_{B \subset A_{1}} P_{β} (B), \dots, \sum_{B \subset A_{| α |}} P_{β} (B)),

(3)

that maps

β

-histograms to

α

-histograms. Then

φ_{* α α}

is the identity for any

α \in A

. Also, for any

α \leq β \leq γ

, we have,

φ_{* α γ} = φ_{* α β} \circ φ_{* β γ},

and, for all

α \leq β

,

φ_{* α} = φ_{* α β} \circ φ_{* β} .

(4)

Together with the fact that

A

forms a directed set, the following property implies that

A

is rich enough for histograms to fix measures on all of the Borel

σ

-algebra.

Definition 2.

A set

A

of partitions of a Hausdorff topological space

X

is said to resolve

X

if the σ-algebra generated by the union of all sets A in all partitions in

A

is the Borel σ-algebra, i.e., if

σ ({A : α \in A, A \in α}) = B

.

To formulate necessary conditions below, we also need a construction of partitions in terms of a topological basis for

X

.

Definition 3.

Let

U

be a topological basis for

X

. We say that a partition α (or collection of partitions

A

) is generated by the basis

U

if for (any

α \in A

and) any

A \in α

, A is the union of a finite number of subsets obtained through a finite number of intersections of

X

with U or

X ∖ U

,

(U \in U)

.

Example 1.

In a topological space

X

with a countable basis

U

, we may construct a sequence of refining partitions based on an enumeration of

U

: start with

α_{0} = {X}

; for all

n \geq 1

, intersect all sets in

α_{n - 1}

with

U_{n}

and

X ∖ U_{n}

and then define

α_{n}

to consist of all such non-empty intersections. The resulting

A = {α_{n} : n \geq 1}

is a fully ordered set, and

A

resolves

X

.

2.1.2. Domination, Histogram Densities and Total Variation

In dominated families of probability measures, the convergence of histogram systems coincides with the martingale convergence of Radon–Nikodym densities (see also Appendix A1.6 of []). Due to the monotony of the relation

α \mapsto σ (α)

,

F = {σ (α) : α \in A}

is a directed filtration. Furthermore, if

A

resolves

X

, the limit of the filtration

F

(which has the union of all

σ (α)

,

(α \in A)

, as a generating ring) is equal to the Borel

σ

-algebra

B

on

X

.

Let

P, Q \in M^{1} (X)

be given and assume that

P ≪ Q

, so that P has a (Q-almost-everywhere unique) Radon–Nikodym density

p : X \to [0, \infty)

with respect to Q. Consider the

σ (α)

-measurable functions

p_{α} : X \to [0, \infty)

, defined by,

p_{α} (x) = \sum_{A \in α} (\frac{1}{Q (A)} \int_{A} p (y) d Q (y)) 1_{A} (x),

(5)

for Q-almost-all

x \in X

(in particular, if

Q (A) = 0

, for some

A \in α

, the corresponding term proportional to

1_{A}

is (Q-almost-everywhere equal to 0 and therefore) not included in the sum). We may define for every

α \in A

the Q-dominated probability measure

P_{Q, α} : B \to [0, 1]

:

P_{Q, α} (B) = \int_{B} p_{α} (x) d Q (x),

(6)

for all

B \in B

, where it is noted that for all

A \in α

,

P_{Q, α} (A) = P (A)

(“

= P_{α} (A)

”, in a slightly abusive but natural notation that we introduce in Remark 1).

Lemma 1.

Let

X

,

A

and

M^{1} (X)

satisfy the minimal conditions and assume that

A

resolves

X

. Then, for any

P \in M^{1} (X)

and any dominating

Q \in M^{1} (X)

,

P_{Q, α}

converges to P in total variation.

Proof.

The Radon–Nikodym density function

p_{α}

is Q-almost-everywhere equal to the

σ (α)

-measurable conditional expectation

E [p | σ (α)] : X \to [0, \infty)

and, as such,

p_{α}

forms a non-negative, uniformly integrable Doob martingale relative to the filtration

F

. Since

A

resolves

X

, Doob’s martingale convergence guarantees that

{lim}_{α} p_{α} = p

in

L^{1} (X, B, Q)

. The assertion now follows from the fact that for Q-dominated probability measures P and

P_{Q, α}

, the total variational norm of their difference is proportional to the

L^{1} (Q)

-norm of the difference between densities,

∥ P - P_{Q, α} ∥_{1, X} = \frac{1}{2} \int |p (x) - p_{α} (x)| d Q (x),

(7)

for all

α \in A

. □

The above martingale convergence of densities has implications for the total-variational norm that we shall appeal to in Section 3 and Section 8.

Proposition 1.

Let μ be a bounded, signed, Borel measure on

X

. The mapping

α \mapsto ∥ μ_{α} ∥_{1, X_{α}}

is monotone-increasing. If

A

resolves

X

, then the total-variational norm for μ equals,

{∥ μ ∥}_{1, X} = sup_{α \in A} ∥ μ_{α} ∥_{1, X_{α}} = sup_{α \in A} \sum_{A \in α} | μ_{α} (A) | .

Proof.

If

α, β \in A

, and

β

refines

α

, then,

∥ μ_{α} ∥_{1, X_{α}} = \sum_{A \in α} | μ_{α} (A) | \leq \sum_{B \in β} | μ_{β} (B) | = ∥ μ_{β} ∥_{1, X_{β}} .

Let a signed measure

μ : B \to R

and

ϵ > 0

be given. According to the Hahn–Jordan decomposition, there exists a

A_{+} \in B

such that, for any

A \in B

,

A \subset A_{+}

, we have

μ (A) \geq 0

and, for any

A \in B

,

A \subset X ∖ A_{+}

, we have

μ (A) < 0

. Moreover,

{∥ μ ∥}_{1, X} = μ (A_{+}) - μ (X ∖ A_{+})

. Since

A

is directed and the union of all

σ (α) \subset B

, (

α \in A

) generates a generating ring for

B

, and there exist an

α \in A

and a

A_{α, +} \in σ (α)

with

μ ((A_{+} ∖ A_{α, +}) \cup (A_{α, +} ∖ A_{+})) < \frac{1}{2} ϵ

, so that,

{∥ μ ∥}_{1, X} \leq μ (A_{α, +}) - μ (X ∖ A_{α, +}) + ϵ \leq {∥ μ_{α} ∥}_{1, X_{α}} + ϵ,

proving the assertion. □

The quantities

∥ P - P \land L Q ∥

, used to control weak compactness in Section 3, are also suprema of their histogram versions.

Lemma 2.

Assume that

A

resolves

X

. For any

P, Q \in M^{1} (X)

such that

P ≪ Q

and any

L > 0

, we have,

{∥ P - P \land L Q ∥}_{1, X} = sup_{α \in A} {∥ P_{α} - P_{α} \land L Q_{α} ∥}_{1, X_{α}} .

Proof.

Write the Radon–Nikodym derivative of P with respect to Q as

p = d P / d Q

. Let

B \in B

be given. If

Q (B) = 0

, then

P (B) = 0

and

(P - P \land L Q) (B) = 0

for any

L > 0

. If

Q (B) > 0

, it follows from the convexity of

x \mapsto {(x)}_{+}

and Jensen’s inequality that

\begin{matrix} (P - P \land L Q) (B) & = Q (B) (\frac{1}{Q (B)} \int 1_{B} (x) {(p (x) - L)}_{+} d Q (x)) \\ \geq {(\int 1_{B} (x) (p (x) - L) d Q (x))}_{+} = P (B) - P (B) \land L Q (B), \end{matrix}

which implies that, for any

α \in A

,

∥ P_{α} - {(P \land L Q)}_{α} ∥_{1, X_{α}} \geq {∥ P_{α} - P_{α} \land L Q_{α} ∥}_{1, X_{α}},

and that the mapping

α \mapsto ∥ P_{α} - P_{α} \land L Q_{α} ∥

is monotone-increasing. Based on Proposition 1, we then find,

{∥ P - P \land L Q ∥}_{1, X} = sup_{α \in A} ∥ P_{α} - {(P \land L Q)}_{α} ∥_{1, X_{α}} \geq sup_{α \in A} {∥ P_{α} - P_{α} \land L Q_{α} ∥}_{1, X_{α}} .

Note that we have for every

α \in A

,

\begin{matrix} | ∥ P - & P \land L Q ∥_{1, X} - {∥P_{α} - P_{α} \land L Q_{α})∥}_{1, X_{α}} | \\ \leq {∥(P - P \land L Q) - (P_{Q, α} - P_{Q, α} \land L Q)∥}_{1, X} \\ = \frac{1}{2} \int | {(p (x) - L)}_{+} - {(p_{α} (x) - L)}_{+} | d Q (x) \\ = \frac{1}{2} \int | (p (x) - L) 1_{{p (x) > L}} - (p_{α} (x) - L) 1_{{p_{α} (x) > L}} | d Q (x) \\ \leq \frac{1}{2} \int 1_{{p (x) > L, p_{α} (x) > L}} | p (x) - p_{α} (x) | \\ + 1_{{p (x) > L, p_{α} (x) \leq L}} | p (x) - L | - 1_{{p (x) \leq L, p_{α} (x) > L}} | p_{α} (x) - L | d Q (x) \\ \leq \frac{1}{2} \int | p (x) - p_{α} (x) | d Q (x) = ∥ P - P_{Q, α} ∥_{1, X} . \end{matrix}

An appeal to Lemma 1 then proves the assertion. □

2.1.3. Random Histogram Systems and Coherence

Regarding random elements

P \in M^{1} (X)

(e.g., random elements of a Bayesian statistical model), we can project P onto its random histograms, as formalized in the following proposition, which introduces the notion of coherence.

Proposition 2.

Let

X

,

A

and

M^{1} (X)

satisfy the minimal conditions and let Π denote a Borel probability distribution on

M^{1} (X)

describing a random element P. Then, for every

α \in A_{0}

,

P_{α} : = φ_{* α} (P) = (P (A_{1}), \dots, P (A_{| α |})) \sim Π_{α},

(8)

induces a random histogram

P_{α}

with probability distribution

Π_{α}

on

M^{1} (X_{α})

. If

α \leq β

, then

P_{α}

and

P_{β}

are coherent, i.e., the distribution of

P_{α}

follows from that of

P_{β}

through summation, as in Equation (1).

Proof.

By assumption, for every

α \in A

and every

A \in α

,

M^{1} (X) \to [0, 1] : P \mapsto P (A)

is Borel-measurable. Accordingly,

Π_{α} = Π \circ φ_{* α}^{- 1}

is a Borel probability distribution on

M^{1} (X_{α})

. Coherence (Equation (1)) is a consequence of Equation (4). □

Our main question may be paraphrased as the converse of the above proposition: suppose that we provide distributions

Π_{α}

for random histograms

P_{α}

for all

α \in A

. Under which conditions does a collection of (probability) histogram distributions define a random (probability) measure (uniquely)? According to Proposition 2, coherence is necessary.

Definition 4.

Let

X

,

A

and

M^{1} (X)

satisfy the minimal conditions. For every

α \in A

, let

Π_{α}

be a distribution for a random histogram

P_{α}

, as in Equation (8). Assume that the resulting system of random histograms has the following property: if

α \leq β

, then the distribution

Π_{α}

follows from

Π_{β}

through summation, as in Equation (3), i.e.,

Π_{α} = Π_{β} \circ φ_{* α β}^{- 1} .

(9)

Then, we refer to

(Π_{α}, φ_{* α β})

as a coherent (inverse) system of random histogram distributions. If there exists a unique Radon probability distribution Π on

M^{1} (X)

with projections

Π_{α}

for all

α \in A

, then Π is called its random histogram limit.

For later reference, we define mean measures for Borel probability distributions on

M^{1} (X)

.

Definition 5.

Let

X

,

A

and

M^{1} (X)

satisfy the minimal conditions. Consider

M^{1} (X)

with a Borel probability measure Π. The mean measure G under Π is defined pointwise,

G (A) = \int_{M^{1} (X)} P (A) d Π (P),

for every Borel set A in

X

. Its restrictions to the sub-σ-algebras

σ (α)

are denoted as

G_{α}

.

To see that G is a well-defined probability measure, note that the

σ

-additivity of G is guaranteed by monotone convergence. Also note that the restrictions

G_{α} : = G |_{σ (α)}

are mean measures for the distributions

Π_{α}

on

M^{1} (X_{α})

: for any

A \in α

,

G (A) = \int_{M^{1} (X)} P (A) d Π (P) = \int_{M^{1} (X)} P_{α} (A) d Π (P) = \int_{M^{1} (X_{α})} P_{α} (A) d Π_{α} (P_{α}) = G_{α} (A) .

Remark 1.

In the above, we abuse notation slightly: for any probability measure in

M^{1} (X_{α})

, the domain is

σ (X_{α})

rather than

σ (α) \subset B

. So when we mean to refer to

φ_{* α} (P) ({e_{i}})

, we shall often use the more natural notation

P_{α} (A_{i})

instead.

2.2. The Bourbaki–Prokhorov–Schwartz Theorem

The conditions we derive in subsequent sections are based on a theorem from [] (referred to as Prokhorov’s theorem in []), which says that the existence of a limiting positive Radon measure in inverse systems of positive measures is equivalent to a form of inner regularity that holds for all projections simultaneously. This leads to the characterization of those inverse systems

(Π_{α}, φ_{*, α β})

that consistently define Radon probability measures

Π

on

M^{1} (X)

with various topologies.

To discuss the Bourbaki–Prokhorov–Schwartz Theorem, we first have to generalize somewhat: let

A

be a directed set and assume that

Y_{α}

,

α \in A

are Hausdorff topological spaces and that for any

α \leq β

, there exist continuous, surjective transition mappings

ψ_{α β} : Y_{β} \to Y_{α}

. Together, they form an inverse system of Hausdorff spaces (see [], Ch. I, § 4, No. 4; Ch. I, § 2, No. 3, Prop. 4, denoted as

(Y_{α}, ψ_{α β})

. If T denotes a Hausdorff topological space, a family of (projection) mappings

ψ_{α} : T \to Y_{α}

,

α \in A

is said to be coherent if for all

α \leq β

,

ψ_{α} = ψ_{α β} \circ ψ_{β}

, and it is said to be separating if for all

x, y \in T

,

x \neq y

, there exists an

α \in A

such that

ψ_{α} (x) \neq ψ_{α} (y)

.

Theorem 1

(Bourbaki–Prokhorov–Schwartz). Let

(Y_{α}, ψ_{α β})

be an inverse system of Hausdorff topological spaces indexed by

α \in I

, T a Hausdorff topological space and

ψ_{α} : T \to Y_{α}

a coherent and separating family of continuous mappings. Let

(μ_{α}, ψ_{α β})

be a coherent inverse system of positive measures on

(Y_{α}, ψ_{α β})

. There exists a bounded, positive Radon measure μ on T projecting to

μ_{α}

for all

α \in I

, if and only if, for every

ϵ > 0

, there is a compact

H \subset T

such that for all

α \in I

,

μ_{α} (Y_{α} ∖ ψ_{α} (H)) \leq ϵ .

(10)

When

(P)

holds, the measure μ is uniquely determined and

μ (L) = inf \{μ_{α} (ψ_{α} (L)) : α \in I\}

for every compact set L in T.

Proof.

See Theorem 1 of [], Ch. IX, § 4, No. 2. □

If all conditions of Theorem 1 are met, but the system of functions

ψ_{α}

is not separating, then a measure

μ

exists but may not be unique.

Bourbaki [] continues with application to a proof of existence of the Wiener measure and Kolmogorov’s perspective and the definition of so-called promeasures (also commonly known as cylinder set measures), which can be compared with the coherent histogram systems we define below: for a locally convex space E, [], Ch. IX, § 6 considers the collection of all linear subspaces V of finite co-dimension in E with continuous projections

p_{V} : E \to E / V

(and canonical

P_{V W} : E / W \to E / V

for

W \subset V

) to introduce

(E / V, p_{V W})

as the inverse system of finite-dimensional quotients. A coherent system of positive measures

Π_{V}

on the finite-dimensional spaces

E / V

,

(Π_{V}, p_{V W})

, is called a promeasure on E. It is noted that [], Ch. IX, § 6, No. 8–10 formulates a sufficient condition (Minlos’s Theorem, [], Ch. IX, § 6, No. 10, Theorem 2, based on []), but it appears difficult to apply unless E is a (barrelled) nuclear space.

In subsequent sections, we apply Theorem 1 directly to spaces of (bounded/signed/-positive/probability) measures with various topologies, limiting the inverse system of finite-dimensional quotients and promeasures, to inverse systems of partitions and random histograms. Let us prepare for the discussion by referring to some specifications pertaining to the situation where

X

,

A

and

M^{1} (X)

satisfy the minimal conditions and

I = A

,

T = M^{1} (X)

,

Y_{α} = M^{1} (X_{α})

and

ψ_{α β} = φ_{* α β}

, in the form of the following proposition.

Proposition 3.

Let

X

,

A

and

M^{1} (X)

satisfy the minimal conditions. For all

α \leq β

, the mappings

φ_{* α β} : M^{1} (X_{β}) \to M^{1} (X_{α})

are continuous and surjective, and

(M^{1} (X_{α}), φ_{* α β})

forms an inverse system of compact Hausdorff topological spaces, with a non-empty, compact, Hausdorff inverse limit N.

Proof.

Let

α \leq β

be given. For any

g \in C (X_{α})

, the mapping

g \circ φ_{α β} : X_{β} \to R

is an element of

C (X_{β})

. Because

φ_{α β}

is surjective, the induced mapping

φ_{α β}^{*} : C (X_{α}) \to C (X_{β})

is a bounded linear operator (with the norm equal to one). The transpose mapping

φ_{* α β} : M (X_{β}) \to M (X_{α})

is defined by,

{⟨ φ_{* α β} (μ_{β}), g ⟩}_{α} = {⟨ μ_{β}, φ_{α β}^{*} (g) ⟩}_{β} = {⟨ μ_{β}, g \circ φ_{α β} ⟩}_{β},

(11)

for all

μ_{β} \in M (X_{β})

and

g \in C (X_{α})

. The linear mapping

φ_{* α β}

is bounded (with the norm less than or equal to one) and surjective. Note that if we express

μ_{β}

as a vector

(μ_{β, 1}, \dots, μ_{β, | β |})

in

R^{| β |}

,

{⟨ μ_{β}, g \circ φ_{α β} ⟩}_{β} = \sum_{j \in I (β)} μ_{β, j} g (φ_{α β} (e_{j})) = \sum_{i \in I (α)} (\sum_{j \in J_{α β} (i)} μ_{β, j}) g (e_{i}) = \sum_{i \in I (α)} φ_{* α β} {(μ_{β})}_{i} g (e_{i}),

in accordance with (3). Finally, it is noted that inverse limits of non-empty, compact spaces are non-empty and compact (see [], Ch. I, § 9, No. 6, Prop. 8). □

The space N consists of finitely additive probability set functions on the

σ

-algebra generated by the partitions in

A

. Existence theorems for inverse limit probability measures on associated inverse limit spaces like N have been studied extensively: Bochner’s theorem [] and Choksi’s theorem [] give relatively mild sufficient conditions for the existence of a limiting probability measure

Π

on N for inverse systems of Radon probability spaces (see also [,,]). But, although

M^{1} (X)

(with the weak topology of Section 3) is homeomorphic to a subspace of N, it has proven difficult to formulate an additional condition to specify that

Π

is concentrated on the image of

M^{1} (X)

in N (see, however, [,,] and the correct proof of the Mean-measure condition in []). One of the strengths of the Bourbaki–Prokhorov–Schwartz theorem is that

T = M^{1} (X)

is projected directly onto the spaces

Y_{α} = M^{1} (X_{α})

, without detour via the inverse limit N. In this way, Theorem 1 avoids the (attractive but misleading) suggestion that a probability distribution

Π

on N is an easy way to get ‘close to’ the desired distribution on

M^{1} (X)

. By insisting only on continuous projections

T \to Y_{α}

, Theorem 1 focusses on inner regularity as the central issue. (Compare [], Theorem 21 and [], Ch. IX, § 4, No. 2, Theorem 1).

3. Random Histogram Limits with the Weak Topology

Again, let

X

,

A

and

M^{1} (X)

satisfy the minimal conditions, and fix the topology on

M^{1} (X)

to be the weak topology, defined as the subspace topology that

M^{1} (X)

inherits from

M_{b} (X)

with the weak topology. The compactness of a subset of

M^{1} (X)

is characterized by the Dunford–Pettis–Grothendieck theorem, as described in Appendix A.

3.1. Support and Approximation of Weak Histogram Limits

Before we apply Theorem 1 to define Radon probability measures on

M^{1} (X)

with the weak topology (and total-variational topology), let us consider some consequences, that is, necessary conditions for the existence of a random histogram limit. First, we characterize the support of Borel probability measures; next, we consider approximations of random P by random

P_{α}

.

3.1.1. Support and Domination

The following lemma is immediate but central enough to emphasize.

Lemma 3.

Let

X

,

A

and

M^{1} (X)

satisfy the minimal conditions. Consider

M^{1} (X)

with a Borel probability measure Π. For any Borel set A in

X

,

G (A) = 0

implies that

Π ({P \in M^{1} (X) : P (A) > 0}) = 0

.

Proof.

Let a Borel set A in

X

be given and assume that

G (A) = 0

. If the Borel set

B = {P \in M^{1} (X) : P (A) > 0}

in

M^{1} (X)

has probability

Π (B) > 0

, then, by

σ

-additivity, for some

ϵ > 0

, the Borel set

B^{'} = {P \in M^{1} (X) : P (A) > ϵ}

has probability

Π (B^{'}) > 0

. This would imply that

G (A) = \int P (A) d Π \geq \int_{B^{'}} P (A) d Π (P) \geq ϵ Π (B^{'}) > 0

, contradicting the assumption. □

Domination by the mean measure plays a role in the following proposition concerning the support of weakly-Borel probability measures on

M^{1} (X)

.

Proposition 4.

Let

X

,

A

and

M^{1} (X)

satisfy the minimal conditions. Consider

M^{1} (X)

with the weak topology and a Borel probability distribution Π. Let G be the mean measure under Π. Then,

{P \in M^{1} (X) : P ≪ G}

is closed in

M^{1} (X)

and,

{supp}_{T_{1}} (Π) \subset {P \in M^{1} (X) : P ≪ G} .

Moreover, if

P \in M^{1} (X)

is such that for all measurable partitions

α \in A

,

P_{α}

lies in the support of

Π_{α}

in

M^{1} (X_{α})

, then P lies in the weak support of Π.

Proof.

If

P \in M^{1} (X)

is not dominated by G, then there exists a Borel set A such that

P (A) > 0 = G (A)

. Consequently, for small enough

ϵ^{'} > 0

, the weakly open neighborhood

U = {Q \in M^{1} (X) : | Q (A) - P (A) | < ϵ^{'}}

does not meet

{Q \in M^{1} (X) : Q ≪ G}

, so

{Q \in M^{1} (X) : Q ≪ G}

is weakly closed. According to Lemma 3,

Π (U) \leq Π ({Q \in M^{1} (X) : Q (A) > 0}) = 0

, so U receives

Π

-mass zero, implying that

P \notin {supp}_{T_{1}} (Π)

.

Regarding the last assertion, it is noted that since

M^{1} (X)

with the weak topology is homeomorphic to a subspace of the inverse limit N of Proposition 3, the collection of sets

{φ_{* α}^{- 1} (V) : α \in A, V \in U_{α}}

(where

U_{α}

is any topological basis for

M^{1} (X_{α})

) in

M^{1} (X)

forms a basis for the weak topology. Consequently, for any weak neighborhood U of

P \in M^{1} (X)

, there exists an

α \in A

and a

V \in U_{α}

such that

φ_{* α}^{- 1} (V) \subset U

, and,

Π (U) \geq Π (φ_{* α}^{- 1} (V)) = Π_{α} (V) > 0,

by assumption. □

Recall the open problem regarding the construction of random elements from specific dominated families, as posed in [] (p. 42):

Indeed, it appears to be an open problem to find simple sufficient conditions, analogous to Corollary 9.3.VI, for the realizations of a random measure to be [almost-surely] absolutely continuous with respect to a given measure.

(Ref. [] (Corollary 9.3.VI), formulates a condition for a system of random histograms with a limit to be almost-surely non-atomic). Proposition 4 says that, given some probability measure G, we should look for coherent systems of random histograms with the projections $G_{α}$ as their mean histograms and a limit $Π$ that is a weakly-Borel probability measure on $M^{1} (X)$ . In Section 3.2, we provide a relatively simple necessary and sufficient condition for a coherent random histogram system to have a unique weakly-Radon random histogram limit $Π$ .

3.1.2. Approximation by Weakly Convergent Histograms

Next, we consider the way in which histogram systems with a weak limit approximate (random) probabilities

P (A)

. Let

Π

be a Radon probability measure on

M^{1} (X)

with the weak topology, with mean measure G. For every

η > 0

and every

α \in A

, let

C_{G, α} (η)

denote the collection of all Borel sets B in

X

that are approximated by elements of the

σ

-algebra

σ (α)

to within G-measure

η

:

C_{G, α} (η) = \{B \in B : inf {G ((B ∖ C) \cup (C ∖ B)) : C \in σ (α)} < η\} .

Note that for any

η > 0

and any B, there exists an

α

such that

B \in C_{G, α} (η)

(see Theorem 4.4 in []).

Proposition 5.

Let

X

,

A

and

M^{1} (X)

satisfy the minimal conditions. If Π is a weakly-Radon probability measure on

M^{1} (X)

with mean measure G, then, for every

δ, ϵ > 0

, there exists a partition

α \in A

and an

η > 0

, such that for all

B \in C_{G, α} (η)

,

Π (\{P \in M^{1} (X) : inf {P ((B ∖ C) \cup (C ∖ B)) : C \in σ (α)} > δ\}) < ϵ .

Proof.

Let

ϵ > 0

be given. By inner regularity, there exists a weakly compact H in

M^{1} (X)

such that

Π (H) > 1 - ϵ

. For every

δ > 0

, there exists an

η > 0

, such that for all Borel sets A in

X

,

G (A) < η

implies that

P (A) < δ

for all

P \in H

, cf. (A1). In particular, if

B \in C_{G, α} (η)

, then, for some

C \in σ (α)

,

G ((B ∖ C) \cup (C ∖ B)) < η

, implying that for all

P \in H

,

P ((B ∖ C) \cup (C ∖ B)) < δ

. □

This observation is important from a computational perspective: the practitioner chooses an approximating partition

α

to perform computations with histograms and would like to be able to control the accuracy of their approximations for the P in terms of their restrictions to

σ (α)

. They have control over the probability measures

Π_{α}

and, as a result, control over the mean measures

G_{α}

. Accordingly, they can choose a level of refinement (expressed by a choice for some partition

α

), making approximations in the G-measure by the

α

-histogram. The Radon property ensures that the approximation in the G-measure carries over to approximation in the P-measure, uniformly in P, with arbitrarily high

Π

-probability, depending on the degree of approximation in the level

α

that is chosen for actual computations. Such a guarantee concerning degrees of approximation is not automatic if

Π

is a Radon measure for the tight topology of Section 4.

3.2. Existence of Weak Histogram Limits

Let

A

denote a set of finite Borel-measurable partitions of

X

, directed for ordering by refinement. If we equip

T = M^{1} (X)

with the weak topology, Theorem 1 takes the following form.

Theorem 2.

Let

X

,

A

and

M^{1} (X)

satisfy the minimal conditions. Assume that

A

resolves

X

and consider

M^{1} (X)

with the weak topology. Let

(Π_{α}, φ_{* α β})

be a coherent system of Borel probability measures on the inverse system

(M^{1} (X_{α}), φ_{* α β})

. There exists a unique weakly-Radon probability measure Π on

M^{1} (X)

projecting to

Π_{α}

for all

α \in A

, if and only if, there is a

Q \in M^{1} (X)

such that, for every

ϵ, δ > 0

there is a

L > 0

such that,

Π_{α} ({P_{α} \in M^{1} (X_{α}) : ∥ P_{α} - P_{α} \land L Q_{α} ∥_{1, X_{α}} > δ}) < ϵ,

(12)

for all

α \in A

.

Proof.

According to Proposition 3,

(M^{1} (X_{α}), φ_{* α β})

forms an inverse system of Hausdorff topological spaces. For all

α \in A

,

φ_{* α} : M^{1} (X) \to M^{1} (X_{α})

is continuous with respect to the weak topology. If

P, Q \in M^{1} (X)

and

P \neq Q

, then there exists a set B in the

σ

-algebra generated by the

σ (α)

such that

P (B) \neq Q (B)

, which cannot be the case unless, for some

α \in A

, the histogram projections

φ_{* α} (P)

and

φ_{* α} (Q)

differ. Combining this with Equation (4), we can conclude that

(φ_{* α}, φ_{* α β})

forms a coherent and separating family of continuous mappings

M^{1} (X) \to M^{1} (X_{α})

.

The assertion now follows from Theorem 1 if we can show that condition (10) holds. To that end, let

ϵ > 0

be given and define

ϵ_{n} = 2^{- n} ϵ

. Given some monotone decreasing sequence

(δ_{n})

such that

δ_{n} > 0

δ_{n} \to 0

, let

L_{n}

be positive constants such that,

Π_{α} (\{P_{α} \in M^{1} (X_{α}) : {∥ P_{α} - P_{α} \land L_{n} Q_{α} ∥}_{1, X_{α}} > δ_{n}\}) < ϵ_{n},

for every

α \in A

. Define,

H = ⋂ \{P \in M^{1} (X) : {∥ P_{α} - P_{α} \land L_{n} Q_{α} ∥}_{1, X_{α}} \leq δ_{n}, n \geq 1, α \in A\} .

Let

δ > 0

be given, choose

n \geq 1

such that

δ_{n} < δ

and define

L = L_{n}

. Since

A

resolves

X

, Proposition 1 and Lemma 2 say that

{∥ P - P \land L Q ∥}_{1, X} = sup {∥ P_{α} - P_{α} \land L Q_{α} ∥_{1, X_{α}} : α \in A}

for all P and, hence,

{sup {∥ P - P \land L Q ∥}_{1, X} : P \in H} = sup {∥ P_{α} - P_{α} \land L Q_{α} ∥_{1, X_{α}} : P \in H, α \in A} \leq δ .

Conclude that H is relatively compact with respect to the weak topology, cf. the Dunford–Pettis–Grothendieck condition. For the compact closure

\bar{H}

of H in

M^{1} (X)

and any

α

, we have (by monotony of

α \mapsto ∥ P_{α} - P_{α} \land L Q_{α} ∥_{1, X_{α}}

for any

L > 0

),

\begin{matrix} Π_{α} (M^{1} (X_{α} & ) ∖ φ_{* α} (\bar{H})) \leq Π_{α} (M^{1} (X_{α}) ∖ φ_{* α} (H)) \\ \leq Π_{α} (M^{1} (X_{α}) ∖ ⋂_{n \geq 1} φ_{* α} (\{P \in M^{1} (X) : {∥ P_{α} - P_{α} \land L_{n} Q_{α} ∥}_{1, X_{α}} \leq δ_{n}\})) \\ = Π_{α} (⋃_{n \geq 1} {P_{α} \in M^{1} (X_{α}) : ∥ P_{α} - P_{α} \land L_{n} Q_{α} ∥_{1, X_{α}} > δ_{n}}) \\ \leq \sum_{n \geq 1} Π_{α} ({P_{α} \in M^{1} (X_{α}) : ∥ P_{α} - P_{α} \land L_{n} Q_{α} ∥_{1, X_{α}} > δ_{n}}) < ϵ, \end{matrix}

which shows that condition (10) of Theorem 1 is satisfied. Conclude that there exists a unique Radon probability measure

Π

on

M^{1} (X)

that projects to

Π_{α}

for all

α \in A

.

Conversely, let

Π

be a weakly-Radon probability measure

Π

on

M^{1} (X)

. According to Proposition 4, the weak support of

Π

is dominated by the mean measure G. Again appealing to Lemma 2, we see that that for every

δ > 0

, all

L > 0

and all

α \in A

,

\begin{matrix} Π ({P \in & M^{1} (X) : {∥ P - P \land L G ∥}_{1, X} > δ}) \\ = Π ({P \in M^{1} (X) : sup_{β \in A} {∥ P_{β} - P_{β} \land L G_{β} ∥}_{1, X_{β}} > δ}) \\ \geq Π (\{P \in M^{1} (X) : {∥ P_{α} - P_{α} \land L G_{α} ∥}_{1, X} > δ\}) \\ = Π_{α} (\{P_{α} \in M^{1} (X_{α}) : {∥ P_{α} - P_{α} \land L G_{α} ∥}_{1, X_{α}} > δ\}) . \end{matrix}

Since

Π

is weakly-Radon, for every

δ, ϵ > 0

, there exists a constant

L > 0

such that,

Π (\{P \in M^{1} (X) : {∥ P - P \land L G ∥}_{1, X} > δ\}) < ϵ,

verifying that condition (12) holds. □

On first sight, condition (12) may appear technical and inaccessible. It is noted, however, that in practice one has considerable control: one may choose

A

(large enough to resolve

X

but otherwise) as small as possible and a histogram system

(Π_{α} : α \in A)

as simple as possible in order to enable the verification of condition (12) in a manageable form. Moreover, all subsequent calculations involve only probability distributions on finite-dimensional simplices, enhancing feasibility greatly.

Regarding the measure Q, we simplify by appeal to a necessary condition: it is clear that if Theorem 2 holds, then the support of the probability measure

Π

is dominated by the mean measure G, cf. Proposition 4. So, when looking for a candidate-dominating measure Q to verify condition (12), we can turn to the mean measures

G_{α}

of the histogram distributions

Π_{α}

: if we show either that the

G_{α}

are the histograms associated with a mean measure G (the Mean-measure condition []) or that condition (16) below holds (cf. Definition 5), then G can play the role of Q.

Based on those two remarks, condition (12) can be rewritten as follows:

For some

G \in M^{1} (X)

,

α

-histograms

φ_{* α} (G)

equal the mean measures

G_{α}

for all

α \in A

and, for every

ϵ, δ > 0

there is a

L > 0

such that,

Π_{α} (\{P_{α} \in M^{1} (X_{α}) : \sum_{A \in α} {(P_{α} (A) - L G_{α} (A))}_{+} > δ\}) < ϵ,

(13)

for all

α \in A

.

This form of the condition forms the starting point for the examples of Section 7 and Section 8.

3.3. Existence of Total-Variational Histogram Limits

Let

X

be a Hausdorff topological space and let

P

be a subset of

M^{1} (X)

dominated by a probability measure

Q \in M^{1} (X)

. In this subsection, we distinguish

P

from

P_{b}

, represented by the same set, denoted as

P

when equipped with the weak topology and

P_{b}

when equipped with the total-variational topology. The identity mapping

i : P_{b} \to P

is a continuous bijection. Write

B (P)

and

B (P_{b})

for the associated Borel

σ

-algebras.

Proposition 6.

If

X

is separable and

P

is a dominated subset of

M^{1} (X)

, then

P_{b}

is separable and

B (P) = B (P_{b})

.

Proof.

Let Q denote the probability measure that dominates

P

. By the separability of

X

, the Banach space

L^{1} (X, B, Q)

of Q-integrable, real-valued functions on

X

, is separable, and so is its subspace of Radon–Nikodym densities

P_{b}^{'} = {d P / d Q : P \in P}

. Since

P_{b}

and

P_{b}^{'}

are homeomorphic,

P_{b}

is separable too. It can be shown [,] that, then, the total-variational norm is measurable with respect to the minimal

σ

-algebra for the measurability of the mappings

P \mapsto P (A)

,

A \in B

, which is contained in

B (P)

. Accordingly,

B (P_{b}) \subset B (P)

. Since the total-variational topology refines the weak topology, also,

B (P) \subset B (P_{b})

. □

As a consequence, any Borel probability measure

Π

on

P

(viewed as a

σ

-additive set-function with

B (P)

as its domain) gives rise to a Borel probability measure

Π_{b}

on

P_{b}

(viewed as a

σ

-additive set function with

B (P_{b})

as its domain).

Corollary 1.

Let

X

be separable and, together with

A

and

M^{1} (X)

, satisfy the minimal conditions. Consider

M^{1} (X)

with the total-variational topology. Assume that

A

resolves

X

. Let

(Π_{α}, φ_{* α β})

be a coherent system of Borel probability measures on the inverse system

(M^{1} (X_{α}), φ_{* α β})

. If condition (12) holds, there exists a unique

T_{T V}

-Radon probability measure Π on

M^{1} (X)

projecting to

Π_{α}

for all

α \in A

.

Proof.

Under the stated conditions, Theorem 2 asserts the existence of a Radon probability measure

Π

on

M^{1} (X)

with the weak topology and Proposition 4 guarantees that

P = {supp}_{T_{1}} (Π)

is dominated by the mean measure G. Because

X

is separable,

B (P) = B (P_{b})

, so

Π

is a Borel probability measure on

P_{b}

. Uniqueness follows from the uniqueness of the weak histogram limit. By the Radon–Nikodym theorem,

P_{b}

is homeomorphic (isometrically, even, cf. (7)) to a closed subset of the Polish space

L^{1} (X, B, G)

and, therefore, a Radon space, so that

Π

is a Radon measure. □

So, remarkably, the existence of a total-variational random histogram limit does not impose stricter conditions than the existence of a weak random histogram limit; moreover, not even inner regularity is lost in the transition from

P

to

P_{b}

. From the perspective of Theorem 5, this amplification is inconsequential, but events and statements involving the total-variational norm are very common and the measurability of total-variational balls is crucial for many applications (for example, in large-sample limits of posterior distributions on metric spaces in non-parametric statistics [,]).

4. Random Histogram Limits with the Tight Topology

The existence question of a limit for coherent random histogram systems has been studied extensively with the tight topology for

M^{1} (X)

: a rich body of literature has grown from Kingman’s original work on completely random measures [], with an emphasis on limits with almost-surely purely atomic realizations [,]. Here, we revisit the existence problem without restricting to point processes and derive a necessary and sufficient condition in Section 4.1 based on Theorem 1. In Section 4.2 we consider the support of tight random histogram limits.

4.1. Existence of Tight Histogram Limits

Let

X

be a Polish space with topology

T

. We are interested in the construction of Radon probability measures on

M^{1} (X)

with the tight topology. In comparison with the construction of Section 3.2, the assertion is weaker since the weak topology refines the tight topology. Accordingly, compactness as in condition (10) constitutes a less stringent restriction, while the continuity requirement of histogram projections becomes harder to satisfy.

Indeed, when one tries to reproduce the initial steps in the proof of Section 3.2 with the tight topology, a disappointment awaits: if

X = R^{d}

with the standard topology, for example, then for any partition

α

of

X

into two or more subsets, the projection mapping

φ_{* α} : M^{1} (X) \to M^{1} (X_{α})

of Equation (2) is not continuous: superficially, it appears that Theorem 1 cannot be applied.

In order to correct this, we refine to a zero-dimensional version

\hat{X}

of

X

, rendering projection mappings continuous for a collection of partitions that is large enough to be separating and resolving. While this leaves the Borel

σ

-algebra unchanged, the transition to

\hat{X}

does complicate the nature of tight compactness in

M^{1} (\hat{X})

. A counterexample at the end of this subsection shows that this complication corresponds directly to the precise way in which a coherent system of random histograms can fail to have a tight limit.

4.1.1. Zero-Dimensional Refinements of Polish Spaces

With a countable basis

U

for the topology

T

, define the topological sub-basis,

S = {U, X ∖ U : U \in U},

(14)

for a topology

\hat{T}

on the set

X

in which each basis element

U \in U

is clopen; denote the resulting topological space by

\hat{X}

.

Proposition 7.

The space

\hat{X}

is zero-dimensional and the identity mapping

i : \hat{X} \to X

is continuous. If

U_{1}

and

U_{2}

are two bases for

X

, the corresponding spaces

{\hat{X}}_{1}

and

{\hat{X}}_{2}

are homeomorphic. If

X

is Polish, then

\hat{X}

is also Polish.

Proof.

The sub-basis

S

gives rise to a basis consisting of clopen sets, so

\hat{X}

is zero-dimensional, and the identity i is continuous because

\hat{T}

refines

T

. Because any

U_{2} \in U_{2}

contains a

U_{1}

from the basis

U_{1}

and vice versa, the identity mapping on

X

is continuous from

{\hat{X}}_{1}

to

{\hat{X}}_{2}

and also from

{\hat{X}}_{2}

to

{\hat{X}}_{1}

. Assuming that

X

is Polish, the countable product space

X^{N} = \prod_{n \geq 1} X

is Polish (see [] (Proposition 3.3)) and has a diagonal

Δ = {(x, x, \dots) \in \prod_{n \geq 1} X : x \in X}

that is a closed subspace, homeomorphic to

X

. Enumerate the basis sets in

U = {U_{i} : i \geq 1}

and define

Y_{n}

to be the refinement of

X

with

U_{1}, \dots, U_{n}

made clopen (e.g.,

Y_{1}

is the topological sum of

U_{1}

and

X ∖ U_{1}

, etc.). The canonical set-theoretic identification

i_{n} : Y_{n} \to X

is continuous. The spaces

Y_{n}

are all Polish as the topological sums of

U_{n}

and

X ∖ U_{n}

(which are Polish). The product space

\prod_{n \geq 1} Y_{n}

is Polish and the map

j : \prod_{n \geq 1} Y_{n} \to X^{N}

is a continuous bijection. Then,

j^{- 1} (Δ)

is Polish and homeomorphic to

\hat{X}

. □

Proposition 8.

Let

X

be a Polish space. The Borel sets on

X

and

\hat{X}

are equal and any set function μ is a (bounded/signed/positive/probability) Borel measure on

X

if and only if μ is a (bounded/signed/positive/probability) Borel measure on

\hat{X}

.

Proof.

Note that the Borel

σ

-algebra on

X

generated by the basis

U

is identical to the

σ

-algebra generated by

U

and its complements, which form the sub-basis for

\hat{X}

. Conclude that

X

and

\hat{X}

have the same Borel sets. Boundedness, signedness or positivity, being a probability measure and countable additivity, are then identical as properties of set functions

μ

on the Borel

σ

-algebra. □

Proposition 8 implies the existence of a bijective mapping

i_{*}

with the following properties.

Proposition 9.

The mapping

i_{*} : M_{b} (\hat{X}) \to M_{b} (X)

is a continuous bijection. If

X

is Polish,

M^{1} (X)

and

M^{1} (\hat{X})

are Polish and

i_{*}^{- 1} : M^{1} (X) \to M^{1} (\hat{X})

is Borel-measurable.

Proof.

Any bounded and continuous

f : X \to R

is also bounded, continuous when viewed as

f : \hat{X} \to R

, so there exists a linear, injective mapping

j_{*} : C_{b} (X) \to C_{b} (\hat{X})

of norm one. Transpose to that a bounded, injective, linear

i_{*} = j_{*}^{t} : M_{b} (\hat{X}) \to M_{b} (X)

of norm one (see [], Ch. II, § 6, No. 4, Proposition 5 and [], Ch. IV, § 1, No. 3, Proposition 8). As noted earlier, if

X

is a Polish space, then so is

M^{1} (X)

(ref. [], Ch. IX, § 5, No. 4, Proposition 10), so, based on Proposition 7, both

M^{1} (X)

and

M^{1} (\hat{X})

are Polish spaces. According to Theorems 12.4 and 14.12 (Souslin’s theorem) in [], if

X, Y

are standard Borel spaces and

f : X \to Y

is a Borel-measurable injection, then its inverse on

f (X)

is also Borel-measurable. Applied to

i_{*}

, this proves the last assertion. □

For all

α \in A

, define the mappings

{\hat{φ}}_{* α} : M_{b} (\hat{X}) \to M (X_{α})

,

{\hat{φ}}_{* α} (μ) = (μ (A_{1}), \dots, μ (A_{| α |})),

(15)

that take any bounded, signed Radon measure

μ

on

\hat{X}

into its

α

-histogram.

Proposition 10.

Let α be a partition of

X

generated by a basis

U

and let

\hat{X}

denote the associated zero-dimensional version of

X

. The mapping

{\hat{φ}}_{* α} : M_{b} (\hat{X}) \to M (X_{α})

is continuous for the tight topology.

Proof.

For any partition

α

generated by the basis

U

(cf. Definition 3), any

A \in α

is clopen in

\hat{X}

. For any clopen A,

1_{A}

is a bounded, continuous function on

\hat{X}

. Therefore,

M^{1} (\hat{X}) \to R : P \mapsto P (A)

is continuous with respect to the tight topology and so is

{\hat{φ}}_{* α}

. □

Compactness in

\hat{X}

has a more stringent meaning than in the original space

X

: indeed, according to Brouwer’s theorem, any compact

\hat{K} \subset \hat{X}

is a union of a subspace homeomorphic to the Cantor set with isolated points. For example, with

X = R

in its standard topology,

[0, 1]

is not compact in the space

\hat{X}

.

4.1.2. Tight Histogram Limits with Zero-Dimensional Compacta

Tight compactness in

M_{b} (X)

is characterized in Prokhorov’s theorem (see Appendix A), which says roughly that a norm-bounded set H of measures is relatively tightly compact if inner regularity is a property that holds uniformly in H.

We are now in a position to apply Theorem 1 to

M^{1} (X)

(or rather,

M^{1} (\hat{X})

) with the tight topology. (Note: mention of the Radon property in the statement of Theorem 3 is accurate but strictly speaking redundant since

M^{1} (X)

is a Radon space.)

Theorem 3.

Let

X

be a Polish space with a directed set

A

of partitions that resolves

X

, generated by a basis that gives rise to a zero-dimensional

\hat{X}

. Consider

M^{1} (X)

with the tight topology. Let

(Π_{α}, φ_{* α β})

be a coherent system of Borel probability measures on the inverse system

(M^{1} (X_{α}), φ_{* α β})

. There exists a unique Radon probability measure Π on

M^{1} (X)

projecting to

Π_{α}

for all

α \in A

, if and only if, for every

ϵ, δ > 0

, there is a compact

\hat{K} \subset \hat{X}

such that,

Π_{α} ({P_{α} \in M^{1} (X_{α}) : P_{α} (φ_{α} (\hat{K})) < 1 - δ}) < ϵ,

(16)

for all

α \in A

.

Proof.

Under the condition of the theorem,

\hat{X}

,

A

and

M^{1} (\hat{X})

satisfy the minimal conditions. Like before (see Proposition 3),

(M^{1} (X_{α}), φ_{* α β})

forms an inverse system of compact Hausdorff topological spaces. As in the proof of Theorem 1,

({\hat{φ}}_{* α}, φ_{* α β})

is a coherent and separating family of mappings on

M^{1} (\hat{X})

and, from Proposition 10, we can conclude that the

{\hat{φ}}_{* α}

are also continuous.

To show that condition (10) holds, let

ϵ > 0

be given and define

ϵ_{n} = 2^{- n} ϵ

. Given some decreasing sequence

(δ_{n})

such that

δ_{n} > 0

,

δ_{n} \to 0

, let

{\hat{K}}_{n}

be compact subsets of

\hat{X}

such that,

Π_{α} ({P_{α} \in M^{1} (X_{α}) : P_{α} (φ_{α} ({\hat{K}}_{n})) < 1 - δ_{n}}) < ϵ_{n},

for every

α \in A

and every

n \geq 1

. Define,

H = ⋂ \{P \in M^{1} (X) : P_{α} (φ_{α} ({\hat{K}}_{n})) \geq 1 - δ_{n}, n \geq 1, α \in A\} .

Let

δ > 0

be given, choose

n \geq 1

such that

δ_{n} < δ

and choose

\hat{L} = {\hat{K}}_{n}

. Since the Borel sets

L_{α} = (φ_{α}^{- 1} \circ φ_{α}) (\hat{L})

decrease as the level of refinement of the partition

α

increases, and since

A

resolves

X

,

\hat{L} = \cap_{α \in A} L_{α}

, and

P (\hat{L}) = inf {P_{α} (φ_{α} (\hat{L})) : α \in A}

by monotony. Conclude that H is relatively compact with respect to the tight topology, according to (A2). For the compact closure

\bar{H}

of H in

M^{1} (\hat{X})

and any

α

, we have (by monotony of

α \mapsto P_{α} (φ_{α} (K))

for any K),

\begin{matrix} Π_{α} (M^{1} (X_{α} & ) ∖ {\hat{φ}}_{* α} (\bar{H})) \leq Π_{α} (M^{1} (X_{α}) ∖ {\hat{φ}}_{* α} (H)) \\ \leq Π_{α} (M^{1} (X_{α}) ∖ ⋂_{n \geq 1} {\hat{φ}}_{* α} (\{P \in M^{1} (X) : P_{α} (φ_{α} ({\hat{K}}_{n})) \geq 1 - δ_{n}\})) \\ \leq Π_{α} (⋃_{n \geq 1} {P_{α} \in M^{1} (X_{α}) : P_{α} (φ_{α} ({\hat{K}}_{n})) < 1 - δ_{n}}) \\ \leq \sum_{n \geq 1} Π_{α} ({P_{α} \in M^{1} (X_{α}) : P_{α} (φ_{α} ({\hat{K}}_{n})) < 1 - δ_{n}}) < ϵ, \end{matrix}

which shows that condition (10) of Theorem 1 is satisfied. Conclude that there exists a unique Radon probability measure

\hat{Π}

on

M^{1} (\hat{X})

that projects to

Π_{α}

for all

α \in A

. The continuous mapping

i_{*} : M^{1} (\hat{X}) \to M^{1} (X)

of Proposition 9 serves to define

Π = \hat{Π} \circ i_{*}^{- 1}

, a Radon probability measure on

M^{1} (X)

, and

Π

still projects to

Π_{α}

for all

α \in A

.

Conversely, since

X

is Polish, cf. Proposition 9,

\hat{X}

,

M^{1} (\hat{X})

and

M^{1} (X)

are Polish spaces and the mapping

i_{*}^{- 1} : M_{b} (X) \to M_{b} (\hat{X})

is Borel-measurable. Therefore, the mapping

\hat{Π} = Π \circ i_{*}

defines a Borel probability measure on

M^{1} (\hat{X})

, which is Radon because

M^{1} (\hat{X})

is a Radon space. So, according to Prokhorov’s theorem, for every

δ, ϵ > 0

, there exists a compact

\hat{K}

in

\hat{X}

such that,

Π ({P \in M^{1} (\hat{X}) : P (\hat{K}) < 1 - δ}) < ϵ .

With

{\hat{K}}_{α} = (φ_{α}^{- 1} \circ φ_{α}) (\hat{K}) \subset \hat{X}

, for every

α \in A

, we have

\hat{K} \subset {\hat{K}}_{α}

and, accordingly,

\begin{matrix} Π ({P \in & M^{1} (\hat{X}) : P (\hat{K}) < 1 - δ}) \geq Π ({P \in M^{1} (\hat{X}) : P ({\hat{K}}_{α}) < 1 - δ}) \\ = Π_{α} ({P_{α} \in M^{1} (X_{α}) : P_{α} (φ_{α} (\hat{K})) < 1 - δ}), \end{matrix}

for any

δ > 0

, which implies the converse. □

Let us paraphrase: to have a coherent inverse system of probability measures for histograms defining a limit

Π

that is a Radon probability measure on

M^{1} (X)

for the tight topology, we look for compacta

\hat{K}

in a zero-dimensional version of

X

that captures most of the mass of the projected measures

P_{α}

with high

Π_{α}

-probability, uniformly in

α \in A

.

4.1.3. Tight Histogram Limits with Ordinary Compacta

In certain histogram systems (like those that define Dirichlet process distributions), there is an easy way to prove the Mean-measure condition (see the proof of Theorem 6). In histogram systems where this condition is less or not accessible (like those that define the Pólya-tree distributions), zero-dimensional compacta in the space

\hat{X}

are unwieldy, so we also provide a re-formulation of Theorem 3 that relies only on compacta in

X

.

To avoid mention of the zero-dimensional space

\hat{X}

, we re-construct compacta

\hat{K}

from decreasing sequences of compacta in

X

. Let

A

be a directed set of partitions generated by a basis. For every

α \in A

, consider the topological space

Y_{α}

obtained from

X

by declaring all sets

A \in α

clopen, i.e.,

Y_{α}

is the topological sum of all the partition sets

A \in α

with their subspace topologies. Note that the set-theoretic identity mapping on

X

is continuous as a mapping

π_{α} : \hat{X} \to Y_{α}

. (See also the proof of Proposition 7.)

Lemma 4.

A subset

\hat{K}

is compact in

\hat{X}

if and only if

K_{α} = π_{α} (\hat{K})

is compact in

Y_{α}

for all

α \in A

. Conversely, given compact subsets

K_{α}

of

Y_{α}

for all

α \in A

, the subset

\cap_{α} K_{α}

is compact in

\hat{X}

.

Proof.

Consider the product space

Y = \prod_{α \in A} Y_{α}

. The diagonal

Δ = {(x, x, x, \dots) \in Y : x \in X}

is a closed subset of

Y

, homeomorphic to

\hat{X}

, and the mappings

π_{α}

are the canonical projections

Y \to Y_{α}

, applied after the homeomorphism

\hat{X} \to Δ

. A compact

\hat{K}

in

\hat{X}

has compact images

π_{α} (\hat{K})

in all

Y_{α}

,

α \in A

. Conversely, if H is a subset of

\hat{X}

such that

π_{α} (H)

is compact in

Y_{α}

for all

α \in A

, then

B = \prod_{α \in A} π_{α} (H)

is compact in

Y

by Tychonov’s theorem, and so is the closed subspace

Δ \cap B

. Set-theoretically,

π_{α} (H) = H

for all

α \in A

, implying that

Δ \cap B

is homeomorphic to H, so H is compact. Given compact subsets

K_{α}

of

Y_{α}

for all

α \in A

, the subset

\hat{K} = \cap_{α} K_{α}

is compact as a subset of any

Y_{α}

, (

α \in A

), so

\hat{K}

is compact as a subset of

\hat{X}

. □

Corollary 2.

Let

X

be a Polish space with a countable basis

U

and a well-ordered sequence

A

of partitions generated by the basis that resolves

X

. Consider

M^{1} (X)

with the tight topology. Let

(Π_{α}, φ_{* α β})

be a coherent system of Borel probability measures on the inverse system

(M^{1} (X_{α}), φ_{* α β})

. If, for all

α \in A

, all

A \in α

and all

ϵ, δ > 0

, there is a

K \subset A

compact in

X

such that,

Π_{β} ({P_{β} \in M^{1} (X_{β}) : P_{β} (φ_{β} (K)) < P_{β} (φ_{β} (A)) - δ}) < ϵ,

(17)

for all

β \in A

such that

α \leq β

, then there exists a unique Radon probability measure Π on

M^{1} (X)

projecting to

Π_{α}

for all

α \in A

.

Proof.

Enumerate the partitions in

A

,

A = {α_{n} : n \geq 0}

and let

δ, ϵ > 0

be given. To find a compact subset

\hat{K}

of

\hat{X}

to satisfy property (16), we construct a decreasing sequence of non-empty, compact sets in the spaces

Y_{α_{n}}

, (

n \geq 0

) by induction and take the intersection. For now, assume that

α_{0} = {X}

. According to condition (17), there exists a compact set

K_{0}

in

Y_{α_{0}} = X

such that,

Π_{α_{n}} (\{P_{α_{n}} \in M^{1} (X_{α_{n}}) : P_{α_{n}} (φ_{α_{n}} (K_{0})) < 1 - \frac{1}{2} δ\}) < \frac{1}{2} ϵ,

for all

n \geq 0

. Make the induction assumption that for given

n \geq 0

, there is a compact

K_{n}

in

Y_{α_{n}}

with,

Π_{α_{m}} (\{P_{α_{m}} \in M^{1} (X_{α_{m}}) : P_{α_{m}} (φ_{α_{m}} (K_{n})) < 1 - \frac{2^{n + 1} - 1}{2^{n + 1}} δ\}) < \frac{2^{n + 1} - 1}{2^{n + 1}} ϵ,

(18)

for all

m \geq n

. Fix

m \geq n + 1

. To combine masses back at a later stage and choose

0 < λ_{i} < 1

for all

1 \leq i \leq | α_{n + 1} |

, such that

\sum_{i} λ_{i} = 1

. For any

1 \leq i \leq | α_{n + 1} |

, there exists a compact

K_{i} \subset A_{i}

, such that,

Π_{α_{m}} (\{P_{α_{m}} \in M^{1} (X_{α_{m}}) : P_{α_{m}} (φ_{α_{m}} (A_{i})) - P_{α_{m}} (φ_{α_{m}} (K_{i})) > 2^{- (n + 2)} λ_{i} δ\}) < 2^{- (n + 2)} λ_{i} ϵ,

(19)

for all

m \geq n + 1

. The intersection

K_{n + 1} = K_{n} \cap (\cup_{i} K_{i})

is not only compact in

Y_{α_{n}}

but also in

Y_{α_{n + 1}}

. Then, for any

m \geq n

, if

P_{α_{m}}

does not lie in any of the

M^{1} (X_{α_{m}})

subsets on the left-hand sides of inequalities (18) and (19), then,

P_{α_{m}} (φ_{α_{m}} (K_{n + 1})) \geq 1 - \frac{2^{n + 2} - 1}{2^{n + 2}} δ,

and the

Π_{α_{m}}

-probability of that event is lower-bounded by,

Π_{α_{m}} ({P_{α_{m}} \in M^{1} (X_{α_{m}}) : P_{α_{m}} (φ_{α_{m}} (K_{n + 1})) \geq 1 - \frac{2^{n + 2} - 1}{2^{n + 2}} δ}) \geq 1 - \frac{2^{n + 1} - 1}{2^{n + 1}} ϵ - \sum_{i} 2^{- (n + 2)} λ_{i} ϵ = 1 - \frac{2^{n + 2} - 1}{2^{n + 2}} ϵ,

completing the induction step. Define

\hat{K} = \cap_{n \geq 1} K_{n}

, which is compact in

\hat{X}

by Lemma 4, and,

Π_{α_{n}} (\{P_{α_{n}} \in M^{1} (X_{α_{n}}) : P_{α_{n}} ({\hat{φ}}_{α_{n}} (\hat{K})) < 1 - δ\}) < ϵ,

for all

n \geq 0

, showing that condition (16) is satisfied, and the assertion follows from Theorem 3. Coming back to the assumption that

α_{0} = {X}

, if

α_{0}

consists of more than one set, then the induction argument is started from a (finite) partition

α_{0}

that coincides with some

α_{n}

-stage in the proof provided above. □

The requirements that Corollary 2 places on

X

and

A

are more specific than those of Theorem 3, but not necessarily more restrictive: all Polish spaces have countable bases, and well-ordered partition systems

A

, generated by some countable basis

U

, can all be derived as subsequences of the generic situation, cf. Example 1.

4.1.4. Coherent Random Histogram Systems Without Limit

To conclude, we consider the cases in which condition (16) does not hold. We start with a counterexample that illustrates concretely how the failure of condition (16) is related to the ‘leaking away’ of probability mass in the limit of refining

α

.

Example 2.

Consider

X = R

with a basis

U

defined by all open intervals with rational midpoints and rational radii. Consider a triangular array defined by

{q_{n, m} : n \geq 1, 1 \leq m \leq M_{n} = 2^{n} - 1}

of values in

Q

, such that for every

n \geq 1

, we have

q_{n + 1, 1} \leq q_{n, 1}

,

q_{n + 1, M_{n + 1}} \geq q_{n + 1, M_{n}}

; for every

1 \leq m \leq M_{n}

,

q_{n + 1, 2 m} = q_{n, m}

; and for every

m \leq M_{n} - 1

,

q_{n, m} < q_{n + 1, 2 m - 1} < q_{n, m + 1}

. Defining

α_{n}

to be of the form,

α_{n} = \{A_{n, 1} = (- \infty, q_{n, 1}], A_{n, 2} = (q_{n, 1}, q_{n, 2}], \dots, A_{n, | α_{n} |} = (q_{n, M_{n}}, \infty)\},

one verifies that the

α_{n}

are generated by the basis

U

and

α_{n + 1}

refines

α_{n}

for any

n \geq 1

. Assuming that the set

\cup {q_{n, m} : n \geq 1, 1 \leq m \leq M_{n}}

is dense in

R

, the resulting partitions collectively generate the Borel σ-algebra. (For later reference, we indicate the possibility to choose

q_{n, 1} = 0

,

q_{n, M_{n}} = 1

, to define partitions on

(0, 1]

.)

The simplest example of a coherent histogram system that does not satisfy condition (16) is constructed as follows. Choose some

δ > 0

,

N \geq 1

and define histogram distributions

Π_{α_{n}}

for all

n \geq N

for the probability vectors

(P_{α_{n}} (A_{n, m}) : 1 \leq m \leq | α_{n} |)

satisfying,

Π_{α_{n}} (P_{α_{n}} (A_{n, 1}) + P_{α_{n}} (A_{n, | α_{n} |}) = δ) = 1,

that is, some non-zero fraction of the total probability mass in the n-th histogram is concentrated in the ‘outside’ sets

A_{n, 1}, A_{n, | α_{n} |}

with

Π_{α_{n}}

-probability one. As

A_{n + 1, 1} \subset A_{n, 1}

and

A_{n + 1, | α_{n + 1} |} \subset A_{n, | α_{n} |}

, coherence of the histogram system

(Π_{α_{n}}, φ_{* α_{n} α_{m}})

is maintained. Assuming that

- q_{n, 1}, q_{n, M_{n}} \to \infty

, any compact K in

X = R

fails to meet the ‘outside’ sets,

K \cap A_{n, 1} = ⌀

and

K \cap A_{n, | α_{n} |} = ⌀

, for large enough n, which invalidates condition (16).

To summarize the above example, the problem occurs because the

Π_{α_{n}}

shifts a non-zero amount of mass towards

\pm \infty

without limitations as n grows. For any presumed limit measure

Π

on

M^{1} (X)

, this would mean that for all compact sets K in

R

,

Π (P (K) \leq 1 - δ) = 1

. This shows that property (16) cannot be satisfied, and no such limit

Π

exists as a probability distribution on

M^{1} (X)

.

The non-compactness of

X

appears essential in the above example; however, the next example shows that the situation is more complicated: mass can ‘leak away’ not just to points at infinity but at any boundary between partition sets.

Example 3.

In Example 2, take

X

equal to the compact subset

[0, 1]

, define the points

s_{n, m} = \frac{1}{2} + \frac{1}{π} arctan (q_{n, m})

for all

n \geq 1

,

1 \leq m \leq M_{n}

and consider partitions

\begin{matrix} β_{n} = {B_{n, 0} = {0}, & B_{n, 1} = (0, s_{n, 0}], \\ B_{n, 2} = (s_{n, 0}, s_{n, 1}], \dots, B_{n, | α_{n} | - 1} = (s_{n, M_{n}}, 1), B_{n, | β_{n} |} = {1}}, \end{matrix}

so that

| β_{n} | = | α_{n} | + 2

. Now define the histogram distributions

Π_{β_{n}}

for the probability vectors

(P_{β_{n}} (B_{n, m}) : 1 \leq m \leq | β_{n} |)

by,

Π_{β_{n}} (P_{β_{n}} (B_{n, 0}) = 0) = Π_{β_{n}} (P_{β_{n}} (B_{n, | β_{n} |}) = 0) = 1,

and the distribution of

(P_{β_{n}} (B_{n, m}) : 2 \leq m \leq | β_{n} | - 1)

equal to that of

(P_{α_{n}} (A_{n, m}) : 1 \leq m \leq | α_{n} |)

. Again, we have a coherent system of histogram distributions and, like in Example 2, probability mass is shifted up against the boundary points at 0 and 1, but no limiting distribution Π on

M^{1} ([0, 1])

exists with the

Π_{β_{n}}

as histogram projections.

In fact, probability mass does not even have to disappear at points of the boundary of

X

: if we make this example on

[0, 1]

part of a refining system of partitions of

R

, with some fraction of the total probability mass in (a random histogram system on) the complements

(- \infty, 0) \cup (1, \infty)

, the construction on

{0} \cup (0, 1) \cup {1}

will continue to make some non-zero fraction of the mass ‘leak away’ across the boundary of

(0, 1)

, which lies in the interior of

X

.

The concluding remark in Example 3 is close to the generic situation: if we partition

R

into intervals, boundaries between partition sets create the potential for coherent random histogram systems that make probability mass disappear there in the limit. If we generalize to higher dimensions, it becomes clear that mass does not necessarily disappear at specific points, it may be concentrated in any decreasing sequence of partition sets with an empty limit; this shows in which way a (histogram-specific) form of

σ

-additivity makes a re-appearance.

These counterexamples highlight the significance of condition (16): requiring the existence of a compact K in

X

would prevent Counterexample 2 but not Example 3. In order to prevent ‘leakage’ of the latter type, we have to impose the stronger requirement of the existence of a compact

\hat{K}

in

\hat{X}

, which keeps probability mass away from all potential points of ‘leakage’ simultaneously.

In case condition (16) cannot be satisfied, as in Examples 2 and 3, it is possible to consider the compactification of

\hat{X}

, for example, the Stone–Čech compactification

β \hat{X}

. With the canonical extension of partitions of

X

to the space

β \hat{X}

, condition (16) is satisfied trivially. The limiting probability measure

Π

on

M^{1} (β \hat{X})

may not be unique (because the projections onto the spaces

M^{1} (X_{α})

are not necessarily separating). Moreover, in applications, the added points in the closed subset

β \hat{X} ∖ \hat{X}

lack interpretation.

4.2. Support in the Tight Topology

Below, Theorem 3 is used to characterize the support of histogram limit measures

Π

on

M^{1} (X)

with the tight topology for a Polish space

X

. As it turns out, the appropriate relation for the mean measure is the inclusion of supports. This assertion is already known in the literature (see, for example, Theorem 4.15 in []), but the proof given here is not. In the formulation of the following theorem, let G denote the mean measure of Definition 5.

Proposition 11.

Let

X

be a Polish space. Consider

M^{1} (X)

with the tight topology and a Borel probability distribution Π. Let G be the mean measure under Π. Then,

{P \in M^{1} (X) : supp (P) \subset supp (G)}

is closed in

M^{1} (X)

and,

{supp}_{T_{C}} (Π) \subset {P \in M^{1} (X) : supp (P) \subset supp (G)} .

Moreover, if

P \in M^{1} (X)

is such that for all partitions

α \in A

,

P_{α}

lies in the support of

Π_{α}

in

M^{1} (X_{α})

, then P lies in the tight support of Π.

Proof.

If P is such that

supp (P) ⊄ supp (G)

, there exists an

x \in supp (P) ∖ supp (G)

and, by complete regularity of

X

, a continuous

f : X \to [0, 1]

with

f = 0

on

supp (G)

and

f (x) = 1

. While

⟨ G, f ⟩ = 0

, the open neighborhood of x for which

f > \frac{1}{2}

receives non-zero P-probability, and we see that

⟨ P, f ⟩ > 0

. So, if Q lies in the tight neighborhood

{Q \in M^{1} (X) : | ⟨ (P - Q), f ⟩ | < \frac{1}{2} ϵ}

of P (for some

0 < ϵ < ⟨ P, f ⟩

),

⟨ Q, f ⟩ > 0

and, accordingly,

supp (Q) ⊄ supp (G)

, from which it follows that

{P \in M^{1} (X) : supp (P) \subset supp (G)}

is closed. Moreover, by Markov’s inequality and Fubini’s Theorem,

\begin{matrix} Π ({Q \in & M^{1} (X) : | ⟨ (P - Q), f ⟩ | < \frac{1}{2} ϵ}) \\ \leq Π ({Q \in M^{1} (X) : ⟨ Q, f ⟩ > \frac{1}{2} ϵ}) \leq \frac{2}{ϵ} \int ⟨ Q, f ⟩ d Π (Q) = \frac{2 ⟨ G, f ⟩}{ϵ} = 0 . \end{matrix}

Conclude that P has a tight neighborhood of

Π

-mass zero, which means that P does not lie in the tight support of

Π

.

Regarding the last assertion, it is noted that since

M^{1} (X)

with the tight topology is the continuous image of a subset of the inverse limit N of Proposition 3, the collection of sets

{φ_{* α}^{- 1} (V) : α \in A, V \in U_{α}}

(where

U_{α}

is any basis for

M^{1} (X_{α})

, e.g., total-variational balls) in

M^{1} (X)

forms a basis for the tight topology. Consequently, for any tight neighborhood U of

P \in M^{1} (X)

, there exists an

α \in A

and a

V \in U_{α}

such that

φ_{* α}^{- 1} (V) \subset U

, and,

Π (U) \geq Π (φ_{* α}^{- 1} (V)) = Π_{α} (V) > 0,

by assumption. □

5. Phase Structure of Probability Histogram Limits

In this section we combine the two main existence theorems of the preceding sections with the general theory of completely random measures [] to describe the various ways in which random histogram limits manifest. In Section 5.1 we review completely random point processes [] and show in Section 5.2 how combination leads to the conclusion that random histogram limits occur in one of four distinct phases: continuous-singular or dominated, each either purely atomic or not (see Theorem 5 below). The phase of a random histogram limit depends on the topology on

M^{1} (X)

and on independence within random histogram distributions. In Section 7 and Section 8, we demonstrate that both in the Pólya-tree family and in the Gaussian family of histogram limits, changes in the defining parameters of their histogram systems can cause the limit to transition from one phase to another.

5.1. Completely Random Measures

In [,], so-called completely random measures are defined as positive random measures

ν \sim Π^{'}

that assign stochastically independent random masses to disjoint measurable subsets of the underlying space

X

, and it is shown that (the random part of) a completely random measure is a purely atomic measure with

Π^{'}

-probability one. (Note, we say that a positive measure

ν

is purely atomic if the collection D of points

x \in X

for which

ν ({x}) > 0

(so-called atoms) contains all

ν

-mass, i.e.,

ν (D) = ν (X)

; we say that

ν

is non-atomic, if

D = ⌀

.) Below, we give the briefest of introductions to completely random measures (following [] (Chapters 9 and 10)), and relate the results to the existence theorems of Section 3 and Section 4.

Definition 6.

Let

(X, B)

be a Polish space. A random positive Radon measure ν on

X

, distributed according to

Π^{'}

, is called a completely random measure, if, for any finite collection of disjoint measurable sets

A_{1}, \dots, A_{n} \in B

, the measures

ν (A_{1}), \dots, ν (A_{n})

are independent.

Any (random) positive Radon measure

ν \sim Π^{'}

decomposes as a sum of a (random) purely atomic measure

ν_{d}

and a (random) non-atomic measure

ν_{n}

in a unique way [] (Proposition 9.3.IV),

Π^{'} (ν = ν_{n} + ν_{d}) = 1,

(20)

and, for a random positive Radon measure to be almost-surely non-atomic, it is necessary and sufficient that for any

ϵ, δ > 0

, there is a finite Borel measurable partition

α

of

X

such that for all finer finite Borel-measurable partitions

β

,

Π^{'} (max {ν (B) : B \in β} > δ) < ϵ

. In the case of a completely random measure

ν

, this implies that

ν

is

Π^{'}

-almost-surely equal to some fixed (that is, non-random) non-atomic measure

ν_{n}

[] (Proposition 10.1.II). As a consequence, the atomic part of any completely random measure can be fixed or random, while the non-atomic part is always non-random.

Definition 7.

For any random positive Radon measure

ν \sim Π^{'}

on

(X, B)

and any

t > 0

, define the cumulant

λ_{t} : B \to [0, \infty]

by,

λ_{t} (A) = log \int e^{t ν (A)} d Π^{'} (ν) .

Fubini’s theorem implies that if

ν \sim Π^{'}

is a completely random measure, for any

t > 0

, the cumulant

λ_{t}

is a positive Borel measure. The theorem below says that the atomic part of a completely random measure decomposes into a sum of random atoms at fixed points in

X

and a sum

ν_{r}

of random atoms at random points in

X

.

Theorem 4

([]). Let

ν \sim Π^{'}

be a completely random measure with cumulant measures

λ_{t}

for

t > 0

. If all

λ_{t}

are σ-finite, then with

Π^{'}

-probability one, ν satisfies the decomposition,

ν = ν_{n} + ν_{f} + ν_{r},

(21)

where

ν_{n}

is a non-random, non-atomic, σ-finite measure on

(X, B)

;

ν_{f}

is a purely atomic measure supported on a fixed, countable subset

D \subset X

where

ν_{f} ({x})

and

ν_{f} ({x^{'}})

are independent if

x, x^{'} \in D

,

x \neq x^{'}

; and

ν_{r}

is a random purely atomic measure that is independent of

ν_{f}

.

As it turns out,

σ

-finiteness of

λ_{t}

is equivalent to the existence of countable cover

C_{1}, C_{2}, \dots \in B

of

X

such that

Π^{'} (ν (C_{i}) < \infty) > 0

, for every

i \geq 1

. Furthermore, the set of fixed atoms D is the set of atoms of

λ_{t}

, and the

σ

-additivity of

λ_{t}

implies the countability of D. The random purely atomic measure

ν_{r}

is realized with the help of a Poisson point process N on

X \times (0, \infty)

(cf. [] (Proposition 9.1.III-(v)), as follows:

ν_{r} (A) = \int y N (A \times d y),

(22)

with an intensity measure

μ

that may be unbounded on sets of the form

A \times (0, ϵ)

,

(ϵ > 0)

, but satisfies,

\int min {1, y} d μ (A \times d y) < \infty .

One may also start by choosing the intensity measure

μ

and define

ν_{r}

through (22). In the case that

X = [0, \infty)

, the existence of an almost-surely strictly positive and finite completely random measure is equivalent to the requirement that the Lévy intensity associated wit the random jumps assigns infinite mass to

(0, \infty)

[] (see p. 563).

The measure

ν_{n}

appears in

λ_{t}

as the t-linear contribution:

λ_{t} (A) = t ν_{n} (A) + \dots

So, completely random measures with cumulant measures without

t ν_{n}

-terms (

ν_{n} = 0

) and without fixed atoms (

ν_{f} = 0

) are characterized as purely atomic with random locations, purely

ν_{r}

; similarly, completely random measures with

ν_{n} = 0

and Poisson intensity measure

μ = 0

are characterized as purely atomic with fixed locations, purely

ν_{f}

. Based on complete randomness, a wider class of random measures can be characterized in which the almost-sure atomic nature is preserved [].

Complete randomness imposes a purely atomic nature on random probability measures too, after normalization: if a given positive random measure

ν \sim Π^{'}

satisfies

0 < ν (X) < \infty

with

Π^{'}

-probability one, then

P = ν / ν (X) \sim Π

defines a random probability measure called a normalized completely random measure, and P inherits the purely atomic nature of

ν

. The histogram distributions

Π_{α}

for

P_{α}

follow from the distributions

Π_{α}^{'}

through,

(P_{α} (A_{α, 1}), \dots, P_{α} (A_{α, | α |})) = \frac{1}{ν_{α} (X)} (ν_{α} (A_{α, 1}), \dots, ν_{α} (A_{α, | α |})) \sim Π_{α},

(23)

where

ν_{α} (X) = ν_{α} (A_{α, 1}) + \dots + ν_{α} (A_{α, | α |})

. We say that the random histograms

P_{α}

are independent up to normalization.

5.2. Phases of Probability Histogram Limits

Combining the conclusions of Section 3 and Section 4 with the presence or absence of complete randomness, we arrive at the following theorem. (Given a random probability measure, let G denote the associated mean measure.)

Theorem 5

(Phases of random histogram limits). Let

X

,

A

and

M^{1} (X)

satisfy the minimal conditions. Let

A

be a directed set of finite, Borel-measurable partitions that resolve

X

, with a coherent system of Borel histogram probability measures

(Π_{α}, φ_{* α β})

on the inverse system

(M^{1} (X_{α}), φ_{* α β})

.

(i.): (absolutely-continuous)
If condition (12) is satisfied, the histogram limit describes a random element P of $M^{1} (X)$ , distributed according to a weakly-Radon probability measure Π, such that Π-almost-surely, for all measurable $B \in B$ , $G (B) = 0$ implies $P (B) = 0$ :

$Π ({P \in M^{1} (X) : P ≪ G}) = 1 .$

The random element P can be identified isometrically with a random positive Radon–Nikodym density function p in $L^{1} (X, B, G)$ of norm one, and we can write, for all $B \in B$ ,

$P (B) = \int_{B} p (x) d G (x);$
(ii.): (fixed-atomic)
if condition (12) is satisfied and the $Π_{α}$ describe normalized completely random histograms, cf. (23), the histogram limit $P \sim Π$ is a normalized version of the sum ν of a fixed non-atomic measure $ν_{n} ≪ G$ and a random purely atomic measure $ν_{f}$ supported on the fixed, countable set $D = {x \in X : G ({x}) > 0}$ . For all $B \in B$ ,

$P (B) = \frac{1}{ν (X)} (ν_{n} (B) + ν_{f} (B)) .$

Assume, in addition, that $X$ is a Polish space and that $A$ is a directed set of finite partitions generated by a basis that resolves $X$ ;
(iii.): (continuous-singular)
If condition (16) is satisfied, the histogram limit describes a random element P of $M^{1} (X)$ , distributed according to a tightly-Radon probability measure Π, such that Π-almost-surely, for all open $U \subset X$ , $G (U) = 0$ implies $P (U) = 0$ :

$Π ({P \in M^{1} (X) : supp (P) \subset supp (G)}) = 1;$

(iv.): (random-atomic)
if condition (16) is satisfied and the $Π_{α}$ describe normalized completely random histograms, cf. (23), the histogram limit $P \sim Π$ is a normalized version of the sum ν of a fixed non-atomic measure $ν_{n} ≪ G$ , a random purely atomic measure $ν_{f}$ supported on the fixed, countable set $D = {x \in X : G ({x}) > 0}$ , and a random purely atomic measure $ν_{r}$ . For all $B \in B$ ,

$P (B) = \frac{1}{ν (X)} (ν_{n} (B) + ν_{f} (B) + ν_{r} (B)) .$

Proof.

In cases (i.) and (iii.), the theorem states the assertions of Theorems 2 and 3; in cases (ii.) and (iv.), these assertions are combined with those of Theorem 4, specific to normalized completely random measures (where it is observed that the set of atoms of

λ_{t}

(for any

t > 0

) is equal to the set of atoms of G). □

In qualitative terms, we may describe the phase structure of random histogram limits as follows: The most general, least constrained type of limit above is that of the continuous-singular phase. According to (20), any continuous-singular random P decomposes into a random atomic component and a random non-atomic component. The random component of any completely random case manifests as purely atomic (the random-atomic phase), with independent, randomly sized point masses at fixed locations and independent random locations. Many examples of (normalized) completely random families are known, including the well-known Dirichlet family (which is discussed in Section 6) and a sub-family of Gaussian histogram systems (see Section 8).

The random non-atomic component of a histogram limit in the continuous-singular phase is novel and more interesting: it is implied by the above that dependence in histogram distributions is required to induce a random non-atomic continuous-singular component. To illustrate the nature of such a component, we may think, for example, of

X = [0, 1]

and a

Π

with G equal to the Lebesgue measure, describing a random Stieltjes function

F : [0, 1] \to [0, 1]

, from a class that is everywhere continuous but not everywhere (or even nowhere) differentiable (e.g., the so-called Cantor distribution). Such distributions are non-atomic but cannot be identified with random Radon–Nikodym density functions. The Gaussian histogram systems of Section 8 are in the non-atomic continuous-singular phase generically, and only Gaussian histogram systems with diagonal covariance matrices are in the random-atomic phase.

In the absolutely-continuous phase, the histogram distributions are such that the histogram probabilities

P_{α} (A)

may be larger than their means

G_{α} (A)

, but not to such a degree that (

G_{α}

-averages of) proportions between

P_{α}

and

G_{α}

grow unbounded in the limit. This is borne out by the formulation of property (13), and also serves to interpret later bounds (e.g., (26)). The upper bound on the proportions between

P_{α} (A)

and

G_{α} (A)

induces domination

P ≪ G

with

Π

-probability one. Extending the above example with

X = [0, 1]

, the absolutely continuous phase describes a random Stieltjes function

F : [0, 1] \to [0, 1]

that is everywhere differentiable and can be identified with a random Radon–Nikodym density function with respect to G. In Section 8, we discuss Gaussian random histogram limits in the absolutely-continuous phase. If we specify that an absolutely-continuous random histogram limit is also normalized completely random, then the limit is in the fixed-atomic phase: combining the resulting purely atomic character of the random component with domination by G, we find only random point masses at the fixed locations of the atoms of G. The sub-family of Dirichlet process distributions with countably supported base measures are in the fixed-atomic phase.

The distinction between the random-atomic and fixed-atomic phases provides an alternative explanation for the decomposition

ν_{f} + ν_{r}

of the random, purely atomic component in Kingman’s theorem: based on the above and the Radon–Nikodym theorem, we explain this by the fact that any random probability measure decomposes uniquely into a random component dominated by its mean measure G and a random component that is mutually singular with respect to G (but still with support inside the support of G).

6. Existence and Phases of Dirichlet Histogram Limits

The best-known family of histogram limits is the Dirichlet family; its definition is based most conveniently on the observation that if

Z_{1}, \dots, Z_{n}

are independent and distributed according to Gamma distributions

Γ (ν_{1}, 1), \dots, Γ (ν_{n}, 1)

, then

Z_{1} + \dots + Z_{n}

is distributed according to

Γ (ν_{1} + \dots + ν_{n}, 1)

. (Below, we use the convention that

Γ (0, 1)

is a single atom of mass one located at zero.)

Definition 8.

Let ν be a non-zero, bounded, positive Borel measure on a Polish space

X

and define, for every Borel-measurable partition α,

(P (A) : A \in α) \sim {Dir}_{ν_{α}},

(24)

where

ν_{α} = (ν (A) : A \in α)

. The histogram distributions

{Dir}_{ν_{α}}

on

M^{1} (X_{α})

are those of the normalized positive random elements

(Z_{1} / S, \dots, Z_{| α |} / S)

, where

Z_{i} \sim Γ (ν (A_{i}), 1)

, (

i \in I (α)

) and

S = \sum_{i} Z_{i}

. Together, the distributions

({Dir}_{ν_{α}}, α \in A_{0})

are coherent and form the Dirichlet histogram system with base measure ν.

It is clear that the Gamma process, defined by the positive random vectors

(Z_{1}, \dots, Z_{n}) \sim \prod_{i} Γ (ν (A_{i}), 1)

, is completely random and that Dirichlet histogram systems are normalized completely random. Limits of Dirichlet histogram systems therefore describe random probability measures in one of the two atomic phases.

A second immediate observation is that coherence of the histogram system could have been guaranteed based on parametrization in terms of a finitely additive base measure

ν

. The well-known Mean-measure condition [] requires

ν

to be countably additive to guarantee the existence of a unique histogram limit with respect to the tight topology on

M^{1} (X)

. We come back to the Mean measure condition below.

6.1. Tight Limits of Dirichlet Histogram Systems

The following theorem is the (by now classical, see []) existence result for Dirichlet histogram limits, with a new proof in terms of condition (16).

Theorem 6.

Let

X

be a Polish space, endow

M^{1} (X)

with the tight topology and let ν be a non-zero, bounded, positive Borel measure on

X

. There exists a unique Radon probability measure

{Dir}_{ν}

on

M^{1} (X)

projecting to the Dirichlet histogram distributions (24), describing a random probability measure in the random atomic phase.

Proof.

Let

U

be a countable basis for

X

and let

A

be a refining sequence of partitions, generated by

U

, that resolves

X

. By assumption, there exist distributions

{Dir}_{ν_{α}}

for the random histograms

P_{α} \in M^{1} (X_{α})

, (

α \in A

). As said, the coherence of the inverse system

({Dir}_{ν_{α}}, φ_{* α β})

follows from finite additivity of the measure

ν

.

To prove condition (16), let

ϵ > 0, δ > 0

be given. According to Proposition 8,

ν

defines a bounded positive Borel measure on

\hat{X}

and, according to Proposition 7,

\hat{X}

is Polish, so

ν

is a Radon measure on

\hat{X}

. Hence, there exists a compact

\hat{K}

in

\hat{X}

such that,

ν (\hat{X} ∖ \hat{K}) < δ ϵ ν (\hat{X}) .

Let

α

be given. By Markov’s inequality and the fact that under

{Dir}_{ν_{α}}

,

P_{α} (A) \sim Beta (ν (A), ν (\hat{X}) - ν (A)),

(25)

for any

A \in σ (α)

, we have,

\begin{matrix} {Dir}_{ν_{α}} & (\{P_{α} \in M^{1} (X_{α}) : P_{α} ({\hat{φ}}_{α} (\hat{K})) < 1 - δ\}) \\ = {Dir}_{ν_{α}} (\{P_{α} \in M^{1} (X_{α}) : P_{α} (X_{α}) - P_{α} ({\hat{φ}}_{α} (\hat{K})) > δ\}) \\ \leq \frac{1}{δ} \int_{M^{1} (X_{α})} P_{α} (X_{α} ∖ {\hat{φ}}_{α} (\hat{K})) d {Dir}_{ν_{α}} (P_{α}) \\ = \frac{1}{δ} \frac{ν (\hat{X} ∖ ({\hat{φ}}_{α}^{- 1} \circ {\hat{φ}}_{α}) (\hat{K}))}{ν (\hat{X})} \leq \frac{1}{δ} \frac{ν (\hat{X} ∖ \hat{K})}{ν (\hat{X})} < ϵ, \end{matrix}

by Markov’s inequality, the fact that the

G_{α}

are proportional to

ν_{α}

and the fact that

\hat{K} \subset ({\hat{φ}}_{α}^{- 1} \circ {\hat{φ}}_{α}) (\hat{K})

. Conclude that there exists a unique histogram limit

{Dir}_{ν}

, a Radon probability measure on

M^{1} (X)

with the tight topology. Because the histogram system is normalized completely random, the limiting random element P is in the random-atomic phase. □

To conclude, two remarks are in order: Firstly, coming back to the Mean-measure condition, it is noted that the above proof relies on

ν

being not just finitely, but countably additive, to imply the Radon property. Secondly, we note that restriction to

A

with partitions generated by the basis may be confusing since the most common definition of the Dirichlet histogram system involves all Borel-measurable partitions,

A_{0}

. We argue that this distinction expresses the difference between the roles that

A

plays in Theorem 6 and Proposition 2: to define

{Dir}_{ν}

, we are restricted to directed sets

A

of a special form, while, after proving existence, we may use histograms associated with all

α \in A_{0}

.

6.2. Weak Limits of Dirichlet Histogram Systems

Whether

{Dir}_{ν}

is a Radon measure with respect to the weak topology as well depends on the base measure

ν

. To make a preliminary assessment, note that, given

α \in A

and

L > 0

, for any

P_{α}, Q_{α} \in M^{1} (X_{α})

,

\begin{matrix} ∥ P_{α} - & P_{α} \land L Q_{α} ∥_{1, X_{α}} = \sum \{P_{α} (A_{i}) : i \in I (α), P_{α} (A_{i}) > L Q_{α} (A_{i})\} \\ \leq \frac{1}{L} \sum \{{(\frac{P_{α} (A_{i})}{Q_{α} (A_{i})})}^{2} Q_{α} (A_{i}) : i \in I (α), P_{α} (A_{i}) > L Q_{α} (A_{i})\} \leq \frac{1}{L} \sum_{i \in I (α)} \frac{P_{α} {(A_{i})}^{2}}{Q_{α} (A_{i})} . \end{matrix}

(26)

Based on (25), we see that for every

A \in α

,

\int_{M^{1} (X_{α})} P_{α} {(A)}^{2} d {Dir}_{ν_{α}} (P_{α}) = \frac{ν {(A)}^{2} + ν (A)}{ν {(X)}^{2} + ν (X)} .

Now, let

δ > 0

be given. Due to the bound (26), for any

L > 0

and any

α \in A

, Markov’s inequality gives,

\begin{matrix} Π_{α} ({ & P_{α} \in M^{1} (X_{α}) : ∥ P_{α} - P_{α} \land L G_{α} ∥_{1, X_{α}} > δ}) \\ \leq Π_{α} (\{P_{α} \in M^{1} (X_{α}) : \frac{1}{L} \sum_{i \in I (α)} \frac{P_{α} {(A_{i})}^{2}}{G_{α} (A_{i})} > δ\}) \\ \leq \frac{1}{L δ} \sum_{i \in I (α)} \frac{1}{G_{α} (A_{i})} \int_{M^{1} (X_{α})} P_{α} {(A_{i})}^{2} d Π_{α} (P_{α}) \\ = \frac{1}{L δ} \sum_{i \in I (α)} \frac{ν (A_{i}) + 1}{ν (X) + 1} = \frac{1}{L δ} \frac{ν (X) + | α |}{ν (X) + 1}, \end{matrix}

for all

α \in A

. Since

| α | \to \infty

, as

α \in A

refines (unless

X

is finite), this shows that the most obvious upper bound to imply uniform integrability does not lead to a useful argument. However, we show the following.

Theorem 7.

Let

X

,

A = A_{0}

and

M^{1} (X)

satisfy the minimal conditions and consider

M^{1} (X)

with the weak topology. Let ν be a non-zero, bounded, positive, purely atomic measure on

X

. Then, there exists a unique Radon probability measure

{Dir}_{ν}

on

M^{1} (X)

with the weak topology, projecting to

{Dir}_{ν_{α}}

for all

α \in A_{0}

. In that case,

{Dir}_{ν}

describes a normalized completely random measure in the fixed-atomic phase.

Proof.

First, consider a countable set D with the discrete topology (which is a Polish space), with a bounded, positive Borel measure

ν_{D}

on D. According to Theorem 6, the Dirichlet histogram system with base measure

ν_{D}

has a Radon histogram limit

{Dir}_{ν_{D}}

on

M^{1} (D)

with the tight topology. Since any bounded

f : D \to R

is continuous, the tight and weak topologies are equal. Therefore

{Dir}_{ν_{D}}

is also Radon with respect to the weak topology on

M^{1} (D)

by default.

Now, let

X

be Polish and let D denote the set

{x \in X : ν ({x}) > 0}

. Let

A_{D}

denote the set of all finite partitions of D and let

A_{X ∖ D}

denote the set of all finite, Borel-measurable partitions of

X ∖ D

. Define

A

to contain all partitions

α

that combine a partition

α_{D}

from

A_{D}

and a partition

α_{X ∖ D}

from

A_{X ∖ D}

, to partition the whole space

X

. Note that

A

resolves

X

, and

A

is directed and co-final in

A_{0}

. For any

α = (α_{D}, α_{X ∖ D}) \in A

, the Dirichlet histogram distribution

{Dir}_{ν_{α}}

is such that for the (

σ (α)

-measurable) subset

X ∖ D

,

{Dir}_{ν_{α}} ({P_{α} \in M^{1} (X_{α}) : P_{α} (X ∖ D) = 0}) = 1 .

So,

P_{α} (D) = 1

with

{Dir}_{ν_{α}}

-probability one. The projections of

(P (A) : A \in α)

onto

(P (A) : A \in α_{D})

give rise to a Dirichlet histogram system with base measure

ν_{D}

, the restriction of

ν

to subsets of D. As argued above, the limit

{Dir}_{ν_{D}}

is a Radon probability measure on

M^{1} (D)

with the weak topology. The space

M^{1} (D)

is weak-to-weak homeomorphic to the weakly closed subspace M of all

P \in M^{1} (X)

such that

P ≪ ν

, through the mapping

ϕ : M^{1} (D) \to M^{1} (X)

,

ϕ (P) (B) = P (B \cap D)

for all

B \in B

. Conclude that the histogram system based on partitions in

A

has a histogram limit

{Dir}_{ν}

that is Radon on

M^{1} (X)

with the weak topology. □

7. Existence and Phases of Pólya-Tree Histogram Limits

Here, we give only a very brief introduction to Pólya-tree distributions; for much more, see [,,,] and the overviews in [,].

The Pólya-tree distribution is defined through a sequence of refining partitions of a Polish space

X

(usually

R

or the interval

[0, 1]

), where, in each step, every set in the preceding partition is split into two subsets. To describe the resulting tree of refinements, we define the following: For every

m \geq 0

, we denote by

E_{m}

the set of all binary sequences

ε

of length m (and we denote the empty binary sequence formally as

ε_{⌀}

, forming the only element of the set denoted

E_{0}

). We also define the set

E = \cup_{m \geq 0} E_{m}

of all finite binary sequences (including the empty one). For any two binary sequences

ε \in E_{m}

,

ε^{'} \in E_{m^{'}}

, we write

ε ε^{'}

for the concatenation in

E_{m + m^{'}}

. In particular, for any

ε \in E_{m}

,

ε 0

(

ε 1

) in

E_{m + 1}

appends a zero (one) to

ε

. Also note that

ε_{⌀} ε = ε ε_{⌀} = ε

for all

ε \in E

. We write out

ε \in E_{m}

as

ε = e_{1} \dots e_{m}

and use the notation

ε_{l} : = e_{1} \dots e_{l} \in E_{l}

for the projections onto the first

1 \leq l \leq m

binary digits. We also define for any

ε \in E_{m}

with

e_{m} \in {0, 1}

,

ε

with the last digit flipped:

\hat{ε} = ε_{m - 1} (\neg e_{m})

.

We use

E

to organize a refining sequence

A = {α_{n} : n \geq 0}

of partitions,

α_{0} = {X}

,

α_{1} = {A_{0}, A_{1}}

,

α_{2} = {A_{00}, A_{01}, A_{10}, A_{11}}

, etc., into a dyadic tree, defining

α_{n} = {A_{ε} : ε \in E_{n}}

and, for all

ε \in E

,

A_{ε} = A_{ε 0} \cup A_{ε 1} .

(27)

Mostly, we shall look at refinement through intersection with basis sets and their complements, i.e., for every

ε \in E

, either

A_{ε 0}

or

A_{ε 1}

equals

A_{ε} \cap U

for some element U in a basis

U

for

X

. Note that in the case of a countable basis

U

, iterative application of the above construction gives rise to a countable

A = {α_{m} : m \geq 1}

that resolves

X

.

Example 4.

A typical example is a dyadic tree of partitions of

X = (0, 1]

(or

[0, 1]

), constructed by iteratively bisecting every interval at the mid-point. This leads to a sequence of refining partitions

α_{m}

,

m \geq 0

, consisting of

2^{m}

intervals of the forms

(l, u]

where

l = u - 2^{- m}

and

u = 2^{- m} k

,

k = 1, 2 \dots, 2^{m}

, which is generated by a basis and which resolves

(0, 1]

. (In case

X = [0, 1]

, we add to every partition the singleton

{0}

).

To arrive at random histogram distributions for the Pólya-tree, we define for every

ε \in E

a so-called splitting variable

V_{ε 0}

(and

V_{ε 1} = 1 - V_{ε 0}

), taking values in

[0, 1]

such that

(i.): for any $ε, ε^{'} \in E$ such that $ε \neq ε^{'}$ , $V_{ε 0}$ is independent of $V_{ε^{'} 0}$ ;
(ii.): for every $ε \in E$ , there exist $β_{ε 0}, β_{ε 1} > 0$ such that $V_{ε 0}$ has a $Beta (β_{ε 0}, β_{ε 1})$ distribution.

(In case

X = [0, 1]

, we assign a separately chosen, fixed probability

0 \leq p_{0} \leq 1

to

{0}

with

Π_{α}

-probability one for all

α \in A

. As a default, we choose

p_{0} = 0

.)

Remark 2.

Here and below, we extend the usual family of Beta-distributions somewhat: we consider

β_{ε 0} = \infty

and

β_{ε 1} = \infty

and define

Beta (\infty, β_{ε 1}) = δ_{1}

for all

0 < β_{ε 1} < \infty

,

Beta (β_{ε 0}, \infty) = δ_{0}

for all

0 < β_{ε 0} < \infty

and

Beta (\infty, \infty) = δ_{1 / 2}

.

The splitting variables

V_{ε 0}

are interpreted as random fractions that determine how much of the probability mass of

A_{ε}

goes to

A_{ε 0}

and how much remains for

A_{ε 1}

, in accordance with (27):

V_{ε 0} = P (A_{ε 0} | A_{ε}), V_{ε 1} = P (A_{ε 1} | A_{ε}) .

Consequently, for every

m \geq 1

,

ε = e_{1} \dots e_{m} \in E_{m}

, the random probability for

A_{ε}

can be written as a product of independent fractions:

P (A_{ε}) = V_{e_{1}} V_{e_{1} e_{2}} \dots V_{e_{1} \dots e_{m}} = \prod_{l = 1}^{m} V_{e_{1} \dots e_{l}},

which fixes the histogram probability measures

Π_{α_{m}}

on

X_{α_{m}}

for all

m \geq 1

,

(P (A_{ε}) : ε \in E_{m}) \sim Π_{α_{m}} .

(28)

By construction, the

Π_{α_{m}}

are such that the refinement and coarsening of partitions (corresponding to relations of type (1)) are accommodated coherently.

For later reference, we note the first two moments of the random variables

P (A_{ε})

: for every

m \geq 1

and every

ε \in E_{m}

, the mean measure equals,

G (A_{ε}) : = \int_{M^{1} (X_{α_{m}})} P_{α_{m}} (A_{ε}) d Π_{α_{m}} (P_{α_{m}}) = \int \prod_{l = 1}^{m} V_{e_{1} \dots e_{l}} d Π_{α_{m}} = \prod_{l = 1}^{m} \frac{β_{ε_{l - 1} e_{l}}}{β_{ε_{l - 1} 0} + β_{ε_{l - 1} 1}},

(29)

by independence of the variables

V_{ε 0}

and expectations of the

Beta

-distributions. Expressed in terms of the parameters

β

, the second moment of

P (A_{ε})

takes the form,

\begin{matrix} \int_{M^{1} (X_{α_{m}})} P_{α_{m}} {(A_{ε})}^{2} & d Π_{α_{m}} (P_{α_{m}}) = \int \prod_{l = 1}^{m} V_{e_{1} \dots e_{l}}^{2} d Π_{α_{m}} \\ = \prod_{l = 1}^{m} (\frac{β_{ε_{l - 1} 0} β_{ε_{l - 1} 1}}{{(β_{ε_{l - 1} 0} + β_{ε_{l - 1} 1})}^{2} (β_{ε_{l - 1} 0} + β_{ε_{l - 1} 1} + 1)} + \frac{β_{ε_{l - 1} e_{l}}^{2}}{{(β_{ε_{l - 1} 0} + β_{ε_{l - 1} 1})}^{2}}), \end{matrix}

based on the independence of the

V_{ε 0}

, the variances of the corresponding

Beta

-distributions and Equation (29).

To have a sub-class of relatively simple examples, we define so-called homogeneous Pólya-tree systems.

Definition 9.

Let

A

denote the a dyadic tree of partitions of

X = (0, 1]

(or

[0, 1]

), as in Example 4. A Pólya-tree system is called homogeneous if we choose

β_{m} > 0

for all

m \geq 1

and set

β_{ε} = β_{m}

for all

ε \in E_{m}

.

Accordingly, in a homogeneous Pólya-tree system, splitting variables are distributed symmetrically around

v = \frac{1}{2}

, and the mean measure G for any homogeneous Pólya-tree system with a limit is a Lebesgue measure.

7.1. Tight Limits of Pólya-Tree Histogram Systems

First, the general case of the Pólya-tree histogram system is analyzed with Corollary 2: here, the particulars of the partition play a role in the formulation of the condition, so we have to be specific regarding

X

and its partitioning. In this subsection, we specify that

X = (0, 1]

(or

R

), with a dyadic tree of partitions. We use the following notation: for all

m \geq 0

,

o_{m} = 0 \dots 0 \in E_{m}

and

ι_{m} = 1 \dots 1 \in E_{m}

.

Theorem 8.

Let

X = (0, 1]

and let

A = {α_{m} : m \geq 0}

be the dyadic tree of Example 4. Let

(Π_{α_{m}}, φ_{*, α_{m} α_{n}})

be a coherent inverse system of Pólya-tree measures (with parameter

β = {β_{ε} : ε \in E}

) on the inverse system

(M^{1} (X_{α_{m}}), φ_{* α_{m} α_{n}})

. Then, there exists a unique probability measure on

M^{1} (X)

that is Radon with respect to the tight topology and projects to the Pólya-tree histograms parametrized by

{(β_{ε 0}, β_{ε 1}) : ε \in E}

, if and only if,

\prod_{m \geq 0} \frac{β_{ε o_{m} 0}}{β_{ε o_{m} 0} + β_{ε o_{m} 1}} = 0,

(30)

for every

ε \in E

, and the resulting random element P of

M^{1} (X)

is in the continuous-singular phase.

Proof.

Given that

m \geq 1

, the partition

α_{m}

consists of

2^{m}

intervals of the forms

(a, b]

, where

a = b - 2^{- m}

and

b = 2^{- m} k

,

k = 1, 2 \dots, 2^{m}

, which is generated by a basis for the standard topology on

(0, 1]

. The well-ordered set of partitions

A = {α_{m} : m \geq 1}

resolves

(0, 1]

. For the given

ε \in E_{m}

, we consider

A_{ε} \in α_{m}

. Let also

δ, η > 0

be given. If

G_{α_{m}} (A_{ε}) = 0

,

P_{α_{m}} (A_{ε}) = 0

with

Π_{α_{m}}

-probability one and any compact

K \subset A_{ε}

satisfies property (17). Assuming that

G_{α_{m}} (A_{ε}) > 0

, we write

A_{ε} = (a, b]

for certain fixed

a, b

, like above, and consider the sequence of half-open intervals

{(I_{ε, l})}_{l \geq m}

in

X

, defined by,

I_{ε, l} = A_{ε} ∖ A_{ε o_{m - l}} = (a + 2^{- l}, b] .

Assuming that (30) holds, choose

l \geq m

large enough such that,

\prod_{k = 0}^{l - 1} \frac{β_{ε o_{k} 0}}{β_{ε o_{k} 0} + β_{ε o_{k} 1}} < \frac{δ η}{G_{α_{m}} (A_{ε})} .

Note that for all

m \leq k \leq l

,

φ_{α_{k}} (I_{ε, l}) = φ_{α_{k}} (A_{ε})

, while, for any

l^{'} \geq l

,

φ_{α_{l}^{'}} (I_{ε, l}) = φ_{α_{l}^{'}} (A_{ε}) ∖ φ_{α_{l}^{'}} (A_{ε o_{m - l}})

. Defining K to be the closure of

I_{ε, l}

, by Markov’s inequality, we have,

\begin{matrix} Π_{α_{l^{'}}} ({ & P_{α_{l^{'}}} \in M^{1} (X_{α_{l^{'}}}) : P_{α_{l^{'}}} (K) < P_{α_{l^{'}}} (φ_{α_{l^{'}}} (A_{ε})) - δ}) \\ \leq Π_{α_{l^{'}}} ({P_{α_{l^{'}}} \in M^{1} (X_{α_{l^{'}}}) : P_{α_{l^{'}}} (I_{ε, l}) < P_{α_{l^{'}}} (φ_{α_{l^{'}}} (A_{ε})) - δ}) \\ = Π_{α_{l}} ({P_{α_{l}} \in M^{1} (X_{α_{l}}) : P_{α_{l}} (A_{ε o_{m - l}}) > δ}) \leq \frac{1}{δ} \int_{M^{1} (X_{α_{l}})} P_{α_{l}} (A_{ε o_{m - l}}) d Π_{α_{l}} (P_{α_{l}}) \\ = \frac{G_{α_{m}} (A_{ε})}{δ} \prod_{k = 0}^{l - 1} \frac{β_{ε o_{k} 0}}{β_{ε o_{k} 0} + β_{ε o_{k} 1}} < η, \end{matrix}

which shows that property (17) holds.

Conversely, suppose that there exists a

ε \in E

, such that,

\prod_{m \geq 0} \frac{β_{ε o_{m} 0}}{β_{ε o_{m} 0} + β_{ε o_{m} 1}} > 0,

Then,

lim_{m \to \infty} \int P_{α_{m}} (A_{ε o_{m}}) d Π_{α_{m}} (P_{α_{m}}) > 0,

while the sequence

{(A_{ε o_{m}})}_{m \geq 0}

decreases to ⌀. Hence, the mean measures

G_{α}

do not define a measure (on the ring that is formed by the union of all

σ (α)

,

α \in A

), which precludes the existence of a Borel probability measure

Π

on

M^{1} (X)

with the tight topology (if

Π

would exist,

B \mapsto \int P (B) d Π

would define a Borel mean measure). □

Remark 3.

The above applies to examples with

X = R

as well, but, in that case, we have to require, in addition to (30), that,

\prod_{m \geq 0} \frac{β_{ι_{m} 1}}{β_{ι_{m} 0} + β_{ι_{m} 1}} = 0,

(31)

because aside from the open, left-sided boundaries of half-open intervals

A \in α

, there are directions towards

\pm \infty

where mass can ‘leak away’ in the limit.

Example 5.

It is well known [] that a Pólya-tree histogram system with defining parameters

{(β_{ε 0}, β_{ε 1}) : ε \in E}

that satisfy,

β_{ε} = β_{ε 0} + β_{ε 1},

(32)

for all

ε \in E

coincides with a Dirichlet histogram system (not on all of

A_{0}

, but on a smaller set of dyadic partitions

A

that resolves

X

, generated by a basis). Accordingly, such Dirichlet–Pólya-tree histogram systems have limits that are Radon probability measures on

M^{1} (X)

with the tight topology, and the resulting random element P of

M^{1} (X)

is in the random-atomic phase.

In the example below, we make a choice for the parameters

{(β_{ε 0}, β_{ε 1}) : ε \in E}

that gives rise to a coherent histogram system without a tight limit. This choice is not singular by construction in the sense that parameters either grow very large or vanish in the limit: for all

ε \in E

, we have

β_{ε 0}^{2} + β_{ε 1}^{2} = 1

. To introduce the example, we define the following function on

E

.

Definition 10.

In the standard construction of Cantor space as a subspace

C

of

[0, 1]

by successive deletions of open mid-sections of intervals, we define the Cantor mid-point function x that parametrizes the set of all mid-points of deleted intervals in terms of finite binary sequences:

x : E \to [0, 1]

maps

ε \in E_{m}

to the midpoint of the interval that is deleted in the m-th transition in the construction of the set

C

: for example,

x (ε_{⌀}) = 1 / 2

in

E_{0}

,

x (0) = 1 / 6

,

x (1) = 5 / 6

in

E_{1}

,

x (00) = 1 / 18

,

x (01) = 5 / 18

,

x (10) = 13 / 18

,

x (11) = 17 / 18

in

E_{2}

, etc..

Example 6.

Take

X = R

with a dyadic tree of partitions as defined in Example 2, and, for all

m \geq 0

,

ε \in E_{m}

,

β_{ε 0} = cos (\frac{1}{2} π x (ε)), β_{ε 1} = sin (\frac{1}{2} π x (ε)) .

(33)

Note that

\begin{matrix} \prod_{m \geq 0} & \frac{β_{o_{m} 0}}{β_{o_{m} 0} + β_{o_{m} 1}} = \prod_{m \geq 0} \frac{cos (\frac{1}{2} π x (o_{m}))}{cos (\frac{1}{2} π x (o_{m})) + sin (\frac{1}{2} π x (o_{m}))} \\ = \prod_{m \geq 0} {(1 + tan (\frac{1}{2} π x (o_{m})))}^{- 1} = exp (- \sum_{m \geq 0} log (1 + tan (\frac{1}{2} π x (o_{m})))) . \end{matrix}

It is noted that

x (o_{m}) = 1 / 2 {(1 / 3)}^{m}

and

\sum_{m \geq 0} log (1 + tan (\frac{1}{2} π x (o_{m}))) \approx \sum_{m \geq 0} tan (\frac{1}{2} π x (o_{m})) \approx \frac{π}{2} \sum_{m \geq 0} x (o_{m}) = \frac{3 π}{8} < \infty .

Similarly,

\begin{matrix} \prod_{m \geq 0} & \frac{β_{ι_{m} 1}}{β_{ι_{m} 0} + β_{ι_{m} 1}} = \prod_{m \geq 0} \frac{sin (\frac{1}{2} π x (ι_{m}))}{cos (\frac{1}{2} π x (ι_{m})) + sin (\frac{1}{2} π x (ι_{m}))} \\ = \prod_{m \geq 0} {(1 + 1 / tan (\frac{1}{2} π x (ι_{m})))}^{- 1} = exp (- \sum_{m \geq 0} log (1 + 1 / tan (\frac{1}{2} π x (ι_{m})))) . \end{matrix}

Since

x (ι_{m}) = 1 - x (o_{m})

for all

m \geq 0

,

1 / tan (\frac{1}{2} π x (ι_{m})) = 1 / tan (\frac{1}{2} π (1 - x (o_{m}))) = tan (\frac{1}{2} π x (o_{m})) .

Conclude that,

\prod_{m \geq 0} \frac{β_{o_{m} 0}}{β_{o_{m} 0} + β_{o_{m} 1}} = \prod_{m \geq 0} \frac{β_{ι_{m} 1}}{β_{ι_{m} 0} + β_{ι_{m} 1}} > 0,

which implies that the Pólya-tree random histograms defined in (33) form a coherent system that does not lead to a limiting probability measure on

M^{1} (R)

with the tight topology.

7.2. Weak Limits of Pólya-Tree Histogram Systems

Second, we formulate a sufficient condition for the parameters

{(β_{ε 0}, β_{ε 1}) : ε \in E}

such that the corresponding Pólya-tree histogram system has a limit

Π

that is a Radon probability measure on

M^{1} (X)

with the weak topology. Based on this condition, it is demonstrated that homogeneous Pólya-tree systems with

β_{m}^{- 1} = O (m^{- 1})

give rise to such weak histogram limits. This rate of growth is lower than that required in the sufficient condition of [], which is elaborated upon in [,,] and re-visited in [].

Theorem 9.

Let

X

be a second countable metrizable space with countable basis

U

, with a corresponding dyadic tree

A

of partitions

α_{m}

,

m \geq 1

, generated by the basis. Let

(Π_{α_{m}}, φ_{*, α_{m} α_{n}})

be a coherent inverse system of Pólya-tree measures (with parameter

β = {β_{ε} : ε \in E}

) on the inverse system

(M^{1} (X_{α_{m}}), φ_{* α_{m} α_{n}})

. Assume also that condition (16) holds. Then, there exists a unique Radon probability measure Π on

M^{1} (X)

with the weak topology, projecting to

Π_{α_{m}}

for all

m \geq 1

if,

sup_{m \geq 1} \sum_{ε \in E_{m}} \prod_{l = 1}^{m} \frac{1}{β_{ε_{l - 1} 0} + β_{ε_{l - 1} 1}} (\frac{β_{\hat{ε}}}{β_{ε_{l - 1} 0} + β_{ε_{l - 1} 1} + 1} + β_{ε}) < \infty .

(34)

The resulting random element P of

M^{1} (X)

is in the absolutely-continuous phase.

Proof.

Condition (16) implies the existence of a tightly-Borel probability measure

Π^{'}

on

M^{1} (X)

and a corresponding mean measure

G \in M^{1} (X)

, which serves as our choice of Q in the proof for property (12). Let

δ > 0

be given. For any

L > 0

and every

m \geq 1

, Markov’s inequality gives,

\begin{matrix} Π_{α_{m}} ({ & P_{α_{m}} \in M^{1} (X_{α_{m}}) : ∥ P_{α_{m}} - P_{α_{m}} \land L G_{α_{m}} ∥_{1, X_{α_{m}}} > δ}) \\ \leq Π_{α_{m}} (\{P_{α_{m}} \in M^{1} (X_{α_{m}}) : \frac{1}{L} \sum_{i \in I (α_{m})} \frac{P_{α_{m}} {(A_{i})}^{2}}{G_{α_{m}} (A_{i})} > δ\}) \\ \leq \frac{1}{L δ} \sum_{i \in I (α_{m})} \frac{1}{G_{α_{m}} (A_{i})} \int_{M^{1} (X_{α_{m}})} P_{α_{m}} {(A_{i})}^{2} d Π_{α_{m}} (P_{α_{m}}) \\ = \frac{1}{L δ} \sum_{ε \in E_{m}} \prod_{l = 1}^{m} (\frac{β_{ε_{l - 1} 0} β_{ε_{l - 1} 1}}{β_{ε_{l - 1} e_{l}}} \frac{1}{(β_{ε_{l - 1} 0} + β_{ε_{l - 1} 1}) (β_{ε_{l - 1} 0} + β_{ε_{l - 1} 1} + 1)} + \frac{β_{ε_{l - 1} e_{l}}}{β_{ε_{l - 1} 0} + β_{ε_{l - 1} 1}}) \\ < \frac{K}{L δ}, \end{matrix}

for all

m \geq 1

, where K denotes the value of the supremum in Condition (34). Consequently, condition (12) is satisfied and Theorem 2 asserts that there exists a unique weakly-Radon probability measure

Π

on

M^{1} (X)

that projects to

Π_{α_{m}}

for all

m \geq 1

. □

Corollary 3.

Assume the conditions of Theorem 9 and let a sequence

β_{m} > 0

, (

m \geq 1

) be given. If the

β_{m}

grows like m or faster,

β_{m}^{- 1} = O (m^{- 1})

, there exists a unique Radon probability measure Π on

M^{1} (X)

with the weak topology, projecting to the associated homogeneous Pólya-tree histogram system.

Proof.

Substituting

β_{ε} = β_{m}

in Condition (34), we find, for every

m \geq 1

,

\sum_{ε \in E_{m}} \prod_{l = 1}^{m} \frac{1}{β_{ε_{l - 1} 0} + β_{ε_{l - 1} 1}} (\frac{β_{\hat{ε}}}{β_{ε_{l - 1} 0} + β_{ε_{l - 1} 1} + 1} + β_{ε}) = {(\frac{1}{2 β_{m} + 1} + 1)}^{m},

which behaves like

exp (m / (2 β_{m} + 1))

in the limit

m \to \infty

. Since

m / β_{m} = O (1)

by assumption, the right-hand side stays bounded and Property (34) is satisfied. □

Note that the sufficient condition of [] (see also []) suggests that the absolute continuity of homogeneous Pólya-tree limits sets in when

β_{m}

grows as

O (m^{2})

or faster; here, it is shown that absolute continuity is already obtained with

β_{m}

that grows more slowly, like

O (m)

, or faster.

8. Existence and Phases of Gaussian Histogram Limits

Most known examples of random histogram systems with a limit are of the (normalized) completely random type []. The reason for the preference for systems with independent components is coherence, cf. (1) or (9), which is analyzed most conveniently with infinite divisibility, requiring independence between summands. The consequence is that most known histogram limits are in one of the atomic phases of Theorem 5. In this section we introduce the family of Gaussian random measures, random signed measures on the space

X

with components that display dependence generically, manifesting in one of the non-atomic phases of Theorem 5.

8.1. Random Histogram Limits with Signed Measures

To arrive at a proof of existence for Gaussian histogram limits, we have to generalize the approaches of Section 3.2 and Section 4.1. Consider the case of a locally compact Polish space

X

. The most natural generalization of our random histogram question calls for construction of Radon probability measures on

M (X)

, the space of all signed (and potentially unbounded) Radon measures on

X

, with the vague topology (see [], Ch. III, § 1, No. 9), rather than

M^{1} (X)

with the tight topology as in Section 4. However, to make histogram projections continuous, transition to a zero-dimensional refinement

\hat{X}

(as in Section 4.1.1) is still a necessary step, which does not combine well with the vague topology (test functions

f : X \to R

for the vague topology on

M (X)

remain continuous when viewed as

f : \hat{X} \to R

, but the compactness of their supports in

\hat{X}

is lost in general).

However, since

M (X)

with the vague topology is the inverse limit of the spaces

M (K)

for compact

K \subset X

, we may also limit attention to compact subsets

K \subset X

initially and then use Theorem 1 (with a directed set of compact

K \subset X

labeling a coherent inverse system of histogram limits

Π_{K}

) to define a limiting Radon probability measure

Π

on

M (X)

with the vague topology.

We defer proof of existence for such a ‘vague inverse limit of histogram limits on compacta’ to future work and focus here on the case where

X

itself is a compact Polish space. Then,

M (X) = M_{b} (X)

and the vague and tight topologies coincide. Although

\hat{X}

is not compact in general,

M_{b} (\hat{X})

with the tight topology still stands in continuous bijective correspondence with

M (X)

(as in (the first part of) Proposition 9), and the histogram projections

{\hat{φ}}_{* α} : M_{b} (\hat{X}) \to M (X_{α})

of the form (15) are continuous. This enables the use of Theorem 1 to prove the existence of histogram limits

Π

that are Radon probability measures on

M (X)

with the vague/tight topology.

Theorem 10.

Let

X

be a compact Polish space and let

A

be a directed set of partitions that resolves

X

, generated by a basis that gives rise to a zero-dimensional

\hat{X}

. Consider

M (X)

with the tight topology and a coherent random histogram system

(Π_{α}, φ_{* α β})

. If,

(i.): for every $ϵ > 0$ , there is a constant $M > 0$ such that for all $α \in A$ ,

$Π_{α} ({Φ_{α} \in M (X_{α}) : {∥ Φ_{α} ∥}_{1, α} > M}) < ϵ;$

(35)
(ii.): and, for every $ϵ, δ > 0$ there is a compact $\hat{K} \subset \hat{X}$ such that for all $α \in A$ ,

$Π_{α} ({Φ_{α} \in M (X_{α}) : | Φ_{α} | (φ_{α} (\hat{K})) < | Φ_{α} | (X_{α}) - δ}) < ϵ,$

(36)

then there exists a unique Radon probability distribution Π on

M (X)

projecting to

Π_{α}

for all

α \in A

.

Proof.

According to Proposition 8, the Borel sets on

X

and

\hat{X}

, as well as (bounded) signed Borel set functions and measures, are the same. The first part of Proposition 9 remains true, (the set-theoretic identity mapping

i_{*} : M_{b} (\hat{X}) \to M (X)

is a tight-to-tight continuous bijection), but the second part fails because

M (X)

and

M_{b} (\hat{X})

are not necessarily Polish spaces (see Remark 4). Like before, the mappings

({\hat{φ}}_{* α}, φ_{* α β})

form a coherent and separating family of continuous mappings on

M_{b} (\hat{X})

. Trivially, the linear spaces

M (X_{α})

are finite-dimensional normed spaces (the vague, tight and total-variational topology are all equivalent) and the mappings

φ_{* α β}

are surjective and continuous, like in Proposition 3. Accordingly,

(M (X_{α}), φ_{* α β})

forms an inverse system.

To show that condition (10) holds, let

ϵ > 0

be given and let

M > 0

be a constant such that property (35) is satisfied for every

α \in A

. Define,

H_{1} = ⋂ \{Φ \in M (\hat{X}) : {∥Φ_{α}∥}_{1, α} \leq M, α \in A\} = {Φ \in M (\hat{X}) : {∥Φ∥}_{1, X} \leq M},

since

{∥ Φ ∥}_{1, X} = {sup}_{α \in A} {∥ Φ_{α} ∥}_{1, α}

, cf. Proposition 1.

Let

\hat{K}

be a compact subset of

\hat{X}

. For any

Φ \in H_{1}

,

| Φ | (\hat{K}) \leq | Φ | (\hat{X}) = | Φ | (X) = {∥ Φ ∥}_{1, X} = sup_{α \in A} {∥ Φ_{α} ∥}_{1, α} \leq M,

with

(ϵ_{n}), (δ_{n})

as in the proof of Theorem 3 and the corresponding compacta

({\hat{K}}_{n})

in

\hat{X}

cf. property (36), we also define,

H_{2} = ⋂ \{Φ \in M (\hat{X}) : | Φ_{α} | (φ_{α} ({\hat{K}}_{n})) \geq | Φ_{α} | (X_{α}) - δ_{n}, n \geq 1, α \in A\} .

Following steps analogous to those in the proof of Theorem 3, one then finds that

H = H_{1} \cap H_{2}

satisfies Prokhorov’s conditions (see (A2)), so that the closure

\bar{H}

forms a compact subset of

M (\hat{X})

and, for any

α

, we have (by monotony of

α \mapsto | Φ_{α} | (B)

for any

B \in σ (α)

, like in the proofs of Theorems 2 and 3),

\begin{matrix} Π_{α} (M (X_{α} & ) ∖ {\hat{φ}}_{* α} (\bar{H})) \leq Π_{α} (\{Φ \in M (\hat{X}) : {∥Φ_{α}∥}_{1, α} > M\}) \\ + \sum_{n \geq 1} Π_{α} (\{Φ_{α} \in M (X_{α}) : | Φ_{α} | (φ_{α} ({\hat{K}}_{n})) < | Φ_{α} | (X_{α}) - δ_{n}\}) < 2 ϵ, \end{matrix}

which shows that condition (10) of Theorem 1 is satisfied. Conclude that there exists a unique Radon probability measure

\hat{Π}

on

M (\hat{X})

that projects to

Π_{α}

for all

α \in A

. The continuous mapping

i_{*} : M (\hat{X}) \to M (X)

serves to define

Π = \hat{Π} \circ i_{*}^{- 1}

, a Radon probability measure on

M (X)

, and

Π

still projects to

Π_{α}

for all

α \in A

. □

As it stands, property (36) is somewhat unwieldy due to the occurrence of compacta in

\hat{X}

. Analogous to Corollary 2, we also provide a version that refers only to compacta in

X

. For brevity’s sake, we omit the proof (which follows the same precise steps of the proof of Corollary 2): if, for all

α \in A

, all

A \in α

and all

ϵ, δ > 0

, there is a

K \subset A

, compact in

X

, such that,

Π_{β} ({Φ_{β} \in M (X_{β}) : | Φ_{β} | (φ_{β} (K)) < | Φ_{β} | (φ_{β} (A)) - δ}) < ϵ,

(37)

for all

β \in A

such that

α \leq β

, then property (36) is satisfied.

Remark 4.

Regarding the pair of properties (35) and (36), we remark that, unlike earlier applications of Theorem 1, our conditions are sufficient but (perhaps) not necessary for the existence of a histogram limit: note that

M (X)

is not necessarily complete and not a Polish space generically (see [], Ch. III, § 1, No. 9, Proposition 14), so that Borel measurability of the inverse mapping

i_{*}^{- 1}

can no longer be guaranteed. Accordingly, not every Radon probability measure on

M (X)

can be extended to a Borel probability measure on

M (\hat{X})

canonically, and there may exist coherent histogram systems with an tight inverse limit Π on

M (X)

, for which stated conditions do not hold.

8.2. Existence of Tight Gaussian Histogram Limits

For the definition of Gaussian histogram systems, location and covariance parameters are defined in a way comparable to that of the base measure of the Dirichlet family.

Definition 11.

Let

X

be a compact Polish space, let λ be a signed Radon measure on

X

. For any Borel-measurable partition α, let

λ_{α}

denote the α-histogram projection of λ,

λ_{α} = (λ (A_{1}), \dots, λ (A_{| α |}))

in

M (X_{α})

. Let Σ be a signed, symmetric Radon measure on

X \times X

(symmetric meaning that

Σ (A \times B) = Σ (B \times A)

for all

A, B \in B

). Assume that for every

α \in A

, the

| α | \times | α |

-matrix

Σ_{α}

, with entries,

Σ_{α, i j} = Σ (A_{i} \times A_{j}),

(

1 \leq i, j \leq | α |

) is semi-positive definite. We refer to λ as the center measure, Σ as the covariance measure and

(λ, Σ)

as Gaussian parameters.

The measure

Σ

may be viewed equivalently as a linear mapping that takes continuous functions on

X

into signed Radon measures on

X

, or as a symmetric bi-linear form. To give examples, we may turn to the theory of reproducing kernel Hilbert spaces.

Example 7.

For

d \geq 1

, let

X

be a compact subset of

R^{d}

, and consider a so-called positive-definite symmetric kernel function

k : X \times X \to R

; cf. the Moore–Aronszajn theorem. Every such kernel function is the reproducing kernel for a unique Hilbert space of functions on

X

. We use k to define,

Σ_{k} (A \times B) = \int_{A \times B} k (x, y) d x d y,

(38)

and note that such a

Σ_{k}

is a covariance measure in the sense of Definition 11. Mercer’s theorem formulates the associated spectral theory, with

Σ_{k}

viewed as a (compact, self-adjoint, positive) integral operator

L^{2} (X) \to L^{2} (X)

. Indeed, we may define a kernel by choice of a countable orthonormal subset of continuous functions

{ϕ_{i} : i \in I}

, and non-negative

{λ_{i} : i \in I}

, to define

k (x, y) = \sum_{i \in I} λ_{i} ϕ_{i} (x) ϕ_{i} (y)

.

To extend the previous example for general covariance measures

Σ

, note that if

A

consists of partitions generated by a basis for

X

, then, for any continuous

f : X \to R

,

{⟨ f, f ⟩}_{Σ} : = \int_{X \times X} f (x) f (y) d Σ (x, y) \geq 0 .

Polarization then defines a positive semi-definite bilinear form on

C (X) \times C (X)

. Within

C (X)

, there is a linear space

Q_{Σ}

of functions f that are

Σ

-almost-surely equal to zero (

f \sim 0

if

{⟨ f, f ⟩}_{Σ} = 0

;

f \sim g

if

f - g \sim 0

.), and the quotient space

C (X) / Q_{Σ}

is a real pre-Hilbert space (see [] (Definition 7.5)), with Hilbert space completion denoted as

L^{2} (Σ)

, which generalizes the Moore–Aronszajn Hilbert spaces associated with reproducing kernel functions.

Definition 12.

Given Gaussian parameters

(λ, Σ)

, we define a Gaussian histogram system as follows: for all

α \in A

, we choose normal probability distributions

Π_{λ, Σ, α}

for random signed histograms

Φ_{α} \in M (X_{α})

, as follows:

(Φ_{α} (A_{1}), \dots, Φ_{α} (A_{| α |})) \sim Π_{λ, Σ, α} = N (λ_{α}, Σ_{α}),

where

N (λ_{α}, Σ_{α})

denotes the multivariate normal distribution on

R^{| α |}

with expectation

λ_{α}

and covariance matrix

Σ_{α}

. When

λ = 0

, we speak of a centered Gaussian histogram distribution, denoted as

Π_{Σ, α} = N (0, Σ_{α})

.

For partitions

α, β \in A

, where

β

refines

α

, let

φ_{* α β}

be as in (3), the mapping that expresses finite additivity. Below, we show that for any

A

and any Gaussian parameters

(λ, Σ)

, the above histogram distributions define a coherent system, referred to as the Gaussian histogram system

(Π_{λ, Σ, α}, φ_{* α β})

associated with the parameters

(λ, Σ)

.

The inclusion of a center measure

λ

is not of influence for the existence of Gaussian histogram limits: for all

α \in A

and all Borel sets B in

M (X_{α})

,

Π_{λ, Σ, α} (\{Φ_{α} \in M (X_{α}) : Φ_{α} \in B\}) = Π_{Σ, α} (\{Φ_{α} \in M (X_{α}) : Φ_{α} \in B - λ_{α}\}),

and, hence,

Π_{λ, Σ}

exists if and only if

Π_{Σ}

exists. The existence of histogram limits therefore only concerns the

Σ

-parameter. The existence conditions of Theorem 10 can be dominated by uniform bounds on mixed second moments of the absolute histogram components

| Φ_{α} (A) |

, which we denote by,

\begin{matrix} T (A, B) & = \int_{M (X_{α})} |Φ_{α} (A) Φ_{α} (B)| d Π_{Σ, α} (Φ_{α}) \\ = C Σ (A \times B) + \frac{2}{π} \sqrt{Σ (A \times A) Σ (B \times B) - Σ {(A \times B)}^{2}}, \end{matrix}

(39)

where

0 \leq C \leq 1

is a constant that depends only on the correlation coefficient between

Φ_{α} (A)

and

Φ_{α} (B)

(see [], p. 933).

Corollary 4.

Let

X

be a compact Polish space and let

A

be a directed set of partitions generated by a basis, as in Example 1. Let Σ be a covariance measure on

X \times X

. Consider

M (X)

with the tight topology and the centered Gaussian histogram system

(Π_{Σ, α}, φ_{* α β})

. If the covariance measure Σ is such that,

(i.): $sup_{α \in A} \sum_{A, B \in α} T (A, B) < \infty;$

(40)
(ii.): and, for any open $U_{n} \in \cup {σ (α) : α \in A}$ that decrease to ⌀,

$lim_{n \to \infty} sup_{α \in A} \sum {T (A, B) : A, B \in α, A, B \subset U_{n}} = 0,$

(41)

then there exists a unique Radon probability distribution

Π_{Σ}

on

M (X)

projecting to

Π_{Σ, α}

for all

α \in A

.

Proof.

To use Theorem 10, we first verify the coherence of Gaussian histogram systems. (We do this first step of the proof generically, that is, with

λ \neq 0

.) If

α, β \in A

,

α \leq β

, and we write

A_{i} = \cup_{k \in J_{α β} (i)} B_{k}

, then,

{(λ_{α})}_{i} = \sum_{k \in J_{α β} (i)} {(λ_{β})}_{k}, {(Σ_{α})}_{i j} = \sum_{k \in J_{α β} (i)} \sum_{l \in J_{α β} (j)} {(Σ_{β})}_{k l},

for

1 \leq i, j \leq | α |

. This can be expressed in terms of a linear mapping

P_{α β} : R^{| β |} \to R^{| α |}

such that,

λ_{α} = P_{α β} λ_{β}, Σ_{α} = P_{α β} Σ_{β} {(P_{α β})}^{t},

(where

{(P_{α β})}^{t}

denotes the matrix transpose of

P_{α β}

). Recall that for any finite

d, d^{'} \geq 1

, any linear

P : R^{d} \to R^{d^{'}}

and a random variable

Φ

distributed

N (λ, Σ)

, the random variable

P Φ

is distributed

N (P λ, P Σ P^{T})

. So

Φ_{α} \sim Π_{λ, k, α}

has the same distribution as

φ_{* α β} (Φ_{β})

for

Φ_{β} \sim Π_{λ, k, β}

,

φ_{* α β} {(Φ_{β})}_{i} = \sum_{j \in J_{α β} (i)} {(Φ_{β})}_{j} = {(P Φ_{β})}_{i},

for all

1 \leq i \leq | α |

. This verifies the coherence of the histogram system

(Π_{λ, Σ, α}, φ_{* α β})

.

In the rest of the proof, we assume that the histogram system is centered:

λ = 0

. To show that property (35) holds, we use Chebyshev’s inequality to upper-bound its left-hand side, for every

α \in A

and

M > 0

:

\begin{matrix} Π_{α} ({Φ_{α} & \in M (X_{α}) : ∥ Φ_{α} ∥_{1, α} > M}) = Π_{Σ, α} ({Φ_{α} \in M (X_{α}) : \sum_{A \in α} | Φ_{α} (A) | > M}) \\ = Π_{Σ, α} ({Φ_{α} \in M (X_{α}) : \sum_{A, B \in α} | Φ_{α} (A) Φ_{α} (B) | > M^{2}}) \\ \leq \frac{1}{M^{2}} \int_{M (X_{α})} \sum_{A, B \in α} |Φ_{α} (A) Φ_{α} (B)| d Π_{Σ, α} (Φ_{α}) . \end{matrix}

Assuming Condition (40) and choosing M large enough, we see that property (35) is satisfied for every

ϵ > 0

and all

α \in A

. Let

α \in A

and

A \in α

be given. Because

α

is generated by a basis, A is the intersection

U \cap C

of an (open) finite intersection U of basis sets and a (closed) finite intersection C of complements of basis sets. Because

X

is a Polish space, U is

F_{σ}

, i.e., U is equal to a countable union of closed sets

U = \cup_{m \geq 1} C_{m}

. Then, the closed sets

C_{n}^{'} = C \cup (\cup_{1 \leq m \leq n} C_{m})

increase to A as

n \to \infty

. In the open complements of

C_{n}^{'}

in A, there exists a decreasing sequence of basis elements

U_{n}

, and we may define closed sets

K_{n} = A ∖ U_{n}

. The open sets

U_{n}

decrease to ⌀ and the sets

K_{n}

are closed as subsets of

X

and therefore compact. By assumption (see Example 1),

A

is such that all sets in the basis occur as elements of

σ (α)

for some

α \in A

. So, for every

n \geq 1

and every

α \in A

, there exists a

β \geq α

such that the decomposition

A = U_{n} \cup K_{n}

is such that

U_{n}, K_{n} \in σ (β)

.

Let

ϵ, δ > 0

be given. Based on Assumption (41), choose

n \geq 1

large enough such that,

sup_{γ \geq α} \sum {T (B, C) : B, C \in γ, B, C \subset U_{n}} < δ^{2} ϵ .

Let

γ \geq α

be given. Since

φ_{γ} (A) ∖ φ_{γ} (K_{n}) \subseteq φ_{γ} (U_{n})

,

\begin{matrix} Π_{Σ, γ} ({Φ_{γ} & \in M (X_{γ}) : | Φ_{γ} | (φ_{γ} (K_{n})) < | Φ_{γ} | (φ_{γ} (A)) - δ}) \\ \leq Π_{Σ, γ} ({Φ_{γ} \in M (X_{γ}) : | Φ_{γ} | (φ_{γ} (U_{n})) > δ}) \\ = Π_{Σ, γ} ({Φ_{γ} \in M (X_{γ}) : \sum \{| Φ_{γ} (B) Φ_{γ} (C) | : B, C \in γ, B, C \subset U_{n} \}> δ^{2}\}) \\ \leq \frac{1}{δ^{2}} \int_{M (X_{γ})} \sum_{B, C \subset U_{n}} |Φ_{γ} (B) Φ_{γ} (C)| d Π_{Σ, γ} (Φ_{γ}) = \frac{1}{δ^{2}} \sum_{B, C \subset U_{n}} T (B, C) < ϵ, \end{matrix}

showing that property (37) holds. □

In the above proof, we satisfy property (37) by roughly following the proof of Theorem 8, but with a different construction of the compacta

K_{n}

, which is more generic and based on Example 1. In applications, control over the choice of

α

allows for convenient constructions. Where the proof of Theorem 8 depends on the Carathéodory-like condition that

G_{α} (φ_{α} (U_{n})) \to 0

for (specific) sequences

(U_{n})

in the generating ring that decrease to ⌀, here, the second-absolute-moment set functions of Condition (41) are required to go to zero.

Based on Condition (40), we briefly come back to the space

L^{2} (Σ)

and indicate how it is related to the covariance structure of a centered Gaussian histogram limit

Π_{Σ}

. To this end, we define the real-valued stochastic integrals

Φ_{f} : = \int_{X} f d Φ

for

f \in C (X)

and consider the linear space of real-valued random variables

L = {Φ_{f} : f \in C (X)}

that they span. Assuming integrability, on L, we define the bilinear form,

{⟨ Φ_{f}, Φ_{g} ⟩}_{L} : = \int_{M (X)} Φ_{f} Φ_{g} d Π_{Σ} (Φ) .

With

Q_{L} = {Φ_{f} \in L : {⟨ Φ_{f}, Φ_{f} ⟩}_{L} = 0}

, the quotient space

L / Q_{L}

with

{⟨ \cdot, \cdot ⟩}_{L}

as an inner product is a real pre-Hilbert space, with Hilbert space completion denoted by

L^{2} (Π_{Σ})

. The following proposition involves histogram approximations of continuous functions: for any

f \in C (X)

and any partition

α

generated by a basis, let

f_{α} (A)

,

(A \in α)

be real numbers such that

inf {f (x) : x \in A} \leq f_{α} (A) \leq sup {f (x) : x \in A}

and let

f_{α} (x) = \sum_{A \in α} f_{α} (A) 1_{A} (x)

, noting that, for all

x \in X

,

f_{α} (x) \to f (x)

as

α

refines within an

A

that resolves

X

.

Proposition 12.

Let

X

be a compact Polish space and let

A

be a directed set of partitions generated by a basis, as in Example 1. Let Σ be a covariance measure for which Conditions (40) and (41) hold. Then, for all

f, g \in C (X)

,

{⟨ f, g ⟩}_{Σ} = lim_{α} \sum_{A, B \in α} f_{α} (A) g_{α} (B) Σ (A \times B) = {⟨ Φ_{f}, Φ_{g} ⟩}_{L},

and the mapping

C (X) \to L : f \mapsto Φ_{f}

extends to an isometric isomorphism of Hilbert spaces

L^{2} (Σ) \to L^{2} (Π_{Σ})

.

Proof.

We first show that the (semi-definite) inner-product spaces

C (X)

and L are isometrically isomorphic. Let

f, g : X \to R

be continuous and denote their supremum norms by

{∥ f ∥}_{\infty, X}, {∥ g ∥}_{\infty, X} < \infty

. By the continuity of

R \to R : (x, y) \mapsto x y

,

\begin{matrix} {⟨ f, g ⟩}_{Σ} & = \int_{X^{2}} f (x) g (y) d Σ (x, y) \\ = \int_{X^{2}} (lim_{α} \sum_{A \in α} f_{α} (A) 1_{A} (x)) (lim_{β} \sum_{B \in β} g_{β} (B) 1_{B} (y)) d Σ (x, y) \\ = \int_{X^{2}} lim_{γ} \sum_{A, B \in γ} f_{γ} (A) g_{γ} (B) 1_{A} (x) 1_{B} (y) d Σ (x, y) \\ = lim_{γ} \sum_{A, B \in γ} f_{γ} (A) g_{γ} (B) Σ (A \times B), \end{matrix}

and the last step holds by Lebesgue’s dominated convergence, based on the facts that

{∥ f ∥}_{\infty, X}, {∥ g ∥}_{\infty, X} < \infty

and

Σ (X \times X) < \infty

. Using the definition of

Φ_{α}

, we may then write,

\begin{matrix} {⟨ f, g ⟩}_{Σ} & = lim_{α} \sum_{A, B \in α} \int_{M (X_{α})} f_{α} (A) Φ_{α} (A) g_{α} (B) Φ_{α} (B) d Π_{Σ, α} (Φ_{α}) \\ = lim_{α} \int_{M (X)} \sum_{A, B \in α} f_{α} (A) Φ (A) g_{α} (B) Φ (B) d Π_{Σ} (Φ) . \end{matrix}

To complete the argument, note that,

|\sum_{A, B \in α} f_{α} (A) Φ (A) g_{α} (B) Φ (B)| \leq {∥ f ∥}_{\infty, X} {∥ g ∥}_{\infty, X} \sum_{A, B \in α} |Φ (A) Φ (B)|,

and the right-hand side is monotone-increasing in

α

. By monotone convergence,

lim_{α \in A} \sum_{A, B \in α} \int_{M (X)} |Φ (A) Φ (B)| d Π_{Σ} (Φ) = \int_{M (X)} lim_{α \in A} \sum_{A, B \in α} |Φ (A) Φ (B)| d Π_{Σ} (Φ),

so that Condition (40) asserts the integrability of the function

Φ \mapsto {lim}_{α} \sum_{A, B \in α} | Φ (A) Φ (B) |

. So, using again the continuity of

R \to R : (x, y) \mapsto x y

and dominated convergence,

\begin{matrix} {⟨ f, g ⟩}_{Σ} & = \int_{M (X)} lim_{α} \sum_{A \in α} f_{α} (A) Φ (A) \sum_{B \in α} g_{α} (B) Φ (B) d Π_{Σ} (Φ) \\ = \int_{M (X)} (lim_{α} \sum_{A \in α} f_{α} (A) Φ (A)) (lim_{β} \sum_{B \in β} g_{β} (B) Φ (B)) d Π_{Σ} (Φ) \\ = \int_{M (X)} Φ_{f} Φ_{g} d Π_{Σ} (Φ) . \end{matrix}

This implies that for any

f \in C (X)

,

{⟨ f, f ⟩}_{Σ} = 0

if and only if

{⟨ Φ_{f}, Φ_{f} ⟩}_{L} = 0

. Consequently, the pre-Hilbert spaces

C (X) / Q_{Σ}

and

L / Q_{L}

are in isometrically isomorphic correspondence, and so are their (unique) Hilbert space completions. □

To conclude this subsection, we briefly consider two statistical perspectives: the frequentist question of estimation of the covariance measure and the Bayesian question of using Gaussian histogram limits to define priors on spaces of probability measures.

Remark 5.

Consider the statistical estimation of the covariance measure Σ from observed data. Assume independent and identically distributed observations

Φ_{1}, Φ_{2}, \dots

, each distributed marginally according to the Gaussian histogram limit

Π_{Σ}

for some fixed covariance measure Σ. The idealized, direct question of how to estimate Σ from the observed sample is difficult because the data is of a functional nature: in any practical way, observing points in a space of measures amounts to the observation of some approximation or projection. In the present context, we interpret the data points as histograms: for some sample size

n \geq 1

and some

α \in A

, we observe independent and identically distributed

Φ_{α, 1}, \dots, Φ_{α, n}

. The question of estimating

Σ_{α}

is then the textbook question of estimating the covariance matrix based on an independent and identically distributed sample from a multivariate normal distribution. This is a smooth parametric estimation problem, with best-regular estimators

{\hat{Σ}}_{α, n}

displaying

n^{- 1 / 2}

-convergence, optimal asymptotic covariance and optimal Wald-type confidence sets.

If the functional data

Φ_{1}, Φ_{2}, \dots

is expressed through any

α \in A

(that is, if the statistician can choose the partition before he sees the data

Φ_{α, 1}, \dots, Φ_{α, n}

), it is important to note that, like the approximation of probability densities in Lemma 1, it is possible to approximate Σ by histograms in total variation:

∥ Σ_{α} {- Σ ∥}_{1, X \times X} \to 0,

(as

n \to \infty

). In such a setting, we may refine partitions

α_{n} \leq α_{n + 1}

as the sample size n increases, ideally in such a way that the estimation error,

∥ {\hat{Σ}}_{α_{n}, n} - Σ_{α_{n}} ∥_{1, X_{α_{n}} \times X_{α_{n}}} \to 0,

(as

n \to \infty

) and the histogram approximation error

∥ Σ_{α_{n}} {- Σ ∥}_{1, X \times X}

are of comparable order.

Remark 6.

To illustrate the statistical possibilities also from the Bayesian perspective, consider the following family of non-parametric priors for measure spaces: since Gaussian histogram limits describe random signed measures, there is no direct Bayesian interpretation for a Gaussian histogram limit as a prior on a statistical model. However, a Gaussian random measure Φ can be conditioned to be positive, and if,

Π_{Σ} ({0 < ∥ Φ ∥}_{1, X} < \infty | Φ \geq 0) = 1,

the conditioned random element can be normalized to a random probability measure, analogous to the (Bayesian) normalization of positive completely random measures (e.g., the Gamma process of Example 8). The resulting class of normalized conditionally positive Gaussian priors enable the novel option of describing continuous-singular random probability measures with histograms, rather than only random discrete probability measures (e.g., the Dirichlet random measures of Section 6).

8.3. Existence of Weak Gaussian Histogram Limits

In Section 8.1, we saw that for the existence of tight histogram limits, condition (35) (which is close to necessary, cf. Remark 4) says that the limiting random

Φ

has a norm

{∥ Φ ∥}_{1, X}

that is a tight real-valued random variable. We shall see in the present subsection that for the existence of weak Gaussian histogram limits, it is sufficient that

{∥ Φ ∥}_{1, X}

is an integrable random variable.

Let

X

be a compact Hausdorff space, fix the topology on

M (X)

to be the weak topology and choose a directed set

A

of finite partitions in non-empty Borel sets. Then,

X

,

A

and

M (X)

satisfy the minimal conditions of Definition 1. The compactness of a subset of

M (X)

is still characterized by the Dunford–Pettis–Grothendieck condition, but with P replaced by the positive measure

| Φ |

: a subset H of

M (X)

is relatively compact in the weak topology if and only if for some positive, bounded measure

Q \in M^{+} (X)

,

sup_{Φ \in H} {∥| Φ | - | Φ | \land L Q∥}_{1, X} \to 0,

as

L \to \infty

. The proof of the existence theorem for histogram limits that are Radon with respect to the weak topology does not differ substantially from that of Theorem 2, so we omit explicit statement.

Theorem 11.

Let

X

be a compact Hausdorff space, consider

M (X)

with the weak topology and choose a directed set of finite partitions

A

in non-empty Borel sets that resolves

X

. Let

(Π_{α}, φ_{* α β})

be a coherent system of Borel probability measures on the inverse system

(M (X_{α}), φ_{* α β})

. There exists a unique weakly-Radon probability measure Π on

M (X)

projecting to

Π_{α}

for all

α \in A

, if and only if, there is a

Q \in M_{b}^{+} (X)

such that for every

ϵ, δ > 0

, there is a

L > 0

such that,

Π_{α} ({Φ_{α} \in M (X_{α}) : {∥| Φ_{α} | - | Φ_{α} | \land L Q_{α}∥}_{1, X_{α}} > δ}) < ϵ,

(42)

for all

α \in A

.

Given a Radon probability measure

Π

on

M (X)

, the role of the mean measure G of Definition 5 as the dominating measure is taken over by the positive measure,

Q (B) = \int_{M (X)} | Φ | (B) d Π (Φ),

for

B \in B

, and

Q (B) = 0

implies

Π ({Φ \in M (X) : | Φ | (B) > 0}) = 0

(as in Lemma 3). Proposition 4 can be adapted to the signed case as well:

{Φ \in M (X) : | Φ | ≪ Q}

is closed in

M (X)

and,

{supp}_{T_{1}} (Π) \subset {Φ \in M (X) : | Φ | ≪ Q} .

Below, we apply Theorem 11 to Gaussian histogram systems. To prepare, it is noted that for all

α \in A

and all

A \in α

,

Q_{α} (A) = \int | Φ_{α} | (A) d Π_{α} (Φ_{α}) = \sqrt{\frac{2}{π} Σ (A \times A)},

(43)

Clearly, the resulting set functions

Q_{α} : σ (α) \to [0, \infty)

are not the

σ (α)

-restrictions of the measure Q (contrary to the cases of positive or probability histogram limits, where all

G_{α}

are restrictions of the mean measure G). For every

α \in A

and all

A \in α

,

| Φ_{α} | (A) \leq | Φ | (A)

, and, hence,

Q_{α} (A) = \int | Φ_{α} | (A) d Π_{α} (Φ_{α}) \leq \int | Φ | (A) d Π (Φ) = Q (A)

. If we assume that

A

resolves

X

, the positive measures

| Φ_{α} |

(and the total-variational norms

∥ Φ_{α} ∥_{1, X_{α}} = | Φ_{α} | (X_{α})

) increase to

| Φ |

(and the total-variational norm

{∥ Φ ∥}_{1, X} = | Φ | (X)

).

Corollary 5.

Let

X

be a compact Hausdorff space and let

A

be a directed set of partitions generated by a basis that resolves

X

. Let Σ be a covariance measure on

X

. Consider

M (X)

with the weak topology and the centered Gaussian histogram system

(Π_{Σ, α}, φ_{* α β})

. If,

Q (X) = sup_{α \in A} \sum_{A \in α} \sqrt{Σ (A \times A)} < \infty,

(44)

then there exists a unique weakly Radon probability distribution

Π_{Σ}

on

M (X)

projecting to

Π_{Σ, α}

for all

α \in A

.

Proof.

Let

ϵ, δ > 0

be given and choose

L = \sqrt{π} S / (\sqrt{2} ϵ δ)

, where

S > 0

denotes any upper-bound for the left-hand side of Condition (44). Due to the bound (26) and Markov’s inequality, we have for any

α \in A

,

\begin{matrix} Π_{Σ, α} ({ & Φ_{α} \in M (X_{α}) : {∥| Φ_{α} | - | Φ_{α} | \land L Q_{α}∥}_{1, X_{α}} > δ}) \\ \leq Π_{Σ, α} (\{Φ_{α} \in M (X_{α}) : \frac{1}{L} \sum_{A \in α} \frac{Φ_{α} {(A)}^{2}}{Q_{α} (A)} > δ\}) \\ \leq \frac{1}{L δ} \int_{M (X_{α})} \sum_{A \in α} \frac{Φ_{α} {(A)}^{2}}{Q_{α} (A)} d Π_{Σ, α} (Φ_{α}) \leq \frac{1}{L δ} \sum_{A \in α} \sqrt{\frac{π}{2} Σ (A \times A)} < ϵ, \end{matrix}

where we use the conditions that for every

α \in A

and all

A \in α

,

\int Φ_{α} {(A)}^{2} d Π_{Σ, α} (Φ_{α}) = Σ (A \times A)

and (43). □

Two remarks are in order: firstly, we relate Condition (44) for the existence of weak Gaussian limits to Condition (40) for the existence of tight Gaussian limits by the Cauchy-Schwartz inequality:

\begin{matrix} \sum_{A, B \in α} \int_{M (X_{α})} & |Φ_{α} (A) Φ_{α} (B)| d Π_{Σ, α} (Φ_{α}) \\ \leq \sum_{A, B \in α} \sqrt{Σ (A \times A)} \sqrt{Σ (B \times B)} = {(\sum_{A \in α} \sqrt{Σ (A \times A)})}^{2}, \end{matrix}

showing that (44) implies (40). Second, we note that Corollary 1 stays valid in the signed case, so if Condition (44) holds, there exists a unique Radon probability measure

Π_{Σ}

on

M (X)

with the total-variational topology, projecting to

Π_{Σ, α}

for all

α \in A

.

Example 8.

Let

X

be a compact subset of

R^{d}

. Consider Example 7 with a kernel function that is constant:

k (x, y) = c

for some

c > 0

. If we let

A

consist of partitions α with the property that for all

A, A^{'} \in α

, there exists a translation vector x in

R^{d}

such that the Lebesgue measure of

(A ∖ (A^{'} - x)) \cup ((A^{'} - x) ∖ A)

is zero. Then, for every α,

Σ_{α}

is the

| α | \times | α |

-matrix with all entries equal to,

Σ_{α, i j} = \int_{A_{i} \times A_{j}} c d x d y = c μ (A_{i}) μ (A_{j}) = c μ_{α}^{2},

(where

μ_{α} = μ (A)

for any

A \in α

). Clearly, the corresponding covariance measure Σ gives rise to positive semi-definite covariance matrices

Σ_{α}

, with random histogram components that are highly dependent: in fact, the linear space of all

μ_{α} \in M (X_{α})

with components that sum to zero forms the kernel of

Σ_{α}

:

Ker (Σ_{α}) = {μ_{α} \in M (X_{α}) : \sum_{i \in I (α)} μ_{α, i} = 0},

and a centered multivariate normal distribution is supported on the range of its covariance matrix. This means that

Φ_{α}

lies on the diagonal of

M (X_{α})

with probability one:

Π_{Σ, α} ({Φ_{α} \in M (X_{α}) : Φ_{α, 1} = \dots = Φ_{α, | α |}}) = 1 .

(45)

Note that

sup_{α \in A} \sum_{A \in α} \sqrt{Σ (A \times A)} = sup_{α \in A} \sum_{A \in α} \sqrt{c} μ (A) = \sqrt{c} μ (X) < \infty,

(46)

so, according to Corollary 5, there exists a weakly-Radon probability measure

Π_{Σ}

on

M (X)

projecting to

Π_{Σ, α}

for all

α \in A

.

Additionally, a moment’s thought shows that the above example serves as a bound for a host of examples based on the reproducing kernels of Example 7.

Corollary 6.

Let

X

be a compact Hausdorff space and let

A

be a directed set of partitions generated by a basis that resolves

X

. Let Σ be a covariance measure based on a bounded kernel function

k : X \times X \to R

. Then, there exists a unique weakly-Radon probability distribution

Π_{Σ}

on

M (X)

projecting to

Π_{Σ, α}

for all

α \in A

.

Proof.

Assume first that

A

is as in Example 8. Since, for some

c > 0

and all

x, y \in X

,

| k (x, y) | \leq c

for all

α

and all

A \in α

,

Σ (A \times A) \leq c μ {(A)}^{2}

. Summability then follows as in Inequality (46), proving the existence of a weakly-Radon histogram limit

Π_{Σ}

. For any Borel-measurable partition

α

of

X

, the weakly continuous mapping

φ_{* α} : M (X) \to M (X_{α}) : μ \mapsto (μ (A_{1}), \dots, μ (A_{| α |}))

induces the Gaussian histogram distribution

Π_{Σ, α} = N (0, Σ_{α})

on

M (X_{α})

. The uniqueness of the limit proves the assertion. □

Over the last two decades there has been considerable interest in the so-called Gaussian free field (see, e.g., refs. [,]); below, we use Green’s functions for the harmonic operators in

d = 1, 2, \dots

as covariance kernels to define Gaussian histogram systems in the closure

X

of a non-empty, bounded, open subset of

R^{d}

.

Example 9.

We consider the existence question for the Gaussian free field first in

d = 1

: the Green’s function

G_{1} : X \times X \to R

is of the form,

G_{1} (x, y) = - | x - y | + f (x, y),

where

x \mapsto f (x, y)

is harmonic in x for every y. Define,

Σ_{1} (A \times B) = \int_{A \times B} G_{1} (x, y) d x d y,

(where we choose f such that

Σ_{1}

is symmetric and positive-definite). Choose a directed set

A

of partitions generated by a basis that resolves

X

. Based on Corollary 6, we see immediately that the associated centered Gaussian histogram system with histogram distributions

Π_{Σ_{1}, α}

has a limit

Π_{Σ_{1}}

that is a weakly Radon probability measure on

M (X)

. Then,

Q (B) = \int | Φ | (B) d Π_{Σ} (Φ)

, (

B \in B

), is a multiple of the Lebesgue measure due to translation invariance and,

Π_{Σ_{1}} ({Φ \in M (X) : Φ ≪ Q}) = 1,

implying that the random element

Φ \sim Π_{Σ}

is of the form,

Φ (B) = \int_{B} ϕ (x) d x,

for all

B \in B

, where ϕ is a random Radon–Nikodym density function in

L^{1} (X)

, the Banach space of Lebesgue integrable functions on

X

.

For

d = 2, 3, \dots

, the situation changes drastically (see Figure 1 and Figure 2): Green’s functions

G_{d} : X \times X \to R

are unbounded and display singular behavior in neighborhoods of the diagonal, namely,

G_{2} (x, y) = - log | x - y |

and

G_{d} (x, y) = - {| x - y |}^{d - 2}

for

d \geq 3

. To apply Corollary 6, we modify

G_{d}

for small length scales to regularize the singular behavior near the diagonal (e.g., for some small

ϵ > 0

, replace

{| x - y |}^{d - 2}

by

{| (| x - y | + ϵ)}^{d - 2} |

, which replaces the pole for

x = y

with an upper bound

1 / ϵ^{d - 2}

). The modified

G_{d}

are bounded kernel functions, and Corollary 6 guarantees that for every

ϵ > 0

, there exists a weak histogram limit

Π_{Σ_{d}, ϵ}

. One may hope that the limit for

Π_{Σ_{d}, ϵ}

as

ϵ \to 0

(which may exist only as a tightly Radon probability measure) describes the so-called Gaussian free field in d Euclidean dimensions. In light of earlier explorations (see, for example, ref. [] for a detailed overview of (mostly) the

d = 2

case), it is also possible that the

ϵ \to 0

limit exists only if we embed the space

M (X)

of Radon measures on

X

, in spaces of distributions on

X

. The limiting probability distribution

Π_{Σ_{d}}

would then describe a random generalized function of the type discussed in [] (Ch. IX, §6, No. 10) and []

Figure 1. A sample from a random histogram on a 64 × 64 (lower-right) partitioned square patch of two-dimensional Euclidean space-time with the Green’s function for the Laplacian to define the covariance measure and its 32 × 32 (lower-left), 16 × 16 (upper-right) and 8 × 8 (upper-left) coarsened histograms. Coherence of the histogram system says that the distributions of the random 8 × 8, 16 × 16 and 32 × 32 histograms must equal the distributions implied by the coarsening of the random 64 × 64 histogram. The histogram limit is the random object obtained by infinite refinement.

Figure 2. Samples from Gaussian random histograms on a 64 × 64 partitioned two-dimensional square slice of Euclidean space-time, with the Green’s function for the Laplacian to define the covariance measure, in two dimensions (upper-left), three dimensions (upper-right) and four dimensions (lower-left), alongside a sample with the Yukawa potential of a massive scalar boson field in four dimensions (lower-right).

8.4. Completely Random Gaussian Histogram Limits

The class of (centered) Gaussian histogram limits has a non-empty intersection with the class of completely random measures, characterized by covariance measures that place all mass on the diagonal. Completely random Gaussian histogram limits exist in the fixed or random atomic phase.

Definition 13.

Let Σ be a covariance measure on

X \times X

. We say that Σ is diagonal if for all

A, B \in B

,

Σ (A \times B) = Σ (A \cap B \times A \cap B)

.

Note that with a diagonal

Σ

, the set function

τ : B \to [0, \infty)

defined by

τ (B) = Σ (B \times B)

for all

B \in B

is a positive Radon measure on

X

, and the Hilbert space

L^{2} (Σ)

is isometrically isomorphic to

L^{2} (τ)

, the usual space of

τ

-square-integrable functions on

X

.

A diagonal covariance measure leaves histogram components independent and leads to a completely random limit, which is of a fixed- or random-atomic nature also in the case of a signed completely random measure. If a Gaussian histogram system with a diagonal covariance measure has a tight limit and

Σ

distributes its mass in a uniformly asymptotically negligible way (see (Section XVII.7) of []), infinite divisibility of the distribution of the random variable

{∥ Φ ∥}_{1, X}

is implied.

Corollary 7.

Let

X

be a compact Polish space and let

A

be a directed set of partitions that resolves

X

and is generated by a basis. Assume that Σ is a diagonal covariance measure. Then, the centered Gaussian histogram system

(Π_{Σ, α}, φ_{* α β})

has a unique tightly-Radon probability distribution

Π_{Σ}

on

M (X)

projecting to

Π_{Σ, α}

for all

α \in A

. If, in addition,

lim_{α \in A} max_{A \in α} τ (A) = 0,

(47)

then the total-variational norm

{∥ Φ ∥}_{1, X}

has a probability distribution that is infinitely divisible.

Proof.

For a diagonal covariance measure

Σ

, any

α

and any

t \in R

, the restrictions of cumulant measures

λ_{t}

of Definition 7 to the

σ

-algebras

σ (α)

are given by,

\begin{matrix} λ_{t} (B) & = log \int_{M (X_{α})} e^{t Φ_{α} (B)} d Π_{Σ, α} (Φ_{α}) = log \int_{M (X_{α})} \prod_{A \subset B} e^{t Φ_{α} (A)} d Π_{Σ, α} (Φ_{α}) \\ = log \prod_{A \subset B} \int_{R} e^{t Φ_{α} (A)} d Π_{Σ, α} (Φ_{α} (A)) = log \prod_{A \subset B} e^{\frac{1}{2} t^{2} τ (A)} = \frac{t^{2}}{2} \sum_{A \subset B} τ (A) = \frac{t^{2}}{2} τ (B), \end{matrix}

for any

B \in σ (α)

. By the Carathéodory extension, all cumulants are therefore finite positive measures and, hence, [] (Theorem 4.4), there exists a tight completely random limit described by a marked Poisson process [] (Ch. 9–10) (with marks in

R

rather than

(0, \infty)

), of the form (21).

With a diagonal covariance measure

Σ

, the components of

Φ_{α}

are independent random variables for every

α \in A

. Therefore, the norms of the random histograms

Φ_{α}

,

{∥Φ_{α}∥}_{1, α} = \sum_{A \in α} |Φ_{α} (A)|,

are sums of independent terms in a triangular array, and uniform asymptotic negligibility, as assumed in (47), is sufficient for tight convergence to an infinitely divisible limiting probability distribution [] (Section XVII.7). Since the total-variational norm

{∥ Φ ∥}_{1, X}

is the monotone limit of the norms

∥ Φ_{α} ∥_{1, α}

(Proposition 1), its probability distribution is tight and infinitely divisible. □

Example 10.

Let

X = [0, 1]

, let

A

consist of partitions in half-open intervals, generated by a basis and collectively fine enough to resolve

X

. Consider a centered Gaussian histogram system with diagonal covariance measure Σ defined by choosing τ equal to the Lebesgue measure. With sets

A_{α, i}

of the form

(s_{α, i}, s_{α, i + 1}]

(and the singleton

{0}

, for which we set

Φ_{α} ({0}) = 0

for all

α \in A

), the histogram system,

(Φ_{α} (A_{α, 1}), \dots, Φ_{α} (A_{α, | α |})) \sim \prod_{A \in α} N (0, (s_{α, i + 1} - s_{α, i})),

describes the independent, normally distributed increments of Brownian motion started from

B (0) = 0

, so the (random) Stieltjes function

B : [0, 1] \to R

for the measure Φ,

B (t) = inf_{α \in A} \sum {Φ_{α} (A_{α, i}) : 1 \leq i \leq | α |, s_{α, i} < t},

is a version of the sample path of Brownian motion on

[0, 1]

. Since

A

resolves

X

, the Lebesgue measures of all intervals

(s_{α, i + 1}, s_{α, i}]

go to zero as α increases in

A

, and Condition (47) is satisfied. (By extension, if we replace normal distributions by stable distributions in this construction appropriately, infinite divisibility preserves the coherence and existence of Φ, cf. Theorem 10, implies random Stieltjes functions corresponding to right-continuous versions of Lévy sample paths on

[0, 1]

.)

Weak histogram limits with diagonal covariance measures display a limitation similar to that of Theorem 7. To appreciate the problem, note that for a diagonal covariance measure with positive

τ

dominated by the Lebesgue measure, Condition (44) cannot be satisfied. Purely atomic measures

τ

, however, lead to Gaussian histogram systems with weak limits.

Corollary 7 and Example 8 form two extremes: in diagonal cases, Gaussian histogram limits manifest in the fixed- or random-atomic phase, while covariance measures that spread their mass more homogeneously over

X \times X

dependence introduces a degree of smoothness, a situation that we have seen in its most extreme form in (the highly non-diagonal) Example 8. Gaussian histogram limits for other covariance measures

Σ

are somewhere in between: depending on the degree to which

Σ

-mass is located away from the diagonal, corresponding to the degree of dependence between Gaussian histogram components, the histogram limit may manifest in close-to-atomic (i.e., highly concentrated) or smooth/closer-to-constant form.

To demonstrate the explanatory value of the phase structure of Gaussian histogram limits described above, the last example, which is analyzed more comprehensively in forthcoming work, suggests the applicability of Gaussian histogram limits in (Euclidean) quantum field theory [].

Example 11.

Let

d \geq 1

be given and consider

X = {p \in R^{d} : ∥ p ∥ \leq Λ}

for some constant

Λ > 0

, and Σ diagonal with,

τ (B) = \int_{B} \frac{1}{p^{2}} d p,

for all

B \in B

. The space

X

plays the role of d-dimensional Euclidean ‘momentum space’ (and the constant Λ that makes

X

compact is known as the UV cutoff scale in physics). The kernel defining Σ is interpreted as the (unregularized, Euclidean) ‘propagator’ of the massless scalar field (roughly, the Green’s function for the Laplace operator, which is represented by the convolution kernel

- p^{2} δ_{d} (p - q)

in momentum space). The diagonal Gaussian histogram limit exists, cf. Corollary 7.

We point out the following consequence of the phase of this Gaussian histogram limit. In this case, ‘quantization’, a description of the field in terms of particles, emerges as a consequence of complete randomness: the Gaussian histogram limit is in the random-atomic phase and manifests as a random sum of discrete point masses in Euclidean momentum space. Such configurations have an immediate physical interpretation, as states describing (off-shell) particles, point-like quanta of momentum. It is noted that the emergence of quantization is not a feature of (second-quantized) quantum field theory: in the physical theory of quantum fields, particles are axiomatic and introduced by hand with the formal introduction of a Fock space to describe quantum states of the field (see the classical work [] (p. 106)).

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the author.

Acknowledgments

The author thanks Jan van Mill, Harm de With and Georg Meyl for numerous insightful discussions at various stages in the development of this work. The author also wishes to thank the University Torino and University Bocconi, Milano, for their hospitality.

Conflicts of Interest

The author declares no conflicts of interest.

Appendix A. Topologies on Measure Spaces

The locally convex topologies that play a role in vector spaces of measures are discussed here in some detail, with special attention for compactness criteria and subspaces of probability measures. In the main text, two topologies feature centrally, the tight and weak topologies (with a smaller role for the total variational topology). While we mention a minimal set of definitions necessary in the main text in Section 2.1, here, we consider the weak topology in some more detail. We assume that the reader is familiar with the much-better-known total-variational and tight topologies, and we mention Prokhorov’s condition for tight compactness for reference in the main text.

From a measure-theoretic perspective, the canonical duality for a vector space of bounded measures is formulated with a space of bounded measurable functions. Let

(X, F)

be a measurable space and denote its vector space of bounded measures by

M_{b} (X, F)

(a measure is bounded if

{∥ μ ∥}_{1, X} = | μ | (X) = μ_{+} (X) + μ_{-} (X) < \infty

). Let

L^{\infty} (X, F)

denote the vector space of bounded measurable functions on

X

. For

μ \in M_{b} (X, F)

and

f \in L^{\infty} (X, F)

,

⟨ μ, f ⟩ = \int_{X} f d μ

defines a bilinear form and a sub-basis of sets

W_{f} = {μ \in M_{b} (X, F) : | ⟨ μ, f ⟩ | < 1}

for the weak topology

T_{1} = σ (M_{b} (X, F), L^{\infty} (X, F))

. The weak topology is the initial topology for the mappings

μ \mapsto ⟨ μ, f ⟩

,

f \in L^{\infty} (X, F)

, a locally convex topology with semi-norms

ρ_{f} (μ) = | ⟨ μ, f ⟩ |

. A net (sequence, filter)

μ_{i}

, (

i \in I

), converges weakly to

μ

if for every

f \in L^{\infty} (X, F)

,

\int f d μ_{i} \to \int f d μ

. The space

M_{b} (X, F)

with the weak topology is, in general, not metrizable, not complete and not separable, making it rather inaccessible compared to the tight and total-variational topologies.

The Dunford–Pettis–Grothendieck theorem characterizes weak compactness for subspaces of

M_{b} (X, F)

(see, for example, Appendix 8, Theorem 6 of []): a subset H of

M_{b} (X, F)

is relatively compact in the weak topology if for some

Q \in M^{1} (X, F)

,

sup_{μ \in H} {∥| μ | - | μ | \land L Q∥}_{1, X} \to 0,

as

L \to \infty

(to make this condition also necessary, choose Q from the positive elements of the L-space generated by

M_{b} (X, F)

, see []). Therefore, dominated subspaces play a central role when considering inner regularity with the weak topology. For any subset H of

M_{b} (X, F)

dominated by some

Q \in M^{1} (X, F)

and any

μ \in H

, there is a unique

d μ / d Q \in L^{1} (X, F, Q)

such that

μ (F) = \int_{F} (d μ / d Q) (x) d Q (x)

by the Radon–Nikodym theorem. In that case, duality between

L^{1} (X, F, Q)

and the vector space

L^{\infty} (X, F, Q)

of (Q-almost-everywhere equivalence classes of) bounded measurable functions on

X

corresponds to the better-known weak-star topology on the continuous dual of the normed space

L^{\infty} (X, F, Q)

. Weak(-star) compactness is characterized by the Dunford–Pettis theorem (see []), which says that a Q-dominated

H \subset M_{b} (X, F)

is relatively compact in the weak topology if and only if the subset of Radon–Nikodym densities

{d μ / d Q : μ \in H}

in

L^{1} (X, F, Q)

is uniformly integrable, i.e.,

sup_{μ \in H} \int_{{x \in X : | d μ / d Q | (x) > L}} |\frac{d μ}{d Q}| (x) d Q (x) \to 0,

as

L \to \infty

. Finally, we may characterize the relative compactness of a subset H of

M_{b} (X, F)

in the weak topology by the condition that there exists a

Q \in M^{1} (X, F)

such that for every

ϵ > 0

, there exists a

δ > 0

, such that,

Q (A) < δ \Rightarrow sup_{μ \in H} | μ | (A) < ϵ,

(A1)

for every

A \in F

. These three characterizations of weak compactness are equivalent.

While tight and total-variational convergence are widely used in statistics, the weak topology is much less prominent. This is unfortunate, given that in frequentist statistics, the weak topology characterizes the testability of hypotheses and the estimability of parameters, cf. an underappreciated theorem of Le Cam and Schwartz that gives necessary and sufficient conditions for asymptotically consistent estimation []. The following theorem is the Le Cam–Schwartz theorem, in a form specific to the existence of asymptotically consistent hypothesis tests, in the case where H is a weakly compact subset of

M^{1} (X, F)

.

Theorem A1.

For all

n \geq 1

, let

X^{n} = (X_{1}, \dots, X_{n})

denote independent, identically distributed observations with

X \sim P

for some P from H, a weakly compact subset of

M^{1} (X, F)

. For mutually exclusive hypotheses

B, V \subset H

, there exists a sequence of measurable

ϕ_{n} : X^{n} \to [0, 1]

such that,

ϕ_{n} (X^{n}) \overset{P}{\to} 0, ϕ_{n} (X^{n}) \overset{Q}{\to} 1,

for all

P \in B

,

Q \in V

if and only if there exists a sequence

(ψ_{n})

of

T_{1}

-uniformly continuous functions

ψ_{n} : B \cup V \to [0, 1]

such that

ψ_{n} (P) \to 1_{V} (P)

, as

n \to \infty

for every

P \in B \cup V

.

The interpretation of this theorem is that data-based distinction between two subsets in a statistical model can be made with asymptotic certainty if and only if

B, V

allow for limiting separation with a sequence of

T_{1}

-uniformly continuous functions. This result appears technical and inaccessible, but, for weakly compact H, the tight and weak topologies are equal and the above condition may be replaced by the requirement that both B and V are

F_{σ}

-sets for the tight topology! The Le Cam–Schwartz theorem demonstrates the centrality of the weak topology in (asymptotic) frequentist statistics.

The total-variational topology

T_{T V}

is the strong topology

β (M_{b} (X, F), L^{\infty} (X, F))

associated with the weak topology

T_{1}

.

{∥ \cdot ∥}_{1, X}

is a norm on

M_{b} (X, F)

and the associated norm topology is complete. (Banach spaces like

(M_{b} (X, F), ∥ \cdot ∥_{1, X})

are considered in detail in [], for example.) Since,

{∥ μ ∥}_{1, X} = {sup {| ⟨ μ, f ⟩ | : ∥ f ∥}_{\infty} \leq 1},

the total-variational distance between two probability measures

P, Q \in M^{1} (X)

can be written as the

L^{1}

-difference between densities, cf. (7), for any

μ

such that

P, Q ≪ μ

. On a subset

H \subset M^{1} (X, F)

that is dominated by a bounded positive measure Q, the Radon–Nikodym theorem induces a one-to-one mapping

H \to L^{1} (X, F, Q)

, which is isometric by (7).

If the

σ

-algebra

F

is the Borel

σ

-algebra

B

on

X

associated with a (Hausdorff completely regular) topology on

X

, weaker dual topologies are also natural: in particular, we consider the vector space of bounded Radon measures

M_{b} (X) \subset M_{b} (X, B)

and the vector space

C_{b} (X)

of bounded continuous functions on

X

. For

μ \in M_{b} (X)

and

f \in C_{b} (X)

,

⟨ μ, f ⟩

defines a sub-basis of sets

V_{f} = {μ \in M_{b} (X) : | ⟨ μ, f ⟩ | < 1}

for the tight topology

T_{C} = σ (M_{b} (X), C_{b} (X))

. The tight topology is the initial topology for the mappings

μ \mapsto ⟨ μ, f ⟩

,

f \in C_{b} (X)

, a locally convex topology with semi-norms

ρ_{f} (μ) = | ⟨ μ, f ⟩ |

. A net (sequence, filter)

μ_{i}

, (

i \in I

), converges weakly to

μ

if for every

f \in C_{b} (X)

,

\int f d μ_{i} \to \int f d μ

. If

X

is a Polish space, the cone of all positive, bounded Radon measures

M_{+} (X)

and the subspace of probability measure

M^{1} (X)

with the tight topology are Polish spaces too. (Note that

M_{b} (X)

is not necessarily a Polish space!)

Prokhorov’s Theorem characterizes tight compactness for subspaces of

M_{b} (X)

[,]: let

X

be a Hausdorff completely regular space; a subset H of

M_{b} (X)

is relatively tightly compact if and only if

sup {∥ μ ∥_{1, X} : μ \in H} < \infty

and, for every

ϵ > 0

, there exists a compact

K \subset X

such that,

sup {| μ | (X ∖ K) : μ \in H} < ϵ .

(A2)

References

Le Cam, L. Asymptotic Methods in Statistical Decision Theory; Springer: New York, NY, USA, 1986. [Google Scholar] [CrossRef]
Ghosal, S.; van der Vaart, A.W. Fundamentals of Nonparametric Bayesian Inference; Cambridge Series in Statistical and Probabilistic Mathematics; Cambridge University Press: Cambridge, UK, 2017. [Google Scholar] [CrossRef]
Kleijn, B.J.K. Frequentist validity of Bayesian limits. Ann. Statist. 2021, 49, 182–202. [Google Scholar] [CrossRef]
Kingman, J.F.C. Completely random measures. Pac. J. Math. 1967, 21, 59–78. [Google Scholar] [CrossRef]
Kingman, J.F.C. Random discrete distributions. J. R. Stat. Soc. Ser. B (Methodol.) 1975, 37, 1–15. [Google Scholar] [CrossRef]
Daley, D.J.; Vere-Jones, D. An Introduction to the Theory of Point Processes: Volume I: Elementary Theory and Methods; Probability and Its Applications; Springer: New York, NY, USA, 2003. [Google Scholar]
Daley, D.J.; Vere-Jones, D. An Introduction to the Theory of Point Processes: Volume II: General Theory and Structure; Probability and Its Applications; Springer: New York, NY, USA, 2007. [Google Scholar]
Bochner, S. Harmonic Analysis and the Theory of Probability; Courier Corporation: North Chelmsford, MA, USA, 1955. [Google Scholar]
Choksi, J.R. Inverse limits of measure spaces. Proc. Lond. Math. Soc. 1958, 3, 321–342. [Google Scholar] [CrossRef]
Metivier, M. Limites projectives de measures, martingales, applications. Ann. Mat. 1963, 63, 225–352. [Google Scholar] [CrossRef]
Schwartz, L. Radon Measures on Arbitrary Topological Spaces and Cylindrical Measures; Studies in mathematics; Tata Institute of Fundamental Research: Mumbai, India, 1973. [Google Scholar]
Bourbaki, N. Integration II: Chapters 7–9; Actualités scientifiques et industrielles; Springer: Berlin/Heidelberg, Germany, 2004. [Google Scholar]
Rao, M.M. Foundations of Stochastic Analysis; Academic Press: Cambridge, MA, USA, 1981. [Google Scholar]
Bogachev, V.I. Measure Theory; Springer: New York, NY, USA, 2007; Volumes I–II. [Google Scholar]
Mallory, D.; Sion, M. Limits of inverse systems of measures. Ann. Inst. Fourier 1971, 21, 25–57. [Google Scholar] [CrossRef]
Rao, M.M. Projective limits of probability spaces. J. Multivar. Anal. 1971, 1, 28–57. [Google Scholar] [CrossRef]
Pinter, M. The existence of an inverse limit of an inverse system of measure spaces—A purely measurable case. Acta Math. Hungar. 2010, 126, 65–77. [Google Scholar] [CrossRef]
Beznea, L.; Cîmpean, I. On Bochner-Kolmogorov Theorem. In Séminaire de Probabilités XLVI; Donati-Martin, C., Lejay, A., Rouault, A., Eds.; Springer International Publishing: Cham, Switzerland, 2014; pp. 61–70. [Google Scholar] [CrossRef]
Kraft, C.H. A Class of Distribution Function Processes Which Have Derivatives. J. Appl. Probab. 1964, 1, 385–388. [Google Scholar] [CrossRef]
Ferguson, T.S. A Bayesian analysis of some nonparametric problems. Ann. Statist. 1973, 1, 209–230. [Google Scholar] [CrossRef]
Ferguson, T.S. Prior distributions on spaces of probability measures. Ann. Statist. 1974, 2, 615–629. [Google Scholar] [CrossRef]
De Blasi, P.; Favaro, S.; Lijoi, A.; Mena, R.H.; Pruenster, I.; Ruggiero, M. Are Gibbs-type priors the most natural generalization of the Dirichlet process? IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 212–229. [Google Scholar] [CrossRef] [PubMed]
Orbanz, P. Projective limit random probabilities on Polish spaces. Electron. J. Stat. 2011, 5, 1354–1373. [Google Scholar] [CrossRef]
Werner, W.; Powell, E. Lecture notes on the Gaussian Free Field. arXiv 2020, arXiv:2004.04720. [Google Scholar] [CrossRef]
Bourbaki, N. General Topology: Chapters 1–4; Elements of Mathematics; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
Minlos, R.A. Generalized random processes and their extension to a measure. Sel. Transl. Math. Stat. Prob. 1963, 3, 291–313. [Google Scholar]
Ghosh, J.K.; Ramamoorthi, R.V. Bayesian Nonparametrics; Springer Series in Statistics; Springer: Berlin/Heidelberg, Germany, 2003. [Google Scholar]
Kingman, J.F.C.; Taylor, S.J. Introduction to Measure and Probability; Cambridge University Press: Cambridge, UK, 1966. [Google Scholar] [CrossRef]
Pfanzagl, J. On the Existence of Product Measurable Densities. Sankhya 1969, 31, 13–18. [Google Scholar]
Strasser, H. Mathematical Theory of Statistics; De Gruyter: Berlin, Germany; New York, NY, USA, 1985. [Google Scholar] [CrossRef]
Kechris, A.S. Classical Descriptive Set Theory; Graduate texts in mathematics; Springer: Berlin/Heidelberg, Germany, 1995. [Google Scholar]
Bourbaki, N. Topological Vector Spaces: Chapters 1–5; Springer: Berlin/Heidelberg, Germany, 1987. [Google Scholar]
Regazzini, E.; Lijoi, A.; Prünster, I. Distributional results for means of normalized random measures with independent increments. Ann. Stat. 2003, 31, 560–585. [Google Scholar] [CrossRef]
James, L.F. A simple proof of the almost sure discreteness of a class of random measures. Stat. Probab. Lett. 2003, 65, 363–368. [Google Scholar] [CrossRef]
Mauldin, R.D.; Sudderth, W.D.; Williams, S.C. Polya Trees and Random Distributions. Ann. Statist. 1992, 20, 1203–1221. [Google Scholar] [CrossRef]
Lavine, M. Some Aspects of Polya Tree Distributions for Statistical Modelling. Ann. Statist. 1992, 20, 1222–1235. [Google Scholar] [CrossRef]
Lavine, M. More Aspects of Polya Tree Distributions for Statistical Modelling. Ann. Statist. 1994, 22, 1161–1176. [Google Scholar] [CrossRef]
Bourbaki, N. Integration I: Chapters 1–6; Springer: Berlin/Heidelberg, Germany, 2003. [Google Scholar]
Trèves, F. Topological Vector Spaces, Distributions and Kernels; Pure and applied mathematics; Academic Press: New York, NY, USA; London, UK, 1967. [Google Scholar]
Kan, R.; Robotti, C. On moments of folded and truncated multivariate normal distributions. J. Comput. Graph. Stat. 2017, 26, 930–934. [Google Scholar] [CrossRef]
Sheffield, S. Gaussian free fields for mathematicians. Probab. Theory Relat. Fields 2007, 139, 521–541. [Google Scholar] [CrossRef]
Gel’fand, I.M.; Vilenkin, N.Y. Generalized Functions: Applications of Harmonic Analysis; AMS Chelsea Publishing: Providence, RI, USA, 1964; Volume 4. [Google Scholar]
Feller, V. An Introduction to Probability Theory and Its Applications, 2nd ed.; Wiley Series in Probability and Statistics; John Wiley and Sons, Inc.: Hoboken, NJ, USA, 1991; Volume II. [Google Scholar]
Hellmund, G. Completely random signed measures. Stat. Probab. Lett. 2009, 79, 894–898. [Google Scholar] [CrossRef]
Zuber, J.B.; Itzykson, C. Quantum Field Theory; Dover Books on Physics; Dover Publications: Mineola, NY, USA, 2012. [Google Scholar]
Diestel, J. Uniform Integrability: An Introduction; Dipartimento di Scienze Matematiche, Università Degli Studi di Trieste: Trieste, Italy, 1991. [Google Scholar]
Le Cam, L.; Schwartz, L. A Necessary and Sufficient Condition for the Existence of Consistent Estimates. Ann. Math. Statist. 1960, 31, 140–150. [Google Scholar] [CrossRef]
Dunford, N.; Schwartz, J. Linear Operators, Part 1: General Theory; Wiley Classics Library; Wiley: Hoboken, NJ, USA, 1988. [Google Scholar]
Prokhorov, Y.V. Convergence of Random Processes and Limit Theorems in Probability Theory. Theory Probab. Appl. 1956, 1, 157–214. [Google Scholar] [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Existence and Phase Structure of Random Inverse Limit Measures

Abstract

1. Introduction

2. Limits of Random Histogram Systems

2.1. Inverse Systems of Random Histograms

2.1.1. Measures, Partitions and Histograms

2.1.2. Domination, Histogram Densities and Total Variation

2.1.3. Random Histogram Systems and Coherence

2.2. The Bourbaki–Prokhorov–Schwartz Theorem

3. Random Histogram Limits with the Weak Topology

3.1. Support and Approximation of Weak Histogram Limits

3.1.1. Support and Domination

3.1.2. Approximation by Weakly Convergent Histograms

3.2. Existence of Weak Histogram Limits

3.3. Existence of Total-Variational Histogram Limits

4. Random Histogram Limits with the Tight Topology

4.1. Existence of Tight Histogram Limits

4.1.1. Zero-Dimensional Refinements of Polish Spaces

4.1.2. Tight Histogram Limits with Zero-Dimensional Compacta

4.1.3. Tight Histogram Limits with Ordinary Compacta

4.1.4. Coherent Random Histogram Systems Without Limit

4.2. Support in the Tight Topology

5. Phase Structure of Probability Histogram Limits

5.1. Completely Random Measures

5.2. Phases of Probability Histogram Limits

6. Existence and Phases of Dirichlet Histogram Limits

6.1. Tight Limits of Dirichlet Histogram Systems

6.2. Weak Limits of Dirichlet Histogram Systems

7. Existence and Phases of Pólya-Tree Histogram Limits

7.1. Tight Limits of Pólya-Tree Histogram Systems

7.2. Weak Limits of Pólya-Tree Histogram Systems

8. Existence and Phases of Gaussian Histogram Limits

8.1. Random Histogram Limits with Signed Measures

8.2. Existence of Tight Gaussian Histogram Limits

8.3. Existence of Weak Gaussian Histogram Limits

8.4. Completely Random Gaussian Histogram Limits

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Topologies on Measure Spaces

References

Article Metrics

Citations

Article Access Statistics