Abstract
Dong, Goldschmidt and Martin (2006) (DGM) showed that, for and , the repeated application of independent single-block fragmentation operators based on mass partitions following a two-parameter Poisson–Dirichlet distribution with parameters to a mass partition having a Poisson–Dirichlet distribution with parameters leads to a remarkable nested family of Poisson—Dirichlet distributed mass partitions with parameters for Furthermore, these generate a Markovian sequence of -diversities following Mittag-Leffler distributions, whose ratios lead to independent Beta-distributed variables. These Markov chains are referred to as Mittag-Leffler Markov chains and arise in the broader literature involving Pólya urn and random tree/graph growth models. Here we obtain explicit descriptions of properties of these processes when conditioned on a mixed Poisson process when it equates to an integer which has interpretations in a species sampling context. This is equivalent to obtaining properties of the fragmentation operations of (DGM) when applied to mass partitions formed by the normalized jumps of a generalized gamma subordinator and its generalizations. We focus primarily on the case where
1. Introduction
Let denote a Markov chain characterized by a stationary transition density given for and :
where is the density of a variable with a Mittag-Leffler distribution, is a positive stable variable with density denoted as and Laplace transform More generally, as in [1,2,3,4], for let denote a variable with density then, is said to have a generalized Mittag-Leffler distribution with parameters and distribution denoted as In the cases where the marginal distributions of each are Furthermore, there is a sequence of random variables defined for each integer j as ; hence, there is the exact point-wise relation where, remarkably, the are independent variables, and is independent of for Note further that by setting there is the point-wise equality where all the variables on the right-hand side are independent. In these cases, the sequence may be referred to as a Mittag-Leffler Markov chain with law denoted as as in [5] and, subsequently, [6]. The Markov chain is described prominently in various generalities, that is, ranges of and , in [5,6,7,8,9]. See for example [5,6,10,11,12,13,14,15] for more references concerning Pólya urn and random tree/graph growth models.
Now, let denote a two-parameter Poisson–Dirichlet distribution over the space of mass partitions summing to say , as described in [3,4,16]. Let correspond in distribution to the ranked lengths of excursion of a generalized Bessel bridge on as described and defined in [1,4]. In particular, and correspond to excursion lengths of standard Brownian motion and Brownian bridge, on respectively. As noted in [6], the single-block fragmentation results for mass partitions by [17], which we shall describe in more detail in Section 1.2, allow one to couple a version of with a nested family of mass partitions where each takes its values in , initial has -diversity , and each successive has -diversity The distribution of this family is denoted as
Recall from [2] that for has distribution , and for a probability measure on , one may generate the general class of Poisson–Kingman distributions generated by an -stable subordinator with mixing by forming Some prominent examples of interest in this work are and where Hence, corresponds to the law of the ranked normalized jumps of a generalized gamma subordinator, say where has density In [6], we obtained some general distributional properties of formed by repeated application of the fragmentation operations in [17] to the case where Furthermore, letting denote a sequence of iid variables forming the arrival times, say , of a standard Poisson process, we ([6], Section 4.3) focused in more detail on the special case of for when and is a mixed Poisson process with random intensity depending on That is to say, corresponds in distribution to following a distribution, where corresponds to the distribution of
In this work, we obtain results for the case where is such that which is when corresponds to the ranked normallized jumps of a generalized gamma process, and its size-biased generalizations. Interestingly, our results equate in distribution to the following setup involving Let be a mixed Poisson process defined by replacing in with Using the mixed Poisson framework in the manuscript of Pitman [18] (see also [6,19] for more details), we obtain some explicit distributional properties of and corresponding variables for when That is when The equivalence in distribution to the fragmentation operations of [17] applied in the generalized gamma cases may be deduced from [18], who shows that when corresponds to the distribution of We shall primarily focus on the case of corresponding to the generalized gamma density and its sized biased distribution, which yields the most explicit results. The fragmentation operations (6) applied to allow one to recover the entire range of distributions for by gamma randomization, whereas the case for only applies to We note that descriptions of our results for , albeit less refined ones, appear in the unpublished manuscript ([9], Section 6). See also [20] for an application of for randomized
We close this section by recalling the definition of the first size-biased pick from a random mass partition (see [2,3,16]). Specifically, is referred to as the first size-biased pick from if it satisfies, for
Hereafter, let denote the remainder, such that where denotes the operation corresponding to ranked re-arrangement. From [1], may be interpreted as the length of excursion (i.e., one of the ), first discovered by dropping a uniformly distributed random variable onto the interval The fragmentation operation of [17] may be interpreted as shattering/fragmenting that interval by the excursion lengths of a process on with distribution and then re-ranking. For clarity and comparison, we first recall some details of the more well-known Markovian size-biased deletion operation leading to stick-breaking representations, as described in [1,2,3], and more related notions arising in a Bayesian nonparametric context in the setting, in the next section.
Remark 1.
Although we acknowledge the influence and contributions of the manuscript [18], the pertinent distributional results we use from that work are re-derived at the beginning of Section 2. Otherwise, the interpretation of from that work is briefly mentioned in Section 1.3.
1.1. Markovian Sequences Obtained from Successive Size-Biased Deletion
Following [1], we may define to be a size-biased deletion operator on as where it can be recalled from (2) that Now, let be a collection of such operators. From [1], as per the description in ([4], Proposition 34, p. 881), it follows that for and is independent of the first size-biased pick , and hence, for
This leads to a nested Markovian family of mass partitions where with inverse local time at time (see ([3], Equation (4.20), p. 83)), and for each with inverse local time at time Furthermore, form a Markov chain with pointwise equality where are independent variables and are the respective first size-biased picks from for Furthermore, is independent of and, more generally, for
From this, one obtains the size-biased re-arrangement of a mass partition, say satisfying , and for Refs. [3,21] discuss the distribution and these other concepts in a species sampling and Bayesian context. We mention the roles of corresponding random distribution functions as priors in a Bayesian non-parametric context. Let denote a sequence of iid variables independent of then, the random distribution is said to follow a Pitman–Yor distribution with parameters (see [21,22]). is a two-parameter extension of the Dirichlet process [23] (which corresponds to ) and has been applied extensively as a more flexible prior in a Bayesian context, but it also arises in a variety of areas involving combinatorial stochastic processes [3,21]. An attractive feature of is that it may be represented as where are the iid concomittants of the as exploited in [22] (see also [21]). This constitutes the stick-breaking representation of Furthermore, we can describe as folllows: let have distribution and denote the first value drawn from ; then, is the mass in corresponding to that atom of The size-biased deletion operation described above, as in (3), leads to the following decomposition of
where are independent of where , and independent of this, where See [1,4,24] and references therein for various interpretations of (4).
1.2. DGM Fragmentation
The single-block fragmentation operator of [17] is defined over the space However, for further clarity we start with an explanation at the level of random distribution functions involving the representation in (4). Suppose that , with and, independent of this, hence, Suppose that is chosen independent of in (4); then, it follows from [17] that
and it is evident that the mass partition shatters/fragments into a countably infinite number of pieces It follows that, in this case, which is the featured case of the fragmentation described in [17]. Hence, for general a fragmentation of is defined as
where, independent of Let denote an independent collection of mass partitions defining a sequence of independent fragmentation operators It follows from [17] that a version of the family may be constructed by the recursive fragmentation, for
In particular, when
1.3. Remarks
We close this section with remarks related to some relevant work of Eugenio Regazzini and his students, arising in a Bayesian context. From [18], in regards to a species sampling context using (see [21]), interprets as the number of animals trapped and tagged up until time and hence, interprets as the time when the j-th animal is trapped for Ref. [18] indicates that this gives further interpretation to such types of quantities arising in [25,26]. Using a Chinese restaurant process metaphor, the animals may be replaced by customers arriving sequentially to a restaurant. More generically, is the number of exchangeable samples drawn from up until time Furthermore, for each is equivalent in distribution to which is now referred to in the Bayesian literature as a normalized generalized gamma process. While, according to [2], appears in a relevant species sampling context in the 1965 thesis of McCloskey [27], and certainly elsewhere, the paper by Reggazzini, Lijoi, and Prünster [28] and subsequent works by Regazzini’s students (see [29]) helped to popularize the usage of in the modern literature on Bayesian non-parametrics. Our work presents a view of subjected to the fragmentation operations in [17]. Although we do not consider specific Bayesian statistical applications in this work, we note that other types of fragmentation/coagulation of models have been applied, for instance, in [30]. We anticipate the same will be true of the operations considered here.
2. Results
Hereafter, we shall focus on the case of as we will recover the general cases by applying gamma randomization as in ([4], Proposition 21) for or ([19], Corollary 2.1) for and other results. See also ([6], Section 2.2.1). We first re-derive some relevant properties related to that are easily verified by first conditioning on and otherwise can be found in [18]. First, for fixed and for
and for
Note these simple results hold for any variable T with density in place of and It follows from (7) and (8) that has the generalized gamma density Furthermore, for has the same distribution as with density Since it is assumed that is independent of it follows that for the conditional distribution of is , and hence, has distribution for as mentioned previously.
Remark 2.
For the next results, which are extensions to conditioned on we note, as in [19], that the densities are well-defined for any real number ϱ in place of with density provided that and for only in the case where which corresponds to Ref. ([19], Corollary 2.1) shows that distributions for ϱ can be expressed as randomized (over λ) distributions for any
For clarity, with respect to are independent variables for and is independent of and for each
Proposition 1.
Consider formed by the fragmentation operations in (6), when Denote the conditional distribution of as and its corresponding component values as Then, the distribution has the following properties.
- (i)
- is equivalent in distribution to .
- (ii)
- has distribution for
- (iii)
- has the same distribution as
3. Results for
We will now focus on results for , given in the cases where and This is equivalent to providing more explicit distributional results than Proposition 1 for the generalized gamma and its size-biased case, where for subjected to the fragmentation operations in (6). We first highlight a class of random variables that will play an important role in our descriptions.
Throughout, we define for with Let and denote, respectively, iid collections of and random variables that are mutually independent. Use this to form iid sums and construct increasing sums for
Lemma 1.
For set with and hence Then, for any and the joint density of can be expressed as
Furthermore,
3.1. Results for the Generalized Gamma Case
Let denote a collection of iid variables, and independent of this, let denote, for each fixed a collection of iid variables such that In addition, for each r, is independent of
Proposition 2.
Consider then, for each r, the joint distribution of the random variables is equivalent component-wise and jointly to the distribution of where:
- (i)
- with conditional density givenfor .
- (ii)
- The conditional distribution of is equivalent to that ofwhere recall
- (iii)
- The conditional density of is
- (iv)
- Hence,
- (v)
- are independent.
Corollary 1.
Suppose that then for
where
Proof.
This follows from statement (iv) of Proposition 2. □
The corollary shows that the fragmentation operations in (6) lead to a nested family of (mixed) normalized generalized gamma distributed mass partitions, with replaced by the random quantities In other words, equates in distribution to the ranked masses of the random distribution function, for :
Now, in order to recover for when set, for where When as in Corollary 1, it follows from ([4], Proposition 21) that Hence It follows from Proposition 2 that, for Notably, are independent variables, such that for When or equivalently and for
3.2. Results for
Proposition 3.
Consider then, for each r, the joint distribution of the random variables is equivalent component-wise and jointly to the distribution of where:
- (i)
- , where .
- (ii)
- for component-wise and jointly.
- (iii)
- is equivalent in distribution to and equivalent in distribution to.
Corollary 2.
The distributions of the components of , where for satisfies for
where for independent of the other variables. In this case,
Proof.
Now, in order to recover for when use where and, It follows from ([19], Corollary 2.1) that for
3.3. Proofs of Propositions 2 and 3
Although the joint conditional density of in the setting can be easily obtained from ([6], p. 324), with for clarity, we derive it here. Since , and , it follows that the desired conditional density of can be expressed as,
Now, a joint density of follows from the descriptions in Proposition 2 and Lemma 3.1 and can be expressed, for ,, as
for Proposition 2 is verified by showing that integrating over in (13) leads to (12). This is equivalent to showing that
which follows by elementary calculations involving the change of variable for and exponential integrals. Now, to establish Proposition 3, first note that since , and the joint density of can be expressed as
Hence, the conditional density of can be expressed as,
which corresponds to verifying statement (i) of Proposition 3. Refs. (14) and (15) show that is , which leads to having distribution This agrees with statement (ii) of Proposition 1, with Using and applying Proposition 2 starting with subject to (6) concludes the proof of Proposition 3.
Funding
This research was supported in part by grants RGC-GRF 16301521, 16300217 and 601712 217 of the Research Grants Council (RGC) of the Hong Kong SAR. This research also received funding 218 from the European Research Council (ERC) under the European Union’s Horizon 2020 research 219 and innovation programme under grant agreement No. 817257.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Acknowledgments
This article is dedicated to Eugenio Regazzini on the occasion of his 75th birthday.
Conflicts of Interest
The author declares no conflict of interest.
References
- Perman, M.; Pitman, J.; Yor, M. Size-biased sampling of Poisson point processes and excursions. Probab. Theory Relat. Fields. 1992, 92, 21–39. [Google Scholar] [CrossRef]
- Pitman, J. Poisson-Kingman partitions. In Science and Statistics: A Festschrift for Terry Speed; Goldstein, D.R., Ed.; Institute of Mathematical Statistics: Hayward, CA, USA, 2003; pp. 1–34. [Google Scholar]
- Pitman, J. Combinatorial Stochastic Processes. In Lectures from the 32nd Summer School on Probability Theory Held in Saint-Flour, July 7–24, 2002. With a Foreword by Jean Picard. Lecture Notes in Mathematics, 1875; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Pitman, J.; Yor, M. The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Ann. Probab. 1997, 25, 855–900. [Google Scholar] [CrossRef]
- Rembart, F.; Winkel, M. A binary embedding of the stable line-breaking construction. arXiv 2016, arXiv:1611.02333. [Google Scholar]
- Ho, M.-W.; James, L.F.; Lau, J.W. Gibbs Partitions, Riemann-Liouville Fractional Operators, Mittag-Leffler Functions, and Fragmentations derived from stable subordinators. J. Appl. Prob. 2021, 58, 314–334. [Google Scholar] [CrossRef]
- Goldschmidt, C.; Haas, B. A line-breaking construction of the stable trees. Electron. J. Probab. 2015, 20, 1–24. [Google Scholar] [CrossRef]
- Haas, B.; Miermont, G.; Pitman, J.; Winkel, M. Continuum tree asymptotics of discrete fragmentations and applications to phylogenetic models. Ann. Probab. 2008, 36, 1790–1837. [Google Scholar] [CrossRef]
- James, L.F. Stick-breaking PG(α,ζ)-Generalized Gamma Processes. Unpublished manuscript. arXiv 2013, arXiv:1308.6570. [Google Scholar]
- Aldous, D. The continuum random tree. I. Ann. Probab. 1991, 19, 1–28. [Google Scholar] [CrossRef]
- Aldous, D. The continuum random tree III. Ann. Probab. 1993, 21, 248–289. [Google Scholar] [CrossRef]
- Móri, T.F. The maximum degree of the Barabási-Albert random tree. Combin. Probab. Comput. 2005, 14, 339–348. [Google Scholar] [CrossRef]
- Peköz, E.; Röllin, A.; Ross, N. Generalized gamma approximation with rates for urns, walks and trees. Ann. Probab. 2016, 44, 1776–1816. [Google Scholar] [CrossRef]
- Peköz, E.; Röllin, A.; Ross, N. Joint degree distributions of preferential attachment random graphs. Adv. Appl. Probab. 2017, 49, 368–387. [Google Scholar] [CrossRef]
- van der Hofstad, R. Random Graphs and Complex Networks. Cambridge University Press: New York, NY, USA, 2016; Volume I. [Google Scholar]
- Bertoin, J. Random Fragmentation and Coagulation Processes; Cambridge University Press: Cambridge, UK, 2006. [Google Scholar]
- Dong, R.; Goldschmidt, C.; Martin, J. Coagulation-fragmentation duality, Poisson-Dirichlet distributions and random recursive trees. Ann. Appl. Probab. 2006, 16, 1733–1750. [Google Scholar] [CrossRef][Green Version]
- Pitman, J. Mixed Poisson and negative binomial models for clustering and species sampling. Unpublished manuscript. 2017. [Google Scholar]
- James, L.F. Stick-breaking Pitman-Yor processes given the species sampling size. arXiv 2019, arXiv:1908.07186. [Google Scholar]
- Favaro, S.; James, L.F. A note on nonparametric inference for species variety with Gibbs-type priors. Electron. J. Statist. 2015, 9, 2884–2902. [Google Scholar] [CrossRef]
- Pitman, J. Some Developments of the Blackwell-MacQueen urn Scheme; Statistics, Probability and Game Theory, IMS Lecture Notes Monogr. Ser. 30; Institute of Mathematical Statistics: Hayward, CA, USA, 1996; pp. 245–267. [Google Scholar]
- Ishwaran, H.; James, L.F. Gibbs sampling methods for stick-breaking priors. J. Am. Statist. Assoc. 2001, 96, 161–173. [Google Scholar] [CrossRef]
- Ferguson, T.S. A Bayesian analysis of some nonparametric problems. Ann. Statist. 1973, 1, 209–230. [Google Scholar] [CrossRef]
- James, L.F. Lamperti type laws. Ann. Appl. Probab. 2010, 20, 1303–1340. [Google Scholar] [CrossRef]
- James, L.F.; Lijoi, A.; Prünster, I. Posterior analysis for normalized random measures with independent increments. Scand. J. Stat. 2009, 36, 76–97. [Google Scholar] [CrossRef]
- Zhou, M.; Favaro, S.; Walker, S.G. Frequency of Frequencies Distributions and Size-Dependent Exchangeable Random Partitions. J. Am. Statist. Assoc. 2017, 112, 1623–1635. [Google Scholar] [CrossRef][Green Version]
- McCloskey, J.W. A Model for the Distribution of Individuals by Species in an Environment. Ph.D. Thesis, Michigan State University, East Lansing, MI, USA, 1965. [Google Scholar]
- Regazzini, E.; Lijoi, A.; Prünster, I. Distributional results for means of normalized random measures with independent increments. Ann. Statist. 2003, 31, 560–585. [Google Scholar] [CrossRef]
- Lijoi, A.; Prünster, I. Models Beyond the Dirichlet Process. In Bayesian Nonparametrics; Hjort, N.L., Holmes, C., Müller, P., Walker, S., Eds.; Cambridge University Press: Cambridge, UK, 2010; pp. 80–136. [Google Scholar]
- Wood, F.; Gasthaus, J.; Archambeau, C.; James, L.F.; Teh, Y.W. The Sequence Memoizer. Commun. ACM 2011, 54, 91–98. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).