# Mixture of Species Sampling Models

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

- $\xi ={\left({\xi}_{n}\right)}_{n\ge 1}$ is an mSSS;
- with probability one ${\left({\xi}_{n}\right)}_{n\ge 1}={\left({Z}_{{I}_{n}}\right)}_{n\ge 1}$, where ${\left({I}_{n}\right)}_{n\ge 1}$ is a sequence of integer-valued random variables independent of the Zs such that, conditionally on ${p}^{\downarrow}:=({p}_{1}^{\downarrow},{p}_{2}^{\downarrow},\dots )$, the ${I}_{n}$ are independent and $\mathbb{P}\{{I}_{n}=i|{p}^{\downarrow}\}={p}_{i}^{\downarrow}$.
- with probability one ${\left({\xi}_{n}\right)}_{n\ge 1}:={\left({Z}_{{\mathcal{C}}_{n}(\Pi )}^{\prime}\right)}_{n\ge 1}$, where ${\left({Z}_{n}^{\prime}\right)}_{n\ge 1}$ is an exchangeable sequence with the same law of ${\left({Z}_{n}\right)}_{n\ge 1}$, Π is an exchangeable partition, independent of ${\left({Z}_{n}^{\prime}\right)}_{n\ge 1}$, obtained by sampling from ${\left({p}_{n}^{\downarrow}\right)}_{n\ge 1}$, and ${\mathcal{C}}_{n}(\Pi )$ is the index of the block in Π containing n.

## 2. Background Materials

#### 2.1. Exchangeable Random Partitions

**Theorem**

**1**

**.**Given any exchangeable random partition Π with EPPF $\mathfrak{q}$, denote by ${\Pi}_{j,n}^{\downarrow}$ the blocks of the partition rearranged in decreasing order with respect to the number of elements in the blocks of ${\Pi}_{n}$. Then,

#### 2.2. Species Sampling Models

- (PS1) $\mathbb{P}\{{\xi}_{1}\in dx\}=H\left(dx\right)$;
- (PS2) the conditional distribution of ${\xi}_{n+1}$ given $({\xi}_{1},\dots ,{\xi}_{n})$ is$$\mathbb{P}\left\{{\xi}_{n+1}\in dx|{\xi}_{1},\dots ,{\xi}_{n}\right\}=\sum _{c=1}^{K}{\omega}_{n,c}{\delta}_{{\xi}_{c}^{*}}\left(dx\right)+{\nu}_{n}H\left(dx\right),$$

**Proposition**

**1**

**.**Let H be a diffuse probability measure; then, an exchangeable sequence ${\left({\xi}_{n}\right)}_{n}$ is characterized by (PS1)–(PS2) if and only if its directing random measure is an $SSrp(\mathfrak{q},H)$.

## 3. Mixture of Species Sampling Models

#### 3.1. Definitions and Relation to Other Models

**Definition**

**1**

**.**${\left({\xi}_{n}\right)}_{n\ge 1}$ is a $gSSS(\mathfrak{q},H)$ if it is an exchangeable sequence with directing random measure P, where $P\sim SSrp(\mathfrak{q},H)$, H being any measure on $(\mathbb{X},\mathcal{X})$ (not necessarily diffuse).

**Definition**

**2**

**.**We say that ${\left({\xi}_{n}\right)}_{n\ge 1}$ is a mixture of species sampling sequences ($mSSS$) if it is an exchangeable sequence with directing random measure

- $\tilde{u}$ is a random variable taking values in U with law Q;
- $\tilde{H}(\xb7):={\alpha}_{\tilde{u}}(\xb7)/{\alpha}_{\tilde{u}}\left(\mathbb{X}\right)$;
- ${\left({Z}_{n}\right)}_{n\ge 1}$ are exchangeable random variables with directing random measure $\tilde{H}$;
- ${p}^{\downarrow}$ is sequence of random weight in ∇ such that $P\{{\sum}_{j\ge 1}{p}_{j}^{\downarrow}=1\}=1$ and the conditional distribution of ${p}^{\downarrow}$ given $\tilde{u}$ depends only on ${\alpha}_{\tilde{u}}\left(\mathbb{X}\right)$. In particular, the (conditional) EPPF associated with the law of ${p}^{\downarrow}$ given $\tilde{u}$ has the form$$\mathfrak{q}({n}_{1},\dots ,{n}_{k}|\tilde{u}):=\frac{{\alpha}_{\tilde{u}}{\left(\mathbb{X}\right)}^{k}}{{\left({\alpha}_{\tilde{u}}\left(\mathbb{X}\right)\right)}_{n}}\prod _{c=1}^{k}({n}_{c}-1)!$$

#### 3.2. Representation Theorems for mSSS

**Proposition**

**2.**

**Proof.**

**Proposition**

**3.**

**Remark**

**1.**

**Proof of Proposition**

**3.**

**Corollary**

**1.**

## 4. Random Partitions Induced by mSSS

#### 4.1. Explicit Expression of the EPPF

- it is possible to determine k subset containing ${m}_{1},\dots ,{m}_{k}$ of these blocks;
- the union of the blocks in the i-th subset coincides with the i-th block of ${\tilde{\pi}}_{n}$ for $i=1,\dots ,k$;
- in the i-th block, there are ${\lambda}_{ij}$ blocks with j elements, for $j=1,\dots ,{n}_{i}$.

**Proposition**

**4.**

**Proof.**

**Corollary**

**2.**

**Proof.**

**Remark**

**2.**

**Remark**

**3.**

#### 4.2. EPPF When $\Pi $ Is of Gibbs Type

**Corollary**

**3.**

**Proof.**

#### 4.3. The EPPF of a $gSSS(\mathfrak{q},H)$

**Proposition**

**5.**

**Remark**

**4.**

#### 4.4. EPPF for $gSSS$ with Spike-and-Slab Base Measure

**Proposition**

**6.**

**Proof.**

## 5. Predictive Distributions

#### 5.1. Some General Results

**Proposition**

**7.**

**Proof.**

**Remark**

**5.**

**Example**

**1**

**.**Let ${Z}_{n}^{\prime}$ be defined as a mixture of normal random variables with Normal-Inverse-Gamma prior. In other words, given ${\mu}_{0}\in \mathbb{R}$, ${k}_{0}>0,{\alpha}_{0}>0,{\beta}_{0}>0$,

#### 5.2. Predictive Distributions for $gSSS$

**Proposition**

**8.**

**Proof.**

## 6. Conclusions and Discussion

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Appendix A

**Lemma**

**A1**

**.**Fix a probability kernel K between two measurable spaces S and T, and let σ be a random element defined on $(\Omega ,\mathcal{F},\mathbb{P})$ taking values in S. Then, there exists a random element η in T, defined on some extension of the original probability space Ω, such that $\mathbb{P}[\eta \in \xb7|\sigma ]=K(\xb7\left|\sigma \right)$ a.s. and, moreover, η is conditionally independent given σ from any other random element on Ω.

**Lemma**

**A2**

**.**Fix two Borel spaces S and T, a measurable mapping $f:T\to S$ and some random elements σ in S and $\tilde{\eta}$ in T with $\mathcal{L}\left(\sigma \right)=\mathcal{L}\left(f\left(\tilde{\eta}\right)\right)$. Then, there is a random element η defined on some extension of the original probability space, such that $\mathcal{L}\left(\eta \right)=\mathcal{L}\left(\tilde{\eta}\right)$ and $\sigma =f\left(\eta \right)$ a.s.

**Lemma**

**A3**

**.**Fix three Borel spaces ${S}_{1}$, ${S}_{2}$ and ${T}_{1}$, a measurable mapping $\varphi :{T}_{1}\times {S}_{2}\to {S}_{1}$ and some random elements $\sigma =({\sigma}_{1},{\sigma}_{2})$ in ${S}_{1}\times {S}_{2}$ and ${\tau}_{1}$ in ${T}_{1}$, all defined on a probability space $(\Omega ,\mathcal{F},P)$. Assume that the conditional law of ${\sigma}_{1}$ given ${\sigma}_{2}$ is the same as the conditional law of $\varphi ({\tau}_{1},{\sigma}_{2})$ given ${\sigma}_{2}$ (P-almost surely). Then, there is a random element τ defined on some extension of the original probability space $(\Omega ,\mathcal{F},P)$ taking values in ${T}_{1}$ such that

- ${\sigma}_{1}=\varphi (\tau ,{\sigma}_{2})$ a.s.
- $\mathcal{L}({\tau}_{1},{\sigma}_{2})=\mathcal{L}(\tau ,{\sigma}_{2})$.

**Proof.**

## References

- Ferguson, T.S. A Bayesian analysis of some nonparametric problems. Ann. Stat.
**1973**, 1, 209–230. [Google Scholar] [CrossRef] - Pitman, J.; Yor, M. The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Ann. Probab.
**1997**, 25, 855–900. [Google Scholar] [CrossRef] - Perman, M.; Pitman, J.; Yor, M. Size-biased sampling of Poisson point processes and excursions. Probab. Theory Relat. Fields
**1992**, 92, 21–39. [Google Scholar] [CrossRef] - Regazzini, E.; Lijoi, A.; Prünster, I. Distributional results for means of normalized random measures with independent increments. Ann. Stat.
**2003**, 31, 560–585. [Google Scholar] [CrossRef] - James, L.F.; Lijoi, A.; Prünster, I. Posterior analysis for normalized random measures with independent increments. Scand. J. Stat.
**2009**, 36, 76–97. [Google Scholar] [CrossRef] - Lijoi, A.; Prünster, I. Models beyond the Dirichlet process. In Bayesian Nonparametrics; Hjort, N.L., Holmes, C., Müller, P., Walker, S., Eds.; Cambridge University Press: New York, NY, USA, 2010. [Google Scholar]
- De Blasi, P.; Favaro, S.; Lijoi, A.; Mena, R.H.; Prunster, I.; Ruggiero, M. Are Gibbs-Type Priors the Most Natural Generalization of the Dirichlet Process? IEEE Trans. Pattern Anal. Mach. Intell.
**2015**, 37, 212–229. [Google Scholar] [CrossRef] [Green Version] - Pitman, J. Poisson-Kingman partitions. In Statistics and Science: A Festschrift for Terry Speed; IMS Lecture Notes Monograph Series; Institute of Mathematical Statistics: Beachwood, OH, USA, 2003; Volume 40, pp. 1–34. [Google Scholar]
- Ishwaran, H.; James, L.F. Gibbs sampling methods for stick-breaking priors. J. Am. Stat. Assoc.
**2001**, 96, 161–173. [Google Scholar] [CrossRef] - Antoniak, C.E. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann. Stat.
**1974**, 2, 1152–1174. [Google Scholar] [CrossRef] - Cifarelli, D.M.; Regazzini, E. Distribution functions of means of a Dirichlet process. Ann. Stat.
**1990**, 18, 429–442. [Google Scholar] [CrossRef] - Sangalli, L.M. Some developments of the normalized random measures with independent increments. Sankhyā
**2006**, 68, 461–487. [Google Scholar] - Broderick, T.; Wilson, A.C.; Jordan, M.I. Posteriors, conjugacy, and exponential families for completely random measures. Bernoulli
**2018**, 24, 3181–3221. [Google Scholar] [CrossRef] [Green Version] - Bassetti, F.; Ladelli, L. Asymptotic number of clusters for species sampling sequences with non-diffuse base measure. Stat. Probab. Lett.
**2020**, 162, 108749. [Google Scholar] [CrossRef] - Pitman, J. Some developments of the Blackwell-MacQueen urn scheme. In Statistics, Probability and Game Theory; IMS Lecture Notes Monograph Series; Institute of Mathematical Statistics: Hayward, CA, USA, 1996; Volume 30, pp. 245–267. [Google Scholar] [CrossRef]
- Dunson, D.B.; Herring, A.H.; Engel, S.M. Bayesian selection and clustering of polymorphisms in functionally related genes. J. Am. Stat. Assoc.
**2008**, 103, 534–546. [Google Scholar] [CrossRef] - Kim, S.; Dahl, D.B.; Vannucci, M. Spiked Dirichlet process prior for Bayesian multiple hypothesis testing in random effects models. Bayesian Anal.
**2009**, 4, 707–732. [Google Scholar] [CrossRef] [PubMed] - Suarez, A.J.; Ghosal, S. Bayesian Clustering of Functional Data Using Local Features. Bayesian Anal.
**2016**, 11, 71–98. [Google Scholar] [CrossRef] - Cui, K.; Cui, W. Spike-and-Slab Dirichlet Process Mixture Models. Spike Slab Dirichlet Process. Mix. Model.
**2012**, 2, 512–518. [Google Scholar] [CrossRef] [Green Version] - Barcella, W.; De Iorio, M.; Baioa, G.; Malone-Leeb, J. Variable selection in covariate dependent random partition models: An application to urinary tract infection. Stat. Med.
**2016**, 35, 1373–13892. [Google Scholar] [CrossRef] [Green Version] - Canale, A.; Lijoi, A.; Nipoti, B.; Prünster, I. On the Pitman–Yor process with spike and slab base measure. Biometrika
**2017**, 104, 681–697. [Google Scholar] [CrossRef] - Teh, Y.; Jordan, M.I. Hierarchical Bayesian nonparametric models with applications. In Bayesian Nonparametrics; Hjort, N.L., Holmes, C., Müller, P., Walker, S., Eds.; Cambridge University Press: New York, NY, USA, 2010. [Google Scholar]
- Teh, Y.W.; Jordan, M.I.; Beal, M.J.; Blei, D.M. Hierarchical Dirichlet processes. J. Am. Stat. Assoc.
**2006**, 101, 1566–1581. [Google Scholar] [CrossRef] - Camerlenghi, F.; Lijoi, A.; Orbanz, P.; Pruenster, I. Distribution theory for hierarchical processes. Ann. Stat.
**2019**, 1, 67–92. [Google Scholar] [CrossRef] [Green Version] - Bassetti, F.; Casarin, R.; Rossini, L. Hierarchical Species Sampling Models. Bayesian Anal.
**2020**, 15, 809–838. [Google Scholar] [CrossRef] - Pitman, J. Combinatorial Stochastic Processes; Lectures from the 32nd Summer School on Probability Theory Held in Saint-Flour, 7–24 July 2002, with a Foreword by Jean Picard; Lecture Notes in Mathematics; Springer: Berlin, Germany, 2006; Volume 1875. [Google Scholar]
- Crane, H. The ubiquitous Ewens sampling formula. Stat. Sci.
**2016**, 31, 1–19. [Google Scholar] [CrossRef] - Kingman, J.F.C. The representation of partition structures. J. Lond. Math. Soc.
**1978**, 18, 374–380. [Google Scholar] [CrossRef] - Aldous, D.J. Exchangeability and related topics. In École d’été de Probabilités de Saint-Flour, XIII—1983; Lecture Notes in Mathematics; Springer: Berlin, Germany, 1985; Volume 1117, pp. 1–198. [Google Scholar] [CrossRef]
- Kallenberg, O. Canonical representations and convergence criteria for processes with interchangeable increments. Z. Wahrscheinlichkeitstheorie Und Verw. Geb.
**1973**, 27, 23–36. [Google Scholar] [CrossRef] - Pitman, J. Exchangeable and partially exchangeable random partitions. Probab. Theory Relat. Fields
**1995**, 102, 145–158. [Google Scholar] [CrossRef] - Gnedin, A.; Pitman, J. Exchangeable Gibbs partitions and Stirling triangles. Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI)
**2005**, 325, 83–102, 244–245. [Google Scholar] [CrossRef] [Green Version] - Schervish, M.J. Theory of Statistics; Springer Series in Statistics; Springer: New York, NY, USA, 1995. [Google Scholar] [CrossRef]
- Marin, J.M.; Robert, C.P. Bayesian Core: A Practical Approach to Computational Bayesian Statistics; Springer Texts in Statistics; Springer: New York, NY, USA, 2007; pp. xiv+255. [Google Scholar]
- Kallenberg, O. Foundations of Modern Probability, 3rd ed.; Probability Theory and Stochastic Modelling; Springer: New York, NY, USA, 2021; Volume 99. [Google Scholar]

**Figure 1.**Pictorial representation of the latent partition structure of an mSSS. In the example, the partition induced by $({\xi}_{1},\dots ,{\xi}_{n})$ for $n=8$ is ${\tilde{\Pi}}_{n}=\{[1,3,4,7],\left[2\right],[5,6,8]\}$, and it is represented using rounded squares (left bottom). Circles at the top left represent a compatible latent partition, namely ${\Pi}_{n}=\{[1,3],\left[2\right],[4,7],[5,8],\left[6\right]\}$. The partition on $\{1,\dots ,5\}$ induced by the latent ${Z}_{n}^{\prime}$, i.e., ${\Pi}_{|{\Pi}_{n}|}^{\left(0\right)}=\{[1,3],\left[2\right],[4,5]\}$, is represented with squares in the middle of the figure. Combining ${\Pi}_{n}$ and ${\Pi}_{|{\Pi}_{n}|}^{\left(0\right)}$, one obtains ${\tilde{\Pi}}_{n}$. The statistics $\mathit{n}$, $\mathit{m}$ and $\lambda $ corresponding to this particular configuration are shown in the box at the bottom right.

**Figure 2.**Predictive CDFs for the relative changes in larcenies between 1991 and 1995 (relative to 1991) for the 90 most populous US counties; data taken from Section 2.1 of [34]. Data have been rounded to the second decimal. Here, $n=90$ and $k=36$. Solid line: empirical CDF. Dotted line: predictive CDF from (33). Dashed line: predictive CDF from PS2 with $H={\mathcal{T}}_{2{\alpha}_{0}}(\xb7|{\mu}_{0},{\sigma}_{0}^{2})$, ${\sigma}_{0}^{2}={\beta}_{0}({k}_{0}+1)/{\alpha}_{0}{k}_{0}$. Different plots correspond to different values of $\theta $ and $\sigma $. In all the plots, the predictive CDFs are evaluated with ${\mu}_{0}=0$, ${\alpha}_{0}=0.1$, ${\beta}_{0}=0.1$ and ${k}_{0}=0.1$.

**Figure 3.**Predictive CDFs for the relative changes in larcenies between 1991 and 1995 (relative to 1991) for the 90 most populous US counties; data taken from Section 2.1 of [34]. Raw data, without rounding. Here, $n=90$ and $k=36$. Solid line: empirical CDF. Dotted line: predictive CDF from (33). Dashed line: predictive CDF from PS2 with $H={\mathcal{T}}_{2{\alpha}_{0}}(\xb7|{\mu}_{0},{\sigma}_{0}^{2})$, ${\sigma}_{0}^{2}={\beta}_{0}({k}_{0}+1)/{\alpha}_{0}{k}_{0}$. Different plots correspond to different values of $\theta $ and $\sigma $. In all the plots, the predictive CDFs are evaluated with ${\mu}_{0}=0$, ${\alpha}_{0}=0.1$, ${\beta}_{0}=0.1$ and ${k}_{0}=0.1$.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Bassetti, F.; Ladelli, L.
Mixture of Species Sampling Models. *Mathematics* **2021**, *9*, 3127.
https://doi.org/10.3390/math9233127

**AMA Style**

Bassetti F, Ladelli L.
Mixture of Species Sampling Models. *Mathematics*. 2021; 9(23):3127.
https://doi.org/10.3390/math9233127

**Chicago/Turabian Style**

Bassetti, Federico, and Lucia Ladelli.
2021. "Mixture of Species Sampling Models" *Mathematics* 9, no. 23: 3127.
https://doi.org/10.3390/math9233127