Next Article in Journal
On Bilinear Narrow Operators
Next Article in Special Issue
The Rescaled Pólya Urn and the Wright—Fisher Process with Mutation
Previous Article in Journal
Sustainable Development Model of EU Cities Compliant with UN Settings
Previous Article in Special Issue
Predictive Constructions Based on Measure-Valued Pólya Urn Processes
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

On Johnson’s “Sufficientness” Postulates for Feature-Sampling Models

by
Federico Camerlenghi
1,2,3,* and
Stefano Favaro
2,4,5
1
Department of Economics, Management and Statistics, University of Milano-Bicocca, Piazza dell’Ateneo Nuovo 1, 20126 Milano, Italy
2
Collegio Carlo Alberto, Piazza V. Arbarello 8, 10122 Torino, Italy
3
BIDSA, Bocconi University, via Röntgen 1, 20136 Milano, Italy
4
Department of Economics, Social Studies, Applied Mathematics and Statistics, University of Torino, Corso Unione Sovietica 218/bis, 10134 Torino, Italy
5
IMATI-CNR “Enrico Magenes”, 20133 Milano, Italy
*
Author to whom correspondence should be addressed.
Mathematics 2021, 9(22), 2891; https://doi.org/10.3390/math9222891
Submission received: 10 October 2021 / Revised: 3 November 2021 / Accepted: 10 November 2021 / Published: 13 November 2021

Abstract

:
In the 1920s, the English philosopher W.E. Johnson introduced a characterization of the symmetric Dirichlet prior distribution in terms of its predictive distribution. This is typically referred to as Johnson’s “sufficientness” postulate, and it has been the subject of many contributions in Bayesian statistics, leading to predictive characterization for infinite-dimensional generalizations of the Dirichlet distribution, i.e., species-sampling models. In this paper, we review “sufficientness” postulates for species-sampling models, and then investigate analogous predictive characterizations for the more general feature-sampling models. In particular, we present a “sufficientness” postulate for a class of feature-sampling models referred to as Scaled Processes (SPs), and then discuss analogous characterizations in the general setup of feature-sampling models.

1. Introduction

Exchangeability (de Finetti [1]) provides a natural modeling assumption in a large variety of statistical problems, and it amounts to the assumption that the order in which observations are recorded is not relevant. Consider a sequence of random variables ( Z j ) j 1 defined on a common probability space ( Ω , A , P ) and taking values in an arbitrary space, which is assumed to be Polish. The sequence ( Z j ) j 1 is exchangeable if and only if
( Z 1 , , Z n ) = d ( Z σ ( 1 ) , , Z σ ( n ) )
for any permutation σ of the set { 1 , , n } and any n 1 . By virtue of the celebrated de Finetti representation theorem, exchangeability of ( Z j ) j 1 is tantamount to asserting the existence of a random element μ ˜ , defined on a (parameter) space Θ , such that, conditionally on μ ˜ , the Z j s are independent and identically distributed with common distribution p μ ˜ , i.e.,
Z j | μ ˜ iid p μ ˜ j 1 μ ˜ M ,
where M is the distribution of μ ˜ . In a Bayesian setting, M takes on the interpretation of a prior distribution for the parameter object of interest. In this sense, the de Finetti representation theorem is a natural framework for Bayesian statistics. For mathematical convenience, Θ is assumed to be a Polish space, equipped with the Borel σ -algebra B ( Θ ) . Hereafter, with the term parameter, we refer to both a finite- and an infinite-dimensional object.
Within the framework of exchangeability (1), a critical role is played by the predictive distributions, namely, the conditional distributions of the ( n + 1 ) th observation Z n + 1 given Z n : = ( Z 1 , , Z n ) . The problem of characterizing prior distributions M in terms of their predictive distributions has a long history in Bayesian statistics, starting from the seminal work of the English philosopher Johnson [2] who provided a predictive characterization of the symmetric Dirichlet prior distribution. Such a characterization is typically referred to as Johnson’s “sufficientness” postulate. Species-sampling models (Pitman [3]) provide arguably the most popular infinite-dimensional generalization of the Dirichlet distribution. They form a broad class of nonparametric prior models that correspond to the assumption that p μ ˜ in (1) is an almost surely discrete random probability measure
p ˜ = i 1 p ˜ i δ z ˜ i ,
where: (i) ( p ˜ i ) i 1 are non-negative random weights almost surely summing up to 1; (ii) ( z ˜ i ) i 1 are random species’ labels, independent of ( p ˜ i ) i 1 , and i.i.d. with common (non-atomic) distribution P. The term species refers to the fact that the law of p ˜ is a prior distribution for the unknown species composition ( p ˜ i ) i 1 of a population of individuals Z j s, with Z j belonging to a species z ˜ i with probability p ˜ i for j , i 1 . In the context of species-sampling models, Regazzini [4] and Lo [5] provided a “sufficientness” postulate for the Dirichlet process (Ferguson [6]). Such a characterization was then extended by Zabell [7] to the Pitman–Yor process (Perman et al. [8], Pitman and Yor [9]) and by Bacallado et al. [10] to the more general Gibbs-type prior models (Gnedin and Pitman [11]).
In this paper, we introduce and discuss Johnson’s “sufficientness” postulates in the feature-sampling setting, which generalizes the species-sampling setting by allowing each individual of the population to belong to multiple species, now called features. We point out that feature-sampling models are extremely important in different areas of application; see, e.g., Griffiths and Ghahramani [12], Ayed et al. [13] and the references therein. Under the framework of exchangeability (1), the feature-sampling setting assumes that
Z j | μ ˜ = i 1 A j , i δ w ˜ i p μ ˜ ,
and
μ ˜ = i 1 p ˜ i δ w ˜ i
where: (i) conditionally on μ ˜ , ( A j , i ) i 1 are independent Bernoulli random variables with parameters ( p ˜ i ) i 1 ; (ii) ( p ˜ i ) i 1 are ( 0 , 1 ) -valued random weights; (iii) ( w ˜ i ) i 1 are random features’ labels, independent of ( p ˜ i ) i 1 , and i.i.d. with common (non-atomic) distribution P. That is, individual Z j displays feature w ˜ i if and only if A j , i = 1 , which happens with probability p ˜ i . For example, if, conditionally on μ ˜ , Z j displays only two features, say w ˜ 1 and w ˜ 5 , it equals the random measure δ w ˜ 1 + δ w ˜ 5 . The distribution p μ ˜ is the law of a Bernoulli process with parameter μ ˜ , which is denoted by BeP ( μ ˜ ) , whereas the law of μ ˜ is a nonparametric prior distribution for the unknown feature probabilities ( p ˜ i ) i 1 , i.e., a feature-sampling model. Here, we investigate the problem of characterizing prior distributions for μ ˜ in terms of their predictive distributions, with the goal of providing “sufficientness” postulates for feature-sampling models. We discuss such a problem and present partial results for a class of feature-sampling models referred to as Scaled Process (SP) priors for μ ˜ (James et al. [14], Camerlenghi et al. [15]). With these results, we aim at stimulating future research in this field to obtain “sufficientness” postulates for general feature-sampling models.
The paper is structured as follows. In Section 2, we present a brief review on Johnson’s “sufficientness” postulates for species-sampling models. Section 3 focuses on nonparametric prior models for the Bernoulli process, i.e., feature-sampling models; we review their definitions, properties, and sampling structures. In Section 4, we present a “sufficientness” postulate for SPs. Section 5 concludes the paper by discussing our results and conjecturing analogous results for more general classes of feature-sampling models.

2. Species-Sampling Models

To introduce species-sampling models, we assume that the observations are Z -valued random elements, and Z is supposed to be a Polish space whose Borel σ -algebra is denoted by Z . Thus, Z contains all the possible species’ labels of the populations. When we deal with species-sampling models, the hierarchical formulation (1) specializes as
Z j | p ˜ iid p ˜ j 1 p ˜ M
where p ˜ = i 1 p ˜ i δ z ˜ i is an almost surely discrete random probability measure on Z , and M denotes its law. We also remind the reader that: (i) ( p ˜ i ) i 1 are non-negative random weights almost surely summing up to 1; (ii) ( z ˜ i ) i 1 are random species’ labels, independent of ( p ˜ i ) i 1 , and i.i.d. as a common (non-atomic) distribution P. Using the terminology of Pitman [3], the discrete random probability measure p ˜ is a species-sampling model. In Bayesian nonparametrics, popular examples of species-sampling models are: the Dirichlet process (Ferguson [6]), the Pitman–Yor process (Perman et al. [8], Pitman and Yor [9]), and the normalized generalized Gamma process (Brix [16], Lijoi et al. [17]). These are examples belonging to a peculiar subclass of species-sampling models, which are referred to as Gibbs-type prior models (Gnedin and Pitman [11], De Blasi et al. [18]). More general subclasses of species-sampling models are, e.g., the homogeneous normalized random measures (Regazzini et al. [19]) and the Poisson–Kingman models (Pitman [20], Pitman [21]). We refer to Lijoi and Prünster [22] and Ghosal and van der Vaart [23] for a detailed and stimulating account on species-sampling models and their use in Bayesian nonparametrics.
Because of the almost sure discreteness of p ˜ in (4), a random sample Z n : = ( Z 1 , , Z n ) from p ˜ features ties, that is, P ( Z j 1 = Z j 2 ) > 0 if j 1 j 2 . Thus, Z n induces a random partition of the set { 1 , , n } into K n = k n blocks, labeled by Z 1 , , Z K n , with corresponding frequencies ( N n , 1 , , N n , K n ) = ( n 1 , , n k ) , such that N i , n 1 and 1 i K n N i , n = n . From Pitman [3], the predictive distribution of p ˜ is of the form
P ( Z n + 1 A | Z n ) = g ( n , k , n ) P ( A ) + i = 1 k f i ( n , k , n ) δ Z i ( A ) , A Z ,
for any n 1 , having set n = ( n 1 , , n k ) , with g and f i being arbitrary non-negative functions that satisfy the constraint g ( n , k , n ) + i = 1 k f i ( n , k , n ) = 1 . The predictive distribution (5) admits the following interpretation: (i) g ( n , k , n ) corresponds to the probability that Z n + 1 is a new species, that is, a species not observed in Z n ; (ii) f i ( n , k , n ) corresponds to the probability that Z n + 1 is a species Z i in Z n . The functions g and f i completely determine the distribution of the exchangeable sequence ( Z j ) j 1 and, in turn, the distribution of the random partition of N induced by ( Z j ) j 1 . Predictive distributions of popular species-sampling models, e.g., the Dirichlet process, the Pitman–Yor process, and the normalized generalized Gamma process, are of the form (5) for suitable specification of the functions g and f i . We refer to Pitman [21] for a detailed account of random partitions induced by species-sampling models and generalizations thereof.
Here, we recall the predictive distribution of Gibbs-type prior models (Gnedin and Pitman [11], De Blasi et al. [18]). Let us first introduce the definition of these processes.
Definition 1.
Let σ ( , 1 ) and let P be a (non-atomic) distribution on ( Z , Z ) . A Gibbs-type prior model is a species-sampling model with a predictive distribution of the form
P ( Z n + 1 A | Z n ) = V n + 1 , k + 1 V n , k P ( A ) + V n + 1 , k V n , k i = 1 k ( n i σ ) δ Z i ( A ) , A Z ,
for any n 1 , where { V n , k : n 1 , 1 k n } is a collection of non-negative weights that satisfy the recurrence relation V n , k = ( n σ k ) V n + 1 , k + V n + 1 , k + 1 for all k = 1 , , n , n 1 , with the proviso V 1 , 1 = 1 .
Note that the Dirichlet process is a Gibbs-type prior model that corresponds to
V n , k = θ k ( θ ) n
for θ > 0 , where we have denoted by ( a ) b = Γ ( a + b ) / Γ ( a ) the Pochhammer symbol for the rising factorials. Moreover, the Pitman–Yor process is a Gibbs-type prior model corresponding to
V n , k = i = 0 k 1 ( θ + i σ ) ( θ ) n
for σ ( 0 , 1 ) and θ > α . We refer to Pitman [20] for other examples of Gibbs-type prior models and for a detailed account of the V n , k s; see also Pitman [21] and the references therein.
Because of de Finetti’s representation theorem, there exists a one-to-one correspondence between the functions g and f i in the predictive distribution (5) and the law M of p ˜ , i.e., the de Finetti measure. This is at the basis of Johnson’s “sufficientness” postulates, characterizing species-sampling models through their predictive distributions. Regazzini [4] and, later, Lo [5] provided the first “sufficientness” postulate for species-sampling models, showing that the Dirichlet process is the unique species-sampling model for which the function g depends on Z n only through n, and the function f i depends on Z n only through n and n i for i 1 . Such a result was extended in Zabell [24], providing the following “sufficientness” postulate for the Pitman–Yor process: The Pitman–Yor process is the unique species-sampling model for which the function g depends on Z n only through n and k, and the function f i depends on Z n only through n and n i for i 1 . Bacallado et al. [10] discussed the “sufficientness” postulate in the more general setting of Gibbs-type prior models, showing that Gibbs-type prior models are the sole species-sampling models for which the function g depends on Z n only through n and k, and the function f i depends on Z n only through n, k, and n i . This result shows a critical difference—at the sampling level—between the Pitman–Yor process and Gibbs-type prior models, which lies in the inclusion of the sampling information on the observed number of distinct species in the probability of observing, at the ( n + 1 ) -th draw, a species already observed in the sample.

3. Feature-Sampling Models

Feature-sampling models generalize species-sampling models by allowing each individual to belong to more than one species, which are now called features. To introduce feature-sampling models, we consider a space of features W , which is assumed to be a Polish space, and we denote by W its Borel σ -field. Thus, W contains all the possible features’ labels of the population. Observations are represented through the counting measure (3), whose parameter μ ˜ is an almost surely discrete measure with masses in ( 0 , 1 ) . When we deal with feature-sampling models, the hierarchical formulation (1) specializes as
Z j | μ ˜ iid BeP ( μ ˜ ) μ ˜ M
where μ ˜ = i 1 p ˜ i δ w ˜ i is an almost surely discrete random measure on W , and M denotes its law. We also remind the reader that: (i) conditionally on μ ˜ , ( A j , i ) i 1 are independent Bernoulli random variables with parameters ( p ˜ i ) i 1 ; (ii) ( p ˜ i ) i 1 are ( 0 , 1 ) -valued random weights; (iii) ( w ˜ i ) i 1 are random features’ labels, independent of ( p ˜ i ) i 1 , and i.i.d. with common (non-atomic) distribution P. Completely random measures (CRMs) (Daley and Vere-Jones [25], Kingman [26]) provide a popular class of nonparametric priors M , the most common examples of which are the Beta process prior and the stable Beta process prior (Teh and Gorur [27], James [28]); see also Broderick et al. [29] and the references therein for other examples of CRM priors and generalizations thereof. Recently, Camerlenghi et al. [15] investigated an alternative class of nonparametric priors M , generalizing CRM priors and referring to these as Scaled Processes (SPs). SP priors first appeared in the work of James [28].
We assume a random sample Z n : = ( Z 1 , , Z n ) to be modeled as in (7), and we introduce the predictive distribution of μ ˜ , that is, the conditional probability of Z n + 1 given Z n . Note that, because of the pure discreteness of μ ˜ , the observations Z n may share a random number of distinct features, say K n = k , denoted here as W 1 , , W K n , and each feature W i is displayed exactly by M n , i = m i of the n individuals as i = 1 , , k . Since the features’ labels are immaterial and i.i.d. form the base measure P, the conditional distribution of Z n + 1 , given Z n , may be equivalently characterized through the vector ( Y n + 1 , A n + 1 , 1 , , A n + 1 , K n ) , where: (i) Y n + 1 is the number of new features displayed by the ( n + 1 ) th individual, namely, hitherto unobserved out of the sample Z n ; (ii) A n + 1 , i is a { 0 , 1 } -valued random variable for any i = 1 , , K n , and A n + 1 , i = 1 if the ( n + 1 ) th individual displays feature W i ; it equals 0 otherwise. Hence, the predictive distribution of μ ˜ is
P ( ( Y n + 1 , A n + 1 , 1 , , A n + 1 , K n ) = ( y , a 1 , , a K n ) | Z n ) = f ( y , a 1 , , a k ; n , k , m )
where we denote by f a probability distribution evaluated at ( y , a 1 , , a k ) , and where n , k and m : = ( m 1 , , m k ) is the sampling information. In the rest of this section, we specify the function f under the assumption of a CRM prior and an SP prior, showing its dependence on n , K n , and ( M n , 1 , , M n , K n ) . In particular, we show how SP priors allow one to enrich the predictive distribution of CRM priors by including additional sampling information in terms of the number of distinct features and their corresponding frequencies.

3.1. Priors Based on CRMs

Let M W denote the space of all bounded and finite measures on ( W , W ) , that is to say, μ M W iff μ ( A ) < + for any bounded set A W . Here, we recall the definition of a Completely Random Measure (CRM) (see, e.g., Daley and Vere-Jones [25]).
Definition 2.
A Completely Random Measure (CRM) μ ˜ on ( W , W ) is a random element taking values in the space M W such that the random variables μ ˜ ( A 1 ) , , μ ˜ ( A n ) are independent for any choice of bounded and disjoint sets A 1 , , A n W and for any n 1 .
We remind the reader that Kingman [26] proved that a CRM may be decomposed as the sum of a deterministic drift and a purely atomic component. In Bayesian nonparametrics, it is common to consider purely atomic CRMs without fixed points of discontinuity, that is to say, μ ˜ may be represented as μ ˜ : = i 1 η ˜ i δ w ˜ i , where ( η ˜ i ) i 1 is a sequence of random atoms and ( w ˜ i ) i 1 are the random locations. An appealing property of purely atomic CRMs is the availability of their Laplace functional; indeed, for any measurable function f : W R + , one has
E e W f ( w ) μ ˜ ( d w ) = exp W × R + ( 1 e s f ( w ) ) ν ( d w , d s )
where ν is a measure on W × R + called the Lévy intensity of the CRM μ ˜ , and it is such that
ν ( { w } × R + ) = 0 w W , and A × R + min { s , 1 } ν ( d w , d s ) <
for any bounded Borel set A. Here, we focus on homogeneous CRMs by assuming that the atoms η ˜ i s and the locations w ˜ i s are independent; in this case, the Lévy measure may be written as
ν ( d w , d s ) = λ ( s ) d s P ( d w )
for some measurable function λ : R + R + and a probability measure P on ( W , W ) , called the base measure, which is assumed to be diffuse. In this case, the distribution of μ ˜ will be denoted as CRM ( λ ; P ) , and the second integrability condition in (10) reduces to the following:
R + min { s , 1 } λ ( s ) d s < + .
In the feature-sampling framework, μ ˜ may be used as a prior distribution if the sequence of atoms ( η ˜ i ) i 1 is in between [ 0 , 1 ] , which happens if the Lévy intensity has support on W × [ 0 , 1 ] . A noteworthy example, widely used in this setting, is the stable Beta process prior (Teh and Gorur [27]). It is defined as a CRM with Lévy intensity
λ ( s ) = α · Γ ( 1 + c ) Γ ( 1 σ ) Γ ( c + σ ) s 1 σ ( 1 s ) c + σ 1 𝟙 ( 0 , 1 ) ( s )
where c > 0 , σ ( 0 , 1 ) , and α > 0 (James [28], Masoero et al. [30]). Now, we describe the predictive distribution for an arbitrary CRM μ ˜ . For the sake of clarity, we fix the following notation:
Poiss ( y ; C ) : = C y e C y ! , y N and Bern ( a ; p ) : = p a ( 1 p ) 1 a , a { 0 , 1 }
to denote the probability mass functions of a Poisson with parameter C > 0 and a Bernoulli random variable with parameter p [ 0 , 1 ] , respectively. We refer to James [28] for a detailed posterior analysis of CRM priors; see also Broderick et al. [29] and the references therein.
Theorem 1
(James [28]). Let Z 1 , Z 2 , be exchangeable random variables modeled as in (7), where M equals CRM ( λ ; P ) . If Z n is a random sample that displays K n = k distinct features { W 1 , , W K n } , and feature W i appears exactly M n , i = m i times in the samples, such as i = 1 , , K n , then
P ( ( Y n + 1 , A n + 1 , 1 , , A n + 1 , K n ) = ( y , a 1 , , a K n ) | Z n ) = Poiss y ; 0 1 s ( 1 s ) n λ ( s ) d s i = 1 k Bern ( a i ; p i )
being
p i : = 0 1 s m i + 1 ( 1 s ) n m i λ ( s ) d s 0 1 s m i ( 1 s ) n m i λ ( s ) d s .
Proof. 
We consider James [28] (Proposition 3.2) for Bernoulli product models (see also Camerlenghi et al. [15] (Proposition 1)); thus, the distribution of Z n + 1 , given Z n , equals the distribution of
Z n + 1 + i = 1 K n A n + 1 , i δ W i ,
where Z n + 1 | μ ˜ = i 1 A n + 1 , i δ w ˜ i BeP ( μ ˜ ) such that μ ˜ CRM ( ( 1 s ) n λ ; P ) , and A n + 1 , 1 , , A n + 1 , K n are Bernoulli random variables with parameters J 1 , , J K n , respectively, such that each J i is a random variable whose distribution is with a density function of the form
f J i ( s ) ( 1 s ) n m i s m i λ ( s ) .
By exploiting the previous predictive characterization, we can derive the posterior distribution of Y n + 1 given Z n by means of a direct application of the Laplace functional. Indeed, the distribution of Y n + 1 | Z n equals i 1 A n + 1 , i . Thus, for any t R , we have the following:
E [ e t Y n + 1 | Z n ] = E [ e t i 1 A n + 1 , i ] = E i 1 e t A n + 1 , i = E E i 1 e t A n + 1 , i μ ˜ = E i 1 e t η ˜ i + ( 1 η ˜ i ) ,
where we used the representation μ ˜ = i 1 η ˜ i δ w ˜ i and the fact that the A n + 1 , i s are independent Bernoulli random variables conditionally on μ ˜ . We now use the Laplace functional for μ ˜ to get
E [ e t Y n + 1 | Z n ] = E exp i 1 log ( 1 + η ˜ i ( e t 1 ) ) = exp ( 1 e t ) 0 1 ( 1 s ) n s λ ( s ) d s .
As a direct consequence, the posterior distribution of Y n + 1 given Z n is a Poisson distribution with mean 0 1 ( 1 s ) n s λ ( s ) d s . Again, by exploiting the predictive representation (14), the posterior distribution of A n + 1 , i , as i = 1 , , K n , is a Bernoulli with the following mean:
E [ J i ] = 0 1 s f J i ( s ) d s = 0 1 ( 1 s ) n m i s m i + 1 λ ( s ) d s 0 1 ( 1 s ) n m i s m i λ ( s ) d s .
Corollary 1.
Let Z 1 , Z 2 , be exchangeable random variables modeled as in (7), where M is the law of the stable Beta process. If Z n is a random sample that displays K n = k distinct features { W 1 , , W K n } , and feature W i appears exactly M n , i = m i times in the samples, such as i = 1 , , K n , then
P ( ( Y n + 1 , A n + 1 , 1 , , A n + 1 , K n ) = ( y , a 1 , , a K n ) | Z n ) = Poiss y ; α ( c + σ ) n ( c + 1 ) n i = 1 k Bern a i ; m i σ n + c ,
where ( x ) y = Γ ( x + y ) / Γ ( x ) denotes the Pochhammer symbol for x , y > 0 .
Proof. 
It is sufficient to specialize Theorem 1 for the stable Beta process. In particular, from Theorem 1, the posterior distribution of Y n + 1 given Z n is a Poisson distribution with mean
0 1 s ( 1 s ) n λ ( s ) d s = ( 12 ) α Γ ( 1 + c ) Γ ( 1 σ ) Γ ( c + σ ) 0 1 s σ ( 1 s ) n + c + σ d s = α ( c + σ ) n ( c + 1 ) n .
Moreover, the parameters of the Bernoulli random variables A n + 1 , 1 , , A n + 1 , K n are equal to
p i = 0 1 s m i + 1 ( 1 s ) n m i λ ( s ) d s 0 1 s m i ( 1 s ) n m i λ ( s ) d s = ( 12 ) B ( m i + 1 σ , c + σ + n m i ) B ( m i σ , c + σ + n m i ) = m i σ n + c
as i = 1 , , K n . □

3.2. SP Priors

From Theorem 1, under CRM priors, the distribution of the number of new features Y n + 1 is a Poisson distribution that depends on the sampling information only through the sample size n. Moreover, the probability of observing a feature already observed in the sample, say W i , depends only on the sample size n and the frequency m i of feature W i out of the initial sample. Camerlenghi et al. [15] showed that SP priors allow one to enrich the predictive structure of CRM priors, including additional sampling information in the probability of discovering new features. To introduce SP priors, consider a CRM μ ˜ = i 1 τ ˜ i δ w ˜ i on W , where ( τ ˜ i ) i 1 are positive random atoms and ( w ˜ i ) i 1 are i.i.d. random atoms, with Lévy intensity ν ( d w , d s ) = λ ( s ) d s P ( d w ) satisfying
0 min { s , 1 } λ ( s ) d s < + .
Consider the ordered jumps Δ 1 > Δ 2 > of the CRM μ ˜ and define the random measure
μ ˜ Δ 1 = i 1 Δ i + 1 Δ 1 δ w ˜ i
normalizing μ ˜ by the largest jump. The definition of SPs follows with a suitable change in the measure of Δ 1 (James et al. [14], Camerlenghi et al. [15]). Let us denote by L ( · , a ) a regular version of the conditional probability distribution of ( Δ i + 1 / Δ 1 ) i 1 given Δ 1 = a . Now denote by Ψ 1 a positive random variable with density function f Ψ 1 on R + and define
L ( · ) : = R + L ( · , a ) f Ψ 1 ( a ) d a
The distribution of ( Δ i + 1 / Δ 1 ) i 1 is obtained by mixing L ( · , a ) with respect to the density function f Ψ 1 . Thus, we are ready to define an SP.
Definition 3.
A Scaled Process (SP) prior on ( W , W ) is defined as the almost surely discrete random measure
μ ˜ Ψ 1 : = i 1 η ˜ i δ w ˜ i ,
where ( η ˜ i ) i 1 has distribution L and ( w ˜ i ) i 1 is a sequence of independent random variables with common distribution P, also independent of ( η ˜ i ) i 1 . We will write μ ˜ Ψ 1 SP ( ν , f Ψ 1 ) .
A thoughtful account with a complete posterior analysis for SPs is given in Camerlenghi et al. [15]. Here, we characterize the predictive distribution (8) of SPs.
Theorem 2.
(Camerlenghi et al. [15], James [28]). Let Z 1 , Z 2 , be exchangeable random variables modeled as in (7), where M equals SP ( ν , f Ψ 1 ) . If Z n is a random sample that displays K n = k distinct features { W 1 , , W K n } , and feature W i appears exactly M n , i = m i times in the samples, such as i = 1 , , K n , then the conditional distribution of Ψ 1 , given Z n , has posterior density:
f Ψ 1 | Z n ( a ) e i = 1 n 0 1 s ( 1 s ) n 1 a λ ( a s ) d s i = 1 k 0 1 s m i ( 1 s ) n m i a λ ( a s ) d s f Ψ 1 ( a ) .
Moreover, conditionally on Z n and Ψ 1 ,
P ( ( Y n + 1 , A n + 1 , 1 , , A n + 1 , K n ) = ( y , a 1 , , a K n ) | Z n , Ψ 1 ) = Poiss y ; 0 1 s Ψ 1 ( 1 s ) n λ ( s Ψ 1 ) d s i = 1 k Bern ( a i ; p i ( Ψ 1 ) )
being
p i ( Ψ 1 ) : = 0 1 s m i + 1 ( 1 s ) n m i λ ( s Ψ 1 ) d s 0 1 s m i ( 1 s ) n m i λ ( s Ψ 1 ) d s .
Proof. 
The representation of the predictive distribution (19) follows from Camerlenghi et al. [15] (Proposition 2). Indeed, the posterior distribution of the largest jump directly follows from [15] (Equation (4)). In addition, the authors of [15] (Proposition 2) showed that the conditional distribution of Z n + 1 , given Z n and Ψ 1 , equals the distribution of the following counting measure:
Z n + 1 + i = 1 K n A n + 1 , i δ W i ,
where Z n + 1 | μ ˜ = i 1 A n + 1 , i δ w ˜ i BeP ( μ ˜ Ψ 1 ) and μ ˜ Ψ 1 is a CRM with Lévy intensity of the form
ν Ψ 1 ( d w , d s ) = ( 1 s ) n Ψ 1 λ ( Ψ 1 s ) 𝟙 ( 0 , 1 ) ( s ) d s P ( d w ) .
Moreover, A n + 1 , 1 , , A n + 1 , K n are Bernoulli random variables with parameters J 1 , , J K n , respectively, such that conditionally on Ψ 1 , each J i has a distribution with a density function of the form
f J i | Ψ 1 ( s ) ( 1 s ) n m i s m i Ψ 1 λ ( Ψ 1 s ) on ( 0 , 1 ) .
As in the proof of Theorem 1, we show that the distribution of Y n + 1 | ( Ψ 1 , Z n ) equals i 1 A n + 1 , i . Thus, by the evaluation of the Laplace functional, one may easily realize that the last random sum has a Poisson distribution with mean 0 1 ( 1 s ) n s Ψ 1 λ ( Ψ 1 s ) d s . Moreover, by exploiting the posterior representation (20), the variables A n + 1 , i , such as i = 1 , , K n , conditionally on Z n and Ψ , are independent and Bernoulli distributed with mean
E [ J i | Ψ 1 ] = 0 1 s f J i | Ψ 1 ( s ) d s = 0 1 ( 1 s ) n m i s m i + 1 Ψ 1 λ ( s Ψ 1 ) d s 0 1 ( 1 s ) n m i s m i Ψ 1 λ ( s Ψ 1 ) d s .
Remark 1.
According to (18), the conditional distribution of Ψ 1 given Z n may include the whole sampling information, depending on the specification of ν and f Ψ 1 , and hence, the conditional distribution of Y n + 1 given Z n may also include such sampling information. As a corollary of Theorem 2, the conditional distribution of Y n + 1 given Z n is a mixture of Poisson distributions that may include the whole sampling information; in particular, the amount of sampling information in the posterior distribution is uniquely determined by the mixing distribution, namely by the conditional distribution of Ψ 1 , given Z n .
Hereafter, we specialize Theorem 2 for the stable SP, that is, a peculiar SP defined through a CRM with a Lévy intensity ν such that λ ( s ) = σ s 1 σ for a parameter σ ( 0 , 1 ) . We refer to Camerlenghi et al. [15] for a detailed posterior analysis of the stable SP prior.
Corollary 2.
Let Z 1 , Z 2 , be exchangeable random variables modeled as in (7), where M equals SP ( ν , f Ψ 1 ) , with λ ( s ) = σ s 1 σ for some σ ( 0 , 1 ) . If Z n is a random sample that displays K n = k distinct features { W 1 , , W K n } , and feature W i appears exactly M n , i = m i times in the samples, such as i = 1 , , K n , then the conditional distribution of Ψ 1 , given Z n , has posterior density:
f Ψ 1 | Z n ( a ) a k σ e σ a σ i = 1 n B ( 1 σ , i ) f Ψ 1 ( a )
having denoted by B ( · , · ) the classical Euler Beta function. Moreover, conditionally on Z n and Ψ 1 ,
P ( ( Y n + 1 , A n + 1 , 1 , , A n + 1 , K n ) = ( y , a 1 , , a K n ) | Z n , Ψ 1 ) = Poiss y ; σ Ψ 1 σ B ( 1 σ , n + 1 ) i = 1 k Bern a i ; m i σ n σ + 1 .
Proof. 
The proof is a plain application of Theorem 2 under the choice λ ( s ) = σ s 1 σ . □

4. Predictive Characterizations for SPs

In this section, we introduce and discuss Johnson’s “sufficientness” postulates in the context of feature-sampling models under the class of SP priors. According to Theorem 1, if the feature-sampling model is a CRM prior, then the conditional distribution of Y n + 1 , given Z n , is a Poisson distribution that depends on the sampling information Z n only through the sample size n. Moreover, the conditional probability of generating an old feature W i given Z n depends on the sampling information Z n only through n and m i . As shown in Theorem 2, SP priors enrich the predictive structure of CRM priors through the conditional distribution of the latent variable Ψ 1 given the observable sample Z n . In the next theorem, we characterize the class of SP priors for which the conditional distribution of Y n + 1 given Z n depends on the sampling information only through n.
Theorem 3.
Let Z 1 , Z 2 , be exchangeable random variables modeled as in (7), where M equals SP ( ν , f Ψ 1 ) and ν ( d w , d s ) = λ ( d s ) d s P ( d w ) . Moreover, suppose that Z n is a random sample that displays K n = k distinct features { W 1 , , W K n } , and feature W i appears exactly M n , i = m i times in the samples, such as i = 1 , , K n . If f Ψ 1 : ( 0 , r ) R + is a continuous function on the compact support ( 0 , r ) with r > 0 , and the function λ : R + R + is continuous on its domain, then the conditional distribution of the latent variable Ψ 1 given Z n depends on the sampling information Z n only through n if and only if λ ( s ) = C s 1 on ( 0 , r ) for some constant C > 0 .
Proof. 
First of all, if f Ψ 1 is defined on the compact support ( 0 , r ) and if λ ( s ) = C s 1 on ( 0 , r ) for some constant C > 0 , then it is easy to see that the posterior distribution of Ψ 1 in (18) depends only on n and not on the other sample statistics. We now show the reverse implication. The posterior density of Ψ 1 , conditionally on Z n , satisfies (18), and it is proportional to
f Ψ 1 | Z n ( a ) i = 1 n e ϕ i ( a ) i = 1 K n 0 1 s m i ( 1 s ) n m i a λ ( a s ) d s f Ψ 1 ( a ) ,
where ϕ i ( a ) = 0 1 s ( 1 s ) i 1 a λ ( a s ) d s . Then, there exists c ( m 1 , , m k , k , n ) such that it holds that
f Ψ 1 | Z n ( a ) = i = 1 n e ϕ i ( a ) i = 1 K n 0 1 s m i ( 1 s ) n m i a λ ( a s ) d s f Ψ 1 ( a ) c ( m 1 , , m k , k , n ) .
Because of the assumptions imposed, the distribution of Ψ 1 | Z n does not depend on K n , nor on the corresponding sample frequencies M n , 1 , , M n , K n . Accordingly, the function
f 1 ( a , n ) : = f Ψ 1 | Z n 1 ( a ) i = 1 n e ϕ i ( a ) , a ( 0 , r ) ,
depends only on a and n, but not on k and ( m 1 , , m k ) . Then, putting together (23) and (24), it holds that
f 1 ( a , n ) · i = 1 k 0 1 s m i ( 1 s ) n m i a λ ( a s ) d s = c ( m 1 , , m k , n , k ) a ( 0 , r ) ,
where c is the normalizing factor, and it does not depend on the variable a. By choosing m 1 = = m k = n N , thanks to Equation (25), we can state that the following function:
f 1 ( a , n ) 0 1 s n a λ ( a s ) d s k ,
which is defined for any a ( 0 , r ) and does not depend on a, but only on k and n. Since the previous assertion is true for any k 1 , one may select k = 1 , thus obtaining the following identity:
f 1 ( a , n ) = c 0 1 s n a λ ( a s ) d s 1
for some constant c , independent of a, but that may depend on n. Substituting (27) into (26), we obtain that
c 0 1 s n a λ ( a s ) d s k 1
is a function that does not depend on a, but only on n and k. As a consequence, we have that
0 1 s n a λ ( a s ) d s = 0 a s n a n λ ( s ) d s = C
for a suitable constant C , which does not depend on a ( 0 , r ) . To conclude, we take a derivative of the previous expression with respect to a, and this allows us to show that
a n λ ( a ) = n a n 1 C ,
namely, λ ( a ) = C / a for a ( 0 , r ) , where C is a positive constant. This is a Lévy intensity; indeed, it satisfies the condition (11). Outside the interval ( 0 , r ) , λ may be defined arbitrarily; indeed, the values of λ on [ r + ) do not affect the posterior distribution of Ψ 1 (18). □
Remark 2.
Note that in Theorem 3, we have supposed that f Ψ 1 has a compact support on ( 0 , r ) ; thus, we are interested in defining λ on ( 0 , r ) ; outside the interval, λ can be defined arbitrarily because it does not affect the posterior distribution (18) of Ψ 1 . From the proof of Theorem 3, it becomes apparent that if the support of f Ψ 1 is the entire positive real line R + , the posterior distribution of the largest jump depends only on n if and only if λ ( s ) = C s 1 on R + for some constant C > 0 . However, in this case, λ does not meet the integrability condition (11); hence, this can only considered a limiting case. It is interesting to observe that such a limiting situation, with the additional assumption f Ψ 1 = f Δ 1 , corresponds to the Beta process case with σ = 0 and c = 1 (Griffiths and Ghahramani [12]).
Now, we characterize SPs for which the posterior distribution of Ψ 1 depends only on n and K n , but not on the sample frequencies of the different features m . Here, we assume that f Ψ 1 has full support a priori. The following characterization has been provided in Camerlenghi et al. [15] (Theorem 3), but for completeness, we report the proof.
Theorem 4.
(Camerlenghi et al. [15]). Let Z 1 , Z 2 , be exchangeable random variables modeled as in (7), where M equals SP ( ν , f Ψ 1 ) and ν ( d w , d s ) = λ ( d s ) d s P ( d w ) . Suppose that Z n is a random sample that displays K n = k distinct features { W 1 , , W K n } , and feature W i appears exactly M n , i = m i times in the sample, such as i = 1 , , K n . If f Ψ 1 : R + R + is a strictly positive function on R + and continuously differentiable, and λ is continuously differentiable, then the conditional distribution of the latent variable Ψ 1 , given Z n , depends on Z n only through n and K n if and only if λ ( s ) = C s 1 σ on R + for some constant C > 0 and σ ( 0 , 1 ) .
Proof. 
By arguing as in the proof of Theorem 3, the posterior density of Ψ 1 given Z n is proportional to
i = 1 n e ϕ i ( a ) i = 1 k 0 1 s m i ( 1 s ) n m i a λ ( a s ) d s f Ψ 1 ( a ) ,
where ϕ i ( a ) = 0 1 s ( 1 s ) i 1 a λ ( a s ) d s . Then, there exists c ( m 1 , , m k , n , k ) such that it holds that
f Ψ 1 | Z n ( a ) = i = 1 n e ϕ i ( a ) i = 1 k 0 1 s m i ( 1 s ) n m i a λ ( a s ) d s f Ψ 1 ( a ) c ( m 1 , , m k , n , k ) .
As a consequence,
f Ψ 1 | Z n 1 ( a ) i = 1 n e ϕ i ( a ) i = 1 k 0 1 s m i ( 1 s ) n m i a λ ( a s ) d s f Ψ 1 ( a ) = c ( m 1 , , m k , n , k ) .
If the density function f Ψ 1 | Z n ( a ) does not depend on m 1 , , m k , then the following function
f Ψ 1 | Z n 1 ( a ) i = 1 n e ϕ i ( a ) f Ψ 1 ( a ) = f 1 ( a , k , n )
depends only on k , n and a, but not on the frequency counts. Therefore, (29) boils down to
f 1 ( a , k , n ) · i = 1 k 0 1 s m i ( 1 s ) n m i a λ ( a s ) d s = c ( m 1 , , m k , n , k ) .
where the function on the right-hand side of (30) is independent of a for any choice of the vector of sampling information ( m 1 , , m k , n , k ) . Now, since the vector ( m 1 , , m k , n , k ) can be chosen arbitrarily, we can make the choice m 1 = = m k = m > 0 , such that the function
w ( a , k , n ) 0 1 s m ( 1 s ) n m a λ ( a s ) d s k
does not depend on a R + , where w ( a , k , n ) = f 1 ( a , k , n ) k . Moreover, suppose that m = n ; thus,
w ( a , k , n ) 0 1 s n a λ ( a s ) d s
does not depend on a R + , which implies that
w ( a , k , n ) = c 0 1 s n a λ ( a s ) d s 1
for a constant c > 0 with respect to a, which can only depend on k and n. By substituting (33) into (31), we obtain
c 0 1 s n λ ( a s ) d s · 0 1 s m ( 1 s ) n m λ ( a s ) d s k ,
which is independent of a R + . Now, it is possible to choose m = n 1 in the previous function. Therefore, there exists a constant c independent of a such that the following identity holds:
0 1 s n 1 λ ( a s ) d s 0 1 s n λ ( a s ) d s = c 0 1 s n λ ( a s ) d s .
By taking the derivative of the previous equation two times with respect to a, one obtains
λ ( a ) ( 1 n c ) = a λ ( a ) c ,
which is an ordinary differential equation in λ that can be solved by separation of variables. In particular, we obtain
λ ( a ) = C a ( 1 n c ) / c , for C > 0 .
To conclude, observe that the exponent of a in (34) should satisfy the integrability condition (11) for homogeneous CRMs. Accordingly, it is easy to see that we must consider
λ ( a ) = C 1 a 1 + σ
where C > 0 and σ ( 0 , 1 ) . The reverse implication of the theorem is trivially satisfied; hence, the proof is completed. □
We recall from Theorem 2 that the conditional distribution of Ψ 1 given Z n uniquely determines the amount of sampling information included in the conditional distribution of the number of new features Y n + 1 given Z n . Such sampling information may range from the whole information, in terms of n, K n , and ( M 1 , n , , M K n , n ) , to the sole information on the sample size n. According to Theorem 4, the stable SP prior of Corollary 2 is the sole SP prior for which the conditional distribution of the number of new features Y n + 1 given Z n depends on the sampling information Z n only on n and K n . Moreover, according to Theorem 3, the Beta process prior is the sole SP prior for which the conditional distribution of the number of new features Y n + 1 given Z n depends on the sampling information Z n only on n. In particular, Theorems 3 and 4 show that the Beta process prior and the stable SP prior may be considered, to some extent, the feature sampling counterparts of the Dirichlet process prior the Pitman–Yor process prior.

5. Discussion and Conclusions

In this paper, we have introduced and discussed Johnson’s “sufficientness” postulates in the context of feature-sampling models. “Sufficientness” postulates have been investigated extensively in the context of species-sampling models, providing an effective classification of species-sampling models on the basis of the form of their corresponding predictive distributions. Here, we made a first step towards the problem of providing an analogous classification for feature-sampling models. In particular, we obtained Johnson’s “sufficientness” postulates when the class of feature-sampling models is restricted to the class of scaled process priors. However, the results presented in the paper remain preliminary, and do not at all provide a complete answer to the characterization problem within the general class of feature-sampling models. This problem remains open.
Within the feature-sampling setting, the predictive distribution is of the form (8), though for the purpose of providing “sufficientness” postulates, one may focus on feature-sampling models exhibiting a general predictive distribution of the following type:
P ( ( Y n + 1 , A n + 1 , 1 , , A n + 1 , K n ) = ( y , a 1 , , a K n ) | Z n ) = g ( y ; n , k , m ) i = 1 k f i ( a i ; n , k , m ) .
Note that (35) is a probability distribution, and it must satisfy a consistency condition, as usual. Among all the feature-sampling models whose predictive distribution can be written in the form (35), we are interested in characterizing nonparametric priors such that: (i) The function g depends on the sampling information only through n, and the function f i depends only on ( n , m i ) ; (ii) g depends only on ( n , k ) and f i depends only on ( n , m i ) ; (iii) g depends only on ( n , k ) and f i depends only on ( n , k , m i ) . In our view, these characterizations may provide a complete picture of sufficientness postulates within the feature setting, and they are also fundamental to guiding the selection of the prior distribution. We conjecture that CRMs are the nonparametric priors satisfying the characterization (i), the SP with a stable Lévy measure is an example of prior satisfying (ii), and no examples satisfying (iii) have been considered in the current literature. Results in this direction are in Battiston et al. [31], where the authors characterize an exchangeable feature allocation probability function (Broderick et al. [32]) in product forms; this could be a stimulating point of departure to study the characterization problem depicted above.

Author Contributions

Writing–original draft, F.C. and S.F.; writing–review and editing, F.C. and S.F. The authors contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

Funding

This research received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program under grant agreement No. 817257.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

F.C. is extremely grateful to Eugenio Regazzini for the time spent at the Department of Mathematics of University of Pavia during his Ph.D. studies in Mathematical Statistics; F.C. wants to especially thank Eugenio Regazzini for having introduced him to the study of Bayesian Statistics with a stimulating Ph.D. course held together with Antonio Lijoi. S.F. wishes to express his gratitude to Eugenio Regazzini, whose fundamental contributions to Bayesian statistics have always been a great source of inspiration, transmitting enthusiasm and methods for the development of his own research. The authors gratefully acknowledge the financial support from the Italian Ministry of Education, University, and Research (MIUR), “Dipartimenti di Eccellenza” grant 2018-2022. F.C. is a member of the Gruppo Nazionale per l’Analisi Matematica, la Probabilità e le loro Applicazioni (GNAMPA) of the Istituto Nazionale di Alta Matematica (INdAM).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. de Finetti, B. La prévision: Ses lois logiques, ses sources subjectives. Ann. Inst. H. Poincaré 1937, 7, 1–68. [Google Scholar]
  2. Johnson, W.E. Probability: The Deductive and Inductive Problems. Mind 1932, 41, 409–423. [Google Scholar] [CrossRef]
  3. Pitman, J. Some developments of the Blackwell-MacQueen urn scheme. In Statistics, Probability and Game Theory; IMS Lecture Notes Monograph Series; Institute of Mathematical Statistics: Hayward, CA, USA, 1996; Volume 30, pp. 245–267. [Google Scholar] [CrossRef]
  4. Regazzini, E. Intorno ad alcune questioni relative alla definizione del premio secondo la teoria della credibilità. Giornale dell’Istituto Italiano degli Attuari 1978, 41, 77–89. [Google Scholar]
  5. Lo, A.Y. A characterization of the Dirichlet process. Stat. Probab. Lett. 1991, 12, 185–187. [Google Scholar] [CrossRef]
  6. Ferguson, T.S. A Bayesian analysis of some nonparametric problems. Ann. Statist. 1973, 1, 209–230. [Google Scholar] [CrossRef]
  7. Zabell, S.L. Symmetry and its discontents. In Cambridge Studies in Probability, Induction, and Decision Theory; Essays on the history of inductive probability, with a preface by Brian Skyrms; Cambridge University Press: New York, NY, USA, 2005; p. xii+279. [Google Scholar]
  8. Perman, M.; Pitman, J.; Yor, M. Size-biased sampling of Poisson point processes and excursions. Probab. Theory Relat. Fields 1992, 92, 21–39. [Google Scholar] [CrossRef]
  9. Pitman, J.; Yor, M. The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Ann. Probab. 1997, 25, 855–900. [Google Scholar] [CrossRef]
  10. Bacallado, S.; Battiston, M.; Favaro, S.; Trippa, L. Sufficientness postulates for Gibbs-type priors and hierarchical generalizations. Stat. Sci. 2017, 32, 487–500. [Google Scholar] [CrossRef] [Green Version]
  11. Gnedin, A.; Pitman, J. Exchangeable Gibbs partitions and Stirling triangles. Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI) 2005, 325, 83–102, 244–245. [Google Scholar] [CrossRef] [Green Version]
  12. Griffiths, T.L.; Ghahramani, Z. The Indian buffet process: An introduction and review. J. Mach. Learn. Res. 2011, 12, 1185–1224. [Google Scholar]
  13. Ayed, F.; Battiston, M.; Camerlenghi, F.; Favaro, S. Consistent estimation of small masses in feature sampling. J. Mach. Learn. Res. 2021, 22, 1–28. [Google Scholar]
  14. James, L.F.; Orbanz, P.; Teh, Y.W. Scaled subordinators and generalizations of the Indian buffet process. arXiv 2015, arXiv:1510.07309. [Google Scholar]
  15. Camerlenghi, F.; Favaro, S.; Masoero, L.; Broderick, T. Scaled process priors for Bayesian nonparametric estimation of the unseen genetic variation. arXiv 2021, arXiv:2106.15480. [Google Scholar]
  16. Brix, A. Generalized gamma measures and shot-noise Cox processes. Adv. Appl. Probab. 1999, 31, 929–953. [Google Scholar] [CrossRef]
  17. Lijoi, A.; Mena, R.H.; Prünster, I. Controlling the reinforcement in Bayesian non-parametric mixture models. J. R. Stat. Soc. Ser. B Stat. Methodol. 2007, 69, 715–740. [Google Scholar] [CrossRef]
  18. De Blasi, P.; Favaro, S.; Lijoi, A.; Mena, R.H.; Prunster, I.; Ruggiero, M. Are Gibbs-type priors the most natural generalization of the Dirichlet process? IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 212–229. [Google Scholar] [CrossRef] [Green Version]
  19. Regazzini, E.; Lijoi, A.; Prünster, I. Distributional results for means of normalized random measures with independent increments. Ann. Stat. 2003, 31, 560–585. [Google Scholar] [CrossRef]
  20. Pitman, J. Poisson-Kingman Partitions; Lecture Notes-Monograph Series; Institute of Mathematical Statistics: Beachwood, OH, USA, 2003; pp. 1–34. [Google Scholar]
  21. Pitman, J. Combinatorial Stochastic Processes; Lecture Notes in Mathematics; Lectures from the 32nd Summer School on Probability Theory held in Saint-Flour, 7–24 July 2002, with a foreword by Jean Picard; Springer: Berlin, Germany, 2006; Volume 1875, p. x+256. [Google Scholar]
  22. Lijoi, A.; Prünster, I. Models beyond the Dirichlet process. In Bayesian Nonparametrics; Hjort, N.L., Holmes, C., Müller, P., Walker, S., Eds.; Cambridge University Press: Cambridge, UK, 2010; pp. 80–136. [Google Scholar]
  23. Ghosal, S.; van der Vaart, A. Fundamentals of Nonparametric Bayesian Inference; Cambridge Series in Statistical and Probabilistic Mathematics; Cambridge University Press: Cambridge, UK, 2017; Volume 44, p. xxiv+646. [Google Scholar]
  24. Zabell, S.L. The continuum of inductive methods revisited. In The Cosmos of Science: Essays of Exploration; University of Pittsburgh Press: Pittsburgh, PA, USA, 1997; pp. 351–385. [Google Scholar]
  25. Daley, D.J.; Vere-Jones, D. An Introduction to the Theory of Point Processes: Volume II: General Theory and Structure (Probability and Its Applications), 2nd ed.; Springer: New York, NY, USA, 2008; p. xviii+573. [Google Scholar] [CrossRef]
  26. Kingman, J. Completely random measures. Pac. J. Math. 1967, 21, 59–78. [Google Scholar] [CrossRef] [Green Version]
  27. Teh, Y.; Gorur, D. Indian buffet processes with power-law behavior. Adv. Neural Inf. Process. Syst. 2009, 22, 1838–1846. [Google Scholar]
  28. James, L.F. Bayesian Poisson calculus for latent feature modeling via generalized Indian buffet process priors. Ann. Stat. 2017, 45, 2016–2045. [Google Scholar] [CrossRef]
  29. Broderick, T.; Wilson, A.C.; Jordan, M.I. Posteriors, conjugacy, and exponential families for completely random measures. Bernoulli 2018, 24, 3181–3221. [Google Scholar] [CrossRef] [Green Version]
  30. Masoero, L.; Camerlenghi, F.; Favaro, S.; Broderick, T. More for less: Predicting and maximizing genomic variant discovery via Bayesian nonparametrics. Biometrika 2021, asab012. [Google Scholar] [CrossRef]
  31. Battiston, M.; Favaro, S.; Roy, D.M.; Teh, Y.W. A characterization of product-form exchangeable feature probability functions. Ann. Appl. Probab. 2018, 28, 1423–1448. [Google Scholar] [CrossRef] [Green Version]
  32. Broderick, T.; Pitman, J.; Jordan, M.I. Feature allocations, probability functions, and paintboxes. Bayesian Anal. 2013, 8, 801–836. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Camerlenghi, F.; Favaro, S. On Johnson’s “Sufficientness” Postulates for Feature-Sampling Models. Mathematics 2021, 9, 2891. https://doi.org/10.3390/math9222891

AMA Style

Camerlenghi F, Favaro S. On Johnson’s “Sufficientness” Postulates for Feature-Sampling Models. Mathematics. 2021; 9(22):2891. https://doi.org/10.3390/math9222891

Chicago/Turabian Style

Camerlenghi, Federico, and Stefano Favaro. 2021. "On Johnson’s “Sufficientness” Postulates for Feature-Sampling Models" Mathematics 9, no. 22: 2891. https://doi.org/10.3390/math9222891

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop