Next Article in Journal
Efficient and Interpretable ECG Abnormality Detection via a Lightweight DSCR-BiGRU-Attention Network with Demographic Fusion
Next Article in Special Issue
A Bayesian Decision-Theoretic Optimization Model for Personalized Timing of Non-Invasive Prenatal Testing Based on Maternal BMI
Previous Article in Journal
A Minimax Diversification Approach to Dynamic Portfolio Optimization
Previous Article in Special Issue
Different Statistical Inference Algorithms for the New Pareto Distribution Based on Type-II Progressively Censored Competing Risk Data with Applications
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Convergent Indian Buffet Process

Department of Statistics, Inha University, Incheon 22212, Republic of Korea
Mathematics 2025, 13(23), 3881; https://doi.org/10.3390/math13233881
Submission received: 14 October 2025 / Revised: 21 November 2025 / Accepted: 1 December 2025 / Published: 3 December 2025

Abstract

We propose a new Bayesian nonparametric prior for latent feature models, called the Convergent Indian Buffet Process (CIBP). We show that under the CIBP, the number of latent features is distributed as a Poisson distribution, with the mean monotonically increasing but converging to a certain value as the number of objects goes to infinity. That is, the expected number of features is bounded above even when the number of objects goes to infinity, unlike the standard Indian Buffet Process, under which the expected number of features increases with the number of objects. We provide two alternative representations of the CIBP based on a hierarchical distribution and a completely random measure, which are of independent interest. The proposed CIBP is assessed on a high-dimensional sparse factor model.

1. Introduction

1.1. Motivation and Background

Latent feature models provide a flexible framework for representing objects that may simultaneously possess multiple latent characteristics. The Indian Buffet Process (IBP), first introduced by [1], has become one of the most essential Bayesian nonparametric priors for such models, owing to its elegant construction as an exchangeable distribution over binary matrices with infinitely many columns. Each column corresponds to a potential latent feature, and each row to an object possessing a subset of these features. This approach has been widely applied to diverse areas, including factor analysis, link prediction, and relational data modeling [2,3,4]. Although its original development is more than a decade ago, recent research demonstrates that IBP-based models remain an active and evolving field within Bayesian nonparametrics, with extensions including hierarchical spike-and-slab IBP [5], similarity-based feature learning [6], and more general product feature allocation formulations [7].
Despite its popularity, a key limitation of the standard IBP and its two- and three-parameter extensions [8,9] lies in the unbounded growth of the expected number of features as the number of objects increases. For a latent feature matrix Ξ R p × , each row corresponds to an object and each column to a potential feature. The number of active features, denoted by the number of nonzero columns in Ξ , is defined as
K + : = k = 1 I ξ ( k ) 0 ,
where ξ ( k ) is the k-th column of Ξ and I ( · ) is the indicator function. Under the standard IBP and its two- or three-parameter generalizations, the expected number of features E [ K + ] increases either logarithmically or polynomially as the number of objects p grows:
E [ K + ] = O ( log p ) , for the one- or two-parameter IBP , O ( p a ) , for the three-parameter IBP with a > 0 .
Hence, existing IBP variants can produce redundant and uninformative features as data size increases, reducing interpretability and predictive performance. This property may be undesirable for applications where the number of underlying latent factors is believed to be finite or asymptotically stable.
In many practical domains, the assumption of infinitely increasing latent features is not realistic. For example, in macroeconomic or financial modeling, the co-movements among a large number of asset returns can often be explained by a fixed and small number of underlying factors [10,11]. Similarly, in biological and social network analysis, the generative mechanisms are often governed by a finite set of latent processes that do not proliferate with additional data [12]. However, the standard IBP tends to allocate unnecessary features as the data size grows, which complicates interpretability and reduces statistical efficiency.
This motivates the need for a new nonparametric prior that maintains the flexibility of IBP while ensuring a convergent (bounded) number of features, i.e., the distribution of K + converges to a finite limit as p . Such a prior would bridge the gap between classical finite feature models and existing nonparametric counterparts, combining parsimony with theoretical tractability.
In this context, we propose the Convergent Indian Buffet Process (CIBP), a novel three-parameter extension of the IBP that possesses a convergent expected number of features. The CIBP retains exchangeability and constructive elegance of the IBP while introducing an additional parameter that governs the rate of convergence. Specifically, the total number of active features follows a Poisson distribution whose mean does not diverge as the number of objects tends to infinity.

1.2. Contributions and Organization

This paper makes both methodological and theoretical contributions to Bayesian nonparametric modeling. First, we introduce the Convergent Indian Buffet Process (CIBP), a new stochastic process that extends the Indian Buffet Process by ensuring a bounded expected number of latent features. The CIBP maintains the desirable exchangeability property of the IBP while incorporating an additional parameter that controls convergence. This enables a parsimonious latent feature inference even as the number of observations increases.
Second, we establish the theoretical foundations of the CIBP. We derive its distributional properties, including the asymptotic convergence of the number of active features, and provide two equivalent formulations: a hierarchical representation based on Poisson–Beta–Bernoulli layers, and a completely random measure construction using the Beta process. These representations clarify the relationship between the CIBP and existing nonparametric priors, offering new insight into the structure of latent feature models.
Finally, we demonstrate the utility of the CIBP through its application to Bayesian sparse factor models. The empirical results show that the CIBP yields stable and interpretable factor inference while preventing overfitting in high-dimensional data.
The remainder of this paper is organized as follows: Section 2 introduces the Convergent Indian Buffet Process and analyzes its distributional properties. Section 3 and Section 4 develop the hierarchical and random measure formulations, respectively. Section 5 presents an application to Bayesian sparse factor modeling, and Section 6 concludes the paper.

2. Convergent Indian Buffet Process

In this section, we formalize the construction of the proposed Convergent Indian Buffet Process (CIBP). The CIBP extends the classical IBP by introducing a “convergence” parameter that ensures the total number of latent features remains bounded as the number of observations increases. While the standard IBP provides an elegant and exchangeable distribution over infinite binary matrices, its expected number of features grows unboundedly with data size. The CIBP modifies this mechanism while preserving exchangeability, thereby providing a regularized version of the IBP suitable for parsimonious latent feature inference.

2.1. Restaurant Analogy

The Convergent Indian Buffet Process (CIBP) is defined by the restaurant analogy given below. We denote by B ( a , b ) the beta function for a > 0 and b > 0 .
Definition 1 (The restaurant analogy of the CIBP). 
Let γ > 0 , α > 0 and κ > 0 . We refer to the following stochastic process, the restaurant analogy of CIBP ( γ , α , κ ) :
1. 
The first customer tries K 1 + dishes with Poisson ( γ B ( α + 1 , κ ) / B ( α , κ ) )
2. 
For every j = 2 , , p , the j-th customer does:
  • For every k = 1 , , K j 1 + , the j-th customer tries the k-th dish if ξ j , k = 1 and does not otherwise, where
    ξ j , k Bernoulli ( m j , k + α j 1 + κ + α ) .
    Here, m j , k denotes the number of previous customers before the j-th customer, who have tried the k-th dish.
  • The j-th customer tries K j new many new dishes with
    K j new Poisson ( γ B ( α + 1 , κ + j 1 ) B ( α , κ ) ) .
  • Set K j + = K j 1 + + K j n e w .
Each realization of the above restaurant process is a binary matrix, where the number of rows corresponds to the number of customers and the columns are unbounded. In this matrix, the element at position ( j , k ) is set to one if the j-th customer has sampled the k-th dish, and zero if they have not. The distribution of this binary matrix, as generated by the restaurant analogy, is denoted by CIBP ( γ , α , κ ) .
We provide an intuition behind the converging property of the number of features under the CIBP. Under the CIBP, the expected number of new dishes taken by the j-th customer is
γ B ( α + 1 , κ + j 1 ) B ( α , κ ) =   γ Γ ( α + 1 ) Γ ( κ + j 1 ) Γ ( α + κ + j ) Γ ( α + κ ) Γ ( α ) Γ ( κ )           =   γ Γ ( α + 1 ) Γ ( α ) Γ ( α + κ ) Γ ( α + κ + j ) Γ ( κ + j 1 ) Γ ( κ )           =   γ α α + κ + j 1 h = 1 j 1 κ + h 1 α + κ + h 1 ,
where Γ(·) denotes the gamma function. As the simplest case, when α = 1, the last display becomes
γ κ + j κ κ + j 1 = O   ( 1 j 2 ) .
since j = 1 1 / j 2 < we can expect that the expected number of features converges to a finite number as p tends to ∞. Indeed, this convergent property holds for any α > 0, which we will show in the next subsection.
Throughout the paper, we let ψ a 1 , b 1 a 2 , b 2 denote the ratio of two beta functions defined as
ψ a 1 , b 1 a 2 , b 2 : = B ( a 1 + a 2 , b 1 + b 2 ) B ( a 1 , b 1 ) .
for a 1 , b 1 > 0 and a 2 , b 2 0 for notational simplicity. For example, the Poisson distribution in (2) may be written as Poisson ( γ ψ α , κ 1 , j 1 ) .

2.2. Distribution of the Number of Features Under the CIBP

In this subsection, we show that the number of features under the CIBP follows a Poisson distribution with the mean being fixed as the number of objects increases. The name, convergent IBP, is named after this property.
Theorem 1. 
If Ξ CIBP ( γ , α , κ ) , then the number of active features is distributed as
K + Poisson ( γ ( 1 ψ α , κ 0 , p ) ) ,
where ψ α , κ 0 , p : = B ( α , κ + p ) / B ( α , κ ) 1 . Moreover, the mean γ ( 1 ψ α , κ 0 , p ) is monotonically increasing and tends to γ as p . This implies that K + converges to the random variable K following Poisson ( γ ) in distribution.
Proof. 
From the restaurant analogy of the CIBP, we have that
K + = d j = 1 P K j new ,   where   K j new ind Poisson   ( γ B ( α + 1 , κ + j 1 ) B ( α , κ ) )
Therefore, by the additive property of independent Poisson random variables,
K + Poisson   ( γ B ( α , κ ) j = 1 p B ( α + 1 , κ + j 1 ) )
From the identity B ( x , y ) B ( x , y + 1 ) = B ( x + 1 , y ) , we have
j = 1 p B ( α + 1 , κ + j 1 ) = j = 1 p   { B ( α , κ + j 1 ) B ( α , κ + j ) } = B ( α , κ ) B ( α , κ + p ) ,
which implies (4).
For the second assertion, note that
ψ α , κ 0 , p = Γ ( α ) Γ ( κ + p ) Γ ( α + κ + p ) Γ ( α + κ ) Γ ( α ) Γ ( κ ) = Γ ( α + κ ) Γ ( α + κ + p ) Γ ( κ + p ) Γ ( κ ) = j = 1 p κ + j 1 α + κ + j 1 .
Since α > 0 , it follows that ψ α , κ 0 , p 0 as p . □
We provide a simple numerical example supporting (4). In Table 1, we compare the empirical distribution of K + obtained from 5000 draws of feature matrices under CIBP ( 1 , 1 , 1 ) with p = 500 to the distribution of Poisson ( 1 ψ 1 , 1 0 , 500 ) . The absolute differences are below 0.005 across all categories, providing numerical support for the distribution of the number of active features under the CIBP.

2.3. Connection to the Two-Parameter IBP

We give the restaurant analogy of IBP ( γ , κ ) . The first customer tries Poisson ( γ ) dishes. For j 2 , the j-th customer tries each previously tasted dish according to
Bernoulli   ( m j , k j 1 + κ ) ,
and tries K j new many new dishes with K j new following
Poisson   ( γ κ j 1 + κ ) .
We denote by IBP ( γ , κ ) the distribution induced by the above restaurant analogy.
The next theorem reveals the relationship between CIBP ( γ , α , κ ) and IBP ( γ , κ ) .
Theorem 2. 
For two p × -dimensional binary matrices Ξ CIBP ( γ , α , κ ) and Ξ 0 IBP ( γ , κ ) ,   Ξ converges to Ξ 0 in distribution as α 0 and γ α / κ γ .
Proof. 
Note that
m j , k + α j 1 + κ + α m j , k j 1 + κ
as α 0 . Moreover,
γ B ( α + 1 , κ + j 1 ) B ( α , κ ) = γ α α + κ + j 1 h = 1 j 1 κ + h 1 α + κ + h 1 γ κ j 1 + κ ,
as α 0 and γ α / κ γ , where the equality was shown in (3). These two displays imply that the means of the Bernoulli distribution in (1) and the Poisson distribution in (2) converge to the corresponding quantities for IBP ( γ , κ ) . This gives the desired result. □
Figure 1 shows some realizations of CIBP ( γ κ / α , α , κ )   α = 5 , α = 1 , α = 0.5 and IBP ( γ , κ ) , with γ = 5 , where κ = 4 is fixed. It is evident that the IBP generates more active features than the CIBP.

3. Hierarchical Representation

In this section, we prove that the restaurant analogy of the CIBP can be expressed as some hierarchical distribution.
Definition 2 (Hierarchical representation of the CIBP). 
Let γ > 0 , α > 0 , and κ > 0 . We refer to the following hierarchical probability distribution as the hierarchical representation of CIBP ( γ , α , κ ) :
K Poisson ( γ ) , θ k iid Beta   ( α , κ ) , k [ K ] ξ j k | θ k ind Bernoulli ( θ k ) j [ p ] , k [ K ] .
To articulate the result precisely, we introduce the notion of lof-equivalence classes. In the context of the latent feature model, the sequence of features does not influence the likelihood. Therefore, we consider two p × dimensional binary matrices to be equivalent if they can be converted into each other by rearranging their columns. It is practical to select a representative for each equivalence class through the left-ordering procedure. This procedure transforms each p × dimensional binary matrix into its left-ordered form, arranging the columns based on the score s k , which is defined as follows:
s k : = j = 1 p ξ j k 2 p j
i.e., the columns are ordered so that s 1 s 2 . The equivalence class defined by the left-ordering procedure is called the lof-equivalence class and denoted by [ Ξ ] .
We introduce additional notations. Let Δ : = { 0 , 1 } p and Δ 1 : = Δ { 0 } . Moreover, for u Δ 1 , we define
K u : = k = 1 I ( ξ ( k ) = u ) ,
with ξ ( k ) being the k-th column of Ξ , which is the number of columns of Ξ equal to u . We then have K + : = k = 1 I ( ξ ( k ) 0 ) = u Δ 1 K u . We define
m k : = j = 1 p ξ j k ,
which is the number of rows that possess the k-th feature.
The following theorem provides the probability mass function of the lof-equivalence class [ Ξ ] .
Theorem 3. 
The probability mass function of a p × -dimensional random binary matrix Ξ ( ξ j k ) j [ p ] , k N generated from the distribution in (5) is given by
P ( [ Ξ ] ) = γ K + u Δ 1 K u ! e γ j = 1 p ψ α , κ 1 , j 1 [ k = 1 K + ψ α , κ m k , p m k ] .
Proof. 
Recall that m k : = j = 1 p ξ j k . If K K + , we have that
P ( Ξ | K ) = k = 1 K B ( m k + α , p m k + κ ) B ( α , κ ) =   ( B ( α , p + κ ) B ( α , κ ) ) K K + × k = 1 K + B ( m k + α , p m k + κ ) B ( α , κ ) =   ( ψ α , κ 0 , p ) K K + × k = 1 K + ψ α , κ m k , p m k ,
where for the second equality, we reorder the columns such that m k > 0 if k K + and m k = 0 otherwise. Therefore, since the cardinality of the lof-equivalence class is | [ Ξ ] ] = K ! / u Δ K u ! , the (conditional) probability mass function of [ Ξ ] given K K + is given by
P ( [ Ξ ] | K ) = K ! u Δ K u !   ( ψ α , κ 0 , p ) K K + k = 1 K + ψ α , κ m k , p m k
If K < K + , we have P ( Ξ | K ) = 0 . Since K Poisson ( γ ) , we have
P ( [ Ξ ] ) = 1 u Δ 1 K u !   [ k = 1 K + ψ α , κ m k , p m k ] K = K + K ! K 0 ! ( ψ α , κ 0 , p ) K K + p K ( K ) .
where p K denotes the probability mass function of Poisson ( γ ) , i.e., p K ( k ) : = e γ γ k / k ! for k = 0 , 1 , 2 , . Note that
K = K + K ! K 0 !   ( ψ α , κ 0 , p ) K K + p K ( k )                                                                                 = e γ γ K + K = K + 1 ( K K + ) !   ( γ ψ α , κ 0 , p ) K K +       = γ K + e γ ( 1 ψ α , κ 0 , p ) .
Moreover, by the identity B ( x , y ) B ( x , y + 1 ) = B ( x + 1 , y ) , we have
1 ψ α , κ 0 , p = 1 B ( α , p + κ ) B ( α , κ ) = 1 B ( α , κ )   { B ( α , κ ) B ( α , p + κ ) } = 1 B ( α , κ ) j = 1 p   { B ( α , κ + j 1 ) B ( α , κ + j ) } = 1 B ( α , κ ) j = 1 p B ( α + 1 , κ + j 1 ) = j = 1 p ψ α , κ 1 , j 1 .
Combining (8) and (9), we obtain the desired result.     □
By Theorem 3, it can be shown that Definitions 1 and 2 are equivalent.
Theorem 4.
For a p × -dimensional binary matrix Ξ following CIBP ( γ , α , κ ) , the probability mass function of the lof-equivalence class [ Ξ ] is the same as (7). Therefore, the distribution in (5) is equivalent to CIBP ( γ , α , κ ) .
Proof. 
Let ξ j be the j-th row of Ξ . We have
P ( ξ 1 ) = 1 K 1 + !   ( γ ψ α , κ 1 , 0 ) K 1 + e γ ψ α , κ 1 , 0 ,
where K 1 + denotes the number of nonzero elements in ξ 1 . It is same as (7) with p = 1 and K + = K 1 + .
For p 2 , the conditional distribution of ξ p given ξ 1 , , ξ p 1 is given by
P ( ξ p | ξ 1 , , ξ p 1 ) = e γ ψ α , κ 1 , p 1 ( γ ψ α , κ 1 , p 1 ) K p new K p new ! × k J p m p , k + α p 1 + κ + α k J p p 1 m p , k + κ p 1 + κ + α ,
where m p , k : = j = 1 p 1 ξ j k , K p new is the number of new dishes taken by the p-th customer and Jp is the set of dishes taken by the p-th customer, i.e., J p : = { k [ K p 1 + ] : ξ p k = 1 } .   Let   K p + : = j = 1 p K j new = K p 1 + + K p new   and   K 1 new = K 1 + . By the inductive hypothesis, we have
P ( ξ 1 , , ξ p ) = P ( ξ p | ξ 1 , , ξ p 1 ) P ( ξ 1 , , ξ p 1 ) = e γ j = 1 p ψ α , κ 0 , j 1 γ K p + j = 1 p K j new ! k J p m p , k + α p 1 + κ + α ψ α , κ m p , k , p 1 m p , k × k J p p 1 m p , k p 1 + κ + α ψ α , κ m p , k , p 1 m p , k × ( ψ α , κ 1 , p 1 ) K p new
since m k = m p , k + 1   for   k J p   and   m k = m p , k otherwise, we have
m p , k + α p 1 + κ + α ψ α , κ m p , k , p 1 m p , k = m p , k + α p 1 + κ + α B ( m p , k + α , p 1 m p , k + κ ) B ( α , κ ) = B ( m p , k + 1 + α , p 1 m p , k + κ ) B ( α , κ ) = ψ α , κ m p , k + 1 , p 1 m p , k = ψ α , κ m k , p m k
and similarly,
p 1 m p , k p 1 + κ + α ψ α , κ m p , k , p 1 m p , k = ψ α , κ m p , k , p m p , k = ψ α , κ m k , p m k .
Therefore,
P ( ξ 1 , , ξ p ) = e γ j = 1 p ψ α , κ + 1 1 , j 1 γ K p + j = 1 p K j new ! k J p ψ α , κ m k , p m k × k J p ψ α , κ m k , p m k × ( ψ α , κ 1 , p 1 ) K p new = e γ j = 1 p ψ α , κ 1 , j 1 γ K p + j = 1 p K j new ! k = 1 K p + ψ α , κ m k , p m k .
since j = 1 p K j new ! / u Δ 1 K u many matrices generated by the above process have the same left-ordered form, we obtain P ( [ Ξ ] ) by multiplying j = 1 p K j new ! / u Δ 1 K u and P ( ξ 1 , , ξ p )  in (11).    □
Using the hierarchical representation, we can explicitly obtain the distribution of m k .
Corollary 1.
Assume Ξ CIBP ( γ , α , κ ) . Let F k : = P ( K k ) for K Poisson ( γ ) . Then, the (marginal) probability mass function of m k : = j = 1 p ξ j k is given by
P ( m k = m ) = F k p m ψ α , κ m , p m I ( m = 0 , 1 , , p ) + ( 1 F k ) I ( m = 0 )
for each k N .
Proof. 
In the hierarchical representation of the CIBP, we have j = 1 p ξ j , k | θ k Binom ( θ k ) when K k . Thus,
P ( m k = m | K k ) = p m E [ θ m ( 1 θ ) p m ] = p m B ( α + m , κ + p m ) B ( α , κ ) = p m ψ α , κ m , p m
On the other hand, we have P ( m k = 0 | K < k ) = 1 . Combining these two derivations, we obtain the desired. □

Exchangeability

The exchangeability of the IBP ensures that the associated posterior inference algorithms is tractable. As established in the following corollary, the CIBP is likewise an exchangeable distribution. This result follows directly from the hierarchical formulation of the CIBP given in Theorem 4.
Corollary 2.
The row vectors ξ 1 , , ξ p of a p × -dimensional binary matrix Ξ generated from CIBP ( γ , α , κ ) are exchangeable.

4. Construction from Random Measures

In this section, we provide the random measure construction of the CIBP. We first briefly review the concept of completely random measures. We refer to Appendix J of [13] for more details. Let ( Ω , A ) be the pair of a Polish space with its Borel σ -field. Let ( M , M ) be the pair of a set of all measures on ( Ω , A ) with its Borel σ -field. A completely random measure (CRM) μ on ( Ω , A ) is a random measure satisfying that μ ( A 1 ) , , μ ( A k ) are mutually independent for all disjoint measurable sets A 1 , , A k A . Every CRM consists of three independent parts:
μ = μ 0 + k = 1 K q k δ ω k + ( q , ω ) Φ q δ ω ,
where μ 0 is a non-random measure, ( ω k ) k [ K ] are fixed points in Ω , ( q k ) k [ K ] are independent random variables on R + and Φ is a Poisson process on R + × Ω . In this paper, we consider purely atomic CRMs with μ 0 = 0 . If μ is a purely atomic CRM written as μ = k = 1 K q k δ ω k + ( q , ω ) Φ q δ ω , we write
μ CRM   ( Λ , ( ω k , P k ) k [ K ] ) ,
where P k is a probability measure on R + , such that q k ind P k for each k [ K ] and Λ is a probability measure on R + × Ω , such that E Φ = Λ , i.e., Λ is the intensity measure of the Poisson process Φ . We write μ CRM ( Λ ) if μ = ( q , ω ) Φ q δ ω with E Φ = Λ .
We introduce two specific CRMs. The Bernoulli process with mean probability measure μ , denoted by BeP ( μ ) , is the CRM on ( Ω , A ) with intensity measure
Λ BeP ( μ ) ( d q , d ω ) = δ 1 ( d q ) μ ( d ω ) ,
where δ 1 denotes a point mass at 1. The beta process with parameters a , b > 0 and base measure Λ 0 is the CRM on ( Ω , A ) with intensity measure
Λ BP ( a , b , Λ 0 ) ( d q , d ω ) = 1 B ( a , b ) q a 2 ( 1 q ) b 1 d q Λ 0 ( d ω ) .
It is known that IBP ( γ , κ ) with γ > 0 and κ > 0 is described by a random measure as follows:
ξ j | μ iid BeP ( μ ) , j [ p ] μ BP ( 1 , κ , γ Λ 0 )
for some smooth probability measure Λ 0 . Note that the intensity measure of BP ( 1 , κ , γ Λ 0 ) is given by
Λ BP ( 1 , κ + 1 , γ Λ 0 ) ( d q , d ω ) = γ κ q 1 ( 1 q ) κ 1 d q Λ 0 ( d ω ) .
We introduce another random measure representation, which turns out to be related to the CIBP.
Definition 3 (Random measure representation of the CIBP). 
Let γ > 0 , α > 0 , and κ 0 . We refer to the following stochastic process the random measure representation of CIBP ( γ , α , κ ) :
ξ j | μ iid BeP ( μ ) , j [ p ]                                 μ BP ( α + 1 , κ , γ α α + κ Λ 0 )
for some smooth probability measure  Λ 0 . Note that the intensity measure of  BP   ( α + 1 , κ , γ α α + κ Λ 0 )  is given by
Λ BP ( α + 1 , κ , γ α α + κ Λ 0 ) ( d q , d ω ) = γ B ( α , κ ) q α 1 ( 1 q ) κ 1 d q Λ 0 ( d ω )
As the function q q α 1 ( 1 q ) κ 1 is integrable on [ 0 , 1 ] for α > 0 , there would be a finite number of features under the distribution of (13). In contrast, this function is not integrable for α = 0 , thus there would be an infinite number of features under the two-parameter IBP in (12).
In the following theorem, we show that Definitions 2 and 3 are equivalent.
Theorem 5.
The joint distribution of random measures ξ 1 , , ξ p generated as (13) is given by
P ( ξ 1 , , ξ p ) = e γ j = 1 p ψ α , κ 1 , j 1   [ k = 1 K + ψ α , κ m k , p m k λ 0 ( ω k * ) ] ,
where there are K+ atoms  ω 1 * , , ω K + *  such that  m k : = j = 1 p ξ j ( ω k * ) 1   f o r   k [ K + ] ,   a n d   λ 0  denotes the density of Λ0.
Proof. 
By the well-known conjugacy result (Theorem 3.3 of Kim [14]),
μ | ξ 1 , , ξ p 1 CRM ( Λ p , { ω k * , P k } k = 1 K ) ,
where ω 1 * , , ω K * are unique atoms that ξ 1 , , ξ p 1 possess,
P k ( d q ) : = q m p , k + α 1 ( 1 q ) p 1 m p , k + κ 1 d q ( 0 , 1 ] q m p , k + α 1 ( 1 q ) p 1 m p , k + κ 1 d q = 1 B ( m p , k + α , p 1 m p , k + κ ) q m p , k + α 1 ( 1 q ) p 1 m p , k + κ 1 d q ,
with m p , k : = j = 1 p 1 ξ j ( ω k * ) , and
Λ p ( d q , d ω ) : = γ B ( α , κ ) q α 1 ( 1 q ) p 1 + κ 1 d q Λ 0 ( d ω ) .
Thus, for each atom ω k * , we have that
P ( ξ p ( ω k * ) = 1 | ξ 1 , , ξ p 1 ) = 1 B ( m p , k + α , p 1 m p , k + κ ) ( 0 , 1 ] q m p , k + α ( 1 q ) p 1 m p , k + κ 1 d q = 1 B ( m p 1 , k + α , p 1 m p 1 , k + κ ) B ( m p , k + 1 + α , p 1 m p , k + κ ) = m p , k + α p 1 + κ + α .
On the other hand, for a small change dω in ω Ω { ω 1 * , , ω K * } , we have
P ( ξ p ( d ω ) = 1 | ξ 1 , , ξ p 1 ) = E [ ξ p ( d ω ) | ξ 1 , , ξ p 1 ] = γ B ( α , κ ) ( 0 , 1 ] q α ( 1 q ) p 1 + κ 1 d q Λ 0 ( d ω ) = γ B ( α + 1 , p 1 + κ ) B ( α , κ ) Λ 0 ( d ω ) = γ ψ α , κ 1 , p 1 Λ 0 ( d ω ) .
Therefore, since   ξ p   is completely random and Λ0 is smooth,   ξ p   is a Poisson process with intensity measure   γ ψ α , κ 1 , p 1 Λ 0   on   Ω { ω 1 * , , ω K * } .  This implies that the number of new atoms in   ξ p  follows a Poisson distribution with mean  γ ψ α , κ 1 , p 1 .  Combining (16) and (17), we complete the proof.   □

5. Application to Bayesian Sparse Factor Models

In this section, as an application, we consider a Bayesian factor model with the CIBP prior.

5.1. Model and Prior

We assume that p-dimensional random vector Y is distributed as
Y | Z = z N p ( B z , σ 2 I ) , Z N K ( 0 , I ) ,
where B is a factor loading matrix, Z is a latent factor and   σ 2 > 0  is a noise variance. Let   β j k   be the (j, k)-th element of B. We impose the CIBP prior distribution on B as
β j k | ξ j k ind ( 1 ξ j k ) δ 0 + ξ j k N ( 0 , τ ) , j [ p ] , k [ K ] ξ j k | θ k ind Bernoulli   ( θ k ) , j [ p ] , k [ K ] θ k iid Beta ( α , κ ) , k [ K ] K Poisson ( γ )
with κ ≥ 0 and τ > 0. This means that we impose  CIBP ( γ , α , κ )   on the binary matrix   Ξ : = ( ξ j k ) j [ p ] , k N .  The above prior distribution on B is named as   SSCIBP p ( γ , α , κ , τ ) ,   which is a shorthand for spike−and−slab CIBP.

5.2. Posterior Computation

In this sectino, we describe a Markov chain Monte Carlo (MCMC) algorithm for computing the posterior distribution under the SSCIBP p ( γ , α , κ , τ ) prior on B and inverse Gamma prior IG ( a , b ) on σ 2 . We denote by K + the number of nonzero columns of the loading matrix B .
Sample β j k for j [ p ] and k [ K + ] . 
We sample β j k ss
β j k | N ( β ^ j k , τ ^ k ) if ξ j k = 1 δ 0 if ξ j k = 0 ,
where
τ ^ k : =   ( σ 2 i = 1 n Z i k 2 + τ 1 ) 1 β ^ j k : = τ ^ k { σ 2 i = 1 n Z i k ( Y i j h [ K + ] : h k Z i h β j h ) } .
Sample ξ j k for j [ p ] and k N . 
Since the CIBP is exchangeable, without loss of generality, we can assume that the j-th customer is the last customer. For each k [ K + ] , ξ j k is sampled with probability
Π ( ξ j k = 1 | ) Π ( ξ j k = 0 | ) = m j , k + α κ + p 1 m j , k τ ^ k τ exp ( 1 2 τ ^ k β ^ j k 2 ) ,
where m j , k : = l [ p ] : l j ξ l k . Next, we sample ξ j k for k > K + . by the Metropolis–Hastings (MH) step as follows: We generate the proposal K j * N { 0 } and β j * : = ( β j , 1 * , , β j , K j * * ) R K j * from the distribution
J ( K j * ) J ( β j * | K j * ) = Poisson ( 1 ) N ( 0 , τ ) K j * .
and then we accept the proposal with probability
min { 1 ,   | M j | n / 2 exp   ( 1 2 ( β j * ) M j 1 β j * i = 1 n E i j 2 )   ( γ ψ α , κ 1 , p 1 ) K j * } ,
where
M j : = σ 2 β j * ( β j * ) + I E i j : = σ 2   ( Y i j k = 1 K + Z i k β j k ) .
If the proposal is accepted, we update
B ( B , ( β j , k * I ( j = j ) ) j [ p ] , k [ K j * ] ) K + K + + K j * .
Sample Z i for i [ n ] . 
We sample Z i as
Z i | N   ( σ 2 Σ ^ Z B Y i , Σ ^ Z ) ,
where Σ ^ Z : = ( σ 2 B B + I ) 1 .
Sample σ 2 . 
We sample the noise variance σ 2 as
σ 2 | IG   ( a + n p 2 , b + 1 2 i = 1 n j = 1 p   ( Y i j k = 1 K + Z i k β j k ) 2 ) .
The dominant computational cost of our sampler is O ( n p K + ) per iteration, and since K + remains stochastically bounded under the CIBP, the complexity is linear in p, unlike in the classical IBP where K + = O ( log p ) .

5.3. Simulation

We perform a simulation study to compare the CIBP and the two-parameter IBP when they serve as prior distributions in the sparse factor model. We generate simulated datasets as follows: For each value p { 100 , 250 , 500 , 1000 } , we generate a p × 5 -dimensional loading matrix B 0 having 10 nonzero rows. Each factor loading in the sampled nonzero rows are generated from the uniform distribution on [ 3 , 3 ] . We generate n = 50 random vectors from the normal distribution with mean 0 and variance B 0 ( B 0 ) + I independently. For each configuration of p, we generate 100 independent datasets.
In all experiments, we compare CIBP ( γ , α , κ ) with γ = 0.1 , α = 10 and κ = 10 and IBP ( γ , κ ) with γ = 0.1 and κ = 10 . Note that γ = 0.1 is equal to γ α / κ = 0.1 × 10 / 10 used for the CIBP. Posterior inference is performed using the MCMC scheme described in Section 5.2. We run 10,000 iterations with 5000 burn-in and thin every 10th draw.
Figure 2 gives the posterior distributions of the number of nonzero factors under the CIBP ( γ , α , κ ) and IBP ( γ , κ ) priors, respectively, aggregated over 100 replications for each p { 100 , 250 , 500 , 1000 } . As the dimension p grows, the IBP ( γ , κ ) prior tends to considerably overestimate the number of factors. In contrast, the CIBP ( γ , α , κ ) prior yields accurate estimates of the number of factors for all considered values of p.
In Table 2 reports the averages of the following metrics across the 100 replications: (1) posterior mean E [ K + ] of the number of active features (nonzero columns of B ), (2) the proportion of zero entries in the posterior median of B (sparsity), and (3) average test log-likelihood per observation, evaluated on an independent test set of size 100. Under the CIBP prior, E [ K + ] remains stable around the true value five across all values of p. In contrast, the IBP prior yields an increasing number of active features as p grows. For the sparsity of the loading matrix, the CIBP consistently produces sparser loading matrices than the IBP. Despite using fewer features and sparser loading matrix, the CIBP yields equal or better predictive log-likelihood compared to the IBP. These results support the claim that the CIBP provides a parsimonious yet predictive alternative to the IBP when the underlying number of latent features does not diverge.
In Appendix A, we provide the results of a sensitivity analysis regarding the choice of hyperparameters, as well as convergence diagnostics for our MCMC algorithm.

6. Conclusions

In this paper, we introduced the CIBP, a novel three-parameter extension of the IBP. The key distinction between the CIBP and standard IBPs is that, under the CIBP, the distribution of the number of features converges to a stable limiting distribution as the number of objects increases. This property helps the CIBP avoid superfluous features in both interpretation and prediction. We evaluated the proposed CIBP within a high-dimensional sparse factor model and provided empirical evidence that it outperforms the IBP in estimating the number of factors.
Although our focus in this work is on posterior sampling, the proposed framework can be combined with several strategies to further improve scalability. First, stochastic subsampling may be used to approximate likelihood contributions when p is large, following advances in subsampling-based MCMC for large-scale Bayesian inference [15]. Second, collapsed or partially collapsed sampling can be employed, which is known to improve mixing in nonparametric feature allocation models, including IBP [16]. Finally, variational inference offers a substantially faster alternative for large-scale applications and can be adapted from the existing variance inference method for the standard IBP [17]. A systematic investigation of these computational refinements is left for future research.

Funding

This work was supported by INHA UNIVERSITY Research Grant.

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The author declares no conflicts of interest.

Appendix A. Additional Experiments

Appendix A.1. Sensitivity Analysis of CIBP Hyperparameters

We examine the sensitivity of the Convergent Indian Buffet Process (CIBP) with respect to its hyperparameters γ > 0 , α > 0 , and κ > 0 . For each hyperparameter configuration, we report the posterior average number of active features E [ K + ] and the sparsity level of the loading matrix, along with Monte Carlo standard errors (SEs) computed from 20 independent simulated datasets. Table A1, Table A2 and Table A3 presents the results of the sensitivity analysis. Across experiments, we observe the following general trends:
  • γ is the primary driver of latent feature complexity, strongly increasing or decreasing E [ K + ] .
  • α adjusts sparsity patterns, with mild secondary effects on E [ K + ] .
  • κ influences sparsity but has minimal impact on E [ K + ] , suggesting it mainly regulates feature retention, not feature count.
Table A1. Sensitivity to γ (fixed α = 10 , κ = 10 ; p = 500 ).
Table A1. Sensitivity to γ (fixed α = 10 , κ = 10 ; p = 500 ).
γ E [ K + ] (SE)Sparsity (%) (SE)
0.054.58  (0.09)97.6  (0.11)
0.105.07  (0.11)96.9  (0.10)
0.205.41  (0.13)95.8  (0.14)
0.505.83  (0.21)93.1  (0.20)
Table A2. Sensitivity to α (fixed γ = 0.1 , κ = 10 ; p = 500 ).
Table A2. Sensitivity to α (fixed γ = 0.1 , κ = 10 ; p = 500 ).
α E [ K + ] (SE)Sparsity (%) (SE)
0.55.62  (0.08)97.3  (0.12)
15.37  (0.09)97.1  (0.10)
55.18  (0.10)97.0  (0.10)
105.07  (0.11)96.9  (0.10)
205.15  (0.12)96.8  (0.13)
Table A3. Sensitivity to κ (fixed γ = 0.1 , α = 10 ; p = 400 ).
Table A3. Sensitivity to κ (fixed γ = 0.1 , α = 10 ; p = 400 ).
κ E [ K + ] (SE)Sparsity (%) (SE)
15.11  (0.10)92.4  (0.13)
55.09  (0.09)94.7  (0.12)
105.08  (0.09)96.9  (0.10)
205.05  (0.08)97.2  (0.14)

Appendix A.2. MCMC Convergence Diagnostics

We evaluate the convergence of our MCMC samples using trace, autocorrelation, and partial autocorrelation plots for a few randomly chosen nonzero loadings. Figure A1 presents these plots. The results show that the MCMC chain converges well.
Figure A1. The trace, autocorrelation and partial autocorrelation plots of the posterior samples of some randomly selected nonzero loadings. The blue dashed lines indicate confidence bands.
Figure A1. The trace, autocorrelation and partial autocorrelation plots of the posterior samples of some randomly selected nonzero loadings. The blue dashed lines indicate confidence bands.
Mathematics 13 03881 g0a1

References

  1. Griffiths, T.L.; Ghahramani, Z. Infinite latent feature models and the Indian buffet process. In Proceedings of the 18th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 5–8 December 2005; pp. 475–482. [Google Scholar]
  2. Meeds, E.; Ghahramani, Z.; Neal, R.M.; Roweis, S.T. Modeling dyadic data with binary latent factors. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 3–6 December 2007; pp. 977–984. [Google Scholar]
  3. Miller, K.; Jordan, M.I.; Griffiths, T.L. Nonparametric latent feature models for link prediction. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 7–10 December 2009; pp. 1276–1284. [Google Scholar]
  4. Navarro, D.J.; Griffiths, T.L. Latent features in similarity judgments: A nonparametric Bayesian approach. Neural Comput. 2008, 20, 2597–2628. [Google Scholar] [CrossRef] [PubMed]
  5. James, L.F.; Lee, J.; Pandey, A. Posterior distributions for Hierarchical Spike and Slab Indian Buffet processes. arXiv 2021, arXiv:2103.11407. [Google Scholar] [CrossRef]
  6. Warr, R.L.; Dahl, D.B.; Meyer, J.M.; Lui, A. The Attraction Indian Buffet Distribution. Bayesian Anal. 2022, 17, 931–967. [Google Scholar] [CrossRef]
  7. Ghilotti, L.; Camerlenghi, F.; Rigon, T. Bayesian analysis of product feature allocation models. J. R. Stat. Soc. Ser. B Stat. Methodol. 2025, qkaf058. [Google Scholar] [CrossRef]
  8. Teh, Y.W.; Gorur, D. Indian buffet processes with power-law behavior. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 7–10 December 2009; pp. 1838–1846. [Google Scholar]
  9. Thibaux, R.; Jordan, M.I. Hierarchical beta processes and the Indian buffet process. In Proceedings of the Artificial Intelligence and Statistics, San Juan, Puerto Rico, 21–24 March 2007; pp. 564–571. [Google Scholar]
  10. Onatski, A. Determining the number of factors from empirical distribution of eigenvalues. Rev. Econ. Stat. 2010, 92, 1004–1016. [Google Scholar] [CrossRef]
  11. Lettau, M.; Pelger, M. Estimating latent asset-pricing factors. J. Econom. 2020, 218, 1–31. [Google Scholar] [CrossRef]
  12. Kopf, A.; Claassen, M. Latent representation learning in biology and translational medicine. Patterns 2021, 2, 100198. [Google Scholar] [CrossRef]
  13. Ghosal, S.; Van der Vaart, A. Fundamentals of Nonparametric Bayesian Inference; Cambridge University Press: Cambridge, UK, 2017; Volume 44. [Google Scholar]
  14. Kim, Y. Nonparametric Bayesian estimators for counting processes. Ann. Stat. 1999, 27, 562–588. [Google Scholar] [CrossRef]
  15. Bardenet, R.; Doucet, A.; Holmes, C. On Markov chain Monte Carlo methods for tall data. J. Mach. Learn. Res. 2017, 18, 1–43. [Google Scholar]
  16. Griffiths, T.L.; Ghahramani, Z. The Indian buffet process: An introduction and review. J. Mach. Learn. Res. 2011, 12, 1185–1224. [Google Scholar]
  17. Doshi, F.; Miller, K.; Van Gael, J.; Teh, Y.W. Variational inference for the Indian buffet process. In Proceedings of the Artificial Intelligence and Statistics, Clearwater Beach, FL, USA, 16–18 April 2009; pp. 137–144. [Google Scholar]
Figure 1. Draws from CIBP ( γ κ / α , α , κ ) and IBP ( γ , κ ) with γ = 5 , κ = 4 but with α = 5 , α = 1 , and α = 0.5 .
Figure 1. Draws from CIBP ( γ κ / α , α , κ ) and IBP ( γ , κ ) with γ = 5 , κ = 4 but with α = 5 , α = 1 , and α = 0.5 .
Mathematics 13 03881 g001
Figure 2. The posterior distributions of the number of nonzero factors under the CIBP and IBP priors, respectively. The dashed vertical line indicates the true number of nonzero factors.
Figure 2. The posterior distributions of the number of nonzero factors under the CIBP and IBP priors, respectively. The dashed vertical line indicates the true number of nonzero factors.
Mathematics 13 03881 g002
Table 1. Empirical and theoretical probabilities for the number of active features under CIBP ( 1 , 1 , 1 ) with p = 500 .
Table 1. Empirical and theoretical probabilities for the number of active features under CIBP ( 1 , 1 , 1 ) with p = 500 .
kEmpirical Prob.Poisson Prob.Abs. Diff.
00.3690.3690.002
10.3680.3680.000
20.1810.1840.002
30.0570.0610.004
40.0170.0150.002
≥50.0080.0050.001
Table 2. Comparison between CIBP and IBP priors in sparse factor modeling. Reported are averages over 10 datasets with standard errors in parentheses.
Table 2. Comparison between CIBP and IBP priors in sparse factor modeling. Reported are averages over 10 datasets with standard errors in parentheses.
p E [ K + ] Sparsity (%)Predictive Log-Likelihood
CIBPIBPCIBPIBPCIBPIBP
1005.12 (0.18)6.54 (0.22)81.82 (0.38)84.97 (0.52) 1.72 (0.02) 1.74 (0.02)
2505.18 (0.19)6.81 (0.29)85.05 (0.34)82.87 (0.61) 1.70 (0.02) 1.76 (0.03)
5005.29 (0.21)7.92 (0.41)95.21 (0.30)88.15 (0.73) 1.69 (0.02) 1.79 (0.03)
10005.24 (0.23)7.53 (0.56)97.46 (0.27)87.09 (0.82) 1.68 (0.02) 1.83 (0.04)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ohn, I. The Convergent Indian Buffet Process. Mathematics 2025, 13, 3881. https://doi.org/10.3390/math13233881

AMA Style

Ohn I. The Convergent Indian Buffet Process. Mathematics. 2025; 13(23):3881. https://doi.org/10.3390/math13233881

Chicago/Turabian Style

Ohn, Ilsang. 2025. "The Convergent Indian Buffet Process" Mathematics 13, no. 23: 3881. https://doi.org/10.3390/math13233881

APA Style

Ohn, I. (2025). The Convergent Indian Buffet Process. Mathematics, 13(23), 3881. https://doi.org/10.3390/math13233881

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop