The Convergent Indian Buffet Process

Ohn, Ilsang

doi:10.3390/math13233881

Open AccessArticle

The Convergent Indian Buffet Process

by

Ilsang Ohn

Department of Statistics, Inha University, Incheon 22212, Republic of Korea

Mathematics 2025, 13(23), 3881; https://doi.org/10.3390/math13233881

Submission received: 14 October 2025 / Revised: 21 November 2025 / Accepted: 1 December 2025 / Published: 3 December 2025

(This article belongs to the Special Issue Application of the Bayesian Method in Statistical Modeling, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

We propose a new Bayesian nonparametric prior for latent feature models, called the Convergent Indian Buffet Process (CIBP). We show that under the CIBP, the number of latent features is distributed as a Poisson distribution, with the mean monotonically increasing but converging to a certain value as the number of objects goes to infinity. That is, the expected number of features is bounded above even when the number of objects goes to infinity, unlike the standard Indian Buffet Process, under which the expected number of features increases with the number of objects. We provide two alternative representations of the CIBP based on a hierarchical distribution and a completely random measure, which are of independent interest. The proposed CIBP is assessed on a high-dimensional sparse factor model.

Keywords:

Indian buffet process; latent feature models; completely random measure; sparse factor models

MSC:

62F15; 62G20

1. Introduction

1.1. Motivation and Background

Latent feature models provide a flexible framework for representing objects that may simultaneously possess multiple latent characteristics. The Indian Buffet Process (IBP), first introduced by [1], has become one of the most essential Bayesian nonparametric priors for such models, owing to its elegant construction as an exchangeable distribution over binary matrices with infinitely many columns. Each column corresponds to a potential latent feature, and each row to an object possessing a subset of these features. This approach has been widely applied to diverse areas, including factor analysis, link prediction, and relational data modeling [2,3,4]. Although its original development is more than a decade ago, recent research demonstrates that IBP-based models remain an active and evolving field within Bayesian nonparametrics, with extensions including hierarchical spike-and-slab IBP [5], similarity-based feature learning [6], and more general product feature allocation formulations [7].

Despite its popularity, a key limitation of the standard IBP and its two- and three-parameter extensions [8,9] lies in the unbounded growth of the expected number of features as the number of objects increases. For a latent feature matrix

Ξ \in R^{p \times \infty}

, each row corresponds to an object and each column to a potential feature. The number of active features, denoted by the number of nonzero columns in

Ξ

, is defined as

K^{+} : = \sum_{k = 1}^{\infty} I (ξ_{(k)} \neq 0),

where

ξ_{(k)}

is the k-th column of

Ξ

and

I (\cdot)

is the indicator function. Under the standard IBP and its two- or three-parameter generalizations, the expected number of features

E [K^{+}]

increases either logarithmically or polynomially as the number of objects p grows:

E [K^{+}] = \{\begin{matrix} O (log p), & for the one- or two-parameter IBP, \\ O (p^{a}), & for the three-parameter IBP with a > 0 . \end{matrix}

Hence, existing IBP variants can produce redundant and uninformative features as data size increases, reducing interpretability and predictive performance. This property may be undesirable for applications where the number of underlying latent factors is believed to be finite or asymptotically stable.

In many practical domains, the assumption of infinitely increasing latent features is not realistic. For example, in macroeconomic or financial modeling, the co-movements among a large number of asset returns can often be explained by a fixed and small number of underlying factors [10,11]. Similarly, in biological and social network analysis, the generative mechanisms are often governed by a finite set of latent processes that do not proliferate with additional data [12]. However, the standard IBP tends to allocate unnecessary features as the data size grows, which complicates interpretability and reduces statistical efficiency.

This motivates the need for a new nonparametric prior that maintains the flexibility of IBP while ensuring a convergent (bounded) number of features, i.e., the distribution of

K^{+}

converges to a finite limit as

p \to \infty

. Such a prior would bridge the gap between classical finite feature models and existing nonparametric counterparts, combining parsimony with theoretical tractability.

In this context, we propose the Convergent Indian Buffet Process (CIBP), a novel three-parameter extension of the IBP that possesses a convergent expected number of features. The CIBP retains exchangeability and constructive elegance of the IBP while introducing an additional parameter that governs the rate of convergence. Specifically, the total number of active features follows a Poisson distribution whose mean does not diverge as the number of objects tends to infinity.

1.2. Contributions and Organization

This paper makes both methodological and theoretical contributions to Bayesian nonparametric modeling. First, we introduce the Convergent Indian Buffet Process (CIBP), a new stochastic process that extends the Indian Buffet Process by ensuring a bounded expected number of latent features. The CIBP maintains the desirable exchangeability property of the IBP while incorporating an additional parameter that controls convergence. This enables a parsimonious latent feature inference even as the number of observations increases.

Second, we establish the theoretical foundations of the CIBP. We derive its distributional properties, including the asymptotic convergence of the number of active features, and provide two equivalent formulations: a hierarchical representation based on Poisson–Beta–Bernoulli layers, and a completely random measure construction using the Beta process. These representations clarify the relationship between the CIBP and existing nonparametric priors, offering new insight into the structure of latent feature models.

Finally, we demonstrate the utility of the CIBP through its application to Bayesian sparse factor models. The empirical results show that the CIBP yields stable and interpretable factor inference while preventing overfitting in high-dimensional data.

The remainder of this paper is organized as follows: Section 2 introduces the Convergent Indian Buffet Process and analyzes its distributional properties. Section 3 and Section 4 develop the hierarchical and random measure formulations, respectively. Section 5 presents an application to Bayesian sparse factor modeling, and Section 6 concludes the paper.

2. Convergent Indian Buffet Process

In this section, we formalize the construction of the proposed Convergent Indian Buffet Process (CIBP). The CIBP extends the classical IBP by introducing a “convergence” parameter that ensures the total number of latent features remains bounded as the number of observations increases. While the standard IBP provides an elegant and exchangeable distribution over infinite binary matrices, its expected number of features grows unboundedly with data size. The CIBP modifies this mechanism while preserving exchangeability, thereby providing a regularized version of the IBP suitable for parsimonious latent feature inference.

2.1. Restaurant Analogy

The Convergent Indian Buffet Process (CIBP) is defined by the restaurant analogy given below. We denote by

B (a, b)

the beta function for

a > 0

and

b > 0

.

Definition 1 (The restaurant analogy of the CIBP).

Let

γ > 0

,

α > 0

and

κ > 0

. We refer to the following stochastic process, the restaurant analogy of

CIBP

(γ, α, κ)

:

1.

The first customer tries

K_{1}^{+}

dishes with

Poisson

(γ B (α + 1, κ) / B (α, κ))

2.

For every

j = 2, \dots, p

, the j-th customer does:

For every $k = 1, \dots, K_{j - 1}^{+}$ , the j-th customer tries the k-th dish if $ξ_{j, k} = 1$ and does not otherwise, where

$ξ_{j, k} \sim Bernoulli (\frac{m_{j, k} + α}{j - 1 + κ + α}) .$

(1)

Here, $m_{j, k}$ denotes the number of previous customers before the j-th customer, who have tried the k-th dish.
The j-th customer tries $K_{j}^{new}$ many new dishes with

$K_{j}^{new} \sim Poisson (γ \frac{B (α + 1, κ + j - 1)}{B (α, κ)}) .$

(2)
Set $K_{j}^{+} = K_{j - 1}^{+} + K_{j}^{n e w}$ .

Each realization of the above restaurant process is a binary matrix, where the number of rows corresponds to the number of customers and the columns are unbounded. In this matrix, the element at position

(j, k)

is set to one if the j-th customer has sampled the k-th dish, and zero if they have not. The distribution of this binary matrix, as generated by the restaurant analogy, is denoted by

CIBP (γ, α, κ)

.

We provide an intuition behind the converging property of the number of features under the CIBP. Under the CIBP, the expected number of new dishes taken by the j-th customer is

\begin{matrix} γ \frac{B (α + 1, κ + j - 1)}{B (α, κ)} & = γ \frac{Γ (α + 1) Γ (κ + j - 1)}{Γ (α + κ + j)} \frac{Γ (α + κ)}{Γ (α) Γ (κ)} \\ = γ \frac{Γ (α + 1)}{Γ (α)} \frac{Γ (α + κ)}{Γ (α + κ + j)} \frac{Γ (κ + j - 1)}{Γ (κ)} \\ = γ \frac{α}{α + κ + j - 1} \prod_{h = 1}^{j - 1} \frac{κ + h - 1}{α + κ + h - 1}, \end{matrix}

(3)

where Γ(·) denotes the gamma function. As the simplest case, when α = 1, the last display becomes

\begin{matrix} \frac{γ}{κ + j} \frac{κ}{κ + j - 1} = O (\frac{1}{j^{2}}) . \end{matrix}

since

\sum_{j = 1}^{\infty} 1 / j^{2} < \infty

we can expect that the expected number of features converges to a finite number as p tends to ∞. Indeed, this convergent property holds for any α > 0, which we will show in the next subsection.

Throughout the paper, we let

ψ_{a_{1}, b_{1}}^{a_{2}, b_{2}}

denote the ratio of two beta functions defined as

ψ_{a_{1}, b_{1}}^{a_{2}, b_{2}} : = \frac{B (a_{1} + a_{2}, b_{1} + b_{2})}{B (a_{1}, b_{1})} .

for

a_{1}, b_{1} > 0

and

a_{2}, b_{2} \geq 0

for notational simplicity. For example, the Poisson distribution in (2) may be written as

Poisson (γ ψ_{α, κ}^{1, j - 1})

.

2.2. Distribution of the Number of Features Under the CIBP

In this subsection, we show that the number of features under the CIBP follows a Poisson distribution with the mean being fixed as the number of objects increases. The name, convergent IBP, is named after this property.

Theorem 1.

If

Ξ \sim CIBP (γ, α, κ)

, then the number of active features is distributed as

K^{+} \sim Poisson (γ (1 - ψ_{α, κ}^{0, p})),

(4)

where

ψ_{α, κ}^{0, p} : = B (α, κ + p) / B (α, κ) \leq 1 .

Moreover, the mean

γ (1 - ψ_{α, κ}^{0, p})

is monotonically increasing and tends to γ as

p \to \infty

. This implies that

K^{+}

converges to the random variable K following

Poisson (γ)

in distribution.

Proof.

From the restaurant analogy of the CIBP, we have that

K^{+} \overset{d}{=} \sum_{j = 1}^{P} K_{j}^{new}, where K_{j}^{new} \overset{ind}{\sim} Poisson (γ \frac{B (α + 1, κ + j - 1)}{B (α, κ)})

Therefore, by the additive property of independent Poisson random variables,

K^{+} \sim Poisson (\frac{γ}{B (α, κ)} \sum_{j = 1}^{p} B (α + 1, κ + j - 1))

From the identity

B (x, y) - B (x, y + 1) = B (x + 1, y)

, we have

\begin{matrix} \sum_{j = 1}^{p} B (α + 1, κ + j - 1) & = \sum_{j = 1}^{p} {B (α, κ + j - 1) - B (α, κ + j)} \\ = B (α, κ) - B (α, κ + p), \end{matrix}

which implies (4).

For the second assertion, note that

\begin{matrix} ψ_{α, κ}^{0, p} & = \frac{Γ (α) Γ (κ + p)}{Γ (α + κ + p)} \frac{Γ (α + κ)}{Γ (α) Γ (κ)} \\ = \frac{Γ (α + κ)}{Γ (α + κ + p)} \frac{Γ (κ + p)}{Γ (κ)} \\ = \prod_{j = 1}^{p} \frac{κ + j - 1}{α + κ + j - 1} . \end{matrix}

Since

α > 0

, it follows that

ψ_{α, κ}^{0, p} ↓ 0

as

p \to \infty

. □

We provide a simple numerical example supporting (4). In Table 1, we compare the empirical distribution of

K^{+}

obtained from 5000 draws of feature matrices under

CIBP (1, 1, 1)

with

p = 500

to the distribution of

Poisson (1 - ψ_{1, 1}^{0, 500})

. The absolute differences are below

0.005

across all categories, providing numerical support for the distribution of the number of active features under the CIBP.

2.3. Connection to the Two-Parameter IBP

We give the restaurant analogy of

IBP (γ^{†}, κ)

. The first customer tries

Poisson (γ^{†})

dishes. For

j \geq 2

, the j-th customer tries each previously tasted dish according to

\begin{matrix} Bernoulli (\frac{m_{j, k}}{j - 1 + κ}), \end{matrix}

and tries

K_{j}^{new}

many new dishes with

K_{j}^{new}

following

\begin{matrix} Poisson (\frac{γ^{†} κ^{†}}{j - 1 + κ}) . \end{matrix}

We denote by

IBP (γ^{†}, κ)

the distribution induced by the above restaurant analogy.

The next theorem reveals the relationship between

CIBP (γ, α, κ)

and

IBP (γ^{†}, κ)

.

Theorem 2.

For two

p \times \infty

-dimensional binary matrices

Ξ \sim CIBP (γ, α, κ)

and

Ξ_{0} \sim IBP (γ^{†}, κ), Ξ

converges to

Ξ_{0}

in distribution as

α \to 0

and

γ α / κ \to γ^{†}

.

Proof.

Note that

\frac{m_{j, k} + α}{j - 1 + κ + α} \to \frac{m_{j, k}}{j - 1 + κ}

as

α \to 0

. Moreover,

\begin{matrix} γ \frac{B (α + 1, κ + j - 1)}{B (α, κ)} & = γ \frac{α}{α + κ + j - 1} \prod_{h = 1}^{j - 1} \frac{κ + h - 1}{α + κ + h - 1} \\ \to \frac{γ^{†} κ}{j - 1 + κ}, \end{matrix}

as

α \to 0

and

γ α / κ \to γ^{†}

, where the equality was shown in (3). These two displays imply that the means of the Bernoulli distribution in (1) and the Poisson distribution in (2) converge to the corresponding quantities for

IBP (γ^{†}, κ)

. This gives the desired result. □

Figure 1 shows some realizations of

CIBP (γ^{†} κ / α, α, κ)

α = 5

,

α = 1

,

α = 0.5

and

IBP (γ^{†}, κ)

, with

γ^{†} = 5

, where

κ = 4

is fixed. It is evident that the IBP generates more active features than the CIBP.

3. Hierarchical Representation

In this section, we prove that the restaurant analogy of the CIBP can be expressed as some hierarchical distribution.

Definition 2 (Hierarchical representation of the CIBP).

Let

γ > 0

,

α > 0

, and

κ > 0

. We refer to the following hierarchical probability distribution as the hierarchical representation of

CIBP (γ, α, κ)

:

K \sim Poisson (γ), θ_{k} \overset{iid}{\sim} Beta (α, κ), k \in [K] ξ_{j k} | θ_{k} \overset{ind}{\sim} Bernoulli (θ_{k}) j \in [p], k \in [K] .

(5)

To articulate the result precisely, we introduce the notion of lof-equivalence classes. In the context of the latent feature model, the sequence of features does not influence the likelihood. Therefore, we consider two

p \times \infty

dimensional binary matrices to be equivalent if they can be converted into each other by rearranging their columns. It is practical to select a representative for each equivalence class through the left-ordering procedure. This procedure transforms each

p \times \infty

dimensional binary matrix into its left-ordered form, arranging the columns based on the score

s_{k}

, which is defined as follows:

s_{k} : = \sum_{j = 1}^{p} ξ_{j k} 2^{p - j}

i.e., the columns are ordered so that

s_{1} \geq s_{2} \geq \dots .

The equivalence class defined by the left-ordering procedure is called the lof-equivalence class and denoted by

[Ξ]

.

We introduce additional notations. Let

Δ : = {0, 1}^{p}

and

Δ_{1} : = Δ ∖ {0}

. Moreover, for

u \in Δ_{1}

, we define

\begin{matrix} K_{u} & : = \sum_{k = 1}^{\infty} I (ξ^{(k)} = u), \end{matrix}

(6)

with

ξ^{(k)}

being the k-th column of

Ξ

, which is the number of columns of

Ξ

equal to

u

. We then have

K^{+} : = \sum_{k = 1}^{\infty} I (ξ^{(k)} \neq 0) = \sum_{u \in Δ_{1}} K_{u}

. We define

m_{k} : = \sum_{j = 1}^{p} ξ_{j k},

which is the number of rows that possess the k-th feature.

The following theorem provides the probability mass function of the lof-equivalence class

[Ξ]

.

Theorem 3.

The probability mass function of a

p \times \infty

-dimensional random binary matrix

Ξ \equiv {(ξ_{j k})}_{j \in [p], k \in N}

generated from the distribution in (5) is given by

P ([Ξ]) = \frac{γ^{K^{+}}}{\prod_{u \in Δ_{1}} K_{u}!} e^{- γ \sum_{j = 1}^{p} ψ_{α, κ}^{1, j - 1}} [\prod_{k = 1}^{K^{+}} ψ_{α, κ}^{m_{k}, p - m_{k}}] .

(7)

Proof.

Recall that

m_{k} : = \sum_{j = 1}^{p} ξ_{j k}

. If

K \geq K^{+}

, we have that

\begin{matrix} P (Ξ | K) & = \prod_{k = 1}^{K} \frac{B (m_{k} + α, p - m_{k} + κ)}{B (α, κ)} \\ = {(\frac{B (α, p + κ)}{B (α, κ)})}^{K - K^{+}} \times \prod_{k = 1}^{K^{+}} \frac{B (m_{k} + α, p - m_{k} + κ)}{B (α, κ)} \\ = {(ψ_{α, κ}^{0, p})}^{K - K^{+}} \times \prod_{k = 1}^{K^{+}} ψ_{α, κ}^{m_{k}, p - m_{k}}, \end{matrix}

where for the second equality, we reorder the columns such that

m_{k} > 0

if

k \leq K^{+}

and

m_{k} = 0

otherwise. Therefore, since the cardinality of the lof-equivalence class is

| [Ξ]] = K! / \prod_{u \in Δ} K_{u}!

, the (conditional) probability mass function of

[Ξ]

given

K \geq K^{+}

is given by

P ([Ξ] | K) = \frac{K!}{\prod_{u \in Δ} K_{u}!} {(ψ_{α, κ}^{0, p})}^{K - K^{+}} \prod_{k = 1}^{K^{+}} ψ_{α, κ}^{m_{k}, p - m_{k}}

If

K < K^{+}

, we have

P (Ξ | K) = 0

. Since

K \sim Poisson (γ)

, we have

\begin{matrix} P ([Ξ]) = \frac{1}{\prod_{u \in Δ_{1}} K_{u}!} [\prod_{k = 1}^{K^{+}} ψ_{α, κ}^{m_{k}, p - m_{k}}] \sum_{K = K^{+}}^{\infty} \frac{K!}{K_{0}!} {(ψ_{α, κ}^{0, p})}^{K - K^{+}} p_{K} (K) . \end{matrix}

where

p_{K}

denotes the probability mass function of

Poisson (γ)

, i.e.,

p_{K} (k) : = e^{- γ} γ^{k} / k!

for

k = 0, 1, 2, \dots

. Note that

\sum_{K = K^{+}}^{\infty} \frac{K!}{K_{0}!} {(ψ_{α, κ}^{0, p})}^{K - K^{+}} p_{K} (k) = e^{- γ} γ^{K^{+}} \sum_{K = K^{+}}^{\infty} \frac{1}{(K - K^{+})!} {(γ ψ_{α, κ}^{0, p})}^{K - K^{+}} = γ^{K^{+}} e^{- γ (1 - ψ_{α, κ}^{0, p})} .

(8)

Moreover, by the identity

B (x, y) - B (x, y + 1) = B (x + 1, y)

, we have

\begin{matrix} 1 - ψ_{α, κ}^{0, p} & = 1 - \frac{B (α, p + κ)}{B (α, κ)} \\ = \frac{1}{B (α, κ)} {B (α, κ) - B (α, p + κ)} \\ = \frac{1}{B (α, κ)} \sum_{j = 1}^{p} {B (α, κ + j - 1) - B (α, κ + j)} \\ = \frac{1}{B (α, κ)} \sum_{j = 1}^{p} B (α + 1, κ + j - 1) \\ = \sum_{j = 1}^{p} ψ_{α, κ}^{1, j - 1} . \end{matrix}

(9)

Combining (8) and (9), we obtain the desired result. □

By Theorem 3, it can be shown that Definitions 1 and 2 are equivalent.

Theorem 4.

For a

p \times \infty

-dimensional binary matrix Ξ following

CIBP (γ, α, κ)

, the probability mass function of the lof-equivalence class

[Ξ]

is the same as (7). Therefore, the distribution in (5) is equivalent to

CIBP (γ, α, κ)

.

Proof.

Let

ξ_{j}

be the j-th row of

Ξ

. We have

P (ξ_{1}) = \frac{1}{K_{1}^{+}!} {(γ ψ_{α, κ}^{1, 0})}^{K_{1}^{+}} e^{- γ ψ_{α, κ}^{1, 0}},

where

K_{1}^{+}

denotes the number of nonzero elements in

ξ_{1}

. It is same as (7) with

p = 1

and

K^{+} = K_{1}^{+}

.

For

p \geq 2

, the conditional distribution of

ξ_{p}

given

ξ_{1}, \dots, ξ_{p - 1}

is given by

\begin{matrix} P (ξ_{p} | ξ_{1}, \dots, ξ_{p - 1}) & = e^{- γ ψ_{α, κ}^{1, p - 1}} \frac{{(γ ψ_{α, κ}^{1, p - 1})}^{K_{p}^{new}}}{K_{p}^{new}!} \\ \times \prod_{k \in J_{p}} \frac{m_{p, k} + α}{p - 1 + κ + α} \prod_{k \notin J_{p}} \frac{p - 1 - m_{p, k} + κ}{p - 1 + κ + α}, \end{matrix}

(10)

where

m_{p, k} : = \sum_{j = 1}^{p - 1} ξ_{j k}, K_{p}^{new}

is the number of new dishes taken by the p-th customer and J_p is the set of dishes taken by the p-th customer, i.e.,

J_{p} : = {k \in [K_{p - 1}^{+}] : ξ_{p k} = 1} . Let K_{p}^{+} : = \sum_{j = 1}^{p} K_{j}^{new} = K_{p - 1}^{+} + K_{p}^{new} and K_{1}^{new} = K_{1}^{+} .

By the inductive hypothesis, we have

\begin{matrix} P (ξ_{1}, \dots, ξ_{p}) & = P (ξ_{p} | ξ_{1}, \dots, ξ_{p - 1}) P (ξ_{1}, \dots, ξ_{p - 1}) \\ = e^{- γ \sum_{j = 1}^{p} ψ_{α, κ}^{0, j - 1}} \frac{γ^{K_{p}^{+}}}{\prod_{j = 1}^{p} K_{j}^{new}!} \prod_{k \in J_{p}} \frac{m_{p, k} + α}{p - 1 + κ + α} ψ_{α, κ}^{m_{p, k}, p - 1 - m_{p, k}} \\ \times \prod_{k \notin J_{p}} \frac{p - 1 - m_{p, k}}{p - 1 + κ + α} ψ_{α, κ}^{m_{p, k}, p - 1 - m_{p, k}} \times {(ψ_{α, κ}^{1, p - 1})}^{K_{p}^{new}} \end{matrix}

since

m_{k} = m_{p, k} + 1 for k \in J_{p} and m_{k} = m_{p, k}

otherwise, we have

\begin{matrix} \frac{m_{p, k} + α}{p - 1 + κ + α} ψ_{α, κ}^{m_{p, k}, p - 1 - m_{p, k}} & = \frac{m_{p, k} + α}{p - 1 + κ + α} \frac{B (m_{p, k} + α, p - 1 - m_{p, k} + κ)}{B (α, κ)} \\ = \frac{B (m_{p, k} + 1 + α, p - 1 - m_{p, k} + κ)}{B (α, κ)} \\ = ψ_{α, κ}^{m_{p, k} + 1, p - 1 - m_{p, k}} \\ = ψ_{α, κ}^{m_{k}, p - m_{k}} \end{matrix}

and similarly,

\begin{matrix} \frac{p - 1 - m_{p, k}}{p - 1 + κ + α} ψ_{α, κ}^{m_{p, k}, p - 1 - m_{p, k}} = ψ_{α, κ}^{m_{p, k}, p - m_{p, k}} = ψ_{α, κ}^{m_{k}, p - m_{k}} . \end{matrix}

Therefore,

\begin{matrix} P (ξ_{1}, \dots, ξ_{p}) & = e^{- γ \sum_{j = 1}^{p} ψ_{α, κ + 1}^{1, j - 1}} \frac{γ^{K_{p}^{+}}}{\prod_{j = 1}^{p} K_{j}^{new}!} \prod_{k \in J_{p}} ψ_{α, κ}^{m_{k}, p - m_{k}} \\ \times \prod_{k \notin J_{p}} ψ_{α, κ}^{m_{k}, p - m_{k}} \times {(ψ_{α, κ}^{1, p - 1})}^{K_{p}^{new}} \\ = e^{- γ \sum_{j = 1}^{p} ψ_{α, κ}^{1, j - 1}} \frac{γ^{K_{p}^{+}}}{\prod_{j = 1}^{p} K_{j}^{new}!} \prod_{k = 1}^{K_{p}^{+}} ψ_{α, κ}^{m_{k}, p - m_{k}} . \end{matrix}

(11)

since

\prod_{j = 1}^{p} K_{j}^{new}! / \prod_{u \in Δ_{1}} K_{u}

many matrices generated by the above process have the same left-ordered form, we obtain

P ([Ξ])

by multiplying

\prod_{j = 1}^{p} K_{j}^{new}! / \prod_{u \in Δ_{1}} K_{u} and P (ξ_{1}, \dots, ξ_{p})

in (11). □

Using the hierarchical representation, we can explicitly obtain the distribution of

m_{k}

.

Corollary 1.

Assume

Ξ \sim CIBP (γ, α, κ)

. Let

F_{k} : = P (K \leq k)

for

K \sim Poisson (γ)

. Then, the (marginal) probability mass function of

m_{k} : = \sum_{j = 1}^{p} ξ_{j k}

is given by

\begin{matrix} P (m_{k} = m) = F_{k} (\binom{p}{m}) ψ_{α, κ}^{m, p - m} I (m = 0, 1, \dots, p) + (1 - F_{k}) I (m = 0) \end{matrix}

for each

k \in N

.

Proof.

In the hierarchical representation of the CIBP, we have

\sum_{j = 1}^{p} ξ_{j, k} | θ_{k} \sim Binom (θ_{k})

when

K \geq k

. Thus,

\begin{matrix} P (m_{k} = m | K \geq k) & = (\binom{p}{m}) E [θ^{m} {(1 - θ)}^{p - m}] \\ = (\binom{p}{m}) \frac{B (α + m, κ + p - m)}{B (α, κ)} = (\binom{p}{m}) ψ_{α, κ}^{m, p - m} \end{matrix}

On the other hand, we have

P (m_{k} = 0 | K < k) = 1

. Combining these two derivations, we obtain the desired. □

Exchangeability

The exchangeability of the IBP ensures that the associated posterior inference algorithms is tractable. As established in the following corollary, the CIBP is likewise an exchangeable distribution. This result follows directly from the hierarchical formulation of the CIBP given in Theorem 4.

Corollary 2.

The row vectors

ξ_{1}, \dots, ξ_{p}

of a

p \times \infty

-dimensional binary matrix Ξ generated from

CIBP (γ, α, κ)

are exchangeable.

4. Construction from Random Measures

In this section, we provide the random measure construction of the CIBP. We first briefly review the concept of completely random measures. We refer to Appendix J of [13] for more details. Let

(Ω, A)

be the pair of a Polish space with its Borel

σ

-field. Let

(M, M)

be the pair of a set of all measures on

(Ω, A)

with its Borel

σ

-field. A completely random measure (CRM)

μ

on

(Ω, A)

is a random measure satisfying that

μ (A_{1}), \dots, μ (A_{k})

are mutually independent for all disjoint measurable sets

A_{1}, \dots, A_{k} \in A

. Every CRM consists of three independent parts:

\begin{matrix} μ = μ_{0} + \sum_{k = 1}^{K} q_{k} δ_{ω_{k}} + \sum_{(q, ω) \in Φ} q δ_{ω}, \end{matrix}

where

μ_{0}

is a non-random measure,

{(ω_{k})}_{k \in [K]}

are fixed points in

Ω

,

{(q_{k})}_{k \in [K]}

are independent random variables on

R_{+}

and

Φ

is a Poisson process on

R_{+} \times Ω

. In this paper, we consider purely atomic CRMs with

μ_{0} = 0

. If

μ

is a purely atomic CRM written as

μ = \sum_{k = 1}^{K} q_{k} δ_{ω_{k}} + \sum_{(q, ω) \in Φ} q δ_{ω}

, we write

\begin{matrix} μ \sim CRM (Λ, {(ω_{k}, P_{k})}_{k \in [K]}), \end{matrix}

where

P_{k}

is a probability measure on

R_{+}

, such that

q_{k} \overset{ind}{\sim} P_{k}

for each

k \in [K]

and

Λ

is a probability measure on

R_{+} \times Ω

, such that

E Φ = Λ

, i.e.,

Λ

is the intensity measure of the Poisson process

Φ

. We write

μ \sim CRM (Λ)

if

μ = \sum_{(q, ω) \in Φ} q δ_{ω}

with

E Φ = Λ

.

We introduce two specific CRMs. The Bernoulli process with mean probability measure

μ^{'}

, denoted by

BeP (μ^{'})

, is the CRM on

(Ω, A)

with intensity measure

\begin{matrix} Λ_{BeP (μ^{'})} (d q, d ω) = δ_{1} (d q) μ^{'} (d ω), \end{matrix}

where

δ_{1}

denotes a point mass at 1. The beta process with parameters

a, b > 0

and base measure

Λ_{0}

is the CRM on

(Ω, A)

with intensity measure

\begin{matrix} Λ_{BP (a, b, Λ_{0})} (d q, d ω) = \frac{1}{B (a, b)} q^{a - 2} {(1 - q)}^{b - 1} d q Λ_{0} (d ω) . \end{matrix}

It is known that

IBP (γ^{†}, κ)

with

γ^{†} > 0

and

κ > 0

is described by a random measure as follows:

ξ_{j} | μ \overset{iid}{\sim} BeP (μ), j \in [p] μ \sim BP (1, κ, γ^{†} Λ_{0})

(12)

for some smooth probability measure

Λ_{0}

. Note that the intensity measure of

BP (1, κ, γ^{†} Λ_{0})

is given by

\begin{matrix} Λ_{BP (1, κ + 1, γ^{†} Λ_{0})} (d q, d ω) = γ^{†} κ q^{- 1} {(1 - q)}^{κ - 1} d q Λ_{0} (d ω) . \end{matrix}

We introduce another random measure representation, which turns out to be related to the CIBP.

Definition 3 (Random measure representation of the CIBP).

Let

γ > 0

,

α > 0

, and

κ \geq 0

. We refer to the following stochastic process the random measure representation of

CIBP (γ, α, κ)

:

ξ_{j} | μ \overset{iid}{\sim} BeP (μ), j \in [p] μ \sim BP (α + 1, κ, \frac{γ α}{α + κ} Λ_{0})

(13)

for some smooth probability measure

Λ_{0}

. Note that the intensity measure of

BP (α + 1, κ, \frac{γ α}{α + κ} Λ_{0})

is given by

\begin{matrix} Λ_{BP (α + 1, κ, \frac{γ α}{α + κ} Λ_{0})} (d q, d ω) = \frac{γ}{B (α, κ)} q^{α - 1} {(1 - q)}^{κ - 1} d q Λ_{0} (d ω) \end{matrix}

(14)

As the function

q \mapsto q^{α - 1} {(1 - q)}^{κ - 1}

is integrable on

[0, 1]

for

α > 0

, there would be a finite number of features under the distribution of (13). In contrast, this function is not integrable for

α = 0

, thus there would be an infinite number of features under the two-parameter IBP in (12).

In the following theorem, we show that Definitions 2 and 3 are equivalent.

Theorem 5.

The joint distribution of random measures

ξ_{1}, \dots, ξ_{p}

generated as (13) is given by

P (ξ_{1}, \dots, ξ_{p}) = e^{- γ \sum_{j = 1}^{p} ψ_{α, κ}^{1, j - 1}} [\prod_{k = 1}^{K^{+}} ψ_{α, κ}^{m_{k}, p - m_{k}} λ_{0} (ω_{k}^{*})],

(15)

where there are K⁺ atoms

ω_{1}^{*}, \dots, ω_{K^{+}}^{*}

such that

m_{k} : = \sum_{j = 1}^{p} ξ_{j} (ω_{k}^{*}) \geq 1 f o r k \in [K^{+}], a n d λ_{0}

denotes the density of Λ₀.

Proof.

By the well-known conjugacy result (Theorem 3.3 of Kim [14]),

μ | ξ_{1}, \dots, ξ_{p - 1} \sim CRM (Λ_{p}, {ω_{k}^{*}, P_{k}}_{k = 1}^{K}),

where

ω_{1}^{*}, \dots, ω_{K}^{*}

are unique atoms that

ξ_{1}, \dots, ξ_{p - 1}

possess,

\begin{matrix} P_{k} (d q) & : = \frac{q^{m_{p, k} + α - 1} {(1 - q)}^{p - 1 - m_{p, k} + κ - 1} d q}{\int_{(0, 1]} q^{m_{p, k} + α - 1} {(1 - q)}^{p - 1 - m_{p, k} + κ - 1} d q} \\ = \frac{1}{B (m_{p, k} + α, p - 1 - m_{p, k} + κ)} q^{m_{p, k} + α - 1} {(1 - q)}^{p - 1 - m_{p, k} + κ - 1} d q, \end{matrix}

with

m_{p, k} : = \sum_{j = 1}^{p - 1} ξ_{j} (ω_{k}^{*})

, and

Λ_{p} (d q, d ω) : = \frac{γ}{B (α, κ)} q^{α - 1} {(1 - q)}^{p - 1 + κ - 1} d q Λ_{0} (d ω) .

Thus, for each atom

ω_{k}^{*}

, we have that

\begin{matrix} P (ξ_{p} (ω_{k}^{*}) = 1 | ξ_{1}, \dots, ξ_{p - 1}) \\ = \frac{1}{B (m_{p, k} + α, p - 1 - m_{p, k} + κ)} \int_{(0, 1]} q^{m_{p, k} + α} {(1 - q)}^{p - 1 - m_{p, k} + κ - 1} d q \\ = \frac{1}{B (m_{p - 1, k} + α, p - 1 - m_{p - 1, k} + κ)} B (m_{p, k} + 1 + α, p - 1 - m_{p, k} + κ) \\ = \frac{m_{p, k} + α}{p - 1 + κ + α} . \end{matrix}

(16)

On the other hand, for a small change dω in

ω \in Ω ∖ {ω_{1}^{*}, \dots, ω_{K}^{*}}

, we have

\begin{matrix} P (ξ_{p} (d ω) = 1 | ξ_{1}, \dots, ξ_{p - 1}) & = E [ξ_{p} (d ω) | ξ_{1}, \dots, ξ_{p - 1}] \\ = \frac{γ}{B (α, κ)} \int_{(0, 1]} q^{α} {(1 - q)}^{p - 1 + κ - 1} d q Λ_{0} (d ω) \\ = γ \frac{B (α + 1, p - 1 + κ)}{B (α, κ)} Λ_{0} (d ω) \\ = γ ψ_{α, κ}^{1, p - 1} Λ_{0} (d ω) . \end{matrix}

(17)

Therefore, since

ξ_{p}

is completely random and Λ₀ is smooth,

ξ_{p}

is a Poisson process with intensity measure

γ ψ_{α, κ}^{1, p - 1} Λ_{0} on Ω ∖ {ω_{1}^{*}, \dots, ω_{K}^{*}}

. This implies that the number of new atoms in

ξ_{p}

follows a Poisson distribution with mean

γ ψ_{α, κ}^{1, p - 1}

. Combining (16) and (17), we complete the proof. □

5. Application to Bayesian Sparse Factor Models

In this section, as an application, we consider a Bayesian factor model with the CIBP prior.

5.1. Model and Prior

We assume that p-dimensional random vector

Y

is distributed as

Y | Z = z \sim N_{p} (B z, σ^{2} I), Z \sim N_{K} (0, I),

(18)

where B is a factor loading matrix, Z is a latent factor and

σ^{2} > 0

is a noise variance. Let

β_{j k}

be the (j, k)-th element of B. We impose the CIBP prior distribution on B as

\begin{matrix} β_{j k} | ξ_{j k} & \overset{ind}{\sim} (1 - ξ_{j k}) δ_{0} + ξ_{j k} N (0, τ), j \in [p], k \in [K] \\ ξ_{j k} | θ_{k} & \overset{ind}{\sim} Bernoulli (θ_{k}), j \in [p], k \in [K] \\ θ_{k} & \overset{iid}{\sim} Beta (α, κ), k \in [K] \\ K & \sim Poisson (γ) \end{matrix}

with κ ≥ 0 and τ > 0. This means that we impose

CIBP (γ, α, κ)

on the binary matrix

Ξ : = {(ξ_{j k})}_{j \in [p], k \in N} .

The above prior distribution on B is named as

{SSCIBP}_{p} (γ, α, κ, τ),

which is a shorthand for spike−and−slab CIBP.

5.2. Posterior Computation

In this sectino, we describe a Markov chain Monte Carlo (MCMC) algorithm for computing the posterior distribution under the

{SSCIBP}_{p} (γ, α, κ, τ)

prior on

B

and inverse Gamma prior

IG (a, b)

on

σ^{2}

. We denote by

K^{+}

the number of nonzero columns of the loading matrix

B

.

Sample $β_{j k}$ for $j \in [p]$ and $k \in [K^{+}]$ .

We sample

β_{j k}

ss

β_{j k} | - \sim \{\begin{matrix} N ({\hat{β}}_{j k}, {\hat{τ}}_{k}) & if ξ_{j k} = 1 \\ δ_{0} & if ξ_{j k} = 0, \end{matrix}

where

\begin{matrix} {\hat{τ}}_{k} & : = {(σ^{- 2} \sum_{i = 1}^{n} Z_{i k}^{2} + τ^{- 1})}^{- 1} \\ {\hat{β}}_{j k} & : = {\hat{τ}}_{k} {σ^{- 2} \sum_{i = 1}^{n} Z_{i k} (Y_{i j} - \sum_{h \in [K^{+}] : h \neq k} Z_{i h} β_{j h})} . \end{matrix}

Sample $ξ_{j k}$ for $j \in [p]$ and $k \in N$ .

Since the CIBP is exchangeable, without loss of generality, we can assume that the j-th customer is the last customer. For each

k \in [K^{+}]

,

ξ_{j k}

is sampled with probability

\frac{Π (ξ_{j k} = 1 | -)}{Π (ξ_{j k} = 0 | -)} = \frac{m_{j, k} + α}{κ + p - 1 - m_{j, k}} \sqrt{\frac{{\hat{τ}}_{k}}{τ}} exp (\frac{1}{2 {\hat{τ}}_{k}} {\hat{β}}_{j k}^{2}),

where

m_{j, k} : = \sum_{l \in [p] : l \neq j} ξ_{l k}

. Next, we sample

ξ_{j k}

for

k > K^{+}

. by the Metropolis–Hastings (MH) step as follows: We generate the proposal

K_{j}^{*} \in N \cup {0}

and

β_{j}^{*} : = (β_{j, 1}^{*}, \dots, β_{j, K_{j}^{*}}^{*}) \in R^{K_{j}^{*}}

from the distribution

J (K_{j}^{*}) J (β_{j}^{*} | K_{j}^{*}) = Poisson (1) N {(0, τ)}^{K_{j}^{*}} .

and then we accept the proposal with probability

min {1, {| M_{j} |}^{- n / 2} exp (\frac{1}{2} {(β_{j}^{*})}^{⊤} M_{j}^{- 1} β_{j}^{*} \sum_{i = 1}^{n} E_{i j}^{2}) {(γ ψ_{α, κ}^{1, p - 1})}^{K_{j}^{*}}},

where

\begin{matrix} M_{j} & : = σ^{- 2} β_{j}^{*} {(β_{j}^{*})}^{⊤} + I \\ E_{i j} & : = σ^{- 2} (Y_{i j} - \sum_{k = 1}^{K^{+}} Z_{i k} β_{j k}) . \end{matrix}

If the proposal is accepted, we update

\begin{matrix} B & \leftarrow (B, {(β_{j, k}^{*} I (j^{'} = j))}_{j^{'} \in [p], k \in [K_{j}^{*}]}) \\ K^{+} & \leftarrow K^{+} + K_{j}^{*} . \end{matrix}

Sample $Z_{i}$ for $i \in [n]$ .

We sample

Z_{i}

as

Z_{i} | - \sim N (σ^{- 2} {\hat{Σ}}_{Z} B^{⊤} Y_{i}, {\hat{Σ}}_{Z}),

where

{\hat{Σ}}_{Z} : = {(σ^{- 2} B^{⊤} B + I)}^{- 1}

.

Sample $σ^{2}$ .

We sample the noise variance

σ^{2}

as

σ^{2} | - \sim IG (a + \frac{n p}{2}, b + \frac{1}{2} \sum_{i = 1}^{n} \sum_{j = 1}^{p} {(Y_{i j} - \sum_{k = 1}^{K^{+}} Z_{i k} β_{j k})}^{2}) .

The dominant computational cost of our sampler is

O (n p K^{+})

per iteration, and since

K^{+}

remains stochastically bounded under the CIBP, the complexity is linear in p, unlike in the classical IBP where

K^{+} = O (log p)

.

5.3. Simulation

We perform a simulation study to compare the CIBP and the two-parameter IBP when they serve as prior distributions in the sparse factor model. We generate simulated datasets as follows: For each value

p \in {100, 250, 500, 1000}

, we generate a

p \times 5

-dimensional loading matrix

B_{0}

having 10 nonzero rows. Each factor loading in the sampled nonzero rows are generated from the uniform distribution on

[- 3, 3]

. We generate

n = 50

random vectors from the normal distribution with mean

0

and variance

B_{0} {(B_{0})}^{⊤} + I

independently. For each configuration of p, we generate 100 independent datasets.

In all experiments, we compare

CIBP (γ, α, κ)

with

γ = 0.1

,

α = 10

and

κ = 10

and

IBP (γ^{†}, κ)

with

γ^{†} = 0.1

and

κ = 10

. Note that

γ^{†} = 0.1

is equal to

γ α / κ = 0.1 \times 10 / 10

used for the CIBP. Posterior inference is performed using the MCMC scheme described in Section 5.2. We run 10,000 iterations with 5000 burn-in and thin every 10th draw.

Figure 2 gives the posterior distributions of the number of nonzero factors under the

CIBP (γ, α, κ)

and

IBP (γ^{†}, κ)

priors, respectively, aggregated over 100 replications for each

p \in {100, 250, 500, 1000}

. As the dimension p grows, the

IBP (γ^{†}, κ)

prior tends to considerably overestimate the number of factors. In contrast, the

CIBP (γ, α, κ)

prior yields accurate estimates of the number of factors for all considered values of p.

In Table 2 reports the averages of the following metrics across the 100 replications: (1) posterior mean

E [K^{+}]

of the number of active features (nonzero columns of

B

), (2) the proportion of zero entries in the posterior median of

B

(sparsity), and (3) average test log-likelihood per observation, evaluated on an independent test set of size 100. Under the CIBP prior,

E [K^{+}]

remains stable around the true value five across all values of p. In contrast, the IBP prior yields an increasing number of active features as p grows. For the sparsity of the loading matrix, the CIBP consistently produces sparser loading matrices than the IBP. Despite using fewer features and sparser loading matrix, the CIBP yields equal or better predictive log-likelihood compared to the IBP. These results support the claim that the CIBP provides a parsimonious yet predictive alternative to the IBP when the underlying number of latent features does not diverge.

In Appendix A, we provide the results of a sensitivity analysis regarding the choice of hyperparameters, as well as convergence diagnostics for our MCMC algorithm.

6. Conclusions

In this paper, we introduced the CIBP, a novel three-parameter extension of the IBP. The key distinction between the CIBP and standard IBPs is that, under the CIBP, the distribution of the number of features converges to a stable limiting distribution as the number of objects increases. This property helps the CIBP avoid superfluous features in both interpretation and prediction. We evaluated the proposed CIBP within a high-dimensional sparse factor model and provided empirical evidence that it outperforms the IBP in estimating the number of factors.

Although our focus in this work is on posterior sampling, the proposed framework can be combined with several strategies to further improve scalability. First, stochastic subsampling may be used to approximate likelihood contributions when p is large, following advances in subsampling-based MCMC for large-scale Bayesian inference [15]. Second, collapsed or partially collapsed sampling can be employed, which is known to improve mixing in nonparametric feature allocation models, including IBP [16]. Finally, variational inference offers a substantially faster alternative for large-scale applications and can be adapted from the existing variance inference method for the standard IBP [17]. A systematic investigation of these computational refinements is left for future research.

Funding

This work was supported by INHA UNIVERSITY Research Grant.

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The author declares no conflicts of interest.

Appendix A. Additional Experiments

Appendix A.1. Sensitivity Analysis of CIBP Hyperparameters

We examine the sensitivity of the Convergent Indian Buffet Process (CIBP) with respect to its hyperparameters

γ > 0

,

α > 0

, and

κ > 0

. For each hyperparameter configuration, we report the posterior average number of active features

E [K^{+}]

and the sparsity level of the loading matrix, along with Monte Carlo standard errors (SEs) computed from 20 independent simulated datasets. Table A1, Table A2 and Table A3 presents the results of the sensitivity analysis. Across experiments, we observe the following general trends:

γ is the primary driver of latent feature complexity, strongly increasing or decreasing $E [K^{+}]$ .
α adjusts sparsity patterns, with mild secondary effects on $E [K^{+}]$ .
κ influences sparsity but has minimal impact on $E [K^{+}]$ , suggesting it mainly regulates feature retention, not feature count.

Table A1. Sensitivity to

γ

(fixed

α = 10

,

κ = 10

;

p = 500

).

Table A1. Sensitivity to

γ

(fixed

α = 10

,

κ = 10

;

p = 500

).

$γ$	$E [K^{+}]$ (SE)	Sparsity (%) (SE)
0.05	4.58 (0.09)	97.6 (0.11)
0.10	5.07 (0.11)	96.9 (0.10)
0.20	5.41 (0.13)	95.8 (0.14)
0.50	5.83 (0.21)	93.1 (0.20)

Table A2. Sensitivity to

α

(fixed

γ = 0.1

,

κ = 10

;

p = 500

).

Table A2. Sensitivity to

α

(fixed

γ = 0.1

,

κ = 10

;

p = 500

).

$α$	$E [K^{+}]$ (SE)	Sparsity (%) (SE)
0.5	5.62 (0.08)	97.3 (0.12)
1	5.37 (0.09)	97.1 (0.10)
5	5.18 (0.10)	97.0 (0.10)
10	5.07 (0.11)	96.9 (0.10)
20	5.15 (0.12)	96.8 (0.13)

Table A3. Sensitivity to

κ

(fixed

γ = 0.1

,

α = 10

;

p = 400

).

Table A3. Sensitivity to

κ

(fixed

γ = 0.1

,

α = 10

;

p = 400

).

$κ$	$E [K^{+}]$ (SE)	Sparsity (%) (SE)
1	5.11 (0.10)	92.4 (0.13)
5	5.09 (0.09)	94.7 (0.12)
10	5.08 (0.09)	96.9 (0.10)
20	5.05 (0.08)	97.2 (0.14)

Appendix A.2. MCMC Convergence Diagnostics

We evaluate the convergence of our MCMC samples using trace, autocorrelation, and partial autocorrelation plots for a few randomly chosen nonzero loadings. Figure A1 presents these plots. The results show that the MCMC chain converges well.

Figure A1. The trace, autocorrelation and partial autocorrelation plots of the posterior samples of some randomly selected nonzero loadings. The blue dashed lines indicate confidence bands.

References

Griffiths, T.L.; Ghahramani, Z. Infinite latent feature models and the Indian buffet process. In Proceedings of the 18th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 5–8 December 2005; pp. 475–482. [Google Scholar]
Meeds, E.; Ghahramani, Z.; Neal, R.M.; Roweis, S.T. Modeling dyadic data with binary latent factors. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 3–6 December 2007; pp. 977–984. [Google Scholar]
Miller, K.; Jordan, M.I.; Griffiths, T.L. Nonparametric latent feature models for link prediction. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 7–10 December 2009; pp. 1276–1284. [Google Scholar]
Navarro, D.J.; Griffiths, T.L. Latent features in similarity judgments: A nonparametric Bayesian approach. Neural Comput. 2008, 20, 2597–2628. [Google Scholar] [CrossRef] [PubMed]
James, L.F.; Lee, J.; Pandey, A. Posterior distributions for Hierarchical Spike and Slab Indian Buffet processes. arXiv 2021, arXiv:2103.11407. [Google Scholar] [CrossRef]
Warr, R.L.; Dahl, D.B.; Meyer, J.M.; Lui, A. The Attraction Indian Buffet Distribution. Bayesian Anal. 2022, 17, 931–967. [Google Scholar] [CrossRef]
Ghilotti, L.; Camerlenghi, F.; Rigon, T. Bayesian analysis of product feature allocation models. J. R. Stat. Soc. Ser. B Stat. Methodol. 2025, qkaf058. [Google Scholar] [CrossRef]
Teh, Y.W.; Gorur, D. Indian buffet processes with power-law behavior. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 7–10 December 2009; pp. 1838–1846. [Google Scholar]
Thibaux, R.; Jordan, M.I. Hierarchical beta processes and the Indian buffet process. In Proceedings of the Artificial Intelligence and Statistics, San Juan, Puerto Rico, 21–24 March 2007; pp. 564–571. [Google Scholar]
Onatski, A. Determining the number of factors from empirical distribution of eigenvalues. Rev. Econ. Stat. 2010, 92, 1004–1016. [Google Scholar] [CrossRef]
Lettau, M.; Pelger, M. Estimating latent asset-pricing factors. J. Econom. 2020, 218, 1–31. [Google Scholar] [CrossRef]
Kopf, A.; Claassen, M. Latent representation learning in biology and translational medicine. Patterns 2021, 2, 100198. [Google Scholar] [CrossRef]
Ghosal, S.; Van der Vaart, A. Fundamentals of Nonparametric Bayesian Inference; Cambridge University Press: Cambridge, UK, 2017; Volume 44. [Google Scholar]
Kim, Y. Nonparametric Bayesian estimators for counting processes. Ann. Stat. 1999, 27, 562–588. [Google Scholar] [CrossRef]
Bardenet, R.; Doucet, A.; Holmes, C. On Markov chain Monte Carlo methods for tall data. J. Mach. Learn. Res. 2017, 18, 1–43. [Google Scholar]
Griffiths, T.L.; Ghahramani, Z. The Indian buffet process: An introduction and review. J. Mach. Learn. Res. 2011, 12, 1185–1224. [Google Scholar]
Doshi, F.; Miller, K.; Van Gael, J.; Teh, Y.W. Variational inference for the Indian buffet process. In Proceedings of the Artificial Intelligence and Statistics, Clearwater Beach, FL, USA, 16–18 April 2009; pp. 137–144. [Google Scholar]

Figure 1. Draws from

CIBP (γ^{†} κ / α, α, κ)

and

IBP (γ^{†}, κ)

with

γ^{†} = 5

,

κ = 4

but with

α = 5

,

α = 1

, and

α = 0.5

.

Figure 1. Draws from

CIBP (γ^{†} κ / α, α, κ)

and

IBP (γ^{†}, κ)

with

γ^{†} = 5

,

κ = 4

but with

α = 5

,

α = 1

, and

α = 0.5

.

Figure 2. The posterior distributions of the number of nonzero factors under the

CIBP

and

IBP

priors, respectively. The dashed vertical line indicates the true number of nonzero factors.

Figure 2. The posterior distributions of the number of nonzero factors under the

CIBP

and

IBP

priors, respectively. The dashed vertical line indicates the true number of nonzero factors.

Table 1. Empirical and theoretical probabilities for the number of active features under

CIBP (1, 1, 1)

with

p = 500

.

Table 1. Empirical and theoretical probabilities for the number of active features under

CIBP (1, 1, 1)

with

p = 500

.

k	Empirical Prob.	Poisson Prob.	Abs. Diff.
0	0.369	0.369	0.002
1	0.368	0.368	0.000
2	0.181	0.184	0.002
3	0.057	0.061	0.004
4	0.017	0.015	0.002
≥5	0.008	0.005	0.001

Table 2. Comparison between CIBP and IBP priors in sparse factor modeling. Reported are averages over 10 datasets with standard errors in parentheses.

p	$E [K^{+}]$		Sparsity (%)		Predictive Log-Likelihood
p	CIBP	IBP	CIBP	IBP	CIBP	IBP
100	5.12 (0.18)	6.54 (0.22)	81.82 (0.38)	84.97 (0.52)	$- 1.72$ (0.02)	$- 1.74$ (0.02)
250	5.18 (0.19)	6.81 (0.29)	85.05 (0.34)	82.87 (0.61)	$- 1.70$ (0.02)	$- 1.76$ (0.03)
500	5.29 (0.21)	7.92 (0.41)	95.21 (0.30)	88.15 (0.73)	$- 1.69$ (0.02)	$- 1.79$ (0.03)
1000	5.24 (0.23)	7.53 (0.56)	97.46 (0.27)	87.09 (0.82)	$- 1.68$ (0.02)	$- 1.83$ (0.04)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ohn, I. The Convergent Indian Buffet Process. Mathematics 2025, 13, 3881. https://doi.org/10.3390/math13233881

AMA Style

Ohn I. The Convergent Indian Buffet Process. Mathematics. 2025; 13(23):3881. https://doi.org/10.3390/math13233881

Chicago/Turabian Style

Ohn, Ilsang. 2025. "The Convergent Indian Buffet Process" Mathematics 13, no. 23: 3881. https://doi.org/10.3390/math13233881

APA Style

Ohn, I. (2025). The Convergent Indian Buffet Process. Mathematics, 13(23), 3881. https://doi.org/10.3390/math13233881

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Convergent Indian Buffet Process

Abstract

1. Introduction

1.1. Motivation and Background

1.2. Contributions and Organization

2. Convergent Indian Buffet Process

2.1. Restaurant Analogy

2.2. Distribution of the Number of Features Under the CIBP

2.3. Connection to the Two-Parameter IBP

3. Hierarchical Representation

Exchangeability

4. Construction from Random Measures

5. Application to Bayesian Sparse Factor Models

5.1. Model and Prior

5.2. Posterior Computation

5.3. Simulation

6. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Additional Experiments

Appendix A.1. Sensitivity Analysis of CIBP Hyperparameters

Appendix A.2. MCMC Convergence Diagnostics

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI