A Bayesian Approach for Clustering Constant-Wise Change-Point Data

da Cruz, Ana Carolina; de Souza, Camila P. E.

doi:10.3390/stats9020031

Open AccessArticle

A Bayesian Approach for Clustering Constant-Wise Change-Point Data

by

Ana Carolina da Cruz

^*

and

Camila P. E. de Souza

Department of Statistical and Actuarial Sciences, University of Western Ontario, London, ON N6B 5B7, Canada

^*

Author to whom correspondence should be addressed.

Stats 2026, 9(2), 31; https://doi.org/10.3390/stats9020031

Submission received: 2 February 2026 / Revised: 12 March 2026 / Accepted: 13 March 2026 / Published: 17 March 2026

(This article belongs to the Section Bayesian Methods)

Download

Browse Figures

Versions Notes

Abstract

Change-point models deal with ordered data sequences. Their primary goal is to infer the locations where an aspect of the data sequence changes. In this paper, we propose and implement a nonparametric Bayesian model for clustering observations based on their constant-wise change-point profiles via a Gibbs sampler. Our model incorporates a Dirichlet process on the constant-wise change-point structures to cluster observations while simultaneously performing multiple change-point estimation. Additionally, our approach controls the number of clusters in the model, not requiring specification of the number of clusters a priori. Satisfactory clustering and estimation results were obtained when evaluating our method under various simulated scenarios and on a real dataset from single-cell genomic sequencing. Our proposed methodology is implemented as an R package called BayesCPclust and is available from the Comprehensive R Archive Network.

Keywords:

change-point models; model-based clustering; Bayesian inference; Dirichlet process

1. Introduction

Change-point models deal with the analysis of an ordered sequence of random quantities. Examples of such sequences include daily average temperatures over time and sequencing data in genomics. One important component of a change-point model is change-point detection, which involves inferring the positions where an aspect of the data sequence changes, such as the location or distribution. These change points and their corresponding locations are of great practical interest. One of the first applications of these models dates back to the 1950s, when refs. [1,2] introduced a now well-known sequential method called cumulative sum (CUSUM) to detect changes in the mean of a quality control process. Since then, change-point detection has been actively addressed in various application settings, such as financial analysis [3] and biostatistics [4,5,6]. Change-point detection is also widely studied in time series analysis [7,8,9,10,11]; however, in what follows, we focus on non-time series techniques.

Change-point models are generally divided into two main groups: online methods, which perform sequential detection with new data continually arriving and are commonly used in anomaly detection, and offline methods, in which retrospective analysis is performed in the entire observed sequence [12]. In this article, we focus on the latter. Additionally, change-point models may be either parametric or nonparametric. Parametric models assume that the underlying distributions belong to some known family. In contrast, nonparametric approaches heavily rely on the estimation of density functions, but may be employed in a broader range of applications [13,14,15,16].

The literature on change-point models is vast, and several methods to perform change-point detection have been proposed in the past few decades. Therefore, we discuss here some approaches proposed for single and multiple change-point problems. For example, refs. [3,4] proposed circular binary segmentation and wild binary segmentation, respectively, both based on the binary segmentation algorithm proposed by [17]. These methods perform change-point tests sequentially to locate change points in the data sequence. Other methods, which are mainly used for multiple change-point problems, treat change-point detection as a model-selection problem and estimate change points via minimizing a criterion. These methods often require dynamic programming such as the pruned exactly linear time (PELT) algorithm [18] and the functional pruned (FP) algorithm [19]. Some well-known approaches for multiple change-point detection include the simultaneous multiscale change-point estimator (SMUCE) [20] and the heterogeneous simultaneous multiscale change-point estimator (H-SMUCE) [21], both of which are based on a multiscale hypothesis testing, where the optimization process relies on the penalization of a test statistic. Additional approaches for change-point problems were described by [22,23].

The previously described approaches have been proposed to detect change points from a single data sequence. However, there is an interest in identifying common patterns across multiple sequences, which allows grouping sequences that originate from the same distribution. To the best of our knowledge, techniques involving change-point estimation and model-based clustering have only been studied by [24,25,26]. Therefore, there needs to be more research on clustering change-point data from multiple sequences, especially considering model-based techniques. Ref. [25] proposed a finite Gaussian mixture model for clustering observations with a single change point, whereas ref. [26] proposed a finite negative binomial mixture model for clustering multiple change-point data. Both approaches consider the expectation–maximization (EM) algorithm for estimating the cluster assignments and the model parameters. The single change-point detection in [25] was performed using exhaustive searches for changes in the mean or variance, where competing models were compared based on the Bayesian information criterion (BIC). The multiple change-point approach detects changes in the mean of a count process by employing a combination of segmentation and an exhaustive search approach. Similar to the single change-point approach, the best model is selected based on the BIC. Focusing on the analysis of the mortality rate over time for 49 states in the United States, ref. [24] took a Bayesian approach for clustering multiple change-point data by assuming a functional Dirichlet process on the linear piece-wise structure of their data to cluster states based on the change-point locations and slope magnitudes. Although these papers showed promising results in dealing with the problem of clustering change-point data while simultaneously performing change-point detection, none made their algorithm’s implementation available. Moreover, there is currently no available software in R that simultaneously performs clustering and multiple change-point detection. Existing packages like ecp [27] and bcp [28] can detect multiple change points within a single sequence of observations but do not perform clustering over multiple sequences.

In terms of application, an important motivation arises in clustering single-cell copy number data, where commonly used approaches estimate copy number profiles and cluster cells sequentially. Typically, hidden Markov models (HMMs) are used to infer copy number states for each cell, followed by clustering as a separate step, often relying on distance-based approaches [29,30]. Although ref. [31] proposed a method that simultaneously performs copy number profiling and clustering, it is also based on an HMM framework. Therefore, there remains a need for alternative approaches that jointly perform copy number profiling and clustering, particularly methods based on change-point models, which naturally represent structural changes in copy number profiles.

In this paper, we propose and implement as an R package a nonparametric Bayesian model for clustering multiple constant-wise change-point data via a Gibbs sampler. Similar to the approach of [24], our model incorporates a functional Dirichlet process on the constant-wise change-point structures that automatically controls the number of clusters in the model as opposed to other clustering techniques [32,33]. To the best of our knowledge, this is the first work to provide an implementation for the problem of clustering multiple change-point data while simultaneously performing change-point detection. We apply our proposed approach to cluster abnormal (tumor) single-cell genomic data based on their copy number profiles, which resemble constant-wise structures. In addition, we evaluate the performance of our method under various simulated scenarios. Our proposed method is implemented as the R package BayesCPclust and is available from the Comprehensive R Archive Network (https://CRAN.R-project.org/package=BayesCPclust, accessed on 10 February 2025).

The rest of this paper is organized as follows. Section 2 introduces our proposed methodology and provides the updating steps for the Gibbs sampler. Section 3 presents the performance results for our proposed method under various simulated scenarios. In Section 4, we show the application results of our method in a single-cell copy number dataset. Finally, Section 5 and Section 6 present potential avenues for future work and a discussion about the implications of our work, respectively.

2. Methods

Let

Y_{n} = {(Y_{n 1}, \dots, Y_{n M})}^{T}

be a data sequence ordered based on some covariate such as time or position along a chromosome. For example, in the copy number dataset analyzed in Section 4,

Y_{n m}

represents the

{log}_{2}

ratio GC-corrected copy number aligned to a genomic bin m and cell n, where

n = 1, \dots, N

, and

m = 1, \dots, M

.

If we assume that there are

K_{n}

change points in

Y_{n}

, then that means that

Y_{n}

can be partitioned into

K_{n} + 1

distinct segments,

[1, τ_{1}^{(n)}), [τ_{1}^{(n)}, τ_{2}^{(n)}), \dots, [τ_{K_{n}}^{(n)}, M]

, with change point positions

τ_{1}^{(n)}, \dots, τ_{K_{n}}^{(n)}

such that

τ_{0}^{(n)} = 1

and

τ_{K_{n} + 1}^{(n)} = M

. Also, we assume that the change points are ordered; that is,

τ_{i}^{(n)} < τ_{j}^{(n)}

if and only if

i < j

.

In our approach, we assume a constant-wise structure for

Y_{n m}

defined by the model

Y_{n m} = α_{l}^{(n)} + ϵ_{n m},

(1)

where

m \in [τ_{l - 1}^{(n)}, τ_{l}^{(n)})

for

l = 1, \dots, K_{n} + 1,

and

ϵ_{n m} \sim N (0, σ_{n}^{2})

.

The model in Equation (1) assumes that the mean in each interval between change points is constant, defined by an intercept

α_{l}

,

l = 1, \dots, K_{n} + 1

. Furthermore, this model allows the variability around the mean to differ depending on the observation by specifying a variance component

σ_{n}^{2}

for each n.

Clustering change-point data via a functional Dirichlet process is formulated by assuming that the constant-wise structures for the observations are independent draws from some distribution G, which in turn follows a Dirichlet process prior. We define the constant-wise function as follows:

θ_{n} (m) = α_{l}^{(n)}, if τ_{l - 1}^{(n)} \leq m \leq τ_{l}^{(n)} - 1,

where

α_{l}^{(n)}

is the intercept in the segment

[τ_{l - 1}^{(n)}, τ_{l}^{(n)})

for each observation n. This constant-wise function

θ_{n} (m)

contains all information about the number of change points, their locations, and the intercepts for the corresponding segment. Furthermore, a Dirichlet process on

θ_{n}

leads to the hierarchical model

\begin{matrix} Y_{n m} ∣ θ_{n}, σ_{n}^{2} \sim N (α_{l}^{(n)}, σ_{n}^{2}), \\ θ_{n} ∣ G \sim G, \\ G \sim D P (α_{0}, G_{0}), \end{matrix}

where

G_{0}

is the baseline distribution such that

E (G) = G_{0}

and

α_{0}

is the precision parameter that determines how distant the distribution

G \sim D P (α_{0}, G_{0})

is from

G_{0}

.

Integration over G allows the predictive distribution of

θ_{n}

to be written as shown in [34]:

θ_{n} ∣ θ_{- n} \sim \frac{1}{N - 1 + α_{0}} \sum_{j \neq n} δ (θ_{j}) + \frac{α_{0}}{N - 1 + α_{0}} G_{0},

(2)

where

δ (θ_{j})

is a point mass distribution at

θ_{j}

and

- n

represents all the observations, except for n. Note that under the first term in Equation (2), there is a positive probability that draws from G will take on the same value. This implies that for a long enough sequence of draws from G, the value of any draw will be repeated by another draw, indicating that G is a discrete distribution. Therefore, a Dirichlet process on the change-point structures allows the proposed approach to control the number of clusters in the model while not requiring pre-specification. More details about the Dirichlet process can be found in the works of [35,36].

We define the distribution

G_{0}

in the following hierarchical form to cluster observations according to their constant-wise change-point profiles:

(i): Distribution of the number of change points (K): We assume that each segment between change points has at least $w > 0$ points to ensure a non-zero length. Let $m_{l}$ be the interval length of the lth segment after subtracting w. As a result, $K \leq k^{*}$ , where

$k^{*} = \frac{M - 1}{w} - 1$

to ensure that $m_{0} = \sum_{l = 1}^{K + 1} m_{l} = M - 1 - (K + 1) w > 0$ .
Therefore, K follows a truncated Poisson distribution given by

$P (K = k) = \frac{e^{- λ} λ^{k} / k!}{\sum_{l = 0}^{k^{*}} e^{- λ} λ^{l} / l!}, for k = 0, \dots, k^{*} .$
(ii): Distribution of the interval lengths between change points: Given $K = k$ , the distribution of the interval lengths is defined by

$(m_{1}, \dots, m_{k + 1}) ∣ K = k \sim Multinomial (m_{0}, \frac{1}{k + 1}, \dots, \frac{1}{k + 1}) .$

The change points’ positions $τ_{l}$ are obtained recursively by assuming that $τ_{0} = 1$ and $τ_{l} = m_{l} + τ_{l - 1} + w$ for $l = 1, \dots, k$ .
(iii): Distribution of the constant level $α_{l}$ : Given $K = k$ , $α_{l}$ is generated from the probability density function $π_{0}$ on $R$ independently, where $π_{0} (α_{l}) \propto 1$ for $α_{l} \in R$ and $l = 1, \dots, k + 1$ .
(iv): Finally, the constant-wise structure $θ_{n} (m)$ is then defined based on the random quantities generated accordingly to their distribution defined in (i–iii).
(v): The baseline distribution $G_{0}$ is defined based on the distributions given in (i–iii):

$\begin{matrix} G_{0} (d θ) & = \underset{(i)}{\underset{︸}{P (K = k)}} \underset{(i i)}{\underset{︸}{(\frac{Γ (m_{0} + 1)}{\prod_{i = 1}^{k} Γ (m_{i} + 1)} {(\frac{1}{k + 1})}^{m_{0}})}} \underset{(i i i)}{\underset{︸}{\prod_{l = 1}^{k + 1} π_{0} (α_{l}) d α_{l}}}, \end{matrix}$

where $d x$ represents an infinitesimal change in x. Therefore, $π_{0} (α_{l}) d α_{l}$ corresponds to the probability of observing the infinitesimal interval in the neighborhood of $α_{l}$ .
Because $π_{0} (α_{l}) \propto 1$ for $l = 1, \dots, k + 1$ , then

$G_{0} (d θ) \propto P (K = k) (\frac{Γ (m_{0} + 1)}{\prod_{i = 1}^{k} Γ (m_{i} + 1)} {(\frac{1}{k + 1})}^{m_{0}}) .$

Note that, as mentioned, the distribution on the constant-wise structures is discrete. Therefore, observations in cluster r for

r = 1, \dots, d

are assumed to share the same constant-wise function

θ_{r}

. Parameter estimation for the model is achieved in a Bayesian framework via a Gibbs sampler.

2.1. Bayesian Inference

The vector with the observed data is denoted by

Y = {(Y_{1}, \dots, Y_{N})}^{T}

, where

Y_{n} = {(Y_{n 1}, \dots, Y_{n M})}^{T}

for all observations

n = 1, \dots, N

, while

θ = (θ_{1}, \dots, θ_{N})

is the set of all constant-wise functions across all N observations. Let

K = {(K_{1}, \dots, K_{N})}^{T}

be the vector with the number of change points for each data sequence. We define the set of all change points’ positions as

τ = (τ^{(n)}, n = 1, \dots, N)

, with

τ^{(n)} = {(τ_{1}^{(n)}, \dots, τ_{K_{n}}^{(n)})}^{T}

, and

α = (α^{(n)}, n = 1, \dots, N)

as the set of all intercept parameters with

α^{(n)} = {(α_{1}^{(n)}, \dots, α_{K_{n} + 1}^{(n)})}^{T}

. Let

X_{0}^{(n)}

be the

M \times K_{n} + 1

design matrix for

α^{(n)}

.

The Dirichlet process hyperparameters

α_{0}

and

λ

are given gamma priors:

\begin{matrix} α_{0} \sim & Gamma (a_{α_{0}}, b_{α_{0}}), \\ λ \sim & Gamma (a_{λ}, b_{λ}) . \end{matrix}

The prior distribution for the intercepts

α^{(n)}

,

π_{0}

is improper to provide analytical simplifications in the calculations for their posterior conditional distributions. The variance components,

σ^{2} = {(σ_{1}^{2}, \dots, σ_{N}^{2})}^{T}

, are given independent inverse gamma priors such that

σ^{2} \sim \prod_{n = 1}^{N} Inverse-Gamma (a_{σ^{2}}, b_{σ^{2}}) .

Gibbs Sampler

In this section, we present the updating steps for estimation of the parameters

θ_{n},

σ^{2},

(α^{(r)}, K_{r}, τ^{(r)}),

λ

, and

α_{0}

for

n = 1, \dots, N

and

r = 1, \dots, d

, where r denotes an individual cluster and d denotes the total number of clusters. Each step involves calculating the full conditional distributions (see Appendix A for derivation details).

Step 1: Update $θ_{n}$

The following expression demonstrates the clustering capability of the Dirichlet process prior on the constant-wise structures

θ_{n}

. The current value of

θ_{n}

can be selected to be one of the existing

θ_{r}

with a positive probability

\sum_{j = 1}^{d} q_{n, j} / (q_{n, 0} + \sum_{j = 1}^{d} q_{n, j})

. In cases in which observation n does not belong to any existing clusters, a new

θ_{n}

is generated from the posterior distribution

G^{*} (θ_{n})

as shown in Equation (3).

The posterior of

θ_{n}

, conditional on

θ_{- n} = (θ_{1}, \dots, θ_{n - 1}, θ_{n + 1}, \dots, θ_{N})

, is given by

P (θ_{n} ∣ θ_{- n}, Y) = \frac{q_{n, 0} G^{*} (d θ_{n}) + \sum_{j = 1}^{d} q_{n, j} δ_{θ_{(j)}}}{q_{n, 0} + \sum_{j = 1}^{d} q_{n, j}},

(3)

where

\begin{matrix} q_{n, 0} = \int_{Θ} ℓ (Y_{n} ∣ θ_{n}) G_{0} (d θ_{n}) α_{0} / (α_{0} + N - 1) \end{matrix}

and

\begin{matrix} q_{n, j} = ℓ (Y_{n} ∣ θ_{n}) N_{r} / (α_{0} + N - 1), \end{matrix}

define the mixing weights when observation n forms a new cluster and when observation n belongs to an existing cluster, respectively. Additionally,

G^{*} (d θ_{n}) = \frac{ℓ (Y_{n} ∣ θ_{n}) G_{0} (d θ_{n})}{\int_{Θ} ℓ (Y_{n} ∣ θ_{n}) G_{0} (d θ_{n})}

is the posterior of

θ_{n}

, given that a new cluster is formed by observation n. Since

Y_{n m} \sim N (α_{l}^{(n)}, σ_{n}^{2})

, we have that

ℓ (Y_{n} ∣ θ_{n})

represents the normal likelihood function corresponding to the observation

Y_{n}

after integrating out the variance component

σ_{n}^{2}

. Also,

α_{0}

corresponds to the precision hyperparameter for the Dirichlet process, and

N_{r}

denotes the number of observations in cluster r. The full expressions for

q_{n, 0}

and

q_{n, j}

are given in detail in Appendix A, Equations (A4) and (A5).

Step 2: Update $σ_{n}^{2}$

Regardless of whether

θ_{n}

is a new value or an existing

θ_{r}

(Step 1), the variance component for observation n is updated using the full conditional of

σ_{n}^{2}

given the other parameters:

\begin{matrix} P (σ_{n}^{2} ∣ θ_{n}, Y_{n}) & \propto f (Y_{n} ∣ θ_{n}, σ_{n}^{2}) π (σ_{n}^{2}) \\ = Inverse-Gamma (\frac{M}{2} + a_{σ^{2}}, \frac{{(Y_{n} - X_{0} α^{(n)})}^{T} (Y_{n} - X_{0} α^{(n)})}{2} + \frac{1}{b_{σ^{2}}}) . \end{matrix}

Step 3: Update $(K_{r}, τ^{(r)}, α^{(r)})$

Note that

θ

uniquely determines the collection of parameters

(K, τ, α)

, and as mentioned, it contains several identical elements. Therefore,

(K, τ, α)

also contains identical elements. In this step, we provide the updating procedures for the d distinct components of

(K, τ, α)

, defined by

(K_{r}, τ^{(r)}, α^{(r)})

, for

r = 1, \dots, d

, where d is the number of clusters at the current update of the Gibbs sampler. Considering the hierarchical structure for the distributions of

α^{(r)}

and

τ^{(r)}

, which both depend on the value of

K_{r}

, we first update

K_{r}

from the posterior marginal probability function as follows:

P (K_{r} = k) \propto P (K = k) \sum_{(m_{1}, \dots, m_{k + 1})} v (m_{1}, \dots, m_{k + 1}),

where

v (m_{1}, \dots, m_{k + 1}) = exp (\tilde{H} (m_{1}, \dots, m_{k + 1})) \frac{Γ (m_{0} + 1)}{\prod_{l = 1}^{k + 1} Γ (m_{l} + 1)} {(\frac{1}{k + 1})}^{m_{0}} .

The full expression of

v (m_{1}, \dots, m_{k + 1})

is given in detail in Appendix A, Equations (A6) and (A7).

Then, we update

τ^{(r)}

given

K_{r} = k

using the probabilities

P (m_{1}, \dots, m_{k + 1})

, where

P (m_{1}, \dots, m_{k + 1}) \propto v (m_{1}, \dots, m_{k + 1}) .

This is carried out by exhaustively listing all combinations and numerically computing the corresponding probabilities.

Finally,

α^{(r)}

given

τ^{(r)}

and

K_{r} = k

are updated based on the full conditional distribution:

\begin{matrix} P (α^{(r)} ∣ τ^{(r)}, K_{r}, Y_{r}) & \propto f (Y_{r} ∣ α^{(r)}, σ^{2}) π (α^{(r)}) \\ = Normal (V_{r}^{- 1} X_{0, r}^{T} Y_{n}, V_{r}^{- 1} \sum_{n \in C_{r}} σ_{n}^{- 2}), \end{matrix}

where

Y_{r}

represents the observations in cluster r.

Step 4: Update $λ$

The update of

λ

is given by the following full conditional distribution and is carried out by the Metropolis–Hastings algorithm; that is, we generate proposals from a gamma distribution and accept them with some probability based on an acceptance ratio:

\begin{matrix} P (λ ∣ K, Y) & \propto π (λ) \prod_{n = 1}^{N} P (K_{n} = k) \\ = \frac{1}{b^{a} Γ (a)} λ^{a - 1} exp \{\frac{λ}{b}\} \times \prod_{n = 1}^{N} \frac{λ^{k} / k!}{\sum_{l = 0}^{k^{*}} λ^{l} / l!} \\ \propto \frac{λ^{a - 1 + \sum_{n = 1}^{N} K_{n}}}{{(\sum_{l = 0}^{k^{*}} λ^{l} / l!)}^{N}} exp \{\frac{λ}{b}\}, \end{matrix}

where a and b are the prior hyperparameters previously defined as

a_{λ}

and

b_{λ}

, respectively.

Step 5: Update $α_{0}$

The update of

α_{0}

is carried out using the procedure described in [37]:

Sample $u \sim Beta (α_{0}^{t - 1} + 1, N)$ ;
Draw $α_{0}$ from the mixture $π_{u} \times Gamma (a + d, {(1 / b - log (u))}^{- 1}) + (1 - π_{u}) \times Gamma (a + d - 1, {(1 / b - log (u))}^{- 1})$ .

Here, a and b are the prior hyperparameters previously described as

a_{α_{0}}

and

b_{α_{0}}

, respectively, and d is the number of clusters at the current update of the Gibbs sampler, while the probability membership is

π_{u} = \frac{a + d - 1}{N (1 / b - log (u))} .

3. Simulations

We evaluated the performance of our method through three simulated scenarios. We varied one of the parameters for each simulated scenario while fixing the others, as shown in Table 1. We applied our method to 96 randomly generated datasets based on the model in Equation (1), considering the initialization for the Gibbs sampler as described in Section 3.1. Then, considering the evaluation metrics described in Section 3.2, we assessed our method’s performance, and the results are presented in Section 3.3, Section 3.4 and Section 3.5.

3.1. Gibbs Sampler Initialization and Implementation

This section describes the initialization of the Gibbs sampler and some details about our algorithm’s implementation. For the simulation scenarios and real data analysis, the hyperparameters for the prior distribution of the variance components were specified as

a_{σ^{2}} = 2

and

b_{σ^{2}} = 1000

. For the prior distributions of

λ

and

α_{0}

, the hyperparameters were

a_{λ} = a_{α_{0}} = 2

and

b_{λ} = b_{α_{0}} = 1000

. The minimum number of locations in each segment between change points w was set to 10 in Scenario 1 and 10, 20, and 50 for Scenario 2 when considering

M = 50, 100, 200

, respectively.

To enable convergence diagnosis for the Gibbs sampler, we employed two chains with different initial values for each simulated scenario. The first chain started from the true settings; that is, we considered the parameter values used to generate the datasets as initial values for our algorithm, whereas for the second chain, we initialized the Gibbs sampler from the true parameter values plus a small perturbation. For instance, the initial values for the intercepts of each cluster were initialized from the true setting plus 1.5. The position of the change points for each cluster started from two points above the ground truth, and the variance components were initialized using generated values from an inverse gamma distribution with twice the average used to generate the true variance components.

The simulations and computations for the Gibbs sampler algorithm were performed using Sharcnet’s Graham cluster, with a single node consisting of two Intel E5-2683 v4 “Broadwell” with a 2.1 GHz processor base frequency for an overall of 32 computing cores. The number of simulated datasets,

S = 96

, was chosen as a multiple of the number of cores. The computations were performed on CentOS 7, with R version 4.2.1–“Funny-Looking Kid” [38], using the parallel package version 4.4.2 [38] to simulate and compute the Gibbs sampler for independent datasets simultaneously, the extraDistr package version 1.10.0 [39] to generate samples from inverse gamma distributions, the RcppAlgos package version 2.9.3 [40] to generate all possible partitions for the number of points in each segment between two change points, the MASS package version 7.3-61 [41] to generate samples from multivariate normal distributions, and the FDRSeg package version 1.0-3 [42] to calculate the V measure. It is worth mentioning that our algorithm was implemented as the R package BayesCPclust version 0.1.0 [43].

3.2. Performance Metrics

For each chain, simulated setting, and randomly generated datasetze of five to each data sequence, using the, we employed our method with 5000 iterations to estimate change points and perform clustering. We considered a burn-in of

50 %

of the size of the chains, and we thinned our remaining samples by keeping only every 25th iteration. This procedure ensured that our samples were not highly correlated. For the 200 remaining samples, we calculated the posterior mean for each parameter, except for the discrete variables, such as cluster assignments, number of clusters, number of change points, and their locations, where we chose the most frequent value: the posterior mode.

To evaluate our method’s performance concerning intercept estimation, we computed the posterior mean for each intercept and simulated dataset, which corresponded to the optimal estimator under the squared error loss

L (α, \hat{α}) = {(α - \hat{α})}^{2}

. Then, we calculated the average of these posterior means for each intercept and compared its value to the true settings considered when generating the datasets. Furthermore, we assessed uncertainty in the estimation of the intercepts by computing the average of the posterior variances across the simulated datasets, which represent the posterior expected risk under the squared loss function, and the average interval length of

95 %

equal-tailed credible intervals taken over the 96 datasets. Additionally, we report the mean absolute deviation (MAD) for the variance components’ estimates.

For the discrete variables, we report the proportion of datasets in which we correctly estimated the parameters. To evaluate the clustering performance of our proposed approach, we considered the V measure [42], which assesses observation-to-cluster assignments and measures the homogeneity and completeness of a clustering result. Homogeneity measures whether each cluster contains only observations from a single true class, while completeness evaluates whether all observations from the same class are assigned to the same cluster. The V measure ranges from zero to one, where results closer to one are considered adequate.

3.3. Scenario 1: Varying the Number of Data Sequences with $σ_{n}^{2} \approx 0.05$

Figure 1 shows the data structure of four data sequences from 1 of the 96 randomly generated synthetic datasets for Scenario 1 when

N = 50

. In this scenario, we varied the number of data sequences, considering

N = 10, 25, 50

while keeping the other parameters fixed as described in Table 1. Each panel represents one observation colored by their cluster assignment. Both clusters had two change points. The change points’ locations for Cluster 1 were 19 and 34, and for the second cluster, they were 15 and 32. Each segment between change points was defined by a constant level (5, 20, 10) for the first cluster and (17, 10, 2) for the second cluster.

Based on the methodology of [44], the convergence of the chains for all parameters after the burn-in period and thinning procedure was confirmed. Table 2 presents the results for the posterior estimates for the intercepts of each cluster when the number of data sequences was

N =

10, 25, and 50, and the variance components were generated around 0.05. In this setting, our estimates were close to the true parameter values, showing that our proposed method retrieved the correct intercepts for each cluster. As the number of data sequences increased, the average of the posterior variances for the intercepts of each cluster and the average length of the

95 %

credible intervals decreased, indicating that the uncertainty about the intercepts decreased as the number of data sequences increased. Overall, these results demonstrate that our method accurately recovered the true parameter values across all cases. Considering that each data sequence had its own variance component and M was fixed, increasing the number of data sequences did not considerably improve the estimation of the variance components as shown in Table 3 by the mean absolute deviation.

The change points’s locations associated with the two clusters were correctly estimated for all 96 datasets. The number of clusters and the cluster assignment for each data sequence were correctly estimated for all 96 datasets, resulting in all V measures being equal to one. Due to these findings, we decided to not include the tables with the results for the change-point detection and the figures with the values for the V measure, which were all one for all the simulated datasets.

3.4. Scenario 2: Varying the Number of Data Sequences with $σ_{n}^{2} \approx 0.50$

This section evaluates our method’s performance with a higher data dispersion than in the previous section. We generated 96 datasets as in the last experiment for each possible value of N; however, for this scenario, we sampled the variance components from an inverse gamma distribution with an average 10 times higher than in the simulation Scenario 1, as shown in Figure 2.

It is worth mentioning that the convergence of the chains for all parameters in Scenario 2 was also confirmed using the methodology of [44]. Table 4 shows the results for the posterior estimates for the intercepts of each segment between change points for the two clusters when the number of data sequences was

N = 10, 25, 50

, and the variance components were generated around 0.5. We observed that for every considered number of data sequences, our approach correctly estimated the intercepts. Although the average posterior variances and the average credible interval sizes of the intercepts for each cluster were noticeably higher than in the previous scenario, reflecting greater uncertainty due to the increase in data dispersion, they decreased as the number of data sequences increased. Nonetheless, our method showed satisfactory performance not only in estimating the intercepts for each cluster but also in correctly estimating the number of change points and their corresponding locations. Additionally, our method always recovered the true clustering configuration in our data, with all V measures being equal to one. Due to these findings, we decided to not include the tables with the results for the change-point detection and the figures with the values for the V measure, which were all one for all the simulated datasets.

Furthermore, the mean absolute deviation for the variance components estimates was small and remained stable, as in the previous scenario, suggesting that increasing the number of data sequences did not noticeably improve the precision of the variance component estimates, as reported in Table 5.

3.5. Scenario 3: Varying the Number of Locations

In this section, we present the performance results of our method when the number of locations was

M = 50, 100, 200

. In this scenario, both clusters had two change points. The intercept values between change points were fixed at (2, 15, 17) for the first cluster and

(20, 5, 12)

for the second cluster across all cases. The change points’ locations varied with the number of locations M as follows. For

M = 50

, they were

(18, 36)

in the first cluster and

(15, 32)

in the second one; for

M = 100

, they were

(37, 74)

and

(29, 58)

; and for

M = 200

, they were

(60, 120)

and

(55, 140)

. Table 6 presents the posterior estimates for the intercepts for each case in Scenario 3. As in the previous scenarios, convergence of the chains for all parameters was confirmed. Based on the results, our approach correctly estimated the intercepts for each cluster and showed that as the number of locations increased, the uncertainty in the estimation of the intercepts for each cluster decreased, as reflected by the decreasing average posterior variances. Once again, our method correctly estimated the number of change points and change-point positions for all generated datasets. In addition, all the values for the V measure were equal to one, showing that our model recovered the true clustering configuration in our data.

Furthermore, in this scenario, we observed an increase in the precision of our estimates for the variance components, as shown in Table 7, as M increased. As discussed in the previous scenarios, the number of data sequences minimally affected the precision of our variance estimates, since each data sequence had its variance component. However, by increasing the number of locations, we noted a decrease in the mean absolute deviation for our estimates, suggesting that the number of locations considerably affected the estimation of the variance components.

4. Real Data Analysis

We further assessed the performance of our method in a real dataset. We applied our approach to a subset of the copy number genomic data analyzed by [29], focusing on patient CRC2. The dataset consists of copy number information for 45 cells (data sequences) from frozen primary tumor and liver metastases of colorectal cancer. Each data point in the dataset corresponds to the

{log}_{2}

ratio of reads aligned per 200-kb genomic bin per cell after GC correction. The

{log}_{2}

ratios provide an indication of the number of copies in each genomic bin. A

{log}_{2}

ratio greater than one means an amplification in the corresponding region. Genomic copy number alterations are common in many diseases including cancer, where deletions or amplifications of DNA segments can contribute to alterations in the expression of tumor-suppressor genes [45,46]. Identifying the number and locations of these alterations is essential for understanding cancer progression. As tumors evolve, differences in genomic profiles, including the copy number, are expected between primary tumor and metastatic tumors [47,48,49,50,51].

In this work, for computational feasibility purposes, we focused our analysis on chromosomes 19, 20, and 21, corresponding to 583 genomic bins (locations), since it is a region with visible change points, as observed by [29]. The raw data (FASTQ files) are available publicly at the NCBI Sequence Read Archive (SRA) under accession number SRP074289. The processed

{log}_{2}

ratios were kindly provided by [29] upon our request.

Figure 3 displays the copy number data for six cells in our dataset: three from the primary tumor location and three from a liver metastasis location. Our main interest lies in clustering all 45 cells based on their copy number variations, evaluating whether they formed groups according to their tissue of origin and uncovering any novel patterns, if present.

Due to computational cost, we fixed the maximum number of change points (

k^{*}

) to two, and we applied a median moving window with a size of five to each data sequence, using the R package zoo [52] to reduce the number of bins in the data and handle possible outliers. Considering the transformed data with 290 locations, we applied our algorithm using two chains with a size of 10,000. One was initialized using the clustering result from the K-means method when we set the number of clusters to be two. The other chain was initialized using random cluster assignments; that is, each cell was randomly assigned to one of two clusters. The number of change points for each cluster was set to zero at the beginning of the chains. Additionally, the initial values for the intercepts were selected as the average

{log}_{2}

ratio copy number information taken over the cells in each initial cluster, and the sample variances were set as initial values for the variance components. The minimum number of locations in each segment between change points w was set to 50. Furthermore, convergence was confirmed using the methodology of [44] for each chain after the burn-in of half the size of the chains and thinning the remaining samples by selecting every 50th one.

Our approach identified three clusters; Cluster 1 was composed of 18 primary tumor cells with clear change points at bin locations 100 and 226, Cluster 2 was composed of four primary tumor cells and five metastatic tumor cells, with

{log}_{2}

ratio reads around one for all bins, and Cluster 3 was composed of 18 metastatic tumor cells with two change points at bin locations 165 and 215, as shown in Figure 4, Figure 5, and Figure 6, respectively.

Table 8 reports the posterior estimates for the intercepts of each segment between change points for each cluster, where we note that the intercepts for Cluster 2 were not significant since the credible intervals overlapped, suggesting, as is shown in Figure 5, the absence of change points, since the

{log}_{2}

ratio reads were steady around one for all locations. In addition, the posterior variances of the intercepts for Clusters 1 and 3 were noticeably smaller than those for Cluster 2, which suggest a higher uncertainty in the estimation of the copy number information for Cluster 2, which may be due to the fact that Cluster 2 was composed of only 9 cells in comparison with the 18 cells assigned to each of the other clusters. Interestingly, the cells belonging to Cluster 2 were not considered in the hierarchical clustering analysis performed by [29]. In addition, metastatic and primary tumor single cells were mainly clustered separately, as observed by [29]. However, ref. [29] considered all chromosomes when clustering cells and found two clusters for the metastatic tumor cells. Furthermore, they noted that amplifications of chromosomes 3 and 8 distinguished the subpopulations for the metastatic tumor cells. This data was also analyzed by [31], where the authors developed a Markov chain-based method for clustering copy number data. Analyzing the same dataset, ref. [31] considered the copy number data for chromosomes 18–21 from patient CRC2 to cluster tumor single cells according to their copy number profiles. As a result, ref. [31] identified two clusters of tumor single cells, separating primary from metastatic single cells.

5. Potential Avenues for Future Work

As a possible future work, other Bayesian inference approaches for optimization could be considered, such as variational inference and approximate Bayesian computation methods [53,54], which provide functional approximations of posterior distributions and can reduce the computational cost associated with MCMC-based methods. Additionally, the Gibbs sampler developed in this paper can be improved using techniques like blocking, collapsing, and partial collapsing techniques to address slow convergence issues [55].

6. Conclusions

The results from the simulation scenarios show that our approach can recover the true classification of each data sequence. Furthermore, it was precise in identifying the change points when we varied the number of data sequences and the number of locations. Importantly, the degree of dispersion in the data did not affect our method’s performance; we observed satisfactory results in scenarios where the variance components were sampled from inverse gamma distributions with both small and large averages. Additionally, our method effectively recovered the true underlying data structure in the presence of outliers, demonstrating its robustness. This robustness was evaluated by introducing an outlier in the change point location for a subset of data sequences from Cluster 1. Using a dataset from Scenario 2 with

N = 50

data sequences, we reduced the value of the 19th observation by 10 units in 9% of the sequences from Cluster 1, causing the first change point for these sequences to shift by one position from its true position. Despite this modification, our method successfully recovered both the true change-point profiles and the correct cluster assignments (see results in Appendix C). Finally, by applying our method to a single-cell copy number dataset, our approach showed results consistent with [29], where we obtained similar clusters for tumor single cells based on their change point structures, in which we observed that some cells were clustered according to their tissue of origin. However, the application to the dataset also revealed a novel cluster composed of cells from both primary and metastatic tissue origins, providing new insights into the dataset.

To facilitate the implementation of our method, we developed the R package BayesCPclust available from the Comprehensive R Archive Network, which to our knowledge is the first package that addresses the problem of clustering multiple change-point data while simultaneously performing change-point detection.

A limitation of our approach lies in the computational cost, since it requires the calculation of a probability for each possible combination of interval lengths between change points, which can be computationally expensive when the number of locations increases. To remedy this, in the real data analysis, we calculated the probabilities for a sample of all possible combinations of interval lengths, reducing (though not sufficiently) the computational cost. In general, as the number of data sequences or locations increased, the average processing time to infer change points and perform clustering analysis for the simulation scenarios also increased, with an average duration between 20 and 30 h for the scenarios with the highest number of locations (see Table A3 in Appendix B). Furthermore, we observed similar processing times for the first two scenarios (see Table A1 and Table A2 in Appendix B), suggesting that the data dispersion had a minimal effect on the computational cost of our algorithm.

A common issue in Bayesian mixture modeling is that the labels of the clusters can be permuted multiple times over iterations of a Markov chain Monte Carlo (MCMC) method, such as the Gibbs sampler. This issue, known as label switching, happens because the data likelihood is invariant under the permutation of the labels of the clusters. Solutions for undoing label switching are necessary to perform cluster-specific inference. Thus, various approaches have been proposed to solve this issue [56,57,58]. In this work, we assigned the most frequent set of labels to the sequences of cluster assignments, leading to the same clustering. Then, after this correction for label switching, we obtained all the corresponding parameter posterior estimates for each cluster.

Author Contributions

Conceptualization, A.C.C. and C.P.E.S.; methodology, A.C.C. and C.P.E.S.; software, A.C.C.; writing—original draft preparation, A.C.C.; writing—review and editing, A.C.C. and C.P.E.S.; supervision, C.P.E.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Sciences and Engineering Research Council of Canada, grant number RGPIN-2019-05915.

Data Availability Statement

The raw data (FASTQ files) are available publicly from the NCBI Sequence Read Archive (SRA) under accession number SRP074289.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Appendix A. Gibbs Sampler

Step 1.

In this step, we update the clustering assignments for the observations, which depends on

q_{n, 0} = α_{0} \sum_{k = 0}^{k^{*}} \sum_{(m_{1}, \dots, m_{k + 1})} H (m_{1}, \dots, m_{k + 1}) \frac{m_{0}}{m_{1}! \dots m_{k + 1}!} {(\frac{1}{k + 1})}^{m_{0}} P (K = k),

where

\begin{matrix} H (m_{1}, \dots, m_{k + 1}) & = \int_{α^{(n)}} \int_{0}^{\infty} f (Y_{n} ∣ α^{(n)}, σ_{n}^{2}) π (σ_{n}^{2}) d σ_{n}^{2} d α^{(n)} \\ = \int_{0}^{\infty} \int_{α^{(n)}} f (Y_{n} ∣ α^{(n)}, σ_{n}^{2}) π (σ_{n}^{2}) d α^{(n)} d σ_{n}^{2}, \end{matrix}

with

f (Y_{n} ∣ α^{(n)}, σ_{n}^{2}) = \frac{1}{{(2 π σ_{n}^{2})}^{M / 2}} exp \{- \frac{1}{2 σ_{n}^{2}} {(Y_{n} - X_{0} α^{(n)})}^{T} (Y_{n} - X_{0} α^{(n)})\}

and

\begin{matrix} \int_{α^{(n)}} f (Y_{n} ∣ α^{(n)}, σ_{n}^{2}) d α^{(n)} \\ = & \int_{α^{(n)}} \frac{1}{{(2 π σ_{n}^{2})}^{M / 2}} exp \{- \frac{1}{2 σ_{n}^{2}} {(Y_{n} - X_{0} α^{(n)})}^{T} (Y_{n} - X_{0} α^{(n)})\} d α^{(n)} . \end{matrix}

(A1)

We calculate

{(Y_{n} - X_{0} α^{(n)})}^{T} (Y_{n} - X_{0} α^{(n)})

in Equation (A1) as

\begin{matrix} {(Y_{n} - X_{0} α^{(n)})}^{T} (Y_{n} - X_{0} α^{(n)}) \\ = & Y_{n}^{T} Y_{n} - α^{T (n)} X_{0}^{T} Y_{n} - Y_{n}^{T} X_{0} α^{(n)} + α^{T (n)} X_{0}^{T} X_{0} α^{(n)} \\ = & Y_{n}^{T} Y_{n} - 2 Y_{n}^{T} X_{0} α^{(n)} + α^{T (n)} X_{0}^{T} X_{0} α^{(n)}, \end{matrix}

(A2)

where we can complete the square as follows:

\begin{matrix} {(α^{(n)} - {(X_{0}^{T} X_{0})}^{- 1} X_{0}^{T} Y_{n})}^{T} (X_{0}^{T} X_{0}) (α^{(n)} - {(X_{0}^{T} X_{0})}^{- 1} X_{0}^{T} Y_{n}) \\ = & (α^{T (n)} - Y_{n}^{T} X_{0} {(X_{0}^{T} X_{0})}^{- 1}) (X_{0}^{T} X_{0} α^{(n)} - X_{0}^{T} X_{0} {(X_{0}^{T} X_{0})}^{- 1} X_{0}^{T} Y_{n}) \\ = & α^{T (n)} X_{0}^{T} X_{0} α^{(n)} - α^{T (n)} X_{0}^{T} Y_{n} - Y_{n}^{T} X_{0} α^{(n)} + Y_{n}^{T} X_{0} {(X_{0}^{T} X_{0})}^{- 1} X_{0}^{T} Y_{n} . \end{matrix}

(A3)

Now, let

V_{n} = X_{0}^{T} X_{0}

, and using the result in Equation (A3), we obtain that Equation (A2) is equivalent to

\begin{matrix} Y_{n}^{T} Y_{n} - 2 Y_{n}^{T} X_{0} α^{(n)} + α^{T (n)} X_{0}^{T} X_{0} α^{(n)} \\ = & Y_{n}^{T} Y_{n} + {(α^{(n)} - V_{n}^{- 1} X_{0}^{T} Y_{n})}^{T} V_{n} (α^{(n)} - V_{n}^{- 1} X_{0}^{T} Y_{n}) - Y_{n}^{T} X_{0} V_{n}^{- 1} X_{0}^{T} Y_{n} . \end{matrix}

Then, we can solve Equation (A1) as follows:

\begin{matrix} \int_{α^{(n)}} f (Y_{n} ∣ α^{(n)}, σ_{n}^{2}) d α^{(n)} \\ = & \int_{α^{(n)}} \frac{1}{{(2 π σ_{n}^{2})}^{M / 2}} exp \{- \frac{1}{2 σ_{n}^{2}} (Y_{n}^{T} Y_{n} + {(α^{(n)} - V_{n}^{- 1} X_{0}^{T} Y_{n})}^{T} V_{n} (α^{(n)} - V_{n}^{- 1} X_{0}^{T} Y_{n}) \\ - Y_{n}^{T} X_{0} V_{n}^{- 1} X_{0}^{T} Y_{n})\} d α^{(n)} \\ = & \frac{1}{{(2 π σ_{n}^{2})}^{M / 2}} exp \{- \frac{1}{2 σ_{n}^{2}} (Y_{n}^{T} Y_{n} - Y_{n}^{T} X_{0} V_{n}^{- 1} X_{0}^{T} Y_{n})\} \\ \times \int_{α^{(n)}} \underset{A}{\underset{︸}{exp \{- \frac{1}{2 σ_{n}^{2}} {(α^{(n)} - V_{n}^{- 1} X_{0}^{T} Y_{n})}^{T} V_{n} (α^{(n)} - V_{n}^{- 1} X_{0}^{T} Y_{n})\}}} d α^{(n)}, \end{matrix}

where A is the kernel of a (

K_{n} + 1

)-variate Normal distribution with a mean vector

V_{n}^{- 1} X_{0}^{T} Y_{n}

and covariance-variance matrix

σ_{n}^{- 2} V_{n}^{- 1}

. Then, we have

\begin{matrix} \int_{α^{(n)}} f (Y_{n} ∣ α^{(n)}, σ_{n}^{2}) d α^{(n)} = \frac{1}{{(2 π σ_{n}^{2})}^{M / 2}} exp \{- \frac{1}{2 σ_{n}^{2}} (Y_{n}^{T} Y_{n} - Y_{n}^{T} X_{0} V_{n}^{- 1} X_{0}^{T} Y_{n})\} \\ \times \frac{{(2 π)}^{(K_{n} + 1) / 2}}{{(σ_{n}^{2})}^{- (K_{n} + 1) / 2}} \frac{1}{| V_{n}^{- 1} |^{- 1 / 2}} \\ = & \frac{1}{{(2 π σ_{n}^{2})}^{\frac{M - (K_{n} + 1)}{2}} {| V_{n}^{- 1} |}^{- 1 / 2}} exp \{- \frac{1}{2 σ_{n}^{2}} (Y_{n}^{T} Y_{n} - Y_{n}^{T} X_{0} V_{n}^{- 1} X_{0}^{T} Y_{n})\} . \end{matrix}

Let

B = Y_{n}^{T} Y_{n} - Y_{n}^{T} X_{0} V_{n}^{- 1} X_{0}^{T} Y_{n}

, and we have that

\begin{matrix} \int_{0}^{\infty} \int_{α^{(n)}} f (Y_{n} ∣ α^{(n)}, σ_{n}^{2}) π (σ_{n}^{2}) d α^{(n)} d σ_{n}^{2} \\ = & \int_{0}^{\infty} \frac{1}{{(2 π σ_{n}^{2})}^{\frac{M - (K_{n} + 1)}{2}} {| V_{n}^{- 1} |}^{- 1 / 2}} exp \{- \frac{1}{2 σ_{n}^{2}} B\} \times \frac{1}{b^{a} Γ (a)} {(σ_{n}^{2})}^{- a - 1} exp \{- \frac{1}{b σ_{n}^{2}}\} d σ_{n}^{2} \\ = & \frac{1}{| V_{n}^{- 1} |^{- 1 / 2} {(2 π)}^{(M - K_{n} - 1) / 2}} \frac{1}{b^{a} Γ (a)} \int_{0}^{\infty} \underset{E}{\underset{︸}{{(σ_{n}^{2})}^{- \frac{(M - K_{n} - 1)}{2} - a - 1} exp \{- \frac{1}{σ_{n}^{2}} (\frac{B}{2} + \frac{1}{b})\}}} d σ_{n}^{2}, \end{matrix}

where E is the kernel of an inverse gamma distribution with parameters

a_{1} = \frac{(M - K_{n} - 1)}{2} + a

and

b_{1} = \frac{B}{2} + \frac{1}{b}

. Then, we have

\begin{matrix} \int_{0}^{\infty} \int_{α^{(n)}} f (Y_{n} ∣ α^{(n)}, σ_{n}^{2}) π (σ_{n}^{2}) d α^{(n)} d σ_{n}^{2} \\ = & \frac{1}{| V_{n}^{- 1} |^{- 1 / 2} {(2 π)}^{(M - K_{n} - 1) / 2}} \frac{1}{b^{a} Γ (a)} \frac{Γ (\frac{M - K_{n} - 1}{2} + a)}{{(\frac{B}{2} + \frac{1}{b})}^{\frac{(M - K_{n} - 1)}{2} + a}} . \end{matrix}

Therefore, we have that

H (m_{1}, \dots, m_{k + 1}) = \frac{1}{| V_{n} |^{1 / 2} {(2 π)}^{(M - K_{n} - 1) / 2}} \frac{1}{b^{a} Γ (a)} \frac{Γ (\frac{M - K_{n} - 1}{2} + a)}{{(\frac{B}{2} + \frac{1}{b})}^{\frac{(M - K_{n} - 1)}{2} + a}},

implying that

\begin{matrix} q_{n, 0} = & α_{0} \sum_{k = 0}^{k^{*}} \sum_{(m_{1}, \dots, m_{k + 1})} \frac{1}{| V_{n} |^{1 / 2} {(2 π)}^{(M - K_{n} - 1) / 2}} \frac{1}{b^{a} Γ (a)} \frac{Γ (\frac{M - K_{n} - 1}{2} + a)}{{(\frac{B}{2} + \frac{1}{b})}^{\frac{(M - K_{n} - 1)}{2} + a}} \\ \times \frac{m_{0}}{m_{1}! \dots m_{k + 1}!} {(\frac{1}{k + 1})}^{m_{0}} P (K = k) . \end{matrix}

(A4)

Now, for

q_{n, j}

we first define

ℓ (Y_{n} ∣ θ_{(r)})

as

\begin{matrix} ℓ (Y_{n} ∣ θ_{(r)}) = \int_{0}^{\infty} f (Y_{n} ∣ θ_{(r)}, σ_{n}^{2}) π (σ_{n}^{2}) d σ_{n}^{2} \\ = & \int_{0}^{\infty} \frac{1}{{(2 π σ_{n}^{2})}^{M / 2}} exp \{- \frac{1}{2 σ_{n}^{2}} {(Y_{n} - X_{0} α^{(n)})}^{T} (Y_{n} - X_{0} α^{(n)})\} \\ \times \frac{1}{b^{a} Γ (a)} {(σ_{n}^{2})}^{- a - 1} exp \{- \frac{1}{b σ_{n}^{2}}\} d σ_{n}^{2} \\ = & \int_{0}^{\infty} \frac{1}{b^{a} {(2 π)}^{M / 2} Γ (a)} {(σ_{n}^{2})}^{- \frac{M}{2} - a - 1} exp \{- \frac{1}{σ_{n}^{2}} (\frac{{(Y_{n} - X_{0} α^{(n)})}^{T} (Y_{n} - X_{0} α^{(n)})}{2} + \frac{1}{b})\} d σ_{n}^{2} \\ = & \frac{Γ (\frac{M}{2} + a)}{b^{a} {(2 π)}^{M / 2} Γ (a)} {(\frac{{(Y_{n} - X_{0} α^{(n)})}^{T} (Y_{n} - X_{0} α^{(n)})}{2} + \frac{1}{b})}^{- (M / 2 + a)} . \end{matrix}

Then, we have

q_{n, j} = \frac{Γ (\frac{M}{2} + a)}{b^{a} {(2 π)}^{M / 2} Γ (a)} {(\frac{{(Y_{n} - X_{0} α^{(n)})}^{T} (Y_{n} - X_{0} α^{(n)})}{2} + \frac{1}{b})}^{- (M / 2 + a)} .

(A5)

In practice, in Step 1, we start with a Bernoulli experiment, generating 0 and 1 with probabilities

p = q_{n, 0} / (q_{n, 0} + \sum_{j = 1}^{d} n_{j} q_{n, j})

and

1 - p

, respectively. If it results in 0, then a new

θ_{s}

is generated from

G^{*} (d θ_{n})

, and d is increased to d + 1. If it results in 1, then the existing cluster label j is sampled with a probability

p_{j} = n_{j} q_{n, j} / \sum_{j = 1}^{d} n_{j} q_{n, j}

for

j = 1, 2, 3, \dots, d

. If

j^{*}

is sampled, then

θ_{n}

is set to

θ_{j^{*}}

.

Step 2.

P (σ_{n}^{2} ∣ θ_{n}, Y_{n}) \propto f (Y_{n} ∣ θ_{n}, σ_{n}^{2}) π (σ_{n}^{2})

\begin{matrix} P (σ_{n}^{2} ∣ \dots) \propto & \frac{1}{{(2 π σ_{n}^{2})}^{M / 2}} exp \{- \frac{1}{2 σ_{n}^{2}} {(Y_{n} - X_{0} α^{(n)})}^{T} (Y_{n} - X_{0} α^{(n)})\} \\ \times \frac{}{b^{a} Γ (a)} {(σ_{n}^{2})}^{- a - 1} exp \{- \frac{1}{b σ_{n}^{2}}\} \\ \propto & {(σ_{n}^{2})}^{- \frac{M}{2} - a - 1} exp \{- \frac{1}{σ_{n}^{2}} (\frac{{(Y_{n} - X_{0} α^{(n)})}^{T} (Y_{n} - X_{0} α^{(n)})}{2} + \frac{1}{b})\} \\ \sim & Inverse-Gamma (\frac{M}{2} + a, \frac{{(Y_{n} - X_{0} α^{(n)})}^{T} (Y_{n} - X_{0} α^{(n)})}{2} + \frac{1}{b}) . \end{matrix}

Step 3.

In this step, we calculate the posterior marginal probability function of

K_{r}

, which depends on

v (m_{1}, \dots, m_{k + 1}) = exp (\tilde{H} (m_{1}, \dots, m_{k + 1})) \frac{Γ (m_{0} + 1)}{\prod_{l = 1}^{k + 1} Γ (m_{l} + 1)} {(\frac{1}{k + 1})}^{m_{0}},

(A6)

where

\tilde{H} (m_{1}, \dots, m_{k + 1}) = log \int_{R^{k + 1}} \prod_{n \in C_{r}} f (Y_{n} ∣ α^{(r)}, σ_{n}^{2}) d α^{(r)} .

We can write

\prod_{n \in C_{r}} f (Y_{n} ∣ α^{(r)}, σ_{n}^{2})

as

\begin{matrix} \prod_{n \in C_{r}} f (Y_{n} ∣ α^{(r)}, σ_{r}^{2}) = \prod_{n \in C_{r}} \frac{1}{{(2 π σ_{n}^{2})}^{M / 2}} exp \{- \frac{1}{2 σ_{n}^{2}} {(Y_{n} - X_{0, r} α^{(r)})}^{T} (Y_{n} - X_{0, r} α^{(r)})\} \\ = & \frac{1}{{(2 π)}^{| C_{r} | M / 2}} \prod_{n \in C_{r}} (σ_{n}^{2})^{- M / 2} exp \{- \sum_{n \in C_{r}} \frac{σ_{n}^{- 2}}{2} {(Y_{n} - X_{0, r} α^{(r)})}^{T} (Y_{n} - X_{0, r} α^{(r)})\} \\ = & \frac{1}{{(2 π)}^{| C_{r} | M / 2}} \prod_{n \in C_{r}} (σ_{n}^{2})^{- M / 2} exp \{- \sum_{n \in C_{r}} \frac{σ_{n}^{- 2}}{2} (Y_{n}^{T} Y_{n} - Y_{n}^{T} X_{0, r} V_{r}^{- 1} X_{0, r}^{T} Y_{n})\} \\ \times exp \{- \sum_{n \in C_{r}} \frac{σ_{n}^{- 2}}{2} {(α^{(r)} - V_{r}^{- 1} X_{0, r}^{T} Y_{n})}^{T} V_{r} (α^{(r)} - V_{r}^{- 1} X_{0, r}^{T} Y_{n})\} . \end{matrix}

Now, using the same approach as in Equation (A2) with

V_{r} = X_{0, r}^{T} X_{0, r}

, we obtain

\begin{matrix} \tilde{H} (m_{1}, \dots, m_{k + 1}) = & log \int_{R^{k + 1}} \prod_{n \in C_{r}} f (Y_{n} ∣ α^{(r)}, σ_{n}^{2}) d α^{(r)} \\ = & - \frac{| C_{r} | M - k - 1}{2} log (2 π) - \frac{M}{2} (\sum_{n \in C_{r}} log (σ_{n}^{2})) \\ - \frac{1}{2} \sum_{n \in C_{r}} σ_{n}^{- 2} (Y_{n}^{T} Y_{n} - Y_{n}^{T} X_{0, r} V_{r}^{- 1} X_{0, r}^{T} Y_{n}) \\ - \frac{k + 1}{2} log (\sum_{n \in C_{r}} σ_{n}^{- 2}) - \frac{1}{2} log | V_{r} | . \end{matrix}

(A7)

Then, after updating the interval lengths

τ_{r}

for cluster r, we obtain the updates for the intercepts

α_{r}

for cluster r using the following full conditional distribution:

\begin{matrix} P (α^{(r)} ∣ τ_{r}, K_{r}, Y_{r}) & \propto f (Y ∣ α^{(r)}, σ^{2}) π (α^{(r)}) \\ = \prod_{n \in C_{r}} f (Y_{n} ∣ α^{(r)}, σ_{n}^{2}) \times 1 \\ \propto exp \{- \sum_{n \in C_{r}} \frac{σ_{n}^{- 2}}{2} {(α^{(r)} - V_{r}^{- 1} X_{0, r}^{T} Y_{n})}^{T} V_{r} (α^{(r)} - V_{r}^{- 1} X_{0, r}^{T} Y_{n})\} \\ \sim Normal (V_{r}^{- 1} X_{0, r}^{T} Y_{n}, V_{r}^{- 1} \sum_{n \in C_{r}} σ_{n}^{- 2}) . \end{matrix}

Appendix B. Additional Simulation Results

Table A1. Simulation Scenario 1. Average processing time taken over 96 datasets randomly generated from our proposed model with 5000 iterations when we varied the number of data sequences with small data dispersion.

N	Average Processing Time (min)
10	157.2476
25	291.2428
50	513.2888

Table A2. Simulation Scenario 2. Average processing time taken over 96 datasets randomly generated from our proposed model with 5000 iterations when we varied the number of data sequences with higher data dispersion.

N	Average Processing Time (min)
10	153.6956
25	282.8196
50	500.9338

Table A3. Simulation Scenario 3. Average processing time taken over 96 datasets randomly generated from our proposed model with 5000 iterations when we varied the number of locations.

M	Average Processing Time (min)
50	283.5759
100	1715.0926
200	1274.1529

Appendix C. Scenario with Outliers

In this appendix, we present a simulation study to evaluate the robustness of our method in the presence of outliers. For this study, we generated one simulated dataset as described in Scenario 2, with

N = 50

data sequences and

M = 50

locations within each data sequence,

d = 2

clusters, and

K = 2

change points in each cluster. The change points’s locations for Cluster 1 were 19 and 34. For the second cluster, they were 15 and 32. Each segment between change points was defined by a constant level (5, 20, and 10) for the first cluster and (17, 10, and 2) for the second cluster.

To assess robustness in the presence of outliers, we reduced the value of the 19th location by 10 units in 9% of the sequences from Cluster 1, causing the first change point for these sequences to shift by one position from its true position. Figure A1 shows the data structure of 4 of the 50 generated data sequences. Data sequence 7 represents an outlier in Cluster 1, with the value of its 19th location reduced by 10 units.

Despite the presence of outliers in Cluster 1, our method successfully recovered the true change point profiles for both clusters as shown in Table A4. Our method also recovered the number of change points and change points’ positions for each cluster. Therefore, this indicates that our method is robust in the presence of outliers.

Figure A1. Scenario with outlier. Data structure for four data sequences for the scenario with outliers. Each panel presents the observed values for one data sequence. Observations from Cluster 1 are colored in red, whereas observations from Cluster 2 are colored in blue. Data sequence 7 represents an outlier, with the value of its 19th location reduced by 10 units.

Table A4. Scenario with outlier. Posterior estimates of the intercepts of each cluster based on the results obtained from one simulated dataset from Scenario 2, with 9% of the sequences from Cluster 1 having the value for their 19th location reduced by 10 units. We report the posterior mean (Estimate) and the posterior variance (Variance) for the intercepts of each cluster based on the results obtained from two chains.

Cluster	Parameter	Estimate	Variance
	$α_{1} = 5$	5.0206	0.0010
1	$α_{2} = 20$	20.0308	0.0014
	$α_{3} = 10$	9.9817	0.0012
	$α_{1} = 17$	16.9531	0.0012
2	$α_{2} = 10$	10.0355	0.0012
	$α_{3} = 2$	2.0010	0.0011

References

Page, E.S. Continuous inspection schemes. Biometrika 1954, 41, 100–115. [Google Scholar] [CrossRef]
Page, E.S. A test for a change in a parameter occurring at an unknown point. Biometrika 1955, 42, 523–527. [Google Scholar] [CrossRef]
Fryzlewicz, P. Wild binary segmentation for multiple change-point detection. Ann. Stat. 2014, 42, 2243–2281. [Google Scholar] [CrossRef]
Olshen, A.B.; Venkatraman, E.S.; Lucito, R.; Wigler, M. Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 2004, 5, 557–572. [Google Scholar] [CrossRef] [PubMed]
Picard, F.; Lebarbier, E.; Hoebeke, M.; Rigaill, G.; Thiam, B.; Robin, S. Joint segmentation, calling, and normalization of multiple CGH profiles. Biostatistics 2011, 12, 413–428. [Google Scholar] [CrossRef]
Hocking, T.D.; Schleiermacher, G.; Janoueix-Lerosey, I.; Boeva, V.; Cappo, J.; Delattre, O.; Bach, F.; Vert, J.P. Learning smoothing models of copy number profiles using breakpoint annotations. BMC Bioinform. 2013, 14, 164. [Google Scholar] [CrossRef]
Jandhyala, V.; Fotopoulos, S.; MacNeill, I.; Liu, P. Inference for single and multiple change-points in time series. J. Time Ser. Anal. 2013, 34, 423–446. [Google Scholar] [CrossRef]
Aue, A.; Horváth, L. Structural breaks in time series. J. Time Ser. Anal. 2013, 34, 1–16. [Google Scholar] [CrossRef]
Yan, J.; Wang, L.; Song, W.; Chen, Y.; Chen, X.; Deng, Z. A time-series classification approach based on change detection for rapid land cover mapping. ISPRS J. Photogramm. Remote Sens. 2019, 158, 249–262. [Google Scholar] [CrossRef]
Zhao, K.; Wulder, M.A.; Hu, T.; Bright, R.; Wu, Q.; Qin, H.; Li, Y.; Toman, E.; Mallick, B.; Zhang, X.; et al. Detecting change-point, trend, and seasonality in satellite time series data to track abrupt changes and nonlinear dynamics: A Bayesian ensemble algorithm. Remote Sens. Environ. 2019, 232, 111181. [Google Scholar] [CrossRef]
Militino, A.F.; Moradi, M.; Ugarte, M.D. On the performances of trend and change-point detection methods for remote sensing data. Remote Sens. 2020, 12, 1008. [Google Scholar] [CrossRef]
Truong, C.; Oudre, L.; Vayatis, N. Selective review of offline change point detection methods. Signal Process. 2020, 167, 107299. [Google Scholar] [CrossRef]
Brodsky, E.; Darkhovsky, B.S. Nonparametric Methods in Change Point Problems; Springer: Dordrecht, The Netherlands, 1993. [Google Scholar]
Chen, J.; Gupta, A.K. Parametric Statistical Change Point Analysis: With Applications to Genetics, Medicine, and Finance; Birkhäuser: Boston, MA, USA, 2012. [Google Scholar]
Haynes, K.; Fearnhead, P.; Eckley, I.A. A computationally efficient nonparametric approach for changepoint detection. Stat. Comput. 2017, 27, 1293–1305. [Google Scholar] [CrossRef] [PubMed]
Londschien, M.; Bühlmann, P.; Kovács, S. Random forests for change point detection. J. Mach. Learn. Res. 2023, 24, 216. [Google Scholar]
Vostrikova, L.Y. Detecting “disorder” in multidimensional random processes. Dokl. Akad. Nauk SSSR 1981, 259, 270–274. [Google Scholar]
Killick, R.; Fearnhead, P.; Eckley, I.A. Optimal detection of changepoints with a linear computational cost. J. Am. Stat. Assoc. 2012, 107, 1590–1598. [Google Scholar] [CrossRef]
Rigaill, G. A pruned dynamic programming algorithm to recover the best segmentations with 1 to K_max change-points. J. Soc. Fr. Stat. 2015, 156, 180–205. [Google Scholar]
Frick, K.; Munk, A.; Sieling, H. Multiscale change point inference. J. R. Stat. Soc. Ser. B 2014, 76, 495–580. [Google Scholar] [CrossRef]
Pein, F.; Sieling, H.; Munk, A. Heterogeneous change point inference. J. R. Stat. Soc. Ser. B 2017, 79, 1207–1227. [Google Scholar] [CrossRef]
Zou, C.; Yin, G.; Feng, L.; Wang, Z. Nonparametric maximum likelihood approach to multiple change-point problems. Ann. Stat. 2014, 42, 970–1002. [Google Scholar] [CrossRef]
Niu, Y.S.; Hao, N.; Zhang, H. Multiple change-point detection: A selective overview. Stat. Sci. 2016, 31, 611–623. [Google Scholar] [CrossRef]
Dass, S.C.; Lim, C.Y.; Maiti, T.; Zhang, Z. Clustering curves based on change point analysis: A nonparametric Bayesian approach. Stat. Sin. 2015, 25, 677–708. [Google Scholar] [CrossRef]
Zhu, X.; Melnykov, Y. On finite mixture modeling of change-point processes. J. Classif. 2022, 39, 3–22. [Google Scholar] [CrossRef]
Sarkar, S.; Zhu, X. Multiple change point clustering of count processes with application to California COVID data. Pattern Recognit. Lett. 2022, 157, 83–89. [Google Scholar] [CrossRef]
James, N.A.; Matteson, D.S. ecp: An R package for nonparametric multiple change point analysis of multivariate data. J. Stat. Softw. 2015, 62, 7. [Google Scholar] [CrossRef]
Erdman, C.; Emerson, J.W. bcp: An R package for performing a Bayesian analysis of change point problems. J. Stat. Softw. 2007, 23, 3. [Google Scholar] [CrossRef]
Leung, M.L.; Davis, A.; Gao, R.; Casasent, A.; Wang, Y.; Sei, E.; Vilar, E.; Maru, D.; Kopetz, S.; Navin, N.E. Single-cell DNA sequencing reveals a late-dissemination model in metastatic colorectal cancer. Genome Res. 2017, 27, 1287–1299. [Google Scholar] [CrossRef] [PubMed]
Zaccaria, S.; Raphael, B.J. Characterizing allele- and haplotype-specific copy numbers in single cells with CHISEL. Nat. Biotechnol. 2021, 39, 207–214. [Google Scholar] [CrossRef]
Safinianaini, N.; de Souza, C.P.E.; Roth, A.; Koptagel, H.; Toosi, H.; Lagergren, J. CopyMix: Mixture model based single-cell clustering and copy number profiling using variational inference. Comput. Biol. Chem. 2024, 113, 108257. [Google Scholar] [CrossRef]
Neal, R.M. Bayesian mixture modeling. In Maximum Entropy and Bayesian Methods; Smith, C.R., Erickson, G.J., Neudorfer, P.O., Eds.; Springer: Dordrecht, The Netherlands, 1992; pp. 197–211. [Google Scholar]
Yerebakan, H.Z.; Rajwa, B.; Dundar, M. The Infinite Mixture of Infinite Gaussian Mixtures. In Proceedings of the 28th Annual Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 28–36. [Google Scholar]
Blackwell, D.; MacQueen, J.B. Ferguson distributions via Pólya urn schemes. Ann. Stat. 1973, 1, 353–355. [Google Scholar] [CrossRef]
Neal, R.M. Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph. Stat. 2000, 9, 249–265. [Google Scholar] [CrossRef]
Li, Y.; Schofield, E.; Gönen, M. A tutorial on Dirichlet process mixture modeling. J. Math. Psychol. 2019, 91, 128–144. [Google Scholar] [CrossRef] [PubMed]
Escobar, M.D.; West, M. Computing nonparametric hierarchical models. In Practical Nonparametic and Semiparametric Bayesian Statistics; Dey, D., Müller, P., Sinha, D., Eds.; Springer: New York, NY, USA, 1998; pp. 1–22. [Google Scholar]
R Core Team. R: A Language and Environment for Statistical Computing, Version 4.2.1; R Foundation for Statistical Computing: Vienna, Austria, 2022.
Wolodzko, T. extraDistr: Additional Univariate and Multivariate Distributions, Version 1.10.0; R Foundation for Statistical Computing: Vienna, Austria, 2020.
Wood, J. RcppAlgos: High Performance Tools for Combinatorics and Computational Mathematics, Version 2.9.3; R Foundation for Statistical Computing: Vienna, Austria, 2023.
Venables, W.N.; Ripley, B.D. Modern Applied Statistics with S, 4th ed.; Springer: New York, NY, USA, 2002. [Google Scholar]
Rosenberg, A.; Hirschberg, J.B. V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic, 28–30 June 2007; pp. 410–420. [Google Scholar]
da Cruz, A.C. BayesCPclust: A Bayesian Approach for Clustering Constant-Wise Change-Point Data, Version 0.1.0; R Foundation for Statistical Computing: Vienna, Austria, 2025.
Gelman, A.; Rubin, D.B. Inference from iterative simulation using multiple sequences. Stat. Sci. 1992, 7, 457–472. [Google Scholar] [CrossRef]
Zhao, M.; Zhao, Z. Concordance of copy number loss and down-regulation of tumor suppressor genes: A pan-cancer study. BMC Genom. 2016, 17, 532. [Google Scholar] [CrossRef]
Shao, X.; Lv, N.; Liao, J.; Long, J.; Xue, R.; Ai, N.; Xu, D.; Fan, X. Copy number variation is highly correlated with differential gene expression: A pan-cancer study. BMC Med. Genet. 2019, 20, 175. [Google Scholar] [CrossRef] [PubMed]
Nowell, P.C. The clonal evolution of tumor cell populations: Acquired genetic lability permits stepwise selection of variant sublines and underlies tumor progression. Science 1976, 194, 23–28. [Google Scholar] [CrossRef] [PubMed]
Ding, L.; Ley, T.J.; Larson, D.E.; Miller, C.A.; Koboldt, D.C.; Welch, J.S.; Ritchey, J.K.; Young, M.A.; Lamprecht, T.; McLellan, M.D.; et al. Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing. Nature 2012, 481, 506–510. [Google Scholar] [CrossRef]
Bambury, R.M.; Bhatt, A.S.; Riester, M.; Pedamallu, C.S.; Duke, F.; Bellmunt, J.; Stack, E.C.; Werner, L.; Park, R.; Iyer, G.; et al. DNA copy number analysis of metastatic urothelial carcinoma with comparison to primary tumors. BMC Cancer 2015, 15, 242. [Google Scholar] [CrossRef]
Eirew, P.; Steif, A.; Khattra, J.; Ha, G.; Yap, D.; Farahani, H.; Gelmon, K.; Chia, S.; Mar, C.; Wan, A.; et al. Dynamics of genomic clones in breast cancer patient xenografts at single-cell resolution. Nature 2015, 518, 422–426. [Google Scholar] [CrossRef]
Kridel, R.; Chan, F.C.; Mottok, A.; Boyle, M.; Farinha, P.; Tan, K.; Meissner, B.; Bashashati, A.; McPherson, A.; Roth, A.; et al. Histological transformation and progression in follicular lymphoma: A clonal evolution study. PLoS Med. 2016, 13, e1002197. [Google Scholar] [CrossRef]
Zeileis, A.; Grothendieck, G. zoo: S3 infrastructure for regular and irregular time series. J. Stat. Softw. 2005, 14, 6. [Google Scholar] [CrossRef]
Blei, D.M.; Kucukelbir, A.; McAuliffe, J.D. Variational inference: A review for statisticians. J. Am. Stat. Assoc. 2017, 112, 859–877. [Google Scholar] [CrossRef]
van de Schoot, R.; Depaoli, S.; King, R.; Kramer, B.; Märtens, K.; Tadesse, M.G.; Vannucci, M.; Gelman, A.; Veen, D.; Willemsen, J.; et al. Bayesian statistics and modelling. Nat. Rev. Methods Primers 2021, 1, 1. [Google Scholar] [CrossRef]
Park, T.; Lee, S. Improving the Gibbs sampler. WIREs Comp. Stat. 2022, 14, e1546. [Google Scholar] [CrossRef]
Jasra, A.; Holmes, C.C.; Stephens, D.A. Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling. Stat. Sci. 2005, 20, 50–67. [Google Scholar] [CrossRef]
Papastamoulis, P.; Iliopoulos, G. An artificial allocations based solution to the label switching problem in Bayesian analysis of mixtures of distributions. J. Comput. Graph. Stat. 2010, 19, 313–331. [Google Scholar] [CrossRef]
Rodríguez, C.E.; Walker, S.G. Label switching in Bayesian mixture models: Deterministic relabeling strategies. J. Comput. Graph. Stat. 2014, 23, 25–45. [Google Scholar] [CrossRef]

Figure 1. Simulation Scenario 1. Data structure for four data sequences from 1 of the 96 randomly generated synthetic datasets for Scenario 1 (

N = 50

and

M = 50

) with

σ_{n}^{2} \approx 0.05

. Each panel presents the observed values for one data sequence. Observations from Cluster 1 are colored in red, whereas observations from Cluster 2 are colored in blue. Change points’ locations for Cluster 1 were 19 and 34. For the second cluster, they were 15 and 32. Each segment between change points is defined by a constant level (5, 20, 10) for the first cluster and (17, 10, 2) for the second cluster.

Figure 1. Simulation Scenario 1. Data structure for four data sequences from 1 of the 96 randomly generated synthetic datasets for Scenario 1 (

N = 50

and

M = 50

) with

σ_{n}^{2} \approx 0.05

. Each panel presents the observed values for one data sequence. Observations from Cluster 1 are colored in red, whereas observations from Cluster 2 are colored in blue. Change points’ locations for Cluster 1 were 19 and 34. For the second cluster, they were 15 and 32. Each segment between change points is defined by a constant level (5, 20, 10) for the first cluster and (17, 10, 2) for the second cluster.

Figure 2. Simulation Scenario 2. Data structure for four data sequences from 1 of the 96 randomly generated synthetic datasets for Scenario 2 (

N = 50

and

M = 50

) with

σ_{n}^{2} \approx 0.5

. Each panel presents the observed values for one data sequence. Observations from Cluster 1 are colored in red, whereas observations from Cluster 2 are colored in blue. Change points’ locations for Cluster 1 were 19 and 34. For the second cluster, they were 15 and 32. Each segment between change points is defined by a constant level (5, 20, 10) for the first cluster and (17, 10, 2) for the second cluster.

Figure 2. Simulation Scenario 2. Data structure for four data sequences from 1 of the 96 randomly generated synthetic datasets for Scenario 2 (

N = 50

and

M = 50

) with

σ_{n}^{2} \approx 0.5

. Each panel presents the observed values for one data sequence. Observations from Cluster 1 are colored in red, whereas observations from Cluster 2 are colored in blue. Change points’ locations for Cluster 1 were 19 and 34. For the second cluster, they were 15 and 32. Each segment between change points is defined by a constant level (5, 20, 10) for the first cluster and (17, 10, 2) for the second cluster.

Figure 3. Copy number data for chromosomes 19, 20, and 21 for six cells: three from primary tumor cells (P) and three from metastatic tumor cells (M). Due to computational cost, the read positions were transformed using a median moving window (size of five), reducing the total number of genomic bins to 290.

Figure 4. Copy number data for all 18 cells in Cluster 1 composed of primary tumor cells. Our approach estimated two change points at positions 100 and 226. The black dashed lines correspond to the mean constant level for each segment between change points.

Figure 5. Copy number data for all nine cells in Cluster 2, composed of tumor cells with copy number reads around one in all locations. The black dashed lines correspond to the mean constant level for each segment between change points.

Figure 6. Copy number data for all 18 cells in Cluster 3, composed of metastatic tumor cells with change points at positions 165 and 215. The black dashed lines correspond to the mean constant level for each segment between change points.

Table 1. Scenario configurations. We varied one of the parameters for each simulated scenario (shown by the * symbol), keeping the others fixed. The fixed parameters are

N = 25

,

M = 50

,

d = 2

clusters and

K = 2

change points in each cluster. The possible values for the number of data sequences N are 10, 25, and 50, and for the number of locations, the M values are 50, 100, and 200. Additionally, while varying the number of data sequences, we considered one simulated scenario where variance components

σ_{n}^{2}, n = 1, \dots, N

were generated from an inverse gamma with a mean equal to 0.05 and another with a mean equal to 0.5.

Table 1. Scenario configurations. We varied one of the parameters for each simulated scenario (shown by the * symbol), keeping the others fixed. The fixed parameters are

N = 25

,

M = 50

,

d = 2

clusters and

K = 2

change points in each cluster. The possible values for the number of data sequences N are 10, 25, and 50, and for the number of locations, the M values are 50, 100, and 200. Additionally, while varying the number of data sequences, we considered one simulated scenario where variance components

σ_{n}^{2}, n = 1, \dots, N

were generated from an inverse gamma with a mean equal to 0.05 and another with a mean equal to 0.5.

Scenario	$σ^{2}$ (Average)	N	M
1	0.05	*	50
2	0.50	*	50
3	0.05	25	*

Table 2. Simulation Scenario 1. Posterior estimates of the intercepts of each cluster taken over 96 randomly generated synthetic datasets when we varied the number of observations N. The variance components in this scenario were sampled from an inverse gamma with a small average (0.05). We report the average of the posterior mean (APM) estimates and the average posterior variance (APV) for each intercept, and we present the average interval length of

95 %