Change-Point Estimation and Detection for Mixture of Linear Regression Models

Zhao, Wenzhi; Cheng, Tian; Xia, Zhiming

doi:10.3390/axioms14060402

Open AccessArticle

Change-Point Estimation and Detection for Mixture of Linear Regression Models

by

Wenzhi Zhao

¹,

Tian Cheng

² and

Zhiming Xia

^2,*

¹

School of Science, Xi’an Polytechnic University, Xi’an 710048, China

²

School of Mathematics, Northwest University, Xi’an 710127, China

^*

Author to whom correspondence should be addressed.

Axioms 2025, 14(6), 402; https://doi.org/10.3390/axioms14060402

Submission received: 15 April 2025 / Revised: 14 May 2025 / Accepted: 19 May 2025 / Published: 26 May 2025

(This article belongs to the Special Issue Mathematical and Statistical Methods and Their Applications, 2nd Edition)

Download Versions Notes

Abstract

This paper studies the estimation and detection problems in the mixture of linear regression models with change point. An improved Expectation–Maximization (EM) algorithm is devised specifically for multi-classified mixture data with change points. Under appropriate conditions, the large-sample properties of the estimator are rigorously proven. This improved EM algorithm not only precisely locates the change points but also yields accurate parameter estimates for each class. Additionally, a detector grounded in the score function is proposed to identify the presence of change points in mixture data. The limiting distributions of the test statistics under both the null and alternative hypotheses are systematically derived. Extensive simulation experiments are conducted to assess the effectiveness of the proposed method, and comparative analyses with the conventional EM algorithm are performed. The results clearly demonstrate that the EM algorithm without considering change points exhibits poor performance in classifying data, often resulting in the misclassification or even omission of certain classes. In contrast, the estimation method introduced in this study showcases remarkable accuracy and robustness, with favorable empirical sizes and powers.

Keywords:

change point; mixture of linear regression model; multi-classification; EM algorithm

MSC:

62F10; 62J05

1. Introduction

Mixture models have found extensive applications in econometrics and social sciences, and the associated theoretical frameworks have been thoroughly investigated. Among them, finite mixtures of linear regression models, a particularly valuable class of mixture models, have been widely employed across diverse fields, including econometrics, marketing, epidemiology, and biology. Simultaneously, the statistical inference and parameter estimation based on mixture models play a pivotal role in practical applications. On another front, the change point problem remains a focal point in statistical research. Conceptually, a change point denotes the specific location or instant at which a sudden alteration in a statistical property occurs. Change point phenomena are pervasive in both natural and social systems, and have enabled crucial advancements in numerous domains such as industrial control, financial economics, biomedicine, and signal processing. The detection and estimation of change points within mixture datasets are essential for discerning underlying patterns and making informed decisions, thereby holding significant practical implications.

In recent decades, remarkable progress has been achieved in the analysis of mixture models. Goldfeld and Quandt provided an outstanding introduction to mixtures of linear regression models [1]. They delved into the parameter estimation problem of such mixtures and applied it to analyze the imbalance in the housing price market. Frühwirth and Schnatter summarized the Bayesian approach for mixtures of linear regression models in their paper [2]. Hurn and Justel proposed a generalized linear finite mixture model and overcame the label switching issue by normalizing the loss function [3]. Li, Chen, and their collaborators (Chen, Li, and Fu) concentrated on the hypothesis testing problem regarding the order of the mixture model and derived the limiting distribution of the test statistics [4,5]. Huang and Yao introduced a semiparametric mixture regression model, where the mixing proportion is a smooth function of covariates [6]. Huang, Li, and Wang developed a non-parametric finite mixture method for regression models and demonstrated the estimation procedure through an analysis of U.S. Housing Price Index (HPI) data [7].

The change point problem has attracted extensive research attention, with a vast body of literature accumulated over the years. It was first introduced by Page during his study of continuous inspection schemes, who proposed the Cumulative Sum (CUSUM) test method [8]. Subsequently, scholars like Sen and Srivastava, Hawkins, James et al., and Srivastava and Worsley investigated the detection of change points in the mean of normal random variable sequences [9,10,11,12,13]. Basseville and Nikiforov systematically expounded the theoretical foundation of change point detection and estimation algorithms [14]. Wang, Zou, and Yin delved into change point detection in multinomial data as the number of categories approaches infinity, rigorously proving the statistical properties of relevant test statistics [15]. Xia and Qiu proposed the JIC criterion to tackle the multi-change point problem in non-parametric regression models [16]. Bai employed the least squares method to estimate the mean change point of linear processes, deriving the consistency and convergence rate of the change point estimator [17]. Baranowski et al. utilized grid search algorithms based on binary segmentation to accurately determine the number and positions of change points [18]. Follain et al. developed a novel approach for estimating change points in partially observed, high-dimensional time series with simultaneous mean shifts in sparse coordinate subsets [19]. Gong et al. redefined the graph-based change point detection problem as a prediction task and proposed an innovative change point detection (CPD) method for dynamic graphs through a latent evolution model. Drabech et al. introduced a Markov Random Field (MRF) model to detect slope changes [20], while Ratnasingam et al. proposed an empirical likelihood-based non-parametric procedure for identifying structural changes in quantile regression models [21]. Notably, despite these abundant studies, the statistical analysis of change points within finite mixtures of linear models remains a relatively overlooked area [22].

This paper delves into the estimation and detection problems of the finite mixture of linear regression models with change points. Specifically, for multi-classified mixture data incorporating change points, we ingeniously propose an improved Expectation–Maximization (EM) algorithm. This algorithm is meticulously designed to accurately compute parameter estimators, such as the location of change points and regression coefficients. Through rigorous mathematical proofs, we firmly establish the consistency and asymptotic normality of these parameter estimators, providing a solid theoretical foundation. Moreover, to effectively detect the presence of change points in the mixture of linear regression models, we innovatively develop a detector based on the score function. By conducting in-depth analyses, we systematically derive the limiting distributions under both the null hypothesis and the alternative hypothesis, which significantly contributes to the theoretical framework of this study. To thoroughly verify the practical effectiveness of the proposed method, we carefully design and execute a series of simulation experiments.

The remaining sections of this paper are structured as follows. In Section 2, we introduce the estimation and detection methods, statistical models, along with their associated assumptions. In Section 3 is dedicated to the development of the estimation and detection techniques for change points within the mixture of linear regression models. Additionally, we obtain the large sample properties of the estimators and test statistics. In Section 4, the simulation results are presented. Section 5 provides some concluding remarks and discussions. In Section 6, the limitations and future research directions are presented. The proofs of the theorems are included in the appendix.

2. Statistical Model and Assumptions

Assume that

{(X_{i}, Y_{i}), i = 1, \dots, n}

are independent random samples from the following data generating process:

Y_{i} \sim \{\begin{matrix} \sum_{c = 1}^{C} π_{c}^{* -} N (X_{i}^{T} β_{c}^{* -}, σ_{c}^{2 * -}), & 1 \leq i \leq k^{*}, \\ \sum_{c = 1}^{C} π_{c}^{* +} N (X_{i}^{T} β_{c}^{* +}, σ_{c}^{2 * +}), & k^{*} + 1 \leq i \leq n, k^{*} \in [1, n - 1], \end{matrix}

(1)

where

k^{*}

is a change point,

X_{i}

is a p-dimensional random covariate, C is the number of components,

β_{c}^{* -}, π_{c}^{* -}, σ_{c}^{* -},

β_{c}^{* +}, π_{c}^{* +}, σ_{c}^{* +}

are mixture coefficients for the cth component,

1 \geq π_{c}^{* -} \geq 0,

π_{c}^{* +} \geq 0, \sum_{c = 1}^{C} π_{c}^{* -} = \sum_{c = 1}^{C} π_{c}^{* +} = 1

.

π_{c}^{* -}

and

π_{c}^{* +}

are called mixing proportions or weights.

β_{c}^{* -}, β_{c}^{* +} \in R^{p}

,

σ_{c}^{2 * -}, σ_{c}^{2 * +} \in R^{+}

and the parameters are unknown. Equivalently speaking,

{Y_{i}, i = 1, \dots, n}

follow a finite mixture of normals,

Y_{i} \sim \sum_{c = 1}^{C} π_{c}^{* -} N (X_{i}^{T} β_{c}^{* -}, σ_{c}^{2 * -}) \cdot I_{{i \leq k^{*}}} + \sum_{c = 1}^{C} π_{c}^{* +} N (X_{i}^{T} β_{c}^{* +}, σ_{c}^{2 * +}) \cdot I_{{i \geq k^{*} + 1}} .

Naturally, the log-likelihood function corresponding to the dataset

{(X_{i}, Y_{i}), i = 1, \dots, n}

can be expressed as

\begin{matrix} l (k, π_{c}^{-}, β_{c}^{-}, σ_{c}^{2 -}, π_{c}^{+}, β_{c}^{+}, σ_{c}^{2 +}) = \\ \sum_{i = 1}^{n} \log [\sum_{c = 1}^{C} \{π_{c}^{-} ϕ (Y_{i} | X_{i}^{T} β_{c}^{-}, σ_{c}^{2 -}) \cdot I_{{i \leq k}} + π_{c}^{+} ϕ (Y_{i} | X_{i}^{T} β_{c}^{+}, σ_{c}^{2 +}) \cdot I_{{i \geq k + 1}}\}] \end{matrix}

(2)

where

ϕ (y | μ, σ^{2})

denotes the normal density.

For later use, we introduce some notations. Let

θ^{*} = {({(π^{*})}^{T}, {(β^{*})}^{T}, {(σ^{2 *})}^{T})}^{T}

denote the true parameter, where

π^{*} = {(π_{1}^{*}, \dots, π_{C - 1}^{*})}^{T},

β^{*} = {(β_{1}^{* T}, \dots, β_{C}^{* T})}^{T}, σ^{2^{*}} = {(σ_{1}^{2^{*}}, \dots, σ_{C}^{2^{*}})}^{T}

and

β_{i}^{*} \in R^{P}, i = 1, \dots, C .

Let

θ^{* -}, θ^{* +}

represent the parameters on the left and right sides of the change point, respectively. Similarly, we can define

θ

,

θ^{-}

,

θ^{+} .

Let

{\hat{k}}_{n}

and

{\hat{θ}}_{n}

be the maximum likelihood estimator of

k^{*}

and

θ

, which can be obtained by maximizing (2). In Section 3.1, an improved EM algorithm is proposed to carry out the estimation. The following assumptions are imposed.

Assumption 1.

\frac{k^{*}}{n} \to τ^{*}

,

n \to \infty

and

τ^{*} \in (0, 1)

.

The above assumption requires that

k^{*}

depends on the sample size n and is approximately proportional to n.

Assumption 2.

We suppose that

θ \in Θ

and parameter space

Θ \subset R^{q}

is a compact set,

q = (p + 2) \cdot C - 1 .

Assumption 2 restricts the parameter space to be compact, which means any open cover of

Θ

has a finite subcover.

Assumption 3.

{(X_{i}, Y_{i}), i = 1, 2, \dots, n}

are independent.

3. Parameter Inference

3.1. Estimation Procedure

In this subsection, we propose an effective EM algorithm to deal with the parameter estimation problem. We take

i \geq k + 1

as an example to illustrate the derivation of the iterative formula (

i < k

can be obtained in the same way). At this time, the log-likelihood function corresponding to the data is

\sum_{i = k + 1}^{n} \log [\sum_{c = 1}^{C} π_{c}^{+} ϕ {Y_{i} | X_{i}^{T} β_{c}^{+}, σ_{c}^{2 +}}] .

(3)

In the EM framework, the mixture problem is described as an incomplete-data problem. We view the observed data

(X_{i}, Y_{i}, i = 1, 2, \dots, n)

as being incomplete, and then introduce unobserved

B e r n o u l l i

random variables

z_{i c} = \{\begin{matrix} 1, i f (X_{i}, Y_{i}) i s i n t h e c t h g r o u p, \\ 0, o t h e r w i s e, \end{matrix}

i = 1, 2, \dots, n, c = 1, 2, \dots, C .

and

z_{i} = {(z_{i 1}, \dots, z_{i C})}^{T}

, which is the component label of

(X_{i}, Y_{i})

. Therefore, the complete data are

{(X_{i}, Y_{i}, z_{i})}_{i = 1}^{n}

, and the complete log-likelihood function corresponding to (3) is

\sum_{i = k + 1}^{n} \sum_{c = 1}^{C} z_{i c} [\log π_{c}^{+} + \log ϕ {Y_{i} | X_{i}^{T} β_{c}^{+}, σ_{c}^{2 +}}] .

Supposing in the lth cycle of the EM algorithm iteration, we have

π_{c}^{+ (l)}, β_{c}^{+ (l)}, σ_{c}^{2 + (l)}

; then, in the E-step of

(l + 1)

th cycle, the expectation of the latent variable

z_{i c}

can be calculated by

r_{i c}^{(l + 1)} = \frac{π_{c}^{+ (l)} ϕ {Y_{i} | X_{i}^{T} β_{c}^{+ (l)}, σ_{c}^{2 + (l)}}}{\sum_{c = 1}^{C} π_{c}^{+ (l)} ϕ {Y_{i} | X_{i}^{T} β_{c}^{+ (l)}, σ_{c}^{2 + (l)}}} .

In the M-step of the

(l + 1)

th cycle, we maximize

\sum_{i = k + 1}^{n} \sum_{c = 1}^{C} r_{i c}^{(l + 1)} [\log π_{c}^{+} + \log ϕ {Y_{i} | X_{i}^{T} β_{c}^{+}, σ_{c}^{2 +}}] .

(4)

The maximization of Equation (4) is equivalent to separately maximize

\sum_{i = k + 1}^{n} \sum_{c = 1}^{C} r_{i c}^{(l + 1)} \log π_{c}^{+}

and

\sum_{i = k + 1}^{n} \sum_{c = 1}^{C} r_{i c}^{(l + 1)} \log ϕ {Y_{i} | X_{i}^{T} β_{c}^{+}, σ_{c}^{2 +}}

. Furthermore, the iterative formula can be easily derived as follows:

π_{c}^{+ (l + 1)} = \frac{\sum_{i = k + 1}^{n} r_{i c}^{(l + 1)}}{n},

β_{c}^{+ (l + 1)} = {(\sum_{i = k + 1}^{n} X_{i} r_{i c}^{(l + 1)} X_{i}^{T})}^{- 1} (\sum_{i = k + 1}^{n} X_{i} r_{i c}^{(l + 1)} Y_{i}),

σ_{c}^{2 + (l + 1)} = \frac{\sum_{i = k + 1}^{n} r_{i c}^{(l + 1)} (Y_{i} - X_{i}^{T} β_{c}^{+ (l)}) {(Y_{i} - X_{i}^{T} β_{c}^{+ (l)})}^{T}}{\sum_{i = k + 1}^{n} r_{i c}^{(l + 1)}}, c = 1, 2, \dots, C

Let the initial value be

π_{c}^{+ (0)}, β_{c}^{+ (0)}, σ_{c}^{+ (0)}, c = 1, \dots, C

, repeat the above iteration algorithm, and when the iteration converges, we obtain the estimator of mixture parameters, denoted as

π_{c}^{+} (k), β_{c}^{+} (k), σ_{c}^{2 +} (k), c = 1, \dots, C

. At this time, the log-likelihood function corresponding to the data

{(X_{i}, Y_{i})}_{i = k + 1}^{n}

is

l^{+} (k, π_{c}^{+}, β_{c}^{+}, σ_{c}^{2 +}) = \sum_{i = k + 1}^{n} \log [\sum_{c = 1}^{C} π_{c}^{+} (k) ϕ {Y_{i} | X_{i}^{T} β_{c}^{+} (k), σ_{c}^{2 +} (k)}] .

Whenever a value of k is determined, the mixture data are divided into two parts:

{(X_{i}, Y_{i})}_{i = 1}^{k}

and

{(X_{i}, Y_{i})}_{i = k + 1}^{n}

. Let the initial value be

π_{c}^{\pm (0)}, β_{c}^{\pm (0)}, σ_{c}^{\pm (0)}, c = 1, \dots, C

, and we can obtain the estimations of change point and mixture parameter according to Algorithm 1:

(\hat{k}, \hat{θ^{-}}, \hat{θ^{+}}) = \underset{k, π_{c}^{\pm}, β_{c}^{\pm}, σ_{c}^{\pm}}{\arg \max} l (k, π_{c}^{-}, β_{c}^{-}, σ_{c}^{2 -}, π_{c}^{+}, β_{c}^{+}, σ_{c}^{2 +}),

where

{\hat{θ}}^{\pm} = {({({\hat{π}}^{\pm})}^{T}, {({\hat{β}}^{\pm})}^{T}, {({\hat{σ}}^{2 \pm})}^{T})}^{T}

,

{\hat{π}}^{\pm} = {({\hat{π}}_{1}^{\pm}, \dots, {\hat{π}}_{C}^{\pm})}^{T}

and

{\hat{β}}^{\pm} = {({\hat{β}}_{1}^{\pm}, \dots, {\hat{β}}_{C}^{\pm})}^{T}

,

{\hat{σ}}^{2 \pm} = {({\hat{σ}}_{1}^{2 \pm}, \dots, {\hat{σ}}_{C}^{2 \pm})}^{T}

,

l (k, π_{c}^{-}, β_{c}^{-}, σ_{c}^{2 -}, π_{c}^{+}, β_{c}^{+}, σ_{c}^{2 +}) = l^{+} (k, π_{c}^{+}, β_{c}^{+}, σ_{c}^{2 +}) + l^{-} (k, π_{c}^{-}, β_{c}^{-}, σ_{c}^{2 -}) .

Algorithm 1 EM algorithm considering change point

Intput:

{X_{i}, Y_{i}}_{i = 1}^{n}

, the number of mixture component C, the maximum iterations T.

Output:

\hat{k}, {\hat{θ}}^{-}, {\hat{θ}}^{+}

1:: for $k = 1, 2, \dots, n - 1$ do
2:: For any given k, the mixture data are divided into two parts: ${(X_{i}, Y_{i})}_{i = 1}^{k}$ and ${(X_{i}, Y_{i})}_{i = k + 1}^{n}$ . Using EM algorithm separately for two parts of data. For ${(X_{i}, Y_{i})}_{i = 1}^{k}$ , let the initial value be $π_{c}^{- (0)}, β_{c}^{- (0)}, σ_{c}^{- (0)}, c = 1, \dots, C$ .
3:: for $t = 1, 2, \dots, T$ do
4:: E-step: calculate $r_{i c}^{(l + 1)} (k)$ ,

$r_{i c}^{(l + 1)} (k) = \frac{π_{c}^{- (l)} ϕ {Y_{i} | X_{i}^{T} β_{c}^{- (l)}, σ_{c}^{2 - (l)}}}{\sum_{c = 1}^{C} π_{c}^{- (l)} ϕ {Y_{i} | X_{i}^{T} β_{c}^{- (l)}, σ_{c}^{2 - (l)}}}, i = 1, \dots, k, c = 1, \dots, C$
5:: M-step: calculate

$π_{c}^{- (l + 1)} (k) = \frac{\sum_{i = 1}^{n} r_{i c}^{(l + 1)}}{n}, β_{c}^{- (l + 1)} (k) = {(\sum_{i = 1}^{n} X_{i} r_{i c}^{(l + 1)} X_{i}^{T})}^{- 1} (\sum_{i = 1}^{n} X_{i} r_{i c}^{(l + 1)} Y_{i})$

$σ_{c}^{2 - (l + 1)} (k) = \frac{\sum_{i = 1}^{n} r_{i c}^{(l + 1)} (Y_{i} - X_{i}^{T} β_{c}^{- (l)}) {(Y_{i} - X_{i}^{T} β_{c}^{- (l)})}^{T}}{\sum_{i = 1}^{n} r_{i c}^{(l + 1)}}$
6:: end for
7:: When the iteration reaches convergence, we obtain the estimator $π_{c}^{-} (k), β_{c}^{-} (k), σ_{c}^{2 -} (k), c = 1, \dots, C$ and the log-likelihood function value $l^{-} (k)$
8:: For ${(X_{i}, Y_{i})}_{i = k + 1}^{n}$ , repeat steps 3–7 and obtain $π_{c}^{+} (k), β_{c}^{+} (k), σ_{c}^{2 +} (k), c = 1, \dots, C$ and $l^{+} (k)$
9:: end for
10:: $\hat{k} = \underset{1 \leq k \leq n - 1}{\arg \max} (l^{-} (k) + l^{+} (k)), {\hat{θ}}^{-} = \hat{θ^{-}} (\hat{k}), {\hat{θ}}^{+} = \hat{θ^{+}} (\hat{k}),$

3.2. Hypothesis Test

Define the following notations for later use:

η (Y | θ) = \sum_{c = 1}^{C} π_{c} ϕ {X, Y | β_{c}, σ_{c}^{2}}, ℓ (θ, Y) = \log η (Y | θ),

q_{1} {θ, Y} = \frac{\partial ℓ {θ, Y}}{\partial θ}, q_{2} {θ, Y} = \frac{\partial^{2} ℓ {θ, Y}}{\partial θ \partial θ^{T}},

I (θ) = - E [q_{2} {θ, Y}], B (θ) = V a r [q_{1} {θ, Y}] = E [q_{1} {θ, Y} q_{1}^{T} {θ, Y}] .

In the following, we are interested in testing the null hypothesis:

H_{0} : θ_{i} = θ^{*} (i = 1, \dots, n)

against the local alternative hypothesis:

H_{A} : θ_{i} = θ^{*} + \frac{1}{\sqrt{n}} \cdot g (\frac{i}{n}),

where

g = {(g_{1}, g_{2}, \dots, g_{q})}^{T}

is a function of bounded variation on [0,1] which describes the patter of departure from stability of the parameter

θ^{*}

.

Firstly, we give the following lemmas and then we build an empirical process and infer its limiting process under the

H_{0}

and

H_{1}

hypothesis, respectively, as described by Theorem 1.

Theorem 1.

Under

H_{0}

and Lemma A3 in the Appendix A, given by

Δ_{n} (t, {\hat{θ}}_{n}) = \frac{1}{\sqrt{n}} \sum_{i = 1}^{[n t]} q_{1} ({\hat{θ}}_{n}, Y_{i}), t \in (0, 1),

(5)

then, we have

(1). under

H_{0}

,

{\hat{B}}_{n}^{- 1 / 2} Δ_{n} (t, {\hat{θ}}_{n}) \overset{d}{\to} W^{0} (t), n \to \infty,

where

W^{0} (t)

is a

q-dimensional

standard Brownian bridge with

W^{0} (t) = W (t) - t W (1)

and

{\hat{B}}_{n}

some consistent covariance matrix estimation, such as

{\hat{B}}_{n} = \frac{1}{n} \sum_{i = 1}^{n} q_{1} ({\hat{θ}}_{n}, Y_{i}) q_{1} {({\hat{θ}}_{n}, Y_{i})}^{T} .

(2). Under

H_{A}

,

{\hat{B}}_{n}^{- 1 / 2} Δ_{n} (t, {\hat{θ}}_{n}) \overset{d}{\to} W^{0} (t) + B {(θ^{*})}^{1 / 2} G^{0} (t), n \to \infty,

where

G^{0} (t) = G (t) - t G (1)

with

G (t) = \int_{0}^{t} g (y) d y

.

Next, we construct test statistics according to the empirical process (5) stated above. The key idea is that the empirical process reflects some symptoms of structural changes, and its performance under

H_{0}

and

H_{A}

is significantly different; then, the test statistics can be constructed as follows:

T_{n} = \sup_{1 \leq k \leq n} {∥{\hat{B}}_{n}^{- 1 / 2} Δ_{n} (k / n, {\hat{θ}}_{n})∥}^{2} .

Further from Theorem 1 and continuous mapping theorem, the following corollary can be obtained:

Corollary 1.

Given the empirical process (5) in Theorem 1, we have

(1). under

H_{0}

,

T_{n} \overset{d}{\to} \sup_{t \in [0, 1]} {∥W^{0} (t)∥}^{2} = \sup_{t \in [0, 1]} \{\sum_{i = 1}^{q} W_{i}^{0} {(t)}^{2}\}, n \to \infty .

(6)

(2). Under

H_{A}

,

T_{n} \overset{d}{\to} \sup_{t \in [0, 1]} {∥W^{0} (t) + B {(θ^{*})}^{1 / 2} G^{0} (t)∥}^{2}, n \to \infty

The above results show that the limiting distributions of the test statistics are different under the null and alternative hypothesis; thus, we can judge the existence of the change point according to it.

3.3. Consistency

In order to prove the consistency of the estimator, we first construct a function

L_{n}

as follows:

L_{n} (τ, θ) = \frac{1}{n} \sum_{i = 1}^{n} \tilde{f} (X_{i}, Y_{i} | t, θ) = \frac{1}{n} \sum_{i = 1}^{[n τ]} \tilde{f} (X_{i}, Y_{i} | t, θ) + \frac{1}{n} \sum_{i = [n τ] + 1}^{n} \tilde{f} (X_{i}, Y_{i} | t, θ),

(7)

where

[n τ] = k, [a]

means the integer part of a and

\begin{matrix} \tilde{f} (X_{i}, Y_{i} | t, θ) = \log [\sum_{c = 1}^{C} { & π_{c}^{-} ϕ (Y_{i} | X_{i}^{T} β_{c}^{-}, σ_{c}^{2 -}) I_{{i \leq [n t]}} + \\ π_{c}^{+} ϕ (Y_{i} | X_{i}^{T} β_{c}^{+}, σ_{c}^{2 +}) I_{{i \geq [n t] + 1}}}] . \end{matrix}

Without loss of generality, we only consider the case

k \leq k^{*}

for brevity; then, by

K o l m o g o r o v

’s Law of Large Numbers and

J e s s e n

inequality, the Formula (7) can be rewritten as

\begin{matrix} L_{n} (τ, θ) & = \frac{1}{n} \sum_{i = 1}^{[n τ]} f (X_{i}, Y_{i} | τ, θ^{-}) + \frac{1}{n} \sum_{i = [n τ] + 1}^{[n τ^{*}]} f (X_{i}, Y_{i} | τ, θ^{-}) + \frac{1}{n} \sum_{i = [n τ^{*}] + 1}^{n} f (X_{i}, Y_{i} | τ, θ^{+}) \\ = \frac{[n τ]}{n} \cdot \frac{1}{[n τ]} \sum_{i = 1}^{[n τ]} f (X_{i}, Y_{i} | τ, θ^{-}) + \frac{[n τ^{*}] - [n τ]}{n} \cdot \frac{1}{[n τ^{*}] - [n τ]} \sum_{i = [n τ] + 1}^{[n τ^{*}]} f (X_{i}, Y_{i} | τ, θ^{-}) \\ + \frac{n - [n τ^{*}]}{n} \cdot \frac{1}{n - [n τ^{*}]} \sum_{i = [n τ^{*}] + 1}^{n} f (X_{i}, Y_{i} | τ, θ^{+}) \\ \overset{p}{\to} τ \cdot E_{(τ^{*}, θ^{* -})} f (X, Y | τ, θ^{-}) + (τ^{*} - τ) \cdot E_{(τ^{*}, θ^{* -})} f (X, Y | τ, θ^{-}) \\ + (1 - τ^{*}) \cdot E_{(τ^{*}, θ^{* +})} f (X, Y | τ, θ^{+}) \\ ≜ L (τ, θ | τ^{*}, θ^{*}) \end{matrix}

where

f (X_{i}, Y_{i} | τ, θ^{-}) = \tilde{f} (X_{i}, Y_{i} | τ, θ) {|_{θ = θ^{-}}, f (X_{i}, Y_{i} | τ, θ^{+}) = \tilde{f} (X_{i}, Y_{i} | τ, θ) |}_{θ = θ^{+}} .

Now, we state our second theoretical result, which is the consistency of the obtained estimator.

Theorem 2.

Let

\hat{τ} = \frac{{\hat{k}}_{n}}{n}

and under the conditions of the Lemmas A1 and A2 in Appendix A; then, we have

(\hat{τ}, {\hat{θ}}_{n}) \overset{p}{\to} (τ^{*}, θ^{*}) .

4. Simulation

To assess the performance of our proposed method, we shall consider two simulation experiments as follows.

4.1. Estimation Experiment

In Experiment 1, we focus on the scenario where the vector

β

is two-dimensional and the value of C is set to 2. Additionally, when dealing with the parameter estimation problem of mixed models that incorporate change points, the conventional approach is to overlook the presence of these change points. Instead, the Expectation–Maximization (EM) algorithm is directly employed to estimate the unknown parameters, as if the data were generated from a mixed model consisting of

2 C

components. Consequently, we conduct a comparison between the estimation results obtained by the EM method that does not take change points into account (referred to as “NCPEM”) and those derived from our improved EM method (“CPEM”). We utilize the data generation mechanism of Model (1) to generate the data. Subsequently, we construct a two-component mixed model with a change point and proceed to specify the parameters as follows:

\begin{matrix} {(β_{1}^{* -})}^{T} & = (1, 2), σ_{1}^{2 * -} = 0.1, π_{1}^{* -} = 0.4, {(β_{2}^{* -})}^{T} = (1, 3), σ_{2}^{2 * -} = 0.2, π_{2}^{* -} = 0.6 . \\ {(β_{1}^{* +})}^{T} & = (3, 4), σ_{1}^{2 * +} = 0.2, π_{1}^{* +} = 0.7, {(β_{2}^{* +})}^{T} = (3, 5), σ_{2}^{2 * +} = 0.2, π_{2}^{* +} = 0.3 . \end{matrix}

X_{i}^{T} = (1, X_{i})

and

X_{i} \sim U (0, 1)

for

i = 1, 2, \dots, n .

Let the total sample size n from

{500, 1000, 2000}

and the change point

k^{*}

be

{250, 500, 1000}

, respectively. The change point estimators and the corresponding standard errors are

\hat{k} = 250.45 (0.730), 500.18 (0.458), 1000.5 (0.759)

over 100 simulations. It can be seen that the change point estimator is close to the true value. The mixture parameter estimators and the corresponding standard errors are presented in Table 1, Table 2 and Table 3.

As is evident from Table 1, the estimation results obtained by the Change Point Expectation–Maximization (CPEM) method are much closer to the true parameter values compared to those derived from the Non-Change Point Expectation–Maximization (NCPEM) method. Moreover, the CPEM method yields a significantly smaller standard error. Based on the estimations of

σ^{2 *}

and

π^{*}

presented in Table 2 and Table 3, it is clear that the estimation results of the NCPEM method deviate substantially from the true parameter values. Additionally, as the sample size increases, the majority of the estimation results tend to converge more closely to the true parameter values, accompanied by a smaller standard error. These findings strongly indicate that our improved CPEM method exhibits superior accuracy and enhanced robustness.

4.2. Detection Experiment

In experiment 2, we explore the performance of the detection procedure. Let

p = 1

and consider the following mixture model:

Y_{i} \sim \{\begin{matrix} \sum_{c = 1}^{2} π_{c}^{* -} N (μ_{c}^{*} -, σ_{c}^{2 * -}), & 1 \leq i \leq k^{*}, \\ \sum_{c = 1}^{2} π_{c}^{* +} N (μ_{c}^{* +}, σ_{c}^{2 * +}), & k^{*} + 1 \leq i \leq n, k^{*} \in [1, n], \end{matrix}

(8)

where

μ_{1}^{* -} = 1, σ_{1}^{2 * -} = 0.1, π_{1}^{* -} = 0.6, μ_{2}^{* -} = 2, σ_{2}^{2 * -} = 0.2, π_{2}^{* -} = 0.4,

μ_{1}^{* +} = 3, σ_{1}^{2 * +} = 0.2, π_{1}^{* +} = 0.3, μ_{2}^{* +} = 4, σ_{2}^{2 * +} = 0.3, π_{2}^{* +} = 0.7 .

The limiting distribution function in (6) is well known and tabulated by Kiefer [23]. In model (8),

q = 5

, at the significance level of 0.05, the critical value is 2.0005. We set the values of n to be 500, 1000, 2000, k to be

n / 4

,

n / 2

, and

3 n / 4

to represent the case where the change point is located in the front, middle, and back end of the data. The empirical sizes and powers are shown in Table 4 and Table 5 under 100 simulations.

Based on the analysis of the data in the tables, the performance of the method proposed in this paper under large-sample scenarios can be summarized as follows:

(1) Empirical Level Performance: As clearly shown in Table 4, with the continuous increase in the sample size n, the empirical level gradually approaches the significance level of 0.05. This indicates that, under large sample conditions, the actual test level of this method is in good agreement with the preset theoretical level.

(2) Empirical Power Performance: According to the data results in Table 5, there is a significant positive correlation between the increase in sample size and the improvement in empirical power. That is, the larger the sample size, the closer the value of empirical power is to 1, suggesting that the method can more effectively detect true change point situations in large samples.

(3) Influence of Change Point Location: The position of the change point in the data sequence has an impact on the detection effect. When the change point is located in the middle or front part of the data, the detection effect of this method is significantly better than when the change point is located at the end of the data.

(4) Comprehensive Performance Advantages: In large sample scenarios, the method proposed in this paper demonstrates good empirical size and power.

5. Concluding Remarks

This paper proposes an improved EM method to solve the parameter estimation problem with a mixture of the linear regression model and change point. This method is more accurate and robust against usual EM algorithms. Furthermore, a detection method based on score function is obtained to test the change point in mixture data. Simulations demonstrate the effectiveness of the proposed methodology.

6. Limitations and Future Research Directions

The method presented in this article also has certain limitations and challenges. Firstly, there is the issue of robustness in complex noisy scenarios. The method proposed in this article performs well in dealing with normal noise, but the detection accuracy remains to be studied under complex noise interference. The algorithm mentioned in the article still needs further research in dealing with high-dimensional data point detection and estimation problems. Finally, the computational complexity of the method proposed in this article is relatively high, and it is worth studying whether there is a simpler and faster processing method. The above issues are also possible research directions for the future.

Author Contributions

Methodology, Z.X.; Software, W.Z.; Writing—original draft, T.C.; Writing—review & editing, W.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Grant No. 12171391) and the Natural Science Basic Research Program of Shaanxi (Grant No. 2024JC-YBQN-0039), Shaanxi Fundamental Science Research Project for Mathematics and Physics (Grant No. 23JSY043).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Appendix for Proof

Lemma A1.

Under the Assumption 3 and

C r a m \overset{´}{e} r-Rao

regular conditions, we have

E [q_{1} (θ^{*}, Y)] = 0,

E [q_{2} (θ^{*}, Y)] = - E [q_{1} (θ^{*}, Y) q_{1}^{T} (θ^{*}, Y)] .

Proof.

See McLachlan and Peel [24]. □

Lemma A2.

For the process given by

Δ_{n} (t, θ^{*}) = \frac{1}{\sqrt{n}} \sum_{i = 1}^{[n t]} q_{1} (θ^{*}, Y_{i}), t \in (0, 1) .

(A1)

and under the conditions of Lemma A1 and under

H_{0}

the following functional central limit theorem holds:

Δ_{n} (\cdot, θ^{*}) \overset{d}{\to} Z (\cdot),

where

Z (\cdot)

is the Gaussian process whose mean is

E [Z (t)] = 0

and variance function is

V a r [Z (t)] = B (θ^{*})

.

Furthermore, if

B (θ^{*})

is reversible, then we have

B {(θ^{*})}^{- 1 / 2} Δ_{n} (\cdot, θ^{*}) \overset{d}{\to} W (\cdot),

where

W (\cdot)

is standard Brownian motion.

Proof.

The proof follows by direct application of Donsker’s theorem (Billingsley [25]). □

Usually in applications, the parameters

θ^{*}

under the null hypothesis are not known but have to be estimated. Let

{\hat{θ}}_{n}

is the maximum likelihood estimator of

θ^{*}

; therefore,

\sum_{i = 1}^{n} q_{1} ({\hat{θ}}_{n}, Y_{i}) = 0 .

Taylor expansion of

Γ_{n} (θ) = \frac{1}{n} \sum_{i = 1}^{n} q_{1} (θ, Y_{i})

at

θ^{*}

when

θ = {\hat{θ}}_{n}

; then,

0 = Γ_{n} ({\hat{θ}}_{n}) = Γ_{n} (θ^{*}) + Γ_{n}^{'} (θ^{*}) ({\hat{θ}}_{n} - θ^{*}) + R_{n} .

Under suitable regularity conditions

\begin{matrix} - Γ_{n}^{'} (θ^{*}) = \frac{1}{n} \sum_{i = 1}^{n} - q_{2} (θ^{*}, Y_{i}) & \overset{p}{\to} I (θ^{*}), \\ \sqrt{n} Γ_{n} (θ^{*}) & \overset{d}{\to} N (0, B (θ^{*})), \\ \sqrt{n} R_{n} & \overset{p}{\to} 0 . \end{matrix}

Therefore, the following holds:

\sqrt{n} ({\hat{θ}}_{n} - θ^{*}) \overset{d}{\to} N (0, V (θ^{*})),

where

V (θ^{*}) = I {(θ^{*})}^{- 1} B (θ^{*}) {I {(θ^{*})}^{- 1}}^{T} .

Equivalently, we can write

\sqrt{n} ({\hat{θ}}_{n} - θ^{*}) ≐ I {(θ^{*})}^{- 1} Δ_{n} (1, θ^{*}) .

Lemma A3.

Under

H_{0}

, the following holds:

Δ_{n} (\cdot, {\hat{θ}}_{n}) \overset{d}{\to} Z^{0} (\cdot),

where

Z^{0} (t) = Z (t) - t Z (1) .

Proof.

Taylor expansion of

Δ_{n} (t, {\hat{θ}}_{n})

at

θ^{*}

,

\begin{matrix} Δ_{n} (t, {\hat{θ}}_{n}) & ≐ \frac{1}{\sqrt{n}} \sum_{i = 1}^{[n t]} q_{1} (θ^{*}, Y_{i}) + \frac{1}{n} \sum_{i = 1}^{[n t]} q_{2} (θ^{*}, Y_{i}) \cdot \sqrt{n} ({\hat{θ}}_{n} - θ^{*}) \\ ≐ Δ_{n} (t, θ^{*}) - \frac{[n t]}{n} I (θ^{*}) \cdot I {(θ^{*})}^{- 1} Δ_{n} (1, θ^{*}) \\ \overset{d}{\to} Z (t) - t Z (1) . \end{matrix}

□

Next, we will provide the proof process for Theorem 1.

Proof.

(1) On the basis of the three lemmas stated above and if

B (θ^{*})

is reversible, we can obtain the first conclusion in Theorem 1:

{\hat{B}}_{n}^{- 1 / 2} Δ_{n} (\cdot, {\hat{θ}}_{n}) \overset{d}{\to} W^{0} (\cdot),

where

W^{0} (τ) = W (τ) - τ W (1)

is a standard Brownian bridge,

{\hat{B}}_{n}

is the variance estimation, such as

{\hat{B}}_{n} = \frac{1}{n} \sum_{i = 1}^{n} q_{1} ({\hat{θ}}_{n}, Y_{i}) q_{1} {({\hat{θ}}_{n}, Y_{i})}^{T} .

(2) Next, we prove the second conclusion.

Y_{i}

has the probability density function under

H_{A}

hypothesis:

\begin{matrix} η (Y | θ_{i}) & ≐ η (Y | θ^{*}) + \frac{\partial η (Y | θ)}{\partial θ} |_{θ = θ_{0}} \\ = η (Y | θ^{*}) + \frac{\partial \log η (Y | θ)}{\partial θ} \cdot η (Y | θ) |_{θ = θ_{0}} \cdot \frac{1}{\sqrt{n}} \cdot I_{{i \geq [n τ] + 1}} \\ = η (Y | θ^{*}) {1 + q_{1} {(θ^{*}, Y)}^{T} \cdot \frac{1}{\sqrt{n}} \cdot I_{{i \geq [n τ] + 1}}}, \end{matrix}

which can be derived from the Taylor expansion of

η

. Therefore, under the alternative hypothesis, the process (A1) no longer has zero mean in general but

\begin{matrix} E [q_{1} (θ_{i}, Y)] & ≐ \int q_{1} (θ^{*} η (y | θ^{*}) d y + \int q_{1} (θ^{*}, y) q_{1} {(θ^{*}, y)}^{T} η (y | θ^{*}) \frac{1}{\sqrt{n}} \cdot I_{{i \geq [n t] + 1}} d y \\ = 0 + \frac{1}{\sqrt{n}} B (θ^{*}) \cdot I_{{i \geq [n t] + 1}} . \end{matrix}

Similar to the Lemmas A2 and A3, we can deduce

Δ_{n} (\cdot, θ^{*}) \overset{d}{\to} W^{A} (\cdot),

where

W^{A} (t) = W (t) + B (θ^{*}) G (t),

(G (t) = \int_{0}^{t} \cdot I_{{\frac{i}{n} \geq t + \frac{1}{n}}} d (\frac{i}{n}) = t (1 - t) + o (1))

.

Finally, the following limiting process can be derived, which is our second conclusion:

\begin{matrix} {\hat{B}}_{n}^{- 1 / 2} Δ_{n} (τ, {\hat{θ}}_{n}) & ≐ {\hat{B}}_{n}^{- 1 / 2} {Δ_{n} (τ, θ^{*}) - τ Δ_{n} (1, θ^{*})} \\ ≐ W^{0} (τ) + \hat{B} {({\hat{θ}}_{n})}^{1 / 2} G^{0} (τ) \\ \overset{d}{\to} W^{0} (t) + B {(θ^{*})}^{1 / 2} G^{0} (t) . \end{matrix}

□

Next, we prove Theorem 2 and first give two useful lemmas.

Lemma A4.

Under the conditions of model (8) and Assumptions 1 and 2,

L_{n}

uniformly converges to

L

with probability.

Proof.

According to the structure of the function

L

, we can show that

L_{n}

converges to

L

point by point. Next, we prove uniform convergence.

Due to the fact that the parameter space

Θ

is a compact set,

\forall i, δ_{i} > 0, Θ \subseteq ⋃_{i = 1}^{m} U ((τ_{i}, θ_{i}), δ_{i}), m \in R

, where

U ((τ_{i}, θ_{i}), δ_{i})

is the neighborhood with center

(τ_{i}, θ_{i})

and radius

δ_{i}

. According to the continuity of

L_{n}

,

\forall ϵ > 0

,

| (τ_{i}, θ_{i}) - (τ, θ) | < δ_{i} \Rightarrow | L_{n} (τ_{i}, θ_{i}) - L_{n} (τ, θ) | < ϵ, \forall (τ, θ) \in U ((τ_{i}, θ_{i}), δ_{i}) .

Consequently,

\begin{matrix} | L_{n} (τ, θ) - L (τ, θ) | & = | L_{n} (τ, θ) - L_{n} (τ_{i}, θ_{i}) + L_{n} (τ_{i}, θ_{i}) - L (τ_{i}, θ_{i}) + L (τ_{i}, θ_{i}) - L (τ, θ) | \\ \leq | L_{n} (τ, θ) - L_{n} (τ_{i}, θ_{i}) | + | L_{n} (τ_{i}, θ_{i}) - L (τ_{i}, θ_{i}) | + | L (τ_{i}, θ_{i}) - L (τ, θ) | \\ < 3 ϵ . \end{matrix}

□

Lemma A5.

Under the condition of model (8) and Assumptions 1 and 3, the parameters are identifiable.

Proof.

From the

J e s s e n

inequality,

\begin{matrix} L (τ, θ | τ^{*}, θ^{*}) - L (τ^{*}, θ^{*} | τ^{*}, θ^{*}) \\ = τ \cdot E_{(τ^{*}, θ^{* -})} f (X, Y | τ, θ^{-}) + (τ^{*} - τ) \cdot E_{(τ^{*}, θ^{* -})} f (X, Y | τ, θ^{+}) \\ + (1 - τ^{*}) \cdot E_{(τ^{*}, θ^{* +})} f (X, Y | τ, θ^{+}) - (τ \cdot E_{(τ^{*}, θ^{* -})} f (X, Y | τ^{*}, θ^{* -}) \\ + (τ^{*} - τ) \cdot E_{(τ^{*}, θ^{* -})} f (X, Y | τ^{*}, θ^{* -}) + (1 - τ^{*}) \cdot E_{(τ^{*}, θ^{* +})} f (X, Y | τ^{*}, θ^{* +})) \\ = τ \cdot [E_{(τ^{*}, θ^{* -})} f (X, Y | τ, θ^{-}) - E_{(τ^{*}, θ^{* -})} f (X, Y | τ^{*}, θ^{* -})] + (τ^{*} - τ) [E_{(τ^{*}, θ^{* -})} f (X, Y | τ, θ^{+}) \\ - E_{(τ^{*}, θ^{* -})} f (X, Y | τ^{*}, θ^{* -})] + (1 - τ^{*}) \cdot [E_{(τ^{*}, θ^{* +})} f (X, Y | τ, θ^{+}) - E_{(τ^{*}, θ^{* +})} f (X, Y | τ^{*}, θ^{* +})] \\ = τ \cdot E_{(τ^{*}, θ^{* -})} [\log g_{1} (X, Y | θ^{-}, θ^{* -})] + (τ^{*} - τ) \cdot E_{(τ^{*}, θ^{* -})} [\log g_{2} (X, Y | θ^{+}, θ^{* -})] \\ + (1 - τ^{*}) \cdot E_{(τ^{*}, θ^{* +})} [\log g_{3} (X, Y | θ^{+}, θ^{* +})] \\ \leq τ \cdot \log {E_{(τ^{*}, θ^{* -})} [g_{1} (X, Y | θ^{-}, θ^{* -})]} + (τ^{*} - τ) \cdot \log {E_{(τ^{*}, θ^{* -})} [g_{2} (X, Y | θ^{+}, θ^{* -})]} \\ + (1 - τ^{*}) \cdot \log {E_{(τ^{*}, θ^{* +})} [g_{3} (X, Y | θ^{+}, θ^{* +})]} \\ = 0, \end{matrix}

where

\begin{matrix} g_{1} (X, Y | θ^{-}, θ^{* -}) & = \frac{\sum_{c = 1}^{C} \frac{π_{c}}{\sqrt{2 π} σ_{c}^{2 -}} \exp {- \frac{{(Y - X^{T} β_{c}^{-})}^{2}}{2 σ_{c}^{2 -}}}}{\sum_{c = 1}^{C} \frac{π_{c}^{*}}{\sqrt{2 π} σ_{c}^{2 * -}} \exp {- \frac{{(Y - X^{T} β_{c}^{* -})}^{2}}{2 σ_{c}^{2 * -}}}}, \\ g_{2} (X, Y | θ^{+}, θ^{* -}) & = \frac{\sum_{c = 1}^{C} \frac{π_{c}}{\sqrt{2 π} σ_{c}^{2 +}} \exp {- \frac{{(Y - X^{T} β_{c}^{+})}^{2}}{2 σ_{c}^{2 +}}}}{\sum_{c = 1}^{C} \frac{π_{c}^{*}}{\sqrt{2 π} σ_{c}^{2 * -}} \exp {- \frac{{(Y - X^{T} β_{c}^{* -})}^{2}}{2 σ_{c}^{2 * -}}}}, \\ g_{3} (X, Y | θ^{+}, θ^{* +}) & = \frac{\sum_{c = 1}^{C} \frac{π_{c}}{\sqrt{2 π} σ_{c}^{2 +}} \exp {- \frac{{(Y - X^{T} β_{c}^{+})}^{2}}{2 σ_{c}^{2 +}}}}{\sum_{c = 1}^{C} \frac{π_{c}^{*}}{\sqrt{2 π} σ_{c}^{2 * +}} \exp {- \frac{{(Y - X^{T} β_{c}^{* +})}^{2}}{2 σ_{c}^{2 * +}}}} . \end{matrix}

Therefore, we have

L (τ, θ | τ^{*}, θ^{*}) \leq L (τ^{*}, θ^{*} | τ^{*}, θ^{*})

, and the equal sign holds only when

(τ, θ) = (τ^{*}, θ^{*})

; that is,

L

maximizes at

(τ^{*}, θ^{*})

. Next, we prove that the maximum point is unique.

According to

J e s s e n

inequality, the three terms on the left side of the inequality are all less than or equal to 0. Therefore, the left side is equal to 0 if and only if the three terms are equal to 0. Since

τ, τ^{*} \in (0, 1)

, both

g_{1} (X, Y | θ^{-}, θ^{* -})

and

g_{3} (X, Y | θ^{+}, θ^{* +})

are equal to 1 if and only if

(π_{c}, β_{c}^{-}, σ_{c}^{2 -}) = (π_{c}^{*}, β_{c}^{* -}, σ_{c}^{2 * -})

and

(π_{c}, β_{c}^{+}, σ_{c}^{2 +}) = (π_{c}^{*}, β_{c}^{* +}, σ_{c}^{2 * +})

; meanwhile, because of

β_{c}^{+} \neq β_{c}^{* -}, σ_{c}^{2 +} \neq σ_{c}^{2 * -}

,

g_{2} (X, Y | θ^{+}, θ^{* -})

is equal to 1 if and only if

τ = τ^{*}

. Therefore,

L (τ, θ | τ^{*}, θ^{*})

obtains the unique maximum at

(τ^{*}, θ^{*})

. □

The proof of Theorem 2 is as follows.

Proof.

For any

ϵ > 0

, let

\begin{matrix} A & = {L_{n} (τ^{*}, θ^{*}) > L (τ^{*}, θ^{*}) - ϵ / 3}, \\ B & = {L_{n} (\hat{τ}, {\hat{θ}}_{n}) > L_{n} (τ^{*}, θ^{*}) - ϵ / 3}, \\ C & = {L (\hat{τ}, {\hat{θ}}_{n}) > L_{n} (\hat{τ}, {\hat{θ}}_{n}) - ϵ / 3} . \end{matrix}

From Lemmas A4 and A5,

L_{n}

uniformly converges to

L

; furthermore,

(\hat{τ}, {\hat{θ}}_{n})

and

(τ^{*}, θ^{*})

are the maximum points of

L_{n}

and

L

, respectively; therefore, when

n \to \infty

,

P (A), P (B), P (C)

all tend to 1. So when the event

A \cap B \cap C

occurs,

\begin{matrix} L (τ^{*}, θ^{*}) - L (\hat{τ}, {\hat{θ}}_{n}) & = (L (τ^{*}, θ^{*}) - L_{n} (τ^{*}, θ^{*})) + (L_{n} (τ^{*}, θ^{*}) - L_{n} (\hat{τ}, {\hat{θ}}_{n})) \\ + (L_{n} (\hat{τ}, {\hat{θ}}_{n}) - L (\hat{τ}, {\hat{θ}}_{n})) \\ < \frac{ϵ}{3} + \frac{ϵ}{3} + \frac{ϵ}{3} = ϵ . \end{matrix}

Furthermore, the parameter space

Θ

is a compact set and

L

is continuous, so for a sufficiently small

δ

, let

N = U ((τ^{*}, θ^{*}), δ)

, we have

\sup_{(τ, θ) \in Θ \cap N^{c}} L (τ, θ) ≜ L (τ_{0}, θ_{0}) < L (τ^{*}, θ^{*}) .

We might as well take

ϵ = L (τ^{*}, θ^{*}) - L (τ_{0}, θ_{0}) > 0

, then

- ϵ < L (τ^{*}, θ^{*}) - L (\hat{τ}, {\hat{θ}}_{n}) < ϵ

holds with probability 1.

Therefore, the following holds:

P (| L (τ^{*}, θ^{*}) - L (\hat{τ}, {\hat{θ}}_{n}) | < ϵ) \to 1 .

In other words, when

(\hat{τ}, {\hat{θ}}_{n}) \in N^{c}

and

| (τ^{*}, θ^{*}) - (\hat{τ}, {\hat{θ}}_{n}) | \geq δ

, then

| L (τ^{*}, θ^{*}) - L (\hat{τ}, {\hat{θ}}_{n}) | \geq ϵ

. It can be shown that

P (| (τ^{*}, θ^{*}) - (\hat{τ}, {\hat{θ}}_{n}) | \geq δ) \leq P (| L (τ^{*}, θ^{*}) - L (\hat{τ}, {\hat{θ}}_{n}) | \geq ϵ) \to 0 .

i.e.,

P (| (τ^{*}, θ^{*}) - (\hat{τ}, {\hat{θ}}_{n}) | \geq δ) \to 0 .

Therefore, we obtain the desired result:

(\hat{τ}, {\hat{θ}}_{n}) \overset{p}{\to} (τ^{*}, θ^{*}) .

□

References

Goldfeld, S.M.; Quandt, R.E. A Markov model for switching regressions. J. Econom. 1973, 1, 3–15. [Google Scholar] [CrossRef]
Frühwirth-Schnatter, S. Finite Mixture and Markov Switching Models; Springer: New York, NY, USA, 2006; pp. 241–275. [Google Scholar]
Hurn, M.; Justel, A.; Robert, C.P. Estimating Mixtures of Regressions. J. Comput. Graph. Stat. 2003, 12, 55–79. [Google Scholar] [CrossRef]
Li, P.; Chen, J. Testing the Order of a Finite Mixture. J. Am. Stat. Assoc. 2010, 105, 1084–1092. [Google Scholar] [CrossRef]
Chen, J.; Li, P.; Fu, Y. Inference on the Order of a Normal Mixture. J. Am. Stat. Assoc. 2012, 107, 1096–1105. [Google Scholar] [CrossRef]
Huang, M.; Yao, W. Mixture of Regression Models With Varying Mixing Proportions: A Semiparametric Approach. J. Am. Stat. Assoc. 2012, 107, 711–724. [Google Scholar] [CrossRef]
Huang, M.; Li, R.; Wang, S. Nonparametric Mixture of Regression Models. J. Am. Stat. Assoc. 2013, 108, 929–941. [Google Scholar] [CrossRef]
Page, E.S. Continuous inspection schemes. Biometrika 1954, 41, 100–115. [Google Scholar] [CrossRef]
Sen, A.; Srivastava, M.S. On tests for detecting change in mean. Ann. Stat. 1975, 3, 98–108. [Google Scholar] [CrossRef]
Sen, A.; Srivastava, M.S. Some one-sided tests for change in level. Technometrics 1975, 17, 61–64. [Google Scholar] [CrossRef]
Hawkins, D.M. Testing a sequence of observations for a shift in location. J. Am. Stat. Assoc. 1977, 72, 180–186. [Google Scholar] [CrossRef]
James, B.; James, K.; Siegmund, D. Tests for a change-point. Biometrika 1987, 74, 71–83. [Google Scholar] [CrossRef]
Srivastava, M.S.; Worsley, K.J. Likelihood ratio tests for a change in the multivariate normal mean. J. Am. Stat. Assoc. 1986, 81, 199–204. [Google Scholar] [CrossRef]
Bassevile, M.; Nikiforov, I. Detection of Abrupt Changes: Theory and Applications; Prentice Hall: Hoboken, NJ, USA, 1993. [Google Scholar]
Wang, G.; Zou, C.; Yin, G. Change-point detection in multinomial data with a large number of categories. Ann. Stat. 2018, 46, 2020–2044. [Google Scholar] [CrossRef]
Xia, Z.; Qiu, P. Jump information criterion for statistical inference in estimating discontinuous curves. Biometrika 2015, 102, 397–408. [Google Scholar] [CrossRef]
Bai, J. Least squares estimation of a shift in linear processes. J. Time Ser. Anal. 1994, 15, 453–472. [Google Scholar] [CrossRef]
Baranowski, R.; Chen, Y.; Fryzlewicz, P. Narrowest-over-threshold detection of multiple change points and change-point-like features. J. R. Stat. Soc. Ser. Stat. Methodol. 2019, 81, 649–672. [Google Scholar] [CrossRef]
Follain, B.; Wang, T.; Samworth, R.J. High-dimensional changepoint estimation with heterogeneous missingness. J. R. Stat. Soc. Ser. Stat. Methodol. 2022, 84, 1023–1055. [Google Scholar] [CrossRef]
Drabech, Z.; Douimi, M.; Zemmouri, E. A Markov random field model for change points detection. J. Comput. Sci. 2024, 83, 102429. [Google Scholar] [CrossRef]
Ratnasingam, S.; Gamage, R. Empirical likelihood change point detection in quantile regression models. Comput. Stat. 2025, 40, 999–1020. [Google Scholar] [CrossRef]
Gong, Y.; Dong, X.; Zhang, J.; Chen, M. Latent evolution model for change point detection in time-varying networks. Inf. Sci. 2023, 646, 119376. [Google Scholar] [CrossRef]
Kiefer, J. K-sample analogues of the Kolmogorov-Smirnov and Cramér-Von Mises tests. Ann. Math. Stat. 1959, 30, 420–447. [Google Scholar] [CrossRef]
McLachlan, G.; Peel, D. Finite Mixture Models; Wiley: New York, NY, USA, 2000. [Google Scholar]
Billingsley, P. Convergence of Probability Measures, 2nd ed.; Wiley: New York, NY, USA, 1999. [Google Scholar]

Table 1. Estimated value and standard error of

β^{*}

.

Table 1. Estimated value and standard error of

β^{*}

.

	$β^{*}$	${({β_{1}}^{* -})}^{T} = (1, 2)$	${({β_{2}}^{* -})}^{T} = (1, 3)$	${({β_{1}}^{* +})}^{T} = (3, 4)$	${({β_{2}}^{* +})}^{T} = (3, 5)$
n = 500	CPEM	(0.994, 2.004)	(0.975, 3.023)	(2.993, 4.004)	(2.999, 4.999)
	CPEM	(0.045, 0.029)	(0.150, 0.129)	(0.064, 0.041)	(0.085, 0.057)
	NCPEM	(2.235, 1.911)	(0.988, 3.048)	(3.086, 3.874)	(3.022, 4.941)
	NCPEM	(9.906, 1.624)	(7.933, 1.290)	(9.285, 1.609)	(0.511, 0.278)
n = 1000	CPEM	(0.982, 2.054)	(1.006, 2.993)	(2.992, 4.005)	(2.996, 5.001)
	CPEM	(0.042, 0.017)	(0.104, 0.077)	(0.039, 0.024)	(0.058, 0.038)
	NCPEM	(0.452, 2.220)	(0.884, 3.044)	(1.550, 4.192)	(3.187, 4.939)
	NCPEM	(8.490, 1.664)	(3.710, 0.678)	(8.211, 1.367)	(2.540, 0.484)
n = 2000	CPEM	(1.000, 1.999)	(1.014, 2.995)	(2.930, 4.017)	(3.001, 4.997)
	CPEM	(0.022, 0.014)	(0.069, 0.046)	(0.030, 0.019)	(0.041, 0.035)
	NCPEM	(1.225, 2.054)	(1.318, 2.975)	(2.880, 3.921)	(3.009, 4.975)
	NCPEM	(4.212, 0.624)	(2.383, 0.505)	(0.557, 0.342)	(0.182, 0.107)

Table 2. Estimated value and standard error of

σ^{2 *}

.

Table 2. Estimated value and standard error of

σ^{2 *}

.

	$σ^{2 *}$	$σ_{1}^{2 * -} = 0.1$	$σ_{2}^{2 * -} = 0.2$	$σ_{1}^{2 * +} = 0.1$	$σ_{2}^{2 * +} = 0.2$
n = 500	CPEM	0.099 (0.006)	0.199 (0.012)	0.100 (0.008)	0.197 (0.011)
n = 500	NCPEM	0.193 (0.466)	0.346 (0.626)	0.171 (0.436)	0.441 (0.786)
n = 1000	CPEM	0.098 (0.011)	0.200 (0.005)	0.099 (0.006)	0.199 (0.008)
n = 1000	NCPEM	0.143 (0.287)	0.375 (0.686)	0.147 (0.338)	0.376 (0.661)
n = 2000	CPEM	0.100 (0.003)	0.203 (0.015)	0.099 (0.011)	0.204 (0.005)
n = 2000	NCPEM	0.153 (0.339)	0.333 (0.616)	0.096 (0.022)	0.376 (0.687)

Table 3. Estimated value and standard error of

π^{*}

.

Table 3. Estimated value and standard error of

π^{*}

.

	$π^{*}$	$π_{1}^{* -} = 0.6$	$π_{2}^{* -} = 0.4$	$π_{1}^{* +} = 0.3$	$π_{2}^{* +} = 0.7$
n = 500	CPEM	0.601 (0.031)	0.399 (0.031)	0.301 (0.030)	0.699 (0.030)
n = 500	NCPEM	0.182 (0.053)	0.310 (0.050)	0.189 (0.045)	0.320 (0.062)
n = 1000	CPEM	0.594 (0.063)	0.406 (0.063)	0.295 (0.019)	0.705 (0.019)
n = 1000	NCPEM	0.189 (0.060)	0.310 (0.044)	0.193 (0.039)	0.308 (0.056)
n = 2000	CPEM	0.599 (0.014)	0.401 (0.013)	0.298 (0.033)	0.702 (0.033)
n = 2000	NCPEM	0.194 (0.035)	0.311 (0.058)	0.186 (0.049)	0.310 (0.045)

Table 4. Empirical sizes.

n	500	1000	2000
size	0.07	0.06	0.05

Table 5. Empirical powers.

	$\frac{n}{4}$	$\frac{n}{2}$	$\frac{3 n}{4}$
n	$\frac{n}{4}$	$\frac{n}{2}$	$\frac{3 n}{4}$
500	0.60	0.83	0.59
1000	0.96	0.88	0.67
2000	1	0.98	0.94

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, W.; Cheng, T.; Xia, Z. Change-Point Estimation and Detection for Mixture of Linear Regression Models. Axioms 2025, 14, 402. https://doi.org/10.3390/axioms14060402

AMA Style

Zhao W, Cheng T, Xia Z. Change-Point Estimation and Detection for Mixture of Linear Regression Models. Axioms. 2025; 14(6):402. https://doi.org/10.3390/axioms14060402

Chicago/Turabian Style

Zhao, Wenzhi, Tian Cheng, and Zhiming Xia. 2025. "Change-Point Estimation and Detection for Mixture of Linear Regression Models" Axioms 14, no. 6: 402. https://doi.org/10.3390/axioms14060402

APA Style

Zhao, W., Cheng, T., & Xia, Z. (2025). Change-Point Estimation and Detection for Mixture of Linear Regression Models. Axioms, 14(6), 402. https://doi.org/10.3390/axioms14060402

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Change-Point Estimation and Detection for Mixture of Linear Regression Models

Abstract

1. Introduction

2. Statistical Model and Assumptions

3. Parameter Inference

3.1. Estimation Procedure

3.2. Hypothesis Test

3.3. Consistency

4. Simulation

4.1. Estimation Experiment

4.2. Detection Experiment

5. Concluding Remarks

6. Limitations and Future Research Directions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Appendix for Proof

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI