Abstract
This paper studies the estimation and detection problems in the mixture of linear regression models with change point. An improved Expectation–Maximization (EM) algorithm is devised specifically for multi-classified mixture data with change points. Under appropriate conditions, the large-sample properties of the estimator are rigorously proven. This improved EM algorithm not only precisely locates the change points but also yields accurate parameter estimates for each class. Additionally, a detector grounded in the score function is proposed to identify the presence of change points in mixture data. The limiting distributions of the test statistics under both the null and alternative hypotheses are systematically derived. Extensive simulation experiments are conducted to assess the effectiveness of the proposed method, and comparative analyses with the conventional EM algorithm are performed. The results clearly demonstrate that the EM algorithm without considering change points exhibits poor performance in classifying data, often resulting in the misclassification or even omission of certain classes. In contrast, the estimation method introduced in this study showcases remarkable accuracy and robustness, with favorable empirical sizes and powers.
MSC:
62F10; 62J05
1. Introduction
Mixture models have found extensive applications in econometrics and social sciences, and the associated theoretical frameworks have been thoroughly investigated. Among them, finite mixtures of linear regression models, a particularly valuable class of mixture models, have been widely employed across diverse fields, including econometrics, marketing, epidemiology, and biology. Simultaneously, the statistical inference and parameter estimation based on mixture models play a pivotal role in practical applications. On another front, the change point problem remains a focal point in statistical research. Conceptually, a change point denotes the specific location or instant at which a sudden alteration in a statistical property occurs. Change point phenomena are pervasive in both natural and social systems, and have enabled crucial advancements in numerous domains such as industrial control, financial economics, biomedicine, and signal processing. The detection and estimation of change points within mixture datasets are essential for discerning underlying patterns and making informed decisions, thereby holding significant practical implications.
In recent decades, remarkable progress has been achieved in the analysis of mixture models. Goldfeld and Quandt provided an outstanding introduction to mixtures of linear regression models [1]. They delved into the parameter estimation problem of such mixtures and applied it to analyze the imbalance in the housing price market. Frühwirth and Schnatter summarized the Bayesian approach for mixtures of linear regression models in their paper [2]. Hurn and Justel proposed a generalized linear finite mixture model and overcame the label switching issue by normalizing the loss function [3]. Li, Chen, and their collaborators (Chen, Li, and Fu) concentrated on the hypothesis testing problem regarding the order of the mixture model and derived the limiting distribution of the test statistics [4,5]. Huang and Yao introduced a semiparametric mixture regression model, where the mixing proportion is a smooth function of covariates [6]. Huang, Li, and Wang developed a non-parametric finite mixture method for regression models and demonstrated the estimation procedure through an analysis of U.S. Housing Price Index (HPI) data [7].
The change point problem has attracted extensive research attention, with a vast body of literature accumulated over the years. It was first introduced by Page during his study of continuous inspection schemes, who proposed the Cumulative Sum (CUSUM) test method [8]. Subsequently, scholars like Sen and Srivastava, Hawkins, James et al., and Srivastava and Worsley investigated the detection of change points in the mean of normal random variable sequences [9,10,11,12,13]. Basseville and Nikiforov systematically expounded the theoretical foundation of change point detection and estimation algorithms [14]. Wang, Zou, and Yin delved into change point detection in multinomial data as the number of categories approaches infinity, rigorously proving the statistical properties of relevant test statistics [15]. Xia and Qiu proposed the JIC criterion to tackle the multi-change point problem in non-parametric regression models [16]. Bai employed the least squares method to estimate the mean change point of linear processes, deriving the consistency and convergence rate of the change point estimator [17]. Baranowski et al. utilized grid search algorithms based on binary segmentation to accurately determine the number and positions of change points [18]. Follain et al. developed a novel approach for estimating change points in partially observed, high-dimensional time series with simultaneous mean shifts in sparse coordinate subsets [19]. Gong et al. redefined the graph-based change point detection problem as a prediction task and proposed an innovative change point detection (CPD) method for dynamic graphs through a latent evolution model. Drabech et al. introduced a Markov Random Field (MRF) model to detect slope changes [20], while Ratnasingam et al. proposed an empirical likelihood-based non-parametric procedure for identifying structural changes in quantile regression models [21]. Notably, despite these abundant studies, the statistical analysis of change points within finite mixtures of linear models remains a relatively overlooked area [22].
This paper delves into the estimation and detection problems of the finite mixture of linear regression models with change points. Specifically, for multi-classified mixture data incorporating change points, we ingeniously propose an improved Expectation–Maximization (EM) algorithm. This algorithm is meticulously designed to accurately compute parameter estimators, such as the location of change points and regression coefficients. Through rigorous mathematical proofs, we firmly establish the consistency and asymptotic normality of these parameter estimators, providing a solid theoretical foundation. Moreover, to effectively detect the presence of change points in the mixture of linear regression models, we innovatively develop a detector based on the score function. By conducting in-depth analyses, we systematically derive the limiting distributions under both the null hypothesis and the alternative hypothesis, which significantly contributes to the theoretical framework of this study. To thoroughly verify the practical effectiveness of the proposed method, we carefully design and execute a series of simulation experiments.
The remaining sections of this paper are structured as follows. In Section 2, we introduce the estimation and detection methods, statistical models, along with their associated assumptions. In Section 3 is dedicated to the development of the estimation and detection techniques for change points within the mixture of linear regression models. Additionally, we obtain the large sample properties of the estimators and test statistics. In Section 4, the simulation results are presented. Section 5 provides some concluding remarks and discussions. In Section 6, the limitations and future research directions are presented. The proofs of the theorems are included in the appendix.
2. Statistical Model and Assumptions
Assume that are independent random samples from the following data generating process:
where is a change point, is a p-dimensional random covariate, C is the number of components, are mixture coefficients for the cth component, . and are called mixing proportions or weights. , and the parameters are unknown. Equivalently speaking, follow a finite mixture of normals,
Naturally, the log-likelihood function corresponding to the dataset can be expressed as
where denotes the normal density.
For later use, we introduce some notations. Let denote the true parameter, where and Let represent the parameters on the left and right sides of the change point, respectively. Similarly, we can define , ,
Let and be the maximum likelihood estimator of and , which can be obtained by maximizing (2). In Section 3.1, an improved EM algorithm is proposed to carry out the estimation. The following assumptions are imposed.
Assumption 1.
, and .
The above assumption requires that depends on the sample size n and is approximately proportional to n.
Assumption 2.
We suppose that and parameter space is a compact set,
Assumption 2 restricts the parameter space to be compact, which means any open cover of has a finite subcover.
Assumption 3.
are independent.
3. Parameter Inference
3.1. Estimation Procedure
In this subsection, we propose an effective EM algorithm to deal with the parameter estimation problem. We take as an example to illustrate the derivation of the iterative formula ( can be obtained in the same way). At this time, the log-likelihood function corresponding to the data is
In the EM framework, the mixture problem is described as an incomplete-data problem. We view the observed data as being incomplete, and then introduce unobserved random variables
and , which is the component label of . Therefore, the complete data are , and the complete log-likelihood function corresponding to (3) is
Supposing in the lth cycle of the EM algorithm iteration, we have ; then, in the E-step of th cycle, the expectation of the latent variable can be calculated by
In the M-step of the th cycle, we maximize
The maximization of Equation (4) is equivalent to separately maximize and . Furthermore, the iterative formula can be easily derived as follows:
Let the initial value be , repeat the above iteration algorithm, and when the iteration converges, we obtain the estimator of mixture parameters, denoted as . At this time, the log-likelihood function corresponding to the data is
Whenever a value of k is determined, the mixture data are divided into two parts: and . Let the initial value be , and we can obtain the estimations of change point and mixture parameter according to Algorithm 1:
where , and , ,
| Algorithm 1 EM algorithm considering change point |
| Intput: , the number of mixture component C, the maximum iterations T. |
Output:
|
3.2. Hypothesis Test
Define the following notations for later use:
In the following, we are interested in testing the null hypothesis:
against the local alternative hypothesis:
where is a function of bounded variation on [0,1] which describes the patter of departure from stability of the parameter .
Firstly, we give the following lemmas and then we build an empirical process and infer its limiting process under the and hypothesis, respectively, as described by Theorem 1.
Theorem 1.
(1). under ,
where is a standard Brownian bridge with and some consistent covariance matrix estimation, such as
(2). Under ,
where with .
Next, we construct test statistics according to the empirical process (5) stated above. The key idea is that the empirical process reflects some symptoms of structural changes, and its performance under and is significantly different; then, the test statistics can be constructed as follows:
Further from Theorem 1 and continuous mapping theorem, the following corollary can be obtained:
The above results show that the limiting distributions of the test statistics are different under the null and alternative hypothesis; thus, we can judge the existence of the change point according to it.
3.3. Consistency
In order to prove the consistency of the estimator, we first construct a function as follows:
where means the integer part of a and
Without loss of generality, we only consider the case for brevity; then, by ’s Law of Large Numbers and inequality, the Formula (7) can be rewritten as
where
Now, we state our second theoretical result, which is the consistency of the obtained estimator.
Theorem 2.
Let and under the conditions of the Lemmas A1 and A2 in Appendix A; then, we have
4. Simulation
To assess the performance of our proposed method, we shall consider two simulation experiments as follows.
4.1. Estimation Experiment
In Experiment 1, we focus on the scenario where the vector is two-dimensional and the value of C is set to 2. Additionally, when dealing with the parameter estimation problem of mixed models that incorporate change points, the conventional approach is to overlook the presence of these change points. Instead, the Expectation–Maximization (EM) algorithm is directly employed to estimate the unknown parameters, as if the data were generated from a mixed model consisting of components. Consequently, we conduct a comparison between the estimation results obtained by the EM method that does not take change points into account (referred to as “NCPEM”) and those derived from our improved EM method (“CPEM”). We utilize the data generation mechanism of Model (1) to generate the data. Subsequently, we construct a two-component mixed model with a change point and proceed to specify the parameters as follows:
and for Let the total sample size n from and the change point be , respectively. The change point estimators and the corresponding standard errors are over 100 simulations. It can be seen that the change point estimator is close to the true value. The mixture parameter estimators and the corresponding standard errors are presented in Table 1, Table 2 and Table 3.
Table 1.
Estimated value and standard error of .
Table 2.
Estimated value and standard error of .
Table 3.
Estimated value and standard error of .
As is evident from Table 1, the estimation results obtained by the Change Point Expectation–Maximization (CPEM) method are much closer to the true parameter values compared to those derived from the Non-Change Point Expectation–Maximization (NCPEM) method. Moreover, the CPEM method yields a significantly smaller standard error. Based on the estimations of and presented in Table 2 and Table 3, it is clear that the estimation results of the NCPEM method deviate substantially from the true parameter values. Additionally, as the sample size increases, the majority of the estimation results tend to converge more closely to the true parameter values, accompanied by a smaller standard error. These findings strongly indicate that our improved CPEM method exhibits superior accuracy and enhanced robustness.
4.2. Detection Experiment
In experiment 2, we explore the performance of the detection procedure. Let and consider the following mixture model:
where
The limiting distribution function in (6) is well known and tabulated by Kiefer [23]. In model (8), , at the significance level of 0.05, the critical value is 2.0005. We set the values of n to be 500, 1000, 2000, k to be , , and to represent the case where the change point is located in the front, middle, and back end of the data. The empirical sizes and powers are shown in Table 4 and Table 5 under 100 simulations.
Table 4.
Empirical sizes.
Table 5.
Empirical powers.
Based on the analysis of the data in the tables, the performance of the method proposed in this paper under large-sample scenarios can be summarized as follows:
(1) Empirical Level Performance: As clearly shown in Table 4, with the continuous increase in the sample size n, the empirical level gradually approaches the significance level of 0.05. This indicates that, under large sample conditions, the actual test level of this method is in good agreement with the preset theoretical level.
(2) Empirical Power Performance: According to the data results in Table 5, there is a significant positive correlation between the increase in sample size and the improvement in empirical power. That is, the larger the sample size, the closer the value of empirical power is to 1, suggesting that the method can more effectively detect true change point situations in large samples.
(3) Influence of Change Point Location: The position of the change point in the data sequence has an impact on the detection effect. When the change point is located in the middle or front part of the data, the detection effect of this method is significantly better than when the change point is located at the end of the data.
(4) Comprehensive Performance Advantages: In large sample scenarios, the method proposed in this paper demonstrates good empirical size and power.
5. Concluding Remarks
This paper proposes an improved EM method to solve the parameter estimation problem with a mixture of the linear regression model and change point. This method is more accurate and robust against usual EM algorithms. Furthermore, a detection method based on score function is obtained to test the change point in mixture data. Simulations demonstrate the effectiveness of the proposed methodology.
6. Limitations and Future Research Directions
The method presented in this article also has certain limitations and challenges. Firstly, there is the issue of robustness in complex noisy scenarios. The method proposed in this article performs well in dealing with normal noise, but the detection accuracy remains to be studied under complex noise interference. The algorithm mentioned in the article still needs further research in dealing with high-dimensional data point detection and estimation problems. Finally, the computational complexity of the method proposed in this article is relatively high, and it is worth studying whether there is a simpler and faster processing method. The above issues are also possible research directions for the future.
Author Contributions
Methodology, Z.X.; Software, W.Z.; Writing—original draft, T.C.; Writing—review & editing, W.Z. All authors have read and agreed to the published version of the manuscript.
Funding
This work was supported by the National Natural Science Foundation of China (Grant No. 12171391) and the Natural Science Basic Research Program of Shaanxi (Grant No. 2024JC-YBQN-0039), Shaanxi Fundamental Science Research Project for Mathematics and Physics (Grant No. 23JSY043).
Data Availability Statement
Data are contained within the article.
Conflicts of Interest
The authors declare no conflicts of interest.
Appendix A. Appendix for Proof
Lemma A1.
Under the Assumption 3 and regular conditions, we have
Proof.
See McLachlan and Peel [24]. □
Lemma A2.
For the process given by
and under the conditions of Lemma A1 and under the following functional central limit theorem holds:
where is the Gaussian process whose mean is and variance function is .
Furthermore, if is reversible, then we have
where is standard Brownian motion.
Proof.
The proof follows by direct application of Donsker’s theorem (Billingsley [25]). □
Usually in applications, the parameters under the null hypothesis are not known but have to be estimated. Let is the maximum likelihood estimator of ; therefore,
Taylor expansion of at when ; then,
Under suitable regularity conditions
Therefore, the following holds:
where Equivalently, we can write
Lemma A3.
Under , the following holds:
where
Proof.
Taylor expansion of at ,
□
Next, we will provide the proof process for Theorem 1.
Proof.
(1) On the basis of the three lemmas stated above and if is reversible, we can obtain the first conclusion in Theorem 1:
where is a standard Brownian bridge, is the variance estimation, such as
(2) Next, we prove the second conclusion. has the probability density function under hypothesis:
which can be derived from the Taylor expansion of . Therefore, under the alternative hypothesis, the process (A1) no longer has zero mean in general but
Similar to the Lemmas A2 and A3, we can deduce
where .
Finally, the following limiting process can be derived, which is our second conclusion:
□
Next, we prove Theorem 2 and first give two useful lemmas.
Lemma A4.
Under the conditions of model (8) and Assumptions 1 and 2, uniformly converges to with probability.
Proof.
According to the structure of the function , we can show that converges to point by point. Next, we prove uniform convergence.
Due to the fact that the parameter space is a compact set, , where is the neighborhood with center and radius . According to the continuity of , ,
Consequently,
□
Lemma A5.
Under the condition of model (8) and Assumptions 1 and 3, the parameters are identifiable.
Proof.
From the inequality,
where
Therefore, we have , and the equal sign holds only when ; that is, maximizes at . Next, we prove that the maximum point is unique.
According to inequality, the three terms on the left side of the inequality are all less than or equal to 0. Therefore, the left side is equal to 0 if and only if the three terms are equal to 0. Since , both and are equal to 1 if and only if and ; meanwhile, because of , is equal to 1 if and only if . Therefore, obtains the unique maximum at . □
The proof of Theorem 2 is as follows.
Proof.
For any , let
From Lemmas A4 and A5, uniformly converges to ; furthermore, and are the maximum points of and , respectively; therefore, when , all tend to 1. So when the event occurs,
Furthermore, the parameter space is a compact set and is continuous, so for a sufficiently small , let , we have
We might as well take , then holds with probability 1.
Therefore, the following holds:
In other words, when and , then . It can be shown that
i.e., Therefore, we obtain the desired result:
□
References
- Goldfeld, S.M.; Quandt, R.E. A Markov model for switching regressions. J. Econom. 1973, 1, 3–15. [Google Scholar] [CrossRef]
- Frühwirth-Schnatter, S. Finite Mixture and Markov Switching Models; Springer: New York, NY, USA, 2006; pp. 241–275. [Google Scholar]
- Hurn, M.; Justel, A.; Robert, C.P. Estimating Mixtures of Regressions. J. Comput. Graph. Stat. 2003, 12, 55–79. [Google Scholar] [CrossRef]
- Li, P.; Chen, J. Testing the Order of a Finite Mixture. J. Am. Stat. Assoc. 2010, 105, 1084–1092. [Google Scholar] [CrossRef]
- Chen, J.; Li, P.; Fu, Y. Inference on the Order of a Normal Mixture. J. Am. Stat. Assoc. 2012, 107, 1096–1105. [Google Scholar] [CrossRef]
- Huang, M.; Yao, W. Mixture of Regression Models With Varying Mixing Proportions: A Semiparametric Approach. J. Am. Stat. Assoc. 2012, 107, 711–724. [Google Scholar] [CrossRef]
- Huang, M.; Li, R.; Wang, S. Nonparametric Mixture of Regression Models. J. Am. Stat. Assoc. 2013, 108, 929–941. [Google Scholar] [CrossRef]
- Page, E.S. Continuous inspection schemes. Biometrika 1954, 41, 100–115. [Google Scholar] [CrossRef]
- Sen, A.; Srivastava, M.S. On tests for detecting change in mean. Ann. Stat. 1975, 3, 98–108. [Google Scholar] [CrossRef]
- Sen, A.; Srivastava, M.S. Some one-sided tests for change in level. Technometrics 1975, 17, 61–64. [Google Scholar] [CrossRef]
- Hawkins, D.M. Testing a sequence of observations for a shift in location. J. Am. Stat. Assoc. 1977, 72, 180–186. [Google Scholar] [CrossRef]
- James, B.; James, K.; Siegmund, D. Tests for a change-point. Biometrika 1987, 74, 71–83. [Google Scholar] [CrossRef]
- Srivastava, M.S.; Worsley, K.J. Likelihood ratio tests for a change in the multivariate normal mean. J. Am. Stat. Assoc. 1986, 81, 199–204. [Google Scholar] [CrossRef]
- Bassevile, M.; Nikiforov, I. Detection of Abrupt Changes: Theory and Applications; Prentice Hall: Hoboken, NJ, USA, 1993. [Google Scholar]
- Wang, G.; Zou, C.; Yin, G. Change-point detection in multinomial data with a large number of categories. Ann. Stat. 2018, 46, 2020–2044. [Google Scholar] [CrossRef]
- Xia, Z.; Qiu, P. Jump information criterion for statistical inference in estimating discontinuous curves. Biometrika 2015, 102, 397–408. [Google Scholar] [CrossRef]
- Bai, J. Least squares estimation of a shift in linear processes. J. Time Ser. Anal. 1994, 15, 453–472. [Google Scholar] [CrossRef]
- Baranowski, R.; Chen, Y.; Fryzlewicz, P. Narrowest-over-threshold detection of multiple change points and change-point-like features. J. R. Stat. Soc. Ser. Stat. Methodol. 2019, 81, 649–672. [Google Scholar] [CrossRef]
- Follain, B.; Wang, T.; Samworth, R.J. High-dimensional changepoint estimation with heterogeneous missingness. J. R. Stat. Soc. Ser. Stat. Methodol. 2022, 84, 1023–1055. [Google Scholar] [CrossRef]
- Drabech, Z.; Douimi, M.; Zemmouri, E. A Markov random field model for change points detection. J. Comput. Sci. 2024, 83, 102429. [Google Scholar] [CrossRef]
- Ratnasingam, S.; Gamage, R. Empirical likelihood change point detection in quantile regression models. Comput. Stat. 2025, 40, 999–1020. [Google Scholar] [CrossRef]
- Gong, Y.; Dong, X.; Zhang, J.; Chen, M. Latent evolution model for change point detection in time-varying networks. Inf. Sci. 2023, 646, 119376. [Google Scholar] [CrossRef]
- Kiefer, J. K-sample analogues of the Kolmogorov-Smirnov and Cramér-Von Mises tests. Ann. Math. Stat. 1959, 30, 420–447. [Google Scholar] [CrossRef]
- McLachlan, G.; Peel, D. Finite Mixture Models; Wiley: New York, NY, USA, 2000. [Google Scholar]
- Billingsley, P. Convergence of Probability Measures, 2nd ed.; Wiley: New York, NY, USA, 1999. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).