Robust Algorithms for Change-Point Regressions Using the t -Distribution

: Regression models with change-points have been widely applied in various fields. Most methodologies for change-point regressions assume Gaussian errors. For many real data having longer-than-normal tails or atypical observations, the use of normal errors may unduly affect the fit of change-point regression models. This paper proposes two robust algorithms called EMT and FCT for change-point regressions by incorporating the t -distribution with the expectation and maximization algorithm and the fuzzy classification procedure, respectively. For better resistance to high leverage outliers, we introduce a modified version of the proposed method, which fits the t changepoint regression model to the data after moderately pruning high leverage points. The selection of the degrees of freedom is discussed. The robustness properties of the proposed methods are also analyzed and validated. Simulation studies show the effectiveness and resistance of the proposed methods against outliers and heavy-tailed distributions. Extensive experiments demonstrate the preference of the t -based approach over normal-based methods for better robustness and computational efficiency. EMT and FCT generally work well, and FCT always performs better for less biased estimates, especially in cases of data contamination. Real examples show the need and the practicability of the proposed method.


Introduction
In regression analyses, the relationship between the response and explanatory variables often changes abruptly at some positions, termed change-points (CPs) or thresholds. Change-point (CP) locations signify the place of changes in the data structure, which is usually pivotal for decision making or for realizing some scientific issues. Hence, CP regression models have been widely used for uncovering the meaning of CPs and regression coefficients in many fields, such as engineering, economics, medicine, climatology, genomics and linguistics. There are some motivating examples of CP detection in reality. In medicine, CPs often denote threshold values that are altered by the effect of some risk factor on the response. For example, the risk of a baby having Down Syndrome (DS) at birth is low and unrelated to the mother's age until an age threshold, but then afterward, the DS risk grows steeply with the mother's age [1]. Another use of CP models in medical studies is for monitoring patients' health, such as detecting heart rate trend [2] or inspecting the activity of epileptics [3]. Besides, in econometrics, CPs often signal the time of market structure changes as a result of government interference or policy changes [4]. More applications include discovering the time of changes in the temperature trend for studying the global warming effect in climatology [5] and finding changes in the DNA copy number in genomics [6].
Outliers frequently occur in regression data due to measurement errors or improperly sampling a fraction of data from a different population. Outliers can lead to distortions in the model fitting. Outliers even have greater impacts on CP regressions since they may not only distort the estimations of model parameters but also twist the segmented structure of data. Fearnhead and Rigaill [6] provided a real example to illustrate the CP detection errors caused by outliers. They remarked that many CP methods are sensitive to outliers because of the modeling assumption of Gaussian noises, with much work on CP regressions being carried out for Gaussian models, such as in [7,10,11]. Despite that the distortion very likely resulted from outliers and the considerable concern for robust procedures in the literature of CP detection, there are few studies on robust estimations for CP regressions.
In regression analysis, the two common methods for solving outlier problems are outlier diagnostic and robust procedures. The robust approach is more recommended [12,13] since the diagnostic approach has issues such as it may fail due to the masking effect, the probability distribution of the resulting statistics is unknown and the variations of estimators may be underestimated. Hence, in this study, we aim at finding a new robust procedure for CP regressions. Here, we regard an estimation method to be robust in the sense of insensitivity to outliers and heavy-tailed distributions.
In this paper, we propose a robust CP regression model using t-distributions. The longer tail of a t-distribution makes it a more robust approach to CP regression data fitting by giving less weight to observations that are atypical in the computation of CPs and regression parameters. With the t CP regression approach, the normal distribution in each regime is embedded in a broader class of symmetric distributions, with an additional parameter called the degrees of freedom, r. As commented by Lange et al. [14] and Peel and McLachlan [15], the parameter r is helpful for tuning the robustness of the t-based method so that it is crucial for achieving resistant estimations. The proposed method treats CPs as latent class variables representing data points lying in respective regimes so that the new method is free of the difficulty of non-differentiability of the log-likelihood function with respect to CPs regarded as parameters. The computations of t log-likelihood are simplified by writing the t-distribution as a scale mixture of normal distributions. Through two latent variables introduced into the observed data, the expectation and maximization (EM) algorithm is employed to estimate the CP model.
Considering the indefiniteness of the demarcation between two adjacent pieces, we extend the crisp CP model to a fuzzy class model. We create a fuzzy classification t (FCT) method which is more robust than the t-based EM-type algorithm. To make the t CP model more tolerant of high leverage outliers, we further propose a modified version to first moderately prune high leverage points and then employ the new method to the pruned dataset. The rules for deciding the proportion of pruning and the degrees of freedom of t-distributions are also discussed. Through numerous simulations and real applications, we demonstrate the effectiveness and superiority of the proposed methods by comparing them with existing methods.
The rest of this paper is organized as follows. In Section 2, we present some related work in the literature. In Section 3, we develop the new methods and illustrate the determination of the degrees of freedom of the t-distribution, the resistance to high leverage outliers and the robust properties of the proposed methods. In Section 4, experiments with numeric and real examples are provided to show the effectiveness and superiority of the proposed methods. Finally, the discussion and conclusions are stated in Section 5.

Related Work
Since change-point (CP) problems are often encountered in practice, numerous methodological approaches have been implemented for estimating CP models, such as likelihood-based estimation, Bayesian analyses, grid-searching approaches and clustering methods. A main difficulty with the likelihood estimations for CP regressions is the nondifferentiability of the likelihood function with respect to the CPs viewed as parameters.
Researchers have tried to overcome this problem through various approximations or smooth transitions between two adjoining regimes. For example, Muggeo [16] created a segmented (SEG) regression method by means of a simple linearization technique. This method allows simultaneous inference on CPs and regression coefficients, but it is not resistant to outliers and only workable for continuous models. Recently, Lu and Chang [10] conquered the non-differentiable problem by transforming the CP regression model into a mixture regression model and introduced an EM-BP algorithm to derive estimates. There have been more recent methods for solving CP problems. Chakar et al. [17] consider the infeasibility of the dynamic programming algorithm for dependent series. They proposed a robust estimator of the autocorrelation parameter to de-correlate the autoregressive time series. Then, the classical inference for independent series can be utilized to estimate the CPs in the autoregressive series.
There are some recent Bayesian CP analyses. For instance, Ko et al. [18] proposed the Dirichlet process hidden Markov model (DPHMM) to detect CPs without requiring the specification of the number of CPs a priori. Bardwell and Fearnhead [19] presented a Bayesian approach to detect abnormal regions in multiple time series, allowing the presence of changes in a small subset of the time series.
Nonparametric CP methods were also suggested for the data, which cannot be wellapproximated by any parametric distribution, such as in the works of Zou et al. [20] and Haynes et al. [21]. Moreover, Frick et al. [7] considered the question of search accuracy in CP analysis. They introduced a simultaneous multiscale CP estimator (SMUCE) to provide the estimates of an unknown step function with the probability of over-and underestimating the number of CPs being controlled. Pein et al. [8] further proposed a heterogeneous simultaneous multiscale change-point estimator (H-SMUCE) for a heterogeneous piecewise constant model. More recently, various researchers have focused on developing computationally efficient methods [22,23]. For more reviews on CP work, we refer to articles [24,25] and references therein.
In spite of recent rapid developments on various aspects of CP detection, there is less work on the robustness to the presence of outliers [6,24]. To alleviate the disturbance of outliers, data may be pre-processed to remove outliers. After editing suspected outliers, the subsequent analysis is still based on the normality assumption. A serious problem with this approach is that the resulted influences may fail to reflect uncertainty in the "cleaned" data; particularly, standard errors may be too small.
The least absolute deviations (LAD) estimation has been used to estimate CP regression models robustly by a few researchers [26,27]. A robust approach connected with the LAD procedure is assuming that regression errors have a Laplace distribution. Yang [28] and Jafari et al. [29] employ Laplace distributions to estimate CP regression models robustly. The restriction of a single CP detection limits their applicability because multiple CPs often exist in reality. Although the LAD method is more robust than the least squares (LS) estimation, it is computationally intensive to optimize the objective function [25].
M-estimation is another popular robust approach in linear regressions, which replaces the least squares criterion with a resistant criterion. Considering the robustness of M-estimation, Fearnhead and Rigaill [6] proposed to associate the penalized cost approaches with a less sensitive loss function for estimating CPs robustly. The authors claimed that a bounded loss, such as the bi-weight loss, is favorable for robust estimations. However, their method relies on a constant variance assumption which is unsuitable in many applications [8]. Recently, a robust estimator based on the Wilcoxon statistic was proposed for estimating the CP of the mean [30], but this approach is restricted to a single CP detection.
The t-distribution has also been employed for robust fitting of a statistical model to the data structure. For example, Lange et al. [14] proposed to use t-distributions for the robust fit of a traditional linear regression model to data. Peel and McLachlan [15] proposed a mixture model of t-distributions for robust clustering. Yao et al. [31] presented a t mixture regression model for estimating model parameters robustly. Lin et al. [32,33] presented the tests on heteroscedasticity in Student t and skew-t-normal regression models, respectively. Although t-distributions have been used in mixture regressions for making resistant estimations, they were barely considered for CP regression problems. Though locating CPs in a piecewise regression model is similar to classifying data into classes of similar objects, CP regression problems are different from mixture regressions in some aspects. For example, data points in CP regressions are ordered according to the order of some partitioning variable, and hence there are restrictions on the memberships of data points belonging to each individual segment. Subsequently, the robust mixture regressions of t-distributions cannot be applied to CP regressions directly. The rare implementation of t-distributions in CP regressions may be because the t likelihood function is non-differentiable with respect to CPs and there is no explicit form for the maximizer of the t-log-likelihood.
A few studies have utilized t-distributions to solve CP regression problems. Osorio and Galea [34] located a single CP for t regression models based on Schwarz Information Criterion (SIC). Their method is restricted to a single CP detection, and the degrees of freedom of t-distributions are not well-estimated. Lin et al. [35] adopted a Bayesian approach to identify the position of changes in variance for a t regression model with the regression coefficients assumed to be constant. The assumption of constant regression coefficients is restrictive and unrealistic. Furthermore, the burden of their computations is heavy because six steps are required in one pass of the Gibbs sampler for locating a single CP only, not mentioned for detecting multiple CPs. That is why they suggested detecting multiple CPs one by one in a hierarchical binary way instead of simultaneous detection. However, the hierarchical binary method for locating multiple CPs may not result in a correct optimum [11]. Moreover, these researchers did not illustrate the resistance of their method against high leverage outliers. It is known that t-distributions are sensitive to high leverage outliers, and these kinds of outliers usually have severe effects on the fit of a regression model to the data.

An EMT Algorithm for Change-Point Regression Models
Assume a dataset, { } For the E step of EM, we need the conditional expectation of ij z given by ( ) where the superscript 0 indicates that parameters are given as 0 θ . Then, Replace ij z in (4) with * ij p in Equation (6) and denote the resulted expression as Repeating the steps E and M until the stop criterion is met, we then obtain the estimator of CPs as: where ( | ) h t r is the density of a standard t-distribution with r DF, t(r). We first suppose i r s are known, and we discuss the decision of i r s later. The complete log-likelihood is:  1 , , , , To simplify the computations, we express the t-distribution as a scale mixture of normal distributions (c.f. [12,28]). We introduce into the data vector additional missing data 1 ( , , ) , satisfying that given where gamma( , ) α λ has density To use EM, given the current estimate ( ) where ( ) k ε τ is computed by Equation (5) with the density given by Equation (8).
Moreover, the gamma distribution is the conjugate prior distribution for j v . It is easy to show that: Then, step M is to maximize: Taking partial derivatives of ( ) where is a vector with the jth component j y and . Based on the above, we propose an EMT Algorithm 1 for estimating CP regression models, as follows.
by Equation (5) with the density given by Equation (8).
Step 6. If , then stop; else, 1 k k = + and go to Step 2.
After the EMT algorithm converges, the estimates of c − 1 CPs are given by ( )

A FCT Algorithm for t CP Regression Models
Since the EM algorithm converges slowly when data involve noises (c.f. [36]), we extend the crisp model to a fuzzy class model for improving the convergence rate and the precision of estimations. Similar to Lu and Chang [10], using the fuzzy partitioning concept, we regard each probable CP combination as a partition of data with a fuzzy membership and then transform them into the pseudo memberships of data belonging to each individual segment. As in previous descriptions, consider a regression model with c − 1 CPs, as given in Equation (1). Suppose ( ) Now, the CP model is a fuzzy class model with a fuzzy c-partition, { } The regression parameters can be estimated by maximizing the fuzzy classification maximum likelihood (FCML) function (c.f. [10,37]): As stated, assume that the CP regres- Having the regression estimates θ , we update the CP memberships s α τ as follows. Given a CP collection τ , define: We then optimize the objective function: The updated α τ is: ( ) We propose an FCT Algorithm 2 for robustly fitting CP regression models, as follows.
Step 5. If , then stop; else, 1 k k = + and go to step 2.
After FCT converges, the estimates of c − 1 CPs are given by ( )

Decide the Degrees of Freedom of t-Distributions
In this section, we discuss the decision of the degrees of freedom (DF),  [28]). As is known, t(r) approaches normal distributions for a large enough r. In practice, r is not necessary to be very large for achieving normality. Similar to [31], we set rmax = 15. After performing lots of simulations, we found that choosing r based on a grid set often fails to provide CP estimates that are precise enough. However, numerous simulation results show that both EMT and FCT with 1 r = (EMT1 and FCT1) always produce rather precise CP estimates, no matter whether data are contaminated or not. Accurate CP estimates are critical to solving CP regression problems. To demonstrate that EMT1 and FCT1 are generally suitable to all kinds of data, we employ EMT and FCT with r = 1, 5, 10 and 15 to diversified datasets randomly generated from the models of 1 and 2 CPs, with data including 0, 5, 10 and 15 noises, respectively. The models considered for generating data are shown in Figure 1 with regression errors following a normal distribution of N(0, 0.5 2 ) (except for the cases in Figure 1, more situations and different kinds of models were also examined, but results are similar, and hence, we only report four cases here). We evaluate the performance based on R replications in each individual circumstance with R equal to 1000 and 200 for models of 1 and 2 CPs, respectively. The evaluation criteria in- where ( )ij r β is the estimate of ij β in the rth replication. The normal-based EM-BP algorithm for CP regressions presented in [10] is also employed and renamed EMN for comparisons. For illustrative purposes, the true regression coefficients and one of the simulated series are shown in Figure 1. Table 1 summarizes the simulation results.   Table 1 demonstrates that EMT1 and FCT1 generally perform better than those with different degrees of freedom (DF) by virtue of more precise CP estimates and less MSE in the estimations of regression coefficients in all cases, especially the cases of data with contaminations. For clean data such as in Figure 1a, all EMTs and FCTs with different DF provide sufficient and comparable estimates of CPs and regression coefficients. However, for data containing noisy points, such as the cases of Figure 1b,d, only EMT1 and FCT1 work well in all, but the other EMTs and FCTs mostly fail to provide sufficient CP estimates. As illustrated by Peel and McLachlan [15], the use of t components provides less extreme estimates of the posterior probabilities of segment membership of the CP model. The longer tail of t-distributions makes the fitted CP models more robust by giving reduced weight to atypical observations in the calculation of CP estimates. Furthermore, the t model-based approach embeds the normal distribution for each segment into a wider class of elliptically symmetric distributions, with an additional parameter called DF which is key for tuning robustness. This illustrates that the long tail of t(1)-distributions makes EMT1 and FCT1 more robust than those with higher DF. Even with normal data involving no contaminations, EMT1 and FCT1 still perform as well as EMN because of the similarity of symmetry between normal and t-distributions. Moreover, the best robustness of EMT1 and FCT1 among all EMTs and FCTs can be seen from the robustness properties of the proposed methods illustrated in Section 3.5. Thus, one may first fit data using EMN, EMT and FCT with several different DF, such as r = 1, 5, 10 and 15, respectively. If the estimations of EMN, EMTs and FCTs are close, one may use EMN, considering that the data are clean and normally distributed. Otherwise, EMT1 and FCT1 should be adopted.

Resistance against High Leverage Outliers
Similar to the M-estimates for linear regressions, the proposed EMT and FCT are still susceptive to high leverage outliers. Here, high leverage outliers refer to outliers in the space of independent variables, since points far away from the main body of points in the X space have high leverage. Modifications for solving this problem are needed. In regression analyses, the leverage score of an observation ( , ) j j y x is often defined as: x and S are the sample mean and the sample covariance of j x s, respectively. However, x and S are nonresistant to outliers, and leverage points may not be recognized due to the effect of other high leverage points [34]. To lessen the mask effect, we consider replacing x and S with the minimum variance determinant (MCD) estimators [38] among all the subsets of  [40], that h = (n + p + 1)/2 should be utilized to have the largest breakdown value for the MCD estimators. Through the fast MCD (FastMCD) algorithm of Rousseeuw and Van Driessen [41] with the small sample corrections proposed by Pison [42], the dataset is trimmed by removing the top *100% α of the data sorted in the descending order of the leverages of data. Then, EMT/FCT is applied to the trimmed data. Since the proposed EMT/FCT is only sensitive to outliers with rather high leverages, a small value of α such as 0.05 is usually enough in practice. To advance the ability of the proposed algorithms in withstanding outliers, for data involving doubtful high leverage outliers, we suggest employing EMT/FCT after trimming those observations with a modified MD ranking in the top *100% α of MDs arranged in declining order using a small α .

The Robust Properties of EMT and FCT
In this section, we employ the influence function and gross error sensitivity to illustrate the ability of the proposed algorithms in withstanding outliers. These two criteria have been frequently used to measure robustness in statistics [13]. Suppose 1 , , n y y  are observed data for estimating the parameter θ . An M-estimator is created by minimizing the form: where ρ is a function that measures the difference between the assumed distribution and the true one. An M-estimator is derived by solving the equation: The loss function associated with the MLE obtained by EMTr (r denotes DF) is equivalent to: and hence the ( ) ; y ϕ θ is: We only need to analyze ( ) ; y ϕ θ because IF is proportional to ( ) ; y ϕ θ according to Equation (24). Note that: Equation (28) indicates that IF approaches to 0 as y approaches to positive or negative infinity. The maximum and minimum values of ( ) ; y ϕ θ can also be derived by solving Therefore, the ( ) ; y ϕ θ in Equation (27) is bounded and continuous, as exhibited in Figure 2a. Furthermore, Figure 2a shows that for all y values, the ( ) ; y ϕ θ of EMT1 is closest to 0 among all EMTr, which means that EMT1 is the most robust among all EMTr.
Besides, Figure 2a demonstrates that for any given observation y, the ( ) The plot of IF for the median is also displayed in Figure 2b. The influence of an excessively large or small y on the median is constant. However, the influence of an extremely large or small y on the estimator derived by EMT1 is tiny according to Equation (28) and Figure 2a. In fact, Equation (28) implies that an extremely large or small y can be viewed as an observation which may come from a different but unknown distribution and have no influence (i.e., IC( ; , ) 0 y F θ = ) on our estimator. The maximum influence of observations, y, on our estimators can be derived by solving Equation (29). Hence, the proposed estimator has a bounded and continuous IF, and also a finite gross error sensitivity. Thus, the t CP model-based approach is robust from the point of view of robust statistics. EMT1 and FCT1 are more robust than those with DF greater than 1. The comparisons are based on three models under four respective scenarios using sample sizes of 50 and 75 with 1000 and 200 replicates for 1 and 3 CPs models in each simulation run, respectively. For illustrative purposes, Figure 3 displays the true coefficients of the models used, and one of the simulated series including generated data, fitted lines and noises. Four respective scenarios considered for three models are similar and include the cases of data having (1)

Simulation Studies
where ˆk τ is the CP estimate at kth replication, and (2) the sum of MSE of the regression coefficient estimates (MSES-reg) as given by Equation (21). Table 2 reports the results.   Table 2 shows that EMT1 and FCT1 perform better or at least comparable to the other methods in all cases of all three models, except the case of data contaminated by leverage outliers. However, for the cases with high leverage outliers, the trimmed versions, EMT1tm and FCT1-tm, work well. Particularly, FCT1 usually performs better than EMT1 because the influence of outliers can be greatly reduced by moderately increasing the power of the fuzzy index m used in FCT1. More findings are stated as follows: 1. For datasets including no atypical observations, all EM-type and FCT methods (including EMN, EM-Tk, EM-T, EMT1, EMT-tm, FCT1 and FCT1-tm) work well for all three models, with FCT1 performing better than the others for less biased CP estimates and less MSES-reg with regression coefficient estimates. 2. RSE works well for the 1 CP continuous model (model b). However, RSE is not workable in cases of models with discontinuity (model a) or multiple CPs (model c). However, these situations frequently occur in practice. 3. For data with background noises such as cases a.2, b.2 and c.2, only the proposed tbased approach works generally well. Particularly, FCT1 provides more precise estimates. 4. For datasets including extremely high leverage outliers such as cases a.3, b.3 and c.3, all methods cannot work well, except EMT1-tm and FCT1-tm. In such situations, both trimmed versions provide effective estimates of CPs and regression coefficients for each model, and FCT1-tm performs better for producing less biased estimates on the whole. 5. From extensive simulations, the iteration time respectively taken by EMT1 and FCT1 in all cases is always less than other EM-based methods. This evidence indicates that the proposed method is most computationally economic. Especially, it is even more timesaving than the normality-based EMN.
Next, we show that the proposed algorithms are more tolerant of extreme outliers in comparison with the existing methods based on model b in Figure 3. Four cases of extreme contaminations occurring at different locations are considered, with a simulated sample shown in Figure 4.
Note that in all four cases, the distance of the observed y-value from the predicted yvalue is larger than 6 deviations, and the percentage of extreme outliers is more than 9.1% in each case. The estimation results are shown in Table 3.  Similar to previous comparisons, Table 3 shows that FCT1 performs best in all four cases of extreme contaminations. Particularly, FCT1 and FCT1-tm are much more resistant against radical outliers than other methods for providing estimates as precise as those obtained from clean data. Specifically, in cases 1 and 4, where the drastic outliers (0, 7) and (10,12) have rather high leverages and their responses are distant from the predicted values for 12 standard deviations, FCT1 methods are still hardly affected by outliers and produce adequate estimates even if extreme outliers account for almost 10% of the whole dataset. Moreover, as explained earlier, FCT1 is more robust than EMT1 in that the fuzzy partitioning and the fuzziness index, m, are helpful for reducing the influence of outliers by giving tiny weights with a large m to the exceptional outliers. Table 3 demonstrates again that the normal-based method, EMN, cannot work well in the presence of extreme outliers. Additionally, while RSE is quite robust to outliers with moderate leverages, it cannot provide satisfied regression estimates for data involving drastic outliers with fairly high leverages.
In summary, the proposed FCT1 method is very robust to outliers, even though the dataset includes extreme outliers with the observed responses far away from the predicted values, by more than 6 standard deviations. Furthermore, FCT1 can still produce sufficient estimates for datasets in the presence of radical outliers with quite high leverages. More importantly, FCT1 is tolerant of the extreme outliers, which account for about 10% of the total data in all four cases of the experiment. In practice, outliers far away from the predicted regression line, by more than 6 standard deviations, are unusual and very likely less than 10% of the whole dataset. Thus, the proposed FCT1 is attainable in real applications, generally.

Real Data Applications
Example 1. We applied the proposed methods to a dataset involving 107 terrestrial animals gathered by Garland [45] for analyzing the relationship between the maximal running speed (MARS) with the body size. It is understandable that animals of different sizes can reach different maximal running speeds (MARS). McMahon [46] commented that animals are built for elastic similarity and the MARS of animals is proportional to the fourth root of body mass, (mass) 1/4 , among elastically similar animals. The plot in Figure  5 shows that a slope change occurs in the relationship between (MARS) 1/4 and (mass) 1/4 . Additionally, several exceptionally low points appear near the place where the regression slope converts. It is appreciable that these suspicious points very likely pull down the regression line and hence have a substantial influence on the regression slope.
We respectively apply EMN, EMT and FCT to the MARS data by setting DF = 1. The fitting results in Figure 5 show that EMT produces a much shorter CP at 25 and the line on the right hand side becomes flatter due to the down pull exerted by those extremely low points (see Figure 5, the blue fitted line by EMT and ). This evidence indicates that FCT1 is much more resistant against those extraordinarily low points than EMN. Based on previous simulation results showing that FCT1 usually provides less biased estimates than other models, we believe that the estimated model provided by FCT1 is reasonable and best suited to MARS data. The MARS data can be obtained from [45]. ) to study the relationship between the discharge rate (m 3 /s) and the rate of bedload transport (kg/s). Bedload transport measures the transportation of particles in flowing fluids along a bed. The bedload transportation is often described as occurring in two phases, starting from a relatively slow stage (Phase I) to a faster stage (Phase II). The transition point is the main interest of researchers in studying bedload transportation. Ryan and Porth [47] had fitted the dataset with a piecewise regression model, and they found it difficult because few data were collected at higher flows. The scatter plots of data in Figure 6 show that the transportation rate tends to increase with the water discharge faster after the discharge rate goes beyond 1.5 m 3 /s. Furthermore, the two points on the upper right having extraordinarily high transportation rates are leverage points. to the data. Figure 6 shows that EMN and EMT1 are heavily affected by two high leverage points and both give an identical CP at 74 (discharge rate, DR = 1.81354). This is irrational for only two points included in Phase II. By contrast, the other three methods are much more robust, resulting in the same CP at 62 (DR = 1.5786) and much lower slope estimates for the second line, especially FCT1 and FCT1-tm. Particularly, the slope of the second line, , obtained by FCT1-tm is much smaller than those obtained by other methods. These facts illustrate that FCT1 and FCT1tm are more resistant against unusual observations; especially, FCT1-tm is capable of withstanding high leverage outliers. Our resulting CP estimate, 62, looks reasonable and it is close to the finding of Zhang and Li [48]. As for deriving more accurate estimates for Phase II, just as suggested by Ryan and Porth [47], more data at faster flows are required.    [35]. However, the box plots in Figure  7b show that the last three segments obtained by EMN contain extreme outliers. Hence, the CPs estimated by EMN are suspicious due to its sensitivity to outliers and heavy-tails. Moreover, the estimated CPs of Lin et al. are obtained by the binary hierarchic segmentation, which does not generally provide the true optimum (cf. [11,49]). On the other hand, EMT1 and FCT1 produced 3 CPs at 21, 32 and 108 and 21, 32 and 151, in which the second segment contains only 11 observations and the difference of variations between the first two segmentations is not statistically significant. Furthermore, the scree test for choosing the number of CPs (see the discussion in Section 5) suggests that the optimal number of segments is 3. The box plots in Figure 8 show that several outliers appear in the last two segments. According to previous experiments, FCT1 is powerful in resisting atypical observations and it is more robust than EMT1. Thus, the CP estimates, 33 and 151, provided by FCT1 should be the optimal times of variance changes in the return series.

Discussion
Computation cost is usually a concern in CP detection methods. The capacity of a robust method in resisting outliers for different sizes of datasets is important. For most statistical methods, it is harder to achieve robustness with a small dataset because estimations are easily affected by anomalous observations for small amounts of data. In order to show the high robustness of the proposed method even for small samples, we used datasets consisting of tens or hundreds of data points in the experiments. When fitting a regression model with c − 1 CPs to a dataset of size n by EMT or FCT, we consider 1 1 n c C − − potential collections of CPs, simultaneously. Therefore, the computational efficiency can be impeded as the size of the dataset or the number of CPs increases. In fact, except for the sample size and the number of CPs, other factors may influence the computational cost, such as the model variations, the model structures, such as continuity or discontinuity at CPs, and the discrepancies of models between two contiguous segments. According to our simulations executed on a computer of an Intel Core i7-7700HQ CPU with 2.80 GHz and 8 Gb of Ram, the proposed methods are suitable for datasets of sizes no more than 2000, 1000 and 500 for locating one, two and three CPs, respectively. For large datasets, a possible approach is to cut the sample size to the favorable size and then derive preliminary estimates by using the reduced data. One possible way for reducing the data size is to remove the data of even rank in the ordered dataset arranged in the increasing order of the partitioning variable, d * x , then repeatedly reduce the data until the favorable size is reached. After deriving the rudimentary estimates, we can further estimate CPs and regression parameters by employing EMT to the nearby data of the preceding CPs again.
The determination of the number of CPs is important in CP detection. However, it is difficult when lacking likelihood-based tests for the number of CPs because the regularity conditions fail and there is little knowledge about the asymptotic distribution of the test statistics even for Gaussian cases. Most CP studies usually adopted some information criteria or informal tests for deciding the number of segments. Here, we employed the idea of Hawkins's scree test [49] to investigate the progressive changes in the residual sum of squared errors ( , c is the number of segments fitted to the data), with the changes in the MSE described as: ( ) F c is −2 multiplying the optimized log-likelihood obtained from a model of c pieces fitted to the data). Additionally, Ciuperca's [50] generalized Bayesian Information Criterion (BIC) was also utilized.
The suggested indexes were employed to the four models, shown in Figure 9, with the data contaminated by background noises, which includs outliers with a quite high leverage. For data contaminated by high leverage outliers, the trimmed version with 0.05 α = was used and the results were similar to those of cases including no outliers. Hence, we do not present the results of cases with high leverage outliers here. We report the results of both EMT1 and FCT in Table 5. For models including zero or one single CP, all four suggested indexes (RSS, MSE, −2lnL and BIC) provide a correct c for each respective case. For the models of multiple CPs, the BIC and −2lnL index fails to obtain the correct c. The RSS and MSE indexes work well in all circumstances. It can be easily seen from Table 5 that both the indexes RSS and MSE decreased progressively for c less than the true value, and then the decrease becomes less again once c goes beyond the true value. These two indexes provide a clear indication of the optimal c. Therefore, the two suggested indexes, RSS and MSE, are useful for determining the number of segments. However, in some applications, the cut-point may not be so clearly indicated by these indices. In such cirsumstances, one may fit the data with several possible c's based on RSS and MSE and then choose the best fitted model on the basis of some criteria in regression analysis. Although there is no solid inferential base for the suggested method, it may suffice as a rough and practical tool.   3 . MSE: the change of MSE, i.e., MSE(c − 1) − MSE(c). 4 . Optimal c denoted by fluorescent yellow color. 5 . Extremely small number (unreasonable results due to using a wrong c).
On the other hand, the penalized likelihood approach has been utilized in recent studies, such as [7,8], to infer both the number of CPs and their locations. For large datasets, one may consider the penalized approaches with a penalty added into the objective likelihood to estimate the number and the locations of CPs simultaneously. However, the selection of a proper penalty is not easy because the penalty can have a substantial impact on the estimations for the number of CPs [28]. One may refer to Haynes et al. [51] for information on workable penalties.
In using the trimmed EMT, a small value of α was selected to remove leverage points where the leverage scores rank in the top 100% α × . Apparently, the choice of α is connected with the results of EMT1-tm and FCT1-tm. A formal statistical test is required for objectively judging whether a datum is a high leverage point based on a cut-point, which is the critical point resulted from a statistical test under a given significance level. Accordingly, high leverage points can be deleted adaptively but not fixed at a subjective value. Moreover, high leverage points may present small residuals and hence provide important information for model fitting. Therefore, more research on incorporating the information of high leverage points with small residuals is also needed to judge high leverage points better. One possible method is to borrow the idea of monitoring introduced by Cerioli et al. [52]. First, the trimmed version with a small α is employed to derive robust estimates of CPs and regression parameters. Then, the technique of monitoring in [52] may be considered for judging the trimmed points to make the most of the dataset. The test statistic for outliers in CP regressions would be better based on the difference in the estimates of CPs and model parameters between different samples. Further research is needed, and it will be covered in our future work.

Conclusions
This paper proposed a new robust method for CP regressions using t-distributions. The t-distribution provides a longer tailed substitute to the normal distribution and has an additional parameter called DF, which is critical for tuning robustness. These favorable properties make the t CP regression model more tolerant of atypical observations. We considered CPs as random variables and transfered them into latent class variables so that we could implement the EM algorithm to estimate the CPs and regression parameters simultaneously. Moreover, we rewrote the t-distribution as a scale mixture of normal distributions to simplify the computation of t-log-likelihood. We further extended the latent class model to a fuzzy class model. Incorporating fuzzy clustering methods, we created the FCT algorithm to make more robust estimations for CP regressions by FCT than by EMT. The proposed EMT and FCT are computationally easy and efficient compared to the other existing EM-type methods. The proposed method has no problem with the non-differentiability and requires no restrictions on the models, such as the continuity between two adjoining segments needed, only a single CP allowed, or an unchanged regression variance demanded. Some of these restraints are often required in existing methods. To advance the ability of the proposed methods in resisting high leverage outliers, we proposed the modified versions, called EMT-tm and FCT-tm, by first trimming high leverage points moderately and then employing the proposed method to the reduced data. As for the decision of the degrees of freedom, r, the EMT1 and FCT1 (DF r = 1) worked well in all experiments, no matter whether the data were contaminated or not. The analysis of robust properties with the proposed t-based methods also showed that EMT1 and FCT1 were most tolerant of atypical observations. In managing CP regression problems, we suggest respectively fitting data with EMN, EMTs and FCTs using values such as r = 1, 5, 10 and 15. If the estimations of the three methods are close, one may consider that the data are normally distributed, having no abnormal observations, so that EMN can be used for obtaining less varied estimators. Otherwise, EMT1 or FCT1 should be empolyed. In practice, FCT1 can be given priority for its better ability of withstanding atypical observations. For the concern about the trimmed proportions of data, α , normally, outliers with extremely high leverage are seldom more than 5%, thus α can be set as 0.05 at the start and increased in case a higher value is needed (cf. [28,47]). Extensive experiments with numerical and real examples showed the effectiveness and the practicability of EMT1 and FCT1. In particular, only the trimmed versions were workable for data containing extremely high leverage outliers, such as the real data of bedload transportations. This evidence shows the effectiveness and the need of the trimmed versions for real applications. Overall, the proposed EMT and FCT approaches are robust to atypical observations and heavy-tailed distributions. Especially, FCT1 generally works well and performs better than EMT1 for providing less biased estimates of CPs and model parameters.

Conflicts of Interest:
The authors declare no conflict of interest.