Bootstrap Confidence Intervals for Multiple Change Points Based on Two-Stage Procedures

Hou, Li; Jin, Baisuo; Wu, Yuehua; Wang, Fangwei

doi:10.3390/e27050537

Open AccessArticle

Bootstrap Confidence Intervals for Multiple Change Points Based on Two-Stage Procedures

¹

Department of Statistics and Finance, University of Science and Technology of China, Hefei 230026, China

²

Department of Mathematics and Statistics, York University, Toronto, ON M3J 1P3, Canada

^*

Authors to whom correspondence should be addressed.

Entropy 2025, 27(5), 537; https://doi.org/10.3390/e27050537

Submission received: 15 April 2025 / Revised: 12 May 2025 / Accepted: 15 May 2025 / Published: 17 May 2025

(This article belongs to the Special Issue Statistical Methods for Modeling High-Dimensional and Complex Data: Second Edition)

Download

Browse Figures

Versions Notes

Abstract

This paper investigates the construction of confidence intervals for multiple change points in linear regression models. First, we detect multiple change points by performing variable selection on blocks of the input sequence; second, we re-estimate their exact locations in a refinement step. Specifically, we exploit an orthogonal greedy algorithm to recover the number of change points consistently in the cutting stage, and employ the sup-Wald-type test statistic to determine the locations of multiple change points in the refinement stage. Based on a two-stage procedure, we propose bootstrapping the estimated centered error sequence, which can accommodate unknown magnitudes of changes and ensure the asymptotic validity of the proposed bootstrapping method. This enables us to construct confidence intervals using the empirical distribution of the resampled data. The proposed method is illustrated with simulations and real data examples.

Keywords:

change point; bootstrap; confidence interval; OGA; sup-Wald-type test

1. Introduction

Multiple change point detection is common in many applications, such as signal processing [1], medical diagnosis [2], industrial control [3], and oceanography [4]. For example, in the case of the detection of changes in one’s heart rates, the accurate identification of the change points can decompose the data into stationary segments, allowing clinicians to identify the behavior of the heart and diagnose diseases. The detection of multiple change points has important practical implications, which has prompted extensive research in the statistical community.

Implementing appropriate hypothesis testing is an important method for detecting multiple change points. Here are some examples. Ref. [5] proposed sup-Wald-type tests to test the null hypothesis of no change against alternative hypotheses containing an arbitrary number of changes to identify multiple structural changes in linear models. At the same time, the null hypothesis of s changes against

s + 1

changes is tested to determine the number of breakpoints. Later, ref. [6] developed dynamic programming principles to estimate multiple change points in linear regression. Ref. [7] proposed a genetic algorithm for detecting multiple breakpoints. However, these methods are very time-consuming. There is another line of research on multiple change point detection that transforms the problem into a variable selection problem. For example, ref. [8] used LASSO [9] to estimate the locations of multiple change points in a one-dimensional piecewise constant signal observed in white noise. Ref. [10] introduced a two-stage procedure based on adaptive LASSO [11], SCAD [12], and MCP [13] regularization methods that can simultaneously detect multiple change points in linear models. In addition, the two-stage procedure has been extended to accelerated failure time (AFT) models [14].

Despite the aforementioned developments, research on the inference for multiple change points remains limited. Under the null hypothesis, asymptotic [15] or approximate [16] distributions of test statistics have been derived, which enables quantifying uncertainty in the number of change points. For many testing procedures in change point analysis, the computation of critical values often relies on the asymptotic behavior of the test statistic under the null hypothesis. However, the convergence of the test statistic to its limiting distribution is often slow, and in some cases, the exact form of this distribution remains unknown. It has been noted that the bootstrap method is a computer-based statistical inference technique that can provide answers to a variety of statistical questions without relying on formulas. For example, ref. [17] proposed a bootstrap method in which the estimated error sequence is resampled with replacement to obtain confidence intervals for change points. Ref. [18] introduced an asymptotically valid confidence region for a single change point through the inversion of bootstrap tests. Ref. [19] studied the application of a circular overlapping block bootstrap method in the context of an at-most-one-change time series model with an abrupt change in the mean and dependent errors. In the field of a single change point estimation, the bootstrap technique has emerged as a valuable tool for approximating unknown probability distributions and the characteristics of change point estimators. These methods make it possible for us to provide confidence intervals for multiple change points. Ref. [20] addressed this problem in mean shift models by proposing the bootstrap construction of pointwise and uniform confidence intervals for multiple change points based on a moving sum procedure.

This paper aims to construct confidence intervals for multiple change points. By [10], the multiple change point detection problem can be formulated as a model selection problem. Before using bootstrapping, it is important to ensure that the estimates of the number of change points and the estimates of the within-segment parameters are consistent. However, regularization methods such as the LASSO and SCAD may suffer from the bias problem inherited from the penalty function [12,21]. Specifically, LASSO may select more irrelevant variables, leading to an overestimation of the number of change points. To alleviate this problem, we consider adopting the

L_{0}

-regularization method to achieve consistency between the estimates of the number of change points and their locations. Although the subset selection, while unbiased, is often computationally expensive, several optimization strategies and algorithms have been proposed to overcome the computational difficulties. For example, ref. [22] introduced an iterative algorithm called the orthogonal greedy algorithm (OGA) for high-dimensional regression models, which sequentially selects input variables to be included in the linear regression model and proposed a procedure named OGA + HDIC + Trim. OGA is a fast stepwise forward regression method that starts with a null model and adds predictors via component-wise linear least squares estimation. HDIC is a high-dimensional information criterion used to enter predictors into the model along the OGA path. The Trim method defines a subset to exclude additional irrelevant variables from OGA + HDIC. To construct confidence intervals for multiple change points, we proceed as follows. We first cut the data sequence into segments. Due to its model selection consistency and expected convergence properties, we apply OGA + HDIC + Trim to find data segments containing change points. We then utilize sup-Wald-type test statistics to locate them. Finally, we apply the bootstrap method to construct confidence intervals for multiple change points using the bootstrap estimated centered error sequence.

The main contributions of this paper are as follows. As described above, we first propose a two-stage procedure to detect multiple change points by using OGA + HDIC + Trim procedure and sup-Wald-type test statistics. In the first stage, we cut the data sequence into segments. In addition, we give the asymptotic distributions of the change point estimators under certain conditions. Based on this framework, we further explore the application of bootstrapping techniques in constructing confidence intervals for the change points and demonstrate the validity of the bootstrap method, i.e., the proposed bootstrap

100 (1 - α) %

-confidence intervals asymptotically attain the coverage probability of

1 - α

for a given

α \in (0, 1)

[20]. Last but not least, we illustrate the effectiveness and applicability of the proposed method through extensive simulation studies and a real data example, respectively.

The rest of this paper is organized as follows. Section 2 details the detection of multiple change points using the OGA + HDIC + Trim procedure and sup-Wald-type test statistics. Furthermore, Section 3 introduces the resampling bootstrap method for the change point estimators based on a two-stage procedure and Section 4 gives the theoretical properties of the proposed bootstrap method. Section 5 presents extensive simulation studies. Section 6 applies the proposed method to the seismograms of the 1982 Urakawa–Oki earthquake. Technical proofs of the main results are relegated to the Appendix A. In this paper, vectors and matrices are denoted in bold type.

2. Multiple Change Point Detection Based on Two-Stage Procedures

Assume that

(x_{i}, y_{i}), i = 1, \dots, n,

satisfy the following linear regression model with s change points located at

a_{1} < \dots < a_{s}

as

\begin{matrix} y_{i} & = x_{i}^{⊤} [β_{1} + \sum_{l = 1}^{s} δ_{l} I (a_{l} < i \leq n)] + ε_{i} \\ = \{\begin{matrix} x_{i}^{⊤} β_{1} + ε_{i}, & if 1 \leq i \leq a_{1}, \\ x_{i}^{⊤} (β_{1} + δ_{1}) + ε_{i}, & if a_{1} < i \leq a_{2}, \\ \dots & \dots \\ x_{i}^{⊤} (β_{1} + \sum_{l = 1}^{s} δ_{l}) + ε_{i}, & if a_{s} < i \leq n, \end{matrix} \end{matrix}

(1)

where n is the sample size,

x_{i} = {(x_{i, 1}, \dots, x_{i, q})}^{⊤}

is a sequence of q-dimensional predictors;

β_{1} = {(β_{1, 1}, \dots, β_{q, 1})}^{⊤} \neq 0

is an unknown q-dimensional vector of regression coefficients; s is an unknown number of change points;

1 < a_{1} < \dots < a_{s} < n

are unknown change points;

δ_{l} = {(δ_{1, l}, \dots, δ_{q, l})}^{⊤}, l = 1, \dots, s

, are unknown changes in regression coefficient vectors at change points; and

ε_{i}

’s are unobservable random errors.

The estimation of the regression coefficients

β_{1}

and

(β_{1} + \sum_{l = 1}^{j} δ_{l}), j = 1, \dots s

is hampered by the unknown parameters s,

a_{1}, \dots, a_{s}

. In light of [23], we propose to replace s in (1) with a predetermined number of segments, denoted by

p_{n}

, to facilitate the estimation of the regression coefficients. This replacement allows us to identify the total number of change points. Specifically, we partition the data sequence into

p_{n}

segments, where

p_{n} \to \infty

as

n \to \infty

. This ensures that all segments excluding the first segment have length m, while the length of the first segment is

n - (p_{n} - 1) m

. Let

Q_{1} =

\{1, 2, \dots, n - (p_{n} - 1) m\}

and

Q_{l} = \{n - (p_{n} - l + 1) m + 1, \dots, n - (p_{n} - l) m\}, l = 2, \dots, p_{n}

. See (3) for more details on segmentation. We assume that

m < min (a_{j + 1} - a_{j}) / 2

for

j = 1, \dots, s

, so that each segment

Q_{l}

, for

l = 1, \dots, p_{n}

, contains at most one change point.

We denote the segment as

k_{j}

if

a_{j} \in Q_{k_{j}}

. This partitioning results in the following model:

\begin{matrix} y_{i} & = x_{i}^{⊤} [β_{1} + \sum_{l = 2}^{p_{n}} d_{l} I (n - (p_{n} - l + 1) m < i \leq n) - ω_{i}] + ε_{i}, \end{matrix}

(2)

where

\{\begin{matrix} d_{k_{j}} = δ_{j} \neq 0, & for j = 1, \dots, s, \\ d_{l} = 0, & for any l \notin \{k_{1}, \dots, k_{s}\}, \end{matrix}

and

ω_{i} = d_{k_{j}} I (i \in T_{j})

with

T_{j} = \{n - (p_{n} - k_{j} + 1

)

m + 1, \dots, a_{j}\}

.

ω_{i} = 0

for all

i \notin \cup_{j = 1}^{s} Q_{k_{j}}

.

Remark 1.

By the definition of m, any two segments also contain at most one change point. From (2), it follows that

\begin{matrix} y_{i} & = \{\begin{matrix} x_{i}^{⊤} [β_{1} + \sum_{l = 1}^{j - 1} δ_{l}] + ε_{i}, & if i \in Q_{k_{j} - 1}, \\ x_{i}^{⊤} [β_{1} + \sum_{l = 1}^{j} δ_{l} - ω_{i}] + ε_{i}, & if i \in Q_{k_{j}}, \\ x_{i}^{⊤} [β_{1} + \sum_{l = 1}^{j} δ_{l}] + ε_{i}, & if i \in Q_{k_{j} + 1} . \end{matrix} \end{matrix}

Based on the partition, we can obtain least-squares estimates

{\hat{γ}}_{l}

for

l = k_{j} - 1, k_{j}, k_{j} + 1

within each segment.

1.: When $a_{j} = n - (p_{n} - k_{j}) m$ , the change point coincides with the pre-specified cut-point and $ω_{i} = δ_{j}$ . The regression coefficients corresponding to the three segments are denoted by $γ_{l}, l = k_{j} - 1, k_{j}, k_{j} + 1$ , and are equal to $β_{1} + \sum_{l = 1}^{j - 1} δ_{l}$ , $β_{1} + \sum_{l = 1}^{j - 1} δ_{l}$ , and $β_{1} + \sum_{l = 1}^{j} δ_{l}$ , respectively. The segment $Q_{k_{j}}$ that contains the change point $a_{j}$ can be identified by ${\hat{γ}}_{k_{j} + 1} - {\hat{γ}}_{k_{j}} \neq 0$ .
1.: When $a_{j} < n - (p_{n} - k_{j}) m$ , due to the existence of $ω_{i}$ , the linear regression model is misspecified on $Q_{k_{j}}$ , causing ${\hat{γ}}_{k_{j}}$ to converge to the pseudo-true value $γ_{k_{j}}$ under model misspecification [24,25]. Since $γ_{k_{j}}$ is different from both $β_{1} + \sum_{l = 1}^{j - 1} δ_{l}$ and $β_{1} + \sum_{l = 1}^{j} δ_{l}$ , $Q_{k_{j}}$ can be determined by the following: ${\hat{γ}}_{k_{j}} - {\hat{γ}}_{k_{j} - 1} \neq 0$ and ${\hat{γ}}_{k_{j} + 1} - {\hat{γ}}_{k_{j}} \neq 0$ .

According to Remark 1, we reformulate the change-point detection problem as a high-dimensional variable selection task by constructing differences between coefficients. Therefore, (2) can be rewritten in matrix form as follows:

y_{n} = X_{n} θ + e_{n} + ε_{n},

(3)

where

y_{n} = {(y_{1}, y_{2}, \dots, y_{n})}^{⊤}

,

X_{n} = (X^{(1)}, \dots, X^{(p_{n})}) = (\begin{matrix} X_{(1)} & 0 & \dots & 0 \\ X_{(2)} & X_{(2)} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ X_{(p_{n})} & X_{(p_{n})} & \dots & X_{(p_{n})} \end{matrix}),

X_{(1)} = {(x_{1}, \dots, x_{n - (p_{n} - 1) m})}^{⊤}

and

X_{(j)} = {(x_{n - (p_{n} - j + 1) m + 1}, \dots, x_{n - (p_{n} - j) m})}^{⊤}

. In addition, the definitions of

θ = {(θ_{1}^{⊤}, \dots, θ_{p_{n}}^{⊤})}^{⊤} = {(β_{1}^{⊤}, {(γ_{2} - β_{1})}^{⊤}, \dots, {(γ_{p_{n}} - γ_{p_{n} - 1})}^{⊤})}^{⊤}

,

ε_{n} = {(ε_{1}, \dots, ε_{n})}^{⊤}

, and

e_{n}

represents the artificial n-dimensional error due to model misspecification, whose elements are equal to zero for all

i \notin \cup_{j = 1}^{s} Q_{k_{j}}

.

It can be seen that

θ_{l} = 0

if

l \neq 1, k_{j},

or

k_{j} + 1

for

j = 1, \dots, s

. Let

A_{n} = {1, k_{1}, k_{1} + 1, k_{2}, k_{2} + 1, \dots, k_{s}, k_{s} + 1}

. It is important to note that

θ_{l} = 0_{q \times 1}

is evident for all

l \notin A_{n}

. Therefore, the estimation of

A_{n}

comes down to identifying the non-zero elements from

θ

, which is the focus of the subsequent subsections.

2.1. Segment Selection

By the definition of m, each segment

Q_{k_{j}}

contains a change point. In this subsection, our goal is to select segments

Q_{k_{j}}

for

j = 1, \dots, s

. From the above statement, we can see that selecting all non-zero elements of

θ

can yield the estimation of

A_{n}

effectively. Therefore, after the cutting stage, the problem of detecting multiple change points is reduced to a variable selection problem in high-dimensional scenarios. We use the OGA + HDIC + Trim method for the segment selection.

We rewrite the model (3) as follows:

y_{n} = \sum_{j = 1}^{r_{n}} z_{j} θ_{j} + {\tilde{ε}}_{n},

(4)

where

r_{n} = p_{n} \cdot q

,

{z_{1}, \dots, z_{r_{n}}}

are all the column vectors of

x_{n}

,

{\tilde{ε}}_{n} = e_{n} + ε_{n}

and

θ = {(θ_{1}, \dots, θ_{r_{n}})}^{⊤}

. Without loss of generality, replace

y_{n}

by

y_{n} - y_{n}^{⊤} 1_{n}

and

z_{j}

by

z_{j} - z_{j}^{⊤} 1_{n}

. Define

{\hat{σ}}_{J_{d}}^{2} = n^{- 1} y_{n}^{⊤} (I_{n} - H_{J_{d}}) y_{n}

, where

H_{J_{d}}

is the orthogonal projection matrix onto the linear span of

{z_{j}, j \in J_{d}}

. For convenience, denote

H_{\emptyset} = 0

. The proposed OGA+HDIC+Trim algorithm is outlined in Algorithm 1.

Algorithm 1 OGA + HDIC + Trim

Require: response vector

y_{n} \in R^{n}

, regressor matrix

X_{n} \in R^{n \times r_{n}}

.
Initialzation: set

d = 0

,

r^{(0)} = y_{n}

and

{\hat{J}}_{0} = \emptyset

.
While

d = 0, \dots, D_{n}

do
Compute

{\hat{j}}_{d + 1} = arg {max}_{1 \leq j \leq r_{n}, j \notin {\hat{J}}_{d - 1}} | z_{j}^{⊤} r^{(d)} | / (n^{1 / 2} ∥ z_{j} ∥)

and update

{\hat{J}}_{d + 1} = {\hat{J}}_{d} ⋃ {\hat{j}}_{d + 1}

;
Compute

z_{{\hat{j}}_{d + 1}}^{⊥}

via

z_{{\hat{j}}_{d + 1}}^{⊥} = (I_{n} - \sum_{ℓ = 1}^{d} H_{{\hat{j}}_{ℓ}}^{⊥}) z_{{\hat{j}}_{d + 1}}

, where

H_{{\hat{j}}_{ℓ}}^{⊥} = z_{{\hat{j}}_{ℓ}} z_{{\hat{j}}_{ℓ}}^{⊤} / {∥ z_{{\hat{j}}_{ℓ}} ∥}^{2}

;
Compute

r^{(d + 1)}

via

r^{(d + 1)} = (I_{n} - H_{{\hat{j}}_{d + 1}}^{⊥}) r^{(d)}

.
end
Compute the minimum of HDIC via

\begin{matrix} HDIC ({\hat{J}}_{d}) & = log {\hat{σ}}_{{\hat{J}}_{d}}^{2} + ♯ ({\hat{J}}_{d}) c_{n} log (r_{n}) / n, \\ {\hat{d}}_{n} & = arg min_{1 \leq d \leq D_{n}} HDIC ({\hat{J}}_{d}) . \end{matrix}

(5)

return

{\hat{J}}_{n}

via

{\hat{J}}_{n} = \{\begin{matrix} {{\hat{j}}_{ℓ} : HDIC ({\hat{J}}_{d} - {{\hat{j}}_{ℓ}}) > HDIC ({\hat{J}}_{d}), 1 \leq ℓ \leq {\hat{d}}_{n}}, & if {\hat{d}}_{n} > 1; \\ {{\hat{j}}_{1}}, & if {\hat{d}}_{n} = 1 . \end{matrix}

In this context, the value of

D_{n}

denotes the maximum number of iterations, and the convergence rate theory of OGA in [22] shows that

D_{n} = O (n^{1 / 2} {(log r_{n})}^{- 1 / 2})

. The parameter

c_{n}

satisfies the conditions

c_{n} \to \infty

and

c_{n} log r_{n} = o (n^{1 - 2 γ})

, where

γ \in [0, 1)

. Let

\hat{θ} = {({\hat{θ}}_{1}^{⊤}, \dots, {\hat{θ}}_{p_{n}}^{⊤})}^{⊤}

be the estimate obtained by applying the OGA+HDIC+Trim procedure. Therefore, we derive an estimate of

A_{n}

from

{\hat{J}}_{n}

as

{\hat{A}}_{n} = {l : {\hat{θ}}_{l} \neq 0, l = 1, \dots, p_{n}} .

Denote

{\hat{C}}_{n} = \{l : l \in {\hat{A}}_{n}, l + 1 \notin {\hat{A}}_{n}, l = 2, \dots, p_{n} + 1\} = \{{\hat{k}}_{1}, \dots, {\hat{k}}_{\hat{s}}\},

(6)

where

{\hat{k}}_{1} < \dots < {\hat{k}}_{\hat{s}}

. It is clear that if

l + 1 \notin {\hat{A}}_{n}, l \in {\hat{A}}_{n}

and

l - 1 \in {\hat{A}}_{n}

, then

l \in {\hat{C}}_{n}

and

l - 1 \notin {\hat{C}}_{n}

. Let

Q_{(j)} = Q_{k_{j}} \cup Q_{k_{j} + 1}

. It only includes the change point

a_{j}

since

m < {min}_{j} (a_{j + 1} - a_{j}) / 2

, and it ensures that no change point overlaps with the cut-point of

Q_{(j)}

. By following the steps outlined above and considering that the Wald test cannot detect the location of the change point at the partition boundary, we derive the following selected segments:

{\hat{Q}}_{(j)} = {\hat{Q}}_{{\hat{k}}_{j}} ⋃ {\hat{Q}}_{{\hat{k}}_{j} + 1} = {n - (p_{n} - {\hat{k}}_{j} + 1) m + 1, n - (p_{n} - {\hat{k}}_{j} - 1) m},

(7)

where

{\hat{Q}}_{{\hat{k}}_{j}} = {n - (p_{n} - {\hat{k}}_{j} + 1) m + 1, \dots, n - (p_{n} - {\hat{k}}_{j}) m}

.

2.2. Refining

By Theorem 1 in Section 4,

\hat{s}

converges to s as

n \to \infty

. Hence, we assume that for a large n, there exists

{\hat{Q}}_{(j)}

such that

a_{j} \in {\hat{Q}}_{(j)}

. We now show how to estimate this change point. Note that for

i \in {\hat{Q}}_{(j)} = {n - (p_{n} - {\hat{k}}_{j} + 1) m + 1, n - (p_{n} - {\hat{k}}_{j} - 1) m}

, we have

y_{i} = x_{i}^{⊤} [β_{j} + δ_{j} I (a_{j} < i \leq n_{j}^{(r)})] + ε_{i}, n_{j}^{(l)} \leq i \leq n_{j}^{(r)},

(8)

where

n_{j}^{(l)} = n - (p_{n} - {\hat{k}}_{j} + 1) m + 1

and

n_{j}^{(r)} = n - (p_{n} - {\hat{k}}_{j} - 1) m

. Here,

δ_{j} = β_{j + 1} - β_{j}

and

β_{j}

and

β_{j + 1}

are unknown q-dimensional vectors of the regression coefficients on the line segment

{n_{j}^{(l)}, a_{j}}

and

{a_{j}, n_{j}^{(r)},}

, respectively.

n_{j}^{(r)} - n_{j}^{(l)} + 1 = 2 m

.

We compute the sup-Wald test statistics [26] and estimate

a_{j}

by

{\hat{a}}_{j, h} = arg max_{h} {\hat{δ}}_{j; h}^{⊤} (Z_{j, h}^{⊤} M_{j} Z_{j, h}) {\hat{δ}}_{j; h}, n_{j}^{(l)} + q < h < n_{j}^{(r)} - q,

(9)

where

Z_{j, h} = {(0, \dots, 0, x_{h + 1}, x_{h + 2}, \dots, x_{n_{j}^{(r)}})}^{⊤}

,

X_{j} = {(x_{n_{j}^{(l)}}, x_{n_{j}^{(l)} + 1} . \dots, x_{n_{j}^{(r)}})}^{⊤}

, and

M_{j} = I_{n} - X_{j} {(X_{j}^{⊤} X_{j})}^{- 1} x_{j}^{⊤}

. We also obtain the estimates

({\hat{β}}_{j; h}, {\hat{δ}}_{j; h})

of

(β_{j}, δ_{j})

by regressing

y_{(j)} = {(y_{n_{j}^{(l)}}, \dots, y_{n_{j}^{(r)}})}^{⊤}

on

X_{j}

and

Z_{j, h}

, respectively. The limiting behavior of

{\hat{a}}_{j, h}

is given in Section 4.

Since the multiple change points are not dense in the data sequence, without loss of generality, we assume that

m < min (a_{j + 1} - a_{j}) / 2,

j = 1, \dots, s - 1

, so that each segment

Q_{(j)}

contains at most one change point. Note that if m is too small, it leads to inconsistency in estimating the regression parameters and increases the computational time. Therefore, we need to avoid choosing too small a value for m. To address this issue, we define

m = ⌈ c_{0} \sqrt{n} ⌉

according to Theorem 2 in Section 3, where

c_{0}

serves as a tuning parameter, and

⌈ \cdot ⌉

is the ceiling function. We study the range of values of

c_{0}

on the interval

[0.1, 1.5]

. The final value of m is determined using the Bayesian Information Criterion (BIC) as follows:

\hat{m} = arg min_{m} log \{\sum_{i = 1}^{n} {(y_{i} - x_{i}^{⊤} [{\hat{β}}_{1, h} + \sum_{j = 1}^{\hat{s}} {\hat{δ}}_{j, h} I ({\hat{a}}_{j, h} < i \leq n)])}^{2}\} + \hat{s} \cdot q log n .

3. Bootstrap Confidence Intervals for Multiple Change Points

In this section, we construct bootstrap confidence intervals for multiple change points. It helps us to study the behavior of multiple change points in a linear regression model. Obviously, the two-stage procedure inherently involves the quantification of uncertainty in the number of change points and their respective locations.

We define the estimated residuals

{\hat{ε}}_{i}

and the centered residuals

{\tilde{ε}}_{i}

as follows:

\begin{matrix} {\hat{ε}}_{i} & = y_{i} - x_{i}^{⊤} [{\hat{β}}_{1, h} + \sum_{l = 1}^{\hat{s}} {\hat{δ}}_{l, h} I ({\hat{a}}_{l, h} < i \leq n)], \\ {\tilde{ε}}_{i} & = {\hat{ε}}_{i} - \frac{1}{{\hat{a}}_{j, h} - {\hat{a}}_{j - 1, h}} \sum_{l = {\hat{a}}_{j - 1, h} + 1}^{{\hat{a}}_{j, h}} {\hat{ε}}_{l}, {\hat{a}}_{j - 1, h} < i \leq {\hat{a}}_{j, h}, \end{matrix}

(10)

where

{\hat{a}}_{0, h} = 1

and

{\hat{a}}_{\hat{s} + 1, h} = n

. Let

ε_{{\hat{a}}_{j - 1, h} + 1}^{*}, \dots, ε_{{\hat{a}}_{j, h}}^{*}

be independently and identically distributed (i.i.d.) random variables sampled from the empirical distribution function of

{{\tilde{ε}}_{{\hat{a}}_{j - 1, h} + 1}, \dots, {\tilde{ε}}_{{\hat{a}}_{j, h}}}

. We then consider the bootstrap observations defined as follows:

\begin{matrix} y_{i}^{*} & = x_{i}^{⊤} [{\hat{β}}_{1, h} + \sum_{l = 1}^{\hat{s}} {\hat{δ}}_{l, h} I ({\hat{a}}_{l, h} < i \leq n)] + ε_{i}^{*}, i = 1, \dots, n, \end{matrix}

(11)

where

ε_{i}^{*}

represents the bootstrapped version of the residuals. To obtain an approximation to the distribution of

{\hat{a}}_{j; h}

in (9), a bootstrap statistic of the estimate is defined as

{\hat{a}}_{j^{*}; h_{b}}^{*} = arg max_{h_{b}} {\hat{δ}}_{j^{*}; h_{b}}^{* ⊤} (z_{j^{*}, h_{b}}^{⊤} M_{j} Z_{j^{*}, h_{b}}) {\hat{δ}}_{j^{*}; h_{b}}^{*}, j^{*} = 1, \dots, {\hat{s}}^{*},

(12)

where the number of change points

{\hat{s}}^{*}

and the

{\hat{C}}_{n}^{*}

are obtained by performing OGA on

y_{n}^{*} = {(y_{1}^{*}, \dots, y_{n}^{*})}^{⊤}

as in (6). The data sequence segmentation remains unchanged. In this case,

Z_{j^{*}, h_{b}} = {(0, \dots, 0, x_{h_{b} + 1}, \dots, x_{n_{j^{*}}^{(r)}})}^{⊤}

, and

({\hat{β}}_{j^{*}; h_{b}}^{*}, {\hat{δ}}_{j^{*}; h_{b}}^{*})

represents the least-squares estimates obtained by regressing

y_{(j^{*})}^{*} = {(y_{n_{j^{*}}^{(l)}}^{*}, \dots, y_{n_{j^{*}}^{(r)}}^{*})}^{⊤}

on

X_{j^{*}}

and

Z_{j^{*}, h_{b}}

. Next, we describe the construction of bootstrap CIs for multiple change points.

1.: We generate a bootstrap sample ${(y_{1}^{*}, \dots, y_{n}^{*})}^{⊤}$ by randomly sampling residuals ${(ε_{1}^{*}, \dots, ε_{n}^{*})}^{⊤}$ from the set ${{\tilde{ε}}_{1}, \dots, {\tilde{ε}}_{n}}$ as in (11);
2.: We apply the two-stage procedure and compute the local maximizer obtained as in (12) for each estimated segment;
3.: For a given bootstrap sample size B, we repeat Steps 1-2 B times and record ${\hat{a}}_{j^{*}; h_{b}}^{* (b)}$ , $j^{*} = 1, \dots, {\hat{s}}^{*}$ , where $b = 1, \dots, B$ .

Therefore, the bootstrap-based approximation for the change point

a_{j}

can be constructed. Generally, for any

α \in (0, 1)

, the bootstrap

100 (1 - α) %

confidence interval for the change point

a_{j}

is given by the following:

{CIs}_{j} (α) = [{\hat{a}}_{j, h} + q_{U} (α / 2), {\hat{a}}_{j, h} + q_{L} (α / 2)],

(13)

where

q_{U} (α / 2) = sup \{x; \frac{1}{B} \sum_{b = 1}^{B} I ({\hat{a}}_{j^{*}, h_{b}}^{* (b)} - {\hat{a}}_{j; h} \leq x) \leq α / 2\}

, and

q_{L} (α / 2) = inf \{x; \frac{1}{B} \sum_{b = 1}^{B} I ({\hat{a}}_{j^{*}, h_{b}}^{* (b)} - {\hat{a}}_{j; h} \geq x) \geq 1 - α / 2\} .

If

{\hat{C}}_{n} \subseteq {\hat{C}}_{n}^{*}

, then

{\hat{a}}_{j; h_{b}}^{*}

is an estimate of

{\hat{a}}_{j; h}

for

j = 1, \dots, \hat{s}

. If some elements of

{\hat{C}}_{n}

are not in the set

{\hat{C}}_{n}^{*}

, then

{\hat{a}}_{j; h}

has no corresponding bootstrap estimate for some j. Hence, the bootstrap CI of

{\hat{a}}_{j; h}

is constructed using

{{\hat{a}}_{j; h_{b}}^{* (b)} : {\hat{a}}_{j; h_{b}}^{* (b)} \notin \emptyset, b = 1, \dots, B^{*}}

instead of

{{\hat{a}}_{j^{*}; h_{b}}^{* (1)}, \dots, {\hat{a}}_{j^{*}; h_{b}}^{* (B)}}

in (13), which yields

{CIs}_{j}^{*} (α) = [{\hat{a}}_{j, h} + q_{U}^{*} (α / 2), {\hat{a}}_{j, h} + q_{L}^{*} (α / 2)],

(14)

where

q_{U}^{*} (α / 2) = sup \{x : \frac{1}{B^{*}} \sum_{b = 1}^{B^{*}} I ({\hat{a}}_{j, h_{b}}^{* (b)} - {\hat{a}}_{j; h} \leq x) \leq α / 2\},

and

q_{L}^{*} (α / 2) = inf \{x : \frac{1}{B^{*}} \sum_{b = 1}^{B^{*}} I ({\hat{a}}_{j, h_{b}}^{* (b)} - {\hat{a}}_{j; h} \geq x) \geq 1 - α / 2\} .

4. Theoretical Validity of the Bootstrap Confidence Intervals

To investigate the performance of the bootstrap CIs for multiple change points, we make the following assumptions.

Assumption 1.

If

s ⩾ 1

, then

a_{j} / n \to τ_{j} > 0

for

1 \leq j \leq s .

Furthermore, if

s ⩾ 2

, then

{min}_{1 \leq j \leq s - 1} (τ_{j + 1} - τ_{j}) > 0

.

Assumption 2.

\{ε_{i}, i = 1, 2, \dots, n\}

is a sequence of independently and identically distributed random variables with

E (ε_{i}) = 0

and

E (ε_{i}^{2}) = σ^{2}

.

Assumption 3.

a_{j} - n_{j}^{(l)} = ⌈ 2 τ_{j} m ⌉

, where

τ_{j} \in (0, 1)

and

⌈ \cdot ⌉

is the ceilling function.

Assumption 4.

{sup}_{(t_{2} - t_{1}) \geq 1} ∥\sum_{i = t_{1}}^{t_{2}} x_{i} x_{i}^{⊤} / (t_{2} - t_{1})∥

is stochastically bounded.

ε_{i}

is independent of the regressor

x_{j}

for all i and j.

Assumption 5.

δ_{j} \to 0

, and

δ_{j}^{- 1} {(2 m)}^{- 1 / 2 + α} = o (1)

for some

α \in (0, 1 / 2)

as

n \to \infty

.

By Assumption 1, it follows that

m / (a_{j + 1} - a_{j}) \to 0

, i.e., there is at most one change point in each segment for a large n. Assumption 2 follows that the residuals are independent and identically distributed and justifies the use of bootstrapping to estimate the central error, helping generate the sample distribution of the change points. Assumption 3 assumes that the shift point is bounded from the endpoints for asymptotic purposes. Assumption 4 requires that there is enough data around the change point and at the beginning and end of the sample so that the change point can be identified. The asymptotic distribution of

{\hat{a}}_{j, h}

depends on various unknown quantities, with the magnitude of the change

δ_{j}

being the most significant. Assumption 5 is the minimum signal amplitude of the regression coefficient in the high-dimensional setting. The shift amplitude cannot be too small; otherwise, the change point will not be identified. Based on this, we give the necessary Assumption 5. Next, we establish the consistency of the number of change points

\hat{s}

and the change point estimator

{\hat{a}}_{j, h}

. The following theorem provides the consistency of the estimated number of change points.

Theorem 1.

Suppose that

m \to \infty

,

p_{n} \to \infty

,

D_{n} = O ({(n / log (r_{n}))}^{1 / 2})

, and

log (r_{n}) / n \to 0

as

n \to \infty

. Under Assumptions 1–5, we have

\begin{matrix} lim_{n \to \infty} P (\hat{s} = s) = 1; \\ lim_{n \to \infty} P (a_{j} \in {\hat{Q}}_{(j)}, j = 1, \dots, s ∣ \hat{s} = s) = 1, \end{matrix}

(15)

where

\hat{s}

and

{\hat{k}}_{j}, j = 1, \dots, \hat{s}

are given in (6).

Theorem 1 extends Theorem 4 in [22] to the multiple change points detection case. The consistency and asymptotic distribution of change points estimators are given below.

Theorem 2.

If

\sum_{i = t_{1}}^{t_{2}} x_{i} x_{i}^{⊤} / (t_{1} - t_{2}) \to_{p} V

as

t_{2} - t_{1} \to \infty

, under Assumptions 1–4, we have that when

n \to \infty

,

{\hat{a}}_{j, h} - a_{j} = O_{p} (∥ δ_{j} ∥^{- 2})

and

\frac{δ_{j}^{⊤} V δ_{j}}{σ^{2}} ({\hat{a}}_{j, h} - a_{j}) \to_{d} arg max {W (c) - | c | / 2 : c \in R},

where V is a strictly positive definite matrix,

{W (c) : c \in R}

is a two-sided Wiener process, and

δ_{j}

is a fixed value or satisfies

δ_{j} \to 0

, as specified in Assumption 5.

Subsequently, we establish the validity of the bootstrap CIs in (14). For future reference, we define some notations here. Any symbol with a superscript ∗ denotes an object under the bootstrap probability measure, rather than the original measure used in some of the other sections. For example,

E^{*} (\cdot)

denotes the conditional expectation with respect to the bootstrap probability measure conditional on the original data. Similarly,

P^{*} (\cdot)

denotes the conditional probability under the bootstrap measure.

Theorem 3.

Under the assumptions of Theorem 2, we have

sup_{x \in R} | P^{*} ({\hat{a}}_{j, h_{b}}^{*} - {\hat{a}}_{j, h} \leq x) - P ({\hat{a}}_{j, h} - a_{j} \leq x) | \to_{p} 0 .

The proofs of Theorems 2 and 3 are given in Appendix A.

Combining Theorem 3 with Theorems 1 and 2 establishes the validity of the bootstrap method for multiple change points.

Corollary 1.

Under Assumptions 1–5, we have that, as

n \to \infty

,

sup_{x \in R} |P^{*} (\cap_{j = 1}^{s} \{{\hat{a}}_{j, h_{b}}^{*} - {\hat{a}}_{j, h} \leq x\}) - P (\cap_{j = 1}^{s} \{{\hat{a}}_{j, h} - a_{j} \leq x\})| \to_{p} 0 .

Since

P (a_{j} \in {CIs}_{j}^{*} (α)) \to 1 - α

for each

j = 1, \dots, s

, by the Bonferroni correction, we have

P (\cap_{j = 1}^{s} \{a_{j} \in {CIs}_{j}^{*} (α / s)\}) \to 1 - α

. The asymptotic validity of the proposed bootstrap CIs in (14) follows.

5. Simulation

In this section, we first present the simulation results for change point detection given in Section 2 and compare them with the two-stage multiple change point detection procedure involving LASSO (TSMCD_lasso) by [10]. We also construct confidence intervals using the bootstrap method proposed in Section 3. We denote the CIs in (14) by bootstrap_oga,wald.

5.1. Detection of Multiple Change Points

We consider the simulation setting where the change points

a_{j}

,

j = 1, 2, 3

are, respectively, 150, 300, and 450, respectively, and generate data from the model

\begin{matrix} y_{t} = & 2 cos (t π / 30) + 2 sin (t π / 30) + 0.1 y_{t - 1} \\ + (- 3 cos (t π / 30) + sin (t π / 30) + 0.2 y_{t - 1}) I_{(150, 600]} (t) \\ + (2 cos (t π / 30) - 0.3 y_{t - 1}) I_{(300, 600]} (t) \\ + (2 cos (t π / 30) + 2 sin (t π / 30)) I_{(450, 600]} (t) + ε_{t}, \end{matrix}

(16)

where

ε_{1}, \dots, ε_{n}

are independent and follow the standard normal distribution. The simulated data are shown in Figure 1. In this model,

{y_{t}, t = 1, \dots, 600}

represents a periodic autocorrelation sequence with a period of 30 and an autocorrelation order of 1.

We perform 1000 Monte Carlo simulations for multiple change point estimation in Table 1. According to [22], we take

c_{n} = 2

in (5), similar to AIC. We focus on counting the number of events for which

{| {\hat{a}}_{j} - a_{j} | \leq 5}

. The percentage of the correct identifications of all change points, denoted as

c_{all} (%)

, reflects the proportion of replicates for which

| {\hat{a}}_{j} - a_{j} | \leq 5

for all j. In addition,

c_{j} (%)

represents the proportion of replicates for which

| {\hat{a}}_{j} - a_{j} | \leq 5

. The mean and standard error of the estimated change points are calculated for the replicates for which the difference between the estimated change points and the true value is less than or equal to 50 (i.e.,

| {\hat{a}}_{j} - a_{j} | \leq 50

).

From Table 1, we can see that TSP_oga,wald generally outperforms TSMCD_lasso in terms of yielding a correct identification rate. It is noteworthy that TSMCD_lasso performs significantly worse than TSP_oga,wald in identifying all change points, especially at

a_{1}

, as evidenced by the lower

c_{1}

value. TSP_oga,wald and TSMCD_lasso show comparable estimation accuracy in terms of the mean and standard error (SE) of the estimated change points.

5.2. Bootstrap CIs

All results are based on 500 realizations in the simulation setting, and we use

B = 500

to give the corresponding bootstrap CIs. We assume the confidence level is

(1 - α) \in {0.9, 0.95}

. For each j, the coverage of the bootstrap CI is calculated as the proportion of simulated realizations where

{CIs}_{j}^{*} (α)

contains

a_{j}

. For all j, the coverage of the bootstrap CIs is calculated as the proportion of simulated realizations, where

\cup_{j = 1}^{s} {CIs}_{j}^{*} (α / s)

contains all

a_{j}

.

Table 2 confirms the effectiveness of our bootstrap method, as the empirical coverage probability for each

a_{j}

is close to the nominal level. The overall coverage of the bootstrap confidence intervals for all j is slightly higher than the nominal level, which may be due to the complexity of multiple change point detection. The average computational time for the bootstrap_oga,wald procedure is 1.44 min per Monte Carlo replication, as measured on an Intel(R) Core(TM) i9-14900K processor (3.20 GHz) with 64 GB of RAM.

6. Empirical Application

In this section, we illustrate the proposed method by an application to the east–west component of seismograms, recorded at Iwanai station during the first foreshock of the Urakawa–Oki earthquake in 1982. This dataset has been previously studied by [27]. The time series data are analyzed using autoregressive models (AR) of an order of 5. The estimates and visualization of the 95% confidence level bootstrap CIs are presented in Table 3 and Figure 2, respectively.

It can be seen from Table 3 that change points are detected at 3074 and 3914. In [27], they found the estimated change points to be 3079 and 3929. It is noteworthy that [27]s’ results and ours are close. In geology, these two change points represent the arrival times of P-waves and S-waves, which are two types of seismic waves. From Figure 2, it is clear that the confidence interval (CI) of the first change point is narrower than that of the second change point.

This example demonstrates the applicability and effectiveness of our method in detecting change points in seismic data. By accurately identifying the locations of structural faults, we can gain insights into the underlying geological processes and improve our understanding of seismic events.

7. Conclusions

This paper effectively addresses the bias issues often encountered when using OGA for model selection and the estimation of segments containing change points in the cutting stage. The accuracy of multiple change point estimation is improved by applying sup-Wald-type test statistics in the refining stage. The proposed method successfully constructs confidence intervals for multiple change points by using a bootstrapping technique with a two-stage procedure to quantify the uncertainty of multiple change points. The reliable construction of confidence intervals makes this method a valuable addition to the field of change point analysis and regression modeling. The bootstrapping technique also guarantees asymptotic validity. Numerical studies demonstrate the statistical accuracy of the proposed method. Our method can also be applied together with block bootstrapping to the parameter changes of linear regression models with dependent errors.

Author Contributions

Conceptualization, L.H., B.J. and Y.W.; methodology, L.H. and B.J.; data analysis, L.H. and F.W.; writing, L.H., Y.W. and F.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Natural Science Foundation of China (12231017, 72293573); and the Natural Science and Engineering Research Council of Canada (RGPIN-2023-05655).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Proof of Theorem 2.

We only sketch this proof because it is similar to the proof of Propositions 1 and 2 in [26]. To prove Theorem 2, we define

V (h) = {\hat{δ}}_{j; h}^{⊤} (Z_{j, h}^{⊤} M_{j} Z_{j, h}) {\hat{δ}}_{j; h}

. For the sake of simplicity, we denote

h_{0} = a_{j}

, which means that

h_{0}

is the change point in

I_{k_{j}}

. If

h = h_{0}

, we can obtain that

Z_{j, h} = Z_{j, h_{0}}

, where

Z_{j, h_{0}} = {(0, \dots, 0, x_{h_{0}}, \dots, x_{n_{j}^{(r)}})}^{⊤}

. By Equation (9),

{\hat{a}}_{j, h} = arg {max}_{h} V (h)

. Note that

\begin{matrix} {\hat{δ}}_{j; h} = {(Z_{j, h}^{⊤} M_{j} Z_{j, h})}^{- 1} (Z_{j, h}^{⊤} M_{j} Z_{j, h_{0}} δ_{j} + Z_{j, h}^{⊤} M_{j} ε), \\ {\hat{δ}}_{j, h_{0}} = δ_{j} + {(Z_{j, h_{0}}^{⊤} M_{j} Z_{j, h_{0}})}^{- 1} Z_{j, h_{0}}^{⊤} M_{j} ε . \end{matrix}

It follows that

\begin{matrix} V (h) - V (h_{0}) & = δ_{j}^{⊤} \{(Z_{j, h_{0}}^{⊤} M_{j} Z_{j, h}) {(Z_{j, h}^{⊤} M_{j} Z_{j, h})}^{- 1} (Z_{j, h}^{⊤} M_{j} Z_{j, h_{0}}) - (Z_{j, h_{0}}^{⊤} M_{j} Z_{j, h_{0}})\} δ_{j} \\ + v (h), \end{matrix}

where

\begin{matrix} v (h) & = 2 δ_{j}^{⊤} (Z_{j, h_{0}}^{⊤} M_{j} Z_{j, h}) {(Z_{j, h}^{⊤} M_{j} Z_{j, h})}^{- 1} Z_{j, h_{0}}^{⊤} M_{j} ε - 2 δ_{j}^{⊤} Z_{j, h_{0}}^{⊤} M_{j} ε \\ + ε^{⊤} M_{j} Z_{j, h} {(Z_{j, h}^{⊤} M_{j} Z_{j, h})}^{- 1} Z_{j, h}^{⊤} M_{j} ε - ε^{⊤} M_{j} Z_{j, h_{0}} {(Z_{j, h_{0}}^{⊤} M_{j} Z_{j, h_{0}})}^{- 1} Z_{j, h_{0}}^{⊤} M_{j} ε . \end{matrix}

Define for

h \neq h_{0}

,

\begin{matrix} g (h) & = δ_{j}^{⊤} |(Z_{j, h_{0}}^{⊤} M_{j} Z_{j, h}) {(Z_{j, h}^{⊤} M_{j} Z_{j, h})}^{- 1} (Z_{j, h}^{⊤} M_{j} Z_{j, h_{0}}) \\ - Z_{j, h_{0}} M_{j} Z_{j, h_{0}}| δ_{j} / |h_{0} - h| . \end{matrix}

When

h = h_{0}

, define

g (h) = δ_{j}^{⊤} δ_{j}

. Then, we have

\begin{matrix} V (h) - V (h_{0}) & = - | h_{0} - h | g (h) + v (h) . \end{matrix}

(A1)

Let

Z_{Δ} = \{\begin{matrix} Z_{j, h} - Z_{j, h_{0}} = {(0, \dots, 0, x_{h + 1}, \dots, x_{h_{0}}, 0, \dots, 0)}^{⊤}, & h < h_{0}, \\ Z_{j, h_{0}} - Z_{j, h} = {(0, \dots, 0, x_{h_{0} + 1}, \dots, x_{h}, 0, \dots, 0)}^{⊤}, & h > h_{0}, \\ 0, & h = h_{0} . \end{matrix}

We have

Z_{j, h_{0}} = Z_{j, h} - Z_{Δ} sgn (h_{0} - h)

. It follows that

\begin{matrix} v (h) & = v_{1} (h) + v_{2} (h) + v_{3} (h) + v_{4} (h) + v_{5} (h) \\ = [2 δ_{j}^{⊤} Z_{Δ}^{⊤} ε] sgn (h_{0} - h) \\ - [2 δ_{j}^{⊤} Z_{Δ}^{⊤} X_{j} {(X_{j}^{⊤} X_{j})}^{- 1} X_{j}^{⊤} ε] sgn (h_{0} - h) \\ - [2 δ_{j}^{⊤} (Z_{Δ}^{⊤} M_{j} Z_{j, h}) {(Z_{j, h}^{⊤} M_{j} Z_{j, h})}^{- 1} Z_{j, h}^{⊤} M_{j} ε] sgn (h_{0} - h) \\ + ε^{⊤} M_{j} Z_{j, h} {(Z_{j, h}^{⊤} M_{j} Z_{j, h})}^{- 1} Z_{j, h}^{⊤} M_{j} ε \\ - ε^{⊤} M_{j} Z_{j, h_{0}} {(Z_{j, h_{0}}^{⊤} M_{j} Z_{j, h_{0}})}^{- 1} Z_{j, h_{0}}^{⊤} M_{j} ε . \end{matrix}

(A2)

We now establish the convergence rate of the change point estimator

{\hat{a}}_{j, h}

in Theorem 2. By Lemma A.2 in [26], there exists

λ > 0

such that for every

ϵ > 0

,

{inf}_{| h - h_{0} | > C ∥ δ_{j} ∥^{- 2}} g (h) \geq λ {∥ δ_{j} ∥}^{2}

has a probability of at least

1 - ϵ

. Therefore, we only need to prove that

\begin{matrix} P (| {\hat{a}}_{j, h} - a_{j} | > C ∥ δ_{j} ∥^{- 2}) \\ \leq & P (sup_{|h - h_{0}| > C {∥ δ_{j} ∥}^{- 2}} V (h) \geq V (h_{0})) \\ \leq & P (sup_{h \in K (C)} |\frac{v (h)}{(h_{0} - h) {∥ δ_{j} ∥}^{2}}| > λ) \leq ϵ, \end{matrix}

(A3)

where

K (C) = \{h : |h - h_{0}| > C {∥ δ_{j} ∥}^{- 2} and n_{j}^{(l)} + η N_{j} \leq h \leq n_{j}^{(r)} - η N_{j}\}

for a small number of

η > 0

. It is easy to show that

\begin{matrix} sup_{h \in K (C)} \frac{| v_{2} (h) |}{| h_{0} - h | ∥ δ_{j} ∥^{2}} & \leq 2 ∥ δ_{j} ∥^{- 1} ∥ Z_{Δ}^{⊤} x_{j} / (h_{0} - h) ∥ ∥ {(x_{j}^{⊤} x_{j})}^{- 1} x_{j}^{⊤} ε ∥ \\ = O_{p} (∥ δ_{j} ∥^{- 1} N_{j}^{- 1 / 2}) = o_{p} (1), \\ sup_{h \in K (C)} \frac{v_{3} (h)}{| h_{0} - h | ∥ δ_{j} ∥^{2}} & \leq ∥ δ_{j} ∥^{- 1} ∥ (Z_{Δ}^{⊤} M_{j} Z_{j, h}) / (h_{0} - h) ∥ ∥ {(Z_{j, h}^{⊤} M_{j} Z_{j, h})}^{- 1} Z_{j, h}^{⊤} M_{j} ε ∥ \\ = O_{p} (∥ δ_{j} ∥^{- 1} N_{j}^{- 1 / 2}) = o_{p} (1), \\ sup_{h \in K (C)} \frac{| v_{4} (h) |}{| h_{0} - h | ∥ δ_{j} ∥^{2}} & \leq ∥ δ_{j} ∥^{- 2} ∥ {(Z_{j, h}^{⊤} M_{j} Z_{j, h})}^{- 1 / 2} Z_{j, h}^{⊤} M_{j} {ε ∥}^{2} / | h_{0} - h | \\ = O_{p} (∥ δ_{j} ∥^{- 2} | h_{0} - h |^{- 1}) = O_{p} (1), \\ sup_{h \in K (C)} \frac{| v_{5} (h) |}{| h_{0} - h | ∥ δ_{j} ∥^{2}} & \leq ∥ δ_{j} ∥^{- 2} ∥ {(Z_{j, h_{0}}^{⊤} M_{j} Z_{j, h_{0}})}^{- 1 / 2} Z_{j, h_{0}}^{⊤} M_{j} {ε ∥}^{2} / | h_{0} - h | \\ = O_{p} (∥ δ_{j} ∥^{- 2} | h_{0} - h |^{- 1}) = O_{p} (1) . \end{matrix}

By Lemma A.3 in [26], there exists a large C such that

\begin{matrix} P (sup_{h \in K (C)} \frac{| v_{1} (h) |}{| h_{0} - h | ∥ δ_{j} ∥^{2}} > \frac{λ}{5}) \\ \leq & P (sup_{h \in K (C)} ∥ δ_{j} ∥^{- 1} ∥ Z_{Δ}^{⊤} ε / (h_{0} - h) ∥ > \frac{λ}{10}) \leq \frac{100 B}{λ^{2} C} \leq \frac{ϵ}{5} . \end{matrix}

This completes the proof of (A3).

To study the limiting distribution of

{\hat{a}}_{j, h}

, we need to investigate the behavior of

V (h)

on

D (C) = {h : | h - h_{0} | \leq C ∥ δ_{j} ∥^{- 2}}

. Since

∥ X_{j}^{⊤} Z_{Δ} ∥ = O_{p} (∥ δ_{j} ∥^{- 2})

and

∥ Z_{j, h}^{⊤} M_{j} Z_{Δ} ∥ = O_{p} (∥ δ_{j} ∥^{- 2})

, we have

\begin{matrix} | h_{0} - h | g (h) & = δ_{j}^{⊤} \{(Z_{j, h_{0}}^{⊤} M_{j} Z_{j, h}) {(Z_{j, h}^{⊤} M_{j} Z_{j, h})}^{- 1} (Z_{j, h}^{⊤} M_{j} Z_{j, h_{0}}) - (Z_{j, h_{0}}^{⊤} M_{j} Z_{j, h_{0}})\} δ_{j} \\ = δ_{j}^{⊤} {Z_{Δ}^{⊤} Z_{Δ} - Z_{Δ}^{⊤} X_{j} {(X_{j}^{⊤} X_{j})}^{- 1} X_{j}^{⊤} Z_{Δ}} δ_{j} \\ - δ_{j} {(Z_{Δ}^{⊤} M_{j} Z_{j, h}) {(Z_{j, h}^{⊤} M_{j} Z_{j, h})}^{- 1} (Z_{j, h}^{⊤} M_{j} Z_{Δ})} δ_{j} \\ = δ_{j}^{⊤} Z_{Δ}^{⊤} Z_{Δ} δ_{j} + o_{p} (1) . \end{matrix}

Consider

v (h)

in (A2). It is straightforward to prove that if

| h - h_{0} | \leq C ∥ δ_{j} ∥^{- 2}

, then

v_{2} (h) = O_{p} (∥ δ_{j} ∥^{- 1} N_{j}^{- 1 / 2}) = o_{p} (1)

, and

v_{3} (h) = O_{p} (∥ δ_{j} ∥^{- 1} N_{j}^{- 1 / 2}) = o_{p} (1)

. It can also be shown that

\begin{matrix} v_{4} (h) + v_{5} (h) & = ε^{⊤} M_{j} Z_{j, h_{0}} [{(Z_{j, h}^{⊤} M_{j} Z_{j, h})}^{- 1} - {(Z_{j, h_{0}}^{⊤} M_{j} Z_{j, h_{0}})}^{- 1}] Z_{j, h_{0}}^{⊤} M_{j} ε \\ + ε^{⊤} M_{j} Z_{Δ} {(Z_{j, h}^{⊤} M_{j} Z_{j, h})}^{- 1} Z_{j, h}^{⊤} M_{j} ε \\ + ε^{⊤} M_{j} Z_{j, h_{0}} {(Z_{j, h}^{⊤} M_{j} Z_{j, h})}^{- 1} Z_{Δ}^{⊤} M_{j} ε = o_{p} (1) \end{matrix}

on

D (C)

. Therefore, we obtain that

V (h) - V (h_{0}) = - δ_{j}^{⊤} Z_{Δ}^{⊤} Z_{Δ} δ_{j} + [2 δ_{j}^{⊤} Z_{Δ}^{⊤} ε] sgn (h_{0} - h) + o_{p} (1) .

(A4)

In light of the proof of Proposition 2 in [26], we obtain the limiting distribution in Theorem 2. □

In the following, we introduce lemmas that are instrumental in proving Theorem 3.

Lemma A1.

If

δ_{j}

is fixed or

δ_{j} \to 0

but satisfies Assumption 5, then as

N_{j} \to \infty

,

{var}^{*} (ε_{n_{j}^{(l)}}^{*}) = N_{j}^{- 1} \sum_{i = n_{j}^{(l)}}^{n_{j}^{(r)}} {(ε_{i} - {\bar{ε}}_{N_{j}})}^{2} + O_{p} (N_{j}^{- 1}) .

where

ε_{n_{j}^{(l)}}^{*}

is defined in (11).

Proof f Lemma A1.

In view of (1) and (10), we have

\begin{matrix} {\tilde{ε}}_{i} & = {[x_{i} I (a_{j} < i \leq n_{j}^{(r)}) - \frac{1}{n_{j}^{(r)} - a_{j}} \sum_{l = a_{j} + 1}^{n_{j}^{(r)}} x_{l}]}^{⊤} δ_{j} + {(x_{i} - {\bar{x}}_{j})}^{⊤} (β_{j} - {\hat{β}}_{j, h}) \\ - {[x_{i} I ({\hat{a}}_{j, h} < i \leq n_{j}^{(r)}) - \frac{1}{N_{j}} \sum_{l = {\hat{a}}_{j, h} + 1}^{n_{j}^{(r)}} x_{l}]}^{⊤} {\hat{δ}}_{j, h} + ε_{i} - {\bar{ε}}_{N_{j}}, \end{matrix}

(A5)

where

{\bar{x}}_{j} = \frac{1}{N_{j}} \sum_{l = n_{j}^{(l)}}^{n_{j}^{(r)}} x_{l}

and

{\bar{ε}}_{N_{j}} = \frac{1}{N_{j}} \sum_{l = n_{j}^{(l)}}^{n_{j}^{(r)}} ε_{l}

.

Assume

a_{j} < {\hat{a}}_{j, h}

(the other case can be handled in a similar way). Since

E^{*} (ε_{n_{j}^{(l)}}^{*}) = N_{j}^{- 1} \sum_{i = n_{j}^{(l)}}^{n_{j}^{(r)}} {\tilde{ε}}_{i} = 0

, we have

{var}^{*} (ε_{n_{j}^{(l)}}^{*}) = N_{j}^{- 1} \sum_{i = n_{j}^{(l)}}^{n_{j}^{(r)}} {\tilde{ε}}_{i}^{2}

. It follows from (A5) that

\begin{matrix} {var}^{*} (ε_{n_{j}^{(l)}}^{*}) & = \frac{1}{N_{j}} \sum_{i = n_{j}^{(l)}}^{n_{j}^{(r)}} {(ε_{i} - {\bar{ε}}_{N_{j}})}^{2} + \frac{2}{N_{j}} \sum_{i = n_{j}^{(l)}}^{n_{j}^{(r)}} {(x_{i} - {\bar{x}}_{j})}^{⊤} (β_{j} - {\hat{β}}_{j, h}) (ε_{i} - {\bar{ε}}_{N_{j}}) \\ + \frac{1}{N_{j}} \sum_{i = n_{j}^{(l)}}^{n_{j}^{(r)}} {[{(x_{i} - {\bar{x}}_{j})}^{⊤} (β_{j} - {\hat{β}}_{j, h})]}^{2} + \frac{1}{N_{j}^{2}} {(\sum_{i = {\hat{a}}_{j, h} + 1}^{n_{j}^{(r)}} x_{i}^{⊤} (δ_{j} - {\hat{δ}}_{j, h}))}^{2} \\ + \frac{1}{N_{j}^{2}} {(\sum_{i = a_{j} + 1}^{{\hat{a}}_{j, h}} x_{i}^{⊤} δ_{j})}^{2} + \frac{2}{N_{j}^{2}} \sum_{i = a_{j} + 1}^{{\hat{a}}_{j, h}} x_{i}^{⊤} δ_{j} \sum_{l = {\hat{a}}_{j, h} + 1}^{n_{j}^{(r)}} x_{l}^{⊤} (δ_{j} - {\hat{δ}}_{j, h}) . \end{matrix}

(A6)

According to Corollary 1 in [26], we have

β_{j} - {\hat{β}}_{j, h} = O_{p} (N_{j}^{- 1 / 2})

and

δ_{j} - {\hat{δ}}_{j, h} = O_{p} (N_{j}^{- 1 / 2})

, which, combined with

{\hat{a}}_{j, h} = a_{j} + O_{p} (∥ δ_{j} ∥^{- 2})

, yields that

N_{j}^{- 1} \sum_{i = a_{j} + 1}^{{\hat{a}}_{j, h}} x_{i}^{⊤} δ_{j} = O_{p} (N_{j}^{- 1} ∥ δ_{j} ∥^{- 1})

and

N_{j}^{- 1} \sum_{l = {\hat{a}}_{j, h} + 1}^{n_{j}^{(r)}} x_{l}^{⊤} (δ_{j} - {\hat{δ}}_{j, h}) = O_{p} (N_{j}^{- 1 / 2})

. This completes the proof.

□

Lemma A2.

For every

ϵ > 0

,

\underset{N_{j} \to \infty}{lim sup} P^{*} (|N_{j}^{- 1} \sum_{i = n_{j}^{(l)}}^{n_{j}^{(r)}} {(ε_{i}^{*})}^{2} - V a r^{*} (ε_{n_{j}^{(l)}}^{*})| \geq ϵ) \leq ϵ σ^{2} .

Proof of Lemma A2.

We have for every

ϵ > 0

and every

η > 0

,

\begin{matrix} P^{*} (|N_{j}^{- 1} \sum_{i = n_{j}^{(l)}}^{n_{j}^{(r)}} ε_{i}^{* 2} - {var}^{*} (ε_{n_{j}^{(l)}}^{*})| \geq ϵ) \\ = & P^{*} (|N_{j}^{- 1} \sum_{i = n_{j}^{(l)}}^{n_{j}^{(r)}} ε_{i}^{* 2} - N_{j}^{- 1} \sum_{i = n_{j}^{(l)}}^{n_{j}^{(r)}} {\tilde{ε}}_{i}^{2}| \geq ϵ) \\ \leq & P^{*} (|N_{j}^{- 1} \sum_{i = n_{j}^{(l)}}^{n_{j}^{(r)}} ε_{i}^{* 2} I (| ε_{i}^{*} | \geq η \sqrt{N_{j}}) - N_{j}^{- 1} \sum_{i = n_{j}^{(l)}}^{n_{j}^{(r)}} {\tilde{ε}}_{i}^{2} I (| {\tilde{ε}}_{i} | \geq η \sqrt{N_{j}})| \geq ϵ) \\ + P^{*} (|N_{j}^{- 1} \sum_{i = n_{j}^{(l)}}^{n_{j}^{(r)}} ε_{i}^{* 2} I (| ε_{i}^{*} | < η \sqrt{N_{j}}) - N_{j}^{- 1} \sum_{i = n_{j}^{(l)}}^{n_{j}^{(r)}} {\tilde{ε}}_{i}^{2} I (| {\tilde{ε}}_{i} | < η \sqrt{N_{j}})| \geq ϵ) \end{matrix}

(A7)

Clearly, for every

ϵ > 0

and every

η > 0

, as

N_{j} \to \infty

, we have

N_{j}^{- 1} \sum_{i = n_{j}^{(l)}}^{n_{j}^{(r)}} {\tilde{ε}}_{i}^{2} I (| {\tilde{ε}}_{i} | \geq η \sqrt{N_{j}}) \to 0 a . s .

It follows that

P^{*} (N_{j}^{- 1} \sum_{i = n_{j}^{(l)}}^{n_{j}^{(r)}} ε_{i}^{* 2} I (|ε_{i}^{*}| \geq η \sqrt{N_{j}}) \geq ϵ / 2) \leq \frac{2}{ϵ} N_{j}^{- 1} \sum_{i = n_{j}^{(l)}}^{n_{j}^{(r)}} {\tilde{ε}}_{i}^{2} I (|{\tilde{ε}}_{i}| \geq η \sqrt{N_{j}}) \to 0, a . s .

Since

\begin{matrix} \underset{N_{j} \to \infty}{lim sup} P^{*} (|N_{j}^{- 1} \sum_{i = n_{j}^{(r)}}^{n_{j}^{(l)}} ε_{i}^{* 2} I (|ε_{i}^{*}| < η \sqrt{N_{j}}) - N_{j}^{- 1} \sum_{i = n_{j}^{(r)}}^{n_{j}^{(l)}} {\tilde{ε}}_{i}^{2} I (|{\tilde{ε}}_{i}| < η \sqrt{N_{j}})| \geq ϵ / 2) \\ \leq & \frac{4}{ϵ^{2}} \underset{N_{j} \to \infty}{lim sup} N_{j}^{- 2} \sum_{i = n_{j}^{(r)}}^{n_{j}^{(l)}} {\tilde{ε}}_{i}^{4} I (|{\tilde{ε}}_{i}| < η \sqrt{N_{j}}) \\ \leq & \frac{4}{ϵ^{2}} η^{2} \underset{n \to \infty}{lim sup} N_{j}^{- 1} \sum_{i = n_{j}^{(r)}}^{n_{j}^{(l)}} {\tilde{ε}}_{i}^{2} = \frac{4}{ϵ^{2}} η^{2} σ^{2} a . s ., \end{matrix}

choosing

η^{2} = ϵ^{3} / 8

gives the assertion in Lemma A2. □

Lemma A3.

Under the assumption that

\sum_{i = t_{1}}^{t_{2}} x_{i} x_{i}^{⊤} / (t_{1} - t_{2}) \to_{p} V

as

t_{2} - t_{1} \to_{p} \infty

, for every

ϵ > 0

, there exists

λ > 0

such that

g^{*} \geq λ {∥ {\hat{δ}}_{j, h} ∥}^{2}

with a probability of at least

1 - ϵ

, where

g^{*} = {inf}_{|k_{2} - h| > N_{j} η} g^{*} (h_{b})

.

Lemma A4.

Under the assumption that

\sum_{i = t_{1}}^{t_{2}} x_{i} x_{i}^{⊤} / (t_{1} - t_{2}) \to_{p} V

as

t_{2} - t_{1} \to \infty

, for every

ϵ > 0

and

λ > 0

, there exists

N_{0} > 0

, such that

N_{j} > N_{0}

,

P^{*} (| h_{b} - h | > N_{j} η) < ϵ

.

The proofs of Lemmas A3 and A4 follow from [26] (see Lemmas A.2 and A.4). We next establish the consistency of

{\hat{a}}_{j, h_{b}}^{*}

in Lemma A5, which serves as a key step in proving Theorem 3.

Lemma A5.

If

δ_{j}

is fixed or

δ_{j} \to 0

but satisfies Assumption 5, under Assumptions 1–4, we have

{\hat{a}}_{j, h_{b}}^{*} - {\hat{a}}_{j, h} = O_{p} (∥ {\hat{δ}}_{j, h} ∥^{- 2}) .

Proof of Lemma A5.

We define

V^{*} (h_{b}) = {({\hat{δ}}_{j; h_{b}}^{*})}^{⊤} (Z_{j, h_{b}}^{⊤} M_{j} Z_{j, h_{b}}) {\hat{δ}}_{j; h_{b}}^{*}

. By Equation (12),

{\hat{a}}_{j, h_{b}}^{*} = arg {max}_{h_{b}} V^{*} (h_{b})

. Denote

h = {\hat{a}}_{j, h}

. If

h_{b} = h

, then

Z_{j, h_{b}} = Z_{j, h}

. It can be seen from (11) that

\begin{matrix} {\hat{δ}}_{j; h_{b}}^{*} = {(Z_{j, h_{b}}^{⊤} M_{j} Z_{j, h_{b}})}^{- 1} (Z_{j, h_{b}}^{⊤} M_{j} Z_{j, h} {\hat{δ}}_{j, h} + Z_{j, h_{b}}^{⊤} M_{j} ε^{*}), \\ {\hat{δ}}_{j; h}^{*} = {\hat{δ}}_{j, h} + {(Z_{j, h}^{⊤} M_{j} Z_{j, h})}^{- 1} Z_{j, h}^{⊤} M_{j} ε^{*} . \end{matrix}

It follows that

\begin{matrix} V^{*} (h_{b}) - V^{*} (h) & = {\hat{δ}}_{j, h}^{⊤} \{(Z_{j, h}^{⊤} M_{j} Z_{j, h_{b}}) {(Z_{j, h_{b}}^{⊤} M_{j} Z_{j, h_{b}})}^{- 1} (Z_{j, h_{b}}^{⊤} M_{j} Z_{j, h}) - (Z_{j, h}^{⊤} M_{j} Z_{j, h})\} {\hat{δ}}_{j, h} \\ + v^{*} (h_{b}), \end{matrix}

where

\begin{matrix} v^{*} (h_{b}) & = 2 {\hat{δ}}_{j, h}^{⊤} (Z_{j, h}^{⊤} M_{j} Z_{j, h_{b}}) {(Z_{j, h_{b}}^{⊤} M_{j} Z_{j, h_{b}})}^{- 1} Z_{j, h_{b}}^{⊤} M_{j} ε^{*} - 2 {\hat{δ}}_{j, h}^{⊤} Z_{j, h}^{⊤} M_{j} ε^{*} \\ + ε^{* ⊤} M_{j} Z_{j, h_{b}} {(Z_{j, h_{b}}^{⊤} M_{j} Z_{j, h_{b}})}^{- 1} Z_{j, h_{b}}^{⊤} M_{j} ε^{*} - ε^{* ⊤} M_{j} Z_{j, h} {(Z_{j, h}^{⊤} M_{j} Z_{j, h})}^{- 1} Z_{j, h}^{⊤} M_{j} ε^{*} . \end{matrix}

Define

g^{*} (h_{b}) = \{\begin{matrix} {\hat{δ}}_{j, h}^{⊤} |(Z_{j, h}^{⊤} M_{j} Z_{j, h_{b}}) {(Z_{j, h_{b}}^{⊤} M_{j} Z_{j, h_{b}})}^{- 1} (Z_{j, h_{b}}^{⊤} M_{j} Z_{j, h}) - Z_{j, h} M_{j} Z_{j, h}| {\hat{δ}}_{j, h} / |h - h_{b}|, & if h_{b} \neq h \\ {\hat{δ}}_{j, h}^{⊤} {\hat{δ}}_{j, h}, & if h_{b} = h . \end{matrix}

We have

V^{*} (h_{b}) - V^{*} (h) = - | h - h_{b} | g^{*} (h_{b}) + v^{*} (h_{b}) .

(A8)

Again, since

V^{*} (h_{b}) \geq V^{*} (h)

by definition, it suffices to prove, by Lemma A4, that

P^{*} (sup_{|h_{b} - h| > C {∥ {\hat{δ}}_{j, h} ∥}^{- 2}} V^{*} (h_{b}) \geq V^{*} (h)) < ϵ .

By Lemmas A1 and A2, for any

ϵ > 0

and

η > 0

, we have

P^{*} (|h_{b} - h| > N_{j} η) < ϵ

for a large

N_{j}

. Therefore, to prove the above equation, we only need to show that

P^{*} (sup_{h_{b} \in K^{*} (C)} V (h_{b}) \geq V (h)) < ϵ,

where

K^{*} (C) = \{h_{b} : |h_{b} - h| > C {∥ {\hat{δ}}_{j, h} ∥}^{- 2} a n d n_{j}^{(l)} + N_{j} η \leq h_{b} \leq n_{j}^{(r)} - η N_{j}\}

for a small number

η > 0

. Finding

{sup}_{h_{b} \in K^{*} (C)} V (h_{b})

is equivalent to a restricted search, this is legitimate only after the consistency has been established. Thus, by Lemma A3, it suffices to show that

P^{*} (sup_{h_{b} \in K^{*} (C)} \frac{| v^{*} (h_{b}) |}{| h - h_{b} | ∥ {\hat{δ}}_{j, h} ∥^{2}} > λ) < ϵ .

(A9)

Next, consider

v^{*} (h_{b})

. Denote

Z_{Δ_{b}} = \{\begin{matrix} Z_{j, h_{b}} - Z_{j, h} = {(0, \dots, 0, x_{h_{b} + 1}, \dots, x_{h}, 0, \dots, 0)}^{⊤}, & h_{b} < h, \\ Z_{j, h} - Z_{j, h_{b}} = {(0, \dots, 0, x_{h + 1}, \dots, x_{h_{b}}, 0, \dots, 0)}^{⊤}, & h_{b} > h, \\ 0, & h_{b} = h . \end{matrix}

It follows that

Z_{j, h_{b}} = Z_{j, h} + Z_{Δ_{b}} sgn (h - h_{b})

. Thus, we have

\begin{matrix} v^{*} (h_{b}) & = v_{1}^{*} (h_{b}) + v_{2}^{*} (h_{b}) + v_{3}^{*} (h_{b}) + v_{4}^{*} (h_{b}) + v_{5}^{*} (h_{b}) \\ = [2 {\hat{δ}}_{j, h}^{⊤} Z_{Δ_{b}}^{⊤} ε^{*}] sgn (h - h_{b}) \\ - [2 {\hat{δ}}_{j, h}^{⊤} Z_{Δ_{b}}^{⊤} x_{j} {(x_{j}^{⊤} x_{j})}^{- 1} x_{j}^{⊤} ε^{*}] sgn (h - h_{b}) \\ - [2 {\hat{δ}}_{j, h}^{⊤} (Z_{Δ_{b}}^{⊤} M_{j} Z_{j, h_{b}}) {(Z_{j, h_{b}}^{⊤} M_{j} Z_{j, h_{b}})}^{- 1} Z_{j, h_{b}}^{⊤} M_{j} ε^{*}] sgn (h - h_{b}) \\ + ε^{* ⊤} M_{j} Z_{j, h_{b}} {(Z_{j, h_{b}}^{⊤} M_{j} Z_{j, h_{b}})}^{- 1} Z_{j, h_{b}}^{⊤} M_{j} ε^{*} \\ - ε^{* ⊤} M_{j} Z_{j, h} {(Z_{j, h}^{⊤} M_{j} Z_{j, h})}^{- 1} Z_{j, h}^{⊤} M_{j} ε^{*} . \end{matrix}

(A10)

By Lemmas A1 and A2, we deduce that

{(X_{j}^{⊤} X_{j})}^{- 1} X_{j}^{⊤} ε^{*} = N_{j}^{- 1 / 2} O_{p^{*}} (1) .

Since

N_{j}^{- 1 / 2 + α} δ_{j}^{- 1} \to 0

and

N_{j}^{- 1 / 2 + α} δ_{j}^{- 1} - N_{j}^{- 1 / 2 + α} {\hat{δ}}_{j, h}^{- 1} = N_{j}^{- 1 / 2 + α} δ_{j}^{- 1} ({\hat{δ}}_{j, h} - δ_{j}) {\hat{δ}}_{j, h}^{- 1} = O_{p^{*}} (N_{j}^{- 1 + α} δ_{j}^{- 2}),

by Corollary 1 in [26], we have

N_{j}^{- 1 / 2 + α} {\hat{δ}}_{j, h}^{- 1}

tends to zero. It follows from

Z_{Δ_{b}}^{⊤} X_{j} = | h - h_{b} | O_{p} (1)

that

\begin{matrix} sup_{h_{b} \in K^{*} (C)} \frac{| v_{2}^{*} (h_{b}) |}{| h - h_{b} | ∥ {\hat{δ}}_{j, h} ∥^{2}} & \leq 2 ∥ {\hat{δ}}_{j, h} ∥^{- 1} ∥ Z_{Δ_{b}}^{⊤} X_{j} / (h - h_{b}) ∥ ∥ {(X_{j}^{⊤} X_{j})}^{- 1} X_{j}^{⊤} ε^{*} ∥ \\ = O_{p^{*}} (∥ {\hat{δ}}_{j, h} ∥^{- 1} N_{j}^{- 1 / 2}) = o_{p^{*}} (1) . \end{matrix}

Similarly, we have

∥ (Z_{Δ_{b}}^{⊤} M_{j} Z_{j, h_{b}}) ∥ = | h - h_{b} | O_{p^{*}} (1)

, and

{(Z_{j, h_{b}}^{⊤} M_{j} Z_{j, h_{b}})}^{- 1} Z_{j, h_{b}}^{⊤} M_{j} ε^{*} = N_{j}^{- 1 / 2} O_{p^{*}} (1)

uniformly on

K^{*} (C)

, which implies

\begin{matrix} sup_{h_{b} \in K^{*} (C)} \frac{v_{3} (h_{b})}{| h - h_{b} | ∥ {\hat{δ}}_{j, h} ∥^{2}} & \leq ∥ {\hat{δ}}_{j, h} ∥^{- 1} ∥ (Z_{Δ_{b}}^{⊤} M_{j} Z_{j, h_{b}}) / (h - h_{b}) ∥ ∥ {(Z_{j, h_{b}}^{⊤} M_{j} Z_{j, h_{b}})}^{- 1} Z_{j, h_{b}}^{⊤} M_{j} ε^{*} ∥ \\ = O_{p^{*}} (∥ {\hat{δ}}_{j, h} ∥^{- 1} N_{j}^{- 1 / 2}) = o_{p^{*}} (1) . \end{matrix}

Furthermore, it is easy to show that the expressions

{sup}_{h_{b} \in K^{*} (C)} \frac{| v_{4}^{*} (h_{b}) |}{| h - h_{b} | ∥ {\hat{δ}}_{j, h} ∥^{2}}

and

{sup}_{h_{b} \in K^{*} (C)} \frac{| v_{5}^{*} (h) |}{| h - h_{b} | ∥ {\hat{δ}}_{j, h} ∥^{2}}

are uniformly

O_{p^{*}} (1)

on

K^{*} (C)

. By Lemma A.3 in [26],

\begin{matrix} P^{*} (sup_{h_{b} \in K^{*} (C)} \frac{| v_{1}^{*} (h_{b}) |}{| h - h_{b} | ∥ {\hat{δ}}_{j, h} ∥^{2}} > \frac{λ}{5}) \\ \leq & P^{*} (sup_{h_{b} \in K^{*} (C)} ∥ {\hat{δ}}_{j, h} ∥^{- 1} ∥ Z_{Δ_{b}}^{⊤} ε^{*} / (h - h_{b}) ∥ > \frac{λ}{10}) \leq \frac{100 B}{λ^{2} C} \leq \frac{ϵ}{5} \end{matrix}

for large C. In conclution, the consistency of the bootstrap estimator has been established. □

Proof of Theorem 3.

Lemma A5 implies that when C is large,

{\hat{a}}_{j, h_{b}}^{*}

has a high probability of being outside the set

K^{*} (C)

. Let

D^{*} (C)

denote the complement set of

K^{*} (C)

such that

D^{*} (C) = {h_{b} : | h_{b} - h | \leq C ∥ {\hat{δ}}_{j, h} ∥^{- 2}}

. To study the limiting distribution, we consider the expression of

V^{*} (h_{b}) - V^{*} (h)

on

D^{*} (C)

.

We use

Z_{j, h} = Z_{j, h_{b}} - Z_{Δ_{b}} sgn (h - h_{b})

to obtain that

\begin{matrix} |h - h_{b}| g^{*} (h_{b}) & = {\hat{δ}}_{j, h}^{⊤} {Z_{j, h} M_{j} Z_{j, h} - Z_{j, h}^{⊤} M_{j} Z_{j, h_{b}} {(Z_{j, h_{b}} M_{j} Z_{j, h_{b}})}^{- 1} Z_{j, h_{b}}^{⊤} M_{j} Z_{j, h}} {\hat{δ}}_{j, h} \\ = {\hat{δ}}_{j, h}^{⊤} {Z_{Δ_{b}}^{⊤} Z_{Δ_{b}} - Z_{Δ_{b}}^{⊤} X_{j} {(X_{j}^{⊤} X_{j})}^{- 1} X_{j}^{⊤} Z_{Δ_{b}}} {\hat{δ}}_{j, h} \\ - {\hat{δ}}_{j, h}^{⊤} Z_{Δ_{b}}^{⊤} M_{j} Z_{j, h_{b}} {(Z_{j, h_{b}} M_{j} Z_{j, h_{b}})}^{- 1} Z_{j, h_{b}}^{⊤} M_{j} Z_{Δ_{b}} {\hat{δ}}_{j, h} . \end{matrix}

(A11)

Since

∥ X_{j}^{⊤} Z_{Δ_{b}} ∥ = O_{p^{*}} (∥ {\hat{δ}}_{j, h} ∥^{- 2})

and

∥ Z_{j, h_{b}}^{⊤} M_{j} Z_{Δ_{b}} ∥ = O_{p^{*}} (∥ {\hat{δ}}_{j, h} ∥^{- 2})

, we have

| h - h_{b} | g^{*} (h_{b}) = {\hat{δ}}_{j, h}^{⊤} Z_{Δ_{b}}^{⊤} Z_{Δ_{b}} {\hat{δ}}_{j, h} + o_{p^{*}} (1) .

Since

|h - h_{b}| \leq C {∥ {\hat{δ}}_{j, h} ∥}^{- 2}

, we deduce that

v_{2}^{*} (h_{b})

and

v_{3}^{*} (h_{b})

are both bounded by

O_{p^{*}} (N_{j}^{- 1 / 2} ∥ {\hat{δ}}_{j, h} ∥^{- 1}) = o_{p^{*}} (1)

, and

v_{4}^{*} (h_{b})

and

v_{5}^{*} (h_{b})

are both bounded by

o_{p^{*}} (1)

uniformly on

D^{*} (C)

.

By the above results, we obtain that

\begin{matrix} V^{*} (h_{b}) - V^{*} (h) & = - {\hat{δ}}_{j, h}^{⊤} Z_{Δ_{b}}^{⊤} Z_{Δ_{b}} {\hat{δ}}_{j, h} + 2 {\hat{δ}}_{j, h}^{⊤} Z_{Δ_{b}}^{⊤} ε^{*} sgn (h - h_{b}) + o_{p^{*}} (1), \end{matrix}

which combined with (A4) completes the proof. □

References

Haynes, K.; Eckley, I.A.; Fearnhead, P. Computationally efficient changepoint detection for a range of penalties. J. Comput. Graph. Stat. 2017, 26, 134–143. [Google Scholar] [CrossRef]
Picard, F.; Robin, S.; Lavielle, M.; Vaisse, C.; Daudin, J.J. A statistical approach for array CGH data analysis. BMC Bioinform. 2005, 6, 27. [Google Scholar] [CrossRef]
Li, J.; Fearnhead, P.; Fryzlewicz, P.; Wang, T. Automatic change-point detection in time series via deep learning. J. R. Stat. Soc. Ser. B Stat. Methodol. 2024, 86, 273–285. [Google Scholar] [CrossRef]
Killick, R.; Fearnhead, P.; Eckley, I.A. Optimal detection of changepoints with a linear computational cost. J. Am. Stat. Assoc. 2012, 107, 1590–1598. [Google Scholar] [CrossRef]
Bai, J.; Perron, P. Estimating and testing linear models with multiple structural changes. Econometrica 1998, 66, 47–78. [Google Scholar] [CrossRef]
Bai, J.; Perron, P. Computation and analysis of multiple structural change models. J. Appl. Econom. 2003, 18, 1–22. [Google Scholar] [CrossRef]
Davis, R.A.; Lee, T.C.M.; Rodriguez-Yam, G.A. Structural break estimation for nonstationary time series models. J. Am. Stat. Assoc. 2006, 101, 223–239. [Google Scholar] [CrossRef]
Harchaoui, Z.; Lévy-Leduc, C. Multiple change-point estimation with a total variation penalty. J. Am. Stat. Assoc. 2010, 105, 1480–1493. [Google Scholar] [CrossRef]
Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
Jin, B.; Wu, Y.; Shi, X. Consistent two-stage multiple change-point detection in linear models. Can. J. Stat. 2016, 44, 161–179. [Google Scholar] [CrossRef]
Zou, H. The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 2006, 101, 1418–1429. [Google Scholar] [CrossRef]
Fan, J.; Li, R. Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
Zhang, C. Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 2010, 38, 894–942. [Google Scholar] [CrossRef] [PubMed]
Li, J.; Jin, B. Multi-threshold accelerated failure time model. Ann. Stat. 2018, 46, 2657–2682. [Google Scholar] [CrossRef]
Eichinger, B.; Kirch, C. A MOSUM procedure for the estimation of multiple random change points. Bernoulli 2018, 24, 526–564. [Google Scholar] [CrossRef]
Fang, X.; Li, J.; Siegmund, D. Segmentation and estimation of change-point models: False positive control and confidence regions. Ann. Stat. 2020, 48, 1615–1647. [Google Scholar] [CrossRef]
Antoch, J.; Hušková, M.; Veraverbeke, N. Change-point problem and bootstrap. J. Nonparametr. Stat. 1995, 5, 123–144. [Google Scholar] [CrossRef]
Dumbgen, L. The asymptotic behavior of some nonparametric change-point estimators. Ann. Stat. 1991, 19, 1471–1495. [Google Scholar] [CrossRef]
Hušková, M.; Kirch, C. Bootstrapping confidence intervals for the change-point of time series. J. Time Ser. Anal. 2008, 29, 947–972. [Google Scholar] [CrossRef]
Cho, H.; Kirch, C. Bootstrap confidence intervals for multiple change points based on moving sum procedures. Comput. Stat. Data Anal. 2022, 175, 107552. [Google Scholar] [CrossRef]
Lv, J.; Fan, Y. A unified approach to model selection and sparse recovery using regularized least squares. Ann. Stat. 2009, 37, 3498–3528. [Google Scholar] [CrossRef]
Ing, C.K.; Lai, T.L. A stepwise regression method and consistent model selection for high-dimensional sparse linear models. Stat. Sin. 2011, 21, 1473–1513. [Google Scholar] [CrossRef]
Jin, B.; Shi, X.; Wu, Y. A novel and fast methodology for simultaneous multiple structural break estimation and variable selection for nonstationary time series models. Stat. Comput. 2013, 23, 221–231. [Google Scholar] [CrossRef]
White, H. Maximum likelihood estimation of misspecified models. Econom. J. Econom. Soc. 1982, 50, 1–25. [Google Scholar] [CrossRef]
Flynn, C.J.; Hurvich, C.M.; Simonoff, J.S. Efficiency for regularization parameter selection in penalized likelihood estimation of misspecified models. J. Am. Stat. Assoc. 2013, 108, 1031–1043. [Google Scholar] [CrossRef]
Bai, J. Estimation of a change point in multiple regression models. Rev. Econ. Stat. 1997, 79, 551–563. [Google Scholar] [CrossRef]
Takanami, T.; Kitagawa, G. Estimation of the arrival times of seismic waves by multivariate time series model. Ann. Inst. Stat. Math. 1991, 43, 407–433. [Google Scholar] [CrossRef]

Figure 1. Simulated regression data with change points indicated by dotted lines.

Figure 2. Change points estimated by our method (vertical lines), with shaded areas representing the 95% confidence intervals around the change points. The results for the two change points are shown in red and blue, respectively.

Table 1. Performance of different methods for multiple change point detection.

Method	$c_{all}$		$c_{1}$	$c_{2}$	$c_{3}$
TSP_oga,wald	90.70		98.50	96.80	97.80
		Mean	150.34	300.41	449.77
		SE	1.66	2.31	2.22
TSMCD_lasso	72.60		95.20	95.80	96.40
		Mean	150.61	300.43	450.16
		SE	2.42	2.31	2.19

Table 2. Performance of bootstrap_oga,wald.

$(1 - α)$ %	$a_{all}$	$a_{1}$	$a_{2}$	$a_{3}$
90	93.80	91.80	93.80	91.00
95	96.80	95.80	95.80	95.60

Table 3. Bootstrap CIs for the change points.

Change Point	95% Bootstrap CIs	90% Bootstrap CIs
3074	[3072, 3084]	[3073, 3080]
3914	[3877, 3952]	[3882, 3948]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hou, L.; Jin, B.; Wu, Y.; Wang, F. Bootstrap Confidence Intervals for Multiple Change Points Based on Two-Stage Procedures. Entropy 2025, 27, 537. https://doi.org/10.3390/e27050537

AMA Style

Hou L, Jin B, Wu Y, Wang F. Bootstrap Confidence Intervals for Multiple Change Points Based on Two-Stage Procedures. Entropy. 2025; 27(5):537. https://doi.org/10.3390/e27050537

Chicago/Turabian Style

Hou, Li, Baisuo Jin, Yuehua Wu, and Fangwei Wang. 2025. "Bootstrap Confidence Intervals for Multiple Change Points Based on Two-Stage Procedures" Entropy 27, no. 5: 537. https://doi.org/10.3390/e27050537

APA Style

Hou, L., Jin, B., Wu, Y., & Wang, F. (2025). Bootstrap Confidence Intervals for Multiple Change Points Based on Two-Stage Procedures. Entropy, 27(5), 537. https://doi.org/10.3390/e27050537

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Bootstrap Confidence Intervals for Multiple Change Points Based on Two-Stage Procedures

Abstract

1. Introduction

2. Multiple Change Point Detection Based on Two-Stage Procedures

2.1. Segment Selection

2.2. Refining

3. Bootstrap Confidence Intervals for Multiple Change Points

4. Theoretical Validity of the Bootstrap Confidence Intervals

5. Simulation

5.1. Detection of Multiple Change Points

5.2. Bootstrap CIs

6. Empirical Application

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI