Next Article in Journal
Dynamical Analysis of an Improved Bidirectional Immunization SIR Model in Complex Network
Next Article in Special Issue
Research on Active Safety Situation of Road Passenger Transportation Enterprises: Evaluation, Prediction, and Analysis
Previous Article in Journal
Maximum Geometric Quantum Entropy
Previous Article in Special Issue
Distance Correlation-Based Feature Selection in Random Forest
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Blockwise Bootstrap-Based Two-Sample Test for High-Dimensional Time Series

Joint Laboratory of Data Science and Business Intelligence, Southwestern University of Finance and Economics, Chengdu 611130, China
Entropy 2024, 26(3), 226; https://doi.org/10.3390/e26030226
Submission received: 16 January 2024 / Revised: 24 February 2024 / Accepted: 27 February 2024 / Published: 1 March 2024
(This article belongs to the Special Issue Recent Advances in Statistical Inference for High Dimensional Data)

Abstract

:
We propose a two-sample testing procedure for high-dimensional time series. To obtain the asymptotic distribution of our -type test statistic under the null hypothesis, we establish high-dimensional central limit theorems (HCLTs) for an α -mixing sequence. Specifically, we derive two HCLTs for the maximum of a sum of high-dimensional α -mixing random vectors under the assumptions of bounded finite moments and exponential tails, respectively. The proposed HCLT for α -mixing sequence under bounded finite moments assumption is novel, and in comparison with existing results, we improve the convergence rate of the HCLT under the exponential tails assumption. To compute the critical value, we employ the blockwise bootstrap method. Importantly, our approach does not require the independence of the two samples, making it applicable for detecting change points in high-dimensional time series. Numerical results emphasize the effectiveness and advantages of our method.

1. Introduction

A fundamental testing problem in multivariate analysis involves assessing the equality of two mean vectors, denoted as μ X and μ Y . Since its inception by [1], the Hotelling T 2 test has proven to be a valuable tool in multivariate analyses. Subsequently, numerous studies have addressed the testing of μ X = μ Y , within various contexts and under distinct assumptions. See refs. [2,3], along with their respective references.
Consider two sets of observations, { X t } t = 1 n 1 and { Y t } t = 1 n 2 , where X t = ( X t , 1 , , X t , p ) T and Y t = ( Y t , 1 , , Y t , p ) T . These observations are drawn from two populations with means μ X and μ Y , respectively. The classical test aims to test the hypotheses:
H 0 : μ X = μ Y versus H 1 : μ X μ Y .
When { X t } t = 1 n 1 and { Y t } t = 1 n 2 are two independent sequences and independent with each other, a considerable body of literature focuses on testing Hypothesis (1). The 2 -type test statistic corresponding to (1) is of the form ( X ¯ Y ¯ ) T S 1 ( X ¯ Y ¯ ) , where X ¯ = n 1 1 t = 1 n 1 X t , Y ¯ = n 2 1 t = 1 n 2 Y t and S 1 is the weight matrix. A straightforward choice for S 1 is the identity matrix I p [4,5], implying equal weighting for each dimension. Several classical asymptotic theories have been developed based on this selection of S 1 . However, this choice disregards the variability in each dimension and the correlations between them, resulting in suboptimal performance, particularly in the presence of heterogeneity or the existence of correlations between dimensions. In recent decades, numerous researchers have investigated various choices for S 1 along with the corresponding asymptotic theories. See refs. [6,7]. In addition, some researchers have developed a framework centered on -type test statistics, represented as max j [ p ] | ( S 1 / 2 ( X ¯ Y ¯ ) ) j | [8,9,10]. Extreme value theory plays a pivotal role in deriving the asymptotic behaviors of these test statistics.
However, when { X t } t = 1 n 1 and { Y t } t = 1 n 2 are two weakly dependent sequences and are not independent of each other, the above methods may not work well. In this paper, we introduce an -type test statistic T n : = ( n 1 n 2 ) 1 / 2 ( n 1 + n 2 ) 1 / 2 | X ¯ Y ¯ | for testing H 0 under two dependent sequences. Based on Σ , which represents the variance of ( n 1 n 2 ) 1 / 2 ( n 1 + n 2 ) 1 / 2 ( X ¯ Y ¯ ) , we construct a Gaussian maxima, denoted as T n G , to approximate T n under the null hypothesis. When n 1 = n 2 = n , T n can be written as | S n | , the maximum of a sum of high-dimensional weakly dependent random vectors, where S n = n 1 / 2 t = 1 n ( X t Y t ) . Let T n G = | G | with G = ( G 1 , , G p ) T N { 0 , var ( S n ) } and A be a class of Borel subsets in R p . Define
ρ n ( A ) = sup A A | P ( S n A ) P ( G A ) | .
Paticularly, let A max consists of all sets A max of the form A max = { ( a 1 , , a p ) T R p : max j [ p ] | a j | x } with some x R . Then we have
ρ n ( A max ) = sup x R | P ( T n x ) P ( T n G x ) | .
Note that ρ n ( A max ) is the Kolmogorov distance between T n and T n G .
When dimension p diverges exponentially with respect to the sample size n, several studies have focused on deriving ρ n ( A max ) = o ( 1 ) under a weakly dependent assumption. Based on the coupling method for β -mixing sequence, ref. [11] obtained ρ n ( A max ) = o ( 1 ) under the β -mixing condition, contributing to the understanding of such phenomena. Ref. [12] extended the scope of the investigation to the physical dependence framework introduced by [13]. Considering three distinct types of dependence—namely α -mixing, m-dependence, and physical dependence measures—ref. [14] made significant strides. They established nonasymptotic error bounds for Gaussian approximations of sums involving high-dimensional dependent random vectors. Their analysis encompassed various scenarios of A , including hyper-rectangles, simple convex sets, and sparsely convex sets. Let A re be the class of all hyper-rectangles in R p . Under the α -mixing scenario and some mild regularity conditions, [14] showed
ρ n ( A re ) { log ( p n ) } 7 / 6 n 1 / 9 ,
hence the Gaussian approximation holds if log ( p n ) = o ( n 2 / 21 ) . In this paper, under some conditions similar to or even weaker than [14], we obtain
ρ n ( A max ) { log ( p n ) } 3 / 2 n 1 / 6 ,
which implies the Gaussian approximation holds if log ( p n ) = o ( n 1 / 9 ) . Refer to Remark 1 for more details on the comparison of the convergence rates. By using the Gaussian-to-Gaussian comparison and Nazarov’s inequality for p-dimensional random vectors, we can easily extend our result to ρ n ( A re ) { log ( p n ) } 3 / 2 n 1 / 6 . Given that our framework and numerous testing procedures rely on -type test statistics, we thus propose our results under A max . When p diverges polynomially with respect to n, to the best of our knowledge, there is no existing literature providing the convergence rate of ρ n ( A max ) for α -mixing sequences under bounded finite moments.
Based on the Gaussian approximation for high-dimensional independent random vectors [15,16], we employ the coupling method for α -mixing sequence [17] and “big-and-small” block technique to specify the convergence rate of ρ n ( A max ) under various divergence rates of p. For more details, refer to Theorem 1 in Section 3.1 and its corresponding proof in Appendix A. Given that Σ is typically unknown in practice, we develop a data-driven procedure based on blockwise wild bootstrap [18] to determine the critical value for a given significance level α . The blockwise wild bootstrap method is widely used in the time series analysis. See [19,20] and references within.
The independence between { X t } t = 1 n 1 and { Y t } t = 1 n 2 is not a necessary assumption in our method. We only require the pair sequence { ( X t , Y t ) } is weakly dependent. Therefore, our method can be applied effectively to detect change points in high-dimensional time series. Further details on this application can be found in Section 4.
The rest of this paper is organized as follows. Section 2 introduces the test statistic and the blockwise bootstrap method. The convergence rates of Gaussian approximations for high-dimensional α -mixing sequence and the theoretical properties of the proposed test can be found in Section 3. In Section 4, an application to change point detection for high-dimensional time series is presented. The selection method for tuning parameter and a simulation study to investigate the numerical performance of the test are displayed in Section 5. We apply the proposed method to the opening price data from multiple stocks in Section 6. Section 7 provides discussions on the results and outlines our future work. The proofs of the main results in Section 3 are detailed in the Appendix A, Appendix B, Appendix C and Appendix D.
Notation: 
For any positive integer p 1 , we write [ p ] = { 1 , , p } . We use | a | = max j [ p ] | a j | to denote the -norm of the p-dimensional vector a . Let x and x represent the greatest integer less than or equal to x and the smallest integer greater than or equal to x, respectively. For two sequences of positive numbers { a n } and { b n } , we write a n b n or b n a n if lim sup n a n / b n c 0 for some positive constant c 0 . Let a n b n if a n b n and b n a n hold simultaneously. Denote 0 p = ( 0 , , 0 ) T R p . For any m × m matrix A = ( a i j ) m × m , let | A | = max i , j [ m ] | a i j | and A 2 be the spectral norm of A . Additionally, denote λ min ( A ) as the smallest eigenvalue of A . Let 1 ( · ) be the indicator function. For any x , y R , denote x y = max { x , y } and x y = min { x , y } . Given γ > 0 , we define the function ψ γ ( x ) : = exp ( x γ ) 1 for any x > 0 . For a real-valued random variable ξ , we define ξ ψ γ : = inf [ λ > 0 : E { ψ γ ( | ξ | / λ ) } 1 ] . Throughout the paper, we use c , C ( 0 , ) to denote two generic finite constants that do not depend on ( n 1 , n 2 , p ) , and may be different in different uses.

2. Methodology

2.1. Test Statistic and Its Gaussian Analog

Consider two weakly stationary time series { X t , t Z } and { Y t , t Z } with X t = ( X t , 1 , , X t , p ) T and Y t = ( Y t , 1 , , Y t , p ) T . Let μ X = E ( X t ) and μ Y = E ( Y t ) . The primary focus is on testing equality of mean vectors of the two populations:
H 0 : μ X = μ Y versus H 1 : μ X μ Y .
Given the observations { X t } t = 1 n 1 and { Y t } t = 1 n 2 , the estimations of μ X and μ Y are, respectively, μ ^ X = n 1 1 t = 1 n 1 X t and μ ^ Y = n 2 1 t = 1 n 2 Y t . In this paper, we assume n 1 n 2 n . It is natural to consider the -type test statistic T n = ( n 1 n 2 ) 1 / 2 ( n 1 + n 2 ) 1 / 2 | μ ^ X μ ^ Y | . Write n ˜ = max { n 1 , n 2 } . Define two new sequences { X ˜ t } t = 1 n ˜ and { Y ˜ t } t = 1 n ˜ with
X ˜ t = X t n 1 1 ( 1 t n 1 ) and Y ˜ t = Y t n 2 1 ( 1 t n 2 ) .
For each t [ n ˜ ] , let
Z t = n 2 n ˜ n 1 ( n 1 + n 2 ) X ˜ t n 1 n ˜ n 2 ( n 1 + n 2 ) Y ˜ t .
Then, T n can be rewritten as
T n = | 1 n ˜ t = 1 n ˜ Z t | .
We reject the null hypothesis H 0 if T n > cv α , where cv α represents the critical value at the significance level α ( 0 , 1 ) . Determining cv α involves deriving the distribution of T n under H 0 . However, due to the divergence of p in a high-dimensional scenario, obtaining the distribution of T n is challenging. To address this challenge, we employ the Gaussian approximation theorem [15,16]. We seek a Gaussian analog, denoted as T n G , satisfying the property that the Kolmogorov distance between T n and T n G converges to zero under H 0 . Then, we can replace cv α by cv α G : = inf { x > 0 : P ( T n G > x ) α } . Define a p-dimensional Gaussian vector
G N ( 0 p , Ξ n ˜ ) with Ξ n ˜ = var 1 n ˜ t = 1 n ˜ Z t .
We then define the Gaussian analogue of T n as
T n G = | G | .
Proposition 1 below demonstrates that the null distribution of T n can be effectively approximated by the distribution of T n G .

2.2. Blockwise Bootstrap

Note that the long-run covariance matrix Ξ n ˜ specified in (3) is typically unknown. As a result, determining cv α G through the distribution of T n G becomes challenging. To address this challenge, we introduce a parametric bootstrap estimator for T n using the blockwise bootstrap method [18].
For some positive constant ϑ [ 1 / 2 , 1 ) , let S n ˜ 1 ϑ and B = n ˜ / S be the size of each block and the number of blocks, respectively. Denote I b = { ( b 1 ) S + 1 , , b S } for b [ B 1 ] and I B = { ( B 1 ) S + 1 , , n ˜ } . Let { ϱ b } b = 1 B be the sequence of i.i.d. standard normal random variables and ϱ = ( ϱ 1 , , ϱ n ˜ ) , where ϱ t = ϱ b if t I b . Define the bootstrap estimator of T n as
T ^ n G = | 1 n ˜ t = 1 n ˜ ( Z t Z ¯ ) ϱ t | ,
where Z ¯ = n ˜ 1 t = 1 n ˜ Z t . Based on this estimator, we define the estimated critical value cv ^ α as
cv ^ α : = inf { x > 0 : P ( T ^ n G > x | E ) α } ,
where E = { X 1 , , X n 1 , Y 1 , , Y n 2 } . Then, we reject the null hypothesis H 0 if T n > cv ^ α . The procedure for selecting the parameter ϑ (or block size S) is detailed in Section 5.1. In practice, we obtain cv ^ α through the following bootstrap procedure: Generate K independent sequences { ϱ ( 1 ) , t } t = 1 n ˜ , , { ϱ ( K ) , t } t = 1 n ˜ , with each { ϱ ( k ) , t } t = 1 n ˜ generated as { ϱ t } t = 1 n ˜ . For each k [ K ] , calculate T ^ ( k ) , n G with { ϱ ( k ) , t } t = 1 n ˜ . Then, cv ^ α is the ( 1 α ) K -th largest value among { T ^ ( 1 ) , n G , , T ^ ( K ) , n G } . Here, K is the number of bootstrap replications.

3. Theoretical Results

We employ the concept of ‘ α -mixing’ to characterize the serial dependence of { ( X t , Y t ) } , with the α -mixing coefficient at lag κ defined as
α ( κ ) : = sup r sup A F r , B F r + κ | P ( A B ) P ( A ) P ( B ) | ,
where F r and F r + κ are the σ -fields generated by { ( X t , Y t ) : t r } and { ( X t , Y t ) : t r + κ } , respectively. We call the sequence { ( X t , Y t ) } is α -mixing if α ( κ ) 0 as κ .

3.1. Gaussian Approximation for High-Dimensional α -Mixing Sequence

To show that the Kolmogorov distance between T n and T n G converges to zero under various divergence rates of p, we need the following central limit theorems for high-dimensional α -mixing sequence.
Theorem 1.
Let { ξ t } t = 1 n be an α-mixing sequence of p-dimensional centered random vectors and { α ( κ ) } κ 1 denote the α-mixing coefficients of { ξ t } , defined in the same manner as (5). Write S n = ( S n , 1 , , S n , p ) T = n 1 / 2 t = 1 n ξ t and W = ( W 1 , , W p ) T N ( 0 p , Σ n ) with Σ n = E ( S n S n T ) . Define
ρ n = sup x R | P ( | S n | x ) P ( | W | x ) | .
(i) 
If max t [ n ] max j [ p ] E ( | ξ t , j | m ) C 1 * , α ( κ ) C 2 * κ τ and λ min ( Σ n ) C 3 * for some m > 3 , τ > max { 2 m / ( m 3 ) , 3 } and constants C 1 * , C 2 * , C 3 * > 0 , we have
ρ n p 1 / 2 ( log p ) 1 / 4 n τ ˜
provided that p = o ( n 2 τ ˜ ) , where τ ˜ = τ / ( 11 τ + 12 ) .
(ii) 
If max t [ n ] max j [ p ] ξ t , j ψ γ 1 M n , α ( κ ) C 1 * * exp ( C 2 * * κ γ 2 ) and min j [ p ] ( Σ n ) j , j C 3 * * for some M n 1 , γ 1 ( 0 , 2 ] , γ 2 > 0 and constants C 1 * * , C 2 * * , C 3 * * > 0 , we have
ρ n M n { log ( p n ) } max { ( 2 γ 2 + 1 ) / 2 γ 2 , 3 / 2 } n 1 / 6
provided that { log ( p n ) } 3 = o { n γ 1 γ 2 / ( 2 γ 1 + 2 γ 2 γ 1 γ 2 ) } and M n 2 { log ( p n ) } 1 / γ 2 = o ( n 1 / 3 ) .
Remark 1.
In scenarios where the dimension p diverges polynomially with respect to n, Theorem 1(i) represents a novel contribution to the existing literature. Moreover, if τ (i.e., α ( κ ) exp ( C κ ) for some constant C > 0 ), we have τ ˜ 1 / 11 , and thus ρ n = o ( 1 ) if p ( log p ) 1 / 2 = o ( n 2 / 11 ) . Compared with Theorem 1 in [14], which provides a Gaussian approximation result when p diverges exponentially with respect to n, Theorem 1(ii) has three improvements. Firstly, all conditions of Theorem 1(ii) are equivalent to those in Theorem 1 of [14], with the exception that we permit γ 1 ( 0 , 1 ) , thereby offering a weaker assumption that is more broadly applicable. Secondly, the convergence rate dependent on n via n 1 / 6 in Theorem 1(ii) outperforms the rate of n 1 / 9 demonstrated in Theorem 1 of [14]. Note that the convergence rate in Theorem 1 of [14] can be rewritten as
M n { log ( p n ) } ( 2 γ 2 + 1 ) / 2 γ 2 n 1 / 6 2 / 3 + M n { log ( p n ) } 7 / 6 n 1 / 9 .
To ensure ρ n = o ( 1 ) , in our result, it is necessary to allow M n 6 { log ( p n ) } ( 6 γ 2 + 3 ) / γ 2 = o ( n ) when γ 2 2 / 3 and M n 6 { log ( p n ) } max { ( 6 γ 2 + 3 ) / γ 2 , 9 } = o ( n ) when γ 2 > 2 / 3 , respectively. Comparatively, the basic requirements under Theorem 1 of [14] are M n 6 { log ( p n ) } ( 6 γ 2 + 3 ) / γ 2 = o ( n ) when γ 2 2 / 3 and M n 9 { log ( p n ) } 21 / 2 = o ( n ) when γ 2 > 2 / 3 , respectively. Due to ( 6 γ 2 + 3 ) / γ 2 < 21 / 2 when γ 2 > 2 / 3 , our result permits a larger or equal divergence rate of p compared with Theorem 1 in [14].

3.2. Theoretical Properties

In order to derive the theoretical properties of T n , the following regular assumptions are needed.
Assumption 1.
(i) 
For some m > 4 , there exists a constant C 1 > 0 s.t. max t [ n ˜ ] max j [ p ] E ( | Z t , j | m ) C 1 .
(ii) 
There exists a constant C 2 > 0 s.t. α ( κ ) C 2 κ τ for some τ > 3 m / ( m 4 ) .
(iii) 
There exists a constant C 3 > 0 s.t. λ min ( Ξ n ˜ ) C 3 .
Assumption 2.
(i) 
There exists a constant C 1 > 0 s.t. max t [ n ˜ ] max j [ p ] Z t , j ψ 2 C 1 .
(ii) 
There exist two constants C 2 , C 3 > 0 s.t. α ( κ ) C 2 exp ( C 3 κ ) .
(iii) 
There exists a constant C 4 > 0 s.t. min j [ p ] ( Ξ n ˜ ) j , j C 4 .
Remark 2.
The two mild Assumptions, 1 and 2, delineate the necessary assumptions for { ( X t , Y t ) } to facilitate the development of Gaussian approximation theories for the dimension p divergence, characterized by polynomial and exponential rates relative to the sample size n, respectively. Assumptions 1(i) and 1(ii) are common assumptions in multivariate time series analysis. Due to n 1 n 2 n , if max t [ n 1 ] , j [ p ] E ( | X t , j | m ) C and max t [ n 2 ] , j [ p ] E ( | Y t , j | m ) C , then Assumption 1(i) holds, as verified by the triangle inequality. Additionally, Assumption 1(iii) necessitates the strong nondegeneracy of Ξ n ˜ , a requirement commonly assumed in Gaussian approximation theories (see refs. [21,22], among others). Note that Assumption 2(iii) is implied by Assumption 1(iii). The latter assumption only necessitates the nondegeneracy of min j [ p ] var ( n ˜ 1 / 2 t = 1 n ˜ Z t , j ) . We can modify Assumption 2(i) to max t [ n ˜ ] max j [ p ] Z t , j ψ γ C for any γ ( 0 , 2 ] , a standard assumption in the literature on ultra-high-dimensional data analysis. This assumption ensures subexponential upper bounds for the tail probabilities of the statistics in question when p n , as discussed in [23,24]. The requirement of sub-Gaussian properties in Assumption 2(i) is made for the sake of simplicity. If { X t } and { Y t } share the same tail probability, Assumption 2(i) is satisfied automatically. Assumption 2(ii) necessitates that the α-mixing coefficients decay at an exponential rate.
Write Δ n : = max { n 1 , n 2 } min { n 1 , n 2 } . Define two cases with respect to the distinct divergence rates of p as
  • Case1: { X t } t = 1 n 1 and { Y t } t = 1 n 2 satisfy Assumption 1, and the dimension p satisfies p 2 log p = o { n 4 τ / ( 11 τ + 12 ) } and Δ n 2 log p = o ( n ) ;
  • Case2: { X t } t = 1 n 1 and { Y t } t = 1 n 2 satisfy Assumption 2, and the dimension p satisfies log ( p n ) = o ( n 1 / 9 ) and Δ n 2 log p = o ( n ) .
Note that Δ n 2 log p = o ( n ) mandates the maximum difference between the two sample sizes. Proposition 1 below demonstrates that, under the aforementioned cases and H 0 , the Kolmogorov distance between T n and T n G converges to zero as the sample size approaches infinity. Proposition 1 can be directly derived from Theorem 1. Note that, in the scenario where the dimension p diverges in a polynomial rate with respect to n, obtaining Proposition 1 requires only m > 3 and τ > max { 2 m / ( m 3 ) , 3 } , an assumption weaker than Assumption 1. The more stringent restrictions m > 4 and τ > 3 m / ( m 4 ) in Assumption 1 are imposed to establish the results presented in Theorems 2 and 3.
Proposition 1.
In either Case1 or Case2, it holds under the null hypothesis H 0 that
sup x R | P ( T n x ) P ( T n G x ) | = o ( 1 ) .
According to Proposition 1, the critical value cv α can be substituted with cv α G . However, in practical scenarios, the long-run covariance Ξ n ˜ defined in (3) is typically unknown. This implies that obtaining cv α G directly from the distribution of T n G is not feasible. We introduce a bootstrap method for obtaining the estimator cv ^ α defined in (4). In situations where the dimension p diverges at a polynomial rate relative to the sample size n, we require an additional Assumption 3 to ensure that cv ^ α serves as a reliable estimator for cv α . Assumption 3 places restrictions on the cumulant function, a commonly assumed criterion in time series analysis. Refer to [25,26] for examples of such assumptions in the literature.
Assumption 3.
For each i , j [ p ] , define cum i , j ( h , t , s ) = cov ( Z ˚ 0 , i Z ˚ h , j , Z ˚ t , i Z ˚ s , j ) γ t , i , i γ s h , j , j γ s , i , j γ t h , j , i , where γ h , i , j = cov ( Z 0 , i , Z h , j ) and Z ˚ t , j = Z t , j E ( Z t , j ) . There exists a constant C 4 > 0 s.t.
max i , j [ p ] h = t = s = | cum i , j ( h , t , s ) | < C 4 .
Similar to Case1 and Case2, we consider two cases corresponding to different divergence rates of the dimension p, as outlined below:
  • Case3: { X t } t = 1 n 1 and { Y t } t = 1 n 2 satisfy Assumptions 1 and 3.
  • Case4: { X t } t = 1 n 1 and { Y t } t = 1 n 2 satisfy Assumption 2.
Theorem 2.
In either Case3 with p log p = o [ n min { ( 1 ϑ ) / 4 , 2 τ / ( 11 τ + 12 ) } ] and Δ n 2 log p = o ( n ) , or Case4 with log ( p n ) = o [ n min { ( 1 ϑ ) / 2 , ϑ / 7 , 1 / 9 } ] and Δ n 2 log p = o ( n ) , it holds under H 0 that sup x R | P ( T n x ) P ( T ^ n G x | E ) | = o p ( 1 ) . Moreover, it holds under H 0 that
P ( T n > cv ^ α ) α a s n .
Theorem 3.
In either Case3 with p = o { n ( 1 ϑ ) / 4 } or Case4 with log ( p n ) = o [ n min { ϑ / 3 , ( 1 ϑ ) / 2 } ] , if max j [ p ] | μ X , j μ Y , j | n 1 / 2 ( log p ) 1 / 2 , it holds that
P ( T n > cv ^ α ) 1 a s n .
Remark 3.
The different requirements for the divergence rates of p follow from the fact that we do not rely on the Gaussian approximation and comparison results under certain alternative hypotheses. By Theorem 2 and Theorem 3, the optimal selections for ϑ are 1 / 2 and 7 / 9 in Case3 and Case4, respectively. This implies that lim n P H 0 ( T n > cv ^ α ) = α holds with p log p = o ( n 1 / 8 ) in Case3 and log ( p n ) = o ( n 1 / 9 ) in Case4. Under certain alternative hypotheses, lim n P H 1 ( T n > cv ^ α ) = 1 holds with p = o ( n 1 / 8 ) in Case3 and log ( p n ) = o ( n 1 / 9 ) in Case4.

4. Application: Change Point Detection

In this section, we elaborate that our two-sample testing procedure can be regarded as a novel method for detecting change points for high-dimensional time series. To illustrate, we provide a notation for the detection of a single change point, with the understanding that it can be easily extended to the multiple change points case.
Consider a p-dimensional time series { X t } t = 1 n . Let μ t = E ( X t ) . Consider the following hypothesis testing problem:
H 0 : μ 1 = = μ n versus H 1 : μ 1 = = μ τ 0 1 μ τ 0 = = μ n .
Here, τ 0 is the unknown change point. Let w be a positive integer such that w < min { τ 0 , n τ 0 } . We define μ ¯ t = w 1 l = t w / 2 + 1 t + w / 2 μ l , μ ¯ ( 1 ) = w 1 l = 1 w μ l and μ ¯ ( 2 ) = w 1 l = n w + 1 n μ l . Then for each t [ 3 w / 2 , n 3 w / 2 ] , define Δ t , ( 1 ) = μ ¯ t μ ¯ ( 1 ) and Δ t , ( 2 ) = μ ¯ t μ ¯ ( 2 ) . Thus,
Δ t , ( 1 ) = 0 p , if 3 w / 2 t τ 0 w / 2 , μ ¯ ( 2 ) μ ¯ ( 1 ) t + w / 2 τ 0 w , if τ 0 w / 2 < t τ 0 + w / 2 , μ ¯ ( 2 ) μ ¯ ( 1 ) , if τ 0 + w / 2 < t n 3 w / 2 , Δ t , ( 2 ) = μ ¯ ( 1 ) μ ¯ ( 2 ) , if 3 w / 2 t τ 0 w / 2 , μ ¯ ( 1 ) μ ¯ ( 2 ) t + w / 2 + τ 0 w , if τ 0 w / 2 < t τ 0 + w / 2 , 0 p , if τ 0 + w / 2 < t n 3 w / 2 .
Assume | μ ¯ ( 1 ) μ ¯ ( 2 ) | = O ( 1 ) , which represents the sparse signals case. Define t 1 ( ε t , ( 1 ) ) = min { t [ 3 w / 2 , n 3 w / 2 ] : | Δ t , ( 1 ) | > ε t , ( 1 ) } and t 2 ( ε t , ( 2 ) ) = max { t [ 3 w / 2 , n 3 w / 2 ] : | Δ t , ( 2 ) | > ε t , ( 2 ) } with two well-defined thresholds ε t , ( 1 ) , ε t , ( 2 ) 0 . Due to the symmetry of | Δ t , ( 1 ) | and | Δ t , ( 2 ) | , it holds under H 1 that
τ 0 = t 1 ( ε t , ( 1 ) ) + t 2 ( ε t , ( 2 ) ) 2 .
The sample estimators of μ ¯ t , μ ¯ ( 1 ) and μ ¯ ( 2 ) are, respectively, μ ¯ ^ t = w 1 l = t w / 2 + 1 t + w / 2 X l , μ ¯ ^ ( 1 ) = w 1 l = 1 w X l and μ ¯ ^ ( 2 ) = w 1 l = n w + 1 n X l . Based on the method proposed in Section 2, with n 1 = n 2 = w , we define the following two test statistics:
T w t , ( 1 ) = w | μ ¯ ^ t μ ¯ ^ ( 1 ) | and T w t , ( 2 ) = w | μ ¯ ^ t μ ¯ ^ ( 2 ) | .
Given a significance level α > 0 , we choose ε t , ( 1 ) = cv 1 α t and ε t , ( 2 ) = cv 2 α t , where cv 1 α t and cv 2 α t are, respectively, the ( 1 α ) -quantiles of the distributions of T w t , ( 1 ) and T w t , ( 2 ) . The estimated critical values cv ^ 1 α t and cv ^ 2 α t can be obtained by (4). Thus, t ^ 1 = min { t [ 3 w / 2 , n 3 w / 2 ] : T w t , ( 1 ) > cv ^ 1 α t } and t ^ 2 = max { t [ 3 w / 2 , n 3 w / 2 ] : T w t , ( 2 ) > cv ^ 2 α t } . Hence, the estimator of τ 0 is given by
τ ^ 0 = t ^ 1 + t ^ 2 2 .
We utilize T w t , ( 1 ) as an illustrative example to elucidate the applicability of our proposed method. Let w be an even integer. For any t [ 5 w / 2 , n 3 w / 2 ] , we have T w t , ( 1 ) = | w 1 / 2 l = 1 w ( X t w / 2 + l X l ) | , where the sequence { X t w / 2 + l X l } l = 1 w possesses the same weakly dependence properties and similar moment/tail conditions as { X l } l = 1 n . For t [ 3 w / 2 , 5 w / 2 1 ] , let { X ˜ l } l = 1 t w / 2 be defined as X ˜ l = X l when l [ 1 , w ] and X ˜ l = 0 p when l [ w + 1 , t w / 2 ] . Additionally, define { Y ˜ l } l = t w / 2 + 1 2 t w as Y ˜ l = X l when l [ t w / 2 + 1 , t + w / 2 ] and Y ˜ l = 0 p when l [ t + w / 2 + 1 , 2 t w ] . Then, T w t , ( 1 ) can be expressed as | w 1 / 2 l = 1 t / 2 w / 4 { ( Y ˜ t w / 2 + l X ˜ l ) + ( Y ˜ 2 t w + 1 l X ˜ t w / 2 + 1 l ) } | , and { ( Y ˜ t w / 2 + l X ˜ l ) + ( Y ˜ 2 t w + 1 l X ˜ t w / 2 + 1 l ) } l = 1 t / 2 w / 4 shares the same weakly dependence properties and similar moment/tail conditions as { X l } l = 1 n . Hence, our method can be applied to change point detection.
The selections of w and α are crucial in this method. We will elaborate on the specific choices for them in future works.

5. Simulation Study

5.1. Tuning Parameter Selection

Given the observations { X t } t = 1 n 1 and { Y t } t = 1 n 2 , we use the minimum volatility (MV) method proposed in [27] to select the block size S.
When the data are independent, by the multiplier bootstrap method described in [28], we set B = n ˜ (thus S = 1 ). In this case,
Ξ ^ n ˜ = var 1 n ˜ t = 1 n ˜ ( Z t Z ¯ ) ϱ t | Z 1 , , Z n ˜ = 1 n ˜ b = 1 B t I b ( Z t Z ¯ ) t I b ( Z t Z ¯ ) T = 1 n ˜ t = 1 n ˜ ( Z t Z ¯ ) ( Z t Z ¯ ) T
proves to be a reliable estimator of Ξ n ˜ introduced in Section 3. When the data are weakly dependent (and thus nearly independent), we expect a small value for S and a large value for B. Therefore, we recommend exploring a narrow range of S, such as S { 1 , , m } , where m is a moderate integer. In our theoretical proof, the quality of the bootstrap approximation depends on how well the Ξ ^ n ˜ approximates the covariance Ξ n ˜ . The idea behind the MV method is that the conditional covariance Ξ ^ n ˜ should exhibit stable behavior as a function of S within an appropriate range. For more comprehensive discussions on the MV method and its applications in time series analysis, we refer readers to [27,29]. For a moderately sized integer m, let S 1 < S 2 < < S m be a sequence of equally spaced candidate block sizes, and S 0 = 2 S 1 S 2 , S m + 1 = 2 S m S m 1 . For each i { 0 , , m + 1 } , let
Y j i = b = 1 B ( S i ) t I b ( Z t , j Z ¯ j ) 2 ,
where j [ p ] and B ( S ) = n ˜ / S . Then for each i { 1 , , m } , we compute
Y i = j = 1 p sd { Y j l } l = i 1 i + 1 ,
where sd ( · ) is the standard deviation. Then, we select the block size S i * with i * = arg min i { 1 , , m } Y i .

5.2. Simulation Settings

We present the results of a simulation study aimed at evaluating the performance of tests based on T n , as defined in (2), in finite samples. To assess the finite-sample properties of the proposed test, we employed the following fundamental generating processes: W = H A + f ( a ) R n × p , where A R p × p is the loading matrix, f ( · ) : R R n × p is a constant function, the parameter a belongs to the set { 0 , 0.1 , 0.2 , 0.3 , 0.4 , 0.5 , 0.6 } , representing the distance between the null and alternative hypotheses. Additionally, H = ( H 1 , , H n ) T R n × p with H t = ρ H t 1 + ε t R p × 1 , where ε t i i d N ( 0 p , I p ) and ρ { 0 , 0.1 , 0.2 } . Construct f i ( a ) = ( m 1 ( i ) , , m n ( i ) ) T R n × p such that m t ( i ) = ( m t , 1 ( i ) , , m t , p ( i ) ) T for i { 1 , 2 } , where m t , j ( 1 ) = a j and m t , j ( 2 ) = a ( 1 j / p ) for each t [ n ] and j [ p ] . Then f 1 ( · ) and f 2 ( · ) represent the sparse and dense signal cases, respectively. We consider three different loading matrices for A as follows:
(M1).
Let V = ( v k , l ) 1 k , l p s.t. v k , l = 0 . 995 | k l | , then let A = V 1 / 2 .
(M2).
Let A = ( a k , l ) 1 k , l p s.t. a k , k = 1 , a k , l = 0.7 for | k l | = 1 and a k , l = 0 otherwise.
(M3).
Let r = p / 2.5 , V = ( v k , l ) 1 k , l p , where v k , k = 1 , v k , l = 0.9 for r ( q 1 ) + 1 k l r q with q = 1 , , p / r , and v k , l = 0 otherwise. Let A = V 1 / 2 .
We assess the finite sample performance of our proposed test (denoted by Yang ) in comparison with tests introduced by [5] (denoted by Dempster ), [4] (denoted by BS ), [6] (denoted by SD ), and [8] (denoted by CLX ). All tests in our simulations are conducted at the 5 % significance level with 1000 Monte Carlo replications, and the number of bootstrap replications is set to 1000. We consider dimensions p { 50 , 200 , 400 , 800 } and sample size pairs ( n 1 , n 2 ) { ( 200 , 220 ) , ( 400 , 420 ) } .

5.3. Simulation Results

For the testing of the null hypothesis, consider independent generations of { X t } and { Y t } , following the same process as W , with identical values for ρ and f ( a ) = 0 . The choice of f ( a ) = 0 here is made for the sake of simplicity. We exclusively present the simulation results for (M1) in the main body of the paper. The results obtained for (M2) and (M3) are analogous to those of (M1) and are detailed in the Appendix E.
Table 1 presents the performance of various methods in controlling Type I errors based on (M1). As the dimension p or sample size ( n 1 , n 2 ) increases, the results of all methods exhibit small changes, except BS’s. When ρ equals 0, indicating samples are generated from independent Gaussian distributions, both Yang’s method and BS’s method effectively control Type I errors at around 5 % , while the control achieved by the other three methods is less optimal. It is noteworthy that, with an increase in ρ , the data generated by the AR(1) model significantly influence the other methods. In contrast, Yang’s method demonstrates superior and more stable results with increasing ρ . These comparative effects are also observable in the results based on (M2) and (M3) in the Appendix E. For this reason, we exclusively compare the empirical power results by different methods with ρ = 0 .
Figure 1 and Figure 2 depict the empirical power results of various methods for sparse and dense signals based on (M1). Similarly, as the dimension p increases, the results of all methods show little variation, except Dempster’s. However, with an increase in sample size ( n 1 , n 2 ) , most methods exhibit improvement in their results. In Figure 1, it is evident that Yang’s method outperforms others significantly when the signal is sparse. Methods like SD, BS, and Dempster rely on the 2 -norm of the data, aggregating signals across all dimensions for testing. This makes them less effective when the signal is sparse, i.e., anomalies appear in only a few dimensions. CLX’s approach, akin to Yang’s, tests whether the largest signal is abnormal. Consequently, CLX performs better than the other three methods in scenarios with sparse signals but still falls short of Yang’s method. On the contrary, when the signal is dense, Figure 2 shows that all methods yield favorable results, with Dempster’s method proving to be the most effective. Yang’s method performs at a relatively high level among these methods. In contrast, the CLX’s method, which performs well in sparse signals, exhibits a relatively lower level of performance in dense signals. In conclusion, the proposed method exhibits the most stable performance across all methods and performs exceptionally well on sparse data.

6. Real Data Analysis

In this section, we apply the proposed method to a dataset comprised of stock data obtained from Bloomberg’s public database. This dataset includes daily opening prices from 1 January 2018 to 31 December 2021 for 30 companies in the Consumer Discretionary Sector (CDS) and 31 companies in the Information Technology Sector (ITS), all listed in the S&P 500. The sample sizes for the years 2018, 2019, 2020, and 2021 are 251, 250, 253, and 252, respectively. The findings are presented in Table 2. Regarding the data for the Consumer Discretionary (CD) and Information Technology (IT) sectors, all p-values from the tests between two consecutive years are 0. This suggests a significant variation in the average annual opening prices across different years for both CDs and ITs.
For data visualization, Figure 3 displays the average annual opening prices of 30 companies in the CDS (left subgragh) and 31 companies in the ITS (right subgragh) in 2018, 2019, 2020, and 2021. The two subgraghs both exhibit a pattern of annual growth in the opening prices of nearly every stock. These results are well in line with the conclusion of Table 2.

7. Discussion

In this paper, we propose a two-sample test for high-dimensional time series based on blockwise bootstrap. Our -type test statistic is designed to detect the largest abnormal signal among dimensions. Unlike some frameworks, we do not necessarily require independence within each observation or between the two sets of observations. Instead, we rely on the weak dependence property of the pair sequence { ( X t , Y t ) } to ensure the asymptotic properties of our proposed method. We derive two Gaussian approximation results for two cases in which the dimension p diverges, one at a polynomial rate relative to the sample size n and the other at an exponential rate relative to the sample size n. In the bootstrap procedure, the block size serves as the tuning parameter, and we employ the minimum volatility method, as proposed by [27], for block size selection.
Our test statistic is designed to pinpoint the maximum value among dimensions, facilitating the detection of significant differences in certain dimensions. In cases where differences in each dimension are minimal, it is more appropriate to consider the 2 -type test statistic rather than the -type one. Consequently, in the absence of prior information, the utilization of test statistics that combine both types proves advantageous. However, deriving theoretical results from such a combined approach is a significant challenge. As discussed in Section 4, our two-sample testing procedure can be applied to change point detection in high-dimensional time series. The choices of w, the size of each subsample mean, and the significance level α play crucial roles in this change point detection procedure. We leave these considerations for future research.

Funding

This research received no external funding.

Data Availability Statement

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The author declares no conflict of interest.

Appendix A. Proof of Theorem 1

Appendix A.1. Proof of Theorem 1(i)

Proof. 
We first show that, for any τ > ( q 1 ) m / ( m q ) with some q [ 2 , m ] ,
max j [ p ] E | t = 1 n ξ t , j | q n q / 2 .
If q = 2 , due to κ = 1 α m 2 m ( κ ) κ = 1 κ ( m 2 ) τ m < , Equation (1.12b) (Davydov’s inequality) of [30] yields
E t = 1 n ξ t , j 2 = t = 1 n E ( | ξ t , j | 2 ) + t 1 t 2 cov ( ξ t 1 , j , ξ t 2 , j ) n + t 1 t 2 { E ( | ξ t 1 , j | m ) } 1 m { E ( | ξ t 2 , j | m ) } 1 m α m 2 m ( | t 1 t 2 | ) n + n κ = 1 n α m 2 m ( κ ) n
for any j [ p ] . For q > 2 and j [ p ] , Theorem 6.3 of [30] yields
E | t = 1 n ξ t , j | q a q s n , j q + n b q 0 1 [ α 1 ( u ) n ] q 1 sup t [ n ] Q t , j ( u ) q d u ,
where a q , b q > 0 are two constants depending only on q, s n , j 2 = t 1 , t 2 = 1 n | Cov ( ξ t 1 , j , ξ t 2 , j ) | , α 1 ( u ) = κ 0 1 ( u α ( κ ) ) and Q t , j ( u ) = inf { x : P ( | ξ t , j | > x ) u } . By (A2), it holds that s n , j q = ( s n , j 2 ) q / 2 n q / 2 . Due to max t [ n ] max j [ p ] E ( | ξ t , j | m ) C , we have max t [ n ] max j [ p ] Q t , j ( u ) u 1 m . By the denifition of α 1 ( · ) , we know that α 1 ( u ) u 1 τ . Thus
0 1 [ α 1 ( u ) n ] q 1 sup t [ n ] Q t , j ( u ) q d u 0 1 u q 1 τ q m d u C ,
where the last inequality follows from τ > ( q 1 ) m / ( m q ) . Hence, we have
E | t = 1 n ξ t , j | q n q / 2
for any j [ p ] . By combining above results, we complete the proof of (A1).
Now, we begin to prove Theorem 1(i). Define
ω n = sup x > 0 | P max j [ p ] S n , j x P max j [ p ] W j x | .
Let S ˇ n = ( S ˇ n , 1 , , S ˇ n , 2 p ) T = ( S n , 1 , S n , 1 , , S n , p , S n , p ) T and W ˇ = ( W ˇ 1 , , W ˇ 2 p ) T = ( W 1 , W 1 , , W p , W p ) T . Then, we have max j [ p ] | S n , j | = max j [ 2 p ] S ˇ n , j and max j [ p ] | W j | = max j [ 2 p ] W ˇ j . Then, to obtain Theorem 1(i), without loss of generality, it suffices to specify the convergence rate of ω n .
For some constant ς ( 0 , 1 ) , let B n = n ς and K n = n / B n be the number of blocks and the size of each block, respectively. For simplicity, we assume B n n ς and K n = n / B n n 1 ς . We first decompose the sequence { 1 , , n } into B n blocks: G b = { ( b 1 ) K n + 1 , , b K n } for b [ B n ] . Let g n k n be two non-negative integers such that K n = g n + k n . We then decompose each G b ( b [ B n ] ) to a “large” block I b with length g n and a “small” block J b with length k n : I b = { ( b 1 ) K n + 1 , , b K n k n } and J b = { b K n k n + 1 , , b K n } . Let H b = ( H b , 1 , , H b , p ) T = K n 1 / 2 t I b ξ t . For each b [ B n ] and some D n , define H b + = ( H b , 1 + , , H b , p + ) T with H b , j + = H b , j 1 ( | H b , j | D n ) E { H b , j 1 ( | H b , j | D n ) } and H b = ( H b , 1 , , H b , p ) T with H b , j = H b , j 1 ( | H b , j | > D n ) E { H b , j 1 ( | H b , j | > D n ) } . For each j [ p ] , by Theorem 2 of [17], there exists an independent sequence { H ˜ b , j } b = 1 B n such that H ˜ b , j has the same distribution as H b , j + and
E ( | H ˜ b , j H b , j + | ) 0 α ( k n ) inf { x R : P ( | H b , j + | > x ) u } d u .
Due to | H b , j + | 2 D n , we have inf { x R : P ( | H b , j + | > x ) u } D n for any u 0 , which implies
E ( | H ˜ b , j H b , j + | ) D n α ( k n ) .
Define S ˜ n = ( S ˜ n , 1 , , S ˜ n , p ) T = B n 1 / 2 b = 1 B n H ˜ b with S ˜ n , j = B n 1 / 2 b = 1 B n H ˜ b , j and
ω ˜ n = sup x > 0 | P max j [ p ] S ˜ n , j x P max j [ p ] W j x | .
For any ϵ 1 > 0 , triangle inequality implies
P max j [ p ] S n , j x P max j [ p ] S ˜ n , j x + ϵ 1 + P | max j [ p ] S n , j max j [ p ] S ˜ n , j | > ϵ 1 P max j [ p ] W j x + ϵ 1 + ω ˜ n + P | max j [ p ] S n , j max j [ p ] S ˜ n , j | > ϵ 1 P max j [ p ] W j x + P x ϵ 1 < max j [ p ] W j x + ϵ 1 + ω ˜ n + P | max j [ p ] S n , j max j [ p ] S ˜ n , j | > ϵ 1
for any x > 0 , then P ( max j [ p ] S n , j x ) P ( max j [ p ] W j x ) P ( x ϵ 1 < max j [ p ] W j x + ϵ 1 ) + ω ˜ n + P ( | max j [ p ] S n , j max j [ p ] S ˜ n , j | > ϵ 1 ) . Likewise, P ( max j [ p ] S n , j x ) P ( max j [ p ] W j x ) P ( x ϵ 1 < max j [ p ] W j x + ϵ 1 ) ω ˜ n P ( | max j [ p ] S n , j max j [ p ] S ˜ n , j | > ϵ 1 ) . Due to min j [ p ] ( Σ n ) j , j λ min ( Σ n ) c , Lemma A.1 of [31] yields
sup x R P x ϵ 1 < max j [ p ] W j x + ϵ 1 ϵ 1 ( log p ) 1 / 2
for any ϵ 1 > 0 . Thus, we can conclude that
ω n ω ˜ n + P | max j [ p ] S n , j max j [ p ] S ˜ n , j | > ϵ 1 + ϵ 1 ( log p ) 1 / 2 .
Define S n + = ( S n , 1 + , , S n , p + ) T = B n 1 / 2 b = 1 B n H b + . By triangle inequality,
| max j [ p ] S n , j max j [ p ] S n , j + | max j [ p ] | S n , j S n , j + | max j [ p ] | 1 n 1 / 2 b = 1 B n t J b ξ t , j | + max j [ p ] | 1 B n 1 / 2 b = 1 B n H b , j | .
By (A1), we have E ( | H b , j | 3 ) C . Thus E ( | H b , j | 3 ) E ( | H b , j | 3 ) C , and
E ( | H b , j | 2 ) E { | H b , j | 2 1 ( | H b , j | > D n ) } E ( | H b , j | 3 ) D n 1 D n 1 .
Similar to (A2), we have E ( | b = 1 B n t J b ξ t , j | ) B n 1 / 2 k n 1 / 2 for any j [ p ] , and
E b = 1 B n H b , j 2 = b = 1 B n E ( | H b , j | 2 ) + b 1 b 2 cov ( H b 1 , j , H b 2 , j ) B n D n 1 + b 1 b 2 α 1 3 k n 1 ( | b 1 b 2 | = 1 ) + | b 2 b 1 1 | K n 1 ( | b 1 b 2 | > 1 ) B n D n 1 + | b 1 b 2 | = 1 α 1 3 ( k n ) + | b 1 b 2 | > 1 α 1 3 ( | b 1 b 2 1 | K n ) B n D n 1 + B n k n τ 3 ,
where the last inequality follows from τ > 3 . Thus, E ( | b = 1 B n H b , j | ) B n 1 / 2 ( D n 1 / 2 + k n τ / 6 ) and
E | max j [ p ] S n , j max j [ p ] S n , j + | p k n 1 / 2 K n 1 / 2 + p D n 1 / 2 + p k n τ / 6 .
Due to H ˜ b , j having the same distribution as H b , j + and | H b , j + | 2 D n , by (A3), we have E ( | H ˜ b , j H b , j + | s ) D n s k n τ for s { 2 , 3 } . Thus, following the same arguments as in the proof of (A7), it holds that
E b = 1 B n ( H ˜ b , j H b , j + ) 2 b = 1 B n D n 2 k n τ + D n 2 k n 2 τ / 3 | b 1 b 2 | = 1 α 1 3 ( k n ) + | b 1 b 2 | > 1 α 1 3 ( | b 1 b 2 1 | K n ) B n D n 2 k n τ .
Thus, E ( | b = 1 B n ( H ˜ b , j H b , j + ) | ) B n 1 / 2 D n k n τ / 2 and
E | max j [ p ] S ˜ n , j max j [ p ] S n , j + | E max j [ p ] | S ˜ n , j S n , j + | p D n k n τ / 2 .
Together with (A8), we have
E | max j [ p ] S n , j max j [ p ] S ˜ n , j | p k n 1 / 2 K n 1 / 2 + p D n 1 / 2 + p k n τ / 6 + p D n k n τ / 2 .
Let ϵ 1 = p 1 / 2 ( log p ) 1 / 4 ( k n 1 / 2 K n 1 / 2 + D n 1 / 2 + k n τ / 6 + D n k n τ / 2 ) 1 / 2 . It holds by (A5) and Markov inequality that
ω n ω ˜ n + p 1 / 2 ( log p ) 1 / 4 k n 1 / 2 K n 1 / 2 + 1 D n 1 / 2 + 1 k n τ / 6 + D n k n τ / 2 1 / 2 .
Define Σ ˜ G = B n 1 b = 1 B n var ( H ˜ b ) and Δ = | Σ n Σ ˜ G | , where Σ n = E ( S n S n T ) . Note that
Δ = | 1 B n b = 1 B n var ( H ˜ b ) var ( H b + ) + 1 B n b = 1 B n var ( H b + ) var ( H b ) + 1 B n b = 1 B n var ( H b ) Σ n | 1 B n b = 1 B n | var ( H ˜ b ) var ( H b + ) | Δ 1 + 1 B n b = 1 B n | var ( H b + ) var ( H b ) | Δ 2 + | 1 B n b = 1 B n var ( H b ) Σ n | Δ 3 .
In this sequel, we specify the convergence rates of | Δ 1 | , | Δ 2 | , and | Δ 3 | , respectively. Note that the ( i , j ) -th element of var ( H ˜ b ) var ( H b + ) is E ( H ˜ b , i H ˜ b , j H b , i + H b , j + ) . Due to H ˜ b , j having the same distribution as H b , j + and | H b , j + | D n for any b [ B n ] and j [ p ] , it holds by (A3) that
| E ( H ˜ b , i H ˜ b , j H b , i + H b , j + ) | | E { ( H ˜ b , i H b , i + ) H ˜ b , j } | + | E { ( H ˜ b , j H b , j + ) H ˜ b , i + } | D n 2 k n τ
for any b [ B n ] and i , j [ p ] . Thus, we can conclude that | Δ 1 | D n 2 k n τ . The ( i , j ) -th element of var ( H b + ) var ( H b ) is E ( H b , i + H b , j + H b , i H b , j ) . Note that E ( | H b , j | ) E { | H b , j | 1 ( | H b , j | > D n ) } E ( | H b , j | 3 ) D n 2 D n 2 . Due to H b , j = H b , j + + H b , j , it holds by (A6) that
| E ( H b , i + H b , j + H b , i H b , j ) | = | E { H b , i + H b , j + ( H b , i + + H b , i ) ( H b , j + + H b , j ) } | | E ( H b , i + H b , j ) | + | E ( H b , j + H b , i ) | + | E ( H b , i H b , j ) | D n 1
for any b [ B n ] and i , j [ p ] . Thus, we can conclude that | Δ 2 | D n 1 . The ( i , j ) -th element of Σ n B n 1 b = 1 B n var ( H b ) is n 1 t 1 , t 2 = 1 n E ( ξ t 1 , i ξ t 2 , j ) n 1 b = 1 B n t 1 , t 2 I b E ( ξ t 1 , i ξ t 2 , j ) , and
| 1 n t 1 , t 2 = 1 n E ( ξ t 1 , i ξ t 2 , j ) 1 n b = 1 B n t 1 , t 2 I b E ( ξ t 1 , i ξ t 2 , j ) | = 1 n | b 1 b 2 E t G b 1 ξ t , i t G b 2 ξ t , j + b = 1 B n E t I b ξ t , i t J b ξ t , j + t J b ξ t , i t G b ξ t , j | .
Similar to the proof of (A2), we have
| E t J b ξ t , i t G b ξ t , j | = | t J b cov ( ξ t , i , ξ t , j ) + t 1 t 2 : t 1 , t 2 J b cov ( ξ t 1 , i , ξ t 2 , j ) + t 1 J b t 2 I b cov ( ξ t 1 , i , ξ t 2 , j ) | k n + t 1 t 2 : t 1 , t 2 J b { E ( | ξ t 1 , i | 3 ) } 1 3 { E ( | ξ t 2 , j | 3 ) } 1 3 α 1 3 ( | t 1 t 2 | ) + t 1 J b t 2 I b { E ( | ξ t 1 , i | 3 ) } 1 3 { E ( | ξ t 2 , j | 3 ) } 1 3 α 1 3 ( | t 1 t 2 | ) k n .
Similarly, we can also obtain
| E t I b ξ t , i t J b ξ t , j | k n .
Thus,
| b = 1 B n E t I b ξ t , i t J b ξ t , j + t J b ξ t , i t G b ξ t , j | k n B n .
Analogously to the proof of (A2), if b 1 < b 2 , due to τ > 2 m / ( m 3 ) ,
| E t G b 1 ξ t , i t G b 2 ξ t , j | t 1 G b 1 t 2 G b 2 { E ( | ξ t 1 , i | m ) } 1 m { E ( | ξ t 2 , i | m ) } 1 m α m 2 m ( | t 1 t 2 | ) δ = 1 K n δ α m 2 m { ( b 2 b 1 1 ) K n + δ } 1 ( b 2 b 1 = 1 ) + K n 2 α m 2 m { ( b 2 b 1 1 ) K n } 1 ( b 2 b 1 > 1 ) .
Then,
b 1 < b 2 | E t G b 1 ξ t , i t G b 2 ξ t , j | B n + B n K n 2 m ( m 2 ) τ m δ = 1 B n δ ( m 2 ) τ m B n .
The same result still holds for b 1 > b 2 . Thus, we can conclude that
b 1 b 2 | E t G b 1 ξ t , i t G b 2 ξ t , j | B n .
Then, by (A11), it holds that
| 1 n t 1 , t 2 = 1 n E ( ξ t 1 , i ξ t 2 , j ) 1 n b = 1 B n t 1 , t 2 I b E ( ξ t 1 , i ξ t 2 , j ) | k n K n
for any i , j [ p ] . Thus, | Δ 3 | k n K n 1 . By (A10), we can conclude that
| Δ | | Δ 1 | + | Δ 2 | + | Δ 3 | D n 2 k n τ + 1 D n + k n K n .
Let { H ˜ b G } b = 1 B n be a sequence of an independent Gaussian vector such that H ˜ b G = ( H ˜ b , 1 G , , H ˜ b , p G ) T N { 0 p , var ( H ˜ b ) } for each b [ B n ] , where H ˜ b = ( H ˜ b , 1 , , H ˜ b , p ) T . By Theorem 1.1 of [15], Cauchy–Schwarz inequality and Jensen’s inequality,
sup x > 0 | P max j [ p ] 1 B n 1 / 2 b = 1 B n H ˜ b , j x P max j [ p ] 1 B n 1 / 2 b = 1 B n H ˜ b , j G x | p 1 / 4 · b = 1 B n E ( | Σ ˜ G 1 / 2 B n 1 / 2 H ˜ b | 2 3 ) p 1 / 4 B n 3 / 2 Σ ˜ G 1 / 2 2 3 · b = 1 B n E j = 1 p H ˜ b , j 2 3 / 2 p 7 / 4 B n 3 / 2 Σ ˜ G 1 / 2 2 3 · b = 1 B n max j [ p ] E ( | H ˜ b , j | 3 ) ,
where Σ ˜ G = B n 1 b = 1 B n var ( H ˜ b ) . Note that
| λ min ( Σ ˜ G ) λ min ( Σ n ) | Δ 2 p | Δ | .
Due to λ min ( Σ n ) c , we have λ min ( Σ ˜ G ) c as long as p | Δ | = o ( 1 ) . Thus, if p | Δ | = o ( 1 ) , we have Σ ˜ G 1 / 2 2 C . Since H b , j = K n 1 / 2 t I b ξ t , j , (A1) yields E ( | H ˜ b , j | 3 ) = E ( | H b , j + | 3 ) E ( | H b , j | 3 ) C for any b [ B n ] and j [ p ] , which implies
sup x > 0 | P max j [ p ] 1 B n 1 / 2 b = 1 B n H ˜ b , j x P max j [ p ] 1 B n 1 / 2 b = 1 B n H ˜ b , j G x | p 7 / 4 B n 1 / 2
provided that p | Δ | = o ( 1 ) . By Proposition 2.1 of [16], we have
sup x > 0 | P max j [ p ] 1 B n 1 / 2 b = 1 B n H ˜ b , j G x P max j [ p ] W j x | | Δ | 1 / 2 log p .
Then, by (A4), (A12), and (A13), we have
ω ˜ n p 7 / 4 B n 1 / 2 + | Δ | 1 / 2 log p
provided that p | Δ | = o ( 1 ) . Together with (A9),
ω n p 7 / 4 B n 1 / 2 + | Δ | 1 / 2 log p + p 1 / 2 ( log p ) 1 / 4 k n 1 / 2 K n 1 / 2 + 1 D n 1 / 2 + 1 k n τ / 6 + D n k n τ / 2 1 / 2
provided that p | Δ | = o ( 1 ) . Select D n n 4 τ / ( 11 τ + 12 ) , k n n 12 / ( 11 τ + 12 ) , and ς = 7 τ / ( 11 τ + 12 ) . Then, if p = o { n 2 τ / ( 11 τ + 12 ) } , we have
ω n p 7 / 4 n 7 τ / ( 22 τ + 24 ) + log p n 2 τ / ( 11 τ + 12 ) + p 1 / 2 ( log p ) 1 / 4 n τ / ( 11 τ + 12 ) p 1 / 2 ( log p ) 1 / 4 n τ / ( 11 τ + 12 ) .
Hence, we complete the proof of Theorem 1(i). □

Appendix A.2. Proof of Theorem 1(ii)

Proof. 
Define { ( G b , I b , J b ) } b = 1 B n , { H b + } b = 1 B n , and { H b } b = 1 B n in the same manner as in the proof of Theorem 1(i) with B n n ς , K n n 1 ς , k n n 1 ς and D n , where ς ( 0 , 1 ) . Let
ω n = sup x > 0 | P max j [ p ] S n , j x P max j [ p ] W j x | .
Analogously to (A5), due to min j [ p ] ( Σ n ) j , j > c , we have
ω n ω ˜ n + P | max j [ p ] S n , j max j [ p ] S ˜ n , j | > ϵ 2 + ϵ 2 ( log p ) 1 / 2
for some ϵ 2 > 0 , where S ˜ n , j = B n 1 / 2 b = 1 B n H ˜ b , j with { H ˜ b , j } specified in the same manner as in the proof of Theorem 1(i), and
ω ˜ n = sup x > 0 | P max j [ p ] S ˜ n , j x P max j [ p ] W j x | .
Define S n + = ( S n , 1 + , , S n , p + ) T = B n 1 / 2 b = 1 B n H b + . By triangle inequality,
| max j [ p ] S n , j max j [ p ] S n , j + | max j [ p ] | S n , j S n , j + | max j [ p ] | 1 n 1 / 2 b = 1 B n t J b ξ t , j | + max j [ p ] | 1 B n 1 / 2 b = 1 B n H b , j | .
Note that P ( | ξ t , j M n 1 | > x ) exp ( C x γ 1 ) for any x > 0 . Let γ ˜ = ( 1 / γ 1 + 1 / γ 2 ) 1 . By Theorem 1 of [32] and Bonferroni inequality, we have
P max j [ p ] | 1 n 1 / 2 b = 1 B n t J b ξ t , j | > x p B n k n exp C n γ ˜ / 2 x γ ˜ M n γ ˜ + p exp C n x 2 M n 2 B n k n
for any x M n n 1 / 2 . Similarly, by Theorem 1 of [32] again, for any x M n K n 1 / 2 ,
P ( | H b , j | > x ) = P | 1 K n 1 / 2 t I b ξ t , j | > x K n exp C K n γ ˜ / 2 x γ ˜ M n γ ˜ + exp C x 2 M n 2 .
Then, if D n > M n ,
E { H b , j 2 1 ( | H b , j | > D n ) } = 2 0 D n x P ( | H b , j | > D n ) d x + 2 D n x P ( | H b , j | > x ) d x D n 2 K n exp C K n γ ˜ / 2 D n γ ˜ M n γ ˜ + exp C D n 2 M n 2 + K n D n x exp C K n γ ˜ / 2 x γ ˜ M n γ ˜ d x + D n x exp C x 2 M n 2 d x D n 2 K n C K n γ ˜ / 2 D n γ ˜ M n γ ˜ + exp C D n 2 M n 2 .
Thus, for any b [ B n ] and j [ p ] ,
E ( | H b , j | 2 ) E { H b , j 2 1 ( | H b , j | > D n ) } D n 2 K n C K n γ ˜ / 2 D n γ ˜ M n γ ˜ + exp C D n 2 M n 2 .
Select D n = C * M n { log ( p n ) } 1 / 2 for some sufficiently large constant C * > 0 . Thus, for any x 0 ,
P max j [ p ] | 1 B n 1 / 2 b = 1 B n H b , j | > x p B n 1 / 2 max j [ p ] max b [ B n ] E ( | H b , j | ) x ( p n ) 1 x
provided that log ( p n ) = o { K n γ ˜ / ( 2 γ ˜ ) } . Then, by (A15), we can conclude that for any x M n n 1 / 2 ,
P | max j [ p ] S n , j max j [ p ] S n , j + | > x p B n k n exp C n γ ˜ / 2 x γ ˜ M n γ ˜ + p exp C n x 2 M n 2 B n k n + ( p n ) 1 x
provided that log ( p n ) = o { K n γ ˜ / ( 2 γ ˜ ) } . Similar to (A3), we have
E ( | H ˜ b , j H b , j + | ) D n α ( k n ) D n exp ( C k n γ 2 ) .
Select k n = C * * { log ( p n ) } 1 / γ 2 for some sufficiently large constant C * * > 0 . By (A18) and triangle inequality,
P | max j [ p ] S ˜ n , j max j [ p ] S n , j + | > x p B n 1 / 2 max b [ B n ] max j [ p ] E ( | H ˜ b , j H b , j + | ) x ( p n ) 1 x
for any x 0 . Thus, by (A17), for any x M n n 1 / 2 ,
P | max j [ p ] S n , j max j [ p ] S ˜ n , j | > x p B n k n exp C n γ ˜ / 2 x γ ˜ M n γ ˜ + p exp C n x 2 M n 2 B n k n + ( p n ) 1 x
provided that log ( p n ) = o { K n γ ˜ / ( 2 γ ˜ ) } . Let ϵ 2 = C * * * M n k n 1 / 2 K n 1 / 2 { log ( p n ) } 1 / 2 for some sufficient large constant C * * * > 0 . It holds by (A14) that
ω n ω ˜ n + M n { log ( p n ) } ( 2 γ 2 + 1 ) / 2 γ 2 K n 1 / 2
provided that log ( p n ) = o { k n γ ˜ / ( 2 γ ˜ ) B n γ ˜ / ( 2 γ ˜ ) K n γ ˜ / ( 2 γ ˜ ) } . Define Σ ˜ G = B n 1 b = 1 B n var ( H ˜ b ) and Δ = | Σ n Σ ˜ G | , where Σ n = E ( S n S n T ) . Note that
Δ = | 1 B n b = 1 B n var ( H ˜ b ) var ( H b + ) + 1 B n b = 1 B n var ( H b + ) var ( H b ) + 1 B n b = 1 B n var ( H b ) Σ n | 1 B n b = 1 B n | var ( H ˜ b ) var ( H b + ) | Δ 1 + 1 B n b = 1 B n | var ( H b + ) var ( H b ) | Δ 2 + | 1 B n b = 1 B n var ( H b ) Σ n | Δ 3 .
In this sequel, we will specify the convergence rates of | Δ 1 | , | Δ 2 | and | Δ 3 | , respectively. Note that the ( i , j ) -th element of var ( H ˜ b ) var ( H b + ) is E ( H ˜ b , i H ˜ b , j H b , i + H b , j + ) . Due to H ˜ b , j has the same distribution as H b , j + and | H b , j + | D n for any b [ B n ] and j [ p ] , it holds by (A18) that
| E ( H ˜ b , i H ˜ b , j H b , i + H b , j + ) | | E { ( H ˜ b , i H b , i + ) H ˜ b , j } | + | E { ( H ˜ b , j H b , j + ) H ˜ b , i + } | ( p n ) 1
for any b [ B n ] and i , j [ p ] . Thus, we can conclude that | Δ 1 | ( p n ) 1 . The ( i , j ) -th element of var ( H b + ) var ( H b ) is E ( H b , i + H b , j + H b , i H b , j ) . Due to H b , j = H b , j + + H b , j , then it holds by (A16) that
| E ( H b , i + H b , j + H b , i H b , j ) | = | E { H b , i + H b , j + ( H b , i + + H b , i ) ( H b , j + + H b , j ) } | | E ( H b , i + H b , j ) | + | E ( H b , j + H b , i ) | + | E ( H b , i H b , j ) | ( p n ) 1
for any b [ B n ] and i , j [ p ] provided that log ( p n ) = o { K n γ ˜ / ( 2 γ ˜ ) } . Thus, we can conclude that | Δ 2 | ( p n ) 1 provided that log ( p n ) = o { K n γ ˜ / ( 2 γ ˜ ) } . The ( i , j ) -th element of Σ n B n 1 b = 1 B n var ( H b ) is n 1 t 1 , t 2 = 1 n E ( ξ t 1 , i ξ t 2 , j ) n 1 b = 1 B n t 1 , t 2 I b E ( ξ t 1 , i ξ t 2 , j ) , and
| 1 n t 1 , t 2 = 1 n E ( ξ t 1 , i ξ t 2 , j ) 1 n b = 1 B n t 1 , t 2 I b E ( ξ t 1 , i ξ t 2 , j ) | = 1 n | b 1 b 2 E t G b 1 ξ t , i t G b 2 ξ t , j + b = 1 B n E t I b ξ t , i t J b ξ t , j + t J b ξ t , i t G b ξ t , j | .
Note that E ( | ξ t , j | r ) M n r for any constant integer r > 0 . Equation (1.12b) of [30] yields
| E t J b ξ t , i t G b ξ t , j | = | t J b cov ( ξ t , i , ξ t , j ) + t 1 t 2 : t 1 , t 2 J b cov ( ξ t 1 , i , ξ t 2 , j ) + t 1 J b t 2 I b cov ( ξ t 1 , i , ξ t 2 , j ) | M n 2 k n + t 1 t 2 : t 1 , t 2 J b { E ( | ξ t 1 , i | 3 ) } 1 3 { E ( | ξ t 2 , j | 3 ) } 1 3 α 1 3 ( | t 1 t 2 | ) + t 1 J b t 2 I b { E ( | ξ t 1 , i | 3 ) } 1 3 { E ( | ξ t 2 , j | 3 ) } 1 3 α 1 3 ( | t 1 t 2 | ) M n 2 k n .
Similarly, we can also obtain
| E t I b ξ t , i t J b ξ t , j | M n 2 k n .
Thus,
| b = 1 B n E t I b ξ t , i t J b ξ t , j + t J b ξ t , i t G b ξ t , j | M n 2 k n B n .
By Equation (1.12b) of [30], if b 1 < b 2 ,
| E t G b 1 ξ t , i t G b 2 ξ t , j | t 1 G b 1 t 2 G b 2 { E ( | ξ t 1 , i | 3 ) } 1 3 { E ( | ξ t 2 , j | 3 ) } 1 3 α 1 3 ( | t 1 t 2 | ) M n 2 δ = 1 K n δ exp [ C { ( b 2 b 1 1 ) K n + δ } γ 2 ] M n 2 1 ( b 2 b 1 = 1 ) + M n 2 K n 2 exp { C ( b 2 b 1 1 ) γ 2 K n γ 2 } 1 ( b 2 b 1 > 1 ) .
Thus,
b 1 < b 2 | E t G b 1 ξ t , i t G b 2 ξ t , j | M n 2 B n + M n 2 K n 2 b 2 b 1 = 2 B n 1 exp { C ( b 2 b 1 1 ) γ 2 K n γ 2 } M n 2 B n .
Same result holds for b 1 > b 2 . Thus we can conclude that
b 1 b 2 | E t G b 1 ξ t , i t G b 2 ξ t , j | M n 2 B n .
Note that the above upper bounds do not depend on ( i , j ) . Then by (A22), it holds that | Δ 3 | M n 2 k n K n 1 . By (A21), we can conclude that
| Δ | M n 2 k n K n
provided that log ( p n ) = o { K n γ ˜ / ( 2 γ ˜ ) } . Let { H ˜ b G } b = 1 B n be a sequence of independent Gaussian vector such that H ˜ b G = ( H ˜ b , 1 G , , H ˜ b , p G ) T N { 0 p , var ( H ˜ b ) } for any b [ B n ] , where H ˜ b = ( H ˜ b , 1 , , H ˜ b , p ) T . Due to k n { log ( p n ) } 1 / γ 2 , we know that min j [ p ] ( Σ ˜ G ) j , j > c provided that M n 2 { log ( p n ) } 1 / γ 2 = o ( K n ) and log ( p n ) = o { K n γ ˜ / ( 2 γ ˜ ) } . Due to H ˜ b , j 2 D n M n { log ( p n ) } 1 / 2 , it holds that E ( H ˜ b , j 4 ) D n 2 E ( H ˜ b , j 2 ) M n 4 log ( p n ) for any b [ B n ] and j [ p ] , where the last inequality follows from E ( H ˜ b , j 2 ) = E ( | H b , j + | 2 ) E ( H b , j 2 ) and the similar arguments as in the proof of (A23). By Theorem 2.1 of [16], we have
sup x > 0 | P max j [ p ] S ˜ n , j x P max j [ p ] 1 B n 1 / 2 b = 1 B n H ˜ b , j G x | M n { log ( p n ) } 3 / 2 B n 1 / 4 .
provided that M n 2 { log ( p n ) } 1 / γ 2 = o ( K n ) and log ( p n ) = o { K n γ ˜ / ( 2 γ ˜ ) } . By Proposition 2.1 of [16] and (A24), we have
sup x > 0 | P max j [ p ] 1 B n 1 / 2 b = 1 B n H ˜ b , j G x P max j [ p ] W j x | | Δ | 1 / 2 log p M n { log ( p n ) } ( 2 γ 2 + 1 ) / 2 γ 2 K n 1 / 2 .
By (A20), (A25) and (A26), due to γ ˜ = ( 1 / γ 1 + 1 / γ 2 ) 1 , we have
ω n M n { log ( p n ) } 3 / 2 B n 1 / 4 + M n { log ( p n ) } ( 2 γ 2 + 1 ) / 2 γ 2 K n 1 / 2
provided that log ( p n ) = o { B n γ 1 γ 2 / ( γ 1 + 2 γ 2 γ 1 γ 2 ) K n γ 1 γ 2 / ( 2 γ 1 + 2 γ 2 γ 1 γ 2 ) } and M n 2 { log ( p n ) } 1 / γ 2 = o ( K n ) . Select ς = 2 / 3 . Then B n n 2 / 3 , K n n 1 / 3 and
ω n M n { log ( p n ) } max { ( 2 γ 2 + 1 ) / 2 γ 2 , 3 / 2 } n 1 / 6
provided that M n 2 { log ( p n ) } 1 / γ 2 = o ( n 1 / 3 ) and { log ( p n ) } 3 = o { n γ 1 γ 2 / ( 2 γ 1 + 2 γ 2 γ 1 γ 2 ) } . Thus we complete the proof of Theorem 1(ii). □

Appendix B. Proof of Proposition 1

Proof. 
Define
T ˚ n = | 1 n ˜ t = 1 n ˜ Z ˚ t | ,
where Z ˚ t = Z t E ( Z t ) . Under H 0 , we know that μ X = μ Y = : μ . Recall n 1 n 2 n and Δ n = n 1 n 2 n 1 n 2 . Without loss of generality, we assume n 1 n 2 . By triangle inequality, for any j [ p ] ,
| t = 1 n ˜ Z ˚ t , j t = 1 n ˜ Z t , j | t = 1 n 1 | n 2 2 n 1 ( n 1 + n 2 ) n 1 ( n 1 + n 2 ) | + t = n 1 + 1 n 2 | n 1 ( n 1 + n 2 ) | = O ( Δ n ) .
Thus | T n T ˚ n | = O ( Δ n n 1 / 2 ) . Write δ n = Δ n n 1 / 2 π n , where π n > 0 diverges at a sufficiently slow rate. Thus, we have
P ( T n x ) P ( T ˚ n x + δ n ) + P ( | T n T ˚ n | > δ n ) P ( T n G x + δ n ) + sup x R | P ( T ˚ n x ) P ( T n G x ) | + o ( 1 ) P ( T n G x ) + sup x R P ( x δ n T n G x + δ n ) + sup x R | P ( T ˚ n x ) P ( T n G x ) | + o ( 1 ) .
Analogously, we can also obtain that P ( T n x ) P ( T n G x ) sup x R P ( x δ n T n G x + δ n ) sup x R | P ( T ˚ n x ) P ( T n G x ) | o ( 1 ) . Thus,
sup x R | P ( T n x ) P ( T n G x ) | sup x R P ( x δ n T n G x + δ n ) + sup x R | P ( T ˚ n x ) P ( T n G x ) | + o ( 1 ) .
In Case1, by Assumption 1(iii), we have min j [ p ] ( Ξ n ˜ ) j , j > c . Then by Lemma A.1 of [31], due to Δ n 2 log p = o ( n ) , we have sup x R P ( x δ n T n G x + δ n ) Δ n n 1 / 2 π n ( log p ) 1 / 2 = o ( 1 ) . By Assumption 1(i), we have max t [ n ˜ ] max j [ p ] E ( | Z ˚ t , j | m ) C . Note that Ξ n ˜ = E ( n ˜ 1 / 2 t = 1 n ˜ Z ˚ t , n ˜ 1 / 2 t = 1 n ˜ Z ˚ t T ) . Then by Assumption 1 and Theorem 1(i), due to 3 m / ( m 4 ) > max { 2 m / ( m 3 ) , 3 } , we have sup x R | P ( T ˚ n x ) P ( T n G x ) | = o ( 1 ) provided that p 2 log p = o { n 4 τ / ( 11 τ + 12 ) } . Thus, if p 2 log p = o { n 4 τ / ( 11 τ + 12 ) } ,
sup x R | P ( T n x ) P ( T n G x ) | = o ( 1 ) .
Similarly, in Case2, by Assumption 2 and Theorem 1(ii) with ( M n , γ 1 , γ 2 ) = ( C , 2 , 1 ) , we have sup x R P ( x δ n T n G x + δ n ) Δ n n 1 / 2 π n ( log p ) 1 / 2 = o ( 1 ) and sup x R | P ( T ˚ n x ) P ( T n G x ) | = o ( 1 ) provided that log ( p n ) = o ( n 1 / 9 ) . Thus, if log ( p n ) = o ( n 1 / 9 ) ,
sup x R | P ( T n x ) P ( T n G x ) | = o ( 1 ) .
We complete the proof of Proposition 1. □

Appendix C. Proof of Theorem 2

Appendix C.1. Proof of Theorem 2 under Case3

Proof. 
By Proposition 1 under Case1, it suffices to show
sup x R | P ( T ^ n G x | E ) P ( T n G x ) | = o p ( 1 ) .
Recall T n G = | G | with G N ( 0 , Ξ n ˜ ) and T ^ n G = | n ˜ 1 / 2 t = 1 n ˜ ( Z t Z ¯ ) ϱ t | , where Ξ n ˜ = var ( n ˜ 1 / 2 t = 1 n ˜ Z t ) . Let
Ξ ^ n ˜ = 1 n ˜ b = 1 B t I b ( Z t Z ¯ ) t I b ( Z t Z ¯ ) T .
By Proposition 2.1 of [16], we have
sup x R | P ( T ^ n G x | E ) P ( T n G x ) | Γ 1 / 2 log p ,
where
Γ = | Ξ n ˜ Ξ ^ n ˜ | = 1 n ˜ | b = 1 B t I b ( Z t Z ¯ ) t I b ( Z t Z ¯ ) T var t = 1 n ˜ Z t | .
Let Z ˚ t = ( Z ˚ t , 1 , , Z ˚ t , p ) T = Z t E ( Z t ) . Then, for any i , j [ p ] , triangle inequality yields
| b = 1 B t I b ( Z t , i Z ¯ i ) t I b ( Z t , j Z ¯ j ) E t = 1 n ˜ Z ˚ t , i t = 1 n ˜ Z ˚ t , j | = | b = 1 B t I b ( Z ˚ t , i Z ˚ ¯ i ) t I b ( Z ˚ t , j Z ˚ ¯ j ) E t = 1 n ˜ Z ˚ t , i t = 1 n ˜ Z ˚ t , j | | b = 1 B t I b ( Z ˚ t , i Z ˚ ¯ i ) t I b ( Z ˚ t , j Z ˚ ¯ j ) b = 1 B E t I b Z ˚ t , i t I b Z ˚ t , j | + | b = 1 B E t I b Z ˚ t , i t I b Z ˚ t , j E t = 1 n ˜ Z ˚ t , i t = 1 n ˜ Z ˚ t , j | | b = 1 B t 1 , t 2 I b { Z ˚ t 1 , i Z ˚ t 2 , j E ( Z ˚ t 1 , i Z ˚ t 2 , j ) } | I 1 , i , j + S n ˜ | t = 1 n ˜ Z ˚ t , i t = 1 n ˜ Z ˚ t , j | I 2 , i , j + | b = 1 B E t I b Z ˚ t , i t I b Z ˚ t , j E t = 1 n ˜ Z ˚ t , i t = 1 n ˜ Z ˚ t , j | I 3 , i , j .
In this sequel, we will specify the upper bounds of I 1 , i , j , I 2 , i , j and I 3 , i , j , respectively. Without loss of generality, we assume n ˜ = B S with B n ϑ and S n 1 ϑ . By Assumption 1(i), it holds that max t [ n ˜ ] max j [ p ] E ( | Z ˚ t , j | m ) C for some m > 4 . Then, due to τ > 3 m / ( m 4 ) , (A1) yields
E t 1 , t 2 I b { Z ˚ t 1 , i Z ˚ t 2 , j E ( Z ˚ t 1 , i Z ˚ t 2 , j ) } 2 E t I b Z ˚ t , i 2 t I b Z ˚ t , j 2 S 2 .
By triangle inequality,
E b = 1 B t 1 , t 2 I b { Z ˚ t 1 , i Z ˚ t 2 , j E ( Z ˚ t 1 , i Z ˚ t 2 , j ) } 2 B S 2 + b = 1 B 1 s = 1 B b | t 1 , t 2 I b t 3 , t 4 I b + s cov ( Z ˚ t 1 , i Z ˚ t 2 , j , Z ˚ t 3 , i Z ˚ t 4 , j ) | B S 2 + b = 1 B 1 s = 1 B b | t 1 , t 2 I b t 3 , t 4 I b + s cum i , j ( t 2 t 1 , t 3 t 1 , t 4 t 1 ) | + b = 1 B 1 s = 1 B b | t 1 , t 2 I b t 3 , t 4 I b + s E ( Z ˚ t 1 , i Z ˚ t 3 , i ) E ( Z ˚ t 2 , j Z ˚ t 4 , j ) | + b = 1 B 1 s = 1 B b | t 1 , t 2 I b t 3 , t 4 I b + s E ( Z ˚ t 1 , i Z ˚ t 4 , j ) E ( Z ˚ t 3 , i Z ˚ t 2 , j ) } | .
By Assumption 3, t 1 , t 2 I b t 3 , t 4 I b + s cum i , j ( t 2 t 1 , t 3 t 1 , t 4 t 1 ) S , which implies
b = 1 B 1 s = 1 B b | t 1 , t 2 I b t 3 , t 4 I b + s cum i , j ( t 2 t 1 , t 3 t 1 , t 4 t 1 ) | B 2 S .
For any b [ B 1 ] and s [ B b ] , due to τ > 3 m / ( m 4 ) , Equation (1.12b) of [30] yields
| t 1 I b t 3 I b + s E ( Z ˚ t 1 , i Z ˚ t 3 , i ) | t 1 I b t 3 I b + s | E ( Z ˚ t 1 , i Z ˚ t 3 , i ) | t 1 I b t 3 I b + s { E ( | Z t 1 , i | m ) } 1 m { E ( | Z t 3 , i | m ) } 1 m α m 2 m ( t 3 t 1 ) t 1 I b t 3 I b + s α m 2 m ( t 3 t 1 ) h = 1 S h α m 2 m { h + ( s 1 ) S } 1 ( s = 1 ) + S 2 m ( m 2 ) τ m ( s 1 ) ( m 2 ) τ m 1 ( s > 1 ) .
Similarly, we also have
| t 2 I b t 4 I b + s E ( Z ˚ t 2 , j Z ˚ t 4 , j ) | 1 ( s = 1 ) + S 2 m ( m 2 ) τ m ( s 1 ) ( m 2 ) τ m 1 ( s > 1 ) .
Thus,
b = 1 B 1 s = 1 B b | t 1 , t 2 I b t 3 , t 4 I b + s E ( Z ˚ t 1 , i Z ˚ t 3 , i ) E ( Z ˚ t 2 , j Z ˚ t 4 , j ) | b = 1 B 1 s = 1 B b 1 ( s = 1 ) + S 4 m 2 ( m 2 ) τ m ( s 1 ) 2 ( m 2 ) τ m 1 ( s > 1 ) B .
Analogously, we also have b = 1 B 1 s = 1 B b | t 1 , t 2 I b t 3 , t 4 I b + s E ( Z ˚ t 1 , i Z ˚ t 4 , j ) E ( Z ˚ t 3 , i Z ˚ t 2 , j ) } | B . Combining this with (A29)–(A31), due to B S ,
E b = 1 B t 1 , t 2 I b { Z ˚ t 1 , i Z ˚ t 2 , j E ( Z ˚ t 1 , i Z ˚ t 2 , j ) } 2 B 2 S .
Then it holds that
I 1 , i , j = O p ( B S 1 / 2 ) .
Similar to (A1), we have | ( t = 1 n ˜ Z ˚ t , i ) ( t = 1 n ˜ Z ˚ t , j ) | = O p ( n ) . Thus, we know that
I 2 , i , j = O p ( S ) .
Note that
I 3 , i , j b 1 b 2 | E t I b 1 Z ˚ t , i t I b 2 Z ˚ t , j | .
For b 1 < b 2 , due to τ > 3 m / ( m 4 ) , Equation (1.12b) of [30] yields
| E t I b 1 Z ˚ t , i t I b 2 Z ˚ t , j | s = 1 S s { E ( | Z t , i | m ) } 1 m { E ( | Z t + s , j | m ) } 1 m α m 2 m { s + ( b 2 b 1 1 ) S } 1 ( b 2 b 1 = 1 ) + S 2 m ( m 2 ) τ m ( b 2 b 1 1 ) ( m 2 ) τ m 1 ( b 2 b 1 > 1 ) .
Thus,
b 1 < b 2 | E t I b 1 Z ˚ t , i t I b 2 Z ˚ t , j | B + S 2 m ( m 2 ) τ m b 2 b 1 > 1 ( b 2 b 1 1 ) ( m 2 ) τ m B .
Similarly, we can also obtain b 1 > b 2 | E { ( t I b 1 Z ˚ t , i ) ( t I b 2 Z ˚ t , j ) } | B , which implies I 3 , i , j B . Then by (A32) and (A33), it holds that
1 n ˜ | b = 1 B t I b ( Z t , i Z ¯ i ) t I b ( Z t , j Z ¯ j ) E t = 1 n ˜ Z ˚ t , i t = 1 n ˜ Z ˚ t , j | = O p ( S 1 / 2 ) .
Then, by Markov’s inequality,
Γ = | Ξ n ˜ Ξ ^ n ˜ | = O p ( p 2 S 1 / 2 ) .
By (A28), due to S n 1 ϑ , it holds that
sup x R | P ( T ^ n G x | E ) P ( T n G x ) | = o p ( 1 )
provided that p log p = o { n ( 1 ϑ ) / 4 } .
Recall cv ^ α = inf { x > 0 : P ( T ^ n G > x | E ) α } . For any ϵ > 0 , let cv α ( ϵ ) and cv α ( ϵ ) be two constants which satisfy P { T