A Blockwise Bootstrap-Based Two-Sample Test for High-Dimensional Time Series

We propose a two-sample testing procedure for high-dimensional time series. To obtain the asymptotic distribution of our ℓ∞-type test statistic under the null hypothesis, we establish high-dimensional central limit theorems (HCLTs) for an α-mixing sequence. Specifically, we derive two HCLTs for the maximum of a sum of high-dimensional α-mixing random vectors under the assumptions of bounded finite moments and exponential tails, respectively. The proposed HCLT for α-mixing sequence under bounded finite moments assumption is novel, and in comparison with existing results, we improve the convergence rate of the HCLT under the exponential tails assumption. To compute the critical value, we employ the blockwise bootstrap method. Importantly, our approach does not require the independence of the two samples, making it applicable for detecting change points in high-dimensional time series. Numerical results emphasize the effectiveness and advantages of our method.


Introduction
A fundamental testing problem in multivariate analysis involves assessing the equality of two mean vectors, denoted as µ X and µ Y .Since its inception by [1], the Hotelling T 2 test has proven to be a valuable tool in multivariate analyses.Subsequently, numerous studies have addressed the testing of µ X = µ Y , within various contexts and under distinct assumptions.See refs.[2,3], along with their respective references.
Consider two sets of observations, {X t } n 1 t=1 and {Y t } n 2 t=1 , where X t = (X t,1 , . . ., X t,p ) T and Y t = (Y t,1 , . . ., Y t,p ) T .These observations are drawn from two populations with means µ X and µ Y , respectively.The classical test aims to test the hypotheses: When {X t } n 1 t=1 and {Y t } n 2 t=1 are two independent sequences and independent with each other, a considerable body of literature focuses on testing Hypothesis (1).The ℓ 2 -type test statistic corresponding to (1) is of the form ( X − Ȳ) T S −1 ( X − Ȳ), where X t=1 Y t and S −1 is the weight matrix.A straightforward choice for S −1 is the identity matrix I p [4,5], implying equal weighting for each dimension.Several classical asymptotic theories have been developed based on this selection of S −1 .However, this choice disregards the variability in each dimension and the correlations between them, resulting in suboptimal performance, particularly in the presence of heterogeneity or the existence of correlations between dimensions.In recent decades, numerous researchers have investigated various choices for S −1 along with the corresponding asymptotic theories.See refs.[6,7].In addition, some researchers have developed a framework centered on ℓ ∞ -type test statistics, represented as max j∈[p] |(S −1/2 ( X − Ȳ)) j | [8][9][10].Extreme value theory plays a pivotal role in deriving the asymptotic behaviors of these test statistics.
However, when {X t } n 1 t=1 and {Y t } n 2 t=1 are two weakly dependent sequences and are not independent of each other, the above methods may not work well.In this paper, we introduce an ℓ ∞ -type test statistic T n := (n 1 n 2 ) 1/2 (n 1 + n 2 ) −1/2 | X − Ȳ| ∞ for testing H 0 under two dependent sequences.Based on Σ, which represents the variance of (n 1 n 2 ) 1/2 (n 1 + n 2 ) −1/2 ( X − Ȳ), we construct a Gaussian maxima, denoted as T G n , to approximate T n under the null hypothesis.When n 1 = n 2 = n, T n can be written as |S n | ∞ , the maximum of a sum of high-dimensional weakly dependent random vectors, where . ., G p ) T ∼N {0, var(S n )} and A be a class of Borel subsets in R p .Define Paticularly, let A max consists of all sets A max of the form A max = {(a 1 , . . ., a p ) T ∈ R p : max j∈[p] |a j | ≤ x} with some x ∈ R. Then we have Note that ρ n (A max ) is the Kolmogorov distance between T n and T G n .When dimension p diverges exponentially with respect to the sample size n, several studies have focused on deriving ρ n (A max ) = o(1) under a weakly dependent assumption.Based on the coupling method for β-mixing sequence, ref. [11] obtained ρ n (A max ) = o(1) under the β-mixing condition, contributing to the understanding of such phenomena.Ref. [12] extended the scope of the investigation to the physical dependence framework introduced by [13].Considering three distinct types of dependence-namely α-mixing, m-dependence, and physical dependence measures-ref.[14] made significant strides.They established nonasymptotic error bounds for Gaussian approximations of sums involving high-dimensional dependent random vectors.Their analysis encompassed various scenarios of A, including hyper-rectangles, simple convex sets, and sparsely convex sets.Let A re be the class of all hyper-rectangles in R p .Under the α-mixing scenario and some mild regularity conditions, [14] showed ρ n (A re ) ≲ {log(pn)} 7/6 n 1/9 , hence the Gaussian approximation holds if log(pn) = o(n 2/21 ).In this paper, under some conditions similar to or even weaker than [14], we obtain ρ n (A max ) ≲ {log(pn)} 3/2 n 1/6 , which implies the Gaussian approximation holds if log(pn) = o(n 1/9 ).Refer to Remark 1 for more details on the comparison of the convergence rates.By using the Gaussian-to-Gaussian comparison and Nazarov's inequality for p-dimensional random vectors, we can easily extend our result to ρ n (A re ) ≲ {log(pn)} 3/2 n −1/6 .Given that our framework and numerous testing procedures rely on ℓ ∞ -type test statistics, we thus propose our results under A max .When p diverges polynomially with respect to n, to the best of our knowledge, there is no existing literature providing the convergence rate of ρ n (A max ) for α-mixing sequences under bounded finite moments.
Based on the Gaussian approximation for high-dimensional independent random vectors [15,16], we employ the coupling method for α-mixing sequence [17] and "bigand-small" block technique to specify the convergence rate of ρ n (A max ) under various divergence rates of p.For more details, refer to Theorem 1 in Section 3.1 and its corresponding proof in Appendix A. Given that Σ is typically unknown in practice, we develop a data-driven procedure based on blockwise wild bootstrap [18] to determine the critical value for a given significance level α.The blockwise wild bootstrap method is widely used in the time series analysis.See [19,20] and references within.
The independence between {X t } n 1 t=1 and {Y t } n 2 t=1 is not a necessary assumption in our method.We only require the pair sequence {(X t , Y t )} is weakly dependent.Therefore, our method can be applied effectively to detect change points in high-dimensional time series.Further details on this application can be found in Section 4.
The rest of this paper is organized as follows.Section 2 introduces the test statistic and the blockwise bootstrap method.The convergence rates of Gaussian approximations for high-dimensional α-mixing sequence and the theoretical properties of the proposed test can be found in Section 3. In Section 4, an application to change point detection for high-dimensional time series is presented.The selection method for tuning parameter and a simulation study to investigate the numerical performance of the test are displayed in Section 5. We apply the proposed method to the opening price data from multiple stocks in Section 6. Section 7 provides discussions on the results and outlines our future work.The proofs of the main results in Section 3 are detailed in the Appendices A-D.
Notation: For any positive integer p ≥ 1, we write [p] = {1, . . ., p}.We use |a| ∞ = max j∈[p] |a j | to denote the ℓ ∞ -norm of the p-dimensional vector a.Let ⌊x⌋ and ⌈x⌉ represent the greatest integer less than or equal to x and the smallest integer greater than or equal to x, respectively.For two sequences of positive numbers {a n } and {b n }, we write and ∥A∥ 2 be the spectral norm of A. Additionally, denote λ min (A) as the smallest eigenvalue of A. Let 1(•) be the indicator function.For any x, y ∈ R, denote x ∨ y = max{x, y} and x ∧ y = min{x, y}.Given γ > 0, we define the function ψ γ (x) := exp(x γ ) − 1 for any x > 0. For a real-valued random variable ξ, we define ∥ξ∥ Throughout the paper, we use c, C ∈ (0, ∞) to denote two generic finite constants that do not depend on (n 1 , n 2 , p), and may be different in different uses.

Test Statistic and Its Gaussian Analog
Consider two weakly stationary time series {X t , t ∈ Z} and {Y t , t ∈ Z} with X t = (X t,1 , . . ., X t,p ) T and Y t = (Y t,1 , . . ., Y t,p ) T .Let µ X = E(X t ) and µ Y = E(Y t ).The primary focus is on testing equality of mean vectors of the two populations: In this paper, we assume For each t ∈ [ ñ], let Ỹt .
Then, T n can be rewritten as We reject the null hypothesis H 0 if T n > cv α , where cv α represents the critical value at the significance level α ∈ (0, 1).Determining cv α involves deriving the distribution of T n under H 0 .However, due to the divergence of p in a high-dimensional scenario, obtaining the distribution of T n is challenging.To address this challenge, we employ the Gaussian approximation theorem [15,16].We seek a Gaussian analog, denoted as T G n , satisfying the property that the Kolmogorov distance between T n and T G n converges to zero under H 0 .Then, we can replace cv α by cv G α := inf{x > 0 : We then define the Gaussian analogue of T n as Proposition 1 below demonstrates that the null distribution of T n can be effectively approximated by the distribution of T G n .

Blockwise Bootstrap
Note that the long-run covariance matrix Ξ ñ specified in (3) is typically unknown.As a result, determining cv G α through the distribution of T G n becomes challenging.To address this challenge, we introduce a parametric bootstrap estimator for T n using the blockwise bootstrap method [18].

Theoretical Results
We employ the concept of 'α-mixing' to characterize the serial dependence of {(X t , Y t )}, with the α-mixing coefficient at lag κ defined as where F r −∞ and F ∞ r+κ are the σ-fields generated by {(X t , Y t ) : t ≤ r} and {(X t , Y t ) : t ≥ r + κ}, respectively.We call the sequence {(X t , Y t )} is α-mixing if α(κ) → 0 as κ → ∞.

Gaussian Approximation for High-Dimensional α-Mixing Sequence
To show that the Kolmogorov distance between T n and T G n converges to zero under various divergence rates of p, we need the following central limit theorems for highdimensional α-mixing sequence.

Theoretical Properties
In order to derive the theoretical properties of T n , the following regular assumptions are needed.

Assumption 1.
(i) For some m > 4, there exists a constant

Assumption 2.
(i) There exists a constant C Remark 2. The two mild Assumptions, 1 and 2, delineate the necessary assumptions for {(X t , Y t )} to facilitate the development of Gaussian approximation theories for the dimension p divergence, characterized by polynomial and exponential rates relative to the sample size n, respectively.Assumptions 1(i) and 1(ii) are common assumptions in multivariate time series analysis.Due to , a standard assumption in the literature on ultra-high-dimensional data analysis.This assumption ensures subexponential upper bounds for the tail probabilities of the statistics in question when p ≫ n, as discussed in [23,24].The requirement of sub-Gaussian properties in Assumption 2(i) is made for the sake of simplicity.If {X t } and {Y t } share the same tail probability, Assumption 2(i) is satisfied automatically.Assumption 2(ii) necessitates that the α-mixing coefficients decay at an exponential rate.n log p = o(n) mandates the maximum difference between the two sample sizes.Proposition 1 below demonstrates that, under the aforementioned cases and H 0 , the Kolmogorov distance between T n and T G n converges to zero as the sample size approaches infinity.Proposition 1 can be directly derived from Theorem 1.Note that, in the scenario where the dimension p diverges in a polynomial rate with respect to n, obtaining Proposition 1 requires only m > 3 and τ > max{2m/(m − 3), 3}, an assumption weaker than Assumption 1.The more stringent restrictions m > 4 and τ > 3m/(m − 4) in Assumption 1 are imposed to establish the results presented in Theorems 2 and 3.

Proposition 1.
In either Case1 or Case2, it holds under the null hypothesis H 0 that According to Proposition 1, the critical value cv α can be substituted with cv G α .However, in practical scenarios, the long-run covariance Ξ ñ defined in (3) is typically unknown.This implies that obtaining cv G α directly from the distribution of T G n is not feasible.We introduce a bootstrap method for obtaining the estimator ĉv α defined in (4).In situations where the dimension p diverges at a polynomial rate relative to the sample size n, we require an additional Assumption 3 to ensure that ĉv α serves as a reliable estimator for cv α .Assumption 3 places restrictions on the cumulant function, a commonly assumed criterion in time series analysis.Refer to [25,26] for examples of such assumptions in the literature.Assumption 3.For each i, j ∈ [p], define cum i,j (h, t, s) = cov( Z0,i Zh,j , Zt,i Zs,j ) − γ t,i,i γ s−h,j,j − γ s,i,j γ t−h,j,i , where γ h,i,j = cov(Z 0,i , Z h,j ) and Zt,j = Z t,j − E(Z t,j ).There exists a constant Similar to Case1 and Case2, we consider two cases corresponding to different divergence rates of the dimension p, as outlined below: . Moreover, it holds under H 0 that Remark 3. The different requirements for the divergence rates of p follow from the fact that we do not rely on the Gaussian approximation and comparison results under certain alternative hypotheses.By Theorem 2 and Theorem 3, the optimal selections for ϑ are 1/2 and 7/9 in Case3 and Case4, respectively.This implies that lim n→∞ P H 0 (T n > ĉv α ) = α holds with p log p = o(n

Application: Change Point Detection
In this section, we elaborate that our two-sample testing procedure can be regarded as a novel method for detecting change points for high-dimensional time series.To illustrate, we provide a notation for the detection of a single change point, with the understanding that it can be easily extended to the multiple change points case.Consider a p-dimensional time series {X t } n t=1 .Let µ t = E(X t ).Consider the following hypothesis testing problem: Here, τ 0 is the unknown change point.Let w be a positive integer such that w < min{τ 0 , n − τ 0 }. (2)} with two well-defined thresholds ε t, (1) , ε t,(2) ≥ 0. Due to the symmetry of |∆ t, (1) | and |∆ t, (2) |, it holds under H ′ 1 that The sample estimators of μt , μ(1) and μ(2) are, respectively, μt = w −1 ∑ t+w/2 l=t−w/2+1 X l , μ(1) = w −1 ∑ w l=1 X l and μ(2) = w −1 ∑ n l=n−w+1 X l .Based on the method proposed in Section 2, with n 1 = n 2 = w, we define the following two test statistics: Given a significance level α > 0, we choose ε t,( We utilize T t,(1) w as an illustrative example to elucidate the applicability of our proposed method.Let w be an even integer.For any t ∈ [5w/2, n − 3w/2], we have where the sequence {X t−w/2+l − X l } w l=1 possesses the same weakly dependence properties and similar moment/tail conditions as shares the same weakly dependence properties and similar moment/tail conditions as {X l } n l=1 .Hence, our method can be applied to change point detection.
The selections of w and α are crucial in this method.We will elaborate on the specific choices for them in future works.

Tuning Parameter Selection
Given the observations {X t } n 1 t=1 and {Y t } n 2 t=1 , we use the minimum volatility (MV) method proposed in [27] to select the block size S.
When the data are independent, by the multiplier bootstrap method described in [28], we set B = ñ (thus S = 1).In this case, proves to be a reliable estimator of Ξ ñ introduced in Section 3. When the data are weakly dependent (and thus nearly independent), we expect a small value for S and a large value for B. Therefore, we recommend exploring a narrow range of S, such as S ∈ {1, . . ., m}, where m is a moderate integer.In our theoretical proof, the quality of the bootstrap approximation depends on how well the Ξ ñ approximates the covariance Ξ ñ.The idea behind the MV method is that the conditional covariance Ξ ñ should exhibit stable behavior as a function of S within an appropriate range.For more comprehensive discussions on the MV method and its applications in time series analysis, we refer readers to [27,29].For a moderately sized integer m, let S 1 < S 2 < • • • < S m be a sequence of equally spaced candidate block sizes, and For each i ∈ {0, . . ., m + 1}, let where j ∈ [p] and B(S) = ⌈ ñ/S⌉.Then for each i ∈ {1, . . ., m}, we compute where sd(•) is the standard deviation.Then, we select the block size S i * with i * = arg min i∈{1,...,m} Υ i .

Simulation Settings
We present the results of a simulation study aimed at evaluating the performance of tests based on T n , as defined in (2), in finite samples.To assess the finite-sample properties of the proposed test, we employed the following fundamental generating processes: W = HA + f(a) ∈ R n×p , where A ∈ R p×p is the loading matrix, f(•) : R → R n×p is a constant function, the parameter a belongs to the set {0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6}, representing the distance between the null and alternative hypotheses.Additionally, t,j = a j and m (2) t,j = a(1 − j/p) for each t ∈ [n] and j ∈ [p].Then f 1 (•) and f 2 (•) represent the sparse and dense signal cases, respectively.We consider three different loading matrices for A as follows: ).Let A = (a k,l ) 1≤k,l≤p s.t. a k,k = 1, a k,l = 0.7 for |k − l| = 1 and a k,l = 0 otherwise.(M3).Let r = ⌈p/2.5⌉,V = (v k,l ) 1≤k,l≤p , where v k,k = 1, v k,l = 0.9 for r(q − 1) + 1 ≤ k ̸ = l ≤ rq with q = 1, . . ., ⌊p/r⌋, and v k,l = 0 otherwise.

Simulation Results
For the testing of the null hypothesis, consider independent generations of {X t } and {Y t }, following the same process as W, with identical values for ρ and f(a) = 0.The choice of f(a) = 0 here is made for the sake of simplicity.We exclusively present the simulation results for (M1) in the main body of the paper.The results obtained for (M2) and (M3) are analogous to those of (M1) and are detailed in the Appendix E.
Table 1 presents the performance of various methods in controlling Type I errors based on (M1).As the dimension p or sample size (n 1 , n 2 ) increases, the results of all methods exhibit small changes, except BS's.When ρ equals 0, indicating samples are generated from independent Gaussian distributions, both Yang's method and BS's method effectively control Type I errors at around 5%, while the control achieved by the other three methods is less optimal.It is noteworthy that, with an increase in ρ, the data generated by the AR(1) model significantly influence the other methods.In contrast, Yang's method demonstrates superior and more stable results with increasing ρ.These comparative effects are also observable in the results based on (M2) and (M3) in the Appendix E. For this reason, we exclusively compare the empirical power results by different methods with ρ = 0.
Figures 1 and 2 depict the empirical power results of various methods for sparse and dense signals based on (M1).Similarly, as the dimension p increases, the results of all methods show little variation, except Dempster's.However, with an increase in sample size (n 1 , n 2 ), most methods exhibit improvement in their results.In Figure 1, it is evident that Yang's method outperforms others significantly when the signal is sparse.Methods like SD, BS, and Dempster rely on the ℓ 2 -norm of the data, aggregating signals across all dimensions for testing.This makes them less effective when the signal is sparse, i.e., anomalies appear in only a few dimensions.CLX's approach, akin to Yang's, tests whether the largest signal is abnormal.Consequently, CLX performs better than the other three methods in scenarios with sparse signals but still falls short of Yang's method.On the contrary, when the signal is dense, Figure 2 shows that all methods yield favorable results, with Dempster's method proving to be the most effective.Yang's method performs at a relatively high level among these methods.In contrast, the CLX's method, which performs well in sparse signals, exhibits a relatively lower level of performance in dense signals.In conclusion, the proposed method exhibits the most stable performance across all methods and performs exceptionally well on sparse data.

Real Data Analysis
In this section, we apply the proposed method to a dataset comprised of stock data obtained from Bloomberg's public database.This dataset includes daily opening prices from 1 January 2018 to 31 December 2021 for 30 companies in the Consumer Discretionary Sector (CDS) and 31 companies in the Information Technology Sector (ITS), all listed in the S&P 500.The sample sizes for the years 2018, 2019, 2020, and 2021 are 251, 250, 253, and 252, respectively.The findings are presented in Table 2. Regarding the data for the Consumer Discretionary (CD) and Information Technology (IT) sectors, all p-values from the tests between two consecutive years are 0.This suggests a significant variation in the average annual opening prices across different years for both CDs and ITs.
For data visualization, Figure 3 displays the average annual opening prices of 30 companies in the CDS (left subgragh) and 31 companies in the ITS (right subgragh) in 2018, 2019, 2020, and 2021.The two subgraghs both exhibit a pattern of annual growth in the opening prices of nearly every stock.These results are well in line with the conclusion of Table 2.

Discussion
In this paper, we propose a two-sample test for high-dimensional time series based on blockwise bootstrap.Our ℓ ∞ -type test statistic is designed to detect the largest abnormal signal among dimensions.Unlike some frameworks, we do not necessarily require independence within each observation or between the two sets of observations.Instead, we rely on the weak dependence property of the pair sequence {(X t , Y t )} to ensure the asymptotic properties of our proposed method.We derive two Gaussian approximation results for two cases in which the dimension p diverges, one at a polynomial rate relative to the sample size n and the other at an exponential rate relative to the sample size n.In the bootstrap procedure, the block size serves as the tuning parameter, and we employ the minimum volatility method, as proposed by [27], for block size selection.
Our test statistic is designed to pinpoint the maximum value among dimensions, facilitating the detection of significant differences in certain dimensions.In cases where differences in each dimension are minimal, it is more appropriate to consider the ℓ 2 -type test statistic rather than the ℓ ∞ -type one.Consequently, in the absence of prior information, the utilization of test statistics that combine both types proves advantageous.However, deriving theoretical results from such a combined approach is a significant challenge.As discussed in Section 4, our two-sample testing procedure can be applied to change point detection in high-dimensional time series.The choices of w, the size of each subsample mean, and the significance level α play crucial roles in this change point detection procedure.We leave these considerations for future research.
Then, we have max j∈[p] |S n,j | = max j∈[2p] Šn,j and max j∈[p] |W j | = max j∈[2p] Wj .Then, to obtain Theorem 1(i), without loss of generality, it suffices to specify the convergence rate of ω n .
For some constant ς ∈ (0, 1), let B n = ⌊n ς ⌋ and K n = ⌈n/B n ⌉ be the number of blocks and the size of each block, respectively.For simplicity, we assume B n ≍ n ς and K n = n/B n ≍ n 1−ς .We first decompose the sequence {1, . . ., n} into B n blocks: ) to a "large" block I b with length g n and a "small" block J b with length k n : , by Theorem 2 of [17], there exists an independent sequence { Hb,j } B n b=1 such that Hb,j has the same distribution as H + b,j and Sn,j ≤ x − P max For any ϵ 1 > 0, triangle inequality implies Sn,j > ϵ 1 for any x > 0, then P(max for any ϵ 1 > 0. Thus, we can conclude that and where the last inequality follows from τ > 3. Thus, S n,j − max Due to Hb,j having the same distribution as H + b,j and |H + b,j | ≤ 2D n , by (A3), we have E(| Hb,j − H + b,j | s ) ≲ D s n k −τ n for s ∈ {2, 3}.Thus, following the same arguments as in the proof of (A7), it holds that Together with (A8), we have
Theorem 2. In either Case3 with p log p

Table 1 .
The Type I error rates, expressed as percentages, were calculated by independently generated sequences {X t } n 1 t=1 and {Y t } n 2 t=1 based on (M1).The simulations were replicated 1000 times.

Table A2 .
The Type I error rates, expressed as percentages, were calculated by independently generated sequences {X t } n 1 t=1 and {Y t } n 2 t=1 based on (M3).The simulations were replicated 1000 times.