Estimation of Dynamic Networks for High-Dimensional Nonstationary Time Series

This paper is concerned with the estimation of time-varying networks for high-dimensional nonstationary time series. Two types of dynamic behaviors are considered: structural breaks (i.e., abrupt change points) and smooth changes. To simultaneously handle these two types of time-varying features, a two-step approach is proposed: multiple change point locations are first identified on the basis of comparing the difference between the localized averages on sample covariance matrices, and then graph supports are recovered on the basis of a kernelized time-varying constrained L1-minimization for inverse matrix estimation (CLIME) estimator on each segment. We derive the rates of convergence for estimating the change points and precision matrices under mild moment and dependence conditions. In particular, we show that this two-step approach is consistent in estimating the change points and the piecewise smooth precision matrix function, under a certain high-dimensional scaling limit. The method is applied to the analysis of network structure of the S&P 500 index between 2003 and 2008.


Introduction
Networks are useful tools to visualize the relational information among a large number of variables.Undirected graphical model is a rich class of statistical network model that encodes the conditional independence [37].Canonically, Gaussian graphical models (or its normalized version partial correlation [45]) can be represented by the inverse covariance matrix (i.e., the precision matrix), where a zero entry is associated with a missing edge between two vertices in the graph.Specifically, two vertices are not connected if and only if they are conditionally independent given the value of all other variables.On one hand, there is a large volume of literature on estimating the (static) precision matrix for graphical models in the high-dimensional setting, where the sample size and the dimension are both large [44,27,3,53,61,62,52,11,10,9,25,6,40,41].Most of the earlier work along this line assumes that the underlying network is time-invariant.This assumption is quite restrictive in practice and hardly plausible for many real-world applications such as gene regulatory networks, social networks, and stocking market, where the underlying data generating mechanisms are often dynamic.On the other hand, dynamic random networks have been extensively studied from the perspective of large random graphs such as community detection and edge probability estimation for dynamic stochastic block models (DSBMs) [38,49,32,15,23,22,30,18,20,47,48,7,4,29].
Such approaches do not model the sampling distributions of the error (or noise), since the "true" networks are connected with random edges sampled from certain probability models such as the Erdős-Rényi graphs [24] and random geometric graphs [46].
In this paper, we shall view the (time-varying) networks of interests as non-random graphs.
We adopt the graph signal processing approach for denoising the nonstationary time series and target on estimating the true unknown underlying graphs.Despite the recent attempts towards more flexible time-varying models [64,35,34,36,51,43,1,56], there are still a number of major limitations in the current high-dimensional literature.First, theoretical analysis was derived under the fundamental assumption that the observations are either temporally independent, or the temporal dependence has very specific forms such as Gaussian processes or (linear) vector autoregression (VAR) [64,35,6,16,54,51,63].Such dynamic structures are unduly demanding in view that many time series encountered in real applications have very complex nonlinear spatial-temporal dependency [57,26].Second, most existing work assumes the data have timevarying distributions with sufficiently light tails such as Gaussian graphical models and Ising models [64,35,16,54,36].Third, in change point estimation problems for high-dimensional time series, piecewise constancy is widely used [54,16,28,33], which can be fragile in practice.
For instance, financial data often appears to have time-dependent cross-volatility with structural breaks [2].For resting-state fMRI signals, correlation analysis reveals both slowly varying and abrupt changing characteristics corresponding to modularities in brain functional networks [12,31].
Advances in analyzing high-dimensional (stationary) time series have been made recently to address the aforementioned the nonlinear spatial-temporal dependency issue [51,58,50,6,63,14,13,5,8,55].In [14,8,55], the authors considered the theoretical properties of regularized estimation of covariance and precision matrices, based on various dependence measure of highdimensional time series.[43] considered the non-paranormal graphs that evolves with a random variable.[51] discussed the joint estimation of Gaussian graphical models based on a stationary VAR(1) model with special coefficient matrices, which may also depend on certain covariates.The authors applied a CLIME estimator with a kernel estimator of covariance matrix and developed consistency in the graph recovery at a given time points.[6] studied the recovery of the Granger causality across time and nodes assuming a stationary Gaussian VAR model with unknown order.
In this paper, we focus on the recovery of time-varying undirected graphs based on regularized estimation of the precision matrices for a general class of nonstationary time series.We simultaneously model two types of dynamics: abrupt changes with an unknown number of change points and the smooth evolution between the change points.In particular, we study a class of highdimensional piecewise locally stationary processes in a general nonlinear temporal dependency framework, where the observation are allowed to have a finite polynomial moment.
More specifically, there are two main goals of this paper: first to estimate the change point locations, as well as the number of change points, and second to estimate the smooth precision matrix functions between the change points.Accordingly, our proposed method contains two steps.In the first step, the maximum norm of the local difference matrix is computed at each time point and the jumps in the covariance matrices are detected at the location where the maximum norms are above a certain threshold.In the second step, the precision matrices before and after the jump are estimated by a regularized kernel smoothing estimator.These two steps are recursively performed until a stopping criterion is met.Moreover, a boundary correction procedure based on data reflection is considered to reduce the bias near the change point.
We provide an asymptotic theory to justify the proposed method in high dimensions: pointwise and uniform rates of convergence are derived for the change point estimation and graph recovery under mild and interpretable conditions.The convergence rates are determined via subtle interplay among the sample size, dimensionality, temporal dependence, moment condition, and the choice of bandwidth in the kernel estimator.Our results are significantly more involved than problems for sub-Gaussian tails and independent samples.We shall highlight that uniform consistency in terms of time-varying network structure recovery is much more challenging and difficult than pointwise consistency.For the multiple change point detection problem, we also characterize the threshold of the difference statistic that gives consistent selection of the number of change points.
We fix some notation.Positive, finite and non-random constants, independent of the sample size n and dimension p, are denoted by C, C 1 , C 2 , . . ., whose values may differ from line to line.For the sequence of real numbers, a n and b n , we write The rest of the paper is organized as following.Section 2 presents the time series model, as well as the main assumptions, which can simultaneously capture the smooth and abrupt changes.
In Section 3, we introduce the two-step method that first segments the time series based on the difference between the localized averages on sample covariance matrices and then recovers the graph support based on a kernelized CLIME estimator.In Section 4, we state the main theoretical results for the change point estimation and support recovery.Simulation examples are presented in Section 7 and a real data application is given in Section 8. Proof of main results can be found in Section 9.

Time series model
We first introduce a class of causal vector stochastic process.Then we state the assumptions to derive an asymptotic theory in Section 4 and explain their implications.Let ε i ∈ R p , i ∈ Z be independent and identically distributed (i.i.d.) random vectors and ) be a p-dimensional nonstationary time series generated by where ) is an R p -valued jointly measurable function.Suppose we observe the data points X i = X i,n = X • i (t i ) at the evenly spaced time intervals t i = i/n, i = 1, 2, . . ., n, We drop the subscription n in X i,n in the rest of this section.Since our focus is to study the second-order properties, the data is assumed to be mean zero.
Model (1) is first introduced in [21].The stochastic process X • i (t) i∈Z,t∈[0,1) can be thought as a triangular array system, double indexed by i and t, while the observations (X i ) n i=1 are sampled from the diagonal of the array.On one hand, fixing the time index t, the (vertical) process X • i (t) i∈Z is stationary.On the other hand, since H(F i ; t i ) is allowed to vary with t i , the diagonal process (2) is able to capture nonstationarity.
The process (X i ) i∈Z is causal or non-anticipative as X i is an output of the past innovations (ε j ) j≤i and does not depend on the future innovations.In fact, it covers a broad range of linear and nonlinear, stationary and non-stationary processes such as vector auto-regressive moving average processes, locally stationary processes, Markov chains, nonlinear functional processes [59,21,65,66,14].
Motivated by real applications where nonstationary time series data can involve both abrupt breaks and smooth varies between the breaks, we model the underlying processes as piecewise locally stationary with a finite number of structural breaks.
(ii) For each l = 0, . . ., ι, and 1 ≤ j, k ≤ p, we have σ jk (t) ∈ C 2 [t (l) , t (l+1) ).Now we introduce the temporal dependence measure.We quantify the dependence of X • i (t) i∈Z by the dependence adjusted norm (DAN) (cf.[60]).Let ε i be an independent copy of ε i and , with the same generating mechanism and input, except that ε i−m is replaced by an independent copy and Θ m,a,j = ∞ i=m θ i,a,j .The dependence adjusted norm of (X Intuitively, the physical dependence measure quantifies the adjusted stochastic difference between the random variable and its coupled version by replacing past innovations.Indeed, θ m,a,j measures the impact on X • ij (t) uniform over t by replacing ε i−m while freezing all the other inputs, while Θ m,a,j quantifies the cumulative influence of replacing ε −m on (X • ij (t)) i≥0 uniform over t.Then X •,j a,A controls the uniform polynomial decay in the lag of the cumulative physical dependence, where a depends on the the tail of marginal distributions of X • 1,j (t) and A quantifies the polynomial decay power and thus the temporal dependence strength.It is clear that X •,j a,A is a semi-norm, i.e., it is subaddative and absolutely homogeneous.
Assumption 2 (Dependence and moment conditions).Let X • i (t) be defined in (1) and X i in (2).There exist q > 2 and A > 0 such that We let M X,q := 1≤j≤p X •,j . The quantities M X,q and N X,2q measure the L q -norm aggregated effect and the largest effect of the element-wise DANs respectively.Both quantities play a role in the convergence rates of our estimator.
Obviously we have In contrast to other works in high-dimensional covariance matrix and network estimation, where sub-gaussian tails and independence are the keys to ensure consistent estimation, Assumption 2 only requires that the time series have finite polynomial moment, and it allows linear and nonlinear processes with short memory in the time domain.
Example 1 (Vector linear process).Consider the following vector linear process model where ε i = (ε 1 , . . ., ε p ) and ε ij are i.i.d. with mean 0 and variance 1, and ε ij q ≤ C q for each i ∈ Z and 1 ≤ j ≤ p with some constants q > 2 and C q > 0. The vector linear process is commonly seen in literature and application [42].It includes the time-varying VAR model where Suppose that the coefficient matrices A m (t) = (a m,jk (t)) 1≤j,k≤p , m = 0, 1, . . .satisfy the following condition.
where A m,j• (t) is the jth row of A m (t).Under condition (A1)-(A3), one can easily verify that for each 1 ≤ j, k ≤ p, the process satisfies: (1) (due to Burkholder's inequality, cf.[17]) Conditions (A1)-(A3) implicitly impose smoothness in each entry of the coefficient matrices, sparseness in each column of the entry and evolution, and polynomial decay rate in the lag m of each entry and its derivative.
, where (3).We assume that the change points are separated and sizeable.
Assumption 3 (Separability and sizeability of change points).There exist positive constants c 1 ∈ (0, 1) and c 2 > 0 independent of n and p such that max 0≤l≤ι (t (l+1) − t (l) ) ≥ c 1 and δ(t l ) := In the high-dimensional context, we assume that the inverse covariance matrices are sparse in the sense of their L 1 norms.

Assumption 4 (Sparsity of precision matrices). The precision matrix |Ω(t)|
, where κ p is allowed to grow with p.
If we further assume that the eigenvalues of the covariance matrices are bounded from below and above, i.e., there exists a constant 0 c −1 , then the covariance matrices and precision matrices are well-conditioned.In particular, as |, a small perturbation in the covariance matrix would guarantee a small change of the same order in the precision matrix under the spectral norm.

Method: change point estimation and support recovery
In graphical models (such as Gaussian graphical model or partial correlation graph), network structures relevant to correlations or partial correlations are second-order characteristics of the data distributions.Specifically, existence of edges coincides with the support of the inverse covariance matrix.In the presence of change points, we must first remove the change points before applying any smoothing procedures since To estimate the change points, compute The following steps are performed recursively.For l = 1, 2, . .., let until the following criterion is attained: where ν is an early stopping threshold.The value of ν is determined in Section 4, which depends on the dimension and sample size, as well as the serial dependence level, tail condition and local smoothness.Since our method only utilizes data in the localized neighborhood, multiple change points can be estimated and ranked in a single pass, which offers some computational advantage than the binary segmentation algorithm [16,28].
Once the change points are claimed, in the second step, we apply a kernelized time-varying (tv-) CLIME estimator for the covariance matrix functions of the multiple pieces of locally stationary processes before and after the structural breaks [10].Let where and where ωjk (t) = min(ω jk (t), ωkj (t)), and Ω(t) ≡ Ωλ (t) = (ω jk (t)) 1≤j,k≤p , Ωλ (t) = arg min Similar hybridized kernel smoothing and CLIME method for estimating the sparse and smooth transition matrices in high-dimensional VAR model has been considered in [19], where change point is not considered.Thus in the current setting we need to carefully control effect of (consistently) removing the change points before smoothing.
Then, the network is estimated by the "effective support" defined as follows.
Ĝ(t; u) = (ĝ jk (t; u)) 1≤j,k≤p , where ĝjk (t; It should be noted that the (vanilla) kernel smoothing estimator (10) of the covariance matrix does not adjust for the boundary effect due to the change points in the covariance matrice function.Thus, in the neighborhood of the change points, larger bias can be induced in estimating Σ(t) by Σ(t).As a remedy, we apply the following reflection procedure for boundary correction.

Theoretical results
In this section, we derive the theoretical guarratees for the change point estimation and graph support recovery.Roughly speaking, Proposition 1 and 2 below show that under appropriate conditions, if each element of the covariance matrix varies smoothly in time, one can obtain accurate snapshot estimation of the precision matrices as well as the time-varying graphs with high probability via the proposed kernel smoothed constrained l 1 minimization approach.
Proposition 1 (Rate of convergence for estimating precision matrices: pointwise and uniform).(i) Pointwise.Choose the parameter λ ) in the tv-CLIME estimator Ωλ • (t) in (12), where C is a sufficiently large constant independent of n and p.Then for any t ∈ [b, 1 − b], we have (12), where C is a sufficiently large constant independent of n and p.

Then we have sup
The optimal order of the bandwidth parameter b = b in ( 16) is the solution to the following equation: which implies that the closed-form expression for b is given by b for some constants C 1 and C 2 that are independent of n and p.
Given a finite sample, to distinguish the small entries in the precision matrix from the noise is challenging.Since a smaller magnitude of a certain element of the precision matrix implies a weaker connection of the edge in the graphical model, we instead consider the estimation of significant edges in the graph.Define the set of significant edges at level u as E * (t; u) = (j, k) : g * jk (t; u) = 0 , where g * jk (t; u) = I {|ω jk (t)| > u} .Then, as a consequence of ( 16), we have the following support recovery consistency result.
Proposition 2 (Consistency of support recovery: significant edges).Choose u as u = C 0 κ 2 p b 2 , where C 0 is taken as a sufficiently large constant independent of n and p. Suppose that u = o(1) as n, p → ∞.Then under conditions of Proposition 1, we have that as n, p → ∞, Proposition 2 shows that the pattern of significant edges in the time-varying true graphs , can be correctly recovered with high probability.However, it is still an open question to what extent the edges with magnitude below u can be consistently estimated, which can be naturally studied in the multiple hypothesis testing framework.Nonetheless, hypothesis testing for graphical models on the nonstationary high-dimensional time series is rather challenging.We leave it as a future problem.
Propositions 1 and 2 together yield that consistent estimation of the precision matrices and the graphs can be achieved before and after the change points.Now we provide theoretical result of the change point estimation.Theorem 5 below shows that if the change points are separated and sizeable, then we can consistently identify them via the single pass segmentation approach under suitable conditions.Denote where C 1 and C 2 are constants independent of n and p.
Theorem 5 (Consistency of change point estimation).Assume X i ∈ R p admits the form (2).
Suppose that Assumptions 2 to 3 are satisfied.Choose the bandwidth h = h , and ν = (1 + L)h 2 in ( 5) and ( 9) respectively.Assume that h = o(1) as n, p → ∞.We have that there exist constants C 1 , C 2 , C 3 independent of n and p such that Furthermore, on the event {ι = ι}, the ordered change-point estimator (ŝ defined in (7) satisfies Choose the penalty parameter as λ := C 1 κ p b 2 , where C 1 is a constant independent of n and p.Then Moreover, P sup t∈S (j,k)∈E * (t;2u ) X n −1/4 log(p) 1/4 , and u = u := C 0 κ 2 p b , where C 0 , C 1 and C 2 are constants independent of n and p. Suppose u = o(1).We have Choose the penalty parameter as λ := C 1 κ p b , where C 1 is a constant independent of n and p.Then Moreover, P sup t∈N (j,k)∈E * (t;2u ) Note that the convergence rates for the covariance matrix entries and precision matrix entries in case (ii) around the jump locations are slower than those for points well separated from the jump locations in case (i).This is because on the boundary due to the reflection, the smooth condition may no longer holds true.Indeed, we only take advantage of the Lipschitz continuous property of the covariance matrix function.Thus we lose one degree of regularity in the covariance matrix function, and the bias term b 2 in the convergence rate of the between-jump area becomes b around the jumps.We also note that around the smaller neighborhood of the jump J := ∪ 1≤j≤ι Th 2 , due to the larger error in the change point estimation, consistent recovery the graphs is not achievable.
We let the coefficient matrices A 1 (i) = {a m,jk (i)} 1≤j,k≤p evolve at each time point such that two entries are soft-thresholded and another two elements increase.Specifically, at time i, we randomly select two elements from the support of A 1 (i), which are denoted as {a 1,j l k l (i)}, l = 1, 2 and that a 1,j k (i) = 0, and set them to a 1,j l k l (i) = sign(a 1,j l k l (i))(|a 1,j l k l (i) − 0.05|).We also randomly select two elements from A 1 (i) and increase their values by 0.03.h , where l is obtained from ( 9) with ν = 0.For l = 1, 2, . . ., l − 1, compute We report the number of estimated jumps and the average absolute estimation error, where the average absolute estimation error is the mean of the distance between the estimated change points and the true change points.As is shown from Table 1 and Table 2, there is an apparent improvement in the estimation accuracy as the jump magnitude increases and dimension decreases.
The detection is relatively robust to the choice of bandwidth.We evaluate the support recovery performance of the time-varying CLIME at the lattice 100, 200, . . ., 900 with λ = 0.02, 0.06, 0.1.We take the uniform kernel function and the bandwidth is fixed as 0.2.At each time point t 0 , two quantities are computed: sensitivity and specificity, which are defined as: .
We plot the Receiver Operating Characteristic (ROC) curve, that is, sensitivity against 1specificity.From Figure 3 and Figure 4 we observe that, due to a screening step, the support recovery is robust to the choice of λ, except at the change points, where a non-negligible estimation error of the covariance matrix is induced and the overall estimation is less accurate.As the effective dimension of the network remains the same at p = 50 and p = 100 by the construction of the coefficient matrix A m (i), there is no significant difference in the ROC curves at different dimensions.In this section, we apply our method to a real financial dataset from Yahoo! Finance (finance.yahoo.com).
The data matrix contains daily closing prices of 420 stocks that are always in the S&P 500 index between January 2, 2002 through December 30, 2011.In total, there are n = 2519 time points.We select 100 stocks with the largest volatility and consider their log-returns; that is, for j = 1, . . ., 100, where p ij is the daily closing price of the stock j at time point i.We first compute the statistic  Lemma 1.Let (Y i ) i∈Z be a sequence that admits (2).Assume Y i ∈ L q for i = 1, 2, . . ., and the dependence adjusted norm (DAN) of the corresponding underlying array (Y • i (t)) satisfies Y • q,A < ∞ for q > 2 and A > 0. Let (ω(t, t i )) n i=1 be defined in (11) and suppose that the kernel function K(•) satisfies Assumption 5. Denote q,A (n) = n, n(log n) 1+2q , n q/2−Aq if A > 1/2−1/q, A = 1/2 − 1/q, and 0 < A < 1/2 − 1/q, respectively.Then there exist constants C 1 , C 2 and C 3 independent of n, such that for all x > 0, sup t∈(0,1) P sup  Proof.
where the last inequality follows from the fact that sup t n i=1 |w(t, t i ) − w(t − t i+1 )| B −1 n due to Assumption 5.
To see (28), it suffices to show Now we develop a probability deviation inequality for max 1≤i≤n | i j=1 α j Y j |, where α j ≥ 0, 1 ≤ j ≤ n are constants such that 1≤j≤n α j = 1.Denote P 0 (Y Then we can write Note that (P 0 (Y j )) j∈Z is an independent sequence.By Nagaev's inequality and Ottaviani's inequality, we have that where the last inequality holds because P 0 (Y j ) q ≤ 2 Y j q by Jensen's inequality.Since ∞ j=i+1 α j P k (Y j ) is a martingale difference sequence with respect to σ(ε i+1−k , ε i+2−k , . ..), we have that | ∞ k=1+n n j=i+1 α j P k (Y j )| is a non-negative sub-martingale.Then by Doob's inequality and Burkholder's inequality, we have x q ( n j=1 α 2 j ) q/2 Θ q n,q x q ≤ Θ q n,q n q/2−1 n j=1 α q j x q .(32) Combining ( 30), ( 31), ( 32) and ( 34), we obtain Now we have ( 29) by taking α j = n −1 for j = 1, . . ., n.Note that since K(•) has bounded support, for any given t ∈ [b, 1 − b], we have Therefore (27) follows from ( 35) by taking α j = w(t, tn + j) and note that Note that for any Lemma 2. Suppose (X ij ) i∈Z,1≤j≤p satisfy Assumption 2. Also let Assumption 5 hold.Let q,A (n) be defined as in Lemma 1. Then there exist constants C 1 , C 2 and C 3 independent of n and p, such that for all x > 0, we have sup t∈(0,1) and Proof.For 1 ≤ j, k ≤ p, let Y i,jk = X ij X ik .We now check the conditions in Lemma 1 for 9.2.Proof of main results.
Taking h = h , we have Furthermore we have P(A) ≥ 1 − C 1 p q,A (n)M q X,q ν q 2q n q c q In other words, {ι = ι} ⊂ A. Thus ( 19) is proved.
Proof of Theorem 6.We adopt the notations in the proof of Theorem 5 and assume that E holds.
Similar as in Lemma 3, we have that by Lemma 2, for any t ∈ (0, 1), where u = C 4 M X,q ν 2q B −1 n (p q,A (B n )) 1/q + ν 4 N X (log p/B n ) 1/2 for a large enough constant C 4 .Since under E, T b (j) ⊂ Tb+h 2 (j).For t ∈ ∪ 1≤j≤ι Tb+h 2 (j) The result (21) follows by the choices of b.The rest of the proof are similar as in that of Proposition 1 and Theorem 2.

Date:
First version: November 19, 2019.X. Chen's research is supported in part by NSF CAREER Award DMS-1752614 and UIUC Research Board Award RB18099.X. Chen acknowledges that part of this work is carried out at the MIT Institute for Data, System, and Society (IDSS).W.B. Wu's research is supported in part by NSF DMS-1405410.
, a non-negligible abrupt change in the covariance matrix will result in a substantial change of the graph structure for sparse and smooth covariance matrices.Thus our proposed graph recovery method consists of two steps: change point detection and support recovery.Let h ≡ h n > 0 be a bandwidth parameter such that h = o(1) and n −1 = o(h), and D h (0) = {h, h + 1/n, . . ., 1 − h} be a search grid in (0, 1).Define

1 − 1 K
−v|/b)/b.The bandwidth parameter b satisfies that b = o(1) and n −1 = o(b).Denote B n = nb.The kernel function K(•) is chosen to have properties as follows.Assumption 5 (Regularity of kernel function).The kernel function K(•) is non-negative, symmetric, and Lipschitz continuous with bounded support in [−1, 1], and that (u)du = 1.Assumption 5 is a common requirement on the kernel functions and can be fulfilled by a range of kernel functions such as the uniform kernel, triangular kernel, and the Epanechnikov kernel.Now the tv-CLIME estimator of the precision matrix Ω(t) is defined by Ω(t) = (ω jk (t)) 1≤j,k≤p ,

) Proposition 2 and
Theorem 5 together indicate the consistency in the snapshot estimation of the time-varying graphs before and after the change points.In a close neighborhood of the change points, we have the following result for the recovery of the time-varying network.DenoteS := b , 1 − b ] ∩ (∪ 1≤j≤ιT c h 2 +b (j) as the time intervals between the estimated change points, andN := [0, b ) ∪ ∪ 1≤j≤ι ( Th 2 +b ∩ T c h 2 ) ∪ (1 − b , 1]as the recoverable neighborhood of the jump.Theorem 6.Let Assumptions 2 to 5 be satisfied.We have the following results as n, p → ∞. (i) Between change points.For t ∈ S, take b = b and u = u , where b and u are defined in Proposition 2. Suppose u = o(1).we have sup t∈S max j,k |σ and i−m = ( i−m,1 , . . ., i−m,p ) , with m,k , m ∈ Z, j = 1, . . ., p generated as i.i.d.standardized T (8) random variables.In the simulation, we fix n = 1000 and vary p = 50 and p = 100.For each m = 1, . . ., 100, the coefficient matricesA m (i) = (1 + m) −β B m (i), where β = 1, and B m (1) is an R p×p block diagonal matrix.The 5 × 5 diagonal blocks in B m (i) are fixed with i.i.d.N (0, 1) entries and all the other entries are 0.

Figure 2 .
Figure 2. Support of the true covariance matrices, p = 100

Figure 6 .
Figure 6.Estimated networks at time points 813, 828, 888 and 903, corresponding to March 23, 2006, April 13, 2006, July 11, 2006, and August 1, 2006.Networks in the first and second row are estimated before and after the estimated change point at time 858, respectively.Different colors correspond to the nine sections in the S&P dataset.

Table 2 .
Number of estimated change points.