Estimation of Dynamic Networks for High-Dimensional Nonstationary Time Series

Xu, Mengyu; Chen, Xiaohui; Wu, Wei Biao

doi:10.3390/e22010055

Open AccessArticle

Estimation of Dynamic Networks for High-Dimensional Nonstationary Time Series

by

Mengyu Xu

¹,

Xiaohui Chen

² and

Wei Biao Wu

^3,*

¹

Department of Statistics and Data Science, University of Central Florida, 4000 Central Florida Blvd, Orlando, FL 32816, USA

²

Department of Statistics, University of Illinois at Urbana-Champaign, S. Wright Street, Champaign, IL 61820, USA

³

Department of Statistics, University of Chicago, 5747 S. Ellis Avenue, Jones 311, Chicago, IL 60637, USA

^*

Author to whom correspondence should be addressed.

Entropy 2020, 22(1), 55; https://doi.org/10.3390/e22010055

Submission received: 14 November 2019 / Revised: 25 December 2019 / Accepted: 26 December 2019 / Published: 31 December 2019

(This article belongs to the Special Issue Nonparametric Statistical Inference with an Emphasis on Information-Theoretic Methods)

Download

Browse Figures

Review Reports Versions Notes

Abstract

This paper is concerned with the estimation of time-varying networks for high-dimensional nonstationary time series. Two types of dynamic behaviors are considered: structural breaks (i.e., abrupt change points) and smooth changes. To simultaneously handle these two types of time-varying features, a two-step approach is proposed: multiple change point locations are first identified on the basis of comparing the difference between the localized averages on sample covariance matrices, and then graph supports are recovered on the basis of a kernelized time-varying constrained

L_{1}

-minimization for inverse matrix estimation (CLIME) estimator on each segment. We derive the rates of convergence for estimating the change points and precision matrices under mild moment and dependence conditions. In particular, we show that this two-step approach is consistent in estimating the change points and the piecewise smooth precision matrix function, under a certain high-dimensional scaling limit. The method is applied to the analysis of network structure of the S&P 500 index between 2003 and 2008.

Keywords:

high-dimensional time series; nonstationarity; network estimation; change points; kernel estimation

1. Introduction

Networks are useful tools to visualize the relational information among a large number of variables. An undirected graphical model belongs to a rich class of statistical network models that encodes conditional independence [1]. Canonically, Gaussian graphical models (or their normalized version partial correlations [2]) can be represented by the inverse covariance matrix (i.e., the precision matrix), where a zero entry is associated with a missing edge between two vertices in the graph. Specifically, two vertices are not connected if and only if they are conditionally independent, given the value of all other variables.

On one hand, there is a large volume of literature on estimating the (static) precision matrix for graphical models in the high-dimensional setting, where the sample size and the dimension are both large [3,4,5,6,7,8,9,10,11,12,13,14,15,16]. Most of the earlier work along this line assumes that the underlying network is time-invariant. This assumption is quite restrictive in practice and hardly plausible for many real-world applications, such as gene regulatory networks, social networks, and stocking market, where the underlying data generating mechanisms are often dynamic. On the other hand, dynamic random networks have been extensively studied from the perspective of large random graphs, such as community detection and edge probability estimation for dynamic stochastic block models (DSBMs) [17,18,19,20,21,22,23,24,25,26,27,28,29,30]. Such approaches do not model the sampling distributions of the error (or noise), since the “true” networks are connected with random edges sampled from certain probability models, such as the Erdős–Rényi graphs [31] and random geometric graphs [32].

In this paper, we view the (time-varying) networks of interests as non-random graphs. We adopt the graph signal processing approach for denoising the nonstationary time series and target on estimating the true unknown underlying graphs. Despite the recent attempts towards more flexible time-varying models [33,34,35,36,37,38,39,40], there are still a number of major limitations in the current high-dimensional literature. First, theoretical analysis was derived under the fundamental assumption that the observations are either temporally independent, or the temporal dependence has very specific forms, such as Gaussian processes or (linear) vector autoregression (VAR) [14,33,34,37,41,42,43]. Such dynamic structures are unduly demanding in view that many time series encountered in real applications have very complex nonlinear spatial-temporal dependency [44,45]. Second, most existing work assumes the data have time-varying distributions with sufficiently light tails, such as Gaussian graphical models and Ising models [33,34,36,41,42]. Third, in change point estimation problems for high-dimensional time series, piecewise constancy is widely used [41,42,46,47], which can be fragile in practice. For instance, financial data often appears to have time-dependent cross-volatility with structural breaks [48]. For resting-state fMRI signals, correlation analysis reveals both slowly varying and abruptly changing characteristics corresponding to modularities in brain functional networks [49,50].

Advances in analyzing high-dimensional (stationary) time series have been made recently to address the aforementioned nonlinear spatial-temporal dependency issue [14,37,43,51,52,53,54,55,56,57]. In [53,56,57], the authors considered the theoretical properties of regularized estimation of covariance and precision matrices, based on various dependence measures of high-dimensional time series. Reference [38] considered the non-paranormal graphs that evolve with a random variable. Reference [37] discussed the joint estimation of Gaussian graphical models based on a stationary VAR(1) model with special coefficient matrices, which may also depend on certain covariates. The authors applied a constrained

L_{1}

-minimization for inverse matrix estimation (CLIME) estimator with a kernel estimator of covariance matrix and developed consistency in the graph recovery at a given time point. Reference [14] studied the recovery of the Granger causality across time and nodes assuming a stationary Gaussian VAR model with unknown order.

In this paper, we focus on the recovery of time-varying undirected graphs on the basis of the regularized estimation of the precision matrices for a general class of nonstationary time series. We simultaneously model two types of dynamics: abrupt changes with an unknown number of change points and the smooth evolution between the change points. In particular, we study a class of high-dimensional piecewise locally stationary processes in a general nonlinear temporal dependency framework, where the observations are allowed to have a finite polynomial moment.

More specifically, there are two main goals of this paper: first, to estimate the change point locations, as well as the number of change points, and second, to estimate the smooth precision matrix functions between the change points. Accordingly, our proposed method contains two steps. In the first step, the maximum norm of the local difference matrix is computed at each time point and the jumps in the covariance matrices are detected at the location where the maximum norms are above a certain threshold. In the second step, the precision matrices before and after the jump are estimated by a regularized kernel smoothing estimator. These two steps are recursively performed until a stopping criterion is met. Moreover, a boundary correction procedure based on data reflection is considered to reduce the bias near the change point.

We provide an asymptotic theory to justify the proposed method in high dimensions: point-wise and uniform rates of convergence are derived for the change point estimation and graph recovery under mild and interpretable conditions. The convergence rates are determined via subtle interplay among the sample size, dimensionality, temporal dependence, moment condition, and the choice of bandwidth in the kernel estimator. Our results are significantly more involved than problems for sub-Gaussian tails and independent samples. We highlight that uniform consistency in terms of time-varying network structure recovery is much more challenging and difficult than pointwise consistency. For the multiple change point detection problem, we also characterize the threshold of the difference statistic that gives a consistent selection of the number of change points.

We fix some notations: Positive, finite, and non-random constants, independent of the sample size n and dimension p, are denoted by

C, C_{1}, C_{2}, \dots

, whose values may differ from line to line. For the sequence of real numbers,

a_{n}

and

b_{n}

, we write

a_{n} = O (b_{n})

or

a_{n} ≲ b_{n}

if

lim {sup}_{n \to \infty} (a_{n} / b_{n}) \leq C

for some constant

C < \infty

and

a_{n} = o (b_{n})

if

{lim}_{n \to \infty} (a_{n} / b_{n}) = 0

. We say

a_{n} ≍ b_{n}

if

a_{n} = O (b_{n})

and

b_{n} = O (a_{n})

. For a sequence of random variables

Y_{n}

and a corresponding set of constants

a_{n}

, denote

Y_{n} = O_{P} (a_{n})

if for any

ε > 0

there is a constant

C > 0

such that

P (| Y_{n} | / a_{n} > C) < ε

for all n. For a vector

x \in R^{p}

, we write

| x | = {(\sum_{j = 1}^{p} x_{j}^{2})}^{1 / 2}

. For a matrix

Σ

,

{| Σ |}_{1} = \sum_{j, k} | σ_{j k} |

,

{| Σ |}_{\infty} = {max}_{j, k} | σ_{j k} |

,

{| Σ |}_{L_{1}} = {max}_{k} \sum_{j} | σ_{j k} |

,

{| Σ |}_{F} = {(\sum_{j, k} σ_{j k}^{2})}^{1 / 2}

and

ρ (Σ) = max {| Σ x | : | x | = 1}

. For a random vector

z \in R^{p}

, write

z \in L^{a}

,

a > 0

, if

{∥ z ∥}_{a} = {: [E (| z |}^{a} {)]}^{1 / a} < \infty

. Let

∥z∥ = {∥z∥}_{2}

. Denote

a \land b = min (a, b)

and

a \lor b = max (a, b)

.

The rest of the paper is organized as follows: Section 2 presents the time series model, as well as the main assumptions, which can simultaneously capture the smooth and abrupt changes. In Section 3, we introduce the two-step method that first segments the time series based on the difference between the localized averages on sample covariance matrices and then recovers the graph support based on a kernelized CLIME estimator. In Section 4, we state the main theoretical results for the change point estimation and support recovery. Simulation examples are presented in Section 5 and a real data application is given in Section 6. Proof of main results can be found in Section 7.

2. Time Series Model

We first introduce a class of causal vector stochastic processes. Next, we state the assumptions to derive an asymptotic theory in Section 4 and explain their implications. Let

ε_{i} \in R^{p}, i \in Z

be independent and identically distributed (i.i.d.) random vectors and

F_{i} = (\dots, ε_{i - 1}, ε_{i})

be a shift process. Let

X_{i}^{\circ} (t) = (X_{i 1}^{\circ} (t), \dots, X_{i p}^{\circ} (t))

be a p-dimensional nonstationary time series generated by

\begin{matrix} X_{i}^{\circ} (t) = H (F_{i}; t), \end{matrix}

(1)

where

H (\cdot; \cdot) = (H_{1} (\cdot; \cdot), \dots, H_{p} (\cdot; \cdot))

is an

R^{p}

-valued jointly measurable function. Suppose we observe the data points

X_{i} = X_{i, n} = X_{i}^{\circ} (t_{i})

at the evenly spaced time intervals

t_{i} = i / n, i = 1, 2, \dots, n

,

\begin{matrix} X_{i, n} = H (F_{i}; i / n) . \end{matrix}

(2)

We drop the subscription n in

X_{i, n}

in the rest of this section. Since our focus is to study the second-order properties, the data is assumed to have a mean of zero.

Model (1) is first introduced in [58]. The stochastic process

{(X_{i}^{\circ} (t))}_{i \in Z, t \in [0, 1)}

can be thought of as a triangular array system, double indexed by i and t, while the observations

{(X_{i})}_{i = 1}^{n}

are sampled from the diagonal of the array. On one hand, when fixing the time index t, the (vertical) process

{(X_{i}^{\circ} (t))}_{i \in Z}

is stationary. On the other hand, since

H (F_{i}; t_{i})

is allowed to vary with

t_{i}

, the diagonal process (2) is able to capture nonstationarity.

The process

{(X_{i})}_{i \in Z}

is causal or non-anticipative as

X_{i}

is an output of the past innovations

{(ε_{j})}_{j \leq i}

and does not depend on future innovations. In fact, it covers a broad range of linear and nonlinear, stationary and non-stationary processes, such as vector auto-regressive moving average processes, locally stationary processes, Markov chains, and nonlinear functional processes [53,58,59,60,61].

Motivated by real applications where nonstationary time series data can involve both abrupt breaks and smooth varies between the breaks, we model the underlying processes as piecewise locally stationary with a finite number of structural breaks.

Definition 1 (Piecewise locally stationary time series model).

Define

{PLS}_{ι} ([0, 1], L)

as the collection of mean-zero piecewise locally stationary processes on

[0, 1]

, if for each

{(X (t))}_{0 \leq t \leq 1} \in

{PLS}_{ι} ([0, 1], L)

, there is a nonnegative integer ι such that

X (t)

is piecewise stochastic Lipschitz continuous in t with Lipschitz constant L on the interval

[t^{(l)}, t^{(l + 1)}), l = 0, \dots, ι

, where

0 = t^{(0)} < t^{(1)} \dots < t^{(ι)} < t^{(ι + 1)} = 1

. A vector stochastic process

{(X (t))}_{0 \leq t \leq 1} \in {PLS}_{ι} ([0, 1], L)

if all coordinates belong to

{PLS}_{ι} ([0, 1], L)

. For the process

{(X_{0}^{\circ} (t))}_{0 \leq t \leq 1}

defined in (1), this means that there exists a non-negative integer ι and a constant

L > 0

, such that

max_{1 \leq j \leq p} ∥H_{j} (F_{0}; t) - H_{j} (F_{0}; t^{'})∥ \leq L | t - t^{'} | f o r a l l t^{(l)} \leq t, t^{'} < t^{(l + 1)}, 0 \leq l \leq ι .

Remark 1.

If we assume

{(X_{i}^{\circ} (t))}_{0 \leq t \leq 1} \in {PLS}_{ι} ([0, 1], L), i \in Z

, then it follows that for each

i^{'} = i - k, \dots, i + k

, where

k / n \to 0

, and that

t^{(l)} \leq i, i^{'} < t^{(l + 1)}

for some

0 \leq l \leq ι

, we have

max_{1 \leq j \leq p} ∥ H_{j} (F_{i^{'}}; i / n) - H_{j} (F_{i^{'}}; i^{'} / n) ∥ \leq L k / n = o (1) .

In other words, within a locally stationary time period, in a local window of i,

{(X_{i^{'} j})}_{i - k \leq i^{'} \leq i + k}

can be approximated by the stationary process

{(X_{i^{'} j}^{\circ} (i / n))}_{i - k \leq i^{'} \leq i + k}

for each

j = 1, \dots, p

. This justifies the terminology of local stationarity.

The covariance matrix function of the underlying process is

Σ (t) = {(σ_{j k} (t))}_{1 \leq j, k \leq p}

,

t \in [0, 1]

, where

σ_{j k} (t) = E (H_{j} (F_{0}; t) H_{k} (F_{0}; t))

, and the precision matrix function is

Ω (t) = Σ {(t)}^{- 1} = {(ω_{j k} (t))}_{1 \leq j, k \leq p}

. The graph at time t is denoted by

G (t) = (V, E (t))

, where

V

is the vertex set and

E (t) = {(j, k) : ω_{j k} (t) \neq 0}

. Note that

{(X_{i}^{\circ} (t))}_{t} \in {PLS}_{ι} ([0, 1], L), i \in Z

implies piecewise Lipschitz continuity in

Σ (t)

except at the breaks

t^{(1)}, \dots, t^{(ι)}

. In particular, if

{sup}_{0 \leq t \leq 1} {max}_{1 \leq j \leq p} ∥H_{j} (F_{0}; t)∥ \leq C

for some constant

C > 0

, then

\begin{matrix} {| Σ (s) - Σ (t) |}_{\infty} & \leq & 2 C L | s - t |, \forall s, t \in [t^{(l)}, t^{(l + 1)}), l = 0, \dots, ι . \end{matrix}

(3)

The reverse direction is not necessarily true, i.e., (3) does not indicate

{(X_{i}^{\circ} (t))}_{t} \in {PLS}_{ι} ([0, 1], L)

,

i \in Z

in general. As a trivial example, let

ε_{i j} = 2^{- 1 / 2}

with probability

2 / 3

and

\sqrt{2}

with probability

1 / 3

i.i.d for all

i, j

. At time

t_{k} = k / n

, let

X_{i j}^{\circ} (t_{k}) = {(- 1)}^{k} {\sqrt{t}}_{k} ε_{i j}

. Then for any k and

k^{'}

such that

k + k^{'}

is odd,

| Σ (t_{k}) - Σ (t_{k^{'}}) |_{\infty} = | t_{k} - t_{k^{'}} |

, while

∥ X_{01}^{\circ} (t_{k}) - X_{01}^{\circ} (t_{k^{'}}) ∥_{2} = \sqrt{t_{k}} + \sqrt{t_{k^{'}}}

.

Assumption 1 (Piecewise smoothness).

(i) Assume

{(X_{i}^{\circ} (t))}_{0 \leq t \leq 1} \in {PLS}_{ι} ([0, 1], L)

for each

i \in Z

, where

L > 0

and

ι \geq 0

are constants independent of n and p. (ii) For each

l = 0, \dots, ι

, and

1 \leq j, k \leq p

, we have

σ_{j k} (t) \in C^{2} [t^{(l)}, t^{(l + 1)})

.

Now we introduce the temporal dependence measure. We quantify the dependence of

{(X_{i}^{\circ} (t))}_{i \in Z}

by the dependence adjusted norm (DAN) (cf. [62]). Let

ε_{i}^{'}

be an independent copy of

ε_{i}

and

F_{i, {m}} = (\dots, ε_{i - m - 1}, ε_{i - m}^{'}, ε_{i - m + 1}, \dots, ε_{i})

. Denote

X_{i, {m}}^{\circ} (t) = (X_{i 1, {m}}^{\circ} (t), \dots, X_{i p, {m}}^{\circ} (t))

, where

X_{i j, {m}}^{\circ} (t) = H_{j} (F_{i, {m}}; t)

,

1 \leq j \leq p

. Here

X_{i, {m}}^{\circ} (t)

is a coupled version of

X_{i}^{\circ} (t)

, with the same generating mechanism and input, except that

ε_{i - m}

is replaced by an independent copy

ε_{i - m}^{'}

.

Definition 2 (Dependence adjusted norm (DAN)).

Let constants

a \geq 1, A > 0

. Assume

{sup}_{0 \leq t \leq 1} {∥ X_{1 j}^{\circ} (t) ∥}_{a} < \infty, j = 1, \dots, p

. Define the uniform functional dependence measure for the sequences

{(X_{i j}^{\circ} (t))}_{i \in Z, t \in [0, 1]}

of form (1) as

θ_{m, a, j} = sup_{0 \leq t \leq 1} {∥ X_{i j}^{\circ} (t) - X_{i j, {m}}^{\circ} (t) ∥}_{a}, j = 1, \dots, p,

and

Θ_{m, a, j} = \sum_{i = m}^{\infty} θ_{i, a, j}

. The dependence adjusted norm of

{(X_{i j}^{\circ} (t))}_{i \in Z, t \in [0, 1]}

is defined as

{∥X_{\cdot, j}∥}_{a, A} = sup_{m \geq 0} {(m + 1)}^{A} Θ_{m, a, j},

whenever

{∥X_{\cdot, j}∥}_{a, A} < \infty

.

Intuitively, the physical dependence measure quantifies the adjusted stochastic difference between the random variable and its coupled version by replacing past innovations. Indeed,

θ_{m, a, j}

measures the impact on

X_{i j}^{\circ} (t)

uniform over t by replacing

ε_{i - m}

while freezing all the other inputs, while

Θ_{m, a, j}

quantifies the cumulative influence of replacing

ε_{- m}

on

{(X_{i j}^{\circ} (t))}_{i \geq 0}

uniform over t. Then

{∥X_{\cdot, j}∥}_{a, A}

controls the uniform polynomial decay in the lag of the cumulative physical dependence, where a depends on the the tail of marginal distributions of

X_{1, j}^{\circ} (t)

and A quantifies the polynomial decay power and thus the temporal dependence strength. It is clear that

{∥X_{\cdot, j}∥}_{a, A}

is a semi-norm, i.e., it is subaddative and absolutely homogeneous.

Assumption 2 (Dependence and moment conditions).

Let

X_{i}^{\circ} (t)

be defined in (1) and

X_{i}

in (2). There exist

q > 2

and

A > 0

such that

ν_{2 q} : = sup_{t \in [0, 1]} max_{1 \leq j \leq p} E {| X_{j}^{\circ} (t) |}^{2 q} < \infty and N_{X, 2 q} : = max_{1 \leq j \leq p} {∥X_{\cdot, j}∥}_{2 q, A} < \infty .

(4)

We let

M_{X, q} : = {(\sum_{1 \leq j \leq p} {∥X_{\cdot, j}∥}_{2 q, A}^{q})}^{1 / q}

and write

N_{X} = N_{X, 4}

,

M_{X} = M_{X, 2}

. The quantities

M_{X, q}

and

N_{X, 2 q}

measure the

L^{q}

-norm aggregated effect and the largest effect of the element-wise DANs respectively. Both quantities play a role in the convergence rates of our estimator.

Obviously, we have

∥ X_{i j} - X_{i j, {m}} ∥_{a} \leq θ_{m, a, j}

and

{max}_{1 \leq j \leq p} E {| X_{i j} |}^{2 q} \leq ν_{2 q}

for all

1 \leq i \leq n

. In contrast to other works in a high-dimensional covariance matrix and network estimation, where sub-Gaussian tails and independence are the keys to ensure consistent estimation. Assumption 2 only requires that the time series have a finite polynomial moment, and it allows linear and nonlinear processes with short memory in the time domain.

Example 1 (Vector linear process).

Consider the following vector linear process model

H (F_{i}; t) = \sum_{m = 0}^{\infty} A_{m} (t) ε_{i - m},

where

ε_{i} = (ε_{1}, \dots, ε_{p})

and

ε_{i j}

are i.i.d. with mean 0 and variance 1, and

∥ ε_{i j} ∥_{q} \leq C_{q}

for each

i \in Z

and

1 \leq j \leq p

with some constants

q > 2

and

C_{q} > 0

. The vector linear process is commonly seen in literature and application [63]. It includes the time-varying VAR model where

A_{m} (t) = A {(t)}^{m}

as a special example.

Suppose that the coefficient matrices

A_{m} (t) = {(a_{m, j k} (t))}_{1 \leq j, k \leq p}, m = 0, 1, \dots

satisfy the following condition.

(A1): For each $1 \leq j, k \leq p$ , $a_{m, j k} (t) \in C^{2} [0, 1] .$
(A2): For each $1 \leq j \leq p$ , there is a constant $C_{A, j} > 0$ such that for each $t \in [0, 1]$ , $\sum_{k = 1}^{p} a_{m, j k} {(t)}^{2} \leq C_{A, j} {(m + 1)}^{- 2 (A + 1)}$ for all $m \geq 0$ .
(A3): For any $t, t^{'} \in [0, 1]$ , $\sum_{m = 0}^{\infty} \sum_{k = 1}^{p} {[a_{m, j k} (t) - a_{m, j k} (t^{'})]}^{2} \leq L^{2} {| t - t^{'} |}^{2}$ for each $j = 1, \dots, p$ .

Note that

\begin{matrix} σ_{j k} (t) & = \sum_{m \geq 0} A_{m, j \cdot}^{⊤} (t) A_{m, k \cdot} (t), \\ Θ_{m, q, j} & \leq 2 C_{q} \sqrt{q - 1} \sum_{m = 0}^{\infty} {(A_{m, j \cdot}^{⊤} A_{m, j \cdot})}^{1 / 2}, \\ ∥ X_{i j}^{\circ} (t) - X_{i j}^{\circ} (t^{'}) ∥^{2} & = \sum_{m = 0}^{\infty} A_{m, j \cdot} \sum_{k = 1}^{p} {[a_{m, j k} (t) - a_{m, j k} (t^{'})]}^{2}, \end{matrix}

where

A_{m, j \cdot} (t)

is the jth row of

A_{m} (t)

. Under conditions (A1)–(A3), one can easily verify that for each

1 \leq j, k \leq p

, the process satisfies: (1)

σ_{j k} (t) \in C^{2} [0, 1]

; (2)

∥ X_{\cdot, j} ∥_{q, A} \leq C_{q} \sqrt{(q - 1) C_{A, j}}

(due to Burkholder’s inequality, cf. [64]); (3)

∥ H_{j} (F_{0}; t) - H_{j} (F_{0}; t^{'}) ∥ \leq L | t - t^{'} |

.

Conditions (A1)–(A3) implicitly impose smoothness in each entry of the coefficient matrices, sparseness in each column of the entry and evolution, and polynomial decay rate in the lag m of each entry and its derivative.

For

1 \leq l \leq ι

, let

δ_{j k} (t^{(l)}) : = σ_{j k} (t^{(l)}) - σ_{j k} (t^{(l)} -)

and

Δ (t^{(l)}) = {(δ_{j k} (t^{(l)}))}_{1 \leq j, k \leq p}

, where

σ_{j k} (t^{(l)} -) = {lim}_{t \to t^{(l)} -} σ_{j k} (t)

is well-defined in view of (3). We assume that the change points are separated and sizeable.

Assumption 3 (Separability and sizeability of change points).

There exist positive constants

c_{1} \in (0, 1)

and

c_{2} > 0

independent of n and p such that

{max}_{0 \leq l \leq ι} (t^{(l + 1)} - t^{(l)}) \geq c_{1}

and

δ (t_{l}) : = {| Δ (t_{l}) |}_{\infty} \geq c_{2}

.

In the high-dimensional context, we assume that the inverse covariance matrices are sparse in the sense of their

L_{1}

norms.

Assumption 4 (Sparsity of precision matrices).

The precision matrix

{| Ω (t) |}_{L^{1}} \leq κ_{p}

for each

t \in [0, 1]

, where

κ_{p}

is allowed to grow with p.

If we further assume that the eigenvalues of the covariance matrices are bounded from below and above, i.e., there exists a constant

0 < c < 1

, such that

c \leq {inf}_{t \in [0, 1]} {| Σ (t) |}_{2} \leq {sup}_{t \in [0, 1]} {| Σ (t) |}_{2} \leq c^{- 1}

, then the covariance matrices and precision matrices are well-conditioned. In particular, as

| Ω (t) - Ω (t^{'}) | \leq c^{- 2} | Σ (t) - Σ (t^{'}) |

, a small perturbation in the covariance matrix would guarantee a small change of the same order in the precision matrix under the spectral norm.

3. Method: Change Point Estimation and Support Recovery

In graphical models (such as the Gaussian graphical model or partial correlation graph), network structures relevant to correlations or partial correlations are second-order characteristics of the data distributions. Specifically, the existence of edges coincides with non-zero entries of the inverse covariance matrix. We consider the dynamics of time series with both structural breaks and smooth changes. The piecewise stochastic Lipschitz continuity in Definition 1 allows the time series to have discontinuity in the covariance matrix function at time points

t^{(l)}, l = 1, \dots, ι

(i.e., change points), while only smooth changes (i.e., twice continuous differentiability of the covariance matrix function in Assumptions 1) can occur between the change points.

In the presence of change points, we must first remove the change points before applying any smoothing procedures since

{| Ω (t) - Ω (t -) |}_{\infty} \geq {| Σ (t) |}_{L^{1}}^{- 1} {| Σ (t -) |}_{L^{1}}^{- 1} {| Δ (t) |}_{\infty}

, i.e., a non-negligible abrupt change in the covariance matrix will result in a substantial change of the graph structure for sparse and smooth covariance matrices. Thus our proposed graph recovery method consists of two steps: change point detection and support recovery.

Let

h \equiv h_{n} > 0

be a bandwidth parameter such that

h = o (1)

and

n^{- 1} = o (h)

, and

D_{h} (0) = {h, h + 1 / n, \dots, 1 - h}

be a search grid in

(0, 1)

. Define

D (s) = n^{- 1} (\sum_{i = 0}^{h n - 1} X_{n s - i} X_{n s - i}^{⊤} - \sum_{i = 1}^{h n} X_{n s + i} X_{n s + i}^{⊤}), s \in D_{h} (0) .

(5)

To estimate the change points, compute

{\hat{s}}_{1} = {argmax}_{s \in D_{h} (0)} {| D (s) |}_{\infty} .

(6)

The following steps are performed recursively. For

l = 1, 2, \dots

, let

\begin{matrix} D_{h} (l) = D_{h} (l - 1) \cap {{\hat{s}}_{l} - 2 h, \dots, {\hat{s}}_{l} + 2 h}^{c}, \end{matrix}

(7)

\begin{matrix} {\hat{s}}_{l + 1} = arg {max}_{s \in D_{h} (l)} {| D (s) |}_{\infty}, \end{matrix}

(8)

until the following criterion is attained:

\begin{matrix} max_{s \in D_{h} (l)} {| D (s) |}_{\infty} < ν, \end{matrix}

(9)

where

ν

is an early stopping threshold. The value of

ν

is determined in Section 4, which depends on the dimension and sample size, as well as the serial dependence level, tail condition, and local smoothness. Since our method only utilizes data in the localized neighborhood, multiple change points can be estimated and ranked in a single pass, which offers some computational advantage than the binary segmentation algorithm [41,46].

Once the change points are claimed, in the second step, we consider recovering the networks from the locally stationary time series before and after the structural breaks. In [11], where

X_{i}, i = 1, \dots, n

are assumed with an identical covariance matrix, the precision matrix

\hat{Ω}

is estimated as,

\begin{matrix} {\hat{Ω}}_{λ} = arg min_{Ω \in R^{p \times p}} {| Ω |}_{1} s . t . {| \hat{Σ} Ω - {Id}_{p} |}_{\infty} \leq λ, \end{matrix}

(10)

where

\hat{Σ}

is the sample covariance matrix. Inspired by (10), we apply a kernelized time-varying (tv-) CLIME estimator for the covariance matrix functions of the multiple pieces of locally stationary processes before and after the structural breaks. Let

\begin{matrix} \hat{Σ} (t) = \sum_{i = 1}^{n} w (t, t_{i}) X_{i} X_{i}^{⊤}, \end{matrix}

(11)

where

\begin{matrix} w (t, i) = \frac{K_{b} (t_{i}, t)}{\sum_{i = 1}^{n} K_{b} (t_{i}, t)} \end{matrix}

(12)

and

K_{b} (u, v) = K (| u - v | / b) / b

. The bandwidth parameter b satisfies that

b = o (1)

and

n^{- 1} = o (b)

. Denote

B_{n} = n b

. The kernel function

K (\cdot)

is chosen to have properties as follows.

Assumption 5 (Regularity of kernel function).

The kernel function

K (\cdot)

is non-negative, symmetric, and Lipschitz continuous with bounded support in

[- 1, 1]

, and that

\int_{- 1}^{1} K (u) d u = 1

.

Assumption 5 is a common requirement on the kernel functions and can be fulfilled by a range of kernel functions, such as the uniform kernel, triangular kernel, and the Epanechnikov kernel. Now the tv-CLIME estimator of the precision matrix

Ω (t)

is defined by

\tilde{Ω} (t) = {({\tilde{ω}}_{j k} (t))}_{1 \leq j, k \leq p}

, where

{\tilde{ω}}_{j k} (t) = min ({\hat{ω}}_{j k} (t), {\hat{ω}}_{k j} (t))

, and

\hat{Ω} (t) \equiv {\hat{Ω}}_{λ} (t) = {({\hat{ω}}_{j k} (t))}_{1 \leq j, k \leq p}

,

\begin{matrix} {\hat{Ω}}_{λ} (t) = arg min_{Ω \in R^{p \times p}} {| Ω |}_{1} s . t . {| \hat{Σ} (t) Ω - {Id}_{p} |}_{\infty} \leq λ . \end{matrix}

(13)

Similar hybridized kernel smoothing and the CLIME method for estimating the sparse and smooth transition matrices in high-dimensional VAR model has been considered in [65], where change point is not considered. Thus in the current setting we need to carefully control effect of (consistently) removing the change points before smoothing.

Then, the network is estimated by the “effective support” defined as follows.

\begin{matrix} \hat{G} (t; u) = {({\hat{g}}_{j k} (t; u))}_{1 \leq j, k \leq p}, where {\hat{g}}_{j k} (t; u) = I \{| {\tilde{ω}}_{j k} (t) | \geq u\} . \end{matrix}

(14)

It should be noted that the (vanilla) kernel smoothing estimator (11) of the covariance matrix does not adjust for the boundary effect due to the change points in the covariance matrice function. Thus, in the neighborhood of the change points, a larger bias can be induced in estimating

Σ (t)

by

\hat{Σ} (t)

. As a remedy, we apply the following reflection procedure for boundary correction. Suppose

t \in {\hat{T}}_{b + h^{2}} (j)

for

1 \leq j \leq ι

, Denote

{\hat{T}}_{d} (j) : = [{\hat{s}}_{j} - d, {\hat{s}}_{j} + d)

for

d \in (0, 1)

. We replace (11) by

\hat{Σ} (t) = \sum_{i = 1}^{n} w (t, t_{i}) {\overset{˘}{x}}_{i} {\overset{˘}{x}}_{i}^{⊤},

and then apply the rest of the tv-CLIME approach. Here

\begin{matrix} {\overset{˘}{x}}_{i} = \{\begin{matrix} x_{i} & if (i - {\hat{s}}_{j} n) (t - {\hat{s}}_{j} n) \geq 0; \\ x_{2 {\hat{s}}_{j} n - i} & otherwise . \end{matrix} \end{matrix}

(15)

4. Theoretical Results

In this section, we derive the theoretical guarantees for the change point estimation and graph support recovery. Roughly speaking, Proposition 1 and 2 below show that under appropriate conditions, if each element of the covariance matrix varies smoothly in time, one can obtain an accurate snapshot estimation of the precision matrices as well as the time-varying graphs with high probability via the proposed kernel smoothed constrained

l_{1}

minimization approach.

Define

J_{q, A} (n, p) = M_{X, q} {(p ϖ_{q, A} (n))}^{1 / q}

, where

ϖ_{q, A} (n) = n, n {(log n)}^{1 + 2 q}, n^{q / 2 - A q}

if

A > 1 / 2 - 1 / q

,

A = 1 / 2 - 1 / q

, and

0 < A < 1 / 2 - 1 / q

, respectively.

Proposition 1 (Rate of convergence for estimating precision matrices: pointwise and uniform).

Suppose Assumptions 2, 4, and 5 hold with

ι = 0

. Let

B_{n} = b n

for

n^{- 1} = o (b)

and

b = o (1)

.

(i): Pointwise.Choose the parameter $λ^{\circ} \geq C κ_{p} (b^{2} + B_{n}^{- 1} J_{q, A} (B_{n}, p) + N_{X} {(log p / B_{n})}^{1 / 2})$ in the tv-CLIME estimator ${\hat{Ω}}_{λ^{\circ}} (t)$ in (13), where C is a sufficiently large constant independent of n and p. Then for any $t \in [b, 1 - b]$ , we have

$\begin{matrix} | {\hat{Ω}}_{λ^{\circ}} {(t) - Ω (t) |}_{\infty} & = O_{P} (κ_{p} λ^{\circ}) . \end{matrix}$

(16)
(ii): Uniform.Choose $λ^{⋄} \geq C κ_{p} (b^{2} + B_{n}^{- 1} J_{q, A} (n, p) + N_{X} B_{n}^{- 1} {(n log (p))}^{1 / 2})$ in the tv-CLIME estimator ${\hat{Ω}}_{λ^{\circ}} (t)$ in (13), where C is a sufficiently large constant independent of n and p. Then we have

$\begin{matrix} sup_{t \in [b, 1 - b]} {| {\hat{Ω}}_{λ^{⋄}} (t) - Ω (t) |}_{\infty} = O_{P} (κ_{p} λ^{⋄}) . \end{matrix}$

(17)

The optimal order of the bandwidth parameter

b = b_{♯}

in (17) is the solution to the following equation:

\begin{matrix} b^{2} & = & B_{n}^{- 1} max (J_{q, A} (n, p), N_{X} {(n log (p^{2}))}^{1 / 2}), \end{matrix}

which implies that the closed-form expression for

b_{♯}

is given by

\begin{matrix} b_{♯} = C_{1} {(n^{- 1} J_{q, A} (n, p))}^{1 / 3} + C_{2} N_{X}^{1 / 3} n^{- 1 / 6} log {(p)}^{1 / 6} \end{matrix}

for some constants

C_{1}

and

C_{2}

that are independent of n and p.

Given a finite sample, to distinguish the small entries in the precision matrix from the noise is challenging. Since a smaller magnitude of a certain element of the precision matrix implies a weaker connection of the edge in the graphical model, we instead consider the estimation of significant edges in the graph. Define the set of significant edges at level u as

E^{*} (t; u) = \{(j, k) : g_{j k}^{*} (t; u) \neq 0\}

, where

g_{j k}^{*} (t; u) = I \{| ω_{j k} (t) | > u\} .

Then, as a consequence of (17), we have the following support recovery consistency result.

Proposition 2 (Consistency of support recovery: significant edges).

Choose u as

u_{♯} = C_{0} κ_{p}^{2} b_{♯}^{2}

, where

C_{0}

is taken as a sufficiently large constant independent of n and p. Suppose that

u_{♯} = o (1)

as

n, p \to \infty

. Then under conditions of Proposition 1, we have that as

n, p \to \infty

,

\begin{matrix} P (sup_{t \in [b, 1 - b]} \sum_{(j, k) \in E^{c} (t)} I \{{\hat{g}}_{j k} (t; u_{♯}) \neq 0\} \neq 0) \to 0, \end{matrix}

(18)

\begin{matrix} P (sup_{t \in [b, 1 - b]} \sum_{(j, k) \in E^{*} (t; 2 u_{♯})} I \{{\hat{g}}_{j k} (t; u_{♯}) = 0\} \neq 0) \to 0 . \end{matrix}

(19)

Proposition 2 shows that the pattern of significant edges in the time-varying true graphs

G (t), t \in [b, 1 - b]

, can be correctly recovered with high probability. However, it is still an open question to what extent the edges with magnitude below u can be consistently estimated, which can be naturally studied in the multiple hypothesis testing framework. Nonetheless, hypothesis testing for graphical models on the nonstationary high-dimensional time series is rather challenging. We leave it as a future problem.

Propositions 1 and 2 together yield that the consistent estimation of the precision matrices and the graphs can be achieved before and after the change points. Now, we provide the theoretical result of the change point estimation. Theorem 1 below shows that if the change points are separated and sizable, then we can consistently identify them via the single pass segmentation approach under suitable conditions. Denote

h_{⋄} = C_{1} {(n^{- 1} J_{q, A} (n, p))}^{1 / 3} + C_{2} N_{X}^{1 / 3} n^{- 1 / 6} log {(p)}^{1 / 6},

where

C_{1}

and

C_{2}

are constants independent of n and p.

Theorem 1 (Consistency of change point estimation).

Assume

X_{i} \in R^{p}

admits the form (2). Suppose that Assumptions 2 to 3 are satisfied. Choose the bandwidth

h = h_{⋄}

, and

ν = (1 + L) h_{⋄}^{2}

in (5) and (9) respectively. Assume that

h_{⋄} = o (1)

as

n, p \to \infty

. We find that there exist constants

C_{1}, C_{2}, C_{3}

independent of n and p, such that

\begin{matrix} P (| \hat{ι} - ι | > 0) \leq C_{1} {(\frac{p ϖ_{q, A} (n) M_{X, q}^{q} ν_{2 q}^{q}}{n^{q} c_{2}^{q}})}^{1 / 3} + C_{2} p^{2} exp \{- C_{3} {(\frac{n {log}^{2} (p)}{N_{X}^{2}})}^{1 / 3}\} . \end{matrix}

(20)

Furthermore, in the event

{ι = \hat{ι}}

, the ordered change-point estimator

({\hat{s}}_{(1)} < {\hat{s}}_{(2)} < \dots < {\hat{s}}_{(\hat{ι})})

defined in (7) satisfies

\begin{matrix} max_{1 \leq j \leq ι} | {\hat{s}}_{(j)} - t^{(j)} | = O_{P} (h_{⋄}^{2}) . \end{matrix}

(21)

Proposition 2 and Theorem 1 together indicate the consistency in the snapshot estimation of the time-varying graphs before and after the change points. In a close neighborhood of the change points, we have the following result for the recovery of the time-varying network. Denote

S : = [b_{♯}, 1 - b_{♯}] \cap (\cup_{1 \leq j \leq \hat{ι}} {\hat{T}}_{h_{⋄}^{2} + b_{♯}}^{c} (j))

as the time intervals between the estimated change points, and

N : = [0, b_{♯}) \cup (\cup_{1 \leq j \leq \hat{ι}} ({\hat{T}}_{h_{⋄}^{2} + b_{♯}} \cap {\hat{T}}_{h_{⋄}^{2}}^{c})) \cup (1 - b_{♯}, 1]

as the recoverable neighborhood of the jump.

Theorem 2.

Let Assumptions 2 to 5 be satisfied. We have the following results as

n, p \to \infty

.

(i): Between change points.For $t \in S$ , take $b = b_{♯}$ and $u = u_{♯}$ , where $b_{♯}$ and $u_{♯}$ are defined in Proposition 2. Suppose $u_{♯} = o (1)$ . We have

$\begin{matrix} sup_{t \in S} max_{j, k} | {\hat{σ}}_{j, k} (t) - σ_{j, k} (t) | = O_{P} (b_{♯}^{2}) . \end{matrix}$

(22)

Choose the penalty parameter as $λ_{♯} : = C_{1} κ_{p} b_{♯}^{2}$ , where $C_{1}$ is a constant independent of n and p. Then

$\begin{matrix} sup_{t \in S} {| {\hat{Ω}}_{λ_{♯}} (t) - Ω (t) |}_{\infty} = O_{P} (κ_{p}^{2} b_{♯}^{2}) . \end{matrix}$

Moreover,

$\begin{matrix} P (sup_{t \in S} \sum_{(j, k) \in E^{c} (t)} I \{{\hat{g}}_{j, k} (t; u_{♯}) \neq 0\} = 0) \to 1, \end{matrix}$

(23)

$\begin{matrix} P (sup_{t \in S} \sum_{(j, k) \in E^{*} (t; 2 u_{♯})} I \{{\hat{g}}_{j k} (t; u_{♯}) = 0\} = 0) \to 1 . \end{matrix}$

(24)
(ii): Around change points.For $s \in N$ , take $b = b_{⋆} : = C_{1} {(n^{- 1} J_{q, A} (n, p))}^{1 / 2} + C_{2} N_{X}^{1 / 2} n^{- 1 / 4} log {(p)}^{1 / 4}$ , and $u = u_{⋆} : = C_{0} κ_{p}^{2} b_{⋆}$ , where $C_{0}$ , $C_{1}$ and $C_{2}$ are constants independent of n and p. Suppose $u_{⋆} = o (1)$ . We have

$\begin{matrix} sup_{t \in N} max_{j, k} | {\hat{σ}}_{j, k} (t) - σ_{j, k} (t) | = O_{P} (b_{⋆}) . \end{matrix}$

Choose the penalty parameter as $λ_{⋆} : = C_{1} κ_{p} b_{⋆}$ , where $C_{1}$ is a constant independent of n and p. Then

$\begin{matrix} sup_{t \in N} {| {\hat{Ω}}_{λ_{⋆}} (t) - Ω (t) |}_{\infty} = O_{P} (κ_{p}^{2} b_{⋆}) . \end{matrix}$

(25)

Moreover,

$\begin{matrix} P (sup_{t \in N} \sum_{(j, k) \in E^{c} (t)} I \{{\hat{g}}_{j, k} (t; u_{⋆}) \neq 0\} = 0) \to 1, \end{matrix}$

(26)

$\begin{matrix} P (sup_{t \in N} \sum_{(j, k) \in E^{*} (t; 2 u_{⋆})} I \{{\hat{g}}_{j, k} (t; u_{⋆}) = 0\} = 0) \to 1 . \end{matrix}$

(27)

Note that the convergence rates for the covariance matrix entries and precision matrix entries in case (ii) around the jump locations are slower than those for points well separated from the jump locations in case (i). This is because on the boundary due to the reflection, the smooth condition may no longer hold true. Indeed, we only take advantage of the Lipschitz continuous property of the covariance matrix function. Thus, we lose one degree of regularity in the covariance matrix function, and the bias term

b^{2}

in the convergence rate of the between-jump area becomes b around the jumps. We also note that around the smaller neighborhood of the jump

J : = \cup_{1 \leq j \leq \hat{ι}} {\hat{T}}_{h_{⋄}^{2}}

, due to the larger error in the change point estimation, consistent recovery of the graphs is not achievable.

5. A Simulation Study

We simulate data from the following multivariate time series model:

X_{i} = \sum_{m = 0}^{100} A_{m} (i) ϵ_{i - m}, i = 1, \dots, n,

where

A_{m} (i) \in R^{p \times p}, 1 \leq m \leq 100, 1 \leq i \leq n

, and

ϵ_{i - m} = {(ϵ_{i - m, 1}, \dots, ϵ_{i - m, p})}^{⊤}

, with

ϵ_{m, k}

,

m \in Z

,

j = 1, \dots, p

generated as i.i.d. standardized

T (8)

random variables. In the simulation, we fix

n = 1000

and vary

p = 50

and

p = 100

. For each

m = 1, \dots, 100

, the coefficient matrices

A_{m} (i) = {(1 + m)}^{- β} B_{m} (i)

, where

β = 1

, and

B_{m} (1)

is an

R^{p \times p}

block diagonal matrix. The

5 \times 5

diagonal blocks in

B_{m} (i)

are fixed with i.i.d.

N (0, 1)

entries and all the other entries are 0.

We consider the number of abrupt changes is

ι = 2

and

(n t^{(1)}, n t^{(2)}) = (300, 650)

. The matrix

A_{0} (i)

is set to be a zero matrix for

i = 1, 2, \dots, 299

, while

A_{0} (i) = A_{0} (299) + α α^{⊤}

,

i = 300, 301, \dots, 649

, and

A_{0} (i) = A_{0} (649) - α α^{⊤}

,

i = 650, 651, \dots, 1000

, where the first 20 entries in

α

are taken to be a constant

δ_{0}

and the others are 0.

We let the coefficient matrices

A_{1} (i) = {a_{m, j k} (i)}_{1 \leq j, k \leq p}

evolve at each time point, such that two entries are soft-thresholded and another two elements increase. Specifically, at time i, we randomly select two elements from the support of

A_{1} (i)

, which are denoted as

{a_{1, j_{l}^{⋆} k_{l}^{⋆}} (i)}, l = 1, 2

and that

a_{1, j^{⋆} k^{⋆}} (i) \neq 0

, and set them to

a_{1, j_{l}^{⋆} k_{l}^{⋆}}^{⋆} (i) = sign (a_{1, j_{l}^{⋆} k_{l}^{⋆}} (i)) (| a_{1, j_{l}^{⋆} k_{l}^{⋆}} (i) - 0.05 |)

. We also randomly select two elements from

A_{1}^{⋆} (i)

and increase their values by

0.03

.

Figure 1 and Figure 2 show the support of the true covariance matrices at

i = 100, 200, \dots, 900

.

In detecting the change points, the cutoff value

ν

of detection is chosen as follows. After removing the neighborhood of detected change points, we obtain

D_{h}^{(l)}

by ordering

D_{h}^{(l)}, \dots D_{h}^{(l)}

, where

l

is obtained from (9) with

ν = 0

. For

l = 1, 2, \dots, l - 1

, compute

R_{h}^{(l)} = \frac{D_{h}^{(l)}}{D_{h}^{(l + 1)}} .

We let

\hat{ι} = arg {max}_{0 \leq l \leq l - 1} R_{h}^{(l)}

and set

ν = D_{h}^{(\hat{ι})}

.

We report the number of estimated jumps and the average absolute estimation error, where the average absolute estimation error is the mean of the distance between the estimated change points and the true change points. As is shown in Table 1 and Table 2, there is an apparent improvement in the estimation accuracy as the jump magnitude increases and dimension decreases. The detection is relatively robust to the choice of bandwidth.

We evaluate the support recovery performance of the time-varying CLIME at the lattice

100, 200, \dots, 900

with

λ = 0.02, 0.06, 0.1

. We take the uniform kernel function and the bandwidth is fixed as

0.2

. At each time point

t_{0}

, two quantities are computed: sensitivity and specificity, which are defined as:

\begin{matrix} sensitivity & = \frac{\sum_{1 \leq j, k \leq p} I {{\hat{g}}_{j k} (t_{0}; u) \neq 0, g_{j k} (t_{0}; u) \neq 0}}{\sum_{1 \leq j, k \leq p} I {g_{j k} (t_{0}; u) \neq 0}}, \\ specificity & = \frac{\sum_{1 \leq j, k \leq p} I {{\hat{g}}_{j k} (t_{0}; u) = 0, g_{j k} (t_{0}; u) = 0}}{\sum_{1 \leq j, k \leq p} I {g_{j k} (t_{0}; u) = 0}} . \end{matrix}

We plot the Receiver Operating Characteristic (ROC) curve, that is, sensitivity against 1-specificity. From Figure 3 and Figure 4 we observe that, due to a screening step, the support recovery is robust to the choice of

λ

, except at the change points, where a non-negligible estimation error of the covariance matrix is induced and the overall estimation is less accurate. As the effective dimension of the network remains the same at

p = 50

and

p = 100

by the construction of the coefficient matrix

A_{m} (i)

, there is no significant difference in the ROC curves at different dimensions.

6. A Real Data Application

Understanding the interconnection among financial entities and how they vary over time provides investors and policy makers with insights into risk control and decision making. Reference [66] presents a comprehensive study of the applications of network theory in financial systems. In this section, we apply our method to a real financial dataset from Yahoo! Finance (finance.yahoo.com). The data matrix contains daily closing prices of 420 stocks that are always in the S&P 500 index between 2 January 2002 through 30 December 2011. In total, there are

n = 2519

time points. We select 100 stocks with the largest volatility and consider their log-returns; that is, for

j = 1, \dots, 100

,

X_{i j} = log (p_{i + 1, j} / p_{i j}),

where

p_{i j}

is the daily closing price of the stock j at time point i. We first compute the statistic (5) and (6) for the change point detection. We look at the top three statistics for different bandwidths. For bandwidth

k = n^{- 1 / 5} = 0.21

, we rank the test statistic and find that the location for the top change point is: 7 February 2008 (

n_{{\hat{s}}_{1}} = 1536

), which is shown in Figure 5. The detected change point is quite robust to a variety of choices of bandwidth. Our result is partially consistent with the change point detection method in [48]. In particular, the two breaks in 2006 and 2007 were also found in [48] and it is conjectured that the 2007 break may be associated to the U.S. house market collapse. Meanwhile, it is interesting to observe the increased volatility before the 2008 financial crisis.

Next, we estimate the time-varying networks before and after the change point at 26 May 2006 with the largest jump size. Specifically, we look at four time points at: 813, 828, 888, and 903, corresponding to 23 March 2006, 13 April 2006, 11 July 2006, and 1 August 2006. We use tv-CLIME (13) with the Epanechnikov kernel with the same bandwidth as in the change point detection to estimate the networks at the four points. Optimal tuning parameter

λ

is automatically selected according to the stability approach [67]. The following matrix shows the number of different edges at those four time points. It is observed that the time of the first two time points (813 and 828) and the last two (888 and 903) has a higher similarity than across the change point at time 858. The estimated networks are shown in Figure 6. Networks in the first and second row are estimated before and after the estimated change point at time 858, respectively. It is observed that at each time point the companies in the same section tend to be clustered together such as companies in the Energy section: OXY, NOV, TSO, MRO, and DO (highlighted in cyan). In addition, the distance matrix of estimated networks is estimated as

(\begin{matrix} 0 & 332 & 350 & 396 \\ 332 & 0 & 394 & 428 \\ 350 & 394 & 0 & 234 \\ 396 & 428 & 234 & 0 \end{matrix}) .

7. Proof of Main Results

7.1. Preliminary Lemmas

Lemma 1.

Let

{(Y_{i})}_{i \in Z}

be a sequence that admits (2). Assume

Y_{i} \in L^{q}

for

i = 1, 2, \dots

, and the dependence adjusted norm (DAN) of the corresponding underlying array

(Y_{i}^{\circ} (t))

satisfies

∥ Y_{\cdot} ∥_{q, A} < \infty

for

q > 2

and

A > 0

. Let

{(ω (t, t_{i}))}_{i = 1}^{n}

be defined in (12) and suppose that the kernel function

K (\cdot)

satisfies Assumption 5. Denote

ϖ_{q, A} (n) = n, n {(log n)}^{1 + 2 q}, n^{q / 2 - A q}

if

A > 1 / 2 - 1 / q

,

A = 1 / 2 - 1 / q

, and

0 < A < 1 / 2 - 1 / q

, respectively. Then there exist constants

C_{1}, C_{2}

and

C_{3}

independent of n, such that for all

x > 0

,

\begin{matrix} sup_{t \in (0, 1)} P (|\sum_{i = 1}^{n} w (t, t_{i}) (Y_{i} - E (Y_{i}))| > x) \leq C_{1} \frac{ϖ_{q, A} (B_{n}) {∥Y_{\cdot}∥}_{q, A}^{q}}{B_{n}^{q} x^{q}} + C_{2} exp (\frac{- C_{3} B_{n} x^{2}}{{∥Y_{\cdot}∥}_{2, A}^{2}}) . \end{matrix}

(28)

\begin{matrix} P (sup_{t \in (0, 1)} |\sum_{i = 1}^{n} w (t, t_{i}) (Y_{i} - E (Y_{i}))| > x) \leq C_{1} \frac{ϖ_{q, A} (n) {∥Y_{\cdot}∥}_{q, A}^{q}}{B_{n}^{q} x^{q}} + C_{2} exp (\frac{- C_{3} B_{n}^{2} x^{2}}{n {∥Y_{\cdot}∥}_{2, A}^{2}}) . \end{matrix}

(29)

Proof.

Let

S_{i} = \sum_{j = 1}^{i} (Y_{i} - E (Y_{i}))

. Note that

\begin{matrix} sup_{t \in (0, 1)} |\sum_{i = 1}^{n} w (t, t_{i}) Y_{i}| & = sup_{t \in (0, 1)} |\sum_{i = 1}^{n} w (t, t_{i}) (S_{i} - S_{i - 1})| \\ \leq sup_{t} |\sum_{i = 1}^{n - 1} [(w (t, t_{i}) - w (t, t_{i + 1})) S_{i}]| + sup_{t} |w (t, 1) S_{n}| \\ ≲ B_{n}^{- 1} max_{1 \leq i \leq n} | S_{i} |, \end{matrix}

where the last inequality follows from the fact that

{sup}_{t} \sum_{i = 1}^{n} | w (t, t_{i}) - w (t - t_{i + 1}) | ≍ B_{n}^{- 1}

, due to Assumption 5.

To see (29), it suffices to show

\begin{matrix} P (max_{1 \leq i \leq n} | S_{i} | > x) \leq C_{1} \frac{ϖ_{q, A} (n) {∥Y_{\cdot}∥}_{q, A}^{q}}{x^{q}} + C_{2} exp (\frac{- C_{3} x^{2}}{n {∥Y_{\cdot}∥}_{2, A}^{2}}) . \end{matrix}

(30)

Now, we develop a probability deviation inequality for

{max}_{1 \leq i \leq n} | \sum_{j = 1}^{i} α_{j} Y_{j} |

, where

α_{j} \geq 0

,

1 \leq j \leq n

are constants such that

\sum_{1 \leq j \leq n} α_{j} = 1

. Denote

P_{0} (Y_{i}) = E (Y_{i} | ε_{i}) - E (Y_{i})

and

P_{k} (Y_{i}) = E (Y_{i} | ε_{i - k}, \dots, ε_{i}) - E (Y_{i} | ε_{i - k + 1}, \dots, ε_{i}) .

Then we can write

\begin{matrix} max_{1 \leq i \leq n} | \sum_{j = 1}^{i} α_{j} Y_{j} | & \leq max_{1 \leq i \leq n} | \sum_{j = 1}^{i} α_{j} P_{0} (Y_{j}) | + max_{1 \leq i \leq n} | \sum_{k = 1}^{n} \sum_{j = 1}^{i} α_{j} P_{k} (Y_{j}) | \\ + max_{1 \leq i \leq n} | \sum_{k = n + 1}^{\infty} \sum_{j = 1}^{i} α_{j} P_{k} (Y_{j}) | . \end{matrix}

(31)

Note that

{(P_{0} (Y_{j}))}_{j \in Z}

is an independent sequence. By Nagaev’s inequality and Ottaviani’s inequality, we have that

\begin{matrix} P (max_{1 \leq i \leq n} | \sum_{j = 1}^{i} α_{j} P_{0} (Y_{j}) | \geq x) & ≲ \frac{\sum_{j = 1}^{n} α_{j}^{q} {∥P_{0} (Y_{j})∥}_{q}^{q}}{x^{q}} + exp (- \frac{C_{3} x^{2}}{\sum_{j = 1}^{n} α_{j}^{2} {∥ P_{0} (Y_{j}) ∥}_{2}^{2}}) \\ ≲ \frac{\sum_{j = 1}^{n} α_{j}^{q}}{x^{q} {∥ Y_{j} ∥}_{q}} + exp (- C_{3} \frac{x^{2}}{\sum_{j = 1}^{n} α_{j}^{2}}), \end{matrix}

(32)

where the last inequality holds because

∥ P_{0} (Y_{j}) ∥_{q} \leq 2 {∥ Y_{j} ∥}_{q}

by Jensen’s inequality. Since

\sum_{j = i + 1}^{\infty} α_{j} P_{k} (Y_{j})

is a martingale difference sequence with respect to

σ (ε_{i + 1 - k}, ε_{i + 2 - k}, \dots)

, we have that

| \sum_{k = 1 + n}^{\infty} \sum_{j = i + 1}^{n} α_{j} P_{k} (Y_{j}) |

is a non-negative sub-martingale. Then by Doob’s inequality and Burkholder’s inequality, we have

\begin{matrix} P (max_{1 \leq i \leq n} | \sum_{k = n + 1}^{\infty} \sum_{j = 1}^{i} α_{j} P_{k} (Y_{j}) | \geq x) \\ \leq P (| \sum_{k = n + 1}^{\infty} \sum_{j = 1}^{n} α_{j} P_{k} (Y_{j}) | \geq \frac{x}{2}) + P (max_{1 \leq i \leq n} | \sum_{k = n + 1}^{\infty} \sum_{j = 1 + i}^{n} α_{j} P_{k} (Y_{j}) | \geq \frac{x}{2}) \\ ≲ \frac{{∥\sum_{k = 1 + n}^{\infty} \sum_{j = 1}^{n} α_{j} P_{k} (Y_{j})∥}_{q}^{q}}{x^{q}} \\ ≲ \frac{{(\sum_{j = 1}^{n} α_{j}^{2})}^{q / 2} Θ_{n, q}^{q}}{x^{q}} \leq \frac{Θ_{n, q}^{q} n^{q / 2 - 1} \sum_{j = 1}^{n} α_{j}^{q}}{x^{q}} . \end{matrix}

(33)

Now, we deal with the term

{max}_{1 \leq i \leq n} | \sum_{k = 1}^{n} \sum_{j = 1}^{i} α_{j} P_{k} (Y_{j}) |

. Define

a_{m} = min (2^{m}, n)

and

M_{n} = ⌈ log n / log 2 ⌉

. Then

\begin{matrix} max_{1 \leq i \leq n} | \sum_{k = 1}^{n} \sum_{j = 1}^{i} α_{j} P_{k} (Y_{j}) | \leq \sum_{m = 1}^{M_{n}} max_{1 \leq i \leq n} | \sum_{l = 1}^{⌈ i / a_{m} ⌉} \sum_{j = 1 + (l - 1) a_{m}}^{min (l a_{m}, i)} \sum_{k = 1 + a_{m - 1}}^{a_{m}} α_{j} P_{k} (Y_{j}) | . \end{matrix}

(34)

Let

A_{o d d} = {1 \leq l \leq ⌈ i / a_{m} ⌉, l is odd}

and

A_{e v e n} = {1 \leq l \leq ⌈ i / a_{m} ⌉, l is even}

. We have

\begin{matrix} P (max_{1 \leq i \leq n} | \sum_{l = 1}^{⌈ i / a_{m} ⌉} Z_{l, m, i} | \geq x) \leq P (max_{1 \leq i \leq n} | \sum_{A_{o d d}} Z_{l, m, i} | \geq x / 2) + P (max_{1 \leq i \leq n} | \sum_{A_{e v e n}} Z_{l, m, i} | \geq x / 2), \end{matrix}

where we have that

Z_{l, m, i} : = \sum_{j = 1 + (l - 1) a_{m}}^{min (l a_{m}, i)} α_{j} P_{a_{m - 1}}^{a_{m}} (Y_{j})

is independent of

Z_{l + 2, m, i}

for

1 \leq l \leq ⌈ i / a_{m} ⌉, 1 \leq m \leq M_{n}, 1 \leq i \leq n

, as

P_{a_{m - 1}}^{a_{m}} (Y_{j}) : = \sum_{k = 1 + a_{m - 1}}^{a_{m}} P_{k} (Y_{j})

is

a_{m}

-dependent. Therefore, we can apply Ottaviani’s inequality and Nagaev’s inequality for independent variables. As a consequence,

P (max_{1 \leq i \leq n} | \sum_{l = 1}^{⌈ i / a_{m} ⌉} Z_{l, m, i} | \geq x) ≲ \frac{\sum_{1 \leq l \leq ⌈ n / a_{m} ⌉} {∥ Z_{l, m, n} ∥}_{q}^{q}}{x^{q}} + exp (- \frac{C_{3} x^{2}}{\sum_{1 \leq l \leq ⌈ n / a_{m} ⌉} {∥ Z_{l, m, n} ∥}_{2}^{2}}) .

Again, by Burkholder’s inequality, we have that for

q \geq 2

,

\begin{matrix} ∥ Z_{l, m, n} ∥_{q} & \leq \sum_{k = 1 + a_{m - 1}}^{a_{m}} {∥ \sum_{j = 1 + (l - 1) a_{m}}^{min (l a_{m}, n)} α_{j} P_{k} (Y_{j}) ∥}_{q} \\ ≲ {(\sum_{j = 1 + (l - 1) a_{m}}^{min (l a_{m}, n)} α_{j}^{2})}^{1 / 2} (Θ_{a_{m - 1}} - Θ_{a_{m}}) . \end{matrix}

Note

\sum_{j = 1 + (l - 1) a_{m}}^{min (l a_{m}, n)} α_{j}^{2} \leq a_{m}^{(q - 2) / q} {(\sum_{j = 1 + (l - 1) a_{m}}^{min (l a_{m}, n)} α_{j}^{q})}^{2 / q}

. Let

τ_{m} = m^{- 2} / \sum_{m = 1}^{M_{n}} m^{- 2}

, and we have

τ_{m} ≍ m^{- 2}

as

1 \leq \sum_{m = 1}^{M_{n}} m^{- 2} \leq π^{2} / 6

. In respect to (34), we have that

\begin{matrix} P (max_{1 \leq i \leq n} | \sum_{k = 1}^{n} \sum_{j = 1}^{i} P_{k} (Y_{j}) | \geq x) & \leq \sum_{m = 1}^{M_{n}} P (max_{1 \leq i \leq n} | \sum_{l = 1}^{⌈ i / a_{m} ⌉} Z_{l, m, i} | \geq τ_{m} x) \\ ≲ \frac{\sum_{i = 1}^{n} α_{j}^{q}}{x^{q}} {∥ Y_{\cdot} ∥}_{q, A}^{q} \sum_{m = 1}^{M_{n}} τ_{m}^{- q} a_{m}^{(1 / 2 - A) q - 1} + \sum_{m = 1}^{M_{n}} exp (- \frac{C_{3} x^{2} τ_{m}^{2} a_{m}^{2 A}}{\sum_{j = 1}^{n} α_{j}^{2} {∥ Y_{\cdot} ∥}_{2, A}^{2}}) . \end{matrix}

(35)

Note

\sum_{m = 1}^{M_{n}} τ_{m}^{- q} a_{m}^{(1 / 2 - A) q - 1} ≍ n^{- 1} ϖ_{q, A} (n)

, and

\sum_{m = 1}^{M_{n}} exp (- \frac{C_{3} x^{2} τ_{m}^{2} a_{m}^{2 A}}{\sum_{j = 1}^{n} α_{j}^{2} {∥ Y_{\cdot} ∥}_{2, A}^{2}}) ≲ exp (- \frac{C_{3} x^{2}}{\sum_{j = 1}^{n} α_{j}^{2} {∥ Y_{\cdot} ∥}_{2, A}^{2}}) .

Combining (31), (32), (33) and (35), we obtain

\begin{matrix} P (max_{1 \leq i \leq n} | \sum_{j = 1}^{i} α_{j} (Y_{j} - E (Y_{j})) | > x) \\ \leq C_{1} \frac{ϖ_{q, A} (n) \sum_{j = 1}^{n} α_{j}^{q} {∥Y_{\cdot}∥}_{q, A}^{q}}{n x^{q}} + C_{2} exp (\frac{- C_{3} x^{2}}{\sum_{j = 1}^{n} α_{j}^{2} {∥Y∥}_{2, A}^{2}}) . \end{matrix}

(36)

Now, we have (30) by taking

α_{j} = n^{- 1}

for

j = 1, \dots, n

. Note that since

K (\cdot)

has bounded support, for any given

t \in [b, 1 - b]

, we have

\begin{matrix} P (| \sum_{i = 1}^{n} w (t, t_{i}) (Y_{i} - E Y_{i}) | > x) \leq P (| \sum_{i = - B_{n}}^{B_{n}} w (t, t_{t n + i}) (Y_{t n + i} - E Y_{t n + i}) | > x) \\ \leq C_{1} \frac{ϖ_{q, A} (B_{n}) \sum_{i = - B_{n}}^{B_{n}} w {(t, t_{t n + i})}^{q} {∥Y_{\cdot}∥}_{q, A}^{q}}{B_{n} x^{q}} + C_{2} exp (\frac{- C_{3} x^{2}}{\sum_{i = - B_{n}}^{B_{n}} w {(t, t_{t n + i})}^{2} {∥Y_{\cdot}∥}_{2, A}^{2}}) . \end{matrix}

Therefore (28) follows from (36) by taking

α_{j} = w (t, t n + j)

, and note that for any

t \in [b, 1 - b]

,

\sum_{i = - B_{n}}^{B_{n}} w {(t, t_{t n + i})}^{β} ≍ B_{n}^{1 - β}

for a constant

β \geq 2

. □

Lemma 2.

Suppose

{(X_{i j})}_{i \in Z, 1 \leq j \leq p}

satisfys Assumption 2. Furthermore, let Assumption 5 hold. Let

ϖ_{q, A} (n)

be defined as in Lemma 1. Then there exist constants

C_{1}, C_{2}

, and

C_{3}

independent of n and p, such that for all

x > 0

, we have

\begin{matrix} sup_{t \in (0, 1)} P (| \sum_{i = 1}^{n} ω (t, t_{i}) (X_{i} X_{i}^{⊤} - E (X_{i} X_{i}^{⊤})) |_{\infty} \geq x) \\ \leq C_{1} ν_{2 q}^{q} \frac{p ϖ_{q, A} (B_{n}) M_{X, q}^{q}}{B_{n}^{q} x^{q}} + C_{2} p^{2} exp (- C_{3} \frac{B_{n} x^{2}}{ν_{4}^{2} N_{X}^{2}}), \end{matrix}

(37)

and

\begin{matrix} P (sup_{t \in (0, 1)} | \sum_{i = 1}^{n} w (t, t_{i}) (X_{i} X_{i}^{⊤} - E (X_{i} X_{i}^{⊤})) |_{\infty} \geq x) \\ \leq C_{1} ν_{2 q}^{q} \frac{p ϖ_{q, A} (n) M_{X, q}^{q}}{B_{n}^{q} x^{q}} + C_{2} p^{2} exp (- C_{3} \frac{B_{n}^{2} x^{2}}{n ν_{4}^{2} N_{X}^{2}}) . \end{matrix}

(38)

Proof.

For

1 \leq j, k \leq p

, let

Y_{i, j k} = X_{i j} X_{i k}

. We now check the conditions in Lemma 1 for

{(Y_{i, j k})}_{1 \leq i \leq n}

. Denote

Y_{i, j k, {m}} = X_{i j, {m}} X_{i k, {m}}

. Then the uniform functional dependence measure of

{(Y_{i, j k})}_{i}

is

\begin{matrix} θ_{m, q, j k}^{Y} & = & sup_{i} {∥ Y_{i, j k} - Y_{i, j k, {m}} ∥}_{q} \\ = & sup_{i} {∥ X_{i j} X_{i k} - X_{i j, {m}} X_{i k, {m}} ∥}_{q} \\ \leq & sup_{i} ∥ X_{i j} (X_{i k} - X_{i k, {m}}) ∥_{q} + sup_{i} {∥ X_{i k, {m}} (X_{i j} - X_{i j, {m}}) ∥}_{q} . \end{matrix}

Thus the DAN of the process

Y_{\cdot, j k}

satisfies that

\begin{matrix} ∥ Y_{\cdot, j k} ∥_{q, A} \leq sup_{i} ∥ X_{i j} ∥_{2 q} ∥ X_{\cdot, k} ∥_{2 q, A} + sup_{i} ∥ X_{i k} ∥_{2 q} ∥ X_{\cdot, j} ∥_{2 q, A} \leq ν_{q} (∥ X_{\cdot, k} ∥_{2 q, A} + ∥ X_{\cdot, j} ∥_{2 q, A}) . \end{matrix}

The result follows immediately from Lemma 1 and the Bonferroni inequality. □

Lemma 3.

We adopt the notation in Lemma 2. Suppose Assumptions 2, 1, and 5 hold with

ι = 0

. Recall

B_{n} = n b

, where

b \to 0

and

B_{n} / \sqrt{n} \to \infty

as

n \to \infty

. Then there exists a constant C independent of n and p such that

\hat{Σ} (t)

in (11) satisfies that for any

t \in [c, 1 - c]

,

\begin{matrix} | \hat{Σ} {(t) - Σ (t) |}_{\infty} & = O_{P} (b^{2} + M_{X, q} ν_{2 q} B_{n}^{- 1} {(p ϖ_{q, A} (B_{n}))}^{1 / q} + ν_{4} N_{X} {(log p / B_{n})}^{1 / 2}) . \end{matrix}

(39)

Furthermore,

\begin{matrix} sup_{t \in [c, 1 - c]} {| \hat{Σ} (t) - Σ (t) |}_{\infty} = O_{P} (b^{2} + M_{X, q} ν_{2 q} B_{n}^{- 1} {(p ϖ_{q, A} (n))}^{1 / q} + ν_{4} N_{X} B_{n}^{- 1} {[n log p]}^{1 / 2}) . \end{matrix}

(40)

Proof.

First, we have

\begin{matrix} E {\hat{σ}}_{j k} (t) - σ_{j k} (t) = \sum_{i = 1}^{n} w (t, t_{i}) [σ_{j k} (t_{i}) - σ_{j k} (t)] . \end{matrix}

Approximating the discrete summation with integral, we obtain for all

1 \leq j, k \leq p

,

\begin{matrix} sup_{t \in [b, 1 - b]} |E {\hat{σ}}_{j k} (t) - σ_{j k} (t) - \int_{- 1}^{1} K (u) [σ_{j k} (u b + t) - σ_{j k} (t)] d u| = O (B_{n}^{- 1}) . \end{matrix}

By Assumption 1, we have

\begin{matrix} σ_{j k} (u b + t) - σ_{j k} (t) & = u b σ_{j k}^{'} (t) + \frac{1}{2} u^{2} b^{2} σ_{j k}^{″} (t) + o (b^{2} u^{2}) . \end{matrix}

Thus we have

{sup}_{t \in [c, 1 - c]} {| E \hat{σ} (t) - σ (t) |}_{\infty} = O (B_{n}^{- 1} + b^{2})

, in view of Assumption 5. By Lemma 2, we have

\begin{matrix} sup_{t \in (0, 1)} P ({|\hat{Σ} (t) - E \hat{Σ} (t)|}_{\infty} \geq x) & \leq C_{1} p ν_{q}^{q} \frac{M_{X, q}^{q} ϖ_{q, A} (B_{n})}{B_{n}^{q} x^{q}} + C_{2} p^{2} exp (- C_{3} \frac{B_{n} x^{2}}{N_{X}^{2}}) . \end{matrix}

Denote

u = C_{4} (M_{X, q} ν_{2 q} B_{n}^{- 1} {(p ϖ_{q, A} (B_{n}))}^{1 / q} + ν_{4} N_{X} {(log p / B_{n})}^{1 / 2})

for a large enough constant

C_{4}

, then for any

t \in (0, 1)

,

\begin{matrix} {|\hat{Σ} (t) - E \hat{Σ} (t)|}_{\infty} = O_{P} (u) . \end{matrix}

Thus (39) is proved. The result (40) can be obtained similarly. □

7.2. Proof of Main Results

Proof of Proposition 1.

Given (39) and (40), the proof of (16) is standard. (See, e.g., Theorem 6 of [11]). For

λ^{\circ}

and

λ^{*}

given in Proposition 1, by Lemma 3, we have that, respectively,

\begin{matrix} λ^{\circ} & \geq sup_{t} E (κ_{p} {| \hat{Σ} (t) - Σ (t) |}_{\infty}), \end{matrix}

(41)

\begin{matrix} λ^{⋄} & \geq E (κ_{p} sup_{t} {| \hat{Σ} (t) - Σ (t) |}_{\infty}) . \end{matrix}

(42)

Then note that for any

t \in [0, 1]

, for any

λ > 0

,

\begin{matrix} | {\hat{Ω}}_{λ} {(t) - Ω (t) |}_{\infty} \leq {| Ω (t) |}_{L_{1}} {| Σ (t) {\hat{Ω}}_{λ} (t) - {Id}_{p} |}_{\infty} \\ \leq {| Ω (t) |}_{L_{1}} [| \hat{Σ} (t) {\hat{Ω}}_{λ} (t) - {Id}_{p} |_{\infty} + | (Σ (t) - \hat{Σ} (t)) {Ω (t) |}_{\infty} + | {\hat{Ω}}_{λ} {(t) - Ω (t) |}_{L^{1}} {| \hat{Σ} (t) - Σ (t) |}_{\infty}] \end{matrix}

where by construction, we have

| \hat{Σ} (t) {\hat{Ω}}_{λ} (t) - {Id}_{p} |_{\infty} \leq λ

and

| {\hat{Ω}}_{λ} {(t) - Ω (t) |}_{L^{1}} \leq 2 κ_{p}

. Consequently,

\begin{matrix} | {\hat{Ω}}_{λ} {(t) - Ω (t) |}_{\infty} \leq κ_{p} (λ + 3 κ_{p} {| \hat{Σ} (t) - Σ (t) |}_{\infty}) . \end{matrix}

(43)

Then (16) and (17) follow from (41) to (43). □

Proof of Proposition 2.

Theorem 2 is an immediate result of (17). □

Proof of Theorem 1.

Denote

r_{j}, 1 \leq j \leq ι

as the time point(s) of the time of jump ordered decreasingly in the sense of the infinite norm of covariance matrices, i.e.,

| Δ (r_{1}) |_{\infty} \geq | Δ (r_{2}) |_{\infty} \geq \dots \geq | Δ (r_{ι}) |_{\infty} \geq {| Δ (s) |}_{\infty}

for

s \in (0, 1) \cap {r_{1}, \dots, r_{ι}}^{c}

. (Temporal order is applied if there is a tie.) Let

T_{h} (j) = [r_{j} - h, r_{j} + h)

. For

h = o (1)

, as a result of Assumption 3,

T_{h} (j) \cap T_{h} (i) = \emptyset

if

i \neq j

for n sufficiently large. That is to say, each time point

s \in (0, 1)

is in the neighborhood of, at most, one change point.

For any

s \in [t^{(j)}, t^{(j + 1)})

,

j = 0, 1, \dots, ι

, denote

D (s) = E [D (s)]

and

D^{⋄} (s) = \{\begin{matrix} (h - s + t^{(j)}) Δ (t^{(j)}), & t^{(j)} \leq s < t^{(j)} + h \\ 0, & t^{(j)} + h \leq s < t^{(j + 1)} - h \\ (h + s - r) Δ (t^{(j + 1)}), & t^{(j + 1)} - h \leq s \leq t^{(j + 1)} . \end{matrix}

(44)

Then, for

s \in \cup_{1 \leq j \leq ι} [t^{(j)} + h, < t^{(j + 1)} - h)

, by (3), we have

{| Σ (s + t) - Σ (s) |}_{\infty} \leq L t, \forall | t | \leq h,

we can easily verify that

sup_{s \in [0, 1]} {| D (s) - D^{⋄} (s) |}_{\infty} \leq L h^{2} .

(45)

Note that

| D^{⋄} {(s) |}_{\infty}

is maximized at

s = r_{1}

and

| D^{⋄} (r_{1}) |_{\infty} = h {| Δ (r_{1}) |}_{\infty}

. By the triangle inequalities, we have that for some positive constant C, for any

s \in [0, 1]

,

\begin{matrix} | D (r_{1}) |_{\infty} - {| D (s) |}_{\infty} & \geq & h c_{2} - | D (r_{1}) - D^{⋄} (r_{1}) |_{\infty} - | D^{⋄} (s) |_{\infty} - {| D (s) - D^{⋄} (s) |}_{\infty} \\ \geq & h c_{2} - {| D^{⋄} (s) |}_{\infty} - 2 L h^{2} \\ \geq & c_{2} (| s - r_{1} | \land h) - 2 L h^{2} . \end{matrix}

(46)

On the other hand, since

| D (r_{1}) |_{\infty} \leq {| D ({\hat{s}}_{1}) |}_{\infty}

, we have

\begin{matrix} | D (r_{1}) |_{\infty} - {| D ({\hat{s}}_{1}) |}_{\infty} & \leq | D (r_{1}) |_{\infty} - | D ({\hat{s}}_{1}) |_{\infty} + | D (r_{1}) - D (r_{1}) |_{\infty} + {| D ({\hat{s}}_{1}) - D ({\hat{s}}_{1}) |}_{\infty} \\ \leq | D (r_{1}) - D (r_{1}) |_{\infty} + {| D ({\hat{s}}_{1}) - D ({\hat{s}}_{1}) |}_{\infty} . \end{matrix}

(47)

Denote the event

A : = {{sup}_{s \in [h, 1 - h]} | D (s) - D (s) |_{\infty} \leq h_{⋄}^{2}}

and let

Y_{i} = {(Y_{i, j k})}_{1 \leq j, k \leq p}

,

Y_{i, j k} = X_{i j} X_{i k} - σ_{i, j k}

. Note that

| D_{j k} (s) - D_{j k} (s) | = \frac{1}{n} |\sum_{i = 1}^{h n} Y_{n_{s} + 1 - i, j k} - \sum_{i = 1}^{h n} Y_{n_{s} + i, j k}| .

(48)

By Lemma 2, we have for any

x > 0

,

\begin{matrix} P (sup_{s \in [h, 1 - h]} {| D (s) - D (s) |}_{\infty} \geq x) \leq C_{1} \frac{p ϖ_{q, A} (n) M_{X, q}^{q} ν_{2 q}^{q}}{n^{q} x^{q}} + C_{2} p^{2} exp (- C_{3} \frac{n x^{2}}{N_{X}^{2}}) . \end{matrix}

(49)

It follows that

| D (r_{1}) |_{\infty} - {| D ({\hat{s}}_{1}) |}_{\infty} = O_{P} (h^{- 1} J_{q, A} (n, p) + N_{X} h^{- 1} {(n^{- 1} log (p))}^{1 / 2}) .

Taking

h = h_{⋄}

, we have

| {\hat{s}}_{1} - r_{1} | = O_{P} (h_{⋄}^{2}) .

Furthermore, we have

P (A) \geq 1 - C_{1} {(\frac{p ϖ_{q, A} (n) M_{X, q}^{q} ν_{2 q}^{q}}{n^{q} c_{2}^{q}})}^{1 / 3} - C_{2} p^{2} exp (- C_{3} {(\frac{n {log}^{2} (p)}{N_{X}^{2}})}^{1 / 3}) .

Let

A_{k} : = {{max}_{1 \leq j \leq k} | {\hat{s}}_{j} - r_{j} | \leq c_{2}^{- 1} 2 (L + 1) h_{⋄}^{2}}

for some

1 \leq k \leq ι

. Assume

A_{k} \subset A

. Under

A_{k}

we have that

[r_{j} - h_{⋄}, r_{j} + h_{⋄}) \subset {\hat{T}}_{2 h_{⋄}} (j) = : [{\hat{s}}_{j} - 2 h_{⋄}, {\hat{s}}_{j} + 2 h_{⋄})

for

1 \leq j \leq k

and

r_{k + 1} \notin \cup_{1 \leq j \leq k} {\hat{T}}_{2 h_{⋄}} (j)

as a consequence of Assumption 3. According to (46) and (47), we have if

A

is true,

| {\hat{s}}_{k + 1} - r_{k + 1} | \leq c_{2}^{- 1} 2 (L + 1) h_{⋄}^{2}

, which implies

A_{k + 1} \subset A

. The result (21) follows from deduction.

Suppose

A

holds. By the choice of

ν

, as a consequence of (45) and (49), and that

ν ≪ h_{⋄}

, we have that

sup_{s \in [0, 1]} {| D (s) - D^{⋄} (s) |}_{\infty} \leq ν .

As a result,

min_{1 \leq j \leq ι} {| D (r_{j}) |}_{\infty} \geq c_{2} h_{⋄} - ν \geq ν,

i.e.,

\hat{ι} \geq ι

. On the other hand, since

\cup_{1 \leq j \leq ι} {\hat{T}}_{2 h_{⋄}} (j)

is excluded from the searching region for

s_{ι + 1}

, we have

sup_{s \in {(\cup_{1 \leq j \leq ι} {\hat{T}}_{2 h_{⋄}} (j))}^{c}} {| D (s) |}_{\infty} \leq ν .

In other words,

{\hat{ι} = ι} \subset A

. Thus (20) is proved. □

Proof of Theorem 2.

We adopt the notations in the proof of Theorem 1 and assume that

E

holds. Similar to Lemma 3, we have that by Lemma 2, for any

t \in (0, 1)

,

\begin{matrix} {|\hat{Σ} (t) - E \hat{Σ} (t)|}_{\infty} = O_{P} (u), \end{matrix}

where

u = C_{4} (M_{X, q} ν_{2 q} B_{n}^{- 1} {(p ϖ_{q, A} (B_{n}))}^{1 / q} + ν_{4} N_{X} {(log p / B_{n})}^{1 / 2})

for a large enough constant

C_{4}

.

Since under

E

,

T_{b} (j) \subset {\hat{T}}_{b + h_{⋄}^{2}} (j)

. For

t \in {(\cup_{1 \leq j \leq ι} {\hat{T}}_{b + h_{⋄}^{2}} (j))}^{c} \cap [b, 1 - b]

, we have that for all

1 \leq j, k \leq p

,

\begin{matrix} |E {\hat{σ}}_{j k} (t) - σ_{j k} (t)| & = \int_{- 1}^{1} K (u) [σ_{j k} (u b + t) - σ_{j k} (t)] d u + O (B_{n}^{- 1}) \\ = b σ_{j k}^{'} (t) \int_{- 1}^{1} u K (u) d u + (\frac{1}{2} b^{2} σ_{j k}^{″} (t) + o (b^{2})) \int_{- 1}^{1} u^{2} K (u) d u + O (B_{n}^{- 1}) \\ = O (b^{2} + B_{n}^{- 1}) . \end{matrix}

On the other hand, for

t \in \cup_{1 \leq j \leq ι} ({\hat{T}}_{b + h_{⋄}^{2}} (j) \cap T_{h_{⋄}^{2}}^{c} (j)) \cup [0, b] \cup [1 - b, 1]

, due to reflection, we no longer have that differentiability. As a result of the Lipschitz continuity, we get

\begin{matrix} |E {\hat{σ}}_{j k} (t) - σ_{j k} (t)| = \int_{- 1}^{1} K (u) [σ_{j k} (u b + t) - σ_{j k} (t)] d u + O (B_{n}^{- 1}) = O (b + B_{n}^{- 1}) . \end{matrix}

The result (22) follows by the choices of b. The rest of the proof are similar to that of Proposition 1 and Theorem 2. □

Author Contributions

Methodology, M.X., X.C., W.B.W.; writing—original draft preparation, M.X., X.C., W.B.W.; writing—review and editing, M.X., X.C., W.B.W., software, M.X. All authors have read and agreed to the published version of the manuscript.

Funding

X.C.’s research is supported in part by NSF CAREER Award DMS-1752614 and UIUC Research Board Award RB18099. W.B.W.’s research is supported in part by NSF DMS-1405410.

Acknowledgments

X.C. acknowledges that part of this work was carried out at the MIT Institute for Data, System, and Society (IDSS).

Conflicts of Interest

The authors declare no conflict of interest.

References

Lauritzen, S. Graphical Models; Clarendon Press: Oxford, UK, 1996. [Google Scholar]
Peng, J.; Wang, P.; Zhou, N.; Zhu, J. Partial Correlation Estimation by Joint Sparse Regression Models. J. Am. Stat. Assoc. 2009, 104, 735–746. [Google Scholar] [CrossRef]
Meinshausen, N.; Bühlmann, P. High-dimensional graphs and variable selection with the lasso. Ann. Stat. 2006, 34, 1436–1462. [Google Scholar] [CrossRef]
Friedman, J.; Hastie, T.; Tibshirani, R. Sparse Inverse Covariance Estimation with the Graphical Lasso. Biostatistics 2008, 9, 432–441. [Google Scholar] [CrossRef]
Banerjee, O.; El Ghaoui, L.; d’Aspremont, A. Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J. Mach. Learn. Res. 2008, 9, 485–516. [Google Scholar]
Rothman, A.J.; Bickel, P.J.; Levina, E.; Zhu, J. Sparse Permutation Invariant Covariance Estimation. Electron. J. Stat. 2008, 2, 494–515. [Google Scholar] [CrossRef]
Yuan, M. High Dimensional Inverse Covariance Matrix Estimation via Linear Programming. J. Mach. Learn. Res. 2010, 11, 2261–2286. [Google Scholar]
Yuan, M.; Lin, Y. Model selection and estimation in the Gaussian graphical model. Biometrika 2007, 94, 19–35. [Google Scholar] [CrossRef]
Ravikumar, P.; Wainwright, M.J.; Raskutti, G.; Yu, B. High-dimensional covariance estimation by minimizing ℓ₁-penalized log-determinant divergence. Electron. J. Stat. 2011, 5, 935–980. [Google Scholar] [CrossRef]
Candès, E.; Tao, T. Rejoinder: “The Dantzig selector: Statistical estimation when p is much larger than n”. Ann. Stat. 2007, 35, 2392–2404. [Google Scholar] [CrossRef]
Cai, T.; Liu, W.; Luo, X. A constrained ℓ₁ minimization approach to sparse precision matrix estimation. J. Am. Stat. Assoc. 2011, 106, 594–607. [Google Scholar] [CrossRef]
Cai, T.; Liu, W. Adaptive thresholding for sparse covariance matrix estimation. J. Am. Stat. Assoc. 2011, 106, 672–684. [Google Scholar] [CrossRef]
Fan, J.; Feng, Y.; Wu, Y. Network Exploration via the Adaptive Lasso and SCAD penalties. Ann. Appl. Stat. 2009, 3, 521–541. [Google Scholar] [CrossRef]
Basu, S.; Shojaie, A.; Michailidis, G. Network Granger causality with inherent grouping structure. J. Mach. Learn. Res. 2015, 16, 417–453. [Google Scholar]
Loh, P.L.; Bühlmann, P. High-dimensional learning of linear causal networks via inverse covariance estimation. J. Mach. Learn. Res. 2014, 15, 3065–3105. [Google Scholar]
Loh, P.L.; Wainwright, M.J. Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses. Ann. Stat. 2013, 41, 3022–3049. [Google Scholar] [CrossRef]
Lèbre, S.; Becq, J.; Devaux, F.; Stumpf, M.P.; Lelandais, G. Statistical inference of the time-varying structure of gene-regulation networks. BMC Syst. Biol. 2010, 4, 1–16. [Google Scholar] [CrossRef]
Przytycka, T.M.; Singh, M.; Slonim, D.K. Toward the dynamic interactome: It’s Toward the dynamic interactome: it’s about time. Brief. Bioinform. 2010, 11, 15–29. [Google Scholar] [CrossRef]
Khandani, A.E.; Lo, A.W. What happened to the quants in August 2007? Evidence from factors and transactions data. J. Financ. Mark. 2011, 14, 1–46. [Google Scholar] [CrossRef]
Chi, K.T.; Liu, J.; Lau, F.C. A network perspective of the stock market. J. Empir. Financ. 2010, 17, 659–667. [Google Scholar]
Durante, D.; Dunson, D.B.; Vogelstein, J.T. Nonparametric Bayes modeling of populations of networks. J. Am. Stat. Assoc. 2017, 112, 1516–1530. [Google Scholar] [CrossRef]
Durante, D.; Dunson, D.B. Locally adaptive dynamic networks. Ann. Appl. Stat. 2016, 10, 2203–2232. [Google Scholar] [CrossRef]
Han, Q.; Xu, K.; Airoldi, E. Consistent estimation of dynamic and multi-layer block models. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 1511–1520. [Google Scholar]
Danaher, P.; Wang, P.; Witten, D.M. The joint graphical lasso for inverse covariance estimation across multiple classes. J. R. Stat. Soc. Ser. B Stat. Methodol. 2014, 76, 373–397. [Google Scholar] [CrossRef]
Dondelinger, F.; Lèbre, S.; Husmeier, D. Non-homogeneous dynamic Bayesian networks with Bayesian regularization for inferring gene regulatory networks with gradually time-varying structure. Mach. Learn. 2013, 90, 191–230. [Google Scholar] [CrossRef]
Pensky, M. Dynamic network models and graphon estimation. Ann. Stat. 2019, 47, 2378–2403. [Google Scholar] [CrossRef]
Pensky, M.; Zhang, T. Spectral clustering in the dynamic stochastic block model. Electron. J. Stat. 2019, 13, 678–709. [Google Scholar] [CrossRef]
Bhattacharjee, M.; Banerjee, M.; Michailidis, G. Change Point Estimation in a Dynamic Stochastic Block Model. arXiv 2018, arXiv:1812.03090. [Google Scholar]
Bartlett, T.E.; Kosmidis, I.; Silva, R. Two-way sparsity for time-varying networks, with applications in genomics. arXiv 2018, arXiv:1802.08114. [Google Scholar]
Gaucher, S.; Klopp, O. Maximum likelihood estimation of sparse networks with missing observations. arXiv 2019, arXiv:1902.10605. [Google Scholar]
Erdös, P.; Rényi, A. On Random Graphs I. Publ. Math. Debr. 1959, 6, 290–297. [Google Scholar]
Penrose, M. Random Geometric Graphs; Oxford University Press: Oxford, UK, 2003. [Google Scholar]
Zhou, S.; Lafferty, J.; Wasserman, L. Time Varying Undirected Graphs. Mach. Learn. 2010, 80, 295–319. [Google Scholar] [CrossRef]
Kolar, M.; Xing, E. On time varying undirected graphs. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS 2011), Ft. Lauderdale, FL, USA, 11–13 April 2011. [Google Scholar]
Kolar, M.; Song, L.; Xing, E. Estimating time-varying networks. Ann. Appl. Stat. 2010, 4, 94–123. [Google Scholar] [CrossRef]
Kolar, M.; Xing, E.P. Sparsistent Estimation Of Time-Varying Markov Sparsistent Estimation Of Time-Varying Markov Random Fields. arXiv 2009, arXiv:0907.2337. [Google Scholar]
Qiu, H.; Han, F.; Liu, H.; Caffo, B. Joint estimation of multiple graphical models from high dimensional time series. J. R. Stat. Soc. Ser. B Stat. Methodol. 2015, 78, 487–504. [Google Scholar] [CrossRef]
Lu, J.; Kolar, M.; Liu, H. Post-regularization Inference for Dynamic Nonparanormal Graphical Models. arXiv 2015, arXiv:1512.08298. [Google Scholar]
Ahmed, A.; Xing, E.P. Recovering time-varying networks of dependencies Recovering time-varying networks of dependencies in social and biological studies. Proc. Natl. Acad. Sci. USA 2009, 106, 11878–11883. [Google Scholar] [CrossRef]
Tibshirani, R.; Saunders, M.; Rosset, S.; Zhu, J.; Knight, K. Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B 2005, 67, 91–108. [Google Scholar] [CrossRef]
Cho, H.; Fryzlewicz, P. Multiple-change-point detection for high dimensional time series via sparsified binary segmentation. J. R. Stat. Soc. Ser. B Stat. Methodol. 2015, 77, 475–507. [Google Scholar] [CrossRef]
Roy, S.; Atchadè, Y.; Michailidis, G. Change-point estimation in high-dimensional Markov random field models. J. R. Stat. Soc. Ser. B Stat. Methodol. 2017, 79, 1187–1206. [Google Scholar] [CrossRef]
Zhou, S. Gemini: Graph estimation with matrix variate normal instances. Ann. Stat. 2014, 42, 532–562. [Google Scholar] [CrossRef]
Tong, H. Non-Linear Time Series: A Dynamical System Approach; Oxford University Press: Oxford, UK, 1993. [Google Scholar]
Fan, J.; Yao, Q. Nonlinear Time Series: Nonparmatric and Parametric Methods; Springer: Berlin/Heidelberg, Germany, 2003. [Google Scholar]
Fryzlewicz, P. Wild Binary Segmentation for multiple change-point detection. Ann. Stat. 2014, 42, 2243–2281. [Google Scholar] [CrossRef]
Kokoszka, P.; Leipus, R. Change-point estimation in ARCH models. Bernoulli 2000, 6, 513–539. [Google Scholar] [CrossRef]
Aue, A.; Hörmann, S.; Horváth, L.; Reimherr, M. Break detection in the covariance structure of multivariate time series models. Ann. Stat. 2009, 37, 4046–4087. [Google Scholar] [CrossRef]
Chang, C.; Glover, G.H. Time-frequency dynamics of resting-state brain connectivity measured with fMRI. NeuroImage 2010, 50, 81–98. [Google Scholar] [CrossRef] [PubMed]
Hutchison, M.; Womelsdorf, T.; Gati, J.; Everling, S.; Menon, R. Resting-state networks show dynamic functional connectivity in awake humans and anesthetized macaques. Hum. Brain Mapp. 2013, 34, 2154–2177. [Google Scholar] [CrossRef]
Wiesel, A.; Bibi, O.; Globerson, A. Time varying autoregressive moving average models for covariance estimation. IEEE Trans. Signal Process. 2013, 61, 2791–2801. [Google Scholar] [CrossRef]
Qiu, H.; Han, F.; Liu, H.; Caffo, B. Robust Portfolio Optimization under High Dimensional Heavy-Tailed Time Series; Technical Report; Johns Hopkins University: Baltimore, MD, USA, 2014. [Google Scholar]
Chen, X.; Xu, M.; Wu, W.B. Covariance and precision matrix estimation for high-dimensional time series. Ann. Stat. 2013, 41, 2994–3021. [Google Scholar] [CrossRef]
Chen, X.; Xu, M.; Wu, W.B. Regularized Estimation of Linear Functionals of Precision Matrices for High-Dimensional Time Series. IEEE Trans. Signal Process. 2016, 64, 6459–6470. [Google Scholar] [CrossRef]
Basu, S.; Michailidis, G. Regularized estimation in sparse high-dimensional time series models. Ann. Stat. 2015, 43, 1535–1567. [Google Scholar] [CrossRef]
Bhattacharjee, M.; Bose, A. Consistency of large dimensional sample covariance matrix under weak dependence. Stat. Methodol. 2014, 20, 11–26. [Google Scholar] [CrossRef]
Shu, H.; Nan, B. Estimation of Large Covariance and Precision Matrices from Temporally Dependent Observations. arXiv 2014, arXiv:1412.5059. [Google Scholar] [CrossRef]
Draghicescu, D.; Guillas, S.; Wu, W.B. Quantile curve estimation and visualization for nonstationary time series. J. Comput. Graph. Stat. 2009, 18, 1–20. [Google Scholar] [CrossRef]
Wu, W.B. Nonlinear system theory: Another look at dependence. Proc. Natl. Acad. Sci. USA 2005, 102, 14150–14154. [Google Scholar] [CrossRef] [PubMed]
Zhou, Z.; Wu, W.B. Local linear quantile estimation for nonstationary time series. Ann. Stat. 2009, 37, 2696–2729. [Google Scholar] [CrossRef]
Zhou, Z.; Wu, W.B. Simultaneous inference of linear models with time varying coefficients. J. R. Stat. Soc. 2010, 72, 513–531. [Google Scholar] [CrossRef]
Wu, W.B.; Wu, Y.N. Performance bounds for parameter estimates of high-dimensional linear models with correlated errors. Electron. J. Stat. 2016, 10, 352–379. [Google Scholar] [CrossRef]
Ltkepohl, H. New Introduction to Multiple Time Series Analysis; Springer: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
Chow, Y.; Teicher, H. Probability Theory: Independence, Interchangeability, Martingales; Springer: New York, NY, USA, 1997; p. 414. [Google Scholar]
Ding, X.; Qiu, Z.; Chen, X. Sparse transition matrix estimation for high-dimensional and locally stationary vector autoregressive models. Electron. J. Stat. 2017, 11, 3871–3902. [Google Scholar] [CrossRef]
Allen, F.; Babus, A. Networks in Finance. In The Network Challenge: Strategy, Profit, and Risk in an Interlinked World; FT Press: Hoboken, NJ, USA, 2009. [Google Scholar]
Liu, H.; Roeder, K.; Wasserman, L. Stability Approach to Regularization Selection (StARS) for High-Dim Graphical Models. In Proceedings of the 23rd International Conference on Neural Information Processing Systems (NIPS’10), Vancouver, BC, Canada, 6–9 December 2010. [Google Scholar]

Figure 1. Support of the true covariance matrices,

p = 50

.

Figure 1. Support of the true covariance matrices,

p = 50

.

Figure 2. Support of the true covariance matrices,

p = 100

.

Figure 2. Support of the true covariance matrices,

p = 100

.

Figure 3. ROC curve of the time-varying CLIME,

p = 50

.

Figure 3. ROC curve of the time-varying CLIME,

p = 50

.

Figure 4. ROC curve of the time-varying CLIME,

p = 100

.

Figure 4. ROC curve of the time-varying CLIME,

p = 100

.

Figure 5. Break size

| D_{s} |_{\infty}

. From 4 February 2004, to 30 November 2009.

Figure 5. Break size

| D_{s} |_{\infty}

. From 4 February 2004, to 30 November 2009.

Figure 6. Estimated networks at time points 813, 828, 888, and 903, corresponding to 23 March 2006, 13 April 2006, 11 July 2006, and 1 August 2006. Colors correspond to the nine sections in the S&P dataset.

Table 1. Average distance.

	Bandwidth	0.14	0.16	0.18	0.2	0.22	0.24
$p = 50$	$δ_{0} = 1$	23.4	21.0	17.47	16.6	14.7	16.5
$p = 50$	$δ_{0} = 2$	7.4	6.9	8.3	8.1	7.2	6.3
$p = 100$	$δ_{0} = 1$	37.2	30.1	26.4	25.5	21.2	21.3
$p = 100$	$δ_{0} = 2$	7.8	8.2	9.9	6.9	8.9	7.6

Table 2. Number of estimated change points.

	Bandwidth	0.14	0.16	0.18	0.2	0.22	0.24
$p = 50$	$δ_{0} = 1$	2.38	2.16	1.99	2.00	2.00	2.00
$p = 50$	$δ_{0} = 2$	2.46	2.31	2.00	2.00	2.00	2.00
$p = 100$	$δ_{0} = 1$	2.25	2.09	1.99	1.99	2.00	2.00
$p = 100$	$δ_{0} = 2$	2.38	2.19	2.00	2.00	2.00	2.00

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xu, M.; Chen, X.; Wu, W.B. Estimation of Dynamic Networks for High-Dimensional Nonstationary Time Series. Entropy 2020, 22, 55. https://doi.org/10.3390/e22010055

AMA Style

Xu M, Chen X, Wu WB. Estimation of Dynamic Networks for High-Dimensional Nonstationary Time Series. Entropy. 2020; 22(1):55. https://doi.org/10.3390/e22010055

Chicago/Turabian Style

Xu, Mengyu, Xiaohui Chen, and Wei Biao Wu. 2020. "Estimation of Dynamic Networks for High-Dimensional Nonstationary Time Series" Entropy 22, no. 1: 55. https://doi.org/10.3390/e22010055

APA Style

Xu, M., Chen, X., & Wu, W. B. (2020). Estimation of Dynamic Networks for High-Dimensional Nonstationary Time Series. Entropy, 22(1), 55. https://doi.org/10.3390/e22010055

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Estimation of Dynamic Networks for High-Dimensional Nonstationary Time Series

Abstract

1. Introduction

2. Time Series Model

3. Method: Change Point Estimation and Support Recovery

4. Theoretical Results

5. A Simulation Study

6. A Real Data Application

7. Proof of Main Results

7.1. Preliminary Lemmas

7.2. Proof of Main Results

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI