1. Introduction
Graphical models are instrumental in the analysis of multivariate data. Originally, these models have been employed for independently sampled data, but their use has been extended to multivariate, stationary time series [
1,
2], which triggered their popularity in statistics, machine learning, signal processing and neuroinformatics.
For better understanding the significance of graphical models, let
$\mathbf{x}$ be a random vector having a Gaussian distribution with zeromean and positive definite covariance matrix
$\mathsf{\Gamma}$. A graph
$G=(V,E)$ can be assigned to
$\mathbf{x}$ in order to visualize the conditional independence between its components. The symbol
V denotes the vertices of
G, while
E is the set of its edges. There are no loops from a vertex to itself, nor multiple edges between two vertices. Hence,
E is a subset of
$\left(\right)$. Each vertex of the graph is assigned to an entry of
$\mathbf{x}$. We conventionally draw an edge between two vertices
a and
b if the random variables
${\mathbf{x}}_{a}$ and
${\mathbf{x}}_{b}$ are
not conditionally independent, given all other components of
$\mathbf{x}$. The description above follows the main definitions from [
3] and assumes that the graph
G is
undirected. Proposition 1 from the same reference provides a set of equivalent conditions for conditional independence. The most interesting one claims that
${\mathbf{x}}_{a}$ and
${\mathbf{x}}_{b}$ are conditionally independent if and only if the entry
$(a,b)$ of
${\mathsf{\Gamma}}^{1}$ is zero. This shows that the missing edges of
G correspond to zeroentries in the inverse of the covariance matrix, which is called the concentration matrix.
There is an impressive amount of literature on graphical models. In this work, we focus on a generalization of this problem to time series. The main difference between the static case and the dynamic case is that the former relies on the sparsity pattern of the concentration matrix, whereas the latter is looking for zeros in the inverse of the spectral density matrix. One of the main difficulties stems from the fact that the methods developed in the static case cannot be applied straightforwardly to time series.
Parametric as well as nonparametric methods have been already proposed in the previous literature dedicated to graphical models for time series. Some of the recently introduced estimation methods are based on convex optimization. We briefly discuss below the most important algorithms which belong to this class.
Reference [
4] extends the static case by allowing the presence of latent variables. The key point of their approach is to express the manifest concentration matrix as the sum of a sparse matrix and a lowrank matrix. Additionally, they provide conditions for the decomposition to be unique, in order to guarantee the identifiability. The two matrices are estimated by minimizing a penalized likelihood function, where the penalty involves both the
${\ell}_{1}$norm and the nuclear norm. Interestingly enough, the authors of the discussion paper [
5] pointed out that an alternative solution, which relies on the ExpectationMaximization algorithm, can be easily obtained.
In the dynamic case, reference [
6] has an important contribution which consists in showing that the graphical models for multivariate autoregressive processes can be estimated by solving a convex optimization problem which follows from the application of the Maximum Entropy principle. This paved the way for the development of efficient algorithms dedicated to topology selection in graphical models of autoregressive processes [
7,
8] and autoregressive moving average processes [
9].
A happy marriage between the approach from [
4] and the use of Maximum Entropy led to the solution proposed in [
10] for the identification of graphical models of autoregressive processes with latent variables. Similar to [
4], the estimation is done by minimizing a cost function whose penalty term is given by a linear combination of the
${\ell}_{1}$norm and the nuclear norm. The two coefficients of this linear combination are chosen by the user and they have a strong influence on the estimated model. The method introduced in [
10] performs the estimation for various pairs of coefficients which yield a set of candidate models; the winner is decided by using a
score function.
According to the best of our knowledge, there is no other work that extends the estimation method from [
5] to the case of latentvariable autoregressive models. The main contribution of this paper is to propose an algorithm of this type, which combines the strengths of ExpectationMaximization and convex optimization. The key point for achieving this goal is to apply the Maximum Entropy principle.
The rest of the paper is organized as follows. In the next section, we introduce the notation and present the method from [
10].
Section 3 outlines the newly proposed algorithm. The outcome of the algorithm is a set of models from which we choose the best one by employing information theoretic (IT) criteria.
Section 4 is focused on the description of these criteria: We discuss the selection rules from the previous literature and propose a novel criterion. The experimental results are reported in
Section 5.
Section 6 concludes the paper.
2. Preliminaries and Previous Work
Let
${\mathbf{x}}_{1},\dots ,{\mathbf{x}}_{T}$ be a
$\kappa $dimensional (
$\kappa >1$) time series generated by a stationary and stable
$\mathrm{VAR}$ process of order
p. We assume that the spacing of observation times is constant and
${\mathbf{x}}_{t}={\left(\right)}^{{\mathbf{x}}_{1t}}\top $. The symbol
${\left(\xb7\right)}^{\top}$ denotes transposition. The difference equation of the process is
where
${\mathbf{A}}_{1},\dots ,{\mathbf{A}}_{p}$ are matrix coefficients of size
$\kappa \times \kappa $ and
${\mathbf{\u03f5}}_{t}$ is a sequence of independently and identically distributed random
$\kappa $vectors. We assume that the vectors
${\left(\right)}_{{\mathbf{\u03f5}}_{t}}^{}$ are drawn from a
$\kappa $variate Gaussian distribution with zero mean vector and covariance matrix
$\mathsf{\Sigma}\succ 0$. Additionally, the vectors
${\left(\right)}_{{\mathbf{x}}_{t}}^{}$ are assumed to be constant.
The conditional independence relations between the variables in
${\mathbf{x}}_{t}$ are provided by the inverse of the spectral density matrix (ISDM) of the
$\mathrm{VAR}$process
$\left\{{\mathbf{x}}_{t}\right\}$. The ISDM has the expression
where
$j=\sqrt{1}$ and
${\left(\xb7\right)}^{H}$ is the operator for conjugate transpose. We define
${\mathbf{A}}_{0}=\mathbf{I}$, where
$\mathbf{I}$ stands for the identity matrix of appropriate size, and
$\mathbf{A}\left(\omega \right)={\sum}_{i=0}^{p}{\mathbf{A}}_{i}{e}^{j\omega i}$. For
$i\ge 0$, we have that
${\mathbf{Q}}_{i}={\sum}_{k=0}^{pi}{\mathbf{A}}_{k}^{\top}{\mathsf{\Sigma}}^{1}{\mathbf{A}}_{k+i}$ and
${\mathbf{Q}}_{i}={\mathbf{Q}}_{i}^{\top}$. The sparse structure of the ISDM contains conditional dependence relations between the variables of
${\mathbf{x}}_{t}$, i.e., two variables
${\mathbf{x}}_{a}$ and
${\mathbf{x}}_{b}$ are independent, conditional on the other variables, if and only if [
1,
11]
In the graph corresponding to the ISDM ${\mathsf{\Phi}}^{1}\left(\omega \right)$, the nodes stand for the variables of the model, and the edges stand for conditional dependence, i.e., there is no edge between conditionally independent variables.
In a latentvariable graphical model it is assumed that
$\kappa =\mathrm{K}+\mathrm{r}$, where
$\mathrm{K}$ variables are accessible to observation (they are called manifest variables) and
$\mathrm{r}$ variables are latent, i.e., not accessible to observation, but playing a significant role in the conditional independence pattern of the overall model. The existence of latent variables in a model can be described in terms of the ISDM by the block decomposition
where
${\mathsf{\Phi}}_{m}\left(\omega \right)$ and
${\mathsf{\Phi}}_{\ell}\left(\omega \right)$ are the manifest and latent components of the spectral density matrix, respectively. Using the Schur complement, the ISDM of the manifest component has the form [
10] (Equation (21)):
When building latent variable graphical models, we assume that
$\mathrm{r}\ll \mathrm{K}$, i.e., few latent variables are sufficient to characterize the conditional dependence structure of the model. The previous formula can therefore be written
where
$\mathbf{S}\left(\omega \right)$ is sparse, and
$\mathsf{\Lambda}\left(\omega \right)$ has (constant) lowrank almost everywhere in
$(\pi ,\pi ]$. Furthermore, we can write [
12] (Equation (4)):
where
$\mathbf{\Delta}\left(\omega \right)=\left(\right)open="["\; close="]">\mathbf{I},{e}^{j\omega}\mathbf{I},\dots ,{e}^{j\omega p}\mathbf{I}$ is a shift matrix, and
$\mathbf{X}$ and
$\mathbf{L}$ are
$\mathrm{K}\left(\right)open="("\; close=")">p+1$ positive semidefinite matrices. We split all such matrices in
$\mathrm{K}\times \mathrm{K}$ blocks, e.g.,
The block trace operator for such a matrix is
$\mathcal{D}(\xb7)$, defined by
For negative indices, the relation
${\mathcal{D}}_{i}\left(\mathbf{L}\right)={\mathcal{D}}_{i}{\left(\mathbf{L}\right)}^{\top}$ holds. Note that (
8) can be rewritten as
The first
$p+1$ sample covariances of the
$\mathrm{VAR}$ process are [
13]:
However, only the upper left
$\mathrm{K}\times \mathrm{K}$ blocks corresponding to the manifest variables can be computed from data; they are denoted
${\widehat{\mathbf{R}}}_{i}$. With
$\widehat{\mathbf{R}}=[{\widehat{\mathbf{R}}}_{0}\dots {\widehat{\mathbf{R}}}_{p}]$, we build the block Toeplitz matrix
It was proposed in [
10] to estimate the matrices
$\mathbf{X}$ and
$\mathbf{L}$ by solving the optimization problem
where
$\mathrm{tr}(\xb7)$ is the trace operator,
$\mathrm{log}(\xb7)$ denotes the natural logarithm and
$det(\xb7)$ stands for the determinant. Minimizing
$\mathrm{tr}\left(\mathbf{L}\right)$ induces low rank in
$\mathbf{L}$ and
$\lambda ,\gamma >0$ are tradeoff constants. The function
$f(\xb7)$ is a group sparsity promoter whose expression is given by
Note that ${\mathcal{D}}_{i}(\mathbf{X}+\mathbf{L})(a,b)$ is the ith degree coefficient of the polynomial that occupies the $(a,b)$ position in the matrix polynomial ${\mathsf{\Phi}}_{m}^{1}\left(\omega \right)$. Sparsity is encouraged by minimizing the ${\ell}_{1}$norm of the vector formed by the coefficients that are maximum for each position $(a,b)$.
3. New Algorithm
The obvious advantage of the optimization problem (
13) is its convexity, which allows the safe computation of the solution. However, a possible drawback is the presence of two parameters,
$\lambda $ and
$\gamma $, whose values should be chosen. A way to eliminate one of the parameters is to assume that the number
$\mathrm{r}$ of latent variables is known. At least for parsimony reasons, it is natural to suppose that
$\mathrm{r}$ is very small. Since a latent variable influences all manifest variables in the ISDM (
5), there cannot be too many independent latent variables. Therefore, giving
$\mathrm{r}$ a fixed small value is likely to be not restrictive.
In this section, we describe an estimation method which is clearly different from the one in [
10]. More precisely, we generalize the ExpectationMaximization algorithm from [
5], developed there for independent and identically distributed random variables, to a
$\mathrm{VAR}$ process. For this purpose, we work with the full model (
4) that includes the ISDM part pertaining to the
$\mathrm{r}$ latent variables. Without loss of generality, we assume that
${\mathbf{{\rm Y}}}_{\ell}\left(\omega \right)$ equals the identity matrix
$\mathbf{I}$; the effect of the latent variables on the manifest ones in (
5) can be modeled by
${\mathbf{{\rm Y}}}_{\ell m}$ alone. Combining with (
2), the model is
where the matrices
${\mathbf{Q}}_{i}$ have to be found.
The main difficulty of this approach is the unavailability of the latent part of the matrices (
11). Were such matrices available, we could work with SDM
$\mathsf{\Phi}\left(\omega \right)$ estimators (confined to order
p) of the form
where
${\mathbf{C}}_{i}$ denotes the
ith covariance lag for the VAR process
$\left\{{\mathbf{x}}_{t}\right\}$ (see also (
1) and (
11)). We split the matrix coefficients from (
15) and (
16) according to the size of manifest and latent variables, e.g.,
To overcome the difficulty, the ExpectationMaximization algorithm alternatively keeps fixed either the model parameters
${\mathbf{Q}}_{i}$ or the matrices
${\mathbf{C}}_{i}$, estimating or optimizing the remaining unknowns. The expectation step of ExpectationMaximization assumes that the ISDM
${\mathsf{\Phi}}^{1}\left(\omega \right)$ from (
15) is completely known. Standard matrix identities [
5] can be easily extended to matrix trigonometric polynomials for writing down the formula
Identifying (
16) with (
18) gives expressions for estimating the matrices
${\mathbf{C}}_{i}$, depending on the matrices
${\mathbf{Q}}_{i}$ from (
15). The upper left corner of (
18) needs no special computation, since the natural estimator is
where the sample covariances
${\widehat{\mathbf{R}}}_{i}$ are directly computable from the time series. It results that
The other blocks from (
17) result from convolution expressions associated with the polynomial multiplications from (
18). The lower left block of the coefficients is
Note that the trigonometric polynomial
${\mathbf{{\rm Y}}}_{\ell m}\left(\omega \right){\mathsf{\Phi}}_{m}\left(\omega \right)$ has degree
$2p$, since its factors have degree
p. With (
20) available, we can compute
where
${\delta}_{i}=1$ if
$i=0$ and
${\delta}_{i}=0$ otherwise. Although the degree of the polynomial from the lower right block of (
18) is
$3p$, we need to truncate it to degree
p, since this is the degree of the ISDM
${\mathsf{\Phi}}^{1}\left(\omega \right)$ from (
15). This is the reason for computing only the coefficients
$i=\overline{p,p}$ in (
21). The same truncation is applied on (
20); note that there we cannot compute only the coefficients that are finally needed, since all of them are required in (
21).
In the maximization step of ExpectationMaximization, the covariance matrices ${\mathbf{C}}_{i}$ are assumed to be known and are fixed; the ISDM can be estimated by solving an optimization problem that will be detailed below. The overall solution we propose is outlined in Algorithm 1, explained in what follows.
Algorithm 1 Algorithm for Identifying $\mathrm{SP}$ of ISDM (AlgoEM) 
Input: $\mathrm{Data}\left(\right)open="("\; close=")">{\mathbf{x}}_{1:K,1},\dots ,{\mathbf{x}}_{1:K,T}$, $\mathrm{VAR}$order p, number of latent variables $\mathrm{r}$, an information theoretic criterion (ITC). Initialization: Evaluate ${\widehat{\mathbf{R}}}_{i}$ for $i=\overline{0,p}$; (see ( 11) and the discussion below it) $\widehat{\mathbf{R}}\leftarrow [{\widehat{\mathbf{R}}}_{0}\dots {\widehat{\mathbf{R}}}_{p}]$; $\widehat{\mathsf{\Phi}}}_{m}(\omega )\leftarrow \sum _{i=p}^{p}{\widehat{\mathbf{R}}}_{i}{e}^{j\omega i$; ${\left(\right)}_{{\stackrel{\u02c7}{\mathbf{Q}}}_{i}^{(0)}}^{(1:\mathrm{K},1:\mathrm{K})}i=0p\leftarrow {\mathrm{ME}}_{\mathrm{I}}(\widehat{\mathbf{R}})$ (see ( 22)); Compute ${\stackrel{\u02c7}{\mathbf{{\rm Y}}}}_{\ell m}^{(0)}(\omega )$ from EIG of ${\stackrel{\u02c7}{\mathbf{Q}}}_{0}^{(0)}$; for all $\lambda \in \{{\lambda}_{1},\dots ,{\lambda}_{L}\}$ do Maximum Entropy ExpectationMaximization (penalized setting): for $\mathrm{it}=1,\dots ,{N}_{\mathrm{it}}$ do Use ${\widehat{\mathsf{\Phi}}}_{m}(\omega )$ and ${\stackrel{\u02c7}{\mathbf{{\rm Y}}}}_{\ell m}^{(\mathrm{it}1)}(\omega )$ to compute ${\stackrel{\u02c7}{\mathbf{C}}}^{(\mathrm{it})}$ (see ( 16)–( 18)); ${\left(\right)}_{{\stackrel{\u02c7}{\mathbf{Q}}}_{i}^{(\mathrm{it})}}^{}i=0p\leftarrow {\mathrm{ME}}_{\mathrm{II}}({\stackrel{\u02c7}{\mathbf{C}}}^{(\mathrm{it})},\lambda )$ (see ( 23)); Get ${\stackrel{\u02c7}{\mathbf{{\rm Y}}}}_{\ell m}^{(\mathrm{it})}(\omega )$ from ${\left(\right)}_{{\stackrel{\u02c7}{\mathbf{Q}}}_{i}^{(\mathrm{it})}}^{}i=0p$ (see ( 15)); end for Use ${\left(\right)}_{{\stackrel{\u02c7}{\mathbf{Q}}}_{i}^{({N}_{\mathrm{it}})}}^{}i=0p$ to compute ${\stackrel{\u02c7}{\mathsf{\Phi}}}_{\lambda}^{1}(\omega )$; Determine ${\mathrm{SP}}_{\lambda}$ (see ( 24)); if ADAPTIVE then ${\stackrel{\u02c7}{\mathbf{{\rm Y}}}}_{\ell m}^{(0)}(\omega )\leftarrow {\stackrel{\u02c7}{\mathbf{{\rm Y}}}}_{\ell m}^{({N}_{\mathrm{it}})}(\omega )$ end if ${\widehat{\mathbf{{\rm Y}}}}_{\ell m}^{(0)}(\omega )\leftarrow {\stackrel{\u02c7}{\mathbf{{\rm Y}}}}_{\ell m}^{({N}_{\mathrm{it}})}(\omega )$; Maximum Entropy ExpectationMaximization (constrained setting): for $\mathrm{it}=1,\dots ,{N}_{\mathrm{it}}$ do Use ${\widehat{\mathsf{\Phi}}}_{m}(\omega )$ and ${\widehat{\mathbf{{\rm Y}}}}_{\ell m}^{(\mathrm{it}1)}(\omega )$ to compute ${\widehat{\mathbf{C}}}^{(\mathrm{it})}$ (see ( 16)–( 18)); ${\left(\right)}_{{\widehat{\mathbf{Q}}}_{i}^{(\mathrm{it})}}^{}i=0p\leftarrow {\mathrm{ME}}_{\mathrm{III}}({\widehat{\mathbf{C}}}^{(\mathrm{it})},{\mathrm{SP}}_{\lambda})$ (see ( 25)); Get ${\widehat{\mathbf{{\rm Y}}}}_{\ell m}^{(\mathrm{it})}(\omega )$ from ${\left(\right)}_{{\widehat{\mathbf{Q}}}_{i}^{(\mathrm{it})}}^{}i=0p$ (see ( 15)); end for Use ${\left(\right)}_{{\widehat{\mathbf{Q}}}_{i}^{({N}_{\mathrm{it}})}}^{}i=0p$ to compute ${\widehat{\mathsf{\Phi}}}_{\lambda}^{1}(\omega )$; Find the matrix coefficients of the $\mathrm{VAR}$model by spectral factorization of ${\widehat{\mathsf{\Phi}}}_{\lambda}^{1}(\omega )$ and compute $\mathrm{ITC}(\mathrm{Data};{\mathrm{SP}}_{\lambda})$. end for $\widehat{\mathrm{SP}}\leftarrow \mathrm{arg}\underset{\lambda}{min}\mathrm{ITC}(\mathrm{Data};{\mathrm{SP}}_{\lambda})$;

The initialization stage provides a first estimate for the ISDM, from which the ExpectationMaximization alternations can begin. An estimate for the left upper corner of
${\mathsf{\Phi}}^{1}\left(\omega \right)$ is obtained by solving the
classical Maximum Entropy problem for a
$\mathrm{VAR}\left(p\right)$model, using the sample covariances of the manifest variables. We present below the matrix formulation of this problem, which allows an easy implementation in CVX (Matlabbased modeling system for convex optimization) [
14]. The mathematical derivation of the matrix formulation from the information theoretic formulation can be found in [
6,
9].
First Maximum Entropy Problem [
${\mathrm{ME}}_{\mathrm{I}}\left(\widehat{\mathbf{R}}\right)$]:
The block Toeplitz operator
$\mathcal{T}$ is defined in (
12). The size of the positive semidefinite matrix variable
$\mathbf{X}$ is
$\mathrm{K}(p+1)\times \mathrm{K}(p+1)$. For all
$i=\overline{0,p}$, the estimate
${\stackrel{\u02c7}{\mathbf{Q}}}_{i}^{\left(0\right)}(1:\mathrm{K},1:\mathrm{K})$ of the ISDM (
15) is given by
${\mathcal{D}}_{i}\left(\mathbf{X}\right)$.
In order to compute an initial value for ${\mathbf{{\rm Y}}}_{\ell m}\left(\omega \right)$, we resort to the eigenvalue decomposition (EIG) of ${\stackrel{\u02c7}{\mathbf{Q}}}_{0}^{\left(0\right)}(1:\mathrm{K},1:\mathrm{K})$. More precisely, after arranging the eigenvalues of ${\stackrel{\u02c7}{\mathbf{Q}}}_{0}^{\left(0\right)}(1:\mathrm{K},1:\mathrm{K})$ in the decreasing order of their magnitudes, we have ${\stackrel{\u02c7}{\mathbf{Q}}}_{0}^{\left(0\right)}(1:\mathrm{K},1:\mathrm{K})=\mathbf{U}\mathbf{D}{\mathbf{U}}^{\top}$. Then, we set ${\stackrel{\u02c7}{\mathbf{Q}}}_{0}^{\left(0\right)}(\mathrm{K}+1:\mathrm{K}+r,1:\mathrm{K})={\mathbf{D}}^{1/2}(1:\mathrm{r},1:\mathrm{r}){\mathbf{U}}^{\top}(1:\mathrm{K},1:\mathrm{r})$ and ${\stackrel{\u02c7}{\mathbf{Q}}}_{i}^{\left(0\right)}(\mathrm{K}+1:\mathrm{K}+\mathrm{r},1:\mathrm{K})=0$ for $i=\overline{1,p}$.
When the covariances
${\mathbf{C}}_{i}$ are fixed in the maximization step of the ExpectationMaximization algorithm, the coefficients of the matrix polynomial that is the ISDM (
15) are estimated from the solution of the following optimization problem:
Second Maximum Entropy Problem [
${\mathrm{ME}}_{\mathrm{II}}(\mathbf{C},\lambda )$]:
Since now we work with the full model, the size of
$\mathbf{X}$ is
$(\mathrm{K}+\mathrm{r})(p+1)\times (\mathrm{K}+\mathrm{r})(p+1)$. The function
$f(\xb7)$ is the sparsity promoter defined in (
14) and depends only on the entries of the block corresponding to the manifest variables. The equality constraints in (
23) guarantee that the latent variables have variance one and they are independent, given the manifest variables, corresponding to the lower right block of (
15).
The estimates
${\left(\right)}_{{\stackrel{\u02c7}{\mathbf{Q}}}_{i}^{\left({N}_{\mathrm{it}}\right)}}^{}i=0p$ obtained after these iterations are further employed to compute
${\stackrel{\u02c7}{\mathsf{\Phi}}}_{\lambda}^{1}\left(\omega \right)$ by using (
15). If
$\lambda $ is large enough, then
${\stackrel{\u02c7}{\mathsf{\Phi}}}_{\lambda}^{1}\left(\omega \right)$ is expected to have a certain sparsity pattern,
${\mathrm{SP}}_{\lambda}$. Since the objective of (
23) does not ensure exact sparsification and also because of the numerical calculations, the entries of
${\stackrel{\u02c7}{\mathsf{\Phi}}}_{\lambda}^{1}\left(\omega \right)$ that belong to
${\mathrm{SP}}_{\lambda}$ are small, but not exactly zero. In order to turn them to zero, we apply a method similar to the one from [
6] (Section 4.1.3). We firstly compute the maximum of partial spectral coherence (PSC),
for all
$a\ne b$ with
$1\le a,b\le \mathrm{K}$. Then
${\mathrm{SP}}_{\lambda}$ comprises all the pairs
$(a,b)$ for which the maximum PSC is not larger than a threshold
$\mathrm{Th}$. The discussion on the selection of parameters
${N}_{\mathrm{it}}$ and
$\mathrm{Th}$ is deferred to
Section 5.
The regularized estimate of ISDM is further improved by solving a problem similar to (
23), but with the additional constraint that the sparsity pattern of ISDM is
${\mathrm{SP}}_{\lambda}$, more precisely:
Third Maximum Entropy Problem [
${\mathrm{ME}}_{\mathrm{III}}(\mathbf{C},\mathrm{SP})$]:
This step of the algorithm has a strong theoretical justification which stems from the fact that
${\widehat{\mathsf{\Phi}}}^{1}\left(\omega \right)$ is the Maximum Entropy solution for a covariance extension problem (see [
10] (Remark 2.1)). The number of iterations,
${N}_{\mathrm{it}}$, is the same as in the case of the first loop.
The spectral factorization of the positive matrix trigonometric polynomial
${\widehat{\mathsf{\Phi}}}_{\lambda}^{1}\left(\omega \right)$ is computed by solving a semidefinite programming problem. The implementation is the same as in [
8], except that in our case the model contains latent variables. Therefore, the matrix coefficients produced by spectral factorization are altered to keep only those entries that correspond to manifest variables. The resulting
$\mathrm{VAR}$ model is fitted to the data and then various IT criteria are evaluated. The accuracy of the selected model depends on the criterion that is employed as well as on the strategy used for generating the
$\lambda $values that yield the competing models. In the next section, we list the model selection rules that we apply; the problem of generating the
$\lambda $values is treated in
Section 5.
As already mentioned, the estimation problem is solved for several values of
$\lambda $:
${\lambda}_{1}<{\lambda}_{2}<\cdots <{\lambda}_{L}$. From the description above we know that, for each value of the parameter
$\lambda $,
${\mathbf{{\rm Y}}}_{\ell m}\left(\omega \right)$ gets the same initialization, which is based on (
22). It is likely that this initialization is poor. A better approach is an
ADAPTIVE algorithm which takes into consideration the fact that the difference
${\lambda}_{i}{\lambda}_{i1}$ is small for all
$i=\overline{2,L}$. This algorithm initializes
${\mathbf{{\rm Y}}}_{\ell m}\left(\omega \right)$ as explained above only when
$\lambda ={\lambda}_{1}$. When
$\lambda ={\lambda}_{i}$ for
$i=\overline{2,L}$, the initial value of
${\mathbf{{\rm Y}}}_{\ell m}\left(\omega \right)$ is taken to be the estimate of this quantity that was previously obtained by solving the optimization problem in (
23) for
$\lambda ={\lambda}_{i1}$. The effect of the
ADAPTIVE procedure will be investigated empirically in
Section 5.