1. Introduction
Manfred Deistler is justly famous for his landmark contribution to the theoretical foundations of time series analysis. In this short note, however, we deliberately chose to focus on a lesser known aspect of his activity, which nevertheless had quite an impact on the theory and practice of one of the most powerful tools of contemporary time series and econometrics: dynamic factor models.
Dynamic factor models—as we shall see in
Section 3 below, this traditional terminology is somewhat of a misnomer, as the
general dynamic factor model (
6) and (
7) (of which other factor models are particular cases) follows as a representation result rather than constituting a statistical
model and does not necessarily involve
factors, see
Hallin and Lippi (
2014)—were developed, in the econometric literature mostly, as a response to the need to analyze and forecast time series in high dimension. Increasingly often, datasets of econometric interest take the form
of a large number
N of time series observed over a period of time
T—the finite
realization of the double-indexed process
with arbitrarily intricate cross-sectional and serial dependence structures.
Even for moderate values of
N, the traditional (parametric) methods of multivariate time-series analysis are running into the theoretical and numerical problems related to the curse of dimensionality. The need for an alternative approach became evident in the late 1970s, leading to the first factor model proposals by
Sargent and Sims (
1977),
Geweke (
1977),
Chamberlain (
1983), and
Chamberlain and Rothschild (
1983). These four papers can be considered as early forerunners of the modern literature on factor models—a literature that had a new start in the early 2000’s, with four papers, essentially, that triggered most subsequent developments:
Forni et al. (
2000),
Bai and Ng (
2002),
Stock and Watson (
2002a,
2002b).
Chamberlain (
1983) and
Chamberlain and Rothschild (
1983) were particularly influential as an early example of high-dimensional time-series asymptotics where both the dimension
N and the series length
T tend to infinity.
Econometricians, of course, were not the only ones facing inference problems in high-dimensional spaces. Interestingly, a couple of years later, mathematical statisticians, in the more restricted context of Gaussian i.i.d. observations (a very particular case of time series) independently adopted a somewhat different approach, leading to the so-called spiked covariance model. We are showing here how spiked covariance models and factor models, while sharing some common features, nevertheless differ on an essential point, and we explain why factor models are both more general and statistically more successful.
Finally, we show how Manfred Deistler, by providing the missing final piece of the general dynamic factor model jigsaw, made a decisive impact in this area—an impact that deserves to be better known.
Outline of the paper.Section 2 deals with the spiked covariance model developed in the probability and mathematical statistics literature. This model, indeed, is somewhat similar to the factor-model approach, with an essential difference that helps understand the benefits of the latter.
Section 3 features a brief history of the factor-model approach and introduces the general dynamic factor model (GDFM).
Section 4 highlights the importance of Manfred Deistler’s contribution to the ultimate development of the GDFM methodology.
2. Spiked Covariance Models: A Needle in a Growing Haystack
While econometricians were facing the time-series version of high-dimensional observations and the curse of dimensionality, mathematical statisticians also were dealing with high-dimensional asymptotics in the more restricted framework of i.i.d. samples where only cross-sectional dependencies are present. Interestingly, while sharing some common features with factor models, the models they developed are leading to strikingly different conclusions.
Most of the probability and statistics literature in the area revolves around the so-called
spiked covariance models—a terminology that was coined, apparently, by
Johnstone (
2001). In that model, which has attracted much interest in the recent years, the observation is of the form
where
are i.i.d.
with, denoting by
the unit sphere in
,
for some unspecified
,
; asymptotics are taken as
N and
T tend to infinity in such a way that
—the so-called
phase transition threshold. To simplify the discussion, let
, that is,
for some
.
That model leads to a number of mathematically beautiful but statistically puzzling asymptotic results: the sample covariance eigenvalues pack together, filling the support of the
Marchenko-Pastur density; the distribution of any finite number of centered and normalized largest sample covariance eigenvalues converges to the multivariate
Tracy-Widom law irrespective of the values of
in
; and the sequence of distributions of
under
(no spike) is contiguous to the corresponding sequence with
, albeit with contiguity rate
, which, in particular, precludes consistent estimation of
(see
Onatski et al. 2013,
2014 for details). The statistical value of such results is, to say the least, somewhat limited—all the more so that in practice
and
do not tend to infinity, so that the value of
, the role of which is crucial, is completely arbitrary and bears no relation to the observed sample. (The value of the actual ratio
is usually chosen for want of anything better.) That spiked covariance literature, thus, has little to offer to econometricians who have to produce forecasts and, moreover, are facing serially dependent and mostly non-Gaussian observations.
These intriguing results all are due to the choice of the asymptotic scheme itself. Recall that asymptotics are a mathematical fiction by which limiting results (as
N and
T tend to infinity) are expected to provide an approximation to the actual fixed-
problem. The scenario of that mathematical fiction is not in the data and entirely depends on the statistician’s choice. That choice, thus, should aim at optimizing the quality of the approximation and is not meant to describe any actual real-world situation: a scenario under which the “cross-sectional future” resembles the actual observation is likely to achieve that objective much better than a “worst-case one”. In traditional time series asymptotics with fixed dimension
, where only
, stationarity with respect to
t is the reasonable and usual choice. While the specification, for
of a fictitious yet sensible “cross-sectional future” is more delicate, the choice leading to the spiked covariance model (
2) and (
3) definitely has the flavor of a “catastrophe scenario”, which is unlikely to provide a good approximation to the finite-dimensional, finite-sample situation.
Below are two types of data-generating processes (two sequences of unit vectors
) leading to the single-spiked-covariance model—the
covariance matrix (
3):
where
i.i.d.
, and
i.i.d.
,
,
;
where
,
and
i.i.d.
,
,
.
Actually, (a) and (b) coincide up to a rotation in —an orthogonal matrix with the first column of the form , indeed, is turning (a) into (b)—they only differ by the choice of a coordinate system, thus, and all of their covariance eigenvalues (theoretical and empirical) coincide.
Under (a), a bounded spike (justifying the spiked terminology) is “hidden” under a growing number of uninformative white noises—a finite needle buried in an ever-growing haystack. Growing N clearly does not provide any information (only growing T does). The fact that the needle gets undetected when its size is small relative to the asymptotic value of the ratio (the larger that ratio, the faster the haystack growth) thus is hardly surprising.
Model (b) takes the form of a factor model decomposition, with a cross-sectionally pervasive “common shock” loaded by all components with loadings tending to zero as , and an idiosyncratic , which is Gaussian white noise. While cross-sectionally pervasive, however, is not loaded strongly enough for the largest eigenvalue of , which is , to diverge as .
The situation, however, improves dramatically if the size of the needle grows with the dimension N. Letting of the form in (b), with being arbitrarily small (loadings still tending to zero as , at slightly slower rate, though), ’s largest eigenvalue is , which tends to infinity as . With of the form ( arbitrarily small), the loadings are and no longer tend to zero as ; ’s largest eigenvalue is , which linearly tends to infinity as .
All of the problems then disappear: with such loadings, (b) yields a (very) special case of the dynamic factor models developed by the econometricians and described in
Section 3, where consistent estimation of the loadings—hence of
—is possible.
Now, the same conclusions hold about (b) if we let
or
Assuming (b) with
of the form (
4) or (
5) instead of the original formulation (
2) and (
3) is clearly tantamount to adopting alternative asymptotic scenarios as it only modifies the postulated but not-to-be-observed form of the “cross-sectional future” (namely, the form of the cross-sectional components with index
), which is an arbitrary choice of the statistician. Under these alternative scenarios, the spike
, say, keeps growing with
N (viz.,
under (
4),
under (
5)), thus balancing the impact of a growing dimension
N: the needle now is growing—be it arbitrarily slowly—with the haystack.
The asymptotic scenario (
5) and, in fact, any scenario of loadings leading to a linearly exploding largest eigenvalue for
(all other eigenvalues remaining bounded) is particularly appealing as it consists of assuming that the fictitious “cross-sectional future” resembles the observed “cross-sectional past”—a form of cross-sectional stability that is much more likely to provide a good approximation of the finite-
problem under study than the classical spiked-covariance-model asymptotics underlying (
2) and (
3).
3. Dynamic Factor Models: Pervasive Needles and the Blessing of Dimensionality
As mentioned in the Introduction, econometricians faced the needle-in-the haystack problem long time before probabilists and mathematical statisticians did—long time before the expression “big data” was coined. Econometrics require operational solutions: the asymptotic scenario of a needle that, at the end of the day, cannot be found is somewhat inappropriate in the econometric context. Moreover, economic data are seldom i.i.d. Gaussian; they generally are serially auto- and cross-correlated and often heavy-tailed (often yielding infinite fourth-order moments). An econometric theory of high-dimensional time series therefore needs to address much more general situations than those covered by Gaussian spiked covariance models.
The econometric theory of factor models for high-dimensional observations arose from that need and takes various forms; the spirit of the approach, as initiated by
Chamberlain (
1983) and
Chamberlain and Rothschild (
1983), is easily explained, however, from a multispiked (
q spikes) extension of (b) yielding
q exploding eigenvalues
, namely,
q needles, which, in a sense, are growing with the haystack.
Under their most general form—the so-called
general dynamic factor model (the terminology
generalized dynamic factor model is used equivalently) or GDFM introduced in
Forni et al. (
2000)—factor models proceed as follows. The observation is still of the form
described in (
1), but the assumptions are much more general, allowing for serial auto- and cross-correlations and non-Gaussian densities. Assuming that they exist, denote by
,
the
spectral density matrices of the
N-dimensional processes
. These density matrices are nested as
. Some eigenvalues—
q of them are (
a.e.) exploding, and the other ones remain (
a.e.) bounded as
. This yields a decomposition of
into
where
, called the
common spectral density, has reduced rank
q and
q diverging
dynamic eigenvalues—the terminology
dynamic eigenvalues used for the eigenvalues of a spectral density matrix was coined by
Brillinger (
1964,
1981), who introduced the concept. As for
, called the
idiosyncratic spectral density, it only has bounded dynamic eigenvalues.
That decomposition of
in turn induces a decomposition of the observations
into
where
, with spectral density
, and
, with spectral density
, are mutually orthogonal at all leads and lags;
is called the
common component,
the
idiosyncratic component. Since the spectral density
has reduced rank
q, the common component
is driven by
mutually orthogonal white noises:
(
L, as usual, stands for the lag operator), while the idiosyncratic component
, having bounded dynamic eigenvalues, is only mildly cross-correlated (it can be strongly
auto-correlated, though); see
Hallin and Lippi (
2019) or
Lippi et al. (
2022) for recent surveys.
Further constraints can be imposed on the decomposition. Among them, the
static loading assumption (
Stock and Watson 2002a,
2002b;
Bai and Ng 2002;
Bai 2003, and many others)
(
here is a
real vector) under which the shocks
are loaded in a
static and contemporaneous way, while in (
7) the loadings are filters
and the shocks are loaded in a dynamic way (involving lagged shock values).
As soon as the spectral density
exists and admits a finite number
q of exploding eigenvalues, a GDFM representation (
6) with dynamic loadings (
7) exists (although an infinite number of exploding eigenvalues in theory is not impossible, such cases are extremely artificial and contrived: see
Hallin and Lippi (
2014) for an example). The existence of a factor model decomposition (
6) with static loadings (
8), however, is a strong assumption one should like to avoid. Multivariate economic time series, let alone the infinite-dimensional ones, typically involve leading and lagging series, loading the common shocks with various leads and lags. The GDFM, which allows for this and basically does not place any restrictions on the data-generating process, is much preferable in that respect.
The GDFM was introduced by
Forni et al. (
2000), who establish its asymptotic identifiability—the “blessing of dimensionality”—and propose a consistent (consistency here is not uniform in
t, though: see
Forni et al. (
2000) for details) estimation strategy based on Brillinger’s concept of
dynamic principal components.
The moot point with dynamic principal components, however, is that their computation involves two-sided filters, i.e., it involves both the past and the future values of the observed ’s. This is fine in the “center” of the observation period but not at the edges. In particular, the estimation, for forecasting purposes, of , is likely to be poor irrespective of N and T as the future observations are not available.
The advantage of static loadings is that, under (
8),
q coincides with the number of exploding “static” eigenvalues—the eigenvalues of the
covariance matrix
of
. Additionally,
Bai and Ng (
2002) and
Stock and Watson (
2002a,
2002b) propose an estimation method relying on a traditional principal component analysis of
. These principal components, at given time
t, only require the contemporaneous observation
: no problems, thus, for
and forecasting issues. This, and the popularity of traditional principal component methods, explains why practitioners, somewhat regrettably, prefer the static contemporaneous loading approach despite its lesser generality, its lack of parsimony, and the fact that the crucial and quite restrictive underlying assumption (
8) may not hold. This latter fact is often dispelled by arguing that lagged values of the factors can be incorporated into a static loading scheme via stacking. This, which may very severely inflate the number of factors, is a flawed argument. Indeed, there is no guarantee that these lagged values enjoy the pervasiveness properties required from static factors, with the consequence that, in the traditional principal component estimation method, they get lost to the idiosyncratic.
Up to this point, the general dynamic factor model, in view of its generality, is a brilliant idea, the practical implementation of which apparently is blocked, hopelessly, by the the two-sided nature of Brillinger’s dynamic principal components. Here is, however, where Manfred Deistler enters into the picture.
4. Manfred Deistler and the General Dynamic Factor Model
Manfred Deistler’s seminal contribution to the analysis of general dynamic factor models originates in his interest in the properties of reduced rank processes. The following result—unrelated, at first sight, to factor models and high-dimensional time series—follows from
Anderson and Deistler (
2008a).
Anderson and Deistler call
tall the transfer function
of a
Q-dimensional pro-cess
driven by a
q-dimensional white noise
where
, namely, a process of the form
where
is some
filter. By abuse of language, we say that the process
itself is
tall.
Recall that a process satisfying (
9) with
is called
rational if the filter
itself is rational, that is, if
matrix filters
and
and integers
m and
p exist such that, for all
and
,
, the degree of
is
m, the degree of
is
p, and
; note that
then involves a finite (although unspecified) number
of real parameters: call it
rational of order P. Also, recall that a subset of a topological space is
generic if it contains an open and dense subset. Denoting by
the parameter space (a complete description of
would require the filters
and
to be
stable and having no common zeroes) indexing the family of rational filters of order
P, the genericity below is meant for
as a subset of
: call it
-genericity.
As in
Forni et al. (
2015), the way the fundamental result of
Anderson and Deistler (
2008a) is presented here is geared towards its general dynamic factor model application and slightly differs from the original formulation. Rather than a rational spectrum of order
P (
P unspecified but finite), the latter assumes a state space representation with finitely many parameters. For the sake of simplicity, the formulation below also slightly differs from the statements in Sections 3 and 4 of
Forni et al. (
2015), where more general and complete results can be found.
Proposition 1. Let be a tall process satisfying (9) for some rational filter . Then, for -generic
values of (P unspecified), admits, for some , a VAR
(K) representation of the formwhere is and is . Refinements of this result can be found in
Anderson and Deistler (
2009),
Chen et al. (
2011), and
Anderson et al. (
2012), where some of its consequences are also discussed. The most elegant proof is found in
Anderson et al. (
2016).
Anderson and Deistler (
2008a) only briefly mention, without entering into details, the relevance of their result for the general dynamic factor model (with dynamic loadings as in (
7)). That relevance is further discussed in
Anderson and Deistler (
2008b,
2009),
Deistler et al. (
2010,
2011), and
Forni and Lippi (
2011); it is fully exploited in
Forni et al. (
2015,
2017,
2018),
Barigozzi et al. (
2021a,
2021b), and several subsequent papers. The same result also has important consequences in the macroeconomic applications of factor models and the estimation and identification of structural VARs: see
Forni et al. (
2020).
The relevance for the general dynamic factor model of the above result stems from the fact that, by definition, for almost all of the values of
, the
-dimensional vector
of common components is tall, with
and a
q-dimensional white noise
in (
9). Under the very mild additional assumption that
is rational (for some unspecified
m and
p), the above proposition thus applies.
Suppose, for convenience (and without loss of generality), that
, where
, and write
as
where
,
. Assuming the rationality of all
’s, the Anderson–Deistler result applies, so that
that is,
generically, (genericity here is meant for any fixed
N) for some
,
, and
, we have for
the block-diagonal VAR representation
with
blocks
,
, an
matrix
, and
q-dimensional white noise
; it can be shown, moreover, that
for some
orthogonal matrix
.
As a consequence,
where, being a linear transformation of idiosyncratic components,
itself is idiosyncratic. This is a
static factor model (static contemporaneous loadings) for the filtered series
: traditional principal components thus, which do not involve filters, can be used instead of dynamic ones in the estimation of (
11) once consistent estimators
are substituted for the unspecified filters
. Such estimators can be constructed from the decomposition of the estimated spectral density followed by inverse Fourier transforms and Yule–Walker VAR estimation in dimension
: see
Forni et al. (
2015,
2017) for details.
Based on this,
Forni and Lippi (
2011) and
Forni et al. (
2015,
2017) propose a winning strategy for a consistent one-sided reconstruction of the common components
(hence, also the idiosyncratic components
), their impulse-response functions, the common shocks
, the loadings filters
, etc. The corresponding asymptotic distributions are derived in
Barigozzi et al. (
2021b). Surprisingly, the consistency rates are comparable to those obtained by
Bai (
2003) for the static method (without the preliminary filtering (
11))—the validity of which, however, requires the much more stringent assumptions of the static model (
8). The same results also apply in the identification and estimation of volatilities
Barigozzi and Hallin (
2016,
2017,
2019), of time-varying GDFMs
Barigozzi et al. (
2021a), and in the prediction of conditional variances, values at risk, and expected shortfalls
Hallin and Trucíos (
2022);
Trucíos et al. (
2022).
The (asymptotic) validity of this strategy, of course, requires (
11) to hold. This has to be assumed. Since, however, it holds
generically, that assumption is extremely mild.
Numerical exercises (both Monte-Carlo and empirical) demonstrate the forecasting superiority of the resulting method, which seems to outperform all of the other methods proposed in the literature while remaining valid under much milder and more general assumptions on the data-generating process.
Forni et al. (
2018) show that, even when the assumptions (
8) of static loadings are satisfied, the GDFM method still performs better than the static one.
Barigozzi et al. (
2021b) finalize the study of its asymptotic properties by establishing the corresponding asymptotic distributional results.
Manfred Deistler, thus, besides his many contributions to the mathematical foundations of time-series analysis, can be credited for unlocking the applicability of the general dynamic factor model with dynamic loadings instead of the more restrictive, less performant, and less parsimonious model with static loadings—thereby turning a beautiful but hardly applicable theoretical model into a fully operational and effective statistical tool, of central importance in contemporary applied time-series econometrics.