Manfred Deistler and the General-Dynamic-Factor-Model Approach to the Statistical Analysis of High-Dimensional Time Series

: For more than half a century, Manfred Deistler has been contributing to the construction of the rigorous theoretical foundations of the statistical analysis of time series and more general stochastic processes. Half a century of unremitting activity is not easily summarized in a few pages. In this short note, we chose to concentrate on a relatively little-known aspect of Manfred’s contribution that nevertheless had quite an impact on the development of one of the most powerful tools of contemporary time series and econometrics: dynamic factor models.


Introduction
Manfred Deistler is justly famous for his landmark contribution to the theoretical foundations of time series analysis. In this short note, however, we deliberately chose to focus on a lesser known aspect of his activity, which nevertheless had quite an impact on the theory and practice of one of the most powerful tools of contemporary time series and econometrics: dynamic factor models.
Dynamic factor models-as we shall see in Section 3 below, this traditional terminology is somewhat of a misnomer, as the general dynamic factor model (6) and (7) (of which other factor models are particular cases) follows as a representation result rather than constituting a statistical model and does not necessarily involve factors, see Hallin and Lippi (2014)were developed, in the econometric literature mostly, as a response to the need to analyze and forecast time series in high dimension. Increasingly often, datasets of econometric interest take the form X N,T := {X it | i = 1, . . . , N t = 1, . . . , T} of a large number N of time series observed over a period of time T-the finite N × T realization of the doubleindexed process X := {X it | i ∈ N t ∈ Z} with arbitrarily intricate cross-sectional and serial dependence structures.
Even for moderate values of N, the traditional (parametric) methods of multivariate time-series analysis are running into the theoretical and numerical problems related to the curse of dimensionality. The need for an alternative approach became evident in the late 1970s, leading to the first factor model proposals by Sargent and Sims (1977), Geweke (1977), Chamberlain (1983), and Chamberlain and Rothschild (1983). These four papers can be considered as early forerunners of the modern literature on factor models-a literature that had a new start in the early 2000's, with four papers, essentially, that triggered most subsequent developments: Forni et al. (2000), Bai and Ng (2002), Watson (2002a, 2002b). Chamberlain (1983) and Chamberlain and Rothschild (1983) were particularly influential as an early example of high-dimensional time-series asymptotics where both the dimension N and the series length T tend to infinity.
Econometricians, of course, were not the only ones facing inference problems in high-dimensional spaces. Interestingly, a couple of years later, mathematical statisticians, in the more restricted context of Gaussian i.i.d. observations (a very particular case of time series) independently adopted a somewhat different approach, leading to the so-called spiked covariance model. We are showing here how spiked covariance models and factor models, while sharing some common features, nevertheless differ on an essential point, and we explain why factor models are both more general and statistically more successful.
Finally, we show how Manfred Deistler, by providing the missing final piece of the general dynamic factor model jigsaw, made a decisive impact in this area-an impact that deserves to be better known.
Outline of the paper. Section 2 deals with the spiked covariance model developed in the probability and mathematical statistics literature. This model, indeed, is somewhat similar to the factor-model approach, with an essential difference that helps understand the benefits of the latter. Section 3 features a brief history of the factor-model approach and introduces the general dynamic factor model (GDFM). Section 4 highlights the importance of Manfred Deistler's contribution to the ultimate development of the GDFM methodology.

Spiked Covariance Models: A Needle in a Growing Haystack
While econometricians were facing the time-series version of high-dimensional observations and the curse of dimensionality, mathematical statisticians also were dealing with high-dimensional asymptotics in the more restricted framework of i.i.d. samples where only cross-sectional dependencies are present. Interestingly, while sharing some common features with factor models, the models they developed are leading to strikingly different conclusions.
Most of the probability and statistics literature in the area revolves around the so-called spiked covariance models-a terminology that was coined, apparently, by Johnstone (2001). In that model, which has attracted much interest in the recent years, the observation is of the form X N,T := {X it | i = 1, . . . , N t = 1, . . . , T} where X t := (X 1t , . . . , X Nt ) , t = 1, . . . , T are i.i.d. N (0, C) with, denoting by S N−1 the unit sphere in R N , for some unspecified v (n) k ∈ S N−1 , k = 1, . . . , q; asymptotics are taken as N and T tend to infinity in such a way that N/T → κ-the so-called phase transition threshold. To simplify the discussion, let q = 1, that is, That model leads to a number of mathematically beautiful but statistically puzzling asymptotic results: the sample covariance eigenvalues pack together, filling the support of the Marchenko-Pastur density; the distribution of any finite number of centered and normalized largest sample covariance eigenvalues converges to the multivariate Tracy-Widom law irrespective of the values of (1 + λ) in [1, κ); and the sequence of distributions of X t under λ = 0 (no spike) is contiguous to the corresponding sequence with 0 < λ < κ, albeit with contiguity rate n 0 = 1, which, in particular, precludes consistent estimation of λ (see Onatski et al. 2013Onatski et al. , 2014 for details). The statistical value of such results is, to say the least, somewhat limited-all the more so that in practice N = N 0 and T = T 0 do not tend to infinity, so that the value of κ, the role of which is crucial, is completely arbitrary and bears no relation to the observed sample. (The value of the actual ratio N 0 /T 0 is usually chosen for want of anything better.) That spiked covariance literature, thus, has little to offer to econometricians who have to produce forecasts and, moreover, are facing serially dependent and mostly non-Gaussian observations. These intriguing results all are due to the choice of the asymptotic scheme itself. Recall that asymptotics are a mathematical fiction by which limiting results (as N and T tend to infinity) are expected to provide an approximation to the actual fixed-(N = N 0 , T = T 0 ) problem. The scenario of that mathematical fiction is not in the data and entirely depends on the statistician's choice. That choice, thus, should aim at optimizing the quality of the approximation and is not meant to describe any actual real-world situation: a scenario under which the "cross-sectional future" resembles the actual observation is likely to achieve that objective much better than a "worst-case one". In traditional time series asymptotics with fixed dimension (N = N 0 ), where only T → ∞, stationarity with respect to t is the reasonable and usual choice. While the specification, for N > N 0 of a fictitious yet sensible "cross-sectional future" is more delicate, the choice leading to the spiked covariance model (2) and (3) definitely has the flavor of a "catastrophe scenario", which is unlikely to provide a good approximation to the finite-dimensional, finite-sample situation.
Below are two types of data-generating processes (two sequences of unit vectors v N ) leading to the single-spiked-covariance model-the N × N covariance matrix (3): Under (a), a bounded spike λ (justifying the spiked terminology) is "hidden" under a growing number N − 1 of uninformative white noises-a finite needle buried in an evergrowing haystack. Growing N clearly does not provide any information (only growing T does). The fact that the needle gets undetected when its size λ is small relative to the asymptotic value κ of the ratio N/T (the larger that ratio, the faster the haystack growth) thus is hardly surprising.
Model (b) takes the form of a factor model decomposition, with a cross-sectionally pervasive "common shock" u t loaded by all components X it with loadings √ λ/N tending to zero as N → ∞, and an idiosyncratic ξ it , which is Gaussian white noise . While crosssectionally pervasive, however, u t is not loaded strongly enough for the largest eigenvalue of C N , which is 1 + λ, to diverge as N → ∞.
The situation, however, improves dramatically if the size of the needle grows with the dimension N. Letting χ it of the form , with δ ∈ (0, 1) being arbitrarily small (loadings √ λ/N 1−δ still tending to zero as N → ∞, at slightly slower rate, though), C N 's largest eigenvalue is 1 + N δ λ, which tends to infinity as N → ∞. With χ it of the form χ it = √ cλ u t (c > 0 arbitrarily small), the loadings are √ cλ and no longer tend to zero as N → ∞; C N 's largest eigenvalue is 1 + cλN, which linearly tends to infinity as N → ∞.
All of the problems then disappear: with such loadings, (b) yields a (very) special case of the dynamic factor models developed by the econometricians and described in Section 3, where consistent estimation of the loadings-hence of λ-is possible. Now, the same conclusions hold about (b) if we let Assuming (b) with χ it of the form (4) or (5) instead of the original formulation (2) and (3) is clearly tantamount to adopting alternative asymptotic scenarios as it only modifies the postulated but not-to-be-observed form of the "cross-sectional future" (namely, the form of the cross-sectional components with index i ≥ N + 1), which is an arbitrary choice of the statistician. Under these alternative scenarios, the spike λ = λ N , say, keeps growing with N (viz., λ N = λN δ under (4), λ N = cλN under (5)), thus balancing the impact of a growing dimension N: the needle now is growing-be it arbitrarily slowly-with the haystack. The asymptotic scenario (5) and, in fact, any scenario of loadings leading to a linearly exploding largest eigenvalue for C N (all other eigenvalues remaining bounded) is particularly appealing as it consists of assuming that the fictitious "cross-sectional future" resembles the observed "cross-sectional past"-a form of cross-sectional stability that is much more likely to provide a good approximation of the finite-(N, T) problem under study than the classical spiked-covariance-model asymptotics underlying (2) and (3).

Dynamic Factor Models: Pervasive Needles and the Blessing of Dimensionality
As mentioned in the Introduction, econometricians faced the needle-in-the haystack problem long time before probabilists and mathematical statisticians did-long time before the expression "big data" was coined. Econometrics require operational solutions: the asymptotic scenario of a needle that, at the end of the day, cannot be found is somewhat inappropriate in the econometric context. Moreover, economic data are seldom i.i.d. Gaussian; they generally are serially auto-and cross-correlated and often heavy-tailed (often yielding infinite fourth-order moments). An econometric theory of high-dimensional time series therefore needs to address much more general situations than those covered by Gaussian spiked covariance models.
The econometric theory of factor models for high-dimensional observations arose from that need and takes various forms; the spirit of the approach, as initiated by Chamberlain (1983) and Chamberlain and Rothschild (1983), is easily explained, however, from a multispiked (q spikes) extension of (b) yielding q exploding eigenvalues λ 1;N , . . . , λ q;N , namely, q needles, which, in a sense, are growing with the haystack.
Under their most general form-the so-called general dynamic factor model (the terminology generalized dynamic factor model is used equivalently) or GDFM introduced in Forni et al. (2000)-factor models proceed as follows. The observation is still of the form X N,T described in (1), but the assumptions are much more general, allowing for serial auto-and cross-correlations and non-Gaussian densities. Assuming that they exist, denote by Σ Σ Σ N (θ), θ ∈ [0, π] the N × N spectral density matrices of the N-dimensional processes {X N,t := (X 1t , . . . , X Nt ) | t ∈ Z}. These density matrices are nested as N → ∞. Some eigenvalues-q of them are (θ−a.e.) exploding, and the other ones remain (θ−a.e.) bounded as N → ∞ . This yields a decomposition of Σ Σ Σ N (θ) into where Σ Σ Σ χ N (θ), called the common spectral density, has reduced rank q and q diverging dynamic eigenvalues-the terminology dynamic eigenvalues used for the eigenvalues of a spectral density matrix was coined by Brillinger (1964Brillinger ( , 1981, who introduced the concept. As for Σ Σ Σ ξ N (θ), called the idiosyncratic spectral density, it only has bounded dynamic eigenvalues. That decomposition of Σ Σ Σ N (θ) in turn induces a decomposition of the observations X it into where χ it , with spectral density Σ Σ Σ χ N (θ), and ξ it , with spectral density Σ Σ Σ ξ N (θ), are mutually orthogonal at all leads and lags; χ it is called the common component, ξ it the idiosyncratic component. Since the spectral density Σ Σ Σ χ N (θ) has reduced rank q, the common component χ it is driven by q << n mutually orthogonal white noises: (L, as usual, stands for the lag operator), while the idiosyncratic component ξ it , having bounded dynamic eigenvalues, is only mildly cross-correlated (it can be strongly autocorrelated, though); see Hallin and Lippi (2019) or Lippi et al. (2022) for recent surveys. Further constraints can be imposed on the decomposition. Among them, the static loading assumption Watson 2002a, 2002b;Bai and Ng 2002;Bai 2003, and many others) (B i here is a q × 1 real vector) under which the shocks u t are loaded in a static and contemporaneous way, while in (7) the loadings are filters B i (L) and the shocks are loaded in a dynamic way (involving lagged shock values). As soon as the spectral density Σ Σ Σ N (θ) exists and admits a finite number q of exploding eigenvalues, a GDFM representation (6) with dynamic loadings (7) exists (although an infinite number of exploding eigenvalues in theory is not impossible, such cases are extremely artificial and contrived: see Hallin and Lippi (2014) for an example). The existence of a factor model decomposition (6) with static loadings (8), however, is a strong assumption one should like to avoid. Multivariate economic time series, let alone the infinite-dimensional ones, typically involve leading and lagging series, loading the common shocks with various leads and lags. The GDFM, which allows for this and basically does not place any restrictions on the data-generating process, is much preferable in that respect.
The GDFM was introduced by Forni et al. (2000), who establish its asymptotic identifiability-the "blessing of dimensionality"-and propose a consistent (consistency here is not uniform in t, though: see Forni et al. (2000) for details) estimation strategy based on Brillinger's concept of dynamic principal components.
The moot point with dynamic principal components, however, is that their computation involves two-sided filters, i.e., it involves both the past and the future values of the observed X it 's. This is fine in the "center" of the observation period but not at the edges. In particular, the estimation, for forecasting purposes, of u T , is likely to be poor irrespective of N and T as the future observations X N,T+1 , X N,T+2 , . . . are not available.
The advantage of static loadings is that, under (8), q coincides with the number of exploding "static" eigenvalues-the eigenvalues of the N × N covariance matrix C N of X N,T . Additionally, Bai and Ng (2002) and Watson (2002a, 2002b) propose an estimation method relying on a traditional principal component analysis of X N,T . These principal components, at given time t, only require the contemporaneous observation X N,t := (X 1t , . . . , X Nt ) : no problems, thus, for t = T and forecasting issues. This, and the popularity of traditional principal component methods, explains why practitioners, somewhat regrettably, prefer the static contemporaneous loading approach despite its lesser generality, its lack of parsimony, and the fact that the crucial and quite restrictive underlying assumption (8) may not hold. This latter fact is often dispelled by arguing that lagged values of the factors can be incorporated into a static loading scheme via stacking. This, which may very severely inflate the number of factors, is a flawed argument. Indeed, there is no guarantee that these lagged values enjoy the pervasiveness properties required from static factors, with the consequence that, in the traditional principal component estimation method, they get lost to the idiosyncratic.
Up to this point, the general dynamic factor model, in view of its generality, is a brilliant idea, the practical implementation of which apparently is blocked, hopelessly, by the the two-sided nature of Brillinger's dynamic principal components. Here is, however, where Manfred Deistler enters into the picture.

Manfred Deistler and the General Dynamic Factor Model
Manfred Deistler's seminal contribution to the analysis of general dynamic factor models originates in his interest in the properties of reduced rank processes. The following result-unrelated, at first sight, to factor models and high-dimensional time series-follows from Anderson and Deistler (2008a).
Anderson and Deistler call tall the transfer function D(L) of a Q-dimensional process {Y t | t ∈ Z} driven by a q-dimensional white noise {w t | t ∈ Z} where q < Q, namely, a process of the form where D(L) is some Q × q filter. By abuse of language, we say that the process {Y t } itself is tall.
Recall that a process satisfying (9) with D(L) = D ij (L) is called rational if the filter D(L) itself is rational, that is, if Q × q matrix filters E ij (L) and F ij (L) and integers m and p exist such that, for all i = 1, . . . , Q and j = 1, . . . , q, F ij (0) = 1, the degree of E ij (L) is m, the degree of F ij (L) is p, and D ij (L) = E ij (L)/F ij (L); note that D(L) then involves a finite (although unspecified) number P = Qq(m + p + 1) of real parameters: call it rational of order P. Also, recall that a subset of a topological space is generic if it contains an open and dense subset. Denoting by Π P the parameter space (a complete description of Π P would require the filters F ij (L) and E ij (L) to be stable and having no common zeroes) indexing the family of rational filters of order P, the genericity below is meant for Π P as a subset of R P : call it Π P -genericity.
As in Forni et al. (2015), the way the fundamental result of Anderson and Deistler (2008a) is presented here is geared towards its general dynamic factor model application and slightly differs from the original formulation. Rather than a rational spectrum of order P (P unspecified but finite), the latter assumes a state space representation with finitely many parameters. For the sake of simplicity, the formulation below also slightly differs from the statements in Sections 3 and 4 of Forni et al. (2015), where more general and complete results can be found. Proposition 1. Let {Y t | t ∈ Z} be a tall process satisfying (9) for some rational Q × q filter D(L). Then, for Π P -generic values of D(L) (P unspecified), Y t admits, for some K < ∞, a VAR(K) representation of the form where A(L) is Q × Q and R is Q × q.
Refinements of this result can be found in Anderson and Deistler (2009), Chen et al. (2011), and Anderson et al. (2012, where some of its consequences are also discussed. The most elegant proof is found in Anderson et al. (2016). Anderson and Deistler (2008a) only briefly mention, without entering into details, the relevance of their result for the general dynamic factor model (with dynamic loadings as in (7)). That relevance is further discussed in Anderson and Deistler (2008b, 2009), Deistler et al. (2010), and Forni and Lippi (2011; it is fully exploited in Forni et al. (2015Forni et al. ( , 2017Forni et al. ( , 2018, Barigozzi et al. (2021aBarigozzi et al. ( , 2021b, and several subsequent papers. The same result also has important consequences in the macroeconomic applications of factor models and the estimation and identification of structural VARs: see Forni et al. (2020).
The relevance for the general dynamic factor model of the above result stems from the fact that, by definition, for almost all of the values of i 1 , . . . , i q+1 , the (q + 1)-dimensional vector Y t := (χ i 1 ,t , . . . , χ i q+1 ,t ) of common components is tall, with Q = q + 1 and a q-dimensional white noise w t = u t in (9). Under the very mild additional assumption that {Y t } is rational (for some unspecified m and p), the above proposition thus applies.
Suppose, for convenience (and without loss of generality), that N = M(q + 1), where M ∈ N, and write (χ 1,t , . . . , χ N,t ) as (χ χ χ 1 t , . . . , χ χ χ M t ) where χ χ χ k t ∈ R q+1 , k = 1, . . . , M. Assuming the rationality of all χ χ χ k t 's, the Anderson-Deistler result applies, so that that is, generically, (genericity here is meant for any fixed N) for some A n (L), R n , and v t , we have for χ χ χ nt the block-diagonal VAR representation A n (L)χ χ χ nt = R n v t with (q + 1) × (q + 1) blocks A k (L), k = 1, . . . , M, an N × q matrix R n , and q-dimensional white noise v t ; it can be shown, moreover, that v t = Ou t for some q × q orthogonal matrix O.
As a consequence, where, being a linear transformation of idiosyncratic components, A A A n (L)ξ ξ ξ nt itself is idiosyncratic. This is a static factor model (static contemporaneous loadings) for the filtered series A n (L)X nt : traditional principal components thus, which do not involve filters, can be used instead of dynamic ones in the estimation of (11) once consistent estimatorsÂ k (L) are substituted for the unspecified filters A k (L). Such estimators can be constructed from the decomposition of the estimated spectral density followed by inverse Fourier transforms and Yule-Walker VAR estimation in dimension q + 1: see Forni et al. (2015Forni et al. ( , 2017 for details. Based on this, Forni and Lippi (2011) and Forni et al. (2015Forni et al. ( , 2017 propose a winning strategy for a consistent one-sided reconstruction of the common components χ it (hence, also the idiosyncratic components ξ it ), their impulse-response functions, the common shocks u t , the loadings filters B i (L), etc. The corresponding asymptotic distributions are derived in Barigozzi et al. (2021b). Surprisingly, the consistency rates are comparable to those obtained by Bai (2003) for the static method (without the preliminary filtering (11))the validity of which, however, requires the much more stringent assumptions of the static model (8). The same results also apply in the identification and estimation of volatilities Barigozzi and Hallin (2016, of time-varying GDFMs Barigozzi et al. (2021a), and in the prediction of conditional variances, values at risk, and expected shortfalls Hallin and Trucíos (2022); Trucíos et al. (2022).
The (asymptotic) validity of this strategy, of course, requires (11) to hold. This has to be assumed. Since, however, it holds generically, that assumption is extremely mild.
Numerical exercises (both Monte-Carlo and empirical) demonstrate the forecasting superiority of the resulting method, which seems to outperform all of the other methods proposed in the literature while remaining valid under much milder and more general assumptions on the data-generating process. Forni et al. (2018) show that, even when the assumptions (8) of static loadings are satisfied, the GDFM method still performs better than the static one. Barigozzi et al. (2021b) finalize the study of its asymptotic properties by establishing the corresponding asymptotic distributional results. Manfred Deistler, thus, besides his many contributions to the mathematical foundations of time-series analysis, can be credited for unlocking the applicability of the general dynamic factor model with dynamic loadings instead of the more restrictive, less performant, and less parsimonious model with static loadings-thereby turning a beautiful but hardly applicable theoretical model into a fully operational and effective statistical tool, of central importance in contemporary applied time-series econometrics.