Linear System Challenges of Dynamic Factor Models

: A survey is provided dealing with the formulation of modelling problems for dynamic factor models, and the various algorithm possibilities for solving these modelling problems. Emphasis is placed on understanding requirements for the handling of errors, noting the relevance of the proposed application of the model, be it for example prediction or business cycle determination. Mixed frequency problems are also considered, in which certain entries of an underlying vector process are only available for measurement at a submultiple frequency of the original process. Certain classes of processes are shown to be generically identiﬁable, and others not to have this property.


Introduction
The purpose of this paper 1 is to survey (and provide commentary on) a collection of contributions to do with Dynamic Factor Models developed especially by the authors, principally the first two, over more than a decade, taking the ideas as far as a listing of some problems of current interest. Matters which we are especially concerned to highlight include 1.
The theoretical underpinning for allowing modelling to focus on AR rather than ARMA modelling, and the simplifications this makes possible for the identification of models.

2.
Issues arising in practice from numerical problems in attempting to build models; some of these errors can be mitigated through the use of carefully chosen forms for the models. 3.
The difficulties arising from modelling time series of different but multiply related periodicities, e.g., monthly and quarterly, and existing tools for resolving, at least partly, the difficulties. These build on the ideas developed for single-periodicity modelling.
Feeding in all these matters is the application of ideas of digital signal processing, for which issues of handling numerical problems, and handling data streams of multiple periodicities have been well studied. The relevance of ideas of digital signal processing to dynamic factor modelling has perhaps not been fully recognized.
Dynamic Factor Models were introduced two decades ago, see in particular Forni et al. (2000), Forni and Lippi (2001), Stock and Watson (2002), Bai and Ng (2002), with macroeconometric applications as their main purpose. The basic idea was that a small number of factors, common to all the time-series of a large macroeconomic dataset, could be used both in structural modeling and forecasting of some series of interest like inflation, industrial production, unemployment. For a typical dataset, see the monthly US macroeconomic panel, including 128 series, described in great detail in McCracken and Ng (2016).
We begin by recalling and commenting on a number of ideas in a recent survey we coauthored, Lippi et al. (2022). The survey confined attention to the modelling of multivariate time series with the same sampling frequency for every component. As foreshadowed above, in this paper, besides dealing with such material, we treat modelling where time series data can be available with different periodicities, provided they are integrally related, e.g., monthly and quarterly.
Throughout the paper, we shall confine attention to stationary processes. We take some time to define what is meant by a dynamic factor model, progressively introducing and commenting on the various assumptions that in aggregate constitute the definition. The comments may be as important as the assumptions. The various assumptions reflect reality to some degree, and they also reflect the requirement to formulate a solvable problem. We also offer commentary on the notion of errors; the comments deal with two issues, first the need to in some way limit the errors involved in determining a process model, given that the model is obtained from sample statistics rather than population data, and second, the need to choose a type of process model that is appropriate to the application: one does not want massive errors in whatever it is desired to compute because of sensitivities intrinsic to the model. An overall conclusion (motivated by the signal processing literature) is that on occasions state-variable models may be preferred to ARMA or AR. Accordingly, some discussion is also offered explaining how state-variable models can be determined without even intermediate use of an ARMA or AR model 2 . Analogously to the result that AR models can often be determined from covariance data using the Yule-Walker equations is an algorithm involving a finite number of rational calculations for obtaining state-variable models of a canonical spectral factor from lagged covariance data. There may well be numerical advantages in using such a state-variable model.
The last part of the paper traces developments applicable to multiple frequency systems, or systems in which data of different periodicities is collected. More precisely, we postulate that there is an underlying stationary zero mean vector time series (y t ) = [(y f t ) (y s t ) ] which is defined at time t = 0, 1, 2, . . . and with y f t (the subvector of "fast" components) observed at all t while y s t (the subvector of "slow" components) is observed at t = 0, J, 2J, . . . for some integer J > 1. The task is to build a canonical spectral factor of the process (y t ) using the measured second-order statistics. The whole set-up can be embedded within a high-dimensional time-series framework of course, but this brief statement encapsulates what is different about the mixed frequency problem.
The conclusions section of the paper sets out a number of issues which have not yet been fully addressed.

Dynamic Factor Models
Dynamic factor models can roughly be described as those associated with a high dimension, say N, vector process (y N t ) (normally taken with zero mean) defined at t = 0, 1, 2, . . . and possessing an additive decomposition with (χ N t ) termed the process of common components or latent variable process, and (ξ N t ) termed the process of idiosyncratic components. Typically, (χ N t ) is regarded as the true process of interest, with (ξ N t ) some kind of contamination intruding into the measurement process ('noise' in electrical engineering parlance, which in general is colored rather than white). However one possible alternative interpretation is that (1) displays the decomposition of stock price observations into a part (χ N t ) representing the comovements between the variables (for instance representing the "market" effect on N different stock returns) and individual movements (ξ N t ) (representing effects associated with each specific firm). It is assumed that the one-dimensional processes (χ it , t ∈ Z), i ∈ {1, . . . , N} are also stationary and strongly dependent across the index i, while the processes (ξ it , t ∈ Z), i ∈ {1, . . . , N} are stationary and weakly cross-dependent (the terms strongly dependent and weakly cross-dependent requiring technical definitions). It is standard to allow for variation in N above some N 0 , with y N t , χ N t and ξ N t being nested, for example y N t is the subvector of y N+1 t formed from its first N entries. Both (χ N t ) and (ξ N t ) are zero mean and the two vector processes are mutually independent. The spectral densities Φ N χ (e jω ), ω ∈ [−π, π] and Φ N ξ (e jω ), ω ∈ [−π, π] are assumed to exist and (y N t ) is then necessarily stationary with These spectral matrices obviously inherit the nesting property of the underlying processes.
We sum up the above description as follows.
Assumption 1. A nested (with N) set of observation processes (y N t ) of zero mean arises as an additive combination of a stationary zero mean nested common component process (χ N t ) and a stationary zero mean nested process of idiosyncratic components (ξ N t ), as in (1). The latter two processes are independent, the first is strongly cross-dependent (a term defined subsequently) and the latter is weakly cross-dependent (also defined subsequently), and the spectra of the three processes are related by (2).
Note that there is no assumption to this point that the spectra are rational, or otherwise finitely parametrized.
To formulate a solvable identification problem, a number of other assumptions are always invoked in considering dynamic factor models, and we will introduce them gradually and attempting a logical progression over the subsequent subsections, albeit with an attempt to justify their relevance.

The Goal of Modelling
Exactly what should be the goal of modelling in science is as much a philosophical as it is a scientific question, with answers often reflecting on the real-world utility of any model. Much the same question arises in connection with economic modelling; the very highlycited paper Sims (1980) seeks to peer behind the mathematical formalism to understand exactly what relation the models of economics have to the real world.
For the purposes of this paper, we will simply assert that a model may be desired for various purposes which may not all be applicable for any one model at a particular time. Two of these for example might be forecasting of future values of a time series, and identification of business cycles. For modelling and especially forecasting purposes, often a canonical (stable, miniphase) spectral factor of Φ N χ (e jω ) is desired. Such a spectral factor is to be constructed using the data y N t with 0 ≤ t ≤ T where T is some finite positive integer. The dimension N may in fact exceed T, indeed theoretical studies focus on letting N rather than T tend to infinity.
To make progress, further assumptions beyond the nesting property are needed. Some of these assumptions bear on the ability to somehow dispose of the contaminating influence of (ξ N t ), and we will deal with them further below, and at that time clarify the strong and weak dependence notions. So assume temporarily that somehow we can work just with (χ N t ).

The Issue of Errors and Continuity
Observe that there are different ways to describe a stable miniphase spectral factor, see e.g., Hannan and Deistler (2012) and Caines (2018) for the relevant linear system theory background. If it has a rational transfer matrix, state-variable descriptions could be used (and then there is freedom to choose a coordinate basis). There is interest especially in those which are of minimal dimension. Again, if the spectral factor is rational, a left or right matrix fraction description could be used (a left matrix fraction description being an ARMA description, subject to rules about leading coefficient matrices). There is interest especially in those where the fraction is coprime (common factors are generally as unhelpful in a matrix transfer function as they are for the numerator and denominator of a scalar transfer function). If it is known to be autoregressive, an AR description may be used. A very different form again is provided by a series of graphical images of the values of the transfer matrix entries as a function of ω; this might even be the most convenient form of model if the model does not have a rational transfer function matrix.
No matter what means are used to construct a model for the process (χ N t ), it is clear that one needs to be certain that a small error in (say) the lagged covariance sequence values (due say to use of sample rather than population statistics) will ultimately flow through to a small error in the quantities obtained in whatever particular application is envisaged when using the model. Since the model is the tool underpinning the application, it is reasonable to seek further assurance that the small error in the lagged covariance values (which in almost all cases are the raw data from which any model is constructed) will also give rise to a small error in the actual model of the minimum phase transfer function, be it frequency domain description, AR, ARMA, or general state-variable. Thus, if the model is being used for prediction, one ultimately wants a small error in the estimation variance of the predictions, and an accurate AR model may consequently be sought (assuming existence of an AR model is guaranteed). On the other hand, and even if an AR model should exist, if the model is being used to look for spectral peaks capturing some oscillatory phenomenon, or to understand the different bandwidths of the processes of interest, one is more likely to want a small error in the frequency domain description of the model, or even the pole positions of the model, as opposed to the AR coefficients.
Evidently then, some kind of a continuity is being sought (small errors in data used for generating the model should give small errors in the ultimate answer in the application for which the model is being used), but evidently also the particular parameters relevant to considering continuity, including continuity within the model description, are application dependent.
The most common sort of continuity in a model to which attention is given is the continuity of coefficients in a finite dimensionally parameterised model. One thinks of the class of systems of interest as collection of subsets, where the subsets are obtained by performing (data driven) model selection or by making various assumptions about integer parameters associated with the dimension of the innovations process, or a minimal state variable realization. However, such thinking may exclude certain systems at the start, and on occasions continuity of a frequency domain plot may be of more interest than continuity of a parameter in a model of some assumed order.
Because so much work in econometrics is linked to forecasting, there is a justified preference for using (when they exist) autoregressive models. The crucial part of the modelling to get right is to have accurate AR coefficients, which (in the event that Yule-Walker equations are used to derive the coefficients) probably requires a well-conditioned and thus more easily invertible matrix of lagged covariance coefficients, in addition to some suitable level of accuracy in the lagged covariance coefficients themselves. [More precisely, since on occasions a singular matrix may appear in the Yule-Walker equations, one will want a modified form of well-conditioning as indicated below.] One should note that the conditioning of the matrix in question can be related to an important property of the associated multivariate spectrum. Let ν max denote the maximum over ω of the maximum eigenvalue of the Hermitian matrix Φ N χ (e jω ), and let ν min denote the minimum over ω of the smallest nonzero eigenvalue of Φ N χ (e jω ). (Zero eigenvalues arise if the matrix is singular and these are for this purpose discarded; it is assumed the number of zero eigenvalues is the same for all ω). The condition number of the block Toeplitz matrix inverted in a Yule-Walker calculation is known to approach ν max /ν min as the size of the matrix goes to infinity, Miranda and Tilli (2000); Serra (1999), with modification in the singular case. Thus, ill-conditioning in a Yule-Walker calculation may be an automatic consequence of the nature of the spectrum being modelled.
In a sense, if we use an information criterion such as AIC for estimating the AR order p, such a problem is unlikely to occur Shibata (1980).
In contrast however to the use of a model for prediction, if bandwidth determination and oscillatory phenomena or accuracy of pole positions in the model are issues of concern, it is a well-established, indeed now old, fact in digital signal processing that autoregressive (and ARMA) models (known often in the relevant literature as direct models) can be very dangerous, in the sense that pole positions and frequency peaks can be extremely sensitive to the AR coefficients, e.g., Mitra (2011); Rabiner and Gold (1975). MATLAB (Signal Processing Toolbox) even allows the determination of "lattice-form" state-variable realizations of a prescribed transfer function that will have low or even minimum sensitivity in the frequency response to truncation or round-off errors associated with the coefficients in the realization, or minimum round-off errors in calculations using the coefficients in the realization to compute outputs from inputs.
Using an AR representation for the canonical spectral factor will likely be a bad idea in this situation: the numerical parameters in the model may need to be known to an unreasonable degree of accuracy. At the same time, as the signal processing literature indicates, there may be a state-variable model for which much less demanding accuracy requirements apply to the numerical parameters in the model, in order to learn from the model what is desired.
Note further that if multi-step prediction is in contemplation, rather than one-stepahead prediction, recursion using AR coefficients will also be risky for those systems with pole positions which are very sensitive functions of the AR coefficients. In the event though that prediction was required for a certain fixed number of steps, h say, ahead, as opposed to an interval extending some distance into the future, one could potentially avoid recursion by directly estimating an h-step prediction model by minimizing the h-step prediction errors by least squares. See for example Bhansali and Ghosh (1999), which distinguishes the two approaches with the names "plug-in method" and "direct method".
To sum up then, we must be concerned about two things: 1.
If there is a true underlying process with an associated canonical spectral factor, and a canonical spectral factor obtained using approximate values of lagged covariance coefficients, there will be an error between the two. 2.
We should use a description of the computed canonical spectral factor that is appropriate to the application, in the sense that the error mechanisms in the relevant computations will not cause significant errors in the quantities computed as part of the application.
In connection with the first issue of continuity of the canonical spectral factor frequency response, reference Anderson (1985) provides a sufficient condition for continuity: the derivative (with respect to ω) of Φ N χ (e jω ) must be bounded, to ensure that a small L ∞ error in the spectrum gives rise to a small L ∞ error in the canonical spectral factor frequency response. To relate this to a covariance sequence condition, observe that Formal differentiation yields and by the Weierstrass M-test, uniform convergence in ω will occur provided A sufficient condition ensuring this property is incorporated in the following: Assumption 2. The derivative (with respect to ω) of Φ N χ (e jω ) is bounded, which is guaranteed if the covariance sequence E[χ N t+s (χ N t ) ] goes to zero faster than (1/s) α for any α > 2.
As explained in Anderson (1985), in the absence of a derivative bound, an arbitrarily small variation of a scalar spectrum can produce an arbitrary phase shift anywhere in (0, π) in the associated spectral factor (though the spectral factor magnitude is obviously simply given by taking the square root of the spectrum value at each frequency, and so continuity applies to the magnitude). Evidently, without the Assumption, the spectral factorization problem is almost ill-posed for any real world situation.
As noted above, this assumption is relevant to securing accuracy in frequency domain behavior of the miniphase spectral factor. It is at best indirectly relevant to securing accuracy in say an AR model being used for prediction purposes, if such is known to exist. Nevertheless, if by virtue of some other assumptions or argument, a stable AR model (and surely stability is a requirement) exists, then this imposes an exponential rate of decay (with increasing lag) on the lagged covariances, and the assumption is fulfilled. Thus, the assumption is not offensive to custom. Actually, the ratio ν max /ν min is also partially relevant to the assumption: the greater is this ratio, the more sensitive appears to be the spectral factor frequency response to variations in the power spectrum, Anderson (1985).

Controlling the Number of Parameters to Be Estimated
It is clear that any canonical spectral factor of practical use has to be parametrised by a finite number of quantities, be they integers or real parameter values. One way to ensure this can occur is simply to lay down an assumption that the spectrum is rational: is rational and therefore bounded in e jω .
The boundedness assumption rules out the inclusion of a deterministic component. On the face of it, a rationality assumption is a leap of faith-there seems no reason grounded in physics or economics that suggests this is reasonable, apart perhaps from a belief in the general applicability of an Occam's Razor principle. What makes it reasonable and less instrumentalist though is to recognize that a very minor and more reasonable variant on it will be satisfactory. Using the crucial fact that any covariance sequence satisfying Assumption 2 (corresponding to a stationary process) can be approximated arbitrarily closely in L p norm for any p ∈ [1, ∞) by a covariance with rational spectrum 3 , it follows that if the true spectrum is not rational, under the decay condition of Assumption 2, the rationality assumption of Assumption 3 becomes in practical terms acceptable, as it can apply to a spectrum arbitrarily close to that of the true process. There is however a caution: while the approximation idea is valid for a fixed N, the assumption says nothing about what happens when N goes to infinity, and the next and less innocent assumption given below is needed to address the issue.
More specifically and given Assumption 3, there will in fact be three critical Ndependent integers associated with the spectral factor of Φ N χ (e jω ): 1.
The dimension n of a minimal state-space realization of the canonical spectral factor (or its McMillan degree) 2.
The number of columns q in the minimal state-space factor, or the dimension of an innovations sequence (this being a white noise sequence, known in this context as the dynamic factor sequence) 3. The , which is the dimension of the space spanned by χ 1t , χ 2t , . . . , χ Nt .
While the existence of n (for fixed N) is a direct consequence of Assumption 3, the other two integers deserve some comment. The rationality assumption implies there is a rational canonical spectral factor for (χ N t ), thus where F ∈ R n×n , G ∈ R n×q with rank q, H N ∈ R N×n with rank r, and the matrices H N are nested; this means we can write for n-dimensional row vectors h 1 , h 2 , . . . , The process (w t ) is zero mean white noise with E[w t w t ] = I q . Stability and the miniphase property correspond to We note that one can also write with obvious definition of the N × q transfer matrix K N (z). As is well known, such a stable and miniphase transfer function satisfying the spectral factorization equation Φ N χ (e jω ) = K N (e jω )(K N (e −jω )) is unique up to right multiplication by a real orthogonal matrix, here of dimension q × q, but through normalization can be made unique.
Since GG is of size n × n, and the number of columns of G, viz. q, is minimal in a canonical spectral factor, it is immediate that q ≤ n. Further, since we see that r is bounded by the dimension of the square matrix E[x(0)x (0)], i.e., r ≤ n. However, also, γ N (0) is related to the spectrum by which shows that r ≥ q.
All three integers evidently depend on N, and one of our interests is to let N vary. However, using these integers, we introduce by fiat an almost natural strengthening of the nesting property of the underlying processes and their spectral densities and thereby almost 'define away' the dependence on N: Assumption 4. There exists a positive integer N 0 such that for all N ≥ N 0 , the integers n, q, r are independent of N.
By way of heuristic explanation or even justification, we comment that as N increases, the parameter count in the canonical spectral factor increases, linearly with N. However, the assumption means that the number of independent driving signals which are the ultimate cause of the output of the canonical spectral factor (the dimension of what is termed the dynamic factors of the model) remains constant at q, the complexity (minimal state-space dimension) n of the model stays constant, and once the output vector has become sufficiently big (of size r), all further outputs are linear combinations of other ones, and not even delayed versions of others. There is almost no new information in these further outputs 4 , other than that explaining how they are related to earlier outputs. Implicit then in the assumption is a sort of Occam's Razor principle, limiting the rate of increase in complexity (the parameter count effectively) of the model to the least amount consistent with increase in N.
A physical analogy arises if one considers χ t as like a vector of vibrations arising in a piece of machinery, or electrical thermal noise at the ports of a linear circuit. There may be a limited number q of sources of the vibrations (resistors in the circuit analogy), and one could add more and more sensors (corresponding to N) to the set-up. Their presence would not change the underlying dynamical behavior of the machinery or circuit, governed by a set of differential equations with 'complexity' n. At some point when adding sensors (but with no associated sensor noise since that is to be bundled in with ξ t in this analogy), say when N = r, essentially no new information is provided by an additional sensor: the additional sensor just provides a linear combination of the measurements from other sensors.
The assumption that q is independent of N is shared by all works on dynamic factor models. A large majority in that literature also shares the assumption that r (and n) is independent of N. However, as the following simple example shows, q can be independent of N without r being independent of N. Let where w t is a scalar white noise, so that in this case q = 1. Now, if the α i ∈ [0, .9] and α i = α j for all i = j, then the dimension of the space spanned by χ it , i = 1, . . . , N is N, so that Assumption 4 does not hold, see Forni et al. (2015) for details.
Henceforth, we shall neglect the dependence of n, q, r on N for small values of N.
We summarize the inequalities linking the three key integer parameters and the minimum allowed value N 0 of N in the following theorem: Theorem 1. With notation as above and under Assumptions 2-4, there holds q ≤ r ≤ n and r ≤ N 0 , (with r, q, n independent of N ≥ N 0 ).
Actually, r > q has always been observed in empirical applications to large macroeconomic data sets, Barigozzi et al. (2021). One can observe also that as N grows, the number of parameters grows linearly with N, in fact as O(nN).
Given covariance data for (χ N t ), one way the integer r can be obtained is to use the fact that for large enough N, Sample averages have to replace the population averages, and methods such as singular value decomposition could be used to execute the calculations.
For future reference, we note that the equation implies that the McMillan degree of K N (the dimension of a minimal realization of K N , i.e., n) is one half the McMillan degree of Φ N χ (e jω ), or equal to the McMillan degree of the transfer matrix with associated impulse response the lagged covariance sequence Anderson (1969).

Strong and Weak Cross-Dependence
Strong dependence captures the notion that as N tends to infinity, the effect of the input signals, and all the dynamic complexity showing up in the state variable, continue to show up in a nondecreasing way in the progressively added output components. This can be captured formally by Assumption 5. For all ω ∈ [−π, π], the largest q eigenvalues of Φ N χ (e jω ) diverge linearly with N as N → ∞, while the other eigenvalues are zero. In addition, the r largest eigenvalues of T −1 ∑ T 1 χ N t (χ N t ) diverge to infinity linearly with N, while the other eigenvalues are bounded.
Note that the second part of the assumption is not a consequence of the first part, Lippi et al. (2022).
Weak dependence in a sense is the converse. The different components of ξ N t do not 'reinforce' one another due to some common dependences, but rather have very limited dependence between each other: Evidently, for any fixed T, the larger N becomes, the greater is the 'signal-to-noise' ratio between χ N t , t ∈ {0, 1, 2, . . . , T} and ξ N t , t ∈ {0, 1, 2, . . . , T}. Motivated by communication engineering, λ max (Φ χ (e jω ))/λ max (Φ ξ (e jω )) can be regarded as a surrogate for signal-to-noise ratio. This might prompt us to contemplate the possibility of 'denoising', i.e., separating out the (χ N t ) process from (y N t ) for large enough N. Note that in using terminology with the word 'noise', we acknowledge that this interpretation of the (χ N t ) process will not be appropriate in some applications.
There are similar but not identical ways to introduce assumptions of this type which, taken in conjunction with the earlier assumptions, are just as suitable. Discussion with comparison can be found in Lippi et al. (2022).
This concludes the setting out of assumptions on the processes being examined. We turn now to the procedures used for modelling.

Obtaining the Common Component Process from the Measurement Process
The key to achieving an additive decomposition of the measurement process is to appeal to the strong dependence of the processes (χ N it ) and the weak cross-dependence of the processes (ξ N it ). It was shown in Stock and Watson (2002), Forni et al. (2009) that under the earlier assumptions a consistent estimator of χ N t can be obtained by using a static PCA of y N t . At a similar time, Bai and Ng (2002) demonstrated consistent estimation of r; the work relies also on using asymptotic principal components, is nonparametric, and does not require a low value of n. As for the input dimension of the canonical spectral factor of the common component process, Hallin and Liška (2007), Amengual and Watson (2007), Bai and Ng (2007), Onatski (2009) allow estimation of q.
As noted in Forni et al. (2009), it is evident that for all N ≥ N 0 , the dimension of the space, call it S N t , spanned by χ it , i = 1, 2, . . . , N is r. Let ( f t ) be an r-dimensional process such that f t forms a basis for S N t and f t = Sχ N t where S is an r × N selector matrix independent of t and picking out a spanning set of entries 5 of χ it for i limited to the set i = 1, 2, . . . , N 0 . It follows also that χ N t = L N f t for some factor loading matrix N × r matrix L N independent of t. The entries of f t are often termed minimal static factors.
Actually, in place of the selector matrix S, one can use any left multiple TS with T nonsingular and r × r. 6 A minimal static factor is then seen to be unique up to premultiplication by a constant nonsingular matrix T, with the factor loading matrix unique up to postmultiplication by T −1 . Note that the way L N has been constructed, the successive (with N) matrices are nested. It is possible to eliminate the nonuniqueness using normalizing procedures, see, e.g., Lippi et al. (2022).
The connection with the earlier model of (3) is straightforward to see. Evidently, for some r × n matrixH, there holds f t =Hx t (7) with The process ( f t ) in a sense is now the real process of interest. Of course, its dimension does not grow with N. Its determination however depends on using large N, in order that (χ N t ) can be extracted from (y N t ), and then ( f t ) is extracted from (χ N t ). Importantly, it now becomes computationally practical to consider the determination of the integer n from the lagged covariance sequence determined by f t . By way of background, recall, see, e.g., Hannan and Deistler (2012); Ho and Kalman (1966) that n is available as the rank of all sufficiently large square block Hankel matrices formed from the covariance data Singular value decomposition (SVD) is a computationally effective way to determine the rank. Of course, given that the covariances are actually determined using sample rather than population averages, and using finite rather than infinite values of N and T to extract (χ N t ) from (y N t ), judgment will have to be used to deal with the computational errors; SVD really is a tool for estimating rather than determining the rank. Further details appear in a later section.

Two Alternative Approaches to Additive Decomposition of the Spectral Matrix
For completeness, we summarize here two ways, see Lippi et al. (2022), whereby the spectra Φ N χ (e jω ) and Φ N ξ (e jω ) can be derived from Φ N y (e jω ). First, an assumption is made that the idiosyncratic process vector is a vector of independent second order processes. Under such a special assumption, it is in principle possible using partial fraction expansion of the power spectrum Φ N y (e jω ) to separate the two spectra Φ N χ (e jω ) and Φ N ξ (e jω ). Second, dropping the rationality assumption but retaining the assumption that the idiosyncratic process vector is a vector of independent second order AR processes, and by considering (q + 1) × (q + 1) submatrices of Φ N χ (e jω ) which contain precisely one diagonal entry (this requires N > 2q), and which therefore have all entries known from Φ N y (e jω ) except for that diagonal entry, the singularity of the matrix generically enables recovery of the diagonal entry.
The diagonality assumption on Φ N χ (e jω ) in both procedures is stronger than the weak cross-dependence and can be seen as a misspecification of the general model. However, Doz et al. (2012) argues that: (i) under diagonal Φ N ξ (e jω ) the model can be estimated by Maximum Likelihood, (ii) as N tends to infinity the effect of the misspecification vanishes. Whether the effect also vanishes in the procedures outlined above is unknown.

Tall Stable Miniphase Spectral Factors and Singular Rational Spectra
An important property is (for N > q) the tallness of the transfer function matrix . Likewise, the transfer function matrix between the innovations and the minimal static factor process ( f t ), being given byH(I − Fz) −1 G is r × q, and while always r ≥ q, as noted earlier r > q is normally encountered in practice.
In this section, we recall results appearing in Deistler (2008, 2009); Anderson et al. (2012b); Chen et al. (2012) stemming from this property. The first key result deals with the zeros of the generating model.
Theorem 2 (Anderson and Deistler (2008)). Consider the stable, miniphase model (3) with N > q and assume that the entries of the matrices F, G, H N take generic values. Then the model has the zero free property 7 In Theorem 2 we make the assumption that the entries of F, G, H N take generic values, with the requirements of stability and the miniphase property. The inclusion of these requirements is dealt with further below. Genericity of course can only be defined with respect to a well specified parameter space. In Theorem 2, in agreement with the Anderson and Deistler's papers mentioned above, the parameter space is implicitly defined as a minimum-restriction set. Precisely, the parameter space is an open subset of R g , contained in R g − G, where g is the total number of entries of F, G, H N and G is the subset of R g where either the stability or the miniphase condition does not hold. If the system is zero free, that is consistent with the miniphase requirement, and it is clear that if the system is zero free for one set of parameter values, it will be free for almost all, given that it is a rank condition. (Note also that an inessential variant to the count can occur if some of the entries are fixed at zero, or one, as can occur if a square matrix is a companion matrix or a block companion matrix for example. Such fixed entries are not part of the count.) In other words, apart from the stability and miniphase conditions, we assume that each parameter can vary independently of the others. Throughout the paper genericity will always be implicitly defined with respect to such minimum-restriction parameter spaces. See however the Conclusions for further considerations on this issue.
Zero freeness with N > q implies a number of properties: 1.
Suppose that for polynomial matricesĀ Then zero freeness implies B N ∈ R N×q has full rank for all z. Accordingly, there exists a polynomial unimodular (constant determinant) common premultiplier U N (z) ofĀ N ,B N such that a new polynomial fraction description of K N (z) exists with where B N is a constant matrix. We have stressed earlier that N can increase; note though that det A N (z) will have degree n as a polynomial in z for all N ≥ N 0 . 2.
The system (3) is left invertible with unknown initial state, that is, there exists an integer L ≤ n such that from the sequence χ N k , χ N k+1 , . . . , χ M k+L−1 the state x k and the sequence w k+1 , w k+2 , . . . , w k+L−1 can be determined.
Other characterizations and properties to be found in Anderson and Deistler (2008) deal with the Smith-McMillan form, the impulse response of K N (z), and the absence of nontrivial so-called output-nulling subspaces.
There is of course a very minor restatement of the above Theorem and the two following comments based on the canonical spectral factor generating f t rather than χ N t . It too is (normally) zero free, due to the tallness inequality r > q. However, in working with an AR description of f t and setting χ N t = L N f t , rather than working with an AR description of χ N t , the reduction in parameter count is substantial, and varies as O(N) rather than O(N 2 ). To this point, the stability of the model (3) and its connection with a spectrum have not been appealed to. To bring in those ideas, continue to assume the underlying stable miniphase spectral model of Φ N χ is (3) and the matrices in a state-variable description are generic. The spectrum Φ N χ can be generated by a singular autoregressive process where B N ∈ R N×q has rank q and w t is a unit covariance innovations process; the A N i and B N are computable in a finite number of rational calculations from the lagged covariances

2.
Suppose the rational spectral matrix Φ N χ (e jω ) is written in the form where the triple {F,H N ,Q N } is minimal (controllable and observable) in the state-variable sense and all eigenvalues ofF have magnitude less than 1, such matrices being computable in a finite number of rational calculations from the lagged covariances E[ Then (a) there existsḠ N such that the stable miniphase (canonical) spectral factor can be written in the formH N (I − zF) −1ḠN (b)Ḡ N is computable in a finite number of rational calculations fromF,H N andQ N , and (c) the triple {F,H N ,Ḡ N } is minimal.

3.
In the event thatḠ N defined above is independent of N, say G =Ḡ N , thenQ N = P(H N ) for some positive definite P which is independent of N, with P the unique solution of P − FPF = GG .
The following statement is a quite obvious consequence of the previous Theorems: Theorem 4. Consider the ARMA process where A(z) is an m × m stable polynomial matrix, B(z) is an m × q polynomial matrix. Suppose that m > q, so that x t is a singular ARMA process. Then for generic values of the coefficients of the entries of B(z), B(z) is zeroless, so that x t has a finite AR representation.
The relevance of AR processes in our context is not a new idea of course, see Stock and Watson (2005), Forni et al. (2009), Stock andWatson (2016), Forni et al. (2020). These papers apply Theorem 4, in which x t is replaced by f t , while singularity is a consequence of the assumption that r > q, so that we are again working with a generically zero-free model.
The first statement of the Theorem 3 above is little more than a restatement of the first statement of Theorem 2. As a side comment, note that B N can be uniquely specified by multiplying on the right by an orthogonal matrix rendering it triangular. Reference Anderson et al. (2012b) contains further information concerning exploitation of the freedoms that remain after such a normalization, and the associated parameter counts, including the value of m. The row degrees in particular of the matrix I + A N 1 z + · · · + A N m z m can be minimized in a certain sense. Notwithstanding, the number of nonzero parameters in the singular AR description is essential proportional to N 2 , which is an unattractive feature, contrasting with the linear dependence on N of the parameter count for a state-variable model. In a different direction, a different aspect of the awkwardness of working with the singular AR model is that the nesting property for N ≥ N 0 is not simply captured. Of course, in the event one seeks, as noted above, an AR model for ( f t ) rather than (χ N t ) the difficulty of working with large N is no longer present. Not only does one have χ N t = L N f t , with N is now responsible for only linear growth in the number of parameters, but the nesting property holds, in particular being handled by L N .
The second statement is more subtle: the fact that the same pairF,H N can be used for state variable realizations of the covariance sequence associated with Φ N χ (e jω ) and the associated stable miniphase spectral factor is an old though not well-known result, see, e.g., Anderson (1969); it is not restricted to the zero-free (autoregressive) case. The multivariate version of the result is not easy to prove, in contrast to the converse (which states that a triple 'realizing' the covariance sequence can inherit two of the component matrices of the triple 'realizing' the spectral factor). What is novel in the zero-free case is that the determination of the spectral factor requires no more than a finite number of rational calculations. These calculations involve solving a recursive discrete time matrix Riccati equation involvingF,H N andQ N over a finite interval of length O(n) or (what is nontrivially equivalent) performing a Cholesky decomposition on a (square) Toeplitz matrix with O(nN) rows, formed fromF,H N andQ N , see Anderson and Deistler (2009). In the event that N is large and the realization of the covariance sequence is available, this is an attractive idea: with both approaches, the computational complexity is linear in N. Of course, it is the zero-free property of the canonical spectral factor that assures the finite-time convergence of the Riccati equation.
The third statement simply captures, and is also an extension of, the nesting property. Though not contained in Anderson and Deistler (2009), a proof is contained in Appendix A.

Obtaining Parameters in the AR and State-Variable Models
We suppose in this section that from the sample data (y N t ) collected over a time interval and with a large value of N, using PCA or an equivalent, an estimate of (χ N t ) is obtained over effectively the same interval. This is not even a sample of a process over a finite interval, but only an estimate of it. Finding a model for the 'true' process (χ N t ) based on the estimates then has to proceed using the assumptions that n, q, r are known, that N ≥ N 0 , and that closeness of the sample-derived lagged covariance data and population lagged covariances assure closeness of the associated canonical spectral factors. Thus, the Assumptions introduced earlier need to all be in force.
Consider first the determination of the coefficient matrices in an AR model for (χ N t ). What we say will be very little modified if we choose instead to model f t , and so we omit any detailed discussion of this alternative in this section.
The fact that the spectrum of interest is generated by a (singular) autoregression as in (10) suggests that a stable miniphase spectral factor may be obtainable by Yule-Walker equations-the usual tool for obtaining an AR model (when one is known to exist) from covariance data. Indeed, this possibility was examined in Deistler et al. (2010). The Yule-Walker equations in general take the form where γ j is the lagged covariance E[χ N t (χ N t−j ) ] and Γ N is the N × N block Toeplitz matrix with first row [γ 0 γ 1 . . . γ N−1 ]. However, a difficulty immediately arises. For a singular covariance, there is no guarantee that Γ N is nonsingular. Nevertheless, it is established in Deistler et al. (2010) that the choice where use of a pseudoinverse to replace the regular inverse ensures that the resulting AR model defines a stationary process, i.e., the associated matrix polynomial [I + A N 1 z + A N 2 z 2 + . . . A N N z N ] has all determinantal zeros in |z| > 1, which is a rephrasing of the stability condition. The paper Chen et al. (2011), which imposes fewer assumptions than are incorporated here, takes this further: under some circumstances, there may be unit circle determinantal zeros, corresponding to a deterministic component of the process. (The assumptions in this paper rule out this possibility.) Further, if one does not use a pseudoinverse (which delivers a minimum norm solution of (11)) but uses some other procedure to define the matrices A N i to satisfy (11), the AR system so defined may not be stable. Note that the quantities γ i will actually be replaced by estimates which are consistent as N, T → ∞.
For completeness, we recall the potential numerical hazards arising when the key Toeplitz matrix is ill-conditioned, as discussed in Section 2.1.
The procedure for determining a state-space model is of course different. The starting point is the sample calculations giving estimates of E[χ N t (χ N t−s ) ] for s = 0, 1, 2, . . . . If these were exact population second moments corresponding to a system satisfying the assumptions, particularly the rational spectrum assumption, an algorithm due to Ho and Kalman (1966) Zeiger and McEwen (1974) presented a variant that generates approximate finite dimensional realizations of low dimension from approximate data, as needed here. The algorithm uses truncated singular value decomposition, which means that for O(N)-dimensioned matrices of much lower rank than N the complexity appears to be O(N). Of course, once the triple {F,Q N ,H N } has been determined, the construction of G using the ideas of Theorem 3 can be attempted. Once again, it may well make more sense to seek a state variable triple realizing the sequence E[ f t ( f t−s ) ], s = 0, 1, 2, . . . , rather than E[χ N t (χ N t−s ) ], s = 0, 1, 2, . . . The bulk of the suggested algorithms in this subsection have not been simulated with real data. However, even if they do not perform particularly well, they offer the opportunity to obtain a first estimate of a model, using which an iterative algorithm to refine the estimate can be used, Shumway and Stoffer (1982). This algorithm, based on expectation maximization, is a maximum likelihood algorithm which at each step always increases the likelihood, and necessarily therefore reaches a maximum, though it may not be a global maximum. The use of a well-chosen initialization can however promote that desired outcome.
The algorithm of Shumway and Stoffer (1982) appears to offer no straightforward technique for incorporating a rank constraint on the input coupling matrix G of the canonical spectral factor, or simply the zero-free property. Consequently, while it uses Kalman filtered and smoothed estimates, it is not immediately clear how one should exploit the notion that such estimates in the zero-free case can be exactly determined using a finite number of measurements as set out in the commentary immediately following Theorem 3.

Mixed Frequency Systems
Some econometric time series may be collected monthly, others may be collected quarterly. A methodology is needed to allow graceful combination of the two periodicities, and the modelling of high dimension time series. Starting with this section, there appears a summary of ideas developed by authors Anderson and Deistler with their colleagues in treating such problems. Note importantly that mixed frequency problems are of interest outside the 'high dimensionality' context, and some limited conclusions are drawn here that do not appeal to properties such as zero freeness.
The historical roots of the applicable techniques lie in signal processing, which has treated with a technique termed 'blocking' or 'lifting' problems involving time series with integrally related periodicities, with a view to bringing to bear tools applicable to single frequency systems, such as rational transfer functions, and their descriptions and properties involving for example state-variable realizations, poles and zeros, see, e.g., (Mitra 2011, see Chapters 13 and 14). The book Bittanti and Colaneri (2009) also contains, or contains references to, many of the relevant ideas, and has a more control systems flavour.

Blocked Linear Systems
We draw on Chen et al. (2012) for initial ideas, which cover change of periodicity as opposed to treatment of simultaneous multiple periodicities. Once change of periodicity is understood, it is straightforward to introduce multiple periodicities.
Consider a time-invariant finite-dimensional linear system with input sequence u t and output sequence y t given by x t+1 = Ax t + Bu t , y t = Cx t + Du t with t = 0, 1, 2, . . . . Imagine then that blocks of J successive inputs and outputs are collected into N-vectors U t = [u t u t−1 , . . . , u t−J+1 ] and Y t = [y t y t−1 , . . . , y t−J+1 ] . Then one can write the following equations for a 'blocked' or 'lifted' system: These are equations of a time-invariant linear system, of different input and output dimension but the same state dimension as the original. One can regard this second system as updating every J time instants, thus at (for example) t = 0, J, 2J, . . . . Note that even if the 'direct feedthrough' term Du t is absent in the unblocked system as when D = 0, a nonzero direct feedthrough term does appear in the blocked system in general. The transfer function of the blocked system is The paper Chen et al. (2011) records many connections between the original and 'blocked' transfer functions and state-variable realizations (some being novel to the paper), in particular, the carrying over of controllability and observability and minimality, connections between the normal ranks, a formula connecting the first column of the blocked transfer function and the original transfer function, relations between the poles and, separately, the finite and infinite zeros of the two transfer functions, implying that one is stable (or miniphase) if and only if the other is stable (or miniphase), and one is zero free if and only if the other is zero free.
In this connection, note that if the dimension of y t exceeds that of u t , the same is true of the dimensions of Y t and U t . Moreover, if the original A, B, C, D matrices are generic, the unblocked and therefore the blocked system are zero free.
The property of zero-free carryover becomes relevant also in considering time series with multiple periodicities.

Systems with Multiple Periodicities in Their Outputs
Again our starting point is the linear system defined by x t+1 = Ax t + Bu t , y t = Cx t + Du t , with t = 0, 1, 2, . . . . However we now suppose that the vector y t is partitioned into two subvectors of dimensions p 1 and p 2 , thus The process (y f t ) is observed at all time instants, while the process (y s t ) is observed at time instants 0, J, 2J, . . . even though it is known to exist for all time instants. From the observer's point of view, (y f t ) and (y s t ) constitute 'fast' and 'slow' processes. Evidently the observed output comprises two (in general multivariate) time series with different integrally related periodicities. As set out in, e.g., Anderson et al. (2016aAnderson et al. ( , 2016b, We can establish a single frequency system by blocking the fast process (y f t ). In more detail, consistent with the partition of y t , we also adopt the corresponding partitions To obtain a blocked model, we block u t as before, but only use the observed values within each y t for blocking, which means that for t = 0, J, 2J, . . . , It is straightforward to obtain matrices A b , C b , C b , D b constituting a state-variable realization for the system taking the sequence U 0 , U J , . . . to Y 0 , Y J , . . . , see Zamani et al. (2011). In particular, This system is self-evidently a purely time-invariant system, governed by a single period which happens to be that defined by the observed slow process (y s t ), despite the system containing within its outputs all the information from the fast process (y f t ). Of course, there is no attempt in this model to interpolate the missing (nonobserved) values of the slow process.
The original system and the blocked system have the same state-space dimension. It is natural then to ask if minimality carries over. This question is examined in Anderson et al. (2016b). If the original system is controllable, then it is straightforward to establish that there can be no nonzero vector w and constant λ for which w A b = λw , w B b = 0, i.e., the blocked model is controllable. The argument does not work for observability because there may be modes solely observed by C s which at the high frequency are distinct, but when sampled every J time instants, merge. This can occur if for example A has two distinct eigenvalues with the same J-th power. Such a situation is nongeneric however, and one can show that if A is generic (and in particular has distinct eigenvalues whose J-th powers are also distinct), observability of the underlying system implies observability of the blocked system. Of course, if the underlying system is observable from y f alone, the blocked system is observable irrespective of the genericity of A, this following by essentially the same argument as used to study the carrying over of controllability. Furthermore, if the original system is uncontrollable, or unobservable, the same property necessarily holds for the blocked system. Evidently then, for generic A, the original system and the blocked system are either both minimal, or both nonminimal.
Going on from this, it follows that if the underlying system has {A, B, C, D} and {TAT −1 , TB, CT −1 , D} as two minimal state variable quadruples for some nonsingular T, the corresponding two blocked systems are and if any minimal realization of the blocked system is known, it must be expressible in terms of an associated underlying system via the formula (16).
Summing up, we can state: Theorem 5. Consider an underlying linear system defined by x t+1 = Ax t + Bu t , y t = Cx t + Du t , with t = 0, 1, 2, . . . and where the vector y t and matrices C, D are partitioned into blocks with p 1 and p 2 rows, as The process (y f t ) is observed at all time instants, while the process (y s t ) is observed at time instants 0, J, 2J, . . . . Then the associated observed block model, with input and output given by (15), has a state-variable realization given by (16). Moreover, controllability of either the underlying system or the blocked observed system implies the same property for the other. If A is generic, the same holds true in respect of observability. Coordinate basis change and blocking commute.
We also note by way of a side remark that there is a new sort of prediction problem indirectly presenting itself here: nowcasting, which can be described roughly as prediction of the immediate but unobserved past, present or immediate future. This will apply for those time instants between those at which the slow process is measured.
Following on from the preceding theorem, it is also relevant to consider what happens when the underlying system is a canonical spectral factor generating the process (y t ). Under some circumstances, the blocked system will not have this property simply because of its input, state and output dimensions, and there is more than one reason why this can happen:

1.
Depending on the value of J and the dimensions of u t , y t , the dimension of U t may end up larger than that of Y t , i.e., the transfer function of the blocked system may be fat, as noted in Anderson et al. (2016b).

2.
Depending on the value of J and the dimensions of u t , x t , the dimension of U t may end up greater than that of x t (which is the state dimension of the blocked system).
Apart from this, the helpful property of zero freeness, which in many applications will be a property of the underlying system, is not guaranteed to carry over to the blocked system, although , as shown in Zamani et al. (2011), if the dimension of y f t alone exceeds that of u t , and the matrices A, B, C, D are generic, then the system defined by The overarching problem is one of identifying the underlying system using the mixed frequency observations. The key is to work via the blocked system. Unsurprisingly, if the blocked system is indeed zero free, the process is much more straightforward.
In econometric applications, the condition for the zero-free property of the blocked system may well be fulfilled. However, this may not be the case, even though the underlying system is zero free. This is the main scenario treated in the next section.

Mixed Frequency System Identification
There are two broad thrusts to this section. First, we postulate that the underlying system has an AR description, and exploit that fact. Second, we exploit the structure of the blocked system (but at a certain point, also appeal to the zero-free structure of the underlying system).

Mixed Frequency System Identificaiton with an AR Underlying System
It is well known that the AR property is not preserved under marginalization. Therefore, even if the underlying process is AR but some of its entries are only observed every J time instants, one cannot expect that the observed process is AR. (This is roughly equivalent to observing that even if the underlying process is zero-free, the blocked process may not have this property). More to the point, even if it is AR, the associated parameters in the defining matrices of the AR equation cannot be straightforwardly related to those of the underlying process. Nevertheless, using an approach based on exploiting Yule-Walker equations, progress can be made on identification. This idea was originally observed in Chen and Zadrozny (1998) and further developed in Anderson et al. (2012a). Subsequently, it was shown in Anderson et al. (2016a) that for systems with generic parameter values, the suggested algorithm could always be executed. The key here was to ensure utilization of all observed second order moments, in contrast to Chen and Zadrozny (1998) which omitted use of some.
Suppose that the underlying process is The innovations covariance may be singular. The identifiability question is whether the coefficient matrices a f f (i), a f s (i), etc and the entries of Σ can be identified from covariance data associated with the time series (y f t ), t = 0, 1, 2, . . . and (y s t ), t = 0, J, 2J, . . . . As asserted above, this problem can be tackled by generating 'extended' Yule-Walker equations, which are obtained by multiplying (17) on the right by the fast components of y t−k , k = 1, 2, . . . and forming expectations. The resulting equations are linear in the AR coefficients and, crucially, it can be proved that they are generically solvable, i.e., for almost all values of the coefficient and entries of Σ, a unique solution exists, see Anderson et al. (2016a). (An example is given below). Once these coefficients have been found, the entries of Σ can be computed, except again on a set of parameter values of measure zero. The corresponding estimators are however not "good" and in particular not asymptotically efficient, also because observed lags of the covariances of the slow process are not exploited.

A Simple AR(1) Example
In this subsection, and following Anderson et al. (2016b), we consider what is almost the simplest possible case of a mixed frequency AR system. The underlying system is AR(1), and has two scalar outputs, one of which is observed at every second time instant. We also assume regularity of the underlying system, so the zero-lag covariance Σ of the driving white noise is nonsingular. We investigate to what extent the parameters of the process before observation loss can be recovered from the observed process statistics.
Before loss of observations, the process is

Set
A = a f f a f s a s f a ss With every second value of the 'slow' process (y s t ) being observed, it is easy to see that with obvious definitions ofÃ andw t . It is straightforward to check that The crucial question at this point is now: given the AR process generated in (19), and an identification using the process measurements of the entries ofÃ andΣ (presumably using Yule-Walker equations which will yield unique values in the regular case), can the entries of A and Σ then be inferred?
The high level answer is that for generic values of the parameters, they can indeed be inferred. The more detailed answer is that the values cannot be inferred if and only if all three of the following conditions (defining a semi-algebraic set of parameters of measure zero) are fulfilled (for details, see Anderson et al. (2016b), which builds on a simpler version of the problem assuming diagonal Σ treated in Anderson et al. (2012a)): As noted in Anderson et al. (2016b), if the underlying process continues to have a lag of 1, the approach is generalizable to multivariate y f t and y s t , except that the definition of nongeneric values of parameters becomes more involved.
Generalization to lags greater than 1 is however not possible using the framework given in this subsection; this is because the AR structure of the original system no longer yields an AR structure for the blocked system. This is unsurprising, given the nonclosure of AR systems under marginalization. Nevertheless, the general approach based on modified Yule-Walker equations remains available Anderson et al. (2016b).

Mixed Frequency System Identification Exploiting Blocked System Structure
In this subsection, we record ideas originally presented in Anderson et al. (2016b) for the case where the slow process comprises every second value in the time sequence of the relevant entries of the underlying process. However, here we describe the situation when every J-th value rather than every second value is used. While the underlying system is zero free, the blocked system is not assumed to have this property.
To begin, suppose the underlying system state variable realization {A, B, C, D} is minimal. Then generically, the same is true of the blocked system, defined by the quadruple {A b , B b , C b , D b }. The blocked system itself may not be a canonical miniphase system, but as argued in Anderson et al. (2016b), there is a minimal state-variable realization of the canonical miniphase system (yielding the same spectrum) with the same C b and A b . It follows that if we identify a minimal state-variable realization of the canonical miniphase system for the blocked system, and this includes state-update and state-to-output coupling matricesÃ b ,C b , then for some nonsingular T, there will holdÃ b = T −1 A b T,C b = TC b . From these two matrices, one seeks matricesÃ,C f ,C s such that Of course,C f andC s are immediately obtained from the second equation. With A sufficiently generic that no two eigenvalues have the same J-th power, the matrixÃ (which has the same property) is determined up to a finite number of possibilities corresponding to different possible choices for the J-th root of each eigenvalue ofÃ b . It may be that knowledge ofC fÃ ,C fÃ2 , . . . then suffices to achieve disambiguation. Alternatively the set can be disambiguated to identify just one of the possible choices by invoking a further appeal to genericity (still excluding a semi-algebraic set of measure zero, but one which contains the set ensuring that no two eigenvalues of A have the same J-th power). Exclusion from the larger semi-algebraic set guarantees the observability of (A, C f ) or equivalently of (Ã,C f ), which is enough to secure disambiguation.
As noted above, we must be able to identify matricesÃ b ,C b which are part of a quadruple of matrices defining a state-variable realization of the canonical spectral factor of the blocked system. We do not however need to identify all matrices of a realization of the canonical spectral factor. We can in fact identify the relevant matrices by identifying a state-variable realization of the covariance sequence associated with the spectrum of Y t -see the discussion following Theorem 3-and this is a relatively straightforward matter, using the algorithm earlier cited, Zeiger and McEwen (1974). Now suppose in fact that the underlying system is a VAR process of order p, thus with (ν t ) a white noise process. Of course, the system is assumed to be stable. This system can also be described by the state-variable equations    y t . . .
It is explained in Anderson et al. (2016a) how the coefficient matrices A i can be determined from the observable pair (Ã,C) (whereC = [(C f ) (C s ) ] ) under the additional genericity assumptions that A p is nonsingular, as is the covariance of the white noise process ν t .
In the earlier part of this paper, we stressed the potential numerical hazards when working with AR descriptions of high dimensional processes. The same warning of course applies here if the processes are typical econometric processes. This means that procedures for reducing the problem dimensionality by exploiting the singularity of the zero lag covariance of the output are desirable. How best to do this when measured signals have two periodicities (whether or not at the blocked system level) has not however been addressed in any depth in the literature. Note though that the minimal static factors can be constructed as easily for the mixed frequency case as for the single frequency case, since they are effectively determined using the zero lag covariance of the measured outputs. Another comparatively untreated issue for mixed frequency systems is that of passing to the common components process from a measurement process in which idiosyncratic components are additively embedded.

VARMA and VMA System Identification
The content of this subsection is not especially relevant to Dynamic Factor Models, but is related to the earlier material of this and the previous section, in that it focusses on further questions of identifiability when not all output values of an underlying process are observed.
An alternative starting point for the underlying process is to assume a different model class, where one postulates a model with A i , B i of dimension N × N. For convenience, make the definitions It is assumed that y t = [(y f t ) (y s t ) ] with y s t observed at time t = 0, J, 2J, . . . for some integer J > 1. The stability and miniphase properties are captured by the assumptions With the degrees of a(z) and b(z) assumed known, the identification task is to estimate the entries of the coefficient matrices in a(z), b(z).
Of course, (w t ) is a zero mean white noise sequence; we also assume E[w t w t ] = Σ. If rank Σ = q < N, the system is singular. Otherwise it is regular.
Note that this problem setting does not have to arise in a context of high-dimensional time series (though it may), which means that non-AR modelling becomes relevant.
Identifiability of such models is linked to the question of uniqueness of the polynomials making up the transfer function a −1 (z)b(z). In the regular case, one can require left coprimeness of the pair a(z), b(z) and then enforce through multiplication of each of a, b by the same unimodular (constant determinant) multiplier a canonical form. In the singular case, the situation is more complicated. For example, even for an AR system, the pair (a, b) with b(z) = B 0 may not be left coprime. Even when they are left coprime, the Yule-Walker equations which are usually used to obtain the A i from covariances may not have a unique solution, in contrast to the regular case.
In case s ≤ p, the reference Anderson et al. (2016a) establishes that for generic values of the entries of the coefficient matrices, identifiability can be achieved. A two-step process is used. First, analogously to the procedure outlined in Section 7.1, a modified set of Yule-Walker equations can be obtained in which only known covariance data appears, and from these, the denominator matrix a(z) can be obtained, given generic values of the coefficient matrices in a(z) and b(z). Then by using the knowledge of all the A i and some of the covariances E[y t y t−s ], it turns out that all the missing covariance data can be obtained. Following that, the matrix polynomial b(z) is straightforward to obtain.
One might wonder whether the condition s ≤ p is important. It appears so. One can in fact show, see Deistler et al. (2017), Theorem 6. Consider the MA model obtained from (25) by setting A 1 = A 2 = · · · = A p = 0. Suppose that E[w t w t ] is nonsingular. With y t = [(y f t ) (y s t ) ] ] and y s t observed for t = 0, J, 2J . . . for some integer J > 1, the parameter matrices B i are not identifiable.
The underlying reason for this conclusion is that there are more parameters to identify than there is relevant covariance data available to identify them.
A very simple example, drawn from Anderson et al. (2016a), establishes the point. In obvious notation, consider the following system where y t has dimension 2: Suppose first that all covariance data is available, viz. γ(0) = E[y t y t ] and γ(1) = E[y t y t−1 ]. It is easy to check that Identification is the task of solving these equations for the seven unknown parameters defining B 1 and Σ. The number of independent scalar equations is seven; though there are multiple solutions, they can be disambiguated by the miniphase condition requiring det b(z) = 0 when |z| ≤ 1. Now consider the situation where every second entry of y s t is observed. Then the 22 entry of γ(1) is simply unavailable, being E[y s t y s t−1 ]. A continuum of values is possible for this entry, and hence a continuum of values for the entries of Σ and B 1 .
It is perhaps surprising to find in the mixed frequency case that with an underlying AR process with lag 1 one has identifiability, while with an underlying MA process with lag 1 one does not have identifiability.

Conclusions
In the course of this survey of a number of results, we have highlighted several issues that are worth re-emphasising here. Furthermore, some open issues, both in the theory and applications to data are pointed out.
First, we have recorded a number of issues arising in modelling. Though there are exceptions, most problem statements with their assumptions need to guarantee that an estimated model is continuously dependent on the data from which it is obtained. Once an explicit assumption of rationality of the model transfer function has been made, continuity of the real-valued parameters for a given (and possibly estimated) set of integer-valued specification parameters with respect to the data is almost always guaranteed. Before such a rationality assumption is made however, continuity (in the L ∞ sense) of a canonical spectral factor is not automatic, and depends on some additional assumption such as boundedness of the derivative of the spectrum. It is also useful to identify which particular form of model of an entity such as a canonical spectral factor is most preferable. One can think of ARMA/AR, state-variable, or even a frequency-based description involving plots of the different entries of the spectral factor against frequency. Moreover, the proposed application for a model (and the calculations using in addressing that application) may be a determinant of the best type of model. For some applications, e.g., those involving business cycles, state-variable models exhibiting pole positions in diagonal blocks of the state update matrix, or frequency domain plots, may be clearly preferable to AR models. For forecasting, AR models are likely to be preferable. Let us recall too that while some of the assumptions used in ensuring solvability of the identification (or modelling) problem are grounded in intuitively reasonable hypotheses, others are of a more instrumentalist character, thus invoked to ensure mathematical solvability rather than to reflect some underlying natural truth.
As a second theme, we have emphasised through much of the paper both the occurrence of and benefit from the zero-free property of canonical spectral factors arising in dynamic factor modelling. It is of great utility, and deserves to be accorded significant recognition for this fact. This observation is also relevant in the mixed frequency case. Another, perhaps less important, idea flowing from linear systems considerations is the fact that state-variable realizations of a covariance sequence and a canonical spectral factor for the associated power spectrum can be assumed to have the state transition matrix and state to output coupling matrices identical. This underpins an ability to compute from a covariance sequence a canonical spectral factor using quite different calculations to those of a Yule-Walker procedure (and thereby possibly avoid numerical problems which can arise).
A third major theme is that of mixed frequency identification problems. Several key conclusions emerge here. First, it is evident that their treatment is not as mature as the treatment of single frequency problems. Second, issues of genericity become more important. Third, certain mixed frequency identification scenarios are provably nonidentifiable (as indicated in our discussion of VARMA and MA modelling).
What now of the future? Among the outstanding issues worthy of investigation, we would certainly include the following: 1.
The zero-free property has been applied to macroeconomic analysis as a means to solve the fundamentalness problem. Further work is ongoing on this and, we hope, the zero-free property will increasingly become familiar to macroeconomists, see the literature cited below Theorem 4.

2.
An important observation on the zero-free property in statements like Theorem 4 is that its genericity is usually obtained with respect to a parameterization in which each parameter is free to vary independently of all the others, see the comment below Theorem 2. However, in applications to macroeconomic models, structural restrictions may prevent such free variability and can therefore interfere with the genericity of zero-freeness. For an attempt to solve this issue, see Forni et al. (2020).

3.
Another observation on Theorem 4 is that an AR representation may exist also for non-generic parameters, for which zerolessness does not hold but the zeros of B(z) lie outside the unit circle. However, in that case the AR would not be of finite length. 4.
In some cases, we have outlined algorithms which have not yet been tried on real data. It would be worthwhile to investigate them using real data.

5.
A full treatment (algorithms through to testing on real data) including the decomposition into common components and idiosyncratic processes is needed for systems with multiple frequencies.
Author Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: Suppose that for any N ≥ N 0 , the transfer function H N (I − zF) −1 G is a spectral factor of Φ N χ (e jω ) with {F, G,H N } a minimal state-space realization. Define P to be the unique positive definite matrix satisfying P −FPF = GG .
Note that P is positive definite since λ i (F) < 1 andF, G is a controllable pair. Then one can verify using the identity P −FPF = (I − zF)P + P(I − z −1 F ) − (I − zF)P(I − z −1F ) that State space models have larger equivalences classes and therefore more flexibility when choosing "nice" representatives. Special representatives, in echelon form, can yield state space models with the same parameters as ARMA models.
3 This is an easy consequence of the celebrated solution of the partial realization problem for covariance sequences, Byrnes and Lindquist (1997). 4 In a practical situation, given that sensors normally are not noise free, the introduction of more sensors is likely to aid the de-noising process. Our remarks here pertain to the process of common components. 5 Such a spanning set of entries can be obtained by identifying the rows defining an r × r nonsingular principal submatrix of E[χ N t (χ N t ) ] 6 As an alternative, PCA can also be used to obtain a minimal static factor. 7 For a scalar rational transfer function written as a ratio of coprime polynomials n(z)/d(z), the zero free property is equivalent to n(z) being a nonzero constant. The idea generalizes to coprime matrix fraction representations of a rational transfer function matrix 8 Coprimeness means there exists no polynomial C N (z) ∈ R N×N [z] with nonconstant determinant and matricesÃ N ∈ R N×N [z],B N ∈ R N×q [z] such thatĀ N = C NÃN andB N = C NBN