Quantifying Drivers of Forecasted Returns Using Approximate Dynamic Factor Models for Mixed-Frequency Panel Data

Defend, Monica; Min, Aleksey; Portelli, Lorenzo; Ramsauer, Franz; Sandrini, Francesco; Zagst, Rudi

doi:10.3390/forecast3010005

Open AccessArticle

Quantifying Drivers of Forecasted Returns Using Approximate Dynamic Factor Models for Mixed-Frequency Panel Data

by

Monica Defend

¹,

Aleksey Min

²,

Lorenzo Portelli

³,

Franz Ramsauer

^2,*,

Francesco Sandrini

⁴ and

Rudi Zagst

²

¹

Group Research and Macro Strategy, Amundi SGR, 20121 Milan, Italy

²

Mathematical Finance, Technical University of Munich, 85748 Garching, Germany

³

Cross Asset Research, Amundi SGR, 20121 Milan, Italy

⁴

Multi Asset Balanced, Income and Real Returns Solution, Amundi SGR, 20121 Milan, Italy

^*

Author to whom correspondence should be addressed.

Forecasting 2021, 3(1), 56-90; https://doi.org/10.3390/forecast3010005

Submission received: 13 December 2020 / Revised: 24 January 2021 / Accepted: 2 February 2021 / Published: 8 February 2021

(This article belongs to the Special Issue New Frontiers in Forecasting the Business Cycle and Financial Markets)

Download

Browse Figures

Versions Notes

Abstract

:

This article considers the estimation of Approximate Dynamic Factor Models with homoscedastic, cross-sectionally correlated errors for incomplete panel data. In contrast to existing estimation approaches, the presented estimation method comprises two expectation-maximization algorithms and uses conditional factor moments in closed form. To determine the unknown factor dimension and autoregressive order, we propose a two-step information-based model selection criterion. The performance of our estimation procedure and the model selection criterion is investigated within a Monte Carlo study. Finally, we apply the Approximate Dynamic Factor Model to real-economy vintage data to support investment decisions and risk management. For this purpose, an autoregressive model with the estimated factor span of the mixed-frequency data as exogenous variables maps the behavior of weekly S&P500 log-returns. We detect the main drivers of the index development and define two dynamic trading strategies resulting from prediction intervals for the subsequent returns.

Keywords:

approximate dynamic factor model; expectation-maximization algorithm; forecasting; incomplete data; mixed-frequency information; prediction interval; trading strategy; vector autoregression

JEL Classification:

C51; C53; C58; E37; E47; G11; G17

1. Introduction

In this paper, we estimate Approximate Dynamic Factor Models (ADFMs) with incomplete panel data. Data incompleteness covers, among others, two scenarios: (i) public holidays, operational interruptions, trading suspensions, etc. cause the absence of single elements, (ii) mixed-frequency information, e.g., monthly and quarterly indicators, results in systematically missing observations and temporal aggregation. To obtain balanced panel data without any gaps, we relate each irregular times series to an artificial, high-frequency counterpart following Stock and Watson [1]. Depending on the relation, the artificial analogs are categorized as stock, flow and change in flow variables. In the literature, the above scenarios of data irregularities are handled in [1,2,3,4,5,6,7].

The gaps in (i) and (ii) are permanent, as they cannot be filled by any future observations. In contrast, publication delays cause temporary lacks until the desired information is available. The numbers of (trading) days, public holidays, weeks, etc. per month change over time. Therefore, calendar irregularities, the chosen time horizon, and different publication conventions further affect the panel data pattern. In a following paper, incomplete data refers to any collection of stock, flow and change in flow variables [1,4].

Factor models with cross-sectionally correlated errors are called approximate, whereas factor models without any cross-sectional correlation are called exact. In Approximate (Static) Factor Models with identically and independently distributed (iid) factors, Stock and Watson [8] showed that unobserved factors can be consistently estimated using Principal Component Analysis (PCA). Moreover, the consistent estimation of the factors leads to a consistent forecast. Under additional regularity assumptions, these consistency results remain valid even for Approximate Factor Models with time-dependent loadings. In the past, Approximate Static Factor Models (ASFMs) were extensively discussed in the literature [9,10,11,12,13,14,15].

Dynamic Factor Models (DFMs) assume time-varying factors, whose evolution over time is expressed by a Vector Autoregression Model (VAR). For Exact Dynamic Factor Models (EDFMs), Doz et al. [16] showed that these models may be regarded as misspecified ADFMs. Under this misspecification and in the maximum likelihood framework, they proved the consistency of the estimated factors. Therefore, cross-sectional correlation of errors is often ignored in recent studies [7,17,18,19,20]. However, cross-sectional error correlation cannot be excluded in empirical applications. The estimation of DFMs is not trivial due to the hidden factors and high-dimensional parameter space. Shumway and Stoffer [21] and Watson and Engle [22] elegantly solved this problem by employing an Expectation-Maximization Algorithm (EM) and the Kalman Filter (KF)-Kalman Smoother (KS). By incorporating loading restrictions, Bork [23] further developed this estimation procedure for factor-augmented VARs. Asymptotic properties of the estimation with KS and EM for approximate dynamic factor models have recently been investigated by Barigozzi and Luciani [24]. For EDFMs, Reis and Watson [25] treated serial autocorrelation of errors at first. For the same model framework, Bańbura and Modugno [20] provided a Maximum-Likelihood Estimation (MLE) using the EM and KF for incomplete data. It should be noted that Jungbacker et al. [26] proposed a computationally more effective estimation procedure, which involves, however, a more complex time-varying state-space representation.

This paper also aims at the estimation of ADFMs for incomplete panel data in the maximum likelihood framework. It contributes to the existing estimation methodology in the following manner: First, we explicitly allow for iid cross-sectionally correlated errors similar to Jungbacker et al. [26] but do not undertake any adaptations for an underlying DFMs. In contrast, Bańbura and Modugno [20] consider serial error correlation instead and assume zero cross-sectional correlation. Second, our MLE does not combine an EM and the KF. We instead propose the alternating use of two EMs and employ conditional factor moments in closed form. The first EM reconstructs missing panel data for each relevant variable by using a relation between low-frequency observations and their artificial counterparts of higher frequency [1]. The second EM performs the actual MLE based on the full data and is similar to Bork [23] and Bańbura and Modugno [20]. Our estimation approach for incomplete panel data deals with a more simple state-space representation of DFMs, which is invariant with respect to any chosen relationship between low-frequency observations and their artificial counterparts of higher frequency. In contrast, the approaches by Bańbura and Modugno [20] and Jungbacker et al. [26] usually deal with more complex underlying DFMs and require adjustments, even if a relationship between observations of low-frequency and high-frequency changes for a single variable only. There exist different types of possible relations between observations of low-frequency and high-frequency in the literature and we refer to Section 2.2 for more details. Third, our paper addresses a model selection problem for the factor dimension and autoregressive order. For this, we propose a two-step approach and investigate its performance in a Monte Carlo study. The choice of the factor dimension is inspired by Bai and Ng [27] and the choice of the autoregressive lag is based on the Akaike Information Criterion (AIC) adjusted for the hiddenness of the factors as in Mariano and Murasawa [28]. It should be noted that our paper does not provide any statistical inference on ADFMs for incomplete panel data.

As an application, we develop a framework for forecasting weekly returns using the estimated factors to determine their main driving indicators of different frequencies. We also empirically construct prediction intervals for index returns taking into account uncertainties arising from the estimation of the latent factors and model parameters. Our framework is able to trace the expected behavior of the index returns back to the initial observations and their high-frequency counterparts. In the empirical study, weekly prediction intervals of the Standard & Poor’s 500 (S&P500) returns are determined for support of asset and risk management. Thus, we detect the drivers of its expected market development and define two dynamic trading strategies to profit from the gained information. For this, our prediction intervals serve as the main ingredient of the two trading strategies.

The remainder of this paper is structured as follows. Section 2 introduces ADFMs. For known model dimensions and autoregressive order, we derive here our estimation procedure for complete and incomplete data sets. Section 3 proposes a selection procedure for the optimal factor dimension and autoregressive order. Section 4 summarizes the results of a Monte Carlo study, where we examine the performance of our estimation method and compare it with the benchmark of Bańbura and Modugno [20] across different sample sizes, factor dimensions, autoregressive orders and proportions of missing data. In Section 5, we present our forecasting framework for a univariate return series using the estimated factors in an autoregressive setup. We also discuss the construction of empirical prediction intervals and use them to specify our two dynamic trading strategies. Section 6 contains our empirical study and Section 7 concludes. Finally note that all computations have been done in Matlab. Our Matlab codes and data are available as supplementary materials.

2. Estimation of ADFMs for Known Model Dimensions and Autoregressive Order

2.1. Estimation with Complete Panel Data

For any point in time t, the covariance-stationary vector

X_{t} \in R^{N}

collects the observed data at t. We assume that

X_{t}

is driven by a common (latent), multivariate normally distributed factor

F_{t} \in R^{K}, 1 \leq K \leq N,

and an idiosyncratic component

ϵ_{t} \in R^{N}

. The factors

{(F_{t})}_{t}

are supposed to be zero-mean, covariance-stationary and potentially autoregressive such that they obey a VAR

(p), p \geq 0

. For

p = 0

or

p > 0

the described factor model is static and dynamic, respectively. Because of the VAR

(p)

structure, it follows

\begin{matrix} X_{t} & = W F_{t} + μ + ϵ_{t}, ϵ_{t} \sim N (0_{N}, Σ_{ϵ}) iid, \end{matrix}

(1)

\begin{matrix} F_{t} & = \sum_{i = 1}^{p} A_{i} F_{t - i} + δ_{t}, δ_{t} \sim N (0_{K}, Σ_{δ}) iid, \end{matrix}

(2)

with constant matrices

W \in R^{N \times K}

,

μ_{} \in R^{N \times 1}

,

Σ_{ϵ_{}} \in R^{N \times N}

,

A_{i} \in R^{K \times K}, 1 \leq i \leq p,

and

Σ_{δ_{}} \in R^{K \times K}

. The vectors

0_{N} \in R^{N \times 1}

and

0_{K} \in R^{K \times 1}

are zero vectors, respectively, and

N (\cdot, \cdot)

denotes the multivariate normal distribution. The covariance matrices in (1)–(2) do not have to be diagonal, thus, the above model ranks among the Approximate Factor Models. Model (1)–(2) coincides with the Exact Static Factor Model (ESFM) from Tipping and Bishop [29] for

p = 0

, if the covariance matrix of the factors

{(F_{t})}_{t}

is the identity matrix and the matrix

Σ_{ϵ_{}}

is a constant times the identity matrix. Bańbura and Modugno [20] consider an EDFM (1)–(2) with a diagonal covariance matrix

Σ_{ϵ_{}}

, i.e., the errors

ϵ_{t}

in (1) are cross-sectionally uncorrelated. However, their model is more general in another direction, namely, they allow for serial correlation of

{(ϵ_{t})}_{t}

.

Then, we focus on the dynamic case with

p > 0

. The conditional distributions

X_{t} | F_{t}

and

F_{t} | F_{t - 1}, \dots, F_{t - p}

can be derived from (1)–(2) and are given by

\begin{matrix} X_{t} | F_{t} & \sim N (W F_{t} + μ_{}, Σ_{ϵ_{}}), \\ F_{t} | F_{t - 1}, \dots, F_{t - p} & \sim N (\sum_{i = 1}^{p} A_{i} F_{t - i}, Σ_{δ_{}}) . \end{matrix}

The covariance-stationarity of

{(F_{t})}_{t}

requires that all z satisfying

| I_{K} - \sum_{i = 1}^{p} (A_{i} z^{i}) | = 0

lie outside the unit circle Hamilton [30] (Proposition 10.1, p. 259), where

I_{K} \in R^{K \times K}

is the identity matrix. Moreover, for the covariance matrix

Σ_{F_{}}

of the covariance-stationary factors

{(F_{t})}_{t}

, Hamilton [30] (Equation (10.1.15), p. 260 and Proposition 10.2, p. 263) justifies the following series representation

\begin{matrix} Σ_{F_{}} = \sum_{k = 0}^{\infty} ((A^{k}) Σ_{δ_{}} {(A^{k})}^{'}), \end{matrix}

(3)

where we define for all

k \geq 1

and

O_{K} \in R^{K \times K}

as the zero square matrix of dimension K

\begin{matrix} A^{k} = [\begin{matrix} A_{1}, \dots, A_{p} \end{matrix}] [\begin{matrix} A^{k - 1} \\ ⋮ \\ A^{k - p} \end{matrix}] with A^{0} = I_{K} and A^{k - p} = O_{K}, \forall (k - p) < 0 . \end{matrix}

By virtue of the Bayes’ theorem, the conditional distribution

F_{t} | X_{t}

is given by

\begin{matrix} F_{t} | X_{t} \sim N (M^{- 1} W^{'} Σ_{ϵ_{}}^{- 1} (X_{t} - μ_{}), M^{- 1}) = N (μ_{F_{t} | X_{t}}, Σ_{F_{t} | X_{t}}), \end{matrix}

(4)

where

M = W^{'} Σ_{ϵ_{}}^{- 1} W + Σ_{F}^{- 1}

. The independence of the errors

{(ϵ_{t})}_{t}

provides that the factors

F_{t}

conditioned on the observations

X_{t}

, i.e.,

{(F_{t} | X_{t})}_{t}

, are uncorrelated. For serially correlated errors, the distribution in (4) has to be adjusted and the independence of

{(F_{t} | X_{t})}_{t}

is lost. In empirical studies, the covariance matrix

Σ_{F_{}}

is computed by truncating the infinite series in (3). Here, we truncate the infinite series for

Σ_{F_{}}

as soon as the relative change in the Frobenius norm of two subsequent truncations is smaller than the predetermined tolerance

η_{F_{}} = 10^{- 6}

. For the existence of their inverse, both matrices

Σ_{ϵ_{}}

and

Σ_{F}

must be positive definite. Because of

K < N

, the positive definiteness usually holds in practical applications. If one of them is merely positive semi-definite, we recommend a reduction of the factor dimension K. For the trivial case

K = N

a proper solution of (1) is given by

W = I_{K}

and

Σ_{ϵ_{}} = O_{K}

.

The log-likelihood function

L (Θ | X, F)

of Model (1)–(2) with parameters

Θ = {W, Σ_{ϵ_{}}, A_{1}, \dots, A_{p}, Σ_{δ_{}}}

for complete samples

X = {[X_{1}, \dots, X_{T}]}^{'} \in R^{T \times N}

and

F = {[F_{1}, \dots, F_{T}]}^{'} \in R^{T \times K}

of sufficient size

T > p

depends on the unobservable factors

F_{t}

,

t = 1, \dots, T

and, therefore, cannot be directly computed. However, Model (1)–(2) can be estimated in the maximum likelihood framework by using the two-step expectation-maximization algorithm of Dempster et al. [31]. In the first step, called the expectation step, the unobserved factors

F_{t}

are integrated out. This is achieved through the computation of the conditional expectation of the log-likelihood

L (Θ | X, F)

with respect to the observed data

X = {(X_{1}, \dots, X_{T})}^{'} \in R^{T \times N}

. Thus, Equation (4) yields

\begin{matrix} E [L | X] & = - \frac{T}{2} [N ln (2 π) + ln (|Σ_{ϵ_{}}|)] - \frac{T - p}{2} [K ln (2 π) + ln (|Σ_{δ_{}}|)] \\ - \frac{1}{2} \sum_{t = 1}^{T} ({(X_{t} - μ_{})}^{'} Σ_{ϵ_{}}^{- 1} (X_{t} - μ_{})) - \frac{1}{2} \sum_{t = 1}^{T} tr ((Σ_{F_{t} | X_{t}} + μ_{F_{t} | X_{t}} μ_{F_{t} | X_{t}}^{'}) W^{'} Σ_{ϵ_{}}^{- 1} W) \\ + \sum_{t = 1}^{T} (μ_{F_{t} | X_{t}}^{'} W^{'} Σ_{ϵ_{}}^{- 1} (X_{t} - μ_{})) - \frac{1}{2} \sum_{t = p + 1}^{T} tr ((Σ_{F_{t} | X_{t}} + μ_{F_{t} | X_{t}} μ_{F_{t} | X_{t}}^{'}) Σ_{δ_{}}^{- 1}) \\ + \sum_{t = p + 1}^{T} \sum_{i = 1}^{p} tr (μ_{F_{t - i} | X_{t - i}} μ_{F_{t} | X_{t}}^{'} Σ_{δ_{}}^{- 1} A_{i}) \\ - \frac{1}{2} \sum_{t = p + 1}^{T} \sum_{i = 1}^{p} tr ((Σ_{F_{t - i} | X_{t - i}} + μ_{F_{t - i} | X_{t - i}} μ_{F_{t - i} | X_{t - i}}^{'}) A_{i}^{'} Σ_{δ_{}}^{- 1} A_{i}) \\ - \sum_{t = p + 1}^{T} \sum_{\underset{i < j}{i, j = 1}}^{p} tr (μ_{F_{t - j} | X_{t - j}} μ_{F_{t - i} | X_{t - i}}^{'} A_{i}^{'} Σ_{δ_{}}^{- 1} A_{j}), \end{matrix}

(5)

where

tr (\cdot)

denotes the matrix trace. Now, the expected log-likelihood

E [L | X]

only depends on the parameters of the conditional distribution of

F_{t}

from (4).

In the second step, called the maximization step, the expected log-likelihood

E [L | X]

in (5) is maximized with respect to the parameters of Model (1)–(2). However, this maximization is done in a simplified way. The dependence of

μ_{F_{t} | X_{t}}

and

Σ_{F_{t} | X_{t}}

on the parameters in (1)–(2) for

1 \leq t \leq T

is neglected at this stage (i.e., both are constants in the maximization routine). This simplification allows us to compute the partial derivatives of (5) with respect to W,

Σ_{ϵ_{}}

,

Σ_{δ_{}}

and

A_{i}, 1 \leq i \leq p

, explicitly. It is also justified by the fact that

μ_{F_{t} | X_{t}}

and

Σ_{F_{t} | X_{t}}

for

1 \leq t \leq T

merely arise from the unobservable factors and therefore, can be treated as data or known parameters. Please note that this simplification is in line with [20,23,29].

By setting the partial derivatives of

E [L | X]

equal to zero matrices and solving the system of linear matrix equations, we receive updates for the parameters of Model (1)–(2). Let index

(l)

refer to the respective loop of the EM with model parameters

Θ_{(l)} = {W_{(l)}, Σ_{ϵ_{} (l)}, A_{1 (l)}, \dots, A_{p (l)}, Σ_{δ_{} (l)}}

. For any

b_{1}, u_{1}, b_{2}, u_{2} \in N

with

u_{1} > b_{1}, u_{2} > b_{2}

and

u_{1} - b_{1} = u_{2} - b_{2}

, we define

\{_{b_{1}}^{u_{1}} S_{b_{2}}^{u_{2}}\}

as follows

\{_{b_{1}}^{u_{1}} S_{b_{2}}^{u_{2}}\} = \frac{1}{u_{1} - b_{1}} [X_{b_{1}} - μ_{}, \dots, X_{u_{1}} - μ_{}] {[X_{b_{2}} - μ_{}, \dots, X_{u_{2}} - μ_{}]}^{'} .

Then, the parameters of the next loop

(l + 1)

are given by

\begin{matrix} W_{(l + 1)} & = \{_{1}^{T} S_{1}^{T}\} Σ_{ϵ_{} (l)}^{- 1} W_{(l)} {(I_{K} + D_{(l)} \{_{1}^{T} S_{1}^{T}\} Σ_{ϵ_{} (l)}^{- 1} W_{(l)})}^{- 1}, \end{matrix}

(6)

\begin{matrix} Σ_{ϵ_{} (l + 1)} & = (I_{N} - W_{(l + 1)} D_{(l)}) \{_{1}^{T} S_{1}^{T}\}, \end{matrix}

(7)

\begin{matrix} [A_{1 (l + 1)}, \dots, A_{p (l + 1)}] & = (1_{p}^{'} \otimes D_{(l)}) {\tilde{S}}^{'} (I_{p} \otimes D_{(l)}^{'}) \\ \times {((I_{p} \otimes M_{(l)}^{- 1}) (I_{p} \otimes D_{(l)}) {\hat{S}}^{'} (I_{p} \otimes D_{(l)}^{'}))}^{- 1}, \end{matrix}

(8)

\begin{matrix} Σ_{δ_{} (l + 1)} & = M_{(l)}^{- 1} + D_{(l)} \{_{p + 1}^{T} S_{p + 1}^{T}\} D_{(l)}^{'} - \\ - [A_{1 (l + 1)}, \dots, A_{p (l + 1)}] (I_{p} \otimes D_{(l)}) \tilde{S} (1_{p} \otimes D_{(l)}^{'}), \end{matrix}

(9)

\begin{matrix} D_{(l)} & = M_{(l)}^{- 1} W_{(l)}^{'} Σ_{ϵ_{} (l)}^{- 1} \in R^{K \times N}, \end{matrix}

(10)

where ⊗ refers to the Kronecker product and

1_{p} \in R^{p}

is a vector of ones,

\begin{matrix} \tilde{S} & = (\begin{matrix} \{_{p}^{T - 1} S_{p + 1}^{T}\} \\ O_{N} \\ ⋱ \\ O_{N} \\ \{_{1}^{T - p} S_{p + 1}^{T}\} \end{matrix}) \in R^{p N \times p N} \end{matrix}

and

\begin{matrix} \hat{S} & = (\begin{matrix} \{_{p}^{T - 1} S_{p}^{T - 1}\} & \dots & \{_{p}^{T - 1} S_{1}^{T - p}\} \\ ⋮ & ⋱ & ⋮ \\ \{_{1}^{T - p} S_{p}^{T - 1}\} & \dots & \{_{1}^{T - p} S_{1}^{T - p}\} \end{matrix}) \in R^{p N \times p N} . \end{matrix}

For the initialization of the EM, we deploy the Probabilistic Principal Component Analysis (PPCA) of Tipping and Bishop [29], which is a special case of (6)–(10) and implies that our initial values for the matrices

A_{i (0)}

,

i = 1, \dots, p

are zero matrices. Alternatively, Doz et al. [16] and Bańbura et al. [32] comprise two steps for EM initialization: At first, they apply PCA for estimating factors, loadings matrix and

Σ_{ϵ_{}}

. Thereafter, an Ordinary Least Squares Regression (OLS) provides the VAR

(p)

parameters.

For an invertible matrix

R \in R^{K \times K}

,

\{W R^{- 1}, Σ_{ϵ_{}}, R A_{1} R^{- 1}, \dots, R A_{p} R^{- 1}, R Σ_{δ_{}} R^{'}\}

and

\{W, Σ_{ϵ_{}}, A_{1}, \dots, A_{p}, Σ_{δ_{}}\}

represent solutions of Model (1)–(2). Hence, the EM output (6)–(10) is unique up to any invertible, linear transformation [20] (working paper version). Since the EM termination must not be affected by this degree of freedom, the absolute value of the relative change in the log-likelihood function may serve as termination criterion. In our implementation, the EM terminates as soon as the absolute value of the relative change in

E [L | X]

between two successive iterations falls below the error tolerance

η = 10^{- 2}

. In our simulations, decreasing the termination criterion from

10^{- 2}

to

10^{- 4}

from [20] (working paper version, 2010) did not significantly improve the estimation quality of our method.

Bańbura and Modugno [20] employ the Kalman filter and smoother for estimating factor moments and covariance matrices between factors and (missing) panel data. By contrast, we estimate them analytically. If the reconstruction Formula (4) and error correlation assumptions enter the update steps of Bańbura and Modugno [20], both approaches coincide. Additionally, Bańbura and Modugno [20] allow for the linear restrictions given in [23,33], which can also be transferred to our approach [34].

2.2. Estimation with Incomplete Panel Data

In this section, we treat incomplete data as stock, flow and change in flow variables. We apply the notation from, e.g., Schumacher and Breitung [4], Ramsauer et al. [34]. As before, let N and T be the number of time series and sample size. The counter

1 \leq t \leq T

covers each point in time, when the data is updated, i.e., it maps the highest frequency. For

1 \leq i \leq N

, the vector

X_{obs}^{i} = {(X_{obs, j}^{i})}_{1 \leq j \leq T (i)}

with

T (i) \leq T

collects the observations of signal i and

{(n_{j})}_{1 \leq j \leq T (i)}

denotes the high-frequency periods of each low-frequency time interval. For missing or low-frequency observations, it follows:

T (i) < T

. Finally, let

X^{i} = {(X_{j}^{i})}_{1 \leq j \leq T}

be an artificial, high-frequency time series satisfying

\begin{matrix} X_{obs}^{i} = Q_{i} X^{i}, \end{matrix}

(11)

with

Q_{i} \in R^{T (i) \times T}

.

For any complete time series,

Q_{i} = I_{T}

holds. For stock variables, only the respective elements of

Q_{i}

are 1, whereas the remaining entries are zeros. For a flow variable, which reflects the average of its high-frequency counterparts,

Q_{i}

is given by

\begin{matrix} X_{obs}^{i} = \underset{Q_{i}}{\underset{⏟}{(\begin{matrix} \begin{matrix} \frac{1}{n_{1}} \end{matrix} & \begin{matrix} \dots \end{matrix} & \begin{matrix} \frac{1}{n_{1}} \end{matrix} & 0 & 0 & \dots & 0 \\ 0 & \dots & 0 & \begin{matrix} ⋱ \end{matrix} & 0 & \dots & 0 \\ 0 & \dots & 0 & 0 & \begin{matrix} \frac{1}{n_{T (i)}} \end{matrix} & \begin{matrix} \dots \end{matrix} & \begin{matrix} \frac{1}{n_{T (i)}} \end{matrix} \end{matrix})}} X^{i} . \end{matrix}

(12)

The change in a flow variable

Q_{i}

has the following shape

\begin{matrix} Δ X_{obs}^{i} = \underset{Q_{i}}{\underset{⏟}{(\begin{matrix} \begin{matrix} \frac{1}{n_{1}} \end{matrix} & \begin{matrix} \dots \end{matrix} & \begin{matrix} \frac{n_{1} - 1}{n_{1}} \end{matrix} & 1 & \begin{matrix} \frac{n_{2} - 1}{n_{2}} \end{matrix} & \begin{matrix} \dots \end{matrix} & \begin{matrix} \frac{1}{n_{2}} \end{matrix} & 0 & 0 & \dots & 0 & 0 & \dots & 0 & 0 & \dots & 0 \\ 0 & \dots & 0 & 0 & \begin{matrix} \frac{1}{n_{2}} \end{matrix} & \begin{matrix} \dots \end{matrix} & \begin{matrix} \frac{n_{2} - 1}{n_{2}} \end{matrix} & 1 & \begin{matrix} \frac{n_{3} - 1}{n_{3}} \end{matrix} & \begin{matrix} \dots \end{matrix} & \begin{matrix} \frac{1}{n_{3}} \end{matrix} & 0 & \dots & 0 & 0 & \dots & 0 \\ 0 & \dots & 0 & 0 & 0 & \dots & 0 & 0 & \begin{matrix} \frac{1}{n_{3}} \end{matrix} & \begin{matrix} \dots \end{matrix} & \begin{matrix} \frac{n_{3} - 1}{n_{3}} \end{matrix} & * & \dots & * & 0 & \dots & 0 \\ 0 & \dots & 0 & 0 & 0 & \dots & 0 & 0 & 0 & \dots & 0 & * & ⋱ & * & 0 & \dots & 0 \\ 0 & \dots & 0 & 0 & 0 & \dots & 0 & 0 & 0 & \dots & 0 & * & \dots & * & \begin{matrix} \frac{n_{T (i)} - 1}{n_{T (i)}} \end{matrix} & \begin{matrix} \dots \end{matrix} & \begin{matrix} \frac{1}{n_{T (i)}} \end{matrix} \end{matrix})}} Δ X^{i} . \end{matrix}

(13)

Sometimes, a flow variable serves as the sum of the high-frequency signals instead of their average [7] (ECB working paper, pp. 9–10). In this case, the fractions in

Q_{i}

in (12) are replaced by ones. Only, if all low-frequency periods comprise the same number of high-frequency periods, the sum version of the change in flow variable (13) exists [34] (p. 8, footnote 6). The change in an averaged flow variable (13) does require this equality.

The chosen data type does not affect our subsequent considerations such that we continue with Model (11). For

1 \leq t \leq T, 1 \leq k \leq K, 1 \leq i \leq N

, let

F = {(F_{t}^{k})}_{t k} \in R^{T \times K}

and

ϵ = {(ϵ_{t}^{i})}_{t i} \in R^{T \times N}

collect all factors and errors of Model (1). Then, we have for

1 \leq i \leq N

\begin{matrix} X^{i} & = F W_{i}^{'} + μ_{i} 1_{T} + ϵ_{}^{i}, \\ X_{obs}^{i} & = Q_{i} F W_{i}^{'} + Q_{i} μ_{i} 1_{T} + Q_{i} ϵ_{}^{i}, \end{matrix}

where

W_{i}

,

μ_{i}

and

ϵ_{}^{i}

denote the i-th row of W, the i-th element of

μ_{}

and the i-th column of

ϵ

, respectively. Because of

ϵ_{t} \sim N (0_{N}, Σ_{ϵ}) iid

, for all

1 \leq i \leq N

we get

ϵ_{}^{i} \sim N (0_{T}, σ_{i}^{2} I_{T})

resulting in

\begin{matrix} {(\begin{matrix} X^{i} \\ X_{obs}^{i} \end{matrix})|}_{F} \sim N ((\begin{matrix} F W_{i}^{'} + μ_{i} 1_{T} \\ Q_{i} F W_{i}^{'} + Q_{i} μ_{i} 1_{T} \end{matrix}), σ_{i}^{2} (\begin{matrix} I_{T} & Q_{i}^{'} \\ Q_{i} & Q_{i} Q_{i}^{'} \end{matrix})) . \end{matrix}

Finally,

X^{i}

conditioned on F and

X_{obs}^{i}

is still normally distributed with

\begin{matrix} E [X^{i} | F, X_{obs}^{i}] & = (F W_{i}^{'} + μ_{i} 1_{T}) + Q_{i}^{'} {(Q_{i} Q_{i}^{'})}^{- 1} [X_{obs}^{i} - Q_{i} (F W_{i}^{'} + μ_{i} 1_{T})] \\ V ar [X^{i} | F, X_{obs}^{i}] & = σ_{i}^{2} [I_{T} - Q_{i}^{'} {(Q_{i} Q_{i}^{'})}^{- 1} Q_{i}], \end{matrix}

(14)

which is the reconstruction formula of Stock and Watson [1,35].

Using (14), we extend the EM (6)–(10) to treat incomplete panel data. In each loop

(l) \geq 0

and for all

1 \leq i \leq N

, an update

X_{(l + 1)}^{i}

is generated as follows

\begin{matrix} X_{(l + 1)}^{i} = (F_{(l)} W_{i (l)}^{'} + μ_{i (l)} 1_{T}) + Q_{i}^{'} {(Q_{i} Q_{i}^{'})}^{- 1} [X_{obs}^{i} - Q_{i} (F_{(l)} W_{i (l)}^{'} + μ_{i (l)} 1_{T})] . \end{matrix}

(15)

The matrix

Q_{i}^{'} {(Q_{i} Q_{i}^{'})}^{- 1}

is the unique Moore-Penrose Inverse of

Q_{i}

[36] (Definition A.64, pp. 372–373), satisfying

Q_{i} Q_{i}^{'} {(Q_{i} Q_{i}^{'})}^{- 1} = I_{T (i)}

. Its uniqueness eliminates degrees of freedom, whereas the relation

Q_{i} Q_{i}^{'} {(Q_{i} Q_{i}^{'})}^{- 1} = I_{T (i)}

ensures the match between observed and artificial data, i.e.,

Q_{i} X_{(l + 1)}^{i} = X_{obs}^{i}

. For incomplete data, the overall approach consists of an inner and outer EM as summarized in Algorithm A2 of Appendix A.

First, Algorithm A2 initializes complete panel data (if necessary, it fills gaps). A univariate time series

X_{(0)}^{i}

does not yet need to satisfy (11), since Equation (15) ensures this until Algorithm A2 converges. As before, relative termination criteria reduce the dimension impact of the parameter space and data sample on the algorithm’s runtime. Furthermore, relative changes in

E_{{\hat{Θ}}_{(l)}} [L | X]

avoid that changes in

(K^{*}, p^{*})

or ambiguous parameters affect the convergence of the algorithm. After the initialization phase, Algorithm A2 alternately updates the complete panel data and reestimates the model parameters

{\hat{Θ}}_{(l)}

until a (local) maximum of the expected log-likelihood function

E_{{\hat{Θ}}_{(l)}} [L | X]

is reached.

The two alternating EMs offer the following advantages: First, Static Factor Models (SFMs) and ADFMs with incomplete panel data can be estimated. Second, for low-frequency observations, artificial counterparts of higher frequency are provided (nowcasting). Third, besides the means, factor variances are estimated indicating some kind of estimation uncertainty. Fourth, there is no need for the Kalman Filter.

3. Model Selection for Unknown Dimensions and Autoregressive Orders

The ADFM (1)–(2) for complete panel data and its estimation require knowledge of the factor dimension K and autoregressive order p. In empirical analyses, both must be determined. For this, we propose a two-step model selection method. For static factor models, Bai and Ng [27] thoroughly investigated the selection of the optimal factor dimension

K^{*}

and introduced several common model selection procedures which were reused in, e.g., [23,25,37,38,39,40]. In this paper, we deploy the following modification of Bai and Ng [27]:

\begin{matrix} K^{*} & = \underset{1 \leq K \leq \bar{K}}{arg min} \{V (K) + K g (N, T)\} \end{matrix}

(16)

\begin{matrix} = \underset{1 \leq K \leq \bar{K}}{arg min} \{V (K) + K {\hat{σ}}^{2} (\frac{N + T}{N T}) ln [min (N, T)]\}, \end{matrix}

(17)

where

1 \leq \bar{K} \leq N

denotes an upper limit for factor dimension K and

\begin{matrix} V (K) & = \frac{1}{N T} \sum_{t = 1}^{T} {(X_{t} - \hat{W} μ_{F_{t} | X_{t}} - {\hat{μ}}_{X_{}})}^{'} (X_{t} - \hat{W} μ_{F_{t} | X_{t}} - {\hat{μ}}_{X_{}}) \end{matrix}

(18)

covers the estimated residual variance of Model (1) ignoring any autoregressive factor dynamics. Bai and Ng [27] (p. 199, Theorem 2) showed that panel criteria in the form of (16) consistently estimate the true factor dimension, if their assumptions A-D are satisfied, PCA is used for factor estimation and the penalty function obeys for

N, T \to \infty

:

\begin{matrix} g (N, T) \to 0 and min (N, T) g (N, T) \to \infty . \end{matrix}

The penalty function

g (N, T)

in (17) coincides with the 2^nd panel criterion in Bai and Ng [27] (p. 201) except for

{\hat{σ}}^{2}

. For empirical studies, Bai and Ng [27] suggest

{\hat{σ}}^{2} = V (\bar{K})

as scaling of the penalty in (17) with

V (\bar{K})

as minimum of (18) for fixed

\bar{K}

regarding

\hat{W}, {\hat{μ}}_{F_{t} | X_{t}}}_{1 \leq t \leq T}

and

{\hat{μ}}_{X_{}}

. Therefore, their penalty depends on the variance that remains, although the upper limit of the factor dimension was reached. If we use

\bar{K} = N

, the setting

\hat{W} = I_{N}, μ_{F_{t} | X_{t}} = X_{t} - {\hat{μ}}_{X_{}}

for all

1 \leq t \leq T

is a trivial solution for SFM (1). Furthermore, it yields

{\hat{σ}}^{2} = 0

and thus, overrides the penalty. For any

\bar{K} < N

, the choice of

\bar{K}

affects

{\hat{σ}}^{2}

and hence, the penalty in (17). To avoid any undesirable degree of freedom arising from the choice of

\bar{K}

, we therefore propose

\begin{matrix} {\hat{σ}}^{2} = m (V_{P P C A} (1) - V_{P P C A} (N - 1)) \end{matrix}

(19)

for a non-negative multiplier m and

V_{P P C A} (\cdot)

denoting the empirical residual variance, if Model (1) is estimated using the PPCA of Tipping and Bishop [29].

Irrespective of whether PCA or PPCA is deployed, the error variance decreases, when the factor dimension increases. Thus,

V_{P P C A} (1) - V_{P P C A} (N - 1) \geq 0

holds. The non-negativity of m causes that

{\hat{σ}}^{2}

in (19) and the penalty in (17) are non-negative. This guarantees that large K is punished. Unlike

{\hat{σ}}^{2} = V (\bar{K})

, the strictness of

{\hat{σ}}^{2}

depends on m instead of

\bar{K}

. Hence, the strictness of the penalty and upper limit of the factor dimension are separated from each other. The panel criteria of Bai and Ng [27] are asymptotically equivalent as

N, T \to \infty

, but may differently behave for finite samples [25,27]. For a better understanding of how m influences the penalty function, we exemplarily consider various multipliers

m \in [1 / 66, 1]

in Section 4. Finally, we answer why

(V_{P P C A} (1) - V_{P P C A} (N - 1))

instead of

V_{P P C A} (1)

or any alternative is used. For

m = 1 / (N - 2)

, the term

{\hat{σ}}^{2}

in (19) coincides with the negative slope of the straight line through the points

(1, V_{P P C A} (1))

and

(N - 1, V_{P P C A} (N - 1))

, i.e., we linearize the decay in

V_{P P C A} (K)

over the interval

[1, N - 1]

and then, take its absolute value for penalty adjustment. In other words, for

m = 1 / (N - 2)

the term

{\hat{σ}}^{2}

in (19) describes the absolute value of the decay in

V_{P P C A} (K)

per unit in dimension. In the empirical study of Section 6, we also use

m = 1 / 31 = 1 / (N - 2)

, since this provides a decent dimension reduction, but it is not such restrictive that changes in the economy are ignored. In total, neither our proposal of

{\hat{σ}}^{2}

nor the original version in Bai and Ng [27] affects the asymptotic behavior of the function

g (N, T)

such that

K^{*}

in (17) consistently estimates the true dimension. Please note that we neglect the factor dynamics and treat DFMs as SFMs in this step.

In a next step, our model selection approach derives the optimal autoregressive order

p^{*} (K) \geq 0

for any fixed

1 \leq K \leq \bar{K}

using AIC. As factors are unobservable, we replace the log-likelihood

L_{F}

of Model (2) by the conditional expectation

E [L_{F} | X]

in the usual AIC. Furthermore, Equation (2) can be rewritten as a stationary VAR(1) process

{(\tilde{F_{t}})}_{t}

, whose covariance matrix

Σ_{\tilde{F}}

has a similar representation to (3). When we run the EM for a fixed K and a prespecified range of the autoregressive orders, the optimal

p^{*} (K)

satisfies

\begin{matrix} p^{*} (K) & = \underset{0 \leq p \leq \bar{p}}{arg min} \{tr (Σ_{\tilde{F}}^{- 1} ((I_{p} \otimes M^{- 1}) + (I_{p} \otimes D) ({\tilde{X}}_{p} - (1_{p} \otimes μ)) \\ \times {({\tilde{X}}_{p} - (1_{p} \otimes μ))}^{'} (I_{p} \otimes D^{'}))) + 2 p K^{2} + K (K + 1) \\ + T K ln (2 π) + (T - p) ln (|Σ_{δ_{}}|) + ln (|Σ_{\tilde{F}}|) + (T - p) K\}, \end{matrix}

(20)

with

0 \leq \bar{p} < T

as upper lag length to be tested [41]. For

p > 0

, we use the maximum likelihood estimates of matrices M, D and

Σ_{\tilde{F}}

. Like

η_{F_{}}

, the criterion

η_{\tilde{F}}

truncates the infinite series for

Σ_{\tilde{F}}

. Alternatively,

Σ_{\tilde{F}}

can explicitly be computed, see Lemma A.2.7 in [41]. Further, the vector

{\tilde{X}}_{p} = {(X_{p}^{'}, \dots, X_{1}^{'})}^{'} \in R^{p N}

comprises the first p observations of X. For

p = 0

, Model (1)–(2) is regarded as SFM. In particular, the objective function of the selection criterion (20) for SFMs is

K (K + 1) + T K ln (2 π) + T ln (|Σ_{δ_{}}|) + T K

. Thereafter, we choose an optimal factor dimension

K^{*}

by using (17) and ignoring the autoregressive structure in (2). An algorithm for the overall model selection procedure is provided in Algorithm A1.

4. Monte Carlo Simulation

In this section, we analyze the two-step estimation method for ADFM (1)–(2) for complete and incomplete panel data within a Monte Carlo (MC) simulation study. Among other things, we address the following questions: (i) does the data sample size (i.e., length and number of time series) affect the estimation quality? (ii) to what extent does data incompleteness deteriorate the estimation quality? (iii) do the underlying panel data types (i.e., stock, flow and change in flow variables) matter? (iv) does our model selection procedure detect the true factor dimension and lag order, even for

K > 1

and

p > 1

? (v) how does our two-step approach perform compared to the estimation method of Bańbura and Modugno [20]? (vi) are factor means and covariance matrices more accurate for the closed-form factor distributions (4) instead of the standard KF and KS?

Before we answer the previous questions, we explain how our random samples are generated. For

a, b \in R

with

a < b

, let

U (a, b)

stand for the uniform distribution on the interval

[a, b]

and let

diag (z) \in R^{K \times K}

be a diagonal matrix with elements

z = [z_{1}, \dots, z_{K}] \in R^{K}

. For fixed data and factor dimensions

(T, N, K, p)

, let

V_{i} \in R^{K \times K}, 1 \leq i \leq p, V_{δ_{}} \in R^{K \times K}

and

V_{ϵ_{}} \in R^{N \times N}

represent arbitrary orthonormal matrices. Then, we receive the parameters of the ADFM (1)–(2) in the following manner:

\begin{matrix} A_{i} & = V_{i} diag (\frac{z_{i, 1}}{p}, \dots, \frac{z_{i, K}}{p}) (V_{i}^{'}), & z_{i, j} \sim U (0.25, 0.75) iid, 1 \leq i \leq p, 1 \leq j \leq K, \\ Σ_{δ} & = V_{δ} diag (z_{δ_{}, 1}, \dots, z_{δ_{}, K}) (V_{δ}^{'}), & z_{δ_{}, j} \sim U (0.25, 0.50) iid, 1 \leq j \leq K, \\ W & = {(w_{n, j})}_{n, j}, & w_{n, j} \sim N (0, 1) iid, 1 \leq n \leq N, 1 \leq j \leq K, \\ μ & = {(μ_{n})}_{n}, & μ_{n} \sim N (0, 1) iid, 1 \leq n \leq N, \\ Σ_{ϵ_{}} & = V_{ϵ_{}} diag (z_{ϵ_{}, 1}, \dots, z_{ϵ_{}, N}) (V_{ϵ_{}}^{'}), & z_{ϵ_{}, n} \sim U (0.05, 0.25) iid, 1 \leq n \leq N . \end{matrix}

The above ADFMs have cross-sectionally, but not serially correlated shocks. To prevent us from implicitly constructing SFMs with eigenvalues of

A_{i}

close to zero, the eigenvalues of

A_{i}, 1 \leq i \leq p,

lie within the range

[0.25 / p, 0.75 / p]

. The division by p balances the sum of all eigenvalues regarding the autoregressive order p. For simplicity reasons, we consider matrices

A_{i}

with positive eigenvalues. However, this assumption, the restriction to eigenvalues in the range

[0.25 / p, 0.75 / p]

and the division by p can be skipped. If matrices

A_{i}, 1 \leq i \leq p,

meet the covariance-stationarity conditions, we simulate factor samples

F \in R^{T \times K}

and panel data

X \in R^{T \times N}

using Equations (1) and (2). Otherwise, all matrices

A_{i}

are drawn again until the covariance-stationarity conditions are met. Similarly, we only choose matrices W of full column rank K.

So far, we have complete panel data. Let

ρ_{m} \in [0, 1]

be the ratio of gaps arising from missing observations and low-frequency time series, respectively. To achieve incomplete data we remove

⌈ρ_{m} T⌉

elements from each time series. For stock variables, we randomly delete

⌈ρ_{m} T⌉

values to end up with irregularly scattered gaps. At this stage, flow and change in flow variables serve as low-frequency information, which is supposed to have an ordered pattern of gaps. Therefore, an observation is made at time

t = ⌈1 + s / (1 - ρ_{m})⌉

with

0 \leq s \leq ⌊(T - 1) (1 - ρ_{m})⌋

and

s \in N_{0}

. Please note that an observed (change in) flow variable is a linear combination of high-frequency data.

In Table A1, Table A2, Table A3, Table A4, Table A5, Table A6, Table A7, Table A8 and Table A9 the same

ρ_{m}

applies to all univariate columns in X such that gaps of (change in) flow variables occur at the same time. If the panel data contains a single point in time without any observation, neither our closed-form solution nor the standard KF provide factor estimates. To avoid such scenarios, i.e., empty rows of the observed panel data

X_{obs}

, each panel data in the second (third) column of Table A1, Table A2, Table A3 and Table A4 comprises

⌈N / 2⌉

times series modeled as stock variable and

⌊N / 2⌋

time series treated as (change in) flow variable. To ensure at least one observation per row of

X_{obs}

, we check each panel data sample, before we proceed. If there is a zero row in

X_{obs}

, we reapply our missing data routine based on the complete data X.

Note that estimated factors are unique except for an invertible, linear transformation. For a proper quality assessment across diverse estimation methods, we must take this ambiguity into account as in [4,8,16,20,42]. Let F and

\hat{F}

be the original and estimated factors, respectively. If the estimation methodology works, it holds:

\hat{F} R = F

. The solution

R = {({\hat{F}}^{'} \hat{F})}^{- 1} {\hat{F}}^{'} F

justifies the trace

R^{2}

of Stock and Watson [8] defined by

\begin{matrix} trace R^{2} = \frac{tr (F^{'} \hat{F} {({\hat{F}}^{'} \hat{F})}^{- 1} {\hat{F}}^{'} F)}{tr (F^{'} F)} . \end{matrix}

The trace

R^{2}

lies in

[0, 1]

with lower (upper) limits indicating a poorly (perfectly) estimated factor span.

Eventually, we choose for the termination criteria:

η = ξ = 10^{- 2}

and

η_{F_{}} = 10^{- 6}

, i.e., we have the same

η, ξ

and

η_{F_{}}

as in the empirical application of Section 6. Furthermore, we use constant interpolation for incomplete panel data, when we initialize the set

X_{(0)}

. In Table A1, Table A2 and Table A3, we consider for known factor dimension K and lag order p, if the standard KF and KS should be used for estimating factor means

E_{Θ} [F_{t} | X]

and covariance matrices

C {ov}_{Θ} [F_{t}, F_{s} | X], 1 \leq t, s \leq T,

instead of the closed-form distributions (4). To be more precise, Table A1 shows trace

R^{2}

means (each based on 500 MC simulations) when we combine the EM updates (6)–(10) with the standard KF and KS. For the same MC paths, Table A2 provides trace

R^{2}

means, when we use Equation (4) instead.

A comparison of Table A1 and Table A2 shows: First, both estimation methods offer large trace

R^{2}

values regardless the data type, i.e., the mix of stock, flow and change in flow variables does not affect the trace

R^{2}

. Second, the larger the percentage of gaps the worse the trace

R^{2}

. Third, the trace

R^{2}

increases for large samples (i.e., more or longer time series). Fourth, for larger K and p the trace

R^{2}

, ceteris paribus, deteriorates. Fifth, our estimation method based on closed-form factor moments appears more robust than the Kalman approach. For instance, in Table A1 for

K = 1, p = 1, N = 75, T = 100

and 40% of missing data the trace

R^{2}

is NaN, which is an abbreviation for Not a Number, i.e., there was at least one MC path the Kalman approach could not estimate. By contrast, the respective trace

R^{2}

in Table A2 is 0.94 and so, all 500 MC paths were estimated without any problems. The means in Table A1 and Table A2 are pretty close, this is why Table A3 divides the means in Table A2 by their counterparts in Table A1. Hence, ratios larger than one indicate that our estimation method outperforms the Kalman approach, while ratios less than one do the opposite. Since all ratios in Table A3 are at least one, our method is superior.

For the sake of simplicity, we proceed with stock variables only, i.e., we treat all incomplete time series as stock variables in Table A4, which compares the single-step estimation method from Bańbura and Modugno [20] (abbreviated by BM) with our closed-form factor moments, two-step approach (abbreviated by CFM). At first glance, one step less speaks in favor of the single-step ansatz. However, one step less comes with a price, i.e., its state-space representation. Whenever a switch between data types occurs, the state-space representation of the overall model in [20] calls for adjustments. Furthermore, the inclusion of mixed-frequency information requires a regular scheme as for months and quarters. E.g., for weeks and months with an irregular pattern, the state-space representation in [20] becomes tremendous or calls for a recursive implementation of the temporal aggregation (11) as in Bańbura and Rünstler [18]. By contrast, our two-step approach permits any data type and calendar structure through the linear relation (11) and leaves the overall model untouched. This is easy and reduces the risk of mistakes. Moreover, the estimation of factor moments in closed form is computationally cheaper than a KF-KS-based estimation. Because of this, our approach can be more than 5-times faster than the corresponding procedure in [20]. According to Table A4, our approach is 5.5 times faster than its BM counterpart for complete panel data with

N = 25, T = 100, K = 7

and

p = 3

. For missing panel data in the range of [10%, 40%], ceteris paribus, our closed-form approach is 3.3–3.7 times faster than the KF-KS approach in [20].

Bańbura and Modugno [20] first derived their estimation method for EDFMs. Thereafter, they followed the argumentation in Doz et al. [16] to admit weakly cross-sectionally correlated shocks

ϵ_{t}

. Since Doz et al. [16] provided asymptotic results, we would like to assess how the method of [20] performs for finite samples with cross-sectionally correlated shocks. With a view to Table A4 we conclude: First, the general facts remain valid, i.e., for more missing data the trace

R^{2}

means worsen. Similarly, for larger K and p, the trace

R^{2}

means, ceteris paribus, deteriorate. By contrast, for larger panel data the trace

R^{2}

means improve. Second, for simple factor dynamics, i.e., small K and p, or sufficiently large panel data, cross-sectional correlation of the idiosyncratic shocks does not matter, if the ratio of missing data is low. This is in line with the argumentation in [16,20]. However, for small panel data, e.g.,

N = 25

and

T = 100

, with 40% gaps and factor dimensions

K \geq 5

cross-sectional error correlation matters. This is why our two-step estimation method outperforms the one-step approach of [20] in such scenarios.

Next, we focus on our two-step model selection procedure. Here, we address the impact of the multiplier m in Equation (19) on the estimated factor dimension. For Table A5, Table A6, Table A7, Table A8 and Table A9, we set

η_{\tilde{F}} = 10^{- 6}

in Algorithm A1. Since Table A5 and Table A6 treat ADFMs with

K \leq 5

and

p \leq 2

, the upper limits

\bar{K} = 7

and

\bar{p} = \bar{p} (K) = 4

are set. In Table A7, Table A8 and Table A9, we have trace

R^{2}

means, estimated factor dimension and lag orders of ADFMs with

K = 17

and

p \leq 2

. Therefore, we specify

\bar{K} = 22

and

\bar{p} = \bar{p} (K) = 4

in these cases. For efficiency reasons, the criterion (17) tests factor dimensions in the range

[12, 22]

instead of the overall range

[1, 22]

. A comparison of Table A5 and Table A6 shows that multipliers

m = 1

and

m = \frac{1}{2}

detect the true factor dimension and hence, support that the true lag order is identified. In doing so, larger panel data increases the estimation quality, i.e., trace

R^{2}

means increase, while estimated factor dimensions and lag orders converge to the true ones. By contrast, more gaps deteriorate the results.

For a better understanding of the meaning of m, we have a look at ADFMs with

K = 17

in Table A7, Table A8 and Table A9 and conclude: First, multiplier

m = \frac{1}{2}

is too strict, since it provides 12 for the estimated factor dimension, which is the lower limit of our tests. Fortunately, the criterion (20) for estimating the autoregressive order tends to the true one, even though the estimated factor dimension is too small. Second, for

N = 35

the slope argumentation after Equation (17) yields

m = \frac{1}{33}

, which properly estimates the true factor dimension for all scenarios in Table A8. As a consequence, the trace

R^{2}

means in Table A8 clearly dominate their analogs in Table A7. Third, we consider

m = \frac{1}{2 \cdot 33} = \frac{1}{66}

in Table A9 for some additional sensitivity analyses. If 40% of the panel data is missing,

m = \frac{1}{66}

overshoots the true factor dimension, which is reflected in slightly smaller trace

R^{2}

means than in Table A8. For lower ratios of missing observations, our two-step estimation method with

m = \frac{1}{66}

also works well, i.e., it delivers large trace

R^{2}

means and the estimated factor dimensions and lag orders tend towards the true values. With Table A7, Table A8 and Table A9 in mind, we recommend for empirical studies to have m rather too small than too big.

5. Modeling Index Returns

The preceding sections show how to condense information in large, incomplete panel data in the form of factors with known distributions. In the past, factor models were popular for nowcasting and forecasting of Gross Domestic Products (GDPs) and the construction of composite indicators [4,12,14,15,17,18,28,43].

Now, we show how estimated factors may support investing and risk management. Let

{(r_{t})}_{t}

be the returns, e.g., of the S&P 500 price index. The panel data

X_{obs}^{i}, 1 \leq i \leq N

, delivers additional information on the financial market, related indicators, the real economy, etc. Like Bai and Ng [44], we construct interval estimates instead of point estimates for the future returns. However, our prediction intervals are empirically derived, since we cannot take their asymptotic ones in the presence of missing observations.

Uncertainties arising from the estimation of factors and model parameters shall affect the interval size. Additionally, we intend to disclose the drivers of the expected returns supporting plausibility assessments. As any problem resulting from incomplete data was solved before, we assume coincident updating frequencies of factors and returns. Let the return dynamics satisfy an Autoregressive Extended Model (ARX)

(\tilde{q}, \tilde{p})

with

0 \leq \tilde{q}

and

0 \leq \tilde{p} \leq p

. The VAR

(p)

in (2) requires the latter constraint, as otherwise for

\tilde{p} > p

the ARX parameters are not identifiable. Thus, for sample length

\tilde{T}

,

\tilde{m} = max \{\tilde{q}, \tilde{p}\}

and

(\tilde{m} + 1) \leq t \leq \tilde{T}

, we consider the following regression model

\begin{matrix} r_{t} = α + \sum_{i = 1}^{\tilde{q}} (β_{i} r_{t - i}) + \sum_{i = 1}^{\tilde{p}} (γ_{i}^{'} F_{t - i}) + ε_{t}, ε_{t} \sim N (0, σ_{ε}^{2}) iid, \end{matrix}

(21)

where

α, β_{i} \in R

and

γ_{i} \in R^{K}

are constants and

F_{t} \in R^{K}

denotes the factor at time t in Model (1)–(2). Then, we collect the regression parameters of (21) in the joint vector

θ = {(α, β_{1}, \dots, β_{\tilde{q}}, γ_{1}^{'}, \dots, γ_{\tilde{p}}^{'})}^{'} \in R^{1 + \tilde{q} + \tilde{p} K}

.

The OLS estimate

\hat{θ}

of

θ

is asymptotically normal with mean

θ

and covariance matrix

Σ_{θ}

depending on

σ_{ε}^{2}

and the design matrix resulting from (21) [30] (p. 215) and its parameters can be consistently estimated. Subsequently, we assess the uncertainty caused by the estimation of

θ

. For this, the asymptotic distribution with consistently estimated parameters is essential, since an unknown parameter vector

θ

is randomly drawn from it [41] (Algorithm 4.2.1) for the construction of prediction intervals of

r_{T + 1}

. The factors are unique up to an invertible, linear transformation

R \in R^{K}

as shown by

\begin{matrix} r_{t} = α + \sum_{i = 1}^{\tilde{q}} (β_{i} r_{t - i}) + \sum_{i = 1}^{\tilde{p}} ((γ_{i}^{'} R^{- 1}) (R F_{t - i})) + ε_{t}, ε_{t} \sim N (0, σ_{ε}^{2}) iid . \end{matrix}

The unobservable factor

F_{t}

must be extracted from X which may be distorted by estimation errors. To cover the inherent uncertainty, we apply (4) and obtain for (21)

\begin{matrix} r_{t} & = α + \sum_{i = 1}^{\tilde{q}} (β_{i} r_{t - i}) + \sum_{i = 1}^{\tilde{p}} (γ_{i}^{'} (μ_{F_{t - i} | X_{t - i}} + Σ_{F_{t - i} | X_{t - i}}^{1 / 2} Z_{t - i})) + ε_{t}, \end{matrix}

(22)

with

Σ_{F_{t - i} | X_{t - i}}^{1 / 2}

as square root matrix of

Σ_{F_{t - i} | X_{t - i}}

and

Z_{t} \sim N (0_{K}, I_{K}) iid

for all

1 \leq t \leq \tilde{T}

. The vector

Z_{t}

and error

ε_{s}

are independent for all

1 \leq t, s \leq \tilde{T}

.

When we empirically construct prediction intervals for

r_{\tilde{T} + 1}

, uncertainties due to factor and ARX parameter estimation shall drive the interval width. To implement this in a Monte Carlo approach, let C be the number of simulated

r_{\tilde{T} + 1}

using Equation (22). After Algorithm A2 determined the factor distribution (4), for each trajectory

1 \leq c \leq C

a random sample

F_{1}^{c}, \dots F_{\tilde{T}}^{c}

enters the OLS estimate of

\hat{θ}

such that the distribution of

{\hat{θ}}^{c}

depends on c. Therefore, we capture both estimation risks despite their nonlinear relation.

The orders

({\tilde{q}}^{*}, {\tilde{p}}^{*})

are selected using AIC based on the estimated factor means. To take the factor hiddenness into account, we approximate the factor variance by the distortion of

F_{1}^{c}, \dots F_{\tilde{T}}^{c}

. Then, let the periods and frequencies of

X = [X_{1}, \dots, X_{T}]

and

r = [r_{1}, \dots, r_{\tilde{T}}]

be coincident. Besides

T = \tilde{T}

, this prevents from a run-up period before

t = 1

offering additional information in terms of

F_{t}, t \leq 0

. For chosen

(K, p)

from Model (1)–(2), the optimal pair

({\tilde{q}}^{*}, {\tilde{p}}^{*})

can be computed using an adjusted AIC. Here we refer to Ramsauer [41] for more details. Finally, a prediction interval for

r_{T + 1}

can be generated in a Monte-Carlo framework by drawing

{\bar{θ}}^{c}

from the asymptotic distribution of

\hat{θ}

, the factors

F_{1}^{c}, \dots, F_{T}^{c}

from (4) and using (21).

The mean and covariance matrix of the OLS estimate

\hat{θ}

are functions of the factors such that the asymptotic distribution of

{\hat{θ}}^{c}

in Ramsauer [41] (Algorithm 4.2.1) depends on

F^{c}

. If we neglect the

F^{c}

impact on the mean and covariance matrix of

{\hat{θ}}^{c}

for a moment, e.g., in case of a sufficiently long sample and little varying factors, we may decompose the forecasted returns as follows

\begin{matrix} r_{T + 1}^{c} & = \underset{AR Nature}{\underset{⏟}{{\bar{α}}^{c} + \sum_{i = 1}^{\tilde{q}} ({\bar{β}}_{i}^{c} r_{T + 1 - i})}} + \underset{Factor Impact}{\underset{⏟}{\sum_{i = 1}^{\tilde{p}} (w_{i}^{'} (X_{T + 1 - i} - μ_{}))}} + \underset{Factor Risk}{\underset{⏟}{\sum_{i = 1}^{\tilde{p}} ({({\bar{γ}}_{i}^{c})}^{'} Z_{T + 1 - i}^{c})}} + \underset{AR Risk}{\underset{⏟}{{\hat{σ}}_{ε}^{c} Z^{c}}}, \end{matrix}

(23)

with

w_{i}^{'} = {({\bar{γ}}_{i}^{c})}^{'} M^{- 1} W^{'} Σ_{ϵ_{}}^{- 1} \in R^{N}

and

Z_{T + 1 - i}^{c} = F_{T + 1 - i}^{c} - μ_{F_{T + 1 - i} | X_{T + 1 - i}} \in R^{K}

for all

1 \leq i \leq \tilde{p}

.

If neither the returns

r_{}

nor any transformation of

r_{}

are part of the panel data X, the distinction between the four pillars in (23) is more precise. In Equation (23), there are four drivers of

r_{T + 1}^{c}

. AR Nature covers the autoregressive return behavior, whereas Factor impact maps the information extracted from the panel data X. Therefore, both affect the direction of

r_{T + 1}^{c}

. By contrast, the latter treat estimation uncertainties. Therefore, Factor Risk reveals the distortion caused by

F^{c}

and hence, indicates the variation inherent in the estimated factors. This is of particular importance for data sets of small size or with many gaps. Finally, AR Risk incorporates deviations from the expected trend, since it adds the deviation of the ARX residuals.

The four drivers in (23) support the detection of model inadequacies and the construction of extensions, since each driver can be treated separately or as part of a group. For instance, a comparison of the pillars AR Nature and Factor Impact shows, whether a market has an own behavior such as a trend and seasonalities or is triggered by exogenous events. Next, we trace back the total contribution of Factor Impact to its single constituents such that the influence of a single signal may be analyzed. For this purpose, we store the single constituents of Factor Risk, sort all time series in line with the ascendingly ordered returns and then, derive prediction intervals for both (i.e., returns and their single drivers). This procedure prevents us from discrepancies due to data aggregation and ensures consistent expectations of

r_{T + 1}^{c}

and its drivers.

All in all, the presented approach for modeling the 1-step ahead returns of a financial index offers several advantages for asset and risk management applications: First, it admits the treatment of incomplete data. E.g., if macroeconomic data, flows, technical findings and valuation results are included, data and calendar irregularities cannot be neglected. Second, for each low-frequency signal a high-frequency counterpart is constructed (nowcasting) to identify, e.g., structural changes in the real economy at an early stage. Third, the ARX Model (21) links the empirical behavior of an asset class with exogenous information to provide interval and point estimations. Besides the expected return trend, the derived prediction intervals measure estimation uncertainties. In addition, investors take a great interest in the market drivers, as those indicate its sustainability. For instance, if increased inflows caused by an extremely loose monetary policy trigger a stock market rally and an asset manager is aware of this, he cares more about an unexpected change in monetary policy than poor macroeconomic figures. As soon as the drivers are known, alternative hedging strategies can be developed. In our example, fixed income derivatives might also serve for hedging purposes instead of equity derivatives.

The prediction intervals cover the trend and uncertainty of the forecasted returns. Therefore, we propose some simple and risk-adjusted dynamic trading strategies incorporating them. For simplicity, our investment strategies are restricted to a single financial market and a bank account. For

t \geq 1

, let

π_{t} \in [0, 1]

be the ratio of the total wealth invested with an expected return

r_{t}

over the period

(t - 1, t]

. The remaining wealth

1 - π_{t}

is deposited on the bank account for an interest rate

{\tilde{r}}_{t}

. Let

L_{t}

and

U_{t}

be lower and upper limits, respectively, of the

ν

-prediction interval for the same period. Then, a trading strategy based on the prediction intervals is given by

\begin{matrix} π_{t} = \{\begin{matrix} 1 & if L_{t} \geq 0 and U_{t} \geq 0, \\ \frac{U_{t}}{U_{t} - L_{t}} & if L_{t} < 0 and U_{t} > 0, \\ 0 & if L_{t} \leq 0 and U_{t} \leq 0 . \end{matrix} \end{matrix}

(24)

If the prediction interval is centered around zero, except for lateral movements, no clear trend is indicated. Regardless of the interval width, Strategy (24) takes a neutral allocation (i.e., 50% market exposure and 50% bank account deposit). As soon as the prediction interval is shifted to the positive (negative) half-plane, the market exposure increases up to 100% (decreases down to 0%). Depending on the interval width, the same shift size results in different proportions

π_{t}

, i.e., for large intervals with a high degree of uncertainty, a shift to the positive (negative) half-plane causes a smaller increase (decrease) in

π_{t}

compared to tight ones indicating low uncertainty. Besides temporary uncertainties, the prediction level

ν

affects the interval size and so, the market exposure

π_{t}

. Therefore, we have: The higher the level

ν

, the lower and rarer are deviations from the neutral allocation.

Strategy (24) is not always appropriate for applications in practice due to investor-specific risk preferences and restrictions. For all

t \geq 1

, Strategy (24) can therefore be accordingly adjusted

\begin{matrix} {\hat{π}}_{t} = max [min [α^{A} π_{t}, π^{U} - π^{L}], 0] + π^{L}, \end{matrix}

(25)

with

π_{t}

from Equation (24).

π^{L}, π^{U} \in R

with

π^{L} \leq π^{U}

are the lower and upper limits, respectively, of the market exposure which may not be exceeded.

α^{A} \geq 0

reflects the risk appetite of the investor.

The max-min-construction in Equation (25) defines a piecewise linear function bounded below (above) by

π^{L}

(

π^{U}

). Within these limits the term

α^{A} π_{t}

drives the market exposure

{\hat{π}}_{t}

. For

α^{A} > 1

changes in

π_{t}

are scaled-up (i.e., increased amplitude of

α^{A} π_{t}

versus

π_{t}

). Furthermore, the limits are reached more likely. This is why,

α^{A} > 1

refers to a risk-affine investor. By contrast,

0 \leq α^{A} \leq 1

reduces the amplitude of

α^{A} π_{t}

and thus, of

{\hat{π}}_{t}

. Therefore,

0 \leq α^{A} \leq 1

covers a risk-averse attitude. As an example, we choose

π^{L} = - 1, π^{U} = 1

and

α^{A} = 2

which implies:

{\hat{π}}_{t} \in [- 1, 1]

such that short sales are possible.

6. Empirical Application

This section applies the developed framework to the S&P500 price index. Diverse publication conventions and delays require us to declare, when we run our updates. From a business perspective the period between the end of trading on Friday and its restart on Monday is reasonable. On the one hand, there is plenty of time after the day-to-day business is done. On the other hand, there is enough time left to prepare changes in existing asset allocations triggered by the gained information, e.g., the weekly prediction intervals, until the stock exchange reopens. In this example, we have a weekly horizon such that the obtained prediction intervals cover the expected S&P500 log-return until next Friday. For the convenience of the reader, we summarize the vintage data of weekly, monthly or quarterly frequencies in Appendix E. Here, we mention some characteristics of the raw information, explain the preprocessing of inputs and state the data types (stock, flow or change in flow variable) of the transformed time series. Some inputs are related with each other, therefore, we group them into US Treasuries, US Corporates, US LIBOR, Foreign Exchange Rates and Gold, Demand, Supply, and Inflation, before we analyze the drivers of the predicted log-returns. This improves the clarity of our results, in particular, when we illustrate them.

The overall sample ranges from 15 January 1999 to 5 February 2016 and is updated weekly. We set a rolling window of 364 weeks, i.e., seven years, such that the period from 15 January 1999 until 30 December 2005 constitutes our in-sample period. Based on this, we construct the first prediction interval for the S&P500 log-return from 30 December 2005 until 6 January 2006. Then, we shift the rolling window by one week and repeat all steps (incl. model selection and estimation) to derive the second prediction interval. Finally, we proceed until the sample end is reached. As the length of the rolling window is kept, the estimated contributions remain comparable, when time goes by. Furthermore, our prediction intervals react on structural changes, e.g., crises, more quickly compared to an increasing in-sample period. As upper limits of the factor dimension, factor lags and return lags we choose

\bar{K} = 22, \bar{p} = 4

and

\bar{q} = 5

, respectively. For the termination criteria, we have:

ξ = 10^{- 2}, η = 10^{- 2}, η_{F_{}} = 10^{- 6}, η_{\tilde{F}} = 10^{- 6}

and

η_{\tilde{B}} = 10^{- 3}

. To avoid any bias caused by simulation each prediction interval relies on

C = 500

trajectories.

For the above settings, we receive the prediction intervals in Figure A1 for weekly S&P500 log-returns. To be precise, the light gray area reveals the 50%-prediction intervals, while the black areas specify the 90%-prediction intervals. Here, each new, slightly darker area corresponds to prediction levels increased by 10%. In addition, the red line shows the afterwards realized S&P500 log-returns. Please note that the prediction intervals cover the S&P500 returns quite well, as there is a moderate number of interval outliers. However, during the financial crisis in 2008/2009 we have a cluster of interval outliers, which calls for further analyses. Perhaps, the inclusion of regime-switching concepts may remedy this circumstance.

As supplement to Figure A1, Figure A2 breaks the means of the predicted S&P500 log-returns down into the contributions of our panel data groups. In contrast to Figure A1, where Factor and AR Risks widened the prediction intervals, both do not matter in Figure A2. This makes sense, as we average the predicted returns, whose Factor and AR Risks are assumed to have zero mean. Dark and light blue colored areas detect how financial data affects our return predictions. In particular, during the financial crisis in 2008/2009 and in the years 2010–2012, when the United States (US) Federal Reserve intervened on capital markets in the form of its quantitative easing programs, financial aspects mainly drove our return predictions. Since the year 2012, the decomposition is more scattered and changes quite often, i.e., macroeconomic and financial events matter. Figure A3 also supports the hypothesis that exogenous information increasingly affected the S&P500 returns in recent years. Although the factor dimension stayed within the range [15, 16] and we have for the autoregressive return order

\tilde{q} = 4

, from mid-2013 until mid-2015 the factor lags p and

\tilde{p}

increased. This indicates a more complex ADFM and ARX modeling.

Next, we focus on the financial characteristics of the presented approach. Therefore, we verify whether the Trading Strategies (24) and (25) may benefit from the proper mapping of the prediction intervals. Here, we abbreviate Trading Strategy (24) based on the 50%-prediction intervals by Prediction Level PL 50, while (PL) 60 is its analog using the 60%-prediction intervals, etc. For simplicity, our cash account does not offer any interest rate, i.e.,

{\tilde{r}}_{t} \equiv 0

for all times

t \geq 0

and transaction costs are neglected. In total, Figure A4 illustrates how an initial investment of 100 United States Dollar (USD) on December 30, 2005 in the trading strategies PL 50 until PL 90 with weekly rebalancing would have evolved. Hence, it shows a classical backtest.

In addition, we analyze how Leverage & Short Sales (L&S) change the risk-return profile of Trading Strategy (24). Again, we have for the cash account:

{\tilde{r}}_{t} \equiv 0

and there are zero transaction costs. That is, how the risk-return profile of Trading Strategy (25) deviates from the one in (24) and what the respective contribution of parameters

α^{A}, π^{U}

and

π^{L}

is. In Figure A4, L&S 2/1/0 stands for Trading Strategy (25) with weekly rebalancing based on PL 50 with parameters

α^{A} = 2, π^{U} = 1

and

π^{L} = 0

. The trading strategy L&S 2/1/−1 is also based on PL 50, but has the parameters

α^{A} = 2, π^{U} = 1

and

π^{L} = - 1

.

In Figure A4, the strategy S&P500 reveals how a pure investment in the S&P500 would have performed. Moreover, Figure A4 shows the price evolution of two Buy&Hold (B&H) and two Constant Proportion Portfolio Insurance (CPPI) strategies with weekly rebalancing. Hence, the Buy&Hold strategies serve as Constant Mix strategies. Here, B&H 50 denotes a Buy&Hold strategy with rebalanced S&P500 exposure on average of PL 50. Similarly, B&H 90 invests the averaged S&P500 exposure of PL 90. In Figure A4, CPPI 2/80 stands for a CPPI strategy with multiplier 2 and floor 80%. The floor of a CPPI strategy denotes the minimum repayment at maturity. For any point in time before maturity, the cushion represents the difference between the current portfolio value and the discounted floor. Here, discounting does not matter, since

{\tilde{r}}_{t} \equiv 0 \forall t \geq 0

holds. The multiplier of a CPPI strategy constitutes to what extent the positive cushion is leveraged. As long as the cushion is positive, the cushion times the multiplier, which is called exposure, is invested in the risky assets. Because of

{\tilde{r}}_{t} \equiv 0 \forall t \geq 0

, there is no penalty, if the exposure exceeds the current portfolio value. To avoid borrowing money, the portfolio value at a given rebalancing date caps the risky exposure in this section. As soon as the cushion is zero or becomes negative, the total wealth is deposited on the bank account with

{\tilde{r}}_{t} \equiv 0

for the remaining time to maturity. Further information about CPPI strategies is stated in, e.g., Black and Perold [45]. Similarly, CPPI 3/60 stands for a CPPI strategy with multiplier 3 and floor 60%.

Besides Figure A4, Table A10 lists some common performance and risk measures for all trading strategies. Then, we conclude: First, for higher prediction levels the Log-Return (Total, %) of its PL strategy decreases. E.g., compare PL 50 and PL 90. By definition, a high prediction level widens the intervals such that shifts in their location have less impact on the stock exposure

π_{t}

in (24). As shown in Figure A5, all PL strategies are centered around a level of 50%, but PL 50 adjusts its stock exposure more often and to a bigger extent than PL 90. Second, all PL strategies have periods of time with a lasting stock exposure

\geq 50 %

or

\leq 50 %

. Over our out-of-sample period, PL 50 invests on average 51% of its wealth in the S&P 500, but it outperformed B&H 50 by far. Hence, changing our asset allocation by

π_{t}

in (24) really paid off.

Except for the L&S strategies, PL 50 has the highest Log-Return (Total, %) and therefore, appears very attractive. However, the upside usually comes with a price. This is why we next focus on the volatilities of our trading strategies. In this regard, CPPI 2/80 offers with 0.93% the lowest weekly standard deviation. With its allocation in Figure A5 in mind, this makes sense, as CPPI 2/80 was much less exposed to the S&P500 than all others. Please note that Figure A5 also shows how CPPI 3/60 was hit by the financial crisis in 2007/2008, when its S&P500 exposure dramatically dropped from 100% on 3 October 2008 to 21% on 13 March 2009. For PL strategies, we get for the volatility an opposite picture compared to the Log-Return (Total, %), i.e., the higher the prediction level, the lower the weekly standard deviation is. This sounds reasonable, as PL 90 makes smaller bets than PL 50. For L&S strategies, Table A10 confirms that leveraging works as usual. Both, i.e., return and volatility, increased at the same time.

The Sharpe Ratio links the return and volatility of a trading strategy. Except for L&S 1.5/1/−0.5, the PL strategies offer the largest Sharpe Ratios. Therefore, PL 80 has the biggest weekly Sharpe Ratio of 7.39%. As supplement, Table A11 reveals that the Sharpe Ratios of PL 80 and PL 90 are significantly different to those of S&P500, CPPI 2/80 and CPPI 3/60. The differences within or between the PL and L&S strategies are not significant. The Omega Measure compares the upside and downside of a strategy. Based on Table A10, L&S 1.5/1/−0.5 and L&S 2/1/−1 have the largest Omega Measures given by 134.92% and 132.39%. The Omega Measures of the PL strategies lie in the range [121.34%, 124.94%], which are larger than those of the benchmark strategies in the range [103.86%, 111.16%]. The differences between all Omega Measures are not significant, see Table 4.16 in [41].

Similar to the volatility, CPPI 2/80 has the smallest 95% Value at Risk and 95% Conditional Value at Risk. The PL strategies have more or less the same weekly 95% VaR, since all lie in the range

[- 1.99 %, - 1.90 %]

. However, their 95% CVaR ranges from −3.19% to −2.78% and so, reflects that PL 50 makes bigger bets than PL 90. For L&S strategies, there is no pattern how leveraging and short selling affects the 95% VaR and CVaR. Finally, we consider the Maximum Drawdown based on the complete out-of-sample period. Please note that Figure A4 and Figure A5 and Table A10 confirm that CPPI 3/60 behaves like the S&P500, until it was knocked out by the financial crises in 2007/2008. This is why its Maximum Drawdown of −48.43% is close to the −56.24% of the S&P500. By contrast, the Maximum Drawdowns of the PL strategies lie in the range of [−19.91%, −17.37%], which is less than half. They are even smaller than the Maximum Drawdown of CPPI 2/80, which is −23.18%. For L&S strategies, short sales admit us to gain from a drop on the stock market, while leveraging boosts profits and losses. In total, this yields a scattered picture for their Maximum Drawdowns.

With the financial figures in mind, we recommend PL 50 for several reasons: First, it provides a decent return, which is steadily gained over the total period. Second, it has an acceptable volatility and a moderate downside. Please note that all PL strategies, L&S 1.5/1/−0.5 and L&S 2/1/−1 are positively skewed, which indicates a capped downside. The normalized histograms of the log-returns for all trading strategies can be found in Figure 4.6 from Ramsauer [41].

If we repeat the previous analysis for complete panel data, we can verify whether the inclusion of mixed-frequency information really pays off. Instead of all 33 time series in Appendix E, we restrict ourselves to US Treasuries, Corporate Bonds, London Interbank Offered Rate (LIBOR) and Foreign Exchange (FX)&Gold. Therefore, we have 22 time series without any missing observations. Again, we keep our rolling window of 364 weeks and gradually shift it over time, until we reach the sample end. For the upper limit of the factor dimension, we set

\bar{K} = 21

. At this stage, there are no obvious differences between the prediction intervals for incomplete and complete panel data [41] (Figure 4.7). If we break the means of the predicted log-returns in Figure A6 down into the contributions of the respective groups as shown in Figure A6, we have a different pattern than in Figure A2. E.g., Figure A2 detects supply as main driver at the turn of the year 2009/2010, whereas Figure A6 suggests US Treasuries and Corporate Bonds. However, in the years 2010–2012 US Treasuries gained in importance in Figure A6, which also indicates the interventions of the US Federal Reserve through its quantitative easing programs.

Next, we analyze the impact of the prediction intervals on Trading Strategies (24) and (25). Besides PL and L&S strategies of Figure A4 based on 33 variables, Figure A7 shows their analogs arising from 22 complete time series. Please note that the expression PL 50 (no) in Figure A7 is an abbreviation for PL 50 using panel data with no gaps. The same holds for L&S 2/1/0, etc. Besides the prices in Figure A7, Table A12 lists their performance and risk measures. The S&P exposure of the single strategies based on the 22 complete time series can be found in Figure 4.11 from Ramsauer [41]. Thus, we conclude: First, PL 50 (no) has a total log-return of 30.22%, which exceeds all other PL (no) strategies, but is much less than 50.93% of PL 50. Similarly, the L&S (no) strategies have a much lower log-return than their L&S counterparts. Second, PL 50 (no) changes its S&P500 exposure more often and to a larger extent than PL 90 (no). Third, the standard deviations of PL (no) strategies exceed their PL analogs such that their Sharpe Ratios are about half of the PL Sharpe Ratios. As shown in Table A13, the Sharpe Ratios of PL and PL (no) strategies are significantly different. Fourth, PL (no) strategies are dominated by their PL versions in terms of Omega Measure. Table 4.19 in [41] shows that such differences are not significant. Fifth, the 95% VaR and CVaR of the PL (no) strategies are slightly worse than of the PL alternatives, but their Maximum Drawdowns almost doubled in the absence of macroeconomic signals. Except for PL 50 (no), the returns of all PL (no) strategies, are negatively skewed [41] (Figure 4.12). This indicates that large profits were removed and big losses were added, respectively. All in all, we therefore suggest the inclusion of macroeconomic variables.

Eventually, we consider the Root-Mean-Square Error (RMSE) for weekly point forecasts of the S&P500 log-returns. We replace sampled factors and ARX coefficients by their estimates to predict the log-return of next week. In this context, an ARX based on incomplete panel data has a RMSE of 0.0272, while an ARX restricted to 22 variables provides a RMSE of 0.0292. Please note that a constant forecast

{\hat{r}}_{t} \equiv 0

yields a RMSE of 0.0259, the RMSEs of Autoregressive Models (ARs) with orders from 1–12 lie in the range

[0.0260, 0.0266]

and the RMSEs of Random Walks with and without drift are 0.0380 and 0.0379, respectively. Therefore, our model is mediocre in terms of RMSE. Since the RMSE controls the size, but not the direction of the deviations, Figure A8 illustrates the deviations

{\hat{r}}_{t} - r_{t}

of our ARX based on all panel data and the AR(3), which was best regarding RMSE. As Figure A8 shows, the orange histogram has 4 data points with

{\hat{r}}_{t} - r_{t} \leq - 0.10

. Our ARX predictions for 10/17/2008, 10/31/2008, 11/28/2008 and 03/13/2009 were too conservative, which deteriorated its RMSE. If we exclude these four dates, our mixed-frequency ARX has a RMSE of 0.0251, which beats all other models.

For comparing the predictive ability of competing forecasts, we perform a conditional Giacomini-White test. Our results rely on the MATLAB implementation available at http://www.execandshare.org/CompanionSite/site.do?siteId=116 (accessed on 13 December 2020) of the test introduced in Giacomini and White [46]. Furthermore, we consider the squared error loss function. We conclude: First, the inclusion of macroeconomic data in our approach is beneficial at a 10%-significance level. A comparison of our method based on incomplete panel data vs. complete financial data only provides a p-value of 0.06 and a test statistic of 5.61. In this context, forecasting with macroeconomic variables outperforms the forecasting relying on pure financial data by more than 50% of the time. Second, there are not any significant differences between our approach and an AR(3) or a constant forecast

{\hat{r}}_{t} \equiv 0

. By comparing our approach with an AR(3), we observe a p-value of 0.364. Similarly, we have a p-value of 0.355 compared to the constant forecast

{\hat{r}}_{t} \equiv 0

. Unfortunately, this also holds true, if we remove the four previously mentioned outliers from our prediction sample.

Finally, we verify the quality of our interval forecasts with respect to the Ratio of Interval Outliers (RIO) and Mean Interval Score (MIS) for prediction intervals in Table A14. For the respective definitions, we refer to Gneiting and Raftery [47], Brechmann and Czado [48] and Ramsauer [41]. In this context, the inclusion of mixed-frequency information provides some statistical improvements. Except for the 50%-prediction intervals, we have more outliers in Table A14, when the ARX relies on 22 complete time series than all 33 variables. Thus, the macroeconomic indicators make our model more cautious. Except for the 90%-prediction intervals based on complete panel data, all Ratios of Interval Outliers are below the aimed threshold. In contrast to RIO counting the number of interval outliers, MIS takes into account by how much the prediction intervals are exceeded. In this regard, the ARX using incomplete panel data dominates the ARX restricted to the 22 time series. All in all, this underpins again the advantages arising from the inclusion of macroeconomic information.

7. Conclusions and Final Remarks

We estimate ADFMs with homoscedastic, cross-sectionally correlated errors for incomplete panel data. Our approach alternately applies two EMs and selects the factor dimension and autoregressive order. The latter feature is important for empirical applications. Furthermore, estimated latent factors are used to model future dynamics of weekly S&P500 index returns in an autoregressive setting. In doing so, we are able to first quantify the contributions of the panel data to our point forecasts. Second, we construct prediction intervals for the weekly returns and define two dynamic trading strategies based on them. Our empirical application shows great potential of the proposed methodology for financial markets without short selling or leverage.

Our paper makes three contributions to the existing literature for incomplete panel data. First, it handles cross-sectionally correlated shocks usually ignored. Our MC simulation study shows that our approach outperforms the benchmark estimation of ADFMs in Bańbura and Modugno [20]. Second, our MLE does not link an EM and the KF/KS. We use instead the means and covariance matrices of the latent factors in closed form. Our MC simulation study reveals that MLE based on closed-form factor moments dominates MLE with the KF/KS. Third, we treat the stochastic factor dynamics in its general form and address the selection of the factor dimension as well as autoregressive order essential for practical applications.

The processing of the estimated factors is also novel. Instead of point estimates, we construct empirical prediction intervals for a return series. Besides exogenous information and autoregressive return characteristics, the prediction intervals incorporate uncertainties arising from the estimation of the factors and model parameters. Furthermore, we trace the means of our prediction intervals back to the original panel data and their high-frequency counterparts, respectively. This is an important feature for practitioners, as it offers them to compare our model-based output with their expectations. To gain information from the future index behavior, we propose two dynamic trading strategies. The first determines how much of the total wealth should be invested in the financial index depending on the prediction intervals. The second strategy shows how risk-return characteristics of the first can be adapted to the needs of an investor.

Our approach does not cover serially correlated errors. Therefore, future research could include the estimation of ADFMs with homoscedastic, serially and cross-sectionally correlated idiosyncratic errors for incomplete panel data. In a next step, an extension to heteroscedasticity or the incorporation of regime-switching concepts would be worthwhile. Finally, several ADFMs could be coupled by copulas to capture nonlinear inter-market dependencies similarly to Ivanov et al. [49].

Author Contributions

All authors substantially contributed to this article as follows: Conceptualization, all authors; Methodology, all authors; Software, F.R.; Validation, F.R. and A.M.; Formal Analysis, F.R.; Investigation, F.R.; Resources, all authors; Data Curation, F.R.; Writing—Original Draft Preparation, F.R. and A.M.; Writing—Review & Editing, F.R., A.M. and R.Z.; Visualization, F.R.; Supervision, M.D., F.S. and R.Z.; Project Administration, L.P., F.R. and A.M.; Funding Acquisition, M.D. and F.S. All authors have read and agreed to the published version of the manuscript.

Funding

The PhD position of Franz Ramsauer at Technical University of Munich was third-party funded by Pioneer Investments, which is now part of Amundi Asset Management. Otherwise, this research received no external funding.

Data Availability Statement

All underlying data is publicly available. The respective sources are stated in Appendix E in detail.

Acknowledgments

The authors want to thank the editor and two anonymous reviewers for their very helpful suggestions, which essentially contributed to an improved manuscript. Franz Ramsauer gratefully acknowledges the support of Pioneer Investments, which is now part of Amundi Asset Management, during his doctoral phase.

Conflicts of Interest

The authors declare no conflict of interest. The sponsorship had no impact on the design of the study, in the collection, analyses, and interpretation of data, in the writing of the manuscript, nor in the decision to publish the results.

Appendix A. Algorithms

Algorithm A1: Estimate ADFMs based on complete panel data

Algorithm A2: Estimate ADFMs based on incomplete panel data

Appendix B. Simulation Results

Table A1. Means of trace

R^{2}

for random ADFMs using standard KF and KS.

Table A1. Means of trace

R^{2}

for random ADFMs using standard KF and KS.

		Stock $^{a}$				Stock/Flow (Average) $^{b}$				Stock/Change in Flow (Average) $^{c}$
		Ratio of Missing Data				Ratio of Missing Data				Ratio of Missing Data
$N$	$T$	0%	10%	25%	40%	0%	10%	25%	40%	0%	10%	25%	40%
		$K = 1, p = 1$
25	100	0.94	0.93	0.93	0.91	0.94	0.93	0.92	0.86	0.94	0.93	0.92	0.87
25	500	0.98	0.98	0.97	0.96	0.98	0.97	0.97	0.94	0.98	0.97	0.97	0.95
50	100	0.95	0.94	0.94	0.91	0.94	0.94	0.92	0.84	0.95	0.94	0.93	0.86
50	500	0.98	0.98	0.98	0.98	0.98	0.98	0.98	0.96	0.98	0.98	0.98	0.96
75	100	0.95	0.95	0.92	0.83	0.95	0.94	0.89	0.89	0.95	0.94	0.88	NaN
75	500	0.99	0.99	0.98	0.98	0.99	0.98	0.98	0.96	0.99	0.98	0.98	0.97
		Ratio of Missing Data				Ratio of Missing Data				Ratio of Missing Data
$N$	$T$	0%	10%	25%	40%	0%	10%	25%	40%	0%	10%	25%	40%
$K = 3, p = 2$
25	100	0.88	0.87	0.85	0.80	0.88	0.87	0.84	0.76	0.88	0.87	0.83	0.75
25	500	0.97	0.96	0.96	0.94	0.97	0.96	0.94	0.87	0.97	0.95	0.94	0.88
50	100	0.90	0.90	0.88	0.85	0.90	0.89	0.86	0.77	0.90	0.89	0.85	0.77
50	500	0.98	0.98	0.97	0.96	0.98	0.97	0.97	0.91	0.98	0.97	0.96	0.92
75	100	0.91	0.90	0.88	0.84	0.90	0.89	0.85	NaN	0.91	0.89	0.85	0.75
75	500	0.98	0.98	0.98	0.97	0.98	0.98	0.97	0.93	0.98	0.97	0.97	0.94
$K = 5, p = 2$
25	100	0.88	0.87	0.84	0.79	0.88	0.87	0.82	0.75	0.88	0.86	0.82	0.75
25	500	0.97	0.96	0.95	0.92	0.97	0.96	0.93	0.85	0.97	0.95	0.93	0.85
50	100	0.89	0.89	0.87	0.84	0.89	0.88	0.85	0.78	0.89	0.88	0.84	0.77
50	500	0.97	0.97	0.97	0.96	0.97	0.97	0.96	0.89	0.97	0.96	0.95	0.90
75	100	0.90	0.89	0.87	0.85	0.89	0.88	0.85	0.78	0.90	0.88	0.85	0.78
75	500	0.98	0.98	0.98	0.97	0.98	0.98	0.97	0.91	0.98	0.97	0.97	0.92

^{a}

: For incomplete time series a stock variable is assumed.

^{b}

: For incomplete data,

⌈N / 2⌉

and

⌊N / 2⌋

time series are stock and flow (average formulation) variables, respectively.

^{c}

: For incomplete data,

⌈N / 2⌉

and

⌊N / 2⌋

time series serve as stock or change in flow (average formulation) variables.

Table A2. Means of trace

R^{2}

for random ADFMs using closed-form factor moments.

Table A2. Means of trace

R^{2}

for random ADFMs using closed-form factor moments.

		Stock $^{a}$				Stock/Flow (Average) $^{b}$				Stock/Change in Flow (Average) $^{c}$
		Ratio of Missing Data				Ratio of Missing Data				Ratio of Missing Data
$N$	$T$	0%	10%	25%	40%	0%	10%	25%	40%	0%	10%	25%	40%
		$K = 1, p = 1$
25	100	0.96	0.95	0.95	0.95	0.96	0.95	0.95	0.92	0.96	0.95	0.94	0.92
25	500	0.98	0.98	0.97	0.97	0.98	0.98	0.97	0.95	0.98	0.97	0.97	0.95
50	100	0.96	0.96	0.96	0.95	0.96	0.96	0.95	0.94	0.96	0.95	0.96	0.94
50	500	0.99	0.99	0.98	0.98	0.99	0.98	0.98	0.97	0.99	0.98	0.98	0.97
75	100	0.96	0.96	0.96	0.96	0.96	0.96	0.96	0.94	0.97	0.96	0.96	0.94
75	500	0.99	0.99	0.99	0.99	0.99	0.99	0.99	0.97	0.99	0.98	0.99	0.98
$K = 3, p = 2$
25	100	0.95	0.95	0.94	0.93	0.95	0.94	0.94	0.90	0.95	0.94	0.94	0.90
25	500	0.98	0.97	0.97	0.96	0.98	0.97	0.96	0.93	0.98	0.96	0.96	0.94
50	100	0.96	0.95	0.95	0.95	0.96	0.95	0.95	0.93	0.96	0.95	0.95	0.93
50	500	0.99	0.98	0.98	0.98	0.99	0.98	0.98	0.96	0.99	0.98	0.98	0.96
75	100	0.96	0.96	0.96	0.96	0.96	0.96	0.96	0.94	0.96	0.95	0.96	0.94
75	500	0.99	0.99	0.99	0.98	0.99	0.99	0.98	0.96	0.99	0.98	0.98	0.97
$K = 5, p = 2$
25	100	0.95	0.94	0.94	0.93	0.95	0.94	0.93	0.89	0.95	0.93	0.93	0.88
25	500	0.98	0.97	0.97	0.95	0.98	0.97	0.96	0.92	0.98	0.96	0.96	0.92
50	100	0.96	0.96	0.95	0.95	0.96	0.95	0.95	0.93	0.96	0.95	0.95	0.93
50	500	0.99	0.98	0.98	0.98	0.99	0.98	0.98	0.95	0.99	0.98	0.98	0.96
75	100	0.96	0.96	0.96	0.96	0.96	0.95	0.96	0.94	0.96	0.95	0.96	0.94
75	500	0.99	0.99	0.99	0.98	0.99	0.98	0.98	0.96	0.99	0.98	0.98	0.97

All displayed means are derived from 500 MC simulations for known dimensions K and p.

^{a}

: For incomplete time series a stock variable is assumed.

^{b}

: For incomplete data,

⌈N / 2⌉

and

⌊N / 2⌋

time series are stock and flow (average formulation) variables, respectively.

^{c}

: For incomplete data,

⌈N / 2⌉

and

⌊N / 2⌋

time series serve as stock or change in flow (average formulation) variables.

Table A3. Ratios of trace

R^{2}

means for random ADFMs using both approaches.

Table A3. Ratios of trace

R^{2}

means for random ADFMs using both approaches.

		Stock $^{a}$				Stock/Flow (Average) $^{b}$				Stock/Change in Flow (Average) $^{c}$
		Ratio of Missing Data				Ratio of Missing Data				Ratio of Missing Data
$N$	$T$	0%	10%	25%	40%	0%	10%	25%	40%	0%	10%	25%	40%
		$K = 1, p = 1$
25	100	1.02	1.02	1.02	1.04	1.02	1.02	1.03	1.08	1.02	1.02	1.03	1.06
25	500	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.01	1.00	1.00	1.00	1.01
50	100	1.02	1.02	1.02	1.05	1.02	1.02	1.03	1.13	1.02	1.02	1.03	1.09
50	500	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.01	1.00	1.00	1.00	1.01
75	100	1.01	1.02	1.04	1.15	1.01	1.02	1.08	1.07	1.01	1.02	1.09	NaN
75	500	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.01	1.00	1.00	1.00	1.00
$K = 3, p = 2$
25	100	1.08	1.08	1.11	1.16	1.08	1.08	1.12	1.19	1.08	1.08	1.13	1.20
25	500	1.01	1.01	1.02	1.02	1.01	1.01	1.02	1.07	1.01	1.01	1.02	1.07
50	100	1.06	1.07	1.08	1.13	1.06	1.07	1.11	1.21	1.06	1.07	1.12	1.21
50	500	1.01	1.01	1.01	1.02	1.01	1.01	1.02	1.05	1.01	1.01	1.02	1.05
75	100	1.06	1.07	1.09	1.14	1.06	1.07	1.12	NaN	1.06	1.07	1.13	1.24
75	500	1.01	1.01	1.01	1.01	1.01	1.01	1.01	1.04	1.01	1.01	1.01	1.03
$K = 5, p = 2$
25	100	1.08	1.08	1.12	1.18	1.08	1.08	1.13	1.18	1.08	1.08	1.14	1.18
25	500	1.01	1.01	1.02	1.03	1.01	1.01	1.03	1.08	1.01	1.01	1.03	1.09
50	100	1.08	1.08	1.09	1.13	1.08	1.08	1.12	1.19	1.08	1.08	1.13	1.20
50	500	1.01	1.01	1.01	1.02	1.01	1.01	1.02	1.07	1.01	1.01	1.03	1.07
75	100	1.07	1.08	1.10	1.13	1.07	1.08	1.12	1.20	1.07	1.08	1.13	1.21
75	500	1.01	1.01	1.01	1.02	1.01	1.01	1.02	1.05	1.01	1.01	1.02	1.05

The displayed ratios are derived from 500 MC simulations for known dimensions K and p. In doing so, each figure represents the mean of the trace

R^{2}

in Table A2 divided by its counterpart in Table A1.

^{a}

: For incomplete time series a stock variable is assumed.

^{b}

: For incomplete data,

⌈N / 2⌉

and

⌊N / 2⌋

time series are stock and flow (average formulation) variables, respectively.

^{c}

: For incomplete data,

⌈N / 2⌉

and

⌊N / 2⌋

time series serve as stock or change in flow (average formulation) variables.

Table A4. Comparison of trace

R^{2}

means for random ADFMs using the approach of [20] and our two-step estimation method.

Table A4. Comparison of trace

R^{2}

means for random ADFMs using the approach of [20] and our two-step estimation method.

		BM $^{a}$				CFM $^{b}$				CFM/BM
		Ratio of Missing Data				Ratio of Missing Data				Ratio of Missing Data
$N$	$T$	0%	10%	25%	40%	0%	10%	25%	40%	0%	10%	25%	40%
$K = 3, p = 2$
25	100	0.93	0.93	0.92	0.91	0.95	0.95	0.94	0.93	1.02	1.02	1.02	1.02
25	500	0.97	0.97	0.97	0.96	0.98	0.97	0.97	0.96	1.00	1.00	1.00	1.00
50	100	0.94	0.94	0.93	0.93	0.96	0.96	0.95	0.95	1.02	1.02	1.02	1.02
50	500	0.98	0.98	0.98	0.98	0.99	0.98	0.98	0.98	1.00	1.00	1.00	1.00
75	100	0.94	0.94	0.94	0.94	0.96	0.96	0.96	0.96	1.02	1.02	1.02	1.02
75	500	0.98	0.98	0.98	0.98	0.99	0.99	0.99	0.98	1.00	1.00	1.00	1.00
$K = 5, p = 4$
25	100	0.90	0.90	0.88	0.85	0.94	0.94	0.93	0.92	1.05	1.05	1.06	1.08
25	500	0.97	0.96	0.96	0.94	0.97	0.97	0.96	0.95	1.01	1.01	1.01	1.01
50	100	0.91	0.91	0.91	0.90	0.96	0.95	0.95	0.95	1.05	1.05	1.05	1.05
50	500	0.98	0.98	0.97	0.97	0.98	0.98	0.98	0.98	1.01	1.01	1.01	1.01
75	100	0.92	0.92	0.91	0.91	0.96	0.96	0.96	0.95	1.05	1.05	1.05	1.05
75	500	0.98	0.98	0.98	0.97	0.99	0.99	0.99	0.98	1.01	1.01	1.01	1.01
$K = 7, p = 3$
25	100	0.91	0.91	0.88	0.83	0.95	0.94	0.93	0.91	1.04	1.04	1.05	1.09
25	500	0.97	0.96	0.95	0.94	0.97	0.97	0.96	0.94	1.01	1.01	1.01	1.01
50	100	0.92	0.92	0.92	0.91	0.96	0.95	0.95	0.95	1.03	1.03	1.04	1.04
50	500	0.98	0.98	0.97	0.97	0.98	0.98	0.98	0.98	1.01	1.01	1.01	1.01
75	100	0.93	0.93	0.93	0.92	0.96	0.96	0.96	0.95	1.03	1.03	1.03	1.04
75	500	0.98	0.98	0.98	0.98	0.99	0.99	0.99	0.98	1.01	1.01	1.01	1.01

The means in columns BM and CFM are derived from 500 MC simulations for known dimensions K and p. The ratios in column CFM/BM denote the means in column CFM divided by their counterparts in column BM. In case of incomplete data, all time series are supposed to be stock variables.

^{a}

: Abbreviation for the estimation method in [20].

^{b}

: Abbreviation for closed-form factor moments.

Table A5. Means of trace

R^{2}

for random ADFMs of low dimensions using our two-step estimation method with

m = 1

.

Table A5. Means of trace

R^{2}

for random ADFMs of low dimensions using our two-step estimation method with

m = 1

.

		Trace $R^{2}$				Estimated K				Estimated p
		Ratio of Missing Data				Ratio of Missing Data				Ratio of Missing Data
$N$	$T$	0%	10%	25%	40%	0%	10%	25%	40%	0%	10%	25%	40%
$K = 3, p = 2$
25	100	0.95	0.94	0.92	0.87	2.99	2.98	2.89	2.72	1.65	1.74	1.69	1.76
25	500	0.98	0.97	0.97	0.94	3.00	3.00	2.99	2.93	2.04	2.04	2.05	2.04
50	100	0.96	0.96	0.96	0.94	3.00	3.00	3.00	2.95	1.72	1.78	1.72	1.70
50	500	0.99	0.98	0.98	0.98	3.00	3.00	3.00	3.00	2.03	2.03	2.06	2.06
75	100	0.96	0.96	0.96	0.96	3.00	3.00	3.00	3.00	1.68	1.71	1.76	1.78
75	500	0.99	0.99	0.99	0.98	3.00	3.00	3.00	3.00	2.05	2.05	2.05	2.06
$K = 5, p = 1$
25	100	0.78	0.74	0.68	0.60	3.74	3.50	3.16	2.74	1.01	1.03	1.05	1.04
25	500	0.85	0.83	0.77	0.68	4.17	4.08	3.75	3.32	1.00	1.01	1.02	1.06
50	100	0.92	0.89	0.83	0.74	4.62	4.39	3.95	3.44	1.01	1.01	1.02	1.04
50	500	0.98	0.98	0.96	0.91	4.99	4.97	4.85	4.52	1.00	1.00	1.01	1.01
75	100	0.96	0.94	0.89	0.82	4.93	4.75	4.36	3.87	1.01	1.01	1.01	1.02
75	500	0.99	0.99	0.99	0.98	5.00	5.00	5.00	4.96	1.01	1.00	1.01	1.00
$K = 5, p = 2$
25	100	0.77	0.72	0.64	0.56	3.78	3.45	3.05	2.64	1.60	1.62	1.72	1.80
25	500	0.85	0.81	0.74	0.64	4.23	4.06	3.67	3.21	2.00	2.01	2.01	2.01
50	100	0.92	0.88	0.79	0.70	4.70	4.36	3.82	3.29	1.47	1.56	1.66	1.79
50	500	0.98	0.98	0.96	0.89	4.99	4.98	4.83	4.45	2.01	2.00	2.00	2.00
75	100	0.95	0.94	0.87	0.79	4.94	4.83	4.28	3.73	1.44	1.48	1.59	1.71
75	500	0.99	0.99	0.99	0.98	5.00	5.00	5.00	4.97	2.01	2.00	2.00	2.00

Table A6. Means of trace

R^{2}

for random ADFMs of low dimensions using our two-step estimation method with

m = \frac{1}{2}

.

Table A6. Means of trace

R^{2}

for random ADFMs of low dimensions using our two-step estimation method with

m = \frac{1}{2}

.

		Trace $R^{2}$				Estimated K				Estimated p
		Ratio of Missing Data				Ratio of Missing Data				Ratio of Missing Data
$N$	$T$	0%	10%	25%	40%	0%	10%	25%	40%	0%	10%	25%	40%
$K = 3, p = 2$
25	100	0.95	0.95	0.94	0.93	3.00	3.00	3.00	3.01	1.67	1.67	1.63	1.65
25	500	0.98	0.98	0.97	0.96	3.00	3.00	3.00	3.02	2.05	2.03	2.03	2.04
50	100	0.96	0.95	0.96	0.95	3.00	3.00	3.00	3.01	1.72	1.75	1.77	1.76
50	500	0.99	0.98	0.98	0.98	3.00	3.00	3.00	3.02	2.03	2.04	2.06	2.04
75	100	0.96	0.96	0.96	0.96	3.00	3.00	3.00	3.00	1.70	1.72	1.76	1.73
75	500	0.99	0.99	0.99	0.98	3.00	3.00	3.00	3.02	2.03	2.04	2.04	2.05
$K = 5, p = 1$
25	100	0.94	0.92	0.90	0.87	4.87	4.75	4.65	4.50	1.01	1.01	1.01	1.01
25	500	0.97	0.97	0.96	0.94	4.96	4.97	4.92	4.85	1.00	1.00	1.00	1.01
50	100	0.96	0.96	0.95	0.95	5.00	5.00	4.98	4.91	1.00	1.01	1.01	1.01
50	500	0.99	0.99	0.98	0.98	5.00	5.00	5.00	5.00	1.00	1.00	1.01	1.00
75	100	0.96	0.96	0.96	0.96	5.00	5.00	5.00	4.99	1.00	1.00	1.01	1.01
75	500	0.99	0.99	0.99	0.99	5.00	5.00	5.00	5.00	1.00	1.00	1.00	1.00
$K = 5, p = 2$
25	100	0.94	0.92	0.89	0.85	4.88	4.78	4.60	4.47	1.35	1.42	1.43	1.47
25	500	0.97	0.97	0.95	0.93	4.98	4.96	4.92	4.85	2.00	2.00	2.00	2.00
50	100	0.96	0.96	0.95	0.94	5.00	5.00	4.98	4.92	1.45	1.46	1.43	1.47
50	500	0.99	0.98	0.98	0.98	5.00	5.00	5.00	5.00	2.00	2.01	2.00	2.00
75	100	0.96	0.96	0.96	0.96	5.00	5.00	5.00	4.99	1.46	1.42	1.42	1.41
75	500	0.99	0.99	0.99	0.98	5.00	5.00	5.00	5.00	2.00	2.00	2.00	2.01

All means in column trace

R^{2}

are derived from 500 MC simulations for unknown dimensions K and p. Therefore, columns two and three show the corresponding means of the estimated factor dimension K and lag length p. In case of incomplete data, all time series are supposed to be stock variables.

Table A7. Means of trace

R^{2}