Recovering Yield Curves from Dynamic Term Structure Models with Time-Varying Factors

A dynamic version of the Nelson-Siegel-Svensson term structure model with time-varying factors is considered for predicting out-of-sample maturity yields. Simple linear interpolation cannot be applied to recover yields at the very short- and long- end of the term structure where data are often missing. This motivates the use of dynamic parametric term structure models that exploit both time series and cross-sectional variation in yield data to predict missing data at the extreme ends of the term structure. Although the dynamic Nelson–Siegel–Svensson model is weakly identified when the two decay factors become close to each other, their predictions may be more accurate than those from more restricted models depending on data and maturity.


Introduction
Yield curves need to be estimated, since bonds with different maturities are not directly comparable due to different coupon payments. Many central banks provide estimated yields from government bonds. The two common methods used to construct such estimates are those based on splines (smooth curve fitting) and those that are based on parametric models. Although most central banks make the estimated yield curve data publicly available, the exact methodology used to construct such estimates are usually not disclosed. For example, the U.S. Department of Treasury publishes daily treasury yield curve rates on their website (https://www.treasury.gov/resource-center/data-chartcenter/interest-rates/Pages/TextView.aspx?data=yield (accessed on 14 June 2019)).The methodology used to obtain the data are explained at their Treasury Yield Curve Methodology page (https: //www.treasury.gov/resource-center/data-chart-center/interest-rates/Pages/yieldmethod.aspx (last revised 14 October 2018; accessed on 14 July 2019)). which only states that the "yield curve is derived using a quasi-cubic hermite spline function" with no further details. Furthermore, "Treasury reserves the option to make changes to the yield curve as appropriate and in its sole discretion. Such changes may include, but are not necessarily limited to, adding, removing, or modifying inputs, and making changes to the methodology for deriving the yield curve." Many central banks make the estimated yield curve data publicly available. However, data are often missing, particularly at the very short and long end of the term structure. For example, for the (nominal) daily yield curve data from 3 January 2000 to 31 December 2018 (4753 observations) available from the U.S. Department of Treasury, there are 396 (8%) missing values for the one-month rate and 994 (21%) missing values for the 30-year rate. This is unfortunate as the rates at these very short and long ends of the term structure are often of interest.
Given the black box nature of yield curve data construction, the main objective of this paper is to examine whether parametric dynamic term structure models can recover the yield curve data provided by central banks. As the yield curve data are estimated from the traded bond prices, one would ideally try to recover the yield estimates from market quotes of the traded bond prices. However, such data are usually proprietary and not easily accessible to academic researchers. Furthermore, as emphasized Stats 2020, 3 285 in Nymand-Andersen [1] "data cleaning is too often neglected or deprioritised [. . . ] with the risk of not being able to differentiate noise from real (economic) signals when interpreting statistical results".
The approach taken in this paper is to use the publicly provided data to infer the full term structure of yields by predicting the missing values. As yield curve data have a panel data structure, we can exploit both the time series and cross-sectional variation in doing so. Recent work by Koo et al. [2] do this nonparametrically without imposing parametric functional forms. However, they assume that time variation in yields is driven by a vector of common observed covariates (including deterministic trends). They apply their method to the daily bond quotes data from the proprietary CRSP U.S. Treasury database.
This paper takes a parametric approach and considers some generalization of the class of dynamic term structure models used in the literature. Section 2 summarizes this literature that this paper builds upon. The main novelty of the specification considered in this paper is to allow the two decay parameters in the generalized Nelson and Siegel [3] model of Svensson [4], Söderlind and Svensson [5] to be time-varying. Koopman et al. [6] considered a specification with one time-varying decay parameter that generalizes the Nelson and Siegel [3] model. The two decay parameter model is harder to estimate than the one decay parameter model due to weak identification when the two decay parameters take similar values, as discussed in Section 2.
Models with time-varying decay parameters become nonlinear in the state variables of a state-space model. The issue of nonlinear filtering and estimation is discussed in Section 3. Section 4 conducts a small Monte Carlo simulation experiment to compare the finite sample performance of alternative nonlinear filters. An important criterion in choosing among the alternative filters is computational cost (efficiency), not just their accuracy. The results of the simulations indicate that the extended Kalman filter can be an effective choice due its computational simplicity.
An extensive literature compares the performance of various term structure models using time series out-of-sample prediction for future observations as performance criterion [1,7]. This papers uses cross-sectional out-of-sample prediction for yields with maturity not used in estimation (but in the same sample period) as performance criterion.
Section 5 compares the out-of-sample cross-sectional predictability of alternative dynamic term structure specifications while using two types of data. For data known to be generated from parametric models, the two time-varying decay parameter specification performs best, despite the weak identification problem. This result is perhaps not surprising. The results using data estimated from spline based methods are more mixed and can be data dependent. If the out-of-sample maturity is inside the range of maturities used for estimation, a simple linear interpolation may perform as well as predictions from parametric dynamic term structure models. However, interpolation cannot be used for maturities at the very short-or long-end of the term structure where the missing data problem is most prevalent. For these extreme maturities, predictions from the general specification with two time-varying decay parameters generally perform better than those based on more restricted models. However, this is not always the case and the simpler models can perform as well as the generalized model that is proposed in this paper.

Yield Curve Estimation Methods
There are two main approaches to yield curve modeling: spline based and parametric model based [1]. The spline based approach is used by the Bank of England, the Federal Reserve Bank of New York, Bank of Canada, among others (footnote 2, p. 5 in [1]). The parametric model based approach is used by many European central banks, such as Deutsche Bundesbank, Banco de España, Banca d'Italia, Banque de France, and the European Central Bank (footnote 3, p. 5 in [1]).

Spline Based Methods
The spline based approach uses piecewise polynomials to flexibly fit a curve through the data. The splines ensure that the piecewise functions are smoothly joined at the knot points, resulting in a smooth curve over the support. To implement this method, one needs to specify the number of knot points and their positions. The piecewise polynomials may be fit to the forward rate function, the discount function, or the log of the discount function [8].
The spline based methods differ in how they determine the coefficients for each piecewise polynomials. The different methods can be characterized by the use of different regularization, or roughness penalty. The roughness penalty function may or may not depend on maturity or vary over time (Section 3.1 in [1]). Related tospline based methods are the nonparametric methods, where polynomials are locally weighted by kernel functions [2,9]. The analogue of roughness penalty for nonparametric methods is the bandwidth, which controls the 'locality' of the kernel weights.
The flexibility to fit any curve is both an advantage and disadvantage of the spline based approach. It is well known that the shape of the fitted curve can be quite sensitive to the choice of roughness penalty. "Given the superior curvature properties of cubic splines it is a bit surprising how relatively poor the spline smoother performs [in the simulations]. [. . . ] One explanation of these results is therefore that this is related to the lack of a good bandwidth estimator". ( [9], p. 215).
For researchers to be able to reproduce the yield curve data obtained from spline based methods, we need information on the position of the knot points, type of spline (piecewise polynomial) used, and the weight used for the roughness penalty term. I am not aware of any central bank that uses spline based methods that provides such reproducible information. This is in contrast to the transparency of the parsimonious parametric approach that is discussed below.

Parametric Approach
The most commonly used parametric models for the yield curve are the Nelson and Siegel [3] and Svensson [4], Söderlind and Svensson [5] models. One of the appealing features of these models is that one can recover the implied instantaneous forward rate from the yield curve and vice versa. The Nelson and Siegel [3] instantaneous forward rate at time t with maturity τ is specified as The four parameters, known as latent factors, β 1t , β 2t , β 3t , λ t have a t subscript to indicate that they can be time varying.
The yield curve at t with maturity τ is then the 'average' of forward rates given by The yield curve is linear in the factors β 1t , β 2t , β 3t but nonlinear in the decay factor λ t . Associated with each factor β are the 'loadings' Not only does this parametric form turn out to fit the yield data flexibly, but the factors also have natural interpretations. This interpretability has contributed to the wide spread use of the Nelson and Siegel [3] model. The decay factor λ t controls how the yield curve exponentially decays with maturity τ. β 1t can be interpreted as the level, or long term, factor since an increase in β 1t will shift the yield for all maturities uniformly due to the constant loading w 1t = 1. β 2t is known as the slope, or short term, factor. w 2t is an exponentially decreasing function of τ with w 2t (0) = 1 and w 2t (∞) = 0. An increase in β 2t shifts the short rate more than the long rate, and hence changes the slope of the yield curve. β 3t is the curvature, or medium term, factor. w 3t (0) = w 3t (∞) = 0 and w 3t (τ) is concave with a local maximum (hump shape). The decay rate λ t determines the position of the local maximum. An increase in β 3t shifts the medium rates more than the short and long rates and, hence, increases the curvature of the yield curve.
These parametric loadings are known to match the data quite well. The first three principal components of the sample covariance of yields account for most of the variation in the U.S. yields, as documented by Litterman and Scheinkman [10]. The first factor is a 'parallel shift' and accounts for 80-90% of the term structure variation. The second factor is a 'twist' where the short and long rates move in opposite directions and accounts for 5-10% of variation. The third factor is a 'butterfly' where intermediate rates move in the opposite direction of the short and long rates and accounts for 1-2% of the variation.
The Svensson [4], Söderlind and Svensson [5] model adds another factor β 4t and associated decay parameter λ 2t with forward rate specified as The Nelson and Siegel [3] yield curve can have, at most, one hump. The additional two factors in the Svensson [4], Söderlind and Svensson [5] model allows the yield curve to have more than one hump.
Although the additional factors increase the flexibility of the model to fit the data, they make the model potentially weakly identified. The loadings associated with the factors β 3t , β 4t become highly collinear for λ 1t ≈ λ 2t [11]. Despite this weak identification problem, most central banks that adopt the model based approach appear to favor the extended Svensson [4], Söderlind and Svensson [5] model over the original Nelson and Siegel [3] model [1]. Another recent contributing factor for the popularity of this class of parametric models is the relation with no arbitrage restrictions. Early work conducted by Filipovic [12] showed that, if the latent factors follow diffusion processes, the Nelson and Siegel [3] model is generally not arbitrage free. To be consistent with no arbitrage, the factors need to be constant or deterministic. For the Svensson [4], Söderlind and Svensson [5] model, only one factor can be non-deterministic to be consistent with no arbitrage. Christensen et al. [13] are able to find a member of the affine class of arbitrage free models of Duffie and Kan [14] with factor loadings that exactly match those of the Nelson and Siegel [3] model. However, this requires modifying the Nelson and Siegel [3] model by including an additional deterministic term in the yield curve that only depends on maturity. Krippner [15] shows how the Nelson and Siegel [3] factors (approximately) map to those of the arbitrage free gaussian affine class models of Dai and Singleton [16].
For the Svensson [4], Söderlind and Svensson [5] model, both Christensen et al. [17] and Krippner [15] show that, in addition to the maturity depend yield adjustment term, a second slope factor is needed to be consistent with no arbitrage. In these so-called arbitrage free affine dynamic Nelson-Siegel-Svensson models, the factors follow diffusion processes, except for the decay parameters, which remain fixed.
These parametric models can be fit to a cross-section of bonds period by period. A dynamic term structure model links these factors over time by assuming that they follow an autoregressive process. Koopman et al. [6] considered a dynamic version of the Nelson and Siegel [3] model where all latent factors follow an AR(1) process. Christensen et al. [17] considered dynamic versions of the Svensson [4], Söderlind and Svensson [5] model. However, they restricted the two decay factors to be fixed parameters. This paper considers a model where the two decay factors also follow an AR(1) process.
Early work by Diebold and Li [18] used a two-step method to estimate dynamic term structure models. The two-step method exploits the fact that, conditional on the decay factor, the model becomes linear in the remaining factors. However, in the first step, one needs to arbitrarily fix the value of the decay parameter to some 'reasonable' value. Swanson and Xiong [19] use a three-step procedure, in which the decay parameter for the two-step procedure is chosen using a grid search that optimizes the in-sample fit of the factors. This appears to be done period by period recursively without specifying the dynamics of the decay parameters. Statistical inference based on these multistep procedures is complicated, since the estimates in each step are conditional on the estimates from the previous step(s).
More recent work jointly estimate all parameters by maximizing a gaussian likelihood function [6,13,17]. The likelihood function is evaluated by casting the dynamic term structure model in state-space form. For the fixed decay factor specifications of Christensen et al. [13,17], the standard linear Kalman filter can be used to evaluate the likelihood. For specifications with time-varying decay factors, the state-space model becomes nonlinear in the states and the standard Kalman filter cannot be applied. Koopman et al. [6] use the extended Kalman filter that applies the Kalman filter to a linearized approximation.
One main reason Christensen et al. [13,17] to restrict the decay parameter(s) to be fixed is to exploit the no arbitrage condition. They show that the their specifications belong to the class of affine arbitrage free models provided a yield adjustment term is added to the original Nelson-Siegel-Svensson specifications. There is some evidence that this yield adjustment term may be economically small and, hence, the gains from exploiting the no-arbitrage restriction may also be small [20]. This paper follows the lead of Koopman et al. [6] and examines the gain, if any, of allowing the decay parameters to be time varying instead.
An important aspect of the model based approach not emphasized enough in the literature is its parsimony. In particular, for researchers to be able to reproduce the yields and forward rates for any maturity, all one needs are the values of a small number of latent factors (six for the Svensson [4], Söderlind and Svensson [5] model). The European Central Bank is commended to not only make the daily yield curve data publicly available, but to also provide the daily estimated values of the six latent factors of the Svensson [4], Söderlind and Svensson [5] model used to fit the yield curve (http://sdw.ecb.europa.eu/browse.do?node=9691126).

Which Method to Use?
As evidenced by the fact that both spline based and parametric model based approaches are used in practice, neither approach dominates the other. The spline based approach is favored when the emphasis is on obtaining a 'good' fit that passes through most of the data points. The model based approach is favored when the emphasis is on parsimony and smoothness of the resulting curve avoiding overfitting in-sample. The appropriate choice requires trading-off these pros and cons, depending on one's primary objective for fitting the yield curve [1]. Alternatively, rather than choosing one method over the other, one can combine or average the estimates from both methods [19].

State-Space Formulations
I compare the performance of four parametric models with time-varying factors to examine the cross-sectional predictability of dynamic term structure models. All of these models can be written in state-space form, as follows.
for t = 1, . . . , n. y t is the m × 1 observed vector of yields in period t with maturities τ = (τ 1 , . . . , τ m ) and α t = (L t , S t , C 1t , C 2t , log λ 1t , log λ 2t ) is the 6 × 1 unobserved latent state vector. The components of the state vector are the level (L t ), slope (S t ), curvature (C 1t , C 2t ), and decay (λ 1t , λ 2t ) factors. v t ⊥ t denotes statistical independence of v t and t . The parameters of the model are θ = (H, d, T, Q). The m × m symmetric positive definite matrix H is assumed to be diagonal with positive entries and Q is a 6 × 6 symmetric semi-positive definite matrix. If we do not restrict T or Q to be diagonal, there are m + 6 + 6 × 6 + 6 × 7/2 = m + 63 parameters to estimate. The measurement equation links the yields to the state (factor) vector via the loadings where 1 m is the m × 1 vector of ones. The models used in this paper do not impose the no arbitrage restrictions. As shown in Christensen et al. [17], the model with two decay factors λ 1 , λ 2 but one slope factor S t cannot be arbitrage free. Christensen et al. [17] introduce a second slope factor in addition to the yield adjustment term to make the model arbitrage free. However, the model with a single slope factor already has a large number of parameters (m + 63) to estimate. Adding a second slope factor will increase the number of parameters to m + 7 + 49 + 28 = m + 84 (without restricting T or Q). Another reason to keep the model simple is to avoid exacerbating the weak identification problem alluded to in Section 2. The arbitrage free model with two decay parameters and two slope factors is not identified without imposing an identification restriction, such as λ 1 > λ 2 . Furthermore, there is some evidence at least with U.S. data that the Nelson and Siegel [3] model is 'nearly' arbitrage free indicating that the yield adjustment term may be economically small [20]. For these reasons, I keep the original Nelson-Siegel-Svensson specification without imposing the no arbitrage restrictions. As discussed below, I instead examine the gains from having a time-varying decay parameter.
An issue that does not appear to be discussed much in the literature is whether the specification should restrict the yield to be non-negative. Most dynamic term structure specifications appear not to impose such non-negativity restriction. Inspecting the factor data publicly available from the European Central Bank indicates that the level factor L t and decay parameters λ 1 , λ 2 are restricted to be non-negative (though I could not find any documentation that mentions this restriction). Koopman et al. [6] do not appear to restrict either the slope factor L t or the decay factor λ t to be non-negative. The specifications used in this paper do not restrict the level factor L t but do restrict the decay parameters λ to be non-negative. This is done by entering the decay parameters in the state vector α t with the (natural) log transformation log λ.
The dynamic Nelson-Siegel-Svensson (DNSS) specification presented above is the most general model considered in this paper. I also consider three special cases of this general model that have been used in the literature. The Koopman et al. [6] specification nulls out rowsnand columns four and six of the parameters d, T, and Q resulting in a state-space model with a four dimensional state vector α t = (L t , S t , C 1t , log λ 1t ). This specification is called the dynamic Nelson-Siegel model, denoted DNS, and differs from that in Koopman et al. [6] in that the decay factor λ t is restricted to be non-negative. The two other special cases keep the decay parameters λ fixed as in Christensen et al. [13,17], but without imposing the no arbitrage restrictions. These specifications are referred to as the Nelson-Siegel-Svensson (NSS) model with state vector α t = (L t , S t , C 1t , C 2t ) and the Nelson-Siegel (NS) model with state vector α t = (L t , S t , C 1t ).

Filtering and Estimation
The parameters of the state-space model (2) and (3) where the m × 6 jacobian ∂Z/∂α is evaluated at α = a t|t−1 . The expressions for the jacobian ∂Z/∂α are given in online Appendix A.1. The gaussian log-likelihood function can be evaluated using the extended Kalman filter, which consists of the filtering steps and prediction steps The filter is initialized with a 1|0 = (I − T) −1 d and the solution to the Lyapunov equation A 1|0 = TA 1|0 T + Q. The log-likelihood is evaluated as The affine arbitrage free models estimated in Christensen et al. [13,17] have the important feature that the decay parameters λ 1 , λ 2 are fixed and not time-varying. This makes the state-space model linear in the states (though still nonlinear in the parameters) and the standard Kalman filter can be used to evaluate the likelihood function. Koopman et al. [6] consider the Nelson and Siegel [3] model with time-varying decay parameter λ t and find improvement in the in-sample fit of the model. Once we make the decay parameter part of the state vector α t , the state-space model becomes nonlinear in the state and the standard Kalman filter cannot be used. Koopman et al. [6] use the extended Kalman filter to the linearized model.
The extended Kalman filter is computationally attractive, since it applies the standard Kalman filter to the approximated linear state-space model. However, the concern is the error due to the linear approximation. As an important feature of the Nelson-Siegel-Svensson specification is the ability to parsimoniously capture nonlinear curvatures in the term structure, one may be concerned whether the linearization in the extended Kalman filter may compromise this feature. As this issue appears to be little explored in the literature, I consider two alternative nonlinear filtering methods [21,22]: the unscented filter and the Rao-Blackwellized particle filter.
The unscented Kalman filter approximates the unknown distribution of a nonlinearly transformed random variable by a small number of discretized points and associated weights, known as sigma points and sigma weights. These points and weights are chosen to match the first two moments of the target distribution. It can be shown that the mean of the extended Kalman filter estimate is accurate to first order, while that of the unscented filter is accurate to the second order. (The variance is estimated to the second order of approximation by both filters.) Further details are provided in Appendix A.2.
To apply the unscented filter, one needs to specify the sigma points and weights. The requirement to match the first two moments of the target distribution imposes certain restrictions on the choice. However, they do not uniquely narrow down the choice and certain 'tuning' parameters need to be selected by the user. One of the simplest and commonly used choice suggested in Julier and Uhlmann [23] requires choosing how to allocate the non-negative weights that sum to one over the sigma points. The first sigma point is equal to the mean of the target distribution while the remaining number of points come in pairs that offset each other, so that the mean of the sigma points is equal to the target mean.
The second alternative is based on particle filtering, also known as sequential Monte Carlo [24,25]. The unscented filter uses a small number of fixed points and weights to approximate the unknown distribution. In contrast, particle filters use a large number of random draws to accurately approximate the unknown target distribution. In a nonlinear state-space model, the target distribution of interest is the distribution of the (unobserved) state vector α t . In a standard application of particle filters, one would draw particles from the full state space with dimension equal to that of α t .
Rao-Blackwellization draws particles from a reduced dimension by partitioning the state space [26]. By drawing a subset of state variables conditional on the other state variables, one can reduce simulation variance and obtain much more accurate filtered estimates than drawing from the full state space without conditioning. The key to achieving effective variance reduction through Rao-Blackwellization is to find an appropriate partitioned state space. An important feature of the DNSS model is that conditional on the two decay factors λ t = (λ 1t , λ 2t ), the model becomes linear in the remaining four state variables L t , S t , C 1t , C 2t . To denote the partitioned system, split the state vector as α t = (x t , λ t ) where x t = (L t , S t , C 1t , C 2t ) and λ = (log λ 1t , log λ 2t ). The system can then be written as Details of the filtering algorithm, mostly taken from Schon et al. [26], are reproduced in Appendix A.3 for the reader's convenience. The extended and the unscented Kalman filters can both be considered to be Rao-Blackwellized for the DNSS model as the state equation is linear in the states. The approximation is only applied to the measurement equation, which is nonlinear in the state variable(s).

Computational Issues
As already indicated, the most general DNSS specification has a large number of parameters (m + 63) to estimate. In order to keep the number of parameters linear in the state dimension for estimation purposes, I restrict T and Q in the state equation to be both diagonal as in Christensen et al. [17]. This reduces the number of parameters to estimate for the DNSS model to the more manageable m + 18.
Even with this restricted specification, the DNSS model with two time-varying decay factors is significantly harder to estimate than the DNS model (with one time-varying decay factor). The reason is the weak identification problem with the DNSS model when λ 1t ≈ λ 2t . The consequence is that the numerical optimization procedure can fail to converge when the starting parameter values result in λ 1t ≈ λ 2t . As a partial solution to this problem, optimization of the log-likelihood is carried out for a number of randomly drawn sets of starting values.
The three-step estimation procedure in Swanson and Xiong [19] deals with this issue by restricting the two decay parameters to be a certain distance apart period-by-period ( [19], footnote 8, p. 12). However, when the dynamics of λ t are specified as an autoregressive processes, as is done in this paper, such period-by-period restriction cannot be imposed. Therefore, I effectively do a grid search over a large number of randomly drawn starting parameter values.

Comparison of Nonlinear Filters
This section conducts a small Monte Carlo simulation study to compare the accuracy of the alternative nonlinear filtering methods described in the previous section. Due to computational cost, this simulation study is restricted to studying the accuracy of the filtering algorithms assuming the true parameter values are known. In practice the parameter values are unknown and must be estimated. The effect of filtering error on the accuracy of the parameter estimates is an important issue to be examined, but is left for future work. The main computational difficulty for examining estimation error is the Rao-Blackwellized particle filter as it requires applying the recursive filter from drawing a large number of particles for each simulated sample, a simulation within a simulation problem. As shown below, even with known parameter values, the Rao-Blackwellized particle filter is considerably more computationally expensive than the other nonlinear filters.

Simulation Design
In the simulation experiment, I consider two weighting allocations for the unscented filter. The first is to equally weight all points and it is denoted USF. The second is to allocate half weight to the first sigma point (which is the target mean) and distribute the remaining half equally to the remaining points. This second filter is denoted USF(w 0 = 0.5).
To make the accuracy measures comparable across different state variables, I use the scale-free accuracy measure recommended by Hyndman and Koehler [27]. Let e t = a t|t − α t denote the error of the filtered state series a t|t . The scaled error is defined as q t = e t /mae rw , where mae rw = ∑ n t=2 |α t − α t−1 |/(n − 1) is the mean absolute error from a 'naive' forecast assuming α t follows a random walk. The scale-free accuracy measure used is the mean absolute scaled error defined as mase = ∑ n t=1 |q t |/n. A sample of size n = 3000 is simulated 999 times and, for each sample, the mase is computed using the alternative nonlinear filters to obtain a distribution of mase for each nonlinear filter. Figure 1 compares the distribution of mase for the measurement with nine maturities and Figure 2 for the measurement with 13 maturities. Several important features are apparent in these figures. First, no filter dominates in accuracy for all state variables. For recovering the (conditionally) linear state variables corresponding to the L, S, C 1 , C 2 factors, the approximate filters EKF, USF are competitive against RBP(m = 999). However, for the decay factors λ, which are the main sources of nonlinearity, RBP(m = 999) is more accurate than the approximate filters EKF, USF.
The mase accuracy measure uses the random walk forecast as benchmark and errors with mase < 1 indicate better recovery than from the random walk model. The figures show that this condition is clearly satisfied only for the level factor L (the state variables were generated from a persistent but stationary AR(1) model with T = 0.9). The potentially collinear two curvature factors C 1 , C 2 have mase of at least two confirming the weak identification problem with the DNSS model.
Finally, the performance of the filters depend on the signal to noise ratio in the observed measurements. This signal-to-noise ratio depends on the measurement error variance H. The value H = 10 −5 used in the figures was chosen, so that some differences among the filters become visually apparent. The distribution of filtering errors for lower signal-to-noise ratio with H = 0.1 (not reported) are visually indistinguishable and overlap each other.

Trading off Accuracy and Computational Cost
The main finding from this small simulation experiment is that no filter dominates in terms of accuracy ranking. Another important consideration in choosing the appropriate filter is its computational cost. Some filters, like EKF, have no tuning parameter to control its accuracy (except possibly to use higher order expansions) and its performance depends on the extent of nonlinearity in the data. For other filters, we have some control over their accuracy by the choice of tuning parameters. For example, for USF the number of sigma points, their positions, and associated weights can affect their accuracy. Generally, there would be a tradeoff between accuracy and computational cost. For USF, the more points you use, we expect better accuracy, but at a higher computational cost (longer computing time).
Figures 3 and 4 compare the accuracy versus computational cost tradeoff of the nonlinear filters. Because the Rao-Blackwellized filter (RBP) takes much more longer than the other filters, the computation time on the horizontal axis is the log of timing relative to the extended Kalman filter. A fast and accurate filter would appear in the bottom left corner of each plot. Filters that lie to the north-west (up-right) of a filter are dominated by that filter.
Viewed from this accuracy-computational cost tradeoff criterion, Figures 3 and 4 show that RBP does not dominate any other filter, but it is dominated for several state variables for the DNSS model. In particular, the RBP filter with m = 99 particles is always dominated, except for the λ 2 state variable with nine yields. The computational cost of the RBP filter is roughly proportional to 2nm, where n is the sample size, two is the dimension of the simulated state vector (the two decay parameters), and m the number of particles. RBP requires running the filter for each particle over the sample n and, hence, the computational cost of order O (2 nm). For the daily sample analyzed below, n ≈ 3000 as used in the simulations. Although computation time is highly dependent on software and hardware used to implement the filters, in my implementation filtering 13 maturities of sample size n = 3000 takes about 0.07 seconds for the EKF filter, 0.16 s for the USF filter (13 sigma points), and 27.5 s for RBP(m = 999). RBP(m = 999) is about 400 times slower than EKF. yields with maturties τ = 1/12, 6/12, 1, 2, 3, 5, 7, 10, 20 years each with 3000 observations. L is the level, S the slope, C 1 , C 2 curvature factors with decay parameters λ 1 , λ 2 , respectively. Each panel compares the error distribution of the filtered series over five different filters. EKF is the extended Kalman filter, USF is the unscented filter with equal weights on every sigma point, USF (w 0 = 0.5) is the unscented filter with weight w 0 on the central (first) sigma point and equal weights for the remaining sigma points, RBP is the Rao-Blackwellized particle filter with m particles.   L is the level, S the slope, C 1 , C 2 curvature factors with decay parameters λ 1 , λ 2 , respectively. USF is the unscented filter with equal weights on every sigma point, USF (w 0 = 0.5) is the unscented filter with weight w 0 on the central (first) sigma point and equal weights for the remaining sigma points, RBP is the Rao-Blackwellized particle filter with m particles.     L is the level, S the slope, C 1 , C 2 curvature factors with decay parameters λ 1 , λ 2 , respectively. USF is the unscented filter with equal weights on every sigma point, USF (w 0 = 0.5) is the unscented filter with weight w 0 on the central (first) sigma point and equal weights for the remaining sigma points, RBP is the Rao-Blackwellized particle filter with m particles.

Recovering Out-of-Sample Maturity Yields
This section analyzes real data in order to examine the predictability of out-of-sample maturity yields from dynamic term structure models with time-varying factors considered above.

Data
Two types of data are used for the empirical analysis. The first is daily yield data from the European Central Bank for the sample (6 September 2004 to 21 February 2019 (3701 observations) (Source: http://sdw.ecb.europa.eu/browse.do?node=9691126). These are known to be constructed from the fitted Nelson-Siegel-Svensson model. They provide two sets of estimated values of the six factors L, S, C 1 , C 2 , λ 1 , λ 2 from a cross-section of bonds, one consisting of AAA rated bonds and one consisting of all ratings. The observed yields are computed from these factors as fitted values from the Nelson-Siegel-Svensson model. That is, (2) and it is equivalent to evaluating (1). Because the latent factors are observable in this data set, I can directly assess the in-sample accuracy of the filtered factors.
The second type of data come from central banks known to use spline based methods. The daily data are from Bank of Canada ( . All data are nominal yields and they are publicly available (Sources: Bank of Canada (https://www.bankofcanada.ca/rates/interest-rates/ bond-yield-curves/), Bank of England (https://www.bankofengland.co.uk/statistics/yield-curves), U.S. Department of Treasury (https://www.treasury.gov/resource-center/data-chart-center/interestrates/Pages/TextView.aspx?data=yield)). As mentioned in the introduction, data at the short-and long-end of the term structure are frequently missing in these data sets. This motivates examining the recoverability of out-of-sample maturity yields from the estimated dynamic term structure models. Although data are made publicly available, as far as I can tell, none of these central banks provide sufficient information to reproduce the yield curve data. Figure 5 shows the time series of Nelson-Siegel-Svensson factor data from the European Central Bank. The level L and decay λ 1 , λ 2 parameters appear to be restricted to be positive, but not the slope S and curvature C 1 , C 2 factors. The two curvature and decay parameters are plotted together to highlight the weak identification problem. For the post-2015 period, the two decay factors nearly overlap indicating weak identification during this period. Another symptom of weak identification is the several 'spikes' or 'jumps' in the factor estimates. One cannot expect the DNSS model to accurately recover the latent factor series, particularly the curvature C 1 , C 2 and decay λ 1 , λ 2 factors, for this data set. Table A1 in the online Appendix shows the results of fitting a univariate AR(1) model to each factor series from the European Central Bank, as assumed in the DNSS specification with diagonal T. The estimated AR(1) parameters are all above 0.98, with R 2 all above 0.96.

Estimation
I first need to estimate model parameters to predict out-of-sample maturity yields. For each data set, I divide the sample of maturities into two groups, one to be used for estimation and the other to be used for out-of-sample prediction. The grouping is somewhat arbitrary but the general rule for the in-sample maturities is to use the range of maturities commonly used in the existing literature with no missing values. For the out-of-sample maturities, I use maturities from both extreme ends of the term structure that typically have missing values. I also use one maturity not used in estimation that is within the maturity range used for estimation. Figure 6 graphically summarizes the in-sample and out-of-sample maturities used in the empirical analysis. Because the European Central Bank (ECB) makes the underlying factor data available, we can choose any (reasonable) maturity and recover the yield data as fitted values, as described above. Two sets of in-sample maturities are used for the ECB data, one with nine maturities τ 9 = (1/12, 6/12, 1, 2, 3, 5, 7, 10, 20) and one with thirteen maturities τ 13 = (1, 1.5, 2, 3, 4, 5, 7, 8, 9, 10, 15, 20, 25). These are the maturities used in the simulation experiment in Section 4. These two choices are made in order to make them similar to the maturities available from the other central banks (τ 9 for U.S. Department of Treasury data and τ 13 for Bank of Canada and Bank of England data).  The parameters of the model are estimated by numerically maximizing the gaussian log-likelihood function (4) while using the extended Kalman filter. The likelihood is numerically maximized using a gradient based optimizer. Because of the large number of parameters to estimate, especially for the DNSS model, finite difference methods to numerically approximate the gradients are not only computationally costly, but can also be inaccurate. The scores are evaluated using algorithmic differentiation (The R package TMB (https://cran.r-project.org/package=TMB) was used to evaluate the likelihood and scores).Because of the weak identification problem discussed in Section 3.3, numerical optimization is sensitive to the choice of starting parameter values. Not only do they often fail to converge, but when they do converge they do not always converge to similar parameter estimates. The numerical optimizer is run through a number of randomly selected starting values and the estimates are taken among results that converged with the highest likelihood value.
The parameter estimates are reported in Tables A2-A8 of the online Appendix. The QML standard errors are computed by evaluating both the scores and Hessian with algorithmic differentiation. As another symptom of weak identification, the Hessian evaluated at the parameter estimates are sometimes singular. For such cases, the reported standard errors use the pseudo-inverse of the Hessian. Consistent with the least squares estimates fitted to the underlying factor data, the estimated latent series are all highly persistent with AR(1) parameter close to the nonstationary boundary of one (the AR(1) parameters were constrained to be stationarity when maximizing the likelihood).
For the European Central Bank data, we can compare how well the filtered state series using these estimates can recover the factor data. Figures 7 and 8 compare the 'true' α t and filtered a t|t state series for the AAA rating data ( Figures A1 and A2 in the Appendix A show the comparisons for the data based on all ratings). Consistent with the simulation results, the first two factors L and S are reasonably well recovered, especially when using thirteen maturities τ 13 . The two curvature factors C 1 , C 2 are tracked reasonably well for the first half of the sample, but not for the second half after some spikes or jumps in the true data. The two decay parameters are close to each other in the second half of the sample and C 1 , C 2 are likely to be highly collinear and weakly identified during this period, as observed in Figure 5. The two decay factors are never well tracked in all cases, highlighting the difficulty of identifying these factors from the observed yields. Figures 7 and 8 show the difficulty of recovering the underlying state series of the DNSS model due to weak identification. However, this does not necessarily imply that predictions from these model may not perform well. Under weak identification, there may be several alternative parameter configurations that, nonetheless, produce similar predictions.

Prediction
The focus is on the cross-sectional predictability rather than the more commonly tested time series predictability in order to examine the predictive performance of these models. This focus on the cross-sectional predictability is motivated by the fact that data for very short-and long-ends of the term structure are often missing, as can be seen in Figure 6.
The cross-sectional predictions for yields of out-of-sample maturity τ are obtained from the measurement Equation (2) using the filtered state series a t|t as y uses the filtered series rather than the one-step predictions a t|t−1 = E t−1 [α t ] or the smoothed series a t|n = E n [α t ]. The smoothed series uses information from future observations and it is subject to 'look ahead' bias. The one-step prediction may be a better choice to mimic real-time, or online, prediction situations than the filtered series. The filtered series are used, so that we can compare the model based predictor with a 'naive' interpolated predictor. If there are maturities τ 1 , τ 2 used for estimation that bracket the out-of-sample maturity τ 0 , such that τ 1 < τ 0 < τ 2 , a simple natural predictor is the linearly interpolated value Because these interpolated values use information up to t, they are comparable to forecasts that are based on filtered series rather than one-step predictions. The main advantage of the model based predictor over interpolation is that they apply even when the out-of-sample maturity have no bracketing in-sample maturities. Extrapolating outside the in-sample maturity range is more problematic than interpolating within the in-sample maturity range.  The model based predictors y (τ) t that are based on the filtered state series are subject to sampling errors. The conditional mean squared prediction error can be computed as The second line approximation is due to the delta method. There are two difficulties with assessing the size of sampling error in the predictions. First, the above expression does not account for parameter uncertainty, the fact that the filtered series are computed using estimated parameter values that are themselves subject to sampling error. Second, there is no natural estimator for the innovation variance H for out-of-sample maturities, since H is associated with each in-sample maturity.
An alternative approach to deal with the latter problem is to turn the prediction problem into an estimation problem. This can be accommodated within the state-space model by including the partially observed yields in the estimation sample. The state-space filter can then fill in the missing values (Section 4.10 in [28]). This approach is left for future work, including how the fraction of missing observations in the sample affects the performance as compared to the prediction approach used in this paper.
Because of these difficulties, instead of using formal statistical tests I use informal visual diagnostics to check the cross-sectional predictability of the model based forecasts. For time series forecast evaluation, it is common practice to specify a loss function and compare certain moments of the forecast error distribution, such as the mean absolute error (for absolute error loss) or root mean squared error (for quadratic loss). A recent work by Diebold and Shin [29], Jin et al. [30] proposed an approach that does not depend on the choice of loss function. Rather than focus on certain moments of the error distribution, this approach checks how 'close' the forecast error cumulative distribution function (cdf) is to that from a perfect forecast. The cdf of a perfect forecast jumps vertically from 0 to 1 at e = 0 where e is the support of the forecast errors. Jin et al. [30] construct a formal test that measures the closeness of the two cdfs. Because of the difficulties with formal testing alluded to above, in this application I plot the empirical cdfs of the forecast errors and visually compare how close they are to the cdf of perfect forecasts. Figures 9-11 compare the empirical cdfs of the prediction errors (The European Central Bank data for all ratings is given in Figure A3 of the online Appendix). For the European Central Bank data in Figure 9, it is visually apparent that the predictions from the DNSS model is closest to the vertical line at e = 0 for most maturities out-of-sample. Figure A4 in the online Appendix compares the kernel density estimates of the error distributions. For most out-of-sample maturities the density for the DNSS error distribution has least bias and variance. Despite the weak identification problem with DNSS, this result is not surprising, since we know that the data are generated from the DNSS model for the European Central Bank data.
Of more interest is for the data estimated from spline based methods that are shown in Figures 10  and 11. (Figures A6 and A8 in the online Appendix compare the corresponding kernel density estimates of the distribution of prediction errors.) For these data, the results depend on the data and maturity. For the Bank of Canada and U.S. Department of Treasury data shown in Figure 10, the interpolation method is often the best with DNSS a close second. However, the interpolation method can only be used for non-extreme maturities within the range of maturities used for estimation. For the Bank of England data presented in Figure 11 there is not much difference (at least visually) among the different models, except for NSS, at the short-end of the term structure. No model performs well at the long-end, although one must keep in mind that these comparisons are based on limited amounted of data due to a large fraction of missing observations; see Figure 6. Figure 12 shows the difference in prediction mean absolute errors (MAE) between DNSS and the other models to illustrate how loss function based summary predictive performance measures compare with the distribution based measures used above. A negative difference indicates better prediction from the DNSS model than from the comparison model. These results correspond to the cross-sectional out-of-sample forecasts for the spline based data in Figures 10 and 11. The confidence intervals for the difference in MAE were constructed from the Diebold and Mariano [31] statistic with heteroskedasticity and autocorrelation robust (HAR) standard errors. For maturities inside the range of maturities used for estimation, a simple linear interpolation generally performs better. However, maturities at both ends of the term structure that require extrapolation are the ones that are often missing. For these cases the DNSS model generally performs better than the more restrictive parametric models. Figure A9 in the online Appendix shows the difference in prediction mean squared errors (MSE). These results generally agree with those based on MAE in Figure 12. For the Bank of England data for maturities τ = 1/2 and τ = 6, some of the differences are not statistically significant using MSE. This illustrates the sensitivity of performance ranking to the choice of loss function.    1.5, 2, 3, 4, 5, 7, 8, 9, 10, 15, 20, 25). DNSS is the dynamic Nelson-Siegel-Svensson specification, where both decay parameters λ 1,t and λ 2,t follow an AR(1) process, DNS is the dynamc Nelson-Siegel specification with a single decay parameter λ 1,t that follows an AR(1) process, NSS is the Nelson-Siegel-Svensson specification with the two decay parameters λ 1 and λ 2 assumed to be fixed, NS is the Nelson-Siegel specification with a single decay parameter λ 1 assumed to be fixed. λ 1,t and λ 2,t follow an AR(1) process, DNS is the dynamc Nelson-Siegel specification with a single decay parameter λ 1,t that follows an AR(1) process, NSS is the Nelson-Siegel-Svensson specification with the two decay parameters λ 1 and λ 2 assumed to be fixed, NS is the Nelson-Siegel specification with a single decay parameter λ 1 assumed to be fixed.

Concluding Remarks
This research was motivated by the need to recover missing yield data at the extreme ends, very short-and long-maturities, of the term structure. I sincerely hope that more central banks follow the lead taken by the European Central Bank and provide sufficient information in order to reproduce the yield estimates. The sooner the need that motivated this research becomes obsolete, the better.
However, currently, data produced by spline based methods employed by several major central banks remain in a black box. For researchers who do not have access to the underlying bond data, the class of parametric dynamic term structure models considered in this paper could be used to infer data that are often missing at the extreme ends of the term structure. The performance of alternative parametric specifications is somewhat mixed and data dependent. The advice is to carefully consider how predictions from alternative specifications differ from each other and consider the tradeoff between parsimony or accuracy and in-sample overfitting. The result that more complex specifications do not necessarily produce better out-of-sample predictions is consistent, for example, with the assessment in Nymand-Andersen [1] based on the cross-section of bond prices.
There are several directions in which the parametric dynamic term structure approach can be further developed. Due to computational constraints, the specifications considered in this paper have restricted the state variable dynamics to the diagonal AR(1) specification. A general VAR(1) specification would not only result in estimating a much larger number of parameters, but also introduce additional issues, such as constraining the estimates, to remain in the stationarity region [6]. If computing power were not a constraint (such as with access to cloud computing clusters), estimation using alternative nonlinear filters can also be pursued.
Another potentially important issue not addressed in the current paper is the effect of structural break or jumps in the underlying state series. The time series plot of the factors presented in Figure 5 show several spikes or jumps, especially around the financial crisis period in 2008. The state-space model could be generalized in order to accommodate such potential breaks following the approach in Andrieu et al. [32], Nemeth et al. [33].
Funding: This research received no external funding.

Conflicts of Interest:
The author declares no conflict of interest.

Appendix A Appendix A.1. Linearization for the Extended Kalman Filter
For the DNSS specification where α t = (L t , S t , C 1t , C 2t , log λ 1t , log λ 2t ). Take Taylor expansion around α t = a t to get where ∂Z(α)/∂α is the m × 6 Jacobian matrix For the DNS specification Appendix A.2. Unscented Kalman Filter s-dimensional target distribution with first two moments x ∼ (x, P xx ). The 2s + 1 sigma points x i and weights w i that match the first two moments of x are [23,34,35] x 0 = x, for i = 1, . . . , s where L ,i is the i-th column of L such that LL = P xx . To determine λ, w 0 , set w 0 = k/(s + k) for some k, a tuning parameter. Then the second moment is matched with λ = √ s + k. If target x is gaussian, set s + k = 3 or k = 3 − s. This may result in negative weights for s > 3. Equivalently, set w 0 as tuning parameter with k = w 0 /(1 − w 0 )s and λ = s/(1 − w 0 ).
Updating step of filter: estimate a t|t , P t|t given y t , a t = a t|t−1 , P t = P t|t−1 with sigma points and weights More specifically, compute and update a t|t = a t + P av,t P −1 vv,t (y t − y t ) P t|t = P t − P av,t P −1 vv,t P av,t Prediction step of filter: estimate a t+1 , P t+1 given a t|t , P t|t with sigma points (same weights) x 0,t = a t|t , x i,t = a t|t + λL ,i , x i+s,t = a t|t − λL ,i and predict For linear state equations, just apply the usual prediction step a t+1 = c t + T t a t|t , P t+1 = T t P t|t T t + R t Q t R t without approximation error (Rao-Blackwellization).
Unscented filter accurate to second order for the mean. Extended Kalman filter only accurate to first order for the mean. The variance estimate accurate to second order for both filters. If k < 0, the prediction step may yield a non-positive definite covariance matrix.

Appendix A.3. Rao-Blackwellized Particle Filter
The material in this section largely taken from Schon et al. [26] for the reader's convenience.
Set t ← t + 1 and go back to step (2). Table A1. AR(1) least squares estimates for Nelson-Siegel-Svensson factors from European Central Bank. L t is the level, S t the slope, C 1,t , C 2,t curvature factors with decay parameters λ 1,t , λ 2,t , respectively. (a) for factors estimated from AAA bonds and (c) for factors estimated from all bonds. µ is the sample mean, ρ is the AR(1) coefficient of the demeaned factor series (without an intercept), σ is the standard error of regression and T is the sample size. Standard errors in parentheses. The daily sample is 6 September 2004 to 21 February 2019.  Table A2. Maximum likelihood estimates of four dynamic Nelson-Siegel-Svensson type specifications for zero yield curve data from the European Central Bank (AAA bonds). DNSS is the dynamic Nelson-Siegel-Svensson specification where both decay parameters λ 1,t and λ 2,t follow an AR(1) process, DNS is the dynamc Nelson-Siegel specification with a single decay parameter λ 1,t that follows an AR(1) process, NSS is the Nelson-Siegel-Svensson specification with the two decay parameters λ 1 and λ 2 assumed to be fixed, NS is the Nelson-Siegel specification with a single decay parameter λ 1 assumed to be fixed. The likelihood functions were evaluated using the extended Kalman filter. σ τ are the estimated measurement equation innovation variances (assumed to be diagonal), ρ(·) are the estimated AR(1) parameters of the state series, E[·] are the estimated unconditional mean of the state series, σ(·) are the estimated standard deviations of the state equation innovations. QML standard errors in parentheses. The daily sample is 6 September 2004 to 21 February 2019 (3701 observations). The measurement consists of 13 yields with maturities τ = 1, 1.5, 2, 3, 4, 5, 7, 8, 9, 10, 15, 20, 25 (years).  Table A3. Maximum likelihood estimates of four dynamic Nelson-Siegel-Svensson type specifications for zero yield curve data from the European Central Bank (AAA bonds). DNSS is the dynamic Nelson-Siegel-Svensson specification where both decay parameters λ 1,t and λ 2,t follow an AR(1) process, DNS is the dynamc Nelson-Siegel specification with a single decay parameter λ 1,t that follows an AR(1) process, NSS is the Nelson-Siegel-Svensson specification with the two decay parameters λ 1 and λ 2 assumed to be fixed, NS is the Nelson-Siegel specification with a single decay parameter λ 1 assumed to be fixed. The likelihood functions were evaluated using the extended Kalman filter. σ τ are the estimated measurement equation innovation variances (assumed to be diagonal), ρ(·) are the estimated AR (1) (1) process, DNS is the dynamc Nelson-Siegel specification with a single decay parameter λ 1,t that follows an AR(1) process, NSS is the Nelson-Siegel-Svensson specification with the two decay parameters λ 1 and λ 2 assumed to be fixed, NS is the Nelson-Siegel specification with a single decay parameter λ 1 assumed to be fixed. The likelihood functions were evaluated using the extended Kalman filter. σ τ are the estimated measurement equation innovation variances (assumed to be diagonal), ρ(·) are the estimated AR (1) (1) process, DNS is the dynamc Nelson-Siegel specification with a single decay parameter λ 1,t that follows an AR(1) process, NSS is the Nelson-Siegel-Svensson specification with the two decay parameters λ 1 and λ 2 assumed to be fixed, NS is the Nelson-Siegel specification with a single decay parameter λ 1 assumed to be fixed. The likelihood functions were evaluated using the extended Kalman filter. σ τ are the estimated measurement equation innovation variances (assumed to be diagonal), ρ(·) are the estimated AR (1)    . DNSS is the dynamic Nelson-Siegel-Svensson specification where both decay parameters λ 1,t and λ 2,t follow an AR(1) process, DNS is the dynamc Nelson-Siegel specification with a single decay parameter λ 1,t that follows an AR(1) process, NSS is the Nelson-Siegel-Svensson specification with the two decay parameters λ 1 and λ 2 assumed to be fixed, NS is the Nelson-Siegel specification with a single decay parameter λ 1 assumed to be fixed. . DNSS is the dynamic Nelson-Siegel-Svensson specification where both decay parameters λ 1,t and λ 2,t follow an AR(1) process, DNS is the dynamc Nelson-Siegel specification with a single decay parameter λ 1,t that follows an AR(1) process, NSS is the Nelson-Siegel-Svensson specification with the two decay parameters λ 1 and λ 2 assumed to be fixed, NS is the Nelson-Siegel specification with a single decay parameter λ 1 assumed to be fixed. The daily sample is 6 September 2004 to 21 February 2019 (3701 observations). Predictions from European Central Bank (all ratings) estimates with maturities τ 9 = (1/12, 6/12, 1, 2, 3, 5, 7, 10, 20) (left panel) and τ 13 = (1, 1.5, 2, 3, 4, 5, 7, 8, 9, 10, 15, 20, 25) (right panel). DNSS is the dynamic Nelson-Siegel-Svensson specification where both decay parameters λ 1,t and λ 2,t follow an AR(1) process, DNS is the dynamc Nelson-Siegel specification with a single decay parameter λ 1,t that follows an AR(1) process, NSS is the Nelson-Siegel-Svensson specification with the two decay parameters λ 1 and λ 2 assumed to be fixed, NS is the Nelson-Siegel specification with a single decay parameter λ 1 assumed to be fixed. The daily sample is 6 September 2004 to 21 February 2019 (3701 observations).  Figure A6. Kernel density estimates of prediction error distributions. Predictions from Bank of England data with maturities τ 13 = (1, 1.5, 2, 3, 4, 5, 7, 8, 9, 10, 15, 20, 25). DNSS is the dynamic Nelson-Siegel-Svensson specification where both decay parameters λ 1,t and λ 2,t follow an AR(1) process, DNS is the dynamc Nelson-Siegel specification with a single decay parameter λ 1,t that follows an AR(1) process, NSS is the Nelson-Siegel-Svensson specification with the two decay parameters λ 1 and λ 2 assumed to be fixed, NS is the Nelson-Siegel specification with a single decay parameter λ 1 assumed to be fixed. The daily sample is 4 January 2000 to 31 December 2018 (4801 observations some of which may be missing). DNSS is the dynamic Nelson-Siegel-Svensson specification where both decay parameters λ 1,t and λ 2,t follow an AR(1) process, DNS is the dynamc Nelson-Siegel specification with a single decay parameter λ 1,t that follows an AR(1) process, NSS is the Nelson-Siegel-Svensson specification with the two decay parameters λ 1 and λ 2 assumed to be fixed, NS is the Nelson-Siegel specification with a single decay parameter λ 1 assumed to be fixed. DNSS is the dynamic Nelson-Siegel-Svensson specification where both decay parameters λ 1,t and λ 2,t follow an AR(1) process, DNS is the dynamc Nelson-Siegel specification with a single decay parameter λ 1,t that follows an AR(1) process, NSS is the Nelson-Siegel-Svensson specification with the two decay parameters λ 1 and λ 2 assumed to be fixed, NS is the Nelson-Siegel specification with a single decay parameter λ 1 assumed to be fixed. λ 1,t and λ 2,t follow an AR(1) process, DNS is the dynamc Nelson-Siegel specification with a single decay parameter λ 1,t that follows an AR(1) process, NSS is the Nelson-Siegel-Svensson specification with the two decay parameters λ 1 and λ 2 assumed to be fixed, NS is the Nelson-Siegel specification with a single decay parameter λ 1 assumed to be fixed.