Forecasting Inflation Uncertainty in the G 7 Countries

There is substantial evidence that inflation rates are characterized by long memory and nonlinearities. In this paper, we introduce a long-memory Smooth Transition AutoRegressive Fractionally Integrated Moving Average-Markov Switching Multifractal specification [STARFIMA(p, d, q)-MSM(k)] for modeling and forecasting inflation uncertainty. We first provide the statistical properties of the process and investigate the finite-sample properties of the maximum likelihood estimators through simulation. Second, we evaluate the out-of-sample forecast performance of the model in forecasting inflation uncertainty in the G7 countries. Our empirical analysis demonstrates the superiority of the new model over the alternative STARFIMA(p, d, q)-GARCH-type models in forecasting inflation uncertainty.


Introduction
The financial crisis of 2007-2009 and the long-lasting economic recovery has renewed the interest in studying and measuring inflation uncertainty.Studies by Baker et al. (2015) and Jurado et al. (2015), for example, discuss new approaches to defining and measuring inflation and, more generally, macroeconomic uncertainty.Theoretical and empirical studies indicating that uncertainty negatively affects economic growth are well-documented in the literature (see Bernanke 1983;Bloom 2009Bloom , 2014;;Stock and Watson 2012;Henzel and Rengel 2017).In this context, Stock and Watson (2012) find that liquidity risk and uncertainty shocks account for about two-thirds of the US GDP decline during the Great Recession.Bloom (2014) and Henzel and Rengel (2017) provide evidence for a countercyclical behavior of uncertainty.Gurkaynak and Wright (2012) and Wright (2011) argue that inflation uncertainty may explain the behavior of bond risk premia and thus plays a major role in understanding the different effects of monetary policy on short-and long-term interest rates.As stressed in Goodhart (1999) and Greenspan (2003), effective monetary policy purposes prevail as reliable, easy-to-update, and accurate measures of inflation uncertainty.
In spite of being inherently unobservable, inflation uncertainty can be estimated from econometric models.One of the most frequently used approaches to measuring inflation uncertainty consists of applying Engle's (1982) AutoRegressive Conditional Heteroscedasticity (ARCH) processes and their generalized variants.These models are motivated by stylized facts on inflation uncertainty, in particular volatility clustering, high persistence, and asymmetry (see, among others, Baillie et al. 1996;Fountas et al. 2004;Karanasos and Schurer 2008;Caporale et al. 2012;Clements 2014;Makarova 2018).The popularity of GARCH-type models stems from their formal simplicity, flexibility, low computational costs, and their capacity to reproduce clustering effects.However, thorough investigations reveal that alternative inflation uncertainty measures (distinct absolute powers of inflation rates) typically exhibit structural dynamics and persistence patterns that GARCH-type models cannot reproduce.This leads to the question as to which econometric models may be appropriate for modeling (and producing accurate measures of) inflation uncertainty.
In this paper, we consider a new modeling approach by combining long-memory Smooth Transition AutoRegressive Fractionally Integrated Moving Average (STARFIMA) specifications with Markov Switching Multifractal (MSM) models, as recently developed by Calvet and Fisher (2004).MSM processes represent an alternative tool for modeling and forecasting volatility in financial and commodities markets, which regularly outperform GARCH-type models in out-of-sample forecasting evaluations (see Lux et al. 2016;Wang et al. 2016;Segnon et al. 2017).Owing to its formal construction, MSM models properly reproduce the structural dynamics observed in different absolute powers of inflation rates. 1  The rest of the paper is organized as follows.Section 2 introduces the STARFIMA-MSM model.The statistical properties of the model are established in Section 3. Section 4 briefly outlines maximum likelihood estimation and optimal forecasting.Section 5 presents the data analysis for the G7 countries, forecasting methodologies, and the empirical results.Section 6 concludes.

The STARFIMA(p, d, q)-MSM(k) Model
We define the STARFIMA(p, d, q)-MSM(k) model to be a discrete-time stochastic process {x t } satisfying the equation where t |Ω t−1 ∼ N(0, h t ) and In Equations ( 1) and (2), L denotes the lag operator and Ω t−1 is the σ-field generated by the information set { t−1 , t−2 , . . .}.The lag polynomials are defined as where the p autoregressive coefficients φ i (s t , η i ) = φ i0 + φ i1 G (s t ; τ, c) are nonlinear functions of the state variable s t .η i = (φ i0 , φ i1 , τ, c) is a vector of parameters, and d ∈ (−0.5, 0.5) is a real number, and (1 − L) d is the fractional differencing operator given by where Γ(•) denotes the gamma function.
In Equation (2), M (1) t , . . ., M t denote the random volatility components (called multipliers).At date t, each multiplier M (j) t is drawn from the base distribution M (to be specified) with positive support.Depending on its rank within the hierarchy of multipliers, M (j) t changes from one period to the next with probability γ j and remains unchanged with probability 1 − γ j .We specify these transition probabilities as so that the transition matrix related to the jth multiplier is given by In this paper, we draw each multiplier M (j) t (in case of a change) from a binomial distribution with support {m 0 , 2 − m 0 }, 1 < m 0 < 2, and (binomial) probability 0.5, implying the unconditional expectation E(M (j) t ) = 1.If we assume stochastic independence among the multipliers, the transition matrix of the vector where ⊗ denotes the Kronecker product.Using the binomial base distribution for the single multipliers implies the finite support Γ ≡ {m 0 , 2 − m 0 } k for M t .
Remark 1.The stochastic process in Equation ( 1) can be viewed as a special case of the model proposed by Hillebrand and Medeiros (2016) with constant conditional variance and multiple regimes.The process reduces to the linear AutoRegressive Fractionally Integrated Moving Average (ARFIMA) model, when setting In this paper, we consider only two regimes, since this turns out to be sufficient in our empirical application below.We allow the conditional variance in Equation ( 2), which we model as the product of the time-varying multipliers and the positive scaling factor σ 2 , to vary over time (see Calvet and Fisher 2004).As the transition function, we specify the first-order logistic function, G(s t ; τ, c) = (1 + exp{−τ(s t − c)}) −1 , τ > 0, which is arbitrarily often differentiable and satisfies lim s t →−∞ G(s t ; τ, c) → 0 and lim s t →+∞ G(s t ; τ, c) → 1.For τ → +∞ the function G(s t ; τ, c) approaches the indicator function 1(s t > c).The parameter τ regulates the smoothness of the transition from one regime to another (cf.van Dijk et al. 2002).
Remark 2. The transition probabilities defined in Equation ( 4) have been proposed by Lux (2008).This specification reduces the number of parameters to be estimated and enables us to obtain some statistical properties of the model.In Calvet and Fisher (2001), the k transition probabilities are specified as with γ 1 ∈ (0, 1) and b > 1, which guarantees the convergence of the discrete-time MSM model to the Poisson multifractal process in the continuous-time limit.Calvet and Fisher (2004) assume binomial and log-normal base distributions for the multipliers.Liu et al. (2007) find that assuming other base distributions, such as lognormal and gamma, makes little difference in empirical applications.

Statistical Properties
In this section, we consider statistical properties of the STARFIMA(p, d, q)-MSM(k) and the general STARFIMA(p, d, q)-GARCH-type processes, as defined in Section 2.
Assumption 1.The roots of the characteristic polynomials Φ s t ;η (L) and Θ(L) lie outside the unit circle and the logistic transition function G(s t ; τ, c) is well-defined.
Proof.The proof follows from Assumption 1 and the conditions in Theorem 2.1 of Ling and McAleer (2002a), where we replace the constant mean process with our stationary STARFIMA(p, d, q) process from Equations ( 1) and (3).Proposition 3.Under Proposition 1 and with m denoting a positive integer, it follows that the 2m-th moments of {x t }and{ t } are finite.
Proof.The proof follows from Proposition 1 and the conditions in Theorem 1 in Shiryaev (1995, p. 118).
Proposition 4.Under Proposition 2, it follows that the mδ-th moments of {x t }, { t } exist.
Proof.The proof follows from Proposition 2 and Theorem 2.2 in Ling and McAleer (2002a).
Remark 3. Second moments and autocovariances of the MSM(k) process for binomial and lognormal base distributions of the multipliers are given in Lux (2008).As argued in Ling and McAleer (2002a), Proposition 4 cannot easily be extended to higher-order generalized GARCH processes, as specified in Equation ( 5).However, Ling (1999) provides a sufficient condition for the existence of 2m-th moments for the standard GARCH(p, q) process.Ling and McAleer (2002b) establish necessary and sufficient higher-order moment conditions for standard GARCH(p, q) and APARCH(p, q) processes.
Next, we present results for (i) the autocorrelation function of the process {x t } from Equation ( 1), which we denote by ρ(n) = Cov(x t , x t−n )/Var(x t ), and (ii) the q-order autocorrelation function of the process t denoted by ρ q for every moment q and every integer n.For this purpose, we consider the two arbitrary numbers κ 1 , κ 2 ∈ (0, 1), κ 1 < κ 2 , which we use to define the following set of integers (as before, k denotes the number of volatility multipliers in Equation ( 2)): It is easy to check that S k contains a wide range of intermediate lags.
Proof.The proof follows from Proposition 2 and Theorem 2.4 in Hosking (1981).Proposition 6.Under Assumption 2, it follows that ln ρ q (n) ∼ −ψ(q) ln n as k → ∞, where Proof.The proof follows from Proposition 2 and the proof of Proposition 1 in Calvet and Fisher (2004).
Remark 4. MSM processes only exhibit apparent long memory with asymptotic hyperbolic decay in the autocorrelation of absolute powers over a finite horizon.This does not coincide with the traditional definition of long memory with asymptotic power-law behavior of the autocorrelation function in the limit or divergence of the spectral density (see Beran 1994).

Maximum Likelihood Estimation
Hillebrand and Medeiros (2016) suggest using Nonlinear Least Squares (NLS) for parameter estimation of the STARFIMA model.We collect all parameters of the STARFIMA specification in the vector χ and denote (i) an appropriately defined subset of the parameter space by Ξ and (ii) the true parameter vector by χ 0 .Then, for a sample of T observations, the NLS estimator is given by (6) In the case of normally distributed innovations t , NLS is equivalent to Maximum Likelihood Estimation (MLE), whereas for non-normal innovations NLS can be interpreted as Quasi MLE (QMLE).Wooldridge (1994), Pötscher andPrucha (1997), andHillebrand andMedeiros (2016) show that the NLS estimator is consistent and asymptotically normal under appropriate regularity conditions.Li and McLeod (1986) derive asymptotic properties of the MLE for the ARFIMA processes, and a portmanteau test for checking model adequacy.
Proposition 7. Let χ be the solution to the minimization problem (6).Under Assumption 1, it follows that χ is (i) a consistent estimator of χ 0 and (ii) asymptotically normal.
Proof.Under Assumption 1, the conditions of Theorems 1 and 2 in Hillebrand and Medeiros (2016) are satisfied, yielding the proof.
Using a binomial base distribution for the k multipliers, Calvet and Fisher (2004) derive a closed-form solution for the log-likelihood and exact ML estimators of the parameters in the MSM(k) model.In fact, discrete base distributions with positive support for the multipliers imply a finite number of states for the hidden Markov process in the MSM model.This allows us to derive the exact likelihood function via Bayesian updating.For pre-specified k, it is known that the MLE is consistent and asymptotically efficient.
Since the off-diagonal blocks in the information matrix of a STARFIMA(p, d, q)-MSM(k) model are zero, the parameters in the STARFIMA(p, d, q) and MSM(k) specifications can be estimated separately, without asymptotic efficiency loss (see Lundbergh and Terasvirta 1999).Therefore, in a first stage, we estimate the conditional mean via NLS, thus providing consistent estimates of the t 's, which we use in the second stage to estimate the parameters of the conditional variance from the specification t = u t h t . (7) Denoting the parameter vector by ξ = (m 0 , σ) (defined on a compact subset of the parameter space), we obtain the parameters in the second stage by maximizing the log-likelihood In Equation ( 8), ω( t ; ξ) is a 1 × 2 k vector containing the conditional densities of any observation t given by where φ(•) denotes the standard normal density and h(m j being the i-th element of vector m j .The transition matrix P has the components p i,j = Pr(M t+1 = m j |M t = m i ).M t is latent, but we can recursively compute the conditional probabilities π i t = Pr(M t = m i | t , . . ., 1 ) through Bayesian updating as Proposition 8. Let ϑ = (χ , ξ ) denote the complete parameter vector of the STARFIMA(p, d, q)-MSM(k) model.Under Assumptions 1 and 2 and Propositions 3 and 4, there exists an MLE ϑ that is consistent and asymptotically efficient.
Proof.Under the given assumptions, the conditions of Theorem 1 in Hillebrand and Medeiros (2016) are met, yielding the proof.
Remark 5.The shortcoming of the exact MLE is that it becomes computationally demanding for a large number of multipliers (k > 10).Furthermore, a continuous base distribution with positive support for the multipliers implies an infinite state space of the hidden Markov chain, so that the MLE is not applicable.
To circumvent these issues, Lux (2008) proposes a generalized method-of-moments estimator with linear forecasting.Recently, Žikeš et al. (2017) established the Whittle estimation approach.
In Section 4.3, we show that numerical optimization of the MSM(k) log-likelihood function produces satisfactory results for a moderate number of volatility components.

Optimal Forecasting
Using the maximum likelihood estimation approach, we easily obtain volatility forecasts in the MSM(k) model via Bayesian updating of the conditional probabilities.The h-step-ahead volatility forecasts of the MSM(k) model are given by In fact, to produce volatility forecasts over arbitrary, long-term horizons as given in Equation ( 11), we need the conditional probabilities of future multipliers.These conditional state probabilities can be iterated forward via the transition matrix P as follows: For GARCH-type models the formula for the h-step-ahead volatility forecasts are available in the literature (see, for example, Lux et al. 2016, Appendix A).

Monte Carlo Simulation
We assess the robustness of the MLE in small samples via Monte Carlo simulations.We choose the number of volatility components as k = 8, which turns out to be optimal in our empirical application below. 2 As the base distribution, we consider a binomial distribution taking on the values m 0 and 2 − m 0 each with probability 0.5.Along with the switching probabilities from Equation (4), our simulation of the MSM model only requires two parameters: the binomial parameter m 0 and the scale factor (unconditional standard deviation) σ, which we normalize to unity.We simulate 500 independent sample paths of our restricted MSM model for (i) the three different binomial parameters m 0 ∈ {1.1, 1.2, 1.3} and (ii) the three different sample sizes T ∈ {250, 500, 1000}.
Table 1 reports the Monte Carlo maximum likelihood estimation results for small sample sizes.The first two rows provide the average bias and the mean squared error (MSE) of the parameter estimates, relative to the true parameters.The results of the ML estimation appear reasonable and exhibit a decrease in the MSEs with increasing sample size T. From T 1 = 250 to T 2 = 500, the MSEs decrease roughly with a factor of about 2. Overall, our Monte Carlo simulation demonstrates that ML estimation produces reliable results.
Table 1.Monte-Carlo maximum-likelihood-estimation results for small sample sizes.Next, we analyze the capacity of the MSM model for reproducing the statistics of empirical data.We first estimate the binomial parameter m 0 and the scaling factor σ 2 for each G7 country and then use the parameter estimates to simulate 500 independent sample paths with country-specific sample sizes corresponding to those from the empirical data.The country-specific averaged means, standard deviations, skewness and kurtosis values, and the Hurst exponents are reported in Table 2. Overall, the results indicate that the MSM model reproduces the inflation-rate characteristics accurately.We note, however, that the MSM model is not able to capture the asymmetric properties observed in the data.Technical details on the determination of the optimal number of multipliers are available upon request.

Data
Our data set consists of seasonally adjusted consumer-price-index (CPI)-based inflation rates for the G-7 countries (USA, UK, Germany, France, Italy, Canada and Japan).The monthly data were compiled from the International Financial Statistics (IFS).Our data cover the following country-specific time spans: (i) January 1985-December 2015 for the USA, France, and Italy; (ii) January 1985-November 2015 for Canada and Japan; (iii) January 1989-December 2015 for UK; (iv) January 1992-December 2015 for Germany.
The descriptive statistics of the inflation rates are reported in Table 3.The inflation-rate time series exhibit positive skewness and excess kurtosis (greater than 3) for all G7 countries.This indicates a deviation from the normal distribution that is confirmed by the Jarque-Bera test.To test for stationarity, we apply the Phillips-Perron unit-root test, which does not reject the null hypothesis of a unit root at the 1% level for any of G7 countries (see Table 4).We also apply the KPSS test for the stationarity, the results of which are also reported in Table 4. Here, the null hypothesis of stationarity is rejected for all G7 countries at any conventional significance level.In order to analyze the decay in the tails of the unconditional distributions, we also disclose the country-specific tail indices in Table 3, which range between 2 and 13.For the USA, UK, France, Germany, Italy, and Canada, the tail indices are substantially larger than 2, indicating convergence under time-aggregation towards the normal distribution.For Japan, the tail index is close to 2, indicating that the unconditional distribution exhibits tail behavior like the normal distribution.The results of the ARCH tests in Table 3 suggest the presence of heteroscedasticity in the G7 inflation-rate time series.Figure 1 displays the inflation-rate series.

Forecasting Methodology
To analyze the predictive ability of our proposed model in forecasting inflation uncertainty, we adopt a rolling forecasting scheme that keeps fixed the estimation sample size over the out-of-sample period and adds new (and removes old) observations on a monthly basis.We define the following in-sample (out-of-sample) periods: (i) January 1958-October 2009(November 2009-November 2015) for the USA, Canada, and Japan; (ii) January 1989-November 2009(December 2009-December 2015) for the UK; (iii) January 1958-November 2009(December 2009-December 2015) for France and Italy; (iv) January 1992-November 2009(December 2009-December 2015) for Germany.For each country and model specification, we consider inflation uncertainty forecasts for the horizons h = 1, 2, 3, 4, 5, 6 months.We consider the end of the global financial crisis 2007-2009 as the splitting point in our forecasting analysis.
In a first step, we first evaluate the forecasting performance of our specifications on the basis of two loss functions, (i) the mean squared error (MSE) and (ii) the mean absolute error (MAE), given by with h f ,t denoting the volatility forecast obtained from the binomial MSM or GARCH-type models, and σ 2 a,t the monthly actual inflation uncertainty proxy obtained from the monthly squared residuals from suitably selected STARFIMA model specifications.(Here, T is the number of out-of-sample observations.) Next, we use of the predictive ability tests of Hansen (2005) and Diebold and Mariano (1995) to test the relative forecasting performance of our proposed specification against competitor models.
The Equal Predictive Ability (EPA) test of Diebold and Mariano (1995) enables us to directly compare the forecasting accuracy of two competing models (say, M 1 and M 2 ) under a predefined loss function.The null hypothesis of no difference in the forecasting accuracy between the competing models is stated as where denoting the forecast errors obtained from the models M 1 and M 2 , respectively.The loss function L(•) is either the squared error loss L(ε t,M i ) = ε 2 t,M i or the absolute error loss where is the heteroscedasticity and autocorrelation consistent (HAC) variance estimator.( γ j is the estimate of the autocovariance function at lag j, N is the nearest integer larger than T 1/3 .)Under the null hypothesis, the EPA test statistic in Equation ( 16) is asymptotically standard normally distributed.
Based on the framework of the Reality Check (RC) proposed by White (2000), the Superior Predictive Ability (SPA) test of Hansen (2005) enables us to compare a benchmark forecast model, M 0 , with K alternative competing models, M 1 , . . ., M K , under predefined loss functions.The null hypothesis, stating that the benchmark model is not outperformed by any of the K competing models, is formalized as where d t,M i = L(ε t,M 0 ) − L(ε t,M i ) for i = 1, . . ., K and L(•) denotes either the squared-error or the absolute-error loss function, as defined above.To formally state the test statistic, we consider (i) the sample mean of the ith loss differential, dM i = 1/T ∑ T t=1 d t,M i , and (ii) the estimated variance Var( √ T • dM i ) for i = 1, . . ., K. We refer the reader to Hansen (2005) for the technical details on how to estimate this latter variance by bootstrapping.To test the null hypothesis in Equation ( 17), we use the test statistic the p-values of which can be obtained via a stationary bootstrap procedure.

Table 2 .
Simulated moments and Hurst exponents via the binomial MSM(k) model. 2

Table 3 .
Descriptive statistics of the G7 inflation-rate time series.
Note: Q(8) denotes the Ljung-Box test for serial correlation out to lag 8. ARCH(1) denotes the Engle test for ARCH effects at lag 1.

Table 4 .
Unit root tests for inflation time series.
and −3.441.ST and ST * denote the KPSS test statistics using residuals from regressions with (i) intercept and time trend and (ii) intercept only.The critical values at the 1% level are 0.216 and 0.739.

Table 11 .
Diebold and Mariano (1995)are p-values of the EPA test ofDiebold and Mariano (1995)using the squared error loss.We test the null hypothesis that the forecasts at horizon h of Model 1 are equal to those of Model 2 against the one-sided alternative that forecasts of Model 1 are inferior to those of Model 2. The p-values are obtained for the following out-of-sample periods: November 2009-November 2015 for Canada and Japan; December 2009-December 2015 for the US, UK, France, Germany, and Italy.The inflation-rate mean process is ARFIMA.

Table 15 .
Diebold and Mariano (1995)EPA) test, squared error loss, mean process: STARFI.The displayed number are p-values of the EPA test ofDiebold and Mariano (1995)using the squared error loss.We test the null hypothesis that the forecasts at horizon h of Model 1 are equal to those of Model 2 against the one-sided alternative that forecasts of Model 1 are inferior to those of Model 2. The p-values are obtained for the following out-of-sample periods: November 2009-November 2015 for Canada and Japan; December 2009-December 2015 for the US, UK, France, Germany, and Italy.The inflation-rate mean process is STARFI.