A Locally Both Leptokurtic and Fat-Tailed Distribution with Application in a Bayesian Stochastic Volatility Model

In the paper, we begin with introducing a novel scale mixture of normal distribution such that its leptokurticity and fat-tailedness are only local, with this “locality” being separately controlled by two censoring parameters. This new, locally leptokurtic and fat-tailed (LLFT) distribution makes a viable alternative for other, globally leptokurtic, fat-tailed and symmetric distributions, typically entertained in financial volatility modelling. Then, we incorporate the LLFT distribution into a basic stochastic volatility (SV) model to yield a flexible alternative for common heavy-tailed SV models. For the resulting LLFT-SV model, we develop a Bayesian statistical framework and effective MCMC methods to enable posterior sampling of the parameters and latent variables. Empirical results indicate the validity of the LLFT-SV specification for modelling both “non-standard” financial time series with repeating zero returns, as well as more “typical” data on the S&P 500 and DAX indices. For the former, the LLFT-SV model is also shown to markedly outperform a common, globally heavy-tailed, t-SV alternative in terms of density forecasting. Applications of the proposed distribution in more advanced SV models seem to be easily attainable.


Introduction
Most of leptokurtic and heavy-tailed distributions, commonly entertained in financial volatility modelling, may be derived as scale mixtures of normal (SMN) distributions, with the concept dating back to [1]. SMN is a very wide and useful class of distributions, which belongs to an even wider class of elliptical distributions (see [2]). We say that a random variable follows a scale mixture of normals if it has the following stochastic representation: where Z has a standard normal distribution independent from a positive random variable V. If V ∼ Gamma(ν/2, ν/2), then follows Student's t-distribution with ν degrees of freedom, while for V ∼ IGamma(α, β) we obtain a variance gamma distribution (see [3]). For V ∼ Pareto(1, ν/2) we obtain a modulated normal type I (MN type I) distribution, while V ∼ Beta(ν/2, 1) yields a modulated normal type II distribution (see [4]), which is also known in the literature as a slash distribution (see [5]). Extensions of the latter were introduced by [6,7], who developed a modified slash (MS) and generalized modified slash (GMS) distributions, respectively. As mentioned above, the SMN class contains many different subclasses of distributions with different tail behaviour and modal concentration in comparison with the as y t = √ h t ε t , ln h t = γ + φ(ln h t−1 − γ) + η t , where ε t ∼ N(0, 1) and η t ∼ N(0, σ 2 ) are i.i.d. and mutually independent. Although it is well-known that such an SV model has the ability to induce leptokurtosis and heavy tails, typically observed in financial time series, its underlying conditional Gaussianity is still quite a limitation. Introducing the LLFT distribution for ε t aims at a more adequate capturing of the heavy tails and leptokurticity of financial assets' returns, as compared not only to the basic but also conditionally heavy-tailed SV models, typically entertained in the literature, including t-SV and slash-SV models.
Modifications of the distribution for ε t has been one of the most prolific strands of SV literature (see, e.g., [9][10][11][12][13][14][15][16]). Other typical directions of generalizing the basic SV structure focus on capturing the leverage effect and asymmetry (see, e.g., [12,[17][18][19]) as well as on refining the volatility process by, for example, accommodating realized volatility and long memory (see, e.g., [18,20]), or allowing for discrete, Markov switches of the parameters (see, e.g., [21][22][23]). However, we do not follow these lines of research in our current paper, focusing rather on the construction of a new distribution "from scratch" and its introduction into the basic SV model, thereby contributing to the research area of improving the conditional distribution in SV models. Extending our framework for other, more elaborate SV specifications is left for future work.
For statistical inference in the resulting LLFT-SV model, we resort to the Bayesian framework, which is typically considered for SV models (see [14,[24][25][26]). We show that under certain prior structure, the marginal data density in the LLFT-SV model (and thus, the posterior distribution) is bounded even when the same values of observations repeat in the sample, which is not necessarily the case when these were to be modelled with the slash or MN type I distributions, as duly noted by [27]. The result seems essential in view of repeating (or at least strongly concentrated around some constant, e.g., the mode) values of returns that occur quite often for some individual financial assets (such as a company's stocks).
Building on the hierarchical representation of SMN distributions, we develop an effective Markov chain Monte Carlo (MCMC) method for posterior sampling of the LLFT-SV model's parameters and latent variables. The procedure suitably adapts standard techniques of the Metropolis-Hastings algorithm and Gibbs sampler.
To sum up, the contribution of our work is several-fold. Methodologically, a new, specific scale mixture of normals is constructed, featuring five free parameters, with two of them censoring separately the leptokurticity and fat tails to be only local, while the other three controlling for the relative weights and magnitudes of these two features in a disentangled manner. Then, we introduce this new LLFT distribution into a stochastic volatility model and develop a Bayesian framework for the resulting LLFT-SV specification, resolving some additional theoretical considerations about the existence of the posterior distribution. Next, we adapt relevant MCMC numerical methods for posterior simulations of the parameters and latent variables. Finally, we conduct an extensive empirical study of the LLFT-SV model's workings and density forecasting performance for a selected financial asset and additionally provide a brief illustration for two common stock market indices.
The article is organized as follows. In Section 2, we define the locally both leptokurtic and fat-tailed (LLFT) scale mixture of normal distributions. In particular, we show in detail the properties of the introduced distributions, together with their relations to other, well-known distribution families. In Section 3, we introduce the stochastic volatility (SV) model incorporating the proposed distribution for the error term and develop a Bayesian framework for the resulting LLFT-SV model. Section 4 presents a real-world data analysis illustrating predictive advantages of the model. Finally, Section 5 concludes and discusses possible avenues for further research.

The Slash Distribution-A Short Review
With begin with a detailed presentation of the slash distribution' properties, with the distribution itself parametrized according to [5], where it was introduced. Following the cited paper, we assume that X|W ∼ N(0, W 2 ), (2) where W has Pareto(1, ν) distribution and N(µ, σ 2 ) stands for the Gaussian distribution with mean µ and variance σ 2 . Note that recently, some other parametrization has been used (see for example [14,15]), in which it is assumed that X|W ∼ N(µ, 1/W), whereW has Beta(ν, 1) distribution. However, both parametrizations are equivalent under ν = 1 2ν , since Pareto(1, ν) is a inverse distribution for Beta(ν, 1). In this paper, we choose to stick to the original parametrization by [5], as it enables a direct comparison of the slash distribution's properties with the ones of Student's t. We will discuss this later in this section. The probability density function (pdf) of the random variable X given by (2) is which is easy to obtain by noticing that the function under the integral above is the kernel of an inverse gamma distribution (under x = 0) at W 2 , and Γ(a, x) denotes the upper incomplete gamma function: Γ(a, x) = ∞ x t a−1 e −t dt. The exact form of the pdf (3) of the slash distribution also admits a recurrent form, which was also shown in [5]. Density is a continuous function on the real line, symmetric and unimodal. The characteristic function of X given by (2) has the form E(e itX ) = 1 2 νE ν 2 +1 t 2 2 , where E n (z) = ∞ 1 e −tz t n dt is the exponential integral function. It is well known that the slash distribution reduces to the standard Gaussian case as ν → ∞. Note that, similarly to Student's t-distribution, the moment E(X 2r ) exists only under assumption ν > 2r, r ∈ N, and is given as Hence, the variance exists only under ν > 2, and is equal to E(X 2 ) = ν/(ν − 2), which is the same as for Student's t-distribution with ν > 2 degrees of freedom. The kurtosis of the slash distribution exists only for ν > 4 and equals Ku(X) = 3(ν − 2) 2 /[(ν − 4)ν]. For a given value of ν, the slash distribution's kurtosis is lower than in the Student's t-distribution with ν degrees of freedom, but is analogically a decreasing function of ν, tending to 3 as ν → ∞, and approaching ∞ when ν → 4 + . Interestingly, the slash distribution features the same tail thickness as the t-distribution with ν degrees of freedom, i.e., Thereby, the probability of extreme values increases as ν → 0 + , while for ν → ∞, the slash distribution reduces to the standard Gaussian case.
In view of the above, the properties of the slash distribution with the parameter ν are comparable with the ones of Student's t-distribution, with both sharing and 'suffering' from the same restriction ensuring the existence of their k th moments: ν > k. Otherwise, relevant integrals do not converge since too much probability is located in the tails.
In the following subsection, we propose a modification of the slash distribution to 'curb' the heaviness of the tails but only far enough from the mode. In consequence, all moments of such a modified distribution exist.

The Concept of the Local Slash (LS) Distribution
Here, we modify the slash distribution reviewed in the previous section by introducing a censored variable in place of W in (2): where, similarly as before, is a censoring parameter. The marginal distribution of X given by (5) depends strongly on parameter d, which is related to the tail thickness. The introduction of parameter d ensures that the tails' thickness of X is upheld but only up to some distance from the mode (with the distance being controlled for by d), while thinning out thereafter. Since the heavy-tailedness is only "local" now, we name the new distribution as the local slash (LS) distribution. The probability density function of the LS distribution can be easily derived and admits the form which is a continuous function of ν and d, for any fixed r ∈ N. Thus, the variance exists for any ν ∈ R + and equals Note that for any fixed ν > 0 we have which means that in the limiting case of d → ∞, the variance stabilizes for ν > 2 and tends to the variance of the slash distribution (coinciding with the one of Student's t-distribution).
The above result also indicates that for any ν ≤ 2, the variance exists and can be arbitrarily high (for sufficiently large d).
The kurtosis exists for any d and ν and equals Hence, Figure 1 presents the shape of the variance and kurtosis of the LS distribution as a function of both ν and d. Two white lines represent the cases where ν = 2 or ν = 4. Note that for any fixed d > 1, the variance increases as ν → 0 + , while the kurtosis tends to 3 with either ν → 0 + or ν → ∞, achieving its maximum at ν = 2 (see Figure 1).  Note that the higher value of d, the more mass is spread out on tails (for fixed ν).

Local Modulated Normal Type I Distribution
We begin this section with a presentation of the modulated normal (MN) type I distribution introduced by [4], which we later modify into the local modulated normal (LMN) type I distribution. Firstly, let us recall that a random variable X follows an MN type I distribution if where R is a continuous random variable on (0, 1) with Beta(ρ, 1) distribution, ρ > 0. The pdf and cdf of random variable X admit the forms The characteristic function equals 2 and tends to characteristic function of the standard normal distribution as ρ → ∞, which indicates the equivalence of the two distributions in the limiting case. The moments exists for any ρ > 0 and equals √ π(ρ+2r) , r ∈ N. Hence, the variance is E(X 2 ) = ρ ρ+2 , while the kurtosis equals Ku(X) = 3(ρ+2) 2 ρ(ρ+4) , which is a decreasing function of ρ (yet always taking on higher values than 3), and tends to ∞ as ρ → 0 + . Figure 3 presents the pdf of the MN type I distribution for different values of ρ, compared with the Gaussian curve. Presented cases exhibit the ability of this distribution to capture leptokurtic data. It is straightforward to show that, for ρ < 2, the pdf is a convex function on sets (0, ∞) and (−∞, 0). If 1 < ρ < 2 (see Figure 3, green line), the density is pointed at zero, thus not differentiable at this point. If ρ ≤ 1, the pdf tends to ∞ as x → 0, and degenerates to a single-point distribution as ρ → 0. Hence, for ρ ≤ 1, the normalized probability −1 P(|X| < ) tends to ∞ for → 0. For ρ > 1, this probability equals 2 π ρ/(ρ − 1), approaching ∞ as ρ → 1 + .  Note that the accumulation of the mass near to zero is typical for returns of many financial assets, corresponding with only small or even no price changes at all. However, such a behaviour of the data can be observed only locally, in the sense that for a finite sample size it seems a little far-fetched to assume that the normalized probability −1 P(|X| < ) is unbounded. Note that for a finite sample size, such a property for ρ ≤ 1 becomes an assumption (somewhat incidental) that could not be validated. To weaken this assumption and control for the pointedness of the distribution, we introduce a locally leptokurtic distribution with a continuous pdf function at zero, for which −1 P(|X| < ) is bounded on > 0 for any fixed ρ > 0, including ρ ≤ 1. Moreover, the pdf for such a distribution is differentiable on the entire real line.
The idea for constructing a locally leptokurtic distribution is to use the censored random variable R as a standard deviation of X, which (conditionally on R) is normally distributed: where R is a continuous random variable with Beta(ρ, 1) distribution and c ∈ (0, 1) is a censoring parameter. We refer to the marginal distribution of X as the local modulated normal (LMN) type I distribution. Its pdf and cdf are given by the following formulas: Note that the pdf is continuous on the entire real line. Since the first term of the above cdf function tends to 0 as ρ → ∞, the LMN type I distribution reduces to standard normal distribution in this limiting case, for any fixed c ∈ (0, 1). Moreover, it reduces to a zeromean Gaussian case with variance c 2 , when ρ → 0 + . The characteristic function of the LMN type distribution admits the form: . The moments E(X 2r ) exist for any ρ > 0, c ∈ (0, 1), and are given as E(X 2r ) = , r ∈ N. Hence, the variance equals E(X 2 ) = 2c ρ+2 +ρ ρ+2 , while the kurtosis Ku(X) = 3(ρ+2) 2 (4c ρ+4 +ρ) (ρ+4)(2c ρ+2 +ρ) 2 , being higher than 3, and approaching this value as either ρ → 0 + or ρ → ∞. Figure 4 presents the densities of the LMN type I distribution for different values of ρ ≥ 1, while Figure 5-for ρ ≤ 5/4, with c ∈ {0.02, 0.05, 0.1, 0.2, 0.5, 0.8} in both cases. The presented curves exhibit visible flexibility in capturing leptokurtic data. In particular, it can be noticed that a combination of a small value of ρ with a small value of c can produce an "extremely" leptokurtic distribution (see Figure 5, with c = 0.02).

Locally Both Leptokurtic and Fat-Tailed Distribution
To capture both effects in financial modelling, that is, leptokurticity and fat tails, we construct a new distribution, belonging to the SMN family, that combines the advantages of both the LS and LMN type I distributions introduced above. Let the conditional distribution of some random variable X be defined as: where R is a mixture of Beta(ρ, 1) and Pareto(1, ν) distributions with weights p and 1 − p, respectively, i.e., R = Beta(ρ, 1), with probability p Pareto(1, ν), with probability 1 − p. (20) We name the marginal distribution of X as the locally both leptokurtic and fat-tailed distribution (LLFT, in short), since it has the ability to be (locally) leptokurtic and (locally) heavy-tailed. Note that LLFT distribution can be written as a mixture of the LS distribution (given by (5)) and LMN type I distribution given by (16) with weights p and 1 − p. The pdf, cdf, characteristic function and moments follow immediately from the previous two sections and well-known properties of probability distributions' mixtures; therefore, we omit the details. Figure 6 presents the limiting cases of the LLFT distribution, the set of which comprises the slash, LS, MN type I, LMN type I and Gaussian distributions.    Figure 7b). Moving away from the mode, the curves behave similarly (see Figure 7c), with their tails' local heaviness driven by the parameter d: the higher its values, the longer the tails keep their local heaviness (see Figure 7d), before they thin out (see Figure 7e). This, however, does not pertain to the case when d → ∞ (red line), which represents a simple mixture of the slash and MN type I distributions (without their "local" modifications).
As regards the moments of the LLFT distribution, they can easily be derived as the moments of a mixture, and thus we skip their presentation. However, for a reason clarified below, it merits a mention that the expectation E(e rX ) exists for any r ∈ N and equals where and It can be shown that the finiteness of E(e rX ) stems from the tails of X ∼ LLFT thinning out beyond the censoring and actually converging to the tails of a zero-mean normal distribution with variance d 2 .
The above feature of the LLFT distribution could prove essential in empirical finances, particularly option pricing, where predicting the price of a derivative hinges upon some models built for logarithmic rates return, say y t , the conditional distribution of which (given the past of the process) belongs to the same family of distributions as the one assumed for the standardized errors of the observations. Transformation of the return y t into the price requires using the exponential function for y t . Depending on the family distribution of the latter, conditional (on the past) expectation of the price simply may not exist (as it happens in the case of y t following a Student's t or slash distribution). However, from (21), it follows that if the conditional distribution of y t is LLFT, then the expectation of the induced price distribution is finite (conditionally on the past price).

The SV Model with Locally Leptokurtic and Fat-Tailed Innovations
In this section, we use the newly proposed, LLFT distribution (given by (19) and (20)) in the context of modelling financial time series. In volatility modelling of this type of data, two major classes of models are used: the generalized autoregressive conditionally heteroscedastic (GARCH) and stochastic volatility (or stochastic variance, SV) models. Here, we focus on the latter class. The stochastic variance models were introduced to describe time-varying volatility ( [17,24,[28][29][30][31]), and it seems that they are more flexible than GARCH. In the basic SV model, a log-normal auto-regressive process is specified for the conditional variance, with the conditional mean equation's innovations following the Gaussian distribution. We generalize such a model by assuming the LLFT distribution instead, which yields the LLFT-SV specification. Due to the LLFT distribution's properties (see Section 2), such a model is a natural and promising alternative to the heavy-tailed SV (including t-SV) models proposed in, e.g., [14,17].
The basic stochastic volatility model is defined as follows: where y t denotes an observable at time t, x t is an 1 × r vector of exogenous variables or lagged observations with parameters comprised in an r × 1 vector β, {ε t } ∼ iiN(0, 1) is a Gaussian white noise sequence with mean zero and unit variance, {η t } ∼ iiN 0, σ 2 , η t and ε s are mutually independent (denoted as η t ⊥ε s ) for all t, s ∈ {1, 2, . . . , T}, T is the length of the modelled time series, and finally, 0 < |ϕ| < 1. We extend the above SV specification by waiving the normality ε t and replacing it with the LLFT distribution given by (19): where {λ t } ∼ iiN(0, 1), ω t ∼ Beta(ρ, 1) with probability p and ω t ∼ Pareto(1, ν) with probability 1 − p and λ s ⊥ω t for s, t ∈ {1, 2, . . . , T}, with censoring parameters c and d such that 0 < c < 1 < d.
The conditional distribution of y t in the LLFT-SV model (given the past of y t and the current latent variables h t and ω t ) is determined by the distribution of λ t , so y t follows the normal distribution with mean µ t = x t β and standard deviation σ t = √ h t min{max{ω t , c}, d}. For µ t we assume an autoregressive structure of order m with a constant: j=1 β j B j has roots outside the unit circle.

Bayesian Setup
The Bayesian statistical model amounts to specifying the joint distribution of all observations, latent variables and parameters. The assumptions presented so far determine the conditional distribution of the observations and latent variables, given the parameters, thus necessitating the marginal distribution of the parameters (the prior or a priori distribution) to be formulated. In the prior structure, we assume mutual independence between parameters β, φ, γ, σ 2 , p, ν, ρ, c, d and use standard prior distributions. The vector β, as well as scalar parameter φ, are assumed to have a truncated normal distribution. Parameter γ has a normal prior, whereas for σ 2 , we set an inverse gamma prior distribution IG(α σ , β σ ) with mean β σ /(α σ − 1) under α σ > 1. For ν we assume a gamma distribution G(α ν , β ν ), with mean α ν /β ν . For ρ, we also assume a gamma distribution, while for parameter p-a beta distribution. Finally, for parameters c and d, we assume inverse Nakagami distributions (see [32]) truncated to intervals (0, 1) and (1, ∞), respectively. A random variable x follows an inverse Nakagami distribution with parameters α, β > 0, which we write as , owing to which sampling from I NK(α, β) distribution is straightforward and becomes down sampling from the corresponding IG(α, β) distribution and taking the square root of the draw. The probability density function is given as x 2 . The inverse Nakagami distribution has been presented by [32], although in a different parameterization. Note that under c and d following truncated I NK distributions, c 2 and d 2 are then inverse-gamma distributed (with corresponding truncations). Notice that for c ∈ (0, 1), a beta prior could be considered instead. However, the resulting full conditional posterior would be far less tractable for MCMC sampling. The exact specifications of these distributions are presented later in this section.
Under a sample y 1 , y 2 , . . . , y T , where T is the sample's length, we introduce the following matrix notation: y = [y 1 , y 2 , . . . , As regards the initial conditions y (0) , for h t we fix h 0 = 1, while for y t the first m pre-sample observations are used. The following symbols are used: f n N (x|u, V )-the probability density function of the n-variate normal distribution with mean vector u and positive definite covariance matrix V , f N (x|a, b)-the probability density function of the normal distribution with mean a and variance b, f LN (x|a, b)-the probability density function of the log-normal distribution with mean a and variance b, f G (x|a, b)-the probability density function of the gamma distribution with mean a/b, f IG (x|a, b)-the probability density function of the inverse gamma distribution with mean b/(a − 1) for a > 1, f Beta (x|a, b)-the probability density function of the beta distribution with parameters a and b, f Pareto (x|a, b)-the probability density function of the Pareto distribution with parameters a and b, Now the Bayesian model can be fully written as: where and the prior distributions are specified as follows:

MCMC Method for the Bayesian LLFT-SV Model
The posterior density function, p(θ, ω, h|y, y (0) ), where θ comprises all the model's parameters, is proportional to (25) and thus, highly dimensional and non-standard. To make an inference about the parameters and latent variables, relevant numerical methods are needed. In our paper, we resort to a common Markov chain Monte Carlo (MCMC) method, namely the Gibbs algorithm, consisting of sequential sampling from the full conditional posteriors derived from (25), which we present below.

The Full Conditional Posterior Distributions of Parameters
The conditional posterior densities of the LLFT-SV model's parameters are the following (see (25)): Since the above densities of the full conditional distributions have known closed forms, we can directly simulate from these distributions using standard pseudo-random numbers generators.
As regards the parameters c and d, their full posterior conditionals are far more complicated as compared to the remaining ones.

•
Conditional distribution of c. If ∑ T t=1 I (0,1) (ω t ) = 0, then the conditional distribution is straightforward, since p c|y, h, ω, β, γ, φ, σ 2 , ν, ρ, p, d, y (0) ∝ p(c). If ∑ T t=1 I (0,1) (ω t ) = n > 0 then the full conditional posterior density of c is as follows: (26) which can be expressed as a mixture of n + 1 different distributions: , (27) whereω 1 ,ω 2 , . . . ,ω T are order statistics, defined by sorting the values of ω 1 , ω 2 , . . . , ω T in ascending order, i.e., with correspondingã y,t for t = 1, . . . , T, where a y,t = (y t −x t β) 2 h t have been previously assigned to unordered ω t s. To simulate from this mixture, the mixing weights and probability density functions with normalizing constants are needed. For k = 1, we have a truncated inverse Nakagami distribution for c with parameters α c and β c : where is the normalizing constant for the truncated inverse Nakagami distribution. For k = 2, . . . , n, we also have a truncated inverse Nakagami distribution for c with where w c,k = 1 2 Finally, for k = n + 1 : where The mixing weights are proportional to: w 0 c,1 = w c,1 To simulate from the mixture components, we apply the acceptance-rejection method by using the uniform distribution on the interval (ω k−1 ,ω k ) for k = 1, . . . , n, and on the interval (ω n , 1) for k = n + 1.  (31) which can be expressed as a mixture of T − n + 1 different distributions: (32) In this case, the first component is proportional to the truncated inverse Nakagami where is the normalizing constant for the truncated inverse Nakagami distribution. For k = 2, . . . , T − n, we also have a truncated inverse Nakagami distribution for d where The last component is: where w d, In this case, the mixing weights are proportional to: Similarly to the algorithm for the parameter c, to simulate from the mixture components here, the acceptance-rejection method is applied by using the uniform distribution on the interval (ω n+k−1 ,ω n+k ) for k = 2, . . . , T − n. In turn, for k = T − n + 1, we simulate from a truncated gamma distribution on the interval (ω T , +∞) using the algorithm proposed by [33]. If ∑ T t=1 I (1,+∞) (ω t ) = T − n = 1, then we have only two components in the mixture (32), the first and last ones.

The Conditional Posterior of h t s
We sample each element of the vector h using the independence Metropolis-Hastings (similarly to [24]) algorithm within a separate Gibbs step, initializing at h t = 0.1y 2 t + 1 for t = 1, . . . , T. The conditional posterior density of h t , t ∈ {1, . . . , T}, is given by The proposal distribution used to draw h t is the inverted gamma: As regards the initial conditions for ln h t , it is assumed that ln h 0 = 0.

The Conditional Posterior of ω t s
The conditional posterior density of ω t , t ∈ {1, . . . , T}, is given by: p ω t |y, h, ω 1 , . . . , ω t−1 , ω t+1 , . . . , ω T , β, γ, φ, σ 2 , ν, ρ, p, c, d, y which can be written as a mixture of four different distributions: (38) where The weightsw ω,i , i = 1, 2, 3, 4, of the mixture are as follows: It is easy to build an algorithm to generate from the mixture. First, we sample from the uniform distribution to randomly select the component distribution, and then we generate a pseudo-random value from it. The first term of the mixture is given by the density: which can be sampled using the inverse cdf technique. The second density is: which for ρ > 1 is the truncated beta density with parameters ρ − 1 and 1. However, for 0 < ρ < 1, we obtain a non-standard distribution, but an independence Metropolis-Hastings algorithm can be applied with the proposal distribution being the Beta distribution with parameters ρ + 1 and 1. The third term of the mixture is: which is the density of an inverse gamma distribution of ω 2 t , truncated to the interval (1, d). Thus, we can sample ω 2 t from the truncated inverted gamma distribution with shape parameters ν 2 + 1 and scale parameters 1 2 a y,t , and then calculate ω t . We simulate from the truncated gamma distribution using the algorithm proposed by [33]. The last term of the mixture is given by the following probability density function: which can be easily sampled from via the inverse cdf technique. We initialize the sampler at starting values ω t = 1.1 for t = 1, . . . , T.

Empirical Illustrations
In this section, we apply the LLFT-SV model to describe the volatility of three time series of daily stock market prices. We begin with the analysis of a quite particular asset, namely the MS Industries AG (MSAG.DE), which is a Germany-based industrial technology company. The MSAG.DE data set is specific in the sense of featuring many (repeating) zero returns, which is typical for many individual companies' stocks. Sections 4.1 and 4.2 cover the results for the original and perturbed prices series, respectively, with a very detailed analysis providing in-depth insights into the LLFT-SV model performance. Then, in Section 4.3, we shift our attention to two common stock market indices: S&P 500 and DAX, to examine the LLFT-SV model's validity also for more 'typical' financial data.
The MSAG.DE stock prices were downloaded from http://finance.yahoo.com (accessed on 31 March 2021) and cover the period from 28 December 2016 through 25 February 2021. The series is transformed into logarithmic rates returns, expressed in percentage points, and forming a series of 1053 observations. The first two available observations are spared for the initial condition in the AR(2) structure underlying the mean equation (two lags are chosen in view of the Lindley type test for the restriction that the autocorrelation parameters for higher lags are equal to zero). Therefore, x t = (1, y t−1 , y t−2 ), and the final number of the modelled observation is T = 1051.
Basic descriptive characteristics of the modelled data are presented in Table 1, with the prices and returns plotted in Figure 8. It can be seen from the graphs that the return rates are centred around zero, featuring some outliers, as well. The data distribution is highly non-Normal, as confirmed by kurtosis, which exceeds three by far. From Table 1, it can also be noted that the returns' range is fairly spread, with their minimum and maximum at −22.399 and 30.390, respectively. These extreme observations occurred in March 2020 and can be explained by the turbulence in financial markets caused by the COVID-19 pandemic. The data are also characterized by the presence of zero returns, with zero being exactly the median (while the sample mean is close to zero). The relative frequency of zero returns is equal to about 0.11. The exact positions of the zero returns are indicated in the figures by the blue vertical lines. This relatively high concentration of the zero returns manifests in the histogram through a prominent peak at the mode (see panel (c) in Figure 8). It is worth noting that the zero returns are not always related to zero volumes. In many cases, the volume is positive, but the price stays unchanged.  In view of the above, it could be argued that the data at hand, with some concentration in zero, should be modelled by means of mixed, continuously-discrete distributions. These, however, remain highly unpopular in practice, usually giving way to models based on families of continuous distributions. As pointed out by [27], Bayesian inference on the basis of a sample containing repeated observations (not necessarily zero-valued) is not always possible with the use of a continuous sampling distribution, even under a proper prior distribution. Admittedly, the LLFT-SV model introduced in this paper, as being based on a mixture of continuous probability distributions, fails to explicitly model the incidence of zero returns. However, owing to the properties of the LLFT distribution proposed in this paper, in the LLFT-SV model, the zero returns or any repeated observations do not pose such a problem, since (under the assumed prior structure) it can be proved that the marginal likelihood (i.e., marginal data density value) is finished (see Theorem A1 in Appendix A). The empirical study is divided into two subsections. In the first one, we analyze the original series, featuring repeated zero returns. However, the validity of the LLFT-SV model can also be illustrated in the case where repeated observations do no occur per se, but the series still contains values that concentrate tightly around the median. To this end, we perturb the MSAG.DE prices and present the results for such a series in Section 4.2.
For the sake of comparison, we also consider the Stochastic Volatility model with the conditional Student's t-distribution (t-SV). The t-distribution's density function can be expressed as a scale mixture of normals. That is, the t-distributed (with ν degrees of freedom) random variable ε t in the t-SV model can be written as follows: The t-SV model is usually used to better explain the heavy tails of the empirical distributions of returns. However, as exemplified below, in contrast to LLFT-SV, the t-SV model may still not be flexible enough to simultaneously allow for fat tails and a very high peak close to the mean (or median) returns.
Such prior distributions reflect our rather little prior knowledge about the model's parameters. In order to examine the sensitivity of the posterior distributions to the priors, we try different values of the remaining hyperparameters, which we present in subsections below.
To obtain a pseudo-random sample from the posterior distribution, we use the hybrid sampler presented in the previous section-the Metropolis-Hastings algorithm within the Gibbs procedure, generating 50,000 of burn-in and 200,000 posterior drawings. The algorithm has been implemented in the authors' own computer code written in GAUSS. Computations were carried out on a personal computer with an Intel Core i7-9850H processor and 16 GB RAM and took about 5-6 h for a single model. A detailed examination of the MCMC convergence is provided in Appendix B.

Results for the Original MSAG.DE Data
In this subsection, we present the results obtained for the original MSAG.DE data set and assumea priori that the values of c very close to 0 are unlikely. To represent such a prior belief about c, it is assumed that c ∼ I NK(2; 0.1) truncated to the interval (0, 1). Table 2 reports the posterior medians, 90% confidence intervals, and interquartile ranges. The 90% confidence intervals are calculated using the 5th and 95th percentiles of the posterior samples.  Table 2). The sensitivity of the posterior distribution of ν with respect to the prior can also be seen in the t-SV model (see the last two columns in Table 2). However, in both types of models considered here, the percentiles of ν indicate that the local slash and Student's t-distributions are more appropriate than the normal distribution for the mean equation innovations. In turn, the posterior distribution of parameter p indicates that the locally leptokurtic and heavy-tailed mixture normal distribution is more appropriate than the local slash distribution. The LLFT distribution makes it possible to explain both the outlying returns and the returns concentrating close to their median. As regards the persistence in volatility (φ) and the variance of the volatility process (σ 2 ), all models give similar, though not identical, results. Table 2. Posterior characteristics of model parameters for the original MSAG.DE data set. The first row of each entry: posterior median. The second row: 90% posterior confidence interval (in parentheses). The third row: interquartile range.        However visible, the posterior sensitivity of the latent processes' parameters does not transfer into similar sensitivity of the posterior means of the processes themselves, especially as it comes to the conditional standard deviations of the returns, σ t (see Table 3). The average of posterior means of σ t = √ h t min{max{ω t , c}, d} is equal to 2.119 with the standard deviation 1.051. In the t-SV model, the average of posterior means of σ t = √ h t ω t is equal to 2.173 with the standard deviation 1.210. The series of the posterior means of σ t obtained in both the models are highly correlated, with the correlation coefficient at 0.964. In all Bayesian models considered in this subsection, the corresponding latent processes are highly correlated, indicating their very similar dynamics. It can be seen in Figures 15-21, where we graphed the posterior means and conditional standard deviations of h t s (in both of the models h t s are relatable) and √ ω t in t-SV that corresponds to min{max{ω t , c}, d} in LLFT-SV. Notice that due to the non-negativity of these quantities, the two-standarddeviation bands presented in the figures are truncated only to positive values. Finally, we compare the predictive capabilities of the Bayesian LLFT-SV and t-SV models. For this purpose, the data set is split into two subsets: training and ex post prediction evaluation sets. As the training set, we take the first 800 observations, with the resulting posterior that can actually be viewed as the prior distribution before a consecutive observation is included into the recursive (not rolling) estimation window. The forecast evaluation period encompasses the most recent 251 trading days. We perform one-to tenstep-ahead predictions over the period 2 March 2020 through 25 February 2021, which gives 242 predictive distributions for each of the ten forecast horizons under consideration, and thereby 2420 predictive distributions in total. The predictive distributions are calculated based on the whole data set available at time 800 + t for each t = 1, . . . , 242 (recursive forecasting scheme). The models are re-estimated each time, i.e., upon arrival of each new observation. Each of the predictive densities is based upon 50,000 MCMC posterior draws, preceded by either 250,000 burn-in passes for t = 1 or 10,000 cycles for t = 2, 3, . . . , 242, with the sampler each time initiated at the final draw of the previous run. The sequence of the one-step-ahead predictive distributions covers the period from 2 March 2020 to 12 February 2021, while the sequence of the ten-step-ahead predictive distributions covers the period 2 March 2020-25 February 2021.  As the basis of the predictive model comparison, we choose the sum of the so-called log predictive likelihoods, which for some horizon h in a model M can be written as: where p(y o t+h |y t,o 1 , y (0) , M) is the predictive density of y t+h at the observed value y o t+h , y t,o 1 = (y o 1 , . . . , y o t ) denotes the observations up to time t, n h is the number of forecasts, and finally,T = T − n h . Note that for h = 1, the difference: SLP(1, M 1 ) − SLP(1, M 2 ) amounts to the cumulative log predictive Bayes factor in favour of model M 1 against M 2 , which informs us how the posterior chances of M 1 versus M 2 (based on the observations up to timeT) change upon observing predicted data, yT +n h T+1 = (yT +1 , . . . , yT +n h ) . The use of predictive likelihoods (the predictive density values at the realized "future" observations) are motivated and described in, e.g., [35]. Note that, in this study, we do not consider the models' in-sample "fit", which is not only due to our intent to avoid the burden of efficient approximation of marginal data density values (and the issue of sensitivity with respect to the priors) but mostly because it would be hardly related to the present context of forecasting.
The predictive density values (predictive likelihoods) in Equation (39) need to be calculated numerically using draws from the posterior distribution of parameters and latent variables: t+h−1 ) are draws from the posterior (j = 1, 2, . . . , N). Table 4 presents the results of predictive power comparisons of two alternative models: LLFT-SV and t-SV, obtained for the original (i.e., not perturbed) data set and assuming that ν ∼ G(8; 0.8) and fixed values of parameters c and d are the medians of their marginal posteriors, obtained under two different priors for c (we provide further details below). Thus, two cases are further considered. In the first one: c = 0.118 and d = 7.553, while in the second: c = 0.014 and d = 2.808. Note that we decided to fix the values of these two parameters here (rather than sample them from their posterior), which is intended to spare much of otherwise largely time-consuming calculations. Even now, the entire forecasting exercise for a single LLFT-SV model, with fixed values of c and d, takes as much as roughly two days of calculations (on a personal computer with Intel Core i7-9850H processor and 16 GB RAM). Otherwise, allowing for posterior sampling of these two parameters as well would extend this time to about eight days (for a single model). The results in Table 4 show clear evidence that the LLFT-SV models (particularly the one with c = 0.014) outperform t-SV for all forecast horizons. Due to apparently small differences in the log predictive likelihoods of the t-SV and LLFT-SV models with c = 0.118 and d = 7.553, one could get an impression that both are quite comparable when it comes to their predictive power. However, the cumulative decimal logarithm of the predictive Bayes factor in favour of the LLFT-SV model with c = 0.118 versus t-SV, defined as a difference of the corresponding sums of the log predictive likelihoods for h = 1 (see the column for h = 1 in Table 4), is equal to about 2.175, which indicates that the posterior odds ratio based on the first 800 observations and calculated in favour of the LLFT-SV model increased by about 150 times upon observing the predicted data. The LLFT-SV model with c = 0.014 provides further, even more considerable improvement, since the cumulative logarithm of the predictive Bayes factor in favour of this model against the one with c = 0.118 equals about 22. Thus, the posterior odds of these two models increases by as much as about 22 orders of magnitude upon observing data over the forecast period (i.e., 2 March 2020 through 25 February 2021). Note that the low value of c = 0.014 allows σ t to reach very small values (close to zero), thus enabling the distribution of the conditional mean equation's innovations to feature a noticeable peakedness. Therefore, it may be concluded that enabling a sharp peak at the mode in the conditional distribution and modelling returns hovering near zero seem to be more important to the predictive performance than fat tails (at least for the data at hand).

Results for the Perturbed MSAG.DE Data
Now, we illustrate the validity of the LLFT-SV model in the case where repeated observations do no occur per se, but the series still contains values that concentrate tightly around (but not at) the mode. For this purpose, we use slightly perturbed observations, using two kinds of perturbation applied to the price series, both in the spirit of [27]. In the first case, the data are perturbed by adding a uniformly distributed random number from the interval (−5 × 10 −5 , 5 × 10 −5 ), while we widen the interval to (−5 × 10 −4 , 5 × 10 −4 ) in the second. As a consequence of the price perturbation, the perturbed returns become different from each other and different from zero. The perturbed returns corresponding to the zero returns in the original series fall within the intervals (−0.0072, 0.0049) and (−0.056, 0.058) in the first and second case, respectively. Basic descriptive characteristics (not presented in the paper for the sake of brevity) of the perturbed daily returns are very similar to those of the original data, presented in Table 1. Moreover, the histograms of the original and both the perturbed data sets practically coincide (thus, we do not present them, either). Naturally, the medians are now different from zero, equaling −0.0005 in the first case and 0.0034 in the second. In the first case, the concentration of the returns around zero is higher than in the second case, with about 11% of the returns lying within the interval (−0.0072, 0.0049), and only 3.2% in the second case (in the same interval).
It can be seen from Table 5 that upon specifying I NK(2; 0.1) as the prior for c in the LLFT-SV model, thus preferring a priori values of the parameter to be relatively distant from zero, the results of Bayesian inference about parameters and latent variables turns out to be quite robust to the considered perturbations. It seems that under such a prior, the values of c are too high, dominating (i.e., censoring) ω t , to "adequately" explain the presence of numerous observations very close to the mode. Thereby, the returns that are close but not equal to zero appear to still be treated here in the same way as the zero returns in the model for the original series.
Turning back to Table 5, we notice that very similar results are obtained for the t-SV model. It seems that although the model "adequately" captures the heavy tails of the empirical distribution of the returns, it fails to explain the observations that are compressed around the mode. In both types of models, the inference about latent processes, h t and σ t , remains unchanged (again, relevant graphs are not presented in the paper, for the sake of brevity).
Since both fat tails and a high concentration of financial returns around the mode need to be accounted for simultaneously, we intend to also consider such LLFT-SV models that allow c to be closer to zero (as compared with the prior assumptions in the previous analysis). Recall that smaller values of c lead to a more pointed distribution of the conditional mean equation's innovations. To investigate this fact, we assume that c follows a priori the N IK(0.4; 0.009) distribution, so that its mode is now equal to about 0.1 instead of 0.2 considered in the previous analysis. Simultaneously, we impose the restriction ν ≥ 1, which is necessary from a numerical point of view. This restriction is not needed when c is bounded away from zero (then the posterior probability of ν ≤ 1 is equal to zero).   When the prior distribution of c admits relatively more mass close to zero, only the larger perturbation (the second one) turns out to change the location and dispersion of the posterior distributions ν and ρ, which also affects, in consequence, the inference about all the remaining parameters (apparently with the exception of the volatility persistence, φ), see Table 6. For the original data and the first perturbation, the two parameters: ν and ρ, have their posterior probability mass at much lower values than in the case of c ∼ I NK(2; 0.1). Moreover, the posterior marginals of these two parameters exhibit a very high concentration around their modes. Additionally, d gathers almost all of its posterior mass near 1 (see Figure 22), indicating that the presence of the latent process h t in the LLFT-SV structure is enough to describe heavy tails of the empirical distribution of modelled returns.  Thus, in this sense, the LLFT-SV model is able to detect "inliers", defined here as the observations that lie in the central part of empirical distribution and their occurrence is "atypical" in the sense of their repeatedness or their "atypically" high concentration around the mode. Note that the inference about the mixing variables, ω t s, depends on the size of perturbation. It seems that the closer a point observation (y t ) is to zero (or to the mode), the smaller value of corresponding min{max{ω t , c}, d} in the LLFT-SV model with c ∼ I NK(0.4; 0.009). Thanks to this prior assumption, min{max{ω t , c}, d} can take on values close to zero and, in consequence, atypical "inlying" observations can be better explained. This result is in accordance with those of the predictive power comparison of models, presented in the previous subsection. Allowing c to take on low values considerably improves the predictive performance of the LLFT-SV model, due to its far more appropriate handling of the very center of the data distribution's central part.

Results for Stock Market Indices
In this section, we present only a brief analysis of the LLFT-SV model estimation results for two common stock market indices: Standard and Poor's 500 (S&P 500) and Deutscher Aktienindex (DAX), both representing "typical" financial series, with their dynamics reflecting movements of the entire market rather than only of a single company's stock prices (possibly, of a limited liquidity). Both data sets were downloaded from http://stooq.pl (an open access web page) and cover the same period as the one considered previously for MSAG.DE: 28 December 2016 through 25 February 2021. As seen in Figure 26, the logarithmic rates of return reveal strong leptokurticity and fat tails, with the latter built up particularly due to the market volatility outburst triggered by the COVID-19 pandemic. However, contrary to the MSAG.DE data set, the series currently at hand do not feature zero returns nor any repeating value, therefore representing quite distinct characteristics.
Similarly to the analysis for MSAG.DE, the first two available observations are spared for the initial condition in the AR(2) structure, yielding the final number of modelled observation of T = 1047 for DAX, and T = 1044 for S&P 500.
Posterior characteristics of the key parameters of the LLFT-SV model (ν, ρ, p) indicate the validity of modelling the error term through the LLFT distribution, pointing to a non-negligible contribution of both mixture components: beta and Pareto (in the SMN representation); see Table 7. As implied by the posterior expectations of p, for DAX, both components are almost of the same weight, while in the case of S&P 500, the beta component constitutes roughly 30% of the mixture. Higher posterior medians of c for DAX, as compared to S&P 500, indicate a lower local conditional peakedness, which is suitably accompanied by lower posterior medians of d, the values of which, being close to 1, bring the conditional distribution of ε t (given d) closer to a normal distribution. However, it seems that in the case of S&P 500 the posterior distributions of c and d are visibly affected by the priors (see relevant panels in Figure 27).
In general, we could conclude that incorporating the LLFT distribution into an SV model may be an empirically valid extension of the basic stochastic volatility structure also for "more typical" financial time series, such as market indices, rather than only for some specific company's ("less typical") stock returns. Feb

Discussion
In the paper, we proposed a new scale mixture of normal distributions that features local leptokurticity and local fat tails. This new LLFT distribution was further incorporated into a basic stochastic volatility model (yielding a LLFT-SV specification), so as to enhance the model's capability of capturing corresponding empirical properties of financial time series. For the new model, we developed a Bayesian framework along with valid MCMC methods of posterior sampling. Empirical results indicate the validity of the LLFT-SV specification for modelling both "non-standard" financial time series with repeating zero returns (resulting in a single pronounced histogram bar), as well as more "typical" data on the S&P 500 and DAX indices. For the former, the LLFT-SV model was also shown to markedly outperform a common, globally heavy-tailed, t-SV alternative in terms of density forecasting (as measured by the sum of predictive likelihoods). Apparently, the noticeable predictive superiority of the LLFT-SV model draws on its more adequate handling of the peak in the central part of the returns' distribution but not at the expense of a valid modelling of the outliers.
In general, it can be concluded that it is not only fat tails but also close-to-mode returns that are vital to financial data modelling. Thus, more flexible than common heavytailed (such as Student's t or slash) distributions may be empirically required so that both features can be 'freely' controlled for by separate parameters (rather than a single one). Nonetheless, the LLFT-SV specification proposed in this study leaves some important room for further improvement, with the model's most obvious limitations being the symmetry of the conditional distribution and the preclusion of the leverage effect. The symmetry of the LLFT distribution suggests that it lends itself mainly to modelling either time series with quite a symmetric data distribution or at least those where an adequate capturing of the tails and peakedness would empirically prove more important than allowing for skewness (although this could not be settled without comparing alternative models). Obviously, extending the LLFT-SV model towards the asymmetry could be directly related to it gaining the ability of accounting for the leverage effect. Moreover, skewing the LLFT distribution itself seems a valid and important extension. We leave such generalizations of our current framework for future work. Finally, and on a different note, evaluation of the LLFT-SV model in terms of risk modelling and management would also be most desirable.
Author Contributions: All authors have contributed equally to the paper. This includes conceptualization, methodology, formal analysis, investigation, data curation, software, visualization, and writing. In addition, all authors have read and agreed to the published version of the manuscript. Theorem A1. For the stochastic volatility model presented in Section 3, and under the additional assumption that the inverse gamma prior distribution of σ 2 is truncated to the interval (0, κ 2 σ ), for any fixed κ 2 σ < ∞, we have that p(y) < ∞. For the term f 1 , we have the following inequality f 1 ≤ e . Using this inequality, we get that there exist C 1 , C 2 < ∞ such that

Proof of
The term f 2 is the kernel of a multivariate Gaussian distribution forh, which can be written as where Σ h = [a ij ] T×T is a tridiagonal matrix such that |Σ h | 1 2 = σ −T , with the elements on the main diagonal: a ii = φ 2 +1 σ 2 , for i = 1, 2, . . . , T − 1, a TT = 1 σ 2 and a ij = − φ σ 2 , for |i − j| = 1, and r is the mean vector. The term g(φ) is a polynomial of order T − 2 with positive values for φ ∈ (−1, 1) and with known coefficients that depend only on the sample's size, T. Hence, there exists C 3 < ∞, which depends only on T, and such that g(φ) ≤ C 3 .