A Bayesian Approach to Heavy-Tailed Finite Mixture Autoregressive Models

In this paper, a Bayesian analysis of finite mixture autoregressive (MAR) models based on the assumption of scale mixtures of skew-normal (SMSN) innovations (called SMSN–MAR) is considered. This model is not simultaneously sensitive to outliers, as the celebrated SMSN distributions, because the proposed MAR model covers the lightly/heavily-tailed symmetric and asymmetric innovations. This model allows us to have robust inferences on some non-linear time series with skewness and heavy tails. Classical inferences about the mixture models have some problematic issues that can be solved using Bayesian approaches. The stochastic representation of the SMSN family allows us to develop a Bayesian analysis considering the informative prior distributions in the proposed model. Some simulations and real data are also presented to illustrate the usefulness of the proposed models.


Introduction
Data analysts apply computational models to describe and infer statements about complex datasets. Mixture models are a valuable class of these models. Finite mixture models are of great importance in statistical inferences. The multimodality, skewness, kurtosis, and unobserved heterogeneity are usually observed in many datasets, for example time series datasets. Importance of mixture distributions, which are the main tools for statistical mixture models, has been noted by many references [1][2][3][4][5][6][7][8]. Various statistical fields containing time series modeling and regression analysis frequently use the mixture modeling. In fact, in analyzing time series data, some events may affect and change the behavior of data over time, for example, finance crises in many financial or panel time series datasets. The finite mixture autoregressive (MAR) models were suggested by Wong and Li [7], to catch the multimodal phenomena, are flexible and applicable in many fields, such as electroencephalogram modeling in medicine [8], interest rates and bond pricing [9], Forex rate [10], and other various fields such as telecommunications, hydrology, biology, sociology, and medical sciences.
Most researchers considered the MAR models based on Gaussian distribution, which are called Gaussian mixture of autoregressive (GMAR) models [7][8][9][10][11]. The GMAR models are sensitive to the existence of outliers, heavy tailed distribution and asymmetry in datasets, and thus some authors considered MAR models based on non-Gaussian, heavy tailed and/or asymmetric distributions. Wong and Li [12] introduced logistic mixture autoregressive model. Wong and Chan [13] also introduced a student t-mixture autoregressive model and discussed its inferences and then applied this model to the heavy tailed financial data.
The SMSN family, introduced by Branco [1], is a rich family containing famous symmetric scale mixtures of normal (SMN; [14]) distributions and asymmetric distributions such as the skew-t, skew-slash and skew-contaminated-normal distribution. The MAR model based on the SMSN innovations, hereafter called SMSN-MAR models, has reasonable performance to model real time series data with outliers and skewness. The various MAR model members can determine weights to each observation, and therefore each single observation affects the estimation of the model's parameters, and it leads to robust inferences. Bayesian inferences of the MAR models have some advantages compared to classical methods, that some of them are discussed below.
For maximum likelihood computations and classical inference of a MAR model, although the numerical optimization algorithms such as the EM-algorithm can be applied, converging to the major mode may fail, while this is not an issue in Bayesian inferences. Furthermore, in the MAR models there exist many situations with indirect information about the parameters or large changes in the results by small changes in the data. When we consider a sample from a MAR model, the sample includes no information about the parameters of one or more components with a positive probability. Therefore, the likelihood function can be unbounded and consequently the resultant likelihood estimates are only local maximum. Using suitable priors and some Bayesian techniques can solve such problems. Another problematic issue on the classical likelihood-based inference is that, in MAR model, the component parameters are not identifiable marginally, so distinguishing any component from other components in the likelihood is hard. The identifiability is not an important problem in Bayesian inferences. Some of the other advantages of the Bayesian inferences of the mixture models are given in [15,16].
In this paper, we consider a Bayesian technique to find estimates of the SMSN-MAR models by the Gibb sampling scheme and application of the Markov Chain Monte Carlo (MCMC) methods. The main characteristics of the SMSN distributions are reviewed in Section 2. In Section 3, we consider the MAR models and apply the Bayesian methodology for estimating the proposed MAR model's parameters. To evaluate the Bayesian estimates of parameters and performance of the proposed model, numerical studies and a real practical example are reported in Section 4. The conclusion is provided in the last section.

The SMSN Distributions
A random variable Y is belonged to the scale mixtures of skew normal (SMSN) family, if it has the following representation: where the random variable Z follows a skew-normal distribution in [17], where σ 2 and λ are respectively scale and skewness parameters, the function k(u) is positive respect to u, U is the scale random variable with density function or pmf h(·; ν) and distribution function H(·; ν), and the vector or scalar ν indexes the distribution of U. The random variable Y is denoted by Y ∼ SMSN µ, σ 2 , λ, ν , and in this work, because of its suitable mathematical properties, we let k(u) = 1/u, so Y|U = u ∼ SN µ, u −1 σ 2 , λ . Therefore, the density of Y is given by where θ = µ, σ 2 , λ, ν , and φ(·) and Φ(·) are density and distribution functions of the standard normal distribution, respectively.
and W 0 and W 1 are independent standard normal random variables.
The famous members of the SMSN family are the scale mixtures of normal (SMN) family [8,13], as the symmetric members of the SMSN family, and skew-normal (SN), skew-t (ST), skew-slash (SSL) and skew-contaminated-normal (SCN) distributions. More details are given in [18,19].

The SMSN-MAR Model
In this paper, we study the MAR model, which has been studied by Wong and Li [7], and Wong and Chan [20], due to its flexibility in use for non-linear time series analysis. This model is represented by or equivalently, j=1 ϕ 1,j X t−j + ε 1,t ; with probability π 1 ϕ 2,0 + p 2 j=1 ϕ 2,j X t−j + ε 2,t ; with probability π 2 , t = 0, ±1, ±2, . . . , . . . ϕ g,0 + p g j=1 ϕ g, j X t−j + ε g,t ; with probability π g where • g is a known positive integer which indicates the number of components in the model; • each component occurs with probabilities π i > 0, i = 1, . . . , g, g i=1 π i = 1, which obeys a discrete distribution π; • for each i = 1, . . . , g, the ith autoregressive component is of order p i ≥ 1; ϕ i,j , j = 0, . . . , p i , are the autoregressive coefficients of the ith components; each ith innovation's component ε i,t distributed as following: Hereafter, this model will be called SMSN-MAR(p,g). Note that in each component, . . , g. Note that we can assume that the vector of coefficients in MAR models, distributed as π, therefore, this model can be also interpreted as random coefficients autoregressive (RCA) model.
Without loss of generality, it is convenient to set p = max i=1,...,g p i and ϕ i,j = . . . = ϕ i,p = 0 for j > p i . Also, let {Z t } be an i.i.d. sequence of random variables which are distributed as π (Multinomial), such that P(Z t = i) = π i , i = 1, . . . , g, that determines the components, and be independent of innovations and random history of model X = (X 1 , . . . , X n ) . Therefore, X t obeys from ith component of MAR model, if Z t = i. Thus, Equation (3) can be written as where It is convenient to set vector and matrix notations of sample X = (X 1 , . . . , X n ) in the form of . , ε Z t ,n ) and A be the n × p matrix whose tth row is X t−1 , in the rest of the paper. Thus, the following matrix representation Note that in Equation (7), . , g and distributed as π, so by Equation (5) we have Therefore, the density of N t is given by . . , g be the vector of parameters. By using Markovian property of this model, we have that L(θ|X) = f X (X 1 , . . . , X n ; θ) = n t=1 f X t |X t−p ,..., X t−1 (X t ; θ), where f X (.) be the density of X. An auxiliary determiner of components in MAR models (6), can be expressed in the random vector Z t = Z t1 , . . . , Z tg , where Therefore, under the above approach, the latent random vector Z t , t = 1, . . . , n has the following multinomial Z t ∼ Multinomial 1, π 1 , . . . , π g with the following probability mass function: Also, to provide a MCMC method, we have: is the missing parts of data, and θ is the vector of unknown parameters. The conditional likelihood function based on the complete data is provided by

Bayesian Approach
In this part, we use the Bayesian approach by applying MCMC algorithms for the model in Equation (3). We consider informative priors under squared error loss function (SEL). To consider the prior distributions for the parameters of the considered model, we assign prior distributions to the parameters as follow: for i = 1, . . . , g, where Dir and IG denote the Dirichlet distribution and the Inverse-Gamma distribution, respectively. The prior distribution assigned to ν i , i = 1, . . . g, varies with the particular SMSN distributions. For skew-t model, we can consider ν i ∼ exp(ς i /2)I (2,∞) , exponential distribution with mean 2/ς i before truncation, for skew-slash model, we can consider ν i ∼ Gamma(a i , b i ) with small positive values a i and b i such that b i a i , and for skew-contaminated-normal model, the independent and non-informative U(0, 1) prior distributions are considered for each component of ν i = (ν i , γ i ) . We also assume that priors are independent and all hyper-parameters (parameters of priors) are known, which guarantees that all posteriors are proper. The joint posterior of θ and missing variables is in the form of π( θ| D) ∝ L(θ |D )π(θ), which is not analytically tractable, but the MCMC methods such as the Gibbs sampling, will be applied to draw samples from the marginal of conditional posteriors which are shown as follows, when θ (−m) , means the mth element of θ has been removed (Algorithm 1).

1.
Z t |θ, U, W; X ∼ Multinomial 1, p 1 , . . . , p g , Finally, in order to complete the specifications of the sampling via MCMC methods, the posterior distribution of the latent variable U ti , t = 1, . . . , n, i = 1, . . . , g, H post (u; θ, ν), and its indexed parameter ν i , f post (ν; θ), should be determined. Set c ti = X t − φ i X t−1 − ∆ i W ti 2 /σ 2 i + (W ti − b i ) 2 , we then have, for t = -1, . . . , n and the several models given as: and SSL-MAR: SCN-MAR: where Note that Equations for v i and γ i do not have closed forms, but a Metropolis-Hasting algorithm which has used in [21] can be employed to obtain draws of them.
In the Bayesian technique, some model selection criteria such as the expected Bayesian information criterion (EBIC) and the expected Akaike information criterion (EAIC) [22,23] can be considered. Let θ 1 , . . . , θ m as a sample from π( θ| D). The posterior mean of the deviance is estimated by D = m i=1 D(θ i )/m, such that D(θ) = −2 n t=1 log f (X t |X t−1 , θ), and f (·|X t−1 , θ) is the density given in Equation (8). These criteria can be respectively estimated byÊAIC = D + 2k andÊBIC = D + k log n, where k is the number of parameters, n is the sample size andD = D θ , where θ = m i=1 θ i /m.

Numerical Studies
In this section, numerical studies and a real data example are given to study the ability of the considered SMSN-MAR(p,g) model. The statistical R software version 3.6.1 is used. A sample size of n = 200, and the following prior distributions are selected: φ i ∼ N p+1 0, 10 3 I p , ∆ i ∼ N 1 0, 10 3 and γ 2 i ∼ IG(0.001, 0.001), with ν i ∼ exp(0.1)I (2,∞) for the skew-t model, ν i ∼ gamma(1, 0.01) for the skew-slash model, and ν i ∼ U(0, 1) and independent of γ i ∼ U(0, 1) for the skew-contaminated normal model. Also, a Gibbs sampling with iterations equal to 60,000 and a burn-in of 10,000 cycles is applied.
In the simulation parts, we considered two schemes. In the first scheme, simulations are based on the SMSN-MAR models to show the performances of the considered model and Bayesian estimates, and in the second scheme, simulations are mixture autoregressive model based on Generalized-Hyperbolic distribution (GH-MAR model). The GH distribution is well known because of it's asymmetry and heavy-tailed nature [24,25]. The second scheme will show the performance of the proposed Bayesian estimates of the heavy-tailed SMSN-MAR models to the time series with heavy-tailed and asymmetric innovations. Finally, the SMSN-MAR model is applied on the closing price of the Iran Telecommunication Company stock via the proposed Bayesian methodology.

Real Data
In this section, we consider the 446 daily observations of the closing price of Iran Telecommunication Company (I.T.C.) stock based on the Iranian Rial from 2011-07-02 up to 2013-07-02 (see data at Tehran securities exchange technology management company site: www.tsetmc.com). Original time series plot of the I.T.C. stock data { }, where is the closing price at time , are given in Figure 2. It can be observed that the time series { } is not stationary. A histogram of the data is plotted in Figure 3.

Real Data
In this section, we consider the 446 daily observations of the closing price of Iran Telecommunication Company (I.T.C.) stock based on the Iranian Rial from 2011-07-02 up to 2013-07-02 (see data at Tehran securities exchange technology management company site: www.tsetmc.com). Original time series plot of the I.T.C. stock data {x t }, where x t is the closing price at time t, are given in Figure 2. It can be observed that the time series {x t } is not stationary. A histogram of the data is plotted in Figure 3.

Real Data
In   Using the Box-Cox transformation with = 0, and the difference with order of one, we have where * = log .
The time series { } has been plotted in in Figure 4. The Dickey-Fuller test verifies that the transformed time series data are stationary (p-value = 0.01). Using the Box-Cox transformation with λ = 0, and the difference with order of one, we have where The time series y t has been plotted in in Figure 4. The Dickey-Fuller test verifies that the transformed time series data are stationary (p-value = 0.01).  Using the Box-Cox transformation with = 0, and the difference with order of one, we have where * = log .
The time series { } has been plotted in in Figure 4. The Dickey-Fuller test verifies that the transformed time series data are stationary (p-value = 0.01).  The EAIC and EBIC criteria demonstrate that the best SMSN-MAR model has two components with order p 1 = 3 for the first AR component and p 2 = 1 for the second AR component. Using the proposed Bayesian approach, we fitted the N-MAR (GMAR), SN-MAR, ST-MAR, SSL-MAR and SCN-MAR models with g = 1, 2, 3 components. The Bayesian model selection criteria for them are given in Table 2. The Bayesian estimates (posterior mean) of the ST-MAR (3,2) (the best fitted) model parameters for the return series of the I.T.C. stock data are given in Table 3.

Conclusions
In this paper, we have considered the robust MAR models based on the scale mixtures of skew normal distributions, called SMSN-MAR, and a Bayesian approach to estimate the models' parameters. These models are used for modeling non-linear time series data, which involve GMAR models, and offer greater flexibility than normal distribution. Numerical studies verified the good performance of the considered models and suitability of Bayesian estimates. A real data example demonstrated that SMSN-MAR models can be useful tools for non-linear and non-stationary time series modeling. In future studies, the researchers can consider cyclostationary or almost cyclostationary processes [26][27][28][29][30][31][32][33] with SMSN errors or mixture models based on these processes.