1. Introduction
The autoregressive (AR) model is one of the simplest and most popular models in the time series context. The AR(
p) time series process
${y}_{t}$ is expressed as a linear combination of
p finite lagged observations in the process with a random innovation structure for
$t=\{1,2,\dots \}$ and is given by:
where
${\phi}_{1},{\phi}_{2},\dots ,{\phi}_{p}$ are known as the
p AR parameters. The process mean (i.e., mean of
${y}_{t}$) for the AR(
p) process in (
1) is given by
${\mu}^{*}={\phi}_{0}{(1{\phi}_{1}{\phi}_{2}\cdots {\phi}_{p})}^{1}$. Furthermore, if all roots of the characteristic equation:
are greater than one in absolute value, then the process is described as stationary (which is considered for this paper). The innovation process
${a}_{t}$ in (
1) represents white noise with mean zero (since the process mean is already built into the AR(
p) process) and a constant variance, which can be seen as independent “shocks” randomly selected from a particular distribution. In general, it is assumed that
${a}_{t}$ follows the normal distribution, in which case the time series process
${y}_{t}$ will be a Gaussian process [
1]. This assumption of normality is generally made due to the fact that natural phenomena often appear to be normally distributed (examples include age and weights), and it tends to be appealing due to its symmetry, infinite support, and computationally efficient characteristics. However, this assumption is often violated in reallife statistical analyses, which may lead to serious implications such as bias in estimates or inflated variances. Examples of time series data exhibiting asymmetry include (but are not limited to) financial indices and returns, measurement errors, sound frequency measurements, tourist arrivals, production in the mining sector, and sulphate measurements in water.
To address the natural limitations of normalbehavior, many studies have proposed AR models characterized by asymmetric innovation processes that were fitted to real data illustrating their practicality, particularly in the time series environment. The traditional approach for defining nonnormal AR models is to keep the linear model (
1) and let the innovation process
${a}_{t}$ follow a nonnormal process instead. Some early studies include the work of Pourahmadi [
2], considering various nonnormal distributions for the innovation process in an AR(1) process such as the exponential, mixed exponential, gamma, and geometric distributions. Tarami and Pourahmadi [
3] investigated multivariate AR processes with the
t distribution, allowing for the modeling of volatile time series data. Other models abandoning the normality assumption have been proposed in the literature (see [
4] and the references within). Bondon [
5] and, more recently, Sharafi and Nematollahi [
6] and Ghasami et al. [
7] considered AR models defined by the epsilonskewnormal (
$\mathcal{ESN}$), skewnormal (
$\mathcal{SN}$), and generalized hyperbolic (
$\mathcal{GH}$) innovation processes, respectively. Finally, AR models are not only applied in the time series environment: Tuaç et al. [
8] considered AR models for the error terms in the regression context, allowing for asymmetry in the innovation structures.
This paper considers the innovation process
${a}_{t}$ to be characterized by the skew generalized normal (
$\mathcal{SGN}$) distribution (introduced in Bekker et al. [
9]). The main advantages gained from the
$\mathcal{SGN}$ distribution include the flexibility in modeling asymmetric characteristics (skewness and kurtosis, in particular) and the infinite real support, which is of particular importance in modeling error structures. In addition, the
$\mathcal{SGN}$ distribution adapts better to skewed and heavytailed datasets than the normal and
$\mathcal{SN}$ counterparts, which is of particular value in the modeling of innovations for AR processes [
7].
The focus is firstly on the
$\mathcal{SGN}$ distribution assumption for the innovation process
${a}_{t}$. Following the skewing methodology suggested by Azzalini [
10], the
$\mathcal{SGN}$ distribution is defined as follows [
9]:
Definition 1. Random variable X is characterized by the $\mathcal{SGN}$ distribution with location, scale, shape, and skewing parameters $\mu ,\alpha ,\beta $, and λ, respectively, if it has probability density function (PDF):where $z=(x\mu )/\alpha \in \mathbb{R}$, $\mu ,\lambda \in \mathbb{R}$, and $\alpha ,\beta >0$. This is denoted by $X\sim \mathcal{SGN}(\mu ,\alpha ,\beta ,\lambda )$. Referring to Definition 1,
$\Phi (\xb7)$ denotes the cumulative distribution function (CDF) for the standard normal distribution, with
$2\Phi \left(\sqrt{2}\lambda z\right)$ operating as a skewing mechanism [
10]. The symmetric base PDF to be skewed is given by
$\varphi $, denoting the PDF of the generalized normal (
$\mathcal{GN}$) distribution given by:
where
$\Gamma (\xb7)$ denotes the gamma function [
11]. The standard case for the
$\mathcal{SGN}$ distribution with
$\mu =0$ and
$\alpha =1$ in Definition 1 is denoted as
$X\sim \mathcal{SGN}(\beta ,\lambda )$. Furthermore, the
$\mathcal{SGN}$ distribution results in the standard
$\mathcal{SN}$ distribution in the case of
$\mu =0$,
$\alpha =\sqrt{2}$ and
$\beta =2$, denoted as
$X\sim \mathcal{SN}\left(\lambda \right)$ [
11]. In addition, the distribution of
X collapses to that of the standard normal distribution if
$\lambda =0$ [
10].
Following the definition and properties of the
$\mathcal{SGN}$ distribution, discussed in [
11] and summarized in
Section 2 below, the AR(
p) process in (
1) with independent and identically distributed innovations
${a}_{t}\sim \mathcal{SGN}(0,\alpha ,\beta ,\lambda )$ is presented with its maximum likelihood (ML) procedure in
Section 3.
Section 4 evaluates the performance of the conditional ML estimator for the ARSGN(
p) model through simulation studies. Real financial, chemical, and population datasets are considered to illustrate the relevance of the newly proposed model, which can accommodate both skewness and heavy tails simultaneously. Simulation studies and real data applications illustrate the competitive nature of this newly proposed model, specifically in comparison to the AR(
p) process under the normality assumption, as well as the ARSN(
p) process proposed by Sharafi and Nematollahi [
6]; this is an AR(
p) process with the innovation process defined by the
$\mathcal{SN}$ distribution such that
${a}_{t}\sim \mathcal{SN}(0,\alpha ,\lambda )$. In addition, this paper also considers the AR(
p) process with the innovation process defined by the skew
t (
$\mathcal{ST}$) distribution [
12] such that
${a}_{t}\sim \mathcal{ST}(0,\alpha ,\lambda ,\nu )$, referred to as an ARST(
p) process. With a shorter run time, it is shown that the proposed ARSGN(
p) model competes well with the ARST(
p) model, thus accounting well for processes exhibiting asymmetry and heavy tails. Final remarks are summarized in
Section 5.
2. Review on the Skew Generalized Normal Distribution
Consider a random variable
$X\sim \mathcal{SGN}(\mu ,\alpha ,\beta ,\lambda )$ with PDF defined in Definition 1. The behavior of the skewing mechanism
$2\Phi \left(\sqrt{2}\lambda z\right)$ and the PDF of the
$\mathcal{SGN}$ distribution is illustrated in
Figure 1 and
Figure 2, respectively (for specific parameter structures). From Definition 1, it is clear that
$\beta $ does not affect the skewing mechanism, as opposed to
$\lambda $. When
$\lambda =0$, the skewing mechanism yields a value of one, and the
$\mathcal{SGN}$ distribution simplifies to the symmetric
$\mathcal{GN}$ distribution. Furthermore, as the absolute value of
$\lambda $ increases, the range of
x values over which the skewing mechanism is applied decreases within the interval
$(0,2)$. As a result, higher peaks are evident in the PDF of the
$\mathcal{SGN}$ distribution [
11]. These properties are illustrated in
Figure 1 and
Figure 2.
The main advantage of the
$\mathcal{SGN}$ distribution is its flexibility by accommodating both skewness and kurtosis (specifically, heavier tails than that of the
$\mathcal{SN}$ distribution); the reader is referred to [
11] for more detail. Furthermore, a random variable from the binomial distribution with parameters
n and
p can be approximated by a normal distribution with mean
$np$ and variance
$np(1p)$ if
n is large or
$p\approx 0.5$ (that is, when the distribution is approximately symmetrical). However, if
$p\ne 0.5$, an asymmetric distribution is observed with considerable skewness for both large and small values of
p. Bekker et al. [
9] addressed this issue and showed that the
$\mathcal{SGN}$ distribution outperforms both the normal and
$\mathcal{SN}$ distributions in approximating binomial distributions for both large and small values of
p with
$n\le 30$.
In order to demonstrate some characteristics (in particular, the expected value, variance, kurtosis, skewness, and moment generating function (MGF)) of the
$\mathcal{SGN}$ distribution, the following theorem from [
11] can be used to approximate the
kth moment.
Theorem 1. Suppose $X\sim \mathcal{SGN}(\beta ,\lambda )$ with the PDF defined in Definition 1 for $\mu =0$ and $\alpha =1$, then:where A is a random variable distributed according to the gamma distribution with scale and shape parameters 1 and $(k+1)/\beta $, respectively. Proof. The reader is referred to [
11] for the proof of Theorem 1. □
Theorem 1 is shown to be the most stable and efficient for approximating the
kth moment of the distribution of
X, although it is also important to note that the sample size
$n>\mathrm{60,000}$ such that significant estimates of these characteristics are obtained.
Figure 3 illustrates the skewness and kurtosis characteristics that were calculated using Theorem 1 for various values of
$\beta $ and
$\lambda $. When evaluating these characteristics, it is seen that both kurtosis and skewness are affected by
$\beta $ and
$\lambda $ jointly. In particular (referring to
Figure 3):
Skewness is a monotonically increasing function for $\beta \le 2$—that is, for $\lambda <0$, the distribution is negatively skewed, and vice versa.
In contrast to the latter, skewness is a nonmonotonic function for $\beta >2$.
Considering kurtosis, all real values of $\lambda $ and decreasing values of $\beta $ result in larger kurtosis, yielding heavier tails than that of the normal distribution.
In a more general sense for an arbitrary $\mu $ and $\alpha $, Theorem 1 can be extended as follows:
Theorem 2. Suppose $X\sim \mathcal{SGN}(\beta ,\lambda )$ and $Y=\mu +\alpha X$ such that $Y\sim \mathcal{SGN}(\mu ,\alpha ,\beta ,\lambda )$, then:with $\mathbb{E}\left[{X}^{k}\right]$ defined in Theorem 1. Proof. The proof follows from Theorem 1 [
11]. □
Theorem 3. Suppose $X\sim \mathcal{SGN}(\beta ,\lambda )$ with the PDF defined in Definition 1 for $\mu =0$ and $\alpha =1$, then the MGF is given by:where W is a random variable distributed according to the generalized gamma distribution (refer to [11] for more detail) with scale, shape, and generalizing parameters 1, $j+1$, and β, respectively. Proof. From Definition 1 and (
2), it follows that:
Furthermore, using the infinite series representation of the exponential function:
where
W is a random variable distributed according to the generalized gamma distribution with scale, shape, and generalizing parameters 1,
$j+1>0$ and
$\beta >0$, respectively, and PDF:
when
$w>0$, and zero otherwise. Similarly,
${I}_{2}$ can be written as:
Thus, the MGF of
X can be written as follows:
□
The representation of the MGF (and by extension, the characteristic function) of the $\mathcal{SGN}$ distribution, defined in Theorem 3 above, can be seen as an infinite series of weighted expected values of generalized gamma random variables.
Remark 1. It is clear that β and λ jointly affect the shape of the $\mathcal{SGN}$ distribution. In order to distinguish between the two parameters, this paper will refer to λ as the skewing parameter, since the skewing mechanism depends on λ only. β will be referred to as the generalization parameter, as it accounts for flexibility in the tails and generalizing the normal to the $\mathcal{GN}$ distribution of [13]. 3. The ARSGN($p$) Model and Its Estimation Procedure
This section focuses on the model definition and ML estimation procedure of the ARSGN(p) model.
Definition 2. If ${Y}_{t}$ is defined by an AR(p) process with independent innovations ${a}_{t}\sim \mathcal{SGN}(0,\alpha ,\beta ,\lambda )$ with PDF:then it is said that ${Y}_{t}$ is defined by an ARSGN(p) process for time $t=\{1,2,\dots \}$ and with process mean ${\mu}^{*}={\phi}_{0}/{(1{\phi}_{1}{\phi}_{2}\cdots {\phi}_{p})}^{1}$. Remark 2. The process mean for an AR(p) process keeps its basic definition, regardless of the underlying distribution for the innovation process ${a}_{t}$.
With
${a}_{t}$ representing the process of independent distributed innovations with the PDF defined in Definition 2, the joint PDF for
$({a}_{p+1},{a}_{p+2},\dots ,{a}_{n})$ is given as:
for
$n>p$. Furthermore, from (
1), the innovation process can be rewritten as:
Since the distribution for
$({Y}_{1},{Y}_{2},\dots ,{Y}_{p})$ is intractable (being a linear combination of
$\mathcal{SGN}$ variables), the complete joint PDF of
$({Y}_{1},{Y}_{2},\dots ,{Y}_{n})$ is approximated by the conditional joint PDF of
${Y}_{t}$, for
$t=\{p+1,p+2,\dots \}$, which defines the likelihood function
$l(\Theta )$ for the ARSGN(
p) model. Thus, using (
3) and (
4), the joint PDF of
${Y}_{t}$ given
$({Y}_{1},{Y}_{2},\dots ,{Y}_{p})$ is given by:
where
${z}_{t}^{*}=({y}_{t}{\phi}_{0}{\phi}_{1}{y}_{t1}{\phi}_{2}{y}_{t2}\cdots {\phi}_{p}{y}_{tp})/\alpha $. The ML estimator
$\Theta =(\alpha ,\beta ,\lambda ,\mathit{\phi})$ is based on maximizing the conditional loglikelihood function, where
$\mathit{\phi}=({\phi}_{0},{\phi}_{1},{\phi}_{2},\dots ,{\phi}_{p})$. Evidently, the
$p+m$ parameters need to be estimated for an AR(
p) model, where
m represents the number of parameters in the distribution considered for the innovation process.
Theorem 4. If ${Y}_{t}$ is characterized by an ARSGN(p) process, then the conditional loglikelihood function is given as:for $t=\{p+1,p+2,\dots \}$ and $f(\xb7)$ defined in Definition 2. The conditional loglikelihood in Theorem 4 can be written as:
where
${z}_{t}^{*}={a}_{t}/\alpha $ and
${a}_{t}$ is defined in (
4). The ML estimation process of the ARSGN(
p) process is summarized in Algorithm 1 below.
Algorithm 1: 
 1:
Determine the sample mean $\overline{y}$, variance ${s}^{2}$, and autocorrelations ${r}_{j}$ for $j=1,2,\dots ,p$.  2:
Define the p Yule–Walker equations [ 14] in terms of theoretical autocorrelations ${\rho}_{i}$ for an AR( p) process: Set the theoretical autocorrelations ${\rho}_{i}$ in the Yule–Walker equations equal to the sample autocorrelations ${r}_{i}$, and solve the method of moment estimates (MMEs) for the p AR parameters simultaneously in terms of ${r}_{1},{r}_{2},\dots ,{r}_{p}$. Use these MMEs as the starting values for the AR parameters $\mathit{\phi}={({\phi}_{1},{\phi}_{2},\dots ,{\phi}_{p})}^{\top}$.  3:
Set starting values for the intercept ${\phi}_{0}$ and scale parameter $\alpha $ equal to the MMEs [ 14] such that:  4:
Set the starting values for the shaping parameters $\beta $ and $\lambda $ equal to two and zero, respectively.  5:
Use the optim() function in the $\mathtt{R}$ software to maximize the conditional loglikelihood function iteratively and yield the most likely underlying distribution with its specified parameters.
