Skew Generalized Normal Innovations for the AR( p ) Process Endorsing Asymmetry

: The assumption of symmetry is often incorrect in real-life statistical modeling due to asymmetric behavior in the data. This implies a departure from the well-known assumption of normality deﬁned for innovations in time series processes. In this paper, the autoregressive (AR) process of order p (i.e., the AR( p ) process) is of particular interest using the skew generalized normal ( SGN ) distribution for the innovations, referred to hereafter as the ARSGN( p ) process, to accommodate asymmetric behavior. This behavior presents itself by investigating some properties of the SGN distribution, which is a fundamental element for AR modeling of real data that exhibits non-normal behavior. Simulation studies illustrate the asymmetry and statistical properties of the conditional maximum likelihood (ML) parameters for the ARSGN( p ) model. It is concluded that the ARSGN( p ) model accounts well for time series processes exhibiting asymmetry, kurtosis, and heavy tails. Real time series datasets are analyzed, and the results of the ARSGN( p ) model are compared to previously proposed models. The ﬁndings here state the effectiveness and viability of relaxing the normal assumption and the value added for considering the candidacy of the SGN for AR time series processes.


Introduction
The autoregressive (AR) model is one of the simplest and most popular models in the time series context. The AR(p) time series process y t is expressed as a linear combination of p finite lagged observations in the process with a random innovation structure for t = {1, 2, . . .} and is given by: where ϕ 1 , ϕ 2 , . . . , ϕ p are known as the p AR parameters. The process mean (i.e., mean of y t ) for the AR(p) process in (1) is given by µ * = ϕ 0 (1 − ϕ 1 − ϕ 2 − · · · − ϕ p ) −1 . Furthermore, if all roots of the characteristic equation: ϕ(x) = 1 − ϕ 1 x − ϕ 2 x 2 − · · · − ϕ p x p = 0 are greater than one in absolute value, then the process is described as stationary (which is considered for this paper). The innovation process a t in (1) represents white noise with mean zero (since the process mean is already built into the AR(p) process) and a constant variance, which can be seen as independent "shocks" randomly selected from a particular distribution. In general, it is assumed that a t follows the normal distribution, in which case the time series process y t will be a Gaussian process [1].
This assumption of normality is generally made due to the fact that natural phenomena often appear to be normally distributed (examples include age and weights), and it tends to be appealing due to its symmetry, infinite support, and computationally efficient characteristics. However, this assumption is often violated in real-life statistical analyses, which may lead to serious implications such as bias in estimates or inflated variances. Examples of time series data exhibiting asymmetry include (but are not limited to) financial indices and returns, measurement errors, sound frequency measurements, tourist arrivals, production in the mining sector, and sulphate measurements in water.
To address the natural limitations of normal-behavior, many studies have proposed AR models characterized by asymmetric innovation processes that were fitted to real data illustrating their practicality, particularly in the time series environment. The traditional approach for defining non-normal AR models is to keep the linear model (1) and let the innovation process a t follow a non-normal process instead. Some early studies include the work of Pourahmadi [2], considering various non-normal distributions for the innovation process in an AR(1) process such as the exponential, mixed exponential, gamma, and geometric distributions. Tarami and Pourahmadi [3] investigated multivariate AR processes with the t distribution, allowing for the modeling of volatile time series data. Other models abandoning the normality assumption have been proposed in the literature (see [4] and the references within). Bondon [5] and, more recently, Sharafi and Nematollahi [6] and Ghasami et al. [7] considered AR models defined by the epsilon-skew-normal (E SN ), skew-normal (S N ), and generalized hyperbolic (GH) innovation processes, respectively. Finally, AR models are not only applied in the time series environment: Tuaç et al. [8] considered AR models for the error terms in the regression context, allowing for asymmetry in the innovation structures.
This paper considers the innovation process a t to be characterized by the skew generalized normal (S GN ) distribution (introduced in Bekker et al. [9]). The main advantages gained from the SGN distribution include the flexibility in modeling asymmetric characteristics (skewness and kurtosis, in particular) and the infinite real support, which is of particular importance in modeling error structures. In addition, the SGN distribution adapts better to skewed and heavy-tailed datasets than the normal and SN counterparts, which is of particular value in the modeling of innovations for AR processes [7].
Referring to Definition 1, Φ(·) denotes the cumulative distribution function (CDF) for the standard normal distribution, with 2Φ( √ 2λz) operating as a skewing mechanism [10]. The symmetric base PDF to be skewed is given by φ, denoting the PDF of the generalized normal (GN ) distribution given by: where Γ(·) denotes the gamma function [11]. The standard case for the SGN distribution with µ = 0 and α = 1 in Definition 1 is denoted as X ∼ SGN (β, λ). Furthermore, the SGN distribution results in the standard SN distribution in the case of µ = 0, α = √ 2 and β = 2, denoted as X ∼ SN (λ) [11]. In addition, the distribution of X collapses to that of the standard normal distribution if λ = 0 [10].
Following the definition and properties of the SGN distribution, discussed in [11] and summarized in Section 2 below, the AR(p) process in (1) with independent and identically distributed innovations a t ∼ SGN (0, α, β, λ) is presented with its maximum likelihood (ML) procedure in Section 3. Section 4 evaluates the performance of the conditional ML estimator for the ARSGN(p) model through simulation studies. Real financial, chemical, and population datasets are considered to illustrate the relevance of the newly proposed model, which can accommodate both skewness and heavy tails simultaneously. Simulation studies and real data applications illustrate the competitive nature of this newly proposed model, specifically in comparison to the AR(p) process under the normality assumption, as well as the ARSN(p) process proposed by Sharafi and Nematollahi [6]; this is an AR(p) process with the innovation process defined by the SN distribution such that a t ∼ SN (0, α, λ). In addition, this paper also considers the AR(p) process with the innovation process defined by the skew-t (S T ) distribution [12] such that a t ∼ ST (0, α, λ, ν), referred to as an ARST(p) process. With a shorter run time, it is shown that the proposed ARSGN(p) model competes well with the ARST(p) model, thus accounting well for processes exhibiting asymmetry and heavy tails. Final remarks are summarized in Section 5.

Review on the Skew Generalized Normal Distribution
Consider a random variable X ∼ SGN (µ, α, β, λ) with PDF defined in Definition 1. The behavior of the skewing mechanism 2Φ( √ 2λz) and the PDF of the SGN distribution is illustrated in Figures 1 and 2, respectively (for specific parameter structures). From Definition 1, it is clear that β does not affect the skewing mechanism, as opposed to λ. When λ = 0, the skewing mechanism yields a value of one, and the SGN distribution simplifies to the symmetric GN distribution. Furthermore, as the absolute value of λ increases, the range of x values over which the skewing mechanism is applied decreases within the interval (0, 2). As a result, higher peaks are evident in the PDF of the SGN distribution [11]. These properties are illustrated in Figures 1 and 2. The main advantage of the SGN distribution is its flexibility by accommodating both skewness and kurtosis (specifically, heavier tails than that of the SN distribution); the reader is referred to [11] for more detail. Furthermore, a random variable from the binomial distribution with parameters n and p can be approximated by a normal distribution with mean np and variance np(1 − p) if n is large or p ≈ 0.5 (that is, when the distribution is approximately symmetrical). However, if p = 0.5, an asymmetric distribution is observed with considerable skewness for both large and small values of p. Bekker et al. [9] addressed this issue and showed that the SGN distribution outperforms both the normal and SN distributions in approximating binomial distributions for both large and small values of p with n ≤ 30. In order to demonstrate some characteristics (in particular, the expected value, variance, kurtosis, skewness, and moment generating function (MGF)) of the SGN distribution, the following theorem from [11] can be used to approximate the kth moment. Theorem 1. Suppose X ∼ SGN (β, λ) with the PDF defined in Definition 1 for µ = 0 and α = 1, then: where A is a random variable distributed according to the gamma distribution with scale and shape parameters 1 and (k + 1)/β, respectively.
Proof. The reader is referred to [11] for the proof of Theorem 1.
Theorem 1 is shown to be the most stable and efficient for approximating the kth moment of the distribution of X, although it is also important to note that the sample size n > 60,000 such that significant estimates of these characteristics are obtained. Figure 3 illustrates the skewness and kurtosis characteristics that were calculated using Theorem 1 for various values of β and λ.
When evaluating these characteristics, it is seen that both kurtosis and skewness are affected by β and λ jointly. In particular (referring to Figure 3):

•
Skewness is a monotonically increasing function for β ≤ 2-that is, for λ < 0, the distribution is negatively skewed, and vice versa. • In contrast to the latter, skewness is a non-monotonic function for β > 2.

•
Considering kurtosis, all real values of λ and decreasing values of β result in larger kurtosis, yielding heavier tails than that of the normal distribution.  In a more general sense for an arbitrary µ and α, Theorem 1 can be extended as follows: Theorem 2. Suppose X ∼ SGN (β, λ) and Y = µ + αX such that Y ∼ SGN (µ, α, β, λ), then: Proof. The proof follows from Theorem 1 [11].
Theorem 3. Suppose X ∼ SGN (β, λ) with the PDF defined in Definition 1 for µ = 0 and α = 1, then the MGF is given by: where W is a random variable distributed according to the generalized gamma distribution (refer to [11] for more detail) with scale, shape, and generalizing parameters 1, j + 1, and β, respectively.
Proof. From Definition 1 and (2), it follows that: Furthermore, using the infinite series representation of the exponential function: where W is a random variable distributed according to the generalized gamma distribution with scale, shape, and generalizing parameters 1, j + 1 > 0 and β > 0, respectively, and PDF: when w > 0, and zero otherwise. Similarly, I 2 can be written as: Thus, the MGF of X can be written as follows: The representation of the MGF (and by extension, the characteristic function) of the SGN distribution, defined in Theorem 3 above, can be seen as an infinite series of weighted expected values of generalized gamma random variables.

Remark 1.
It is clear that β and λ jointly affect the shape of the SGN distribution. In order to distinguish between the two parameters, this paper will refer to λ as the skewing parameter, since the skewing mechanism depends on λ only. β will be referred to as the generalization parameter, as it accounts for flexibility in the tails and generalizing the normal to the GN distribution of [13].

The ARSGN(p) Model and Its Estimation Procedure
This section focuses on the model definition and ML estimation procedure of the ARSGN(p) model.

Definition 2. If Y t is defined by an AR(p) process with independent innovations a t ∼ SGN
then it is said that Y t is defined by an ARSGN(p) process for time t = {1, 2, . . .} and with process mean

Remark 2.
The process mean for an AR(p) process keeps its basic definition, regardless of the underlying distribution for the innovation process a t .
With a t representing the process of independent distributed innovations with the PDF defined in Definition 2, the joint PDF for (a p+1 , a p+2 , . . . , a n ) is given as: for n > p. Furthermore, from (1), the innovation process can be rewritten as: Since the distribution for (Y 1 , Y 2 , . . . , Y p ) is intractable (being a linear combination of SGN variables), the complete joint PDF of (Y 1 , Y 2 , . . . , Y n ) is approximated by the conditional joint PDF of Y t , for t = {p + 1, p + 2, . . .}, which defines the likelihood function l(Θ) for the ARSGN(p) model. Thus, using (3) and (4), the joint PDF of Y t given (Y 1 , Y 2 , . . . , Y p ) is given by: where is based on maximizing the conditional log-likelihood function, where ϕ = (ϕ 0 , ϕ 1 , ϕ 2 , . . . , ϕ p ). Evidently, the p + m parameters need to be estimated for an AR(p) model, where m represents the number of parameters in the distribution considered for the innovation process.

Theorem 4.
If Y t is characterized by an ARSGN(p) process, then the conditional log-likelihood function is given as: The conditional log-likelihood in Theorem 4 can be written as: where z * t = a t /α and a t is defined in (4). The ML estimation process of the ARSGN(p) process is summarized in Algorithm 1 below.

Algorithm 1:
1: Determine the sample meanȳ, variance s 2 , and autocorrelations r j for j = 1, 2, . . . , p. 2: Define the p Yule-Walker equations [14] in terms of theoretical autocorrelations ρ i for an AR(p) process: Set the theoretical autocorrelations ρ i in the Yule-Walker equations equal to the sample autocorrelations r i , and solve the method of moment estimates (MMEs) for the p AR parameters simultaneously in terms of r 1 , r 2 , . . . , r p . Use these MMEs as the starting values for the AR parameters ϕ = (ϕ 1 , ϕ 2 , . . . , ϕ p ) . 3: Set starting values for the intercept ϕ 0 and scale parameter α equal to the MMEs [14] such that: 4: Set the starting values for the shaping parameters β and λ equal to two and zero, respectively. 5: Use the optim( ) function in the R software to maximize the conditional log-likelihood function iteratively and yield the most likely underlying distribution with its specified parameters.

Application
In this section, the performance and robustness of the ML estimator Θ for the ARSGN(p) time series model is illustrated through various simulation studies. The proposed model is also applied to real data in order to illustrate its relevance, in comparison to previously proposed models. All computations were carried out using R 3.5.0 in a Win 64 environment with a 2.30 GHz/Intel(R) Core(TM) i5-6200U CPU Processor and 4.0 GB RAM, and run times are given in seconds. The R code is available from the first author upon request.

Numerical Studies
The aim of this subsection is to illustrate the performance and robustness of the conditional ML estimator Θ for the proposed model in Definition 2 using various simulation studies. Define the hypothetical AR(5) time series model, which will (partly) be considered in the simulation studies below: where a t and the sample size n will be defined differently for each simulation study. The simulation studies are algorithmically described in Algorithm 2 below.

Algorithm 2:
1: Independent random variables U ∼ GN (β) with PDF φ(·; 0, 1, β) and U 1 ∼ N (0, 1) are generated, using the rgnorm( ) and rnorm( ) functions in R, respectively. 2: Following [9,15], the innovation process a t is simulated for t = {1, 2, . . . , n} using: 3: The time series y t is determined using the arima.sim( ) function in R with the simulated innovations a t from (7) and theoretical parameters defined in (6), then adding the process mean µ * to the time series y t . 4: Algorithm 1 is applied to estimate the parameters of the ARSGN(p) model, considering various values of p. 5: A second simulation study is implemented for p = 2, repeating Steps 1 to 4 above 500 times in order to analyze the sampling distributions for the parameters of the ARSGN(p) model. 6: A third simulation study investigates the performance of the ARSGN(p) model when the innovation process a t is described by an ST distribution instead, for p = 2 in (6). 7: A forth simulation study extends the latter when the innovation process a t is described by an ST distribution for various degrees of freedom, evaluating the performance of the ARSGN(p) model for various levels in the tails of a t .

Simulation Study 1
In order to evaluate the conditional ML estimation performance of the ARSGN(p) model, the time series y t will be simulated and estimated for various orders of p = 1, 2, 3, 4, 5 and sample sizes n = 100, 500, 1000, 5000. Assuming a t ∼ SGN (0, 4, 3, −10) for the hypothetical model defined in (6), the innovation process a t is simulated and the time series y t is estimated with an ARSGN(p) model, as described in Algorithm 2. Table 1 summarizes the parameter estimates and standard errors obtained from the ARSGN(p) model for all p = 1, 2, 3, 4, 5 and n = 100, 500, 1000, 5000. In general, it is clear that the model fits the simulated innovation processes (and time series) relatively well, except for λ, which tends to be more volatile with larger standard errors, although these standard errors decrease for larger sample sizes. It is also noted that there are occasional trade-offs in the estimation of β and λ for p = 4 and p = 5, which decreases the standard error for one parameter, but increases the standard error of the other. These occasional instances of "incorrect" estimations, which consequently have an influence on the estimation of ϕ 0 as well, may be explained by the fact that β and λ are jointly affecting the asymmetric behavior in the SGN distribution. In addition to Table 1, Figure 4 illustrates the estimated ARSGN(3) model with the distribution of the residuals, where the residual at time t is defined as: andŶ t representing the estimated value at time t. The asymmetric behavior of the SGN distribution is especially seen from the fitted model in Figure 4.

Simulation Study 2
Sampling distributions for the parameters can be used to evaluate the robustness of the estimator for the proposed model. In order to construct the sampling distributions for the parameters in Table 1, a Monte Carlo simulation study is applied by repeating the simulation for a t and the estimation procedure 500 times, considering p = 2 and n = 5000 for the hypothetical model defined in (6) with a t ∼ SGN (0, 4, 3, −10).
The sampling distributions obtained for the conditional ML parameter estimates are illustrated in Figure 5, all centered around their theoretical values. The occasional trade-offs between β and λ (noted in Simulation Study 1) are evident from these distributions. Furthermore, Table 2 summarizes the 5th, 50th (thus, the median), and 95th percentiles obtained from these sampling distributions, of which the 5th and 95th percentiles can be used for approximating the 95% confidence intervals. It is evident that the theoretical values for all parameters fall within their respective 95% confidence intervals and exclude zero, suggesting that all parameter estimates are significant. It is concluded that the conditional ML estimation of the proposed model is robust since all 50th percentiles are virtually identical to each of the theoretical values, confirming what is depicted in Figure 5.   For comparison and completeness' sake, consider the hypothetical time series model defined in (6), for up to p = 2 only. This time, the innovation process is simulated from an ST distribution such that a t ∼ ST (0, 2, 15, 5) [12], and thus, the simulated time series y t is referred to as an ARST (2) process. The aim of this simulation study is to evaluate the fit of the ARSGN(2) model in comparison to the AR(2), ARSN(2), and ARST(2) models each, even though the true innovation process follows an ST distribution.
Considering a sample size of n = 5000, the innovation process a t was simulated using the rst( ) function in R. The AR(2), ARSN(2), ARSGN(2), and ARST(2) models are each fitted to the time series y t by maximizing the respective conditional log-likelihood functions. Starting values were chosen similar for all models as discussed in Algorithm 1, except for λ, being set equal to the sample skewness for both the ARSN(2) and ARST(2) models and the degrees of freedom ν being initialized at one for the latter. It should be noted that the starting value for λ in the ARSGN(p) model was not set to the sample skewness, since it was noted in Section 2 that λ and β jointly affect the skewness and shape of the distribution. Table 3 summarizes the conditional ML parameter estimates with the standard errors given in parentheses, as well as the log-likelihood, Akaike information criterion (AIC), and run times for the various AR models, where AIC is defined as: with n (Θ) representing the maximized value of the log-likelihood function defined in Theorem 4 and M denoting the number of parameters in the model [16]. From the log-likelihood and AIC values obtained for the different models, it can be seen that the ARSGN(2) model competes well with the ARST(2) model (see Figure 6). In particular, the run time for the estimation of the ARSGN(2) model is shorter than that of the ARST(2) model, with all parameters being significant at a 95% confidence level. Thus, it can be concluded that the ARSGN(2) model is a valid contender in comparison to other popular models accounting for asymmetry, kurtosis, and heavy tails and performs competitively considering the computational time. Table 3. AR(2) ML parameter estimates with standard errors (in parentheses), log-likelihood n (Θ), Akaike information criterion (AIC), and run times for sample size n = 5000 and a t ∼ ST (0, 2,15,5). ST, skew-t.

Simulation Study 4
The purpose of this simulation study is to evaluate the estimation performance and adaptability of the proposed ARSGN(p) model (in comparison to some of its competitors) on processes with various tail weights simulated from the ST distribution. Thus, consider the hypothetical time series model defined in (6), for up to p = 2 only and with a t ∼ ST (0, 2, 15, ν) where ν represents the different degrees of freedom ν = 5, 6, . . . , 35 under which y t will be simulated. For a sample size of n = 10,000, the AR(2) model is estimated assuming various distributions for the innovation process a t . Figures 7 and 8 illustrate the standard errors of the estimates of the parameters of interest and AIC values obtained from the various AR models fitted to the simulated ARST(2) process for degrees of freedom ν = 5, 6, . . . , 35. Figure 7 illustrates that the skewing parameter λ for the ARSGN(2) model is more volatile compared to the other parameters, although it performs with less volatility than λ in the ARSN(2) model for lower degrees of freedom. Observing the AIC values in Figure 8, it is clear that the AR(2) model (under the assumption of normality) performs the worst for all degrees of freedom, whereas the ARSN(2) model performs similar to that of the ARST(2) for degrees of freedom ν > 15. In contrast, the proposed ARSGN(2) model performs almost equivalently to the ARST(2) model for all degrees of freedom indicating that the proposed model adapts well to various levels of skewness and kurtosis.

Real-World Time Series Analysis
This subsection illustrates the relevance of the ARSGN(p) model in areas such as chemistry, population studies, and economics, in comparison to previously proposed AR models. Descriptive statistics for all time series considered below are summarized in Table 4.    Figure 8. AIC values obtained from AR(2) models fitted (assuming various distributions for a t ) to an ARST(2) process simulated with a t ∼ ST (0, 2, 15, ν) for different degrees of freedom ν.

Viscosity during a Chemical Process
In order to compare the ARSGN(p) model, defined in Definition 2, to previously proposed models (i.e., AR models assuming the normal and SN distributions for the innovation process a t , respectively), consider the time series data Series D in [17]. This dataset consists of n = 310 hourly measurements of viscosity during a chemical process, represented in Figure 9. The Shapiro-Wilk test applied to the time series data suggests that the data are not normally distributed with a p-value < 0.001; this is also confirmed by the histogram in Figure 9. From the autocorrelation function (ACF) and partial autocorrelation function (PACF), it is evident that an AR(1) model is a suitable choice for fitting a time series model. Previously, Box and Jenkins [17] fit an AR(1) model to this time series, assuming that the innovations are normally distributed. Sharafi and Nematollahi [6] relaxed this normality assumption and allowed for asymmetry by fitting an ARSN(1) model. Table 5 summarizes the conditional ML parameter estimates (with the standard errors given in parentheses) for both of these models, together with those obtained for the newly proposed ARSGN(1) model and the ARST(1) model. In addition, the maximized log-likelihood, AIC values, Kolmogorov-Smirnov (KS) test statistics, and run times are also represented for the various models, where the KS test statistic is defined as: where F(â t ) and F(a t ) represent the empirical and estimated distribution functions for the residuals and innovation process, respectively [18]. From the log-likelihood, AIC values, and KS test statistics calculated for the four models, it can be concluded that the ARSGN(1) model fits this time series the best, with a competitive estimation run time. Take note that from the estimated interceptφ 0 , the process mean is estimated asμ * = 9.492, which is evident from the time plot in Figure 9.
Evaluating the standard errors for the parameter estimates obtained for the ARSGN(1) model, it is observed that all parameters differ significantly from zero at a 95% confidence level, except for the skewing parameter λ, suggesting that the innovation process does not contain significant skewness. This is confirmed by Figure 10, from which it is clear that only slight skewness is present when observing the distribution for the residuals obtained for all four models. In this case, it is also clear to see that the SGN distribution captures the kurtosis well. Finally, Figure 11 illustrates the CDFs for the various estimated models together with the empirical CDFs for the residuals obtained from the ARSGN(1) model, suggesting that the ARSGN(1) model fits the innovation process best. Table 5. ML parameter estimates with standard errors (in parentheses), log-likelihood n (Θ), AIC, Kolmogorov-Smirnov (KS) test statistics, and run times for the hourly viscosity measurements during a chemical process [17].

Estimated Resident Population for Australia
In order to illustrate the relevance of the proposed model for higher orders of p, consider the quarterly estimated Australian resident population data (in thousands), which consists of n = 89 observations from June 1971 to June 1993. Figure 12 shows the time plot and ACF for the original time series from which it is clear that nonstationarity is present since the process mean and autocorrelations depend on time. This is also confirmed by the augmented Dickey-Fuller (ADF) test, which yields a p-value = 0.3493, suggesting that the time series exhibits a unit root [19]. Transforming the original time series by differencing the time series twice yields a stationary time series with a p-value < 0.01 for the ADF test (suggesting no unit roots). This stationary time series and its distribution are illustrated in Figure 13, with the ACF and PACF suggesting that an AR(3) model is the appropriate choice for fitting a time series model. Previously, Brockwell and Davis [20] fitted an AR(3) model to the differenced (i.e., stationary) time series, assuming that the innovations are normally distributed. However, both the histogram and Shapiro-Wilk test (with p-value < 0.001) applied to this stationary time series suggest that the innovation process a t is not normally distributed. Instead, SGN is considered as a distribution for the innovation process-that is, a t ∼ SGN (0, α, β, λ). Table 6 summarizes the estimation results for AR models under the SGN distribution in comparison to the normal, SN , and ST distributions. From the maximized log-likelihood, AIC values, and KS test statistics calculated for the four models, it can be concluded that the ARSGN(3) model fits the best, with an estimated process meanμ * = 0.339 (also evident from Figure 13). Furthermore, evaluating the standard errors of the parameters obtained for the ARSGN(3) model, it is evident that all parameters differ significantly from zero at a 95% confidence level, except for the skewing parameter λ, suggesting that the innovation process does not exhibit significant skewness. Figure 14 illustrates these estimated models, confirming that the proposed ARSGN(3) model adapts well to various levels of asymmetry.

Insolvencies in South Africa
As a final application, consider the monthly (seasonally adjusted) number of insolvencies in South Africa for January 2000 to November 2019 (retrieved from Stats SA [21]). The time plot and ACF for the time series in Figure 15 suggests that the time series is nonstationary (which is also confirmed by the ADF test with p-value = 0.5807). A stationary time series is obtained by differencing the original time series once. This stationary time series and its distribution is illustrated in Figure 16, with the ACF and PACF suggesting that an AR(2) model is the appropriate choice for fitting a time series model. Although the Shapiro-Wilk test yields a p-value < 0.001 (suggesting non-normality), the AR(2) model is fitted assuming each of the normal, SN , SGN , and ST distributions for the innovation process a t , respectively; Table 7 and Figure 17 show the results obtained. Referring to Table 7, it appears as if the ARST(2) model fits the best, with the proposed ARSGN(2) model as the runner-up (by comparing the maximized log-likelihood and AIC values). However, when comparing the standard errors for the parameter estimates, it is suggested that the estimates for the ARSGN(2) model generally exhibit less volatility compared to its competitors, with all parameters being significant at a 95% confidence level. Figure 17 confirms that the SGN distribution adapts well to various levels of skewness and kurtosis.

Conclusions
In this paper, the AR(p) time series model with skewed generalized normal (S GN ) innovations was proposed, i.e., ARSGN(p). The main advantage of the SGN distribution is its flexibility by accommodating asymmetry, kurtosis, as well as heavier tails than the normal distribution. The conditional ML estimator for the parameters of the ARSGN(p) model was derived and the behavior was investigated through various simulation studies, in comparison to previously proposed models. Finally, real-time series datasets were fitted using the ARSGN(p) model, of which the usefulness was illustrated by comparing the estimation results for the proposed model to some of its competitors. In conclusion, the ARSGN(p) is a meaningful contender in the AR(p) environment compared to other often-considered models. A stepping-stone for future research includes the extension of the ARSGN(p) model for the multivariate case. Furthermore, an alternative method for defining the non-normal AR model by discarding the linearity assumption may be explored; see, for example, the work done on non-linear time series models in [22,23].