1. Introduction
Linear regression is one of the most common and well understood techniques for modeling the relationship between one or more covariates or predictor variables, X, and the conditional mean of the response variable Y, given . The conditional mean minimizes the expected squared error: . If the conditional mean of Y, given x, is linear and expressed as , then can be estimated by solving , which is the ordinary least squares solution of the linear regression model.
However, inference from this model requires that specific assumptions be made about the distribution of the error (i.e., linearity, homoscedasticity, independence, or normality). In contrast, quantile regression, as introduced by Koenker and Bassett [
1], aims to extend these ideas to the estimation of conditional quantile function models wherein the quantiles of the conditional distribution of the response variable are expressed as functions of observed covariates. For a pair of observed values
of random vector
, a quantile regression has the form
, where
indicates that the specific quantile of
Y is the smallest value of y denoted by
, given without making assumptions about the distribution of the error. In the case of classical quantile regressions, we can assume that these quantiles of the conditional distribution have a linear form,
where
is a parameter vector (a vector of regression coefficients).
For the following, it may be useful to note that this expression can be written in an equivalent way:
, with
. Condition
1 is similar to that carried out in the standard linear regression, in which the conditional mean of the variable of interest Y is modelled as a linear expression of the explanatory variables
X. One difference is that here we allow the coefficients to differ from quantile to quantile. This provides additional information not apparent from simple linear regression.
In the parameter estimation problem, the form of the quantile regression function is known, but it contains unknown parameters,
. The most intuitive way to calculate the standard estimator,
, consists of ordering these n variables, the quantile of order
being provided by the (
) observation where [
] is the smallest integer greater than or equal to
. However, it is more useful to notice that the popular method for estimating the unknown parameters,
, in a quantile-regression function is by solving the estimating equation
where
is the check function defined by
for some
. Here,
is an indicator function that takes the value of unity when
is true and zero otherwise, and here
. However, this function is not differentiable at zero, and clear solutions to minimization problems are unobtainable [
2,
3]. In quantile-regression methods, linear programming is frequently implemented for parameter estimation.
The Expectation-Maximization (EM) algorithm and the Alternating Least Squares (ALS) algorithm are both iterative optimization techniques commonly employed in statistical modeling to estimate parameters iteratively. In quantile regression, aimed at estimating conditional quantiles of a response variable given covariates, the selection between EM and ALS relies on several considerations.
Opting for EM in quantile regression over ALS is justified due to its capability in managing latent variables and accommodating diverse error distributions, and its alignment with the objectives of quantile regression. Additionally, EM avoids the swapping effect by updating parameters sequentially rather than alternately, enhancing stability in estimation. However, it is essential to note that the bias in EM hinges on factors such as sample size and model assumptions.
Despite potential variations in speed, where the speed of EM varies depending on the complexity of the model and convergence criteria, EM’s advantages in handling latent variables and error distributions make it a favorable choice in many quantile-regression applications.
There are numerous articles about the parameter estimation of quantile-regression models with the EM algorithm. Tian et al. [
4] proposed a new method based on the Expectation-Maximization algorithm for a linear quantile-regression model with symmetric Laplace error distribution. Furthermore, Tian et al. [
5] used this method for a linear quantile-regression model with autoregressive errors. Zhou et al. [
6] developed the EM algorithm and GEM algorithm for calculating the quantiles of linear and non-linear regression models.
All of these results are only for the errors that are modeled by short-memory, time-series models. However, long-memory models are very important; they are used in various fields, such us hydrology, chemistry, and economics; see, for example Hurst [
7], Jeffreys [
8], Student [
9].
The most commonly used long-memory model, which deals with the modeling of cyclic behavior, is The Generalized Autoregressive Moving Average model (GARMA). It was proposed by Gray et al. [
10] to deal with non-additivity, non-normality, and heteroscedasticity in real time-series data.
This behavior has attracted considerable attention, prompting extensive research efforts over recent decades. For instance, Darmawan et al. [
11] utilized a GARMA model to predict COVID-19 data trends in Indonesia, Albarracin et al. [
12] analyzed the structure of GARMA models in practical scenarios, while Huntet al. [
13] introduced an R package, ‘garma’, specifically tailored for fitting and forecasting GARMA models in R (Version R-4.4.0).
The foundations of the methods for independent data have now been consolidated, and some computational commands for the analysis of such data are provided by most of the available statistical software (e.g., R (Version 4.4.1 R. Ross Ihaka and Robert Gentleman’s creation in Vienna, Austria)/Python (Version 3.13. Guido van Rossum’s creation, in Wilmington, DE, USA).
In this paper, we will consider a linear quantile-regression model with Generalized Autoregressive Moving Average error, defined as follows:
where
is the t-th observation of the response variable,
is a
covariate,
is a regression parameter vector, and
is a stationary long-memory process generated by the GARMA(p,0) model, as follows:
where
with
is the autoregressive parameter vector,
L is the Backshift operator defined by
,
represent the long-memory Gegenbauer component, and
is an iid process.
In this work, we aim to estimate the parameters of our models with the EM algorithm (see [
14]) based on the method of Tian et al. [
4]. However, we will derive the asymptotic properties of our estimators.
The outline of this paper is organized as follows: In
Section 2, we provide the likelihood function, and
Section 3 deals with the estimation of our parameters by the EM algorithm. In
Section 4, we derive the asymptotic properties of the estimators under some mild conditions. Some simulation results will be provided in
Section 5, and finally a real data example is provided in
Section 6.
2. Estimation Procedure via EM Algorithm
The maximum-likelihood method is a technique that is widely recognized for deriving estimators. One of the principal reasons for the wide popularity of the maximum-likelihood method is that the resulting estimator has many interesting properties. However, in some complicated problems maximum-likelihood estimators are unsuitable or do not exist. This method estimates the parameters of various statistical models, including quantile-regression models. In such cases, to estimate the
at model (
1), it is natural to simultaneously estimate the regression coefficients and the GARMA parameters using quantile regression. Hence, in this section we focuses on the estimation of the parameter vector of models (
3) and (
4) by
. It is also worth noting that maximum-likelihood estimation is related to other optimization techniques. In particular, if we assume a normal distribution then it is equivalent to the ordinary least squares method. In practice, the normality assumption of the random error may be abandoned for many reasons, such as outliers, contaminated data, and heavy-tailed distributions. The Laplace distribution is a good robust alternative in this case [
15]. Yu and Moyeed [
16] found that minimizing the expression (
1) is equivalent to maximizing a likelihood function formed by combining the independently distributed asymmetric Laplace error distribution (see [
17]). In this work, we adopt a similar idea to build a likelihood function; the distribution of the process
in model (
3) is assumed to be a symmetric Laplace distribution with zero mean, scale parameter
, and skewness parameter
. Let
, as follows:
where
is the quantile check function.
The mean and the variance of this distribution are given by:
and
(See [
18]).
The Laplace distribution was chosen for the error term in the GARMA model due to several advantages it offers. Firstly, it offers robustness to outliers. The Laplace distribution, also referred to as the double-exponential distribution, has heavier tails compared to the normal distribution. This implies that it is more robust to outliers in the data. In a GARMA error model, which aims to capture the time-series behavior of a process, having robustness to outliers is essential for precise estimation and prediction. Secondly, unlike the normal distribution, the Laplace distribution is symmetric and has the capability to capture skewness in the data. In time-series analysis, it is common to encounter data that exhibit asymmetric patterns. By using the Laplace distribution as the error term in the GARMA model, we align the model’s assumptions with the potential skewness in the data, thus improving its ability to capture the underlying patterns. Additionally, the Laplace distribution possesses straightforward mathematical properties that enhance its computational efficiency. This characteristic can be beneficial when fitting the GARMA model to extensive datasets or when performing simulations for different scenarios.
Under such circumstance, the maximum-likelihood estimates for the parameters
,
, and
are obtained by maximizing the marginal density,
, which is obtained by replacing
with
in Equation (
3), that is,
The maximization of (
4) is equivalent to the maximization of the logarithm of the likelihood function:
Clearly, the objective function has no closed form. Hence, it is not possible the estimate the quantile-regression coefficients because this objective function is non-convex with respect to
and the differentiation (the function
is not differentiable at 0). Therefore, to overcome this problem we suggest the use of the property of the asymmetric Laplace, combining a normal distribution conditional on an exponential distribution, proposed by Yu and Moyeed [
16], as follows:
where
,
,
and
.
This proposal should be seen as a ratification to the previous objective function, using the Gegenbauer polynomials (
,
), defined by:
Then, under the previous results, Equations (
3), (
8) and (
9) lead to the following explicit expression:
Therefore, we can conclude that
is conditionally normally distributed (see [
5] for more details). Then, the quantile-regression model here is represented as a normal regression model. The full joint density for
is given by:
with
and
.
The estimators of the unknown quantile-regression parameters,
, the scale
, and the vector parameters of the GARMA model,
, may be obtained by maximizing the likelihood function:
The corresponding log-likelihood function is given by:
Due to the unobserved variables,
, the maximum likelihood becomes intractable. In this case, solving Equation (2) requires an iterative algorithm. The EM algorithm (see [
20]) is a broad method of finding the maximum-likelihood estimates of the parameters of a fundamental distribution from a provided dataset that has missing values. In the next section, we present the Expectation-Maximization (EM) algorithm, which is derived from likelihood optimization.
2.1. EM Algorithm
The Expectation-Maximization (EM) algorithm [
14] is a commonly utilized method for finding maximum-likelihood estimates in statistical models that rely on missing data or unobservable latent variables. A latent variable is a variable that affects our observed data but in ways that we cannot know directly. The EM algorithm is an iterative approach that alternates between two modes. The first mode attempts to estimate the missing or hidden variables; this is known as the estimation-step or E-step. The second mode attempts to optimize the parameters of the model to provide the best explanation of the data; this is known as the maximization-step or M-step.
In the model (
10), the EM iterations are based on regarding the random variable (
) as a set of unobserved latent variables or missing values. We seek to estimate our model by maximizing the complete-data log likelihood, which is then denoted in Equation (
12),
, where
are the unknown parameter vectors for which we wish to find the MLE. To apply the EM algorithm for estimation, we must first find the conditional pdf of the unobserved variable
, in order to estimate the missing variables in the dataset. Indeed, since
, in Equation (
8), is assumed to be
and
, it can be observed that the conditional distribution of
, given y,
is given by:
where GIG is the Generalized Inverse Gaussian distribution, a three parameter distribution introduced by Good, [
21] that has been applied in a variety of fields of statistics. More recently, Sánchez et al. [
22] used the GIG distribution as a mixing distribution to estimate the parameters of quantile-regression models. Furthermore, note that the r-th moment of
is given by:
and
where
K is a modified Bassel function of the second kind (see [
23] for more details). The above relations (
14) and (
15) will be useful in calculating the conditional expectation of the log-likelihood function, with respect to the conditional distribution of
given
y, when applying the following two EM steps, described as follows:
To applying EM to model (
10) (with GARMA errors), we start by writing down the expected complete log-likelihood, given by Equation (2), known as the Q-function,
, where
means that the expectation is being affected using
for
, with respect to the unknown data
z given the observed data
y and the current parameter estimates
, which is the estimated value of the h-th iteration. Calculating this expectation at the (
)-th iteration yields
Then,
where we have defined the pseudo-values
,
, and
.
For evaluating (2), it is necessary to compute
,
, and
, as in (
14) and (
15), which will depend of the conditional pdf of
z in Equation (
11).
These expressions are straightforward to derive using Equations (
14) and (
15), as discussed in Eberlein and Hammerstein [
24]. The elements of the partial derivatives of the
t-th moment of
evaluated at
are as follows:
.
We can calculate
explicitly by applying the well-known recursion formula defined by
, resulting in the following expression:
The above E-step of the EM algorithm simply uses to calculate the expectation of the unobservable information, z, given the observed data and the existing estimates of unknown parameters .
In the M-step, the parameters are re-estimated by maximizing the Q-function to find the new estimate
by solving the
step; the M-step procedures is as follows:
where each estimate parameter can be acquired by partially maximizing the objective Q-function:
- (A.1)
- (A.2)
- (A.3)
.
In this regard, starting values,
, are necessary in order to initiate the iterative procedure. These can be obtained from the Least Squares Estimates (LSE) method (see [
25]).
An estimate of can be obtained by equating the score vector, including the first-order partial derivatives of with respect to each parameter, denoted by , to zero vector, leading to Q-function equations. In particular, we have the following explicit expressions:
2.1.1. Estimator of
Let
be the parameter value at a local maximum of the Q-function in Equation (2). Differentiating the previous expression in respect to
, we find
In matrix form, we have:
where:
and
We obtain the h-th iteration estimator of the parameter
as follows:
2.1.2. Estimator of
Now, we apply the same procedure for our parameters , …,:
For
, we have:
Then, we write the matrix form as follows:
where:
, and , for
Therefore, the h-th iteration estimators of the parameters
are given by:
2.1.3. Estimator of
Finally, for
, we have:
4. Simulation
In this section, we present a simulation study in order to illustrate the performance of our proposed estimation procedure of
under different scenarios. This study was performed based on the QR model with GARMA errors, designed with two covariates each measuring n samples. The response variable,
, is generated from the following equation:
The simulation scenario considers the following setting:
The sample size (n) is fixed at , , and and combinations of the vector of the true values for the parameters stated as , , , , including different degrees of asymmetry considered by choosing , , and as the quantile points for estimation, with 500 Monte Carlo replications for each n.
The covariate values for
,
are i.i.d random vectors generated from the standard normal distribution. The data for the error term,
, are also generated by taking the different distributions, density-Gaussian,
, density function for Student distribution with
degrees of freedom (
), and the density function for the skewed t distribution with
degrees of freedom (
). The performance and recovery of the estimators are assessed by the bias and Root Mean Square Error (RMSE) to ensure that the parameters estimated have small bias and small variance, which are stated, respectively, by:
and
where
and
are the true parameter value and its respective
maximum-likelihood estimate using the EM algorithm. The Monte Carlo simulation experiments were performed using the R software (version 4.4.1); see
www.r-project.org, accessed on 31 October 2023.
The results derived from simulation studies are presented in
Table 1,
Table 2 and
Table 3. A look at the results in all these tables allows us to conclude that, as expected, in general, as the sample size increases, the bias and RMSE of the estimators decrease for all values of quantile points (
) considered. Moreover,
,
, and
all seem to be consistent and asymptotically normally distributed.
5. Real Data Example
In this work, to evaluate our proposed quantile-regression model with GARMA errors formulated in this paper, we used a dataset called Engel (Engel food expenditure data), sourced from the R package quantreg. Koenker and Bassett [
27] utilized this dataset to exemplify the application of quantile regression in R using authentic, real-world data.
In
Figure 1, displayed below, Koenker and Bassett show the scatter plot of the original data on double log axes. Fitted regression quartile lines are superimposed on the same figure. The estimated slope parameters (Engel elasticities), 0.8358, 0.8326, 0.8780, and 0.9170, for the quartiles corresponding to the 20th, 40th, 60th, and 80th percentiles, respectively. It is interesting to note in this example that the fitted quartile lines indicate an increasing conditional scale effect.
To investigate the influence of the GARMA error model’s “significance” in quantile-regression models, we suggest the following approach with the GARMA errors expressed as:
with
where:
Our methodology commences by estimating the parameters
,
,
, and
at the quartiles 0.2, 0.4, 0.6, and 0.8 via our method described in
Section 2 with the R software (version 4.4.1).
Table 4 shows the estimated coefficients (Estimate) and the corresponding predictions at four quantile levels,
, and
.
Upon examination of
Table 4, the estimates associated with the covariate “income” consistently escalate as
increases, in accordance with our expectations. Moreover, the values of prediction are close to the true value, which is
, and tend to increase as
increases, as expected.
This closeness to the true value not only validates the efficacy of our method but also underscores its reliability in making accurate predictions. Moreover, akin to the estimates, the predicted values also manifest a tendency to increase with escalating , a phenomenon that aligns seamlessly with our anticipated behavior of the model.