A General Family of Autoregressive Conditional Duration Models Applied to High-Frequency Financial Data

In this paper, we propose a general family of Birnbaum–Saunders autoregressive conditional duration (BS-ACD) models based on generalized Birnbaum–Saunders (GBS) distributions, denoted by GBS-ACD. We further generalize these GBS-ACD models by using a Box-Cox transformation with a shape parameter λ to the conditional median dynamics and an asymmetric response to shocks; this is denoted by GBS-AACD. We then carry out a Monte Carlo simulation study to evaluate the performance of the GBS-ACD models. Finally, an illustration of the proposed models is made by using New York stock exchange (NYSE) transaction data.


Introduction
The modeling of high-frequency financial data has been the focus of intense interest over the last decades. A prominent approach to modeling the durations between successive events (trades, quotes, price changes, etc.) was introduced by Engle and Russell (1998). These authors proposed the autoregressive conditional duration (ACD) model, which has some similarities with the ARCH (Engle 1982) and GARCH (Bollerslev 1986) models. The usefulness of appropriately modeling duration data is stressed by the relatively recent market microstructure literature; see Diamond and Verrechia (1987), Easley and O'Hara (1992), and Easley et al. (1997). Generalizations of the original ACD model are basically based on the following three aspects, i.e., (a) the distributional assumption in order to yield a unimodal failure rate (FR) (Grammig and Maurer 2000;Lunde 1999), (b) the linear form for the conditional mean (median) dynamics (Allen et al. 2008;Bauwens and Giot 2000;Fernandes and Grammig 2006), and (c) the time series properties (Bauwens and Giot 2003;Chiang 2007;De Luca and Zuccolotto 2006;Jasiak 1998;Zhang et al. 2001); see the reviews by Pacurar (2008) and Bhogal and Variyam Thekke (2019). Bhatti (2010) proposed a generalization of the ACD model that falls into all three branches above, based on the Birnbaum-Saunders (BS) distribution, denoted as the BS-ACD model. This model has several advantages over the traditional ACD ones; in particular, the BS-ACD model (1) has a realistic distributional assumption, that is, it provides both an asymmetric probability density function (PDF) and a unimodal FR shape; (2) it provides a natural parametrization of the point process in terms of a conditional median duration which is expected to improve the model fit despite a conditional mean duration, since the median is generally considered to be a better measure of central tendency than the mean for asymmetrical and heavy-tailed distributions; and (3) has easy implementation for estimation; see Ghosh and Mukherjee (2006), Bhatti (2010), Leiva et al. (2014), and Saulo et al. (2019).
Based on the relationship between the BS and symmetric distributions, Díaz-García and Leiva (2005) introduced generalized BS (GBS) distributions, obtaining a wider class of distributions that has either lighter or heavier tails than the BS density, allowing them to provide more flexibility. This new class essentially provides flexibility in the kurtosis level; see Sanhueza et al. (2008). In addition, the GBS distributions produce models whose parameter estimates are often robust to atypical data; see Leiva et al. (2008) and Barros et al. (2008). The GBS family includes as special cases the BS, BS-Laplace (BS-LA), BS-Logistic (BS-LO), BS-power-exponential (BS-PE), and BS-Student-t (BS-t) distributions.
The main aim of this work is to generalize the BS-ACD model, which was proposed by Bhatti (2010), based on GBS distributions (GBS-ACD). The proposed models should hold with the properties of the BS-ACD model, but, in addition, they should provide further properties and more flexibility. As mentioned before, the GBS family has models that have heavier tails than the BS density, and this characteristic is very useful in the modeling of high-frequency financial durations, since duration data are heavy-tailed and heavily right-skewed. We subsequently develop a class of augmented GBS-ACD (GBS-AACD) models by using the Box-Cox transformation (Box and Cox 1964) with a shape parameter λ ≥ 0 to the conditional duration process and an asymmetric response to shocks; see Fernandes and Grammig (2006). Thus, the proposed GBS-ACD and GBS-AACD models would provide greater range and flexibility while fitting data. We apply the proposed models to high-frequency financial transaction (trade duration, TD) data. This type of data has unique features absent in data with low frequencies.
For example, TD data (1) inherently arrive in irregular time intervals, (2) possess a large number of observations, (3) exhibit some diurnal pattern, i.e., activity is higher near the beginning and closing than in the middle of the trading day, and (4) present a unimodal failure rate; see Engle and Russell (1998) and Bhatti (2010). In addition, TD data have a relevant role in market microstructure theory, since they can be used as proxies for the existence of information in the market, and then serve as predictors for other market microstructure variables; see Mayorov (2011).
The rest of the paper proceeds as follows. Section 2 describes the BS and GBS distributions. In addition, some propositions are presented. Section 3 derives the GBS-ACD models associated with these distributions. Section 4 derives the GBS-AACD class of models. A Monte Carlo study of the proposed GBS-ACD model is performed in Section 5. Next, Section 6 presents an application of the proposed models to three data sets of New York stock exchange (NYSE) securities, and their fits are then assessed by a goodness-of-fit test. Finally, Section 7 offers some concluding remarks.

The Birnbaum-Saunders Distribution and Its Generalization
In this section, we describe the BS and GBS distributions and some of their properties. The two-parameter BS distribution was introduced by Birnbaum and Saunders (1969) for modeling failure times of a material exposed to fatigue. They assumed that the fatigue failure follows from the development and growth of a dominant crack. Let θ = (κ, σ) and Expressions for the first, second, and third derivatives of the function a(·; θ) are, respectively, given by a (x; θ) = 3 A random variable (RV) X has a BS distribution with parameter vector θ = (κ, σ) , denoted by BS(θ), if it can be expressed as where a −1 (·; θ) denotes the inverse function of a(·; θ), κ is a shape parameter, and when it decreases to zero, the BS distribution approaches the normal distribution with mean σ and variance τ, such that τ → 0 when κ → 0. In addition, σ is a scale parameter and also the median of the distribution F BS (σ) = 0.5, where F BS is the BS cumulative distribution function (CDF). The BS distribution holds proportionality and reciprocal properties given by b X ∼ BS(κ, b σ), with b > 0, and 1/X ∼ BS(κ, 1/σ); see Saunders (1974). The probability density function (PDF) of a two-parameter BS random variable X is given by where φ(·) denotes the PDF of the standard normal distribution. Díaz-García and Leiva (2005) proposed the GBS distribution by assuming that Z in (3) follows a symmetric distribution in R, denoted by X ∼ GBS(θ, g), where g is a density generator function associated with the particular symmetric distribution. An RV Z has a standard symmetric distribution, denoted by Z ∼ S(0, 1; g) ≡ S(g), if its density takes the form f Z (z) = c g(z 2 ) for z ∈ R, where g(u) with u > 0 is a real function that generates the density of Z, and c is the normalization constant, that is, c = 1/ +∞ −∞ g(z 2 )dz. Note that U = Z 2 ∼ Gχ 2 (1; g), namely, U has a generalized chi-squared (Gχ 2 ) distribution with one degree of freedom and density generator g; see Fang et al. (1990). Table 1 presents some characteristics and the values of u 1 (g), u 2 (g), u 3 (g), and u 4 (g) for the Laplace, logistic, normal, power-exponential (PE) and Student-t symmetric distributions, where u r (g) = E[U r ] denotes the rth moment of U. Table 1. Constants (c and c g 2 ), density generators (g), and expressions of some moments u r (g) for the indicated distributions.
Consider an RV Z such that Z = a(X; θ) ∼ S(g) so that The density associated with X in (5) is given by where, as mentioned earlier, g is the generator and c the normalizing constant associated with a particular symmetric density; see Table 1. The mean and variance of X are, respectively, where u r = u r (g) = E[U r ], with U ∼ Gχ 2 (1, g); see Table 1. Based on Table 1, the expressions for the BS-LA, BS-LO, BS-PE, and BS-t densities are as follows: x > 0 and κ, σ, η > 0.
Note that if η = 1 (BS-PE) or if η → ∞ (BS-t), then we obtain the BS distribution. It is worthwhile to point out that the BS-PE distribution has a greater (smaller) kurtosis than that of the BS distribution when η < 1 (η > 1). In addition, the BS-t distribution has a greater degree of kurtosis than that of the BS distribution for η > 8; see Marchant et al. (2013). Let An alternative way to obtain GBS distributions is through sinh-symmetric (SHS) distributions. Díaz-García and Domínguez-Molina (2006) proposed SHS distributions by using the sinh-normal distribution introduced by Rieck and Nedelman (1991) in the symmetric case. They assumed the standard symmetrically distributed RV Z as follows: Then, The density associated with H in (10) is given by where g and c are as given in (6). A prominent result, which will be useful later on, is the following.

Existing ACD Models
Let X i = T i − T i−1 denote the trade duration, i.e., the time elapsed between two transactions occurring at times T i and T i−1 . Engle and Russell (1998) assumed that the serial dependence commonly found in financial duration data is described by ψ i = E[X i |F i−1 ], where ψ i stands for the conditional mean of the ith trade duration based on the conditioning information set F i−1 , which includes all information available at time T i−1 . The basic ACD(r, s) model is then defined as where r and s refer to the orders of the lags and {ε i } is an independent and identically distributed nonnegative sequence with PDF f (·). Engle and Russell (1998) assumed a linear ACD(1,1) model defined by ψ i = α + βx i−1 + γψ i−1 , where α, β, and γ are the parameters. Note that a wide range of ACD model specifications may be defined by different distributions of ε i and specifications of ψ i ; see Fernandes and Grammig (2006) and Pathmanathan et al. (2009). An alternative ACD model is the Birnbaum-Saunders autoregressive conditional duration (BS-ACD) model proposed by Bhatti (2010). This approach takes into account the natural scale parameter in the BS(θ) distribution to specify the BS-ACD model in terms of a time-varying conditional median duration σ i = F −1 BS (0.5|F i−1 ), where F BS denotes the CDF of the BS distribution. This specification has several advantages over the existing ACD models, as previously mentioned, including a realistic distributional assumption-an expected improvement in the model fit as a result of the natural parametrization in terms of the conditional median duration, since the median is generally considered to be a better measure of central tendency than the mean for asymmetrical and heavy-tailed distributions-and ease of fitting.
The PDF associated with the BS-ACD(r, s) model is given by where

GBS-ACD Models
We now extend the class of BS-ACD(r, s) models by using the GBS distributions. As explained earlier, this family of distributions possesses either lighter or heavier tails than the BS density, thus providing more flexibility. From the density given in (6), the GBS-ACD(r, s) model can be obtained in a way analogous to that provided for the BS-ACD(r, s) model in (13) with an associated PDF expressed as where c and g are as given in (6), θ i = (κ, σ i ) for i = 1, . . . , n, with where ξ = (κ, α, β 1 , . . . , β r , γ 1 , . . . , γ s ) and ζ = (ζ 1 , . . . , ζ k ) denotes the additional parameters required by the density function in (6).

Properties
Proposition 2 (Expected value of logarithmic duration in the GBS-ACD(r, s) model). Assuming that the The constant (depending only on the kernel g) u 1 is given in (7).

Estimation
Let (X 1 , . . . , X n ) be a sample from X i ∼ GBS(θ i , g) for i = 1, . . . , n, and let x = (x 1 , . . . , x n ) be the observed durations. Then, the log-likelihood function for ξ = (κ, α, β 1 , . . . , β r , γ 1 , . . . , γ s ) is obtained as where the time-varying conditional median σ i is given as in (16). The maximum-likelihood (ML) estimates can be obtained by maximizing the expression defined in (18) by equating the score vector˙ GBS (ξ), which contains the first derivatives of GBS (ξ), to zero, providing the likelihood equations. They must be solved by an iterative procedure for non-linear optimization, such as the Broyden-Fletcher-Goldfarb-Shanno (BFGS) quasi-Newton method. It can easily be seen that the first-order partial derivatives of GBS (ξ) are The asymptotic distribution of the ML estimator ξ can be used to perform inference for ξ. This estimator is consistent and has an asymptotic multivariate normal joint distribution with mean ξ and covariance matrix Σ ξ , which may be obtained from the corresponding expected Fisher information matrix I(ξ). Then, Notice that I(ξ) −1 is a consistent estimator of the asymptotic variance-covariance matrix of ξ. Here, we approximate the expected Fisher information matrix by its observed version obtained from the Hessian matrix¨ GBS (ξ), which contains the second derivatives of GBS (ξ). The elements of the Hessian are expressed as follows: for each u, v ∈ {κ, α, β 1 , . . . , β r , γ 1 , . . . , γ s }, where The partial derivatives are given in (19). Furthermore, the second-order partial derivatives of a(x i ; θ i ) and a (x i ; θ i ) in (21), respectively, are given by Note that the functions a(x i ; θ i ) and a (x i ; θ i ) have continuous second-order partial derivatives at a given point θ i ∈ R 4 , i = 1, . . . , n. Then, by Schwarz's Theorem, it follows that the partial differentiations of these functions are commutative at that point, that is, With this in mind, the mixed partial derivatives of a(x i ; θ i ) and a (x i ; θ i ) in (21) have the following form: In the above identities, the mixed partial derivatives ∂ 2 σ i ∂α∂w 2 and ∂ 2 σ i ∂β l ∂γ m , respectively, are given by . . , s and i = 1, . . . , n.

Residual Analysis
We carry out goodness-of-fit through residual analysis. In particular, we consider the generalized Cox-Snell residual, which is given by whereŜ(x i |F i−1 ) denotes the fitted conditional survival function. When the model is correctly specified, the Cox-Snell residual has a unit exponential (EXP(1)) distribution; see Bhatti (2010).

The GBS-AACD Models
Now, we introduce a generalization of the linear form for the conditional median dynamics based on the Box-Cox transformation; see Box and Cox (1964) and Fernandes and Grammig (2006) for pertinent details. Hereafter, we use the log-linear form σ i given in (16) with r = 1 and s = 1 (i.e., the GBS-ACD(r = 1, s = 1) model, which we abbreviate as the GBS-ACD model, since a higher-order model does not increase the distributional fit of the residuals (Bhatti 2010)). Therefore, (16) results in The asymmetric version of the GBS-ACD model-GBS-AACD model-is given by where b and c are the shift and rotation parameters, respectively. By applying the Box-Cox transformation with parameter λ ≥ 0 to the conditional duration model process σ i and introducing the parameter ν, we can write (24) as The parameter λ determines the shape of the transformation, i.e., concave (λ ≤ 1) or convex (λ ≥ 1), and the parameter ν aims to transform the (potentially shifted and rotated) term |ϕ i−1 − b| + c(ϕ i−1 − b) . Setting α = λα * − β + 1 and γ = λγ * , we obtain We present below the forms of GBS-AACD models obtained from different specifications. Note that the Logarithmic GBS-ACD type II is equivalent to (23).

Numerical Results for the GBS-ACD Models
In this section, we perform two simulation studies, one for evaluating the behavior of the ML estimators of the GBS-ACD models, and another for examining the performance of the residuals. We have focused on the GBS-ACD models because similar results were obtained for the GBS-AACD models.

Study of ML Estimators
Through a Monte Carlo (MC) study, we evaluate here the finite sample behavior of the ML estimators of the GBS-ACD model parameters presented in Section 3. The sample sizes considered were n = 500, 1000, and 3000. The number of MC replications was B = 1000. The data-generating process for each of the realizations is where the distribution of i is a generalized gamma with density f (x; µ, σ, ν) = θ θ z θ ν exp(−θz)/(Γ(θ)x) with z = (x/µ) µ and θ = 1/σ 2 |ν| 2 . Note that stationarity conditions only require |β| < 1, and in (27), β = 0.9; see Bauwens and Giot (2000).
We estimate the GBS-ACD model parameters through the following two-step algorithm: • Estimate only the ACD parameters (α, β, γ) by the Nelder and Mead (1965)  • Estimate all of the ACD model parameters using the Broyden-Fletcher-Goldfarb-Shanno (BFGS) quasi-Newton approach, with starting values obtained from the estimates obtained in the anterior step.
The estimation results from the simulation study are presented in Table 2. The following sample statistics for the ML estimates are reported: Mean, coefficients of skewness (CS) and kurtosis (CK), relative bias (the RB, in absolute values, is defined as |E( τ) − τ|/τ, where τ is an estimator of a parameter τ), and root mean squared error ( √ MSE). The sample CS and CK are, respectively, given by where x = (x i , . . . , x n ) denotes an observation of the sample. This definition of kurtosis is the raw measure, not excess kurtosis, which subtracts three from this quantity. From Table 2, we note that, as the sample size increases, the RBs and √ MSE become smaller. We can also note that both β and γ are persistently skewed and somewhat unstable; nonetheless, they remain close to a normal distribution in terms of their skewness and kurtosis values.

Study of Residuals
We now carry out an MC simulation study to examine the performance of the Cox-Snell residual r cs defined in (22). To do so, we use the estimation procedure presented in Section 5.1 and consider only the BS-PE-ACD model, as it provides greater flexibility in relation to other models, that is, it has either less or greater (lighter or heavier tails) than the BS distribution. The BS-PE-ACD samples are generated using the transformation in (5). We simulate B = 1000 MC samples of size n = 500. The empirical autocorrelation function (ACF) of the residual r cs is plotted in Figure 1a. This plot indicates that the BS-PE-ACD model is well specified, since the residual r cs mimics a sequence of independent and identically distributed RVs and there is no indication of serial correlation. Moreover, the empirical mean of the residual r cs , whose value was expected to be 1, was 0.9836. Finally, using a quantile-against-quantile (QQ) plot with a simulated envelope (see Figure 1b), we note that the Cox-Snell residual has an excellent agreement with the EXP(1) distribution, which supports the adequacy and flexibility of the BS-PE-ACD model. It is then possible to conclude that the residual r cs seems adequate to assess the adjustment of the proposed models. a quantile-against-quantile (QQ) plot with a simulated envelope (see Figure 1b), we note that the Cox-Snell residual has an excellent agreement with the EXP(1) distribution, which supports the adequacy and flexibility of the BS-PE-ACD model. It is then possible to conclude that the residual r cs seems adequate to assess the adjustment of the proposed models.

Application to Analysis of Financial Transaction Data
In this section, our objective is to assess the GBS-ACD and GBS-AACD models using TD data. In particular, we consider here three TD data sets studied in Bhatti (2010), corresponding to the time elapsed (in seconds) between two consecutive transactions, which cover forty trading days from January 1, 2002 to February 28, 2002: International Business Machines (IBM), Johnson and Johnson Company (JNJ), and The Proctor and Gamble Company (PG). Note that, as mentioned before, these types of data exhibit some diurnal patterns, so that the final data sets are constructed from adjusted TDx i = x i /φ, whereφ = exp(ŝ) andŝ denotes a set of quadratic functions and indicator variables for each half-hour interval of the trading day from 9:30 am to 4:00 pm; for more details, see Giot (2000), Tsay (2002), and Bhatti (2010). Table 3 provides some descriptive statistics for both plain and diurnally adjusted TD data, which include central tendency statistics and coefficients of variation (CV), of skewness (CS), and of kurtosis (CK), among others. These measures indicate the positively skewed nature and the high kurtosis of the data. Figure 2 shows graphical plots of the ACF and partial ACF for the IBM, JNJ, and PG data sets, which indicate the presence of serial correlation.

Application to Analysis of Financial Transaction Data
In this section, our objective is to assess the GBS-ACD and GBS-AACD models using TD data. In particular, we consider here three TD data sets studied in Bhatti (2010), corresponding to the time elapsed (in seconds) between two consecutive transactions, which cover forty trading days from January 1, 2002 to February 28, 2002: International Business Machines (IBM), Johnson and Johnson Company (JNJ), and The Proctor and Gamble Company (PG). Note that, as mentioned before, these types of data exhibit some diurnal patterns, so that the final data sets are constructed from adjusted TDx i = x i /φ, whereφ = exp(ŝ) andŝ denotes a set of quadratic functions and indicator variables for each half-hour interval of the trading day from 9:30 am to 4:00 pm; for more details, see Giot (2000), Tsay (2002), and Bhatti (2010). Table 3 provides some descriptive statistics for both plain and diurnally adjusted TD data, which include central tendency statistics and coefficients of variation (CV), of skewness (CS), and of kurtosis (CK), among others. These measures indicate the positively skewed nature and the high kurtosis of the data. Figure 2 shows graphical plots of the ACF and partial ACF for the IBM, JNJ, and PG data sets, which indicate the presence of serial correlation.  The hazard function of a positive RV X is given by h X (t) = f X (x)/(1 − F X (x)), where f X (·) and F X (·) are the PDF and CDF of X, respectively. A useful way to characterize the hazard function is by the scaled total time on test (TTT) function, namely, we can detect the type of hazard function that the data have and then choose an appropriate distribution. The TTT function is given by W X (u) =

Exploratory Data Analysis
is the inverse CDF of X. By plotting the consecutive points (k/n, W n (k/n)) with W n (k/n) = [∑ k i=1 x (i) + (n − k)x k ]/ ∑ n i=1 x (i) for k = 0, . . . , n, and x (i) being the ith-order statistic, it is possible to approximate W X (·); see Aarset (1987) and Azevedo et al. (2012).
From Figure 3, we observe that the TTT plots suggest a failure rate with a unimodal shape. We also observe that the histograms suggest a positive skewness for the data density. This supports the results obtained in Table 3. However, Huber and Vanderviere (2008) pointed out that, in cases where the data follow a skewed distribution, a significant number of observations can be classified as atypical when they are not. The boxplots depicted in Figure 3 suggest such a situation, i.e., most of the observations considered as potential outliers by the usual boxplot are not outliers when we consider the adjusted boxplot. The hazard function of a positive RV X is given by h X (t) = f X (x)/(1 − F X (x)), where f X (·) and F X (·) are the PDF and CDF of X, respectively. A useful way to characterize the hazard function is by the scaled total time on test (TTT) function, namely, we can detect the type of hazard function that the data have and then choose an appropriate distribution. The TTT function is given by W X (u) = is the inverse CDF of X. By plotting the consecutive points (k/n, W n (k/n)) with W n (k/n) = [∑ k i=1 x (i) + (n − k)x k ]/ ∑ n i=1 x (i) for k = 0, . . . , n, and x (i) being the ith-order statistic, it is possible to approximate W X (·); see Aarset (1987) and Azevedo et al. (2012).
From Figure 3, we observe that the TTT plots suggest a failure rate with a unimodal shape. We also observe that the histograms suggest a positive skewness for the data density. This supports the results obtained in Table 3. However, Huber and Vanderviere (2008) pointed out that, in cases where the data follow a skewed distribution, a significant number of observations can be classified as atypical when they are not. The boxplots depicted in Figure 3 suggest such a situation, i.e., most of the observations considered as potential outliers by the usual boxplot are not outliers when we consider the adjusted boxplot.

Estimation Results and Analysis of Goodness-of-Fit for the GBS-ACD Models
We now estimate the GBS-ACD models by the maximum likelihood method using the steps described in Section 5.1. Tables 4-6 present the estimation results for the indicated models. The standard errors (SEs) are reported in parentheses and ℓ stands for the value of the log-likelihood function, whereas AIC = −2ℓ + 2k and BIC = −2ℓ + k ln n denote, respectively, the Akaike information and Bayesian information criteria, where k stands for the number of parameters and n for the number of observations. The maximum and minimum values of the sample autocorrelations (ACF) from order 1 to 60 are also reported. Finally,γ denotes the mean magnitude of autocorrelation for the first 15 lags, namely,γ = 1/15 ∑ 15 i=1 |γ k |, where γ k = cor(x i , x i+k ). The mean magnitude of autocorrelationγ is relevant for separating the influence of the sample size on the measure of the degree of autocorrelation in the residuals.

Estimation Results and Analysis of Goodness-of-Fit for the GBS-ACD Models
We now estimate the GBS-ACD models by the maximum likelihood method using the steps described in Section 5.1. Tables 4-6 present the estimation results for the indicated models. The standard errors (SEs) are reported in parentheses and stands for the value of the log-likelihood function, whereas AIC = −2 + 2k and BIC = −2 + k ln n denote, respectively, the Akaike information and Bayesian information criteria, where k stands for the number of parameters and n for the number of observations. The maximum and minimum values of the sample autocorrelations (ACF) from order 1 to 60 are also reported. Finally,γ denotes the mean magnitude of autocorrelation for the first 15 lags, namely,γ = 1/15 ∑ 15 i=1 |γ k |, where γ k = cor(x i , x i+k ). The mean magnitude of autocorrelationγ is relevant for separating the influence of the sample size on the measure of the degree of autocorrelation in the residuals.
From Tables 4-6, we observe that all of the parameters are statistically significant at the 1% level. It is also interesting to observe that, in general, the ACD parameter estimates are very similar across the models independently of the assumed distribution. In terms of AIC values, the BS-PE-ACD model outperforms all other models. Based on the BIC values, we note that the BS-PE-ACD model once again outperforms the remaining models, except for the JNJ data set. However, in this case, there does not exist one best model, since the BIC values for the BS-ACD and BS-PE-ACD models are very close.
In order to check for misspecification, we look at the sample ACF from order 1 to 60. Tables 4-6 report that there is no sample autocorrelation greater than 0.05 (in magnitude) throughout the models and residuals. Figure 4 shows the QQ plots of the Cox-Snell residual with the IBM, JNJ, and PG data sets. The QQ plot allows us to check graphically if the residual follows the EXP(1) distribution. These graphical plots show an overall superiority in terms of fit of the BS-PE-ACD model. Moreover, the empirical means of the residual r cs for the BS-PE-ACD model with the IBM, GM, and PG data sets were 1.0271, 0.9990, and 1.0153, respectively. Thus, the BS-PE-ACD model seems to be more suitable for modeling the data considered. It must be emphasized that this model provides greater flexibility in terms of kurtosis compared to the BS-ACD model. Table 4. Estimation results based on the generalized Birnbaum-Saunders autoregressive conditional duration (GBS-ACD) models for IBM trade durations.  Table 5. Estimation results based on the GBS-ACD models for JNJ trade durations.  Table 6. Estimation results based on the GBS-ACD models for PG trade durations.  Table 6. Estimation results based on the GBS-ACD models for PG trade durations.

Estimation Results for the BS-PE-AACD Models
We estimate here different ACD specifications (see Section 4) assuming a BS-PE PDF and using JNJ TD data. We focus on the BS-PE-AACD models (in short, AACD models), since, as observed in Section 6.2, this model fits the data adequately to provide effective ML-based inference. The estimation is performed using the steps presented in Section 5.1.
Tables 7 and 8 report the estimation results for different specifications. It is important to point out that the estimates of the BS-PE parameters κ and η are quite robust throughout the specifications. The Box-Cox ACD result (see column BCACD) shows that allowing ν of ϕ i−1 to freely vary in the logarithm ACD processes (LACD I and LACD II) increases the log-likelihood value, indicating that ν may play a role. In fact,ν is significantly different from zero and one, thus supporting the BCACD model against its logarithm counterparts, i.e., LACD I and LACD II. The AIC values show that the BCACD, LACD I, and AACD are best models. From the BIC values, the LACD I, BCACD, and AACD models are the best ones. Note, however, that the BIC values for the LACD I and BCACD models are quite close. Tables 4-6 also show that there is no sample autocorrelation greater than 0.05 (in magnitude) throughout the models and residuals.

Estimation Results for the BS-PE-AACD Models
We estimate here different ACD specifications (see Section 4) assuming a BS-PE PDF and using JNJ TD data. We focus on the BS-PE-AACD models (in short, AACD models), since, as observed in Section 6.2, this model fits the data adequately to provide effective ML-based inference. The estimation is performed using the steps presented in Section 5.1.
Tables 7 and 8 report the estimation results for different specifications. It is important to point out that the estimates of the BS-PE parameters κ and η are quite robust throughout the specifications. The Box-Cox ACD result (see column BCACD) shows that allowing ν of ϕ i−1 to freely vary in the logarithm ACD processes (LACD I and LACD II) increases the log-likelihood value, indicating that ν may play a role. In fact,ν is significantly different from zero and one, thus supporting the BCACD model against its logarithm counterparts, i.e., LACD I and LACD II. The AIC values show that the BCACD, LACD I, and AACD are best models. From the BIC values, the LACD I, BCACD, and AACD models are the best ones. Note, however, that the BIC values for the LACD I and BCACD models are quite close. Tables 4-6 also show that there is no sample autocorrelation greater than 0.05 (in magnitude) throughout the models and residuals.

Concluding Remarks
We have introduced a general class of ACD models based on GBS distributions. These distributions possess either lighter or heavier tails than the BS distribution, thus providing a wider class of positively skewed densities with nonnegative support. In addition, we have proposed a wider class of GBS-ACD models based on the Box-Cox transformation with a shape parameter to the conditional duration process and an asymmetric response to shocks. We then investigated the performance of the maximum likelihood estimates of the GBS-ACD models by means of an MC study. We also compared the proposed GBS-ACD and GBS-AACD models through an analysis with real financial data sets, which has shown the superiority of the BS-PE-ACD and BS-PE-BCACD models. A future line of research may be the out-of-sample forecast abilities of these models, as well as their application to other types of irregularly time-spaced data (besides TD data).
Author Contributions: All authors contributed equally to this manuscript. All authors have read and agreed to the published version of the manuscript Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.
Using this identity and Proposition 2 in the second identity for E[(ln X i ) 2 ] in (A3), the proof follows.