Quasi-Maximum Likelihood Estimation for Long Memory Stock Transaction Data — Under Conditional Heteroskedasticity Framework

This paper introduces Quasi-Maximum Likelihood Estimation for Long Memory Stock Transaction Data of unknown underlying distribution. The moments with conditional heteroscedasticity have been discussed. In a Monte Carlo experiment, it was found that the QML estimator performs as well as CLS and FGLS in terms of eliminating serial correlations, but the estimator can be sensitive to start value. Hence, two-stage QML has been suggested. In empirical estimation on two stock transaction data for Ericsson and AstraZeneca, the 2SQML turns out relatively more efficient than CLS and FGLS. The empirical results suggest that both of the series have long memory properties that imply that the impact of macroeconomic news or rumors in one point of time has a persistence impact on future transactions.


Introduction
Classical economic theory of price determination is a function of demand and supply.For example, in the Walrasian auctioneer approach, demands and supplies of a good are aggregated to find a market-clearing price.However, availability of high frequency data enables studying market mechanisms or market microstructures.These studies depart from demand-supply function of price determination under classical economics and takes into account other factors considered to be influential in price determination.For instance, Working (1953) in matching demand and supply curves in equilibrium expands focus on the underlying trading mechanism.Demsetz (1968) investigates effect of transactions costs in price determination in securities market and incorporates the influence of time factor of demand and supply in analyzing formation of market prices.Studies of market microstructure concern, among others, the impact of transactions, bid-ask spreads (the difference between bid and ask prices), volume of and time between transactions (duration) on price formation.The studies also investigate trading behavior of actors in response to news, rumors, etc.In the securities market, a transaction (or a trade) is completed by a buyer and a seller agreeing to exchange a specific volume of stocks at certain price.These transactions and time elapsed between each transaction are correlated.Fewer trades take place with increasing time lapse between successive transactions in a given time interval.Therefore, trading intensity and duration can be considered to be inversely related.During the past decades research on market microstructure to understand pricing processes has been centered on the trading intensity and durations.Diamond and Verrecchia (1987) illustrate implications of bad news in low trading intensity.Easley and O'hara (1992) differ on level of implication demonstrating that a low trading intensity implies no news.Besides, Engle (2000) finds that durations are associated with price volatilities.Moreover, the stock transactions data are counts for a fixed interval of time and Quoreshi (2014) explains that the time series of the data may also have long memory property.
The long memory phenomenon in time series has been first considered by Hurst (1951Hurst ( , 1956)).In these studies, he explains the long term storage requirements of the Nile River.He shows that the cumulated water flows in a year depends not only on the water flows in recent years, but also on water flows in years much prior to the present year.Mandelbrot and van Ness (1968) explain and advance Hurst's studies by employing fractional Brownian motion.In analogy with Mandelbrot and van Ness (1968); Granger (1980); Granger and Joyeux (1980) and Hosking (1981) developed Autoregressive Fractionally Integrated Moving Average (ARFIMA) models to account for the long memory in time series data.Granger and Ding (1996) pointed out that a number of other processes can also have the long memory property.An empirical study regarding the usefulness of ARFIMA models is conducted by Bhardwaj and Swanson (2005), who find strong evidence in favor of ARFIMA in absolute, squared, and log-squared stock index returns.
A time series of count data describes a non-negative sequence of count observations, which is integer-valued and observed at equidistant intervals of time.The literature on different techniques to model, estimate and exploit such data is ever growing.Jacobs and Lewis (1978a, 1978b, 1983) introduced time dependence and developed discrete ARMA (DARMA) models.An important difference between the continuous variable ARMA model and its corresponding Integer-valued ARMA (INARMA) version is that the latter contains parameters that are interpreted as probabilities and then take on values in narrower intervals than do the parameters of the ARMA model (e.g., McKenzie 1986;Al-Osh and Alzaid 1987;Al-Osh and Alzaid 1988).Brännäs and Quoreshi (2010) advance integer-valued moving average model to model the number of transactions in intraday data of stocks.Quoreshi (2006Quoreshi ( , 2008Quoreshi ( , 2014Quoreshi ( , 2017) ) advances further the INMA model into bivariate, multivariate and long memory (INARFIMA) framework.These papers consider Conditional Least Square (CLS), Feasible Generalized Least Square (FGLS), and Generalized Methods of Moments (GMM).A large number of studies have considered the modeling of bivariate or multivariate count data assuming an underlying Poisson distribution (e.g., Gourieroux et al. 1984).Heinen and Rengifo (2003) introduced multivariate time series count data models based on the Poisson and the double Poisson distribution.Other extensions to traditional count data regression models are considered by, e.g., Brännäs and Brännäs (2004) and Rydberg and Shephard (1999).None of these papers consider maximum likelihood estimation since full density function of underlying distribution is unknown.Sunecher et al. (2018) recently introduce a first-order bivariate integer-valued moving average process (BINMA(1)) where they propose a generalized quasi-likelihood (GQL) method of estimation.Ristic et al. (2018) introduces a new bivariate integer-valued moving average of the first order (BINMA(1)) process with independent Negative Binomial (NB) innovations under nonstationary moment conditions.They also employed a generalized quasi-likelihood (GQL) method of estimation.
In this paper, we propose a quasi-maximum likelihood (QML) estimator for nonstationary integer-valued long memory model for unknown underlying distribution and compare this estimator with CLS and FGLS that have performed better than GMM in the previous studies.We employ the QML estimator on stock transactions data for Ericsson and AstraZeneca.Both stock series demonstrate long memory.Empirically it is also found that the QML estimator is more efficient compared to the other estimators.
The paper is organized as follows.The INARFIMA (0, d, 0) model is presented in Section 2. The conditional and unconditional moment properties of the INARFIMA (0, d, 0) models are obtained.The estimation procedures, CLS and FGLS for unknown parameters are discussed and the QML estimator is proposed in Section 3. The results from a Monte Carlo experiment are presented in Section 4. A detailed description of the empirical data is given in Section 5.The empirical results for the stock series are presented in Section 6, and the concluding comments are included in the final section.Quoreshi (2014) proposes the INARFIMA (p, d, q) long memory properties of stock transaction data.The INMA (∞) representation of INARFIMA model, which is INARFIMA (0, d, 0), can be written as

The Model
where L is a lag operator and the notation L • = (•L) i for i > 0.Here we assume that here is no cross-lag dependence among u t .Note that y t has long memory in a sense that the variables have slow decaying autocorrelation functions and the parameters . .where d 0 = 1.Note that d j are considered thinning probabilities and hence The macroeconomic news or rumors are assumed to be captured by {u t } and filtered by {d j } through the system.The binomial thinning operator is used to account for the integer-valued property of count data.This operator can be written as an iid sequence of 0-1 random variables, such that Pr Many authors accentuate exact distributional results for y t , while Brännäs and Quoreshi (2010) stress only the first-and second-order moment conditions.In analogy with Brännäs and Quoreshi (2010), we construct the Quasi-Maximum Likelihood Estimation based on the first and second-order moment condition.
Assuming independence between and within the thinning operations and {u t } an iid sequence with mean λ and variance σ 2 = vλ where υ > 0, the unconditional first and second-order moments can be given as where γ k denotes the autocovariance function at lag k.It is obvious from (3)-( 5) that the mean, variance, and autocovariance take only positive values since λ, σ 2 , and d i are all positive and that ∞ i=1 d i < ∞ is required for y t to be a stationary process.Note also that the variance may be larger than the mean (overdispersion), smaller than the mean (underdispersion), or equal to the mean (equidispersion) depending on whether v > 1, v ∈ (0, 1) or v = 1, respectively.When lag length q is finite, summing to infinite is replaced to summing to q.The conditional mean, variance, and covariance for the INARFIMA (0, d, 0) are in an analogous way: where Y t−1 is the information set available at time t − 1.The conditional mean and variance vary with u t−i .Since the conditional variance varies with u t−i , there is a conditional heteroscedasticity or nonstationary property of moving average type that Brännäs and Hall (2001) called MACH(q).The effect of u t−i on the mean is greater than on the variance.Note also, that like the unconditional variance, the conditional variance could be overdispersed, underdispersed, or equidispersed depending on whether λ , respectively.

Estimation
If we do not assume a full density function, we may estimate the Quasi Maximum Likelihood (QML) Estimator as discussed by Weiss (1986) instead of Maximum Likelihood (ML) Estimator.Conditional Least Square (CLS), Feasible Generalized Least Square (FGLS) are Generalized Methods of Moments discussed in INMA model (Brännäs and Quoreshi 2010).In the previous studies, FGLS is the best estimator among the three in terms of eliminating serial correlation (Brännäs and Quoreshi 2010).The CLS is second, which is almost as good as FGLS.Here, we only construct the QML and FGLS for INARFIMA (0, d, 0) model and compare the results with CLS.
The Conditional least square (CLS) estimator for INARFIMA (0, d, 0) representation model has the following residual.
The criterion function S CLS = T i=m+1 e 2 t is minimized with respect to unknown parameters, i.e., ψ = (λ and δ ) where δ is vector of parameters with elements d i .The E t−1 is the conditional expected value of y t defined in Equation ( 6).Using a finite maximum lag m in (8) instead of infinite lags may cause biasing effects.Due to omitted variables, i.e., u t−m−1 , . . ., u t−∞ , we may expect a positive bias on the parameters λ and δ (Brännäs and Quoreshi 2010).These moment conditions correspond to the normal equations of the CLS estimator that focuses on the unknown parameters of the conditional mean function.Alternatively and equivalently, the properties E(e t ) = 0 and E(e t e t−j ) = 0, j ≥ 1 could be used.The FGLS estimator minimizes with V−1 as given.The variance of error from CLS estimates may be used for approximation of V−1 in equation ( 9).Alternatively, V−1 can be estimated as specified in (7) by employing estimates from CLS.The covariance matrix estimators for CLS and FGLS are The QML estimator for INARFIMA (0, d, 0) representation model has the same residual as in Equation ( 9), and we propose the following criterion function to maximize where ψ i = (λand δ ) and V t−1 is as in Equation (7).This specification may be motivated by the central limit theorem since y t are counts but not the E t−1 .According to the central limit theorem, the standardized expected value of a sample is normally distributed with mean zero and variance one if the sample size is large enough.A relevant empirical study of distribution properties on high-frequency intraday transaction prices are conducted by Andersen et al. (2001).Taking the logarithm of Equation ( 10), we may simply use the criterion function and minimize the function as where Vt−1 is an estimate for V t−1 that is to be estimated.Since T, 2, and π are constants, we can equivalently minimize the following criterion function Note that the Vt−1 is to be estimated at the same time as the other parameters.If the estimation is sensitive to the start value of Vt−1 , we can obviously estimate CLS at the first stage and calculate the Vt−1 which can be used as the start value for QML.We call this estimation procedure Two-Stage Quasi Maximum Likelihood (2SQML) Estimation.The covariance matrix estimators for QML and 2SQML are .
Note that we call an estimator relatively efficient if we have a smaller standard deviation (error) for the parameters of interest compared to the other estimator.The covariance matrices mentioned above calculate the standard error for the estimates in respective estimator.The criteria that are used to choose best model fitting are, e.g., Adjusted-R 2 , mean square error (MSE), Akaike Information Criteria (AIC), and Schwarz Bayesian Information Criterion (SBIC).The Adjusted-R 2 explains the degree of variation in the dependent variable explained by the independent variables, while MSE explains to what extend the regression models are unable to explain the variation of independent variable.Hence, this implies that the higher Adjusted-R 2 the smaller MSE.The S CLS is the MSE for Equation (8), while the corresponding Adjusted-R 2 is (1 − S CLS /var(y t )), where var(y t ) is the variance of the independent variable.The ACI and SBIC are similar to MSE.Instead of MSE, the AIC and SBIC use a function that incorporate the value from the likelihood function used in estimation, as in Equation ( 11).In the first step, these criteria may be used for model selection.
In time series, Ljung-Box statistics and Box-Pierce test are widely used to check the serial correlations between the residuals.These criteria can be used to evaluate estimators given a model.The null hypothesis is that the residuals e t are independently distributed while the alternative hypothesis is that the residuals are not independently distributed; there are serial correlations between the residuals.The Ljung-Box statistics for residuals can be written where n is the number of observations and k is the number of lags used for estimation (Ljung and Box 1978).The ρk is the autocorrelations of residuals at lag k.The null hypothesis is rejected if where χ 2 1−α,h is chi-squared distribution with significance level α and degree of freedom h.Similarly, the Box-Pierce test can be written as where n, k, are ρk as described above (Box and Pierce 1970).The difference between these two statistics emerges from the term (n + 2) and (n − k) used in Ljung-Box statistics and Q LB > Q BP and hence Q LB is more restrictive to reject the null hypothesis.

Monte Carlo Experiment
Quoreshi (2014) studies, in a brief Monte Carlo experiment, the bias, MSE, Ljung-Box statistics (LB), AIC, and SBIC properties of the CLS estimators for finite-lag specifications; data is generated according to INARFIMA (0, d, 0).Here, we study employing QML, 2SQML, and FGLS estimators and compare the results with CLS.Initially, we found that QML is highly sensitive to start values and produces large biased estimates.Hence, we only focus the result on 2SQML, FGLS, and CLS.This study generates data from Poisson distribution to study the bias properties under 2SQML and FGLS estimation procedure.Smith et al. (1996) and Quoreshi (2014) studied the bias and misspecification in ARFIMA and INARFIMA models, respectively.Drost et al. (2009) investigate finite sample behavior of semiparametric integer-valued AR(p) models, while Brännäs and Quoreshi (2010) study finite lag misspecification when the data is generated according to an infinite-lag INMA model.In this Monte Carlo experiment we study the bias, MSE, Ljung-Box statistics, AIC, and SBIC properties of the FGLS and 2SQML estimators for finite-lag specifications when data is generated according to INARFIMA (0, d, 0).The data generating process is as in (1), with d j = Γ( j + d)/[Γ( j + 1)Γ(d)], j = 0, 1, 2, . . .where d 0 = 1 and where u t is drawn according to Equation (2).The values for d,d = 0.1, 0.25 and 0.4 are used and lag length m = 70 is chosen.The u t sequence is generated as Poisson with parameter λ = 5 so that λ = σ 2 in the conditional variance equation in (7).Six time series with length T = 2000 and T = 10, 000 are generated.The first 500 observations are discarded to avoid the start-up effect.The results for the Monte Carlo experiment are given in Tables 1-3.In Table 1, we see that the estimates for d decreases as lag length (M) increases from 10 to 70 and approaches to the theoretical value d = 0.1.This implies that the biasness of the estimates decreases from 0.071 to 0.004 for T = 2000 as the lag length approaches the theoretical lag length 70 (see d corresponding to M10-M70 in Table 1).The parameter estimated with lag length 90 (M90) is d < 0.1, which implies negative biasness.Like Brännäs and Quoreshi (2010), we conclude that we may expect a positive biasing effect on the parameters due to omitted variables.The results for MSE, AIC, and SBIC for all the three estimators are same up to three decimals (Table 1).Depending on the size of d, the standard AIC and SBIC may need to be corrected.The result indicates that 2SQML, FGLS and CLS estimators perform equally well in terms of eliminating serial correlation (see Q LB100 and Q LB100 in Tables 1-3).However, standard error (see s.e. for M70 in Tables 1-3) for d varies slightly between the estimator.Taking this into account, we may conclude that CLS performs best while 2SQML estimator performs better than FGLS.It is to be noted that both QML and FGLS are sensitive to start values.In this case 2SQML should be used.

Data and Descriptive
In this paper, we use the same data set used by Quoreshi (2014).The reason for this is that we introduce Quasi-Maximum Likelihood Method, which has not been considered in the study and the studies on count data earlier due to unknown underlying distribution.We intend to employ this method on the same data set and replicate the previous study to compare with results emerged from employing this method.Quoreshi (2014) has downloaded the tick-by-tick data for Ericsson B and AstraZeneca from the Ecovision system and the data are later filtered to generate transaction data which are counts.The stocks are frequently traded and have the highest turnovers at the Stockholmsbörsen.The two stock series are collected for the period 5 November-12 December 2002.Due to a technical problem in downloading data there are no data for 12 November in the time series and the first captured minutes of 5 December is 1037.Since we are interested in capturing the number of ordinary transactions, we have deleted all trading before 0935 (trading opens at 0930) and after 1714 (order book closes at 1720).The transactions in the first few minutes are subject to a different trading mechanism while there is practically no trading after 1714.The data are aggregated into one minute intervals of time.For high frequency data, researchers usually use one, two, five, or ten minute intervals of time and the choice is rather arbitrary.There are altogether 11,960 observations for both the Ericsson B and AstraZeneca series.The series together with their autocorrelation and partial-autocorrelation functions and histograms are exhibited in Figure 1.There are frequent zero frequencies in both series, especially in the AstraZeneca series, and hence the application of count data modeling is called for.The counts in both series fluctuate around their means which is an indication of mean reverting processes.The autocorrelation functions for both series suggest fractional integration which implies long memory.The histograms exhibit the distribution of counts and the possible empirical densities of the counts.Even with relatively large sample size, the distributions appear far from being normally distributed.On the other hand, the distributions appear to be similar to Poisson distribution; both variances are greater than their respective means (Figure 1).Hence, there is no scope for employing known distributions.

Empirical Results
CLS, FGLS, and 2SQML are employed for estimation with lag length of 70 following the suggestions of Brännäs and Quoreshi (2010).It is to be noted that the AIC and SBIC criteria are not applicable in the context of long memory (Brännäs and Quoreshi 2010;Quoreshi 2014), which is also supported by the Monte Carlo Experiment.The results of the empirical studies are presented in Table 4. Empirically, we find evidence for long memory property ( d < 0.5) for both Ericsson B and AstraZeneca series (see Table 4).The series for both AstraZeneca and Ericsson has mean reversion property and is covariance stationary.The findings suggest that the impact of macroeconomic news or rumors in one point of time has a persistence impact on future transactions.It may be recalled that news disseminated through formal channels which may have impact on overall stock markets and specifically on a particular stock is termed macroeconomic news, such as news on interest rates or unemployment statistics for a country which may influence all stocks.Rumors are information that spread through unofficial channels yet is related to macroeconomic news or a particular stock.For AstraZeneca, we find that CLS perform best while 2SQML performs better than FGLS in terms of eliminating serial correlation.Considering the standard error of the parameters ( λ, d), we find that 2SQML performs best among the three estimators, while CLS performs better than FGLS.For Ericsson B, we find that 2SQML performs best in terms of both eliminating serial correlations and minimum standard error for the parameters.All other corresponding estimates turn out equal up to 3 decimals estimated by the three estimators.

Concluding Remarks
This paper introduces Quasi-Maximum Likelihood Estimation of Integer-Valued Long Memory Model for unknown underlying distribution and the estimation procedures for QML have been discussed.The paper compares the 2SQML estimator with FGLS and CLS.In a Monte Carlo experiment, it is found that the 2SQML, FGLS, and CLS estimators perform equally well in terms of eliminating serial correlation.The empirical study suggests that CLS performs best for AstraZeneca, while 2SQML performs best for Ericsson B in terms of eliminating serial correlations.However, the 2SQML estimator performs better than both the CLS and FGLS in terms of minimum standard error for estimates of the parameters for both Ericsson B and AstraZeneca, although CLS perform best followed by 2SQML in the simulation study.Note that the data in the simulation study is equidispersed since the data is generated from the Poisson distribution, while the data for the empirical study is overdispersed.The results of the study collectively may indicate that 2SQML estimator is relatively more efficient compared to CLS and FGLS for overdispersed data.The empirical results suggest that both series have long memory properties, which implies that the impact of macroeconomic news or rumors in one point of time has a persistence impact on future transactions.

Figure 1 .Figure 1 .
Figure 1.The time series of Ericsson B (mean 11.73 and variance 84.86) and AstraZeneca (mean 1.33 and variance 3.75) and their autocorrelation and partial-autocorrelation functions.

Table 2 .
The properties of the CLS, FGLS, and 2SQML estimators for finite-lag specifications, when data is generated according to INARFIMA (0, d, 0) models with d = 0.25 and lag (m) = 70.

Table 3 .
The properties of the CLS, FGLS, and 2SQML estimators for finite-lag specifications, when data is generated according to INARFIMA (0, d, 0) models with d = 0.4 and lag (m) = 70.