3.1. Conventional Methods
A well-established technique for estimating the Hurst exponent is based on a statistic known as the “rescaled range over standard deviation” or “R/S” statistic. This statistic was first introduced in 1951 by the hydrologist H. E. Hurst who observed long range dependence in the dynamics of the Nile river’s annual water level to determine the long-term storage capacity crucial for the construction of irrigation reservoirs. Since then, robust empirical evidence of long-range dependence in time series has been extensively documented in various disciplines, particularly from physical science studies, where studied time series exhibit some kind of trending behavior (e.g., the circumferences of tree trunks, levels of rainfall, fluctuations in air temperature, oceanic movements, and volcanic activities, etc.). Among the first to use re-scaled range analysis to examine this behavior in common stock returns is the renowned econometrician B. Mandelbrot, who also coined the term Hurst exponent in recognition of Hurst. Refs [
12,
29] radically refined the R/S statistic. In particular, they advocate its robustness in detecting as well as estimating long-range dependence even for non-Gaussian processes with extreme degrees of skewness and kurtosis. Furthermore, this method’s superiority over traditional approaches such as spectral analysis or variance ratios in detecting long-memory was also presented in this body of research.
However, as [
5] pointed out, the refinements were not able to distinguish the effects of short-range and long-range dependence. To compensate for this weakness, he proposed a new modified R/S framework. His findings indicate that the dependence structure documented in previous studies are mostly short-ranged, corresponding to high frequency autocorrelation or heteroskedasticity. There are two important implications for us from [
5]: (i) empirical inferences of long-range behavior must be carefully drawn, preferably by accounting for dependence at higher frequencies, and (ii) in such cases, conventional models of short-range dependence model (such as AR(1) or random walk models) might be adequate. On the other hand, as implied in a counterargument raised by [
30], we should also be cautious of the implication of [
5]’s modified method because of its tendency to reject long-range dependence even when evidence of such behavior in fact exists (albeit weakly).
Therefore, despite the enormous praise the R/S statistic has enjoyed over the years, we follow these authors’ advice of not relying solely on this technique, but on a diverse range of well-established alternatives in the literature for estimating long-range dependence. In the following paragraphs we provide descriptions for the methods we used to estimate the long-range dependence parameter
H. Furthermore, to suit our empirical analysis, we focus on the case of discrete-time stochastic processes. Additional estimation techniques we utilize in this study include the aggregated variance method as analyzed in [
25,
31], the Higuchi method [
32], the residuals of regression [
33], and the periodogram method [
34]. Detailed discussions of these methods, as well as their strength and weaknesses, are available upon request.
3.2. Wavelet-Based Maximum Likelihood Estimator
Most recent studies adopt the approach from a ‘time domain’ perspective, that is, the data are analyzed as time series which are commonly recorded at a pre-determined frequency(s) (i.e., daily, weekly, monthly etc.). This approach, no matter how effective, implicitly imposes that the recorded frequency is the sole frequency to be considered when studying realizations of a time varying variable. Problems emerge when this assumption turns out to be insufficient. Specifically, what will the situation be when there are many, not one, frequencies that dictate the underlying generating process of the variable of interest? This issue is particularly relevant in the context of financial assets, of which prices are determined by the activities of agents with multiple trading frequencies.
To address this concern, a different approach taking into account the frequency aspect is called for. A well-established methodology representing this branch of ‘frequency-domain’ analysis is the Fourier transform/spectral analysis. In general, this method is a very powerful tool specifically designed to study cyclical behavior of stationary variables. Based on this fundamental idea, an advanced technique was developed to simultaneously incorporate both aspects of a data sequence. This relatively novel methodology is known as the wavelet transform. It is worth noting that though wavelet analysis has been used for a long time in the field(s) of engineering, in particular signal processing, its application in finance is only recently becoming more popular thanks to the work of pioneers such as [
2,
35].
To apply the wavelet-based estimation method for long-memory processes, we begin by examining the popular case of the fractional ARIMA process class: the FARIMA (0, d, 0) [also known as a “fractional difference process” (hereafter, FDP)] which is described as
. or, equivalently,
with
the fractional difference parameter. This expression means that the
-th order difference of
equals a (stationary) white noise process. A zero-mean FDP (with
), denoted as
, is stationary and invertible (see e.g., [
2,
36]). Recall that we define its slowly decaying auto-covariance function as:
Correspondingly, for frequency f satisfying
the spectral density function (SDF) of
satisfies:
The SDFs of the process with different values of d (and standard normal innovations, i.e.,
) are plotted in
Figure 1. When
(i.e., long memory exists), the slope of the SDF on a log-log scale increases as d increases. In this case, the SDF has an asymptote at frequency zero, or it is “unbounded at the origin”, in [
36]’s terminology. In other words,
when
. Correspondingly, the auto-correlation function (hereafter, ACF) decays more slowly as d increases. While the ACF of a process with
quickly dissipates after a small number of lags, the ACF of a process at the ‘high-end’ of the long-memory family, with
, effectively persists at long lags. The former was commonly interpreted as exhibiting “short-memory” behavior.
We can see the relationship between auto-covariance and the spectrum: the ACF decreasing towards very long lags, which correspond to very low frequencies (as the observations are separated by a great time distance, i.e., the wavelength of the periodic signal becomes very high). This reminds us that the spectrum is simply a “representation” of the autocorrelation function in the frequency domain. In addition,
Figure 1 shows that the higher the degree of long-memory (the higher the
parameter), the larger the spectrum will be when
. It was well-established that both slowly decaying auto-correlation and unbounded spectrum at the origin independently characterize long-memory behavior (see, e.g., [
37,
38]). In line with these authors, [
39] agrees that the pattern of power concentrates at low frequencies and exponentially declines as frequency increases, such as the ones in the top plot of
Figure 1, which is the “typical shape” of an economic variable. An important remark that should be made from this observation is that since the periodogram is very high at low frequencies, it is the low frequencies components of a long-memory process that contribute the most to the dynamics of the whole process. For our purposes, we show that to understand the underlying mechanism of risk process, emphasis needs to be placed in the activities of investors with long trading horizons rather than the day-to-day, noisy activities of, for example, market makers.
To avoid the burden of computing the exact likelihood, [
2] utilize an approximation to the covariance matrix obtained via the discrete wavelet transformation (hereafter, the DWT): let
be a fractional difference process with dyadic length
and covariance matrix
, the likelihood function is defined as:
where
denotes the determinant of
. Furthermore, we have the approximate covariance matrix given by
, where
is the orthonormal matrix representing the DWT.
is a diagonal matrix which contains the variances of DWT coefficients. The approximate likelihood function and its logarithm are:
As noted earlier, [
1] introduced the wavelet variance
for scale
which satisfies
. The properties of diagonal and orthonormal matrices allow us to rewrite the approximate log-likelihood function in Equation (1) as:
The maximum likelihood procedure requires us to find the values of
and
to minimize the log-likelihood function. To do this, we set the differentiated Equation (2) (with respect to
) to zero and then find the MLE of
:
Finally, we put this value into Equation (2) to get the reduced log-likelihood, which is a function of the parameter
d:
As an illustration, we apply the wavelet MLE to our simulated fGn dataset with
H = 0.7 and the volatility series of S&P500 index (proxied by daily absolute returns). Because a dyadic length signal is crucial for this experiment, we obtain daily data ranges from 6 February 1981 to 31 July 2013 (from
http://finance.yahoo.com), for a total of 8192 = 213 working days. In addition, we set the number of simulated fGn observations equals to 8192 for comparison. We chose an LA8 wavelet with the decomposition depth set to 13 levels.
Figure 2 summarizes our results. Because the actual values of the SDF are very small, we substitute them with their base-10 logarithmic transformation to make the plot visually clear. Estimates of
d are 0.2435 and 0.2444 (corresponding to
H = 0.7435 and
H = 0.7444) for the simulated fGn and S&P500 daily risk processes, respectively. Corresponding values of
(or the residuals’ variance) are
and
. Subsequently, we have the corresponding time series models:
To further demonstrate the ability of our estimator in capturing long-memory behavior, for each case we plot the theoretical SDF of a fractional difference process with a parameter
d set to equal that of the estimated value. Then, we fit this SDF (indicated by a green line) with the corresponding periodogram/spectral density function obtained from the data. In line with [
1]’s findings, for both cases the two spectra are in good agreement in terms of overall shape, save for some random variation. However, we obtain a much smaller value of
for the S&P500 series, thus random variation is less severe in its case. In other words, the green line approximates the spectrum of the risk series better than in the case of the fGn. In summary, it can be concluded that this method is effective regarding detecting long-range dependence. The result also indicates that the daily S&P500 volatility series can be reasonably modelled as an fGn process since the two have very similar long-memory parameter.