Long-Range Correlations and Characterization of Financial and Volcanic Time Series

In this study, we use the Diffusion Entropy Analysis (DEA) to analyze and detect the scaling properties of time series from both emerging and well established markets as well as volcanic eruptions recorded by a seismic station, both financial and volcanic time series data have high frequencies. The objective is to determine whether they follow a Gaussian or Lévy distribution, as well as establish the existence of long-range correlations in these time series. The results obtained from the DEA technique are compared with the Hurst R/S analysis and Detrended Fluctuation Analysis (DFA) methodologies. We conclude that these methodologies are effective in classifying the high frequency financial indices and volcanic eruption data—the financial time series can be characterized by a Lévy walk while the volcanic time series is characterized by a Lévy flight.


Introduction
The collection and analysis of time series data is a very important area of research. Inferences drawn from these data sets have helped in forecasting as well as various industrial product improvements. One important inference usually sought for is whether the time series exhibits persistence (long-range correlations), randomness or anti-persistence. Long-range correlations refers to the slow decay of the temporal or spatial correlation function defined as γ xy (δ) = X(t)Y(t + δ) . (1) A time series data which exhibits long-range correlations implies that the evolution of the system is affected by previous system states over long periods of time [1][2][3][4][5]. This makes the need to determine long-range correlations in time series data very important for various fields. Figure 1 shows the plot of a time series exhibiting a short term and long-term dependency. The plot shows the hourly occupancy rate of a road in the bay area. From the plot we observe two repeating patterns, daily and weekly. The daily describes the morning peaks vs. evening peaks, whereas the weekly reflects the workday and weekend patterns. However to determine the existence of long-range correlations using the formula in Equation (1) poses challenges due to its sensitivity to noise. This in addition to other factors has pushed research into the development of a number of scaling methods [2][3][4][5][7][8][9][10][11][12][13][14][15][16].
Various scaling methods exist and have been utilized by many researchers in detecting the persistence or anti-persistence in time series. Most notable applications are in financial and geophysical time series. Some examples of these scaling methods are the Rescaled Range Analysis (R/S), the Detrended Fluctuation Analysis (DFA), the Relative Dispersion Analysis (RDA) and the fairly recent Diffusion Entropy Analysis (DEA) which was developed by Scafetta [7][8][9]. Scafetta used the DEA to detect the scaling behavior of DNA sequences. The R/S, DFA, and RDA are examples of variance scaling methods and their scaling exponent is called the Hurst exponent, named after Hurst who first studied it in hydrology while the DEA on the other hand is a pdf scaling method.
The variance scaling methods however encounter various challenges when faced with time series data that exhibit anomalous behaviors. The R/S analysis in particular is usually unable to detect correctly the scaling exponent of non-stationary time series data while the DFA is known to overestimate the scaling exponent. Thus two short comings of these variance scaling methods are their inability to detect the exact value of the exponent though they may be able to detect the scale invariance and their unavailability for processes with infinite variances like the Lévy flight [7]. This is what makes the DEA, our main focus of this paper an important method in detecting the scaling exponent within a time series data. The DEA detects the scaling parameter δ using the pdf of the diffusion process derived from the time series. The advantage of DEA over the variance scaling methods is that it is able to establish the possible existence of scaling in time series data with normal or anomalous properties efficiently without any data alteration due to detrending [7][8][9]17].
Researches focused on long-range correlations have made it possible to gain more insight into long range evolution patterns of complex and chaotic occurrences both in nature (geophysical time series) and other equally important fields including financial markets, traffic analysis, bio engineering, and others. The results from these researches have provided various approaches to minimize risk and forecast or predict future dynamical trends [7][8][9][10]18].
In this study we consider several financial time series data as well as some geophysical time series data and analyze their long-range correlations using R/S analysis, the DFA and the DEA. The continuous time-varying Lévy process is effective for capturing the stochastic volatility (SV) and fat tails of data distribution. It is known that the volatilities of high frequency data are correlated, and they vary stochastically over time. We seek to determine the characterization of the time series data (i.e., whether it follows the Gaussian or Lévy distribution) by comparing the relation between the scaling exponent derived with the R/S and DFA against that of the DEA. This manuscript is organized as follows: A brief background of R/S and DFA and a detailed background of the DEA with the procedure used to detect the scaling exponent (δ) for a stationary and non-stationary time series data is presented in Section 2. Section 3 presents an overview of the data including results on stationarity of the series. A table of results and figures are shown in Section 4, and finally Sections 5 and 6 contain a brief discussion and conclusion.

Variance Scaling Methods
In this section we briefly introduce the Rescaled Range Analysis and the Detrended Fluctuation Analysis. The Diffusion Entropy Analysis is then discussed with more detail.

Rescaled Range Analysis
The idea of the Rescaled-Range analysis (R/S) was presented by Hurst in the framework of his study on the long-run variations of the water level of the Nile river [19]. It has become very popular since then, and has been applied to a wide range of disciplines, including traffic analysis, bioengineering, physics, geology, biology and geophysics [20].
The name H for the parameter derived from this technique was coined by Mandelbrot in tribute to the hydrologist Hurst and the mathematician Holder. The parameter H also known as index of dependence represents the relative trend of a time series and always lies between 0 and 1, it is equal to 1 2 in the case of processes with independent increments. Of particular interest for our work is the case in which 0.5 < H < 1 since it is an indicator of long-range correlations.

Detrended Fluctuation Analysis
In order to study the self-similarity and long-range dependence of time series Peng et al. [21] proposed the Detrended Fluctuation Analysis (DFA) while examining a series of DNA nucleotides. From the moment it was proposed to date, DFA has become a widely used method for the determination of fractal scaling properties and the detection of long-range correlations in non-stationary time series. It has been applied for example in biology, meteorology, geophysics and economics [21][22][23][24][25][26][27].
The principal advantage of the DFA lies in its ability to differentiate the intrinsic autocorrelations of the time series from those imposed by non-stationary external trends. That is, the method focuses on the intrinsic structure of the correlations of market fluctuations at different time scales, leaving aside non-stationary trends.
The application of the DFA method allows obtaining a scale exponent α from estimating the slope of function F(s) that measures the mean square deviation from an optimal linear approximation around the trend signal in segments of length s. The fluctuation function vs s behaves as a power law. Therefore it is possible to compute the value of the exponent α from the slope of the function in a log-log scale plot of F(s) vs s. The DFA exponent α and the Hurst parameter H are related by However, due to its sensitivity to abnormal values in the series, the rescaled range analysis method is not suitable for analyzing long-range auto-correlation for non-stationary series.

Diffusion Entropy Analysis
Based on the direct evaluation of the Shannon entropy [7][8][9]14,15], the DEA is a pdf scaling method which perceives the numbers in a time series as the trajectory of a diffusion process [13]. The scaling property for the stationary time series takes the form where x denotes the diffusion variable, p(x, t) is its probability density function (pdf) at time t, and 0 < δ < 1 is the scaling exponent. The scaling property for the non-stationary time series takes the form As derived in References [7,8], a diffusion process generated by Lévy walk is characterized by the following relation:  ( 5) refers to the scaling exponent derived from the variance scaling methods.

Estimation Procedure
In this subsection, we describe the estimation technique for the scaling exponent, δ. We first present a brief background on the Shannon Entropy that is used for estimating δ.

The Shannon Entropy
The concept of entropy was developed by Rudolph Clausius in 1865, a few years after he stated the laws of thermodynamics [18,28]. The entropy is an indicator of the lack of information about the measure of an event that occurs with propability p [18].
Other types of entropies are the Kolmogorov-sinai entropy, the Renyi entropy and the Tsallis entropy [7][8][9]18]. The Shannon entropy (named by Shannon) measures information of a probability distribution as follows: The summation is replaced by the integral in the case of continuous probability distributions. The above equation is used to derive the log equation that will be used to determine the DEA δ scaling. See below the technique for estimating δ: • The time series data is first transformed into a diffusion process. • Shannon's entropy of the diffusion process is calculated. A log-linear equation or log-quadratic equation is derived from the Shannon entropy by substituting Equations (3) and (4) respectively. Simplifying the result from the substitutions, we have the following relation for stationary time series: For the non-stationary series, the relation is as follows: where δ(t) = δ 0 + η log(t) and τ = log(t) with η log(t) < 1 − δ 0 . After some simplifications, Equation (8) becomes where K < 0 and δ 0 ≡ δ from the stationary pdf. Thus, by fitting a log-quadratic model in the non-stationary series and a log-linear model in the stationary series we are able to determine the δ (δ 0 ) scaling. At t = 1, it is clear that the constant A in both Equations (7) and (8) is given by S(1).
Thus δ (or δ 0 ) is derived by an estimation of the slope of the above linear-log equation or by the coefficients from the quadratic-log equation. For details of the algorithm used when transforming the series into a diffusion process, we refer the reader to Reference [7].

Financial and Volcanic Time Series
In this work we have applied two variance scaling methods (R/S analysis and DFA) and a pdf scaling method (DEA) on financial and volcanic time series data. This section gives a brief background of the data sets used and also presents the stationarity test-Augmented Dickey-Fuller test (ADF) was used for checking the time series stationarity [29].

Financial Time Series
The financial data used was taken from:

Volcanic Time Series
The Volcanic data used was recorded by seismic stations belonging to the Bezymianny Volcano Campaign Seismic Network (PIRE). Data was requested for 10 days before and 5 days after the published time of the volcanic eruptions. The seismic stations used were BEZB and BELO. Volcanic eruptions 1 and 2 were from BEZB and Volcanic eruptions 3-8 were from BELO.

Stationarity of the Financial and Volcanic Time Series
In this section the stationarity of the Financial and Volcanic data is determined by using the Augmented Dickey-Fuller test (ADF). We implemented both methods in R and Python.

Augmented Dickey-Fuller
The Augmented Dickey-Fuller test is a type of statistical test called unit root test. The null hypothesis of the test is that if the time series can be represented by a unit root, thus it is not stationary (has some time-dependent structure). The alternate hypothesis (rejecting the null hypothesis) is that the time series is stationary.

Financial Time Series
After implementing the ADF test to the financial data the results in Table 1 were obtained for p-values at α = 0.05.

Volcanic Time Series
After implementing the ADF test to the Volcanic time series the results in Table 2 were obtained for p-values at α = 0.05. The above tables summarize the results obtained for the two time series, it is clear from both tests that the volcanic time series is non-stationary while the financial time series is stationary.

Results
This section describes the analysis of financial indices and volcanic time series when our models are applied to the data sets. Tables 3 and 4 show the scaling exponents derived from applying the three scaling methods. The δ, H, and α exponents are used to obtain δ Levy (R/S) and δ Levy (DFA). The Hurst analysis of financial indices and volcano time series are shown in Figures 2-6, 17-20. The slope of the best straight line fitted on the logarithmic plot of rescaled range (R/S) versus time is the Hurst exponent (see Table 3). show that the scaling exponent (α) is less than 1, which confirms the presence of long-range correlations, that is, the large values are likely to be followed by large values and vice versa. So the DFA allows us to study the correlations in data, without disturbance of seasonality or trend. In Figures  12-16 and 25-28, we notice that there is a considerable difference between the DEA analysis of financial indices and volcanic eruptions data. Unlike the financial indices, S(t) − S(1) of the volcanic eruption data is increased almost exponentially with the logarithm of time scale. Table 3. Scaling exponents for emerging and established markets time series.

Figures
In this section we present figures obtained from our numerical simulation of the financial and volcanic time series data after we applied the R/S analysis, the DFA and the DEA. Figures 2-6 show the log-log plot from the R/S analysis applied on the financial time series. Figures 7-11

Discussion
For the financial series data all three scaling methods correctly detect the existence of long-range correlations. Comparing δ with the relation in Equation (5), we see that the relation holds (with adjustments within the interval (0, 0.06)) since equality is almost always impossible by virtue of the fact that each scaling method derives its scaling exponent through approximations. Thus we are able to deduce that the financial time series is characterized by a Lévy walk. With the Volcanic data however the R/S analysis is unable to correctly detect the existence of long-range correlations since the volcanic data is non-stationary. However the DEA and DFA correctly detects long-range correlations. Equation (5) is however not satisfied and clearly δ = (H, α). Hence the volcanic series can neither be characterized by FBM nor Lévy walk. The volcanic time series is thus characterized by a Lévy flight (i.e., it has an infinite variance).

Conclusions
In this study, we have used high frequency financial and volcanic time series to analyze their scaling and dynamic behavior. We have implemented some scaling techniques, namely Diffusion Entropy Analysis, Diffusion Fluctuation Analysis and the R/S analysis that incorporates exponential and Hurst parameters. The techniques allow us to characterize the data distribution and their long-range correlations. To obtain a good fit for the data, we first analyze their stationary behavior using unit root tests (see Section 3.3). Tables 3 and 4 to confirm that the p-values are significant at all specified levels for financial data, so the high frequency financial indices used in this paper are stationary. In Section 3.3.3, we see that the volcano time series data shows non-stationary behavior. We fit three scaling exponent techniques into our financial and geophysical data in order to estimate the exponent parameters.
Tables 3 and 4 summarize the estimation of parameters α, δ, and H for financial and volcano data, respectively. We see that the estimated values (α, δ, and H) fall between 0 and 1, which means that the high frequency stock market data and volcanic eruption data show long memory behavior. The long memory supports that the present information is highly correlated with past information at specified levels, which may facilitate prediction. We conclude that for the high frequency stock market data, the Hurst coefficient is near to 0.65, Detrended fluctuation parameter is near to 0.65, and Diffusion entropy parameter is near to 0.59. For the high frequency volcanic time series, the Hurst coefficient is near to 0.39 and Diffusion entropy parameter is 0.6837. In addition we have shown with a combination of DEA and the variation scaling methods that the financial time series can be characterized by a Lévy walk while the volcanic time series is characterized by a Lévy flight. The Lévy process is useful to detect a financial crash of the stock market or the risky seismic events. Since the high frequency data follow almost log-normal distribution, for any finite-variance Lévy process, randomizing time is equivalent to randomizing variance. Thus the time-varying Lévy process generates stochastic volatility (SV) by randomizing time, which may improve the forecasting performance. The reason is that the SV model takes into account a stochastic component of the data volatility and estimates the time-varying parameters using filtering techniques in order to predict future volatility [30].