Efficiency of the Moscow Stock Exchange before 2022

This paper investigates the degree of efficiency for the Moscow Stock Exchange. A market is called efficient if prices of its assets fully reflect all available information. We show that the degree of market efficiency is significantly low for most of the months from 2012 to 2021. We calculate the degree of market efficiency by (i) filtering out regularities in financial data and (ii) computing the Shannon entropy of the filtered return time series. We developed a simple method for estimating volatility and price staleness in empirical data in order to filter out such regularity patterns from return time series. The resulting financial time series of stock returns are then clustered into different groups according to some entropy measures. In particular, we use the Kullback–Leibler distance and a novel entropy metric capturing the co-movements between pairs of stocks. By using Monte Carlo simulations, we are then able to identify the time periods of market inefficiency for a group of 18 stocks. The inefficiency of the Moscow Stock Exchange that we have detected is a signal of the possibility of devising profitable strategies, net of transaction costs. The deviation from the efficient behavior for a stock strongly depends on the industrial sector that it belongs to.


Introduction
When prices reflect all available information, the market is called efficient [1].One way to claim the efficiency of a market is by testing the Efficient Market Hypothesis (EMH).In the weak form, the EMH considers that the last price incorporates all the past information about market prices [2].If the weak form of EMH is rejected, previous prices help to predict future prices.For traders, market efficiency means that analyzing the history of previous prices does not help to design a strategy that gives an abnormal profit.For a company issuing shares, market efficiency means that the cost of its share already reflects all information about the valuation and decisions of the company.The EMH is of great interest also in research.Mathematical models of an asset price are usually based on the assumption that the price follows a martingale: the expected value of a future price is the current value of the price.If the EMH is rejected, there should be an estimation of the future price better than its current value.In such a case, new models should be thought.
The review of works confirming the EMH was presented by Fama in 1970 [2] and then in 1991 [3].The martingale hypothesis was also tested later.It was shown that the efficiency of a market depends on the development of the country [4].Also, the martingale hypothesis was confirmed on short time intervals, but may be violated on longer intervals [5].In addition, there is a range of strategies designed to increase an expected profit.High-frequency and algorithmic trading strategies are discussed in [6].Statistical and machine learning methods for high frequency trading are reviewed in [7].The existence of such profitable strategies contradicts the Efficient Market Hypothesis.According to Grossman and Stiglitz [8], the degree of market inefficiency determines the effort investors are willing to expend to gather and trade on information.A goal of this paper is to investigate the degree of stock market efficiency using the Shannon entropy.
Before estimating the degree of market efficiency, we need to get rid of regularities that make prices more predictable, but do not imply any profitable strategies.The methodology of filtering regularities was introduced in [9].However, such a filtering has not usually been applied in other research works (see e.g.[10,11,12]).In fact, deviations of price behavior from perfect randomness may be the result of some known regularity pattern, such as volatility clustering or daily seasonality, but not a signal of market inefficiency.
We process data by filtering regularities of financial time series including volatility clustering and price staleness.Price staleness is defined as a lack of price adjustments yielding 0-returns.Traders may trade less because of high transaction costs and so the price does not update.See [13] for more details.The price staleness produces an extra amount of 0-returns called excess 0-returns.The other source of 0-returns in the time series is price rounding.Estimations of volatility and degree of price staleness are mutually connected: excess 0-returns appearing due to price staleness tend to underestimate volatility.At the same time, volatility estimation is needed to calculate the expected amount of 0-returns due to rounding.
One way to estimate volatility in the presence of excess 0-returns was presented in [14].It uses expectation-maximization algorithm [15] to estimate returns in the places of all 0-returns and uses GARCH(1,1) model to estimate volatility [16].The maximization of the likelihood function appearing at each step of the considered algorithm requires several parameters for numerical optimization.If the estimation of volatility is sensitive to these parameters, that are user-defined, then they may affect the entropy of returns standardized by volatility and the amount of 0-returns in the time series.In this article we suggest a modification of moving average volatility estimation that requires adjusting of the only parameter that can be defined using out-of-sample testing.The idea is to adopt a simple method for volatility estimation, so that price staleness is taken into consideration.Moreover, while estimating volatility, we filter out excess 0-returns.
The degree of market efficiency has been measured for many countries.Stock indices for 20 countries including Taiwan, Mexico, and Singapore, were considered in [17].The efficiency of 11 emerging markets, the US and Japan markets was measured in [18] using the Hurst exponent and R/S statistics.The review of articles about Baltic countries was presented in [19].A degree of uncertainty of Chinese [20], Tunisian [21], and Portuguese [22] stock markets was also considered using entropy measures.However, the efficiency of the Russian stock market has not yet been analyzed.In this paper, we present an analysis of market efficiency based on the estimation of Shannon entropy for a group of 18 stocks of Russian companies from five industries.
Our paper introduces four original contributions in the field.First, we construct the method of filtering out heteroskedasticity and price staleness.This filtering process helps to identify a true degree of market inefficiency.Second, we calculate the degree of market inefficiency for the previous decade using monthly intervals.We conclude that the degree of market inefficiency for the Moscow Stock Exchange was greater than 80%.Third, we determine which pair of stocks exhibits the largest amount of inefficiency, as measured by estimating Shannon's entropy on their high frequency price time series.We show that months where the predictability of stock prices attains its maximum cluster together.We find out what behavior of stocks repeats most often for inefficient time periods.Finally, we estimate the closeness of price movements using two measures of entropy.Based on these results, we cluster together groups of stocks for which the efficient market hypothesis is rejected, thus pointing out how market inefficiency display some dependence from the financial sector they belong.
The article is organized as follows.Section 2 describes the dataset and the methodology of filtering data regularities and calculating the Shannon entropy.Section 3 presents the results on simulated and real data.Section 4 concludes the paper.

Dataset
We study the Moscow Stock Exchange.We consider close prices aggregated at one-minute time scale.In particular, we select only minutes of the main trading session from 10:00 to 18:40.The time interval covers ten years from 2012 to 2021.The time period is divided into monthly time intervals.We take 18 companies, 16 of them are from five sectors: oil industry, metallurgy, banks, telecommunications, electricity.All stocks are listed in the Table 1  For each company, we specify the ticker of stock, its sector, the size of data, and the amount of outliers removed.

Apparent inefficiencies
To estimate a degree of market efficiency, we first should eliminate the known patterns of predictability, such as a daily seasonality.Financial agents operating in the market tend to trade less in the middle of a day.It is reflected in prices, but again this pattern in trading volume should be filtered out to detect genuine patterns of inefficiency.Other known regularities are volatility clustering, price staleness, and microstructure noise.See Appendix A for the guide of filtering out apparent inefficiencies.The contribution of this article is devising a simple method for filtering volatility clustering and price staleness.One of the methods to estimate volatility is the exponentially weighted moving average (EWMA).It is described in the next section.

EWMA
We define price returns as r t = ln Pt Pt−1 , where P t is the last price available at time t and ln() is the natural logarithm.In order to estimate volatility σ n , we apply exponentially weighted moving average [24] of the values µ −1 Here the fact that E[|r n |] = µ 1 σ n is used and more weights are given for the more recent data.An alternative formula based on the fact that E A large value of return increases the value of volatility.The current value of volatility reflects all available values of returns and changes slowly if the value of α is small.Usually, the smoothing parameter α is taken close to 0. For instance, α = 0.05 is taken in the article [9].The value of the parameter α is set to be equal 0.12 for in-sample testing and 0.22 for out-of-sample testing in [25].Instead, Hunter [24] suggests to use α = 0.2 ± 0.1.Using the principle of the best one-step forecasting, the smoothing parameter is set to 0.06 for the daily data and to 0.03 for the monthly data [26].
We follow the approach suggested by [26] (p.97) to find the optimal value of α.The goal is to select the parameter α, such that it minimizes the value of Er σ = i (σ 2 i − r 2 i ) 2 .In order to minimize Er σ as a function of the only parameter 0 < α < 1, we apply Brent's algorithm [27]3 .

Estimation of price staleness
Let's define an efficient price, P e , as a continuous process following a Geometric Brownian Motion.If the efficient price changes insignificantly, the return of the rounded price will be equal to 0. Analogically, if the return of rounded price is 0, the return of efficient price has a value close to 0. We use the Equation 3 to determine the probability that a 0-return is appearing due to rounding: where 2∆ and erf (x) is the Gaussian error function; d is a tick size4 , ∆ is a time step 5 , P is a rounded price, and σ is an estimation of volatility [29].It is obtained by considering the probability that a price following a Geometric Brownian Motion moves less than one tick size, assuming that price increments are normally distributed.
There is another source of getting 0-returns, namely price staleness.Price staleness is a regularity that means that a fundamental (efficient) price of an asset is not updated because of economic reasons.Such a reason is, for instance, a high transaction cost, which makes transactions unprofitable for traders.See [13] for more details.The presence of price staleness in the data implies the existence of excess 0-returns instead of values of returns of efficient price.The excess 0-returns tend to reduce the estimation of volatility.Therefore, we need to filter out 0-returns due to price staleness and keep 0-returns due to rounding.
According to our methodology of filtering out excess 0-returns presented in [29], we save 0-returns proportionally to the probability in Eq. 3 and set other 0-returns as missing values.We adopt this methodology to estimate the degree of price staleness together with volatility in the next section.

Modification of EWMA
In this Section, we present a modification of the EWMA that takes into consideration the effect of price staleness.Our modification of the EWMA is based on the suggestion to estimate volatility σ n as σn−1 (so setting α = 0), if the value of r n−1 is missing because of price staleness.That is, there is no new information from returns to update the value of volatility.
Initially, the expected amount of 0-returns due to rounding is N save = 0. Thus, each appearance of 0-returns does not affect the value of volatility.A 0-return is defined as a value due to rounding and is saved in the sequence if the sum of all p i (Eq. 3) moves to a new integer value.Other details and the algorithm of volatility estimation can be found in Appendix B.
We update the estimation of volatility and price staleness minute by minute.This method has the clear advantage of making possible the online inference by processing data in real time.

The Shannon entropy
The degree of market efficiency is assessed by computing the Shannon entropy.The entropy of a source is an average measure of the randomness of its outputs [30].
Definition 1 Let X = {X 1 , X 2 , . . . .} be a stationary random process with a finite alphabet A and a measure µ.An n-th order entropy of X is with the convention 0 log 0 = 0.The process entropy (entropy rate) of X is

Discretization
The Shannon entropy is computed over a finite alphabet.To measure the Shannon entropy, we need to keep the length of blocks of symbols, k, sufficiently large.The predictable behavior of returns can be seen on blocks of greater length and may not be noticeable on blocks of smaller length.For this reason, we consider 3-symbols and 4-symbols discretizations using empirical quantiles.
where θ 1 and θ 2 are tertiles and Q 1 , Q 2 , Q 3 are quartiles.The tertiles divide data into three equal parts.The quartiles divide data into four equal parts.Q 2 is also the median of the empirical distribution of returns.For the later analysis, we will need a discretization describing the behavior of a pair of stocks.
where r (1) t and r (2) t are two time series of price returns and m 1 and m 2 are their medians.

Let x n
1 ∈ A n be the sequence of length n generated by an ergodic source µ from the finite alphabet A, where x i+k−1 i = x i . . .x i+k−1 .There are possible missing values in the sequence generated independently from x n 1 .We consider all blocks of length k that do not contain missing values.We take where n b (k) is the number of blocks of length k.The base of the logarithm is the size of the alphabet A (3 or 4).
For each a k 1 ∈ A k empirical frequencies are defined as an empirical k-entropy is defined by The estimation of the process entropy is ĥk = Ĥk k .
See [31] for the proof of the consistency of this estimator and [29] for the case of missing values.Since the sequence is finite, the estimation of entropy is underestimated.To remove this bias, we use the correction for the entropy estimation introduced in [32,33].
where the sequence G(i) is defined recursively as with the Euler's constant γ = 0.577215 . . . .

Detection of inefficiency
We need to do three steps to determine if the time interval is efficient or not.First, we filter out apparent inefficiencies (see Appendix A).Then, we estimate the entropy of the filtered return time series using Eq. 7. Finally, we determine if the value of entropy is significantly low relative to the case of perfect randomness.We detect inefficiency in the time interval using Monte Carlo simulations.We regard a Brownian motion as absolutely unpredictable.First, we define the length of sequences as l = n b (k) + k − 1.Then, we simulate 10 4 realizations of Brownian motions with Gaussian increments and the length l.For each realization, we calculate entropy using 3-and 4-symbols discretizations.
Then, we find the first percentile of the obtained entropies for each discretization.These percentiles are the bounds of 99% of the Confidence Interval (CI) for testing market efficiency.Finally, we define an efficiency rate as the ratio of the entropy of the time interval and the bound of CI.If the efficiency rate is less than 1 for at least one type of discretization, we define the time interval as inefficient.We provide testing for inefficiency twice using different discretizations because the unique testing may not be robust.See an example in Appendix C.

Kullback-Leibler divergence
In addition to estimating the entropy of one time series, we can also consider the difference between two time series.The Kullback-Leibler divergence [34] is used to measure similarity between two sequences.For two discrete probability distributions P and Q.
We use p i and q i as empirical probabilities obtained in Eq 6.Since the Kullback-Leibler divergence is asymmetric, we consider the distance between two time series proposed in [35].
The greater the distance D(P, Q), the more probability distributions P and Q differ.

Simulations
The aim of this section is to assess the accuracy of the estimation of volatility and the degree of price staleness.We will choose the method that gives the least error of the estimation for further analysis on real data.We take the following model of an observed price Pt , t = 1 . . .2N .
where W 1 and W 2 are two independent Brownian motions with the length of 2N , N = 10 5 , price P 0 = 100, and ν = 10 −4 .B = 1 stands for the case when price is not updated due to price staleness (see [13,36]).Prices are rounded to two digits, thus the tick size is d = 0.01.We consider 4 choices for pr t and σ t listed below.2 and 3 below.We compare two methods that use Sig 1 and Sig 2 for volatility estimation.We set a fixed value of alpha, α = 0.05, as a benchmark for the comparison.We also apply non-modified EMWA estimation from Section 2.3 with selected optimal value of α to show the contribution of 0-filtering to the accuracy of volatility estimation.We simulate 10 3 prices for each model.
Table 2 represents a mean absolute percentage error (MAPE) that is 1 for six different approaches.These approaches differ in the choice of a function for volatility, the value of α, and the presence of missing values.Table 3 represents three values for each of two methods using Sig 1 and Sig 2 for the volatility estimation.The first value is the optimal value of α.The second is − 1| where N A is the amount of remaining non-missing returns, N round is the amount 0-returns that would appear due to rounding (before adding the effect of staleness in the simulated data); N A is the amount of non-missing returns; and N 0 is the amount of 0-returns.Er N represents the absolute error of the proportion of 0-returns that remain in the data and are defined as 0-returns due to rounding.The third value is the proportion of data set as missing values, that is 1 − NA N .It can be seen from Table 2 that the method that more often gives the lowest value of MAPE is with fixed α = 0.05 and Sig 1 used for volatility estimation.Moreover, for almost all cases, 0-filtering makes the volatility estimate more accurate.The error of the amount of 0-returns due to rounding is smaller for the function Sig 1 than for the function Sig 2 for all 16 cases.After the comparison of the two functions of volatility estimation, we choose Sig 1 that uses absolute values of returns.For the rest of the paper, we fix the value of α as 0.05 for the simplicity of further analysis.

Moscow Stock Exchange
We define a degree of inefficiency as the fraction of months which are defined as inefficient according to Section 2.9.The degree of inefficiency for the chosen group of stocks traded at Moscow Exchange is 0.823. 6The degree of inefficiency for each stock and discretization is presented in Table 4.We notice that the 4-symbols discretization contributes to the larger amount of inefficient months than the 3-symbols discretization.That is, the 4-symbol discretization appears to have a more predictable structure than the 3-symbols discretization.
Figure 1 shows the minimum value of efficiency rate among all months for each stock.
There are two most notable deviations from 1 for stocks MLTR (Mechel, mining and metals company) and RSTI (Rosseti, power company).We investigate them in the next section.For the other 16 stocks, the minimum value of efficiency rate is attained for the stock AFLT and is equal to 0.933 (0.964) for 3 (4) symbols.Values of α, errors of the number of 0-returns due to rounding, and fraction of data set as missing values.The first column indicated a model.95% CI is presented below each averaged statistic.v 1 stands for using Sig 1 ; v 2 stands for using Sig 2 .

Analysis of MLTR and RSTI
We plot the values of efficiency rates for monthly intervals for the MLTR and RSTI stocks.See Fig. 2 and Fig. 3.Both types of discretization show coherent results.For MLTR, there are two notable decreases in the efficiency rates at the beginning of 2014 and in the middle of 2016.For both types of discretizations, eight months with the lowest efficiency rate (in the ascending order of time) are Jan-Feb and May-Oct of 2014.For each month we write down the most frequent block of symbols in Table 5.Note that block 1111 for the 4-symbols discretization appears as the most frequent for 6 months out of 8 for MLTR.The block denotes a slight decrease in the price for 4 minutes in a row.The meaning of the last two columns is discussed later.
For RSTI, there are two sharp decreases in 2014 and 2015.There are 11 months that have the lowest efficiency rates in common for both discretizations.These months are Apr-Sep of 2014 and Jun-Oct of 2015.Note that these inefficient months cluster together and are not distributed uniformly among the entire time period of 10 years.This is the signal of a market condition that affects the inefficiency of the stocks for more than one month.
We construct a simple trading strategy on discretized returns to test the predictability of future returns.We consider blocks of length 4 obtained by the 4-symbols discretization.For each month, we divide blocks into two equal parts.The discretization is made using only the first part of a  Fraction of inefficient months using 3-symbols and 4-symbols discretization.
month.We consider the sequences of the first 3 symbols of each block.If the empirical probability of getting 0 or 1 after the sequence of 3 symbols in the first part of the month, this sequence is from group D (decreasing).If the empirical probability of getting 2 or 3 after the sequence of 3 symbols, this sequence is from group I (increasing).Then, for the second part of the month, we determine a success if symbols 0 or 1 follow a sequence from group D or if symbols 2 or 3 follow a sequence from group I.Then, we calculate the fraction of successes.Thus, it is the probability of making a profit: sell after group D or buy after group I.In the case of market efficiency, this probability is equal to 0.5.For example, we expect that after 111 the next symbol would be 1 according to Table 5.That is, after this block, a trader can sell a stock.The fourth column of the Table 5 shows the results for filtered return time series.The fifth column stands for the original return time series.For all cases the probability is greater than 0.5.Obviously, the probabilities for the original return time series are greater than those for the filtered return time series.The reason is that predictability for the original return time series follows from the sources of apparent inefficiencies.
The same analysis is done for the RSTI stock.Eleven months with the lowest efficiency rates are presented in Table 6.For the RSTI stock, the simple trading strategy gives the fraction of successes (of predicting increases and decreases of the price) greater than 0.5 for all 11 months.The frequent behavior of the price of RSTI during the chosen months is a slight increase in price for several minutes in a row denoted by symbol 2.
The simple trading strategy is an illustrative example of market inefficiency.In fact, such a strategy could result in no profit when used in practice because it does not take into account the costs of transaction and other trading frictions.Moreover, the filtering of daily seasonality pattern is made by using the whole period of analysis.That is, this method cannot be applied in real time.Finally, we consider blocks containing only observed returns, by neglecting the missing values from the analysis.Thus, the application of such a strategy in practice should be integrated with the case when a missing value follows a sequence of 3 symbols.The first column represents months with the lowest efficiency rates.Columns 2 and 3 are the most frequent blocks in 3-and 4-symbols discretization.Columns 4 and 5 are the probability of the success of the simple trading strategy for filtered and original price returns.The first column represents months with the lowest efficiency rates.Columns 2 and 3 are the most frequent blocks in 3-and 4-symbols discretization.Columns 4 and 5 are the probability of the success of the simple trading strategy for filtered and original price returns.

Stock market clustering
Most of the month-long time intervals are identified as inefficient.But is there some dependence between two stocks that are inefficient at the same time?

Kullback-Leibler distance
We measure the similarity of discretized filtered returns by using the Kullback-Leibler (KL) distance (Eq.8).We use k, the length of blocks, as the maximum value suitable for both sequences according to Eq. 5.The 4-symbols discretization is used.The Kullback-Leibler divergence DL(P |Q) is calculated using empirical frequencies.The entropy rates are calculated using Eq. 7. Using the Kullback-Leibler distance for all pairs of stocks, we cluster them in three groups using hierarchical clustering with UPGMA algorithm [37] 7 .The result is in Fig. 4. Combining companies into one cluster means that their stocks have a common behavior that is not related to the value of volatility, the degree of price staleness and the structure of microstructure noise.It can be seen that banks and oil companies are clustered together (right).There is a group of four stocks RTKM, HYDR, AFLT, MGNT, that have nothing in common at first glance.The remaining group (left) mainly consists of metallurgy companies.However, there is no visible distinction between the stocks of banks and oil companies.According to the clustering tree, two telecommunications companies differ significantly, as well as electricity companies.
Finally, two stocks with the lowest efficiency rates, RSTI and MLTR, are the furthest (in the sense of KL distance) from any other stock.That is, there are no stocks that behave similarly to these two stocks.

Entropy of co-movement
Now, we consider another measure of difference between two stocks, the entropy of co-movement.We calculate the Shannon entropy of the discretization describing the movement of a pair of prices presented in Eq. 4. We consider only minutes that are in common for both stocks.For these minutes we consider values of residuals obtained after ARMA fitting.The result is in Fig. 5.
Two companies related to telecommunications are a separate cluster.Three metallurgy companies MAGN, CHMF, NLMK also cluster together.Stocks relating to oil and bank companies form the other cluster.The same cluster, with the exception of the TATN (oil industry), was also formed in the previous section.The "closeness" of stocks GAZP and SBER is detected either in this and in the previous section.The three stocks on the left that join other stock clusters last are the stocks with the lowest efficiency rates.
Some clusters may form on the basis that companies belong to the same industry.The division of companies into industries is noticeable from the dendrogram in Figure 5.However, this criterion does not explain all clusters.For instance, GMKN from metallurgy is in the cluster of oil companies and banks.

Conclusions and discussion
We have investigated the predictability of the Moscow Stock Exchange.We are interested in a measure of market inefficiency that is not related to known sources of regularity in financial time series.Usually, these sources are not filtered out and, accordingly, their impact is taken into account in the degree of price predictability (see e.g.[10,11,12]).
We have focused on two sources of regularity, namely volatility clustering and price staleness [13].Filtering of volatility clustering was made in [9] by estimating volatility using the exponentially weighted moving average.We have developed a modification of the volatility estimation by taking into consideration the effect of price staleness.Price staleness produces excess 0-returns that affect the estimation of volatility.Another approach of estimating volatility in the case of presence 0returns was proposed in [14] where all 0-returns are reevaluated during an expectation-maximization algorithm.In our approach, we separate 0-returns that may have resulted from rounding and from price staleness.Thus, we also filter out apparent inefficiency due to price staleness.The advantage of our approach is simplicity: there is only one parameter in the method which can be optimized using historical data.Our approach combining the estimates of volatility and the degree of staleness can be used for real-time analysis since only past observations of time series are used.
We used the Shannon entropy as a measure of randomness to calculate a degree of inefficiency of the Moscow Stock Exchange.We used two types of the discretization of return time series to test efficiency more reliably for each month.The 4-symbols discretization helps to find more price movements that lead to market inefficiency than the 3-symbols discretization.There are 80% of months over the period from 2012 to 2021 that are defined as inefficient.Even after filtering out all sources of apparent inefficiency, most of the months contain signals of market inefficiency.
We selected two stocks that exhibit the lowest efficiency rates.We have shown that the most inefficient months are grouped together.We have shown that, for such months, discretized price returns before and after filtering out apparent inefficiencies are predictable.
Finally, we used two methods to cluster stocks using filtered return time series.Inspired by [35], we computed the Kullback-Leibler distance between stocks and grouped stocks into three clusters.We also introduced the entropy of co-movement of two stocks.In this case, stock prices display common patterns that have an interpretation in terms of the sector the stocks belong to.We also noticed that the stocks of banks and oil companies were linked to each other.One possible improvement to stock clustering is to modify the entropy of co-movement such that it is possible to define a proper distance function.This is left for future research.
The proposed method for measuring market efficiency using the Shannon entropy can be applied in other markets of different countries.In this work, we use monthly time intervals for entropy calculation.Our future work will be related to the optimization of the length of return time series.One of the problems is to find a significant decrease in entropy without using Monte Carlo simulations.We also plan to switch to a higher frequency (less than one minute) to analyze the predictability of financial time series.

Appendices A Data cleaning and whitening A.1 Outliers
We use the method of an outlier detection introduced in [23].The algorithm finds price values that are too far from the mean in relation to the standard deviation.The algorithm deletes a price P i if where Pi (k) and s i (k) are respectively a δ-trimmed sample mean and the standard deviation of the k price records closest to time i.The δ% of the lowest and the δ% of the highest observations are discarded when the mean and the standard deviation are calculated from the sample.The parameters are k = 20, δ = 5, c = 5, γ = 0.05.

A.2 Stock Splits
We check the condition |r| > 0.2 in the return series to detect unadjusted splits8 .There are no unadjusted splits found.

A.3 Intraday Volatility Pattern
The volatility of intraday returns has periodic behavior.The volatility is higher near the opening and the closing of the market.It shows an U-shaped profile every day.The intraday volatility pattern from the return series is filtered by using the following model.We define deseasonalized returns as Rd,t = R d,t ξ t , where is the raw return of day d and intraday time t, s d is the standard deviation of absolute returns of day d, N days is the number of days in the sample.

A.4 Heteroskedasticity
Different days have different levels of the deviation of the deseasonalized returns R. In order to remove this heteroskedasticity, we estimate the volatility σt in Appendix B. We define the standardized returns by r t = Rt σt .

A.5 Price staleness
If a transaction cost is high, the price is updated less frequently, even if trading volume is not zero.This effect is called price staleness and is discussed in Section 2.4.We identify 0-returns appearing due to rounding (and not due to price staleness) using the Equation 3. Other 0-returns are set as missing values as shown in Appendix B.

A.6 Microstructure noise
The last step in filtering apparent inefficiencies is filtering out microstructure noise.The microstructure effects are caused by transaction costs and price rounding.We consider the residuals of an ARMA(P,Q) model of the standardized returns after filtering out 0-returns.We apply the methodology introduced in [38] to find the residuals of an ARMA(P,Q) model by using the Kalman filter.
We select the values of P and Q that minimize the value of BIC [39], so that P + Q < 6.The values of P and Q are chosen for each calendar year and are used for the next year.For the year 2012 we select P = 0 and Q = 1 corresponding to an MA(1) model.

B Algorithm
The aim of the algorithm is to estimate volatility and filter out excess 0-returns due to price staleness.Some 0-returns appear due to price rounding.These 0-returns will be saved in the data.First, we set the number of 0-returns "to save" N save = 0 and the first value of a cumulative function Z 1 = 0.The cumulative function is updated Z t = Z t−1 + p t , if r t−1 is not defined as missing due to staleness.Each time when ⌊Z(t)⌋ − ⌊Z(t − 1)⌋ = 1, N save is increased by 1.
We notice that the first non-zero return after a row of 0-returns due to staleness is the sum of all missing returns generated by a hidden efficient price.This return is also set as missing.However, the value of return used for estimating volatility is calculated as its expected value: rn−1 = rn−1 √ N0+1 , where N 0 is the amount of missing values strictly before the non-zero return r n−1 .The same is also referred to initially missing values, e.g., due to no-trading or errors in collecting the data.
Another assumption is that a 0-return appears due to staleness if the previous return had the 0-value and was defined to appear due to staleness.We include this rule, since we assume that it is more likely that two consecutive 0-returns appear due to high transaction costs than due to rounding (that is, simply speaking, two outcomes of generating Gaussian random variables are less than a tick size).
Generally, for the estimation of volatility at time t we should consider three cases: P t−1 was missing (or minute t − 1 is non-trading), r t−1 = 0, r t−1 = 0. Thus, the algorithm is the following.We give the algorithm for the case of Sig 1 which is used in the application for the real data.We remove all 0-returns that start the sequence.
For t from 2 to N , where N is the length of time series: Step 1: • Calculate p t (Eq. 3) • If r t−1 is not missing, Z t = Z t−1 + p i • If ⌊Z(t)⌋ − ⌊Z(t − 1)⌋ = 1, N save = N save + 1 Finally, we check if the effect of staleness really exists in the price time series: If N real ≤ i p i + 1.96 √ V ar, we leave time series without putting any missing values, where N real is the initial amount of 0-returns.

C A predictable time series with entropy at maximum
The goal of this section is to construct a price model where entropy is high because of discretization.This model shows that a high entropy value may be caused by discretization, but not because of the randomness of a return time series.
There are equal probabilities of having symbols 0, 1 and 2. 1 corresponds to log-returns, r, equal to −0.4, 2 corresponds to log-returns equal to 0.4.The structure of symbol 0 is more complicated.It covers three other symbols 3, 4, 5.They correspond to log-returns −0.3, 0.1, 0.2, respectively.One of the symbols 3, 4 or 5 appears with probabilities depending on the previous value of these symbols.The probabilities are presented in the Table 7. Having a symbol presented in a column, there are probabilities of getting a symbol presented in a row.
The model implies an average zero return.However, a trading strategy that increases a profit exists.After 3 a trader should buy, and after 4 and 5 the trader should sell.However, the entropy of 3-symbols series is at maximum, that should imply absence of profitable strategies.
Considering the same example with 4-symbols discretization we get that Q 1 = −0.4,Q 2 = 0.1, and Q 3 = 0.4.Therefore, we have the following discretization of returns: Thus, we can distinguish returns r = 0.2 from others the using 4-symbols discretization.Table 7 gives the following probabilities for the blocks of two symbols from the 4-symbols discretization: An observed price moves along a discrete grid.Possible price values are multiples of the tick size, d.P t = d • P e t d

Figure 4 :
Figure 4: Hierarchical clustering tree using KL distance.The threshold for clustering into groups is 0.035.
using the entropy of co-movement

Figure 5 :
Figure 5: Hierarchical clustering tree using the entropy of co-movement.The threshold for clustering into groups is 0.989.

Table 1 :
1.All data are provided by Finam Holdings 2 .Stocks of Russian companies traded at Moscow Exchange.
GARCH(1.25 × 10 −8 , 0.15, 0.8) We divide data into two equal parts with the size N .The first part is a training set for finding optimal values of α from Equations 1 and 2. The second part is a testing set for calculating errors represented in Tables

Table 2 :
Results on volatility estimation.

Table 3 :
Results on filtering out 0-returns.model α for v 1 α for v 2 Er N , v 1 Er N , v 2

Table 4 :
The degree of inefficiency for each stock.

Table 5 :
The most frequent blocks appearing for the Stock MLTR and probabilities of success.

Table 6 :
The most frequent blocks appearing for the Stock RSTI and probabilities of success.

Table 7 :
Transition probabilitiesRows stand for the first symbol of a block, columns stand for the second symbol.