How Complexity and Uncertainty Grew with Algorithmic Trading

The machine-learning paradigm promises traders to reduce uncertainty through better predictions done by ever more complex algorithms. We ask about detectable results of both uncertainty and complexity at the aggregated market level. We analyzed almost one billion trades of eight currency pairs (2007–2017) and show that increased algorithmic trading is associated with more complex subsequences and more predictable structures in bid-ask spreads. However, algorithmic involvement is also associated with more future uncertainty, which seems contradictory, at first sight. On the micro-level, traders employ algorithms to reduce their local uncertainty by creating more complex algorithmic patterns. This entails more predictable structure and more complexity. On the macro-level, the increased overall complexity implies more combinatorial possibilities, and therefore, more uncertainty about the future. The chain rule of entropy reveals that uncertainty has been reduced when trading on the level of the fourth digit behind the dollar, while new uncertainty started to arise at the fifth digit behind the dollar (aka ‘pip-trading’). In short, our information theoretic analysis helps us to clarify that the seeming contradiction between decreased uncertainty on the micro-level and increased uncertainty on the macro-level is the result of the inherent relationship between complexity and uncertainty.


SI.1. Assumptions of Linear Regression
We ran a series diagnostic analysis on the residuals of our multiple linear regressions and feel comfortable with the multiple linear regression with normal noise model. We found that the most worrisome violation stems from the normality of the residuals. Fortunately, this also tends to be the least concerning assumption, since a central limit theorembased results makes the actual distribution of the residuals wash out with enough data. Given the nature of our data from eight different currency pairs, linked in time, any alternative bootstrapping procedure would have to be fairly intricately involved, which would reduce transparency and replicability. We feel that our normal noise assumptions are within what is generally accepted in the community.

SI.2. Comparison without Lagged Term
Social dynamics often contain an important path dependency. Therefore, researchers interested in other influences often include a lagged value of the dependent variable as an independent variable in what is known as dynamic panel data models. Our analysis confirms that the dynamics of the pervious bimonthly period t-1 is highly predictive of the next one t. In this sense, our results show the influence of the other tested variables independent from this effect of path dependency. However, there is an ongoing discussion about the practice of including lagged value dependent variable in panel data models [1]. Leaving out the lagged term leads to a lot of autocorrelation in the residuals of the regression, which would violate basic assumptions. Just to make sure, we also ran the exercise without it. The main conclusions drawn from this study are strongly reinforced when running the tests without the lagged term. The influence of algorithmic trading increases. For example, Figure SI.2 shows the case of measure EeM_200. ATemp_200 is only weakly significant when considering lagged path dependency, but becomes significantly stronger without. The same is shown for measure heM_200, and in general applies to all tests we have seen. We present the version with lagged term in the main article, because it better corresponds to the basic assumptions of linear analysis, and because it presents the more conservative version of our results.

SI.3. Full Array of Models for H1
As discussed in the main article, we also use three complementary estimates for the rise of algorithmic trading (AT), namely empirical, linear and exponential. Given that our information-theoretic dynamicalsystems indicators are not as deeply established in the social sciences, and given that there are less agreed upon best practices in their estimation, we test their robustness by using two different methods to calculate them for each bi-monthly interval, namely ϵ-machines (epsilon machines, eM) [2] and frequency counts (fq) [3]. Note that predictive complexity is a measure of the associated ϵ-machine [4,5], and we only derive those once, with the Causal State Splitting Reconstruction (CSSR) algorithm.
It shows that our results are quite robust, independent from the derivation method, and independent from the estimate for the rise of algorithmic trading. In Tables SI.1 and SI.2, model (2) and model (7) are marked in bold since these are the ones presented in the main article. As can be seen, these are among the models where algorithmic trading has the least influence, and are therefore a rather conservative estimate of our broader results. Tests for bi-monthly changes in coarse-grained (20 bins based) complexity in form predictable information (E) and predictive complexity (C), measured according to frequency counts (fq) and ϵ-machines (eM); showing unstandardized beta coefficients, with standard errors in italic parentheses. *** p<0.01, ** p<0.05, * p<0.1 (N = 520). Tests for bi-monthly changes in fine-grained (200 bins based) complexity in form predictable information (E) and predictive complexity (C), measured according to frequency counts (fq) and ϵ-machines (eM); showing unstandardized beta coefficients, with standard errors in italic parentheses. *** p<0.01, ** p<0.05, * p<0.1 (N = 520). (1)

SI.4. Full Array of Models for H2
Model (2) in Tables SI.3 and SI.4 is the one presented in the main article. As can be seen, it is among those models in which algorithmic trading has the least influence, and are therefore a rather conservative estimate of our broader results. Table SI.3: Tests for bi-monthly changes in coarse-grained (20 bins based) remaining uncertainty in form entropy rate (h), measured according to frequency counts (fq) and ϵ-machines (eM); showing unstandardized beta coefficients, with standard errors in italic parentheses. *** p<0.01, ** p<0.05, * p<0.1 (N = 520).

SI.6. Conditional Entropy Plots
In line with the chain rule of entropy, Figure SI.5 shows the diverging uncertainty between the more coarse-grained and the more fine-grained perspective. In 2007, both were still similar. The area between both levels of uncertainty is the conditional uncertainty (conditioned on the more coarse-grained resolution level).

SI.7. Decreasing Bid-Ask Spreads
Gaining some intuition about the evolution of bid-ask spreads, Figure SI.6 shows visual evidence for a changing bid-ask spreads over the decade. In agreement with Hendershott and Moulto [6] the descriptive data shows an increase of the spread during the 2008 financial crisis (Lehman Brothers filed bankruptcy in September 2008) (in 2011, Hendershott et al. [7] speculated that there could be increases of temporary nature regarding realized spreads, related to asymmetries exploited by liquidity suppliers during early phases of algorithmic trading. From a decade long perspective, this might be the case, but it also seems to be the case that the increase in spread rather is linked to the financial crisis and its accompanying turmoil per se). Over the decade, however, bi-monthly bid-ask spreads decreased by half, sometimes more. In Jan-Feb 2007, the average bid-ask spread of EUR/USD was precisely one-hundredth of a cent higher than in Nov-Dec 2017 (0.00013 vs. 0.00003). Figure SI.6b shows that a similar decreasing tendency also applies to the bi-monthly standard deviation of the bid-ask spread over the same period.
It is important to point out that our relatively short time window cannot eliminate the possibility that the rather large and volatile bid-ask spread around 2008-2009 is rather the result of the global financial crisis.
Our analysis of predicting the bi-monthly means and standard deviations with our six IVs (Table SI.5), confirms that both decreases are linked to a strongly and monotonically increasing tendency that is in line with the rise of algorithmic trading (our independent variable AT, in its three different versions, namely empirical (ATemp), linear (ATlin) and exponential (ATexp)). The strongest predictor is the lagged path dependency term, closely followed by our AT variable. Interest rate and unemployment rate are also significant predictors, but less important in terms of effect size. GDP growth rate and inflation, which have seen important variances over the decade, do not play a significant role in predicting changing bid-ask tendencies. Additionally, to the negative association between algorithmic trading and bid-ask spread, we can also add that the standard deviation decreased in association with our increased algorithmic trading tendency, which gives us first indications in terms of temporal predictability.