Exploring Simplicity Bias in 1D Dynamical Systems

Arguments inspired by algorithmic information theory predict an inverse relation between the probability and complexity of output patterns in a wide range of input–output maps. This phenomenon is known as simplicity bias. By viewing the parameters of dynamical systems as inputs, and the resulting (digitised) trajectories as outputs, we study simplicity bias in the logistic map, Gauss map, sine map, Bernoulli map, and tent map. We find that the logistic map, Gauss map, and sine map all exhibit simplicity bias upon sampling of map initial values and parameter values, but the Bernoulli map and tent map do not. The simplicity bias upper bound on the output pattern probability is used to make a priori predictions regarding the probability of output patterns. In some cases, the predictions are surprisingly accurate, given that almost no details of the underlying dynamical systems are assumed. More generally, we argue that studying probability–complexity relationships may be a useful tool when studying patterns in dynamical systems.


Introduction
In recent years several studies of simplicity bias have been made in input-output maps, in which a general inverse relationship between the complexity of outputs and their respective probabilities has been observed [1,2].More specifically, using arguments inspired by algorithmic information theory [3,4,5] (AIT), and specifically algorithmic probability, an upper bound on the probability P (x) of observing output pattern x was presented [1], with the bound depending on the estimated Kolmogorov complexity of x.The upper bound implies that complex output patterns must have low probabilities, while high probability outputs must be simple.Example systems where simplicity bias has been observed include RNA structures [1,6], differential equation solutions [1], finite state transducers [2], time series patterns in natural data [7], natural protein structures [8], among others.
A full understanding of exactly which systems will, and will not, show simplicity bias is still lacking, but the phenomenon is expected to appear in a wide class of input-output maps, under fairly general conditions.Some of these conditions were suggested in ref. [1], including (1) that the number of inputs should be much larger than the number of outputs, (2) the number of outputs should be large, and (3) that the map should be 'simple' (technically of O(1) complexity) to prevent the map itself from dominating over inputs in defining output patterns.Indeed, if an arbitrarily complex map was permitted, outputs could have arbitrary complexities and probabilities, and thereby remove any connection between probability and complexity.Finally (4), because many AIT applications rely on approximations of Kolmogorov complexity via standard lossless compression algorithms [9,10] (but see [11,12] for a fundamentally different approach), another condition proposed is that the map should not generate pseudo-random outputs like π = 3.1415... which standard compressors cannot handle effectively.The presence of such outputs may yield high probability outputs which appear 'complex' hence apparently violating simplicity bias, but which are in fact simple.
To explore further the presence of simplicity bias in dynamical systems and physics, and also in maps which test the boundaries of the conditions for simplicity bias described above, we examine the output trajectories of a selection of 1D maps from chaos theory, namely the logistic map, the Gauss ("mouse") map, the sine map, the Bernoulli map, and the tent map.Starting with its popularisation by Robert May [13] in the 1970's, the logistic map has been heavily studied.The map is a textbook example of a simple system which can produce simple, complex, and chaotic patterns via iterations of the map [14].This map is related to a very wide range of nonlinear models with applications in epidemiology, economics, physics, time series, etc. [15].Due to the popularity of the logistic map, and because its trajectory outputs can depict simple as well as complex, chaotic, and even pseudo-random patterns, we focus primarily on the logistic map, but we also analyse the other mentioned maps.
Although not restricted to studying binary strings, most AIT results are framed in the context of binary strings and hence applications are easier in the same framework.Thus in this work we will study simplicity bias in digitised binary trajectories of these example 1D dynamical systems.Our main conclusions are that simplicity bias appears in the logistic map, the Gauss ("mouse") map, and the sine map, and hence we suggest that simplicity bias may also appear in natural dynamical systems more generally.As a broader context of motivation, this work contributes to research is the intersection of dynamical systems, and AIT, and machine learning.
2. Background and problem set up 2.1.Background theory and pertinent results.
2.1.1.AIT and Kolmogorov complexity.We give some basic background regarding AIT now, and describe simplicity bias in more detail.Note that the current work will not involve detailed AIT or related theory, so we only give a brief survey of relevant results without giving many formal details here.There are many standard texts which the interested reader can refer to if needed, e.g., refs.[16,17,18,19].
Within computer science, algorithmic information theory [3,4,5] (AIT) directly connects computation, computability theory, and information theory.The central quantity of AIT is Kolmogorov complexity, K(x), which measures the complexity of an individual object x as the amount of information required to describe or generate x.K(x) is more technically defined as the length of a shortest program which runs on an (optimal prefix) universal Turing machine (UTM) [20], generates x, and halts.More formally, the Kolmogorov complexity K U (x) of a string x with respect to U , is defined [3,4,5] as where p is a binary program for a prefix optimal UTM U , and |p| indicates the length of the program p in bits.Due to the invariance theorem [16] for any two optimal UTMs U and V , K U (x) = K V (x)+O(1) so that the complexity of x is independent of the machine, up to additive constants.Hence we conventionally drop the subscript U in K U (x), and speak of 'the' Kolmogorov complexity K(x).
Informally, K(x) can be defined as the length of a shortest program that produces x, or simply as the size in bits of the compressed version of x (assuming a perfect compressor).If x contains repeating patterns like x = 1010101010101010 then it is easy to compress, and hence K(x) will be small.On the other hand, a randomly generated bit string of length n is highly unlikely to contain any significant patterns, and hence can only be described via specifying each bit separately without any compression, so that K(x) ≈ n bits.Other more expressive names for K(x) are descriptional complexity, algorithmic complexity, and program-size complexity, each of which highlight the idea that K(x) is measuring the amount of information to describe or generate x precisely and unambiguously.Note that Shannon information and Kolmogorov complexity are related [21], but differ fundamentally in that Shannon information quantifies the information or complexity of a random source, while Kolmogorov complexity quantifies the information of individual sequences or objects.An increasing number of studies show that AIT and Kolmogorov complexity can be successfully applied in physics, including thermodynamics [22,23,24,25], quantum physics [26], and entropy estimation [27,28].Further, applications to biology [29,8,30], other natural sciences [31], and engineering, are also numerous [32,16,33].
2.1.2.The coding theorem and algorithmic probability.An important result in AIT is Levin's [34] coding theorem, establishing a fundamental connection between K(x) and probability predictions.Mathematically, it states that (1)  (2) where P (x) is the probability that an output x is generated by a (prefix optimal) UTM fed with a random binary program.Thus, high complexity outputs have exponentially low probability, and simple outputs must have high probability.This is a profound result which links notions of data compression and probability in a direct way.P (x) is also known as the algorithmic probability of x.
Given the broad reaching and striking nature of this theorem, it is somewhat surprising that it is not more widely studied in the natural sciences.The reason in part for this inattention is that AIT results are often difficult to apply directly in real-world contexts, due to a number of issues including the fact that K(x) is formally uncomputable and the ubiquitous use of UTMs.
2.1.3.The simplicity bias bound.Coding theorem-like behaviour in real-world input-output maps has been studied recently, leading to the observation of a phenomenon called simplicity bias [1] (see also ref. [35]).Simplicity bias is captured mathematically as where P (x) is the (computable) probability of observing output x on random choice of inputs, and K(x) is the approximate Kolmogorov complexity of the output x: complex outputs from input-output maps have lower probabilities, and high probability outputs are simpler.The constants a > 0 and b can be fit with little sampling and often even predicted without recourse to sampling [1].We will assume that b = 0 in Eq. (3) throughout this work, which is a default assumption as argued and discussed in ref. [1].There is also a conditional version of the simplicity bias equation [36].
The ways in which simplicity bias differs from Levin's coding theorem include that it does not assume UTMs, uses approximations of complexities, and for many outputs P (x) ≪ 2 −K(x) .Hence the abundance of low complexity, low probability outputs [2,37] is a signature of simplicity bias.
2.1.4.Estimating pattern complexity.To estimate complexity, we follow ref.[1] and use where N w (x) comes from the 1976 Lempel and Ziv complexity measure [9], and where the simplest strings 0 n and 1 n are separated out because N w (x) assigns complexity K = 1 to the string 0 or 1, but complexity 2 to 0 n or 1 n for n ≥ 2, whereas the true Kolmogorov complexity of such a trivial string actually scales as log 2 (n) for typical n, because one only needs to encode n.Having said that, the minimum possible value is K(x) ≈ 0 for a simple set, and so e.g. for binary strings of length n we can expect 0 ≤ K(x) ≤ n bits.Because for a random string of length n the value C LZ (x) is often much larger than n, especially for short strings, we scale the complexity so that a in Eq. ( 3) is set to a where M is the maximum possible number of output patterns in the system, and the min and max complexities are over all strings x which the map can generate.K(x) is the approximation to Kolmogorov complexity that we use throughout.This scaling results in 0 ≤ K(x) ≤ n which is the desirable range of values.2.2.Digitised map trajectories.The AIT coding theorem is framed in terms of random inputs or 'programs' and resultant output strings or patterns.The core idea of the simplicity bias bound is that for a large range of systems, uniformly sampling input parameter values ('programs') yields an exponential variation in output pattern probabilities, with high probability patterns having low complexities.While dynamical systems may not appear to fit this input-output framework, they can be viewed as input-output functions if the initial values and parameters are considered input 'programs' which are then computed to generate 'output' dynamical system trajectories.Because output pattern complexities and probabilities are more easily calculated if outputs are binary strings (and as above, AIT is mainly framed in terms of binary strings), we will digitise the real-valued trajectories into binary strings.
To illustrate this input-output framework, consider binary sequence trajectories resulting from digitised realisations of the logistic map where the inputs are the pair of values x 0 ∈(0.0, 1.0) and µ ∈ (0.0, 4.0].For the outputs, the map is first iterated to obtain a sequence of n real values x 1 , x 2 , x 3 , . . ., x n in [0,1].Similar to the field of symbolic dynamics [38], the real-valued trajectory is digitised to become a binary string output sequence x by applying a threshold, and writing 1 if x k ≥ 0.5 and 0 otherwise [39].Hence a binary sequence output x of n bits is generated for each input pair µ and x 0 .By way of example, consider choosing µ = 3.8 and x 0 = 0.1 with n = 25, then from iterating Eq. ( 6) with k = 1, 2, ..., 25 we obtain the real-valued trajectory x 1 , x 2 , . . ., x 25 = 0.34, 0.86, . . ., 0.25, 0.72, 0.77, 0.67 which after digitisation becomes the binary string x = 0101011011111011010110111 Figure 1 illustrates the trajectory and digitisation procedure.The choice of n is a balance between being long enough that many different patterns can emerge, and short enough that decent frequency and probability estimates can be made without the need for excessive sampling.Different µ and x 0 inputs can yield different binary strings x ∈ {0, 1} n .
Note that throughout the paper, we will ignore the first 50 iterations of a map, so as to discard the transient dynamics of the iterated maps.This step of ignoring the initial iterations is common practice [40].3. Results

Logistic map.
The logistic map in Eq. ( 6) is possibly the best known and most iconic 1D map studied in dynamical systems and chaos theory.To our knowledge, there have not been many studies of the complexity of digitised trajectories of the logistic map.A notable exception is the work by Kaspar and Schuster [41] who also studied the estimated complexity of resulting binary string trajectories.
However their work was fundamentally different in that it was not concerned with simplicity bias or in estimating the probability of different outputs.
3.1.1.Parameter intervals.The intervals x 0 ∈ (0.0, 1.0) and µ ∈ (0.0, 4.0] are standard ranges in which the logistic map is studied and it can be shown [40], for example, that for µ > 4.0 almost all trajectories are not confined to [0,1]; similarly if x 0 / ∈ (0, 1) the behaviour can be unbounded or trivial.For some large values of µ (e.g., µ = 4.0), almost all initial values x 0 yield complex and chaotic outputs [14], and the distribution over digitised trajectories is roughly uniform, with little bias.Note that we use the word 'bias' to describe a strongly non-uniform distribution of the probability to obtain a given binary string output.
In Figure 2(a) the bifurcation diagram of the logistic map is shown, in which for different values of µ the asymptotic x k values are depicted.The diagram shows fixed points, oscillations, and nonperiod behaviour.Also added is the value 0.5 as a red line, highlighting the digitising threshold.It is known [14] that if µ ∈ [0.0, 1.0], then the trajectories tend to 0. Because we truncate at 0.5, therefore the corresponding binary string would be x = ...0000.If µ ∈ (1.0, 2.0] then the trajectories tend to 1 − (1/µ), which means that we expect to see binary strings x = ...0000.For µ ∈ (2.0, 3.0], the resulting pattern would be x = ...1111 because 1 − (1/µ) is greater than 0.5.For µ from 3.0 to ≈3.3, we still expect x = ...1111 because although the first bifurcation appears at µ =3.0, and oscillations with period two begin, until about 3.3 both values of the oscillations are larger than 0.5.Figure 2(b) shows the same bifurcation diagram, but zoomed in on the larger values of µ.From about 3.3 to about 3.5, oscillations between two or four values appear, and larger values of µ can yield oscillation periods of 8, 16, etc., which will yield patterns such as x = 0101.... Chaotic trajectories do not occur until [14] µ ≥ 3.56994567... ≈ 3.57 (https://oeis.org/A098587), and so the µ interval 3.0 to about 3.5699 contains period-doubling bifurcations, in which oscillations of exponentially increasing frequency occur but no truly chaotic trajectories.Finally, for µ between ≈3.57 and 4.0, more complex and chaotic patterns can emerge, but also 'islands' of stability appear in this interval, as can be observed in Figure 2(b).
In our numerical simulations, we will separately investigate various intervals for µ, namely µ sampled uniformly from (0.0, 4.0], [3.0, 4.0], [3.57, 4.0], and finally also fixing µ = 4.0.The motivation for choosing these intervals is as follows: Given that the interval µ ∈ (0.0, 4.0] is the standard range in which the map is studied, the most 'natural' sampling strategy is to sample uniformly across this interval.As for choosing intervals [3.0, 4.0] and [3.57, 4.0], these are interesting test cases because we can expect to see complex patterns appearing more frequently in these intervals.Finally, when fixing µ = 4.0 most trajectories will be highly complex, and so we might expect simplicity bias to disappear.

Connection of simplicity and probability.
Reflecting on the bifurcation diagram and above related comments, it is easily seen that by uniformly sampling µ from (0.0, 4.0], some simple binary patterns will have high probability, and highly complex patterns will have low probability just because most of the interval (0.0, 4.0] yields fixed points or low period oscillations.Hence, some general form of bias towards simplicity is expected for the logistic map.However, what is not a priori obvious is whether or to what extend the simplicity bias bound in Eq. (3) will be followed or have predictive value.
3.1.3.Simplicity bias appears when bias appears.Following the protocol for generating binary strings via logistic map trajectories described above, we now examine the probability P (x) that the digitised trajectory x is produced on random sampling of input parameters µ and x 0 , done here with 10 6 samples over uniformly chosen random parameters.Using Eq. ( 3), we can make an a-priori prediction for the upper bound decay upon sampling of the input parameters.(Recall that we ignore the first 50 iterations of the map, in order to exclude the transient dynamics.) Figure 3(a) shows that the upper bound prediction (black line) agrees remarkably well with the probability and complexity data for different binary string output (blue dots); we see that the logistic map displays simplicity bias when uniform sampling µ ∈ (0.0, 4.0], such that high probability outputs have low complexities, and complex outputs have low probability.The gradient prediction (a = 1) of the black line used no information about the map, except that we assumed that almost all 2 n outputs of length n bits are realisable.An upper-bound fit to the log 10 P (x) data gives the slope as -0.18.Note that many output strings fall below the upper bound prediction as we expected from earlier studies [1,37], but nonetheless it is known that randomly generated outputs tend to be close to the bound [2].Put differently, even though many output strings (blue dots) appear to be far below the bound, most of the probability mass for each complexity value is concentrated close to the bound.Thus, this simple bound predicts P (x) quite accurately by using the complexity values of output strings yet while otherwise completely ignoring the details of the underlying dynamical system.In Figure 3(b) we make a similar plot to panel (a) except that we restrict the sampling to µ ∈ [3.0, 4.0].The qualitative pattern in (b) is similar to (a), although simplicity bias is slightly less clear than in (a).An upper-bound fit to the log 10 P (x) data gives the slope as -0.17.For Figure 3(c) the sampling interval was chosen so that µ ∈ [3.57, 4.0] which is the region containing some truly chaotic trajectories [40].Both bias and simplicity bias are less clear in this plot.An upper-bound fit to the log 10 P (x) data gives the slope as -0.04.In Figure 3(d) a plot is given in which µ is no longer sampled, but is fixed at µ = 4.0, so that almost all x 0 values yield chaotic trajectories.As expected, there is very little bias in the distribution, i.e., the distribution of P (x) is roughly uniform, and hence no simplicity bias can be observed.An upper-bound fit to the log 10 P (x) data gives the slope as 0.02. Figure 3(d) still shows some very simple strings, which can be understood from the fact that x i ≈ 0 ⇒ x i+j ≈ 0 for j ≈ 1 even if the trajectories may become much more complex for j ≫ 1.In other words, if the trajectory reaches a value close to zero, then it tends to stay close to zero for several iterations.These initial short trajectories with many 0s will have very low complexity.
As argued above, the fact that simple patterns occur with high probability when sampling µ from (0.0, 4.0] is not in itself remarkable, and can be rationalised from the bifurcation diagram.However, what is noteworthy in our investigation is that we don't just see a vague general inverse relation between complexity and probability (as the preceding discussion might predict), but rather that we see an exponential decay in probability with linearly increasing complexity which follows the upper bound of Eq. (3) quite closely.
See the Appendix A for an illustration of the effects of using the different values of n.See Appendix B for the same plots as in Figure 3, but with semi-transparent data points to highlight the distribution of data points.3.1.4.Distribution of complexities.By itself, simplicity bias does not necessarily imply that the distribution of complexity values, P (K(x) = r), will be skewed towards low complexities values.Rather, simplicity bias means that the individual probability of some simple strings will be higher than that of more complex strings.Because the probability of observing a given complexity r depends on the product of the number of strings of complexity r and their individual probabilities, the distribution P (K(x) = r) may in fact peak at high complexities, or low complexities, or have no peak at all possibly.
To investigate this distribution in the logistic map, in Figure 4 we plot the distribution of complexities for the different sampling intervals of µ.As can be seen in Figure 4(a), when sampling from [0.0, 4.0] there is a bias towards lower complexities.As for (b), when µ is sampled from [3.0, 4.0] the distribution of complexities is roughly uniform (at least on a log scale).In Figure 4(c), the distribution is also somewhat uniform but peaks around medium complexity values.In (d), there is a bias towards higher complexity values.For comparison, in (e) we also plot the distribution of complexities resulting from sampling purely random binary strings, and this distribution is very similar to that in (d) obtained when µ = 4.0.It is noteworthy that in some of these cases, while there is some evidence of bias toward higher or lower complexities, the distributions still display spread and are not narrowly peaked (at least on a log scale).We note that if the logistic map produced binary output strings with a uniform probability over strings, in theory the frequency would grow exponentially with complexity r, however, for Lempel-Ziv complexity with short strings, the distribution is not quite exponential, see [1,42].
The appearance of a distribution that is much less peaked towards high complexity than that of randomly chosen strings can be rationalised from AIT arguments.To first order there are ∼ 2 r strings of complexity r, each with probability ∼ 2 −r , such that the product 2 r 2 −r = O(1) is independent of r, and presumably uniform.For prefix complexity, the number of strings with complexity r is slightly less than 2 r , and hence for a prefix UTM the distribution would be biased to lower r values to some extent.Nonetheless, the brief rough argument outlined is still valid as a first order approximation for distributions with large spread, as we see in Figure 4. See also ref. [8] for similar arguments and results regarding the distribution of complexities in a biological setting, as well as ref.[42] for the setting of machine learning.
Finally, we note that in practice, far fewer samples are needed to produce a P ( K(x) = r) distribution than a P (x) distribution, because many strings have the same K. Comparing the sampled distribution to a null-model of random strings may be the quickest and easiest way to diagnose simplicity bias in a dynamical system.
3.1.5.Complex and pseudo-random outputs.The occurrence of complex patterns may prompt a question: since the logistic map in Eq. ( 6) is itself simple with low Kolmogorov complexity, it might be supposed that any output x it generates should be simple as well.So, how can we obtain large variations in complexity which would lead to large variations in probability?Or are all the apparently complex outputs merely pseudo-random patterns?We can address these analytically via bounding the complexity K(x) of a discretised trajectory written as an n-bit binary string x using the following inequality This bound follows from the fact that any x can be described precisely by first describing: (a) the logistic map function in Eq. ( 6) with only a few bits because K(logistic map)=O(1) bits, (b) a pair of values µ and x 0 which yield x with K(µ, x 0 ) bits, and (c) the length of the string (i.e., the number of iterations to perform) via n, which is at most log 2 (n) bits (up to loglog terms).From this upper bound, we see that if µ and x 0 are simple values with short descriptions, then K(x) ≪ n so that x must be simple.However, because µ and x 0 are (truncated) real numbers randomly chosen from their intervals, they need not be simple -rather the opposite -because almost all decimal numbers sampled here are given to d decimal places, and so will have high complexity values of ∼ log 2 (10)d bits.Therefore, outputs need not be simple just because the map is simple, due to the presence of the complexity of the inputs in Eq. (7).Nor are all outputs merely pseudo-random patterns.Indeed, this argument accords with the fact that in Figure 3 we see that very many outputs x are realised, most of which must be (truly) complex because it is well known from AIT that almost all strings of length n are complex and only relatively few are simple.
Extending the preceding discussion on Eq. ( 7), it is also interesting to ask if there are simple input parameters which generate high entropy pseudo-random outputs.Indeed there are, for example if µ = 4.0 then almost all x 0 lead to chaotic trajectories including some simple initial values like x 0 = 1/3 which can be described with only a few bits, so that K(µ, x 0 ) = O(1) and so K(x) ≤ log 2 (n) + O(1) (up to additive loglog terms), but at the same time K(x) ≈ n bits because the Lempel-Ziv complexity measure cannot compress pseudo-random strings.
On reflection, these pseudo-random strings with K(x) ≪ K(x) must be 'rare' in the space, and not have individually high probability, otherwise we would not see simplicity bias, or at least see many points strongly violating of the upper bound in Figure 3(a), which we do not.
3.1.6.Pre-chaotic regime.As discussed above, when µ > 3.5699... some trajectories can be chaotic, but there are also intermittent 'islands' of stability, and in general the pattern of types of behaviour in the region 3.57 to 4.0 is intricate and complicated.However, in the region µ ∈ [0.0, 3.5699] the types of behaviour are more straightforward, with progressive period-doubling bifurcations but no true chaos.Feigenbaum [43,44] famously showed that the distance between successive bifurcations eventually decreases exponentially, as µ q+1 − µ q ≈ 2.069/4.669q [40], where µ q is the value of the q th bifurcation.
Because this interval µ ∈ [0.0, 3.5699] is relatively easy to understand, and to see if simplicity bias appears even without chaos, we also generated a complexity-probability plot for µ ∈ [0.0, 3.5699] uniformly sampled: Figure 5(a) shows the decay in probability with increasing complexity.As is apparent, there are fewer patterns (blue dots) which is because we are restricting the possible space of patterns by restricting to this interval of µ.We see that some simplicity bias is observed, but the bound is less clearly followed as compared to the when sampling across [0.0, 4.0].For example, the upper bound on the data points' decay is less clearly linear.
To get a better understanding of the complexity of trajectories, we can find 1 the oscillation periods for some chosen values of µ: With µ = 3.4, period 2; µ = 3.5, period 4; µ = 3.56, period 8; µ = 3.569, period 32; µ = 3.5699, period 128.So for the sampled interval [0.0, 3.5699] the highest period is 128.Because we use n = 25, any pattern with period 32 or more will not appear as periodic, because n needs to be large for the periodicity to become plain.From this brief analysis, we can see how the low-probability and high-complexity patterns have appeared in Figure 5(a).In conclusion, some simplicity bias is observed for the interval µ ∈ [0.0, 3.5699], but it is not as pronounced as for sampling µ ∈ [0.0, 4.0], which is presumably due to the presence of potentially more complex patterns in the interval 3.57 to 4.0. 1 WolframAlpha (https://www.wolframalpha.com/), using command "logistic map for r=[//number:3.4//] and x=[//number:0.2//]"3.2.Gauss map ("mouse map").Moving on from the logistic map, we now explore simplicity bias in another 1D map, namely the Gauss map, which is also known as the mouse map because the bifurcation diagram is reminiscent of a mouse [45].The equation defining the dynamical system is The value α = 7.5 is chosen and will be fixed.For many values of α the bifurcation diagram is not sufficiently complex as to yield many varied trajectories; also the value 7.5 has been used previously [46].The value β is the bifurcation parameter and the map is typically studied [45,47] with β ∈ [−1.0, 1.0] .Similar to the logistic map example, we will sample initial values x 0 and values of β, then ignore the first 50 iterations (due to transient dynamics), and then digitise the real-valued trajectory to form binary string outputs.Because iterations of Eq. ( 8) are not confined to [0.0, 1.0] and due to the form of the bifurcation diagram [45], we will sample x 0 ∈ [−0.5, 0.5] uniformly.Also due to the form of the bifurcation diagram, the digitisation threshold will be set at 0.2 (instead of 0.5, as it was in the case of the logistic map).Changing the threshold is done merely to avoid having too many trivial outputs of x = 000 . . .000.As above, we will use n = 25.
In Figure 5(b), we see that there is clear simplicity bias in the Gauss map.The slope of the decay follows the predicted black line very closely, but the offset value b is not well approximated by b = 0. Nonetheless, there is clear simplicity bias, similar to the logistic map case.Hence, we can predict the relative log-probability of output strings with quite high accuracy.
3.3.Sine map.We now study simplicity bias in the sine map.Wolfram [48] described the sine map and further displayed the bifurcation diagram for this map, which is broadly similar to that of the logistic map, to illustrate Feigenbaum's discovery of universality.For this map, we sample x 0 ∈ [0.0, 1.0] uniformly 10 6 times, and return to the digitisation threshold of 0.5.The parameter µ will sampled uniformly from [0.0, 1.0].As before, the first 50 iterates will be ignored, and we will use n = 25.Note that there is another form of the sine map which is sometimes studied in dynamical systems [49,50], in which there is no square root on x k .Figure 5(c) shows the complexity-probability plot for the sine map, and simplicity bias is also present here.However, the upper bound is less clearly linear (on the log scale), and the black line prediction is not as closely followed.Also noteworthy is that in the tail of the decay at higher complexities, the upper bound appears to curve up slightly.It is not clear why this happens.
3.4.Bernoulli map.Moving on to another prototypical chaotic map, the Bernoulli map (also known as the dyadic map, bit shift map, doubling map, or sawtooth map) is defined via the equation with x 0 ∈ (0.0, 1.0).This map shows sensitive dependence on initial conditions because the trajectories of two inputs x 0 and y 0 which differ by |x 0 − y 0 | ≈ 0 will eventually diverge for large enough k.
Given that the Bernoulli map is also a 1D chaotic system with a simple (O(1) complexity) map, it is interesting to ask whether it shows simplicity bias like the logistic map and others do.A little reflection shows that this map does not show simplicity bias nor bias, even for a digitised version.The trajectory is defined by multiplying 2 by x 0 ignoring the integer part, so if x 0 is a random real number then the trajectory will be random and incompressible because (2x 0 mod 1) will be another random number, assuming that x 0 is defined to a large number of decimal places.Multiplying a random number by 2 does not remove the randomness.Hence the binary discretised trajectory sequence would look almost random, with small and quickly decaying autocorrelation (see also Section 9.4 of ref. [17] for a similar conclusion).For bias and simplicity bias, it is necessary for random inputs to be able to lose a lot of their information and complexity (i.e., complex inputs must be able to produce simple outputs), but the Bernoulli map does not allow this.Hence the behaviour of this map is similar to the logistic map with µ = 4.0 in the sense that there is no bias and no simplicity bias.Indeed, this similarity is quite natural due to the conjugacy between the logistic map with µ = 4.0 and the Bernoulli (doubling) map.This map does not have a bifurcation parameter µ.
3.5.Tent map.The last map we look at is the tent map, which is quite well known and studied in dynamical systems research [14].The iterated values follow the function , for k = 0, 1, 2, 3, . . . .This map does not have a bifurcation parameter µ.Despite being a 1D dynamical system, this map does will not lead to strong bias in the distribution of digitised binary string outputs x, and hence cannot possibly show simplicity bias.Intuitively, this can be seen due to the fact that almost all values of x 0 will yield complex paths, while simplicity bias arises typically when most inputs lead to relatively simple paths.Indeed, because the tent map is topologically conjugate to the logistic map [14] when µ = 4.0, and we saw neither bias nor simplicity bias in the logistic map [14] when µ = 4.0, this helps to understand the absence of simplicity bias in the tent map.

Discussion
Arguments inspired by algorithmic information theory (AIT) predict that in many input-output maps, strongly non-uniform probability distributions over outputs will result, with complex patterns having exponentially low probabilities, and some simple patterns having high probability; this phenomenon is known as simplicity bias [1].Here, we numerically investigated the presence of simplicity bias in digitised trajectories arising from iterations of the logistic map, Gauss map, sine map, Bernoulli map, and tent map.By digitising the real-valued trajectories, we studied the probability and complexity of the resulting binary strings.Our main conclusions are that (i) we observe simplicity bias in the logistic map, Gauss map, and sine map, and also that in some cases the probability of resulting binary strings can be predicted a priori with surprising accuracy; and (ii) we do not observe simplicity bias in the trajectories of the Bernoulli map and tent map, nor indeed any bias at all.
Due to the qualitatively different behaviours exhibited by the logistic map for different µ values, we separately studied different regimes by sampling µ from (0.0, 4.0], (3.0, 4.0], (3.57, 4.0] and also µ = 4.0.In general, simplicity bias and upper bound prediction accuracy was higher for µ sampled across the full range (0.0, 4.0] and decreased for smaller ranges, until completely disappearing for µ = 4.0.The logistic map is perhaps the most iconic example of a dynamical system in chaos theory, and has been very extensively studied for decades.Here we report a novel finding relating to this map, and one that is not (merely) a subtle higher-order effect, but rather a strong effect related to order-of-magnitude variations in pattern probability.This finding is also interesting given that we did not necessarily expect to observe simplicity bias when outputs can be pseudo-random (Cf.ref. [1]).It appears that in this map, pseudo-random outputs are sufficiently rare that they do not cause strong violations of the simplicity bias upper bound.Additionally, we found simplicity bias can be 'tuned' via altering the µ interval: sampling from the full interval (0.0, 4.0] yields a biased (low entropy) distribution over output strings along with simplicity bias, while sampling from higher values of µ ≈ 4.0 yields low bias (high entropy) distributions and little or no simplicity bias.While we observe simplicity bias similar to that predicted by AIT arguments in some of these maps, we want to clarify that we do not think that we have in any way proven that these patterns are in fact directly linked to the AIT arguments.For that, much more work is needed.Nevertheless, we argue that studying probability-complexity relationships, and looking for patterns such as simplicity bias may be a fruitful perspective for dynamical systems research.
A weakness of our probability predictions is that they only constitute an upper bound on the probabilities, and for example Figure 3(a) shows that many output trajectory patterns x fall far below their respective upper bounds.Following the hypothesis from [2,37], these low-complexity low-probability outputs are presumably patterns which the logistic map finds 'hard' to make, yet are not intrinsically very complex.Further, the presence of these low-complexity low-probability patterns may indicate the non-universal power of the map [2].A potential avenue for future work would be studying what types of patterns occur far from the bound may, and possible approaches to improving on their probability predictions [37].
The motivations for this work were to explore new examples of simplicity bias in physics and specifically dynamical systems, test the boundaries of relevance of simplicity bias, and explore how information theory and algorithmic probability can inform dynamical systems research.By extension, this work expands on the project of investigating the interaction between machine learning and dynamical systems because machine learning is intimately connected to information theory and algorithmics.The broader context of our work is a research project into the interface of dynamical systems, machine learning, and AIT.Given the strong interest in applying machine learning to the analysis of dynamical systems, an open and important question, then, is the extent to which machine learning methods are applicable in dynamical systems problems, and what kinds of limitations or advantages this relatively novel approach may have.Since information theory and machine learning are inextricably linked and have even been referred to as "two sides of the same coin" [51], one perspective on these questions is to consider whether computation and information processing -fundamental components of machine learning -themselves might have limits of applicability, or in some other way constrain or inform dynamical behaviours.Some fascinating examples of such limits are known in relation to uncomputability of trajectories, meaning that some properties of dynamical trajectories may not be possible to calculate, even in principle.In a seminal paper, Moore [52] proved that the motion of a single particle in a potential can (in some specific settings) be sufficiently complex as to simulate a Turing machine, and thereby yield trajectories with uncomputable quantitative properties.More recently, Watson et al. [53] proved for an example many-body system that even if the exact initial values of all system parameters were known, the renormalisaton group trajectory and resultant fixed point is impossible to predict.This kind of unpredictability is stronger than the unpredictability of (merely) chaotic dynamics, in which the limiting factor is the accuracy with which initial conditions can be measured.See also Wolfram [54,55], Svozil [56], Lloyd [57], and Aguirre et al. [58] for more discussion of (un)computability and (un)predictability in physical systems.Naturally, if the dynamics of some system cannot be computed even in principle, the accuracy of machine learning approaches to prediction in these settings will be restricted (but see [59]).
There may also be deep connections between deep neural networks (DNNs) and simplicity bias.Indeed, upon random sampling of parameters, DNNs exhibit an exponential bias towards functions with low complexity [60,61,62].This property implies that they can learn Kolmogorov simple data fairly well, but will not generalise well on complex data.Interestingly, by changing the initialization over parameters towards a more artefactual chaotic regime of DNNs, this simplicity bias becomes weaker [63].It has been recently shown that in this regime, DNNs no longer generalise well on both simple and complex data [42], and tend to overfit.This is not unexpected, because DNNs are highly expressive, and classical bias-variance arguments suggest that they should be highly prone to overfitting.The big question is why standard DNNs don't fall prey to this problem.It was argued that the Occam's razor like simplicity bias toward simple functions observed in standard DNNs compensates the exponential growth of the number of possible functions with complexity, and that this compensation explains why such DNNs can be highly expressive without overfitting [42].These principles imply that if a dynamical system exhibits some form of simplicity bias then the inbuilt Occam's razor inductive bias of DNNs should make it much easier to learn by DNNs than in the opposite case where the dynamical system does not have simplicity bias.
The word "complexity" can take on many meanings [64,65], and can be vague.In this work we are precise about what we mean by complexity, which is Kolmogorov complexity, and in practice using lossless compression methods which is a standard and theoretically motivated approximation to the true uncomputable quantity.Bialek et al. [66,67] discuss complexity in relation to time series, and argue that Kolmogorov complexity lacks in its intuitive appeal for a measure of the complexity of these types of sequential patterns.Many would agree that a series with rich complicated structure and longrange correlations is truly complex, whereas a random string of bits is merely an irregular and in a sense trivial pattern.In contrast, Kolmogorov complexity assigns the highest complexity to such random strings, precisely because they do not contain any correlations or structure.Having noted this, the discussion does not directly bear upon our work, because we are not studying 'complexity' in a general sense or trying to argue for one or other metric.Rather, we are studying specifically simplicity bias and AIT inspired arguments as a mathematical framework for understanding probability and dynamical systems.AIT allows one to make quantitative bounds and predictions about various systems, as we illustrate here, regardless of whether or not the term "complexity" is being used in it truest or most correct sense.
Mathematicians and physicists are fascinated by studying simple systems which can exhibit complex patterns, even while they are not technically random.In this context, Wolfram [55] investigated simple automata, and showed that some have high levels of computing power, leading to his conjecture that many natural systems can be Turing complete, i.e., that they can implement arbitrary algorithms, and hence produce complex patterns.Despite this focus on complexity, we observe in our work here that complexity is in a sense not actually that common: Even though the logistic map is famous precisely due to its ability to produce complex behaviour, within the space of possible parameters, i.e., µ, x 0 ∈ R and even restricting to µ ∈ (0.0, 4.0] and x 0 ∈ (0.0, 1.0), chaos and 'complexity' only occur rarely (assuming uniform sampling of the parameter space).This observation accords with the fact that while Wolfram highlighted some rule sets that produce randomness, most of his automata rules sets do not produce complex pseudo-random patterns [55].Coe et al. [68] analytically studied the question of when automata produce 'random' and complex dynamics, and also found that relatively few are random.
The study of random dynamical systems -in which dynamical systems are randomly perturbed in some way -is quite well established [69], including specifically dynamical systems with additive noise [70].While deterministic dynamical systems have been studied due to their relevance to modelling problems in e.g., ecology, physics, and economics, randomly perturbed dynamics are also common in science and hence important to study, and arise naturally in modelling physical systems (e.g., [71]).This motivates studying simplicity bias also in such random dynamical systems in future work.As a separate motivation, in the deterministic logistic map, many sampled µ values yield trajectories which quickly converge to fixed points or other fairly trivial patterns.However, by introducing some small random noise these trajectories may be prevented from converging into trivial patterns, and therefore may show simplicity bias even while the deterministic counterpart does not.The relation of simplicity bias to random dynamical systems has been initially studied recently [72], but many questions remain open.
In this work we have used 1D maps including the logistic map as toy systems to explore simplicity bias in dynamical systems.The connection between AIT and dynamical systems has received some earlier attention from an analytical perspective [73,74,75], and computational perspective [76,72].In future work it may be fruitful to investigate which, if any, properties of the logistic (or other chaotic) map can be predicted or bounded using the simplicity bias bound.That is, if we assume that the simplicity bias bound holds, what other dynamical properties of the map might follow from this?Even if these properties are already known, it would still be insightful to see if they could be derived (perhaps more simply) from AIT arguments.Further, the presence of simplicity bias in some of the 1D map may lead to using the simplicity bias bound as a trajectory prediction method.This is related to the a priori prediction of natural time series patterns attempted by Dingle et al. [7].Another angle would be to study non-digitised trajectories of dynamical systems, which would require different complexities measures amenable to continuous curves, such as proposed in ref. [77].More generally, exploring the use of arguments inspired by AIT in dynamical systems and chaos research is an attractive and potentially fruitful research avenue.

Figure 1 .
Figure 1.An example of a real-valued (orange) and digitised (blue) trajectory of the logistic map, with µ = 3.8 and x 0 = 0.1.The discretisation is defined by writing 1 if x k ≥ 0.5 and 0 otherwise, resulting in the pattern x = 0101011011111011010110111 which has length n = 25 bits.

Figure 2 .
Figure 2. Bifurcation diagram for the logistic map.In (a), the diagram for parameters µ ∈(0, 4.0]; and in (b), for values µ ∈(2.9, 4.0].The value 0.5 has been highlighted in red, to indicate the cut off threshold used to digitise trajectories by a value of 0 if the output is below the threshold, and a value of 1 if it is greater than or equal to the threshold.

Figure 3 .
Figure 3. Simplicity bias in the digitised logistic map from random samples with x 0 ∈ (0, 1) and µ sampled in different intervals.Each blue datapoint corresponds to a different binary digitised trajectory x of length 25 bits.The black line is the upper bound prediction of Eq. (3).(a) Clear simplicity bias for µ ∈(0.0, 4.0] with P (x) closely following the upper bound, except for low frequency and high complexity outputs which suffer from increased sampling noise; (b) simplicity bias is still present for µ ∈[3.0, 4.0]; (c) the distribution of P (x) becomes more flat (less biased) and simplicity bias is much less clear when µ ∈[3.57, 4.0] due to constraining the sampling to µ-regions more likely to show chaos; (d) the distribution of P (x) is roughly uniform when using µ = 4.0, with almost no bias, and hence no possibility of simplicity bias.

Figure 4 .
Figure 4.The distribution P ( K(x) = r) of output complexity values, with x 0 ∈ (0.0, 1.0) and µ sampled from different intervals.(a) A roughly uniform complexity distribution for µ ∈(0.0, 4.0], with some bias towards lower complexities (mean is 3.4 bits); (b) Close to uniform distribution of complexities for µ ∈[3.0, 4.0], mean is 10.3 bits; (c) the distribution leans to higher complexities when µ ∈[3.57, 4.0], mean is 14.1 bits; (d) the distribution is biased to higher complexities values when µ = 4.0 (mean is 16.4 bits); (e) for comparison, purely random binary strings of length 25 bits were generated (mean is 16.2 bits).The distributions of complexity values in (d) and (e) are very similar, but (a-c) show distinct differences.Calculating and comparing P (K) is an efficient way of checking how simplicity biased a map is.

Figure 5 .
Figure 5. Simplicity bias in (a) the logistic map with µ sampled in [0.0, 3.5699], which is the non-chaotic period doubling regime (upper bound fitted slope is -0.17);(b) the Gauss map (upper bound fitted slope is -0.13); and (c) the sine map (upper bound fitted slope is -0.17).

Figure 8 .
Figure 8. Simplicity bias in the logistic, Gauss map, and sine map, same as Figure5, but with semi-transparent data points.