Moving Frames of Reference, Relativity and Invariance in Transfer Entropy and Information Dynamics

.


Introduction
Einstein's theory of relativity postulates that the laws of physics are the same for observers in all moving frames of reference (no frame is preferred) and that the speed of light is the same in all frames [1].These postulates can be used to quantitatively describe the differences in measurements of the same events made by observers in different frames of reference.
Information-theoretic measures are always computed with reference to some observer.They are highly dependent on how the observer measures the data, the subtleties of how an observer asks a question of the data, how the observer attempts to interpret information from the data, and what the observer already knows [2,3].We aim to take inspiration from the theory of relativity to explore the effect of a moving observer on information-theoretic measures here.To make such an investigation however, we need not only an observer for the information measures but specifically: (1) a space-time interpretation for the relevant variables in the system; and (2) some frame of reference for the observer, which can be moving in space-time in the system while the measures are computed.
A candidate for such investigations is a recently introduced framework for information dynamics [4][5][6][7][8], which measures information storage, transfer and modification at each local point in a spatiotemporal system.This framework has had success in various domains, particularly in application to cellular automata (CAs), a simple but theoretically important class of discrete dynamical system that is set on a regular space-time lattice.In application to CAs, the framework has provided quantitative evidence for long-held conjectures that the moving coherent structures known as particles are the dominant information transfer entities and that collisions between them are information modification events.In considering the dynamics of information, the framework examines the state updates of each variable in the system with respect to the past state of that variable.For example, in examining the information transfer into a destination variable using the transfer entropy [9], we consider how much information was contributed from some source, in the context of the past state of that destination.This past state can be seen as akin to a stationary frame of reference for the measurement.As such, we have the possibility to use this framework to explore "relativistic" effects on information; i.e., as applied to a spatiotemporal system such as a CA, with a spatiotemporally moving frame of reference.We begin our paper by introducing CAs in Section 2, basic information-theoretic quantities in Section 3, and the measures for information dynamics in Section 4.
Our primary concern in this paper then lies in exploring a new interpretation of this framework for information dynamics by defining and incorporating a moving frame of reference for the observer (Section 5).The type of relativity presented for application to these lattice systems is akin to an ether relativity, where there is a preferred stationary frame in which information transfer is limited by the speed of light.(We note the existence of a discretized special relativity for certain CAs by Smith [10].For special relativity to be applicable, the CA laws must obey the same rules in all frames of reference.Smith notes the difficulty to find any non-trivial CA rules that meet this requirement, and indeed uses only a simple diffusion process as an example.While in principle we could apply our measures within moving frames of reference in that particular discretization, and intend to do so in future work, we examine only an ether-type of relativity in this study, as this is more naturally applicable to lattice systems.)We also mathematically investigate the shift of frame to demonstrate the invariance of certain information properties.That is, while the total information required to predict a given variable's value remains the same, shifting the frame of reference redistributes that information amongst the measurements of information storage and transfer by the observer.The nature of that redistribution will depend on whether the shift of frame retains, adds or removes relevant information regarding the source-destination interactions.
We perform experiments on elementary cellular automata (ECAs) using the new perspective on information dynamics with shifted frames of reference in Section 6, comparing the results to those found in the stationary frame.We find that, as expected, the use of a moving frame of reference has a dramatic effect on the measurements of information storage and transfer, though the results are well-interpretable in the context of the shifted frame.In particular, particles only appear as information transfer in frames in which they are moving, otherwise they appear as information storage.

Dynamics of Computation in Cellular Automata
Cellular automata (CAs) have been a particular focus for experimentation with the framework for the information dynamics measures that we use here.This is because CAs have been used to model a wide variety of real-world phenomena (see [11]), and have attracted much discussion regarding the nature of computation in their dynamics.
CAs are discrete dynamical systems consisting of an array of cells that each synchronously update their state as a function of the states of a fixed number of spatially neighboring cells using a uniform rule.We focus on Elementary CAs, or ECAs, a simple variety of 1D CAs using binary states, deterministic rules and one neighbor on either side (i.e., cell range r = 1).An example evolution of an ECA may be seen in Figure 1(a).For more complete definitions, including that of the Wolfram rule number convention for describing update rules (used here), see [12].
Studies of information dynamics in CAs have focused on their emergent structure: particles, gliders, blinkers and domains.A domain is a set of background configurations in a CA, any of which will update to another configuration in the set in the absence of any disturbance.Domains are formally defined by computational mechanics as spatial process languages in the CA [13].Particles are considered to be dynamic elements of coherent spatiotemporal structure, as disturbances or in contrast to the background domain.Gliders are regular particles, and blinkers are stationary gliders.Formally, particles are defined by computational mechanics as a boundary between two domains [13]; as such, they can be referred to as domain walls, though this term is usually reserved for irregular particles.Several techniques exist to filter particles from background domains (e.g., [5][6][7][13][14][15][16][17][18][19][20]).As a visual example, see Figure 1(a) and Figure 1(b) -the horizontally moving gliders in Figure 1(a) are filtered using negative values of the measure in Figure 1(b) (which will be introduced in Section 4.1), while the domains (in the background) and the blinkers (the stationary large triangular structures) in Figure 1(a) are filtered using positive values of the measure in Figure 1(b).
These emergent structures have been quite important to studies of computation in CAs, for example in the design or identification of universal computation in CAs (see [11]), and in the analyses of the dynamics of intrinsic or other specific computation ( [13,21,22]).This is because these studies typically discuss the computation in terms of the three primitive functions of computation and their apparent analogues in CA dynamics [11,21]: • blinkers as the basis of information storage, since they periodically repeat at a fixed location; • particles as the basis of information transfer, since they communicate information about the dynamics of one spatial part of the CA to another part; and • collisions between these structures as information modification, since collision events combine and modify the local dynamical structures.

Information-theoretic Quantities
To quantify these dynamic functions of computation, we look to information theory (e.g., see [2,3]) which has proven to be a useful framework for the design and analysis of complex self-organized systems, e.g., [23][24][25][26][27].In this section, we give a brief overview of the fundamental quantities which will be built on in the following sections.
The Shannon entropy represents the uncertainty associated with any measurement x of a random variable X (logarithms are in base 2, giving units in bits): H(X) = − x p(x) log p(x).The joint entropy of two random variables X and Y is a generalization to quantify the uncertainty of their joint distribution: H(X, Y ) = − x,y p(x, y) log p(x, y).The conditional entropy of X given Y is the average uncertainty that remains about x when y is known: H(X|Y ) = − x,y p(x, y) log p(x|y).The mutual information between X and Y measures the average reduction in uncertainty about x that results from learning the value of y, or vice versa: I(X; Y ) = H(X) − H(X|Y ).The conditional mutual information between X and Y given Z is the mutual information between X and Y when Z is known: Moving to dynamic measures of information in time-series processes X, the entropy rate is the limiting value of the average entropy of the next realizations x n+1 of X conditioned on the realizations x (k) n = {x n−k+1 , . . ., x n−1 , x n } of the previous k values X (k) of X (up to and including time step n): Finally, the effective measure complexity [28] or excess entropy [23] quantifies the total amount of structure or memory in a system, and is computed in terms of the slowness of the approach of the entropy rate estimates to their limiting value (see [23]).For our purposes, it is best formulated as the mutual information between the semi-infinite past and semi-infinite future of the process: where X (k + ) refers to the next k states with realizations x (k + ) = {x n+1 , x n+2 , . . ., x n+k }.This interpretation is known as the predictive information [29], as it highlights that the excess entropy captures the information in a system's past that can also be found in its future.

Framework for Information Dynamics
A local framework for information dynamics has recently been introduced in [4][5][6][7][8].This framework examines the information composition of the next value x n+1 of a destination variable, in terms of how much of that information came from the past state of that variable (information storage), how much came from respective source variables (information transfer), and how those information sources were combined (information modification).The measures of the framework provide information profiles quantifying each element of computation at each spatiotemporal point in a complex system.
In this section, we describe the information storage and transfer components of the framework (the information modification component is not studied here; it may be seen in [6]).We also review example profiles of these information dynamics in ECA rule 54 (see raw states in Figure 1(a)).ECA rule 54 is considered a class IV complex rule, contains simple glider structures and collisions, and is therefore quite useful in illustrating the concepts around information dynamics.

Information Storage
We define information storage as the amount of information from the past of a process that is relevant to or will be used at some point in its future.The statistical complexity [30] measures the amount of information in the past of a process that is relevant to the prediction of its future states.It is known that the statistical complexity C µX provides an upper bound to the excess entropy [31]; i.e., E X ≤ C µX .This can be interpreted in that the statistical complexity measures all information stored by the system that may be used in the future, whereas the excess entropy only measures the information that is used by the system at some point in the future.Of course, this means that the excess entropy measures information storage that will possibly but not necessarily be used at the next time step n + 1.When focusing on the dynamics of information processing, we are particularly interested in how much of the stored information is actually in use at the next time step, so as to be examined in conjunction with information transfer.
As such, the active information storage A X was introduced [7] to explicitly measure how much of the information from the past of the process is observed to be in use in computing its next state.The active information storage is the average mutual information between realizations x (k) n of the past state X (k) (as k → ∞) and the corresponding realizations x n+1 of the next value X of a given time series X: We note that the limit k → ∞ is required in general, unless the next value x n+1 is conditionally independent of the far past values x n .We can then extract the local active information storage a X (n + 1) [7] as the amount of information storage attributed to the specific configuration or realization (x (k) n , x n+1 ) at time step n + 1; i.e., the amount of information storage in use by the process at the particular time-step n + 1: (Descriptions of the manner in which local information-theoretical measures are obtained from averaged measures may be found in [5,31].) By convention, we use lower case labels for the local values of information-theoretic quantities.Note that A X (k) and a(i, n + 1, k) represent finite k estimates.
Where the process of interest exists for cells on a lattice structure, we include the index i to identify the variable of interest.This gives the following notation for local active information storage a(i, n + 1) in a spatiotemporal system: We note that the local active information storage is defined for every spatiotemporal point (i, n) in the lattice system.We have A(i, k) = a(i, n, k) n as the average for variable i.For stationary systems of homogeneous variables where the probability distribution functions are estimated over all variables, it is appropriate to average over all variables also, giving: Figure 2(a) shows the local active information as this mutual information between the destination cell and its past history.Importantly, a(i, n, k) may be positive or negative, meaning the past history of the cell can either positively inform us or actually misinform us about its next state.An observer is misinformed where, conditioned on the past history, the observed outcome was relatively unlikely as compared with the unconditioned probability of that outcome (i.e., p(x n+1 |x In deterministic systems (e.g., CAs), negative local active information storage means that there must be strong information transfer from other causal sources.As reported in [7], and shown in the sample application to rule 54 in Figure 1(b), when applied to CAs the local active information storage identifies strong positive values in the domain and in blinkers (vertical gliders).For each of these entities, the next state is effectively predictable from the destination's past.This was the first direct quantitative evidence that blinkers and domains were the dominant information storage entities in CAs.Interestingly for rule 54, the amount of predictability from the past (i.e., the active information storage) is roughly the same for both the blinkers and the background domain (see further discussion in [7]).Furthermore, negative values are typically measured at (the leading edge of) traveling gliders, because the past of the destination (being in the regular domain) would predict domain continuation, which is misinformative when the glider is encountered.

Information Transfer
Information transfer is defined as the amount of information that a source provides about a destination's next state that was not contained in the destination's past.This definition pertains to Schreiber's transfer entropy measure [9] (which we will call the apparent transfer entropy, as discussed later).The transfer entropy captures the average mutual information from realizations y (l) n of the state Y (l) of a source Y to the corresponding realizations x n+1 of the next value X of the destination X, conditioned on realizations x (k) n of the previous state X (k) : Schreiber emphasized that, unlike the (unconditioned) time-differenced mutual information, the transfer entropy was a properly directed, dynamic measure of information transfer rather than shared information.
In general, one should take the limit as k → ∞ in order to properly represent the previous state X (k) as relevant to the relationship between the next value X and the source Y [5].Note that k can be limited here where the next value x n+1 is conditionally independent of the far past values x n , y n ).One than then interpret the transfer entropy as properly representing information transfer [5,32].Empirically of course one is restricted to finite-k estimates T Y →X (k, l).Furthermore, where only the previous value y n of Y is a direct causal contributor to x n+1 , it is appropriate to use l = 1 [5,32].So for our purposes, we write: We can then extract the local transfer entropy t Y →X (n + 1) [5] as the amount of information transfer attributed to the specific configuration or realization (x n+1 , x (k) n , y n ) at time step n + 1; i.e., the amount of information transfered from Y to X at time step n + 1: Again, where the processes Y and X exist on cells on a lattice system, we denote i as the index of the destination variable X i and i − j as the source variable X i−j , such that we consider the local transfer entropy across j cells in: The local transfer entropy is defined for every channel j for the given destination i, but for proper interpretation as information transfer j is constrained among causal information contributors to the destination [32] (i.e., within the past light cone [33]).For CAs for example we have |j| ≤ r, being |j| ≤ 1 for ECAs as shown in Figure 2(a).
We have T (i, j, k) = t(i, j, n, k) n as the average transfer from variable i − j to variable i.For systems of homogeneous variables where the probability distribution functions for transfer across j cells are estimated over all variables, it is appropriate to average over all variables also, giving: Importantly, the information conditioned on by the transfer entropy (i.e., that contained in the destination's past about its next state) is that provided by the local active information storage.(Note however that a conditional mutual information may be either larger or smaller than the corresponding unconditioned mutual information [3]; the conditioning removes information redundantly held by the source and the conditioned variable, but also includes synergistic information that can only be decoded with knowledge of both the source and conditioned variable [34].)Also, the local transfer entropy may also be positive or negative.As reported in [5], when applied to CAs it is typically strongly positive when measured at a glider in the same direction j as the macroscopic motion of the glider (see the sample application to rule 54 in Figure 1(c)).Negative values imply that the source misinforms an observer about the next state of the destination in the context of the destination's past.Negative values are typically only found at gliders for measurements in the orthogonal direction to macroscopic glider motion (see the right moving gliders in Figure 1(c)); at these points, the source (still part of the domain) would suggest that the domain pattern in the destination's past would continue, which is misinformative.Small positive non-zero values are also often measured in the domain and in the orthogonal direction to glider motion (see Figure 1(c)).These correctly indicate non-trivial information transfer in these regions (e.g., indicating the absence of a glider), though they are dominated by the positive transfer in the direction of glider motion.These results for local transfer entropy provided the first quantitative evidence for the long-held conjecture that particles are the information transfer agents in CAs.
We note that the transfer entropy can also be conditioned on other possible causal contributors Z in order to account for their effects on the destination.We introduced the conditional transfer entropy for this purpose [5,6]: This extra conditioning can exclude the (redundant) influence of a common drive Z from being attributed to Y , and can also include the synergistic contribution when the source Y acts in conjunction with another source Z (e.g., where X is the outcome of an XOR operation on Y and Z).
We specifically refer to the conditional transfer entropy as the complete transfer entropy (with notation T c Y →X (k) and t c Y →X (n + 1, k) for example) when it conditions on all other causal sources Z to the destination X [5].For CAs, this means conditioning on the only other causal contributor to the destination.For example, for the j = 1 channel, we can write with T c (j, k) for the spatiotemporal average in homogeneous, stationary systems.To differentiate the conditional and complete transfer entropies from the original measure, we often refer to T Y →X simply as the apparent transfer entropy [5]-this nomenclature conveys that the result is the information transfer that is apparent without accounting for other sources.
In application to CAs, we note that the results for t c (i, j, n+1, k) are largely the same as for t(i, j, n+ 1, k) (e.g., compare Figure 1(d) with Figure 1(c) for rule 54), with some subtle differences.These results are discussed in detail in [5].First, in deterministic systems such as CAs, t c (i, j, n + 1, k) cannot be negative since by accounting for all causal sources (and without noise) there is no way that our source can misinform us about the next state of the destination.Also, the strong transfer measured in gliders moving in the macroscopic direction of the measured channel j is slightly stronger with t c (i, j, n + 1, k).This is because, by accounting for the other causal source, we can be sure that there is no other incoming glider to disturb this one, and thus attribute more influence to the source of the ongoing glider here.Other scenarios regarding synergistic interactions in other rules are discussed in [5].

Information Dynamics for a Moving Observer
In this section, we consider how these measures of information dynamics would change for a moving observer.First, we consider the meaning of the past state x (k) n in these measures, and how it can be interpreted as a frame of reference.We then provide a formulation to interpret these measures for an observer with a moving frame of reference.We consider what aspects of the dynamics would remain invariant, and finally consider what differences we may expect to see from measures of information dynamics by moving observers.

Realizations x (k)
n of the past state X (k) of the destination variable X play a very important role in the measures of information dynamics presented above.We see that the active information storage directly considers the amount of information contained in x (k) n about the next value x n+1 of X, while the transfer entropy considers how much information the source variable adds to this next value conditioned on x n can be understood from three complementary perspectives here: (1) To separate information storage and transfer.As described above, we know that x (k) n provides information storage for use in computation of the next value x n+1 .The conditioning on the past state in the transfer entropy ensures that none of that information storage is counted as information transfer (where the source and past hold some information redundantly) [5,6].
(2) To capture the state transition of the destination variable.We note that Schreiber's original description of the transfer entropy [9] can be rephrased as the information provided by the source about the state transition in the destination.That x n are embedding vectors [35], which capture the underlying state of the process.
(3) To examine the information composition of the next value x n+1 of the destination in the context of the past state x (k) n of the destination.With regard to the transfer entropy, we often describe the conditional mutual information as "conditioning out" the information contained in x (k) n , but this nomenclature can be slightly misleading.This is because, as pointed out in Section 4.2, a conditional mutual information can be larger or smaller than the corresponding unconditioned form, since the conditioning both removes information redundantly held by the source variable and the conditioned variable (e.g., if the source is a copy of the conditioned variable) and adds information synergistically provided by the source and conditioned variables together (e.g., if the destination is an XOR-operation of these variables).As such, it is perhaps more useful to describe the conditioned variable as providing context to the measure, rather than "conditioning out" information.Here then, we can consider the past state x (k) n as providing context to our analysis of the information composition of the next value x n+1 .
Note that we need k → ∞ to properly capture each perspective here (see discussion in Section 4.1 and Section 4.2 regarding conditions where finite-k is satisfactory).
Importantly, we note that the final perspective of x (k) n as providing context to our analysis of the information composition of the computation of the next state can also be viewed as a "frame of reference" for the analysis.

Information Dynamics with a Moving Frame of Reference
Having established the perspective of x (k) n as providing a frame of reference for our analysis, we now examine how the measures of our framework are altered if we consider a moving frame of reference for our observer in lattice systems.
It is relatively straightforward to define a frame of reference for an observer moving at f cells per unit time towards the destination cell x i,n+1 .Our measures consider the set of k cells backwards in time from x i,n+1 at −f cells per time step: Notice that x i−0,n with f = 0, as it should.We can then define measures for each of the information dynamics in this new frame of reference f .As shown with the double headed arrow in Figure 2(b), the local active information in this frame becomes the local mutual information between the observer's frame of reference x (k,f ) i−f,n and the next state of the destination cell x i,n+1 ; mathematically this is represented by: Crucially, a(i, n + 1, k, f ) is still a measure of local information storage for the moving observer: it measures how much information is contained in the past of their frame of reference about the next state that appears in their frame.The observer, as well as the shifted measure itself, is oblivious to the fact that these observations are in fact taken over different variables.Finally, we write A(k, f ) = a(i, n + 1, k, f ) i,n as the average of finite-k estimates over all space-time points (i, n) in the lattice, for stationary homogeneous systems.As shown by directed arrows in Figure 2(b), the local transfer entropy becomes the local conditional mutual information between the source cell x i−j,n and the destination x i,n+1 , conditioned on the moving frame of reference x The set of sensible values to use for j remains those within the light-cone (i.e., those that represent causal information sources to the destination variable); otherwise we only measure correlations rather than information transfer.That said, we also do not consider the transfer entropy for the channel j = f here, since this source is accounted for by the local active information.Of course, we can now also consider j = 0 for moving frames f = 0. Writing the local complete transfer entropy t c (i, j, n + 1, k, f ) for the moving frame trivially involves adding conditioning on the remaining causal source (that which is not the source x i−j,n itself, nor the source x i−f,n in the frame) to Equation (36).
Again, t(i, j, n + 1, f ) is still interpretable as a measure of local information transfer for the moving observer: it measures how much information was provided by the source cell about the state transition of the observer's frame of reference.The observer is oblivious to the fact that the states in its frame of reference are composed of observations taken over different variables.
Also, note that while t(i, j, n + 1, f ) describes the transfer across j cells in a stationary frame as observed in a frame moving at speed f , we could equally express it as the transfer observed across j − f cells in the frame f .Finally, we write T (j, k, f ) = t(i, j, n + 1, k, f ) i,n as the average of finite-k estimates over all space-time points (i, n) in the lattice, for stationary homogeneous systems.
In the next two subsections, we describe what aspects of the information dynamics remain invariant, and how we can expect the measures to change, with a moving frame of reference.

Invariance
This formulation suggests the question of why we consider the same set of information sources j in the moving and stationary frames (i.e., those within the light-cone), rather than say a symmetric set of sources around the frame of reference (as per a stationary frame).To examine this, consider the local (single-site) entropy h(i, n + 1) = log p(x i,n+1 ) as a sum of incrementally conditioned mutual information terms as presented in [6].For ECAs (a deterministic system), in the stationary frame of reference, this sum is written as: with either j = 1 or j = −1.Since h(i, n + 1) represents the information required to predict the state at site (i, n + 1), Equation (37) shows that one can obtain this by considering the information contained in the past of the destination, then the information contributed through channel j that was not in this past, then that contributed through channel −j which was not in this past or the channel j.The first term here is the active information storage, the first local conditional mutual information term here is a transfer entropy, the second is a complete transfer entropy.Considering any sources in addition to or instead of these will only return correlations to the information provided by these entities.Note that there is no need to take the limit k → ∞ for the correctness of Equation (37) (unless one wishes to properly interpret the terms as information storage and transfer).In fact, the sum of incrementally conditional mutual information terms in Equation ( 37) is invariant as long as all terms use the same context.We can also consider a moving frame of reference as this context and so construct this sum for a moving frame of reference f .Note that the choice of f determines which values to use for j, so we write an example with f = 1: Obviously this is true because the set of causal information contributors is invariant, and we are merely considering the same causal sources but in a different context.Equation (39) demonstrates that prediction of the next state for a given cell in a moving frame of reference depends on the same causal information contributors.Considering the local transfer entropy from sources outside the light cone instead may be insufficient to predict the next state [32].
Choosing the frame of reference here merely sets the context for the information measures, and redistributes the attribution of the invariant amount of information in the next value x i,n+1 between the various storage and transfer sources.This could be understood in terms of the different context redistributing the information atoms in a partial information diagram (see [34]) of the sources to the destination.
Note that we examine a type of ether relativity for local information dynamics.That is to say, there is a preferred stationary frame of reference f = 0 in which the velocity for information is bounded by the speed of light c.The stationary frame of reference is preferred because it is the only frame that has an even distribution of causal information sources on either side, while other frames observe an asymmetric distribution of causal information sources.It is also the only frame of reference that truly represents the information storage in the causal variables.As pointed out in Section 1, we do not consider a type of relativity where the rules of physics (i.e., CA rules) are invariant, remaining observationally symmetric around the frame of reference.

Hypotheses and Expectations
In general, we expect the measures a(i, n, k, f ) and t(i, j, n, k, f ) to be different from the corresponding measurements in a stationary frame of reference.Obviously, this is because the frames of reference x (k,f ) i−f,n provide in general different contexts for the measurements.As exceptional cases however, the measurements would not change if: • The two contexts or frames of reference in fact provide the same information redundantly about the next state (and in conjunction with the sources for transfer entropy measurements).• Neither context provides any relevant information about the next state at all.Despite such differences to the standard measurements, as described in Section 5.2 the measurements in a moving frame of reference are still interpretable as information storage and transfer for the moving observer, and still provide relevant insights into the dynamics of the system.
In the next section, we will examine spatiotemporal information profiles of CAs, as measured by a moving observer.We hypothesize that in a moving frame of reference f , we shall observe: • Regular background domains appearing as information storage regardless of movement of the frame of reference, since their spatiotemporal structure renders them predictable in both moving and stationary frames.In this case, both the stationary and moving frames would retain the same information redundantly regarding how their spatiotemporal pattern evolves to give the next value of the destination in the domain; • Gliders moving at the speed of the frame appearing as information storage in the frame, since the observer will find a large amount of information in their past observations that predict the next state observed.In this case, the shift of frame incorporates different information into the new frame of reference, making that added information appear as information storage; • Gliders that were stationary in the stationary frame appearing as information transfer in the channel j = 0 when viewed in moving frames, since the j = 0 source will add a large amount of information for the observer regarding the next state they observe.In this case, the shift of frame of reference removes relevant information from the new frame of reference, allowing scope for the j = 0 source to add information about the next observed state.

Results and Discussion
To investigate the local information dynamics in a moving frame of reference, we study ECA rule 54 here with a frame of reference moving at f = 1 (i.e., one step to the right per unit time).Our experiments used 10,000 cells initialized in random states, with 600 time steps captured for estimation of the probability distribution functions (similar settings used in introducing the local information dynamics in [5][6][7]).We fixed k = 16 for our measures (since the periodic background domain for ECA rule 54 has a period of 4, this captures an adequate amount of history to properly separate information storage and transfer as discussed in [5]).We measure the local information dynamics measures in both the stationary frame of reference (Figure 1) and the moving frame of reference f = 1 (Figure 3).The results were produced using the "Java Information Dynamics Toolkit" [36], and can be reproduced using the Matlab/Octave script movingFrame.m in the demos/octave/CellularAutomata example distributed with this toolkit.
We first observe that the background domain is captured as a strong information storage process irrespective of whether the frame of reference is moving (with a(i, n, k = 16, f = 1), Figure 3(b)) or stationary (with a(i, n, k = 16, f = 0), Figure 1(b)).That is to say that the frame of reference is strongly predictive of the next state in the domain, regardless of whether the observer is stationary or moving.This is as expected, because the background domain is not only temporally periodic, but spatiotemporally periodic, and the moving frame provides much redundant information with the stationary frame about the next observed state.
While it is not clear from the local profiles however, the average active information storage is significantly lower for the moving frame than the stationary frame (A(i, n, k = 16, f = 1) = 0.468 bits versus A(i, n, k = 16, f = 0) = 0.721 bits).At first glance, this seems strange since the background domain is dominated by information storage, and the observer in both frames should be able to adequately detect the periodic domain process.On closer inspection though, we can see that the storage process in the domain is significantly more disturbed by glider incidence in the moving frame, with a larger number and magnitude of negative local values encountered, and more time for the local values to recover to their usual levels in the domain.This suggests that the information in the moving frame is not fully redundant with the stationary frame, which could be explained in that the stationary frame (being centred in the light cone) is better able to retain information about the surrounding dynamics that could influence the next state.The moving frame (moving at the speed of light itself) is not able to contain any information regarding incoming dynamics from neighboring cells.Thus, in the moving frame, more of the (invariant) information in the next observed state is distributed amongst the transfer sources.
As expected also, we note that gliders that are moving at the same speed as the frame of reference f = 1 are now considered as information storage in that frame.That is, the right moving gliders previously visible as misinformative storage in Figure 1(b) now blend in with the background information storage process in the moving frame in Figure 3(b).As previously discussed, this is because the moving frame brings new information for the observer about these gliders into the frame of reference.
Figure 3(b) also shows that it is only gliders moving in orthogonal directions to the frame f = 1 (including blinkers, which were formerly considered stationary) that contain negative local active information storage, and are therefore information transfer processes in this frame.Again, this is as expected, since these gliders contribute new information about the observed state in the context of the frame of reference.For gliders that now become moving in the moving frame of reference, this is because the information about those gliders is no longer in the observer's frame of reference but can now be contributed to the observer by the neighboring sources.To understand these processes in more detail however, we consider the various sources of that transfer via the transfer entropy measurements in Figure 3(c)-Figure 3(f).
First, we focus on the vertical gliders that were stationary in the stationary frame of reference (i.e., the blinkers): we had expected that these entities would be captured as information transfer processes in the j = 0 (vertical) channel in the j = 1 moving frame.This expectation is upheld, but the dynamics are more complicated than the foreseen in our hypothesis.Here, we see that the apparent transfer entropy from the j = 0 source alone does not dominate the dynamics for this vertical glider (Figure 3(c)).Instead, the information transfer required to explain the vertical gliders is generally a combination of both apparent and complete transfer entropy measures, requiring the j = −1 source for interpretation as well.The full information may be accounted for by either taking Figure 3 38)).Further, we note that some of the points within the glider are even considered as strong information storage processes -note how there are positive storage points amongst the negative points (skewed by the moving frame) for this glider in Figure 3(b).These vertical gliders are thus observed in this frame of reference to be a complex structure consisting of some information storage, as well as information transfer requiring both other sources for interpretation.This is a perfectly valid result, demonstrating that switching frames of reference does not lead to the simple one-to-one correspondence between individual information dynamics that one may naively expect.(f)

Figure 1 .
Figure 1.Measures of information dynamics applied to ECA Rule 54 with a stationary frame of reference (all units in (b)-(d) are in bits).Time increases down the page for all plots.(a) Raw CA; (b) Local active information storage a(i, n, k = 16); (c) Local apparent transfer entropy t(i, j = −1, n, k = 16); (d) Local complete transfer entropy t c (i, j = −1, n, k = 16).

Figure 2 .
Figure 2. Local information dynamics for a lattice system with speed of light c = 1 unit per time step: (a) (left) with stationary frame of reference (f = 0); (b) (right) with moving frame of reference f = 1 (i.e., at one cell to the right per unit time step).Red double-headed arrow represents active information storage a(i, n + 1, f ) from the frame of reference; the blue single-headed arrow represent transfer entropy t(i, j, n + 1, f ) from each source orthogonal to the frame of reference.Note that the frame of reference in the figures is the path of the moving observer through space-time.
role of the past state x (k) (c) plus Figure 3(f) or Figure 3(e) plus Figure 3(d) (as per the two different orders of considering sources to sum the invariant information in Equation (