Morphological Computation : Synergy of Body and Brain

There are numerous examples that show how the exploitation of the body’s physical properties can lift the burden of the brain. Examples include grasping, swimming, locomotion, and motion detection. The term Morphological Computation was originally coined to describe processes in the body that would otherwise have to be conducted by the brain. In this paper, we argue for a synergistic perspective, and by that we mean that Morphological Computation is a process which requires a close interaction of body and brain. Based on a model of the sensorimotor loop, we study a new measure of synergistic information and show that it is more reliable in cases in which there is no synergistic information, compared to previous results. Furthermore, we discuss an algorithm that allows the calculation of the measure in non-trivial (non-binary) systems.


Introduction
There are numerous examples that show how the exploitation of the body's physical properties can lift the burden of the brain.Examples range from grasping [1][2][3], swimming [4][5][6], locomotion [7][8][9][10], to motion detection [11].Probably the most prominent example in this field is the Passive Dynamic Walker [9], which is a purely mechanical system that mimics human walking.It has carefully chosen length and weight proportions of the leg segments, as well as carefully designed feet.If placed on a slope, it will show a natural, appealing walking behaviour, which is a strong indication that human walking does not have to be fully controlled, but that part of it can result from physical interactions of the legs (weight, friction, etc.) with their environment (slope, gravity, etc.).Another impressive example is human grasping which exploits at least two different physical interactions.First, as a result of the skin's softness and friction, even fragile objects can be hold securely with some variation of grip posture and grip pressure.This means that the brain does not have to carefully control the position of the fingers, the tightness of the grasp, and, in particular, does not have to precisely estimate the shape of the object.This leads to a significant reduction of the computational burden for the brain in grasping.The second effect that is used is the friction in the hand's tendon network, which has been shown to perform logic computation and affect torque production capabilities [2].
The term Morphological Computation [12] was originally coined to describe processes in the body that would otherwise have to be conducted by the brain [13].One of the main questions that arises, and is so far unsolved, regards the distinction between the Passive Dynamic Walker and a ball rolling downhill.Both cases are purely mechanical systems, but one would assign morphological computation to the Passive Walker, whereas one would generally have difficulties in stating that the ball is performing computation or reducing the computational complexity for a brain.There are three possible solutions to this problem.Their relation to synergy will be addressed below.First, as argued in [14,15], the Passive Walker itself is not performing computation, but it shows that morphological computation can be present in human walking.Second, the definition of physical computation [16] offers a possibility to distinguish between pure physics and physical computation.Physical computation requires four ingredients.First, a function that encodes the data of the user into an initial state of the system, second, a physical process that transforms the initial state into some target state, third, a decoding function that transforms the target state back into something that the user can process, and, finally, a theory about how the system works.The implications of this theory of physical computation, in particular with respect to morphological computation, are discussed in [17].With respect to the ball and the Passive Walker, the theory of [16] would lead to the conclusion that both are computing if there is a user that translates their states, e.g., to measure the slope.This is in accordance with the initial definition given by [12] and is also used as a basis for [18,19], which are discussed below.The third possibility can be summarised in the following way (cited from [20]): "nonneural body parts could be described as parts of a computational system, but they do not realise computation autonomously, only in connection with some kind of [. . .] central control system."If we now compare the three different approaches, they don't seem to be entirely different.All three cases argue that morphological computation requires the interaction of a brain with the body.In the context of this work, this is understood as a synergistic perspective on morphological computation.This will be explained in more detail, after related work on formalising morphological computation is presented next.
Pfeifer and Iida [21] state that "One problem with the concept of morphological computation is that while intuitively plausible, it has defied serious quantification efforts."Since then, there are basically two different streams of formalising morphological computation, which can be divided into a dynamical systems approach and an information theoretic approach.The two approaches do not stand in opposition but should rather be seen as complimentary [22].The first approach [18,19] models processes in the body in the context of reservoir computing [23,24].This means that the body is understood as a type of physical reservoir computer and the controller or brain harnesses the body dynamics to produce a behaviour.Examples are the spine-driven robot [7], which uses the spine dynamics as part of its controller and the dynamics of an octopus arm that can be used for computation [25].Within this first approach, there are also several works that discuss the importance of a tight body-brain-environment coupling, of which the following are just a few examples [13,[26][27][28][29][30][31].Although very intuitive and compelling, this approach does not allow to quantify how the body reduces the computational burden of the brain.This is the motivation for the second, information theoretic approach [15].The guiding idea is to model the sensorimotor loop as a causal graph [32] (details will follow below) and, based on that, ask how much internal processes (with respect to the agent's perspective) contributed to an observed behaviour, as opposed to body-environment interactions.The information theoretic measures have been successfully applied to quantify morphological computation in muscle models [14] and soft robotics [1] and relations have been drawn to unique and synergistic information [33] based on the work by [34].Unfortunately, at that time, the synergistic information could only be calculated for simple binary models of the sensorimotor loop, which prohibited a further investigation in non-trivial systems or even real data.This is where this work is targeted at.Based on the complexity measure by Ay [35] and Perrone and Ay [36], we investigate synergistic information in binary and non-binary models of the sensorimotor loop and compare the results to our previous work in [33].
This work is organised in the following way.Section two discusses in detail the relation between synergistic information and morphological computation, based on previous work and the causal model of the sensorimotor loop.The third section presents the parametrised model of the sensorimotor loop, which is used to analyse and compare the new measure with previous work.The fourth section presents numerical results which are discussed in the final section.

A Synergistic Perspective on Morphological Computation
The introduction gave a motivation to understand morphological computation as a process that occurs as the result of some type of control that exploits physical properties of the body.This is one way to distinguish morphological computation from purely physical processes and can be understood as a synergistic coupling of brain and body that is required for morphological computation.This section will give a formal motivation to quantifying morphological computation as synergistic information, which is based on our previous work [15,33].For a derivation of an information theoretic quantification, it is helpful to have a causal model of the sensorimotor loop [32], which is presented first.

Causal Model of the Sensorimotor Loop
We assume that there is a canonical way to separate a cognitive system into four parts, namely brain, sensors, actuators, and body.We are fully aware that the system-environment separation is a very difficult and yet unsolved question for biological systems (see e.g., [37] for a discussion).This holds even more in the case of the distinction between the body and brain.Yet, in order to derive a quantification, we have to assume that there is such a distinction.
In our conceptual model of the sensorimotor loop, which is derived from [22], the brain or controller sends signals to the actuators that influence the environment (see Figure 1).We prefer the notion of the system's Umwelt [31,38,39], which is the part of the system's environment that can be affected by the system and itself affects the system.The state of the actuators and the Umwelt are not directly accessible to the cognitive system, but the loop is closed as information about both the Umwelt and the actuators are provided to the controller by the system's sensors.In addition to this general concept, which is widely used in the embodied artificial intelligence community (see e.g., [22]), we introduce the notion of world to the sensorimotor loop, that is, the system's morphology and the system's Umwelt.This differentiation between body and world is analogous to the agent-environment distinction made in the context of reinforcement learning [40], where the environment is defined as everything that cannot be changed arbitrarily by the agent.Revisiting the list of examples given in the introduction (e.g., locomotion, grasping), it is seen that most behaviours that are interesting in the context of morphological computation can be modelled sufficiently as reactive behaviours.Hence, for the remainder of this work, we will omit the controller and assume that the sensors are directly connected to the actuators.For a discussion of the causal diagram for non-reactive systems, the reader is referred to [32].Quantifications of morphological computation for non-reactive systems are discussed in [15].
The causal diagram of the sensorimotor loop is shown on the right-hand side of Figure 1.The random variables A, S, W, and W refer to actuator signals, sensor signals, and the current and next world state.Directed edges reflect causal dependencies between the random variables.The random variables S and A are not to be mistaken with the sensors and actuators.The variable S is the output of the sensors, which is available to the controller or brain, and the action A is the input that the actuators take.Consider an artificial robotic system as an example.Then the sensor state S could be the pixel matrix delivered by a camera sensor and the action A could be a numerical value that is taken by a motor controller and converted in currents to drive a motor.
Throughout this work, we use capital letters (X, Y, . . . ) to denote random variables, non-capital letters (x, y, . . . ) to denote a specific value that a random variable can take, and calligraphic letters (X , Y, . . . ) to denote the alphabet for the random variables.This means that x t is the specific value that the random variable X can take a time t ∈ N, and it is from the set x t ∈ X .Greek letters refer to generative kernels, i.e., kernels that describe an actual underlying mechanism or a causal relation between two random variables.In the causal graphs throughout this paper, these kernels are represented by direct connections between corresponding nodes.This notation is used to distinguish generative kernels from others, such as the conditional probability of a given that w was previously seen, denoted by p(a|w), which can be calculated or sampled but that does not reflect a direct causal relation between the two random variables A and W (see Figure 1).
We abbreviate the random variables for better comprehension in the remainder of this work, as all measures consider random variables of consecutive time indices.Therefore, we use the following notation.Random variables without any time index refer to time index t and hyphened variables to time index t + 1.The two variables W and W refer to W t and W t+1 , respectively.

Quantifying Morphological Computation as Synergistic Information
We can now restate the two original concepts of quantifying morphological computation [15] (see Figure 2) and also discuss their relationship to synergistic information [33] as defined by [34].This will build the foundation for the comparison with quantifying synergistic information based on the measure proposed by [35].The basis for both original concepts MC A and MC W is the world dynamics kernel α(w |w, a), which describes how the next world states W depends on the current world state W and the current action A (see Figure 1, right-hand side, and Figure 2, left-hand side, respectively).For the first concept MC A , let us assume that there is no dependence of the next world state W on the current action A. In this case, the world dynamics kernel α(w |w, a) reduces to α(w |w) (which is given by α(w |w) = ∑ a p(w ,w,a) /p(w), see also Figure 2, centre).As a result, we would state that we have maximal morphological computation, as the system's behaviour is not controlled by the brain at all.An example of such a system is the Passive Dynamic Walker that was discussed in the introduction of this work.We can measure how much the observed behaviour differs from this assumption with the Kullback-Leibler divergence.This leads to the following formalisation: Unfortunately, Equation ( 1) is zero for maximal morphological computation, which is why we initially chose to normalise and invert it, leading to the following definition: The second concept, MC W starts with the opposite assumption, namely, that the current world state W does not have any influence on the next world state W (see Figure 2, right-hand side).In this case, the world dynamics kernel α(w |w, a) reduces to α(w |a) (which is given by α(w |a) = ∑ w p(w ,w,a) /p(a), see also Figure 2) and analogously to the following definition for MC W : The relation of the measures to transfer entropy [41,42] and the information bottleneck [43] are discussed in [15].In the context of this work, we focus on their relation to synergistic information as defined by [34] and [35,36].
Next, we briefly restate the information decomposition by [34,44] that was used in the context of morphological computation in [33].

Synergistic Information Based on the Decomposition of the Multivariate Mutual Information
Consider three random variables X, Y, and Z. Suppose that a system wants to predict the value of the random variable X, but it can only access the information in Y or Z.The question is, how is the information that Y and Z carry about X distributed over Y and Z?In general, there may be redundant or shared information (information contained in both Y and Z), but there may also be unique information (information contained in either Y or Z).Finally, there is also the possibility of synergistic or complementary information, i.e., information that is only available when Y and Z are taken together.The classical example for synergy is the XOR function: if Y and Z are binary random variables and if X = Y XOR Z, then neither Y nor Z contain any information about X (in fact, X is independent of Y and X is independent of Z), but when Y and Z are taken together, they completely determine X.
The total information that (Y, Z) contains about X can be quantified by the mutual information I(X; Y, Z).However, there is no canonical way to separate these four kinds of informations.Different variations have been proposed (see e.g., [34,36,[44][45][46][47]), but a final definition has yet to be found.
Mathematically, one would like to have four functions, namely shared information (SI(X : Y; Z)), unique information of Y (UI(X : Y \ Z)), unique information of Z (UI(X : Z \ Y)), and finally synergistic information (also named complementary information in [34], CI(X : Y; Z)) that satisfy It follows from the defining equations [34] and the chain rule of mutual information that an information decomposition always satisfies Several candidates have been proposed for SI, UI, and CI so far (see e.g., [45,46]).A new candidate will be presented below (see Section 4.2).In this section, we will describe the decomposition of [34] that is defined in the following way.
Let Σ be the set of all possible joint distributions of X, Y, and Z. Fix an element P ∈ Σ (the "true" joint distribution of X, Y, and Z).
as the set of all joint distributions that have the same marginal distributions on the pairs (X, Y) and (X, Z).Then CI(X : where CoI denotes the co-information as defined in [48].Here, a subscript Q in an information quantity means that the quantity is computed with respect to Q as the joint distribution.
In [34], the formulas for UI, CI, and SI are derived from considerations about decision problems in which the objective is to predict the outcome of X.In the context of morphological computation, we want to apply the information decomposition in the following way.We will set X = W , Y = W, and Z = A. In the context of the sensorimotor loop, W and A not only have information about W but they actually control W .However, from an abstract point of view, the situation is similar: in the sensorimotor loop, we also expect to find aspects of redundant, unique, and complementary influence of W and A on W . Formally, since everything is defined probabilistically, we can still use the same functions UI, CI, and SI.We believe that the arguments behind the definition of UI, CI, and SI remain valid in the setting of the sensorimotor loop where we need it.
The reason for investigating the unique and synergistic information is indicated in Equation ( 8), which we will rewrite here in terms of the sensorimotor loop in the following way: = UI(W : W\A) + CI(W : W; A) Equation (14) shows that MC W can be decomposed into unique and synergistic information.This also shows the advantage of the information decomposition approach as defined by [34].Synergistic and unique information can be computed from the other, if the conditional mutual information I(W ; W|A) is known.The conditional mutual information can be easily derived from observation, given that there are enough samples with respect to the dimensionality of W , W, and A. We are not only interested in mathematically rigorous definitions but also in applicability to real data.This is where the decomposition by [34] currently has a disadvantage.There is no algorithm known to us to compute synergistic and unique information for non-trivial systems, i.e., non-binary systems.For the binary model of the sensorimotor loop (see below), we used an approximation in our previous work [33] that was already described in the original paper [34].We will compare our previous results with a new measure that can be computed also for non-trivial systems as below.However, there is a more important problem with the definitions given by [34].They do not incorporate the probability distributions over the inputs, in this case, the random variables W and A, which means that the synergistic measure CI(W : W; A) can falsely detect correlations in the input as synergistic information (this will be shown in the results section and discussed at the end of this work).
Applicability to non-trivial systems and potential false positives are the two reasons why we introduce a measure for synergistic information based on [35,36] in the remainder of this section.

Synergistic Information as the Difference between the Whole and the Sum of Its Parts
The basic idea of the complexity measure defined by [35] is summarised in the paper in the following way: "The whole is more than the sum of its elementary parts."The underlying information theoretic idea is best explained along the schematics shown in Figure 3.The left-hand side of Figure 3 shows the "whole", while the right-hand side shows "the elementary parts" of a stochastic system with two input variables, X 1 and X 2 , and two output variables, Y 1 and Y 2 .We refer to the graphical model on the left-hand side of Figure 3 as the full model, whereas the model on the right-hand side will be referred to as the split model.The full model assumes that every output node is connected to every input node.The split model assumes every input node only affects one output node.The undirected connection between the input nodes indicates that the input distribution is taken into account in both models.Both models are defined by their feature sets F. In the example given above, the feature set for the full model is given by F full = {{X 1 , Y 1 , X 2 , Y 2 }} and the feature set for the split model is given by F split = {{X 1 , Y 1 }, {X 2 , Y 2 }, {X 1 , X 2 }}.Note that we have explicitly included the feature corresponding to the input distribution {X 1 , X 2 }.The divergence between the full and split models is defined as a measure for complexity in [35].Variations of this measure have been proposed in [49,50] and compared in [51].
As discussed earlier in this work, we are primarily interested in the relation of the three random variables W, A, and W , which represent the current world state, the current action, and the next world state, respectively.Hence, we translate the quantification for complexity [35] that was originally formulated for four random variables (see Figure 3) to three random variables (see Figure 4).The resulting quantification is also known as synergistic information and was first discussed in [36].
Given three random variables X, Y, and Z (see Figure 4), the synergistic information is defined as the averaged Kullback-Leibler divergence where the full and split models are defined by the feature sets depicted in Figure 4. Right-hand side: split model, as proposed by [36].
In their work, the authors showed that the main difference between the synergistic measure defined by [34] and the synergistic measure defined by [36] is that the former does not take the input distribution into account.This is shown by the following Equation (cited from [36]): ) The difference between the measure proposed by [34,36] is seen in line 3 of Equation ( 17) (compare with Equation ( 9)).Another difference is that the measure by [34] can so far not be calculated for non-trivial systems, e.g., with the iterative scaling algorithm discussed in the next section.
Applying the measure by [36] to the sensorimotor loop and in the context of morphological computation, we are asking how much the observation of our embodied agent's behaviour differs from the assumption that there is no synergistic term.In other words, we are measuring the averaged Kullback-Leibler divergence where the feature set of the full model is defined as F full = {{W , W, A}} and the feature set of the split model is defined as F split = {{W , W}, {W , A}, {W, A}} (see Figure 4, where X = W , Y = W, and Z = A).

Maximum Entropy Estimation with the Iterative Scaling Algorithm
There is a standard method to calculate the maximum entropy estimation of a probability distribution based on features, known as iterative scaling, which is well-established in this field and goes back to the work of Darroch and Ratcliff [52] and Csiszár [53].The algorithm can be summarised in the following form (for joint distributions).Let p be the target distribution (in our case p(w , w, a) = ∑ s p(w)β(s|w)π(a|s)α(w |w, a), see Section 3.1), V be the set of random variables (in the context of this work V = {W , W, A}), and F be the feature set (e.g., F split = {{W , W}, {W , A}, {W, A}}).For the sake of simplicity in presentation, we use the following abbreviations: p(V) = p(w , w, a), p(F i ) is either p(w, a), p(w , a), or p(w , w) (depending on the index i) and p(V\F i |F i ) is either p(w |w, a), p(w|w , a), or p(a|w , w) depending on the selected feature.The target distribution (that is approximated with the maximum entropy method) is denoted by p(V).As an example, the target distribution for the first feature is given by p(F 1 ) = ∑ a p(w , w, a).Iterative scaling is then defined in the following way: In words, we initialise the joint distribution p (0) to be the uniform distribution.In each iteration, we pick one feature from the set of features.We then multiply the marginal distribution of this feature in the target distribution ( p(F n mod |F| )) with the conditional distribution of the remaining variables conditioned on the chosen feature (p (n) (V\F n mod |F| |F n mod |F| )).In the notion above, F n mod |F| refers to the iterative selection of features based on the current iteration step n, modulo the number |F| of defined features.This algorithm is proved to converge [52,53] and is used in the numerical simulations below (source code is available at [54]).Next, we discuss the parametrised model of the sensorimotor loop which is used in this work to evaluate and compare the different measures.

Parametrised Model of the Sensorimotor Loop
The previous section discussed five different measures, namely MC W , MC A , UI(W : W\A) CI(W : W; A), and MC SY .The first four were evaluated in previous publications [14,15] based on a binary model of the sensorimotor loop.MC W was also evaluated on data from hopping models [14] and in the context of soft robotics [1].This is why we concentrate on CI(W : W; A) and MC SY in the context of this work.A new proposal for the unique information will follow below (see Section 4.2).
In our previous work [33], we used an approximation to calculate the synergistic information and were only able to apply it to binary systems.As stated earlier, we are interested in applying synergistic information to real-world applications, hence, this section has two goals.The first goal is to compare how the previous results to the new measure.The second goal is to investigate how the new measure perform on non-binary systems in a fully controlled setting, i.e., model setting.This is a necessary step, before applying the measures to real data, as we need to understand if the results comply with the intuitive understanding of morphological computation.
The next section introduces the binary and non-binary model of the sensorimotor loop.The binary model is used to compare the results of our new measure MC SY with our previous results on CI(X : Y; Z) [33].

Binary Model of the Sensorimotor Loop
The causal diagram of the sensorimotor loop (see Figure 1) implies that we need to define four different maps, namely the world dynamics kernel α(w |w, a), the policy π(a|s), the sensor map β(s|w), and finally, the input distribution p(w).Note that in this section we operate on binary random variables, i.e., w , w, s, a ∈ Ω = {−1, 1}.The binary model of the sensorimotor loop is then defined by the following set of equations (see also [1,15]): e φw w+ψw a+χw wa ∑ w ∈Ω e φw w+ψw a+χw wa (21) p τ (w) = e τw ∑ w ∈Ω e τw (24) In the context of this work, we will only vary the parameters φ, ψ, and χ, which means that we will change the causal dependence of W → W (parameter φ), the causal dependence of A → W (parameter ψ), and finally, the causal dependence of (W, A) → W (parameter χ).The other parameters are set such that they result in a uniform distribution of the corresponding kernels, i.e., τ, µ, ζ = 0.
Next, we will first present the modification that allows us to model the non-binary sensorimotor loop, before the results are discussed in the next section (see Section 4).

Non-Binary Model of the Sensorimotor Loop
A generalisation from the binary model to a non-binary model requires to modify the function that operates on the state values.In case of a binary alphabet Ω = {−1, 1}, the function was simply given by the product.We explain the approach to generalise this function used in this work based on the policy π(a|s).For the non-binary model, the policy is now given by: There are various ways in which the function f (a, s) can be chosen.Our choice of the function f (a, s) is derived from the requirement to have a single parameter µ that allows us to smoothly transition from an independence to a strong dependence (we will briefly discuss a different method at the end of this paragraph).Therefore, we chose to normalise the values a ∈ A = {1, 2, . . ., N} and s ∈ S = {1, 2, . . ., N} such that they are mapped onto the interval [−1, 1].This allows us to use define f (a, s) similar to the binary case (see Equation ( 23)) as the product of both mapped random variables, i.e., As this example indicates, we chose the same number of bins for all random variables w , w, s, a ∈ Ω = {1, 2, . . ., N}.
It must be noted here that this choice of f (a, s) is a projection of the full space of couplings to a subspace.In particular, there will be cases in which the effects of ψ, φ, and χ cannot be fully separated.One such example is the case in which ψ, φ, and χ are large and w , w, and a are all equal to N. The reason is that the bases spanning the space given by our choice of f (a, s) are not orthogonal (as in the case of Equation (21) to Equation ( 24)).Walsh bases are one possible way to define the function(s) f such that they are orthogonal with respect to the L 2 inner product given by the uniform distribution.
This concludes the description of the parametrised binary and non-binary model of the sensorimotor loop.The next section presents the numerical results obtained with these two models.

Numerical Simulations
In this section, we plot the numerical results for two measures, CI(W : W; A) and MC SY .CI(W : W; A) is plotted to compare our previous results [33] with the results obtained from MC SY with the iterative scaling algorithm.The entire source code used in this work is available at [54].

Results for the Binary Sensorimotor Loop
This section begins with revisiting the point made earlier about the difference between CI(W : W; A) and MC SY with respect to including the input distribution as part of the feature set. Figure 5 shows two experiments with the binary model of the sensorimotor loop.Both plots show the results for χ = 0.0 and ψ, φ ∈ [0, 5.0], i.e., without the synergistic term χw wa (see Equation ( 21)).The first plot (left-hand side), shows the result for CI(W : W; A).Along the diagonal φ ≈ ψ, we see a region in which the synergistic information is non-zero.This is counter-intuitive as the higher-order interaction term is set to zero (χ = 0, see Equation ( 21)) and the bases in the binary model of the sensorimotor loop are orthogonal.Hence, high values of φ and ψ should not result in non-zero values for the synergistic information in these cases.The second plot shows the results for MC SY with the feature set F = {{W, A}, {W, W }, {A, W }}. The plot shows that this measure results in zero synergistic information for χ = 0.0 and any choice of φ, ψ ∈ [0, 5], which is what we expect.The difference between these two approaches becomes more evident if we increase the synergistic coupling factor χ. Figure 6 shows the two measures for varying values of χ ∈ {0, 1.25, 2.5, 3.75, 5.0}.We see that for increasing values of χ, the amount of detected synergistic information increases for both measures, however in different ways.CI(W : W; A) shows increasing regions with high values along the diagonal, whereas MC SY shows areas in which the synergistic information is close to zero for values of χ = 0.The latter is surprising because, in the binary case, the three basis are orthogonal and hence the synergistic information should be distinguishable from the pair-wise interactions, also for high values of ψ and φ.This issue will be addressed again in the discussion.The plots for the non-binary case, in particular for 4 and 8 bins, are shown below (see Figure 7).Note, that CI(W : W; A) was omitted for those cases, as it was not computable for non-trivial systems at this time.

New Measure for Unique Information
Equation (14) shows that the conditional mutual information I(X; Y|Z) is given by the sum of the unique information U I(X : Y\Z) and the synergistic information CI(X : Y; Z).Given that MC SY is a new measure for synergy (first introduced in [36]), we can now also give a new definition for the unique information U I(X : Y\Z) in terms of MC W and MC SY , denoted by MC P , because it captures the part of MC W that results from uncontrolled physical interactions: Note, that MC P is not equivalent to the definition of U I(X : Y\Z) given above, because CI(W : W, A) = MC SY .Figure 8 shows MC W , MC P , and MC SY for 2, 4, and 8 bins and varying values of the synergistic parameter χ.

Discussion
This work is a continuation of our previous work on the quantification of morphological computation.Initially, we proposed two measures MC W and MC A that are based on calculating the conditional mutual information in the sensorimotor loop [15].In a later work, we investigated the relation of the conditional mutual information to unique U I(W : W\A) and synergistic information CI(W : W; A) [54], while primarily focussing on unique information at that time.In this work, we investigated synergistic information and compared a measure based on [35] with the synergistic information that was independently discovered in [34,44].The main difference between the two measures is that the previously utilised measure does not take the input distribution p(w, a) into account.We have shown in this work that omitting the input distribution can lead to positive synergistic information in cases in which it should be zero.Furthermore, we showed that the new measure MC SY can be calculated for non-trivial systems with the iterative scaling method.
Although the new measure MC SY has significantly better properties (no false positives), it does show false negative results.This means that the measure MC SY is close to zero for high values of χ, i.e., for a high synergistic term in our parametrised model, if two other couplings W → W (parameter φ) and A → W (parameter ψ) are large.It seems that in the case in which φ or ψ are large, it is increasingly difficult to detect synergistic information.At this point it is not quite clear if this is a general problem or something that is specific to the measure MC SY .This is the work of currently ongoing investigations.

Figure 2 .
Figure 2. Visualisation of the two concepts MC A and MC W . Left-hand side: causal diagram for a reactive system.Centre: causal diagram assuming no effect of the action A on the next world state W . Right-hand side: causal diagram assuming no effect of the previous world state W on the next world state W .

Figure 3 .
Figure 3. Quantifying complexity.Left-hand side: full model of two input and two output variables.Right-hand side: split model, as proposed [35].

Figure 4 .
Figure 4. Quantifying synergy.Left-hand side: full model of two input and two output variables.Right-hand side: split model, as proposed by[36].

Figure 5 .
Figure 5. CI(W : W; A) (left-hand side) and MC SY (right-hand side) without synergistic information present in the model, i.e., χ = 0.The comparison of the plots reveal that CI(W : W; A) has regions with non-zero values, although no synergistic information is present.By that we mean that the higher order interaction term χw wa is set to zero (see Equation (21)).No false positives are found for MC SY in this case.