Can Transfer Entropy Infer Causality in Neuronal Circuits for Cognitive Processing?

—Finding the causes to observed effects and establishing causal relationships between events is (and has been) an essential element of science and philosophy. Automated methods that can detect causal relationships would be very welcome, but practical methods that can infer causality are difﬁcult to ﬁnd, and the subject of ongoing research. While Shannon information only detects correlation, there are several information-theoretic notions of “directed information” that have successfully detected causality in some systems, in particular in the neuroscience community. However, recent work has shown that some directed information measures can sometimes inadequately estimate the extent of causal relations, or even fail to identify existing cause-effect relations between components of systems, especially if neurons contribute in a cryptographic manner to inﬂuence the effector neuron. Here, we test how often cryptographic logic emerges in an evolutionary process that generates artiﬁcial neural circuits for two fundamental cognitive tasks: motion detection and sound localization. Our results suggest that whether or not transfer entropy measures of causality are misleading depends strongly on the cognitive task considered. These results emphasize the importance of understanding the fundamental logic processes that contribute to cognitive processing, and quantifying their relevance in any given nervous system.


I. INTRODUCTION
When searching for common foundations of cortical computation, more and more emphasis has been placed on information-theoretic descriptions of cognitive processing [1]- [4].One of the core tasks in the analysis of cognitive processing is to follow the flow of information within the nervous system, namely finding cause-effect components.Indeed, understanding causal relationships is considered to be fundamental to all natural science [5].However, inferring causal relationships and separating them from mere correlations is difficult, and the subject of ongoing research [6], [7].The concept of Granger causality is an established statistical measure that aims to determine directed (causal) functional interactions among components or processes of a system.The main idea is that if a process X is influencing process Y , then an observer can predict the future state of Y more accurately given the history of both X and Y (written as X 0:t and Y 0:t ) compared to only knowing the history of Y .Schreiber [8] described Granger causality in terms of information theory by introducing the concept of transfer entropy (TE), positing that the influence of process X on Y can be captured using the transfer entropy from process X to Y : yt+1 x0:t y0:t p(x t+1 , x 0:t , y 0:t ) p(x t+1 |x 0:t , y 0:t ) p(x t+1 |p(x 0:t ) . ( The transfer entropy (1) is a conditional mutual entropy, and quantifies what the process Y at time t + 1 knows about the process X up to time t, given the history of Y up to time t.More colloquially, T E X→Y measures "how much uncertainty about the future course of Y can be reduced by the past of X, given Y 's own past."Transfer entropy reduces to Granger causality for so-called "auto-regressive processes" [9], which encompasses most biological dynamics.As a result, transfer entropy has become the most widely used directed information measure, especially in neuroscience (see [4], [10] and references cited therein).
Critique: Despite the increasing use of the concept, the use of transfer entropy to search for and detect causal relations has been questioned by James et al. [11], who presented scenarios in which TE may either underestimate or overestimate the flow of information from one process to another.In particular, the authors present two examples of causal processes implemented with the XOR (exclusive OR, ⊕) logic operation, to show that TE underestimates information flow in one example whereas it overestimates it in the other.
The key idea behind the James et al. criticism is that causal relations cannot correctly be captured in networks with polyadic dependencies (where more than one variable causally influences another).For instance, if Y t+1 = X t ⊕ Y t it is not possible to determine the influence of variable X on Y using T E X→Y , which considers X in isolation and independent of variable Y .We should make it clear that it is not the formulation of TE that is the source of these problems.Rather, by definition Shannon's mutual information, ) is dyadic, and cannot capture polyadic correlations.Consider for example a timeindependent process between binary variables X, Y , and Z where Z = X ⊕ Y .As is well-known, the mutual information between X and Z, and also between Y and Z vanishes: I(X : Z) = 0, I(Y : Z) = 0 (this is the quintessential onetime-pad encryption process).Thus, while the TE formulation aims to capture a directed (causal) dependency of variables, Shannon information measures the undirected (correlational) dependency of two variables only.As a consequence, problems with TE measurements of causality are unavoidable when using Shannon information, and do not stem from how Schreiber formulated TE to capture directed relations.Note that methods such as partial information decomposition have been proposed to take into account the synergistic influence of a set of variables on the others [12].However, such higher order calculations are more costly (possibly exponentially so) and require significantly more data in order to perform accurate measurements.
We first tested how well TE measures capture causality in a first-order Markov process (i.e., correctly attributes the influence of inputs on the output) for all 16 possible connected 2 → 1 binary logic gates with inputs X and Y with output Z (Fig. 1A) 1 .Our calculations show that TE measures are reliable (i.e., correctly identify the information resident in X and Y as the source of influence on output) in 14 out of 16 gates; TE measures are inaccurate only in the XOR gate Z t+1 = X t ⊕ Y t , and Z t+1 = X t Y t relations (XNOR, or EQUALS), (T E X→Z = T E Y →Z = 0 while both X and Y certainly influence output Z.Thus, out of 8 relations in which both inputs influence the output (meaning they are polyadic relations) 6 TE measures are perfectly reliable.It is therefore not valid to conclude that TE fails to capture causal relations in all polyadic relations.We repeated similar calculations for the case of a feedback loop network where variable X is connected to another variable Y and itself (Fig. 1B).Again, only in the case of X t+1 = X t ⊕ Y t , and X t+1 = X t Y t are TE measurements inaccurate (T E Xt→Xt+1 = T E Yt→Xt+1 = 1 while the entropy of X at any time is 1 bit, H(X t ) = 1).Given that TE measurements only fail to identify causal relations in cryptographic polyadic relationships (XOR and XNOR), we now set out to determine how often these cryptographic relations are likely to appear in basic cognitive tasks.If such logic is widely used in cognitive processing, an analysis of causal relationships among neural components (neurons, voxels, etc.) using the transfer entropy is bound to be problematical.If, however, the logic that causes problems for transfer entropy is rare within biological control structures, treatments using the TE concept can largely be trusted.
To answer this question, we use a new tool in computational cognitive neuroscience, namely computational models of cognitive processing that can explain task-performance in terms of plausible dynamic components [13].In particular, we use Darwinian evolution to evolve artificial digital brains (also known as Markov Brains or MBs [14]) that can receive sensory stimuli from the environment, process this information, and take actions in response.We evolve Markov Brains (MBs) that perform two different cognitive tasks whose circuitry are thoroughly studied: visual motion detection [15], as well as sound localization [16].Markov Brains have been shown to be a powerful platform that can unravel the informationtheoretic correlates of fitness and network structure in neural networks [17], [18].Our computational platform enables us to analyze structure, function, and circuitry of hundreds of evolved digital brain.As a result, we can obtain statistics on the frequency of different causal relations in evolved circuits (as opposed to study only a single evolutionary outcome), and further assess how crucial cryptographic operators are for each evolved task by performing knock-out experiments in order to measure their contribution to the task.While artificial evolution of control structures ("artificial brains") is not a substitute for the analysis of information flow in biological brains, this investigation should provide some insights of how common the problematical logic elements are likely to be, and how the use of logical elements depends on the task under investigation.

A. Markov Brains
Markov Brains (MB) are evolvable networks of binary neurons (they take value 0 for quiescent neuron, or 1 for firing neuron) in which neurons are connected via probabilistic or deterministic logic gates (in this work, we confined MBs to only evolve 2-to-1 deterministic logic gates).The states of the neurons are updated in a Markov process, i.e., the probability distribution of states of the neurons at time step t + 1 depends only on the states of neurons at time step t.The connectivity and the underlying logic of the MB's network is encoded in a genome.Thus, we can evolve populations of MBs using a Genetic Algorithm (GA) [19] to perform a variety of cognitive tasks (for a more detailed description of Markov Brain function and implementation see [14]).In the following sections, we describe two fitness functions designed to evolve motion detection and sound localization circuits in MBs.

B. Motion Detection
The first fitness function is designed in order to evolve MBs that function as a visual motion detection circuit.The circuit model of motion detection proposed by Reichardt and Hassenstein is based on a delay-and-compare scheme [20].The main idea behind this model is that a moving object is sensed by two adjacent receptors of the retina at two different time points.Fig. 2  motions detection circuits is similar to the setup previously used by [21].In their setup, two sets of inputs are presented to MB at two consecutive times and MB classifies the input as preferred direction (PD), stationary, or null direction (ND).
The value of the sensory neuron becomes 1 when a stimulus is present, and it becomes 0 otherwise (see Fig.

C. Sound Localization
The second fitness function is designed in order to evolve MBs that function as a sound localization circuit.Sound localization mechanisms in auditory systems in humans function based on several cues such as the interaural time difference or the interaural level difference.Interaural time difference is the difference between the times sound reaches two ears.Fig. 3A shows a simple schematic of a sound localization model proposed by Jeffress in which sound reaches the right ear and left ear at two possibly different times.These stimuli are then delayed in an array of delay components and travel to an array of detector neurons (marked with different colors in Fig. 3A).Each detector only fires if the two signals from different pathways, left ear pathway (shown in bottom) and right ear pathway (shown in top), reach that neuron simultaneously.In our experimental setup, two sequences of stimuli are presented to two different sensory neurons, neurons N 0 and N 1 , that represent receptors in the two ears.The stimulus in two sequences are lagged or advanced with respect to one another (as shown in Fig. 3B).The agent receives these sequences and should identify 5 different angles from which that sound is coming from.The binary value of the sensory neuron becomes 1 when a stimulus is present, shown as black blocks in Fig. 3B, and it becomes 0 otherwise, shown as white blocks in Fig. 3B.Similar to the schema shown in Fig. 3A, Markov Brains have five designated output neurons (N 11-N 15) and each neuron corresponds to one of the sound sources placed at a specific angle.Colors of detector neurons (N 11-N 15) in Fig. 3B match the angle of each sound source in Fig. 3A.

III. RESULTS
For the motion detection (MD) and sound localization (SL) tasks, we evolved 100 populations each for 10,000 generations, allowing all possible 2 → 1 logic gates as primitives.At the end of each evolutionary run, we isolated one of the genotypes with the highest score from each population and tested its ability to perform the required function.

A. Evolved Motion Detection Circuits
Out of 100 replicates of this experiment, 89 led to circuits that performed motion detection with perfect fitness.The number of gates in evolved brains varies significantly, with a minimum of 3 and maximum of 21 with a mean value of 9.75 and standard deviation of 4.39.The frequency distribution of types of logic gates per each individual brain is shown for these 89 perfect circuits in Fig. 4A.These brains have 868 gates in total, of which 29 are XOR gates and 67 are EQU gates, which constitutes 11.6% of all gates.To gain a better understanding of the logic gates distribution and composition in evolved motion detection circuits, we performed gate-knockout assays on all 89 brains.We sequentially eliminated each logic gate and re-measured the mutant brain's fitness, thus allowing us to estimate which gates were essential to the motion detection function (if there is a drop in the mutant brain's fitness) and which gates were redundant (if the mutant brain's fitness remains perfect).The frequency distribution of each type of logic gate per individual brain for essential gates is shown for the 89 perfect brains in Fig. 4B.The total number of essential gates identified by knockout experiments is 237 for 89 brains, among which only 6 are XOR gates and 24 are EQU gates.XOR and EQU gates, which are the only cryptographic logic gates, thus constitute 12.6% of essential gates in 89 brains.The mean number of essential gates in these brains is 2.78 with a standard deviation of 1.33.

B. Evolved Sound Localization Circuits
In our sound localization experiments, 71 runs out of 100 resulted in evolution of brains with perfect fitness.The number of gates in evolved brains has a minimum of 6 and maximum of 24 with a mean value of 12.6 and standard deviation of 3.42.The frequency distribution of types of logic gates per each individual brain is shown for these 71 perfect brains in Fig. 5A.These brains cumulatively have 893 gates, among which we found 110 XOR gates and 108 EQU gates, so that these gates constitute 24.4% of all gates.This percentage is significantly higher compared to the frequency found in motion detection circuits.
We also performed knockout analysis on evolved sound localization circuits on all 71 brains.The frequency distribution of each type of logic gate per individual brain for essential gates is shown for the 71 perfect brains in Fig. 4B.The total number of essential gates identified by knockout experiments is 365, of which 35 are XOR gates and 45 are EQU gates.Thus, XOR and EQU gates constitute 21.9% of essential gates in 71 brains.The mean number of essential gates in these brains is 5.14 with a standard deviation of 1.88.

C. Transfer Entropy Misestimations in Evolved Circuits
To summarize how accurate transfer entropy measures are if applied to evolved motion detection circuits, we calculated directed information flow that could be captured accurately using transfer entropy measures as well as information flow misestimations.Recall that using transfer entropy measures, XOR and EQU logic gates incorrectly estimate the influence of input neurons on output by 1 bit (see [11] for more details), while transfer entropy captures the influence of inputs on outputs accurately in all other logic gates (i.e., the sum of transfer entropies from inputs to output is 1 bit in these gates and also the influence is attributed correctly to each input).Note also that in ZERO and ONE logic gates (where the output is always 0 or 1 regardless of input states) the entropy of the output is 0 and transfer entropy correctly calculates the influence of the inputs on output as 0. Mean values of misestimates and exact measures in 89 brains as well as 95% confidence intervals are shown for essential gates in Fig. 6A.We also calculated exact measures and misestimates of transfer entropy on essential gates of each of the 71 circuits evolved for sound localization task.Mean values of misestimates and exact measures in 71 brains as well as 95% confidence intervals for essential gates are shown in Fig. 6B.
These results demonstrate that gate type compositions and circuit structures in evolved brains for motion detection (MD) and sound localization (SL) tasks are vastly different.The total number of logic gates in SL task (12.6 gates per brain) is greater than the total number of gates in the MD task (9.75 gates per brain).Moreover, the number of essential gates in SL (5.14 gates per brain) is approximately twice the number of essential gates in MD (2.78 gates per brain).More importantly, 21% of essential gates in SL were either XOR/EQU gates whereas in MD only 12.6% of gates are XOR/EQU gates.Transfer entropy successfully captures 2.42 bits (averaged across all 89 brains) of information flown between neurons in a typical motion detection circuit whereas it fails to capture 0.35 bits of information transferred between neurons.However, the results for sound localization circuits are more concerning: transfer entropy successfully captures 3.8 bits (averaged across all 71 brains) of information transferred between neurons in a sound localization circuit, while it does not detect 1.13 bits of information transferred between neurons.

IV. DISCUSSION
We used an agent-based evolutionary platform to quantitatively measure the frequency and significance of cryptographic logic gates, i.e., XOR and EQU gates, in evolved digital brains that perform two fundamental and well-studied cognitive tasks: visual motion detection, and sound localization.We evolved 100 populations for each of the cognitive tasks and analyzed the brain with highest fitness at the end of each run.Markov Brains evolved a variety of neural architectures that vary in number of neurons and the number of logic gates, as well as the type of logic gates to perform each of the cognitive tasks.In fact, both modeling [22] and empirical [23] studies have shown that a wide variety of internal parameters in neural circuits can result in the same functional output [24].Thus, it would be informative and perhaps necessary to examine a variety of circuits that perform the same cognitive task [21].
When evolving ensembles of brains for motion detection, we found XOR/EQU logic gates to be infrequent (1.08 gates per individual brain), whereas the XOR/EQU gates are significantly more frequent in sound localization circuits (3.07 gates per brain).In fact, XOR/EQU gates compose only 11% of all evolved gates in motion detection circuits while they compose 24% of logic gates in sound localization task.In order to assess the significance of cryptographic gates in circuits, we performed gate knockout assays on brains.We found the number of XOR/EQU essential gates for sound localization task to be significantly greater than the number of XOR/EQU essential gates in motion detection.In sound localization task, XOR/EQU gates constitute 21% of all essential gates whereas in motion detection task, XOR/EQU gates only constitute 12.6% of essential gates.These results suggest that evolving circuits that perform various types of cognitive tasks require different sets of relations (logic gates).More importantly, the differences between the two tasks resulted in different levels of accuracy when using transfer entropy measures to identify causal relations among neurons.
How accurate then is the transfer entropy method if we were to apply it to digital brains that perform motion detection or sound localization?We directly measured the amount of transfer entropy from input neurons of each essential logic gates to their output neurons.And since we know exactly which neurons are influencing output neurons and the dependency between them, we can calculate the amount of misestimates by transfer entropy measures in our artificial brains.Transfer entropy successfully captures 2.42 bits of information transferred between neurons per brain in a motion detection circuit while failing to capture 0.35 bits.In sound localization circuits transfer entropy successfully captures 3.8 bits on average while failing to detect 1.13 bits.Our results imply that transfer entropy may or may not accurately estimate information flow, depending on the type of circuit or cognitive task it is applied to.This finding highlights the importance of understanding the frequency and types of fundamental processes and relations in biological nervous systems.Performing such an analysis in vivo will remain a daunting task for the foreseeable future, but advances in the evolution of digital cognitive systems may allow us a glimpse of those, and perhaps guide development of other measures of information flow.

Fig. 1 .
Fig. 1. (A) A network where processes X and Y influence future state of Z. (B) A feed-back network in which processes X and Y influence future state X.
Fig. 2. (A) A Reichardt detector circuit.In Reichardt detector circuits, the results of the multiplications from each half circuit are subtracted.(B) Schematic examples of three types of input patterns received by the two sensory neurons at two consecutive time steps.Grey squares show presence of the stimuli in those neurons.
2).Thus, 16 possible sensory patterns can be presented to the MB to classify, among which 3 input patterns are PD, 3 are ND, and the other 10 are stationary patterns.Two neurons are assigned as output neurons of the motion detection circuit.The sum of binary values of these neurons represents the output of the motion detection circuit, 0: ND, 1: stationary stimulus, 2: PD.

Fig. 3 .
Fig. 3. (A) Schematic of 5 sound sources at different angles with respect to a listener (top view) and Jeffress model of sound localization.(B) Schematic examples of 5 time sequences of input patterns received by the two sensory neurons, receptors of two ears, at three consecutive time steps.Black squares show presence of the stimuli in those neurons.

Fig. 4 .
Fig. 4. (A) Frequency distribution of all gates (essential and redundant) in 89 Markov Brains that perform sound localization task perfectly.(B) Frequency distribution of essential gates (gates whose knockout result in fitness drop) in 89 Markov Brains that perform sound localization task perfectly.

Fig. 5 .
Fig. 5. (A) Frequency distribution of all gates (essential and redundant) in 71 Markov Brains that perform sound localization task perfectly.(B) Frequency distribution of essential gates (gates whose knockout result in fitness drop) in 71 Markov Brains that perform sound localization task perfectly.

Fig. 6 .
Fig. 6.Transfer entropy measures, exact measures and misestimates by transfer entropy, on essential gates of perfect circuits for (A) motion detection, (B) sound localization task.Columns show mean values and 95% confidence interval of misestimates and exact measures per individual brain.