The Consensus Problem in Polities of Agents with Dissimilar Cognitive Architectures

Sowinski, Damian Radosław; Carroll-Nellenback, Jonathan; DeSilva, Jeremy; Frank, Adam; Ghoshal, Gourab; Gleiser, Marcelo

doi:10.3390/e24101378

Open AccessFeature PaperArticle

The Consensus Problem in Polities of Agents with Dissimilar Cognitive Architectures

¹

Thayer School of Engineering, Dartmouth College, Hanover, NH 03755, USA

²

Department of Physics and Astronomy, University of Rochester, Rochester, NY 14627, USA

³

Department of Anthropology, Dartmouth College, Hanover, NH 03755, USA

⁴

Department of Physics and Astronomy, Dartmouth College, Hanover, NH 03755, USA

^*

Author to whom correspondence should be addressed.

Entropy 2022, 24(10), 1378; https://doi.org/10.3390/e24101378

Submission received: 15 July 2022 / Revised: 9 September 2022 / Accepted: 19 September 2022 / Published: 27 September 2022

(This article belongs to the Special Issue Information Theoretic Measures and Their Applications II)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Agents interacting with their environments, machine or otherwise, arrive at decisions based on their incomplete access to data and their particular cognitive architecture, including data sampling frequency and memory storage limitations. In particular, the same data streams, sampled and stored differently, may cause agents to arrive at different conclusions and to take different actions. This phenomenon has a drastic impact on polities—populations of agents predicated on the sharing of information. We show that, even under ideal conditions, polities consisting of epistemic agents with heterogeneous cognitive architectures might not achieve consensus concerning what conclusions to draw from datastreams. Transfer entropy applied to a toy model of a polity is analyzed to showcase this effect when the dynamics of the environment is known. As an illustration where the dynamics is not known, we examine empirical data streams relevant to climate and show the consensus problem manifest.

Keywords:

agent; polity; transfer entropy; information theory; consensus; sociophysics

1. Introduction

Agent success necessitates predicting with fidelity the behavior of treacherous environments in order to optimize action. Information theory is the power-tool for quantifying predictability, so any agent that thinks—an epistemic agent—can be modeled as a computing entity attempting to infer information theoretic measures. Channel capacity, Shannon entropy, mutual information, transfer entropy, and other such measures are defined with respect to empirically inaccessible joint probability distributions [1,2,3,4]. Finite computational and sensory resources, such as memory and sampling frequency, respectively, affect the estimation of these distributions [5,6,7,8,9,10,11,12,13,14,15,16,17]. From a Bayesian perspective, this implies that any agent with finite cognitive resources will make judgements that are governed by such limitations [18,19,20,21]. A group of epistemic agents—a polity—predicated on the pooling and dissemination of information is affected by the architectural heterogeneity of its constituents. In this context, a question of interest is to what extent can agreement be reached within polities given such limitations?

Transfer entropy provides a canonical example of how architecture affects conclusions about the structure of the environment. It is well known that sub-sampling of continuous time series can lead to differing estimates of transfer entropy [22,23,24], a failure circumvented by considering rates in the continuum limit

Δ t \to 0

[10,24]. Mathematically satisfying, such considerations are not applicable to physical agents whose information storage and processing capabilities are limited by the universe they find themselves attempting to predict and navigate [25]. Szilard and Landauer revealed that information is physical; there is an energy cost in memory for the read-write cycle [26,27,28]. Bekenstein demonstrated that any finite volume of spacetime has an upper limit to the information it can contain [29,30,31]. Bremermann showed that information cannot be processed at any rate without loss of fidelity [32]. Herein these limitations are not swept under the rug—any finite agent must have computational/cognitive abilities limited by finite memory and finite sampling frequency, which we bundle together under the term cognitive architecture. Embracing the inevitability of differing estimates of information theoretic measures due to variance in cognitive architectures or allotment of cognitive resources gives a different perspective on the origins of disagreement—the Consensus Problem.

With the desire to shed some light on what this means, we introduce a toy model of a polity, built from the bottom up, in order to show how disagreement amongst agents can emerge in ideal circumstances. We start with an epistemic agent capable of harnessing the correlations in the history of its environment to predict the future. The agent is placed in a simple environment consisting of a pair of stimuli which it samples according to its own limitations. We develop an exactly solvable model for the transfer entropy between the two stimuli and analyze how the inference of information-flow depends on the memory-capacity and temporal sampling of an observing agent [22,23,24]. Finally, a polity of agents, each having access to identical streams of stimuli, is formed and their individual estimates brought together and compared. As a representative illustration, we use time-series of CO

_{2}

content and atmospheric temperature taken on Mauna Loa to demonstrate this effect, suggesting that polarization amongst epistemic agents on scientifically charged topics may persist irrespective of the quality of data supporting one particular conclusion.

2. From Agents to Polities

In this section, we construct a model of an epistemic agent. We emphasize epistemic since we are not concerned with the actions of the agent, but in how it uses memories of its environment to predict the future and inform itself of how it should act. The agent is placed into an environment in which it has access to two time series of sensory input, which it samples as coupled qualia and stores in memory. Both sampling rate and storage are fixed—a toy model of the intrinsic cognitive architecture of the agent. We give a brief primer on transfer entropy and then construct a toy model of the qualia coming from a complex environment. With a single EA in hand, we introduce a simple description of a polity, an ensemble of epistemic agents, and show how the consensus problem can emerge in heterogeneous populations of agents.

2.1. Transfer Entropy and Influence

An epistemic agent, simple or complex, exists in a world full of uncertainty. That uncertainty implies that the inference of probability distributions, and the measures derived from them, lies at the core of an agent’s model of the world. Physical limits on processing and/or reaction times can hamper the actions, and existence, of an agent if that model only describes the past states of its environment. A model of the future helps mitigate the vagaries of the world.

Transfer entropy [4,33] is the simplest measure quantifying the extent to which past correlations between two processes reduce the future uncertainty in either process—the extent to which an agent can predict one of the processes, given past knowledge of both. Given two processes,

X_{1} (t) \land X_{2} (t)

, the transfer entropy is defined as the excess information gained by knowing the past history of process 2 in addition to that of process 1,

\begin{matrix} T_{2 \to 1} = H [X_{1}, t | X_{1}^{'}, t^{'} < t] - H [X_{1}, t | X_{1}^{'} \land X_{2}^{'}, t^{'} < t], \end{matrix}

(1)

where

H [X]

is the hidden information, i.e., the Shannon entropy of X, and

H [X | Y]

is the conditional entropy of X conditioned on Y [1]. In what follows, it is useful to rewrite the transfer entropy in terms of the mutual information

M [X : Y]

, expressed as the Kullback–Leibler divergence between joint and product distributions,

M [X : Y] = D_{K L} (ρ_{X Y} | | ρ_{X} \otimes ρ_{Y})

[34]. Using this, we can write an alternative expression for the transfer entropy [2,3,35]:

\begin{matrix} T_{2 \to 1} = M [X_{1}, t : X_{1}^{'} \land X_{2}^{'}, t^{'} < t] - M [X_{1}, t : X_{1}^{'}, t^{'} < t] . \end{matrix}

(2)

For any finite agent, memory limitations preclude storage of the entire history, so a discrete subset of events is taken at

t^{'} \in {t - n Δ t}_{n = 1, \dots, N}

, where N is taken to represent the number of past snapshots of the world the agent stores in estimating the joint distribution over histories i.e., memory. A schematic of the information diagram is shown in Figure 1, highlighting the relevant measures.

The processes in question can be treated symmetrically—how does knowing how

X_{2}

’s history decreases the uncertainty in

X_{1}

’s future compare to knowing how

X_{1}

’s history reduces the uncertainty in

X_{2}

’s future? To probe the asymmetry of transfer entropy in these two cases, it is useful to define the ratio of the difference in information flows to the total information flow:

\begin{matrix} T_{i j} = \frac{T_{i \to j} - T_{j \to i}}{T_{i \to j} + T_{j \to i}} . \end{matrix}

(3)

This quantity is bounded on the interval

[- 1, 1]

, saturating iff the transfer entropy vanishes in only one direction, which occurs iff the target process is deterministic. Since

T_{i \to j} \geq 0

, when both directions vanish we set

T_{i j} \equiv 0

. We note that, in certain cases, the transfer entropy reduces to the Granger-Causality [36,37], and there is a tendency to interpret it as a causal measure. However, one must be cautious employing such interpretations, given that both measures are based entirely on correlations [36,38,39,40]. In passing, we refer to it as influence, but only with regard to how correlations help influence an agent’s predictions, as opposed to any sort of causal influence between the actual processes.

2.2. A Toy Model of Qualia

Consider now a simple agent, one whose senses allow it to observe the world through two stimuli. The experiences of these stimuli by the agent are referred to as qualia (singular quale); both terms are used interchangeably. Placing the agent into an environment provides it with these stimuli, albeit noised by whatever else is happening. One could have in mind a thermostat that senses both temperature and pressure, though, in principle, one can imagine a far more complex agent sampling myriad stimuli and ignoring all but two. Like the temperature and the pressure of the thermostat’s environment, the stimuli are coupled together by whatever processes give rise to them in the environment, and the agent will estimate the influence of the two by computing the transfer entropy from its memory and sampling of the environment.

To see this in action, let us consider as our stimuli a pair of positions,

X_{1} (t) \land X_{2} (t)

, coupled in the environment by evolution equations

\begin{matrix} \frac{d X_{1}}{d t} & = - α_{1} (X_{1} - X_{2}) + β_{1} η_{1}, \end{matrix}

(4)

\begin{matrix} \frac{d X_{2}}{d t} & = - α_{2} (X_{2} - X_{1}) + β_{2} η_{2}, \end{matrix}

(5)

and initial conditions

X_{1} (t_{0}) \land X_{2} (t_{0})

. These could be the temperature and pressure of the thermostat, but dimensionalized to facilitate comparisons. Since a metric structure is closely tied to similarity judgements when comparing both auditory and visual qualia, position is an appropriate descriptor [41,42] Here,

α_{i}, β_{i} \in R^{+}

parameterize the deterministic and stochastic contributions, respectively, to the equations of motion. The latter are associated with the environment acting as a heat reservoir,

R

, inducing fluctuations in both processes—

η_{1, 2}

are independent white noise contributions that satisfy

〈 η_{i} (t) 〉 = 0

and

〈 η_{i} (t) η_{j} (t^{'}) 〉 = δ_{i j} δ (t - t^{'})

. The dimensions of the parameters are

[η_{i}] = {Time}^{- 1 / 2}

,

[α_{i}] = {Time}^{- 1}

and

[β_{i}] = Length \times {Time}^{- 1 / 2}

.

The four coupling constants define natural time and length scales in the model,

\begin{matrix} τ = \frac{1}{α_{1} + α_{2}} ℓ = \frac{β_{1} + β_{2}}{\sqrt{α_{1} + α_{2}}}, \end{matrix}

(6)

which are used to dimensionalize all variables; see Appendix A for more details. The latter describes the size of fluctuations in the separation of the two processes, while the former the decay of transients. In the left panel of Figure 1, we plot instances of paths generated by Equations (4) and (5) for several values of the coupling constants. We note that the model has an explicit coupling which introduces correlations between the past and present of both processes. The relevant quantities determining the behavior of the information dynamics of the two processes are

\begin{matrix} a = \frac{α_{1} - α_{2}}{α_{1} + α_{2}} b = \frac{β_{1} - β_{2}}{β_{1} + β_{2}}, \end{matrix}

(7)

with the former being deterministic and the latter stochastic asymmetry parameters, both normalized to the range

[- 1, 1]

.

2.3. Influence between Qualia

Equations (4) and (5) are diagonalized into a Wiener process for the center of mass motion, and an Ornstein–Uhlenbeck process for the separation, giving exact solutions. For a more in depth derivation, please see Appendix A. Given that both are Gaussian processes, the solution to the corresponding Fokker–Planck equation is a multivariate Gaussian, allowing for an exact solution for the analytical form of

T_{12} (Δ t, N)

. For a Gaussian process, if

A

and

B

have covariances

Σ_{A A}

and

Σ_{B B}

, respectively, and joint covariance

Σ

, then the mutual information is

M [A : B] = \frac{1}{2} log \frac{| Σ_{A A} | | Σ_{B B} |}{| Σ |} .

Combining this with Equation (2), we show the results for different values of N in Figure 3. In the small,

ϵ ≪ 1

, asymmetry regime where

(a + b) \sim O (ϵ), | b - a | \sim O (ϵ^{2})

and for small timescales,

Δ t \sim O (ϵ^{2})

, short memory

N = 1

, and in the limit

t \to \infty

, the influence reads

\begin{matrix} T_{12} (Δ t, 1) = \frac{3}{ln 2} (a + b) (Δ t - {(a + b)}^{2}) + O (ϵ^{4}) . \end{matrix}

We note that there are many ways in which one can discuss a small limit, and this is simply one of them. When the asymmetry switches sign, agents sampling timescales on opposite sides of

Δ t \sim {(a + b)}^{2}

conclude in contradiction to one another the direction of influence. Since the transfer entropy for Gaussian processes is equivalent to the Granger causality, this implies that one can confuse the direction of causation, which, as previously mentioned, is a common misinterpretation of a purely correlative measure [36,37]. Figure 2 indicates that this phenomenon is not limited to the linear regime but is a general feature found across parameter space. A longer discussion of how to compute the influence within the qualia model is presented in Appendix B.

It is beneficial to think of a space of all possible cognitive architectures given fixed values of deterministic and stochastic model parameters. A polity distribution function, discussed in the next section, has this space as its support. If the population has a fairly uniform set of architectures, then this density will be highly peaked. As variance in architecture is introduced, or as cognitive resources are moved around, the polity distribution function spreads out and deforms. When the distribution crosses the vanishing influence surface, agents on either side will necessarily have differing opinions.

2.4. A Toy Model of Polities

A polity can be thought of as a more complex agent, one whose beliefs are a conglomeration of the beliefs of its constituent agents, and whose actions are determined by that distribution. Pooling of information can be done at multiple levels, and in different fashions that distribute weight amongst agent opinions. Here, we consider a simple case where final estimates of an information measure are pooled together but examine the effect of different weight schemes. Bringing together the estimates of many individual agents, the naive expectation is that the law of large numbers might allow the polity to form an estimate with smaller uncertainty. Given a homogeneous population of agents, this would be the case; not so for heterogeneous polities.

Consider then a polity of agents with heterogeneous cognitive architectures,

A = {(N_{1}, Δ t_{1}), (N_{2}, Δ t_{2}), \dots}

, described by a population distribution function,

ρ_{N} (Δ t) d Δ t

, giving the fraction of the population with memory N and sampling times between

Δ t

and

Δ t + d Δ t

. Each agent, assumed independent of the others, has access to the same data, sampling and remembering as their nature allows, and estimates the influence between processes, contributing that result to the polity. The distribution of estimates is

ρ (T_{12}) = \sum_{N = 1}^{\infty} \int_{0}^{\infty} d Δ t ρ_{N} (Δ t) ρ (T_{12} | N, Δ t),

where an agent’s uncertainty in their estimate of

T_{12}

can be included in the conditional distribution

ρ (T_{12} | N_{a}, Δ t_{a})

. More details on the derivation are given in Appendix B.

The distribution

ρ_{N} (Δ t)

describes the heterogeneity of cognitive architectures found in the polity. If the agents are similar enough that they all have near identical hardware, whether that be biological or otherwise, one would expect the distribution to be peaked at some

Δ t_{*}

, with a large variance. To account for several orders of magnitude in sampling rates, we take the

Δ t

dependence to be log-normal, with mean

〈 Δ t 〉

and log normalized root mean square

θ = ln \sqrt{〈 Δ t^{2} 〉 / {〈 Δ t 〉}^{2}}

. Meanwhile, the N dependence is taken to be geometric, with mean

〈 N 〉

,

ρ_{N} (Δ t) = \frac{e^{- \frac{θ}{4}}}{\sqrt{4 π θ} 〈 N 〉 〈 Δ t 〉} {(1 - \frac{1}{〈 N 〉})}^{N - 1} {\sqrt{\frac{〈 Δ t 〉}{Δ t}}}^{3 - \frac{1}{θ} ln \sqrt{\frac{〈 Δ t 〉}{Δ t}}},

(8)

where

Δ t \in (0, \infty)

and

N \geq 1

. Correlations between sampling and memory representing constraints such as favored look-back window times,

〈 T 〉 = 〈 N Δ t 〉 \neq 〈 N 〉 〈 Δ t 〉

, can easily be incorporated into this framework.

3. Results

In this section, we look at the influence measure across a wide range of parameters in our coupled qualia model, as well as differing cognitive architectures. The central contour plot (A) of Figure 2 shows the asymmetry across the full domain for the deterministic parameter,

a \in [- 1, 1]

as inferred by agents with a range of sampling timescales,

Δ t \in [10^{- 3}, 10^{3}]

. In the bottom-left surface plot (A) of Figure 2, we see the corresponding transfer entropy from process 1 to process 2 (blue), and vice versa (pink). The dominating process is the one with the weaker deterministic coupling; the other process is pulled towards it, as seen in the left panel of Figure 1. Three slices, labelled

I, I I,

and

I I I

, are taken of these surfaces for constant values of the deterministic asymmetry parameter, namely

a \in {- 0.1, 0.2, 0.5}

and displayed in the upper-left three panels (C). For the first (last) of these, the transfer entropy in the direction

2 \to 1

(

1 \to 2

) is always larger. In both cases, agents would interpret the information flow as always being unidirectional, irrespective of their sampling times

Δ t

. In the middle panel, however, there is a cross-over of transfer entropy at a particular value of temporal discretization. The exact value of

Δ t

at which this occurs is not important in our discussion, as the far right plot of the influence shows that there is a wide range of deterministic asymmetry parameters that share this feature. In particular, the direction of influence inferred by agents will depend on their sampling of the past: If the agent samples the processes at longer timescales, they will believe that process 2 holds more information about process 1 than the other way around. Conversely, for shorter sampling timescales, the agents will reach the opposite conclusion. Re-framing our discussion to an ensemble of agents that do not have an agreed upon sampling timescale, there does not exist a singular conclusion concerning information flows across the entire ensemble: samples of agents drawn from this ensemble will reach contradictory conclusions concerning historical correlations between the processes.

The influence cross-over is not simply dependent on the sampling timescale

Δ t

, but also the memory size of the agents captured by the parameter N. The plot array (D) in Figure 2 shows multiple instances of the central contour plot (A) for varying values of the noise asymmetry parameter, b, and memory size, N. The black line in these panels indicates the region where the influence flips, and the rows show its dependence as a function of increasing N. Since the crossover region is monotonic in N, there are points of constant asymmetry parameters and temporal discretization that nonetheless lead to contradictory conclusions due to different memory capacities. We note that the effect of N appears weaker than that of

Δ t

as the locus of influence flips changes with N and appears to saturate by

N \sim 8

.

One could imagine that the way N and

Δ t

affect the transfer entropies conspire together so that, for a constant look-back window,

T = N Δ t

, their effects would cancel out. This is not the case, however, as seen in Figure 3, where the top panel (A) shows

T_{12}

for a wide range of

T \in [10^{- 2}, 10^{2}]

and

N \in {1, 2, \dots, 10^{2}}

, and the bottom panel (B) explores the space of deterministic and stochastic parameters. The black line represents the cross-over point for the asymmetry in transfer entropy, a general feature over a large number of samples. It is interesting to note that, for

a \approx b

, the cross-over curve hugs the line (horizontal, dashed)

N = 2

across window sizes smaller than the natural timescale,

τ

, while for larger windows it hugs the line (diagonal, dashed) corresponding to discretizations of the window into increments of size

τ

. As b grows larger than a, the crossover curve moves to the right, yet remains approximately parallel to this latter line for

N ≳ 2

.

The Consensus Problem

The presence of surfaces of vanishing influence in the space of cognitive architectures seems to be a generic feature across many model parameters. For simplicity, let us assume that the computation of influence depends solely on

Δ t

and a critical timescale

τ_{*}

. If the agent samples on a timescale shorter than

τ_{*}

, they compute the influence to be

T_{12}^{<}

; otherwise, they compute

T_{12}^{>}

. This can be modeled by a Heaviside step function,

Θ

, so that

T_{12}^{Δ t} = (T_{12}^{>} - T_{12}^{<}) Θ (Δ t - τ_{*}) + T_{12}^{<}

. Then, the belief distribution over values of influence is bimodal:

\begin{matrix} ρ (T_{12}) & = \frac{1}{2} (1 - f) δ (T_{12} - T_{12}^{<}) + \frac{1}{2} (1 + f) δ (T_{12} - T_{12}^{>}) \\ where f = \erf (\sqrt{ln \frac{τ}{〈 Δ t 〉}} cosh \sqrt{ln {(\frac{τ}{〈 Δ t 〉})}^{1 / θ}}) \end{matrix}

(9)

with erf

(x)

the Gaussian error function. Both outcomes in the polity belief distribution are weighed by a prefactor that is determined by what fraction of the polity distribution function lies on either side of the critical sampling time.

A similar result can be found if we allow for uncertainty in the computation of influence, and polity censuring. Consider the case where an agent that has a memory less than some critical memory size

N_{1}

computes an influence with mean

T_{12}^{1}

and variance

σ_{1}^{2}

, while agents above

N_{1}

but below

N_{2}

compute an influence with mean

T_{12}^{2}

and variance

σ_{2}^{2}

. Agents with memory above

N_{2}

are censured so that their beliefs do not contribute to the polity. We also assume the errors that are small enough that the influence distributions are Gaussian. The belief distribution in the polity is

\begin{matrix} ρ (T_{12}) & = \frac{1}{\sqrt{2 π}} (\frac{1 - x^{N_{1}}}{1 - x^{N_{2}}} σ_{1}^{- 1} e^{- \frac{{(T_{12} - T_{12}^{1})}^{2}}{2 σ_{1}^{2}}} + \frac{x^{N_{1}} - x^{N_{2}}}{1 - x^{N_{2}}} σ_{2}^{- 1} e^{- \frac{{(T_{12} - T_{12}^{2})}^{2}}{2 σ_{2}^{2}}}) \end{matrix}

(10)

\begin{matrix} where x = 1 - \frac{1}{〈 N 〉} \end{matrix}

(11)

For more details on either calculation, see Appendix B. Again, we find a bimodal distribution with each mode weighed by the structural and statistical properties of the polity. This is quite general, and creating distributions with more modes becomes a simple generalization.

A multi-modal polity belief distribution implies that there are camps within the polity with similar beliefs, though not necessarily with similar cognitive architectures. This is not a problem for consensus if the modes all have the same sign for influence—different camps still agree on the direction of influence. However, a problem occurs if the modes have different signs i.e., the distribution cuts through a line of vanishing influence. In this case, the camps have opposing beliefs, and the polity will not have a consensus belief in the direction of influence. We call this impasse the Consensus Problem, and it should be clear now how the heterogeneity of cognitive architectures within a polity contributes to its emergence. In the next section, we show this emergence in real world data.

4. An Argument over Climate Data

As an illustration of our framework applicable to empirical data, we use climate data gathered at Mauna Loa Observatory, CO

_{2}

content and local temperatures [43,44,45]. For clarity, we are not attempting to examine whether carbon dioxide content is driving temperature, or vice versa, but to show that the consensus problem can be identified in data coming from a dynamical system whose dynamics need not be known. These data consist of monthly measurements from 1958–present, with accuracy at the

10^{- 2}

p.p.m. and

0.1

°C levels. We leverage the uncertainty at the data level of accuracy to bootstrap a polity of heterogeneous agents with different ages. This is done by taking substreams drawn from the full data, starting dates uniformly drawn from 1958–2010, and introducing Gaussian noise with standard deviation equal to the significant digit in the original data. We experimented with many homogeneous polities of equal age agents, each instance using substreams of the same length but not necessarily starting at the present, and found the results robust for lengths up to ~20–30 months, beyond which data volume became an issue. Our bootstrapped results for a polity consisting of many ages are shown in Figure 4.

The analysis was done on the raw data, as well as detrended data, representing polities aware of long-term trends. Removing the linear/exponential trends increased the transfer entropies by nearly an order of magnitude, and tightened the error bars. Because influence is insensitive to scale transformations, the former effect did not change the overall shape of the influence curve significantly. Further detrending by the removal of the highest power harmonics had almost no effect on both transfer entropies and influence. For a more detailed description of the procedure, see Appendix C.

Figure 4B shows

T_{{T, CO}_{2}}

does, indeed, depend on the memory usage/look-back window. For

N Δ t < 6

months, a period associated with changing weather, an agent as modeled above would infer temperature influences CO

_{2}

content. For

N Δ t > 6

months, a period associated with seasonal changes, an agent would infer the opposite influence. The two data streams yield contradictory conclusions about which process influences which dependent on the architecture of the agent.

Moving up to polities, consensus on the influence direction for a random sample of agents is difficult unless the sample is drawn so that all the agents have similar memory usage given this monthly sampling strategy. The shape of the curve is similar to the second example discussed in the previous section, so we expect the polity belief distribution to be bimodal, and this is supported by Figure 5. There we see several polity belief distributions for three different values of the average polity memory size. Around

〈 N 〉 \approx 6

, the median value of the influence becomes 0, implying that the population is equally divided over their opinions. For populations with shorter memories, the belief that temperature is influencing CO

_{2}

is dominant. Populations with longer memories tend to believe the opposite. Once again, we intend this as a demonstration of how the consensus problem might emerge in polities, and understand that the complexities of weather and climate change cannot be boiled down to two simple data streams.

5. Discussion and Conclusions

Under the assumption that information is physical; that any realistic epistemic agent will have bounds on its ability to acquire, store, and process information from the environment stemming from its finite cognitive architecture [26,32,46,47,48,49,50,51,52,53], this paper has given a powerful reinterpretation of known problems with transfer entropy estimation as a source for disagreement within populations of such agents that span a large enough volume of possible cognitive architectures. The consensus problem does not stem from any differences in the sensory data different agents are exposed to; exposing agents to identical data streams does not ensure that all will reach the same conclusion. This result is qualitatively obvious to anyone familiar with large groups of humans and has implications for recent studies on opinion formation and polarization [54,55,56]. The results also have implications for presumably model-agnostic machine learning. ML architectures utilize information theoretic measures to learn from data. However, such learning demands efficient storage of belief distributions in lieu of enormous data sets and is therefore subject to historical sampling. Algorithms with memory usage optimized to specific hardware architectures will likely encounter the consensus problem described here when compared with identical algorithms running on different architectures.

Transfer entropy has gained in popularity recently in the analysis of group dynamics [57,58,59]. We hope to investigate whether our results hold for more than two data streams, and what happens as we attempt to move up in scale from small groups of agents to flocks and coarse grained polity? Extending this work to larger scales will help clarify the dynamics of group formation and models of interaction in social organisms [60,61]. Beyond that, how does the consensus problem manifest due to cognitive architectural choices in the inference of other information theoretic measures? Going back to the notion of agency, epistemic agents have a repertoire of actions available to them which they must choose from based on the conclusions they make about their changing environment: In what way does this repertoire affect the availability of sensory data to an agent introducing new ways for the consensus problem to manifest. These points open up the need to clarify what is meant by an epistemic agent, as well as a further development of the notion of a polity as a group of agents.

Author Contributions

Conceptualization, D.R.S., A.F. and M.G.; methodology, D.R.S. and J.C.-N.; software, D.R.S. and J.C.-N.; validation, D.R.S. and J.C.-N.; formal analysis, D.R.S. and J.C.-N.; investigation, D.R.S. and J.C.-N.; resources, G.G. and J.D.; data curation, D.R.S.; writing—original draft preparation, D.R.S.; writing—review and editing, A.F., G.G., J.D. and M.G.; visualization, D.R.S. and J.C.-N.; supervision, G.G. and J.D.; project administration, A.F. and M.G. All authors have read and agreed to the published version of the manuscript.

Funding

D.R.S. gratefully acknowledges funding from the National Institutes of Health grant R01EB27577. G.G. and A.F. acknowledge support from the Templeton Foundation under grant number 62417 Information Architectures That Enable Life: The Emergence of Meaning.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data for atmospheric CO

_{2}

content and temperature for the Mauna Loa observation site are available at the National Oceanic and and Atmospheric Administration website [43] and the World Meteorological Organization website [44]. The code used to generate the data on influence can be shared by the corresponding author under reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. The Sensory Model

Our sensory model consists of two coupled stochastic processes that can be written as a vector Ornstein–Uhlenbeck process:

\begin{matrix} \frac{d X}{d t} & = - A \cdot X + B \cdot η \\ where A = [\begin{matrix} α_{1} & - α_{1} \\ - α_{2} & α_{2} \end{matrix}] and B = [\begin{matrix} β_{1} & 0 \\ 0 & β_{2} \end{matrix}] . \end{matrix}

The

η

are uncorrelated white noise with first and second moments

\begin{matrix} 〈 η (t) 〉 = 0 and 〈 η (t) \cdot η {(t^{'})}^{T} 〉 & = 𝟙 δ (t - t^{'}) . \end{matrix}

In what follows, we will show how to scale the variables to reveal the relevant model parameters. Then, we will diagonalize the system and give interpretations to the new coordinates. Lastly, we integrate the system using standard techniques.

Appendix A.1. Scaling and Dimensionalization

The dimensions of all relevant quantities are

\begin{matrix} [X_{i}] & = L [t] = T [α_{i}] = T^{- 1} \\ [β_{i}] & = {LT}^{- 1 / 2} [η_{i}] = T^{- 1 / 2} \end{matrix}

with which one can construct time and length scales

\begin{matrix} τ = \frac{1}{α_{1} + α_{2}} ℓ = \frac{β_{1} + β_{2}}{\sqrt{α_{1} + α_{2}}}, \end{matrix}

which are used to construct the dimensionless (tildes) quantities

\begin{matrix} X_{i} & = ℓ {\tilde{X}}_{i} \\ t & = τ \tilde{t} \end{matrix}

Meanwhile, the noise scales are

η_{i} (t) = τ^{- 1 / 2} η_{i} (\tilde{t})

. Introducing the dimensionless parameters defined on

[- 1, 1]

,

\begin{matrix} a = \frac{α_{1} - α_{2}}{α_{1} + α_{2}} b = \frac{β_{1} - β_{2}}{β_{1} + β_{2}}, \end{matrix}

the linear operators are dimensionalized as well

\begin{matrix} \tilde{A} = τ A = \frac{1}{2} [\begin{matrix} 1 + a & - (1 + a) \\ - (1 - a) & 1 - a \end{matrix}] \\ \tilde{B} = ℓ^{- 1} τ^{1 / 2} B = \frac{1}{2} [\begin{matrix} 1 + b & 0 \\ 0 & 1 - b \end{matrix}] . \end{matrix}

With these, the equations of motion are rendered dimensionless;

\begin{matrix} \frac{d \tilde{X}}{d \tilde{t}} = - \tilde{A} \cdot \tilde{X} + \tilde{B} \cdot η, \end{matrix}

(A1)

and their behavior is controlled by the parameters

(a, b)

. These have the same form as the original equations, so when we proceed by dropping the tildes, we do so with the understanding that we are working in the dimensionless variables.

Appendix A.2. Diagonalization and Translations

Note that

A

is idempotent,

A^{2} = A

, so it is a projection operator. What does it project onto? Idempotency implies two eigenvalues, 0 and 1, so the eigenvectors obey a standard Weiner process and an Ornstein–Uhlenbeck process, respectively. The latter is the translationally invariant subspace, since the equations of motion are invariant under the global translation

X_{i} \to X_{i} + c

. Let us go through this in more detail.

Using the decomposition

X = A \cdot X + (𝟙 - A) \cdot X

, we define the eigenvectors

\begin{matrix} Y_{1} & = A \cdot X \end{matrix}

(A2)

\begin{matrix} Y_{0} & = (𝟙 - A) \cdot X . \end{matrix}

(A3)

Under a translation,

X \to X + c 1

, they transform as

Y_{1} \to Y_{1}

and

Y_{0} \to Y_{0} + c 1

. From this, we see that

Y_{1}

is translationally invariant, proportional to the displacement

X_{1} - X_{2}

, while

Y_{0}

represents the center of mass motion. By projecting the equations of motion, we find that the two satisfy

\begin{matrix} \frac{d Y_{1}}{d t} & = - Y_{1} + A \cdot B \cdot η \end{matrix}

(A4)

\begin{matrix} \frac{d Y_{0}}{d t} & = (𝟙 - A) \cdot B \cdot η \end{matrix}

(A5)

showing our previous assertion that the translationally invariant eigenvector is an Ornstein–Uhlenbeck process, and the center of mass is a standard Weiner process. Note that the two processes are not independent, since the cross-correlation between the transformed noise is non-vanishing—the two talk to one another through the heat bath. We say that they are uncoupled deterministically but remain coupled stochastically due to the cross-correlation between noise.

Appendix A.3. Solution by Integration Factor

The equations of motion can be solved using an integration factor,

G (t) = e^{- A t}

, giving

\begin{matrix} X (t) = G (t - t_{0}) \cdot X_{0} + \int_{t_{0}}^{t} d s G (t - s) \cdot B \cdot η (s) . \end{matrix}

(A6)

Since

A

is idempotent, the integration factor can be written in terms of scalar exponential functions:

\begin{matrix} e^{- A} & = 𝟙 - A t + \frac{1}{2} A^{2} t^{2} - \frac{1}{3!} A^{3} t^{3} + \cdot s \\ = 𝟙 - A + e^{- t} A \end{matrix}

This form shows us that, at long times, the operator approaches the projection onto the center of mass subspace. From the solution, one may compute moments and cummulants of the process.

Appendix B. Information Theory

Appendix B.1. Statistics

Here, we compute the mean and covariance of our vector process. The mean is

\begin{matrix} μ (t) & = 〈 X (t) 〉 = (𝟙 - A) \cdot X_{0} + e^{- t} A \cdot X_{0} . \end{matrix}

As stated, the mean approaches the center of mass of the system. For completeness, in the initial coordinate system,

\begin{matrix} μ_{1} (t) & = \frac{1}{2} (X_{1} (t_{0}) + X_{2} (t_{0})) + \frac{(1 - e^{- (t - t_{0})}) a - e^{- (t - t_{0})}}{2} (X_{2} (t_{0}) - X_{1} (t_{0})) \end{matrix}

(A7)

\begin{matrix} μ_{2} (t) & = \frac{1}{2} (X_{1} (t_{0}) + X_{2} (t_{0})) + \frac{(1 - e^{- (t - t_{0})}) a + e^{- (t - t_{0})}}{2} (X_{2} (t_{0}) - X_{1} (t_{0})), \end{matrix}

(A8)

The covariance requires some work to find a decent form for

\begin{matrix} Σ (t . t^{'}) & = 〈 (X (t) - 〈 X (t) 〉) \cdot {(X (t^{'}) - 〈 X (t^{'}) 〉)}^{T} 〉 \\ = \int_{t_{0}}^{t} d s \int_{t_{0}}^{t^{'}} d s^{'} G (t - s) \cdot B \cdot 〈 η (s) \cdot η {(s^{'})}^{T} 〉 \cdot B \cdot G (t^{'} - s^{'}) \\ = \int_{t_{0}}^{min (t, t^{'})} d s G (t - s) \cdot B \cdot B \cdot G (t^{'} - s) \end{matrix}

Since

Σ {(t, t^{'})}^{T} = Σ (t^{'}, t)

, without loss of generality, we choose

t^{'} = t + Δ t > t

, so that

\begin{matrix} Σ (t, t + Δ t) & = (𝟙 - A) \cdot B^{2} \cdot (𝟙 - A) {(t - t_{0})}^{T} + A \cdot B^{2} \cdot {(𝟙 - A)}^{T} \end{matrix}

(A9)

\begin{matrix} + e^{- Δ t} ((𝟙 - A) \cdot B^{2} \cdot A^{T} + \frac{1}{2} A \cdot B^{2} \cdot A^{T}) \end{matrix}

(A10)

\begin{matrix} - (e^{- Δ t} (𝟙 - A) \cdot B^{2} \cdot A^{T} + A \cdot B^{2} \cdot {(𝟙 - A)}^{T}) e^{- (t - t_{0})} \end{matrix}

(A11)

\begin{matrix} - \frac{1}{2} e^{- Δ t} A \cdot B^{2} \cdot A^{T} e^{- 2 (t - t_{0})} \end{matrix}

(A12)

The first line is dominant, representing diffusion of the center of mass and a constant correlation. The second term represents short-term correlations that decay on a timescale

τ

. The third and fourth terms are transients, decaying on timescales

τ / 2

and

τ

, respectively. In the asymptotic limit,

t ≫ t_{0}

, only the short correlation terms remain, subdominant to the linear term. Since the model is Gaussian, all higher order moments can be constructed from these first two.

Appendix B.2. Belief Distribution

The EA’s belief distribution over possible paths is required. Here, we construct the multivariate Gaussian describing the probability of a finite history appearing in our model. With a time discretization,

Δ t

, we write the two histories

x_{N} = [X_{1} (t), X_{2} (t), X_{1} (t - Δ t), X_{2} (t - Δ t), \dots, X_{1} (t - N Δ t), X_{2} (t - N Δ t)]

, the mean history

μ_{N} = [μ_{1} (t), μ_{2} (t), μ_{1} (t - Δ t), μ_{2} (t - Δ t), \dots, μ_{1} (t - N Δ t), μ_{2} (t - N Δ t)]

and the covariance history

\begin{matrix} Σ_{N} (Δ t) = [\begin{matrix} Σ (t, t) & Σ (t, t - Δ t) & \cdot s & Σ (t, t - N Δ t) \\ Σ (t - Δ t, t) & Σ (t - Δ t, t - Δ t) & \cdot s & Σ (t - Δ t, t - N Δ t) \\ ⋮ & ⋮ & ⋱ & ⋱ \\ Σ (t - N Δ t, t) & Σ (t - N Δ t, t - Δ t) & \cdot s & Σ (t - N Δ t, t - N Δ t) . \end{matrix}] \end{matrix}

(A13)

We mention that this form is amenable to numerical computations, since removing certain rows and columns gives the matrices necessary for the computing the mutual information. Though we do not use it explicitly, for completeness, the multivariate Gaussian distribution over two histories reads

\begin{matrix} ρ (x; μ_{N}, Σ_{N}) = \frac{1}{{(2 π)}^{N} {| Σ_{N} |}^{\frac{1}{2}}} exp (- \frac{1}{2} (x - μ_{N}) \cdot Σ_{N}^{- 1} \cdot (x - μ_{N})) . \end{matrix}

(A14)

Appendix B.3. Information Measures

With the belief distribution, we can exactly compute the transfer entropy. Consider subdividing our history vector into two disjoint parts:

x = [x_{A} x_{B}]

. Then, the history mean breaks up similarly, while the history covariance breaks into

\begin{matrix} Σ = [\begin{matrix} Σ_{A A} & Σ_{A B} \\ Σ_{B A} & Σ_{B B} \end{matrix}] . \end{matrix}

(A15)

With these, the mutual information between the disjoint subsets of events is the well known result for multivariate Gaussians,

\begin{matrix} M [A : B] = \frac{1}{2} log \frac{| Σ_{A A} | | Σ_{B B} |}{| Σ |} . \end{matrix}

(A16)

It is clear that, if the subsets are uncorrelated,

| Σ | = | Σ_{A A} | | Σ_{B B} |

, the logarithm is unity and the mutual information vanishes. Otherwise,

| Σ | = | Σ_{A A} | | Σ_{B B} - Σ_{B A} \cdot Σ_{A A}^{- 1} \cdot Σ_{A B} |

. For our purposes, we must figure out how to use this with the covariance history. As it turns out, this requires that we simply keep the rows and columns in the covariance history corresponding to

X_{1}

and

X_{2}

at whatever times are included in the history.

Using Equation (A16) in the mutual information form of the transfer entropy, let us do the

N = 1

calculation explicitly:

\begin{matrix} T_{2 \to 1} & = M [X_{1}, t : X_{1}^{'} \land X_{2}^{'}, t^{'} = t - Δ t] - M [X_{1}, t : X_{1}^{'}, t^{'} = t - Δ t] \\ = \frac{1}{2} log \frac{|\begin{matrix} Σ_{11} (t, t) \end{matrix}| |\begin{matrix} Σ_{11} (t^{'}, t^{'}) & Σ_{12} (t^{'}, t^{'}) \\ Σ_{21} (t^{'}, t^{'}) & Σ_{22} (t^{'}, t^{'}) \end{matrix}|}{|\begin{matrix} Σ_{11} (t, t) & Σ_{11} (t, t^{'}) & Σ_{12} (t, t^{'}) \\ Σ_{11} (t, t^{'}) & Σ_{11} (t^{'}, t^{'}) & Σ_{12} (t^{'}, t^{'}) \\ Σ_{12} (t, t^{'}) & Σ_{12} (t^{'}, t^{'}) & Σ_{22} (t^{'}, t^{'}) \end{matrix}|} - \frac{1}{2} log \frac{|\begin{matrix} Σ_{11} (t, t) \end{matrix}| |\begin{matrix} Σ_{11} (t^{'}, t^{'}) \end{matrix}|}{|\begin{matrix} Σ_{11} (t, t) & Σ_{11} (t, t^{'}) \\ Σ_{11} (t, t^{'}) & Σ_{11} (t^{'}, t^{'}) \end{matrix}|} \\ = \frac{1}{2} log \frac{|\begin{matrix} Σ_{11} (t^{'}, t^{'}) & Σ_{12} (t^{'}, t^{'}) \\ Σ_{21} (t^{'}, t^{'}) & Σ_{22} (t^{'}, t^{'}) \end{matrix}| |\begin{matrix} Σ_{11} (t, t) & Σ_{11} (t, t^{'}) \\ Σ_{11} (t, t^{'}) & Σ_{11} (t^{'}, t^{'}) \end{matrix}|}{|\begin{matrix} Σ_{11} (t^{'}, t^{'}) \end{matrix}| |\begin{matrix} Σ_{11} (t, t) & Σ_{11} (t, t^{'}) & Σ_{12} (t, t^{'}) \\ Σ_{11} (t, t^{'}) & Σ_{11} (t^{'}, t^{'}) & Σ_{12} (t^{'}, t^{'}) \\ Σ_{12} (t, t^{'}) & Σ_{12} (t^{'}, t^{'}) & Σ_{22} (t^{'}, t^{'}) \end{matrix}|} . \end{matrix}

One can play around with this expression, but, in the end, cannot escape taking the determinants and creating an algebraic mess of gargantuan size—the expressions for

N > 1

are even more space consuming, since the determinants can gain up to two additional rows and columns each time N goes up by 1.

To compute the transfer entropy in the opposite direction, we note that relabelling

1 \leftrightarrow 2

is equivalent to flipping the sign of the assymetry parameters. Rather than opting for the full analytical expressions, this is where we went the numerical route, implementing the determinants pointwise across a finite domain of all relevant variables. This allowed us to compute values of N up to about 100, at which point computational time created a large bottleneck which prevented further investigation. It is clear that the limits

N \to \infty, Δ t \to 0

are interesting since they correspond to ideal epistemic agents. The transfer entropy in the opposite direction is computed by simply exchanging

1 \leftrightarrow 2

. With the two information flows, the influence is easily computed.

Appendix B.4. From Agent to Politi

A politi consists of multiple epistemic agents, which, by pooling measures, can be considered a new type of agent with its own belief distribution. We consider only non-interacting agents whose belief distributions are independent of one another. Each agent is privy to the same data streams, though they may not sample them at the same rates due to differing cognitive architectures. We denote the set of agents with their respective cognitive architectures as

A = {(N_{1}, Δ t_{1}), (N_{2}, Δ t_{2}), \dots}

. Each agent estimates the influence, and if they are certain of their estimate, the conditional distribution collapses to a Dirac delta,

p (T_{12} | N_{a}, Δ t_{a}) = δ (T_{12} - T_{12}^{N_{a}, Δ t_{a}})

; for sophisticated agents, we imagine the distribution being peaked, but with a variance stemming from the known uncertainties and error propagation inherent to the computation.

The polity belief distribution is found using Bayes’ Rule:

\begin{matrix} ρ (T_{12}) & = \sum_{a \in A} ρ (T_{12} | N_{a}, Δ t_{a}) p_{a} \\ = \sum_{a \in A} \sum_{N = 1}^{\infty} \int_{0}^{\infty} d Δ t δ_{N, N_{a}} δ (Δ t - Δ t_{a}) ρ (T_{12} | N_{a}, Δ t_{a}) p_{a} \\ = \sum_{N = 1}^{\infty} \int_{0}^{\infty} d Δ t (\sum_{a \in A} δ_{N, N_{a}} δ (Δ t - Δ t_{a}) p_{a}) ρ (T_{12} | N, Δ t) \\ = \sum_{N = 1}^{\infty} \int_{0}^{\infty} d Δ t ρ_{N} (Δ t) ρ (T_{12} | N, Δ t) \end{matrix}

(A17)

where in the last line we have introduced the polity architecture function so that the fraction of the polity with memory length N and sampling time between

Δ t

and

Δ t + d Δ t

is

ρ_{N} (Δ t) d Δ t

.

p_{a}

is a weight fraction describing how much agent a’s beliefs contribute to the beliefs of the polity. In a pure democracy, for example,

p_{a} = 1 / | A |

, so that each agent contributes equally. In an autocracy, the weights would be clustered around several, or 1, agent—the dictator. Many such games can be played by choosing different functional forms for these weights.

Given the geometric-lognormal distribution defined in the text, let us work out the first example for

ρ (T_{12} | N, Δ t) = δ (T_{12} - T_{12}^{N, Δ t})

—agents with certainty. First, we marginalize over N, leaving us with just the lognormal distribution,

ℓ n (Δ t)

,

\begin{matrix} ρ (T_{12}) & = \int_{0}^{\infty} d Δ t ℓ n (Δ t) δ (T_{12} - (T_{12}^{>} - T_{12}^{<}) Θ (Δ t - τ_{*}) - T_{12}^{<}) \\ = δ (T_{12} - T_{12}^{<}) \int_{0}^{τ_{*}} d Δ t ℓ n (Δ t) + δ (T_{12} - T_{12}^{>}) \int_{τ_{*}}^{\infty} d Δ t ℓ n (Δ t) \end{matrix}

The integrals for the partial integration of lognormal distributions are known in terms of error functions. For the second example, we first marginalize over

Δ t

, to get

\begin{matrix} ρ (T_{12}) = \frac{e^{- \frac{{(T_{12} - T_{12}^{1})}^{2}}{2 σ_{1}^{2}}}}{\sqrt{2 π} σ_{1} 〈 N 〉} \sum_{i = 1}^{N_{1}} {(1 - \frac{1}{〈 N 〉})}^{N - 1} + \frac{e^{- \frac{{(T_{12} - T_{12}^{2})}^{2}}{2 σ_{2}^{2}}}}{\sqrt{2 π} σ_{2} 〈 N 〉} \sum_{i = N_{1} + 1}^{N_{2}} {(1 - \frac{1}{〈 N 〉})}^{N - 1} . \end{matrix}

The remaining sums are trivial.

Appendix C. Mauna Loa Data

We acquired our data for atmospheric CO

_{2}

content and temperature for the Mauna Loa observation site from the NOAA and WMO websites [43,44]. The data spans the years 1958–2021, with measurements taken at monthly intervals. The CO

_{2}

data are measured in parts per million and have an accuracy of

10^{- 2}

. The temperature data represent the mean daily temperature in Celsius and have an accuracy of

10^{- 1}

.

Appendix C.1. Detrending

We wanted to examine the effect of detrending the raw data on our results and were surprised to find only small quantitative changes. Removing trends occurred in steps, and at each step we then bootstrapped the detrended data (see below) and ran our analysis. As a first step, we performed a boxcar average with a window of one year and removed this from the data. This got rid of most of the exponential growth in the dataset but left a remaining linear trend. We performed a linear least squares fit and removed this from the data. The next detrending steps involved the removal of the highest power harmonics. To this end, we took the power spectrum of the data, found the highest peak, and included frequencies up to a

90 %

drop from the peak to remove from the raw data. We then repeated this for the second harmonic, at which point the remaining harmonics had power comparable to their neighbors.

Appendix C.2. Bootstrapping

Note that

Δ t = 1

month is fixed by the data, so we explored the effect of memory size, N, on EA judgements. To create a data ensemble that would allow us to measure the errors in influence, we bootstrapped our data to create 1000 smaller data sets. Each smaller dataset was constructed as follows after fixing N. A random present time, t, is chosen uniformly from the data between 2004–2021. The original data are taken from

1958 - t

, and arithmetic noise is added at each month drawn from a Gaussian distribution with 0 mean and standard deviation equal to the accuracy of the data. Transfer entropy and influence are computed for this noised subset of data for values of

N = 1, 2, \dots 100

, corresponding to EA memory capacities of up to

N Δ t \sim 8

years. We then used a peaked smoothing kernel,

[0.09, 0.18, 0.46, 0.18, 0.09]

, to clean up the simulation results. This is repeated 1000 times to generate the full ensemble. For fixed N, statistics are run on the ensemble, with means and standard deviations computed via unbiased estimators.

References

Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
Dembo, A.; Cover, T.M.; Thomas, J.A. Information theoretic inequalities. IEEE Trans. Inf. Theory 1991, 37, 1501–1518. [Google Scholar] [CrossRef]
Cover, T.M.; Thomas, J.A. Information theory and statistics. Elem. Inf. Theory 1991, 1, 279–335. [Google Scholar]
Schreiber, T. Measuring information transfer. Phys. Rev. Lett. 2000, 85, 461. [Google Scholar] [CrossRef]
Lizier, J.T.; Prokopenko, M.; Zomaya, A.Y. Local information transfer as a spatiotemporal filter for complex systems. Phys. Rev. E 2008, 77, 026110. [Google Scholar] [CrossRef] [PubMed]
Wibral, M.; Vicente, R.; Lizier, J.T. Directed Information Measures in Neuroscience; Springer: Berlin/Heidelberg, Germany, 2014. [Google Scholar]
Caticha, A. Lectures on probability, entropy, and statistical physics. arXiv 2008, arXiv:0808.0012. [Google Scholar]
Sowinski, D.R. Complexity and Stability for Epistemic Agents: The Foundations and Phenomenology of Configurational Entropy; Dartmouth College: Hanover, NH, USA, 2016. [Google Scholar]
Ursino, M.; Ricci, G.; Magosso, E. Transfer Entropy as a Measure of Brain Connectivity: A Critical Analysis with the Help of Neural Mass Models. Front. Comput. Neurosci. 2020, 14, 45. [Google Scholar] [CrossRef]
Bossomaier, T.; Barnett, L.; Harré, M.; Lizier, J.T. Transfer entropy. In An Introduction to Transfer Entropy; Springer: Berlin/Heidelberg, Germany, 2016; pp. 65–95. [Google Scholar]
Gencaga, D.; Knuth, K.H.; Rossow, W.B. A recipe for the estimation of information flow in a dynamical system. Entropy 2015, 17, 438–470. [Google Scholar] [CrossRef]
Wolpert, D.H.; Wolf, D.R. Estimating functions of probability distributions from a finite set of samples. Phys. Rev. E 1995, 52, 6841. [Google Scholar] [CrossRef]
Agapiou, S.; Papaspiliopoulos, O.; Sanz-Alonso, D.; Stuart, A.M. Importance sampling: Intrinsic dimension and computational cost. Stat. Sci. 2017, 32, 405–431. [Google Scholar] [CrossRef]
Aguilera, A.C.; Artés-Rodríguez, A.; Pérez-Cruz, F.; Olmos, P.M. Robust sampling in deep learning. arXiv 2020, arXiv:2006.02734. [Google Scholar]
Hollingsworth, J.; Ratz, M.; Tanedo, P.; Whiteson, D. Efficient sampling of constrained high-dimensional theoretical spaces with machine learning. arXiv 2021, arXiv:2103.06957. [Google Scholar] [CrossRef]
Rotskoff, G.M.; Mitchell, A.R.; Vanden-Eijnden, E. Active Importance Sampling for Variational Objectives Dominated by Rare Events: Consequences for Optimization and Generalization. arXiv 2021, arXiv:2008.06334. [Google Scholar]
Zhu, J.; Bellanger, J.J.; Shu, H.; Le Bouquin Jeannès, R. Contribution to transfer entropy estimation via the k-nearest-neighbors approach. Entropy 2015, 17, 4173–4201. [Google Scholar] [CrossRef] [Green Version]
Caticha, A.; Giffin, A. Updating probabilities. AIP Conf. Proc. 2006, 872, 31–42. [Google Scholar]
Ramsey, F.P. Truth and probability. In Readings in Formal Epistemology; Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–45. [Google Scholar]
Caticha, A. Entropic dynamics. AIP Conf. Proc. 2002, 617, 302–313. [Google Scholar]
Caticha, A. Entropic dynamics. Entropy 2015, 17, 6110–6128. [Google Scholar] [CrossRef]
Barnett, L.; Seth, A.K. Detectability of Granger causality for subsampled continuous-time neurophysiological processes. J. Neurosci. Methods 2017, 275, 93–121. [Google Scholar] [CrossRef]
Spinney, R.E.; Lizier, J.T. Characterizing information-theoretic storage and transfer in continuous time processes. Phys. Rev. E 2018, 98, 012314. [Google Scholar] [CrossRef]
Spinney, R.E.; Prokopenko, M.; Lizier, J.T. Transfer entropy in continuous time, with applications to jump and neural spiking processes. Phys. Rev. E 2017, 95, 032319. [Google Scholar] [CrossRef]
Prokopenko, M.; Lizier, J.T. Transfer entropy and transient limits of computation. Sci. Rep. 2014, 4, 5394. [Google Scholar] [CrossRef]
Szilard, L. Über die Entropieverminderung in einem thermodynamischen System bei Eingriffen intelligenter Wesen. Z. Phys. 1929, 53, 840–856. [Google Scholar] [CrossRef]
Landauer, R. Irreversibility and heat generation in the computing process. IBM J. Res. Dev. 1961, 5, 183–191. [Google Scholar] [CrossRef]
Boyd, A.B.; Crutchfield, J.P. Maxwell demon dynamics: Deterministic chaos, the Szilard map, and the intelligence of thermodynamic systems. Phys. Rev. Lett. 2016, 116, 190601. [Google Scholar] [CrossRef] [PubMed]
Bekenstein, J.D. Universal upper bound on the entropy-to-energy ratio for bounded systems. In JACOB BEKENSTEIN: The Conservative Revolutionary; World Scientific: Singapore, 2020; pp. 335–346. [Google Scholar]
Bekenstein, J.D. How does the entropy/information bound work? Found. Phys. 2005, 35, 1805–1823. [Google Scholar] [CrossRef]
Bekenstein, J.D. Black holes and entropy. In JACOB BEKENSTEIN: The Conservative Revolutionary; World Scientific: Singapore, 2020; pp. 307–320. [Google Scholar]
Bremermann, H.J. Minimum energy requirements of information transfer and computing. Int. J. Theor. Phys. 1982, 21, 203–217. [Google Scholar] [CrossRef]
Massey, J. Causality, feedback and directed information. In Proceedings of the 1990 International Symposium on Information Theory and Its Applications (ISITA-90), Waikiki, HI, USA, 27–30 November 1990; pp. 303–305. [Google Scholar]
Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
Gleiser, M.; Sowinski, D. How we make sense of the world: Information, map-making, and the scientific narrative. In The Map and the Territory; Springer: Berlin/Heidelberg, Germany, 2018; pp. 141–163. [Google Scholar]
Barnett, L.; Barrett, A.B.; Seth, A.K. Granger Causality and Transfer Entropy Are Equivalent for Gaussian Variables. Phys. Rev. Lett. 2009, 103, 238701. [Google Scholar] [CrossRef]
Amblard, P.O.; Michel, O.J. The relation between Granger causality and directed information theory: A review. Entropy 2013, 15, 113–143. [Google Scholar] [CrossRef]
James, R.G.; Barnett, N.; Crutchfield, J.P. Information flows? A critique of transfer entropies. Phys. Rev. Lett. 2016, 116, 238701. [Google Scholar] [CrossRef]
Lizier, J.T.; Prokopenko, M. Differentiating information transfer and causal effect. Eur. Phys. J. B 2010, 73, 605–615. [Google Scholar] [CrossRef]
Ay, N.; Polani, D. Information flows in causal networks. Adv. Complex Syst. 2008, 11, 17–41. [Google Scholar] [CrossRef]
Oh, S.; Bowen, E.F.; Rodriguez, A.; Sowinski, D.; Childers, E.; Brown, A.; Ray, L.; Granger, R. Towards a Perceptual Distance Metric for Auditory Stimuli 2020. Available online: http://xxx.lanl.gov/abs/2011.00088 (accessed on 14 July 2022).
Bowen, E.; Rodriguez, A.; Sowinski, D.; Granger, R. Visual stream connectivity predicts assessments of image quality. Accepted in the Journal of Vision under JOV-07873-2021R2. arXiv 2020, arXiv:2008.06939. [Google Scholar]
Lawrimore, J.H.; Menne, M.J.; Gleason, B.E.; Williams, C.N.; Wuertz, D.B.; Vose, R.S.; Rennie, J. Global Historical Climatology Network - Monthly (GHCN-M); NOAA National Centers for Environmental Information, NESDIS, NOAA, U.S. Department of Commerce: Washington, DC, USA, 2021; Version 3. [Google Scholar]
World Meteorological Organization. Climate Explorer; World Meteorological Organization: Geneva, Switzerland, 2021. [Google Scholar]
Koutsoyiannis, D.; Kundzewicz, Z.W. Atmospheric Temperature and CO₂: Hen-Or-Egg Causality? Sci 2020, 2, 83. [Google Scholar] [CrossRef]
Bekenstein, J.D. Black Holes and Entropy. Phys. Rev. D 1973, 7, 2333–2346. [Google Scholar] [CrossRef]
Lloyd, S. Ultimate physical limits to computation. Nature 2000, 406, 1047–1054. [Google Scholar] [CrossRef]
Bennett, C.H. The thermodynamics of computation—A review. Int. J. Theor. Phys. 1982, 21, 905–940. [Google Scholar] [CrossRef]
Earman, J.; Norton, J.D. EXORCIST XIV: The wrath of Maxwell’s demon. Part I. From Maxwell to Szilard. Stud. Hist. Philos. Sci. Part B Stud. Hist. Philos. Mod. Phys. 1998, 29, 435–471. [Google Scholar] [CrossRef]
Earman, J.; Norton, J.D. Exorcist XIV: The wrath of Maxwell’s demon. Part II. From Szilard to Landauer and beyond. Stud. Hist. Philos. Sci. Part B Stud. Hist. Philos. Mod. Phys. 1999, 30, 1–40. [Google Scholar] [CrossRef]
Kim, H.; Davies, P.; Walker, S.I. New scaling relation for information transfer in biological networks. J. R. Soc. Interface 2015, 12, 20150944. [Google Scholar] [CrossRef]
Lizier, J.T.; Mahoney, J.R. Moving frames of reference, relativity and invariance in transfer entropy and information dynamics. Entropy 2013, 15, 177–197. [Google Scholar] [CrossRef]
Wolpert, D.H. Minimal entropy production rate of interacting systems. New J. Phys. 2020, 22, 113013. [Google Scholar] [CrossRef]
Sinan, A.; Lev, M.; Arun, S. Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks. Proc. Natl. Acad. Sci. USA 2009, 106, 21544–21549. [Google Scholar] [CrossRef]
Mimar, S.; Juane, M.M.; Park, J.; Muñuzuri, A.P.; Ghoshal, G. Turing patterns mediated by network topology in homogeneous active systems. Phys. Rev. E 2019, 99, 062303. [Google Scholar] [CrossRef]
Conover, M.; Ratkiewicz, J.; Francisco, M.; Goncalves, B.; Menczer, F.; Flammini, A. Political Polarization on Twitter. Proc. Int. AAAI Conf. Web Soc. Media 2021, 5, 89–96. [Google Scholar]
Bettencourt, L.M.; Gintautas, V.; Ham, M.I. Identification of functional information subgraphs in complex networks. Phys. Rev. Lett. 2008, 100, 238701. [Google Scholar] [CrossRef]
Brown, J.; Bossomaier, T.; Barnett, L. Information flow in finite flocks. Sci. Rep. 2020, 10, 3837. [Google Scholar] [CrossRef]
Brown, J.M.; Bossomaier, T.; Barnett, L. Information transfer in finite flocks with topological interactions. J. Comput. Sci. 2021, 53, 101370. [Google Scholar] [CrossRef]
Jiang, L.; Giuggioli, L.; Perna, A.; Escobedo, R.; Lecheval, V.; Sire, C.; Han, Z.; Theraulaz, G. Identifying influential neighbors in animal flocking. PLoS Comput. Biol. 2017, 13, e1005822. [Google Scholar] [CrossRef]
Vahdati, A.R.; Weissmann, J.D.; Timmermann, A.; Ponce de León, M.S.; Zollikofer, C.P. Drivers of Late Pleistocene human survival and dispersal: An agent-based modeling and machine learning approach. Quat. Sci. Rev. 2019, 221, 105867. [Google Scholar] [CrossRef]

Figure 1. (A) Instances of paths generated by equations of motion, Equations (4) and (5), with the same seed for each pseudorandom generator used to simulate the heat bath. Axes have been scaled to the natural length and time scales, ℓ and

τ

, respectively. Paths are initialized 10ℓ units from each other. Note the existence of transient behavior decaying with timescale

τ

followed by steady state behavior dominated by stochasticity. The black line represents the mean of the two processes while the grey lines of decreasing opacity are an integer number of ℓs away from the mean; (B) an information diagram of transfer entropy in our toy model. To the right, we have our coupled stochastic processes

X_{1}

and

X_{2}

, as well as the heat reservoir,

R

, through which noise is introduced to both processes. Coupling constants are labelled

α_{i}

and

β_{i}

. The present is at time t, and the temporal discretization scale is

Δ t

. To the left, we have an information diagram where each circle represents the entropy of that quantity. The present and past entropies are shown as bubbles, and the mutual information is the intersection of these bubbles. The transfer entropy is labeled in relation to this mutual information. Note that this diagram has a mirror image on the other side of the processes (not shown) that represents the reverse

T_{1 \to 2}

calculation.

Figure 1. (A) Instances of paths generated by equations of motion, Equations (4) and (5), with the same seed for each pseudorandom generator used to simulate the heat bath. Axes have been scaled to the natural length and time scales, ℓ and

τ

, respectively. Paths are initialized 10ℓ units from each other. Note the existence of transient behavior decaying with timescale

τ

followed by steady state behavior dominated by stochasticity. The black line represents the mean of the two processes while the grey lines of decreasing opacity are an integer number of ℓs away from the mean; (B) an information diagram of transfer entropy in our toy model. To the right, we have our coupled stochastic processes

X_{1}

and

X_{2}

, as well as the heat reservoir,

R

, through which noise is introduced to both processes. Coupling constants are labelled

α_{i}

and

β_{i}

. The present is at time t, and the temporal discretization scale is

Δ t

. To the left, we have an information diagram where each circle represents the entropy of that quantity. The present and past entropies are shown as bubbles, and the mutual information is the intersection of these bubbles. The transfer entropy is labeled in relation to this mutual information. Note that this diagram has a mirror image on the other side of the processes (not shown) that represents the reverse

T_{1 \to 2}

calculation.

Figure 2. (A) The influence contours in our toy model across

a \in [- 1, 1]

and

Δ t \in [10^{- 3}, 10^{3}]

for

b = 0.03

and

N = 4

. The three slices,

I, I I,

and

I I I

, are taken at

a = - 0.1, 0.2, 0.5

, respectively. (B) The transfer entropy surfaces from which (A) was constructed, with slices shown. (C) The three slices are displayed. Note the crossover for

a = 0.2

. (D)

T_{12}

is plotted for other values of b and N, with a vanishing influence represented by the black line. Each contour plot has the same axes as (A). For negative values of b, the diagrams would have a flipped color scheme. Note how for constant a values the sign flip in the asymmetry of transfer entropy is a generic feature across a large portion of the parameter space, typically occurring within an order of magnitude of the natural timescale.

Figure 2. (A) The influence contours in our toy model across

a \in [- 1, 1]

and

Δ t \in [10^{- 3}, 10^{3}]

for

b = 0.03

and

N = 4

. The three slices,

I, I I,

and

I I I

, are taken at

a = - 0.1, 0.2, 0.5

, respectively. (B) The transfer entropy surfaces from which (A) was constructed, with slices shown. (C) The three slices are displayed. Note the crossover for

a = 0.2

. (D)

T_{12}

is plotted for other values of b and N, with a vanishing influence represented by the black line. Each contour plot has the same axes as (A). For negative values of b, the diagrams would have a flipped color scheme. Note how for constant a values the sign flip in the asymmetry of transfer entropy is a generic feature across a large portion of the parameter space, typically occurring within an order of magnitude of the natural timescale.

Figure 3. (A) Influence between processes with

a = 0.7

and

b = 0.75

. Due to numerical accuracy, the grey regions have been removed from the plot. The horizontal line represents

N = 2

, while the diagonal line represents look-back windows that have been cut into intervals of duration equal to the natural timescale of the system,

Δ t = 1

(in dimensionless units). The black curve cutting through the plot is null information flow. For constant look-back window size, T, the direction of information flow depends on the discretization of the window; (B) a plot array for several values of the deterministic and stochastic parameters. Each subplot has the same description as the top panel. Note that the sign cross-over is a generic feature in much of the parameter space; furthermore, it is typically found within an order of magnitude of the natural timescale.

Figure 3. (A) Influence between processes with

a = 0.7

and

b = 0.75

. Due to numerical accuracy, the grey regions have been removed from the plot. The horizontal line represents

N = 2

, while the diagonal line represents look-back windows that have been cut into intervals of duration equal to the natural timescale of the system,

Δ t = 1

(in dimensionless units). The black curve cutting through the plot is null information flow. For constant look-back window size, T, the direction of information flow depends on the discretization of the window; (B) a plot array for several values of the deterministic and stochastic parameters. Each subplot has the same description as the top panel. Note that the sign cross-over is a generic feature in much of the parameter space; furthermore, it is typically found within an order of magnitude of the natural timescale.

Figure 4. (A) Raw data from temperature and CO

_{2}

content sensors taken at Mauna Loa from 1958–2022. Smaller inset plots show detrending procedures applied to the data: First, the exponential and linear trends are removed using a best fit. This is followed by the progressive removal of the harmonics containing the most signal power; (B) the transfer entropies computed from the pairs of data streams as a function of agent memory size. The scale on the vertical axis is irrelevant, as any scaling is removed in the next step. The progressively detrended data results in qualitatively similar transfer entropies compared to the raw data, with a noticable decrease in variance and a scaling by an order of magnitude after the initial detrending step; (C) the influence between the data streams—since the influence is insensitive to scale changes in the transfer entropy, both raw and detrended data result in nearly identical curves.

Figure 4. (A) Raw data from temperature and CO

_{2}

content sensors taken at Mauna Loa from 1958–2022. Smaller inset plots show detrending procedures applied to the data: First, the exponential and linear trends are removed using a best fit. This is followed by the progressive removal of the harmonics containing the most signal power; (B) the transfer entropies computed from the pairs of data streams as a function of agent memory size. The scale on the vertical axis is irrelevant, as any scaling is removed in the next step. The progressively detrended data results in qualitatively similar transfer entropies compared to the raw data, with a noticable decrease in variance and a scaling by an order of magnitude after the initial detrending step; (C) the influence between the data streams—since the influence is insensitive to scale changes in the transfer entropy, both raw and detrended data result in nearly identical curves.

Figure 5. The left panel shows all the polity distributions for ensembles of agents with average memory

〈 N 〉

. The central curve is the average value of the influence, and the margins are taken at

10 %

intervals centered on the median. The right panels are the belief distributions for, starting at the top,

〈 N 〉 = 1.5

, 6 and 24. The thick dashed line is the mean, and the thin dashed line is the median. The inset pie charts are the fraction of the population that believe one way or the other.

Figure 5. The left panel shows all the polity distributions for ensembles of agents with average memory

〈 N 〉

. The central curve is the average value of the influence, and the margins are taken at

10 %

intervals centered on the median. The right panels are the belief distributions for, starting at the top,

〈 N 〉 = 1.5

, 6 and 24. The thick dashed line is the mean, and the thin dashed line is the median. The inset pie charts are the fraction of the population that believe one way or the other.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sowinski, D.R.; Carroll-Nellenback, J.; DeSilva, J.; Frank, A.; Ghoshal, G.; Gleiser, M. The Consensus Problem in Polities of Agents with Dissimilar Cognitive Architectures. Entropy 2022, 24, 1378. https://doi.org/10.3390/e24101378

AMA Style

Sowinski DR, Carroll-Nellenback J, DeSilva J, Frank A, Ghoshal G, Gleiser M. The Consensus Problem in Polities of Agents with Dissimilar Cognitive Architectures. Entropy. 2022; 24(10):1378. https://doi.org/10.3390/e24101378

Chicago/Turabian Style

Sowinski, Damian Radosław, Jonathan Carroll-Nellenback, Jeremy DeSilva, Adam Frank, Gourab Ghoshal, and Marcelo Gleiser. 2022. "The Consensus Problem in Polities of Agents with Dissimilar Cognitive Architectures" Entropy 24, no. 10: 1378. https://doi.org/10.3390/e24101378

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Consensus Problem in Polities of Agents with Dissimilar Cognitive Architectures

Abstract

1. Introduction

2. From Agents to Polities

2.1. Transfer Entropy and Influence

2.2. A Toy Model of Qualia

2.3. Influence between Qualia

2.4. A Toy Model of Polities

3. Results

The Consensus Problem

4. An Argument over Climate Data

5. Discussion and Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

Appendix A. The Sensory Model

Appendix A.1. Scaling and Dimensionalization

Appendix A.2. Diagonalization and Translations

Appendix A.3. Solution by Integration Factor

Appendix B. Information Theory

Appendix B.1. Statistics

Appendix B.2. Belief Distribution

Appendix B.3. Information Measures

Appendix B.4. From Agent to Politi

Appendix C. Mauna Loa Data

Appendix C.1. Detrending

Appendix C.2. Bootstrapping

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI