1. Introduction
Despite its centrality to fields such as linguistics and ethology, the concept of communication has no generally accepted definition. If the main focus is on language, it seems intuitive to use the so-called “conduit metaphor” [
1,
2] and to describe communication as a transfer of information from sender to receiver, the purpose being to reconstruct the message as accurately as possible on the receiver’s side. Successful communication, according to this classical view, constitutes lossless transfer of information from the sender’s to the receiver’s mind, where the concept of information is derived from semantics and pragmatics—that is, from the dictionary meaning of words and the way these words are deployed by intentional actors in ongoing social interaction [
3].
In contrast, biological accounts of communication emphasize the evolved, adaptive nature of communication. According to this view, communication can be defined as “the process of conveying information from senders to receivers by means of signals, and signals as the behaviors or structures that senders evolved in order to convey information” [
4] (p. 2). Now, animals interact in many ways, including patently non-communicational encounters such as predation, accidental eavesdropping across species, and so on. Hence, treating communication as an evolved feature, adaptive to the sender and receiver, focuses the inquiry on those aspects of social interaction that are communicative “by design.”
To situate the current state of play in the philosophy of biology, and more specifically, the philosophy of biological communication, we will introduce two main approaches to communication: the informational, and the influential. The former builds on classical formal theories of communication in terms of a sender, a message, and a receiver, initiated by Shannon [
5]. The latter prefers the terms signaler, signal, and perceiver, and originates in work by Dawkins and Krebs [
6].
The original work by Shannon was developed in the context of optimizing the legibility of human speech communication over telephone lines [
5]. This work conceptualized information as independent of content, and focused on stable transmittable differences, quantifiable in terms of bi-valued bits. Later work by Lakoff [
2] and others resulted in the conduit metaphor which tend to presuppose human-like communicators informing each other about states of the world. These states may then include the states of the communicator’s mind.
While the informational approach works well in the context of humans and machines, Dawkins and Krebs [
6] criticized its use in biological settings. They maintained that organisms tend to reflexively optimize energy use, and that this can be achieved by means of signaling and the perception of those signals by other organisms. Hence the point of communication in this sense is to influence the behavior of the organism’s environment to avoid excessive energy expenditure, or to gain energy [
7] (p. 176) (see also [
8]). This is in contrast with organisms trying to inform each other about something, or maintaining desires or beliefs that need to be communicated somehow [
9].
We interpret the conduit metaphor to sort under the informational approach, and by highlighting its limitations in the context of biological communication, we also argue for the influential approach. However, by treating communication as a continuum that can include both biological forms of communication, as well as human, and artificial ones, we aim to show that it may be possible to unify the two approaches. Further, this unification is mediated by differences in cognitive capabilities which specifically have to do with the degree to which reflective processing is supported.
More specifically, we will investigate communication from two perspectives. First, as a biological natural phenomenon, focusing on its connection to an organism’s embodied structures and memory system. Such a perspective can elucidate how communicative abilities have evolved, consisting of a gradient set distributed on procedural, semantic, and episodic long-term memory as well as on working memory. Moreover, this approach lets us investigate parallels between communication and knowledge—which also map to the memory systems [
10,
11,
12]. In particular, a reflexive and a reflective form of communication will emerge. Second, from the perspective of artificial neural networks. This will show that communication is possible independent of ontological commitments, only requiring similarity of experience. The article will present arguments for the following three theses:
Communication can fruitfully be grounded in an organism’s embodied structure and memory system.
Communication features gradient properties that are plausibly divided into a reflexive and a reflective form.
The conduit metaphor of communication is limited by not taking into account reflexive reward and aversion inducing processes that motivate approach or avoidance.
This teleological view of communication has the additional benefit of being easily extended to artificial systems if we replace evolutionary adaptations with actual intelligent design, in this case by human engineers. To take a very simple example, a red light on a console that alerts the user of memory overload or low battery power can be viewed as communicative if it was designed for this specific purpose. In
Section 2, a short background to our biological account of communication is presented, which
Section 3 links to embodied structures, memory systems, and knowledge.
Section 4 focuses on the communicative sender, both from a reflexive and a reflective perspective, and
Section 5 then deals with the communicative receiver in a similar manner. In
Section 6, the discussion is connected to the development of biologically grounded communicative features in AI systems.
2. Background
An important point of contention in biological theories of communication concerns the notion that signals carry information, which is then processed by the receiver. As illustrated by many of the examples in the following sections, both the production and perception of signals may be too direct to plausibly involve mental representations or advanced cognitive processing. According to some authors, it is therefore best to avoid the notion of information and to define communication as the process of altering other’s behavior via evolved mechanisms [
4,
13]. For example, piercing shrieks draw attention and increase arousal simply because of their acoustic properties [
14], leading the authors to propose a distinction between direct and indirect affect induction in the audience. This view of communication, with a focus on influencing instead of informing others, is a valuable contribution from biological research and a reminder that language is not the only possible form of communication. On the other hand, the existence of “direct” signals does not necessarily mean that they carry no information. If the effect—or meaning—of a signal depends on the receiver’s set of sensory organs, cognitive architecture, and unique life history, the informational content of a signal is best treated not as an intrinsic property of the signal itself, but as a product of its interaction with a particular receiver in a particular context [
15]. Once we acknowledge that the informational content of a given signal is not constant under all conditions, even “direct” signals such as startling shrieks can be accommodated by an information-based view of communication, which can then be defined as exchange of information via an evolved (for biological systems) or designed (for artificial systems) mechanism. It makes sense here to also contrast the biological perspective with Floridi’s (see, e.g., [
16,
17]) notion of “true semantic content.” Factual semantics is necessary when an agent needs to acquire knowledge about the world. This is particularly the case when that knowledge is needed as a means to an end, such as finding the solution to a problem, or finding the path to some goal. In situations when an agent cannot observe the world first hand, but is dependent on a third party for getting information, the veracity of that third party’s account is critical. This state of affairs is common in human society, but does have analogues among animals as well. An example of this may be the case of bees reporting on the suitability of found hive migration sites (see, e.g., [
18]). Yet, as will be further explicated below, in cases such as mate attraction, it may not make sense to speak of, e.g., a colorful plume, or towering antlers as conveying facts about an individual’s fitness. Rather, it may be more plausible to understand the colors, or the antlers as inducing reward processes in the observer. These processes again facilitate and motivate approach behavior by reflex mechanisms.
The “conduit metaphor,” however, is not compatible with a biological account of communication. As we argue below, the language-inspired notion of communication as the process of intentionally transferring a mental representation from the sender to the receiver via a symbolic code represents only the tip of the iceberg—a highly specialized and rather unusual example of communication in the biological world. Instead, a useful starting point in studying communication may be to specify the various cognitive mechanisms involved in the production and perception of different signals, from fairly “direct” to the most cognitively sophisticated. In other words, given that animal’s innate capabilities are formed by evolutionary processes, that tend to vary widely, each particular species’ capabilities and prerequisites are crucial to take into account. A failure to acknowledge, for example, non-linguistic or modal limitations risks missing essential communicative features. Therefore, an animal’s cognitive faculties, as well as developmental factors, are potentially interesting.
3. Knower
To gain an overarching perspective of communication, we regard it conducive to link our discussion of communication to cognitive psychology (see, e.g., [
11,
12,
19,
20,
21,
22]). We will argue that both communication and knowledge feature parallel gradient properties that can be usefully compared. In doing so, we hope to be able to ground communication in a way that is elucidating.
All organisms have been formed by adaptations through evolutionary processes. They thus in a certain sense match their environment and so, through their structure, embody a form of “biological knowledge” [
23]. What this means is that each organism has a specific set of ways to interact with the world and a specific set of faculties to perceive the world. In other words, an organism’s structures and capabilities enable and delimit its impressions (actions) on the world and its fellow creatures, as well as its interpretations (perceptions) of incoming stimuli. Since different species live under very different circumstances and in different environments, they subsequently have been shaped to appear, act, and perceive very differently.
Focusing on cognitive capabilities, memory is central for knowledge. Memory is often divided into long-term memory and working memory.
1 Long-term memory can, in turn, be divided into (non-declarative or implicit) procedural memory governing actions, skills, and an animal’s ability to tackle practical obstacles, as well as (declarative or explicit) semantic memory governing pattern recognition and categorizations, and episodic memory governing that is closely tied to remembrance and language. Working memory consists of a central executive that works as a decision making and conscious control station, a phonological loop governing internal linguistic sequences, a visuospatial sketchpad governing visual semantics and mental images, and an episodic buffer that binds information into episodes. Working memory, together with episodic long-term memory, governs reflection [
20].
A way to facilitate our discussion is offered by the cognitive psychological Dual Process Theory (see, e.g., [
21,
22,
24,
25,
26,
27]). Sidestepping a number of details, Dual Process Theory divides mental processes into two kinds: non-conscious and automatic Type 1 processes and conscious and reflecting Type 2 processes. We will use these process forms, and the aforementioned memory forms, as a background framework to engage the following discussion of communication.
3.1. Reflexive Knowledge
From a cognitive psychological perspective, reflexive knowledge relies heavily on purely embodied structures. As mentioned, evolved structures provide limits and affordances for an organism’s interactions with the world. This has led to vast differences in size, form, and capability. Moreover, different organisms rely differently on their various senses. For example, to some, tactile or olfactory stimuli are essential, whereas others primarily depend on visual or auditory input. It should be pointed out that it is important to understand both what form of stimuli an organism is capable to register, as well as to what degree it relies on a particular form of stimuli (or a particular weighted combination). These limitations will naturally also constrain that organism’s abilities for communication with its environment, in particular its abilities to receive signals.
Concerning cognitive capabilities, reflexive knowledge also relies on non-conscious Type 1 processes [
21]. Such reflexive knowledge is based in procedural and semantic long-term memory [
11]. These processes are thought to primarily be implicit, intuitive, and automatic. They are linked to motor skills and abilities, such as bodily movement and the ability to vocalize. Moreover, the ability to categorize, and to associatively learn, comprise a conceptual form of reflexive knowledge. Procedural memory governs motoric, reflexive and perceptual pathways, whereas semantic memory governs, for example, associative pathways (see, e.g., [
28,
29]).
Reflexive knowledge may also include propensities for reward and aversion. These propensities can be seen as innate or learned reflexive associations between the sensory apparatus and the motivation and motor pathways [
30]. These pathways enable quick and effortless reaction to rewarding or threatening stimuli, but often at the cost of accuracy [
31]. Typically, rewarding stimuli motivates approach behavior, while aversive stimuli tend to motivate avoidance [
32].
Reflexive knowledge is spread throughout the animal kingdom, even though it can look very differently depending on the relevant context various animals find themselves in. Reflexive knowledge thus enables eliciting automatic signals, as well as automatic reactions to signals.
3.2. Reflection
Reflective abilities—although they are unusual in the totality of biological knowledge—rely heavily on Type 2 processes, involving capabilities such as language and mental representation. Type 2 processes are based in episodic long-term memory and working memory—albeit intertwined with procedural and semantic memory.
It is worth pointing out that “the brain mechanisms subserving episodic-like memory are highly conserved among mammals” [
33] (p. 10373), and so many animals have some capacity for episodic long-term memory [
34], such as rats [
35], corvids [
36], and primates [
37]. Including all organisms, different species can be seen to range from having no such capabilities, or being capable of some rudimentary forms, to having open-ended general intelligence.
Although it is no trivial matter to dissociate short-term memory from working memory, it is also well established that many animals, such as mice, rats, dogs, and monkeys, have some form of working memory [
38,
39]. This involves being able to chunk information and to resist interfering stimuli. However, it remains unclear exactly to what extent animals have such working memory capabilities. This might in part depend both on that it is hard to measure such capabilities with certainty, and in part depend on that the research area is still relatively overlooked [
33]. Working memory is involved in enabling domain-general information processing [
20,
40], where particularly the episodic buffer works as a link between the central executive in working memory and episodic memory in the long-term memory system.
Now, reflective capabilities involve, for example, rule-based explicit reasoning, mentalizing (mindreading), mental time travelling, hypothetical thinking, and language abilities (see, e.g., [
28,
41]). Working memory is divided into a number of discrete components (see, e.g., [
20]). The phonological loop governs internal monologues and speech as well as interpretation. The visuospatial sketchpad governs visual and spatial information. The central executive governs cognitive control and executive functions. The episodic buffer governs mediation between memory systems, especially between the central executive and episodic long-term memory, integrating relevant information needed for planning and executive control. The episodic long-term memory governs integration of sensory streams, together with the episodic buffer, encoding and reconstructing episodes. In summary, reflective knowledge is also spread throughout the animal kingdom, even though it is much less clear how, and to what degree. It affords communicating about events and situations that are temporally and spatially distanced from the immediate environment.
3.3. Grounding Communication
In order to ground communication in memory, we link reflexive communication to embodied and innate aspects of an animal, as well as to reflexive behaviors. Indeed, this is plausible given that Type 1 processes rely on procedural and semantic long-term memory, thought to be instantiated in many animals. Importantly, as previously pointed out, such aspects can take very different forms depending on species. Moreover, reflexive signaling can involve conscious aspects, but are nonetheless seen to principally make up non-conscious functions. Reflective communication is linked to mentalizing abilities and Type 2 processes that rely on episodic memory and working memory. These competences are also widely spread although it is unclear to what extent they are found to a high degree.
Briefly put, and illustrated in
Figure 1 below, procedural communicative features involve, for example, somatic, innate and inflexible responses and behaviors. Semantic features involve, for example, learned inflexible behaviors, whereas episodic features involve, for example, learned flexible behaviors as well as mental simulation, planning, and language.
In the following two sections we will use this grounding of knowledge and communication in memory as a background framework to stepwise investigate communication, first from the perspective of the sender and thereafter from the perspective of the receiver.
4. Sender: Reflexive and Reflective Communication Production
In this section, we will focus on the communicative sender. This role can be filled by an animal, human, or AI system, potentially involving a wide range of communicative possibilities.
The least cognitively demanding production mechanisms rely on innate neural circuits and require neither learning nor conscious access to the communicated signal. In fact, the production of many signals does not even involve the brain. Signals of this type, which we refer to as ‘somatic,’ are long-term modifications of the signaler’s body that evolved in order to inform other organisms about the fitness, age, sex, and social status of the signaler. For example, males of many animal species possess ornate and seemingly useless features: antlers in deer, large tail feathers in peacocks, brightly colored spots in fishes, and so on [
42]. These decorations are thought to evolve due to sexual selection driven by female preferences. Sexual selection in humans is an object of continuing debate and speculation. For example, it is possible that a descended larynx and beard in males are examples of somatic features whose evolution was driven by female preferences and male competition in the context of attempting to exaggerate the apparent body size [
43,
44]. If that is true, these human peculiarities can be regarded as somatic communicative signals. It is also important to emphasize that the operation of sexual selection is not limited to somatic signals. Complex behavioral traits, such as songs of oscine birds or roaring contests of male deer [
45], also evolve to regulate mating. There are even speculations that such uniquely human abilities as music and language [
46] were affected by sexual selection.
Moving on from somatic features to signals whose production is rapid and controlled by the brain, there are many examples of communicative signals that are fully or largely innate in terms of both form and context of production. For example, worker ants returning from a food site lay down a pheromone trail, which helps to recruit and guide other workers, who in turn strengthen the trail with fresh pheromone markers until the food supply is exhausted. By using several types of attractant and repellent pheromones with varying half-life, ants can coordinate the behavior of the entire colony in an adaptive and highly flexible manner [
47]. However, the form of signal (the choice of a particular pheromone) and the timing of its expression appear to be determined by simple ‘if-then’ rules, leaving limited room for learning, broader context, or conscious intentions. This may be obvious in the case of ants, but innate and relatively inflexible signals are by no means unique to invertebrates. On the contrary, a very large proportion of animal signals falls into this category. For example, the basic structure of nearly all primate vocalizations and many gestures is genetically determined [
48], and each is associated with a range of typical eliciting contexts. In humans, congenitally deaf infants learn to laugh normally [
49], which indicates that the appropriate motor programs (a coordinated activity of the diaphragm and muscles of the larynx) are species-typical behaviors that mature without auditory feedback and are triggered in a predetermined eliciting context (social play, tickling), again without the need for environmental input. Nor do we grow out of such innate signaling as adults: if suddenly frightened, most people will scream and display the classical primate “fear face” before being able to monitor or suppress this involuntary reaction. As demonstrated by this example, neural circuitry for the production of species-typical signals in relatively narrow, predetermined contexts remains operative in organisms endowed with a strong capacity for social learning and intentional control, including humans.
In contrast to ants laying pheromone tracks or deaf infants laughing when tickled, many animals deploy species-typical signals with a considerable degree of flexibility. In many cases, learning has only a limited role in determining the context of production. A well-known example is the alarm call that vervet monkeys use to alert members of the group to the presence of an aerial predator. While young monkeys initially produce the eagle alarm call to things such as falling leaves and harmless birds, they gradually learn which species of raptors are particularly dangerous and call only when they spot those [
50]. The acoustic structure of the call itself is innate; further, there is a strong predisposition to apply this call type to threats from above rather than to terrestrial predators such as leopards or snakes, for which vervet monkeys use different alarm calls. Learning serves to fine-tune the eliciting context, but the production of alarm calls remains rather predictable.
At the opposite extreme of flexibility, calls of chimpanzees are much less context specific, even if their acoustic structure is innate, and some calls may even be produced with intention to inform. For example, chimpanzees appear to produce more alarm calls when other animals are not aware of the threat [
51], and they may be able to inhibit the production of food grunts when it would be disadvantageous to disclose this information to others [
52], although this inhibition appears to be effortful and is not always successful [
53].
It is also important to point out that the same signal can be produced with varying degrees of flexibility or intentional control. The question of intentionality in animal communication is fraught with difficulty [
54,
55], but human emotional expressions are a clear case in point. Non-verbal vocalizations and facial expressions can be produced spontaneously, as when laughing at something amusing or showing a genuine, Duchenne smile [
56], but they can also be used in a more controlled fashion, as when smiling or chuckling politely on social occasions. Interestingly, different neural circuits appear to be involved depending on whether an emotional expression such as a laugh is produced spontaneously or volitionally [
57], which demonstrates that the same communicative signal can be generated by different cognitive mechanisms. In addition, there are detectable differences between spontaneous and volitional facial expressions [
56] and vocalizations [
58], indicating that markers of genuine affect are hard to fake and thus relatively “honest.” The crucial point is that this honesty stems precisely from lack of intentional control. The less the context of production is open to manipulation, the more reliably the signal expresses the true mental state of the sender. As the amount of flexibility increases, the signal can potentially express a wider range of meanings [
55], but it also places a greater burden on the receiver, who now has to take into account the broader context, and possibly also the reputation of the sender, since the “honesty” of communication is no longer guaranteed.
Finally, some aspects of language itself also appear to belong in the category of innate signals with relatively flexible usage. Emotional prosody in spoken language shows strong regularities around the world [
59,
60], making it straightforward to determine whether a speaker of an unfamiliar language is angry, happy, or sad. The changes in voice quality, rate of speaking, intonation and other acoustic features appear to stem from the even more universal nonverbal emotional vocalizations [
61,
62], which are in turn traceable back to the vocalizations of the great apes and other primates [
63,
64]. In addition to emotional prosody, spoken language utilizes a number of largely universal grammatical markers, such as rising intonation in questions [
65] or simple interjections such as “Huh?” [
66]. While their usage is flexible and subject to intentional control, the form of these signals is thus strongly constrained by the need to conform to the repertoire of vocal and gestural communicative signals that humans are genetically endowed with.
Signals with a completely arbitrary, purely learned form are not common in the natural world. The most obvious exception is human language, although even language is now regarded as less arbitrary than originally claimed by Saussure [
67] due to the widespread presence of onomatopoeia and other forms of sound symbolism in basic vocabulary [
68,
69]. Among animals, the form of signals is normally either wholly or partially innate, but there are interesting exceptions to this rule. The gestural repertoire of great apes is generally considered to be more flexible than their vocalizations [
70,
71]. Furthermore, all species of great apes can be taught to understand and produce hundreds of signs from the American sign language (ASL). While the grammatical structure of their sentences remains relatively impoverished [
72], rigorous testing has confirmed that they do understand the meaning of the signs and can produce them appropriately, not only to obtain reward but also to request information, inform others of their intended course of action, and so on [
73].
The work with language-trained apes probably constitutes the most convincing example of intentional use of symbolic signals by any non-human animal, but signals with non-innate form do exist in the natural world. Vocal dialects are common among songbirds and have been reported in some mammals such as whales [
74,
75] and bats [
76]. Learning plays an important role in the acquisition of such signals, which makes them more similar to human language than to human emotional expressions. Once learned, however, these signals may well be produced without intention to inform and with only limited sensitivity to context, placing them closer to the relatively inflexible signals discussed above.
A number of species have varying degrees of reflective communicative competencies, based in episodic memory and working memory, although testing these abilities are made difficult by the fact that it is sometimes possible to explain these same abilities in ways more in line with reflexive behavior [
33,
36]. At lower levels, capabilities can include any communicative behavior indicating recall of past events. At higher levels, reflective communication involves competencies such as rudimentary symbolic language and a sense of time. In its most advanced forms open-ended language abilities are tied to a developed general intelligence involving the ability to communicate through speech, writing, sign, or gesture, where arbitrary symbols are used as representations in socially agreed upon manners.
There are various theories concerning why such abilities might have developed. Examples include that it is in order for groups to plan for the future [
77]. By playing out and discussing long-term future scenarios, rather than actually carrying them out, efficiency and survival can be increased by a large degree. For example, instead of going into a dark cave to explore, it is safer to first think through and discuss various scenarios and thereafter take relevant precautions beforehand. Such abilities offer enormous survival benefits.
By forming complex syntactic and semantic structures, communication can be both powerful and efficient, involving, for example, mental imagery, recollection, inner speech, reflective awareness, willed action, deliberation, and planning [
78]. Such purpose-driven and intentional abilities of communication enable a highly flexible form of communication in large social groups, referred to by Hockett and Hockett [
79] as “design features” involving ‘displacement’ (ability to tend to things not immediately present), ‘productivity’ (ability to understand new utterances), ‘cultural transmission’ (language learning in social groups), and ‘duality’ (meaningful language, made up by meaningless parts).
5. Receiver: Reflexive and Reflective Communication Perception
In this section, we will focus on the communicative receiver. As before, we will consider this role possible to be filled by an animal, human, or AI system.
The most direct effect of a signal on a receiver—in the sense of involving the smallest amount of neural processing—is largely determined by the properties of peripheral receptors. An example already mentioned in the Introduction is the generally aversive effect of harsh and loud shrieks on listeners [
14]. It is also possible that the cries of infants in humans and other mammalian species are under selective pressure to (i) maximize their subjectively experienced loudness by carrying a significant amount of energy in the range of frequencies to which adults are particularly sensitive [
64], potentially causing pain and even hearing loss in the listener [
80] and (ii) prevent habituation by means of introducing frequency modulation, non-linear vocal phenomena, and other acoustic irregularities [
64,
81]. The aversive effect of such sounds is not mediated by learned associations, but basically stems from excessive stimulation of cochlear hair cells in their most sensitive frequency range. Some minimal degree of neural processing is still necessary, so there is arguably no absolute divide between receptor-driven and other innate responses discussed below. However, such “direct” signals are interesting theoretically since they highlight the danger of approaching all biological communication with a toolkit borrowed from linguistics. The informational content of these stimuli, if any, is clearly very different from that of a verbal utterance.
In many cases, the receiver’s response is not predicated on the physical properties of the signal, but it is nevertheless innate—that is, largely predictable based on the characteristics of the signal and the genetic makeup of the receiver. A good example of such inflexible innate response is the startle reflex—a rapid, spontaneous defensive reaction to a threatening stimulus such as a sudden loud noise. The response does not have to be completely impervious to contextual effects. For instance, in humans the eyeblink to a sudden noise is attenuated by positive and enhanced by negative affective states [
82]. Non-associative learning in the form of habituation can also play some role in modulating the response. However, the basic pattern of the eliciting stimulus and response are “hard wired” rather than learned.
In the animal world, innate responses are extremely common and crucial for survival. To refer back to the example of somatic signals that regulate mating, female preferences for features such as bright plumage or long tail feathers are not the product of associative learning, but rather innately specified responses to the appropriate triggering stimulus. In other words, a female peacock does not learn by observation that males with large tails produce healthy offspring; instead, their brain is predisposed to respond favorably to a particular combination of visual features on a large tail (see, e.g., [
46]). Innately specified responses can persist not only without a chance to learn the meaning of the signal through previous exposure, but without even a theoretical possibility of such exposure. For instance, moths that migrated to Pacific islands relatively recently continue to drop to the ground upon hearing an ultrasound, although this defensive measure against bats is meaningless in their bat-free environment. In contrast, this motor response has been decoupled from the detection of bat cries in species endemic to the islands, although their ears are still sensitive to ultrasounds.
A well-documented example of an innate response in humans is rapid detection of threatening stimuli by subcortical circuits centered on amygdala, which orchestrates a reflexive fearful response to pictures of snakes and spiders [
83]. Interestingly, amygdala also appears to respond similarly to facial expressions of fear in other humans, or rather to the increased visibility of the sclera as the sender’s eyes open wide in fear [
84]. In this case, both the production of the facial expression of fear and its detection appear to be innate and relatively inflexible—that is, hard to control or inhibit intentionally. Revealingly, the responsible neural mechanisms are largely subcortical, which makes both production and response very fast, but also hinders intentional control.
When there is no innate predisposition to respond to a signal in a particular way, the receiver has to learn the signal’s meaning from experience. In behavioral terms, it means observing what events tend to follow the detection of this signal—in other words, what the signal predicts in terms of environmental changes or the ensuing behavior of the sender. In neurological terms, learning the signal’s predictive power (or, more generally, its meaning) requires some form of associative learning. Depending on exactly what is learned and how this information is processed, we propose the following three subtypes of learned responses, from least to most cognitively sophisticated.
The simplest strategy is to associate a signal with a single, standardized response that does not depend on the broader context. Learned, but inflexible responses of this kind appear to be relatively uncommon in the natural world. Overtrained operant conditioning in laboratory animals or household pets is a possible example, but such “mindless” conditioning is seldom advantageous in nature. There is, however, an interesting special case, namely behavioral programs with an innately specified response to a learned signal. Imprinting is popularly associated with the image of Konrad Lorenz followed by his goslings, who had taken him to be their mother. In more natural circumstances, however, imprinting has an important role to play in creating a powerful bond between the mother and her offspring. In highly vocal and colonial animals such as seals and walruses, the ability of the mother to learn the voice of her pup is crucial for them to reunite after the mother’s hunting expeditions. The pup’s calls—more specifically, the unique signature of frequency modulation in the pup’s bark that enables individual recognition [
85]—are thus learned signals that trigger innate nurturing behavior in the mother.
In the majority of cases, when a signal is not coupled with an innately specified response, the animal learns to extract the relevant information from the signal and to respond appropriately, taking into account additional factors such as the sender’s identity, the history of previous interactions with the sender, the presence of other group members, and other contextual factors. As a result, there is no longer a one-to-one correspondence between the signal and the response. For example, vervet monkeys respond to alarm calls depending on their current position—that is, the response is not stereotypical. An animal who hears an eagle alarm call while on the ground will rush up into the branches, whereas an animal who is already high up will descend from the exposed treetops [
50]. Furthermore, if an alarm call is followed by the sound made by the actual predator, this otherwise frightening sound no longer provokes a strong response. For all practical purposes, it appears that an eagle alarm call evokes the mental representation of an eagle in the audience, a snake alarm call brings to mind the image of a snake, and so on [
55].
The idea of signals evoking mental representations remains a controversial, but parsimonious explanation for flexible responses [
54,
55] to context-specific, or functionally referential, signals such as alarm calls. Whether or not mental representations are involved, highly flexible cognitive processing is required when the same signal can be produced in a broad range of contexts. For example, people can laugh with each other or at each other, and the meaning of a laugh can vary accordingly, from benign amusement to malicious taunting [
86]. Likewise, chimpanzees who hear a sequence of screams from two familiar individuals seem to be able not only to determine who is the aggressor and who is the victim, but also to judge whether these roles conform to their expectations based on the existing social hierarchy [
87]. In cases such as this, it becomes increasingly natural to describe animal communication in terms of the inferences that receivers make on the basis of the information that they extract from a signal.
This view aligns closely with the pragmatic approach to human communication. One implication is that the distinction between human language and animal communication has become increasingly blurred on the receiver’s side, whereas the production of signals in the animal world is usually—but not always—restricted to species-typical displays [
88]. Characteristically, comprehension far outstrips production both in human infants and in language-trained animals [
73], again suggesting that the capacity for highly flexible, context-dependent interpretation of learned signals is more widespread and less cognitively costly than the corresponding production skills.
Phenomena such as gaze following, present in for example chimpanzees, indicate some form of ability of understanding mental states. Although, an alternative interpretation is that such competencies merely involve “goal-directed action and perception, common to all apes, (rather than a) sharing (of) psychological states with others in collaborative acts involving joint intention and attention, (which is) unique to the human.” [
89] (p. vii).
Now, communication perception can involve ascriptions of intentionality to others, where a subject can “see” others as intentional agents in their own right. This can be achieved by, for example, theorizing about other’s mental states or by simulating them [
41,
78]. So, communication perception can involve the ability to take another’s perspective as well as being able to discern their intent. In a social setting, such abilities enable complex social interactions, where long-term planning concerning goals in the distant future are made possible. By mental trial and error different predictions concerning perceptual input can then be made.
However, as far as we know, capabilities of such flexible mental representations, involving simulation and reflection on one’s own and other’s thoughts, are biological outliers.
6. Communication between Artificial Neural Networks
In this section, we describe how artificial neural networks can illustrate and lend support to the idea that different but similar experience enable communication between agents, irrespective of ontological commitments.
Artificial systems may presently communicate with humans or other artefacts through a variety of means: voice and natural language, by recognizing human facial expressions, prosody, and body language. So far, the communication of non-linguistic information has been mainly from humans to machines, as machines so far lack proper emotional systems and designers have to make do with simple theatrics. As for recognition in general, deep neural networks have afforded improvement both in inference of human emotional state, as well as generation of very natural-like language. The latter has in fact become good enough to be indistinguishable from the real thing, which in the context of phone calls raises ethical issues of subterfuge, and a call for artificial systems to be made to identify themselves when interacting with humans.
Deep learning neural networks have become one of the most essential computational engineering methods used today. These networks often consist of input, output and hidden neurons that learn to detect features of the signal in the data. In the early days of what became known as ‘computer vision,’ the first pattern recognition algorithms were developed that consisted of multiple layers of hand-coded feature detectors using fixed network weights [
90,
91]. Over time, new methods were developed in order to learn the weights of the hidden units that have state invisible to the observer—that is why a neural network with intermediate feature detecting hidden units is typically referred to as a ‘black box.’ One of the first methods to self-adapt their own weights were invented in analogy with evolution. By randomly sampling the parameter-space and preserving those weight parameters that returned the best results while dismissing those with poor results, the network started to learn its own weights [
92]. A different approach came through the discovery of the backpropagation algorithm that allowed to learn neural networks more effectively [
93,
94]. Even though some theoretical neuroscientists are convinced that backpropagation is also a reasonable possibility of how the neural cortex learns to adapt synaptic strengths [
95], many remain suspicious about the method. Alternatively, predictive coding schemes have been proposed suggesting that higher-cortical cells predict the neural activity of lower cortical cells. In a complex interaction between information flowing upstream towards and downstream away from higher cortical regions, the system begins to learn based on local rules of computation [
96]. This is illustrated in
Figure 2, where we see from the view of one neuron how information flows in both directions in order to gain an internal representation.
The downstream flow of information is given by a generative model that generates images from the internal representation in order to predict the activity of lower cortical regions. This allows a human-like communication process of two agents A and B to be modelled in a computer simulation as illustrated in
Figure 3 [
97]. Both agents are represented by a neural network that was trained on similar data, in this case a set of images of pears. Each of these networks includes a recognition as well as a generative model that allows the agents to build internal representations of pears and generate images of these. Given this setup, agent A can generate an image of, e.g., a pear that is fed as an input image into the network of agent B. This network then correctly recognizes the image and classifies it as a pear. Running this simulation back and forth closely resembles a dialog between two distinct human agents.
Traditional accounts of communication require the existence of an external material object
x to express the perception of an agent (‘A perceives
x’) and an expression for informing B about
x, oftentimes by ostension (‘A tells B about
x by pointing at
x’). The way human perception and action operate is closely related to the two-way process illustrated above. Thus, what we learn from such simulations is that no reference to any further objective reality must be made besides there being ‘data’ available. A communication process is perfectly expressible without relying on the individuals of an external reality. In other words, it is not required that the world appears to our senses in preparcelled form consisting of given objects. The world presents itself not in a veridical but instead in a way that is useful for preserving homeostasis. This is a view that found support by many scholars, including those in support of ‘Evolutionary Epistemology’ (see, e.g., [
98,
99]). This view was abandoned from mainstream thought for some decades before being resurrected with a new face in the light of modern neuroscientific findings [
100]. Running a computer simulation of this model imitates a functioning communication process without committing to an ontology of a structured external world. Even if the communicators in the real world are humans rather than artificially intelligent agents, there is no particular commitment to an ontology about the real world. This is in line with contemporary constructivist and Kantian understanding of perception [
101,
102,
103,
104]. It is a view that challenges traditional semiotic accounts that assumes the existence of mind-independent objects with certain features that are signified by signs. It further challenges modern science oriented approaches that conceive information as veridical (see, e.g., [
17]).
There are multiple theoretical advantages of this view. First, traditional problems concerning the inscrutability of reference (see, e.g., [
105,
106]) disappear since the commitment to a notion of reference is not required. A stronger claim, motivated by neuroscience research is made by Rosenberg [
107,
108] (Ch. 8). He says that even if introspection tells us that our thoughts are about something in the world, this is just an illusion. Advocating scientism, he says that science gives us no reason to believe that neural circuits are about something. In the end, neural circuits are nothing more than matter, and matter can never be intrinsically about anything at all. If our thought could be said to be about something in the world then we were to expect to find a sort of ‘map’ of the world in the brain. However, all we find is a neuronal network with altering connections. According to the sciences, mental states are not about the world but the neural structure is physically isomorphic to the world. Whether Rosenberg is right in his judgement is controversial and even though some others such as Kenny [
109] (ch. 9) or Bennett and Hacker [
110] share his type of sentiment, they draw different conclusions. According to them, when it comes to cognitive processes, we should not only consider the brain in isolation but the person as a whole. In any case, while we are not committed to Rosenberg’s claim that the manifest concept of reference is
necessarily defective, we do support the idea that reference is not
required. Thus, we suggest that network modelling and simulations show how the notion of reference is not
required in a science oriented perspective on communication. This is a weaker and less controversial position to be in.
Second, the representational account of perception supported by our scientific framework manages to avoid traditional problems of perception. These problems mostly concern the nature of how it is that we see an object and typically emerge once the mind-independent object is presupposed. However, not much is left of these problems within a framework that does not assume a world that is shaped prior to human perception. The problems simply disappear if objects are conceived of as constructions in the Kantian framework [
97].
Third, communicative processes very often refer to fictive objects, such as unicorns or the characters in novels, rather than presumably existing ones. If there is no commitment to objects in the real world then there is also no need to make a distinction to fictitious objects. The process alone is what distinguishes the one from the other. While philosophers have pondered about the metaphysical distinction between real and fictional objects, an account of communication such as ours draws a line between the two types of objects by referring to the underlying cognitive process involved rather than the objects themselves. While the reference to real objects relies on a successful ‘downstream’ recognition process, the fantasizing of Sherlock Holmes depends on an ‘upstream’ information flow driven by the active generative network (see
Figure 3).
Interestingly, this account of communication helps explain the phenomenon of humans communicating about fictitious, or made up, worlds and agents.