1. Introduction
During language comprehension, utterances are mapped to their meaning and vice versa during language production. An important challenge is that the number of utterances of a language, and meanings that can be represented, is infinite. Consequently, we cannot memorize all possible utterances/meanings. From a finite set of utterances to which we are exposed during language acquisition, we can generalize and produce/comprehend an infinite number of utterances [
1].
Systematicity refers to this ability to generalize from known instances to novel ones, profiting from the regularities between them, in a manner similar to how (rule-based) symbolic functions operate over variables, processing uniformly or systematically the variables of the same type. This has been proposed to be ubiquitous in human cognition, and it is even a law of cognitive systems [
2,
3]. Other ways to refer to this notion are “compositionality” and “compositional generalization”.
Fodor and Pylyshyn [
2] started a debate arguing that connectionist cognitive models (i.e., models implementing artificial neural networks) cannot behave systematically, and even if they could, they would need to implement a symbol system, similar to the one proposed by the Language of Thought Hypothesis [
4], where the cognitive system consists of rules operating over symbols, with combinatorial dynamics and internal hierarchical structure. In that case, connectionist models are reduced to descriptions at the implementational level of analysis, with little to no explanatory value at the algorithmic level [
5].
Since the beginning of this debate, proponents of connectionism have argued that connectionist models can exhibit systematicity (for a review, see [
6]), from a theoretical point of view (e.g., [
7,
8]) and empirically (e.g., [
9,
10,
11,
12]). However, until recently some points of the debate still remain open, including the extent to which connectionist models can exhibit systematicity and under which circumstances systematic behavior is expected as an implication of the cognitive architecture and not just as a mere coincidence.
Although nowadays it is evident that connectionist models can generalize, it has been argued that they do not show a level of generalization or systematicity comparable to humans, and that is why modern deep learning models require such vast amounts of training, in contrast to humans, to learn certain tasks [
13]. In order to operationalize and measure systematicity, Hadley [
14] proposed to define it in terms of learning and generalization, where a neural network behaves systematically if it can process inputs for which it was not trained. Then, the level of systematicity depends on how different the training items are from the test items. Along this line, Hadley [
14,
15] proposed that human level systematicity is achieved if the model exhibits
semantic systematicity: the ability to construct correct meaning representations of novel sentences.
In this context, many language comprehension models have been proposed (e.g., [
16,
17,
18,
19,
20,
21,
22]). Of particular relevance for our purposes is the approach of Frank et al. [
23], which develops a connectionist model of sentence comprehension that is argued to achieve semantic systematicity. Their model takes a sentence and constructs a
situation vector, according to the Distributed Situation Space model (DSS, [
23,
24]). Each situation vector corresponds to a
situation model (see [
25]) of the state-of-affairs described by a sentence, which incorporates “world knowledge”-driven inferences. For example, when the model processes "a boy plays soccer", it does not only recover the explicit literal propositional content, but it also constructs a more complete situation model, in which a boy is likely to be playing outside, on a field, with a ball, etc. In this way, it differs from other connectionist models of language processing, that typically employ simpler meaning representations, such as case-roles (e.g., [
26,
27,
28,
29]). Crucially, Frank et al. [
23]’s model generalizes to sentences and meaning representations that it has not seen during training, exhibiting different levels of semantic systematicity.
Frank et al. [
23] explain the reason for the development of systematicity to be the inherent structure of the world from which the semantic representations are obtained (similar to [
19]). Hence, systematicity does not have to be an inherent property of the cognitive architecture but rather a property of the representations that are used. In this way, the model addresses the systematicity debate, providing an important step towards psychologically plausible models of language comprehension.
In this paper, we investigate whether the approach of Frank et al. [
23] can be applied to language production. We present a connectionist model that produces sentences from these situation models. We test whether the model can produce sentences describing situations for which a particular voice was not seen during training (passive vs. active), i.e., exhibiting syntactic systematicity, and further, whether the model can produce sentences for semantic representations that were not seen during training, i.e., exhibiting semantic systematicity. Additionally, we test whether the model can produce words in syntactic roles with which they were not seen during training. Finally, we test whether the model can produce sentences describing situations that are not allowed by the rules that generated the semantic representations (i.e., impossible or imaginary situations).
The results of testing active vs. passive show that the model successfully learns to produce sentences in all conditions, demonstrating systematicity similar to [
23]. Furthermore, the model is not only able to produce a single novel sentence for a novel message representation, but also it can typically produce all the encodings related to that message. Concerning the production of words in novel syntactic roles, the model is able to produce in most cases the expected patterns but more so when there are no alternative ways of encoding the semantics. Nonetheless, when the model is queried to produce sentences describing impossible situations, the model completely fails to produce such sentences and instead produces similar sentences describing plausible scenarios. The dataset and code to train/test our sentence production model can be found here:
https://github.com/iesus/systematicity-sentence-production (accessed on 11 August 2021).
Finally, we elaborate on the nature of the input representations as well as the mapping from inputs to outputs. In both cases, and like Frank et al. [
23], we argue that regularities between representations are necessary for a systematic behavior to emerge.
The structure of this paper is as follows:
Section 2 introduces the Distributed Situation Space as described by Frank et al. [
23].
Section 3 presents the model of language production and its architecture.
Section 4,
Section 5 and
Section 6 present each different testing conditions and their results. Finally,
Section 7 and
Section 8 present respectively the discussion and conclusion.
3. Language Production Model
Our model architecture (see
Figure 1) is broadly similar to the one of Frank et al. [
23], with the main difference being that the inputs and outputs are reversed: it maps DSS representations onto sentences. As Frank et al. [
23] point out, this is not intended to model human language development.
The model is an extension of a Simple Recurrent Network (SRN [
31]). It consists of an input layer, a 120-units recurrent hidden (sigmoid) layer, and a 43-units (softmax) output layer. The dimensionality of the input layer is determined by the chosen semantic representation (150-dimensional situation vector or 44-dimensional belief vector), plus one bit indicating if the model should produce an active sentence (1) or a passive one (0). The output layer dimensionality is the number of words in the vocabulary plus the end-of-sentence marker (43).
Time in the model is discrete. At each time step t, the activation of the input layer is propagated to the hidden recurrent layer. This layer also receives its own activation at time-step (zeros at ) through context units. Additionally, the hidden layer receives the word produced at time-step (zeros at ) through monitoring units, where only the unit corresponding to the word produced at time-step is activated (set to 1).
We did not test with more sophisticated architectures such as LSTMs [
32] or GRUs [
33] because the focus of this work are the representations used by the model, rather than the model itself. Consequently, we tried to use the minimum machinery possible aside from the input and output representations.
More formally, activation of the hidden layer is given by:
where
is the weight matrix connecting the input layer to the hidden layer,
connects the hidden layer to itself,
connects the monitoring units to the hidden layer, and
is the bias unit of the hidden layer.
Then, the activation of the hidden layer
is propagated to the output layer, which yields a probability distribution over the vocabulary, and its activation is given by:
where
is the weight matrix connecting the hidden layer to the output layer and
is the output bias unit.
The word produced at time-step t is defined as the one with highest probability (highest activation). The model stops when an end-of-sentence marker (a period) has been produced.
While it is outside the scope of this work, Calvillo and Crocker [
34] presents a more thorough analysis of the internal mechanism of this model using Layer-wise Relevance Propagation [
35].
As an example,
Figure 2 illustrates the production of “someone plays badly.”. At time step
, the DSS representation is fed to the hidden layer, which propagates its activation to the output layer. The output layer then yields a probability distribution over the vocabulary. In this case, the words “someone” and “a” have high activation. Since “someone” has the highest activation, it is the word the model produces. At
, the hidden layer is fed again with the DSS representation, but this time also the activation of the hidden layer at
and the word produced at
(“someone”). It then propagates its activation to the output layer, which again yields a probability distribution over the vocabulary. This time the only activated word is “plays”. At
, the process is repeated but this time the model activates “badly” and “a”. Since ”badly” has higher activation, it produces “badly”. Finally, at
, the model produces “.”, which signals the end of production.
Testing for Systematicity
We defined different test conditions where the model must produce a sentence that it has not seen during training. Depending on its success, and how different the test items are to the training ones, we can assess the degree to which the model can generalize and exhibit systematicity.
The model was trained with cross-entropy backpropagation [
36] and stochastic gradient descent (see
Appendix B for more details). Each test instance corresponds to giving the model a
and seeing whether the model can produce one of the related sentences in
. The test conditions are divided into three sets corresponding to the next three sections: the first set (Active vs. Passive) tests whether the model can produce sentences in passive or active voice for a novel semantics. The second set (Words in New Syntactic Roles) tests whether the model can produce a word in a syntactic role with which that word was not seen in training. Finally, the third set (Semantic Anomalies) tests whether the model can produce sentences for situations that violate rules of the microworld.
4. Active vs. Passive
The dataset contains 782 unique DSS representations of microworld situations. These were divided in two sets: setAP (
) are situations related to both active and passive sentences, and setA (
) are situations related only to active sentences in which the direct object is a person (e.g., “Heidi beats Charlie.”) or unspecified (e.g., “Charlie plays.”), for which the microlanguage defines no passive sentences. Using this division, we defined our test conditions, which are outlined in
Table 4.
setAP allowed for three conditions:
1: The model has seen active sentences, and a passive is queried.
2: The model has seen passive sentences, and an active is queried.
3: New situations, passive and active sentences are queried.
setA allowed for two testing conditions:
4: The model has seen active sentences, and a passive is queried.
5: New situations, passive and active sentences are queried.
These conditions correspond to different levels of generalization. In all cases, the queried sentence is new to the model. For Conditions 1, 2 and 4 the model has seen the situations but not in the queried voice. Importantly, for Conditions 3 and 5, the model has never seen the situation itself.
From another view, in Conditions 1, 2 and 3, the model has seen similar syntactic and semantic patterns but not the specific sentences or semantics. In contrast, in Conditions 4 and 5, where a passive is queried, the model must produce a sentence with a voice for which no example was given during training, not only for that specific semantics but for that type of semantics.
For these conditions, we applied 10-fold cross-validation, where for each fold, some items are held out for testing, and the rest are used for training. Thus setAP was randomly shuffled and split into 10 folds of 90% training and 10% test situations, meaning per fold 382/42 training/test items. For each fold, the test situations were further split into the three conditions, rendering 14 different test situations per condition, per fold. SetA was also shuffled and split into 10 folds, but in order to preserve uniformity, for each fold 14 situations were drawn per condition, meaning that each fold contained 28 test and 330 training situations.
For Condition 1, the situations were coupled with their active sentences and incorporated into the training set (while the passive sentences remained in the test set) and vice versa for Condition 2. Similarly, for Condition 4 the active sentences were incorporated into the training set, while during testing the model will be queried for a passive construction.
For each type of semantic representation (150-dimensional situation vectors and 44-dimensional belief vectors), we trained 10 instances of the model initialized with different weights, corresponding to each fold as described above. The results reported below are averages over these instances.
The model was first tested to see if it could produce a single sentence for each test item. Then, the model was tested to see whether it could produce all the sentences related to each test item.
4.1. Quantitative Analysis
As initial production policy, we define the word that the model produces at each time step as the one with the highest activation in the output layer. Thus, for a , the model produces a sentence describing the state-of-affairs represented in .
We assume that
is correct if it belongs to the set
of all possible realizations of
in the queried voice. However, sometimes
does not perfectly match any sentence in
. As such, we compute the similarity between
, and each sentence in
, using their Levenshtein distance [
37]; which is the number of insertions, deletions or substitutions that are needed in order to transform one string into another. More formally, Levenshtein similarity
between sentences
and
is:
where
is the Levenshtein distance. This similarity is 0 when the sentences are completely different and 1 when they are the same. Then, having a sentence produced by the model
, we compute the highest similarity that one can achieve by comparing
with each sentence
s in
:
For each test item, the model produced a sentence
, which was compared to the expected ones, rendering the results in
Table 5. For Conditions 4 and 5, where the model is queried a passive sentence, there are no corresponding examples in the dataset.
The performance using the 150-dimensional situation vectors can be seen in columns 3 and 4 in
Table 5. On average, the model produces perfect sentences for almost two thirds of the situation vectors. While this may seem modest, we should consider that the situation vectors went through a dimensionality reduction, after which some information is lost. Nevertheless, the model achieves a
similarity score on the test conditions. Since all sentences generated are novel to the model, this demonstrates the model can reasonably generalize.
For Condition 5 where active sentences are queried, we see a drop of performance. This could be because setA contained fewer sentences per situation and therefore fewer training items.
The performance with belief vectors can be seen in the last two columns in
Table 5. The average similarity across all the test conditions is
(
of perfect matches), which is very high and almost perfect in several cases, demonstrating high semantic systematicity.
Given that the errors with belief vectors are much fewer, and that a qualitative analysis (in the next subsection) shows that the sentences are similar in nature, we continue the rest of this quantitative analysis using only belief vectors.
Now we test whether the model can produce not only one sentence but all the sentences related to a given situation. Previously, at each time step the word produced was the one with highest activation. Sometimes, however, multiple words can have high activation, particularly when they all may be correct continuations of the sentence. In order to explore these derivations, we redefine the word production policy: at a given time step, the words produced are all the words with an activation above a threshold . By following all the word derivations that comply with this, the model can produce multiple sentences for a given semantics.
In general,
should be low enough such that the model can derive the range of possible sentences but not too low so as to avoid overgenerations. We evaluated in terms of precision and recall several formulations of
in order to identify the one that reproduces better all and only the sentences related to the semantic representations in the training set. The results indicate that
is rather insensitive to the shape of the output distribution and that a fixed value may be sufficient (see
Appendix C for further details). Indeed, a maximum f-score value of
was achieved setting
.
For our experiments, we set , which has a high recall () and a relatively high precision (). A high recall is preferred in order to produce a relatively large number of sentences that could give us insight into the production mechanism of the model.
For each condition, the following was done: for each
, we calculated precision, recall and f-score of the set of sentences produced by the model with respect to the set
; then, these values were averaged across the
representations of the test condition. In these calculations, only the sentences that fully matched a sentence in
were considered, discarding partial matches. Finally, these values were averaged across the previously described 10 folds, giving the results in
Table 6.
The first and second columns in
Table 6 show the test conditions and the type of sentence the model must produce. The third column shows the average number of sentences related to
its standard deviation. For example, in Condition 2, each
is related to
(
) active sentences. In general, the range is quite wide: some representations are related to one sentence, while others are related to many more, 130 being the maximum. On average, the model must produce
(
) encodings per
.
The fourth column in
Table 6 shows the percentage of situations where the model produced all expected sentences without errors, which was the case for more than half of the representations in all conditions (55.9% on average). Considering only these cases, the mean number of sentences per situation was
(
), showing that the model can reproduce a large number of sentences per semantics without difficulties, sometimes reproducing up to 40 different encodings without errors.
The last three columns in
Table 6 show precision, recall and f-score. In all conditions the model produced more than 90% of the expected sentences (93.7% on average). Additionally, the sentences produced were mostly correct (80.8% on average). The value of
was chosen such that recall would be high; nonetheless, as we will see shortly, the sentences overgenerated are semantically very similar to the ones expected, raising again the question of whether they should be considered errors.
From these results, we see that for more than a half of the situations, the model can perfectly produce all the related sentences, and even for the situations with errors, the model can reproduce a large proportion of the expected sentences, even if they are novel, again demonstrating systematicity.
4.2. Qualitative Analysis
As before, we analyze first the model’s output in single sentence production, and then we continue with the case where the model must produce all the sentences related to a semantics.
Inspecting the sentences produced with the two types of representations (situation and belief vectors), we see that in both cases they are, with only few exceptions, syntactically correct and in all cases their semantics are, if not fully correct, at least closely related to the intended semantics.
The dimensionality reduction used to generate the 150-dimensional situation vectors introduces some information loss: the sentences with adverbial modifiers do not distinguish between “well” and “badly” and between “ease” and “difficulty”. The errors elicited for three folds were manually analyzed, finding that out of 82 errors, 34 () were related to modification. The information loss affects also other aspects, causing other error types but with fewer attestations.
Belief vectors elicited no errors related to adverbial modification. Without these errors, the sentences obtained using the two kinds of representations are qualitatively similar. Because of that, and since the errors with belief vectors are much fewer, we focus the rest of the analysis on the output obtained using belief vectors. However, we expect the performance using the 150-dimensional situation vectors would be similar to the one using belief vectors if the dimensionality reduction did not introduce information loss. See [
38] for an alternative dimensionality reduction technique that may well mitigate this loss.
Although the performance with belief vectors is quite high, the model still produces systematic errors that provide us with insight into its internal mechanism. Because of that, the errors produced in five folds were manually inspected to see their regularities. Examples of these are in
Table 7.
All the inspected errors occur when the model produces a sentence that is semantically very similar to the one expected, reproducing patterns of the speech error literature (e.g., [
39]), which states that speech errors involve elements with phonological and/or semantic similarity. In our case, the model does not operate with phonological information, and consequently the errors are solely related to semantic similarity. This pattern appears even when testing with training items, where the model cannot distinguish between some highly similar situations, even though it has already seen them (Examples 1–3 in
Table 7).
The errors observed (35 in total) can be classified into three main categories:
Underspecification (
): sentences providing correct information but omitting details (Examples 4–6 in
Table 7).
Overspecification (
): sentences with information that is not contained in the semantics but that is likely to be the case (Examples 7–9 in
Table 7).
Very high similarity (
): errors related to situations that are remarkably similar because of the design of the microworld (Examples 10–12 in
Table 7).
The model is expected to describe situations assuming that the comprehender has no contextual information. Thus, a sentence that underspecifies gives less than enough information to fully describe the intended situation, while a sentence that overspecifies gives more information than what the semantics contains. It is debatable whether these should be considered as errors, given that people are not as precise, sometimes being vague (underspecifying) and sometimes making assumptions under uncertainty (overspecifying). For uniformity, however, we consider an “error” any difference between the semantics of the input and the semantics of the sentence produced.
The errors in the third category are sentences that seem correct at first glance; it is only after looking deeply into the microworld and the microlanguage that we see the error. First, in this microworld, whenever there is a winner, there is also a loser, meaning that sentences that are apparently contradictory (“someone loses.” vs. “someone wins.”) actually have the same implications and therefore are semantically identical. Second, the winner and the loser are usually in the same location, except when playing hide and seek, in which case the loser can be in the bedroom, while the winner could be in the bathroom and vice versa. Finally, prepositional phrases (e.g., “in the bedroom”) are attached to the subject of the sentence according to the microlanguage, meaning that in “Heidi beats Sophia in the bedroom”, Heidi is in the bedroom while Sophia could be in either the bedroom or the bathroom; similarly, in “Sophia loses to Heidi in the bedroom”, it is Sophia who stays in the bedroom while Heidi could also be in the bathroom. Apart from this detail, the situations are almost identical.
So far we see the model can take the linguistic elements learned during training in order to characterize situations for which it has no experience. The only difficulty appears to be the distinction of highly similar situations. However, the performance is very good in general and even for the sentences with an error, the output is largely correct. Furthermore, these errors serve to further demonstrate systematicity, as they are elicited precisely because of proximity/similarity in the semantic space, where similar situations have representations that are close to each other.
We also analyzed manually the output for three folds when the model must produce all the sentences related to each semantics. We found 391 sentences with errors, which followed the same patterns and proportions as in single sentence production. Additionally, since the model explores areas of low probability, a fourth category of errors appeared, which interestingly seems to coincide with errors found in the literature about human speech errors, such as repetitions and substitutions [
40]; and only four sentences had a clear syntactic anomaly.
In general, the sentences produced with the most activated words are the best, and as one goes further away from the most activated words, errors start to appear, first producing underspecification/overspecification and then repetitions or syntactic errors. Nonetheless, with only few exceptions, the sentences maintain syntactic adequacy and high semantic similarity.
Finally, when the model produces multiple sentences we can also identify undergenerations, which show what the model preferred not to produce. We compared the undergenerated sentences, in the same three folds as before, against the ones produced. We found 38 situations with undergenerated sentences, which in general followed the statistical patterns of the microlanguage. For example, the location is preferred to be mentioned at the end. For more details, see
Appendix D.
4.3. Undefined Passive Sentences
For Conditions 4 and 5, where a passive sentence is queried,
Table 8 presents examples of output sentences and the situations that they were supposed to convey. These situations can be of two types: the first one involving a winning/losing situation where both actors are explicitly mentioned, and the second type being situations where the object of the action is unspecified.
We report the case where the model must produce multiple sentences per semantics, because it gives us a slightly wider view; however, the results are very similar for single sentence production. As before, we manually analyzed the output for three folds.
Here, the model must produce passive sentences for areas in the semantic space to which no sentences in the microlanguage are related. Consequently, the model is more uncertain, activating at each time step more words, producing relatively more sentences. Most of these follow the semantics; however, productions with low probability contain errors similar to those previously reported.
Concerning winning/losing situations (77 situations, Examples 1–2 in
Table 8), the object is always a game because in the microworld winning/losing only happens when playing games. Thus, the model produces the name of the game when it is known (e.g., “soccer is…”), otherwise the sentence starts with “a game is…”. Then, one player is mentioned (one omitted) and the rest of the situation is described.
For situations with an unspecified object (seven situations, Examples 3–4 in
Table 8), it is unknown whether the subject is playing a game or with a toy. Most of the time (four situations), both types of sentences are produced: sentences where “a game” is the object, and sentences where “a toy” is the object. Apart from this, the sentences follow the semantics.
Similar to the other conditions, over and underspecification errors occur in Conditions 4 and 5, but are rare (
, Example 5 in
Table 8). Two types of error that appear only for these situations are the inversion of the winning/losing relation in game situations (39 situations, Example 6 in
Table 8) and the mention of the agent at the beginning of the sentence (12 situations, Example 7 in
Table 8).
Although the model exhibits confusion as it explores areas of low probability, it can still process most information of each input. For these conditions, not only the specific representations are novel but also the model has never seen this kind of situation coupled with passive sentences. It is because of the systematic behavior of the model that it can produce coherent sentences for these areas of the semantic space. A classical symbol model would have difficulties producing any output, as the grammar rules describing passive sentences for these situations are simply non-existent in this microlanguage. From this view, our model can be regarded as more robust and perhaps even more systematic.
6. Semantic Anomalies
The last test condition investigates whether the model can produce sentences for semantic representations that violate rules of the microworld. These representations are not only outside of the training set but also outside of the set of possible elements in the input space. Intuitively, this condition addresses the human ability to produce sentences for situations that are not real or in contradiction to common world knowledge [
41].
According to the rules of the microworld, each game is played only in specific locations: chess in the bedroom, soccer in the street, and hide and seek in all locations except in the street. We constructed representations that violate these rules by taking the semantic representations related to the sentences with the pattern "X plays Y" where X is a person and Y is a game. Then, we set to 0 all dimensions related to locations (basic events of the form where Z is a location that is not the target one). Finally, the target location (which violates the rules) is set to 1.0 for the protagonist of the situation (X). In this way, the original semantic representations are mostly preserved, except that all people are placed in the target location.
For example, the representation for “Charlie plays chess in the playground.” is the same as the one for “Charlie plays chess in the bedroom.”, except that the dimensions related to locations would place “Charlie” in the playground and would remove all activation related to other people in other places.
Here, the training set is the full dataset, and the test items are the newly created representations that violate the rules. We trained three instances of the model and manually analyzed their output.
Results
The sentences produced show that the model is uncertain about the meaning of the input. In most cases the model produces the usual place for the game (bedroom for chess, street for soccer), or in the case of hide and seek, the model avoids expressing the location. In all cases, the sentences produced follow the rules of the microworld and avoid aspects that would contradict those rules.
During training, the model learns, for example, that soccer is always played in the street, so if someone is playing soccer, he/she is in the street. Then, if a semantics indicates that someone plays soccer in the playground, according to the microworld, either that person is not playing soccer or he/she is in the street. The sentences produced follow one of these options but not both.
It is debatable whether these outputs are correct or not. If we consider the microworld rules as unbreakable, then the model is correct avoiding sentences that make no sense in this microworld. However, if we consider that the semantic representation should be followed, no matter the implications in the microworld, then the model was incorrect as it was unable to follow those representations.
If we expect the model to produce sentences that break the rules, then perhaps we need to relax those rules. In the real world, some events imply certain other events; however, most events are independent of each other. Moreover, a person’s knowledge is limited, and one cannot form very strong rules about the world, since for each rule, many exceptions exist. In contrast, with the limited set of events that the model experiences, the rules that the model learns could be considered as “hard”, as during training, it receives no information leading to believe that those rules can be broken.
We should also consider that the dimensions in the semantic representations are not independent. Indeed, each game entails certain locations. We altered the values of the location dimensions, but the model still knows the game, which implies different locations, and which can potentially cancel the effect of our alterations. If we wish the model to produce sentences breaking these entailments, it would need to be rewired such that each input dimension is independent, and thus, altering one dimension would not affect the model’s behavior with respect to the others.