Referential Salience in French and Mandarin Chinese: Influence of Syntactic, Semantic and Textual Factors

: In this article, we propose a multifactorial approach to salience analysis, examining the influence of five factors on the salience of referential entities in discourse. Significance tests and Cramer’s V tests were conducted to analyze textual data obtained through manual annotation of four text excerpts in French and in Chinese. The results show that almost all the factors have a sig‑ nificant influence on referents’ salience (except the animacy factor in one of the excerpts). While it seems difficult to predict a fixed ranking of salience factors, which depends more on textual char‑ acteristics than on differences between the two languages, the different values of each factor under investigation show an identical behavior in terms of the positive/negative contribution to salience. The results also suggest that some factors (syntactic function and syntactic parallelism) may have a more stable influence on referents’ salience than other factors (animacy, mobility, and main charac‑ ter), potentially constrained by textual properties such as the main character’s nature, its number of occurrences, and the possible existence of competing protagonists.


Introduction 1
Salience (also referred to as 'prominence') has recently attracted considerable attention in various linguistic fields (Schnedecker 2011).In this article, this notion is examined from a referential and discursive perspective (Landragin 2004;Chiarcos et al. 2011;Von Heusinger and Schumacher 2019), which concerns a property of entities in discourse representation and serves more particularly to describe the status of centrality of certain referents in the consciousness of the partners of the enunciation (Neveu 2011).In fact, the emergence of the term and its use in the field of discourse reference stem from studies using a cognitive approach around the eighties (Chafe 1976;Chafe 1994;Prince 1981;Yule 1981;Givón 1983;Ariel 1990), according to which the choice of referential expressions is directly linked to the memory process and the cognitive system that is the mental representation of discourse entities.Salience, as presumed by the speaker and perceived by the hearer, is thus applied to referential entities in a stretch of discourse, and can account for various linguistic phenomena related to the interpretation and production of language, such as the interpretation of anaphoric expressions and the choice of referential expressions.Both the speaker and the hearer collaborate in the processes of referential choice and referential understanding, and the degree of salience of a referent indicates to the speaker which referring expression to choose, and to the hearer how to find the relevant referent.From this perspective, the consideration of this notion is essential when dealing with the automatic generation or processing of referential expressions.
With a multifactorial approach to salience, our objective is first to verify the influence of five factors, namely syntactic function, syntactic parallelism, animacy, mobility, and main character, on the salience of referents.In addition, we aim to compare not only the relative importance of these factors, but also the contributions of each categorical value of each factor (e.g., subject or direct object of the syntactic function factor).For this purpose, we used textual data annotated with the five factors.In fact, most of the above studies were based on psycholinguistic experiments or descriptive observations.However, it would be more interesting to have recourse to attested data and to situate each referential expression in its textual context.With textual data, we are able to analyze referents' salience in their context of realization and to take into account the influence of several factors at the same time.The interest of this study also lies in its contrastive and comparative approach.Previous research in centering theory (Kameyama 1986;Walker et al. 1998;Di Eugenio 1998, see also Section 2.1 for more discussion of the theory) has shown that the factors that determine the ranking of entities according to their salience may be universal or specific to the language being processed.If the five factors under examination are all likely to influence salience in French and in Chinese (Hou and Landragin 2019), we would like to know if the contribution of each factor is similar in the two typologically different languages.Furthermore, excerpts of same genre but with different characteristics (see Section 3.1) were chosen to investigate the relative importance of the factors in four different excerpts of parallel texts.If the salience factors and its operation are perhaps constrained by textual genre (Schnedecker 2021), another question is to find out whether the factors will have the same effects in texts (or excerpts) of the same genre.Through the statistical results of the annotation, we address the following questions in this article:

•
Does each of the factors have a statistically significant influence on referents' salience?

•
Is the relative importance of each factor (or the ranking of factors according to their importance) similar in each language?• Is the relative importance of each factor always similar in texts (or excerpts) of the same genre?• Do the different categorical values of a single factor all contribute to an increase in the degree of salience?If not, are the patterns (of positive/negative contribution) similar in each language (or excerpt)?
More specifically, we put forth the following hypotheses: • While the strength of influence might vary, each factor will have a statistically significant influence on referents' salience.

•
Given the inherent linguistic differences between the two languages, the relative importance of each factor may be different in French and Chinese.• While salience factors may be influenced by the specific textual genre, we predict that the relative importance of each factor will remain largely consistent within texts of the same genre.

•
Not all values of a single factor have a uniformly positive contribution to referential salience.Some values may enhance salience, while others may diminish it, but the patterns (of positive/negative contribution) are similar in each language.
In the following sections, we first discuss the notion of salience and our multifactorial approach in Section 2.Then, we present our corpora, annotation methodology, and statistical methods in Section 3. Sections 4-7 are devoted, respectively, to the results of the statistical tests of the syntactic (syntactic function and syntactic parallelism), semantic (animacy and mobility), and textual (main character) factors.The overall results are summarized in Section 8, followed by the discussion of the stability of the factors' contributions to referential salience and some theoretical implications in Section 9. We end the last section with a conclusion and research perspectives.

Salience: Main Characteristics, Related Theories, and Multifactorial Approach
In order to define the notion of salience (or prominence), Himmelmann and Primus (2015) and Von Heusinger and Schumacher (2019) proposed three fundamental characteristics of salience: The second criterion emphasizes that the degree of salience of an entity as the discourse progresses.As a result, a referent considered salient enou referent of an anaphoric expression at a particular time (or place) may lose its high-saliency status later, as a result of the influence of salience factors (se third characteristic proposes that salient units may be more central in the pro ture building and may contribute to more operations or structures.This see corollary of the special attention attributed to the most salient entity and coul fact that a salient referent can be more easily retrieved by a reduced linguist In the literature, several theories close to the notion of salience share thi that certain entities are more salient (or central) than others in the conscio speaker and the hearer, and that there is a correspondence between linguis degree of salience.In centering theory (Grosz et al. 1995), the 'centers' of an the entities (or semantic objects) that link that utterance to others in the se discourse in question.According to Grosz et al. (1995), each utterance has a se looking centers (Cf) that are realized through the constituent expressions of (U).The elements of Cf are ranked according to their relative salience.Mo rèn and zhè běn shū de zuòzhě that attract more attention of the hearer in (1) a and (1) b, and become the salient referents in their own context.
(1) a. [ According to Von Heusinger and Schumacher (2019), the relational principle results from the fact that an entity is considered salient only if it is more salient in relation to the other entities.In the process of interpreting anaphors, discourse referents (realized by referential expressions) are in competition with one another and it is the most salient entity that attracts the attention of the hearer and provides an anchor for the resolution of an anaphoric expression.In Example (1), the referents zhǔrèn 'the director' and zhè běn shū de zuòzhě 'the author of this book' are in competition.After the interpretation of the second clause, it is respectively zhǔrèn and zhè běn shū de zuòzhě that attract more attention of the hearer in (1) a and (1) b, and become the salient referents in their own context.
]i把这本书的作者介绍给我， [Ø] The second criterion emphasizes that the degree of salience of an entity may change as the discourse progresses.As a result, a referent considered salient enough to be the referent of an anaphoric expression at a particular time (or place) may lose (or maintain) its high-saliency status later, as a result of the influence of salience factors (see below).The third characteristic proposes that salient units may be more central in the process of structure building and may contribute to more operations or structures.This seems to be the corollary of the special attention attributed to the most salient entity and could explain the fact that a salient referent can be more easily retrieved by a reduced linguistic form.
In the literature, several theories close to the notion of salience share this perspective that certain entities are more salient (or central) than others in the consciousness of the speaker and the hearer, and that there is a correspondence between linguistic form and degree of salience.In centering theory (Grosz et al. 1995), the 'centers' of an utterance are the entities (or semantic objects) that link that utterance to others in the segment of the discourse in question.According to Grosz et al. (1995), each utterance has a set of forwardlooking centers (Cf) that are realized through the constituent expressions of an utterance (U).The elements of Cf are ranked according to their relative salience.Moreover, each rèn] i b According to Von Heusinger and Schumacher (2019), the relational principle results from the fact that an entity is considered salient only if it is more salient in relation to the other entities.In the process of interpreting anaphors, discourse referents (realized by referential expressions) are in competition with one another and it is the most salient entity that attracts the attention of the hearer and provides an anchor for the resolution of an anaphoric expression.In Example (1), the referents zhǔrèn 'the director' and zhè běn shū de zuòzhě 'the author of this book' are in competition.After the interpretation of the second clause, it is respectively zhǔrèn and zhè běn shū de zuòzhě that attract more attention of the hearer in (1) a and (1) b, and become the salient referents in their own context.The second criterion emphasizes that the degree of salience of an entity may change as the discourse progresses.As a result, a referent considered salient enough to be the referent of an anaphoric expression at a particular time (or place) may lose (or maintain) its high-saliency status later, as a result of the influence of salience factors (see below).The third characteristic proposes that salient units may be more central in the process of structure building and may contribute to more operations or structures.This seems to be the corollary of the special attention attributed to the most salient entity and could explain the fact that a salient referent can be more easily retrieved by a reduced linguistic form.
In the literature, several theories close to the notion of salience share this perspective that certain entities are more salient (or central) than others in the consciousness of the speaker and the hearer, and that there is a correspondence between linguistic form and degree of salience.In centering theory (Grosz et al. 1995), the 'centers' of an utterance are the entities (or semantic objects) that link that utterance to others in the segment of the discourse in question.According to Grosz et al. (1995), each utterance has a set of forwardlooking centers (Cf) that are realized through the constituent expressions of an utterance (U).The elements of Cf are ranked according to their relative salience.Moreover, each zhè běn shū de zuòzhě jièshào gěi w According to Von Heusinger and from the fact that an entity is conside other entities.In the process of interpr erential expressions) are in competitio that attracts the attention of the hear anaphoric expression.In Example (1), zuòzhě 'the author of this book' are in clause, it is respectively zhǔrèn and zh hearer in (1) a and (1) b, and become t The second criterion emphasizes as the discourse progresses.As a res referent of an anaphoric expression a its high-saliency status later, as a resu third characteristic proposes that salie ture building and may contribute to corollary of the special attention attrib fact that a salient referent can be more In the literature, several theories that certain entities are more salient speaker and the hearer, and that ther degree of salience.In centering theory the entities (or semantic objects) that discourse in question.According to G looking centers (Cf) that are realized (U).(ii) Dynamic: the prominent status may change; (iii) Structural attraction: prominent units are structural attractors in their domain.
According to Von Heusinger and Schumacher (2019), the relational principle results from the fact that an entity is considered salient only if it is more salient in relation to the other entities.In the process of interpreting anaphors, discourse referents (realized by referential expressions) are in competition with one another and it is the most salient entity that attracts the attention of the hearer and provides an anchor for the resolution of an anaphoric expression.In Example (1), the referents zhǔrèn 'the director' and zhè běn shū de zuòzhě 'the author of this book' are in competition.After the interpretation of the second clause, it is respectively zhǔrèn and zhè běn shū de zuòzhě that attract more attention of the hearer in (1) a and (1) b, and become the salient referents in their own context.The second criterion emphasizes that the degree of salience of an entity may change as the discourse progresses.As a result, a referent considered salient enough to be the referent of an anaphoric expression at a particular time (or place) may lose (or maintain) its high-saliency status later, as a result of the influence of salience factors (see below).The third characteristic proposes that salient units may be more central in the process of structure building and may contribute to more operations or structures.This seems to be the corollary of the special attention attributed to the most salient entity and could explain the fact that a salient referent can be more easily retrieved by a reduced linguistic form.
In the literature, several theories close to the notion of salience share this perspective that certain entities are more salient (or central) than others in the consciousness of the speaker and the hearer, and that there is a correspondence between linguistic form and degree of salience.In centering theory (Grosz et al. 1995), the 'centers' of an utterance are the entities (or semantic objects) that link that utterance to others in the segment of the discourse in question.According to Grosz et al. (1995), each utterance has a set of forwardlooking centers (Cf) that are realized through the constituent expressions of an utterance (U).The elements of Cf are ranked according to their relative salience.Moreover, each lì w (ii) Dynamic: the prominent status may change; (iii) Structural attraction: prominent units are structural attractors in their domain.
According to Von Heusinger and Schumacher (2019), the relational principle results from the fact that an entity is considered salient only if it is more salient in relation to the other entities.In the process of interpreting anaphors, discourse referents (realized by referential expressions) are in competition with one another and it is the most salient entity that attracts the attention of the hearer and provides an anchor for the resolution of an anaphoric expression.In Example (1), the referents zhǔrèn 'the director' and zhè běn shū de zuòzhě 'the author of this book' are in competition.After the interpretation of the second clause, it is respectively zhǔrèn and zhè běn shū de zuòzhě that attract more attention of the hearer in (1) a and (1) b, and become the salient referents in their own context.The second criterion emphasizes that the degree of salience of an entity may change as the discourse progresses.As a result, a referent considered salient enough to be the referent of an anaphoric expression at a particular time (or place) may lose (or maintain) its high-saliency status later, as a result of the influence of salience factors (see below).The third characteristic proposes that salient units may be more central in the process of structure building and may contribute to more operations or structures.This seems to be the corollary of the special attention attributed to the most salient entity and could explain the fact that a salient referent can be more easily retrieved by a reduced linguistic form.
In the literature, several theories close to the notion of salience share this perspective that certain entities are more salient (or central) than others in the consciousness of the speaker and the hearer, and that there is a correspondence between linguistic form and degree of salience.In centering theory (Grosz et al. 1995), the 'centers' of an utterance are the entities (or semantic objects) that link that utterance to others in the segment of the discourse in question.According to Grosz et al. (1995), each utterance has a set of forwardlooking centers (Cf) that are realized through the constituent expressions of an utterance (U).The elements of Cf are ranked according to their relative salience.Moreover, each (i) Relational (or singling-out): the prominent status is the result of competition among language units of the same level (e.g., syllables, referents); (ii) Dynamic: the prominent status may change; (iii) Structural attraction: prominent units are structural attractors in their domain.
According to Von Heusinger and Schumacher (2019), the relational principle results from the fact that an entity is considered salient only if it is more salient in relation to the other entities.In the process of interpreting anaphors, discourse referents (realized by referential expressions) are in competition with one another and it is the most salient entity that attracts the attention of the hearer and provides an anchor for the resolution of an anaphoric expression.In Example (1), the referents zhǔrèn 'the director' and zhè běn shū de zuòzhě 'the author of this book' are in competition.After the interpretation of the second clause, it is respectively zhǔrèn and zhè běn shū de zuòzhě that attract more attention of the hearer in (1) a and (1) b, and become the salient referents in their own context.The second criterion emphasizes that the degree of salience of an entity may change as the discourse progresses.As a result, a referent considered salient enough to be the referent of an anaphoric expression at a particular time (or place) may lose (or maintain) its high-saliency status later, as a result of the influence of salience factors (see below).The third characteristic proposes that salient units may be more central in the process of structure building and may contribute to more operations or structures.This seems to be the corollary of the special attention attributed to the most salient entity and could explain the fact that a salient referent can be more easily retrieved by a reduced linguistic form.
In the literature, several theories close to the notion of salience share this perspective that certain entities are more salient (or central) than others in the consciousness of the speaker and the hearer, and that there is a correspondence between linguistic form and degree of salience.In centering theory (Grosz et al. 1995), the 'centers' of an utterance are the entities (or semantic objects) that link that utterance to others in the segment of the discourse in question.According to Grosz et al. (1995), each utterance has a set of forwardlooking centers (Cf) that are realized through the constituent expressions of an utterance (U).The elements of Cf are ranked according to their relative salience.Moreover, each rèn b teristics of salience: (i) Relational (or singling-out): the prominent status is the result of competition among language units of the same level (e.g., syllables, referents); (ii) Dynamic: the prominent status may change; (iii) Structural attraction: prominent units are structural attractors in their domain.
According to Von Heusinger and Schumacher (2019), the relational principle results from the fact that an entity is considered salient only if it is more salient in relation to the other entities.In the process of interpreting anaphors, discourse referents (realized by referential expressions) are in competition with one another and it is the most salient entity that attracts the attention of the hearer and provides an anchor for the resolution of an anaphoric expression.In Example (1), the referents zhǔrèn 'the director' and zhè běn shū de zuòzhě 'the author of this book' are in competition.After the interpretation of the second clause, it is respectively zhǔrèn and zhè běn shū de zuòzhě that attract more attention of the hearer in (1) a and (1) b, and become the salient referents in their own context.The second criterion emphasizes that the degree of salience of an entity may change as the discourse progresses.As a result, a referent considered salient enough to be the referent of an anaphoric expression at a particular time (or place) may lose (or maintain) its high-saliency status later, as a result of the influence of salience factors (see below).The third characteristic proposes that salient units may be more central in the process of structure building and may contribute to more operations or structures.This seems to be the corollary of the special attention attributed to the most salient entity and could explain the fact that a salient referent can be more easily retrieved by a reduced linguistic form.
In the literature, several theories close to the notion of salience share this perspective that certain entities are more salient (or central) than others in the consciousness of the speaker and the hearer, and that there is a correspondence between linguistic form and degree of salience.In centering theory (Grosz et al. 1995), the 'centers' of an utterance are the entities (or semantic objects) that link that utterance to others in the segment of the discourse in question.According to Grosz et al. (1995), each utterance has a set of forwardlooking centers (Cf) that are realized through the constituent expressions of an utterance (U).The elements of Cf are ranked according to their relative salience.Moreover, each The second criterion emphasizes as the discourse progresses.As a res referent of an anaphoric expression a its high-saliency status later, as a resu third characteristic proposes that salie ture building and may contribute to corollary of the special attention attrib fact that a salient referent can be more In the literature, several theories that certain entities are more salient speaker and the hearer, and that ther degree of salience.In centering theory the entities (or semantic objects) that discourse in question.According to G looking centers (Cf) that are realized (U).In order to define the notion of salience (or prominence), Himmelmann and Pr (2015) and Von Heusinger and Schumacher (2019) proposed three fundamental ch teristics of salience: (i) Relational (or singling-out): the prominent status is the result of competition am language units of the same level (e.g., syllables, referents); (ii) Dynamic: the prominent status may change; (iii) Structural attraction: prominent units are structural attractors in their domain.
According to Von Heusinger and Schumacher (2019), the relational principle re from the fact that an entity is considered salient only if it is more salient in relation t other entities.In the process of interpreting anaphors, discourse referents (realized b erential expressions) are in competition with one another and it is the most salient e that attracts the attention of the hearer and provides an anchor for the resolution anaphoric expression.In Example (1), the referents zhǔrèn 'the director' and zhè běn s zuòzhě 'the author of this book' are in competition.After the interpretation of the se clause, it is respectively zhǔrèn and zhè běn shū de zuòzhě that attract more attention o hearer in (1) a and (1) b, and become the salient referents in their own context.
(1) a. [主任] The second criterion emphasizes that the degree of salience of an entity may ch as the discourse progresses.As a result, a referent considered salient enough to b referent of an anaphoric expression at a particular time (or place) may lose (or main its high-saliency status later, as a result of the influence of salience factors (see below) third characteristic proposes that salient units may be more central in the process of s ture building and may contribute to more operations or structures.This seems to b corollary of the special attention attributed to the most salient entity and could explai fact that a salient referent can be more easily retrieved by a reduced linguistic form.
In the literature, several theories close to the notion of salience share this perspe that certain entities are more salient (or central) than others in the consciousness o speaker and the hearer, and that there is a correspondence between linguistic form degree of salience.In centering theory (Grosz et al. 1995), the 'centers' of an utteranc the entities (or semantic objects) that link that utterance to others in the segment o discourse in question.According to Grosz et al. (1995), each utterance has a set of forw looking centers (Cf) that are realized through the constituent expressions of an utte (U).The elements of Cf are ranked according to their relative salience.Moreover, oxiàng gāng bìyè de xi In order to define the notion of salienc ( 2015) and Von Heusinger and Schumache teristics of salience: (i) Relational (or singling-out): the promin language units of the same level (e.g., s (ii) Dynamic: the prominent status may ch (iii) Structural attraction: prominent units a According to Von Heusinger and Schu from the fact that an entity is considered sal other entities.In the process of interpreting a erential expressions) are in competition wit that attracts the attention of the hearer and anaphoric expression.In Example (1), the re zuòzhě 'the author of this book' are in comp clause, it is respectively zhǔrèn and zhè běn s hearer in (1) a and (1) b, and become the sal   (1) a The second criterion emphasizes that th as the discourse progresses.As a result, a referent of an anaphoric expression at a par its high-saliency status later, as a result of th third characteristic proposes that salient uni ture building and may contribute to more o corollary of the special attention attributed to fact that a salient referent can be more easily In the literature, several theories close t that certain entities are more salient (or cen speaker and the hearer, and that there is a degree of salience.In centering theory (Gros the entities (or semantic objects) that link t discourse in question.According to Grosz et looking centers (Cf) that are realized throug (U).The elements of Cf are ranked accordi The director introduced me to the author of this book, who appeared to be a young lady who had just graduated.' The second criterion emphasizes that the degree of salience of an entity may change as the discourse progresses.As a result, a referent considered salient enough to be the referent of an anaphoric expression at a particular time (or place) may lose (or maintain) its high-saliency status later, as a result of the influence of salience factors (see below).The third characteristic proposes that salient units may be more central in the process of structure building and may contribute to more operations or structures.This seems to be the corollary of the special attention attributed to the most salient entity and could explain the fact that a salient referent can be more easily retrieved by a reduced linguistic form.
In the literature, several theories close to the notion of salience share this perspective that certain entities are more salient (or central) than others in the consciousness of the speaker and the hearer, and that there is a correspondence between linguistic form and degree of salience.In centering theory (Grosz et al. 1995), the 'centers' of an utterance are the entities (or semantic objects) that link that utterance to others in the segment of the discourse in question.According to Grosz et al. (1995), each utterance has a set of forwardlooking centers (C f ) that are realized through the constituent expressions of an utterance (U).The elements of C f are ranked according to their relative salience.Moreover, each utterance other than the initial utterance contains a single backward-looking center (C b ) which is to be chosen from the C f of the preceding utterance and represents the discourse entity with which the current utterance is most concerned.Various factors can influence the ranking of C f in an utterance.Most of the work in centering theory emphasizes the role of syntactic functions, and considers that the subject is more likely to contribute to a rise in the ranking.Other factors such as word order, subordination, and lexical semantics are also assumed to affect the ranking.
In accessibility theory (Ariel 1990), the choice of referential expressions (or accessibility markers) by the speaker tells us about the cognitive accessibility of the referent in the mental representation of the discourse.A speaker will use a high (or low) accessibility marker to encode a referent that is assumed to be accessible (or inaccessible) to the hearer.Four factors are considered to have a determining effect on the degree of accessibility: (i) Distance: The distance between the antecedent and the anaphor (relevant to subsequent mentions only); (ii) Competition: The number of competitors on the role of antecedent; (iii) Saliency: The antecedent being a salient referent, mainly whether it is a topic or a non-topic; (iv) Unity: The antecedent being within vs. without the same frame/world/point of view/ segment or paragraph as the anaphor.(Ariel 1990, pp. 28-29) In addition to the accessibility theory, the Givenness Hierarchy (Gundel et al. 1993) also intends to associate different uses of referential expressions in discourse with the cognitive status of referents in the mental representation of interlocutors.A hierarchy of six cognitive statuses ranging from 'in focus' to 'type identifiable' is suggested: According to Gundel et al. (1993), a cognitive status higher (or more to the left) in the hierarchy includes all the lower statuses, and not the reverse.For example, an entity in focus is necessarily activated, whereas an activated entity is not necessarily in focus.With this inclusive feature, the hierarchy can allow the use of a referring expression corresponding to the lower cognitive status for an entity of a higher status, which is different from the accessibility theory which considers that the choice of a marker corresponds to a given degree of accessibility in the accessibility scale.
In fact, if the cognitive status of referents is often analyzed through the observation of referential expressions, it should be pointed out that it seems more likely that the hearer establishes a referent in his mental representation of the discourse and that he relates subsequent references to this referent to his mental representation, rather than to the original linguistic expression in the text (Brown and Yule 1983).While the entities of discourse are virtually present in the mental representation of the interlocutors, the salience, as a property of the entities, is neither tangible nor visible.It is thus difficult to learn, in a direct way, the degree of salience of entities.
Most of the above-mentioned studies agree that the lexical form of an entity could reflect the salience degree of a referent in its immediate context, especially for reduced lexical forms which represent salient referents.In our analysis, salience is quite close to but different from the notion of accessibility.On the one hand, in accessibility theory, the emphasis is put on the one-to-one relationship between the form of an expression and the cognitive status of the entity to which the expression refers, with a more or less static view.We consider that the lexical form of an entity is only a reflection of the salience degree of a referent in its immediate context.And this reflection of the salience degree by the form of expressions is more complex than a one-to-one relation in an authentic text.Our view of this relationship is broadly consistent with that of Gundel et al. (1993), who argue that a referent with 'in focus' (salience) cognitive status may be realized prototypically by reduced forms, or less frequently by other linguistic forms generally related to a less salient referent.Therefore, even if salient referents are not always introduced by reduced referential expressions, high salience markers (anaphoric personal pronouns and zero pronouns) necessarily encode salient referents in their context of occurrence.
On the other hand, the degree of salience does not depend solely on the four factors in accessibility theory.If the discourse entities are constantly updated by textual data, the characteristics of the pronoun, of the antecedent, of other elements (e.g., verbs and grammatical constructions) of the relevant sentences (or, more broadly, of a discourse segment), the inherent properties of the referent, and the relational properties between the antecedent and anaphoric expressions are all likely to influence salience, hence the importance of a multifactorial approach to salience analysis (Landragin 2004;Hou and Landragin 2019).In accessibility theory, only the distance factor has been measured quantitatively to demonstrate distributional differences between different accessibility markers (i.e., pronoun, demonstrative, and definite description) and their antecedent.In our study, we will measure and compare several factors in two languages to understand their contribution to referential salience.By adopting this quantitative and contrastive method, which goes beyond the scope of accessibility theory, we aim to provide empirical evidence supporting the multifactorial nature of salience.This evidence will not only enhance our understanding of salience in cognitive terms but will also contribute to a better understanding of anaphora interpretation.
In our exploration of salience from a relational perspective, we consider that the salience of an entity is determined not only by the factors that are associated with the entity in question, but also by those that arise from the contexts of its potential competitors.This relational point of view is, however, taken into account by the centering theory (Grosz et al. 1995), which proposes a ranking of C f according to their degree of salience.However, the centering theory focuses on local coherence and models the relationship between two consecutive utterances, whereas an utterance can be linked to another more previous utterance.As a result, this theory could not explain cases where an anaphoric expression that marks high salience is not linked to an entity realized by an expression in the preceding utterance (i.e., where an anaphora and its antecedent are not located in two consecutive utterances), as well as cases where two expressions that are markers of high salience are found in the same utterance.A focus on local coherence might also miss factors that have a more global influence, such as factors from the context of encyclopedic knowledge and general cognitive processes (e.g., factors associated with the inherent semantic properties of a referent).By extending the analysis beyond immediate linguistic elements to encompass broader discourse factors, our approach offers a more nuanced understanding of the anaphora-antecedent relationship.
In our conception of salience, there is no limit to the number of salient entities in a single utterance, but the durability of the high-salience status of two or more entities over the course of the processing of the entire utterance must be questioned, as the analysis of salience must also take into account the moment and progress of the current processing or production.An entity is salient in relation to its own context and through the properties (or factors) that belong to it.That is to say, high salience status is the result of an accumulation of factors related to (but not limited to) the properties of the antecedent and the anaphora, the properties of other elements (i.e., referential, verbal or other elements) in the sentence of the antecedent and in that of the anaphora (or, even more broadly, in a segment of discourse), the inherent properties of the referent, the relational properties between the antecedent and the anaphora, the situational context, etc.In Example (2), the salience status of referents cannot be established solely on the basis of the content of the first sentence.Instead, the whole situation constructed by the two sentences in (2) involves a set of potential factors (such as syntactic function, syntactic parallelism, or animacy), making the referents 'Susan' and 'Betsy' salient for being the referents of elle and lui, respectively.
[  Cornish (2000)] In this article, we consider salience as the property of a discourse entity to be more in the center of attention in relation to other entities, in the mental representation of the speaker and the hearer, at a specific moment, and in a specific context.The notion is characterized by its relational, dynamic, and structural attraction aspects.Moreover, the complexity of the notion requires a model that considers the salience from a multifactorial perspective.According to Landragin (2004), two dimensions of salience can be distinguished, namely factors related to the cognitive aspect, such as perceptual intentions, subject attention, memory or affect, and factors related to the physical aspect.The latter includes, on the one hand, formal physical factors, such as salience due to particular syntactic constructions, syntactic function, and word order, and on the other hand semantic physical factors such as salience related to the thematic role or the theme (or topic) of the utterance.In line with this research, Hou and Landragin (2019) revisited salience factors and categorized factors into syntactic, semantic, textual, and pragmatic domains: (i) Syntactic factors: syntactic function, grammatical constructions with salience effect, syntactic parallelism, and syntactic hierarchy; (ii) Semantic factors: verb semantics (in the utterance of the antecedent or of the pronoun) and referents' semantic features; (iii) Textual factors: order of occurrence of the referents, recency (distance), frequency of occurrence of the referents, uniqueness, and main character; (iv) Pragmatic factors: pragmatic constraint and the given-new distinction.
The influence of multiple factors in salience analysis or in anaphora resolution has been observed in several languages, such as French (Landragin 2004(Landragin , 2015;;Schnedecker 2011), English (Chiarcos 2011), Spanish (Lozano 2016; Martín-Villena and Lozano 2020) for L2 Spanish learners, and English (Quesada and Lozano 2020) for L2 English Learners.In this study, we examine these phenomena in light of an original study of salience in Chinese, aiming to de-lineate the specific characteristics and underlying mechanisms that drive referential salience in this language, and especially in a contrastive approach (French/Chinese).It is in this multifactorial and contrastive approach that we analyze five salience factors in this study: syntactic function, syntactic parallelism, animacy, mobility, and main character.

Salience Factors under Investigation
After clarifying our approach to the notion of salience, we review the discussions in the literature on the factors analyzed in this study in order to examine if they have a statistically significant influence on referents' salience, and if the factors show similar or different effects in Chinese and in French.Five representative factors among all the factors discussed in Hou and Landragin (2019) were selected, since these factors were found to be influential in both languages we are analyzing, and they consistently appear across the corpus, ensuring a robust dataset for analysis.The other factors have not been annotated and examined, since annotating all the factors is very time consuming, and some factors, such as syntactic constructions with salience effect, verb semantics (of implicit causality), the concrete/abstract nature of referents or pragmatic constraint, have a relatively restricted occurrence or are even virtually unobservable in our quantitative analysis corpus, which proves to be quite different from the materials used in psycholinguistic studies (Stevenson et al. 1994;Sun 2014).In order to analyze these factors quantitatively with a corpus-based approach, it would be better to adopt a different methodology than the one used in this research, and to consider, for example, a search of the targeted constructions in corpus databases or in a larger corpus collection built specifically for this purpose.
In the literature, it is often argued that the most salient entity in a French sentence is the one that occupies the syntactic function of the subject.This argument is put forward especially in the work on Centering Theory and confirmed by psycholinguistic experiments (Matthews and Chodorow 1988;Gordon and Chan 1995;Hudson-D'Zmura and Tanenhaus 1997).In these experiments, a self-paced reading test and reading comprehension test were used to show that reading time is faster when the antecedent occupies the subject function.In addition to the subject, other functions (or values of the syntactic function factor) can be ranked according to their ability to contribute positively to the salience of entities (Grosz et al. 1995).
In the above-mentioned research, direct and indirect objects are classified in the same group, and it does not distinguish between the two.According to a cognitive point of view (Van Hoek 2007), when there are two objects in the sentence, the degree of salience of the direct object (DO) and that of the indirect object (IO) differs.While the subject functions as the most salient entity (or Figure in cognitive terms) in the sentence, the DO functions as the second most salient entity (or primary landmark in cognitive terms) and is more prominent than the other object (the secondary landmark), which yields the following hierarchy: (3) Subject > direct object > indirect object > other In Chinese, the topic (if there is one in the sentence) is considered to be the function that contributes the most to a referent's salience (Jiang 2004(Jiang , 2017;;Wang 2004).Although 'topic/theme' is primarily considered to be a pragmatic notion (Reinhart 1981) or a notion of information structure (Lambrecht 1994), and although the 'topic-comment' structure is universal, it should be noted that languages have different formal devices to encode it, hence the importance of distinguishing a pragmatic topic which constitutes what the comment is about in a 'topic-comment' structure from the syntactic topic which is the formal device of a pragmatic topic (Gundel 1988).This distinction is especially important for Chinese (Li and Thompson 1976;Huang 1992;Her 1991;Shi 2000), which is considered as a pragmatic language (Huang 1994(Huang , 2000) ) and a topic-prominent language (Li and Thompson 1976).This being said, a pragmatic topic is not always encoded by a syntactic topic (it can also be encoded by a syntactic subject).Syntactic topics, however, refer always to pragmatic topics.In Examples ( 4) and ( 5), the expressions zhè kuài jiāsù de suìpiàn ('the accelerating fragment') and tā ('it'), which are not subjects of the sentences, constitute the syntactic topics and encode also the pragmatic topics in (4) and ( 5 In order to define the notion of salience (or prominence), Himmelmann and Primus (2015) and Von Heusinger and Schumacher (2019) proposed three fundamental characteristics of salience: (i) Relational (or singling-out): the prominent status is the result of competition among language units of the same level (e.g., syllables, referents); (ii) Dynamic: the prominent status may change; (iii) Structural attraction: prominent units are structural attractors in their domain.
According to Von Heusinger and Schumacher (2019), the relational principle results from the fact that an entity is considered salient only if it is more salient in relation to the other entities.In the process of interpreting anaphors, discourse referents (realized by referential expressions) are in competition with one another and it is the most salient entity that attracts the attention of the hearer and provides an anchor for the resolution of an anaphoric expression.In Example (1), the referents zhǔrèn 'the director' and zhè běn shū de zuòzhě 'the author of this book' are in competition.After the interpretation of the second clause, it is respectively zhǔrèn and zhè běn shū de zuòzhě that attract more attention of the hearer in (1) a and (1) b, and become the salient referents in their own context.
(1) a.The second criterion emphasizes that the degree of salience of an entity may change as the discourse progresses.As a result, a referent considered salient enough to be the referent of an anaphoric expression at a particular time (or place) may lose (or maintain) its high-saliency status later, as a result of the influence of salience factors (see below).The third characteristic proposes that salient units may be more central in the process of structure building and may contribute to more operations or structures.This seems to be the corollary of the special attention attributed to the most salient entity and could explain the fact that a salient referent can be more easily retrieved by a reduced linguistic form.
In the literature, several theories close to the notion of salience share this perspective that certain entities are more salient (or central) than others in the consciousness of the speaker and the hearer, and that there is a correspondence between linguistic form and degree of salience.In centering theory (Grosz et al. 1995), the 'centers' of an utterance are the entities (or semantic objects) that link that utterance to others in the segment of the discourse in question.According to Grosz et al. (1995), each utterance has a set of forwardlooking centers (Cf) that are realized through the constituent expressions of an utterance (U).The elements of Cf are ranked according to their relative salience.Moreover, each ng zh of entities (Grosz et al. 1995).
In the above-mentioned research, direct and indirect objects are classified in group, and it does not distinguish between the two.According to a cognitive poin (Van Hoek 2007), when there are two objects in the sentence, the degree of salien direct object (DO) and that of the indirect object (IO) differs.While the subject f as the most salient entity (or Figure in cognitive terms) in the sentence, the DO f as the second most salient entity (or primary landmark in cognitive terms) and prominent than the other object (the secondary landmark), which yields the follo erarchy: (3) Subject > direct object > indirect object > other In Chinese, the topic (if there is one in the sentence) is considered to be the that contributes the most to a referent's salience (Jiang 2004(Jiang , 2017;;Wang 2004).A 'topic/theme' is primarily considered to be a pragmatic notion (Reinhart 1981) or of information structure (Lambrecht 1994), and although the 'topic-comment' str universal, it should be noted that languages have different formal devices to e hence the importance of distinguishing a pragmatic topic which constitutes what ment is about in a 'topic-comment' structure from the syntactic topic which is th device of a pragmatic topic (Gundel 1988).This distinction is especially importan nese (Li and Thompson 1976;Huang 1992;Her 1991;Shi 2000), which is consid pragmatic language (Huang 1994(Huang , 2000) ) and a topic-prominent language (Li and son 1976).This being said, a pragmatic topic is not always encoded by a syntactic can also be encoded by a syntactic subject).Syntactic topics, however, refer always matic topics.In Examples ( 4) and ( 5), the expressions zhè kuài jiāsù de suìpiàn ('the ating fragment') and tā ('it'), which are not subjects of the sentences, constitute th tic topics and encode also the pragmatic topics in (4) and (5).subject function.In addition to the subject, other functions (or values of the syntact tion factor) can be ranked according to their ability to contribute positively to the s of entities (Grosz et al. 1995).
In the above-mentioned research, direct and indirect objects are classified in th group, and it does not distinguish between the two.According to a cognitive point o (Van Hoek 2007), when there are two objects in the sentence, the degree of salience direct object (DO) and that of the indirect object (IO) differs.While the subject fu as the most salient entity (or Figure in cognitive terms) in the sentence, the DO fu as the second most salient entity (or primary landmark in cognitive terms) and i prominent than the other object (the secondary landmark), which yields the follow erarchy: (3) Subject > direct object > indirect object > other In Chinese, the topic (if there is one in the sentence) is considered to be the fu that contributes the most to a referent's salience (Jiang 2004(Jiang , 2017;;Wang 2004).Al 'topic/theme' is primarily considered to be a pragmatic notion (Reinhart 1981) or a of information structure (Lambrecht 1994), and although the 'topic-comment' stru universal, it should be noted that languages have different formal devices to enc hence the importance of distinguishing a pragmatic topic which constitutes what th ment is about in a 'topic-comment' structure from the syntactic topic which is the device of a pragmatic topic (Gundel 1988).This distinction is especially important f nese (Li and Thompson 1976;Huang 1992;Her 1991;Shi 2000), which is consider pragmatic language (Huang 1994(Huang , 2000) ) and a topic-prominent language (Li and T son 1976).This being said, a pragmatic topic is not always encoded by a syntactic t can also be encoded by a syntactic subject).Syntactic topics, however, refer always t matic topics.In Examples ( 4) and ( 5), the expressions zhè kuài jiāsù de suìpiàn ('the a ating fragment') and tā ('it'), which are not subjects of the sentences, constitute the tic topics and encode also the pragmatic topics in (4) and (5).Except for the difference in the primacy of topic function in Chinese, Wang (2004) and Jiang (2004Jiang ( , 2017) ) propose the same ranking of other values as in French: (6) Topic > subject > object(s)> other Another essential factor is syntactic parallelism, also called structural parallelism.This is a phenomenon whereby anaphoric pronouns prefer to co-refer to an element having the same syntactic function in the previous clause.Unlike the previous factor, which is a syntactic property of the antecedent expression, syntactic parallelism concerns both the properties of the antecedent and those of the anaphor, or more precisely a relational property between the two expressions.In the literature, this phenomenon was first observed and considered for pronouns in subject function (Grober et al. 1978;Zhu 2002), as shown in example (7), and later for the interpretation of pronouns in object function (Chambers and Smyth 1998;Jiang 2004), as shown in example (8).In our analysis, we consider that there is a parallel relationship between the antecedent and the anaphor in cases where both expressions function as subject, DO, or IO.In addition to syntactic properties, we also analyze two semantic factors, animacy and mobility, which are the inherent properties of referents.It is often discussed in the literature, particularly in cognitive linguistic and psycholinguistic approaches, that animate entities are generally more salient than inanimate entities in both French and Chinese (Lyons 1980;Comrie 1989;Langacker 1991;Pattabhiraman 1992;Hou and Sun 2005;Wang 2014).On the other hand, the semantic feature 'mobility' is less often analyzed as a salience factor.According to Talmy (2000), Zhang (2007), and Schmid (2010), movable entities are supposed to attract more attention than immovable entities and are therefore expected to be more salient.In this article, through the exploitation of corpus data, we attempt to confirm the influence of the mobility factor on salience.
In order to decide which non-human beings we consider animate, we adopted Yamamoto's (1999) criterion that animate entities must have a face.Thus, body parts of a human or an animate object will be treated as inanimate.Although body parts have a more or less animate characteristic, this animate characteristic is in fact transferred from the entire animate (or human) entity.In other words, they do not possess in themselves this animacy.For the mobility factor, Schmid (2010) and Talmy (2000) consider that immovable entities have a permanent location.In addition to this criterion, in order to distinguish movable entities from immovable ones, we consider that movable entities are those that have, undoubtedly, the ability to move, or those that undergo a change in location in our text excerpts.As shown in example (9), tā ('she') is considered as an animate and movable entity, while tā de yī zhī sh Languages 2023, 8, x FOR PEER REVIEW 3 of 28

Salience: Main Characteristics, Related Theories, and Multifactorial Approach
In order to define the notion of salience (or prominence), Himmelmann and Primus (2015) and Von Heusinger and Schumacher (2019) proposed three fundamental characteristics of salience: (i) Relational (or singling-out): the prominent status is the result of competition among language units of the same level (e.g., syllables, referents); (ii) Dynamic: the prominent status may change; (iii) Structural attraction: prominent units are structural attractors in their domain.
According to Von Heusinger and Schumacher (2019), the relational principle results from the fact that an entity is considered salient only if it is more salient in relation to the other entities.In the process of interpreting anaphors, discourse referents (realized by referential expressions) are in competition with one another and it is the most salient entity that attracts the attention of the hearer and provides an anchor for the resolution of an anaphoric expression.In Example (1), the referents zhǔrèn 'the director' and zhè běn shū de zuòzhě 'the author of this book' are in competition.After the interpretation of the second clause, it is respectively zhǔrèn and zhè běn shū de zuòzhě that attract more attention of the hearer in (1) a and (1) b, and become the salient referents in their own context.
( The second criterion emphasizes that the degree of salience of an entity may change as the discourse progresses.As a result, a referent considered salient enough to be the referent of an anaphoric expression at a particular time (or place) may lose (or maintain) its high-saliency status later, as a result of the influence of salience factors (see below).The third characteristic proposes that salient units may be more central in the process of structure building and may contribute to more operations or structures.This seems to be the corollary of the special attention attributed to the most salient entity and could explain the fact that a salient referent can be more easily retrieved by a reduced linguistic form.
In the literature, several theories close to the notion of salience share this perspective that certain entities are more salient (or central) than others in the consciousness of the speaker and the hearer, and that there is a correspondence between linguistic form and degree of salience.In centering theory (Grosz et al. 1995), the 'centers' of an utterance are the entities (or semantic objects) that link that utterance to others in the segment of the discourse in question.According to Grosz et al. (1995), each utterance has a set of forwardlooking centers (Cf) that are realized through the constituent expressions of an utterance (U).The elements of Cf are ranked according to their relative salience.Moreover, each u ('one of his hands') is considered as an inanimate and movable entity.used in this research, and to consider, for example, a search of the targeted const in corpus databases or in a larger corpus collection built specifically for this purp In the literature, it is often argued that the most salient entity in a French sen the one that occupies the syntactic function of the subject.This argument is put especially in the work on Centering Theory and confirmed by psycholinguistic ments (Matthews and Chodorow 1988;Gordon and Chan 1995;Hudson-D'Zm Tanenhaus 1997).In these experiments, a self-paced reading test and reading com sion test were used to show that reading time is faster when the antecedent occu subject function.In addition to the subject, other functions (or values of the syntac tion factor) can be ranked according to their ability to contribute positively to the of entities (Grosz et al. 1995).
In the above-mentioned research, direct and indirect objects are classified in t group, and it does not distinguish between the two.According to a cognitive poin (Van Hoek 2007), when there are two objects in the sentence, the degree of salien direct object (DO) and that of the indirect object (IO) differs.While the subject f as the most salient entity (or Figure in cognitive terms) in the sentence, the DO f as the second most salient entity (or primary landmark in cognitive terms) and prominent than the other object (the secondary landmark), which yields the follo erarchy: (3) Subject > direct object > indirect object > other In Chinese, the topic (if there is one in the sentence) is considered to be the that contributes the most to a referent's salience (Jiang 2004(Jiang , 2017;;Wang 2004).A 'topic/theme' is primarily considered to be a pragmatic notion (Reinhart 1981) or of information structure (Lambrecht 1994), and although the 'topic-comment' str universal, it should be noted that languages have different formal devices to en hence the importance of distinguishing a pragmatic topic which constitutes what ment is about in a 'topic-comment' structure from the syntactic topic which is th device of a pragmatic topic (Gundel 1988).This distinction is especially important nese (Li and Thompson 1976;Huang 1992;Her 1991;Shi 2000), which is conside pragmatic language (Huang 1994(Huang , 2000) ) and a topic-prominent language (Li and son 1976).This being said, a pragmatic topic is not always encoded by a syntactic can also be encoded by a syntactic subject).Syntactic topics, however, refer always matic topics.In Examples (4) and ( 5), the expressions zhè kuài jiāsù de suìpiàn ('the ating fragment') and tā ('it'), which are not subjects of the sentences, constitute the tic topics and encode also the pragmatic topics in (4) and (5).

Salience: Main Characteristics, Rela
In order to define the notion of (2015) and Von Heusinger and Schu teristics of salience: (i) Relational (or singling-out): the language units of the same level (ii) Dynamic: the prominent status (iii) Structural attraction: prominent According to Von Heusinger an from the fact that an entity is conside other entities.In the process of interp erential expressions) are in competit that attracts the attention of the hea anaphoric expression.In Example (1) zuòzhě 'the author of this book' are in clause, it is respectively zhǔrèn and z hearer in (1) a and (1) b, and become The second criterion emphasize as the discourse progresses.As a re referent of an anaphoric expression a its high-saliency status later, as a resu third characteristic proposes that sali ture building and may contribute to corollary of the special attention attri fact that a salient referent can be mor In the literature, several theories that certain entities are more salient speaker and the hearer, and that the degree of salience.In centering theor the entities (or semantic objects) tha discourse in question.According to G looking centers (Cf) that are realized (U).The elements of Cf are ranked In order to define the notion of salience (or prominence), Himmelmann and (2015) and Von Heusinger and Schumacher (2019) proposed three fundamental teristics of salience: (i) Relational (or singling-out): the prominent status is the result of competition language units of the same level (e.g., syllables, referents); (ii) Dynamic: the prominent status may change; (iii) Structural attraction: prominent units are structural attractors in their domai According to Von Heusinger and Schumacher (2019), the relational principle from the fact that an entity is considered salient only if it is more salient in relatio other entities.In the process of interpreting anaphors, discourse referents (realized erential expressions) are in competition with one another and it is the most salien that attracts the attention of the hearer and provides an anchor for the resolutio anaphoric expression.In Example (1), the referents zhǔrèn 'the director' and zhè bě zuòzhě 'the author of this book' are in competition.After the interpretation of the clause, it is respectively zhǔrèn and zhè běn shū de zuòzhě that attract more attentio hearer in (1) a and (1) b, and become the salient referents in their own context.
(1) a. [主任] The second criterion emphasizes that the degree of salience of an entity may as the discourse progresses.As a result, a referent considered salient enough to referent of an anaphoric expression at a particular time (or place) may lose (or m its high-saliency status later, as a result of the influence of salience factors (see belo third characteristic proposes that salient units may be more central in the process o ture building and may contribute to more operations or structures.This seems t corollary of the special attention attributed to the most salient entity and could exp fact that a salient referent can be more easily retrieved by a reduced linguistic for In the literature, several theories close to the notion of salience share this per that certain entities are more salient (or central) than others in the consciousnes speaker and the hearer, and that there is a correspondence between linguistic fo degree of salience.In centering theory (Grosz et al. 1995), the 'centers' of an uttera the entities (or semantic objects) that link that utterance to others in the segmen discourse in question.According to Grosz et al. (1995), each utterance has a set of f looking centers (Cf) that are realized through the constituent expressions of an u (U).The elements of Cf are ranked according to their relative salience.Moreov The last factor analyzed-main character-is categorized as a textual factor.Sanford and Garrod (1981) consider that a particular centrality is given to main characters when interpreting anaphors in written texts.Lima and Bianco's (1999) experiments show that the textual cue of the main character is crucial for anaphoric interpretation among French students.According to their study, references to the main character are always easier to understand, irrespective of the syntactic functions of the referent.In the corpus study of Jiang (2004), it is found that when only one main character is involved in a Chinese discourse, zero anaphora may even go across clauses or sentences to refer to the main character (which is mentioned several clauses before).In our study, we determined that the main character is the most often mentioned referent in our four text excerpts.

Corpus and Annotation Methodology
The corpus of this study is composed of four narrative text excerpts of relatively small size, listed in Appendix A. In these excerpts, markables were annotated manually 2 using the TXM software (Heiden 2010).The corpus includes both the excerpts in their original language and the corresponding translation excerpts in the other language.While the two excerpts from 'The Belly of Paris' (FR and CTRF) are taken from the beginning of the novel and represent typical characteristics of the narrative genre, the two excerpts from 'The Dark Forest' (FTRC and CH) are in the middle of a narrative science fiction novel.Thus, even though the four annotated excerpts are of same genre, they are deliberately chosen to be distinctive.A summary of the annotation information is presented in Table 1, and the factors and the annotated values of each factor are summarized in Table 2.In order to annotate salience factors and to facilitate data processing, the following major preparation stages have been carried out: (i) Annotation of all referential expressions.Co-referential expressions are assigned the same referent identifier under the REF property, as shown in Figure 1.(ii) Annotation of properties for high salience markers and potential antecedents (i.e., syntactic function, animacy, and main character).(iii) Exporting, cleaning, and formatting text data from the TXM tool to a CSV table.(iv) Generating new properties (i.e., mobility 4 and syntactic parallelism) in the CSV table based on properties already annotated.(v) Data processing for statistical methods.
With respect to high salience markers, we decided to focus on two types of markers: anaphoric personal pronouns and zero anaphors.This choice is justified firstly by the fact that these markers all contain little lexical information, and, on the other hand, they are considered to be highly accessible markers (Ariel 1990) that orient the hearer towards salient referents.The zero anaphor, like its pronominal counterpart, indicates a coherence mechanism in both languages, namely that the speaker will continue to talk about a referent already salient or present in a salient situation (Kleiber 1994).This choice is also motivated by the fact that, according to Gundel et al. (1993), the two reduced forms in a discourse inevitably encode the salient referents with the most restrictive cognitive status, even though, in a much less frequent way, other linguistic forms can also be used to realize a salient referent in narrative texts.The observation of pronominal and zero anaphors ensures that entities identified in this way must be salient in their context, so that we can perform an analysis of these entities and the factors influencing referential salience.(iv) Generating new properties (i.e., mobility 4 and syntactic parallelism) in the CSV table based on properties already annotated.(v) Data processing for statistical methods.With respect to high salience markers, we decided to focus on two types of markers: anaphoric personal pronouns and zero anaphors.This choice is justified firstly by the fact that these markers all contain little lexical information, and, on the other hand, they are considered to be highly accessible markers (Ariel 1990) that orient the hearer towards salient referents.The zero anaphor, like its pronominal counterpart, indicates a coherence mechanism in both languages, namely that the speaker will continue to talk about a referent already salient or present in a salient situation (Kleiber 1994).This choice is also motivated by the fact that, according to Gundel et al. (1993), the two reduced forms in a discourse inevitably encode the salient referents with the most restrictive cognitive status, even though, in a much less frequent way, other linguistic forms can also be used to realize a salient referent in narrative texts.The observation of pronominal and zero anaphors ensures that entities identified in this way must be salient in their context, so that we can perform an analysis of these entities and the factors influencing referential salience.
Before presenting the statistical methods (Section 3.2), it is necessary to explain the influence of our approach on the data exploitation methodology.Our conception of salience is based on the fact that it is a relational notion.The high salience status of an entity exists only in comparison with other entities of the same type.If we have a series of expressions in a text (schematized by Example ( 10)), we consider that the referent of Xn+4 is salient, and that this salience is determined by various factors.Analysis solely in terms of the characteristics of the anaphor (Xn+4) and the antecedent (Xn) would neglect the relational principle and the role of other potential antecedents in the process of anaphora interpretation.Before presenting the statistical methods (Section 3.2), it is necessary to explain the influence of our approach on the data exploitation methodology.Our conception of salience is based on the fact that it is a relational notion.The high salience status of an entity exists only in comparison with other entities of the same type.If we have a series of expressions in a text (schematized by Example ( 10)), we consider that the referent of X n+4 is salient, and that this salience is determined by various factors.Analysis solely in terms of the characteristics of the anaphor (X n+4 ) and the antecedent (X n ) would neglect the relational principle and the role of other potential antecedents in the process of anaphora interpretation.

Statistical Methodology
In this study, the variables are of the categorical type ('salient' versus 'not salient', or the different values of each saliency factor).The Chi-squared (Chi2) test, Fisher's exact test, and Cramer's V test were applied in order to determine whether the association between the factor in question and the salience of an entity was statistically significant, and to determine the strength of this association.We also provide contingency tables and the conditional distribution of observations.In a contingency table, one variable is generally a response variable Y (the 'salience' variable in our analysis) and the other is an explanatory variable X (each salience factor).It is therefore instructive to construct a conditional probability distribution for the values of Y, given the value of X, in order to compare the various values of each salience factor.
Both the Chi2 test and Fisher's exact test aim to determine whether the two variables analyzed in a contingency table are not independent.Generally, the Chi2 test applies to large-sample data and Fisher's exact text is used when the sample size is small and especially when up to 20% of the cells have an expected number below 5.For all factors, we applied both tests in order to have a double check.The interpretation of these two significance tests is based primarily on the p-value.We chose the 0.001 significance level in order to reject null hypotheses, which are the absence of dependence between the factor in question and the salience.
Cramer's V test was used in order to measure the intensity of dependence and to make a comparison between factors, between excerpts, or between the two languages.According to Sheskin (2011), a V value below 0.3 indicates a weak association.When the V value is between 0.3 and 0.5, there is a moderate association between the two variables.And a V value greater than 0.5 indicates a high degree of dependence.
Association plots of the factors indicate the over-/under-representation of the observed frequency of a cell in a contingency table and its significance, and can help to analyze the contribution of the values taken by each factor.In an association plot, the color of the shading and the (upward or downward) orientation correspond to the (positive or negative) sign of a residual, which is used to measure the difference between the observed frequency and the expected frequency.The intensity of the shading shows its relative importance.This graph therefore makes it possible to analyze the positive/negative contribution of each value of our five factors.Multiple correspondence analysis (MCA) graphs are presented in Section 8 to visualize the relationships between salience and the five factors analyzed.Multiple correspondence analysis applies to a table which cross-classifies each individual (i.e., referential entity) with respect to all the categorical variables including the salience factors and the salience status.These MCA graphs take into account all the values (or modalities) assigned to each observation sample and represent the values often associated with a high degree of salience.We use Python 'SciPy' library and 'dython' library to calculate Chi-square tests and Cramer V-values, the R software 'vcd' library to generate association plots, and the 'Prince' library to obtain MCA graphs.

Influence of the Syntactic Function Factor
In order to show the ranking of the various values of the factor, we first present the counts of the syntactic function versus the salience or not of the entities in Table 3.We also present the conditional distributions of the salience, given the syntactic function of the previous mention of the referent.
In both French excerpts (FR and FTRC), the subject is the function that contributes the most to increasing the referents' salience.In addition, the conditional distribution of referents realized previously by an IO shows that 50% of IOs are salient in the FR and FTRC excerpts, while the marginal percentages of salient referents are, respectively, 29.89% and 33.77%.This suggests that the IO function may contribute to referents' salience.A closer observation of the sentences containing IOs indicates that this salience may be due to the fact that an IO referent is often a human entity or even a main character in the text, at least in our four excerpts.
In the two Chinese excerpts (CTRF and CH), the topic appears to be the value that contributes the most to the increase in salience.The subject value follows closely with a conditional percentage of 56.90% of the salient antecedents in the CTRF excerpt, and 58.27% in the CH excerpt.According to the conditional distributions of the referents that are the subjects of our current investigations, the two hierarchizations of syntactic function values can be established in French (11) and Chinese (12).From the point of view of probability, a referent realized by the syntactic function further to the left of the ranking is more likely to stand out than a referent with a syntactic function value further to the right of the ranking.However, due to the relatively limited occurrence of IO and topics, a confirmatory analysis is required to enhance the reliability of the topic's and IO's positions in these rankings.In order to test whether the influence of the syntactic function factor is significant and to determine the degree of intensity of this influence, we then performed the Chi2 test, the Fisher's exact test, and the Cramer's V test.The results in Table 4 suggest that, for all four excerpts, the dependence between the salience of a referent and the syntactic function of the antecedent is significant (p < 0.001).The Cramer's V values of the four text excerpts are between 0.38 and 0.58.Applying Sheskin's (2011) criteria, the influence of the syntactic function factor on salience can therefore be classified as moderate (the FR excerpt) or strong (the CTRF, FTRC, and CH excerpts).Graphically, this association can be seen in the association plots (Figure 2).In the four plots, the use of enhanced shading for the bars representing the subject and other functions demonstrates that these two values all contribute significantly to the association between syntactic function and salience: subject antecedents are significantly more frequent and other antecedents are significantly less frequent in the salient group than in the non-salient group.With respect to the rest of the functions, the bars for each function have the same orientation (up or down) in all four excerpts: while there is an over-representation 5 of topics and IOs in the salient antecedents' group, there is an under-representation of DOs in the salient group.
That being said, the topic and IO functions seem to be able to contribute to increasing the referents' salience, while the DO function decreases the salience degree.
and other antecedents are significantly less frequent in the salient group than in salient group.With respect to the rest of the functions, the bars for each function same orientation (up or down) in all four excerpts: while there is an over-repres of topics and IOs in the salient antecedents' group, there is an under-represen DOs in the salient group.That being said, the topic and IO functions seem to b contribute to increasing the referents' salience, while the DO function decreases ence degree.

Influence of the Syntactic Parallelism Factor
After having cross-classified the syntactic functions of potential antecedent referents' salience status, we seek in this section to observe the influence of ano tactic factor, namely syntactic parallelism.Unlike the factor syntactic function,

Influence of the Syntactic Parallelism Factor
After having cross-classified the syntactic functions of potential antecedents and the referents' salience status, we seek in this section to observe the influence of another syntactic factor, namely syntactic parallelism.Unlike the factor syntactic function, syntactic parallelism is a variable that contains only two values: parallelism or not.In Table 5, we present the counts cross-classifying syntactic parallelism and salience, as well as the conditional distribution of salient and non-salient referents according to whether the anaphor and the antecedent occupy the same syntactic function or not.In the four excerpts, having syntactic parallelism is more likely to contribute to the salience of the referents, making it possible to establish the ranking of the two values in both French and Chinese: According to the results of the Chi2 tests and Fisher's exact tests in Table 6, the influence of the syntactic parallelism factor on referents' salience is significant.Cramer's V values (respectively, 0.34, 0.49, 0.57, 0.54 in the FR, CTRF, FTRC, and CH excerpts) show that the dependence between syntactic parallelism and salience is stronger in the two excerpts of 'The Dark Forest' than in the two excerpts of 'The Belly of Paris' (the same phenomenon can be observed for the syntactic function factor, see Table 3), and that the strength of this dependence can be moderate or strong.The association plots also illustrate the influence of syntactic parallelism on referents' salience.Figure 3 shows a significant over-representation of syntactic parallelism phenomena and a significant under-representation of cases where there is no parallel relationship between anaphors and their antecedents in all four text excerpts.

Influence of the Semantic Features of the Referent
In this section, we step out of the syntactic domain and examine whether inherent properties of referents, such as their animate/inanimate and movable/immovable features, can influence their salience.
Tables 7 and 8 show that in all four excerpts, the proportion of salient antecedents is greater among animate entities than among inanimate entities, and the same pattern can be observed among movable and immovable entities.In that respect, we can establish the rankings of the salience degree 'animate entities > inanimate entities' and 'movable entities > immovable entities'.However, are the animate (or movable) entities significantly more prominent than the inanimate (or immovable) entities in all four excerpts?

Influence of the Semantic Features of the Referent
In this section, we step out of the syntactic domain and examine whether inherent properties of referents, such as their animate/inanimate and movable/immovable features, can influence their salience.
Tables 7 and 8 show that in all four excerpts, the proportion of salient antecedents is greater among animate entities than among inanimate entities, and the same pattern can be observed among movable and immovable entities.In that respect, we can establish the rankings of the salience degree 'animate entities > inanimate entities' and 'movable entities > immovable entities'.However, are the animate (or movable) entities significantly more prominent than the inanimate (or immovable) entities in all four excerpts?For the animacy factor, there are significantly more animate entities among the salient antecedents (p < 0.001, Table 9) in the FR, CTRF, and FTRC excerpts.In other words, animacy has a significant influence on referents' salience in these three excerpts.On the other hand, in the CH excerpt, the p value (in both the Chi2 test and the Fisher's exact test) is above the significance level (0.001), which fails to reject the independence hypothesis.Regarding the mobility factor, in all four excerpts, there are significantly more movable entities among the salient antecedents (p < 0.001, Table 10).The association plots 9-16 (see Figure 4) also show that animate (movable) entities are over-represented while inanimate (immovable) entities are under-represented among salient antecedents.While the over-representation and the under-representation are significant in all four excerpts for the mobility factor and in the FR and CTRF excerpts for the animacy factor, they are not significant in the CH excerpt for the animacy factor.In the FTRC excerpt, animate entities are significantly over-represented, but the under-representation of inanimate entities is not significant.The association plots 9-16 (see Figure 4) also show that animate (movable) entiti are over-represented while inanimate (immovable) entities are under-represented amo salient antecedents.While the over-representation and the under-representation are si nificant in all four excerpts for the mobility factor and in the FR and CTRF excerpts for t animacy factor, they are not significant in the CH excerpt for the animacy factor.In t FTRC excerpt, animate entities are significantly over-represented, but the under-represe tation of inanimate entities is not significant.The Cramer's V values in Tables 9 and 10 show that in the four excerpts, the influen of mobility on referents' salience is more stable than that of animacy: while the associati strength is between weak and strong for the animacy factor (the V values are, respective 0.65, 0.61, 0.25, and 0.06 in the FR, CTRF, FTRC, and CH excerpts), the degree of assoc tion is between moderate and strong for the mobility factor (the V values are, respective 0.53, 0.50, 0.46, and 0.42 in the FR, CTRF, FTRC, and CH excerpts).
The V values also seem to indicate that the two semantic factors play a slightly mo important role in French (the FR and FTRC excerpts) than in Chinese (the CH and CTR excerpts).Compared to the minor differences observed between the two languages, t differences are more pronounced between the two excerpts from 'The Belly of Paris' an the two excerpts from 'The Dark Forest'.For both semantic features, their influence referents' salience is greater in the 'The Belly of Paris' excerpts, especially for the anima feature.Moreover, a comparison between the V values of the two factors within the sam excerpts shows that the animacy factor plays a more important role than the mobility fa tor in the FR and CTRF excerpts, whereas the influence of mobility is greater than that animacy in the FTRC and CH excerpts.This could be explained by the fact that the degr of influence of the two factors may depend on the nature (semantic feature) of the ma characters.While the main protagonist-'Florent'-in the FR and CTRF excerpts is a h man entity (included in the animate entity category), the main character-'the droplet' space probe)-in the FTRC and CH excerpts is a movable inanimate entity.In this conte in the latter two excerpts, there are relatively more occurrences of movable inanimate The Cramer's V values in Tables 9 and 10 show that in the four excerpts, the influence of mobility on referents' salience is more stable than that of animacy: while the association strength is between weak and strong for the animacy factor (the V values are, respectively, 0.65, 0.61, 0.25, and 0.06 in the FR, CTRF, FTRC, and CH excerpts), the degree of association is between moderate and strong for the mobility factor (the V values are, respectively, 0.53, 0.50, 0.46, and 0.42 in the FR, CTRF, FTRC, and CH excerpts).
The V values also seem to indicate that the two semantic factors play a slightly more important role in French (the FR and FTRC excerpts) than in Chinese (the CH and CTRF excerpts).Compared to the minor differences observed between the two languages, the differences are more pronounced between the two excerpts from 'The Belly of Paris' and the two excerpts from 'The Dark Forest'.For both semantic features, their influence on referents' salience is greater in the 'The Belly of Paris' excerpts, especially for the animacy feature.Moreover, a comparison between the V values of the two factors within the same excerpts shows that the animacy factor plays a more important role than the mobility factor in the FR and CTRF excerpts, whereas the influence of mobility is greater than that of animacy in the FTRC and CH excerpts.This could be explained by the fact that the degree of influence of the two factors may depend on the nature (semantic feature) of the main characters.While the main protagonist-'Florent'-in the FR and CTRF excerpts is a human entity (included in the animate entity category), the main character-'the droplet' (a space probe)-in the FTRC and CH excerpts is a movable inanimate entity.In this context, in the latter two excerpts, there are relatively more occurrences of movable inanimate or immovable protagonists and fewer occurrences of protagonists in the upper level (in the animate category), as can be seen in Tables 7 and 8.As a result, the influence of the animacy factor is reduced in these two excerpts, while mobility plays a more decisive role than animacy.

Influence of the Main Character Factor
In this section, we explore whether being the main character can have an influence on referents' salience.The conditional percentages in Table 11 show that in all four excerpts, the percentage of salient entities is higher when the referent is the main character than when it is another less central character (for example, 74.68% compared to 25.32% in the FR excerpt).This indicates that being the main character can promote a referent's salience.The p values of the Chi2 and the Fisher's exact tests in Table 12 confirm the statistical significance (p < 0.001) of the influence of the main character factor.This significance is also shown in Figure 5 where there is a significant over-representation of main characters in the category of salient antecedents in all four excerpts.While less central characters are significantly under-represented in the FR and CTRF excerpts, their under-representation is not statistically significant in the FTRC and CH excerpts.With respect to the strength of association, Cramer's V values (respectively, 0.60, 0.49, 0.26, and 0.24) indicate that the association is greater in the excerpts 'The Belly of Paris'.While the effect size is rather strong in the FR and CTRF excerpts, the effect is small in the FTRC and CH excerpts.The strength of association seems to depend on the number of occurrences of the main character, since the excerpts from 'The Belly of Paris' were extracted from the beginning of the novel and contain more narration and description of the main character-'Florent'-whereas the excerpts from 'The Dark Forest' were taken from the middle of the novel and describe not only the main protagonist-'the droplet'-but also the interactions between it and the other less central protagonists.This interpretation is also supported by the number of mentions of the main character and the percentage of this number relative to the total number of mentions of referential expressions in the four excerpts, as shown in  With respect to the strength of association, Cramer's V values (respectively, 0.60, 0.4 0.26, and 0.24) indicate that the association is greater in the excerpts 'The Belly of Pari While the effect size is rather strong in the FR and CTRF excerpts, the effect is small in t FTRC and CH excerpts.The strength of association seems to depend on the number occurrences of the main character, since the excerpts from 'The Belly of Paris' were e tracted from the beginning of the novel and contain more narration and description of t main character-'Florent'-whereas the excerpts from 'The Dark Forest' were taken fro the middle of the novel and describe not only the main protagonist-'the droplet'-b also the interactions between it and the other less central protagonists.This interpretatio is also supported by the number of mentions of the main character and the percentage this number relative to the total number of mentions of referential expressions in the fo excerpts, as shown in Table 13.

Overall Results and Comparison between Factors
In the previous sections, each factor was analyzed specifically and independently from the influences of the other factors.However, no single factor alone would be able to explain all the occurrences of high salience markers.In this section, we summarize the overall results and compare the contributions of the five factors in question within each text excerpt.
Firstly, we present the MCA graphs (Figure 6) of the four excerpts, which provide a synthetic visualization of the relationships between the response variable (salience) and the explanatory variables (salience factors).In the four graphs, we can see a clear opposition between referents in the high salience (salience_YES) group and in the low salience (salience_NO) group: on the positive side of the first factorial axis, we can notice the anaphoric expressions that represent entities of high salience; on the negative side of this axis, we see the anaphoric expressions that represent entities of low salience.The two groups (i.e., the entities with, respectively, high and low salience) are also distinguished by the overrepresented values of certain factors.In the FR and CTRF excerpts, high salience is more closely related to the animate, main character, and movable values of the animacy, main character, and mobility factors (upper right corner of the plot), whereas low salience is related to the non-main character, inanimate, and immovable values (lower left corner of the plot).In the FTRC and CH excerpts, high salience is more closely associated with the subject, parallelism, and main character categories of the syntactic function, syntactic parallelism, and main character factors (lower right corner of the plot), whereas low salience is associated with the non-presence of parallelism, non-main character and other of the syntactic parallelism, and main character and syntactic function factors (upper left corner of the plot).Since component 0 (along the first factorial axis) has a greater contribution to the total inertia of the contingency table than component 1 (along the second factorial axis), all the four graphs illustrate a stronger association between high salience and subject (or topic, indirect object, syntactic parallelism, animate, movable, and main character) value, which confirms our previous analysis.Table 14 summarizes the Cramer's V values for the five factors in each excerpt.It reveals that the high salience status appears to be the result of a combination of several factors, and that this combination is not always realized in the same way: the relative importance of the factors is not always of the same order, and the relatively small effect size of one factor may be offset by an increase in the influence of other factors.For example, the small effect size of the animacy factor in the FTRC and CH excerpts could lead to the syntactic function and syntactic parallelism factors (or some other factors that have not been analyzed in this article) playing a more important role in increasing referential salience.Through the rankings of the effect sizes of the factors in ( 14), it seems difficult to establish a fixed ranking of salience factors, but it can be concluded that the differences due to textual characteristics (FR vs. FTRC, or CTRF vs. CH) are greater than the differences between the two languages (FR vs. CTRF, or FTRC vs. CH) 6 .Nevertheless, it can be noticed that Cramer's V values for the animacy, mobility, and main character factors are slightly higher in the French excerpts than in the Chinese excerpts.Since the differences are not very pronounced, additional data will be necessary to confirm whether these factors play a more important role in French than in Chinese.While it seems impossible to predict an immutable ranking in the four excerpts, the different values of each factor under investigation show an identical behavior in terms of the positive/negative contribution to salience.In other words, we have observed a homogeneity in the rankings of values under each salience factor, as shown by the following rankings: In this study, we investigated the influence of various factors on referential salience in French and Chinese.Our analysis confirms that, in all the analyzed excerpts, the syntactic function, syntactic parallelism, mobility, and main character factors all have a statistically significant influence on referents' salience.For the animacy factor, its influence is significant in most of the excerpts, except in the CH excerpt where the main character is a movable inanimate entity and animate entities have a relatively low frequency.The relative importance of each factor was not markedly different between French and Chinese.Nevertheless, it can be noticed that the animacy, mobility, and main character factors have a slightly stronger influence in the French excerpts than in the Chinese excerpts.Furthermore, we found that, even in texts of same genre, the relative importance of each salience factor can be constrained by different textual characteristics such as the nature of the main character, its number of occurrences, and the possible existence of competing protagonists.With regard to the fourth hypothesis, our findings affirm that not all values of a single factor have a uniformly positive contribution to referential salience, but the patterns of positive and negative contributions of all values are similar in the two languages.
In addition to these findings, we would also like to discuss the stability and instability of the contributions of the factors to the salience of referents.The results in Table 14 suggest that some factors may have a more stable influence on referents' salience than other factors.On the one hand, the syntactic function and syntactic parallelism factors, whose effect sizes are between moderate and strong, contribute to the increase in salience in a reliable manner.On the other hand, a greater range (between small and strong) is found for the effect sizes of the animacy and main character factors.It is likely that the role played by these two factors, as well as the mobility factor, will vary in importance depending on the nature of the texts.As discussed in Sections 6 and 7, the influence of the animacy factor may depend on whether the main character is an animate entity, while the effect of the main character factor may be constrained by the number of times the character occurs in the excerpt in question, or by the fact that there are several competing main characters.As for the mobility factor, even if its V values prove to be fairly stable (between moderate and strong) in the four excerpts, it can be presumed that in a text where the main character (or rather the most central topic) is an immovable entity, the influence of the mobility factor would also be reduced, as illustrated by the description below (16) of the Diamant dit 'le Régent' on the website of the Louvre Museum.However, it should be noted that in narrative texts, it is not very usual to have an immovable entity as the main 'character'.

Theoretical Implications
In this subsection, we aim to outline some possible implications from our findings on the referential salience factors for the different theoretical frameworks in the literature.We refrain from providing a thorough analysis here, both for space reasons and because the nature of our work remains exploratory.
First of all, in a complementary approach with respect to that of the accessibility theory (Ariel 1990), we have adopted a quantitative and contrastive method and provided empirical evidence supporting the multifactorial nature of salience.The results of Chi2 and Fisher's exact tests do confirm that the salience of the entities depends on a multitude of factors, which include, but are not limited to, our five factors under investigation.Therefore, the distinction between salience and accessibility is further underscored by our findings.While accessibility theory mainly focuses on the distance, competition, saliency, and unity factors, our study reveals that salience encompasses a broader range of factors.The understanding of the influence of these factors constitutes, in fact, the reconstruction of the cues made by the speaker so that the hearer can identify the correct referent of the anaphora in question.The effectiveness of the syntactic parallelism factor also indicates that salience depends not only on factors from the characteristics of the antecedent but also on the relational properties between the antecedent and anaphoric expressions.
We then consider the implications of our findings for centering theory (Grosz et al. 1995), which considers that various factors on a rather local level (i.e., in the preceding and current utterances) can influence the salience degree.On the one hand, we have shown, through the result of animacy, mobility, and main character factors, that determining factors do not only derive from the local level, but also from a more global context that goes beyond the limit of a series of utterances, or from the context of general cognitive processes.On the other hand, our results of Cramer's V tests in Table 14 indicate that even though the five factors all contribute significantly to referential salience, their relative importance, which does not follow a fixed ranking order, depends both on textual characteristics (to a large extent) and linguistic specificities (to a lesser extent).

Conclusions and Perspectives
In this study, we have discussed the notion of salience in discourse reference, its particularities in relation to related notions such as accessibility (Ariel 1990) and centering of attention (Grosz et al. 1995), and the importance of a multifactorial analysis of salience.We also reviewed studies on the influence of five salience factors (i.e., syntactic function, syntactic parallelism, animacy, mobility, and main character), and specified for each factor our annotation criteria and annotated values.
The annotation of salience factors and the application of statistical tests (Chi2, Fisher's exact test, and Cramer's V) showed that almost all the factors have a significant influence on referents' salience (except the animacy factor in one of the excerpts).With regard to the importance of the five factors analyzed, we found that the ranking of the factors is not always of the same order and that a lower influence of one factor could be compensated by an increase in the influence of other factors.While in all four excerpts we were able to observe a regularity in the rankings of values within each salience factor, we find it difficult to predict a fixed ranking of salience factors according to their relative importance.Although our contrastive analysis of French and Chinese excerpts reveals no significant disparities in the overall importance of each factor, there are also some notable nuances to consider.For instance, the Cramer's V values for animacy, mobility, and main character factors exhibit slightly higher values in the French excerpts compared to the Chinese ones.This subtle yet important observation may offer preliminary insights into the ongoing debate regarding the languagespecific factors that determine referential salience.Nevertheless, since the differences are not very pronounced, additional data will be necessary to confirm whether these factors play a more important role in French than in Chinese.Compared to the minor differences between the two languages, the importance of the factors appears to be more significantly constrained by textual characteristics such as the nature of the main character, its number of occurrences, and the possible existence of competing protagonists, at least for the five factors under investigation and in the four excerpts.For all the five factors, some categories (such as the subject category of the syntactic function factor) may enhance salience, while others (like the non-presence of parallelism of the syntactic parallelism factor) may diminish it, but the patterns (of positive/negative contribution) are similar in the two languages.
The results also indicate that certain factors (syntactic function and syntactic parallelism) may exert a more stable influence on referents' salience than other factors (animacy, mobility, and main character).The effect sizes of the latter may be constrained by textual properties such as the nature of the main character, its number of occurrences, and the possible existence of competing protagonists.
As a perspective of this work, we intend to examine, with annotation data, the influences of other salience factors (see Hou and Landragin 2019 for more discussion), such as the order of occurrence of referents in a sentence, the fact of being a pragmatic topic, and the syntactic hierarchy (i.e., main constituents versus modifiers).In addition, other methods of corpus analysis, such as corpus study using databases, can be considered to examine the effect of factors which occur less frequently in a relatively small corpus (such as the factor grammatical constructions with salience effect).An analysis of a corpus consisting of narrative texts of very different natures or texts of other genres also seems interesting to analyze differences in terms of the importance and the stability of salience factors.This is illustrated by Schnedecker (2021), who points out that in informative texts of the journalistic portrait type, the main referent may rarely be taken up by a pronominal form.However, it is implausible to consider that the referent is rarely perceived as salient in readers' mental representations.In this sense, the high-salience status of referents in other textual genres would not have the same pattern of manifestation nor respond to the same factors as those observed in narrative texts.In the long term, it would be useful to explore means to capture the interactions between factors, to configure a model to classify the salience or not of referents, and thus to contribute to the interpretation of anaphoric expressions.
Paul, and Marie insulted him.' down in front of him, took one of his hands and saw that it was warm.' [Le Ventre de Paris 'The Belly of Paris', Émile Zola (excerpt)] Annotation of all referential expressions.Co-referential expressions are assigned the same referent identifier under the REF property, as shown in Figure 1.(ii) Annotation of properties for high salience markers and potential antecedents (i.e., syntactic function, animacy, and main character).(iii) Exporting, cleaning, and formatting text data from the TXM tool to a CSV table.

Figure 2 .
Figure 2. Association plots for the factor syntactic function.

Figure 2 .
Figure 2. Association plots for the factor syntactic function.

Figure 3 .
Figure 3. Association plots for the factor syntactic parallelism.

Figure 3 .
Figure 3. Association plots for the factor syntactic parallelism.

Figure 4 .
Figure 4. Association plots for the factors animacy and mobility.

Figure 4 .
Figure 4. Association plots for the factors animacy and mobility.

Figure 5 .
Figure 5. Association plots for the factor main character.

Figure 5 .
Figure 5. Association plots for the factor main character.

Figure 6 .
Figure 6.Relationship between referents' salience and factors in MCA graphs.

Figure 6 .
Figure 6.Relationship between referents' salience and factors in MCA graphs.
French): Subject > (IO >) DO > other a' Syntactic function (Chinese): (Topic >) subject > (IO >) DO > other b Syntactic parallelism: Syntactic parallelism > non-presence of parallelism c Animacy: animate > inanimate d Mobility: movable > immovable e Main character: main character > non-main character 9. Discussion 9.1.Discussion of the Overall Results Cette pierre fut découverte en 1698 à Golconde, en Inde, et Ø suscita immédiatement l'intérêt de Thomas Pitt, gouverneur anglais de Madras.Le diamant fut taillé en Angleterre puis acquis à la demande du régent Philippe d'Orléans en 1717.Le Régent surpassait en beauté et en poids tous les diamants jusqu'alors connus en Occident.Aujourd'hui encore, il est considéré comme le plus beau diamant du monde par sa pureté et la qualité de sa taille.'This stone was discovered in 1698 in Golconde, India, and Ø immediately attracted the interest of Thomas Pitt, English governor of Madras.The diamond was cut in England and then purchased for the French Crown at the behest of the Regent Philippe d'Orléans in 1717.The Regent surpassed in beauty and weight all the diamonds previously known in the western world until that time.Even today, it is considered to be the most beautiful diamond in the world by its flawless brilliance and its perfect cut.'[Diamant dit 'le Régent', 'Diamond known as the "Regent"'), https://collections.louvre.fr/ark:/53355/cl010103121(accessed on 9 December 2023)] me to the author of this book, who appeared to be a young lady who had just graduated.' The director introduced me to the author of this book, who appeared to be a young lady who had just graduated.' The elements of Cf are ranked a The director introduced me to the author of this book, who appeared to be a young lady who had just graduated.' The director introduced me to the author of this book, who appeared to be a young lady who had just graduated.' ǔ ǔ ǎ ǒ The elements of Cf are ranked a ).
The director introduced me to the author of this book, who appeared to be a young lady who had just graduated.' ǔ ǔ ǎ ǒ

Table 1 .
Summary of annotation information.

Table 2 .
Annotated values of each salience factor.

Table 3 .
Contingency table of referents' salience and antecedents' syntactic function, with conditional percentages.

Table 4 .
Results of Chi2, Fisher's exact, and Cramer's V tests for the factor syntactic function.

Table 5 .
Contingency table of referents' salience and antecedents' syntactic parallelism, with conditional percentages.

Table 6 .
Results of Chi2, Fisher's exact, and Cramer's V tests for the factor syntactic parallelism.

Table 8 .
Contingency table of referents' salience and referents' mobility, with conditional percentages.

Table 9 .
Results of Chi2, Fisher's exact, and Cramer's V tests for the animacy factor.

Table 10 .
Results of Chi2, Fisher's exact, and Cramer's V tests for the mobility factor.

Table 10 .
Results of Chi2, Fisher's exact, and Cramer's V tests for the mobility factor.

Table 11 .
Contingency table of referents' salience and the factor main character, with conditional percentages.

Table 12 .
Results of Chi2, Fisher's exact, and Cramer's V tests for the main character factor.

Table 13 .
Number of mentions of the main character and its percentage in relation to the total num ber of mentions.

Table 13 .
Number of mentions of the main character and its percentage in relation to the total number of mentions.

Table 14 .
Summary of Cramer's V values for all factors.

Table 14 .
Summary of Cramer's V values for all factors.