1. Introduction
The central goal of this paper is to show how both grammatical and cognitive mechanisms involved in language processing can account for variable singular–plural agreement phenomena that, from a normative point of view, are labeled as ‘non-standard’. More specifically, we explore how the conceptualization of perceived referents leaves visible traces in the morphosyntax of Spanish. In concrete terms, in several Spanish constructions, there is a possible mismatch between the verbal agreement actually produced by speakers and the agreement form prescribed by the grammatical norm. The widespread nature of this phenomenon can be illustrated by a series of agreement patterns (1–3) that diverge, to varying degrees, from the prescriptive norm. Although some of these are traditionally classified as deviations, they often display systematic behavior that warrants closer examination
1.
| (1) | Habían | muchas | personas | en | la | fiesta. |
| | exist-3PL.PST | many-F.PL | person-F.PL | in | ART.F.SG | party |
| | ‘There were many people at the party.’ |
| (2) | Me | dolieron | las | piernas. |
| | 1SG.DAT | hurt-3PL.PST | ART.F.PL | leg-F.PL |
| | ‘My legs hurt.’ |
| (3) | Un | grupo | de | estudiantes | recorrió(-ieron) | la | avenida. |
| | INDEF.M.SG | group | of | student-M.PL | walk-3SG(PL).PST | ART.F.SG | avenue |
| | ‘A group of students walked along the avenue.’ |
In example (1), the impersonal verb
haber ‘there is’, which is considered invariable in standard Spanish, appears with plural morphology under the influence of the plural postverbal noun phrase (NP)
muchas personas ‘many persons’. This pattern can be readily found in spontaneous speech and in corpus data, although the grammatical norm would classify it as non-standard. So, the plural agreement morphology reflects the way speakers conceptualize the referent set, rather than the formal grammatical features of the syntactic head (e.g.,
Claes, 2015). In (2), the verb
dolieron ‘hurt’ agrees with the plural NP
las piernas ‘the feet’, which functions as the grammatical subject, while the dative
me ‘to me’ designates the experiencer (e.g.,
Melis, 2022). Although normative grammar fully accepts this construction, it illustrates a different kind of mismatch: the experiencer functions as the conceptual subject of the event, while the grammatical agreement is driven by the postposed, yet semantically salient, body-part referent. Finally, the case in (3) displays variable singular or plural agreement patterns depending on how speakers choose to construe the referent of the clause including a collective NP. Although both variants are acceptable in contemporary Spanish, the alternation between singular and plural agreement reflects a conceptual choice: whether the speaker treats the group as a unified collective entity (
recorrió) or focuses on the plurality of its members (
recorrieron) (e.g.,
Martínez, 1999). This leads to a comparable alternating set of constructions that constitute the empirical focus of this study, namely those in which perception verbs are followed by an infinitival clause (4–5).
| (4) | A | lo | lejos | se | oye(n) | volar | palomas. |
| | to | ART.N.SG | far | REFL | hear-3SG(PL).PRS | fly | pigeon-F.PL |
| | ‘In the distance one hears pigeons flying.’ |
| (5) | En | las | aguas | del | mar | se | ve(n) | flotar | cañas. |
| | in | ART.F.PL | water-F.PL | of-ART.M.SG | sea | REFL | see-3SG(PL).PRS | float | reed-F.PL |
| | ‘In the waters of the sea one sees reeds floating.’ |
In these constructions, the finite perception verb may appear either in singular or in plural. Formally, this variation reflects two possible agreement targets: the speaker may establish agreement either with the postposed NP (in concrete terms,
palomas ‘doves’ and
cañas ‘reeds’) or with the infinitival complement representing the perceived event as a whole (that is,
volar palomas ‘palomas fly’ and
flotar cañas ‘reeds float’). What remains to be determined is whether this alternation correlates with differences in the speaker’s perceptual construal of the witnessed scene. In the literature, examples such as these are often described as instances of ‘agreement
ad sensum’, that is, agreement motivated by meaning rather than by formal morphosyntactic features (e.g.,
Corbett, 2003). In such cases, the speaker’s cognitive representation of the referent, such as its animacy or perceptual salience, overrides the syntactic configuration that would otherwise determine agreement.
The objectives of this study are as follows: The first is to investigate which cognitive mechanisms underlie these cases of variable agreement in Spanish and more specifically to examine the role of ‘salience’, understood as the relative cognitive prominence that speakers assign to elements within a perceived/conceptualized scene (
Langacker, 1987,
2008;
Talmy, 2000;
Croft & Cruse, 2004). Salience has long been recognized as a fundamental factor in linguistic construal, governing whether speakers foreground an event as a holistic occurrence or highlight a particular participant as the Figure/Trajector against a more backgrounded Ground/Landmark. From this perspective, agreement alternations in perception–verb constructions may reflect speakers’ shifting attentional focus: at times privileging the perceived event as a single unit, and at others foregrounding the entity that initiates the perceived action. The second objective is to analyze how specific properties of the perception act or the referent (such as the overall perceptual modality and animacy of the stimulus) correlate with these grammatical options. More broadly, the aim of this study is to show why such variable agreement phenomena are relevant for linguistic theory: they offer an empirical window onto the interface between grammar and cognition and reveal how speakers negotiate between formal and conceptual constraints when producing language (see also
Acuña-Fariña et al., 2014, p. 120).
Addressing these questions requires a methodological approach that combines usage-based and experimental evidence. Corpus data are indispensable for identifying actual distributional tendencies, but they cannot on their own reveal the cognitive representations that guide speakers’ grammatical choices (e.g.,
Schütze, 2016;
Granvik et al., 2025, among many others). For this reason, corpus-based observations are complemented here with an acceptability-rating task, which allows us to probe whether observed distributional patterns align with speakers’ internal evaluations under controlled conditions. Similar corpus–experiment triangulation has proved fruitful in related domains such as argument structure alternations, as earlier research has shown (e.g.,
Bresnan et al., 2007).
The remainder of this paper is structured as follows:
Section 2 examines standard and non-standard agreement patterns in Spanish, situating the phenomenon within broader theoretical discussions on the syntax–semantics interface and the role of meaning in morphosyntactic variation.
Section 3 introduces the infinitival construction with perception verbs, which will serve as the testing ground for the present study.
Section 4 and
Section 5 present, respectively, the results of a corpus analysis and an experimental study designed to capture both the distributional tendencies and the acceptability of singular and plural agreement in these constructions. Finally,
Section 6 brings together the findings and discusses their implications for our understanding of agreement in Spanish as a phenomenon at the interface between grammatical constraints, perceptual modalities, and conceptual foregrounding.
2. Variable Verbal Agreement in Spanish
First, it is necessary to establish what the grammatical norm of Spanish defines as standard agreement. In general terms, (verbal) agreement is a formal mechanism through which elements within a clause share morphosyntactic information. As a consequence, agreement ensures internal harmony among syntactically related constituents (
Bello, 1964;
Avendaño, 2007). Classical examples include, at the level of the NP,
las casas blancas (‘the white houses’, gender and number agreement between a determiner, the noun and adjective) and, at the level of the VP,
los chicos bailan (‘the kids dance’, number and person agreement between the subject and verb). Among these, subject–verb agreement has been described as the most widespread agreement mechanism cross-linguistically, playing a central role in encoding grammatical relations (
Mallinson & Blake, 1981;
Vigliocco et al., 1996;
Mare & Pato, 2018;
Sánchez et al., 2014;
Avendaño, 2007;
Haskell & MacDonald, 2003). Indeed, from a typological perspective, three main formal mechanisms are generally recognized for identifying the syntactic function of a constituent: (i) word order, (ii) case marking, and (iii) agreement. Of these, agreement is considered the most consistent and reliable indicator across languages (
Givón, 2001).
Still, agreement is not universal, nor does it operate in the same way in all languages (
Corbett, 2006). Some languages lack morphological agreement entirely and rely on other devices to encode syntactic relations. Japanese, for instance, does not inflect the verb for person or number, and grammatical relations are marked instead by particles alongside relatively fixed word order. Swedish, though inflectional, also displays minimal verbal agreement (e.g.,
Jag talar ‘I speak’,
du talar ‘you speak’,
han talar ‘s/he speaks’ all show an invariant verbal form). Spanish, by contrast, has a rich agreement system that allows for considerable syntactic flexibility. Compared to English, which exhibits highly restricted subject–verb agreement (e.g., s/
he runs vs.
you run vs.
they run), Spanish encodes number and person systematically, making it possible to omit the subject (e.g.,
corre ‘s/he runs’
, corres ‘you run’,
corren ‘they run’) or vary word order without loss of grammatical information.
The canonical rule in Spanish prescribes that “los rasgos de número y persona de los verbos conjugados constituyen el reflejo gramatical de los de su sujeto” (‘the number and person features of conjugated verbs constitute the grammatical reflection of those of their subject’)
(RAE & ASALE, 2009, p. 2559). In practice, however, this idealized pattern is often complicated by semantic, discourse-related, and cognitive factors that introduce variation.
One key observation is that verbal agreement does not always align with the canonical subject, even when a clear subject is present in the structure. Numerous cases can be found in which the verb agrees instead with another nearby NP with the syntactic function of a direct object (DO) (6a)
2, an adverbial adjunct (6b), a secondary predicate (6c) or even the subject of a subordinate clause (6d). In these examples the verb does not agree with the grammatical subject but with another nearby constituent instead.
| (6a) | Se | han | ido | citando | a | varios | guardia | civiles. |
| | REFL | AUX.PRS.3PL | go-PTCP | cite-PTCP | to | several | guard | civil-PL |
| | ‘Several members of the Civil Guard have been summoned.’ |
| | (Gómez Torrego, 1992, p. 24) |
| (6b) | Para | llegar | a | Madrid | sólo | se | tardan | diez | minutos. |
| | for | arrive-INF | to | Madrid | only | REFL | take-3PL.PRS | ten | minute-PL |
| | ‘It only takes ten minutes to get to Madrid.’ |
| (6c) | Miningas | es | como | se | llaman | allí | a | las | muchachas. |
| | Miningas | be-3SG.PRS | how | REFL | call-3PL.PRS | there | to | ART.F.PL | girl-F.PL |
| | ‘Miningas is what girls are called there.’ |
| | (Goytisolo, J., Fiestas, p. 149) |
| (6d) | No | se | saben | cuántos | alumnos | habrán | asistido | a | clase. |
| | not | REFL | know-3PL.PRS | how.many | student-M.PL | AUX.FUT.3PL | attend-PTCP | to | class |
| | ‘It is not known how many students will have attended class.’ | |
In the literature, mismatches such as these have been accounted for in terms of agreement with an ‘attractor’. The attractor is an intervening constituent situated between the grammatical subject and the verb, which may ‘attract’ agreement features (
Vigliocco et al., 1996;
Wagers et al., 2009;
Sánchez et al., 2014). This phenomenon typically arises when the distance between the subject and the verb increases or the subject is not explicitly mentioned, and the syntactic or semantic salience of a nearby constituent exerts a stronger influence. Similar attraction effects have been widely attested in psycholinguistic studies of agreement production (cf.
Bock & Miller, 1991)
3. A second explanatory factor underlying certain cases of agreement mismatch, and interfering with the grammatical one, involves ‘notional’ or ‘conceptual agreement’. As early as
Gili Gaya (
1980), reference was made to
concordancia mentada ‘intended agreement’, that is, agreement that responds to the speaker’s expressive or communicative intentions rather than to purely grammatical constraints. This type of agreement operates as a principle of textual cohesion, where morphological concord serves as a mechanism that is a “guía de interpretación y especificación de los distintos elementos del discurso” (‘a guide of the interpretation and specification of the various elements of discourse’) (
Avendaño, 2007, p. 207). From this perspective, agreement fulfills not only a grammatical but also a discursive and cognitive function, as an interpretive strategy through which speakers activate particular ‘construals’ in their interlocutors. The principle can also be understood under a broader principle: the so-called agreement
ad sensum, or agreement according to the overall meaning of the clause (
Corbett, 2003;
Haskell & MacDonald, 2003;
Sánchez et al., 2013;
Mare & Pato, 2018;
Enghels, 2019).
This approach aligns closely with cognitively and functionally oriented hypotheses about language. From the perspective of Cognitive Grammar (e.g.,
Langacker, 1987,
2008), grammatical patterns emerge from the way speakers conceptualize a scene. In this sense, agreement tends to be established with the element that is most ‘salient’ or mentally accessible at the moment of production. This principle is also connected to notions of ‘perspective’ and ‘evidentiality’: speakers may align agreement with the entity they perceive first, or about which they have the most immediate information (cf.
infra Section 5.2).
Naturally, the idea that language and perception are closely interrelated has been expressed before. As early as the 1970s, Miller and Johnson-Laird observed that “the impression that perception and language are closely related may stem from a feeling that people use language primarily to talk about the world they perceive” (
Miller & Johnson-Laird, 1976, p. 119). A few years later,
Jackendoff (
1983, p. 3) reformulated this as a guiding question: “What does the grammatical structure of natural language reveal about the nature of perception and cognition?”
From this perspective, it is particularly interesting to interpret the variable agreement phenomena within the framework of Construction Grammar (e.g.,
Goldberg, 2006, amongst others). This approach conceives grammatical constructions as complex symbolic units, that is, conventionalized pairings of form and meaning. According to the Principle of No Synonymy, two constructions that differ formally must also differ in their meaning or pragmatic function (
Bolinger, 1968, p. 127). Consequently, this view that a difference in syntactic form spells a difference in meaning implies that the choice between singular or plural agreement in contexts of variability is not neutral either. Rather, it may produce subtle effects on focus or perspective of the utterance. In this sense, variable agreement in perception–verb constructions offers an ideal testing ground to explore how morphosyntactic alternation reflects differences in conceptualization, as further explained in the next section.
3. Testing Ground: The Infinitive Construction with Perception Verbs
In most languages, the semantic nature of perception verbs is determined by two fundamental parameters: the degree of agentivity of the perceiver and the sensory modality involved (
Viberg, 1984;
Sweetser, 1990;
Evans & Wilkins, 2000, among others). In this study, we will focus on four types of verbs that combine these two dimensions:
ver ‘to see’ and
mirar ‘to look at’ for visual perception, and
oír ‘to hear’ and
escuchar ‘to listen to’ for auditory perception. Among the various possible complementation patterns that perception verbs allow, one of the most distinctive is the infinitival construction. This construction represents the type of complement that encodes an act of direct perception of an event by an experiencer subject. The perceived event, expressed by a subordinate infinitive complement, includes a second participant, here referred to as the subordinate participant or NP
2, who is responsible for the action expressed by the infinitive. Example (7) illustrates this type of construction.
| (7) | perceiver | perception verb | perceived event |
| | NP1 | PV | Inf | NP2 |
| | Ø | Veía | agrandarse | las bocas de aquellas dos mujeres. |
| | I | see-1SG.IPFV | enlarge-INF.REFL | ART.F.PL mouth-F.PL of those two woman-F.PL |
| | ‘I saw the mouths of those two women grow.’ |
| | (Muñoz Molina, A., El jinete polaco, p. 353 [CREA]) |
Since the infinitive is a non-finite form, it cannot display verbal agreement with a subject. To observe agreement marking within this domain, we must therefore turn to the pronominalized infinitival construction, where the perception verb itself becomes the locus of agreement (8a–b). In this type of construction, the presence of the reflexive
se eliminates the explicit perceiver subject and turns the structure into an impersonal or passive configuration (e.g.,
Mendikoetxea, 1999); the perception verb may occur either in singular or plural.
| (8a) | | PV | Inf | NP2 |
| | Se | oye(n) | sonar | las campanas. |
| | REFL | hear-3SG(PL).PRS | ring-INF | ART.F.PL bell-F.PL |
| | ‘One hears the bells ring.’/‘The bells are heard ringing.’ |
| (8b) | | PV | Inf | NP2 |
| | Se | ve(n) | volar | los pájaros. |
| | REFL | see-3SG(PL).PRS | fly-INF | ART.M.PL bird-M.PL |
| | ‘One sees the birds fly.’/‘The birds are seen flying.’ |
In the descriptive and theoretical literature, Spanish
se-constructions are commonly discussed as a domain where closely related form-meaning pairings cluster together and partially overlap. In concrete, three relevant construals have been distinguished—passive se, impersonal
se, and middle(-like)
se—although actual tokens tend to occupy intermediate positions rather than neatly instantiate one discrete type (
Maldonado, 1999;
Mendikoetxea, 1999).
First, in passive
se constructions, the clause lacks an overt external argument and instead favors a construal in which the postverbal NP is interpreted as subject-like (as a promoted internal argument) and may control agreement on the finite verb. The classical illustration is
Se venden casas ‘Houses are sold’. In the perception verb + infinitive configuration, a passive-like construal can be defined as especially compatible with plural agreement, as in
Se ven volar los pájaros, where the perceived entities are foregrounded as the salient stimulus and can be paraphrased in a subject-prominent way (‘The birds are seen flying’). Second, impersonal
se is usually defined as a structure without an explicit subject in which
se introduces a non-specific human perspective (‘one/people’), while the postverbal NP retains a more object-like status; the finite verb then tends to appear in singular. A standard example is
Aquí se trabaja bien ‘People work well here’ (
Sánchez López, 2002). In perception verb + infinitive clauses, singular morphology has therefore often been associated with an impersonal analysis. Third, middle(-like)
se constructions are typically characterized by participant defocusing and a shift in attention toward the event or situation as such. What is salient is not who initiates the event nor the affected participant, but rather the manifestation of the situation (
Maldonado, 1999). In the perception verb + infinitive construction, this middle-like region is relevant because singular agreement readily supports an event-centered construal:
Se oye volar las moscas naturally profiles the perceptual scene (‘the flying is audible/one can hear the flying’), with reduced prominence of the participants.
Traditional grammars, including the Nueva Gramática de la lengua española, frequently treat agreement as a key diagnostic for distinguishing passive vs. impersonal se and have often attributed singular agreement to the presence of se itself. However, the impersonal analysis has been questioned because se, as a clitic, does not behave like a canonical subject in the ordinary sense (NGLE, §41.12e–l). Moreover, real usage does not reveal a clean passive-impersonal classification as the same se pattern can occur in contexts compatible with either interpretation, and agreement and other cues do not consistently align with one category or the other. So, closely related sentences can exhibit mismatches between agreement, Differential Object Marking, and interpretive intuitions (e.g., the contrast between Se busca a los culpables ‘one looks for the guilty ones’ and Se buscan soluciones ‘solutions are sought’), suggesting that agreement morphology does not transparently encode a categorical construction choice.
This paper therefore goes beyond the diagnostic tradition and treats the pronominal (infinitive) construction as a constructional family in which agreement participates in the profiling of different construals. Crucially, in the complex perception verb + infinitive configuration, the hearer needs not first compute a fully articulated impersonal agent interpretation and only then assign agreement. Rather, singular vs. plural agreement functions as a cue toward an event-centered interpretation (often middle-like) or an entity-centered one (often passive-like). The empirical goal of the paper is to identify which features most strongly condition these interpretations in usage and acceptability patterns.
Importantly, the focus on the infinitival construction is motivated by the considerable controversy it has generated in the literature regarding its internal structure. The debate centers above all on the hybrid status of the second participant, the so-called NP
2, which occupies an intermediate position between the perception verb and the infinitive or can be postposed to the complex predicate. On the one hand, NP
2 clearly functions as the semantic subject of the event expressed by the infinitive, that is, the participant performing the action being perceived. At the same time, the infinitive, by its very nature, lacks the capacity to assign case to its arguments. For this reason, NP
2 often appears morphologically marked as the DO of the perception verb. In the grammatical tradition, this configuration is therefore referred to as
accusativus cum infinitivo (e.g.,
Del Rey Quesada, 2022).
Without going into detail, this functional ambiguity has given rise to a large debate as to whether we are dealing with a simple construction, involving a DO and a predicative complement, or rather with a complex construction, in which the infinitive functions as an independent subordinate predicate. A further question concerns the nature of this subordinate unit: if we accept that the infinitive forms part of a subordinate structure, does it represent a propositional unit, that is, a complete clause (e.g.,
Felser, 1999), or rather a non-propositional structure (e.g.,
Rodríguez Espiñeira, 2000)? These issues relate directly to the type of syntactic link established, on the one hand, between the perception verb and NP
2, and on the other, between that NP
2 and the infinitive (see
Enghels (
2007) for an overview of these different analyses and the arguments raised).
Consequently, this construction offers a particularly rich window into the interaction between syntactic and conceptual dimensions in structures that do not fit into traditional grammatical categories. From a constructional perspective, this raises a further question: how many argument slots does a construction such as ver a los niños jugar ‘to see the children play’ or oír a los niños cantar ‘to hear the children play’ actually contain? Are we dealing with a three-slot structure, comprising subject, verb, and a complex complement that combines an NP2 and an infinitive, or rather with a construction that includes four clearly differentiated elements?
These observations lead to the question whether a homogeneous analysis of all infinitive constructions following perception verbs is in fact possible and necessary. An alternative hypothesis would be that the internal organization of the construction depends on the semantics of the perception verb itself and on the nature of the perceived stimulus. In other words, the variability observed across verbs such as ver, oír, mirar and escuchar may be motivated by differences in how the perceptual experience is construed. From this perspective, the alternation between singular and plural agreement in se-constructions could be seen as a form of agreement ad sensum, that is, agreement guided by the speaker’s conceptualization of the perceptual scene rather than by purely syntactic constraints.
It is generally known that, under the label of ‘perception verbs’, we find verbal forms that, while sharing certain general characteristics, follow distinct semantic dynamics depending on the perceptual modality they encode (
Ibarretxe-Antuñano, 1999;
Enghels, 2007,
2019). Crucially, these differences are not arbitrary but reflect a well-attested hierarchy of perceptual modalities (
Viberg, 1984), according to which vision occupies a privileged position in Western societies as a source of reliable and objective information, followed by auditory perception and at a lower scale, tactile, gustative, and olfactory perception. This perceptual hierarchy leaves clear traces in language: visual perceptions tend to be encoded by the highest number of lexically specific verbs, whereas less dominant modalities such as touch, taste, or smell are more often expressed through semantically underspecified or multifunctional verbs (e.g.,
sentir ‘to feel’, ‘to smell’, sometimes ‘to hear’). In addition, visual perception verbs stand out for their greater syntactic flexibility, allowing a wider range of complementation patterns. Finally, these verbs also display a higher degree of semantic flexibility, giving rise to extensive metaphorical extensions and polysemy (
Sweetser, 1990;
Evans & Wilkins, 2000).
What is most relevant for the present study is the difference in the nature of the perceived stimulus. In the case of visual perception, it is sufficient for the object to be present in order to be perceived. In contrast, auditory perception is not activated merely by the presence of a stimulus: it requires the emission of a sound effect. This implies that auditory stimuli are inherently dynamic, since they must produce sound. Thus, when we say oigo al niño ‘I hear the child’, what we actually perceive is the auditory result of some activity carried out by the child. This contrasts with visual perception, where the child may simply be present and serve as the object of perception. This distinction is crucial, as it significantly conditions the relationship between the verb, the type of stimulus, and the grammatical structure that accompanies it, as further analyzed through a corpus study.
4. Variable Agreement: Results from a Previous Corpus Study
The first case study examines empirical data from a corpus. This section provides a concise summary of
Enghels (
2019), from which only the most relevant conclusions are retained, serving as a starting point for the subsequent experimental analysis. To empirically determine which factors condition singular or plural agreement in the pronominal infinitive construction, a corpus was compiled containing all occurrences of this construction with the four perception verbs under investigation:
ver, oír, mirar, and
escuchar. The data were extracted from a range of sources, including written press (e.g., El País, El Mundo), literary texts, and electronic Spanish databases (e.g., Corpus de Referencia del Español Actual (CREA), Corpus del Español (CDE)) in order to cover different genres and registers. The final dataset comprised approximately 4000 infinitive constructions, of which 748 correspond to pronominal cases.
The collected constructions were classified according to the formal agreement relationship between the perception verb and the subordinate NP2 participant. The first group comprises those cases in which both the verb and the participant are singular. In a second group, the subordinate participant NP2 is plural while the verb appears in singular (e.g., Se oyesg entrechocar [botellas]pl ‘One hears bottles clinking together’). A third category contains examples showing plural agreement, i.e., both the subordinate participant NP2 and the verb appear in plural (e.g., Se veíanpl bajar [reatas de mulas]pl ‘Strings of mules were seen coming down’). The analysis only concentrated on the second and third patterns, as these are the only contexts that allow a contrastive examination of whether a postverbal plural NP2 participant triggers formal agreement in the perception verb. Notably, these represented a relatively small but balanced subset of the total corpus (singular agreement n = 88; plural agreement n = 90).
Figure 1 displays the correlation between perceptual modality and the type of verbal agreement: singular agreement is represented in blue, and plural agreement in contrasting color (see also
Enghels, 2019, p. 119). Based on these data, the statistical analysis reveals a clear contrast between visual and auditory perception verbs
4. The visual ones show a marked preference for plural agreement: in 59.3% of the analyzed cases, the plural subordinate participant triggers plural formal agreement on the main verb (that is, pattern 3 described above). In contrast, the auditory verbs display a strong preference for singular agreement, even when the subordinate NP
2 participant is plural, namely in 63.8% of the analyzed cases (that is, pattern 2 described above).
In the examples involving visual perception verbs (9a–b), the second participant typically follows the infinitive, which creates a certain structural distance between the main verb and the subordinate participant. Despite this distance, however, formal agreement systematically occurs between the two constituents. In (9b), the structural distance is even greater, as the main verb and the infinitive are separated by a prepositional complement (
con dirección al trabajo), yet plural agreement is still maintained. This observation suggests that such agreement cannot be accounted for by purely grammatical factors and instead calls for a conceptual interpretation. A plausible hypothesis is that the speaker perceives a strong connection between the perceptive act and the initiator of the perceived event. By contrast, the examples with auditory verbs in (10a–b) systematically lack agreement between the main verb and the subordinate NP
2 participant, regardless of whether the latter appears immediately after the verb or at a greater distance. In the literature, this behavior has been interpreted as reflecting the more impersonal nature of auditory constructions (
Mendikoetxea, 1999;
Enghels, 2007). In sentences such as (10a)
se oye a los grillos beber, the interpretation does not involve a specific perceiver but rather conveys a generic perception, as in ‘it is heard that the crickets are drinking’.
| (9a) | Más allá | se | ven | pasar | los transeúntes. |
| | farther.away | SE | see-3PL | pass-INF | the passers-PL |
| | ‘Farther away, one sees passers-by going by.’ |
| | (Romero, L., La noria, 1952 [Spanish Online]) |
| (9b) | […] cuando | se | miraban | pasar | con dirección al trabajo | |
| | when | SE | watch-3PL-IPFV | pass-INF | with direction to.the work | |
| | a los empleados y funcionarios del Banco Nacional. | |
| | DOM the employees and officials of.the Bank National.PL |
| | ‘When one watched the employees and officials of the National Bank on their way to work, wearing their impeccable shirts, going by.’ |
| | (La Prensa, December 5, 1997 [CREA]) |
| (10a) | Antes de salir el sol | se | oye | a los grillos | beber. |
| | before of rise-INF the sun | SE | hear-3SG | DOM the crickets.PL | drink-INF |
| | ‘Before sunrise, one hears the crickets drinking.’ |
| | (Ruy Sánchez, A., Los jardines secretos de Mogador, 2002 [CDE]) |
| (10b) | Dicen los campesinos que en las noches oscuras aún | se | escucha | gemir |
| | say-3PL the peasants that in the nights dark still | SE | hear-3SG | moan-INF |
| | a las almas sin sepultura. | | | |
| | DOM the souls without burial.PL | | | |
| | ‘Peasants say that on dark nights one can still hear the souls without burial moaning.’ |
| | (Cuvi, P., Ecuador. Paso a paso, 1994 [CREA]) |
According to the hypothesis proposed, the higher proportion of plural agreement observed with
ver and
mirar can be explained by the object-centered nature of visual perception (
supra Section 3). Visual perception tends to focus on and foreground the participant who initiates the perceived event, thereby favoring an interpretation in which this second participant is placed in the foreground. In such cases, visual verbs select infinitival complements of a non-clausal type, corresponding to a 4-slot type of the whole construction, and the speaker establishes a direct semantic link between the subordinate participant and the perception verb itself. This promotes the interpretation of that participant as the (grammatical) subject of the main predicate, which in turn triggers plural agreement. The prototypical constructional configuration for visual perception verbs corresponds to a non-clausal analysis of the construction and can be represented as:
In contrast, the more event-oriented nature of auditory perception explains the predominance of singular agreement in pronominal infinitive constructions with
oír and
escuchar. These verbs preferentially select infinitival complements that denote dynamic events, where the speaker first establishes a semantic relationship between the subordinate participant NP
2 and the infinitive, and only subsequently between that complex and the perception verb. In this configuration, the second participant is no longer the core perceived entity; rather, the entire subordinate clause constitutes the perceptual focus. Consequently, agreement is in singular. Hence the prototypical constructional configuration for auditory perception verbs, corresponding to a clausal analysis of the construction, is:
Thus, the observed alternation in the corpus data shows a tendency toward semantic agreement, guided by conceptual salience of the stimulus rather than by strictly grammatical rules. In this view, variation reflects subtle cognitive adjustments during linguistic production.
However, it was also observed that these configurations do not represent fixed patterns but rather prototypical tendencies. As the corpus data reveal, instances of singular agreement also occur with visual perception verbs (11a), and plural agreement occasionally appears with auditory verbs (11b). A possible explanation for the observed cases is that, when the subordinate NP
2 referent is abstract (as in
sus problemas con la justicia ‘his problems with the law’ in (11a)), visual perception can no longer construe it as a salient concrete entity, which leads to the absence of plural agreement. Conversely, when the perceived object in auditory perception refers to a clear sound source (such as
las fuertes pisadas ‘the strong/loud footsteps’ in (11b)), the speaker may opt for a more object-centered construal and establish plural agreement.
| (11a) | Se | vio | venir | sus problemas | con la justicia. |
| | SE | see-3SG.PFV | come-INF | his problems.PL | with the law |
| | ‘His problems with the law were seen coming.’ |
| | (Clarín, April 7, 1997 [CREA]) |
| (11b) | Se | oyeron | las fuertes pisadas de Federico | perderse |
| | SE | hear-3PL.PFV | the strong footsteps.PL of Federico | lose-INF.REFL |
| | por el largo corredor del hotel. |
| | through the long corridor of the hotel |
| | ‘Federico’s heavy footsteps were heard fading away down the long hotel corridor.’ |
| | (de Lera, A. M., Los clarines del miedo, 1967, p. 14) |
Moreover, both types of agreement may alternate in nearly identical syntactic contexts (12a–b). This suggests that the alternation stems from subtle discourse-level adjustments. In (12a) the event of flies flying as a whole is foregrounded, an interpretation that can be linked to the metaphorical meaning of the sentence. The sentence does not literally refer to flies buzzing but figuratively highlights how quiet, inactive, or dull the professor’s classes are, so much so that even the slightest noise would be noticeable. In (12b), attention is drawn to the flies as initiators of the perceived action, that is perceived due to the literal extreme silence on the square. In short, the variation might again be explained by conceptual foregrounding on behalf of the speaker.
| (12a) | […] profesor en cuyas aulas | se | oía | volar | las moscas. |
| | professor in whose classrooms | SE | hear-3SG.IPFV | fly-INF | the flies.PL |
| | ‘A professor in whose classes you could hear the flies buzzing.’ |
| | (Chavarría, D., El rojo en la pluma del loro, 2002 [CREA]) |
| (12b) | No hay ni un ruido en la plaza; | se | oyen | volar | las moscas. |
| | there.is not even one noise in the square | SE | hear-3PL | fly-INF | the flies.PL |
| | ‘There isn’t a single noise in the square; you can hear the flies flying.’ |
| | (Sarduy, S., De dónde son los cantantes, 1967 [CDE]) |
As an interim conclusion, the corpus analysis has provided an empirical overview of how variable agreement patterns manifest in Spanish, revealing which and how frequently combinations of verb type (visual vs. auditory perception) and agreement (singular vs. plural) actually occur. However, this corpus-based research also faced certain inherent limitations. The first concerns the size of the dataset: the number of relevant examples was quite small, suggesting that the pronominal infinitive construction is quite infrequent in usage. Moreover, as is often the case in corpus-based research, these observations are limited to ‘attested usage’. They show what speakers produce, but not how they internally evaluate alternative forms that may be infrequent or absent from the data. Therefore, following
Schütze (
2016) and
Granvik et al. (
2025), among many others, we believe that the corpus evidence alone cannot distinguish whether the observed agreement pattern preferences stem from grammatical constraints, discourse-pragmatic tendencies, or processing factors.
To address these questions, these corpus results provide the basis for an acceptability-rating task, designed to probe speakers’ intuitions about the well-formedness of the relevant agreement patterns under controlled conditions
5.
6. Discussion and Conclusions
Overall, it is fair to conclude that the results of the acceptability-rating task offer a more nuanced picture of agreement variability in perception–verb constructions than the corpus analysis suggested. While the corpus study clearly showed an effect of perceptual modality on agreement configurations—with ver favoring plural agreement and oír favoring singular agreement—the experimental data did not fully replicate this pattern. No global interaction between Modality and Agreement emerged, indicating that when speakers are asked to rate the naturalness of these constructions in isolation, they do not systematically prefer one agreement type over the other depending on perceptual modality. This discrepancy between corpus and experimental evidence might reflect a task effect: corpus data capture production, where speakers make spontaneous linguistic choices shaped by discourse context and communicative goals, whereas the acceptability task taps into processing and metalinguistic evaluation, where sentences are assessed without a discourse frame and thus without the pragmatic pressures that favor certain configurations.
Despite the absence of an overall visual modality dominance effect or an interaction of Modality with Agreement, the analysis confirmed that both the singular and plural variants are cognitively active and equally represented in speakers’ internal grammar. The balanced acceptability ratings for both agreement patterns suggest that speakers treat them as coexisting constructions, each linguistically valid and licensed by the grammar of contemporary Spanish.
At the same time, the results clearly identified Animacy as a decisive factor in acceptability, modulating the relationship between Modality and Agreement. As expected, sentences with human NP2s received higher ratings overall than those with inanimate NP2s. Yet the direction of this effect was the inverse of what was initially hypothesized: rather than promoting plural agreement, the presence of human referents favored singular marking, while inanimate referents were more compatible with plural agreement. This inversion can be explained by the role of Differential Object Marking (DOM). As DOM is almost categorically associated with human and definite objects, its presence reinforces the DO status of NP2 and thus enhances the acceptability of singular agreement.
The results showed another clear asymmetry between human and inanimate NP2s. For human referents with plural agreement (not marked by DOM), acceptability ratings pattern as predicted by H1b, with sentences involving visual perception receiving higher scores than those involving auditory perception. In contrast, for inanimate NP2s (also not marked by DOM) plural agreement is overall rated as more acceptable than singular agreement, regardless of perceptual modality. Notably, and contrary to H1b, in this inanimate domain plural agreement showed a slight preference for oír over ver. At a more abstract level, this divergence suggests that the interaction between perceptual modality and agreement is not uniform across referent types but is mediated by the conceptual status of NP2.
Overall, these findings demonstrate that speakers can modulate their choice between alternating constructions depending on both referential properties (e.g., animacy and the presence of DOM) and perceptual modality (visual vs. auditory), although the latter plays a lesser role than anticipated.
From a broader theoretical perspective, these results lend further support to the view that agreement phenomena reflect the interface between grammatical form and conceptual construal. The fact that both agreement patterns are accepted by native speakers indicates that multiple constructions coexist within the grammar of Spanish. At the same time, the divergence between corpus and experimental outcomes underscores the importance of combining production-based and rating-based methodologies to capture the full range of grammatical variation.
For future research, it will be crucial to complement these acceptability data with production experiments that elicit spontaneous speech. Such studies would clarify how speakers actually construct and select agreement patterns in real—but controlled—communicative situations, and how contextual or discourse factors influence their choices. Furthermore, new experimental paradigms could be designed to further test the role of salience in contexts (a) where DOM cannot interfere, thus only with inanimate NP2s, for instance, (b) by introducing adjectival modifiers that contribute specific perceptual-conceptual properties (for instance, indicating brightness, movement, dynamicity, etc.), or (c) by systematically testing animate referents across both agreement types and across conditions with and without DOM, including configurations that run counter to the grammatical norm, such as se oyen/ven jugar a los niños or se oye/ve jugar los niños. These controlled manipulations would make it possible to test whether the link between salience and agreement persists independently of the grammatical marking associated with DOM.