According to the Implicit Prosody Hypothesis [1
], readers generate imagined representations of prosodic structure during silent reading that are similar to the explicit prosodic representations that readers produce when reading aloud. This hypothesis has been supported by behavioral evidence demonstrating similarity between real and imagined representations of a variety of prosodic phenomena, including intonation, phrasing, stress, and meter [4
]. For example, evidence for implicit intonational structure is provided by the fact that readers are faster to recognize target words that are produced aloud with a previously imagined intonation contour [6
]. Readers impose implicit phrase boundaries in sentences that are long enough to have a phrase break [8
] and tend to balance the size of adjacent phrases even during silent reading [9
], providing evidence for implicit prosodic phrasing. Readers take longer to silently read words with two stressed syllables than words with one stressed syllable [11
], and take longer to read sentences in which a local lexical stress pattern mismatches the predicted metric structure as determined by prior sentence material [12
], providing evidence for an implicit metric structure. Although these behavioral similarities between patterns associated with explicit and implicit prosody provide indirect support for implicit prosodic representations, they cannot tell us to what extent implicit prosodic representations are processed similarly to explicit prosodic representations. In the current study, we used event-related potentials (ERPs) to investigate the processing of implicit prosodic representations, and how it compares to that of explicit prosody.
1.1. Behavioral Studies of Explicit and Implicit Linguistic Metric Representation
The specific focus of the current study is the similarity between implicit and explicit metric processing. For this investigation, we exploit metrical regularity in English; English is a stress-timed language, meaning that speakers produce temporally regularized sequences of strong (stressed) and weak (unstressed) syllables. The metric structure in stress-timed languages is conveyed by the timing of strong syllables [17
]. There are constraints on the ordering of strong and weak beats in stress-timed languages, as strong beats tend to occur at regular intervals [19
], speakers avoid clashes of strong beats and lapses of weak beats [20
], and under some circumstances, speakers shift the location of stress on words to maintain metric regularity (e.g., thirTEEN MEN → THIRteen MEN) [18
]. Speakers signal strong syllables in speech with a variety of acoustic cues, including longer duration and higher intensity [21
]. Strong syllables also hold a privileged position in auditory language comprehension; listeners are faster to detect phonemes in stressed syllables [25
], lexical access is more disrupted by the mispronunciation of stressed syllables than unstressed syllables [26
], and listeners tend to interpret stressed syllables as word onsets [27
]. Moreover, listeners use the pattern of strong and weak syllables to predict what words will come next [29
], and to resolve lexical ambiguity [30
Like speakers and listeners, there is evidence that readers are also sensitive to a metric structure. For example, readers spend more time fixating four-syllable words with two stressed syllables (e.g., RAdiAtion) than four-syllable words with one stressed syllable (e.g., geOmetry) [11
]. In silent reading, syntactically ambiguous sentences are more likely to be resolved in ways that maintain alternating strong and weak syllables [14
]. In the study that serves as the inspiration for the current study, Breen and Clifton tracked participants’ eye movements as they read limericks designed to induce readers to generate strong expectations about the stress pattern of upcoming words [13
]. The target word in the critical items, which was always the final word of the second line of the limerick, was a stress-alternating noun–verb homograph; these words are realized with strong–weak (SW) stress as a noun (e.g., PERmit), but weak–strong (WS) stress as a verb (e.g., perMIT) [29
]. In this way, the target was either SW or WS, and this lexical stress pattern was either consistent or inconsistent with the metric structure of the limerick (see Table 1
). Throughout this paper, we will refer to the occurrence of an inconsistent SW word when a WS word is predicted as a strong–weak (SW) violation, and to the occurrence of an inconsistent WS word when a SW word is predicted as a weak–strong (WS) violation.
Breen and Clifton predicted that readers would encounter difficulty whenever the stress pattern of the target word mismatched the pattern of the limerick. However, they only observed an effect of metric mismatch for WS violations (e.g., Table 1
D); reading times for SW violations (e.g., Table 1
B) did not differ from those of consistent SW words. Breen and Clifton argued that these results reflect the uneven distribution of SW and WS words in the English lexicon; 85–90% of content words in English have an initial stressed syllable [34
]. Specifically, there is a minimal cost to encountering a SW word in a context where a WS word is predicted because SW is the default stress pattern. Identifying a WS word in a context that predicts SW, on the other hand, is costly because of both the conflict with context and the lower base frequency of the WS pattern. This interpretation is supported by previous work showing that auditory word identification is more disrupted when a canonically SW word is pronounced as WS, than when a canonically WS word is pronounced as SW [35
]. Moreover, the observed effect was not on initial reading times, but only on the combined duration of fixations on the target word and time spent rereading earlier sentence material. The latency of this effect, therefore, suggests that the WS violation did not disrupt initial reading times but required later reanalysis.
1.2. Event-Related Potential Studies of Explicit Linguistic Metric Processing
In ERP investigations of explicit metric processing during speech perception, multiple methods have been used to investigate metric violations. One major source of variation among these studies is whether the metric violation is determined by the lexical stress pattern of the word in isolation or only by the context in which the word occurs. In studies of the first variety, researchers presented multisyllabic words auditorily with the correct or incorrect stress pattern either in isolation [36
] or in a sentence context [39
]. In studies of the second variety, researchers established a context that created an expectation of a specific metric pattern, then presented a target that had the correct metric pattern in isolation but was consistent or inconsistent with the expected pattern created by the context. One such paradigm used word strings to create metric context: listeners heard a string of three or four prime words with the same lexical stress pattern (all SW or all WS, e.g., BANKer, HELPful, PARty or moRALE, emBRACE, deLIGHT) followed by a target word with the same stress pattern as the primes or the opposite pattern [41
]. In another such paradigm, participants heard sentences with a consistent metric structure including a target which was either consistent or inconsistent with the established pattern [43
] (e.g., stress clash in “The chamPAGNE COCKtails are very delicious”). A final method used cross-modal information to inform prosodic interpretation, as in [49
] where participants viewed pictures which disambiguated the meaning of semantically ambiguous two-syllable strings like greenhouse, which are disambiguated by stress patterns (GREENhouse vs. green HOUSE).
Regardless of the type of manipulation, these ERP studies demonstrate that encountering metric violations while listening generally gives rise to an early negativity between 250 and 500 ms [36
]. However, this early effect is not consistent across studies, in terms of timing and polarity. Some of the variance can be explained by the different responses to SW violations and WS violations in two-syllable words; SW violations, where a SW word appears when a WS word is predicted, typically elicit an early negativity [41
]. The results are more mixed for WS violations, where a WS word appears when a SW word is predicted, which elicit an early negativity in some cases [41
] but has also been shown to elicit an early positivity relative to predicted metric patterns [36
]. In two studies, both SW and WS violations elicited an early negativity, but the negativity to SW violations peaked earlier [41
Additionally, explicit metric violations have often been shown to elicit a late positivity between 500 and 1000 ms [36
]. In contrast to the early time window, this later effect does not seem to differ in polarity or timing as a function of target lexical stress pattern. However, its presence is dependent on the experimental task; in cases where the participants’ task is to make an explicit assessment of the accuracy of the metric structure of the target, that target usually elicits a late positivity [36
], though this is not always the case [45
]. In contrast, if the participants’ task does not include a specific assessment of the metric structure, a late positivity is absent [41
]. Indeed, in cases where the explicitness of a metric judgment is varied within the experiment, a late positivity is generally evident only when the task requires this judgment [39
Despite some variation across studies, these neural effects of metric inconsistency appear to be distinct from the neural effects of either syntactic or semantic violations. Syntactic violations typically elicit a biphasic response consisting of a left-lateralized anterior negativity peaking around 300 ms (LAN) and a posterior positivity peaking around 600 ms after stimulus onset (P600/LPC) [50
]. A simultaneous test of metric and syntactic violations reported distinct negativities for each violation type however, with the negativity to metric violations occurring earlier than the negativity evoked by syntactic violations (which was interpreted as a LAN) [43
]. Semantic violations typically elicit a parietally-maximal negativity around 400 ms (N400) [51
]. Although some authors have interpreted the early negativity elicited by metric violations as an N400 [39
], this metric negativity has been observed in response to illegal stress shifts in pseudowords which have no lexico-semantic content and should not result in an N400 [45
]. Further, semantic incongruity and metric incongruity have been shown to modulate the amplitude of an early negativity differently when considered in the same design, even by authors who categorize deviations from a predicted metric structure as N400 effects [39
]. Finally, [44
] observed that simultaneous metric and semantic violations lead to a larger negativity than that observed for semantic violation alone, and [52
] used neuroimaging to demonstrate that the responses to semantic and metric violations have different neural generators, providing evidence that metric violations are not simply processed as semantic violations.
1.3. Event-Related Potential Studies of Implicit Linguistic Metric Processing
ERPs have also been used to explore implicit metric representations during silent reading. In one study, readers were presented with strings of four two-syllable English prime words with consistent lexical stress patterns, followed by a target word that was consistent or inconsistent with the stress pattern of the previous words [53
]. Both SW and WS violations resulted in a larger fronto-central negativity from 250–400 ms after word onset, relative to words with a predicted stress pattern. In addition, all SW targets, whether consistent or inconsistent with the context, elicited a larger negativity (350–450 ms after word onset) than WS targets. In another study exploring silent metric processing in word lists, readers were presented with strings of three two-syllable German prime words followed by a SW or WS target. In this case, there were no observable ERP differences for SW violations, but WS violations were more positive than correct WS targets in three time windows: between 250–400, 400–600, and 600–800 ms after target onset [54
]. A final study presented participants with an auditory tone sequence with a SW or WS pattern followed by a visually presented two-syllable English word which was consistent or inconsistent with the tone sequence stress pattern [55
]. The results demonstrated a larger negativity from 300–700 ms after target presentation for SW violations compared to correct SW targets, but no significant ERP effect for WS violations. In general, these studies demonstrate that, similar to explicit metric violations, implicit metric violations often evoke an early negativity that is more reliably observed for SW than WS violations. Moreover, two of these studies are consistent with results from explicit meter studies in that when the task does not require an explicit metric judgment (and none of these did; rather, participants’ task was to make an old/new judgment of the target [53
], a lexical decision judgment [55
], or answer a semantic question about the word strings [54
]), there is no late positivity.
Multiple factors could be contributing to the variability in results observed across previous investigations of ERP responses to implicit metric violations. First, these studies have used different target words in the SW and WS conditions, meaning that the observed results may reflect differences beyond prosody, including phonetic, orthographic, or lexical differences between conditions. Second, these studies used single words or word lists to create metric expectations but, in these contexts, readers are not required to fully process the syntactic and semantic structure of the targets; this variability could lead to heterogenous depth of processing across conditions. Therefore, in the current study, we implemented metric expectations using metrically regular rhyming couplets, which encourage readers to make strong predictions about when strong and weak syllables will occur but also require deep linguistic processing. Moreover, our target words are stress-alternating noun–verb homographs, which can have SW or WS stress depending on the syntactic category. In this way, readers are exposed to the same visual, orthographic, and segmental input across all conditions.
If readers are generating implicit metric predictions during silent reading, we predict that targets which are inconsistent with the metric context will result in early differences in the ERP waveform compared to metrically consistent targets. However, based on prior work, we predict that this early effect may differ depending on the type of violation. Specifically, we predict SW violations will elicit an early negativity relative to consistent SW words. Conversely, WS violations may result in either a reduced negativity, or a positivity, relative to WS consistent targets. Moreover, we predict the absence of a late positivity in response to metric violations, as participants are not making explicit judgments about the metric structure.
The goal of the current study was to investigate the realization of metric representations during silent reading using ERPs. Participants silently read metrically regular rhyming couplets in which the final target word had a strong–weak (SW) or weak–strong (WS) lexical stress pattern that was either consistent or inconsistent with the metric stress pattern predicted by the couplet. The results demonstrated that SW targets which were inconsistent with the stress pattern of the couplet (i.e., SW violations) elicited two separate negativities (80–155 ms and 325–375 ms after word onset) relative to SW targets which were consistent with the predicted stress pattern. Conversely, WS targets inconsistent with the stress pattern of the couplet (i.e., WS violations) elicited an early positivity (365–435 ms after word onset) relative to WS targets which were consistent with the predicted stress pattern. Neither SW nor WS violations elicited a late positivity. Together with prior results, the current results support the Implicit Prosody Hypothesis, which maintains that readers are generating implicit versions of prosodic structure even when reading silently, and that these representations are similar to explicit ones.
The observation of a significant negative left-lateralized deflection from 80–155 ms in response to SW violations is an unexpected result based on prior work on explicit and implicit linguistic metric processing. Few studies of linguistic meter have reported consistent differences in components this early, though one study demonstrated a significant negativity between 100–320 ms in response to an inappropriate stressed syllable [46
]. However, negativities in the 100–200 ms time window have been widely observed in response to metric violations in musical studies. This effect, termed the metric mismatch negativity (MMN), has been observed when a strong tone occurs at an unpredicted temporal location (i.e., when a weak tone is predicted) [63
]. This situation is analogous to the circumstance under which we observed the early negativity in the current study, such that a strong beat at a predicted weak time elicits the early negativity (SW violation), whereas a weak beat at a predicted strong time does not (WS violation). Importantly, as this effect was detected based on a marginal interaction of metric consistency with electrode position and this is the first study we are aware of to report this early negativity in response to an implicit strong beat occurring at a predicted weak time, additional experiments will be required to determine the reliability and meaning of this component.
The negativity between 325–375 ms observed for SW violations is consistent with results from previous investigations of both explicit and implicit violations of metric structure. Specifically, previous studies have demonstrated that SW metric violations result in a negative deflection in the 250–500 ms range relative to metrically consistent targets [36
]. Moreover, a similar effect has also been shown in a small set of studies investigating metric structure in silent reading of single words [53
]. The current study extends this finding to silent reading of metric violations in sentence contexts using orthographically identical items across all conditions. The observation of a positivity for WS violations from 365–435 ms after word onset is also consistent with both prior listening and reading studies. Two prior studies of explicit metric violations [40
] and one prior study of implicit metric processing [54
] have observed positive deflections for consistent WS targets relative to inconsistent WS targets. Our results therefore suggest that prior findings of different responses to SW and WS violations are not simply due to idiosyncratic differences between the SW and WS target items chosen for these prior experiments, but do indeed reflect the activation of abstract metric representations during silent reading.
The different results observed across multiple studies for SW vs. WS violations may be due to differences in the underlying phonological structure of the target words. According to [17
], the trochaic foot (SW) is the default phonological structure in Germanic languages, including English. This phonological constraint is realized in the lexical stress patterns of words, such that most two-syllable words begin with a stressed syllable (85–90% of the time in English [34
]; 73% of the time in German [66
]). This asymmetry means that accessing a SW (trochaic) representation of a target is globally easier than accessing a WS (iambic) representation, irrespective of the context in which the target occurs. Therefore, the lexical representation of a SW target is harder to access when its stress pattern conflicts with the local metric context, than when its stress pattern is consistent with the local context. Conversely, resolving WS violations is more challenging for readers, due to conflicting cues in both the local environment and the global environment.
Under this view of phonological asymmetry, the negativity observed between 325–375 ms for SW violations in the current study, and in a similar time window in other studies, could be related to the N400, which reflects the ease with which lexical access is achieved. The negativity for SW violations could reflect either additional lexical processing due to the added difficulty of accessing the appropriate lexical content in the presence of lexical stress mismatch, or lexical repair processes due to automatic activation of the metrically consistent, but semantically inconsistent, alternate form of the noun/verb homograph. However, it is important to note that this interpretation of the negativity as indexing lexical processing is challenged by previous work exploring simultaneous violations of metric and semantic structure, in which the latency and distribution of the negativity differs across violation types [39
], as well as evidence that metrically inconsistent pseudowords also elicit such negativities, even though they lack semantic context [45
]. Alternatively, it could be that the negativity we observed in the current experiment indicates the violation of a consistent, rule-based sequence, in this case realized as the metric structure [45
In contrast, the positivity observed between 365–435 ms for WS violations in the current study, and in a similar time window in other studies, could be related to conflict processing. When a WS violation occurs, the reader must resolve the conflict between a metric context which leads them to predict a SW target and a semantic context which leads them to predict a WS target. In addition, there is the added conflict that WS two-syllable words are phonologically marked in the language. These factors together may lead to the observed positivity, which is signaling an error in processing that is harder for readers to recover from. This interpretation is consistent with previous ERP research of the metric structure in German, where metric violations in three-syllable words that did not violate metric foot structure led to an early negativity, whereas violations that also conflicted with foot structure resulted in an early positivity [36
], similar to the results in the current study.
Consistent with other explicit and implicit metric processing studies that do not involve an explicit metric task, we did not observe evidence of a late positivity for metric violations relative to consistent metric conditions. Previous studies of both explicit and implicit metric processing demonstrate that late positivities in response to metric violation are most likely observed when the participant’s task is to assess the metric structure. Indeed, only one previous study of implicit metric processing observed a late positivity in response to metric violations [54
] while two others did not [53
], and none of these studies required an explicit metric judgment. This interpretation is in line with previous work showing a dissociation between early and late ERP effects of syntactic violations, where early negativities are thought to reflect automatic processing and late positivities are thought to reflect controlled processes of repair [67
] and the difficulty of the required repair process [69
]. The current results suggest that although both implicit and explicit metric violations are automatically detected, as evidenced by early (<500 ms) waveform differences, only violations that rise to the level of awareness give rise to a late positivity.
It is also possible that the lack of a late positivity in the current study reflects a lack of power; our choice to present the same orthographic information across conditions meant that the number of items in the experiment was limited by the number of two-syllable stress-alternating noun–verb homographs in English that were known to our participants and could be embedded in rhyming couplets. Moreover, compared to previous studies using word lists, the stress pattern of the target in the current study was locally ambiguous, and only disambiguated by the implicit metric structure provided by the context. Although this manipulation is a better test of the abstract metric structure compared to other studies that used different items across SW and WS conditions, it produces a less clearly defined metric violation than paradigms employing single target words with unambiguous stress patterns.
Although the current results are generally consistent with prior ERP work on explicit and implicit linguistic metric structures, they are inconsistent with results observed in a previous eye-tracking experiment using the same materials. Recall that Breen and Clifton observed inflated reading times only for WS violations, and not for SW violations [13
]; moreover, these effects were observed only in relatively later reading time measures. Conversely, our results demonstrate significant early ERP differences for both SW and WS violations, though they differ in polarity, timing, and topography. These differential effects are likely due to differences in the temporal control of stimulus presentation between the studies. In Breen and Clifton’s experiment, participants read normally at their own pace, meaning they could take as much time as needed to process material in advance of the critical word, and could look back to prior sentence material to resolve difficulty generated at the target word. In contrast, materials in the current study were presented in a region-by-region segmented fashion, giving participants less time to generate predictions about upcoming material, and disallowing regressions. Moreover, the fact that the current materials were presented in a time-controlled manner means that the metric structure of the sentence materials was more obvious for readers, making the metric inconsistency more explicit, resulting in significant ERP effects of both types of metric violations.
Future work could directly investigate the role of temporal stimulus control on implicit metric violation processing by replicating the current paradigm using simultaneous collection of eye-tracking data and ERPs, a method which has already been used to successfully adjudicate debates about linguistic processing in eye movements [70
]. In this way, the role of metric inconsistency in silent reading could be assessed without explicitly controlling the timing of materials. Additionally, while current results demonstrate that readers engage in implicit prosody during silent reading of poetry, it is an open question to what extent these findings generalize to normal reading. The couplets used in the current study were designed to have strict metric and rhyming structure, which is rare in non-poetic language. However, our study does provide an insight to the role of meter in implicit prosody. To determine whether our result can be replicated in non-poetic contexts which do not have concomitantly high metrical expectancies, future work will explore differences in brain activity in response to metric violations in silently-read prose sentences.