After having reviewed how current models of eye movement control try to capture regressive eye movements in reading, it becomes apparent that all of them add helpful ideas to our understanding of mechanisms that control regressive eye movements during reading but that they all have limitations with regard to several aspects as well.
In the following, we will therefore propose a new framework that may provide a general tool for our understanding of regressive eye movements, without limiting it to a small range of linguistic phenomena. As a starting point, we will use the FC model proposed by Bicknell and Levy (15). But instead of focusing on theoretical considerations about reading strategies, the current aim is to develop a realistic model of human reading behavior, which means that the model should be able to cover findings from the existing literature as well as to make further testable predictions about reading behavior. This, however, requires some substantial modifications in the architecture of the FC model, so that we will call the new account the Information Gathering Framework (IGF).
We acknowledge that our approach has limitations in several ways as well and we want to encourage others to also test and modify this framework. Also note that in contrast to the FC model, the IGF is not incorporated into a computational model as yet that allows for simulating reading. Instead, the IGF takes into account more cognitive and linguistic properties of eye movement control than the former model does. But the current considerations should be used by future research to combine these two approaches and to develop a computational version of the IGF as well.
The architecture of the Information Gathering Framework
Before explaining the assumptions of the IGF in more detail and clarifying its modifications from the FC model, we will briefly summarize the architecture of the IGF by the following six assumptions:
The confidence in each word’s identity is described by the confidence level. The confidence level is computed by matching predictions about incoming material with the lexical representations of a word.
The lexical representation of a word is viewed as an infinite bundle of features which takes time to be retrieved and which varies among individuals (indicated by the lexical quality level).
The focus of attention (i.e., the area within the confidence levels are computed in parallel) is restricted to two words.
There are three different thresholds for the confidence level causing an action: The forward threshold defines the confidence level that is needed to trigger a progressive eye movement, whereas the backward threshold prevents a regression. The re-inspection threshold prevents the word from being selected as a regression target on the basis of explicit linguistic processing.
There are two different scenarios that cause a regressive eye movement: First, if the confidence level falls under the forward threshold after the eyes have already moved to the next word, and second, if the backward threshold is not reached before the confidence level of the next word reaches the forward threshold.
There are also two different scenarios as to how a regression target is selected: Either by targeting the word within the perceptual span with the confidence level under the re-inspection threshold or by using experience-based strategies.
Please notice that although all regressions share the same characteristics (e.g., an eye movement against the intended reading direction, re-reading of former sentence material etc.), the idea to summarize all regressions under one unifying function is probably not convincing. Inhoff et al. (1), for example, suggested that two different types of regressions can be distinguished, namely according to their size, function and target control. One type is referred to as ‘large regressions’ and comprises regressions “that traverse across more than one prior word” (p. 36). They argue that these regressions are highly coupled to linguistic processing and serve to improve comprehension by re-processing prior text. The second type of regressions are ‘small regressions’, typically including refixations of the current word and inter-word regressions to the immediately preceding word. These regressions are assumed to reflect responses to inaccurate or premature oculomotor programming and serve to improve visual word recognition.
We agree with this distinction and in the following we will focus on ‘long regressions’ only. However, although some regressions to word n-1 certainly share the characteristics of ‘small regressions’, we doubt that all of these regressions can be attributed to this class. Thus, we use a slightly broader definition and just exclude regressions due to errors in oculomotor programming, but include regressions to word n-1 not falling in this category.
For all these inter-word regressions we propose one unifying function, this is to gather additional information relevant in the course of sentence interpretation, more precisely, to gather additional information about the identity of words.
(1) The lexical quality level
The FC model proposes that because word identification is based on noisy visual information, “word recognition may be best thought of as a process that never is completed” (15, p. 1170). Although we agree on the assumption of incomplete word recognition, we doubt that noisy visual information is in fact the major determinant of word identification, especially because there exists convincing evidence that the decoding of visual information occurs very rapidly (e.g., 23). Thus, we rather claim that word identification is mainly affected by the retrieval of the lexical information (as also proposed by the SWIFT and E-Z Reader model).
To incorporate this idea in our framework, we assume that the underlying language model contains lexical representations of each word. Specifically, the lexical representations stored in memory have to be viewed as (theoretically) infinite bundles of features, containing information about the word’s orthography, phonology, meaning, morpho-syntax as well as its constituent binding preferences (c.f. also [
24] who introduced this idea as the concept of lexical quality in order to explain differences in language skill between individuals). Because of the complexity of the lexical representation it takes time to retrieve this information from the lexicon.
We refer to the amount of information about a word that is currently retrieved from the lexicon with the term ‘lexical quality level’. Typically, the amount of information (and thus the lexical quality level) continuously increases during a fixation because a fixation allows for the retrieval of lexical information on the basis of the visual input. However, once the eyes have moved to the next word, no additional information can be received and the quality level is then continuously decreasing over time due to interference from other words and due to a decay of the memory trace (25; see
Figure 1 for a schematic illustration). Also note that the lexical quality level of a word (as the confidence level, see below) is never reaching the full quality level because the retrieval of the information from the lexical entry can by definition never be completed.
(2) The confidence level
In addition to the lexical quality level, the IFG claims that a confidence level for each word is computed which basically represents the reader’s confidence in the identity of the current word.
According to the FC model, the reader computes a confidence level of a particular word on the basis of its language model. If additional information causes the confidence in a previous word’s identity to fall under a certain threshold, a regressive saccade to this particular word is triggered. Because the FC model computes the confidence level on the basis of the underlying bigram frequency model, its focus is set on reducing noisy visual input and the computation of confidence levels mirrors just a coarse approximation to the complexity of word recognition processes.
Since we want to take a broader perspective here which also covers higher-order language processing, we propose within the IFG that the computation of confidence levels (as the computation of the lexical quality levels) is based on linguistic processing and takes a certain amount of time. During this time, the confidence level of a word typically increases (asymptotically approaching but never reaching the full confidence level), because more supporting evidence is given from the information of the lexical representation (see
Figure 1 for a schematic illustration). For current purposes, it is assumed that the confidence level is computed by matching the features of the lexical representation with the predictions of former sentence material on the basis of explicit production rules (26).
These production rules represent all procedural knowledge (grammatical knowledge) and define condition–action pairs. For example, if an inanimate noun (e.g. the table) is encountered as the initial argument in an English sentence (condition), the production rules predict that a verb (action) will follow in the course of the sentence. More precisely, they predict that this verb should agree with the argument in number (singular), comes with an inanimate subject, and so on. If a verb like talks is encountered next, this leads to a violation of production rules because talks requires an animate subject. On the other hand, if a pronoun like the word which is following, it induces a relative clause. In this case, the production rules are not violated and the action (the expected verb) is simply postponed. Also, not every condition-action pair is mandatory; some pairs are just optional (e.g., the indirect object of verbs like write: He writes a letter (to his father)). If the evidence provided by the lexical representation matches the predictions made on the basis of the production rules, a high confidence level is computed. If the production rules are violated by contrast, it leads to a low confidence level. Accordingly, if the context is highly predictive, less lexical information and thus less time is needed to reach a certain level of confidence resulting in shorter fixation durations.
Note that the level of confidence is highly correlated to the lexical quality level, but these two parameters are not the same. A poor reader could have a high confidence in a word’s identity although it is ambiguous (e.g., in meaning). But due to a small lexicon which implies a representation of a few features only, the reader is not aware of these alternative interpretations. Accordingly, a proficient reader could have low confidence in the same word’s identity because he takes into account several potential ambiguities that the poor reader is not aware of. In addition, a highly predictive context may also cause that less information (and thus a lower lexical quality level) is needed to confirm this prediction and a certain level of confidence is reached. This explains why gaze durations on highly predictive words are shorter than those on unpredictable words (e.g., 27).
Although the notion of explicit production rules is not experimentally verified yet, there exists comprehensive evidence from a variety of behavioral tasks (including reading) that prediction on several linguistic levels forms an integral part of language processing (for a recent overview, see 28). In addition, there are also influential accounts that highlight the strong relationship of language production and comprehension, assuming that both modalities share fundamental mechanisms (29). Thus, the concept of production rules guiding predictions about the following input may provide a useful tool to model language processing in terms of prediction although it needs more experimental support.
Also, the claim that a mismatch of predictions is the main determinant of regressions is not without problems. In particular, it would imply that regressions serve to improve comprehension because they provide additional information that helps to solve prediction conflicts (note that we have to assume that there are indeed solutions in coherent sentences and texts). However, the empirical evidence for this claim is somewhat inconsistent.
Schotter and colleagues (30) examined the question whether regressions help in comprehension in a clever masking experiment with garden path sentences: All words to the left of the current fixation were replaced with an x-mask so that possible regressions did not provide any useful information. The authors found that although the opportunity to regress supported comprehension, actually making a regression did not lead to significantly better comprehension results compared to cases where the reader did not regress.
More recently, Metzner, von der Malsburg, Vasishth, and Rösler (31) compared sentence comprehension of free-reading and word-by-word presentation in a concurrent ERP / eye-tracking study. They found that accuracy improved when reading naturally compared to the wordby-word presentation, but that the benefit was only visible when the eyes actually made a regressive saccade.
It is not fully clear where these differences come from. The mode of presentation might have had an effect on the results. But also the difficulty of sentence material seems likely to have affected the benefit of a regression: The overall accuracy results indicate that the stimulus material used by Schotter et al. was much harder to process than the sentences used by Metzner et al. Thus, the claim that regressions support comprehension seems to be dependent on the language proficiency of the reader. In other words: Even a regressive eye movement would be useless if the reader does not have the ability to deal with the linguistic problem. This may also explain the lack of a comprehension benefit in the case of a regression in the data reported by Christianson, Luke, Hussey, and Wochna (32, experiment 1) among many others.
(3) The confidence level is monitored by three independent control mechanisms
The FC model proposes that the generation of eye movements is monitored by a simple control policy that sets two different values of confidence causing an action. If the first value is reached, a forward saccade to the next word of low confidence is initiated. If the confidence level of a word falls under the second value, a regressive eye movement to this particular word is triggered.
In the IFG the actions are controlled by three independent thresholds for the confidence level, which we refer to as the forward, backward and re-inspection threshold (see
Figure 1).
The first (forward) mechanism defines the level of first-pass confidence, namely the amount of evidence about word n’s identity that is retrieved in first-pass reading and assessed to be sufficient for the current sentence interpretation. When a certain level of confidence is reached, the eyes move to the next word.
It is further proposed that this forward control mechanism works in a highly automatic manner, per default targeting the next word. This automatic saccade generation is canceled and the eyes move to word n+2, if parafoveal processing already reveals a certain level of confidence for word n+1. The forward control mechanism proposed here is compatible with current models of saccade control like SWIFT (19, 6) that assume a) parallel processing of different words, b) largely automatic generation of progressive (and regressive) eye movements, and c) word identification as the core function of saccades in reading.
This forward threshold in particular mediates between speed and accuracy: If the threshold is set down, the reading speed is increased but accuracy also suffers. If the threshold is set high, by contrast, the accuracy is higher but at the expense of reduced reading speed.
The second (backward) mechanism defines the level of confidence that has to be reached in order to prevent a regressive eye movement from happening. Thus, a regression is performed whenever the level of confidence for a word does not reach a certain threshold. In contrast to the forward control mechanism, this backward mechanism is highly linguistically controlled.
Although the forward and backward control mechanisms often interact, they are assumed to be independent and may be adjusted separately. Thus, there may exist a first-pass strategy that allows for relatively superficial reading, but this does not necessarily mean that at the same time the probability for regressions increases. In addition, both control mechanisms are assumed to be sensitive to top-down influences (i.e. tasks) that may reduce or increase the thresholds for first-pass reading times and regressions. Bicknell and Levy (15) for example showed that the most efficient reading strategy (i.e., the one that leads to highest comprehension accuracy) is one that allows for a lower level of confidence in first pass and increases the probability for regressions at the same time.
The third (re-inspection) mechanism defines the level of confidence that prevents a word from being selected as a regression target on the basis of explicit linguistic processing. Thus, if the confidence level of a word does not reach this re-inspection threshold during a fixation of the subsequent word, it provides a potential target for the regression (we will explain this procedure in more detail below).
(4) Limited focus of attention
The FC model takes into account the limitations of the visual field in order to compute the degree of noisiness for the visual input, but it is not specified with regard to the focus of attention. However, because the underlying language model is restricted to bigram frequencies, the confidence level of a word can only be affected by the visual information about the subsequent word.
Within the IGF, the visual field also shapes the amount of visual information that is available to the reader during a fixation and that is used for the computation of the lexical quality level. But in addition, it is assumed that the computation of confidence levels always requires attention, so that not the confidence levels of all words in a sentence can be monitored in parallel. In particular, research on the basis of SAT (speed accuracy trade-off) experiments has indicated that the focus of attention is very limited, covering only two chunks (33). We therefore assume within the IGF that the focus of attention is restricted to the word of the current fixation (W6 in the example below) and the word before (W5 in the example below) which means that only the lexical representations of these two words can be used in parallel to compute the confidence levels (see
Figure 2).
Note that the concurrent allocation of attention to word n and word n-1 is a highly controversial claim and stands in clear contrast to models like E-Z Reader. However, there is evidence that this kind of attention allocation is indeed possible (e.g., 34).
(5) Four different eye movement scenarios
In a framework with an architecture described above, four different eye movement scenarios are possible (see
Figure 3). We will now describe them in turn. Note that each graph represents the confidence level of six words (W1–W6) while the eyes are currently fixating word 6 (W6).
Pattern 1
The confidence level of W5 has already passed the forward threshold which triggered a saccade to W6. Now the confidence level of W6 is also increasing, and the word remains fixated until the confidence level of W6 reaches the forward threshold. Alternatively, the confidence level of W5 drops under the forward threshold.
Pattern 2
The confidence level of W5 drops under the forward threshold after first passing it (which triggered the saccade to W6). This may happen because the computation of the confidence level for W5 still continues after the eyes moved to W6. Sometimes the computation of the confidence levels reveals that W5 cannot be integrated into the current sentence structure which causes that the confidence level of W5 drops under the forward threshold. As a response, a regressive eye movement is triggered.
Pattern 3
There is another scenario that causes a regression: If the confidence level of W6 already passed the forward threshold but the confidence level of W5 did not reach the backward threshold. This happens for example if the new input does not provide the expected evidence about W5’s identity. In this case, the confidence level of W5 increases only slowly. However, if the confidence level of W6 reaches the forward threshold in the meanwhile, a regression is triggered. We assume that this happens especially at the end of a sentence where the whole sentence structure is evaluated.
Pattern 4
In this case, the confidence level of W6 reached the forward threshold after the confidence level of W5 reached the backward threshold. This is assumed to be the “normal” case and it triggers an eye movement to W7.
(6) How the target of a regressive eye movement is selected
The IGF predicts that there are two different regression scenarios: regressions due to integration difficulties (pattern 2) on the one hand and regressions due to missing evidence on the other (pattern 3). However, a crucial question is how the target of this regressive eye movement is selected.
The FC model predicts that the regression always targets the word with the confidence level under the backward threshold which is always the directly preceding word (due to the underlying bigram frequency model). However, the assumption that regressions are always targeting word n-1 (an assumption which is also shared by the E-Z Reader 10 model, for example) is just a simplified approximation, as discussed above. We also have to keep in mind that the word in the sentence where problems become apparent does not always correspond to the word that causes difficulties. A very prominent example are garden path sentences where difficulties are often caused by a misinterpretation of a word earlier in the sentence. In this case, a re-inspection of the word n-1 would not help to solve the problem, and since we assume that the function of a regression is to solve the problem, this is not a plausible mechanism.
Another opportunity would be to select the word with the lowest quality level as the target for the regression instead because there is an increased likelihood that more evidence (provided by the lexical representation) about this word would help to increase confidence. However, there are also difficulties with this assumption: As already discussed, the quality level and the confidence level are not the same. Thus, a low quality level does not automatically cause a low confidence level. In addition, this assumption would lead to the conclusion that words earlier in the sentence / text are more likely to become the target of a regression because the quality level is low (due to the decrease over time). This prediction, however, is not supported by the empirical findings either.
A third opportunity would be that a re-computation of confidence levels of all prior words takes place and that the word with the lowest confidence level (or the confidence level under the backward threshold) is selected as the regression target. However, since the computation of confidence levels requires attention and there is only a very limited focus of attention (see above), this is not possible within the model’s architecture, either.
For this reason, a third threshold is assumed within the IGF: the re-inspection threshold. Typically, the confidence level of word n-1 reaches the backward and the re-inspection threshold during a fixation on word n (see
Figure 3). But in some cases, the linguistic processing reveals still a substantial doubt in the confidence of a word although it provides a possible but unexpected input for the current sentence interpretation. As a consequence, the confidence level of this particular word reaches the forward and the backward threshold but not the re-inspection threshold. But this does not have any effect on eye movement behavior at this point of time.
If, however, a regressive eye movement is triggered in the course of sentence reading, the word whose confidence level did not reach the re-inspection threshold is selected as the regression target because more confidence is needed here.
Since this procedure would require to monitor all confidence levels of a sentence or even a text in parallel, there has to be some limitation of the amount of words which can be selected by such a mechanism. For the current framework we claim that this target selection mechanism is restricted to words within the perceptual span.
Several studies have shown that the perceptual span comprises 3 to 4 letter spaces to the left of the fixation (35, 36) and 14 to 15 letter spaces to the right of the fixation during reading (37, 38). Because the perceptual span is not a restriction of the visual system per se, but is rather affected by attentional processes (for example indicated by the finding that systematically increasing the font size of the letters to the right or left of the fixation does not reduce the perceptual span: 39), it has been hypothesized that the perceptual span changes when making a regressive eye movement. This hypothesis has been confirmed by research of Apel and colleagues (40), who showed that the size of the perceptual span switches toward the direction of the eye movement which also implies a shift of attention to the left. Although the authors did not answer the question of the actual size of the perceptual span to the left of a fixation during regressions we suggest that the perceptual span encompasses about 15–20 characters to the left, according to the size of the right perceptual span in progressive eye movements. However, the precise size of the perceptual span has to be further examined by future research.
It follows for the architecture of the IGF, that when making a regression, the word within about 15–20 characters to the left of a regression is selected as the regression target if its confidence level did not reach the re-inspection threshold.
However, because word n-1 never reached the re-inspection threshold when a regression is triggered (see
Figure 3), this would lead to the prediction that regressions are always targeting word n-1 (which is obviously not the case as discussed earlier). But note that word n-1 is still in the focus of attention which allows for the computing of its confidence level but also for the retrieval of its lexical information. Thus, it is assumed that word n-1 is only selected as the regression target if the linguistic processing reveals that information about the identity of word n-1 would help to solve the problem. In all other cases the word prior in the perceptual span whose confidence level did not reach the re-inspection threshold is selected as the regression target.
In the case the confidence of none or more than one word (apart from word n-1) did not reach the re-inspection threshold, the regression target is selected by the backward control mechanism on the basis of experience-based strategies which also means that the target selection is not restricted to words within the perceptual span. It seems likely that a target selection based on strategy is more the rule than an exception.
The limited set of selection strategies is based on language experience and aims to define the most efficient way to gather the required information, without taking into account the details of the lexical representation or requiring language processing itself. Most efficient is defined as the combination of speed and accuracy, which means that the strategy is the fastest way to find the most relevant information in the absence of explicit knowledge, taking into account the speed-accuracy tradeoff. Language experience means that this strategy has been applied most frequently in the past and yielded good results, so that the reader when he is faced with a certain category of tasks, assesses the likelihood where the relevant information can be found on the basis of his language experience. Strategy means that the same type of eye movement (B) is performed when faced with the same task (A) – at least for a single reader – resulting in the simple condition term: if A, then B.
Note that it is probably not a certain sentence type which induces a certain backward strategy, but that these strategies mainly differ between individuals due to memory capacities or reading skill. Thus, many studies found evidence that readers prefer a certain strategy. Poor readers, for example, seem to use the backtracking strategy more often than good readers do (41; see also e.g. [11, 12] for identifying scanpath signatures among individuals).
The assumption that the target selection of regressions is under linguistic control (which is assumed in the case of targets that are selected because their confidence level did not reach the re-inspection threshold) is a contentious issue. Mitchell and colleagues (42), for example, introduced the idea that regressive eye movements just may reflect some kind of cognitive-inhibition mechanism. This ‘time out hypothesis’ assumes that “the function of the system is nothing more than that of postponing new input” (p. 269) which also implies that there is no linguistic guidance on regression target selection. However, the authors were not able to provide any evidence for this hypothesis because their syntactic manipulation had a clear impact on the landing sites of regressive eye movements.
But the opposite claim also failed to receive sufficient support. Frazier and Rayner (7) proposed the ‘selective reanalysis hypothesis’ which assumes that in the case of garden path sentences the parser regresses to a position where he expects the source of the error. Although Frazier and Rayner found that 53% of regressions initiated in the disambiguating region and beyond ended in the ambiguous region, the regressions nonetheless showed a relatively high variance with regard to their landing sites, questioning such a strong linguistic guidance. Because the number of regressions was very small and statistical evidence was missing, Meseguer and colleagues (10) conducted a follow-up study two decades later. But they were not able to find convincing evidence for this strong linguistic guidance, either.
Thus, we think that more factors may shape the landing site distribution, although linguistic computations are assumed to be the main determinant. These factors are differences between individuals with respect to linguistic knowledge (e.g., 1, 43) or memory capacities (44, 45). But also general factors like spatial memory (46, 47 as well as [
48] for an overview), oculomotor error (49) and visual salience (e.g., 50) may play an important role in determining landing site distributions of regressions. This, of course, makes it hard to draw strong predictions from the model’s architecture and we acknowledge that more research has to be done in this domain.
Applying the Information Gathering Framework to the findings in the literature and deriving further predictions
Having described the main properties of the IGF, we will now discuss how the model may account for a variety of critical empirical findings reported in the context of regressive eye movements during reading.
In addition, another important factor supporting the strength of a model is that it allows for further predictions. In the following, we will therefore also discuss several predictions that can be derived from the architecture of the model. But note that not all predictions discussed here will potentially verify or falsify the model. For example, the IGF assumes that new input is matched against predictions arising from previous input, which is one of the core principles of the model. If we were to find empirical evidence against this assumption, this would question the validity of the model. But whether these predictions are accomplished on the basis of production rules, by contrast, does primarily affect the detailed architecture of the model but not its core principles.
(1) Properties of word n and word n-1
Above we mentioned the work by Bicknell and Levy (21) testing predictions of the FC model. In their study they were focusing on the relationship between inter-word regressions and properties of word n and word n-1. They discuss the predictions of several theories that account for regressive eye movements during reading.
Table 1 provides an overview over these predictions according to Bicknell and Levy.
Predictions were tested by using the Dundee corpus (22). In contrast to former studies (51, 5) the authors controlled for skipping of word n-1 and clearly distinguished between the factors word length, frequency and predictability. The analysis revealed that there were more regressions when word n-1 was longer, more frequent and less predictable as well as when word n was less predictable (see
Table 1). Length or frequency of word n did not have an effect.
However, because there was a high correlation between the factors frequency and predictability for word n1, the authors carried out an additional analysis which accounted for this correlation. This analysis revealed that there were highly significant effects of the predictability and frequency of word n-1, but in opposite directions (i.e., increased regressions for less predictable but more frequent words).
Bicknell and Levy argue that these results fit best with the assumptions of the FC model. In general, the FC model proposes that an unpredictable word n is more likely to cause confidence to fall which triggers a regressive eye movement. In addition, because for longer, less frequent and less predictable words the confidence level is lower to begin with, it is more likely that the confidence level of these words fall. This may explain the general higher regression probability for word n-1 when it is longer, less frequent and less predictable. The opposing effects of predictability and frequency, however, are interpreted in the sense that unpredictable words only cause more regressions if they are more predictable for alternate possible contexts (indicated by a high frequency). Thus, Bicknell and Levy conclude that their data suggests “that the amount by which a word makes confidence to fall is a key determinant in whether a reader will make a regressive saccade.” (p. 936)
The IGF shares the predictions of the FC model with regard to the properties of word n and n-1. If the confidence level of word n-1increases slower (due to low frequency or less predictability), then it is more likely that the confidence level drops under the forward threshold during a fixation on word n (regressions of type I) or does not reach the backward threshold (regressions of type II). But the IGF provides a clear theoretical explanation for the opposing effects of frequency and predictability: Because the lexical quality level and the confidence level are assumed to be (in principle) independent, properties like frequency (which is associated with the lexical quality level) and predictability (which is associated with the confidence level) may affect the regression behavior in different ways.
The IGF also predicts (as the FC model) that a less predictable word n also increases regression probability because it fits poorly with the prior context. As a response, the confidence level of word n-1 drops under the forward threshold and a regression is triggered (regressions of type I). This should happen widely unaffected by the length or frequency of word n (or at least not resulting in a clear pattern). However, the IGF makes an additional prediction: If the confidence level of word n needs more time to cross the forward threshold, then the confidence level of word n1 has more time to reach the backward threshold. Thus, the regression rates for regressions of type II should be reduced in cases in which the confidence level of word n is creasing slower (i.e., less frequent and less predictable words).
This prediction, however, cannot be tested by the data of Bicknell and Levy, because they restricted their analysis to regressions targeting word n-1 and in addition excluded regressions that were initiated on the last word in a line. Thus, this hypothesis has to be tested by future research. Also note that in the analysis reported above word n-1 always served as the regression target (in contrast to the assumptions of the IGF model). So, it is hardly to distinguish which properties of word n-1 caused regressions and which qualified them as a potential regression target. This topic also needs more empirical examinations.
(2) Regressions to the immediately preceding word
Although the landing positions of regressions are spread over the whole sentence, many studies have shown that the majority of regressive eye movements targets the word immediately preceding the currently fixated word (see e.g., [
5,
11,
12] for corresponding evidence). In particular, all current models of eye movement control discussed above (E-Z Reader 10, Model of falling confidence, Glenmore, SWIFT – with some exceptions mentioned above) account only for these instances.
Mitchell et al. (42) argue (in favor of an automatic regression mechanism) that a regression from word n+1 to word n is the “smallest possible regression” (p. 271). And of course, a regression to word n has some important advantages compared to target words that are farther away from the current fixation: First, the saccade is short and fast, so that less effort for its execution and control is needed. Second, the target word can be processed parafoveally so that the saccade can be guided by using visual input. Third, memory demands are low because the word has been encountered immediately before (see [
46] for a detailed discussion of 'spatial knowledge' in the context of regressions to word n-1).
In the IGF, however, we argue that regressions to the immediately preceding word can be explained more plausibly by a regression mechanism that is controlled by linguistic factors.
Although they differ in their explanations, both the EZ Reader and SWIFT model account for the often replicated finding that the processing of word n also affects processing of word n+1 (also known as “lag” or “spillover” effects: 52, 53 see [
51] for a discussion). Within the IGF, however, this finding can be explained by the idea that the computation of the confidence level continues after the eyes have moved to word n+1 because the retrieval and integration of linguistic information takes time. Because language processing is organized hierarchically and this hierarchy is assumed to correspond to the time course of sentence interpretation (at least to some degree), the computation of the confidence level of word n on word n+1 is based primarily on higher-order linguistic processing like lexical integration. Thus, an integration failure of word n will often become apparent only on word n+1 (see pattern 2 described above). If this integration fails because the predictions based on the production rules are not met, a regression is triggered. If the production rules reveal that more information about word n is needed (which is assumed to be within the focus of attention, see above) because this information would help to solve the problem, this regression targets word n (see also 18).
Because there are many more instances in which the integration of word n fails due to wrong and/or less specified assumptions about its identity than instances where the integration fails due to wrong / less specified identities of previous words (which is the case for instance in most garden path sentences), the eyes very frequently regress to word n. This explains why the majority of regressions targets the immediately preceding word.
In addition, the backward control mechanism could also have developed a strategy that selects the preceding word. Recall that the strategies applied by the backward control mechanism are assumed to be based on general language knowledge / experience and hence operate on frequency. Thus, in the case the confidence level of none or more than one word (apart from word n-1) did not reach the re-inspection threshold, the backward control mechanism might select the preceding word, because this word often provides the most useful information in order to solve the processing problem.
This view is further supported by the findings of von der Malsburg and Vasishth (12) indicating that low-capacity readers were less likely to re-read the sentences when faced with garden path sentences. Instead, they used rapid regressions to the word in the pre-disambiguating region more frequently. Since these rapid regressions provide some advantages with regard to memory capacities (as discussed above), this strategy suits readers with low memory capacities.
(3) Sentence wrap-up effects
A clear deficit of eye movement models like SWIFT, Glenmore or E-Z Reader is that they attribute regressive eye movements only to processing difficulties (in the case of E-Z Reader) or incomplete word processing / identification (in the case of SWIFT and Glenmore). Whereas this of course covers a wide range of regressions reported in the literature, it excludes some findings at the same time. An important sub-class of regressions, for example, is the increased probability to regress from the end of a sentence (‘sentence wrap-up effect’) which was mentioned earlier.
As discussed above, the IGF is not restricted to processing difficulties, it rather posits that regressions are triggered whenever the predictions made by previous input are not matched. This could either be that the current input conflicts with the predictions (which would lead to a decrease of confidence) or that expected evidence is missing (which would lead to a slower increase of confidence). In the case of regressions from the final region we assume that the latter scenario takes place.
Thus, if the eyes move to the final (or pre-final) word, the confidence level of this word is computed by matching the predictions. But in addition, the punctuation is also received from the visual input (at least parafoveally), which signals a sentence boundary. Sentence boundaries indicate that no additional input for the current sentence interpretation can be received and subsequently no prediction (condition-action pair) can be postponed to later input. Thus, at the end of a sentence an evaluation of the whole sentence interpretation takes place (56, 9, 55). In the case that this evaluation reveals that more evidence is needed in order to develop a coherent sentence interpretation, a regression is performed to compensate for this information deficit. Of course, the degree of evidence (and of confidence, respectively) in a sentence structure that is assessed to be sufficient (the backward threshold) may depend on factors like task or time pressure.
Since an evaluation of the whole sentence takes place without dealing with a concrete integration problem, it is reasonable to assume that not a single target position based on the production rules can be defined. In contrast, the regression strategy applied selects a target position on the basis of language experience. This prediction fits well with the regression patterns reported by von der Malsburg and Vasishth (11, 12), which show a clear tendency for readers to regress to the beginning of the sentence and to read the whole sentence again.
(4) Gaze durations and regressions
In the beginning we mentioned the counterintuitive finding of Altmann and colleagues (14) that gaze durations before regressions tend to be shorter relative to gaze durations before progressions. Whereas these results may in general be interpreted in favor of the claim that increased fixation durations and a higher number of regressive eye movements have to be functionally distinguished, the SWIFT model, for example, accounts for this effect by the assumption of saccadic overshoots. In the case of an overshoot, a new saccade program is started immediately. Because it is likely that word n-1 has not been recognized completely and therefore has a high activation level, this word is often targeted by this new saccade.
However, the architecture of the IGF also directly predicts this pattern. Recall that fixation durations are mainly monitored by the forward threshold: As soon as the confidence level of word n reaches the forward threshold, the eyes move to word n+1. If, however, the computation of the confidence level of word n-1 reveals integration difficulties (recall that the computation of the confidence level of word n-1 still continues during a fixation of word n), this causes the confidence level of word n-1 to fall. As a consequence, the fixation of word n is cancelled and a regressive eye movement is performed instead. Because the fixation of word n is cancelled, fixation durations before regressive eye movements tend to be shorter.
But our model makes an additional prediction: Because regressions due to missing evidence are not triggered before the fixation of the current word is completed, we would expect no shorter fixation durations for these types of regressive eye movements (in contrast to regressions due to integration difficulties where a fixation is cancelled and thus the fixation durations are shortened).
(5) Regression targets within and outside the perceptual span
The IGF makes a strong prediction with regard to the target selection of regressions: Only words within the perceptual span, which is assumed to comprise about 15–20 characters to the left of the current fixation, can be selected as a regression target by an explicit linguistic computation. Words outside of the perceptual span are assumed to only be selected by a backward strategy. This division should be reflected by the empirical data.
First, it would be quite an unexpected finding if the regression landing sites show, for example, a Gaussian or a linear distribution over the sentence, thus ranging from very short to very long sizes with no further distinctions. We would rather expect that the majority of regressive saccades land within the perceptual span. In addition, we would expect that we are able to find a clear pattern for regressions that land outside the perceptual span because these regression targets are assumed to be selected by a strategy. Murray and Kennedy (41), for example, identified three different regression strategies in the context of anaphor processing: re-reading ab initio, selective reinspection of some words, or right-to-left backtracking. For the first scenario, for instance, we would expect to see a clear tendency for long regressions to target the beginning of a sentence.
Second, in the case that there exists a well-defined target position from a theoretical linguistic point of view (as for example, in garden path sentences), we would expect that this defined target position is selected as a regression target only if it is within the perceptual span. If the ambiguous word is outside the perceptual span, for instance, no preference for a selection of this word is predicted, unless it is selected by the strategy.
(6) Independency of forward and backward threshold
Within the IGF it is assumed that the duration of firstpass reading times is monitored by the forward threshold on one hand and the probability to regress by the backward threshold on the other. Although there is considerable evidence that these two thresholds highly interact (as for example indicated by the speed-accuracy tradeoff), we assume that these two parameters can be set independently.
Thus, we predict that there are cases where a more risky forward strategy does not necessarily lead to an increased probability of regressions. On the other hand, there should be cases where the probability of regressions is increased despite the fact that there are no longer first-pass reading times.
(7) Regressions are sensitive to task modulations
Since regressions are assumed to be mediated by both the forward and backward threshold, we would expect that an adjustment of these thresholds should have an impact on the probability of triggering a regression. In particular, top-down influences like task or time pressure should affect the regression behavior during reading leading to more or less regressions, respectively.
Testing the Information Gathering Framework
In the last section we described the architecture of the IGF and also outlined some predictions that can be derived from the framework. In the following we will look for further empirical evidence by applying these predictions to an experiment conducted by Weiss and colleagues (57).
In this experiment, 92 English native speakers were asked to read 99 English sentences in total while their eye movements were monitored. These English sentences contained 36 semantic reversal anomalies (SRAs), 39 relative clause sentences (RC) and 24 garden path sentences (GP; see
Table 2 for an overview), where each of the RC and GP sentences was followed by a comprehension question.
Crucially, the question difficulty was manipulated between subjects: While one group received only easy comprehension questions (e.g., probing for a word), the other only received questions that required a deeper understanding of the sentence (see [58, 59] for similar manipulations).
The analysis revealed that for anomalous SRA sentences first-pass reading time and go-past time on the verb and object regions was significantly increased which was also shaped by the association between the verb and object. Difficult questions, however, led to significantly longer reading times and more regressions in the sentence-final region, indicated by a significant effect for question difficulty on go-past time and regressions out. But question difficulty had no significant effect on the earlier regions nor interacted with first-pass reading. This was also true for the RC and GP sentences.
To further clarify this pattern, regions 1–4 (SRA sentences) or 1–2 (RC and GP sentences) were merged to one region and the original final region was divided into two regions. The new final region consisted of the last 2–3 words of the sentences. Again, difficult questions induced longer go-past times in the final region for all three sentence types but neither in the first nor in the second region.
Let us now see how the IGF may account for these results.
(1) Task manipulation should only affect regression rates
From the perspective of the IGF, we expect that the task manipulation should adjust the backward threshold. Thus, in the easy condition the subjects should have applied a more superficial reading strategy compared to the difficult condition which set the backward threshold to a lower level. More precisely, the IGF makes the strong prediction that this task manipulation should only affect regression rates but not first-pass reading times.
Interestingly, that is exactly the pattern that was found in the data. For the SRAs, the anomaly effect became apparent in first-pass reading irrespective of the task manipulation. However, although the question type did not affect first-pass reading behavior, difficult questions induced significantly more regressions. We may interpret these results as evidence for adjusting the backward threshold independently of the forward threshold by using different reading strategies.
(2) Task manipulation should only affect regressions of type II (missing evidence)
A second prediction that can be directly derived from the model’s architecture is that adjusting the backward threshold should only affect regressions of type II (due to missing evidence) but not regressions of type I (due to integration difficulties). Thus, we would expect to find an increasing number of regressive eye movements from the end of a sentence but not from the regions before.
Again, the reported results are in line with this prediction: In all three sentence types there was a significant increase of regressions out of the last 2–3 words of a sentence for the difficult condition. This was not the case for the regions before. Thus, the backward threshold seems to only affect regressions of type II (due to missing evidence) but not regressions of type I (due to integration difficulties).
(3) Shorter fixation durations before regressions of type I (integration difficulties)
The IGF makes the strong prediction that fixation durations before regressions should be shorter compared to fixation durations preceding progressions, but only before regressions of type I (due to integration difficulties). This means that we should find shorter fixation durations before regressions in all sentence regions except the last region, where we expect to find either no or a reduced effect of saccade type.
In order to test this prediction, we re-analyzed the data by identifying all inter-word saccades of the SRAs (n=41.800) and categorized them as progressive (n=31.671) or regressive eye movements (n=10.129), respectively. After that we attributed these saccades to the six regions of the sentence (for an example of the regioning-scheme, see
Table 2).
A first analysis revealed that fixations before regressions were generally shorter (mean 217 ms) than fixations before progressive saccades (mean 222 ms). This difference of about 6 ms was highly significant (
t(14691) = 4.92,
p<.001). Looking at the means for the single regions, we also observed that this difference ranged from about 10 to 22 ms in regions 1–5 but dropped to about 2 ms in the last region (see
Figure 4). We checked if this difference was significant by fitting a linear mixed effect model of the log fixation duration of the preceding fixation. For this we combined regions 1–5 to a new region (region_early) and compared this with region 6 (region_late), treating SACCADE TYPE and REGION as well as their interactions as fixed effects.
We also used random intercepts for subjects and items and took the maximal random effect structure. Following convention, we treat t>|2| as significant.
The results of the linear mixed effect models showed that SACCADE TYPE (ß = .07, SE = .01, t = 6.28) and REGION (ß = .10, SE = .01, t = 7.31) as well as their interaction (ß = -.05, SE = .02, t = -2.48) had a significant impact on fixation durations. Thus, although fixations before regressions were generally shorter (indicated by the significant effect of SACCADE TYPE), this effect was absent in the last region of the sentence (indicated by the significant interaction of SACCADE TYPE X REGION).
This somewhat surprising finding fits well with the prediction made by the IGF: Because only regressions of type I (due to integration difficulties) are triggered in the way that the preceding fixation is cancelled, only fixations before these regressions should be shorter.
Another interesting, although unrelated, finding is that fixation durations generally increase during the course of the sentence (indicated by the significant effect of REGION, see also
Figure 4). In terms of the IGF, this points to idea that the amount of information that has to be dealt with increases during the course of the sentence which leads to longer computation times until the forward threshold of confidence is reached. It might be worthwhile to examine the reasons for that in more detail by future research.
(4) Regression amplitudes and landing sites of regressive eye movements
Although the IGF is not very specific with regard to the landing site distributions yet, we nonetheless would expect that the perceptual span is reflected in the saccade amplitude of regressions. Thus, because regression target selection is assumed to be linguistically constrained but also needs precise spatial knowledge (see Inhoff et al., 2005, for a discussion), the majority of regressions should target a word within the perceptual span. Thus, we first computed the amplitude of all regressive eye movements in the SRA sentences (see
Figure 5).
This analysis revealed that 74.81% of all regressions fell within the 15-character window left to the current fixation. However, because we took all regressions, the distance to the beginning of the sentence was reduced for some of them. Thus, we conducted a second analysis and restricted it to regressions that were initiated in the final region only (using the regioning scheme outlined above).
As becomes apparent from
Figure 6, we see a similar pattern, but the proportion of regressions within the 15-character window dropped to 51.61%. Anyway, at about 15– 20 characters there seems to be again some kind of invisible boundary for which the probability to be crossed by a regressive eye movement is clearly reduced. This fits well with the assumption of the IGF that the linguistically driven selection of target positions is limited by the perceptual span. From this data we may conclude that the perceptual span comprises about 15–20 characters to the left of the current fixation for regressive eye movements, although certainly more research is needed here.
Because the number of characters varied within sentences and regions, the saccade amplitude it not very meaningful with regard to the actual location in the sentence where the regressions landed. Thus, we further investigated the landing site distributions by aligning the target positions with the six sentence regions defined above.
When taking all regressions into account we see a clear tendency to target the first region of the sentence (29.51%), thus probably resulting in subjects re-reading the whole sentence again (see
Figure 7). When only focusing on regressions from the final region, we see again an increased tendency to regress from the sentence beginning (14.45%) but substantially more regressions (33.18%) landed in the pre-final region (which is a quite expected pattern given the results of the amplitude analysis above). These results are fully in line with the predictions of the IGF: The majority of regressions target a position within the perceptual span but if they cross this span, most likely a strategy is applied which is for subjects to re-read the whole sentence again. This also fits well with the regression patterns reported by [
11,
12].
However, because the experiment was not designed to conduct an analysis on the landing-site distributions, factors like region length were not controlled. Thus, these results just give a first impression but stress the need to investigate the target pattern of regressive eye movements in more detail by future research.