Attention to a Moment in Time Impairs Episodic Distinctiveness during Rapid Serial Visual Presentation

: Human attention is limited in the ability to select and segregate relevant distinct events from the continuous ﬂow of external information while concurrently encoding their temporal succession. While it is well-known that orienting attention to one external target stimulus impairs the encoding of ensuing relevant external events, it is still unknown whether orienting attention to internally generated events can interfere with concurrent processing of external input. We addressed this issue by asking participants to identify a single target embedded among distractors in a non-spatial rapid serial visual presentation (RSVP) stream and to indicate whether that target appeared before or after an internally estimated midpoint of the stream. The results indicate that (a) such an internally generated temporal benchmark does not interfere with the identiﬁcation of a subsequent physical target stimulus but (b) the two events cannot be accurately segregated when the physical target immediately follows the internally generated temporal event. These ﬁndings indicate that the asymmetrical distribution around the midpoint of order reversals reﬂects an impaired temporal discrimination ability. Orienting attention to a moment in time reduces episodic distinctiveness as much as orienting attention to external events.


Introduction
Human optimal adaptation to the environment relies on the ability to identify goalrelevant events at the expense of irrelevant ones, along with the temporal information they convey [1], especially when multiple or fast-changing pieces of information must be managed concurrently or in rapid succession [2].
It is well known that these abilities come with costs. Indeed, the human attentional system is often unable to deploy resources to multiple items concurrently or in close succession [2] and to segregate proximal sensorial events accurately [3].
Traditionally, the temporal constraints of attention have been experimentally investigated by overloading the visual attentional system with concurrently or rapidly presented task-relevant events [4]. When focusing on task-relevant information, the ability of consciously reporting or responding to other subsequent information is usually impaired [4,5]. One such procedure is the rapid serial visual presentation (RSVP), whereby visual events rapidly succeed one another in foveal vision at a rate of about ten items per second. The human ability to consciously recognize the second of two targets (T1 and T2) embedded among distractors is severely compromised when T2 is presented within 100-500 milliseconds after T1. Under these conditions, subjects perceive T2 at a rate of approximately 50%, a phenomenon referred to as the attentional blink (AB; [4]). Crucially, when T2 immediately follows T1 and no distracting information is presented in between, such events can be processed in parallel and be accurately reported (Lag 1 sparing), at the cost of a loss of episodic distinctiveness. Information about the temporal order of stimuli is usually lost (e.g., they are frequently reported in reversed order) [6][7][8][9]. Hence, it seems that attentional Symmetry 2021, 13,1938 2 of 9 constraints on visual perception do not allow for both the identity and the temporal order of two rapidly presented relevant visual events to undergo full processing simultaneously.
Traditional accounts of AB rely on the idea of a low-level central bottleneck in information processing that does not allow perceptual processing of more than one event at a time (e.g., [10]). However, both reported and missed targets produce similar semantic and repetition priming effects on consecutive successfully reported targets during the AB and elicit event-related potentials (ERPs) associated with visual and semantic processing [11][12][13]. These results suggest that even unreported targets undergo a high-level of processing (e.g., semantic).
In addition, more recent evidence suggests that top-down processes such as temporal expectations [14][15][16][17][18] also play an important role in the attentional blink, suggesting that the expectations and the learned contingencies of the environment could play a role in improving the structural knowledge of the task. Indeed, the literature has extensively shown that information about the temporal dimensions of stimuli and events can orient attention and improve subsequent performance (e.g., [19,20]). Importantly, (at least) part of this anticipatory ability has been described to be flexible and under endogenous control (e.g., [20]).
Despite the evidence that top-down cognitive control processes (e.g., temporal expectations) have a role in modulating performance on the attentional blink, it is still unknown whether internally generated events can affect the subsequent processing of external input. Indeed, in typical RSVP paradigms attentional constraints are generally assessed by means of bottom-up external events, such as the onset of a target stimulus (e.g., T1).
Evidence already exists that the deployment of attention to sensory input and to representations in long-term memory is governed by similar neurocognitive processes [21,22]. This supports the hypothesis that an internally generated event, such as a temporal benchmark, can indeed open an attentional episode, engaging attentional resources as much as an external perceptual event does, so as to interfere with the identification or with the temporal localization of a subsequent external stimulus.
From a theoretical point of view, results coherent with such a hypothesis would shed light on the possibility that internal and external input processing share attentional resources, undergoing similar temporal constraints.
In this regard, the present study was implemented to investigate the possibility that internally generated temporal tasks embedded in an RSVP stream might act by the same token as external stimuli in the dynamics of the AB. To do so, we designed a single-target, non-spatial RSVP task, asking participants to perform two tasks: an identification task, requiring to report a physical target stimulus among distractors, and a temporal judgment task, requiring to indicate whether the target appeared before or after the estimation of the midpoint of the stream. Importantly, since the temporal midpoint was not explicitly signalled, it lacked bottom-up dimensions. We hypothesized that if the formation of a temporal expectation in a specific point in time (the midpoint) operates as a physical stimulus, then it should interfere with the ongoing or subsequent stimulus processing. More specifically, in the identification task this would happen if the temporal dimension interferes with the consolidation of the target memory trace. Considering temporal judgments, if the benchmark and the target do not interfere with each other, we should observe a symmetrical performance around the midpoint, as uncertainty should be equally distributed before and after the midpoint. Conversely, if the internally generated benchmark is capable of interfering with the target stimulus, we should observe that accurate judgments of the target temporal dimension are not equally distributed around the midpoint.

Participants
Twenty volunteers (M Age = 23.6, SD = 4.67; 13 females, 7 males; 1 left-handed) took part in the study. They had a correct or corrected-to-normal vision. The study was approved by the Institutional Review Board of the Department of Psychology at the Sapienza University of Rome and conducted in accordance with its policies and with the Declaration of Helsinki. All participants provided written informed consent and were unaware of the hypotheses of the study.

Stimuli and Apparatus
Stimuli in the experimental phase were white alphanumeric characters displayed in foveal vision on a black background through Rapid Serial Visual Presentation (RSVP) streams. Capital Latin letters were used as distractors (all but B, G, I, J, O, S, and Z), and Arabic digits (from 1 to 9, excluding 5) as target items. A red 5 was presented only during the practice trials, always at the midpoint position, to ensure participants developed a temporal expectation of the RSVP streams.
All the stimuli were about 2 × 2 cm. Participants were seated approximately 60 cm from the screen, a 17" CRT Samsung computer monitor, subtending approximately 1 • of visual angle. The RSVP task was implemented on E-Prime and launched on a Dell PC.

Experimental Protocol
For half of the participants, each RSVP stream ( Figure 1) consisted of 25 stimuli (fixedlength condition), whereas the other half was presented with streams of variable length (25, 31, and 37 stimuli; variable-length condition). This condition was designed to exclude the potential role of the development of automaticity and expectations. Indeed, we expect the same asymmetrical effect around the midpoint when participants are required both to maintain (fixed length condition) or to update trial-by-trial (variable length condition) their representation of the estimated midpoint.

Participants
Twenty volunteers (M Age = 23.6, SD = 4.67; 13 females, 7 males; 1 left-handed) took part in the study. They had a correct or corrected-to-normal vision. The study was approved by the Institutional Review Board of the Department of Psychology at the Sapienza University of Rome and conducted in accordance with its policies and with the Declaration of Helsinki. All participants provided written informed consent and were unaware of the hypotheses of the study.

Stimuli and Apparatus
Stimuli in the experimental phase were white alphanumeric characters displayed in foveal vision on a black background through Rapid Serial Visual Presentation (RSVP) streams. Capital Latin letters were used as distractors (all but B, G, I, J, O, S, and Z), and Arabic digits (from 1 to 9, excluding 5) as target items. A red 5 was presented only during the practice trials, always at the midpoint position, to ensure participants developed a temporal expectation of the RSVP streams.
All the stimuli were about 2 × 2 cm. Participants were seated approximately 60 cm from the screen, a 17" CRT Samsung computer monitor, subtending approximately 1° of visual angle. The RSVP task was implemented on E-Prime and launched on a Dell PC.

Experimental Protocol
For half of the participants, each RSVP stream ( Figure 1) consisted of 25 stimuli (fixed-length condition), whereas the other half was presented with streams of variable length (25, 31, and 37 stimuli; variable-length condition). This condition was designed to exclude the potential role of the development of automaticity and expectations. Indeed, we expect the same asymmetrical effect around the midpoint when participants are required both to maintain (fixed length condition) or to update trial-by-trial (variable length condition) their representation of the estimated midpoint. Each trial started with a central fixation cross. Participants were asked to press the spacebar to initiate the RSVP stream autonomously. The first stimulus in the stream appeared after 500 ms of a blank screen. Only in the variable-length condition, after pressing the spacebar, a word (Italian terms for short, medium, and long) replaced the fixation cross, explicitly cueing the length of the upcoming stream, and remained on the screen for Each trial started with a central fixation cross. Participants were asked to press the spacebar to initiate the RSVP stream autonomously. The first stimulus in the stream appeared after 500 ms of a blank screen. Only in the variable-length condition, after pressing the spacebar, a word (Italian terms for short, medium, and long) replaced the fixation cross, explicitly cueing the length of the upcoming stream, and remained on the screen for 500 ms (an additional 50 ms of a blank screen). Each stimulus of the stream remained on the screen for 82 ms with no interstimulus interval (refresh rate = 85 Hz).
The experiment started with a practice phase. In this phase, participants were asked to launch and attend the RSVP streams and focus on locating the mid-position of each stream (signalled by the red 5). According to other results [17,18], this kind of training should have been able to create expectations and instantiate top-down attention over the temporal mid-point. Except the 5, all the other stimuli in the practice phase were letter distractors. Thirty trials in the fixed and 48 in the variable (16 for each stream length) condition were presented.
In the experimental phase, a white letter distractor replaced the red 5 and one different white digit target was pseudo-randomly presented at a distance (Lag) of 1, 3, or 9 positions before or after the midpoint of the stream (always the 13th position in the fixed-length task; the 13th, the 16th, or the 19th positions in the variable-length task for the short, medium, and long streams respectively). Hence, the target might appear in the first or the second (Half condition) half of each stream, determining 6 possible lags (− 9, − 3, and − 1 for the first half of the stream; + 1, + 3, and + 9 for the second half of the stream). In this way we aimed at investigating temporal ordering effects between the physical (the digit) and the temporal (the midpoint benchmark) targets, allowing both to act as potential first (T1) or second (T2) targets, during the different phases of the AB (Lag 1 sparing, inside, and outside the AB).
The lags were chosen to detect the trends associated with a typical AB curve. After 500 ms, the end of the stream, a question appeared on the screen asking participants to type on the keyboard the digit they observed (identification task). Participants were properly informed that the digit 5 was never presented and to guess if not sure of the identity of the digit. Following their response, a second question invited them to judge whether the digit appeared in the first (left arrow) or the second (right arrow) half of the stream (temporal order judgment task). The total number of trials was 288 for both the stream length conditions where each "Lag × Half" cell consists of 48 trials. Short, medium, and long streams were equally presented across conditions in the variable-length task. For both tasks, digit targets were equally presented in each cell, in order to prevent numeracy effects (e.g., [23,24]).

Statistical Analyses
In order to assess if the ability to identify the target changed according to its distance from the internally generated temporal benchmark, the proportion of correct target identifications over the total number of trials was normalized through arcsine square root transformation and analyzed in a 2 × 2 × 3 mixed ANOVA as a function of the stream lengths (fixed or variable; between-subjects), the position relative to the midpoint (before and after; within-subjects), and the lag between the midpoint distractor and the target (1, 3, 9; within-subjects).
To investigate whether the target identification interferes with the temporal midpoint identification accuracy, the proportion of correct temporal judgments was calculated as the proportion of total correct responses when the target had been correctly identified. Arcsine square root transformations were performed on proportions to normalize data. Data were then analyzed in a 2 × 2 × 3 mixed ANOVA, with stream length (fixed or variable) as a between-subject factor and position relative to the midpoint (before or after) and lag (1,3,9) as within-subject factors.

Identification Task
The ANOVA revealed no significant main effect ( Performance in the identification task was near-optimal for all the conditions (nontransformed means in Table 1). These results indicate that the internally generated temporal benchmark does not interfere with target identification, regardless of whether the target appears before or after the midpoint.

Temporal Order Judgment Task
The ANOVA showed a significant main effect of Lag (F2,36 = 258.261, p < 0.0001, partial eta2 = 0.93) due to the higher accuracy for longer than shorter lags from the midpoint, and no significant main effects of both Stream-Length (F1,18 = 0.008, p = 0.93) and Half factors (F1,18 = 3.263, p = 0.08, partial eta2 = 0.15). A significant interaction between Half and Lag was also found (F2,36 = 3.242, p = 0.05, partial eta2 = 0.15), with Newman-Keuls post-hoc test showing a significant difference of accuracy at Lag 1 before and after the midpoint (p < 0.01) and between Lag 3 before and after the midpoint (p = 0.02), while the difference between Lag 9 before and after the midpoint was not significantly different (p = 0.96). The differences observed at lags 1 and 3 were due to a significant reduction in midpoint identification accuracy when the target appeared in the second half compared to the first half of the stream for both Lag 1 and Lag 3 conditions, indicating that target identification interferes with the temporal midpoint identification accuracy (Figure 2).

Discussion
The present study aimed to investigate whether the typical pattern of AB results due to the reciprocal influence between the processing of two rapidly presented target stimuli can be found even when one of these two targets does not consist of a physical event, but rather a temporal event. To this purpose, we manipulated the relative position of these two targets, the internal benchmark and the physical target, and assessed the effects of the temporal distance and the order of presentation of the two events on both target identification and temporal judgment.
Firstly, in line with the single-target condition of the AB tasks (for a review, see [25]), we found a near-optimal performance at the identification task, indicating that the task was cognitively effortless.
Secondly and more importantly, we found that the performance in the temporal judgment task was clearly lag-dependent, with the short lags being progressively less ac- To test whether participants had effectively lost their ability to temporally segregate the target from the benchmark at Lag 1, we compared their performance with the one expected by chance, as if they were guessing the temporal localization of the target (50% of before/after responses). Results revealed that the Lag − 1 condition presented proportions significantly higher than chance level (t = 5.0009, p < 0.0001), confirming that they were Symmetry 2021, 13, 1938 6 of 9 able to correctly judge the temporal location of the target with respect to the midpoint when the target appeared at Lag − 1. Differently, performance at Lag + 1 was not different from chance level (t = 1.481, p = 0.15), indicating that they were not able to correctly judge the temporal location of the target when it appeared at Lag + 1.
This asymmetrical performance indicates that participants accurately segregated the two events only when the target was presented in the first half of the stream (nontransformed means in Table 2). Overall, these results indicate that (a) the internally generated temporal benchmark does not interfere with the identification of a subsequent physical target stimulus, but (b) the two events cannot be accurately segregated when the physical target immediately follows the internally generated temporal event.

Discussion
The present study aimed to investigate whether the typical pattern of AB results due to the reciprocal influence between the processing of two rapidly presented target stimuli can be found even when one of these two targets does not consist of a physical event, but rather a temporal event. To this purpose, we manipulated the relative position of these two targets, the internal benchmark and the physical target, and assessed the effects of the temporal distance and the order of presentation of the two events on both target identification and temporal judgment.
Firstly, in line with the single-target condition of the AB tasks (for a review, see [25]), we found a near-optimal performance at the identification task, indicating that the task was cognitively effortless.
Secondly and more importantly, we found that the performance in the temporal judgment task was clearly lag-dependent, with the short lags being progressively less accurate compared to the long lags, but in a way that is not entirely predicted by the uncertainty of temporal estimation. Indeed, if no interference occurred between the two events, we should have observed a symmetrical performance with the point of maximal uncertainty (chance performance) being equally distributed immediately before and immediately (Lag − 1 and + 1) after the midpoint [26][27][28]. Instead, the results obtained advocate for the disruption of the temporal segregation of the two events when the physical target immediately follows the internally generated temporal event, indicating that the episodic distinctiveness between the two events is lost.
According to the eSTST model [6], the representations of two or more target stimuli presented in rapid succession with no intervening distractors are embedded into a single attentional episode. Despite the fact that information regarding their identity is usually spared, the perception of their temporal order is impaired. In our results, the impairment in temporal judgment of the targets presented in the second half of the stream is due to the fact that the temporal benchmark set by participants was able to open an attentional episode, which included the post-benchmark target (Lag + 1), causing the ordering information between the two to be lost. Indeed, the order encoded in working memory is dependent upon target strength and regulated by means of a competitive mechanism during the consolidation stage. This serial visual encoding (and the whole AB phenomenon) has the function of temporally segregating episodic representations stored in working memory. When a gap is presented in between two or more task-relevant items, inhibitory mechanisms occur to isolate the first attentional episode, preventing new information from entering the already busy consolidation stage. Instead, the human attentional system lets Symmetry 2021, 13, 1938 7 of 9 an uninterrupted sequence of relevant stimuli to be entirely encoded in parallel (allowing for the "sparing" to occur) thanks to excitatory mechanisms, but does not guarantee their episodic distinctiveness, provoking order reversals, integrations, repetition blindness, and decreased T1 identification. Thus, the results of the present study provide preliminary evidence that such a detriment in episodic distinctiveness can also be generated by attentional engagement to a task-relevant internal event, constituted of a purely temporal expectation, lacking physical features.
One may wonder why the same detrimental effect observed in the second half of the stream is not obtained also when the target precedes the internal benchmark. A long research tradition has extensively highlighted that the competitive race between targets is guided by prior entry mechanisms [29][30][31]. The stronger the target the earlier is its entry in working memory. The boost and bounce model [32] suggests that the onset of T1 enhances the identification of T2 resulting in order reversals and reduced accuracy of T1. However, in our case, the appearance of the target is not succeeded by a second target sharing physical features with the first one. Assuming that prior entry is a phenomenon governed by bottom-up saliency-driven mechanisms, the attentional enhancement generated by the identification of T1 has no T2 to "boost". Instead, the post-target stimulus is always occupied by a distractor (a letter), exacerbating the "bounce", and ensuring the target can be correctly temporally identified (at least more than a random performance). Since we also found an asymmetry on Lag 3 (246 ms later), which is sequentially divided by non-relevant stimuli, it would be possible to argue that the internal benchmark is not strong enough to trigger subsequent inhibition, allowing for the attentional episode to spread until Lag 3.
A potential explanation of the interference observed between an internally generated event and a target stimulus may rely upon the idea that the temporal benchmark identification is a hypothesis-testing process. In this view, the temporal judgment is a continuous process starting when a tentative midpoint identification hypothesis is formulated (at the time the benchmark is set) and ending at the end of the stream, when the initial hypothesis is confirmed or adjusted according to the actual duration of the stream. If the target appears during this ongoing process (i.e., after the midpoint), the temporal judgment is impaired (for a review, see [33]).
One alternative explanation regards the possibility that the effects observed are due to time perception mechanisms. Cognitive models of time estimation (see [34]) suggest that subjective duration of time is due to the monitoring, by a "switch", of the accumulation of pulses emitted by a peacemaker. Attentional resources support the functioning of this switch. However, when attention is distracted from the passage of time, the "switch" is disrupted, and "time flies". In the present paradigm, participants might have been engaged in the identification task first, distracting themselves from the passage of time. Once the target appeared, they might have switched to the temporal task, restoring the time-keeping mechanism. As a result, the perceived duration between the target and the end of the stream resulted in being longer than its actual duration, causing the subjective midpoint to be shifted toward the end of the stream. However. we found that asymmetries also exist at lags longer than Lag 1 (i.e., Lag 3). Whereas it is conceivable that temporal discrimination at Lag 1 could be hard, it is less immediate to postulate shifts of the subjective midpoints of more than 240 ms. Moreover, if participants did not engage themselves in the midpoint estimation, we would not have observed the "V shape" in our results (Figure 2). Such a pattern suggests that uncertainty about the correct temporal order of the benchmark and the target increases as the lag reduces, promoting near-chance performance at minor lags as literature on temporal order judgment would predict (e.g., [27,28,31]). In spite of that, a more direct measure of the temporal estimation accuracy would be useful in future studies to exclude or determine its role in the pattern observed.
Alternatively, the preference for the first-half responses could be the result of a more general biased estimation, typically described in spatial paradigms. For instance, one study [35] observed that the identification of a first, centrally presented target (T1) impairs a successive temporal judgment about which of two lateralized stimuli appeared earlier. The authors found this impairment when the targets to be judged (T2) were presented short SOA after T1, but not when participants were required to ignore T1, compared to a long SOA condition, consistently with a resource depletion hypothesis. Individuals involved in a temporal judgment between two spatially presented targets exhibited a rightward shift when a non-spatial identification task preceded the judgment task by 280 ms (AB). Instead, they found at a longer lag and in no-T1 trials (low cognitive load) a leftward bias, resembling the left-hemifield advantage observed both in dual-stream RSVP studies [36] and in pseudo-neglect literature [37] mainly using line bisection tasks. Some studies have observed the pseudo-neglect effect also in the absence of visuospatial processing, a phenomenon known as a representational pseudo-neglect [38]. Given the strict relation existent between the representation of space and time, and magnitudes in general (ATOM, [39]), participants might have recalled a spatial representation of the RSVP temporal stream. Consistent with the representational pseudo-neglect phenomenon, they could have exhibited a left hemifield advantage with higher accuracy when the target appeared in the first compared to the second half. However, such an interpretation requires more assumptions than a simple episodic account does, as it requires the assumption that (a) the representation of the stream is spatially recalled; (b) that there is a left hemifield advantage; and (c) this left hemifield advantage only affects the physical target stimulus but does not affect the midpoint identification. Based on these considerations, we would exclude such a representational pseudo-neglect hypothesis.
In summary, the results of the present study provide preliminary evidence for the capacity of an internally generated temporal benchmark to asymmetrically interfere with temporal information of a subsequent target stimulus, in a way which has been traditionally found between two or more physically present items. Despite the fact that further research is needed, the present preliminary findings suggest that during rapidly presented visual stimulation, the temporal segregation between two events is disrupted even when one of the two events is internally generated.
Author Contributions: Conceptualization, P.Z., F.F. and S.S.; methodology, software, and formal analysis, P.Z.; writing-original draft preparation, P.Z.; interpretation and discussion of the results, P.Z., F.F. and S.S. All authors have read and agreed to the published version of the manuscript.