Listening for Imagery by Native Speakers and L2 Learners

Slobin’s thinking-for-speaking (TFS) hypothesis suggests that speakers are habitually attuned to aspects of an event that are readily codable in the language while they are formulating speech. This TFS process varies considerably cross-linguistically and can be observed in all forms of production and reception including listening for understanding or mental imagery. This study explored whether second language learners (L2) engage in mental simulation of deictic paths while processing motion language online. Forty Chinese native speakers (NSs) and eighty English-speaking learners of L2 Chinese participated in an online judgment task. They listened to motion sentences containing deictic paths while simultaneously watching a motion display of a towardor away-direction. Since simultaneous presentation of the sentence and the display of the same directionality require the same neural structures to process competing inputs, interference effects are expected and the reaction time to respond should take longer. Results of repeated measures ANOVA show interference effects for the NSs, but not for the L2 learners of both heritage and foreign language backgrounds, suggesting that while the NSs were sensitive to the deictic cues and automatically performed mental simulations of the deictic paths, the L2 learners’ listening for imagery did not pattern with the NSs. The results added to our understanding of L2 learners’ development of TFS in the new modality of listening for imagery.


Introduction
The extent to which language influences thinking has been a focus of study for decades in the fields of anthropology, linguistics and first language (L1) acquisition, psychology, and recently in second and bilingual acquisition research. Focusing on the influence of language on online thinking, Slobin's thinking-for-speaking hypothesis (TFS) [1][2][3] has provided new insights into this topic and attracted substantial research that explores issues pertaining to conceptual changes in language learning (e.g., [4][5][6][7][8]). This hypothesis suggests that speakers are habitually attuned to aspects of an event that are readily codable in the language while they are formulating speech. This TFS process varies cross-linguistically and can be observed in all forms of production and reception such as thinking for speaking/writing, listening/reading for understanding, thinking for translating, or listening for imaging [2,9]. Although Slobin's TFS framework underscores the online effects of language on thought processes, there have been very limited experimental methods employed to study the real time processing of TFS, especially for second language (L2) learners. It remains unclear whether or how well L2 learners can acquire the thought patterns of the L2 while processing the language online. This study filled the gap by adopting a simulation-based approach [10,11] to examine patterns of mental simulation produced by L1 speakers and L2 learners in real-time processing of motion language containing deictic paths.

Listening for Imagery by Native Speakers
Other methods that have been used to study processing of motion language are behavioral experiments designed to study mental imagery or simulation as well as the activation of neural patterns corresponding to perceptual or motor experiences, which are grounded in a variety of experiential domains, including cognitive, physiological, biological, and cultural (cf. [10]). Mental simulation or imagery as a basic form of cognition plays a crucial role in many thought processes such as communication, navigation, memory, and problem solving [16]. Simulation-based theories of language processing claim that understanding language involves the automatic and unconscious running of mental simulation related to the content of the utterance [10,[17][18][19][20][21][22][23]. Studies using neuroimaging technology, such as Positron Emission Tomography (PET) and functional Magnetic Resonance Imaging (fMRI), have provided convergent evidence concerning the localization of simulation (e.g., [24][25][26]). In these studies, it was observed that mental simulation for understanding an utterance uses the same neural circuitry as those activated when one actually perceives or performs the action named in the utterance. That is, when a language learner runs a simulation in understanding a motion sentence such as Throw the ball to me, the neural motor structures responsible for the action of throwing a ball are activated as well.
Two kinds of behavioral predictions are commonly made in simulation-based approaches: compatibility effects and interference effects [10]. The distinction between these two effects lies in the timing of the presentation of stimuli. Compatibility effects are expected when the presentation of the sentence and a corresponding image/action do not overlap temporally, and the presentation of the sentence appears before the image or action. In such a design, the simulation that occurs while processing the preceding sentence facilitates the response in the following task if the image or action matches the sentence in terms of orientation, shape, type of action, etc. Such a facilitation effect occurs because the neural regions responsible for comprehending the content of the sentence are activated, which primes the activation of the same neural regions for the subsequent compatible task. Zwaan et al. [23] used an experimental design to elicit compatibility effects. Participants first heard a sentence describing the motion of a ball toward or away from the listener (e.g., toward condition: The shortstop hurled the softball at you; away condition: You hurled the softball at the shortstop). They then saw pictures of objects presented in a smaller-bigger sequence (suggesting movement toward) or a bigger-smaller sequence (suggesting movement away) and were asked to decide whether the two pictures displayed the same object. A compatibility effect was observed. Participants responded more quickly to visual stimuli that matched the direction of movement described in the sentence they had heard, showing that the participants constructed a visual simulation during sentence comprehension.
The second kind of behavioral prediction is that of an interference effect. When the presentation of the sentence and the image/action overlap temporally, or the presentation of the sentence appears after the image or action, interference effects between the two tasks can be expected. In contrast to compatibility effects, the simultaneous presentation of the sentence and the image/action requires the same neural structures to process multiple tasks simultaneously, and thus processing is slowed for the two tasks. Kaschak et al. [11] is an example of a method that was designed to elicit interference effects. Participants were asked to decide whether the sentence they heard was meaningful or not while simultaneously watching a visual presentation of motion. The critical sentences described events involving movement in one of the four directions (up, down, toward, or away), such as The car approached you (toward condition) or The car left you in the dust (away condition), and the visual presentation also depicted one of the four directions. It was found that participants took longer to decide on the meaningfulness of the sentence when the visual stimulus and the sentence both involved motion in the same direction, because the same cognitive structure was required to process two competing tasks.
Simulations evoked by linguistic input can reflect universally-shared human experiences, as well as language-specific differences in patterns of imagery [10,27]. Such differences in the construction of mental simulations can be illustrated by the following two scenarios, involving boiling water and chewing betel nuts. Most people are familiar with the perceptual experience of seeing and hearing boiling water. When hearing the sentence The water on the stove is boiling, people unconsciously run a simulation depicting the heat, vapor, or bubbles rising from the bottom of the pot. However, some experiences can be specific to languages, cultures, and individuals. For instance, upon hearing the sentence Chewing betel nuts brought him a burning sensation and made his teeth and mouth bloody red, the mental representation evoked by this linguistic input can vary profoundly, depending on whether one has had the experience of chewing betel nuts or has seen other people do so, or whether one has other factual knowledge of betel nuts. For example, a mouth cancer expert may have a simulation with more vivid details concerning the medical effects of betel nut chewing, which would be unlikely to take place for people who lack such knowledge or relevant experience. Given that mental simulations are specific to languages and experiences, the typologically different TFS patterns observed between speakers of S-languages and V-languages are likely to be implicated in the existence of different simulation patterns between such two types of speakers.

Listening for Imagery by L2 Learners
Few attempts have been made to investigate whether and how language-driven mental imagery developed in L2 learning. Following the method used in Bergen et al. [19] with English NSs, Wheeler and Stojanovic [28] used an image-verb forced-choice matching task to test if proficient L2 English learners show the same interference effects as L1 learners. They showed first an image depicting an action and then a verb describing an action (e.g., grab, kick, or lick). The image and verb either used the same effector (e.g., grab and push both use hand) or different effectors (e.g., grab uses hand; lick uses mouth). The participants' task was to determine whether or not the image and the verb depicted the same action as quickly as possible. In the non-matching conditions, they found that it took the participants longer to decide when the image and verb used the same effector than when they did not.
When processing the image and verb that used the same effectors (e.g., image of jump followed by verb kick), it required co-activation of the same neural representations for processing the image and the verb, which resulted in a longer response time for two competing inputs involving the same areas of motor cortex. By contrast, when the image and verb depicted actions using different effectors, there was less similarity in the neural representations and thus made the simultaneous processing easier (cf. [29,30]). The 40 L2 participants in Wheeler and Stojanovic [28] were mostly international students studying in a public university in the United States and their mean length of English study was 14 years. Although the response times overall were longer than the NSs in Bergen et al. [19], the results suggested that proficient L2 learners can perform automatic mental simulation while comprehending motion language in a way similar to NSs.
In another study on simulation with English prepositions, Shoen [31] examined patterns of mental simulation in L1 and L2 English learners. The 36 L1 participants and 35 L2 participants first heard a sentence containing a preposition (e.g., the man went up the mountain) and then saw animated geometrical shapes representing the semantic content of the preposition with a moving object and a fixed landmark. The visual display either matched or mismatched (i.e., match condition: upward display vs. mismatch condition: downward display) the preposition heard in the sentence. The participants' task was to answer whether the moving object they saw was a square or a circle as fast as possible. The results suggested presence of compatibility effects for matching conditions, but was not statistically significant for the L1 group. The L2 group, by contrast, did not pattern with the L1 group and no statistically significant result was found.
While there have been a substantial number of studies on the role of mental simulation in sentence processing by L1 learners, little research has been conducted to explore L2 learners' online performance in this regard. Focusing on L1 and L2 TFS, the present study aims to explore whether or how well L2 learners of heritage and foreign language backgrounds engage in mental simulation of deictic paths while processing motion language online.

Chinese Deictic Paths
According to Talmy's motion event typology [12,13,32], the Chinese language typically uses path satellites to denote path information, which falls in the classification of a satellite-framed language. Slobin [14,33] reanalyzed such constituents as full path verbs with an equal status as the preceding manner verb, and treated both the manner verb and the path verb together as a serial verb construction. He proposed serial-verb languages such as Chinese and Thai should be classified into a third type-equipollently-framed languages. The debate over whether or not Chinese should be considered an equipollently-framed language has been a topic of heated discussion with no general consensus being reached thus far (see discussions of different stances in [32,[34][35][36][37][38][39]. Nevertheless, it is generally agreed that Chinese encodes deictic paths more frequently than English. Chen [35] (p. 53) reported that 55% of the motion event descriptions found in 59 Chinese frog stories encoded deictic paths. Wu [40] examined 240 motion events that involved movement of an object produced by 40 Chinese speakers and found that 93% encoded deictic paths. Slobin [14] also commented that space deixis seems to be more closely tied to conceptions of path for Chinese speakers, as compared to the other groups of speakers. In addition to a path particle such as into, out, or down that indicates direction of a movement, Chinese speakers frequently attach a deictic hither/thither path from the perspective of the speaker: lái 'moving toward the speaker' or qù 'moving away from the speaker'. The deictic path informs the interlocutors of the relative location where one is standing and where the moving figure is heading. As shown in example (1a), the hither path lái denotes that the agent Little Wang is walking into a space and moving toward the speaker. Substituting the hither path lái with the thither path qù would completely change the relative location of the two, where Little Wang is walking into another space further away from the speaker. If the deictic cue is overlooked or misinterpreted, the interlocutors would not be able to quickly construct the correct relative location between oneself and the moving figure. Such use of space deixis is habitually encoded when describing movement of objects as well. For instance, the thither path qù in (1b) tells that the movement of the inanimate object shū 'book' to the office is in a direction away from the interlocutors, suggesting that they are in a location other than the office. Note that the use of deictic paths is mostly a matter of choice, but there are occasions when it becomes a matter of grammaticality. Specifically, when the sentence does not have a location or object noun (e.g., removal of 'office' in example (1b)), it becomes necessary to encode the deictic path for the utterance to be grammatical (cf. [7]).
Properly uttering deictic expressions involves several simultaneous dimensions of processing. Interlocutors need to resort to perception, proximity, and ongoing interaction in the context of utterance [41,42]. For two interlocutors to communicate effectively, they need to share not only the same grammar, but also the same pragmatically appropriate ways in orienting themselves verbally and perceptually in context. Encodings of deictic paths presents language-specific TFS patterns, in which the deictic path needs to be frequently attended to in different social situations. It has been noted that this conceptual restructuring presents considerable challenge for L2 learners whose L1 does not require such encodings of deictic relations. The deictic paths were often misused or ungrammatically omitted by English-speaking learners of L2 Chinese [7,40]. Wu [7,40] also reported that heritage language learners (HLLs) performed better than foreign language learners (FLLs) of the same proficiency level in supplying deictic paths, potentially because HLLs have more opportunities to use deictic paths in different social contexts in real-life situations. It is therefore important to examine whether there is a difference in processing deictic paths between these two types of learners.
As suggested by Slobin [1], L2 learners may require a relatively long period of time to restructure their L1 TFS in order to be able to express motion events fluently in the L2. To explore the underlying cognitive processes that may not be obvious from speech, it would be desirable to apply methods developed in simulation-based research to explore the online processing of deictic paths in an L2. If L2 learners can process Chinese deictic paths in a way similar to L1 learners, which would require them to show similar simulation patterns as the L1 learners, this would suggest that they have successfully internalized the L2 TFS patterns.

Research Design
The present research partially replicated the design in Kaschak et al. [11] with L1 speakers and included both L1 and L2 participants to explore the "listening for imaging" phenomenon with online processing of the hither/thither paths. Specifically, it investigated: (a) the extent to which L1 and L2 participants run mental simulations when comprehending motion sentences that contain the Chinese deictic path lái 'moving toward the speaker' or qù 'moving away from the speaker'; and (b) how such simulations might relate to participants' language backgrounds or Chinese proficiency levels. The study was approved by the Institutional Review Board at a public university in the United States. All participants provided informed consent to participate in the study prior to the data collection.

Subjects
Forty L1 and 80 L2 learners of Chinese participated in the study. The L1 participants were international students or their spouses from China or Taiwan. The L2 participants were composed of heritage or foreign language learners enrolling in an intermediate-level Chinese language course or higher at the time when the study was conducted. To examine whether learner background plays a role in conceptual changes in processing deictic paths, the L2 participates completed a language background questionnaire to determine their status as either HLLs or FLLs. The criteria for classification of an L2 participant as an HLL were: learners (1) who identified his or her strongest language before the age of five as Mandarin Chinese or another Chinese dialect; or (2) who had one or both parents with Mandarin Chinese or another Chinese dialect as their native or dominant language, and that he or she also reported exposure to the language at home.
The L2 participants also completed a Chinese elicited imitation test (EIT) [43] to assess their L2 Chinese proficiency. The EIT was a Chinese version of the EIT developed by Ortega et al. [44], in which they tested parallel EITs in English, German, Spanish, as well as Japanese and confirmed that the EITs offered a good indication of learners' global L2 proficiency. The EIT was composed of 30 sentences ranging from 7 to 19 syllables in length and contained different vocabulary and grammatical patterns reflecting a varying degree of difficulty. The participants were required to repeat the sentence they heard as exactly as possible and their repetition was recorded and graded based on a 5-point scoring rubric for each sentence. No response or the repetition of only one word was given a score of 0, repetition of half of the sentence or less was given a score of 1, repetition that showed a change in content or form affecting the meaning was given a 2, accurate repetition of content with some changes to form was given a score of 3, and a perfect repetition was given a score of 4 (final full score = 120). A score of 50 was adopted as a cut point to divide the participants into a low or high proficiency group. Table 1 summarizes the L2 groups classified by language background (HLLs vs. FLLs) and proficiency level (Low vs. High) based on results of the EIT scores.

Materials and Procedure
An online judgment task was developed to examine interference effects that arise when simultaneously processing linguistic stimuli and perceiving visual stimuli. Participants listened to Chinese motion sentences containing a deictic path lái or qù that could easily be interpreted as either toward or away from themselves (see examples (2a) and (2b)) while simultaneously watching a black-and-white motion display.

a.
Toward-sentence Examples (2a) and (2b) are one of the 16 pairs of critical sentences used in the experiment. In each pair, the two sentences differed only in the use of the deictic path lái or qù, which resulted in a contrast in directionality for moving either toward or away. All of the critical sentences were controlled so as to be between 6 and 9 syllables in length and were composed of vocabulary that was judged appropriate for the L2 learners' proficiency levels. A complete list of the critical toward-and away-sentences is provided in Appendix A.
The motion display (see Figure 1 for a static screenshot) was intended to convey a sense of objects moving either toward or away from the participants. The toward-display was constructed by presenting an image of a sky filled with shooting stars moving from the center to the boundaries, and the away-display was created by reversing the timeline of the presentation, which resulted in the image of stars moving from the boundaries to the center. Each motion display was presented using a resolution of 450 × 338 pixels at a rate of 15 frames per second. The motion display (see Figure 1 for a static screenshot) was intended to convey a sense of objects moving either toward or away from the participants. The toward-display was constructed by presenting an image of a sky filled with shooting stars moving from the center to the boundaries, and the away-display was created by reversing the timeline of the presentation, which resulted in the image of stars moving from the boundaries to the center. Each motion display was presented using a resolution of 450 × 338 pixels at a rate of 15 frames per second. A match condition was produced when the sentence described motion in the same direction as the motion depicted in the visual display. In contrast, a mismatch condition was generated when the sentence described motion in a direction opposite to that depicted in the display. The participants' task was to decide whether or not the sentence they heard made sense. They pressed a key labeled "Y" if the sentence made sense, or pressed a key labeled "N" if the sentence did not make sense. They were told that their responses would be timed and they should respond as quickly as possible while still maintaining accuracy.
The prediction was that participants would take longer to decide whether the sentence made sense in a match condition than in a mismatch condition. An interference effect was expected because of the simultaneous presentations of the display and the sentence of the same directionality required the same neural structures to process the competing inputs, leading to interference between the two tasks (cf. [10,11]). For the L2 learner participants, if they were sensitive to the deictic cues and were able to simulate the toward-and away-imagery, they would show the same patterns of reaction times as the Chinese NSs in their online L2 listening-for-imagery performance.
In order to disguise the critical sentences, 24 filler sentences, including 12 sensible sentences and A match condition was produced when the sentence described motion in the same direction as the motion depicted in the visual display. In contrast, a mismatch condition was generated when the sentence described motion in a direction opposite to that depicted in the display. The participants' task was to decide whether or not the sentence they heard made sense. They pressed a key labeled "Y" if the sentence made sense, or pressed a key labeled "N" if the sentence did not make sense. They were told that their responses would be timed and they should respond as quickly as possible while still maintaining accuracy.
The prediction was that participants would take longer to decide whether the sentence made sense in a match condition than in a mismatch condition. An interference effect was expected because of the simultaneous presentations of the display and the sentence of the same directionality required the same neural structures to process the competing inputs, leading to interference between the two tasks (cf. [10,11]). For the L2 learner participants, if they were sensitive to the deictic cues and were able to simulate the toward-and away-imagery, they would show the same patterns of reaction times as the Chinese NSs in their online L2 listening-for-imagery performance.
In order to disguise the critical sentences, 24 filler sentences, including 12 sensible sentences and 12 non-sensible sentences (see examples (3a) and (3b)), were included in the experiment. The filler sentences were similar to the critical sentences in both length and level of difficulty for vocabulary.

a.
Sensible Four lists were generated, with lists 1 and 2 presenting the toward-display and lists 3 and 4 presenting the away-display. The 16 pairs of critical sentences and fillers were counterbalanced across lists. Participants were randomly assigned to one of the lists.

Procedure
The experiment was conducted on a PC laptop computer using SuperLab 4.5 Stimulus Presentation Software (Cedrus Corporation: Phoenix, AZ, USA). The experiment included four blocks, with each block having 10 sentences, including 4 critical sentences (2 toward-and 2 away-sentences) and 6 fillers (3 sensible and 3 non-sensible sentences) that are presented via headphones, while simultaneously showing the toward-or away-visual display. Participants were instructed to stare at the screen. A green fixation cross appeared before each block started, and a red fixation cross appeared after the block ended. A screen asking if the participant was ready for the next block appeared between the blocks. The participant then pressed any key to continue with the next block. During each block, the 10 sentences were presented with 4-seconds interval between sentence onsets for the L1 participants and with 7-second interval for the L2 participants. The L2 participants were given more time given that L2 sentence processing can be expected to require more time than L1 processing. The length of the adjusted interval was determined through pilot tests conducted on both low-and high-proficiency L2 learners. A practice trial containing 5 sentences was given prior to the experimental blocks. The participants had to respond correctly for all five practice sentences before they were allowed to proceed to the experimental blocks. Reaction times were measured and compared between the matching and non-matching conditions and among different groups, and accuracy for each sentence was also recorded. The L1 participants spent about 5-7 min completing the experiment, and the L2 participants spent about 8-10 min.

Patterns of Simulation by L1 Participants
The reaction times were calibrated for analysis. A calibrated value of 0 ms was generated when a response occurred at the end of the sentence. A negative reaction time was generated when a response was made before the sentence ended, and a positive value when a response was made after the sentence ended. The following procedures were adopted to eliminate outliers that could confound the results. First, participants were checked for having accuracy rate lower than 80% or having mean reaction times that were more than 2.5 standard deviations from all participants' mean reaction times. No participants were excluded due to these criteria. Next, incorrect responses were excluded. Responses that were more than 2.5 standard deviations from the mean for each participant were also removed. These two procedures led to the exclusion of less than 5% of the data. The average accuracy rate for all responses was 95.93%.
The remaining response times were submitted to a 2 (Sentence Direction: Toward vs. Away) × 2 (Compatibility: Match vs. Mismatch) repeated measures ANOVA. Figure 2 presents the mean reaction times for each condition. The effect for sentence direction was not significant, F(1, 39) = 3.771, p = 0.059, partial η 2 = 0.088, suggesting the toward-and away-sentences were generally processed at the same speed. Responses were significantly faster when the direction implied by the visual display did not match that of the sentence than when there was a match, F(1, 39) = 6.160, p = 0.017, partial η 2 = 0.136. There was no significant interaction between sentence direction and compatibility, F(1, 39) = 0.392, p = 0.535, partial η 2 = 0.010. These results are in agreement with Kaschak et al. [11], supporting the predicted interference effects. When the L1 participants simultaneously processed the visual display and the motion sentence, they took longer to decide whether the sentence made sense when the visual display and the sentence depicted motion of the same direction. This suggested that language comprehension requires the same general mechanisms that are used in perception of motion. The mental simulation activated by deictic cues is thus sensitive to the directional axis of toward versus away. The NSs' automatic online construction of mental simulations for the deictic cues shows that they are tuned to the cue of deictic paths in their L1 listening-for-imagery.

Patterns of Simulation by L2 Participants
The L2 learners' responses were screened following the same outlier analysis procedures as was conducted on the L1 data. The results show that a portion of the L2 participants in each group had an accuracy rate falling below the 80% accuracy threshold. Those learners' data were thus removed, because their responses suggested that there were other confounding factors that had influenced their performance, which were beyond the focus of this study. Table 2 summarizes the results of the accuracy analysis.  These results are in agreement with Kaschak et al. [11], supporting the predicted interference effects. When the L1 participants simultaneously processed the visual display and the motion sentence, they took longer to decide whether the sentence made sense when the visual display and the sentence depicted motion of the same direction. This suggested that language comprehension requires the same general mechanisms that are used in perception of motion. The mental simulation activated by deictic cues is thus sensitive to the directional axis of toward versus away. The NSs' automatic online construction of mental simulations for the deictic cues shows that they are tuned to the cue of deictic paths in their L1 listening-for-imagery.

Patterns of Simulation by L2 Participants
The L2 learners' responses were screened following the same outlier analysis procedures as was conducted on the L1 data. The results show that a portion of the L2 participants in each group had an accuracy rate falling below the 80% accuracy threshold. Those learners' data were thus removed, because their responses suggested that there were other confounding factors that had influenced their performance, which were beyond the focus of this study. Table 2 summarizes the results of the accuracy analysis. As can be seen in Table 2, only two HLLs and four FLLs from the low-proficiency groups had an accuracy rate above 80%, and the mean accuracy rate for the respective group fell below the 80% threshold. Even though the learner participants were given a longer interval of 7 s to determine whether the sentence made sense, it appeared that the majority of the low-proficiency learners were unable to succeed in this task. This suggests either that the low-proficiency learners needed a longer time to process the meanings of the sentences or that the task required more advanced L2 listening skills than their current levels. On the other hand, the high-proficiency learners were more successful in this task. The mean accuracy rates for both HLLs and FLLs met the 80% threshold. However a portion of the participants were still excluded because they were below the threshold. The remaining participants from the high-proficiency groups were 15 from HLLs and nine from FLLs.
Because the numbers of the participants from the low-proficiency groups were not sufficient for an inferential analysis, the remaining participants from the two proficiency levels were then merged into just two groups, namely, HLLs and FLLs. The new HLL group comprised 17 participants and the new FLL had 13. Among these participants, no one was excluded due to having a mean reaction time that was more than 2.5 standard deviations from the group mean. The remaining responses for HLLs and FLLs were, respectively, submitted to a 2 (Sentence Direction: Toward vs. Away) × 2 (Compatibility: Match vs. Mismatch) repeated measures ANOVA to examine whether the interference effects could be found as observed in the L1 data.
Languages 2016, 1, 10 10 of 18 skills than their current levels. On the other hand, the high-proficiency learners were more successful in this task. The mean accuracy rates for both HLLs and FLLs met the 80% threshold. However a portion of the participants were still excluded because they were below the threshold. The remaining participants from the high-proficiency groups were 15 from HLLs and nine from FLLs. Because the numbers of the participants from the low-proficiency groups were not sufficient for an inferential analysis, the remaining participants from the two proficiency levels were then merged into just two groups, namely, HLLs and FLLs. The new HLL group comprised 17 participants and the new FLL had 13. Among these participants, no one was excluded due to having a mean reaction time that was more than 2.5 standard deviations from the group mean. The remaining responses for HLLs and FLLs were, respectively, submitted to a 2 (Sentence Direction: Toward vs. Away) × 2 (Compatibility: Match vs. Mismatch) repeated measures ANOVA to examine whether the interference effects could be found as observed in the L1 data.

Discussion and Conclusions
Similar to findings reported in Wheeler and Stojanovic [28] and Shoen [31], comparisons of the mean reaction times between the L1 and L2 groups showed that the L2 learners spent longer time to make a response than did the NSs. However, neither the HLLs nor the FLLs showed patterns of interference effects as observed in the L1 data, suggesting that, while these L2 learners were capable of judging whether the Chinese sentences were sensible or not, processing the deictic paths lái or qù by listening did not trigger mental imagery directed toward or away which could interfere with the simultaneous processing of the visual presentations like did the L1 group. Although the majority of the participants were from the high-proficiency groups, the L2 learners' online processes of listening-for-imagery seemed to indicate that they had not yet totally internalized the L2-specific deictic paths in their L2 TFS, even with advanced proficiency. While L2 learners were able to produce motion language following L2 TFS patterns in the mode of writing or speaking as reported in previous studies (e.g., [5,7,8]), scrutinization of the online automatic processing of the motion language in the mode of "listening for imagery" may reveal a different story. On the other hand, the results of this study were in agreement with findings reported in Yoshioka and Kellerman [45], Choi and Lantolf [46], and Stam [47], in which they also examined the co-speech gesture produced by the L2 learners and found no sign of switching to L2 TFS (cf. Gullberg [48] for discussion on gesture and bilingualism). Given that Slobin's TFS hypothesis has highlighted cross-linguistic differences in online processes for language production and comprehension, it is crucial to examine the underlying cognitive mechanisms L2 learners adopt in real time language processing. The nonverbal and online methods adopted in the present study and the aforementioned gesture studies have provided a new lens to look into the development of L2 TFS. Although English speakers naturally run mental simulations of objects moving toward them when processing motion sentences such as The car approached you, automatic activation of simulation in processing Chinese deictic paths lái 'hither' or qù 'thither' proved to be a domain of L2-specific TFS patterns for English-speaking learners that had not yet been salient enough to be noted when the learners were listening to the motion sentences. Echoing Wu [8,40], it was reported that L2 learners of both HLL and FLL backgrounds produced significantly less deictic paths when describing motion scenes, as compared to Chinese NSs.
Nevertheless, caution must be made when interpreting the results. The lack of statistical significance for the compatibility factor could be attributed to the smaller number of participants who contributed data to the final analysis. Moreover, it was observed that, although the L2 learners were instructed to look at the computer screen while listening to the sentences, some of them occasionally avoided looking at the screen in order to better concentrate on listening to the Chinese sentences. Hence, whether the L2 learners were capable of simulating the deictic paths online awaits further research.
The results of the study also provide some suggestions for the future design of simulation-based experiments with L2 participants. First, simulation-based methodology uses reaction times to determine whether learners are engaging in online mental simulations of the content of the sentence, but low-or intermediate-proficiency learners may not be suitable for such designs, because their limited L2 processing abilities and varied L2 listening skills are very likely to complicate the results. Additionally, a design that is meant to elicit interference effects may present more of a challenge for L2 learners than one that is meant to generate compatibility effects. This is because the processing of L2 linguistic stimuli and performance of another perceptual or motor task at the same time, as required in an interference design, consumes more attentional resources than what is required in a compatibility design, in which the two tasks do not overlap temporally. Thus, an interference-based design may require participants with more advanced L2 processing skills than that required in a compatibility design. As observed in this study, some L2 participants tended to shift their focus from one of the required tasks. It is therefore important to videotape experiment sessions to keep track of participants' actual performance.