1. Introduction
Despite the intriguing findings of the above studies, they have been critiqued for largely focusing on isolated words. The tasks employed in these studies were designed to examine the behavior of individual words out of context. As the activation and integration of the meaning of words operate through different processes in normal silent reading, in cases of individual word recognition, there have been repeated calls to examine the behavior of emotion words under normal reading conditions. Thus, a few studies, e.g., (
Knickerbocker et al. 2015;
Scott et al. 2012), have recently attempted to explore the processing advantage of emotion words in contextualized tasks using eye-tracking. In these studies, the participants read sentences in which the target words were embedded, and the participants’ eye movements were monitored against several predefined measures. However, these studies have mainly focused on native language speakers to the neglect of bilinguals who are, in fact, the majority of the world’s residents.
The current study aims to address this gap through investigating the processing of L2 emotion words in normal silent reading by foreign language learners. Specifically, the current study will extend the work of
Knickerbocker et al. (
2015) on native English speakers to a population of Arab EFL learners. Through the use of 13 eye-movement measures, the current study will compare the processing of emotionally positive versus neutral words in Experiment 1 and of emotionally negative versus neutral words in Experiment 2.
The findings of the present study will be of great importance for this line of emotion research for three reasons. First, the study addresses a true gap in the literature. There is a dearth of research on the emotionality advantage among bilinguals in normal reading. This gap is particularly important because the native language is often described as being more emotionally advantageous than the L2 (
Caldwell-Harris 2014) and as being emotionally close while subsequent languages are emotionally distant (
Chen et al. 2015). Second, the findings will target a population that is relatively underrepresented in the psycholinguistic literature; namely, Arabic speakers of English. This particular population is important because Arabic and English are extremely distant languages that follow completely different linguistic systems. Third, the study will have theoretical implications in relation to the Revised Hierarchical Model (
Kroll and Stewart 1994) and the Mental Lexicon Model (
Jiang 2000). The study will also have practical implications in relation to language pedagogy.
To situate the present study, the following section will include a brief survey of earlier studies in this direction. This will be followed by listing our research questions and describing the study methodology. The results will then be presented and discussed in light of the literature survey. Finally, conclusions, including the relevant implications and suggested directions for future research, will be drawn.
3. The Current Study
Similar to
Scott et al. (
2012) and
Knickerbocker et al. (
2015), several studies have recently emerged to investigate word processing in silent reading using eye-trackers (
Gerth and Festman 2021;
Huang et al. 2022;
Kuperman and Deutsch 2020). However, there is a paucity of research on the processing of emotion words among bilinguals. This is the gap that the current study aims to fill. This study, which extends the work of
Knickerbocker et al. (
2015), is designed to investigate how emotionally positive and negative words are processed in comparison with neutral words in normal silent reading across several eye-movement and eye-fixation measures. The focus here will be on foreign language learners, particularly Arab EFL learners, who learned English as late bilinguals in an instructional setting and live within an L1-dominant community. Will the processing advantage for emotion words cited above (
Knickerbocker et al. 2015) be maintained in this context? Following
Knickerbocker et al. (
2015), we also aimed to examine any modulating effects that several item-related and participant-related factors might have on the potential advantage of emotion words (positive in Experiment 1 and negative in Experiment 2) over neutral words. More specifically, the current study addresses these research questions:
Is there an advantage for positive emotion words when compared to neutral words for EFL learners during natural reading? (Experiment 1)
What modulating effects do learner-related and item-related factors have on any potential advantage for positive emotion words over neutral words during natural reading? (Experiment 1)
Is there an advantage for negative emotion words when compared to neutral words for EFL learners during natural reading? (Experiment 2)
What modulating effects do learner-related and item-related factors have on any potential advantage for negative emotion words over neutral words during natural reading? (Experiment 2)
The modulating factors that we aim to examine in the present study include the following participant-related factors: measures of depression and anxiety (as represented by STAI and BDI scores); a rough proficiency measure (vocabulary test scores); and age, as our participants varied greatly in terms of their age. The item-related factors include: arousal; word length; and word frequency, which are well-established factors modulating the reading of emotion words (
Angele et al. 2014;
Barriga-Paulino et al. 2022;
Brysbaert and Vitu 1998).
4. Experiment 1
4.1. Participants
A total of 44 participants took part in Experiment 1. These were L1 Arabic—L2 English speakers in Saudi Arabia (36 female and 8 male). They ranged in age between 18 and 38 (M = 23.52, SD = 5.29). Age was controlled for statistical significance in the analysis (see below). They started learning English at an average age of seven and a half (M = 7.48, SD = 4.50), mostly in schools. The participants’ average self-reported proficiency across the four skills (reading, listening, reading and writing) on a score from 1 = very poor to 5 = excellent was 4.32 (SD = 0.52).
The participants completed the V_YesNo online vocabulary test (
Meara and Miralpeix 2017; maximum score = 10,000) as a proxy for their L2 proficiency. Their scores ranged between 4159 and 8966 (M = 6543.75, SD = 1039.75), which indicates a good-to-high level of proficiency (
Meara and Miralpeix 2017). We added the vocabulary test score as a main and interacting variable in the mixed-effects models (see below) in order to examine any potential effect of proficiency on reading times (RTs) for positive and neutral words.
The participants were presented with one of two versions of the eye-tracking experiment randomly (see Design below). Thus, nearly half of the participants took Version 1 of the Experiment (n = 23), and the other half took Version 2 (n = 21).
4.2. Design and Stimuli
The items were borrowed from
Knickerbocker et al. (
2015), who selected their target positive and neutral words from the ANEW database (
Bradley and Lang 1999).
Knickerbocker et al. (
2015) included 36 pairs, but we opted to select the 21 pairs with the most frequent target words (see
Appendix A). This was intended to ensure that our L2 participants were likely to know them. The target words were divided equally between three parts of speech (Verbs = 7, Nouns = 7, Adjective = 7). All target items belonged to the most frequent 4000-word families in English and were thus likely to be known by EFL speakers in the present context (see
Participants above). Only four target words (
appliance, optimism, soothe and
detest) belonged to higher frequency levels. Thus, we included word frequency as a covariate in the analysis (see below) to control for its effect on emotion word processing. The items in each pair were matched for various factors. The characteristics of the target items are presented in
Table 1. It is evident that the target items were similar along several variables, but they significantly differed in their valence and arousal scores. The former score is the variable under manipulation in the present study, and arousal will be controlled for in the statistical analysis (see below).
The sentence contexts were also adopted from
Knickerbocker et al. (
2015). Each pair of items was presented in two sentence contexts (counter-balanced across lists). The sentences were already normed for understandability and predictability with 20 native speakers of English (see a study by
Knickerbocker et al. (
2015) for more details). The following is an example of sentence contexts for the target words (bake/love):
The Italian chef will probably bake/love pizza until the day he dies.
Erin’s parents thought she would bake/love the food that was at the family reunion.
We checked for the lexical coverage of the sentence contexts and found it acceptable; 97.5% of the words in the context belonged to the most frequent 4000-word families (likely to be known by our participants, see above). Thus, no changes were made on the sentence contexts. One-third of the sentence contexts were presented with a comprehension question to ensure that participants were reading for meaning. (Please refer to
Knickerbocker et al. (
2015) for the full list of sentences.)
4.3. Procedures
After signing the consent form, each participant was asked to complete the BDI-II (
Beck et al. 1996) and the State Version of the STAI (
Spielberger et al. 1983). The items within each questionnaire were presented in random order. The BDI includes 21 items for which participants select the statement that indicates their feelings and behaviors during the past two weeks. The following item was excluded from the BDI-II as it was considered culturally sensitive:
0 I have not noticed any recent change in my interest in sex.
1 I am less interested in sex than I used to be.
2 I have almost no interest in sex.
3 I have lost interest in sex completely.
Thus, the possible scores on the BDI scale in the present study ranged between 0 and 60, with higher scores reflecting higher levels of depressive symptoms. The STAI includes 20 items that assess the feelings of the participants at that very moment. The possible scores ranged between 20 and 80, with higher scores indicating higher levels of anxiety. After completing the emotion questionnaires, the participant completed the vocabulary test, then the eye-tracking experiment began.
We used an SR Research EyeLink 1000+ eye-tracker. Eye movements were recorded monocularly and head movement was minimized using a desk-mounted chinrest. A 9-point grid calibration procedure was conducted before the experiment. Each screen was additionally preceded by a fixation point for drift correction. The eye-tracker was re-calibrated at least once during the experiment. The instructions stated that the sentences should be read as naturally as possible for comprehension and that the space bar should be pressed to go to the next screen. The sentences were presented across one line in random order, and the target positive and neutral words did not appear at the beginning or end of a line. A third of the sentences were followed by a comprehension question, and the scores indicated that the participants had attended to the text (average percentile score: 97.56%, SD = 4.61). Before leaving the lab, the participants completed a short language-background questionnaire.
4.4. Analysis
Trials were eliminated from the data analysis if the eye tracker lost track of the eye. Following
Knickerbocker et al. (
2015), we applied a two-step cleaning procedure in the DataViewer software. First, adjacent fixations (less than 0.5 degrees apart) were merged if one or both of them were short (less than 80 ms). Second, single fixations shorter than 100 ms and longer than 800 ms were removed (3.28% of all fixations).
The analysis was conducted in R version 4.0.5 (
Team R Developement Core 2021). Separate models were constructed for 13 eye-movement measures examining early and late eye movements on the positive/neutral word (target region) and on the post-target region. Similar to
Knickerbocker et al. (
2015), the post-target region was operationalized as two words following the target word.
Table 2 presents a list of the eye-movement measures and their specified features. Binary (0/1) outcome values were analyzed using a mixed-logit regression (or, mixed-effects logistic regression) analysis (
glmer function in the
lme4 package). For continuous dependent variables, we employed linear mixed-effects (LME) models (
lmer function in the
lme4 package).
Reading times were log-transformed to reduce skewness in the data. All 13 analyses (one for each eye-movement measure) adopted the maximal random-effects structure justified by the design. The main target variable was Word Type, with
neutral as the reference level. We also included various variables that might modulate differences between positive and neutral words: Trial; Log vocabulary test score; STAI score; BDI score; Age; Arousal; Length; and Log word frequency. All models included random intercepts for subjects and items, as well as by-subject random slopes for Word Type. Seven 2-way interactions were then added stepwise, one-by-one (between Word Type and each of the other variables, except Trial), to examine any modulating effect of these variables on the difference between positive and neutral words. Log-likelihood tests and AIC values were used to compare the resulting model with the original model, and only interactions that significantly improved the model fit were retained
1.
All continuous variables were centered.
Table 3 presents a summary of the continuous variables. The final models were checked for collinearity, and no issues were observed, with a variance inflation factor (VIF) of around 3.
4.5. Results and Discussion
Table 4 presents the means (continuous variables) and percentage (binary variables) for neutral and positive words. No evident direction is observed; while most of the measures exhibited the expected advantage for positive over neutral words (shorter RT for the former than the latter), the remaining measures showed a processing disadvantage.
The results of the mixed-effects modelling are presented in
Table 5. It is evident that Word Type was not a significant main predictor for any of the 13 eye-movement measures (early, late, or post-target). However, Word Type interacted with several predictor variables. The significant interactions are presented in
Figure 1,
Figure 2,
Figure 3,
Figure 4 and
Figure 5.
Figure 1 shows that the usual advantage for positive words over neutral words is only evident in the first fixation duration measure for higher-frequency words; that is, positive words were read faster than neutral words only when these were frequent in the language.
The interactions depicted in
Figure 2 and
Figure 3 seem to suggest that the advantage in the total reading time for positive words over neutral ones is evident for words with a low arousal score and for participants with a high BDI score, respectively. Arousal also seems to have an effect on the second pass reading time measure (
Figure 4), again with an advantage for positive words with a low arousal score. This same effect is also evident for the first pass reading time in the post-target region (
Figure 5). It is noteworthy that as the arousal scores increase for all three measures, positive words show the opposite effect; that is, a disadvantage in processing compared to neutral items.
5. Experiment 2
5.1. Participants
Forty-three L1 Arabic—L2 English speakers participated in Experiment 2 (female = 29, male = 14). None of them had participated in Experiment 1. Their average age was 29.70 years (Min = 18, Max = 62, SD = 9.73). Similar to Experiment 1, age was added as a covariate in the analysis. They started learning English mostly in educational settings, at an average age of six (M = 6.30, SD = 4.00). The participants’ average self-reported proficiency across the four skills was 4.34 (SD = 0.52).
The mean score achieved by the participants in the V_YesNo was 6492.37 (Min = 4608, Max = 8601, SD = 1147.28), indicating a good-to-high level of proficiency. The vocabulary scores were included as a factor in the analysis. The participants were randomly presented with one version of the experiment (Version 1, n = 21 and Version 2, n = 22).
5.2. Design and Stimuli
The items for Experiment 2 were borrowed from
Knickerbocker et al. (
2015), following the same procedures as Experiment 1. We selected the pairs with the same neutral words as the ones included in Experiment 1. The 21 negative/neutral pairs are presented in
Appendix B. Here is an example for the pair (bake/hate):
I am pretty sure that Kevin will bake/hate the sugar cookie mix from the grocery store.
Every holiday, I return home and bake/hate the dinner that the family eats together.
5.3. Procedures
The procedures followed those in Experiment 1. The average percentile score in the comprehension question was high (M = 96.68%, SD = 4.23).
5.4. Analysis
The data cleaning and analysis followed the procedures described in Experiment 1. Only 2.49% of the data points were lost due to the cleaning procedure.
Table 6 presents a summary of the continuous variables. The collinearity was checked for significant predictors in each model using the VIF. All values were around 3, indicating no collinearity issues.
5.5. Results and Discussion
The means and percentages for continuous and binary variables, respectively, are presented in
Table 7. The emotion-word advantage is less clearly evident in Experiment 2 than in Experiment 1. Almost all of the eye-movement measures showed a disadvantage for negative words in comparison to neutral ones, with longer reading times in the former than the latter.
Looking now at the results of the mixed-effects modelling in Experiment 2 (
Table 8), we find that Word Type was significant as a main variable for only one measure, which is the skipping-rate, with a medium effect (
odds ratio = 0.29,
d = −0.67). This result seems to suggest that the odds of skipping a neutral word were over three times (1/0.29) larger than those of skipping a negative word. It should be noted, however, that this effect is further modulated by word length (see
Figure 6). Negative words are more likely to be skipped when they are short, but the difference seems to diminish for longer words.
Finally, the effect of arousal, which was evident in Experiment 1 (whereby the advantage for emotion words was evident only for words with lower arousal score), was also demonstrated in one early (first pass reading time) and one late (total reading time) measure on the target region (see
Figure 7 and
Figure 8). The BDI and STAI had no significant main or interreacting effects on the difference between negative and neutral words.
6. Discussion
The current study aimed to examine the processing of emotionally positive and negative words versus neutral words in normal silent reading by Arab EFL learners. To this end, two experiments were conducted in which the participants read English sentences with embedded target words while their eye movements were monitored and recorded against 13 measures. Experiment 1 was designed to address Research Question 1 (RQ1), concerning whether positive emotion words display an advantage over neutral words, and Research Question 2 (RQ2), regarding the modulating effects of item-related and participant-related features; on the other hand, Experiment 2 was designed to respond to RQ3, concerning the advantage of negative emotion words when compared with neutral words and, again, the modulating effects of item-related and participant-related features. The results of the two experiments failed to reveal any significant advantage for word type (but see Experiment 2 for a main effect of word type on the skipping rate of negative words) against the 13 eye-movement measures.
This finding is in contrast to the confirmed processing advantage for emotion words in earlier studies in word processing (
Ferré 2002;
Kousta et al. 2009;
Kazanas and Altarriba 2015b) and in normal reading (
Knickerbocker et al. 2015;
Scott et al. 2012). However, this should not come as a surprise as emotion words can behave differently in bilinguals, particularly for late bilinguals who learned their L2 in instructional settings, as is the case in the present study (
Ferré et al. 2018). According to
Kroll and Stewart’s (
1994) RHM, the strong conceptual connections between L2 words and their meanings may require additional meaningful exposure before L2 learners can access meaning directly, and thus behave similar to L1 speakers. Similarly,
Jiang (
2000) predicted that very few advanced language learners can escape the fossilization stage and deal with L2 vocabulary as naturally as native speakers. Additionally, the L2, particularly in foreign language learning contexts, is known to be emotionally distant (
Caldwell-Harris 2014;
Chen et al. 2015).
Despite failing to show any significant main effects for word type, Experiment 1 showed an interaction between word type and frequency. Positive words were only read faster than neutral words in the case of highly frequent words. This finding can again be interpreted in light of the RHM. Highly frequent L2 words are likely to access meaning through direct conceptual links due to their recurrent use. Hence, highly frequent words may connect with their concepts directly without L1 mediation and can thus mirror the behavior of L1 emotion words.
Similarly, Experiment 2 revealed a significant main effect of word type on the skipping rate, which was further modulated by length. Negative words were less likely to be skipped than neutral words (with a medium effect), particularly when they were shorter. It is well-known that adult readers skip approximately one-third of the words in a text as they read (
Rayner 1998,
2009). It is also widely acknowledged that shorter words are more likely to be skipped than longer ones (
Brysbaert and Vitu 1998;
Rayner and McConkie 1976;
Vitu et al. 1995). This can be related to
parafoveal lexical processing, whereby shorter words are preprocessed while they are still at the parafoveal view, leading to a higher skipping rate (see
Schotter 2018 for an overview of the evidence). In addition to word length, a number of other contributing factors have also been investigated, such as predictability from context (
Drieghe et al. 2005;
Gollan et al. 2011) and high word frequency (
Angele et al. 2014;
Rayner and Fischer 1996). The finding that negative words may be less likely to be skipped than neutral words is novel. Could this be a result of automatic vigilance (
Pratto and John 1991)? That is, could negative words be more resistant to skipping because they require prolonged attention? Further research is needed to investigate this finding in depth.
In addition to word frequency and skipping, Experiments 1 and 2 showed a negative correlation between emotion words and arousal. Emotionally positive and negative words were only read faster than neutral words when they were low in arousal. In contrast, emotion words suffered from a processing disadvantage when the words were high in arousal. This finding is similar to previous studies in word recognition in which arousing words were recognized more slowly than calming words (
Barriga-Paulino et al. 2022;
Kuperman et al. 2014). This could again be interpreted in light of the automatic vigilance model of emotion (
Pratto and John 1991) as highly aroused words may hold attention longer than words with low arousal. It must be noted, however, that
Knickerbocker et al. (
2015) also found a modulating effect for arousal on the processing advantage of emotion words, but the effect was facilitative for positive words and inhibitory in negative words.
Concerning the scores of the DBI and STAI, minimal main and interaction effects were attached to them in the present study, as was the case in
Knickerbocker et al.’s (
2015) study with native speakers of English. In
Knickerbocker et al.’s (
2015) study, this indicated that the processing advantage of emotion words can emerge in L1 silent reading regardless of the participants’ status of anxiety or depression. However, this conclusion will not apply to the present study as no processing advantage for emotion words was noted.
7. Conclusions
The current study aimed to examine the processing of emotionally positive/negative words versus neutral words in L2 normal silent reading. Two experiments were conducted that included Arab EFL learners read English sentences in which positive versus neutral words (Experiment 1) or negative versus neutral words (Experiment 2) were embedded. The participants’ eye-movements were monitored against 13 measures. In contrast to earlier L1 studies (
Knickerbocker et al. 2015;
Scott et al. 2012), the present study failed (overall) to find any main effects for word type in both experiments. However, an interaction was found between word type and word frequency/arousal in Experiment 1 and between word type and word length/arousal in Experiment 2. Only Experiment 1 showed a few instances of interaction between word type and the BDI inventory, which assesses depression. The results seem to support the hypothesis that L1 is emotionally close while L2 is emotionally distant to bilinguals, especially the late ones who learned the L2 in instructional settings (
Caldwell-Harris 2014;
Chen et al. 2015).
In terms of theoretical models/hypotheses, the current study partially supports
Kroll and Stewart (
1994) and
Jiang (
2000). L2 emotion words could have behaved similarly to their L1 counterparts if the participants had longer natural exposure to the L2. In this case, the direct connections between L2 words and their concepts could have been strengthened enough to allow L2 emotion words to behave similarly to their L1 counterparts. It must be noted, however, that the results of the current study call for the explicit integration of the affective factor in mental lexicon models. The current study also casts doubt on the applicability of two models on the processing of emotion words by bilinguals: the motivated attention and affective states model (
Lang et al. 1990,
1997) and the automatic vigilance model of emotion (
Pratto and John 1991). A revision of the models in light of bilingualism research could help provide a comprehensive understanding of the behavior of emotion words.
In terms of pedagogy, the results of the current study add further emphasis to the need to pay special attention to the teaching of emotion words in the L2 classroom and engage learners in experiences that would strengthen their understanding and appreciation of these words. The fact that the L2 is emotionally distant and that EFL learners struggle to express their true feelings are well-documented in the literature (
Caldwell-Harris 2014;
Chen et al. 2015;
Pratto and John 1991). This difficulty emerges in the present study, although the participants’ proficiency level was good-to-high based on the vocabulary test they completed prior to the experiments.
Based on the present study, we can suggest some directions for future research. As noted previously, work on emotion words (e.g., happy, sad, joy) abounds in the literature, but far fewer studies appear to focus on the reading of emotion-laden words (e.g., cancer, butterfly, knife). The question remains in terms of how eye movements are modulated by the type of emotion word that is read or viewed, and whether or not words of higher or lower arousal and positive or negative valence influence measures of eye movements and reading in significant and systematic ways. A true comparison would involve creating sets of these materials that control for various characteristics or features (e.g., length, frequency, orthographic neighbors, valence, arousal, etc.) and differ only on the basis of whether they directly label an emotional state or represent an emotion, in the case of emotion-laden words. Work on both early and late measures of processing for both of these word classes will further illuminate our understanding of emotion word processing on the whole.