Bilingual Contextual Variability: Learning Words in Two Languages

Lauro, Justin; Toassi, Pamela Freitas Pereira

doi:10.3390/educsci15091264

Open AccessArticle

Bilingual Contextual Variability: Learning Words in Two Languages

by

Justin Lauro

^1,*

and

Pamela Freitas Pereira Toassi

²

¹

Department of Psychology, Barry University, Miami Shores, FL 33161, USA

²

Foreign Languages and Translation Department, Universidade de Brasília, Brasília 70910-900, DF, Brazil

^*

Author to whom correspondence should be addressed.

Educ. Sci. 2025, 15(9), 1264; https://doi.org/10.3390/educsci15091264

Submission received: 26 June 2025 / Revised: 8 September 2025 / Accepted: 15 September 2025 / Published: 22 September 2025

(This article belongs to the Special Issue Language Learning in Multilingual, Inclusive and Immersive Contexts)

Download

Browse Figures

Versions Notes

Abstract

Background. Bilingual novel word learning is shaped by both semantic context and the language in which learning occurs. According to the context variability hypothesis and instance-based learning frameworks, varied semantic contexts promote the formation of flexible lexical-semantic representations. However, the extent to which these benefits generalize across languages and transfer to novel contexts remains unclear. Method. Two experiments examined the effects of study language (L1, L2, or both) and semantic variability (repeated vs. varied contexts) on novel word learning in English–Spanish bilinguals. Participants studied rare words embedded in sentences and were tested via a word-stem completion task. In Experiment 1, test sentences were identical to those seen during the study. In Experiment 2, half of the test sentences were novel, requiring generalization beyond previously encountered contexts. Orthographic overlap across languages was also assessed. Results. In Experiment 1, varied semantic contexts improved recall accuracy, supporting the context variability hypothesis. Unexpectedly, words studied in L2 were recalled more accurately than those studied in L1, consistent with desirable difficulty effects. Additionally, orthographic overlap moderated learning, with greater benefits observed in mixed-language conditions. In Experiment 2, overall accuracy declined, and no main effects of language or context were observed. However, a three-way interaction showed that orthographic overlap improved recall only when words were studied in L1 and tested in novel contexts. Conclusions. Semantic and linguistic variability can enhance bilingual word learning when test conditions are consistent with the learning context. However, generalization to novel contexts may require deeper processing, extended exposure, or additional retrieval cues.

Keywords:

bilingualism; word learning; semantic variability; orthographic overlap; desirable difficulty

1. Introduction

Repeated exposure to unknown words in context leads to the integration of form and semantic features into the lexicon (Bolger et al., 2008; Masson, 1995; McRae et al., 1997; Plaut & Booth, 2000). According to the instance-based framework of word learning (Reichle & Perfetti, 2003), each encounter with a novel word forms a distinct memory trace, encoding both word-level and contextual features (Bolger et al., 2008; Lauro et al., 2020; Nelson et al., 2005; Reichle & Perfetti, 2003). Features repeated across encounters are strengthened, while novel features are incorporated into a developing lexical-semantic network.

Semantic context is particularly important for building robust lexical-semantic networks of memory traces of word learning encounters (Bolger et al., 2008; Lauro et al., 2020; Reichle & Perfetti, 2003). The context variability hypothesis suggests that exposure to varied semantic contexts promotes more generalized, flexible word knowledge by enhancing integration into lexical-semantic networks. This effect has been demonstrated in both monolingual (Bolger et al., 2008) second language (L2) learners (Elgort, 2011), as well as in both languages for bilingual learners (Lauro et al., 2020). These studies typically employ rare or pseudowords, embedded in sentence contexts. Repeated exposure to these sentences can be manipulated by repeating or varying the semantic contexts in which novel target (pseudo)words are embedded. Then, lexical-semantic knowledge can be assessed in a post-test session measuring form-based (Elgort, 2011) or meaning-based (Elgort & Warren, 2014; Lauro et al., 2020) word knowledge.

General vocabulary knowledge also influences the extent to which individuals can extract word meaning from context. Skilled readers rely less on the episodic context during studying than less-skilled readers (Nelson et al., 2005) and can learn new words even in uninformative semantic contexts (Eskenazi et al., 2018). The Lexical-Quality Hypothesis posits that efficient word recognition in skilled readers frees cognitive resources for meaning construction (Perfetti & Hart, 2001). Similarly, skilled spellers are better able to infer the meanings of novel words from context once they learn the spelling of each word (Eskenazi et al., 2018). However, for bilinguals, reading in L2 requires more cognitive resources for word recognition and orthographic processing, limiting the capacity for semantic processing (Elgort et al., 2018).

Beyond semantic variability, the language in which a novel word is encountered may also influence the ability to extract word meanings from context in bilinguals and second language learners. The language in which words are encountered is typically encoded in memory (Cristoffanini et al., 1986; De Groot, 1992; MacLeod & Kampe, 1996; Marian & Neisser, 2000). Repeated exposure across languages may promote semantic integration. Previous research shows that bilinguals exposed to novel words in both L1 and L2 were better able to judge the semantic relatedness of word pairs compared to single-language exposure (Lauro et al., in press).

Bilingual lexical access is generally language non-selective, with activation spreading across both languages (Brysbaert & Duyck, 2010; Dijkstra & Van Heuven, 2002). This activation can facilitate lexical access for cognates, words with high cross-language orthographic overlap and shared meaning (Costa et al., 1999), but hinder lexical access for interlingual homographs or false cognates, which are words that share orthographic form but not meaning across languages (Dijkstra et al., 2000; Toassi et al., 2023). Semantic context modulates the degree of cross-language activation (Lauro & Schwartz, 2017). However, the interaction between semantic contextual variation and language context in novel word learning remains underexplored.

Encountering a word or concept in multiple languages may integrate language-specific features, potentially increasing the number of nodes available within the developing lexical-semantic network. Each linguistic context contributes unique language-specific orthographic and phonological features. Over time, repeated exposure across languages can create more interconnected lexical-semantic networks, making the concept more accessible and flexible in retrieval. While initial cognitive load may increase, deeper processing through repeated, meaningful contexts could improve subsequent recall. Bilingual non-selective access may increase cognitive demand during initial learning, especially for words that have a low degree of cross-language overlap. This is because bilinguals must manage competing activations across languages (Lauro et al., in press).

This increased demand might enhance memory recall through deeper processing, as well as through desirable difficulty (R. A. Bjork, 1994; R. A. Bjork & Bjork, 2020; E. L. Bjork & Bjork, 2011). Deliberately difficult challenges can stimulate deeper cognitive processing and aid in long-term learning. Tasks that initially require more effort, such as learning the meanings of target words across multiple languages, can improve learning outcomes. In fact, bilingualism by itself can create a desirable difficulty during word learning (Bogulski et al., 2019), particularly if the language context of vocabulary acquisition is varied. Bilinguals have to constantly engage in language regulation processes and typically outperform monolinguals in acquiring vocabulary in a (separate) foreign language (Bogulski et al., 2019; Bradley et al., 2013). Multi-language variation in novel word learning may increase the desired difficulty of learning.

We propose that encountering a novel word in multiple linguistic contexts introduces unique orthographic and phonological features from each language. These repeated, meaningful exposures can build a richer and more interconnected lexical-semantic network. While the initial cognitive demand may be greater, especially when cross-language overlap is low, the increased effort may ultimately enhance recall.

2. Present Study

This study examines the potential interaction of language and semantic context in bilingual word learning. Across two experiments, English–Spanish bilinguals studied novel words in their dominant (L1), non-dominant (L2), or both languages, with target words appearing either in repeated or varied semantic contexts. Learning was assessed using a post-test word-stem completion task in which the missing stem was embedded into sentences. The word-stem completion task was selected because it provides a sensitive measure of how contextual variability supports recall of novel word forms within sentence contexts. In Experiment 1, the test sentence matched a sentence from the study phase. In Experiment 2, half of the test sentences were entirely novel, requiring participants to generalize word knowledge beyond prior contexts, and half were repeated sentences from the study phase.

We hypothesized that repeated semantic contexts would enhance recall in Experiment 1, where test sentences matched previously studied sentences. In Experiment 2, we predicted that words studied in repeated semantic contexts would show higher recall. However, for the novel test sentences, we predicted that words studied in varied semantic contexts would lead to better performance, reflecting improved generalization of word knowledge. Additionally, across both experiments, we predicted higher recall for L1 words due to stronger pre-established lexical-semantic networks, reflecting the dominance of L1 lexical-semantic networks. However, cross-language learning was predicted to facilitate integration into the lexical-semantic network, particularly in novel test sentences in Experiment 2, where more semantic flexibility was required.

3. Experiment 1

3.1. Method

3.1.1. Participants

Participants were recruited from a university in Miami, Florida, and received course credit for their participation. Forty-eight English–Spanish bilinguals were included in the sample (IRB approval #1715119). Self-reported proficiency-based dominance was assessed on a 7-point Likert scale, with higher scores indicating greater proficiency. Based on these ratings, 41 participants were classified as English-dominant and 7 as Spanish-dominant. In addition, we considered participants’ language background and order of acquisition, given the large population of Spanish heritage speakers in this region. Participants represented diverse cultural backgrounds, including the United States (38%), Puerto Rico (21%), and Cuba (17%), among others. Participants reported their age of acquisition for both English and Spanish. In Experiment 1, 17 participants were simultaneous bilinguals, 3 reported English as their L1, and 28 reported Spanish as their L1. Of the 17 simultaneous bilinguals, 15 were English-dominant based on self-reported proficiencies. Of the 28 L1 Spanish bilinguals, 23 were English-dominant based on self-reported proficiencies (see Table 1 for participant characteristics). Finally, we note that participants in Experiment 1 were the same as those in Lauro et al. (in press); the present study focuses specifically on recall performance for the same studied words.

3.1.2. Materials

Stimuli

The stimuli in the present study were identical to those used in Lauro et al. (in press). Target novel words consisted of 84 rare English and Spanish words. These 84 rare words were selected based on their low-frequency counts in both English and Spanish (Brysbaert & New, 2009; Cuetos et al., 2011). Target words were randomly divided into two lists: List A (42 words) and List B (42 words). Participants were randomly assigned to one of these lists, with half of the participants receiving List A and half receiving List B. This counterbalancing was done to ensure that any potential effects were not due to specific items. Word frequencies (words per million, wpm) were obtained from SUBTLEXus (Brysbaert & New, 2009) for English and SUBTLEX-Esp (Cuetos et al., 2011) for Spanish. To ensure low-frequency status, only items with frequencies less than 3 wpm when both languages were included. Across the final stimulus set, English words ranged from 0 to 1.82 wpm (M = 0.36, SD = 0.42), while Spanish words ranged from 0 to 3.10 wpm (M = 0.52, SD = 0.64). No significant difference was found between the average word frequencies of English (M = 0.28 wpm, SD = 0.39) and Spanish (M = 0.38 wpm, SD = 0.63), t(166) = −1.22, p = 0.23. Word length was also similar across languages (English M = 6.08 letters, SD = 1.22; Spanish M = 6.58 letters, SD = 1.08).

Cross-language orthographic overlap was defined as the degree of similarity in spelling between a word in one language and its translation in another. This was measured using normalized Levenshtein distance, a metric that quantifies the number of single-character edits needed to transform one word into the other (Levenshtein, 1966; Schepens et al., 2012), using the following formula: score = 1 − Levenshtein distance/maximum word length. To ensure our effects were due to orthographic overlap and not misleading semantic associations, no false cognates (i.e., words with similar spelling but different meanings) were used. The average orthographic similarity ratio was 0.61 (SD = 0.28).

For the study phase, all target words were embedded into sentences. Four sentences were constructed for each target word, resulting in 336 distinct sentence contexts in each language. English and Spanish sentences were direct translations. To illustrate how target words appeared across conditions, consider the English word remiss. In the repeated context, English condition, participants read “Natasha was quite remiss when she messed up the customer’s order” four times. In the varied context condition, four separate sentences were created for the target word. Similarly, the Spanish word abstrusa appeared in the repeated Spanish context condition as “Algunas de las novelas clásicas son demasiado abstrusas para que las entiendan los lectores principiantes.” In the varied context condition, four separate sentences were created for the target word.

The average words per sentence was 15.50 (SD = 4.05) for English sentences, and 16.16 (SD = 5.12) for Spanish sentences. Sentences were randomly assigned to one of three experimental conditions: English-only, Spanish-only, or mixed-language. Additionally, half of the sentences were randomly assigned to a single-sentence context (i.e., repeated context) and half were assigned to a four-sentence context (i.e., varied context). Participants studied only the 42 words from their assigned list (either List A or List B) within these sentences.

For the testing phase, a word-stem sentence completion task was created. In this task, participants were presented with the first letter of a target word (e.g., R_ _ _ _ _) and asked to complete the stem with the studied word. Additionally, missing word stems were embedded in sentence contexts. Specifically, a sentence that was presented in the study phase was presented in the testing phase. Participants were only tested on target words they had studied previously. This method was used to explicitly assess memory for the recently studied items.

To assess prior knowledge of the target words, 42 words in the stimulus pool that were not studied were presented in word pairs, with either a high-frequency synonym or an unrelated word. Accuracy of non-studied word pairs was less than chance (M = 0.32, SD = 0.15), t(47) = −8.26, p < 0.001, indicating that prior knowledge of words in the stimulus pool was minimal and that target words were indeed unfamiliar to the participants.

3.1.3. Design

A 3 (study language: L1-only, L2-only, both L1 and L2) × 2 (semantic variation: repeated context, varied context) within-subjects design was used. Participants studied 42 rare target words embedded in sentence contexts. The dependent variable was the accuracy rates in the word-stem sentence completion task.

3.1.4. Procedure

Study Phase

Participants studied 42 of the target novel words. Half of the words were presented to participants in a single-sentence context, repeated four times, and half of the words were presented in four distinct sentence contexts. Additionally, one-third of the target words were always presented in English, one-third always in Spanish, and one-third were presented in both English and Spanish. In the conditions with a single-sentence context and mixed languages, direct translations of the target sentence were used.

Stimuli were randomly assigned to two blocks of 21 items each. Within each block, target words were presented within sentences in randomized order. Sentences were displayed centered on the screen, and participants pressed the spacebar to advance after reading each sentence. Each target word was presented four times, with the presentation randomized within each block.

Testing Phase

Following each study block, participants completed a word-stem completion task. In this task, participants were presented with one previously studied sentence, with only the first letter of the target word provided (e.g., the target word “Remiss” would appear as R _ _ _ _ _). Participants were asked to type in the missing letters.

3.1.5. Data Analysis

Study conditions in English and Spanish were recoded as L1 or L2 based on each participant’s self-reported proficiency-based dominance (i.e., for English-dominant participants, English = L1, Spanish = L2; for Spanish-dominant participants, Spanish = L1 and English = L2), rather than strictly order of acquisition. This coding ensured that comparisons reflected relative self-reported proficiency-based dominance. This approach allowed us to analyze the data from a psycholinguistic perspective, focusing on the cognitive processes related to dominant vs. non-dominant language use in bilinguals. All study factors were within-subjects and included study language (L1, L2, or both) and semantic variation (repeated or varied sentences). The three-level study language was coded using two orthogonal contrasts: (1) L1 vs. L2, and (2) single-language vs. mixed-language conditions.

Accuracy in the word-stem completion was analyzed using logistic mixed-effects regression using the lme4 package (Bates et al., 2015) in R version 4.4.1 (Baayen, 2008; Baayen et al., 2008). A model comparison approach was used. The simplest model included study language and semantic variation as fixed effects, with random intercepts for participants and items. Random slopes for participants and items were then added incrementally. If the addition of a slope improved the model fit, as indicated by a significant log-likelihood ratio test, it was retained. Probabilities of correct responses (for the logistic mixed-effects models) were then calculated based on the final model using the emmeans package (Lenth, 2017).

3.2. Results

A logistic mixed-effects regression model was used to analyze the word-stem completion accuracy, with study language (L1 vs. L2; single- vs. mixed-language), semantic context (repeated vs. varied), and orthographic overlap as predictors. The final model included random intercepts for participants and items, indicating variability in performance across individuals and items. Neither random slopes by-participants nor by-items improved the model fit (or did not converge). The results for the logistic mixed-effects models are displayed in Table 2.

There was a significant effect of language context (L1 vs. L2), β = 0.28, SE = 0.08, z =−3.66, p < 0.001. The probability of a correct response was higher for L2 (estimated probability = 0.55, SE = 0.06) than for L1 (estimated probability = 0.41, SE = 0.06). However, this main effect should be interpreted cautiously due to the significant interaction with orthographic overlap. Accuracy did not differ across single-language and mixed-language conditions, z < 1. There was also a significant effect of semantic context, β = 0.36, SE = 0.18, z = 2.00, p = 0.045. Accuracy was higher for the varied semantic context condition (estimated probability = 0.52, SE = 0.06) than the repeated semantic context condition (estimated probability = 0.43, SE = 0.06).

There was no significant effect of orthographic overlap of the novel target word on word-stem completion accuracy, z < 1. However, there was a significant interaction between study language (single vs. mixed) and orthographic overlap, β = 0.14, SE = 0.07, z = 2.02, p = 0.04. As the degree of cross-language orthographic overlap increased, the difference between the single-language (i.e., the average of L1 and L2 conditions) and mixed-language conditions increased. Figure 1 displays the estimated probabilities when the novel target word had a relatively low (0.20) degree of orthographic overlap and a relatively high (0.80) degree of orthographic overlap.

3.3. Experiment 1 Discussion

The significant main effect of semantic variability, in which varied contexts led to higher accuracy, directly supports the context variability hypothesis (Bolger et al., 2008). After multiple exposures to novel words, participants were more likely to answer correctly on a word-stem completion task when those words were studied in varied semantic contexts, even when tested in one of those contexts. Within the instance-based framework, repeated features across encounters with novel words strengthen episodic memory traces of the word learning event, and varied features across encounters create new traces within a lexical-semantic network. This finding suggests that the flexibility of a broader lexical-semantic network outweighed a potential disadvantage of generally weaker, context-specific features, which would be enhanced in the repeated context condition.

The second major finding was that participants had a higher probability of a correct response for words studied exclusively in L2. This was not predicted. However, this unexpected result could suggest that L2 novel words are processed differently, perhaps requiring more cognitive effort for retrieval. For example, previous research has shown that L2 requires more deliberate processing and allocation of attentional resources (Godfroid et al., 2018). Learning novel words exclusively in L2 sentences may have required more effort and deeper processing compared to L1 sentences, which may have been processed more automatically. This would therefore lead to stronger memory traces for words studied in L2 contexts. Finally, a significant interaction was observed between single-language and mixed-language contexts with the degree of cross-language orthographic overlap. Higher overlap resulted in greater differences in the single- and mixed-language conditions, suggesting that orthographic overlap moderates word learning in bilingual sentences.

The observed pattern of findings in varied semantic and both-language learning conditions may be interpreted through desirable difficulties (R. A. Bjork, 1994; R. A. Bjork & Bjork, 2020; E. L. Bjork & Bjork, 2011). The increased difficulty of studying in L2 could lead to improved recall, compared to studying in L1, by increasing the retrieval effort during encoding and retrieval. While the initial increased effort may initially hinder performance, it also supports long-term retention and potentially flexible access to lexical-semantic representations (R. A. Bjork & Bjork, 2020). Furthermore, because the word-stem completion task repeated previously studied semantic contexts, it did not require full semantic retrieval of lexical-semantic knowledge, but rather recognition of word-form level knowledge.

One possible limitation of Experiment 1 was that the testing phase did not directly assess semantic knowledge because the sentence was repeated from the study phase. Because Experiment 1 used the same context for both study and test, it remained unknown if the words were truly learned or if the participants just recognized the sentence. Experiment 2 was designed to address this critical issue.

4. Experiment 2

Experiment 2 was designed to extend the findings of Experiment 1 by examining whether participants could recall novel words when tested in novel semantic contexts. This design addressed a critical limitation of Experiment 1, where the repetition of the study sentences during the word-stem completion task could have been measuring recognition rather than recall of the target word’s meaning. In Experiment 2, we introduced novel sentence contexts during the testing phase, requiring participants to generalize their word knowledge beyond specific cues from the learning phase.

4.1. Method

4.1.1. Participants

Participants in Experiment 2 were recruited from the same pool as in Experiment I. Forty-four English–Spanish bilinguals participated. Self-reported proficiency-based dominance was assessed on a 7-point Likert scale, with 38 participants classified as English-dominant and 6 as Spanish-dominant. Additionally, participants reported their order of acquisition. Twenty-six were simultaneous bilinguals, 8 reported English as their L1, and 10 reported Spanish as their L1 (See Table 1 for participant characteristics). All simultaneous bilinguals were classified as English-dominant based on their self-reported proficiencies. Among the 8 L1 English participants, 4 were English-dominant. Among the 10 L1 Spanish participants, 8 were English-dominant. Participants also represented diverse cultural backgrounds; more than half of the participants were born in the United States (n = 25, 57%), followed by Dominican Republic (n = 4, 9%), Uruguay (n = 3, 7%), Cuba (n = 2, 5%), Nicaragua (n = 2, 5%), Puerto Rico (n = 2, 5%), Italy (n = 2, 5%), Argentina (n = 1, 2%), Colombia, (n = 1, 2%), Ecuador (n = 1, 2%), and Switzerland (n = 1, 2%).

4.1.2. Material

The same materials as Experiment 1 were used in Experiment 2, with one exception. In the testing phase of Experiment 2, novel sentences were presented, with the missing word-stem embedded, for half of the trials. On the other half of the trials, participants were presented with the exact same sentence they had studied during the study phase. This allowed us to compare memory for target words presented in new versus previously encountered semantic contexts.

4.1.3. Design and Procedure

The same experimental design and procedures used in Experiment 1 were employed in Experiment 2. In addition to the study language and study semantic context variables, the test context was also manipulated. In half of the trials, missing word stems were embedded in sentences from the study session. For the other half of the trials, missing word stems were embedded in novel sentence contexts. The order of repeated and varied test contexts was counterbalanced across participants.

4.1.4. Data Analysis Plan

The data analysis plan was the same as Experiment 1. This allowed for direct comparisons between the results of Experiment 1 and Experiment 2.

4.2. Results

As in Experiment 1, a logistic mixed-effects regression model was used to analyze the word-stem completion accuracy, with study language (L1 vs. L2; single- vs. mixed-language), semantic context (repeated vs. varied), and orthographic overlap as predictors. In Experiment 2, the test context (repeated vs. varied) was also included as a predictor. All possible interaction terms were also included in the model. The final model included random intercepts for participants and items, with by-participant random slopes for study language (see Table 2).

There were no significant main effects observed for study language (L1 vs. L2: z = −0.92, p = 0.36; single- vs. mixed-language: z = −0.18, p = 0.86), semantic context (z = −0.15, p = 0.59), test context (z = −0.58, p = 0.56) or orthographic overlap (z = −0.49, p = 0.62).

There was, however, a significant three-way interaction between study language (L1 vs. L2), test context, and orthographic overlap β = 0.76, SE = 0.40, z = 1.93, p = 0.05. The impact of orthographic overlap on the word-stem completion task changed depending on the language in which the novel word was studied (L1 vs. L2) and the semantic context in which the word appears during the testing phase (repeated vs. varied). In the L1 and varied condition, higher orthographic overlap has higher odds of a correct word-stem completion. In the L2, regardless of test context, orthographic overlap does not improve accuracy (Figure 2). No other interactions were significant.

4.3. Discussion

In contrast to our predictions, no significant main effects were observed for study language, semantic context, or orthographic overlap of the target word. Participants did not demonstrate significant differences in their ability to recall the studied words based on the language in which they studied, nor did they demonstrate differences based on the variability of the semantic contexts in which they studied. The absence of a main effect of semantic variability suggests a limitation of the context variability hypothesis. When participants were required to fully access word knowledge in new, unfamiliar contexts, varied study contexts did not provide a significant advantage.

However, the significant three-way interaction between study language, text context, and orthographic overlap indicates that participants did benefit from cross-language orthographic similarities during retrieval, but only when the semantic contexts in which the novel words were embedded at the test were not previously seen. Specifically, the post-hoc analyses revealed that orthographic overlap facilitated word-stem completion only when the novel target words were studied in L1 and tested in a novel semantic context. This suggests that when learners process information in their dominant language and encounter that information in unfamiliar contexts, they may be better able to use form-based cues (e.g., orthographic similarity) to support lexical retrieval.

Moreover, the overall low accuracy rates in Experiment 2, compared to Experiment 1, demonstrate the difficulty in recalling recently studied novel words in completely novel contexts. Participants struggled to retrieve and apply newly acquired lexical-semantic knowledge beyond the specific contexts encountered during the learning phase. This finding suggests that the study phase may not have adequately fostered robust lexical-semantic networks for the novel words. It is possible that more extensive exposure or more explicit semantic processing during learning is necessary for words to be accessed flexibly across diverse contexts. Furthermore, the cognitive demands of generalizing semantic knowledge, particularly in the absence of semantic retrieval cues, might have exceeded the threshold of desirable difficulty, leading to cognitive overload and thus interfering with successful retrieval.

5. General Discussion

Across two experiments, we examined the role of language context and semantic variability in novel word learning among English–Spanish bilinguals. We explored how repeated exposures to novel words in varying language and semantic contexts influenced recall, using a word-stem completion embedded in sentence contexts. Experiment 1 examined recall when tested sentences matched study sentences, while Experiment 2 assessed recall in novel sentence contexts.

Experiment 1 provided evidence for the context variability hypothesis (Bolger et al., 2008). Words studied in varied semantic contexts were recalled more accurately than those studied in repeated contexts, specifically when the testing semantic context matched the studied semantic context. This supports the claim that varied contexts enhance lexical-semantic representations by creating multiple (partially overlapping) memory traces (Reichle & Perfetti, 2003), which contribute to greater network flexibility. However, recall was unexpectedly higher for words studied in L2 than in L1. This may reflect a desirable difficulty (E. L. Bjork & Bjork, 2011), in which the increased cognitive effort required to process L2 material led to deeper encoding and more robust memory traces. Even so, we acknowledge that this effect may be moderated by individual differences that were not fully captured by our subjective measure of language dominance. For example. The increased cognitive effort associated with L2 processing may have created an optimal level of desirable difficulty for some learners. However, other participants with lower L2 proficiency may have increased the demand beyond a beneficial threshold. The observed effects may therefore represent an average outcome that masks these individual trajectories. The low overall accuracy in Experiment 2 may also reflect item-specific effects. While the statistical models included random intercepts for items, we acknowledge that certain characteristics of the target novel words may influence their learnability. The present findings may not apply uniformly to all novel words but rather represent general trends.

An additional consideration is the directionality of learning across participants’ languages. Because most of the sample were English-dominant bilinguals, the L1/L2 contrast in practice largely reflected learning with English as the dominant language and Spanish as the non-dominant language. Although our analyses recorded study conditions based on individual dominance, the effects of L2 learning observed here may be interpreted in the context of this asymmetry.

In Experiment 2, participants were required to recall studied words and apply them to novel sentence contexts. There was no effect of the language or semantic contexts in which the words were studied. The overall low accuracy rates in this experiment demonstrate the difficulty of generalizing newly learned words to unfamiliar contexts, suggesting that the initial study phase may not have been sufficient to develop robust lexical-semantic representations.

However, a key finding emerged in a three-way interaction. Orthographic overlap across languages facilitated recall for words studied in L1 and tested in novel sentence contexts. This suggests that form-based cues (e.g., shared orthography across languages) can support retrieval when learners rely less on semantic cues and more on word-form recognition. Cross-language orthographic overlap did not facilitate processing for words studied in L2, possibly due to larger processing costs associated with L2 reading.

Taken together, these findings suggest that while varied semantic exposure supports recall in familiar contexts (Experiment 1), it may not enhance generalization unless learners also develop deep, flexible lexical-semantic representations, which may require additional support, such as extended exposure or explicit semantic processing. The reduced performance in Experiment 2 further highlights the challenges of generalizing word knowledge to novel contexts in the absence of semantic retrieval cues, particularly in L2.

These results align with the instance-based framework of learning, in which each exposure contributes to a distinct memory trace. Varied semantic contexts across exposures increase the number and distinctiveness of these traces, but this benefit appears to be contingent on the availability of retrieval cues at test. Simply studying in varied semantic contexts may not be sufficient for integration of lexical-semantic networks, particularly when cognitive demands are above the threshold of desirable difficulty, such as recalling words studied in the L2 in completely novel semantic contexts.

Finally, the findings related to the language contexts in which word learning occurs suggest that increased cognitive effort associated with L2 processing could lead to deeper encoding of each instance, resulting in stronger memory traces, particularly when semantic retrieval cues are available (Experiment 1). However, this increased cognitive effort could have overburdened learners during the retrieval process, particularly in the absence of semantic retrieval cues (Experiment 2).

Limitations

While the present study offers important insights into bilingual word learning, several limitations should be acknowledged. First, the word-stem completion task primarily assessed form-based retrieval, particularly in Experiment 1. Consequently, the findings should be interpreted as reflecting the influence of semantic and language context on the recall of word forms embedded in sentence contexts. This design may not have fully captured the development of robust lexical-semantic representations. Both experiments provided semantic cues during post-test, even when words in Experiment 2 were tested in novel contexts. A free-recall or meaning-generation task may be able to better assess the degree to which learners develop a robust lexical-semantic network of the recently acquired words.

Second, the relatively short learning phase may have limited the extent to which participants encoded relevant information to form robust lexical-semantic networks of the target words. Extended exposure or more explicit semantic processing during learning may be necessary to facilitate generalization of semantic knowledge. For example, a study on incidental word learning found that L2 learners needed between 12 and 20 repetitions to recall the meanings of novel words (Elgort & Warren, 2014). However, tasks that require recognition of form and meaning may require significantly less exposure than more demanding tasks, such as form and meaning recall or production tasks (Peters & Webb, 2018).

Third, although language dominance was considered, the study did not directly assess or control for L2 proficiency or vocabulary size, both of which likely influence learning outcomes. Individual differences in proficiency and vocabulary size impact how bilinguals encode and retrieve lexical-semantic representations (Uchihara et al., 2019; Zhang & Zhang, 2022). Although some of this variability was accounted for through the random effects structure in the mixed-effects models, future work should include systematic assessments of proficiency in order to determine how it interacts with language and semantic variation during the learning process. However, the operationalization of L1 and L2 based on subjective self-reported dominance may not fully capture nuanced variability in bilingual proficiency. However, this method of language dominance operationalization is not uncommon. For example, one study found that more than half of 140 studies relied exclusively on participants’ language proficiency questionnaire data (Hulstijn, 2012). Several studies have demonstrated that subjective self-ratings are consistent with objective assessments of language proficiency (Sheng et al., 2014).

6. Conclusions

This study examined how semantic and language context affect bilingual novel word learning, using word-stem completion tasks embedded in sentence contexts. Findings from Experiment 1 support the context variability hypothesis, showing that varied semantic exposure enhances recall, particularly for words with lower orthographic overlap, when the test contexts match those used during learning. However, these benefits did not extend to novel test contexts in Experiment 2, highlighting the challenges of generalizing word knowledge in the absence of semantic retrieval cues. The results further suggest that studying in L2, though more demanding, may promote deeper encoding under certain conditions, consistent with the concept of desirable difficulty. Yet, this effect appears limited when learners are required to generalize meaning across unfamiliar contexts, particularly without sufficient exposure or support.

Taken together, these findings emphasize the importance of both semantic diversity and linguistic context in bilingual word learning. They also point to cognitive and linguistic constraints that may limit learners’ ability to apply new vocabulary across contexts. Future research should explore how increasing exposure duration, promoting active retrieval, or tailoring semantic support to learners’ proficiency levels might enhance the flexible use of newly acquired words.

Author Contributions

Conceptualization, J.L.; methodology, J.L.; software, J.L. and P.F.P.T.; validation, J.L. and P.F.P.T.; formal analysis, J.L. and P.F.P.T.; investigation, J.L. and P.F.P.T.; resources, J.L. and P.F.P.T.; data curation, J.L. and P.F.P.T.; writing—original draft preparation, J.L.; writing—review and editing, J.L. and P.F.P.T.; visualization, J.L.; supervision, J.L.; project administration, J.L.; funding acquisition, P.F.P.T. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded, in part, by a Fulbright Scholar Grant awarded to Pamela Toassi to support her as a Fulbright Scholar at Barry University.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by the Institutional Review Board of Barry University (protocol code 1715119, approved on 19 September 2024).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The original data presented in this study are openly available in the Open Science Framework (OSF) at https://osf.io/68ajb/?view_only=7374066055a94fa991f873b93a225d52 (accessed on 20 June 2025).

Conflicts of Interest

The authors declare no conflict of interest.

References

Baayen, R. H. (2008). Analyzing linguistic data: A practical introduction to statistics using R (8th printing). Cambridge University Press. [Google Scholar]
Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59(4), 390–412. [Google Scholar] [CrossRef]
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lME4. Journal of Statistical Software, 67(1), 1–48. [Google Scholar] [CrossRef]
Bjork, E. L., & Bjork, R. A. (2011). Making things hard on yourself, but in a good way: Creating desirable difficulties to enhance learning. In D. S. Dunn (Ed.), Psychology and the real world: Essays illustrating fundamental contributions to society (pp. 56–64). Worth Publishers. [Google Scholar]
Bjork, R. A. (1994). Memory and metamemory considerations in the training of human beings. In J. Metcalfe, & A. P. Shimamura (Eds.), Metacognition (pp. 185–206). MIT Press. [Google Scholar] [CrossRef]
Bjork, R. A., & Bjork, E. L. (2020). Desirable difficulties in theory and practice. Journal of Applied Research in Memory and Cognition, 9(4), 475–479. [Google Scholar] [CrossRef]
Bogulski, C. A., Bice, K., & Kroll, J. F. (2019). Bilingualism as a desirable difficulty: Advantages in word learning depend on regulation of the dominant language. Bilingualism: Language and Cognition, 22(5), 1052–1067. [Google Scholar] [CrossRef]
Bolger, D. J., Balass, M., Landen, E., & Perfetti, C. A. (2008). Context variation and definitions in learning the meanings of words: An instance-based learning approach. Discourse Processes, 45(2), 122–159. [Google Scholar] [CrossRef]
Bradley, K. A. L., King, K. E., & Hernandez, A. E. (2013). Language experience differentiates prefrontal and subcortical activation of the cognitive control network in novel word learning. NeuroImage, 67, 101–110. [Google Scholar] [CrossRef]
Brysbaert, M., & Duyck, W. (2010). Is it time to leave behind the revised hierarchical model of bilingual language processing after fifteen years of service? Bilingualism: Language and Cognition, 13(3), 359–371. [Google Scholar] [CrossRef]
Brysbaert, M., & New, B. (2009). Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 41(4), 977–990. [Google Scholar] [CrossRef] [PubMed]
Costa, A., Miozzo, M., & Caramazza, A. (1999). Lexical selection in bilinguals: Do words in the bilingual’s two lexicons compete for selection? Journal of Memory and Language, 41(3), 365–397. [Google Scholar] [CrossRef]
Cristoffanini, P., Kirsner, K., & Milech, D. (1986). Bilingual lexical representation: The status of Spanish-English cognates. Quarterly Journal of Experimental Psychology, 38(3), 367–393. [Google Scholar] [CrossRef]
Cuetos, F., Glez-Nosti, M., Barbón, A., & Brysbaert, M. (2011). SUBTLEX-ESP: Spanish word frequencies based on film subtitles. Psicológica, 32, 133–143. [Google Scholar]
De Groot, A. M. (1992). Determinants of word translation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18(5), 1001–1018. [Google Scholar] [CrossRef]
Dijkstra, T., Timmermans, M., & Schriefers, H. (2000). On being blinded by your other language: Effects of task demands on interlingual homograph recognition. Journal of Memory and Language, 42(4), 445–464. [Google Scholar] [CrossRef]
Dijkstra, T., & Van Heuven, W. J. B. (2002). The architecture of the bilingual word recognition system: From identification to decision. Bilingualism: Language and Cognition, 5(3), 175–197. [Google Scholar] [CrossRef]
Elgort, I. (2011). Deliberate learning and vocabulary acquisition in a second language. Language Learning, 61(2), 367–413. [Google Scholar] [CrossRef]
Elgort, I., Brysbaert, M., Stevens, M., & Van Assche, E. (2018). Contextual word learning during reading in a second language: An eye-movement study. Studies in Second Language Acquisition, 40(2), 341–366. [Google Scholar] [CrossRef]
Elgort, I., & Warren, P. (2014). L2 vocabulary learning from reading: Explicit and tacit lexical knowledge and the role of learner and item variables. Language Learning, 64(2), 365–414. [Google Scholar] [CrossRef]
Eskenazi, M. A., Swischuk, N. K., Folk, J. R., & Abraham, A. N. (2018). Uninformative contexts support word learning for high-skill spellers. Journal of Experimental Psychology: Learning, Memory, and Cognition, 44(12), 2019–2025. [Google Scholar] [CrossRef]
Godfroid, A., Ahn, J., Choi, I., Ballard, L., Cui, Y., Johnston, S., Lee, S., Sarkar, A., & Yoon, H.-J. (2018). Incidental vocabulary learning in a natural reading context: An eye-tracking study. Bilingualism: Language and Cognition, 21(3), 563–584. [Google Scholar] [CrossRef]
Hulstijn, J. H. (2012). The construct of language proficiency in the study of bilingualism from a cognitive perspective. Bilingualism: Language and Cognition, 15(2), 422–433. [Google Scholar] [CrossRef]
Lauro, J., & Schwartz, A. I. (2017). Bilingual non-selective lexical access in sentence contexts: A meta-analytic review. Journal of Memory and Language, 92, 217–233. [Google Scholar] [CrossRef]
Lauro, J., Schwartz, A. I., & Francis, W. S. (2020). Bilingual novel word learning in sentence contexts: Effects of semantic and language variation. Journal of Memory and Language, 113, 104123. [Google Scholar] [CrossRef]
Lauro, J., Toassi, P. F. P., & Arêas da Luz Fontes, A. B. (in press). Bilingual word learning: Recognizing novel words in context. Journal of Learning and Instruction.
Lenth, R. V. (2017). emmeans: Estimated marginal means, aka least-squares means. Version 1.11.1, R package. R Team. [Google Scholar] [CrossRef]
Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady, 10, 707–710. [Google Scholar]
MacLeod, C. M., & Kampe, K. E. (1996). Word frequency effects on recall, recognition, and word fragment completion tests. Journal of Experimental Psychology: Learning, Memory, and Cognition, 22(1), 132–142. [Google Scholar] [CrossRef] [PubMed]
Marian, V., & Neisser, U. (2000). Language-dependent recall of autobiographical memories. Journal of Experimental Psychology: General, 129(3), 361–368. [Google Scholar] [CrossRef]
Masson, M. E. J. (1995). A distributed memory model of semantic priming. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21(1), 3–23. [Google Scholar] [CrossRef]
McRae, K., De Sa, V. R., & Seidenberg, M. S. (1997). On the nature and scope of featural representations of word meaning. Journal of Experimental Psychology: General, 126(2), 99–130. [Google Scholar] [CrossRef]
Nelson, J. R., Balass, M., & Perfetti, C. A. (2005). Differences between written and spoken input in learning new words. Written Language & Literacy, 8(2), 25–44. [Google Scholar] [CrossRef]
Perfetti, C. A., & Hart, L. (2001). The lexical basis of comprehension skill. In D. S. Gorfein (Ed.), On the consequences of meaning selection: Perspectives on resolving lexical ambiguity (pp. 67–86). American Psychological Association. [Google Scholar] [CrossRef]
Peters, E., & Webb, S. (2018). Incidental vocabulary acquisition through viewing L2 television and factors that affect learning. Studies in Second Language Acquisition, 40(3), 551–577. [Google Scholar] [CrossRef]
Plaut, D. C., & Booth, J. R. (2000). Individual and developmental differences in semantic priming: Empirical and computational support for a single-mechanism account of lexical processing. Psychological Review, 107(4), 786–823. [Google Scholar] [CrossRef]
Reichle, E. D., & Perfetti, C. A. (2003). Morphology in word identification: A word-experience model that accounts for morpheme frequency effects. Scientific Studies of Reading, 7(3), 219–237. [Google Scholar] [CrossRef]
Schepens, J., Dijkstra, T., & Grootjen, F. (2012). Distributions of cognates in Europe as based on Levenshtein distance. Bilingualism: Language and Cognition, 15(1), 157–166. [Google Scholar] [CrossRef]
Sheng, L., Lu, Y., & Gollan, T. H. (2014). Assessing language dominance in Mandarin-English bilinguals: Convergence and divergence between subjective and objective measures. Bilingualism: Language and Cognition, 17(2), 364–383. [Google Scholar] [CrossRef] [PubMed]
Toassi, P. F. P., Lauro, J., Da Silva Gadelha, L. M., & Carthery-Goulart, M. T. (2023). Effect of interlingual homographs and word frequency on bilingual lexical access. Ilha do Desterro: A Journal of English Language, Literatures in English and Cultural Studies, 76(3), 137–155. [Google Scholar] [CrossRef]
Uchihara, T., Webb, S., & Yanagisawa, A. (2019). The effects of repetition on incidental vocabulary learning: A meta-analysis of correlational studies. Language Learning, 69(3), 559–599. [Google Scholar] [CrossRef]
Zhang, S., & Zhang, X. (2022). The relationship between vocabulary knowledge and L2 reading/listening comprehension: A meta-analysis. Language Teaching Research, 26(4), 696–725. [Google Scholar] [CrossRef]

Figure 1. Estimated probabilities of word-stem completion accuracy as a function of orthographic overlap, study language, and study semantic context conditions.

Figure 2. Estimated probabilities of word-stem completion accuracy as a function of study language, test semantic context, and orthographic overlap. Estimates calculated from the final model (Table 2).

Table 1. Participant Characteristics.

		Experiment 1	Experiment 2
Sample Size		48	44
Median Age		22 (3.51)	21 (6.48)
L1 AoA		4.73 (4.53)	2.73 (3.05)
L2 AoA		3.43 (5.06)	3.99 (5.33)
Subjective L1 Rating (1–7)		6.58 (0.82)	6.83 (0.48)
Subjective L2 Rating (1–7)		6.00 (1.24)	5.86 (1.19)
Way(s) in which L2 was acquired
	L2 Immersion	21 (44%)	16 (36%)
	With friends	26 (54%)	17 (39%)
	At home	25 (52%)	34 (77%)
	At school	40 (83%)	28 (64%)
	At work	7 (15%)	7 (16%)
	Self-taught	20 (42%)	12 (27%)

Note. Percentages may not sum to 100% due to participants selecting multiple acquisition methods.

Table 2. Logistic mixed-effects models for Experiment 1 and Experiment 2.

	Experiment 1			Experiment 2
	Estimate (SE)	z	p	Estimate (SE)	z	p
(Intercept)	−0.10 (0.21)	−0.43	0.67	−2.14 (0.38)	−5.69	<0.001
L1 vs. L2	0.28 (0.08)	3.66	<0.001	−0.19 (0.21)	−0.92	0.36
Single Language vs. Mixed Language	0.04 (0.06)	0.50	0.61	−0.02 (0.11)	−0.18	0.86
Semantic context	0.36 (0.18)	2.00	0.05	−0.15 (0.28)	−0.53	0.59
Test Context	--	--	--	−0.43 (0.74)	−0.58	0.56
Orthographic Overlap	0.07 (0.10)	0.66	0.51	−0.08 (0.17)	−0.49	0.62
L1 vs. L2 * Semantic Context	−0.15 (0.15)	−0.99	0.32	0.39 (0.31)	1.28	0.20
Single vs. Mixed language * Semantic Context	−0.06 (0.13)	−0.46	0.65	−0.09 (0.19)	−0.51	0.61
L1 vs. L2 * Test Context	--	--	--	−0.14 (0.39)	−0.37	0.71
Single vs. Mixed * Test Context	--	--	--	−0.02 (0.20)	−0.09	0.93
Semantic Context * Test Context	--	--	--	−0.12 (0.56)	−0.22	0.82
L1 vs. L2 * Orthographic Overlap	−0.05 (0.09)	−0.57	0.57	0.09 (0.20)	0.44	0.66
Single Language vs. Mixed Language * Orthographic Overlap	0.14 (0.07)	2.02	0.04	−0.05 (0.11)	−0.47	0.64
Semantic Context * Orthographic Overlap	0.02 (0.20)	0.11	0.91	−0.23 (0.33)	−0.68	0.49
Test Context * Orthographic Overlap	--	--	--	0.05 (0.33)	0.16	0.87
L1 vs. L2 * Semantic Context * Test Context	--	--	--	0.06 (0.61)	0.10	0.92
Single vs. Mixed * Semantic Context * Test Context	--	--	--	0.20 (0.37)	0.55	0.58
L1 vs. L2 * Semantic Context * Orthographic Overlap	−0.31 (0.17)	−1.82	0.07	0.04 (0.40)	0.10	0.92
Single vs. Mixed language * Semantic Context * Orthographic Overlap	0.0.10 (0.14)	0.72	0.47	−0.14 (0.21)	−0.68	0.50
L1 vs. L2 * Test Context * Orthographic Overlap	--	--	--	0.76 (0.40)	1.93	0.05
Semantic Context * Test Context * Orthographic Overlap	--	--	--	0.30 (0.66)	0.58	0.57
L1 vs. L2 * Semantic Context * Test Context * Orthographic Overlap	--	--	--	1.05 (0.79)	0.46	0.65
Single vs. Mixed * Semantic Context * Test Context * Orthographic Overlap	--	--	--	0.12 (0.42)	1.32	0.19

* Note. Semantic Context refers to the study phase context (repeated or varied); Test Context refers to the test phase context (repeated or varied).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lauro, J.; Toassi, P.F.P. Bilingual Contextual Variability: Learning Words in Two Languages. Educ. Sci. 2025, 15, 1264. https://doi.org/10.3390/educsci15091264

AMA Style

Lauro J, Toassi PFP. Bilingual Contextual Variability: Learning Words in Two Languages. Education Sciences. 2025; 15(9):1264. https://doi.org/10.3390/educsci15091264

Chicago/Turabian Style

Lauro, Justin, and Pamela Freitas Pereira Toassi. 2025. "Bilingual Contextual Variability: Learning Words in Two Languages" Education Sciences 15, no. 9: 1264. https://doi.org/10.3390/educsci15091264

APA Style

Lauro, J., & Toassi, P. F. P. (2025). Bilingual Contextual Variability: Learning Words in Two Languages. Education Sciences, 15(9), 1264. https://doi.org/10.3390/educsci15091264

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Bilingual Contextual Variability: Learning Words in Two Languages

Abstract

1. Introduction

2. Present Study

3. Experiment 1

3.1. Method

3.1.1. Participants

3.1.2. Materials

Stimuli

3.1.3. Design

3.1.4. Procedure

Study Phase

Testing Phase

3.1.5. Data Analysis

3.2. Results

3.3. Experiment 1 Discussion

4. Experiment 2

4.1. Method

4.1.1. Participants

4.1.2. Material

4.1.3. Design and Procedure

4.1.4. Data Analysis Plan

4.2. Results

4.3. Discussion

5. General Discussion

Limitations

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI