The Complexity in Bilingual Code-Switching Research: A Systematic Review

: This systematic review explored how researchers operationalized bilingualism when investigating the relationship between bilingual code-switching experience and cognition. Through a PRISMA-guided systematic review of thirty-two studies with original data, published in English, focusing on adult non-clinical samples, with bilingualism as a key variable, we aimed to understand the prevalence of these issues. Criteria for inclusion required an assessment of bilingualism beyond language proficiency or age of acquisition, and consideration of naturalistic code-switching behaviors. We report our results through an analysis of themes that included aspects of language that are considered when measuring bilingualism and code-switching experience. We present our findings and offer insights for future research, advocating for the inclusion of sociocultural factors and more complex analytical modeling in bilingualism research to foster an evolution in the field


Introduction
Bilingualism is a norm around the world, with current estimates placing over half of the world's population as speaking two or more languages (Grosjean 2010).Research into the effects of bilingualism on the developing mind has been a topic of study since at least the late 1800s, with views from early work captured by the following excerpt: "If it were possible for a child or boy to live in two languages at once equally well, so much the worse.His intellectual and spiritual growth would not thereby be doubled but halved" (Laurie 1890, p. 16).The prevailing attitude that learning a second language during childhood was a detriment was the norm until Peal and Lambert (1962) conducted a study that controlled for factors that were responsible for the negative findings (second language type, level of bilingualism, and sociocultural status).
From then until the early 2010s, studies generally reported that bilingualism had a positive effect on cognition (for a review, see Bialystok 2011).A resurgence in the prevalence of bilingualism research was spurred by the finding that bilinguals simultaneously activate both of their languages regardless of the language that they speak (Kroll et al. 2014;Thierry and Wu 2007), resulting in a flurry of studies investigating the cognitive ramifications of bilingualism (Kroll et al. 2015).Early results from this period of research suggested that bilingualism conferred a large "bilingual advantage" across a variety of cognitive domains such as attentional control and working memory, and extended to higher-order skills such as metalinguistic awareness and abstract and symbolic thinking (Adesope et al. 2010).However, recent attempts to replicate these initial findings have shown effects that are small or null, leading some researchers to claim that either there is no bilingual advantage, or if there is, then the circumstances in which it exists are "very specific and undetermined" (Paap et al. 2015).Since then, a softer form of the bilingual advantage hypothesis has emerged (Woumans and Duyck 2015), with a renewed focus on identifying the specific circumstances under which bilingualism may affect cognition (de Bruin 2019).

Complexity in Defining the Bilingual Experience
One possible reason for the mixed results in this field is that researchers studying bilingualism have often treated language experience as a binary variable, categorizing individuals as either "bilingual" or "monolingual (de Bruin 2019)."The issue with grouping all bilinguals together for straightforward comparisons with monolinguals is that individuals do not share the exact same set of experiences.Bilinguals are a diverse and multifaceted group, differing in the languages they speak, the way they acquired those languages, and the contexts in which they use them (Luk and Bialystok 2013).These variations in bilingual experiences have been hypothesized to influence the cognitive effects of bilingualism, resulting in different cognitive outcomes among bilingual individuals (Green and Abutalebi 2013).

Our Systematic Review
Although research on bilingualism has been conducted for many decades (Hakuta 1986), current theoretical advances in code switching (Green and Abutalebi 2013;Green and Wei 2014), recent calls for culturally informed and culturally sensitive research (Marian and Hayakawa 2021), and the proposal of an experience-dependent bilingual advantage (Blanco-Elorrieta and Pylkkänen 2018) have triggered an increase in studies looking at when/where/how bilinguals use their languages in their daily lives.The research on the effects of bilinguals' language-switching experience on cognitive outcomes suggests that there are nuanced differences around how bilingualism is measured and operationalized.In this systematic review, we provide a brief overview of the current ways in which researchers are studying and operationalizing the relationship between bilingualism, naturalistic language use, code switching, and cognition.We review some of the findings and limitations in current research, and highlight potential avenues that have been proposed in order to address some of those concerns.Furthermore, methodology for assessing code switching has not been reviewed to assess commonalities across language questionnaires, strengths, and limitations across the most common operationalizations used for code switching.Therefore, we conducted a systematic review of the last decade's (2012)(2013)(2014)(2015)(2016)(2017)(2018)(2019)(2020)(2021)(2022)(2023) literature to view the innovations and integrations of current theoretical advances and paradigms on code switching and cognitive research for bilingual populations.Through our review, we aimed to answer the following research questions: • What aspects of language are being considered when measuring bilingualism?

•
How is language use/code switching being measured/operationalized?
We sought to identify the ways in which bilingualism has been measured, what the field is missing, why it is important for us to consider the missing elements, and hopefully identify some potential avenues that allow us to consider the complex and nuanced interactions between language and the contexts in which we use them.

Search Strategy and Inclusion Criteria
We conducted our systematic review using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement method and the PRISMA checklist (Page et al. 2021).Specifically, the PRISMA method covers a clear and efficient process that guides the researcher through a systematic review which has been effectively used for other reviews on bilingualism research (e.g., Giovannoli et al. 2020;Van den Noort et al. 2019).Based on previous systematic reviews' search methods for bilingualism (Adesope et al. 2010), we performed our literature search using the databases of EBSCO, PsychINFO, and Web of Science, searching for peer-reviewed articles published between 2012 and 2023.The search was conducted in August 2023.We used different combinations of our keywords to maximize our search findings: "bilingual*" AND "switch*", AND "cognit*".Our initial search criteria retrieved 1620 published articles combined, and after eliminating duplicates, a total of 813 articles were kept for possible inclusion.
The inclusion criteria were (1) the study being fully published in English, (2) spoken bilingualism being a main variable in the study, (3) the sample being focused on adults (18 years or older), (4) the study not using a clinical sample, (5) the study being based on original data, (6) the study measuring at least one non-verbal cognitive outcome, and ( 7) the methods and results indicating an assessment of naturalistic code-switching behavior.The first author of this project first checked all 813 papers' abstracts for possible inclusion, where 519 papers were eliminated based on the abstract content not indicating a focus on our main variables.The remaining 294 articles were fully read and screened separately, following our inclusion criteria, by the first and last authors, who met weekly for consensus on articles that met the eligibility criteria for this study.
The process of elimination followed previous similar literature reviews looking for mention or evidence (or absence) of each criterion (Adesope et al. 2010;Giovannoli et al. 2020;Van den Noort et al. 2019).Zotero citation manager software (6.0.30) was used for organizational purposes (Corporation for Digital Scholarship 2023).Figure 1 provides a full breakdown of the articles eliminated for not meeting each inclusion criterion.Of the articles reviewed, we found that 30 met the full criteria for our systematic review.As an additional step suggested by the PRISMA checklist (Page et al. 2021), we conducted a forward search, cross-checking the references for the 30 included articles for any relevant work, and found 2 additional articles that met our inclusion criteria (Gullifer et al. 2018;Pot et al. 2018).Thus, our systematic review included a final sample of 32 published articles.

Data Extraction
The included studies were reviewed using a synthesis matrix where we tabulated study background information such as authors' names, sample characteristics, and methodology (see Table 1).While our systematic review did not conduct a traditional thematic analysis, we followed the recommended guidelines for thematic analysis during our review (Braun and Clarke 2006) to address specific themes.Given our research questions and study focus, two of the authors fully reviewed each article and separately summa-

Data Extraction
The included studies were reviewed using a synthesis matrix where we tabulated study background information such as authors' names, sample characteristics, and methodology (see Table 1).While our systematic review did not conduct a traditional thematic analysis, we followed the recommended guidelines for thematic analysis during our review (Braun and Clarke 2006) to address specific themes.Given our research questions and study focus, two of the authors fully reviewed each article and separately summarized information about methodology and results for bilingualism and code switching as our main themes.Both researchers met after reviewing the first 10 articles (about 32%; sorted in alphabetical order by the first author) to check the initial inter-rater reliability and reduce the risk of bias for these themes.Following McHugh's (2012) recommendations for calculating the inter-rater reliability's Kappa statistics, we calculated each percentage by dividing the number of summaries matching between both researchers by the total number of summaries completed at each stage.Inter-rater reliability was primarily tested to increase rigor in our review analysis as well as establish replicability (Syed and Nelson 2015).The first inter-rater reliability test indexed by Cohen's kappa (K) was high (K = 0.92).A second meeting was held to discuss the findings on the remaining 22 articles to ensure that agreement remained consistent, and high inter-rater reliability was once again achieved (K = 0.87).The two researchers resolved any disagreements by holding discussions until consensus was reached.The dense code-switching index, but not dual-language index, was predictive of the overall EF accuracy score.

Results and Discussion
Our search drew 813 articles, of which 32 studies fully met our inclusion criteria for a systematic review focused on the relationship between bilingualism, code-switching practices, and cognition (for a list of included articles, see Table 1).We summarized our results based on the main characteristics across these studies, the specific methodology for assessing bilingualism, code switching and cognitive outcomes, implications, and limitations.An initial important finding in our review was that a large portion of the bilingualism literature over the last decade has been centered around just two aspects of bilingualism: self-reported age of acquisition and language proficiency.Specifically, 264 articles were excluded from our review because they did not take into account naturalistic codeswitching behaviors in their operationalization of bilingualism.This suggests that much of the experimental research on the relationship between code switching and executive functions, which traditionally involves laboratory tasks designed to assess participants' language-switching abilities, does not account for how bilinguals regularly use or change between their languages.

How Is Bilingualism Being Measured in Research?
One of the main goals of our review was to explore what aspects of bilingualism are considered when assessing this variable in research.Across all 32 articles included in this review, we found that bilingualism is typically assessed using measures such as self-reported language history or language experience questionnaires, self-rated proficiency tests, vocabulary tests (general, receptive/productive), verbal fluency tasks, and picture-or image-naming tasks.
Every study in our review used some form of self-report language measure to capture language characteristics of interest such as self-reported proficiency, age of acquisition (AoA), or language use.We found that the majority, 62.2% of studies, used some type of standardized self-reported questionnaire to assess language history and language experiences.Among the most common questionnaires were the Language History Questionnaire, 47.8% (LHQ; Li et al. 2014Li et al. , 2020)); the Language Experience and Proficiency Questionnaire, 39.1% (LEAPQ; Marian et al. 2007); and the Language Social and Background Questionnaire, 17.3% (LSBQ; Anderson et al. 2018).All of these questionnaires seek information about participants' self-rated proficiency, age of acquisition, contextual language use, language exposure over their lifetime, and media consumption habits.Of note is that all three questionnaires cover traditional media such as reading, TV, and music, but the LHQ and LSBQ ask additional questions regarding social media and internet use.Additionally, the LHQ and the LSBQ ask participants to estimate the frequency of mixed language use in conversation and the language partners that are involved in this language switching (parents/family, friends; LHQ: coworkers/roommates/others; LSBQ: social media).The LEAP-Q was the only survey questionnaire that did not ask participants about mixed language use, although the authors state they are considering including questions regarding code-switching practices and exposure into future iterations of the questionnaire (Kaushanskaya et al. 2020).
Language proficiency.Language proficiency is often defined as a measure of a bilingual's ability to comprehend, speak, read, and write in a language.It is often the primary measure used to determine whether or not an individual is considered bilingual (Bedore et al. 2012;Blumenfeld and Marian 2007;Luk and Bialystok 2013).Proficiency is often measured via self-reported proficiency in L1 and L2 or via objective measures of language proficiency such as lexical decision, language fluency, and sentence completion tasks (de Bruin 2019; Surrain and Luk 2017).Proponents of self-assessment generally use questionnaires because of the relative ease with which individuals can reflect on their proficiency in understanding, speaking, reading, and writing (Treffers-Daller 2018).Support for the use of these questionnaires comes in the form of the moderately high correlation (r = 0.63) between a participant's subjective perception of language proficiency and their actual performance on objective language proficiency measures (Zell and Krizan 2014).Over-all, self-report language questionnaires were one of the most common measures used to assess bilingualism.
Proficiency has been used to dichotomize participants as being either balanced or unbalanced bilinguals.Balanced bilinguals include those who have/report similar levels of language proficiency across both languages, whereas unbalanced bilinguals report a difference in their language proficiencies.In a survey of the field between 2005 and 2015, Surrain and Luk (2017) found that bilinguals are commonly operationalized in terms of their second language proficiency.However, the development of language proficiency is a dynamic process and can be influenced by other language-related factors such as age of acquisition; language exposure; and language use, whether frequency-, activity-, or setting-based, or by context-based switching (De Houwer and Ortega 2018).
Age of acquisition.In the context of bilingualism, it stands to reason that the more a person hears and interacts with others in a second language (L2), the greater their subsequent language skills (language proficiency) will be (e.g., reading, speaking, writing, and listening skills).According to Bronfenbrenner and Ceci's (1994) Bioecological Model, human development occurs directly as a result of an individual's interactions with their immediate environment.In order for those reciprocal interactions to effectively shape neural outcomes, the interactions must occur with regularity and over extended periods of time.Consequently, the earlier that these interactions occur, the greater their potential influence on subsequent development.As a result, age of acquisition is a common measure in bilingualism research since the age at which a person acquires their second language plays a significant role in their subsequent opportunities to engage in it.Researchers typically use AoA to divide their participants into groups based on years of language experience.AoA can be used to compare the temporal order of when the language was learned, referred to as simultaneous bilinguals when both languages are learned concurrently or as sequential bilinguals when one language is learned after the first has already developed (McLaughlin 1977).It can also be used to delineate the specific time at which the languages were learned, where sequential bilinguals can either be considered early if the second language was acquired before the age of six or late when acquired any time after early childhood (De Houwer and Ortega 2018).
Historically, the field of bilingualism research has often relied on a combination of language proficiency and age of acquisition to formulate an index for bilingualism.Under this framework, individuals who acquired a second language early in life were typically considered "balanced" bilinguals, while those who began learning a language later were often deemed unlikely to achieve native-like fluency.However, there is a problem with this perspective.It operates under the assumption that early exposure to a language almost certainly results in mastery.Contemporary researchers in bilingualism emphasize that this is an oversimplified view.Numerous sociolinguistic factors can play a pivotal role in determining whether an individual not only achieves, but also retains bilingual proficiency (Singleton and Pfenninger 2018); we describe some below.
Language Use.The Adaptive Control Hypothesis (ACH, Green and Abutalebi 2013) argues that the cognitive impacts of bilingualism largely depend on when and how bilinguals switch between languages in different interactional contexts throughout their lives.The ACH specifies three distinct interactional contexts: a single-language context (SLC), a dual-language context (DLC), and a dense code-switching context (DCS).In the SLC, one language is exclusively used in a particular setting (e.g., L1 exclusively used at home, L2 exclusively used at work/school).In this context, the environment itself serves as a cue for the appropriate language selection.As a result, the researchers hypothesize that this pattern of language use trains the use of proactive control in order to maintain the current language while monitoring for conflict and preventing interference from the language they are not currently using.Conversely, in the DLC, bilinguals use two or more languages in a single context.They may switch languages based on their conversational partner within the same context or engage in intersentential code switching with a single conversational partner.For instance, they might converse in L1 with one coworker and in L2 with another.
According to the ACH, the DLC not only hones an individual's ability to discern environmental cues for language selection but also reinforces the proactive control processes found in the SLC.Moreover, the frequent alternation between languages enhances the efficiency of the task-switching system, facilitating smoother transitions between tasks.Lastly, the DCS is characterized by bilinguals frequently engaging in conversations with other bilinguals, seamlessly transitioning between languages within a single utterance.Although the hypothesized contexts in the ACH are based on patterns of language use, they also invoke the construct of code switching.A potential confound highlights the need to distinguish between language use and code switching when considering interactional contexts.When studies find that a person's dual-language context experience (DLC) is related to cognitive outcomes, the observed association results from switching between languages with distinct conversational partners or as a result of a specific form of code switching (intersentential) with a single conversational partner.

How Is Code Switching Being Measured in Bilingualism Research?
The next question we addressed in our study was "how are researchers assessing code switching?".Code switching refers to the ability of bilingual individuals to fluidly switch between their languages in response to internal and/or environmental cues (Beatty-Martínez et al. 2020).Throughout the literature, this construct is referenced by a variety of names (e.g., language switching, language mixing, or code mixing) (Rodriguez-Fornells et al. 2012).This variability extends to the definitional boundaries between code switching and code mixing.While some researchers assert distinct definitions, attributing language switches within sentences and language switches between sentences to code mixing and code switching, respectively, a substantial portion of the academic discourse uses "code switching" as an overarching term to include all forms of bilingual language use, blurring traditional distinctions (Ritchie and Bhatia 2012).One of the key findings from our study is that the operationalization of these code-switching measures varies substantially across studies.Some studies focused on the frequency of code switches, while others considered the different types of code switches bilingual users engage in.Others account for the interactional contexts in which code switching is present, and some utilize questions assessing various combinations of the three.
Initially, code-switching research investigated the effects of an individual's codeswitching frequency.In contrast to prior work that had found a moderating effect of L2 proficiency on cognitive control, Verreyt et al. (2016) found that it was language-switching frequency (e.g., "How often are you in a situation in which you switch languages") instead of L2 proficiency that moderated the associations with cognitive control assessed via Flanker and Simon tasks.Xie (2014) found that language-switching frequency was negatively related to task-switching performance, whereas no performance differences were found between low and high L2 proficiency levels.Barbu et al. (2018) compared high-frequency switchers to low-frequency switchers and found that high-switching bilinguals switch between tasks faster than low-switching bilinguals.Jiao et al. (2019) and Sanchez-Azanza et al. ( 2020) created a composite language-switching frequency variable by averaging participants' responses across the distinct code-switching types measured in the Bilingual Switching Questionnaire (BSWQ; Rodriguez-Fornells et al. 2012).Jiao et al. (2019) found no relationship between switching and cognitive outcomes, whereas Sanchez-Azanza et al. (2020)'s latent code-switching variable was associated with the latent Inhibition, Updating, and Shifting executive control variables in different directions.A problem with only accounting for code-switching frequency is that the cognitive effects of switching between languages can depend on the intentionality of the switch, whether the switch occurs within or between sentences, and even on the amount or type of words being switched (Festman 2012;Zeller 2020).
The Bilingual Switching Questionnaire (BSWQ; Rodriguez-Fornells et al. 2012) evaluates individual variations in language switching, focusing on four distinct types of code switching, each with its unique association with cognitive control mechanisms.These types include switching from Language 1 to another language (L1S), switching from Language 2 to another language (L2S), context-dependent switching (CS), and unintended switching (US).The BSWQ was used and adapted to various languages in 12 of the 32 papers we reviewed (Beatty-Martínez et al. 2020;Chan et al. 2020;Han et al. 2022;Hartanto and Yang 2020;Jiao et al. 2019;Jylkkä et al. 2017;López-Penadés et al. 2020;Ooi et al. 2018;Pot et al. 2018;Sanchez-Azanza et al. 2020;Thanissery et al. 2020).While the BSWQ seems to be the most common measure for these types of code switching, other measures were used to further delineate different forms of code switching.
Another way in which code switching can be characterized is whether the language switch occurs within or between sentences (Muysken 2013).Intersentential code switching occurs between sentences, where a complete sentence in one language is followed by a new sentence in another language.On the other hand, intrasentential code switching involves a seamless switch within a single sentence or utterance, without any pause or hesitation.Based on an examination of sociolinguistic corpora Muysken (2000) outlined three distinct types of within-sentence (intrasentential) code switching: insertion, alternation, and dense code switching.Insertion involves taking an element from one language and placing it into the morphosyntactic frame of the other.For example, an English-dominant frame: "I need to finish my trabajo (work) by tomorrow."Alternation involves a smooth shift from one language to another at the end of a clause.E.g., "I was reading a fascinating book, y de repente, me di cuenta que casi eran las tres de la mañana (and suddenly I realized it was almost three AM)."Dense code switching (also known as congruent lexicalization) involves the use of shared grammatical structures and concepts between the two languages; in practice, this means that the speakers can freely choose the word and language that best matches their communicative goal.E.g., "Ayer fuí al mercado to buy some frutas, pero estaban way too caras.(I went to the market yesterday to buy some fruits but they were way too expensive)".
Three studies (Hofweber et al. 2016(Hofweber et al. , 2020a(Hofweber et al. , 2020b) ) presented their participants with examples of various code-switching types: (1) insertion of English into German, (2) insertion of German into English, (3) alternation, and (4) dense code switching, and had their participants provide frequency judgments regarding their social networks' usage of the different types of code switches.
These measures were often used in conjunction with language use measures such as the Revised Bilingual Interactional Context Questionnaire (RCSICQ; Hartanto and Yang 2020) or Patterns of Language Use Questionnaire (PLUQ; Kałamała et al. 2020) in order to obtain a more accurate portrayal of the proportion of time that individuals use their languages across a variety of contexts (SLC/DLC/DCS).Kałamała et al. (2020) adapted the Patterns of Language Use Questionnaire (PLUQ) and the Code-switching and Interactional Context Questionnaire (CSICQ; Hartanto and Yang 2016) to measure both intersentential and intrasentential code switching.Kharkhurin and Wei (2015) developed for their own study the Code-Switching Attitudes and Behaviors Questionnaire, where participants would rate their code-switching behavior, emotional state, and contexts when doing so.Carter et al. (2023) adapted their own hybrid Switching Experience and Environments Questionnaire (SEEQ) using questions from the BSWQ about types of code switching, the interactional contexts, and language use in the Adaptive Control Hypothesis (Green and Abutalebi 2013) across four distinct social contexts from Hartanto and Yang (2016)'s Code-Switching and Interactional Contexts Questionnaire.
Four studies (Carter et al. 2023;Gullifer et al. 2018;Kałamała et al. 2020;van den Berg et al. 2022) used Gullifer et al. (2018)'s concept of language entropy in order to represent the within-person variance across an individual's context-based language use.They used the concept of Shannon's Entropy in Information theory in order to mathematically formulate the overall balance of language use across communicative contexts.Low entropy scores are generally indicative of an individual who experiences very little uncertainty in what language is needed in a particular context (e.g., they pretty much exclusively use English at work/school and Spanish at home).High entropy scores generally reflect an individual who spends most of their time in contexts where there is a high diversity of language use.This measure is meant to address some of the shortcomings of alternative approaches that make the assumption that bilinguals spend all of their time in one of the contexts specified in the ACH.Supporting the use of this measure, and of bilingualism as a continuum, studies have found that participants' language entropy scores have been found to moderate the degree of functional connectivity between the language and control networks of the brain (Carter et al. 2023;Li et al. 2021;Sulpizio et al. 2020).The remaining studies created their own questionnaires to include a selection of the items measured.Some researchers such as Jylkkä et al. (2020) question the reliability and validity of self-report code-switching questionnaires and propose the use of Ecological Momentary Assessments as a viable alternative.They argue that EMAs or repeated measurements over the course of a two-week period can provide a more representative measure of naturalistic code-switching behavior than a cross-sectional time point.Most of the studies in our review used subjective measures of code switching, although a few utilized objective measures.
In contrast to studies that use self-report measures, some studies experimentally manipulated the languages being used across trials.Similar to broader cognitive work examining task switching, these studies (Beatty-Martínez et al. 2020;Jylkkä et al. 2020) focused on measures in terms of switch or mixing costs.In bilingualism work, both switching and mixing costs refer to the cognitive costs associated with switching between two languages in an experimental task.They are measured by comparing the performance speed and accuracy in mixed-language conditions (where the individual must switch back and forth between languages) versus single-language conditions.A switch cost is the specific cost of changing languages on a given trial, while the mixing cost is the general cost associated with being in a context (block) where such changes can occur.For instance, a bilingual person might be asked to name pictures in only one language (single-language block) and then to name pictures by switching between two languages (mixed-language block) If the person is slower or makes more errors in the mixed-language block, this could be attributed to the "mixing cost".The "switch cost" represents the additional time needed to process a language switch between individual trials when compared to the processing time between trials with a consistent language.Han et al. (2022) used the difference between the number of exemplars produced in the mixed-language condition and single-language conditions in the Semantic Verbal Fluency Task as a measure of natural code-switching proficiency.
Evidence from these paradigms points to the idea that switching between languages is a cognitively demanding task (Gade et al. 2021).These experimental results run counter to the experiences in bilingual communities around the world where switching languages is seen as effortless.Researchers (Beatty-Martínez et al. 2018;Blanco-Elorrieta and Pylkkänen 2018) suggest that experimental manipulations aimed at measuring code switching in the lab inadvertently introduce artificial constraints that differ from how participants actually use their languages in their day-to-day lives; it is suggested that the resulting switch costs actually reflect task-specific difficulties and not the intended differences in languageswitching ability.Zhu et al. (2022) further explored this hypothesis in their experimental work.They found that factors such as voluntary language choice and consistency in language use, which are characteristic of natural settings, play a crucial role.Specifically, these factors seem to mitigate the significant cognitive demands and performance impairments typically observed in controlled scenarios involving switching between languages.
Our review found examples of researchers developing innovative experimental tasks that more closely resemble naturalistic language-switching contexts in order to minimize those artificial constraints.Lai and O'Brien (2020) and Keijzer and Schmid (2016) both used interviews to elicit naturalistic switching behaviors as part of their experimental paradigms.Keijzer and Schmid (2016) conducted the first half of their semi-structured interviews in Dutch before functionally switching to English and subsequently analyzed how often their participants code-switched during the interview.
Lai and O'Brien (2020) created two distinct tasks in order to assess code switching.The first task was a cued story recount task designed to elicit semi-cued language switching by getting participants to recount a story (e.g., Little Red Riding Hood) while using both English and Mandarin.The story was initially presented as an auditory narration with accompanying PowerPoint slides with sentences in English, Mandarin, and English-Mandarin mixed sentences, with intersentential swaps occurring twice within the story.Next, Lai and O'Brien created a naturalistic conversation task in order to elicit uncued language switching.Since their laboratory was a traditionally English-dominant context, they engaged their participants in conversation, asking them to share their thoughts about the story and their favorite childhood story with the experimenters speaking exclusively in Mandarin.Ng and Yang (2021) constructed a bilingual opportunistic verbal planning task.In this task, participants were presented with sentences that were missing a word and asked to complete the sentences with the words that best fit the context.The participants were unaware that the sentences were constructed in such a way that the most appropriate answer would require them to code-switch.These kinds of experimental tasks serve as a form of carefully designed speech elicitation task that allows researchers to increase their study's ecological validity while maintaining a high degree of experimental control (Beatty-Martínez et al. 2018).

Recommendations
This systematic review investigated how researchers operationalized bilingual language use and code switching across 32 studies investigating the cognitive effects of bilingualism published between 2012 and 2023.First, this review focused on specific aspects of language that were being considered when measuring bilingualism.Secondly, this review was interested in how bilingualism researchers were measuring code switching in their samples.The articles identified in this systematic review have mostly focused on the interaction between bilinguals' language knowledge/use (bilingualism) and cognition.Our main suggestions for the field will be twofold; first, how we can improve our assessment and exploration of this person-specific domain.Secondly, an understanding of the intricate relationship between bilingualism and cognition within an individual necessitates a nuanced consideration of both the languages in question and the sociocultural variables informing a person's language use.
Muysken (2013)'s theory regarding bilingual optimization strategies suggests that the strategies bilinguals employ are influenced by a combination of language-specific factors, person-specific factors, and sociocultural factors influencing language use.The first layer of language-specific factors, perceived language distance, refers to the degree of similarity there is between the two languages.Languages that are closely related (e.g., Spanish and Italian) may have more overlapping sounds, vocabulary, grammatical structures, or ways of expressing ideas, making certain types of dense code switching or alternation easier and more frequent.In contrast, languages that are more distant from each other (e.g., English and Japanese) might lead to different optimization strategies, potentially resulting in a decrease in dense code switching but an increase in insertion (Muysken 2013).The second layer incorporates a consideration of the bilingual individual by examining the language processing capacity of the bilingual themselves.This element is what the majority of the articles in our review have centered on.Individual elements of bilingual knowledge include aspects like language proficiency levels, order of acquisition, or dominance effects, which in turn impact the ease of access to vocabulary and structures in each language.Bilinguals often optimize their language use to minimize cognitive effort, which can lead to varying degrees of code switching, borrowing, or avoiding certain linguistic structures that are more challenging to process.Finally, the third layer posits that the social environment and context in which a bilingual individual operates play a crucial role in determining how they use their languages.Factors such as the status and prestige of languages, the societal attitudes towards bilingualism, and the specific needs of communication in a given social setting can all influence which language is used and how.The critical aspect is that in order to study the relationship between bilingualism and cognition, we have to consider the interactions across all of these elements.

Operationalizing Bilingualism
Given the inherent diversity among bilingual individuals, it is crucial for researchers to choose measures and participants that align closely with their research hypotheses.Recognizing the variations in how we define and operationalize bilingualism, however, is critical.How bilingualism is defined and operationalized can significantly influence the interpretation and generalizability of research outcomes.First, varying criteria for what constitutes bilingualism can lead to different participant inclusion standards.Second, employing different metrics for bilingualism captures distinct facets of language use which can lead to misleading comparisons across seemingly similar, but fundamentally different, language constructs.Lastly, if bilingualism is operationalized in a study-specific way, the findings might/should not be expected to generalize to populations with different bilingual experiences.
One of the key obstacles in expanding bilingualism research beyond the confines of the tried and true measures most frequently used in the literature (i.e., L2 AoA, L2 usage, L2 proficiency, etc.) is the difficulty in quantifying complex bilingual experiences into continuous measures that can be incorporated in our statistical models.When examining bilingualism, the tendency is to consider aspects of bilingualism in isolation or to utilize specific variables to create distinct groupings.In their meta-analysis on the effects of bilingualism on executive functioning, Lehtonen et al. (2018) point out that it can be difficult to compare the results emerging from studies that use dichotomized proficiency groups (low vs. high) since the criteria used to sort participants are not consistent.This approach can inadvertently oversimplify the intricate and multifaceted nature of bilingual experiences.By focusing on select indicators, researchers may inadvertently neglect the interplay of various factors that contribute to an individual's development as a bilingual individual.Furthermore, creating rigid groupings based on isolated variables can lead to an incomplete or skewed understanding of bilingualism, potentially marginalizing or misrepresenting certain bilingual experiences.Such an approach can also lead to overgeneralizations, failing to capture the complex nuanced differences across the experiences of bilinguals around the world.
Traditionally, one of the difficulties associated with the utilization of language questionnaires has been that researchers have been left to their own devices when it comes to interpreting the results.In recent years, the authors of the three most common measures have provided researchers with various degrees of support in interpreting their results.The LEAP-Q (Kaushanskaya et al. 2020) recommends that each research team ultimately decide the proficiency thresholds they want to use in conjunction with other aspects of their language experience.Kaushanskaya et al. (2020) recommend that researchers conduct factor or correlational analyses on their own LEAP-Q data before they combine participant responses.Anderson et al. (2018) provide an interpretation guide for the LSBQ and recommended cut-off scores for researchers who wish to discretize the continuous outcome variable into categorical groups.The LHQ-3 (the LHQ's current version) gives researchers the option to collect and analyze data on their own or manage their project via a web interface that provides users with the ability to create scores that represent overall proficiency, dominance, and immersion levels for each participant's known languages.
Even though the authors of these bilingualism questionnaires (Anderson et al. 2018;Kaushanskaya et al. 2020) recommend that researchers use their instruments in combination with behavioral measures of language proficiency, many studies elected to exclusively use self-report measures.When deciding whether to use objective or self-report measures in a study, researchers need to consider whether the objective measure is accurately capturing the construct of interest.The biggest strength associated with using self-report measures is their ease of use, since researchers can disseminate these questionnaires quickly and efficiently to gather participant responses.While self-report and objective measures of language proficiency seem to correlate (Zell and Krizan 2014), there is little evidence that subjective measures of code switching are measuring the same underlying construct as traditional objective code switching measures (Blanco-Elorrieta and Pylkkänen 2018; Woumans et al. 2019).Because of this, if researchers are interested in utilizing an objective measure of code switching, we recommend including subjective measures as well so that researchers can assess the relationship between the subjective and objective measures within their sample.
The variety of definitions of language-switching behaviors we observed in our review suggests that a proper operationalization of code switching has to move beyond simply measuring the frequency of code switching.Considerations to take into account are the inclusion of measures that account for the types of code switching that individuals can engage in (intersentential, intrasentential, unintentional), the language-based directionality of the switch (L1, L2), and the separation of unintentional switching.Finally, these should be complemented with measures that consider the languages being used and the contexts in which individuals use their languages, since these factors have been found to differentially impact the cognitive outcomes associated with code switching.When selecting between measures that account for participants' interactional contexts and language use, we recommend that researchers use the Revised Bilingual Interactional Context Questionnaire (RCSICQ; Hartanto and Yang 2020) or Patterns of Language Use Questionnaire (PLUQ; Kałamała et al. 2020) over Hartanto and Yang (2016)'s Interactional Context Questionnaire, since the newer measures are better able to capture variations across individuals who experience a mixture of bilingual interactional contexts.
Given the impact of different operationalizations of bilingualism on the generalization of findings, it is important to consider these variations when interpreting study outcomes in aggregate, as pooling results from studies with varying definitions of bilingualism may further obfuscate the results.If studies have used different measures and thresholds for defining bilingualism, then combining their results in a meta-analysis may lead to heterogeneous findings that are difficult to interpret and generalize.Additionally, metaanalyses often involve pooling results across different populations, languages, and contexts, which may obscure the specific factors that contributed to the observed effects.This can make it difficult to determine the specific conditions under which the cognitive effects of bilingualism are present, and to make useful recommendations for future research we need to account for the specific languages involved and the sociocultural conditions for language use.

Going beyond Switch Frequency or Type
Not all bilinguals switch between languages, but among those who do, there are differences in where, when, and why they choose to do so (Muysken 2013).The prevalence of these different types of code switching varies across sociocultural groups, with recent immigrants tending to favor insertion, alternation being associated with contexts in which the use of a language carries sociopolitical implications, and dense code switching occurring in communities with a long history of bilingualism (Muysken 2000).Sociolinguistic factors such as cultural norms can also influence code switching by determining the appropriate context for a switch.In metropolitan bilingual communities around the world, code switching occurs opportunistically, with words from both languages interspersed throughout a single utterance (Poplack 1987;Tiv et al. 2022).In other communities, the switching is less frequent and is used to contrast or emphasize a particular point (Myslín and Levy 2015;Poplack 1987).
One approach towards tackling this complexity is to study populations who use the same languages across distinct geographical regions with different sociolinguistic norms regarding the day-to-day use of their multiple languages (Beatty-Martínez et al. 2020;Hofweber et al. 2016;Ooi et al. 2018).Beatty-Martínez et al. (2020) recruited proficient bilingual speakers of English and Spanish across three different populations (Puerto Rico, Spain, United States) that varied in how languages were used in their everyday lives.Bilinguals in Puerto Rico predominantly spoke Spanish but commonly engaged in code switching and were regularly exposed to English in various domains (education, media, etc.).Bilinguals in Spain lived in an environment where they predominantly spoke Spanish and where switching between languages was not a common occurrence.Finally, bilinguals in the United States were native Spanish speakers who had moved to the US during childhood or adolescence and were raised in communities where code switching was a common practice and were now living in a predominantly English-speaking environment (Beatty-Martínez et al. 2020).
Critically, participants in all of these populations were highly proficient bilinguals, native Spanish speakers who acquired English as their second language either simultaneously or during their early childhood.Beatty-Martínez et al. (2020) were able to show that although these participants would have traditionally been considered a part of the same English-Spanish bilingual population, the contexts in which they utilized their languages related to the way in which cognitive control was engaged during language production.Similarly, Hofweber et al. ( 2016) compared the code-switching habits of German and English speakers across two populations: first-generation speakers in the United Kingdom and fifth-generation speakers in South Africa.Hofweber and colleagues found that although both populations frequently engaged in insertion and alternation, the fifth-generation speakers reported engaging in more dense code switching than the first-generation speakers, and this increase was associated with better conflict monitoring performance.Ooi et al. (2018) investigated how age of acquisition and code switching impacted attentional control by examining the differences across bilinguals' experiences in Edinburgh and Singapore.Participants in Singapore acquire their second language early on in life and regularly engage in contexts where they have to use both languages/dense code switching.In Edinburgh, they recruited two groups of bilinguals (early vs. late acquisition) and a monolingual control group.They found that while both groups of bilinguals demonstrated enhanced attentional control, the performance was indexed by different measures.Singaporean bilinguals showed a lower conflict effect whereas late Edinburgh bilinguals demonstrated better attentional switching.Studies such as these, where the researchers consider the differences across their carefully selected populations, allow us to tease out and examine the intricate relationship between language, bilingualism, and the sociocultural contexts in which bilinguals use their languages.Kałamała et al. (2023) add that even in studies that utilize continuous measures, the use of regression models ignores the potential relationships between multiple variables, since the focus is predominantly on how well they explain the variance of the dependent variable.They argue for the exploration of alternative methods for modeling the direct and complex interactions between the different variables associated with bilingual experiences.In Kałamała et al. (2023), they employed the use of the psychometric network framework (Borsboom et al. 2021).These models take a different approach towards explaining behavior, in which behaviors emerge from the complex interactions between psychological, environmental, and biological components.In contrast to latent variable models, network models focus on the variance that is unique to connected nodes instead of the totality of measures.Under this framework, variables such as a second language proficiency and second language AoA correlate with each other not because they indicate the same underlying latent "bilingualism" construct, but because of mutually reinforcing interactions between the variables.In these models, the interactions between all of the variables are represented as connections between nodes.Connections are referred to as edges and represent the partial correlations between variables after controlling for the shared variance across other variables in the network.In their analysis, when compared to a traditional factor analysis, the network analysis was a better fit for describing the individual differences in bilingual experiences.The real power of these models is that there is no need to invoke some ethereal latent bilingualism variable; instead, bilingualism is entirely defined by the variables and their direct mutualistic interactions (Kałamała et al. 2023).In their sample of young adult language-unbalanced bilinguals living in Poland, they found a particular network structure that they attributed to the sociolinguistic environment in which their sample lived; they propose that bilinguals living in a distinct language environment would demonstrate a different network structure to the one present in their study.

Modeling Complexity
The use of network approaches as a means to model complex interactions has been growing across both general cognition (Castro and Siew 2020;Siew 2019) and psycholinguistics (Faust et al. 2016;Kenett et al. 2016;Stella 2020;Stella et al. 2018;Vitevitch et al. 2014).Researchers employing network approaches to study the mental lexicon have been able to show that the structure of the phonological network influences higher-level language processes involved in both comprehension and production (Vitevitch et al. 2014).Xu et al. (2021) used network modeling to study similarities and differences in naturalistic code-switched speech across two different bilingual populations (Spanish-English and Mandarin-English).This approach allowed them to model the organization of the lexicon in both languages in order to examine the connections within and between languages.Xu and colleagues found that the semantic organization of code-switched speech differed across languages.This study exemplifies the importance of considering the relationship between the languages themselves and the practices of code-switched speech, as well as the utility of network approaches in modeling the complex relationships.
Titone and Tiv (2022)'s Systems Framework of Bilingualism provides theoretical guidance for researchers on how to incorporate social and cultural factors into neurocognitive research.They introduce their Systems Framework of Bilingualism as a way of understanding the ramifications of the sociolinguistic context on language use, development, and cognition (Titone and Tiv 2022).Their framework dives into the benefits of incorporating a systems approach at the levels envisioned by Bronfenbrenner (1977) in his ecological systems theory: within the individual, between individuals, and at the societal level.This approach enables researchers to continue using the kinds of measures that exist in standard language background questionnaires to account for individual experiences with bilingualism but allows us to achieve more than just indirectly approximate how people use languages socially with others.Tiv et al. (2020) used a form of network analysis, social network analysis, to examine the interpersonal language dynamics of English and French bilinguals in Montréal.They mapped the real world in-person social networks of individuals and found distinct differences between individuals' bilingual, English, and French networks.They found that the bilingual network was the most densely connected, suggesting that bilinguals were most likely to connect with other bilinguals over monolinguals in either language, and that they felt closest to other bilingual individuals.They interpreted this to mean that at least in Montréal, bilingualism functions as a salient social identifier that signals group membership and cultivates the development of stronger ties to other bilinguals in their social network.By employing models such as the psychometric network framework, researchers can holistically examine the intricate interactions between various bilingualismrelated variables at the micro-, meso-, and macro levels.This approach acknowledges that bilingual experiences are shaped by a complex web of factors, all of which interact and influence one another.Instead of attempting to distill bilingualism into a single latent variable, network models embrace its multifaceted nature, offering a more comprehensive and nuanced understanding of bilingual experiences.

Conclusions
This systematic review explored two main questions: (1) what aspects of language are being considered when measuring bilingualism and (2) how language use/code switching is being measured/operationalized.Through a PRISMA-guided review, we identified 32 studies that investigated how researchers operationalized bilingual language use and code switching and we offered several recommendations for future researchers in this field.Ultimately, understanding the intricate relationship between bilingualism and cognition necessitates a nuanced exploration of sociocultural variables.Factors such as the specific languages spoken, the cultural and demographic characteristics of the population studied, and the contextual usage of languages critically shape this relationship.These factors, often overlooked, play a pivotal role in understanding how bilingual experiences influence cognitive processes.To accurately gauge the impact of bilingualism on cognition, it is essential to dissect and analyze these sociocultural elements in detail.This approach will enable a more precise understanding of the extent to which bilingual experiences are shaped by, and in turn shape, the sociocultural contexts in which they occur.

Table 1 .
Synthesis matrix for systematic review.