Is Early Bilingual Experience Associated with Greater Fluid Intelligence in Adults?

Emerging evidence suggests that early bilingual experience constrains the development of attentional processes in infants, and that some of these early bilingual adaptations could last into adulthood. However, it is not known whether the early adaptations in the attentional domain alter more general cognitive abilities. If they do, then we would expect that bilingual adults who learned their second language early in life would score more highly across cognitive tasks than bilingual adults who learned their second language later in life. To test this hypothesis, 170 adult participants were administered a well-established (non-verbal) measure of fluid intelligence: Raven’s Advanced Progressive Matrices (RAPM). Fluid intelligence (the ability to solve novel reasoning problems, independent of acquired knowledge) is highly correlated with numerous cognitive abilities across development. Performance on the RAPM was greater in bilinguals than monolinguals, and greater in ‘early bilinguals’ (adults who learned their second language between 0–6 years) than ‘late bilinguals’ (adults who learned their second language after age 6 years). The groups did not significantly differ on a proxy of socioeconomic status. These results suggest that the difference in fluid intelligence between bilinguals and monolinguals is not a consequence of bilingualism per se, but of early adaptive processes. However, the finding may depend on how bilingualism is operationalized, and thus needs to be replicated with a larger sample and more detailed measures.


Introduction
Recent studies suggest that mere exposure to a bilingual environment can affect an infant's attentional development. Specifically, 6-month-old infants from bilingual homes looked longer at a novel visual stimulus than infants from monolingual homes (Singh et al. 2015) and 7-to 9-month-old infants from bilingual homes redirected their visual attention faster and more frequently than infants from monolingual homes (Dal Ben et al. 2021;D'Souza et al. 2020;Kalashnikova et al. 2021). These findings are consistent with the hypothesis that infants' attentional processes adapt to the complexity of their language environment through greater exploration of the visual environment (D' Souza and D'Souza 2021). Furthermore, by comparing adults who learned their second language early in development with adults who learned their second language later in development, D 'Souza et al. (2021) found that the early bilingual adaptations in the attentional domain appear to last into adulthood. Specifically, the 'early bilingual' adults redirected attention faster than the 'late bilingual' adults, a finding which parallels the observation that infants from bilingual homes redirect attention faster than infants from monolingual homes. These data hint at the possibility that bilingual environments constrain the early development of the attentional system with long term consequences.
However, it is not known whether these early adaptations to bilingual environments in the attentional domain constrain cognitive development more generally. This is because most investigations into the relationship between bilingual language experience and cognitive abilities compared bilinguals with monolinguals 1 , rather than early bilinguals with late bilinguals. Moreover, the few studies that did compare early bilinguals with late bilinguals report mixed results. The first of these studies compared the two groups on inhibitory control: the ability to inhibit dominant, automatic, or prepotent responses (Luk et al. 2011). This measure was selected because it is believed that when words in one language are activated in the bilingual brain during language production, the activation of words in the other language(s) is suppressed, which means that over developmental time, the process that suppresses activation in the language domain is strengthened in the bilingual brain from regular use (Green 1998). Researchers (including Luk et al. 2011) have postulated a link between this change in activation levels during language production and the ability to consciously and deliberately override the tendency to produce a dominant or automatic response, such as the name of a colour word (Stroop 1935). Luk et al. (2011) found that early bilinguals (43 university students who acquired their second language (L2) before age 10 years) demonstrated greater inhibitory control than late bilinguals (43 university students who acquired L2 after age 10 years). This hints at a relationship between early bilingual experience and cognitive development. However, subsequent studies have either failed to replicate this finding (Humphrey and Valian 2012;Kalia et al. 2014;Kapa and Colombo 2013;Paap et al. 2014;Pelham and Abrams 2014;Tao et al. 2011) or only partially replicated it (Hartanto and Yang 2019). For example, the largest and most recent study to date found that early bilinguals (142 university students who acquired L2 before age 10 years) demonstrated greater inhibitory control than late bilinguals (54 university students who acquired L2 after age 10 years)-but only in one of several measures of inhibitory control (Hartanto and Yang 2019). The evidence in favour of a relationship between early language experience and inhibitory control is therefore limited.
Although researchers have mostly focused on group differences in inhibitory control, some have examined differences in another cognitive control process: monitoring and updating working memory representations. These studies have also yielded mixed results. For example, Paap et al. (2014) found that 91 bilinguals who had been exposed to L2 since birth performed better on a measure of monitoring than 55 bilinguals who had been exposed to L2 after age 6 years, but other studies have not found this effect (Hartanto and Yang 2019;Luk et al. 2011;Pelham and Abrams 2014;Yow and Li 2015); and Paap et al. (2014) concluded that their finding could be spurious because a group difference was only observed in one of five measures of monitoring.
Thus, the few studies that directly compared early bilinguals with late bilinguals across cognitive tasks yield inconclusive results. Yet, no study has hitherto directly compared early bilinguals with late bilinguals on a measure that may underpin many of these cognitive processes: fluid intelligence. Fluid intelligence is the ability to solve novel reasoning problems, independent of acquired knowledge. It is strongly correlated with a large number of apparently diverse cognitive variables (Salthouse et al. 2008), though it is not clear why. For some researchers (e.g., Salthouse et al. 2008), there is a single psychological capacity underpinning controlled or effortful processes, which has a lot of different labels: cognitive control, executive function, fluid intelligence, working memory. For other researchers (e.g., Friedman et al. 2011), the labels describe different-albeit similar-psychological capacities. To further complicate the issue, various theoretical frameworks exist and vie with each other to explain the same empirical data: EF unity/diversity (Friedman et al. 2008); the multiple demand system (Duncan 2010); executive/controlled attention (Engle et al. 1999a;Kane and Engle 2003); proactive control (Braver 2012;Braver et al. 2007); the supervisory attention system (Norman and Shallice 1986;Shallice and Burgess 1996).
Although there is no consensus on what the relationship is between fluid intelligence and various cognitive abilities, we suggest the relationship is not something that exists between separable functions, but rather that each function is the emerging property of dynamic interactions among diverse, interdependent, neural subsystems, not unlike how aerodynamic stability, maneuverability, and performance are the emerging properties of Languages 2022, 7, 100 3 of 17 interactions between the concrete properties of an aeroplane (the number of blades in its turbine, the aspect and taper ratios of its wings, the shape of its fuselage, etc.) and current weather conditions. In other words, we conceptualize fluid intelligence and cognitive abilities as emerging (and sometimes overlapping) properties that arise from the collective action of the components at the level below, and which may constrain the arrangements of the components below to some purpose or function, but cannot themselves be linked to any separable concrete component. To adapt to its physical and social environment, a system (individual) must internally represent parts of the external world and actively select, maintain, and implement goals (common executive function or common EF). Some tasks may also require the individual to continuously update their goal representations in working memory. We suggest that performing these goal-directed tasks is likely to require the coordinated activity of components involved in solving novel reasoning problems ( Figure 1). For example, a latent variable (common EF) was found to predict performance across nine different cognitive tasks (Friedman et al. 2008). It was also found to be related to a measure of intelligence (r = .51), as was another latent variable: updating working memory (r = .49) (Friedman et al. 2011). Fluid intelligence is important because it is not only related to adaptive processes such as learning, but also to educational, occupational, and social outcomes (Deary et al. 2007;Neisser et al. 1996).
Languages 2022, 6, x FOR PEER REVIEW 3 of 19 between separable functions, but rather that each function is the emerging property of dynamic interactions among diverse, interdependent, neural subsystems, not unlike how aerodynamic stability, maneuverability, and performance are the emerging properties of interactions between the concrete properties of an aeroplane (the number of blades in its turbine, the aspect and taper ratios of its wings, the shape of its fuselage, etc.) and current weather conditions. In other words, we conceptualize fluid intelligence and cognitive abilities as emerging (and sometimes overlapping) properties that arise from the collective action of the components at the level below, and which may constrain the arrangements of the components below to some purpose or function, but cannot themselves be linked to any separable concrete component. To adapt to its physical and social environment, a system (individual) must internally represent parts of the external world and actively select, maintain, and implement goals (common executive function or common EF). Some tasks may also require the individual to continuously update their goal representations in working memory. We suggest that performing these goal-directed tasks is likely to require the coordinated activity of components involved in solving novel reasoning problems ( Figure  1). For example, a latent variable (common EF) was found to predict performance across nine different cognitive tasks (Friedman et al. 2008). It was also found to be related to a measure of intelligence (r = .51), as was another latent variable: updating working memory (r = .49) (Friedman et al. 2011). Fluid intelligence is important because it is not only related to adaptive processes such as learning, but also to educational, occupational, and social outcomes (Deary et al. 2007;Neisser et al. 1996). Figure 1. What is the relationship between fluid intelligence and cognitive abilities? We conceptualise them as emerging properties of dynamic interactions between multiple internal and external constraints. The system (individual) adapts to the external world (represented here by cognitive tasks (the nine boxes)) by actively selecting and implementing goals. This appears to involve the coordination of a core set of structures and functions (common executive function or common EF). However, to succeed on some tasks, the individual must also replace (update) a goal with another and/or rapidly shift between goals. Fluid intelligence is the ability to reason or implement a goal (independent of acquired knowledge) and update the working memory. It therefore overlaps with common EF and updating.
How might early bilingual experience constrain the development of such a general cognitive ability as fluid intelligence? One possibility is that changes in fluid intelligence reflect early experience-driven adaptations to environmental complexity. There is evidence that bilingual environments are more complex than monolingual environments. For example, some bilingual parents produce single utterances that contain phonemes from different languages (Bail et al. 2015), and some produce more pronunciation errors in their non-native language (Bosch and Ramon-Casas 2011). It has been proposed that infants adapt to these more complex (language) environments by exploring (gathering more Figure 1. What is the relationship between fluid intelligence and cognitive abilities? We conceptualise them as emerging properties of dynamic interactions between multiple internal and external constraints. The system (individual) adapts to the external world (represented here by cognitive tasks (the nine boxes)) by actively selecting and implementing goals. This appears to involve the coordination of a core set of structures and functions (common executive function or common EF). However, to succeed on some tasks, the individual must also replace (update) a goal with another and/or rapidly shift between goals. Fluid intelligence is the ability to reason or implement a goal (independent of acquired knowledge) and update the working memory. It therefore overlaps with common EF and updating.
How might early bilingual experience constrain the development of such a general cognitive ability as fluid intelligence? One possibility is that changes in fluid intelligence reflect early experience-driven adaptations to environmental complexity. There is evidence that bilingual environments are more complex than monolingual environments. For example, some bilingual parents produce single utterances that contain phonemes from different languages (Bail et al. 2015), and some produce more pronunciation errors in their non-native language (Bosch and Ramon-Casas 2011). It has been proposed that infants adapt to these more complex (language) environments by exploring (gathering more information from) their visual environment to facilitate learning (D' Souza and D'Souza 2021). networks such as the 'multiple-demand cortex' (MDC; Duncan 2010). The MDC is not a single cortical region but rather a network of co-activating frontoparietal regions that stretch from intraparietal sulcus to middle frontal gyrus, and which are involved in functions as diverse as controlling attentional processes, maintaining goals, and selecting strategies. The MDC is especially active during tasks that require learning (Duncan and Owen 2000;Ruge and Wolfensteller 2016). For example, stimulation of MDC regions (using repetitive transcranial magnetic stimulation) substantially enhances word learning (Fiori et al. 2018;Sliwinska et al. 2021;Sliwinska et al. 2017). It has been argued that MDC facilitates learning through the orchestration of domain-specific networks Schneider 2005, 2012;Uddin 2015) while also serving as an attentional modulator of tasks (Walsh et al. 1998). We therefore propose that early adaptations that engage and strengthen parts of the MDC may have indirect consequences on learning in general, and thus on tasks that measure the ability to solve novel problems, independent of acquired knowledge: i.e., fluid intelligence.
Moreover, a study found that research participants demonstrate greater performance on a widely used test of fluid intelligence (Raven's Advanced Progressive Matrices; Raven et al. 2003) when the test is modified to force participants to attend to each part of the puzzle separately (Duncan et al. 2017). This hints at the possibility that adaptive visuoattentional motor behaviour, such as redirecting attention and visual scanning (exploration), may scaffold the development of learning strategies as well as strengthen frontoparietal connectivity. This would fit with evidence that early bilingual adults are quicker at detecting change than late bilingual adults (D' Souza et al. 2021). In other words, early adaptations to complex language environments might have cascading effects on neural connectivity that scaffold a range of cognitive functions and behavioural strategies, possibly even resulting in individuals being better at solving problems by attending more to the parts of a problem. Indeed, measures of attentional scope and control during task performance are associated with measures of fluid intelligence in children and adults (Cowan et al. 2006). This raises the possibility that performance on cognitive tasks is contingent on the participant's attentional ability to zoom in to maintain a goal in the presence of interference (e.g., when naming the colour of a colour word rather than reading the word aloud), zoom out to apprehend and retain multiple items simultaneously (e.g., when maintaining a specific configuration of visual stimuli in working memory), and to switch back and forth to identify, represent, and contrast the parts of a problem (see Cowan 1995, for discussion).
If early bilingual experience results in increased exploration and attentional control, and if attentional control co-develops with the neural structures (e.g., the MDC) through which fluid intelligence emerges, then we would expect early bilingual adults to score more highly on a fluid intelligence task than late bilingual adults. To test this hypothesis, we directly compared early bilingual adults with late bilingual adults on the well-established (non-verbal) measure of fluid intelligence: Raven's Advanced Progressive Matrices.

Participants
A total of 212 English-speaking adult participants (18+ years) were recruited online through the university and social media, of whom 42 were excluded from analysis because they provided incomplete information regarding their linguistic status. Of the remaining 170 participants, 49% (n = 83) were women and 51% (n = 87) were men. Participants were organized by linguistic status based on commonly used criteria: individuals who acquired two or more languages by 6 years of age (early bilinguals); individuals who acquired two or more languages after 6 years of age (late bilinguals); and individuals who acquired no more than one language (monolinguals) Vaid 2006, 2007). See Table 1 for participant characteristics by group. The groups did not significantly differ on gender, χ 2 (2, N = 170) = 1.91, p = .385, or parental education, F 2,167 = 3.04, p = .051, ω 2 = 0.02. However, because the group difference in parental education was trending, we decided to carry out post-hoc Bonferroni-adjusted t-tests to aid interpretation. Parental education was not significantly higher among early bilinguals than late bilinguals (t = 2.15, p = .100) or monolinguals (t = 2.20, p = .088). Nor was parental education significantly higher among late bilinguals than monolinguals (t = 0.11, p > .999).
As expected, the bilingual groups did not differ on first language (L1) proficiency, U = 1543.50, Z = −0.88 p = .382. This is because 99.1% of participants reported having a native-level grasp of their first language; the one exception being a late bilingual who reported having an 'advanced' but not 'native-level' grasp of the language. However, second language (L2) proficiency was higher in the early bilingual group than in the late bilingual group, U = 1056.50, Z = −3.30, p < .001. L2 proficiency in relation to L1 proficiency (i.e., L2 proficiency minus L1 proficiency) was closer to 0 in the early bilingual group than in the late bilingual group, U = 1080.00, Z = −3.16 p = .002, indicating that early bilinguals were more likely to be equally proficient in their first two languages than late bilinguals. (Mann-Whitney U tests were used to analyse ordinal data.)

Materials
Materials comprised an online background questionnaire and an online version of Raven's Advanced Progressive Matrices-Set I (Raven et al. 1988). These were presented to participants via the online Gorilla platform (www.gorilla.sc; Anwyl-Irvine et al. 2020).

Background Questionnaire
Participants were asked questions about their demographic, parents' education, and language background. Specifically, participants were asked to provide the following information: age; gender; parents' level of education (rated from 1 (no qualifications) to 6 (doctoral level qualifications)); the languages they speak; and for each language, whether they acquired the language early (first six years of life) or late (6-18 years or adulthood), and whether they were fluent in the language or not (rated from 1 (basic) to 4 (fluent)).
The answer to one of these questions (parents' level of education) was used as a proxy of socioeconomic background. Socioeconomic status (SES) is a multifaceted, complex construct (Farah 2017). It is often measured as a composite of parents' education, occupation, and income (e.g., Li et al. 2020) and is associated with academic achievement (e.g., Gottfried et al. 2003;Sirin 2005) and intelligence (e.g., Von Stumm and Plomin 2015;White 1982). Parental education is a particularly powerful predictor of academic achievement, intelligence, and socioeconomic success. For example, a structured equation modelling study of 8-to 12-year-old children (N = 868) found that SES was related indirectly to children's academic achievement through parental education (rather than income), specifically through the parents' beliefs and behaviours (Davis-Kean 2005). Because there are such strong correlations between parental education, other measures of family SES, and academic achievement and intelligence, and because parental education is a greater predicator of educational achievement than other measures of SES (Liu et al. 2020), to keep the online design as short as possible, a single measure (parental education) was adopted as a proxy of SES.

Raven Advanced Progressive Matrices-Set I
Participants completed Set I of Raven's Advanced Progressive Matrices (RAPM; Raven et al. 1988). The RAPM was selected because it is a commonly used nonverbal test of fluid intelligence and proxy for general cognitive ability (Raven et al. 2003). The items comprise a series of black visual geometric designs on a white background, in the form of a 2 × 2, 3 × 3, 4 × 4, or 6 × 6 matrix. Each matrix has a single missing piece, and the participant is instructed to identify the missing piece from six to eight options. The RAPM contains 48 items, presented as one set of 12 (Set I) and another of 36 (Set II). Only Set I of the RAPM was administered to participants in the current study, because evidence suggests that Set I is a reliable and valid short form of the RAPM in the assessment of fluid intelligence (Chiesi et al. 2012) and a short online procedure helps to avoid collecting data from only persistent and determined participants. That is, a participant may be more likely to withdraw from an online study (which they can do by simply closing their browser) than a traditional lab-based one.

Procedure
Participants were provided with a link to access this online study. After providing informed consent, they were asked to fill in the background questionnaire. The participants then received instructions on how to complete the RAPM-Set I and were provided with three practice items. Once they had completed the practice items, they were allowed to proceed and complete the 12 items in Set I. Participants had unlimited time to complete each item, but were informed that their response times would be recorded.

Ethical Approval
The study received ethical approval from the School Research Ethics Panel (No. PSYPGT20_135), which was ratified by the Faculty Research Ethics Panel under the terms of Anglia Ruskin University's Policy and Code of Practice for the Conduct of Research with Human Participants.

Comparing Early Bilinguals with Late Bilinguals and Monolinguals
Six outliers (with scores less than 2 SD from the group mean) were identified and excluded from analysis (two from each group). A further seven outliers (with response times greater than 2 SD from the group mean) were excluded from analysis (one early bilingual, three late bilinguals, and three monolinguals). As predicted, fluid intelligence significantly differed across groups, F 2,154 = 9.91, p < .001, ω 2 = 0.10 (see Figure 2). Planned contrasts were used to break down the variation due to the experimental manipulation into its component parts. The planned contrasts revealed that fluid intelligence was higher in

Comparing Early and Late Bilinguals: A Bayesian Analysis
To further evaluate the evidence, a Bayesian analysis was carried out (van Doorn et al. 2021). We focus on the critical difference reported in the main document between early bilinguals and late bilinguals. The null hypothesis (H0) postulates that there is no difference in fluid intelligence between early and late bilinguals and therefore should equal 0. The one-sided alternative hypothesis (H+) states that only positive values of are possible and assigns more prior mass to values closer to 0 than extreme values. Specifically, was assigned a Cauchy prior distribution with r = 1/√2, trunctuated to allow only positive effect size values. Figure A1 (the prior and posterior plot) shows that the Bayes factor (BF+0) indicates evidence for H+; specifically, BF+0 = 12.15, which means that the data are approximately 12 times more likely to occur under H+ than H0. This result indicates "moderate" to "strong" evidence in favour of H+ (Kass and Raftery 1995). The error percentage is < 0.001%, which indicates great stability of the numerical algorithm that was used to obtain the result. In order to assess the robustness of the Bayes factor to our prior specification, Figure A2 shows BF+0 as a function of the prior width r. Across a range of widths, the Bayes factor appears to be relatively stable, ranging from 8.26 to 13.21.
We also report the results for parameter estimation. Of interest is the posterior distribution of the standardized effect size (i.e., the population version of Cohen's d, the standardized difference in mean fluid intelligence). For parameter estimation, was assigned a Cauchy prior distribution with r = 1/√2. Figure A1 shows that the median of the resulting posterior distribution of equals 0.50-albeit with a central 95% credible inter-

Comparing Early and Late Bilinguals: A Bayesian Analysis
To further evaluate the evidence, a Bayesian analysis was carried out (van Doorn et al. 2021). We focus on the critical difference reported in the main document between early bilinguals and late bilinguals. The null hypothesis (H 0 ) postulates that there is no difference in fluid intelligence between early and late bilinguals and therefore δ should equal 0. The one-sided alternative hypothesis (H + ) states that only positive values of δ are possible and assigns more prior mass to values closer to 0 than extreme values. Specifically, δ was assigned a Cauchy prior distribution with r = 1/ √ 2 , trunctuated to allow only positive effect size values. Figure A1 (the prior and posterior plot) shows that the Bayes factor (BF +0 ) indicates evidence for H + ; specifically, BF +0 = 12.15, which means that the data are approximately 12 times more likely to occur under H + than H 0 . This result indicates "moderate" to "strong" evidence in favour of H + (Kass and Raftery 1995). The error percentage is < 0.001%, which indicates great stability of the numerical algorithm that was used to obtain the result. In order to assess the robustness of the Bayes factor to our prior specification, Figure A2 shows BF +0 as a function of the prior width r. Across a range of widths, the Bayes factor appears to be relatively stable, ranging from 8.26 to 13.21.
Languages 2022, 7, 100 8 of 17 We also report the results for parameter estimation. Of interest is the posterior distribution of the standardized effect size δ (i.e., the population version of Cohen's d, the standardized difference in mean fluid intelligence). For parameter estimation, δ was assigned a Cauchy prior distribution with r = 1/ √ 2 . Figure A1 shows that the median of the resulting posterior distribution of δ equals 0.50-albeit with a central 95% credible interval for δ that ranges from 0.13 to 0.89. In summary, we hypothesised that fluid intelligence would be higher in early bilinguals than late bilinguals. A Bayesian version of the independent t-test suggests that the evidence provided by the data are about 12:1 in favour of our hypothesis, BF +0 = 12.15, error =~6.62 × 10 −6 . According to Jeffreys (1961), Kass and Raftery (1995), and Raftery (1995), these data provide "moderate" to "strong" evidence in favour of the alternative hypothesis.

Exploratory Analyses
To probe further, we carried out an additional set of (exploratory) analyses. Because these analyses are exploratory, they should be interpreted with extra caution. First, we directly compared late bilinguals with monolinguals. Fluid intelligence was greater in late bilinguals than monolinguals, t(109) = 2.20, p = .030, d = 0.42 (two-tailed).
Second, because the early bilinguals had more languages than the late bilinguals, we decided to see whether the number or proficiency of languages the bilinguals could speak correlates with fluid intelligence. Neither the number of languages, nor the number of advanced/fluent/native languages, nor second language proficiency, nor second language proficiency in relation to first language proficiency correlated with fluid intelligence, r = .02, p = .821, r = −.11, p = .269, τ = −.08, p = .333, τ = −.10, p = .238, respectively (Kendall's tau was used for ordinal data).
Finally, because bilingual experience was so variable, we decided to see what would happen if we used a different definition of bilingualism. If we re-categorize bilinguals as individuals who are advanced or fluent/native in 2 or more languages, then, excluding 10 outliers (3 early bilinguals, 1 late bilingual, and 6 monolinguals) for having scores less than 2 SD from their group mean, and 5 outliers (2 early bilinguals, 1 late bilingual, 2 monolinguals) for taking too long to provide their answers (+2 SD), we no longer find a significant main effect of group, F 2,152 = 2.49, p = .086. If the scores of all participants are included (excluding the five participants who took too long), then there is an effect of group, F 2,162 = 4.73, p = .010, but only between bilinguals and monolinguals (p = .003) and not between early bilinguals and late bilinguals (p = .305). We obtain a similar result if we include the five participants who were too slow: F 2,167 = 5.95, p = .003; bilinguals vs. monolinguals, p = .002; early bilinguals vs. late bilinguals, p = .155. Therefore, the difference between early and late bilinguals varies according to how they are categorized.

Discussion
This study was the first to directly compare early bilinguals with late bilinguals on a commonly used measure of fluid intelligence. We found that fluid intelligence was greater in early bilinguals (adults who learned their second language between 0 and 6 years) than late bilinguals (adults who learned their second language after age 6 years). The groups did not significantly differ on a proxy of socioeconomic status (parental education). These data suggest that the difference in fluid intelligence between the bilinguals and monolinguals was not a result of bilingualism per se, but of early adaptive processes. However, this result must be interpreted with caution because it could not be replicated when bilingualism was operationalized differently. A large replication study is needed.
If the main result is replicated in a larger study, it will be important to ask the following question: How could early cognitive development be constrained by bilingual experience? One suggestion is that bilingual language processing requires (and thus strengthens) domain-general cognitive control processes. For example, 20-month-old bilingual toddlers demonstrate a processing cost when they hear a language switch within a sentence (e.g., "Look! Find the chien!") (Byers-Heinlein et al. 2017). The researchers interpreted this switch cost to mean that the toddlers were monitoring and controlling their languages during language listening, and that monitoring, controlling, and switching between two languages during comprehension "trains information processing beyond the domain of language" (Byers-Heinlein et al. 2017, p. 9035). However, there are different ways of interpreting the data-and most importantly, there remains a large explanatory gap between what may be a language-specific mechanism and a domain-general cognitive control process. For example, switching between languages in the temporal domain may require different mechanisms from those necessary for switching between visual stimuli in the spatial domain. In addition, the processing cost was only observed when the switch occurred within a single sentence, not between sentences. For this reason, the data can only suggest a link between cognitive control and a language environment where switching within sentences (intra-sentential code switching) is common. However, a study of 10-and 18-month-olds in Montreal, Canada, found low rates of infant-directed code-switching (0.6-2.8 times per 100 words), and most of these (77-83%) were instances of inter-sentential code switching, not intra-sentential code switching (Kremin et al. 2021). A more parsimonious explanation for why there is a switch cost in 20-month-old toddlers is that the brain actively adapts to its environment in a cost-efficient way by automatically minimizing the gap between prior expectations and sensory input (Friston 2009(Friston , 2010. Involvement of a higher-level (domain-general) cognitive control process is possible but not necessary.
Alternatively, because infants raised in bilingual environments are likely to receive fewer words from each language, and more varied, noisier, and less predictable speech sounds, they may sample (explore) more of their environment than infants exposed to just one language (D' Souza and D'Souza 2021). For example, visual information (such as lip movements) is known to facilitate the discrimination of speech sounds, so infants in more complex (language) environments may use visual information for a longer period of developmental time than those in less complex environments, which may result in them rapidly and more frequently switching attention between the objects they are exploring and speakers' lip movements. This kind of activity would be supported by the mobilisationand thus refinement-of neurocognitive mechanisms that enable rapid disengaging and attention switching. Indeed, this is what was found in another study: infants raised in bilingual homes were faster at disengaging attention from one visual stimulus in order to shift attention to another visual stimulus, and they switched attention more frequently between visual stimuli, than infants raised in monolingual homes (D' Souza et al. 2020).
How might these early bilingual adaptations constrain the development of fluid intelligence specifically? The neurocognitive mechanisms that enable rapid disengaging and attention switching comprise long-range white matter tracts that connect and facilitate coordinated activity among multiple neural systems in the parietal lobe (e.g., intraparietal cortex, superior parietal lobe, and precuneus) and frontal lobe (e.g., anterior cingulate cortex, presupplementary and supplementary motor areas, and middle frontal gyrus) (Corbetta and Shulman 2002;Corbetta et al. 2008). Most of these neural systems have also been associated with fluid intelligence (Cocchi et al. 2014;Cole et al. 2015;Duncan 1995;Fedorenko et al. 2013;Gray et al. 2003;Tamnes et al. 2010;Ullén et al. 2008;Wendelken et al. 2017; see Basten et al. 2015, for meta-analysis). For example, spontaneous activity in the right middle frontal gyrus correlates with two functionally segregated attentional systems: the dorsal and ventral frontoparietal networks (Corbetta et al. 2008). The dorsal network appears to bias or filter activity in the ventral network, ensuring that the ventral network processes only behaviourally-important information, while the ventral network reorients the dorsal network to unexpected but behaviourally-important stimuli. These interactions happen via the right middle frontal gyrus (Fox et al. 2006). Damage to this region disrupts performance on tasks that require participants to reorient attention (Japee et al. 2015). This suggests that the right middle frontal gyrus plays an important role in attention. However, it also plays an important role in fluid intelligence. In one Magnetic Resonance Imaging study, 104 young adults were administered 21 different cognitive tasks, including Raven's Advanced Progressive Matrices (Colom et al. 2013). Confirmatory factor analysis found significant overlap between the right middle frontal gyrus (cortical grey matter volume, cortical surface area) and fluid intelligence.
The connection between attention and fluid intelligence is not surprising, since both involve building, monitoring, and maintaining behaviourally-important goal representations in the prefrontal cortex (Duncan 2010). Early adaptations in the attentional system may therefore shape the emergence and development of fluid intelligence through increased long-range frontoparietal connectivity. This would explain why fractional anisotropy in white matter fibre tracts that link parieto-occipital regions with premotor and prefrontal areas is greater in children who were exposed to two languages before age 3 years than age 3-5 years (Mohades et al. 2012).
Furthermore, fluid intelligence is believed to reflect processing speed (Salthouse 1996) and/or working memory capacity (Conway et al. 2002;Engle et al. 1999b;Kyllonen and Christal 1990). Performance on tests of fluid intelligence and processing speed is correlated with prefrontal white matter volume (Ullén et al. 2008), which in turn appears important for the synchronization of cortical neural activity (Traub et al. 2004) and a wide range of cognitive functions, including attention and working memory (e.g., Fries 2005;Singer 1999;Uhlhaas and Singer 2006). Adaptations in attention may therefore lead to greater prefrontal connectivity that results in more temporally stable neural activity, which may affect other cognitive functions dependent on neural synchrony in the millisecond range. This is consistent with evidence of greater structural and functional connectivity in the frontoparietal control network in bilingual adults than monolingual adults (Grady et al. 2015), and more importantly, greater in early bilingual adults than late bilingual adults (Berken et al. 2015(Berken et al. , 2016. Greater attentional control may also lead to improvements in fluid intelligence via working memory. Although working memory has often been described as a system that comprises a supervisory subsystem that directs attention to relevant information, suppresses irrelevant information, and controls three subordinate subsystems which maintain representations (Baddeley and Hitch 1974), more recent models suggest that working memory refers to representations in long-term memory that have become activated through attentional processes, up to four of which may be called the 'focus of attention' (Cowan 1995(Cowan , 1999(Cowan , 2010. In other words, working memory and attention are not necessarily separate subsystems but different aspects (states) of the same underlying neurocognitive system; we select and maintain items in working memory by actively attending to them. If early bilingual adaptations lead to greater connectivity within attentional networks, it would explain why bilinguals perform particularly well on working memory tasks that require greater attentional control, such as complex span tasks (Linck et al. 2014).
Fluid intelligence may also reflect the ability to direct attention to the simpler parts of a complex task (Duncan et al. 2017). If this is the case, then it is possible that early bilingual experience biases the visuo-attentional system in a way that supports the development of learning strategies as well as strengthens frontoparietal connectivity. That is, performance on tasks of fluid intelligence may rely on the ability to zoom in and out or switch back and forth between the different aspects of a single representation or problem. This attentional flexibility could mean that bilinguals are more likely to find the solution to a task of fluid intelligence, and may explain how early bilingual adults were better at detecting minute gradual changes in a visual stimulus than late bilingual adults (D' Souza et al. 2021).
Assuming that early bilingual experience is related to fluid intelligence, then given that fluid intelligence is also correlated with general cognitive ability, why do comparisons between early bilinguals and late bilinguals (or even between bilinguals and monolinguals) on cognitive performance report mixed results (e.g., Donnelly et al. 2019)? Although attention, fluid intelligence, and cognitive abilities overlap, they are still functionally separate. Attention is the ability to select goal-relevant sensory information, monitor information, and redirect action; fluid intelligence is the ability to recognize patterns, engage in abstract thinking, and solve novel problems; and while cognitive abilities include problem solving, they also include planning, controlling, and regulating the flow of information processing more generally to support goal-oriented behaviour, including the capacity to draw on past experience and knowledge to comprehend situations, figure out what is needed, decide on and plan new courses of action, and override automatic thoughts and behaviours. Although all three functional clusters rely on many of the same neural regions, adaptive demands differ across time and place, and thus, the neurocognitive processes adapt to the environment not by activating the same neural regions but through the timely coordination of different neural populations. It is therefore possible that some cognitive tasks require long-range connectivity between parietal and frontal regions, while others rely more on specific neural systems in prefrontal cortex.
Moreover, there is evidence of a gradual differentiation of cognitive ability, with a transition from a single non-specific function early in life to diverse and increasingly specialized functions by adulthood. For example, confirmatory factor analyses often identify a 'mental set-shifting' factor in adults, but not in young children (see Karr et al. 2018, for review). These findings are consistent with the observation that grey matter in regions associated with mental set-shifting (e.g., dorsolateral prefrontal cortex) is pruned after grey matter in regions more associated with general cognitive ability (e.g., ventral frontal areas) (Gogtay et al. 2004). This suggests that early adaptations in frontoparietal connectivity may have broad implications for fluid intelligence but not necessarily for specific cognitive abilities such as mental set-shifting.
In summary, we propose that early bilingual adaptations involve mechanisms that enable rapid disengaging and attention switching and that these mechanisms are likely to comprise white matter tracts and functional connectivity among diverse frontal and parietal cortical regions. This 'attentional network' is unlikely to be an independent system. It is more likely to involve interdependent processes that overlap with other neurocognitive processes and which have cascading effects on how fast information is processed, how representations are selected, maintained, and updated, and how complex tasks are solved.

Limitations and Future Research
Although our data fit with findings in the wider literature, our study needs to be replicated by larger studies that include larger samples and more detailed measures of fluid intelligence, language experience, and socioeconomic background. For example, rather than ask participants to tick whether they acquired the language early (first six years of life) or late (6-18 years or adulthood), we could ask them to state when they acquired each language. The need to replicate our results with more detailed measures of language experience is particularly evident given the fact that the effect was not observed when we re-analysed the data using a different criterion for bilingualism. This raises the question of precisely what kind of bilingual experience leads to greater fluid intelligence? A larger sample would also permit us to analyse typological similarity and other potentially important factors (Antoniou and Wright 2017). A larger study could also factor in the time dimension by longitudinally relating infant and home environment data to later measures of language experience, home/school environment, and fluid intelligence. This could pave the way to studies that elucidate causal mechanisms.
If the current study is replicated on larger scales, with larger samples and more detailed measures, and fluid intelligence is found to be greater in early bilinguals than late bilinguals (as the current study hints at), then it will pave the way to further studies on the relationship between early bilingual experience and fluid intelligence. Why has bilingual research not hitherto focused on fluid intelligence? We suspect that the reason is due to the popularity of the inhibitory control hypothesis (Green 1998) as an explanation for findings in the 'bilingual advantage' literature. As mentioned in the Introduction, the automatic process through which one language is selected for use during language production has been linked, via an inhibitory control mechanism, to the process through which a research participant controls their motor behaviour in tasks that require them to consciously and deliberately override an automatic, dominant, or prepotent response to an external stimulus. This argument requires more detail, however, because even if an inhibitory control mechanism is used 2 , there exists a wide range of inhibitory control mechanisms across multiple spatial and temporal scales, from the inhibition of dendritic growth to the regulation of neuronal oscillatory activity. They also serve a wide range of functions, from generating complex properties (e.g., non-linearity) in cortical circuits to preventing involuntary motor movements (Buzsaki 2006). A diverse range of inhibitory mechanisms may even be observed within the same domain and modality. For example, 'inhibition of return' and the 'inhibitory surround' mechanism are two very different processes in the low-level domain of visual attention, and 'prepotent response inhibition', 'resistance to distractor interference', and 'resistance to proactive interference' appear to be dissociable functions in the higher-level domain of cognitive control (Friedman and Miyake 2004). Even within the same cortical regions, different inhibitory neural networks coexist with each other: lateral inhibition divides, isolates, and suppresses neural activity; negative (inhibitory) feedback provides stability; and feedforward inhibition filters afferent excitation or increases the temporal precision of neural firing (Buzsaki 2006). Moreover, even seemingly single inhibitory processes may be subserved by different coalitions of neural activity. For example, performance across four similar inhibitory control (Stroop-like interference) tasks was found to be uncorrelated (Shilling et al. 2002;Borragan et al. 2018;Cipolotti et al. 2016). This diversity in inhibitory mechanisms precludes the possibility of any straightforward link between the automatic modulation of activation levels in the language domain and deliberate motor control. The two processes are conceptually separate. More importantly, the inhibitory control hypothesis fails to explain why mere exposure to two or more languages leads preverbal infants (who do not select and produce language) to redirect visual attention faster and more frequently than infants exposed to just one language (see D 'Souza andD'Souza 2016, 2021, for discussion). Future research should therefore seek to elucidate the putative link between the process through which one language is selected for use during language production and the process through which a participant controls their motor behaviour.
Future research should also establish when in development the inhibitory mechanism constrains cognition and behaviour. Indeed, there is evidence that greater inhibitory control is more likely to be found in late bilinguals than early bilinguals (Donnelly et al. 2019). In addition, whereas inhibitory control is often greater in late bilinguals than monolinguals, processing speed is often greater in early bilinguals than monolinguals (Donnelly et al. 2019). This is consistent with our hypothesis that early bilingual experience strengthens long range connectivity within the attentional network, while inhibitory control may play a role later in development.

Conclusions
Fluid intelligence was greater in bilinguals than monolinguals, and greater in early bilinguals than late bilinguals. This suggests that the difference in fluid intelligence was not a result of bilingualism per se, but of early adaptive processes. We hypothesise that early bilingual experience leads to adaptations that enable rapid disengaging and attention switching, and that these adaptations are likely to comprise increased functional connectivity among diverse frontal and parietal cortical regions that are implicated in a wide range of higher-level functions including fluid intelligence. However, our finding needs to be replicated with a larger sample and more detailed measures, because early and late bilinguals did not differ when bilingualism was operationalized differently. Figure A1. A Bayes factor of 12.15 suggests that the evidence provided by the data are about 12:1 in favour of the alternative hypothesis (the red part of the pie chart) that early bilinguals would demonstrate greater fluid intelligence than late bilinguals. A Bayes factor of 1 indicates that both hypotheses (alternative and null) predicted the data equally well. This prior posterior plot shows the prior (dashed line) and posterior (full line). The two gray dots indicate the prior and posterior density at the test value. The median and 95% central credible interval of the posterior distribution are shown in the top right corner. The pie chart ("probability wheel") at the top visualises the evidence that the data provide for the two rival hypotheses (red = alternative hypothesis; white = null hypothesis). Figure A2. The Bayes factor robustness check. The maximum Bayes factor (BF+0) is attained when setting the prior width r to 0.43. The plot indicates BF+0 for the user specified prior (r = 1/√2), wide prior (r = 1), and ultrawide prior (r = √2). The evidence for the alternative hypothesis is somewhat stable across a wide range of prior distributions, suggesting that the analysis is robust.

1.
A recent review and meta-analysis of 1194 effect sizes from studies that included 10,937 bilinguals and 12,477 monolinguals between 3 and 17 years of age found no effect of language status (bilingual vs. monolingual) on cognitive ability, after Figure A1. A Bayes factor of 12.15 suggests that the evidence provided by the data are about 12:1 in favour of the alternative hypothesis (the red part of the pie chart) that early bilinguals would demonstrate greater fluid intelligence than late bilinguals. A Bayes factor of 1 indicates that both hypotheses (alternative and null) predicted the data equally well. This prior posterior plot shows the prior (dashed line) and posterior (full line). The two gray dots indicate the prior and posterior density at the test value. The median and 95% central credible interval of the posterior distribution are shown in the top right corner. The pie chart ("probability wheel") at the top visualises the evidence that the data provide for the two rival hypotheses (red = alternative hypothesis; white = null hypothesis). Conflicts of Interest: The authors declare no conflict of interest.
Appendix A Figure A1. A Bayes factor of 12.15 suggests that the evidence provided by the data are about 12:1 in favour of the alternative hypothesis (the red part of the pie chart) that early bilinguals would demonstrate greater fluid intelligence than late bilinguals. A Bayes factor of 1 indicates that both hypotheses (alternative and null) predicted the data equally well. This prior posterior plot shows the prior (dashed line) and posterior (full line). The two gray dots indicate the prior and posterior density at the test value. The median and 95% central credible interval of the posterior distribution are shown in the top right corner. The pie chart ("probability wheel") at the top visualises the evidence that the data provide for the two rival hypotheses (red = alternative hypothesis; white = null hypothesis). Figure A2. The Bayes factor robustness check. The maximum Bayes factor (BF+0) is attained when setting the prior width r to 0.43. The plot indicates BF+0 for the user specified prior (r = 1/√2), wide prior (r = 1), and ultrawide prior (r = √2). The evidence for the alternative hypothesis is somewhat stable across a wide range of prior distributions, suggesting that the analysis is robust.

1.
A recent review and meta-analysis of 1194 effect sizes from studies that included 10,937 bilinguals and 12,477 monolinguals between 3 and 17 years of age found no effect of language status (bilingual vs. monolingual) on cognitive ability, after Figure A2. The Bayes factor robustness check. The maximum Bayes factor (BF +0 ) is attained when setting the prior width r to 0.43. The plot indicates BF +0 for the user specified prior (r = 1/ √ 2 ), wide prior (r = 1), and ultrawide prior (r = √ 2 ). The evidence for the alternative hypothesis is somewhat stable across a wide range of prior distributions, suggesting that the analysis is robust.

1
A recent review and meta-analysis of 1194 effect sizes from studies that included 10,937 bilinguals and 12,477 monolinguals between 3 and 17 years of age found no effect of language status (bilingual vs. monolingual) on cognitive ability, after accounting for moderating factors and publication bias (Lowe et al. 2021). The meta-analysis did not compare early bilinguals with late bilinguals.

2
Although it is possible that a dedicated control system inhibits the activation of words in the non-target language (Green 1998), such a system is not necessary. It is possible that greater activation of task-relevant words attenuates activation of task-irrelevant words. This could be achieved through simple lateral inhibition mechanisms, whereby the synchronous activation of one neural population automatically disrupts the synchronous activation of nearby populations (Buzsaki 2006). These lateral inhibition mechanisms are ubiquitous in the human brain and may help to prevent task-irrelevant interference (Buzsaki 2006). In other words, control processes and attentional processes may not be two independent systems but two aspects of one complex process involving highly diverse, interconnected, and interdependent neurons.