1. Introduction
Bilingualism and multilingualism are very common and established phenomena all over the world. For example, Singapore has four official languages and the country has adopted a bilingual education policy [
1], India specifies 22 languages in the Eighth Schedule of the Indian Constitution [
2], and in the European Union, nearly two-thirds of working-age adults report knowledge of at least one foreign language (European Commission, 2019).
Bi- and multilingualism are also complex phenomena. There are many reasons why people in the world communicate in more than one language. In some cases, bi- and multilinguals are speakers of a minority indigenous language who learn the dominant state language. In others, we find instances of so-called neighbourhood multilingualism (e.g., India; [
3]). However, a great deal of interest has recently been paid to cases of sequential bi- and multilingualism. Multilinguals can be immigrants who speak their first language(s) as well as the language(s) of their host countries. Or can simply be people who acquired an ability to speak languages further than their native one to gain a competitive edge.
The fact that people around the world are using, learning, and communicating in more than one language has important implications, such as many people making decisions in a language that is not their native tongue on a daily basis. The question arises whether there are differences in the way people make decisions depending on whether they are presented with their native or their foreign language. A burgeoning body of literature suggests so. It has been reported that making decisions in a native language differs from making decisions in a foreign language in some key aspects, a phenomenon known as the foreign language effect (henceforth FLe; [
4,
5]).
The underlying theoretical basis of the FLe comes largely from the literature on thinking and decision-making, in particular from the influential dual-process framework [
6,
7]. While different characterizations of the framework are on offer (cf. [
8]), they typically pose that two cognitive systems are implicated in reasoning. System 1 is intuitive and affective, whilst System 2 is analytic and rule-governed [
6] and the latter is thus often referred to as more ‘rational’. With regards to decision-making, it is generally claimed that engagement of System 2 leads to more deliberative thinking and therefore to better choices. However, under certain circumstances, people are prone to ‘errors’ in judgement and decision-making (i.e.,
cognitive biases) believed to arise from engagement in fast, automatic responses of System 1 (but cf. [
9,
10] for a proposal on how sometimes adaptive behaviours might come from fast, intuitive reasoning).
For individuals who speak more than one language, it has been reported that certain types of cognitive biases are reduced when presented with decision-making problems in their foreign language (FL) compared with their native language (NL) (for a review see [
11]). For example, Keysar et al. [
4] used different decision-making paradigms involving risk and loss aversion in a series of six experiments. Results showed that those participants presented with problems in their NL tended to choose more risky options when a problem with the same outcome was framed in terms of losses compared to when it was presented in terms of gains (known as a ‘framing effect’). Conversely, such ‘framing effect’ was not observed (i.e., there was no asymmetry in risk-seeking choice whether presented in terms of losses or gains) when the problem was presented in the FL. The authors proposed an account whereby reasoning in the FL elicits a reduced emotional reaction than reasoning in the NL, which in turn reduces cognitive biases influenced by emotional reactions.
Expanding on Keysar et al. [
4]’s findings, Costa et al. [
5] tested the generalizability and boundaries of the FLe. To this purpose, they employed a large set of paradigms that either invoked or did not invoke an emotional component. Their first set of results replicated those of Keysar et al. [
4]’s such that they reported a reduced framing effect for loss aversion biases in the FL compared with the NL. Moreover, the FLe was observed in other types of problems too, in particular in paradigms related to mental accounting and most of the risk aversion problems scenarios (since then, framing effects and/or loss aversion biases in the context of FLe have been also explored in other populations of young adults e.g., [
12,
13,
14,
15]). Finally, among the paradigms not expected to carry an emotional component, they tested the FLe on logical reasoning using the cognitive reflection test (CRT; [
16]). The CRT is a three-item test (in its original version but see methods for a variant with more items) designed to elicit an incorrect, ‘intuitive’ answer (System 1). In order to override the intuitive answer, one requires to engage in deliberative, analytical, and logical thinking (System 2) to determine the correct answer. Costa et al. [
5] reported that both the number of correctly answered questions and the number of intuitive wrong answers was similar for people performing the test in their NL and those performing in their FL, thus revealing an absence of FLe in this task. Taken together, results of this study concluded that the FLe seems to generalize to other decision-making paradigms involving emotional components but not to emotionally neutral paradigms (e.g., CRT), thus giving an indication of the scope and potential boundaries of the FLe.
Furthering on the generalizability of the FLe, Costa et al. [
17] assessed the FLe in relation to moral decision-making. They presented a series of moral dilemmas to participants in either their NL or their FL in various populations of L2-late learners of different languages. The moral dilemmas included the well-known trolley problem in which participants are asked whether they would sacrifice one individual (by pressing a switch) in order to save five people. They also employed the footbridge dilemma, which asks whether one would sacrifice an individual by pushing him off a bridge in order to save five people. The footbridge dilemma was considered ‘more emotional’ insofar as it entails pushing a man directly, whereas the trolley dilemma was considered less emotional. Under a utilitarian view, the ‘
rational’ answer is considered the one that achieves the greatest benefit for the greatest number (i.e., answering yes in the examples above). Results of this study showed that for the trolley (or switch) problem, the one considered less emotional, participants chose the utilitarian option most of the time, in both the NL and FL conditions. Conversely, for the footbridge problem, the one considered more emotional, participants presented with the dilemma in their FL made more utilitarian choices than those presented with their NL (but see also related studies on different populations by [
18,
19]). In addition, although the focus of Costa et al. [
17]’s study was on morality, a subset of participants underwent the CRT as well, which was used in that study as a background measure to support the claim that the FL group had good understanding of the problems. In contrast to previous findings (cf. [
5]), they reported better performance for participants doing the task in their FL compared to those in their NL. Taken together, their results were interpreted as support for the reduced emotionality in the FL.
As we have seen, one of the most prominent accounts for the potential underpinnings of the FLe has appealed to reduced emotionality when reasoning in a FL. Such emotional distance in the FL might stem from factors such as the learning modality, for example, classroom instruction for the FL [
20]. Using different methodologies such as behavioural, physiological, and neurophysiological measures, researchers have shown differences in emotional processing between the native and the foreign language. For example, taboo or swear words tend to be rated as less strong in the second than in the first language [
21]. Caldwell-Harris and Ayçiçeǧi-Dinn ([
22]; Experiment 1) showed reduced skin conductance responses to auditorily presented emotional phrases, such as childhood reprimands, when they were heard in the FL compared with the NL, thus suggesting a stronger emotional resonance in the NL than in the FL. In addition, using event-related potentials (ERPs), Opitz and Degner [
23] reported a longer latency of an ERP component (Early Posterior Negativity, EPN) when participants read positively (e.g., angel) and negatively (e.g., crime) emotionally loaded words compared to neutral words (e.g., bottle) in the second language (L2) compared to the first language (L1). In sum, the reduced emotionality account has been further supported by evidence that the FL tends to elicit less strong emotional responses than the NL (for a review see [
24]).
However, the reduced emotionality account has also been challenged on empirical and theoretical grounds. For example, in the context of moral decision making it has been argued that reduced emotionality cannot be the sole underlying mechanism to explain the FLe. In particular, a study by Geipel et al. [
19] found a FLe when participants were faced with what is generally regarded as a more emotional dilemma (i.e., the footbridge problem) but not when they were faced with what is usually considered to be a less emotional moral dilemma (i.e., the trolley problem), consistent with previous findings. Yet, an important element of their study was that the authors also tested the purported reduced emotionality account by gathering self-ratings of emotionality, and in particular ratings of distress after participants completed the moral judgement questions. The reasoning was that if reduced emotionality modulated the FLe, one would then expect to find reduced emotionality ratings in the FL compared with the NL, but only in the case of the more emotionally salient footbridge problem. However, this was not the case, since the authors found reduced emotionality ratings in the FL in both the trolley and the footbridge problem, thus indicating that reduced emotionality is unlikely to modulate the FLe on moral judgment. While these results challenge the reduced emotionality account, the authors also provided an alternative explanation of FLe by pointing to the moderating impact of norm violations. In particular, in another experiment, the authors showed that the FL only influenced moral judgement when the action in question involved a social or moral norm.
Yet, an alternative account that has recently garnered a great deal of attention posits that reasoning in a FL promotes deliberative thinking and explains the FLe in certain cognitive biases (see e.g., [
25]). The logic behind this account stems from the proposal that reduced cognitive fluency improves performance on tasks that require more careful processing by making people more cautious of their responses. In other words, a decrease in processing fluency prompts people to slow down and think more carefully about the decision-making situation. Evidence of this can be found in earlier behavioural observations from the judgement and decision-making literature. For example, Alter et al. ([
26]; Experiment 1) showed that factors that reduced cognitive fluency such as presenting the CRT items in difficult to read text, led to higher accurate rates than administering the test in easy to read fonts, thus concluding that cognitive disfluency prompted engagement in System 2 (analytic reasoning). Moreover, reduced cognitive fluency created by presenting information in a difficult to read font has also been shown to reduce the number of erroneous responses on distorted questions [
27] and to reduce confirmation bias [
28]. With that in mind, the ability of disfluency to prompt additional processing and reduce tendency to bias and errors has been well-documented in a variety of relevant contexts (but see [
29] for contrasting evidence). In the context of the FLe, it is proposed that thinking in a FL incurs higher processing costs thus reducing cognitive fluency. In turn, cognitive disfluency increases deliberation (see also [
11]).
Hayakawa et al. [
30] set out to test between the two aforementioned accounts, namely, reduced-emotionality and increased-deliberation. They used a ‘process-dissociation task’, a method that allows one to partial out the effects of deontological responding (related to emotionality) from those of utilitarian responding (related to deliberative reasoning). Results of a series of six experiments focusing on moral dilemmas showed that using a FL reduced deontological responding. In addition, they found no evidence that FL increased utilitarian responding (cf. [
17]). They concluded that the moral FLe (MFLe) arises from reduced emotionality rather than increased deliberation.
To sum up, a body of literature has shown that using a FL reduces certain types of cognitive biases and increases utilitarian choices, particularly in paradigms that elicit an emotional component. Our understanding of the FLe is still far from conclusive (for discussions and future directions see [
11,
31]) and even more, recent discussions argue that indeed the reduced emotionality and the increased deliberation proposals are not mutually exclusive [
11,
25]. In addition, among the questions that remain open are to what extent factors surrounding the bilingual experience impact the FLe [
11]. Most of the studies in the FLe literature have been conducted in populations of young adults, acquiring their FL later in life usually in late childhood or adolescence (e.g., [
4,
5,
18,
25,
30]). But would the FLe still be observed irrespective of when the FL was learnt? Or depending on how proficiently balanced people are in their languages? These are relevant lines of inquiry because, recall, the
reduced emotionality account posits reduced emotional reactions in FL compared with NL. However, emotional responses in FL and NL have been reported to be modulated by age of acquisition [
24]. As such, it would be expected that with the NL and FL being closer in age of acquisition, the purported difference in emotional reaction would be reduced and in turn, emotionality should not differentially affect reasoning in NL or FL. Similarly, the
increased-deliberation account poses that the FLe stems from increased processing costs in the FL, leading to reduced cognitive fluency, which in turn promotes increased deliberation. Under this logic, as the gap in proficiency between languages narrows, there would not be an incurred processing cost in the FL and hence, no increase in deliberative processing in the FL. In fact, Costa et al. [
17] performed a post hoc analysis splitting their participants into lower and higher proficiency and found that the number of utilitarian choices in the aforementioned footbridge dilemma was higher for those participants with lower proficiency than in those with higher proficiency.
A study that set out to focus specifically on aspects of the bilingual experience that could modulate the FLe on moral decision making, was recently published by Wong and Ng [
32]. They assessed the impact of age of acquisition and language dominance on responses to a series of personal and impersonal moral dilemmas, in Chinese-English young adult early-bilinguals. Results of their study showed no differences in the rate of utilitarian choices when participants performed in their NL or their FL. However, a FLe was crucially observed when language dominance was taken into account: the more dominant participants were in the language they were tested in, the larger difference in their moral judgements of personal versus impersonal dilemmas. Finally, Mækelæ and Pfuhl [
33] using a battery of reasoning tasks (including the CRT, base-rate neglect, ratio bias, and a probability matching task) compared the performance of young-adult participants when presented either with their first language, their foreign language, or in a language switching condition. Results of their study showed no differences in performance between the three conditions and thus the authors concluded that deliberation is not increased when using a second language.
To expand on previous findings, the present study aims to test the FLe on logical reasoning in an older population. Age-related declines on certain aspects of cognitive functioning in healthy adults have been well documented in the literature (for discussions see e.g., [
34,
35,
36]). In particular, cognitive abilities related to fluid intelligence (e.g., memory, processing speed, reasoning) have been reported to decline more than those cognitive abilities related to crystallised intelligence (e.g., verbal ability) [
35]. A number of accounts have been offered to explain such changes. From a neurobiological point of view, for example, one possibility appeals to frontal networks of the brain being more sensitive to ageing and thus impacting related cognitive abilities (for a review see e.g., [
37]). Reasoning is an ability that relies on frontal brain networks and therefore its study on an older population can aid our understanding of cognitive changes in older adulthood.
In addition, and of relevance for the present study, within the framework of dual-processing it has been proposed that deliberative processing (related to System 2) is more likely to decline with age than is affective processing (related to System 1) (for a review see [
38]). Based on that observation, studying those populations who are not expected to have a ceiling level on deliberative processing, in this case, older adults, can be more informative to capture an effect. Interestingly, a parallel line of reasoning is found in research on the impact of bi/multilingualism on cognition, and in particular on executive functioning (we thank an anonymous reviewer for raising this point). In that area of research, it has been suggested that certain cognitive effects of bilingualism are more likely to emerge in older adults than in younger adults (see e.g., [
39,
40]) due to young adults performing at ceiling (but cf. [
41,
42]). From that perspective, effects of language experience thus seem more likely to be observed in older adults. Although reliance on young adults is a common feature of modern experimental psychology, looking at young populations only is also problematic as it prevents generalizability of findings. Yet, as we have seen, the study of FLe and in particular, in relation to deliberative processing in older adults has been under researched as a large proportion of studies have focused on young adults.
In addition, given that most work on FLe has focused on testing the reduced-emotionality account, the present work sheds light on inconsistent previous findings on performance on CRT (related to increased-deliberation) in the context of the FLe. Furthermore, we take into account bilingualism-related factors (FL proficiency). Our hypotheses are thus based on the FLe in the context of the increased-deliberation account, and we reasoned that increased-deliberation could be tested in two ways as we outline below.
If thinking in a FL reduces cognitive fluency, and if reduced cognitive fluency engages in deliberative and analytical thinking, our predictions are two-fold: first, we hypothesize that participants performing an analytical task (CRT) in their FL would have a higher rate of correct responses than those performing in their NL. Second, if the cognitive demands imposed by thinking in a FL are reduced at higher levels of proficiency, we predict an inverse relationship between proficiency and performance such that those participants performing the task in the FL shall exhibit better performance at lower levels of proficiency. Finally, we also studied the effects of the demographic factors age and level of education. Regarding age, it has been reported that older adults, compared with younger adults, tended to provide a higher relative proportion of intuitive errors on the CRT 3-item version, whilst an opposite pattern (i.e., younger adults provide a higher proportion of intuitive errors compared with older adults) was reported in a longer 7-item version of the CRT [
43]. In a similar vein, Stieger and Reips [
44] performed analyses on each CRT item separately and reported that the lower the age, the higher the accuracy rate on item 1 of the classical 3-item version of the CRT. We also account for the effects of education, as it has been reported that higher education is a strong predictor of higher performance on the classic version of the CRT [
44].
3. Results
Table 1 shows demographic information and language proficiency scores. There were no significant differences between the two groups with regards to age (
U = 1501.50,
z = −1.93,
p = 0.054), gender distribution (
χ2(
n = 129) = 1.09,
p = 0.296) and years of education (
U = 1526.50,
z = −1.33,
p = 0.185). The groups were also comparable on the Raven’s APM short form (
U = 1884.50,
z = −0.027,
p = 0.978).
As expected, the groups differed on their use of Swedish at different stages in life (all ps < 0.001). The NL group self-rated their proficiency to speak Swedish (Mdn = 9.0) significantly higher than the FL group (Mdn = 8.0; U = 1437.00, z = −2.22, p = 0.026). Also self-rated proficiency to write was higher in the NL group (Mdn = 9.0) than in the FL group (Mdn = 8.0; U = 1299.00, z = −2.78, p = 0.005) but the two groups did not significantly differ on their self-rated abilities to read (MdnNL = 9.0, MdnFL = 9.0; U = 1673.50, z = −1.03, p = 0.302) and understand Swedish (MdnNL = 9.0, MdnFL = 9.0; U = 1717.50, z = −0.794, p = 0.427). The overall composite Swedish mean score was significantly different between the NL (Mdn = 9.0) and the FL (Mdn = 8.8) groups (U = 1421.50, z = −2.28, p = 0.023). There was also a significant difference between the groups for the Swedish synonym test (SRB) in that the NL group performed better than the FL group (Mdn = 25.5 and Mdn = 21.0, respectively; U = 908.00, z = −4.87, p < 0.001).
Regarding our first prediction, namely, that the FL group would perform better compared with the NL group on the CRT, first we compared the overall accuracy in the CRT between the groups. There was no significant difference in performance between the NL group (
Mdn = 1.00,
M = 1.89,
SD = 1.63) and the FL group (
Mdn = 1.00,
M = 1.67,
SD = 1.62;
U = 1705.50,
z = −0.942,
p = 0.346). In addition, since the groups did not differ on the covariates [
49], namely, the variables age and years of education, we used ANCOVA to adjust for these two covariates. This showed that while both covariates were related to participants’ performance on the CRT (age:
F(1, 121) = 6.01,
p = 0.016,
ηp2 = 0.047; years of education:
F(1, 121) = 3.99,
p = 0.048,
ηp2 = 0.032), the group effect remained not significant even after controlling for these two covariates (estimated marginal means after controlling for covariates:
MNL = 2.00,
SE = 0.18;
MFL = 1.57,
SE = 0.24;
F(1, 121) = 2.06,
p = 0.154,
ηp2= 0.017). The parameter estimates for the covariates were for age
b = −0.063, 95% CI [−0.1132, −0.0121] indicating that CRT accuracy decreased with age overall in the sample, whereas for education, more years of education were associated with better CRT performance overall
b = 0.059, 95% CI [0.0005, 0.1177] (Please see
Appendix A for additional analyses).
Table 2 displays the distribution (in percent) across the six CRT items between the NL and FL groups. There was no significant association between the proportion of correctly answered problems and language group,
χ2(n = 129) = 3.49,
p = 0.745 (Fisher’s exact
p = 0.783).
Table 3 displays zero-order correlations between the variables within the FL group. There was a relatively strong association between years of education and CRT performance, even after removal of two extreme outliers (28 and 40 years of education, respectively)
rs(42) = 0.37,
p = 0.015, 95% CI [0.0688, 0.6139]. However, this association did not reach statistical significance after correcting for multiple comparisons. Age was negatively associated with all measures of Swedish proficiency: the SRB, each of the self-rated sub-domains, and the Swedish mean composite (
Table 3), indicating that the older the participants, the lower their Swedish proficiency. After correcting for multiple comparisons, the associations between age and each of the self-rating measures and Swedish mean composite score remained significant, but not the association between age and SRB. In addition, there were strong positive correlations between the Swedish self-rating measures and the objective measure of Swedish, namely, the SRB (Swedish synonym test), which remained after correcting for multiple comparisons (
Table 3).
Regarding our second study prediction, namely that participants performing the task in the FL would exhibit better performance at lower levels of proficiency, the correlations between each of the four self-rated proficiency sub-domains and CRT performance (accuracy) in the FL group followed instead a positive direction, however, none reached statistical significance (
ps ≥ 0.071,
Table 3). Controlling for age and years of education using partial correlations only changed the direction of the association between CRT performance and Swedish speaking and writing to negative, but in all four correlations the associations remained non-significant (
ps ≥ 0.632). The association between the Swedish mean composite score and CRT accuracy was not statistically significant either (
rs(45) = 0.28,
p = 0.060, 95% CI [−0.0203, 0.5388]). A partial correlation, controlling for age and education yielded a negative association between CRT accuracy and Swedish mean composite score, but still did not reach statistical significance
r(40) = −0.002,
p = 0.991, BCa 95% CI [−0.3107, 0.2781]. The relationship between performance on SRB test and CRT accuracy followed a positive direction, however it was not statistically significant either
rs(45) = 0.23,
p = 0.128, 95% CI [−0.0769, 0.4973], and controlling for age and education in a partial correlation did not change this result
r(40) = 0.01,
p = 0.966, BCa 95% CI [−0.4035, 0.4401].