The Critical Period Hypothesis for L2 Acquisition: An Unfalsiﬁable Embarrassment?

: This article focuses on the uncertainty surrounding the issue of the Critical Period Hypothesis. It puts forward the case that, with regard to naturalistic situations, the hypothesis has the status of both “not proven” and unfalsiﬁed. The article analyzes a number of reasons for this situation, including the effects of multi-competence, which remove any possibility that competence in more than one language can ever be identical to monolingual competence. With regard to the formal instructional setting, it points to many decades of research showing that, as critical period advocates acknowledge, in a normal schooling situation, adolescent beginners in the long run do as well as younger beginners. The article laments the profusion of deﬁnitions of what the critical period for language actually is and the generally piecemeal nature of research into this important area. In particular, it calls for a fuller integration of recent neurolinguistic perspectives into discussion of the age factor in second language acquisition research.


Introduction
In SLA research, the age at which L2 acquisition begins has all but lost its status as a simple quasi-biological attribute and is now widely recognized to be a 'macrovariable' (Flege et al. 1999;cf. Birdsong 2018)-in other words, a complex combination of sociocultural and psychological variables. Dimensions other than physical maturation are increasingly often taken into account in discussions of age and language acquisition. This stems from the recognition that a host of factors is responsible for individual variability in L2 attainment, including the state of entrenchment of the L1, psychological variables such as self-regulation, motivation and identification, conative factors, as well as the degree of immersion in the L2 context, among others (see Birdsong 2018). Despite this, a narrow maturational perspective still persists, in the form of various versions of the Critical Period Hypothesis (CPH) (Lenneberg 1967). However, the precise timing of the offset of the posited critical period has long been a matter of debate, as has the proposed range of its effects (cf. Singleton 2005).
Some CPH advocates in the SLA area maintain the traditional perspective of the hypothesis ever more strictly. With regard to their criterion for the falsification of the CPH, they demand "scrutinized nativelikeness" for all linguistic features in the performance of later learners of additional languages at all times (Abrahamsson and Hyltenstam 2009;Long 2013). The one dimension in which some prominent CPH advocates (e.g., DeKeyser 2003; Johnson and Newport 1989) concede that the critical age has no role is that of formal education-for reasons having to do with the essential experiential difference between the normal language classroom and the naturalistic learning environment.
In this position paper, we explore a number of areas, concluding that in the naturalistic sphere, the critical period notion remains unproven but also unfalsified, which is very disappointing given the amount of time that has passed since CPH first emerged. We point out the main reasons that have contributed to this lack of progress and the unfortunate consequences of the lack of resolution of this controversy.

The Notion of Critical Period
The term critical period 1 , as used by biologists, refers to a phase in the development of an organism during which a particular capacity or behavior must be acquired if it is to be acquired at all. More precisely, a critical period is a "bounded maturational span during which experiential factors interact with biological mechanisms to determine neurocognitive and behavioral outcomes" (Birdsong 2017). Certain influences or stimuli from the environment are judged necessary for the particular development to take place. Critical periods are assumed to be enabled by the fact that the brain is especially plastic during early development, allowing for neural wiring to form optimal circuits for the development of a specific capacity or behavior. The reason why critical periods end is purportedly to facilitate future development. Once neural circuits are fully formed, they become fixed, which serves the purpose of allowing other, more complicated functions to build on the basis of the more basic ones, once the basic ones are consolidated (Cisneros-Franco et al. 2020).
An example often cited is that of early imprinting in certain species. Thus, for instance, immediately after hatching, ducklings follow and become attached to the first moving object they perceive-usually their mother. This following behavior occurs only within a certain period of time, after which the hatchlings develop a fear of strange objects and retreat from them instead of following them. Between onset of the following behavior and its cessation is what is seen as the critical period for imprinting (Clark and Clark 1977, p. 520). Another example is the acquisition of birdsong: for instance, if a young chaffinch does not hear an adult bird singing within a certain period, the young bird in question will apparently never sing a full chaffinch song (Thorpe 1954). Imprinting in birds exemplifies a sharply delimited critical period of relatively short duration. Critical periods for the development of complex behaviors in humans are understood to be longer and much less clearly delineated (Purves et al. 2004). If language acquisition in human beings is constrained by the limits of a critical period, the implication is, however, that unless language acquisition gets under way before the period ends, it will not fully happen. There is also widely assumed to be an analogical implication that additional languages acquired beyond the end of the critical period will not be completely or "perfectly" acquired. This analogy is actually problematic, the idea of a critical period for language in general being different from that of a critical period for specific language competencies.
Early evidence for the existence of a critical period for first language acquisition was based on cases of "feral children" or from children who were deprived of socialization during childhood, and were later unable to acquire language successfully. Obviously, these conditions of extreme deprivation could have serious psychological consequences, and so the problems with speech development cannot be attributed solely to a missed critical period. Less problematic is the more recently available evidence from infants who are born deaf, but later have their hearing restored surgically through cochlear implantation. The earlier the surgery is performed, apparently, the more likely the child is to develop normal speech, preferably before the age of 6 months (Kral and Eggermont 2007). In contrast, a mature person, deprived of hearing speech for three years, will retain their language faculty, deprivation of a specific sensory experience at later age seeming not to be damaging to the already formed system. On the other hand, the neural circuitry in the adult is certainly not immutable-plasticity persists and can be redeveloped (see Cisneros-Franco et al. 2020, for an overview and bibliography).
If there is a critical period, what are its limits and what is the extent of its effects? With regard to limits, as Bates et al. (1997) noted some years ago, "the end of the critical period for language in humans has proven . . . difficult to find, with estimates ranging from 1 year of age to adolescence" (p. 85). Differences concerning the offset of language-readiness go back to the origins of the Critical Period Hypothesis. For Penfield (Penfield and Roberts 1959), the widely acknowledged forerunner of the CPH, the critical age was after age nine, when the brain was supposed to lose its plasticity, whereas for Lenneberg (1967), the "father" of the CPH, it was puberty, when the process of assigning language functions to the language-dominant brain-hemisphere was supposed to be complete. Both Penfield and Lenneberg were researchers who had a strong impact regarding the idea that language learning capacity is programmed to undergo a sudden and serious decline at a particular point; however, in common with many researchers taking this line, they disagreed as to where precisely this point is located. There have been claims that the critical period for everyone ends even earlier than age six (see, e.g., Hyltenstam and Abrahamsson 2003;Ruben 1997). Meisel (2008) suggests that, at least for some aspects of language, the window of opportunity for nativelike ultimate attainment begins to close as early as 3-4 years of age. Recent developments in critical period research have brought us no closer to an agreed offset point. The two recent, large-scale studies, whose findings have been interpreted by their authors as supportive of the CPH, have determined the age when the critical period closes as 9 years (Dollmann et al. 2020) and 17 years (Hartshorne et al. 2018)-a difference of no less than 8 years.
Especially problematic is the widespread acceptance of puberty (following Lenneberg) as the critical period offset point. The timing of puberty is generally assumed in the CPH literature to happen around twelve to fourteen years of age. This assumption turns out to be a gross simplification. Puberty, (1) turns out, in fact, to be associated with quite a wide age-range (8-14 years), (2) occurs usually later in boys than in girls, and (3) has an increasingly early onset in girls in many cultures (see, e.g., Roberts 2013). Some girls, in fact, experience puberty as early as age six (see, e.g., Biro et al. 2020), and also, there are some cases of individuals not reaching puberty until their very late teens (see, e.g., Abdel Aal 2016).
Commonsensically, the proposition that, on the basis of the foregoing, the acquirability of a second language, being linked to the age of puberty, is, in some individuals, severely curtailed at age six, while in others, it remains unproblematic until age 17 or 18, appears rather questionable. However, although implausible, the proposition is not invalidated by even very large individual differences. There remains the theoretical possibility that the ability to fully acquire a second language may be related to puberty. However, the wide age range for puberty raises a very serious issue concerning the design of studies which explore this issue. Most researchers collect data on the age of arrival into the L2 country, or the age at which language instruction began, but do not focus on the age at which individuals began puberty, probably because of the widespread assumption that puberty is assumed to happen at a non-specific age. Since this assumption is unreliable, not much useful information can be found in the literature on this issue.

CPH or CPHs?
Nor do differences among researchers concern only the CPH offset point. Regarding the affected language learning capacities, as pointed out by Singleton (2005), CPH advocates have written of deficits in general language learning ability and in the linguistic features of every degree of supposed innateness. As far as the underlying sources of critical period effects are concerned, Singleton (2005) retails six accounts of a neurobiological nature, as well as four relating to cognitive development, and a further four having to do with affect and motivation. His response to this enormous range of perspectives is that the CPH cannot plausibly be regarded as a scientific hypothesis either in the strict Popperian sense of something which can be falsified (see, e.g., Popper 1959) or in the looser sense of something that can be clearly confirmed or supported (see, e.g., Ayer 1959). Birdsong and Vanhove (2016, p. 164) make a similar point, saying that the CPH is actually "a conglomerate of partly overlapping, partly contradictory hypotheses"-thus, resistant to proof or disproof. Singleton (2005) also critiques the "multiple critical periods" idea-revived by, for example, Granena and Long (2013), who posit three sensitive periods, closing, according to their analysis, first, for phonology, then for lexis, and finally, for syntax. Supporters of this idea might have been pleased with the results of the two recent studies mentioned above, by Hartshorne et al. (2018) and Dollmann et al. (2020); the former inferred critical period closure at the age of 17 on the basis of a test on syntax (in L2 English), whereas the latter suggested closure at the age of 9 on the basis of measurements of the degree of a foreign accent (in L2 German). However, many other critical periods for language have been proposed, with different sequences and different ages. For example, Meisel (2008Meisel ( , 2010 suggests that there are various periods for different aspects of grammar (for example, inflectional morphology), and that some aspects of language are affected already at 3-4 years of age. Additionally, the multiple critical periods hypothesis has always been undermined by mixed evidence and by counterevidence. For example, the notion that in order to attain a nativelike accent, one has to begin one's L2 experience in early childhood was devastatingly contradicted by Bongaerts' series of studies (e.g., Bongaerts 1999Bongaerts , 2003 and by Moyer's work (e.g., Moyer 1999Moyer , 2004Moyer , 2013Moyer , 2014. Even studies which offer support to the CPH, such as that of Dollmann et al. (2020), find that there are always exceptions to the general trend; that is, there are always cases of high levels of ultimate attainment in late L2 learners. Moreover, if multiple critical periods were indeed to happen in the postulated sequence, it would be impossible to encounter L2 users who have a nativelike accent but an imperfect command of syntax, which is obviously not the case. One could argue that the last point rules out the possibility that multiple critical periods occur in a specific sequence, but not that they occur in general. However, if multiple critical periods exist in an unspecified number, for an unspecified set of aspects of language, take place at varying ages, and in a different sequence for each individual, the lack of specificity renders this hypothesis almost meaningless-at least until a theoretical model or explanation is offered which would make some predictions as to why the sequence should be variable. At the moment, no such theoretical possibility has been entertained.
The notion of multiple, separate critical periods for language, specifically for grammar and lexis, is also undermined by a number of recent trends in linguistics which blur the distinction between the two. The notion that lexis and syntax are clearly separable (cf. Singleton 2020a, 2021; Singleton and Leśniewska 2021), was dealt a death blow by the work of Sinclair (e.g., Sinclair 1991) and Hoey (e.g., Hoey 2007), then buried deep by emergent grammar (see, e.g., Lantolf and Thorne 2006) and by the usage-based perspective on language knowledge (e.g., Ellis 2017).
Naturalistic evidence generally supports the notion that, in the long term, the earlier L2 learning begins, the higher the degree of L2 proficiency attained. This is the pattern found in classic immigrant studies. Thus, for example, Asher and García (1969) demonstrated that an early age of arrival in America was a better predictor of English pronunciation than length of residence; Seliger et al. (1975) found that most of those who had migrated to Israel or the United States before age 9 thought themselves to be native speakers of Hebrew or English, whereas most of those who had migrated at or after age 16 felt they still had a foreign accent; Patkowski (1980) showed a negative relationship between English syntactic rating and age of arrival in the United States; Hyltenstam (1992) discovered a higher number of lexical and grammatical errors in the Swedish of immigrants settling in Sweden after age 7; and Piske et al. (2002) found the vowel production of early bilinguals to be more nativelike than that of late bilinguals. A general finding appears to be that those who arrive early in a country where a language in general use differs from their home language are more likely than older arrivals to pass-eventually-for native speakers of the new language.
This "earlier the better" tendency in naturalistic SLA is, however, only a tendency. Not all immigrants arriving in their host country in childhood attain a high degree of mastery of the ambient language, and those who arrive later do not necessarily fail to acquire the degree of proficiency attained by younger arrivals. Regarding the latter point, one can cite the case of the 20 late L2 acquirers of French in Kinsella and Singleton's (2014) study. In a test of their identification of regional French accents and a lexicogrammatical test, 3 of these 20 participants scored within native-speaker ranges across the board. Such findings do not, however, undermine the CPH for its most stalwart advocates (e.g., Abrahamsson and Hyltenstam 2009;Long 2013), for whom the criterion for falsification is 'scrutinized nativelikeness' in the L2 at all times with regard to every single linguistic feature in the later learner (Abrahamsson and Hyltenstam 2009).

Problems with the "Scrutinized Nativelikeness" Yardstick
Nativelikeness (see, e.g., Long 1990Long , 1993 has, in fact, proved extremely difficult to establish and demonstrate in general (cf. Dewaele et al. in press). Some years ago, Davies, addressing the problem of defining what a native speaker actually is, expressed the view that "the distinction native speaker-non-native speaker . . . is at bottom one of confidence and identity" (Davies 2003, p. 213;. The concept of "scrutinized nativelikeness" is problematic for two important reasons. Firstly, the conception of nativeness as a benchmark implies that there is a specific, clearly defined level of language proficiency that characterizes native speakers, whereas, in reality, native speakers of a language display quite a wide spectrum of divergence from idealized norms (Dąbrowska et al. 2020). It is now recognized that even monolingual native speakers exhibit features in their representations of linguistic structure that would normally be deemed erroneous (see Dąbrowska 2012). Hulstijn (2019) emphasizes the need to recognize this range of levels of usage among native speakers, because the widespread assumption concerning the homogeneity of the native speaking population is problematic for SLA research. Most studies that assess the level of L2 learners in relation to native speaker norms with the use of tests assume that native speakers will obtain maximum results. This impression may be reinforced by the fact that the native speaker control groups tend to be highly educated (Andringa 2014)-owing usually to the convenience factor of researchers drawing the native-speaker sample from their colleagues or students. When comparing non-native speakers' performance on grammar tasks to that of native speakers, Dąbrowska et al. (2020) observed that the amount of variation in native speakers substantially exceeded expectations based on previous research. This was due to their use of a larger and more heterogenous group of native speaker controls; having a less restrictive approach to the selection of native speaker controls increases the performance overlap between near-native learners and native speakers.
Secondly, it is doubtful whether any speaker of more than one language should be assessed using norms based on the performance of monolingual speakers. From Cook's (2016) multi-competence perspective, none of such a person's languages can be expected faithfully to coincide with the native language of monolinguals (see Cook 2002). The inevitable interaction between the relevant language competencies inevitably has effects on language production (see, e.g., Jarvis and Pavlenko 2008). Birdsong (2008, p. 22) expresses a similar view, arguing that "minor quantitative departures from monolingual values are artefacts of the nature of bilingualism, wherein each language affects the other and neither is identical to that of a monolingual". It goes without saying-or should!-that this mutual influence includes the domain of "language intuition" (cf. Abrahamsson 2012). Recent work in translanguaging (e.g., Wei 2018; Singleton 2020b) supports Cook's and Birdsong's insights.
The above discussion has important implications for CPH research. Birdsong (2014, p. 47) comments that, because of the mutual influence of a multilingual's knowledge of his/her languages, and the fact that the L2 will inevitably be affected by such influence, "nonativelikeness will eventually be found"; if, then, pure and exceptionless nativelikeness is demanded in order to disconfirm the CPH, he argues, "the CPH is invulnerable to falsification". An implication which is important for future research is that L2 learners should be compared to, or judged according to, norms set by bilingual (or multilingual) speakers of the same language who acquired the language in question from birth.

Aptitude
Studies claiming to demonstrate the existence of a critical period for SLA tend to find exceptions in the form of late L2 learners who, nevertheless, perform at a nativelike level. This is of course problematic for proponents of the CPH. One way in which CPH advocates try to deal with the problem is by positing that some individuals have particular innate characteristics which allow them to overcome the disadvantages of missing the critical period. The main candidate trait in this respect is language aptitude, the degree of which has been widely portrayed as inborn (see, e.g., Carroll 1981). High aptitude is often designated a "gift for languages" (Rosenthal 1996, p. 59), which, according to some, may act to some degree as a prophylactic against the effects of the critical period (see, e.g., Abrahamsson and Hyltenstam 2008;Granena and Long 2013;Dollmann et al. 2020). Such a claim is made, for example, by DeKeyser (2000) in relation to those of his immigrant late learners who performed at a native level, all of whom, he reported, as showing high aptitude. Birdsong's (2014) reanalysis of the data in question, however, suggests that education was a more robust predictor of the proficiency results than aptitude. In any case, it is not clear that language aptitude is simply an innate trait (cf. Singleton 2017); at least, to an extent, the awareness that derives from experience and training seems to impact on it (cf. Robinson 2002). Kormos (2013, pp. 145-46), citing a range of studies (Eisenstein 1980;Sparks et al. 1995;Sáfár and Kormos 2008;Nijakowska 2010), sums up the way in which thinking on this matter is moving: Although language-learning aptitude might seem to be a relatively stable individual characteristic when compared with other factors, such as motivational orientation and action control mechanisms, there seems to be some converging evidence that certain components of aptitude . . . might improve in the course of language learning. This, of course, hugely complicates the posited interaction between aptitude and the so-called critical age. It very much suggests that what has been propounded on this issue is, to say the least, wildly premature.

Age or Opportunity?
While it is a widely accepted fact that, in the naturalistic environment, older beginners are not generally as successful as younger ones, doubts arise as to whether age itself is in fact the variable at play. There are other variables that have nothing to do with biological maturation, but are confoundable with age. Most obviously, length of residence in the target country often correlates with ultimate attainment alongside age (see, e.g., the results of Dollmann et al. 2020). A wide range of other factors have been proposed in this respect, including psychological, social, and educational ones. A major suspect in this context appears to be the amount and quality of input experienced (see, e.g., Flege 2019). Arrival in the host country at a later age means, for example, less time spent in school. As Flege points out, it is often assumed that increased length of residence automatically means more input, but it does not; immigrants may spend years interacting with a predominantly L1 environment, or with the accented (or otherwise non-native) L2 speech of other members of the immigrant community. As Flege and Bohn (2021) put it, "immigrants' length of residence (LOR) in a predominantly L2-speaking environment is problematic because it does not vary linearly with the phonetic input that L2 learners receive and because it provides no insight into the quality of L2 input that has been received" (p. 32).
Socioeconomic status (SES) is likely to be a factor, too, by analogy with the role that SES is known to play in L1 and L2 acquisition (see Huttenlocher et al. 2010;andVasilyeva et al. 2008, for SES in L1, andHuang et al. 2018 for SES in L2). Immigrants with a lower SES are less likely to have good educational opportunities, and are more likely to have stronger ties to L1-speaking migrant communities. Such considerations have led some to postulate that opportunity rather than age is the most important predictor of attainmentthe opportunity for large amounts of high-quality input, interaction, and education in the target language. Thus, Marinova-Todd et al. (2000) argue that late L2 acquisition ends in full success for those "adults who invest sufficient time and attention in SLA and who benefit from high motivation and from supportive, informative L2 environments".

Looking for Discontinuity
As has been mentioned several times, research on the age factor in SLA in the naturalistic sphere generally shows a negative correlation between age of acquisition and ultimate attainment. In other words, the older one is at the beginning of acquisition, the lower the level of proficiency, which will be the long-term outcome of learning. This is a general trend, not really questioned despite the abundant exceptions. However, to demonstrate the existence of a critical period, there has to be incontrovertible evidence of a discontinuity in relationship between the effects of different ages of acquisition on ultimate attainment, preferably visible in studies with large numbers of participants.
However, it is less clear what counts as such discontinuity, and-more generallywhat shape is to be expected of the relationship between age of acquisition (AoA) and ultimate attainment (UA) (see Hartshorne et al. 2018;Vanhove 2013). Do we expect UA to be the same for all AoAs in the entire critical period window, then to drop sharply, and flatten again, or is there supposed to be a gradual decline throughout? Different conceptualizations of the critical period effects are possible. This raises serious methodological issues for large-scale studies. The work of Vanhove (2013) and Birdsong and Vanhove (2016) shows how differences (sometimes seemingly minor) in how data are statistically analyzed can make a major difference in what inferences are drawn from the data when it comes to the analysis of the AoA-UA function.
Very intriguing results have recently come from a study by Hartshorne et al. (2018), involving a record-breaking number of participants (over 600 thousand), which makes it a prime example of the so-called "big data" approach to psycholinguistic study. The test constructed by the researchers became viral on social media, thanks to the fact that the participants could test themselves on whether they were nativelike, and also to see if the test was able to accurately discover which variety of English they used. It should be noted here that the test was designed with the intention of measuring the subjects' knowledge of syntax only. One of the main challenges in critical-period-related research is that it is difficult to infer information about the change in learning rate (when it occurs, and how sudden it is) from data about the AoA and ultimate attainment. (Two contradictory theoretical models-a gradual decline in learning ability over the lifespan and a sudden slowing down of learning at the end of a critical period-may produce the same result in terms of ultimate attainment.) To overcome that challenge, the authors of the study used a novel modelling technique in order to mathematically reconstruct the learning rate on the basis of the available answers. The results indicate that the learning rate for syntax declines by an astonishing 50 percent at around the age of 17.4. In response to some concerns about the novel method used to analyze the data (see Frank 2018), a follow-up study was conducted by Chen and Hartshorne (2021) that presents an improved analysis of the impressive dataset, which had meanwhile grown to include data from more than one million respondents. While the original study did not provide any estimate of uncertainty, Chen and Hartshorne (2021) added bootstrapping to obtain confidence intervals. They also used Item Response Theory instead of simply the proportion of correct answers on the test to gauge syntactic knowledge. Finally, and most importantly, they used a different mathematical model from the original study, in order to investigate the possibility that the previous results may have been an artefact of the model used. The results of the new study confirmed the previous findings-only the age when the syntax learning ability declines was found to be slightly later. Hartshorne et al. (2018) and Chen and Hartshorne (2021) thus provide support, in a way, to the CPH, in the sense that they provide evidence of discontinuity in the learning rate, but their findings also constitute a major challenge for existing CPH-based predictions in that they place the discontinuity much later than most scholars have predicted. The impressive size of the dataset and the ensuing statistical power give a lot of weight to the results. It also needs to be remembered, however, that the two studies only concern syntactic knowledge, and only as much of the syntactic knowledge as was measured by the specific test used in the study, as the authors themselves acknowledge. The test features a selection of items which are meant to represent syntax, and in fact, some grammar topics appear to be overrepresented (cleft sentences, passive voice) and some items would usually be labelled as lexical (such as the item "I . . . . the story", in which the correct answer "told" needs to be selected from among distractors such as "said"). It is, therefore, unclear to what extent the test covers English syntax comprehensively. Moreover, syntax is assumed to consist of a finite number of rules, a view which has been challenged by developments in phraseology and construction grammar (as we discuss elsewhere in this article). Syntax is also assumed to be a unified whole, for which one critical period would apply, which is again one theoretical possibility, but not the only one. Such reservations by no means take away from the importance of the study, but they point to the need to conduct further studies along these lines.

Neurolinguistics: New Developments
Despite the many versions of the critical period discussed earlier, Lenneberg is aptly named as "father" of the CPH, to the extent that his 1967-vintage version of the hypothesis, postulating that the critical period affects all aspects of language and that it ends at puberty, is probably still the one that holds most sway. The element that has mostly suffered the ravages of ageing is the rationale underlying Lenneberg's hypothesis. Lenneberg explained the alleged problems of later second language learners in terms of a developmental process in the brain which, according to him, was completed by the "critical age" of puberty. The process referred to is the "lateralization" of language functions to the brain hemisphere dominant for language (usually the left). Lenneberg's account of the nature and timescale of such lateralization is no longer taken seriously by neuroscientists. Indeed, as early as 1973, Krashen devastatingly critiqued Lenneberg's account of lateralization, claiming on the basis of brain damage studies that it was complete before age 5 (Krashen 1973, p. 65). Current research suggests a complex and multi-factored relationship between lateralization and age (see, e.g., Nenert et al. 2017).
It is worth stressing that the original CPH made claims about the biology of the human brain; that is, the hypothesized changes were supposed to be neurological in nature. However, subsequent research sought to provide evidence for the hypothesis principally on the basis of studies of human behavior, e.g., second language attainment in learners with different AoA. The reason for this was that meaningful detailed study of the postulated neurological reality behind age-related language acquisition phenomena was not possible.
Much has changed in this respect since the 1960s, when Lenneberg's hypothesis was originally published, particularly in the last two decades. In fact, the developments in neurology have been much more radical in the last few decades than in mainstream second language acquisition research, which relies on essentially the same approaches as before (with the main technological development of larger-scale studies being facilitated by the use of the internet). Neurolinguistics, however, has seen major developments in the form of electroencephalography/event-related potential (EEG/ERP) research as well as (functional) magnetic resonance imaging ((f)MRI).
When large groups of neurons fire at the same time in the brain, the resulting small voltages can be recorded using ERPs. Different types of linguistic stimuli result in distinct "brain signatures" (DeLuca et al. 2019, p. 176), i.e., different ERP patterns. The patterns most often mentioned in the literature on linguistic processing are the N400 and P600 patterns, which result from waves that differ in terms of amplitude, direction and timing. P600, for instance, accompanies many types of morphosyntactic processing in native speakers, while N400 effects are typical of semantic processing. This is why they allow researchers to gauge a participant's sensitivity to a specific linguistic stimulus. If such ERP signatures generated in response to the same stimulus are different for someone's L1 and L2s, it implies that different processing mechanisms are used in each case. (See DeLuca et al. 2019, for an accessible overview of ERPs in language processing research and further bibliography.) Even though the body of ERP research on language processing has produced results which are, to some extent, mixed (see, for example, the work of Schmid and associates, which show the plasticity of the brain in adults to be limited, especially in the case of morphosyntax-e.g., Bergmann et al. 2015), the majority of studies support the view that the adult brain substantively retains plasticity. DeLuca et al. (2019) provide a careful summary of research to date and conclude that the overall picture is that of much greater plasticity of the adult brain than would be implied by the classic CPH. Most importantly, the results show that according to the majority of studies, L2 learners at lower levels of proficiency produce different ERP responses to the same stimuli from L1 speakers, while L2 learners at higher levels tend to parallel L1 speakers of the same language in terms of ERP patterns. DeLuca et al. (2019) conclude that even with the minor discrepancies between the results of some studies, the overall picture which emerges is that of L2 learners gradually shifting to language processing that is qualitatively similar to that of native users.
Interestingly, as noted by Steinhauer and Kasparian (2020), early studies using ERP, from the turn of the millennium, actually seemed to confirm the CPH, showing that ERP patterns of brain activation of learners with late AoA differed from those of native speakers, in contrast to the patterns of learners with early AoA, which mirrored those of native speakers. Such studies were later found to have confounded AoA with proficiency. The new, better-controlled studies showed that the type of brain activity changes with proficiency, and most importantly-that late adult learners of L2 who reached very high levels of attainment displayed comparable patterns of brain activity to native speakers, even for morphosyntax (e.g., Bowden et al. 2013).
For example, such findings were reported by Rossi et al. (2014)-a study which investigated late English-Spanish bilinguals of varying degrees of (self-reported) English proficiency. The study used ERP to examine their reactions to gender and number violations in the use of clitic pronouns in Spanish sentences. The study suggests that it is possible for late bilingual speakers of Spanish at high levels of proficiency to process grammatical features in a nativelike way, with nativeness evidenced by an adequate ERP signature. Similar conclusions were reached in Rossi et al. (2017). In a nod in the direction of the concept of Universal Grammar (UG), which predicts that if a syntactic feature is absent from one's L1, it is impossible or more difficult to acquire it in L2, both the aforementioned studies incorporated grammatical structures of two kinds: those with and without corresponding L1 structures. Both investigations showed that, in regard to both these types of structures, at high proficiency levels, the processing is similar to that of native speakers (as evidenced by the type of the ERP signatures).
Recent work with implications for the CPH comes from ERP studies on attrition. We will report in detail on one such highly interesting study, by Kasparian and Steinhauer (2016). They provided three groups with the same stimulus sentences; in some of the sentences, one word was replaced with an incorrect word, either very similar to the one which would be correct, or less similar. The three groups were Italian monolinguals living in Italy, Italians living in Canada who reported using mostly English in their everyday lives and having occasional difficulty remembering words in Italian (which indicates L1 attrition), and L1 English speakers with advanced L2 Italian. The ERP patterns observed in those three groups were similar for both the attriters and the L2 learners. What is more, the ERP patterns were found to be related to language proficiency in Italian, thus, for the attriters, the extent of attrition. Very high levels of L2 proficiency were accompanied by ERP patterns similar to those of monolingual native speakers. Lower levels of L2 proficiency and more severe cases of L1 attrition were characterized by similar ERP patterns, different from those of monolinguals, indicating problems with lexical retrieval. These findings were limited to lexical aspects of language, but in another study (Kasparian and Steinhauer 2017), the authors obtained similar results for morphosyntax. These studies point to the ongoing plasticity of the brains of adults, whether it is for language acquisition or loss.
(F)MRI is another technology which can provide a glimpse of the underlying language processing mechanisms. It is a neuroimaging technique that is non-invasive and relies on the use of radio frequency pulses and magnetic fields. With respect to the brain, its specific structures can be captured by static MRI, whereas the neural processes can be captured by functional MRI (fMRI). In contrast to ERP, MRI provides information on the location of brain activity; (f)MRI studies also provide some evidence about the neuroplasticity of the adult brain in terms of language learning. "Plasticity" refers to regional changes in the volume and/or density of white and grey matter, as well as changes in connectivity between different areas of the brain, i.e., patterns of activation. According to the review provided by DeLuca et al. (2019), most such studies are longitudinal, with participants having brain scans before a language training program and after it. L2 acquisition has been found to have observable effects on the brain, resulting in an increase in grey matter volume and in white matter volume in brain regions related to language. Fewer (f)MRI studies have looked at more naturalistic language acquisition. For example, Pliatsikas et al. (2015) found patterns of white matter increase in late bilinguals that are similar to those in child bilinguals. DeLuca et al. (2019) sum up their review on MRI research in the following words: "MRI affords us the opportunity to literally see first-hand if the 1967 predictions hold. Evidence ( . . . ) clearly suggests they do not" (p. 185). The authors sum up their indepth overview of a vast number of neurolinguistic studies, which examine brain plasticity with the following conclusion: Taken together, the evidence indicates that the neural substrates and processes underlying language acquisition and production in the L2 are maximally comparable to those in L1 across the lifespan. Any maturational constraints that might apply are not specific to language learning, especially to the extent that they create critical periods, but are generic constraints brought about by healthy ageing and apply to other aspects of cognition (e.g., memory). While showing this does nothing to negate the very observable differences in path and outcome between typical L1 and adult L2 acquisition, it does suggest that other variables conspire to account for these differences; that is, there is no true fundamental difference in how language is acquired and processed, irrespective of age (p. 188).

Concluding Remarks
The paradox concerning the CPH is that, although there are vast amounts of literature on this matter, the findings are "interpretable". This is because (i) the notion of critical period is used with very different, often underdefined meanings, (ii) because the relevant research is extremely varied and variable, and (iii) because the different categories of participants in the relevant studies are less than precisely profiled and sometimes confused.
One reason for these deficiencies may be that the research in question is not "highstakes"-neither backed by copious funds nor by international teams of researchers. Perhaps this situation may change in coming years. Another contributing factor may be the very limited cooperation between linguists and psychologists. Psychologists either do not take linguistic theories into account, or, if there are linguistic theories which have made their way into psycholinguistic research, they tend to derive from the UG model (see, e.g., Rossi et al. 2014;Rossi et al. 2017), whereas linguistics certainly has more to offer than this! Linguists could benefit from a careful consideration of the neurolinguistic research outlined above. What would be welcome, from the linguistics side, is a carefully designed, large-scale study that would test various aspects of language knowledge and use, which could be combined with neurolinguistic measures.
Another shortcoming and source of problems is the widespread (mostly tacit) assumption that measuring any small subset of language skills of aspects of language competence is representative of the bigger picture. As a result, many studies focus on various arbitrarily selected aspects of language knowledge and use. For example, the age-factor-related literature cited throughout this paper very often focuses on accent, the presence of a nativelike accent being popularly taken as the criterion for "nativelikeness". Accent is thus often covertly assumed to represent language proficiency in general. It seems to be tempting to some (and not just lay-people) to imagine that nativelike pronunciation is the ultimate marker of performing like a native speaker, but we have to recognize that, while the various aspects of language proficiency do tend to develop broadly in parallel, pronunciation is arguably distinct from other language skills. Similarly, studies that examine morphosyntax often include apparently arbitrarily selected items (e.g., Hartshorne et al. 2018). Age-related SLA research has given too little attention to comprehensively and systematically testing language knowledge.
While it is astonishing that after decades of research, we are nowhere near being able to close the topic of the CPH, some people might doubt the importance of this debate, especially given that the overall picture is so muddied. After all, while there are individuals who achieve remarkable success in the acquisition of a foreign language, despite starting late, most do not. A myriad of factors, including individual factors, affect this process, and getting to the bottom of the issue may be seen as either too difficult, or simply not worth the major research effort it would entail. However, the CPH debate is anything but irrelevant. In fact, it is one of those issues in second language acquisition research that has important real-world implications, because research findings may inform policies applied in the formal instructional context.
The widespread popularity of the concept of maturational constraints limiting a person's ability to learn a language beyond puberty has been behind the widespread move to lower the starting age of institutional L2 learning. This trend was instigated some seventy years ago, under the influence of advocates of early L2 instruction in the school curriculum-such as Penfield-and has accelerated dramatically in recent times all round the world (see, e.g., Murphy 2014), despite not being supported by empirical research. In fact, it flies in the face of such research, which, for nearly half a century, has been showing that in a normal schooling situation, pupils who are taught an L2 at primary level do not, in the long run, maintain the advantage of their early start (see e.g., Pfenninger and Singleton 2017;Singleton and Pfenninger 2016). Starting age-related differences are demonstrated to be levelled out over the course of the secondary school period. This obviously implies that the late starters do better than the early starters, since they are able to acquire as much second language knowledge as the earlier starters within a considerably shorter period of time, and thus, progress faster than younger starters.
From the 1970s (e.g., Burstall 1975;Carroll 1975), studies were conducted which have consistently failed to verify the hope that early instruction would deliver higher proficiency levels than later instruction. Moreover, the later beginners, who have less learning time, have been found to be equal or superior across a range of measures (see Muñoz and Singleton 2011). In Canada and the US, it was also found that older immersion learners were as successful as younger learners in shorter time periods (e.g., Swain and Lapkin 1989;Turnbull et al. 1998). A very recent study (Baumert et al. 2020), involving a sample of almost 20 thousand students, also failed to show an advantage in outcomes for earlier beginners. Despite the univocal research history on this matter, the authors attribute their results to the fact that at the secondary level in German schools, everyone is taught English at the same level. Importantly, a meticulous synthesis of empirical studies, which examined the results of early L2 instruction (Huang 2016), has shown that there is no solid evidence supporting the "younger is better" approach to L2 teaching.
To sum up: given the amount of time and attention that has been devoted to this topic in the last five decades, the overall results are very disappointing. Our own discussion in this article has tended in the direction of affirming that the CPH, despite its long history, has the status of "not proven". For an issue which attracts such popular interest and prejudices and has been a research topic for half a century still to be surrounded by such ambivalence is embarrassing, indeed, and unfortunate, especially given the hunger for clarity in relation to planning institutional second language teaching.

Conflicts of Interest:
The authors declare no conflict of interest.

1
Another term used in connection with this concept is "sensitive period". Sometimes, a distinction is made between a critical period and a sensitive period (the latter denoting a milder version of the former); for example, Knudsen (2004) defines critical periods as "a subset of sensitive periods for which the instructive influence of experience is essential for typical circuit performance and the effects of experience on performance are irreversible", while a sensitive period occurs when "the effect of experience on the brain is particularly strong during a limited period in development". However, the two terms are often used interchangeably. Moreover, in linguistics, there is a well-established tradition of referring to the sensitive/critical period for a second language as a "critical period", even if it is closer to a sensitive period. We therefore follow Birdsong (2017) in assuming these two terms to be interchangeable and not making a specific distinction between the two.