The Lexical Development of Canadian-Born Romanian L1 Bilingual Kindergarteners

: This study charts the lexical development of three sequential bilingual kindergarteners whose ﬁrst language, Romanian, was acquired naturalistically at home, and whose second language, English, was acquired in kindergarten. The children’s lexical development in English and Romanian was assessed at ﬁve different points over a two-year period via the PPVT-4 (peabody picture vocabulary test 4) and a specially adapted PPVT-4 for Romanian. The children’s lexical repertoires were further analyzed to uncover home versus school and cognate versus non-cognate acquisitional differences. In addition, because there is no database of lexical items acquired by monolingual Romanian children, the PPVT-4 adapted for Romanian was administered to 22 monolingual six-year-old Romanian children in Sibiu, Romania. The ﬁndings indicate the following: (i) the three bilinguals’ receptive vocabulary in English was below average when they joined kindergarten, and at or above average two years later; (ii) their lexical growth in Romanian was steady; (iii) the bilinguals’ scores for words belonging to a home register reﬂected ceiling effects in English and Romanian (i.e., were very well known); (iv) academic words were known to an equal extent in English and Romanian, but scores were lower than for the home register; and (v) there was no deﬁnitive evidence of cognate facilitation. A comparison of the monolingual and bilingual Romanian repertoires reﬂects the following: (i) equally high scores for home items; (ii) differences in scores in the academic register in favour of the Romanian monolinguals; and (iii) important lifestyle and cultural differences between the groups. The Romanian children, for example, were more familiar than their Canadian counterparts with items related to home maintenance, such as s , mirgh ă luies , te (‘sanding’) and mistrie (‘trowel’), or items probably learned in school, such as foca (‘walrus’) and broasc ă t , estoas ă (‘tortoise’).


Introduction
The primary objective of this longitudinal study is to document the bilingual lexical development of three Romanian-speaking Canadian children between the ages of four and six. To chart their lexical development in Romanian, their first language (L1), and in English, their second language (L2), two separate measures of lexical ability were administered, namely, PPVT-4 (Peabody Picture Vocabulary Test 4) and a specially adapted PPVT-4 for Romanian. The children's language-specific development included an analysis of home versus school vocabulary differences in each language. In addition, as Romanian and English share a sizable number of Greco-Latinate cognates (Petrescu et al. 2017), possible cognate facilitation resulting from the transfer from L1 to L2 or L2 to L1 was also investigated.
For comparative purposes, especially as there are not yet any known accounts of lexical development in monolingual Romanian contexts, we also collected Romanian data via the adapted Romanian PPVT-4 from 22 monolingual kindergarteners in an urban centre in Romania. While this was not a longitudinal study, these data were collected when the children were six years old, paralleling the stage at which we collected our final data from the three Canadian bilingual children.

The Study's Rationale
Our study aims to investigate the receptive lexical knowledge of bilingual children whose L1 and L2 acquisition is successive but overlapping. In the case of simultaneous or overlapping bilingual acquisition, some argue that vocabulary growth in each of the two languages is generally delayed because of the limited input and interaction in each of the two languages; furthermore, a bilingual's lexicon in each of the two languages is smaller than that of a monolingual's for the same reasons (Bialystok et al. 2010;Werker 2012). Another interesting question is whether a bilingual's L2 lexicon is qualitatively different because of environmental and cultural factors such as the manner and purpose of L2 acquisition (Sabourin et al. 2014). Sabourin et al. (2014), for example, suggest that late L2 learners, irrespective of age, could have an integrated L1-L2 lexicon if L2 acquisition included input and interaction in a naturalistic context. As a corollary to this, Jiang (2000) hypothesizes that in a foreign-language context, where the instructor is the only source of input and interaction, learners are frequently encouraged to translate from their L2 to their L1; this gives rise to a reliance upon the L1 and to a superficial and occasionally inaccurate semantic representation of L2 words. In addition to exploring these issues, our study also aims to consider how potential disadvantages in terms of reduced lexical size or slow lexical acquisition could be mitigated by language relatedness. From a lexical point of view, for instance, it could be advantageous to have a Romance language, like Romanian, as a definitive L1 and a language like English as an L2, because numerous "everyday" Latinate words in Romanian pair up with low-frequency "academic" words (cognates) in English (Petrescu et al. 2017). Furthermore, typological similarities between an L1 and L2 could lead to more than cognate facilitation because these similarities allow for extra "processing space" to be devoted to lexical acquisition (Ard and Homburg 1983). In short, limited input and interaction could be compensated for by typological similarities.

Lexical Knowledge of Monolingual and Bilingual Children
Extracting words from the speech stream during the first year of L1 acquisition is a complex process ("prosodic bootstrapping") that requires segmentation at different, interacting levels: phoneme perception in syllabic units in keeping with phonemic repertoire of the ambient language; phrasal boundaries using acoustic cues such as pauses and drops to F0 (fundamental frequency); rhythm (including stresses) to acquire smaller phrases and word onsets; and so on (Cutler 1994;Guasti 2000;Jusczyk et al. 1993;Werker and Tees 1984). By nine months or so, a baby has acquired the basic phonotactics of their L1 and can begin to hold a rudimentary version of the phonological form of an L1 word (its "lexeme") in memory (Jusczyk et al. 1993), a milestone allowing them to start mapping meaning onto these forms. During their pre-school years, children begin to acquire words rapidly, mapping meaning onto phonological forms via numerous mechanisms, such as child-directed speech, semantic and syntactic bootstrapping, cognitive development, and a rapid increase in world knowledge (Clark 1995;Gleitman 1990;Pinker 1984;Snow 1977;Tomasello 2003).
In garden-variety types of successive child L2 acquisition, L2 words are acquired once the child has a well-established L1. In cases of overlapping bilingual acquisition in early childhood, the L1 lexicon might be small (and restricted to items learned in a domestic context) when L2 acquisition commences. In such a case, and most noticeably when the L1 is a minority language, while the L2 is the language of the community and school, an important number of concepts are learned through the L2 rather than the L1 (Sherkina-Lieber and Helms-Park 2015). The issue of overlapping L1 (minority) acquisition and L2 (majority) acquisition pertains to the current study, in which the participants began their L2 Languages 2018, 3, 33 3 of 29 acquisition of English in a meaningful way at only the age of four, when they joined kindergarten. Thus, one question that the study addresses is whether the children's two lexicons differ in terms of the types of words known in each language (for example, whether words for food and personal grooming are better known in Romanian than are terms for simple arithmetic or animal houses).
Irrespective of whether L1 and L2 lexical acquisition overlap or are clearly sequential, there is inevitable interaction between the L2 and L1 lexicons (Jarvis 2009;Jiang 2000;Kroll and Stewart 1994;Kroll et al. 2010;Pavlenko 2009;Wolter 2006). In general ways, the very existence of an L1 can facilitate L2 lexical acquisition as the use of the L1 provides evidence to the user that language has structures that can convey meanings and that at least some words have referents. Furthermore, many of the concepts that accompany lexical items do not have to be re-learned. However, the L1 also complicates L2 lexical acquisition. If we consider the process of extracting words from the speech stream alone, an L1 mechanism for segmenting speech phonologically would likely be a default at the outset of L2 acquisition, and an L2 mechanism would need to be acquired alongside to segment the L2 speech stream; the degree of ease or difficulty in this regard would depend on how similar or different, respectively, the two mechanisms are (Cutler 1994;Werker 2012). Beyond phonological segmentation, there are complex ways in which related lexical entries in the two lexicons are linked. Kroll and Stewart's (1994) revised hierarchical model, for instance, posits the view that in early L2 lexical acquisition, the learner relies on the L1 translation of the L2 item to arrive at the concept ("word association"), but with growing L2 proficiency, the learner can possibly arrive at the concept directly without reliance on the L1 ("concept mediation"). However, there are other possibilities. The concept and associated lexical entry could first be acquired in an L2 context, for instance, when a Romanian-speaking child learns 'rectangle' in kindergarten without knowing the Romanian word dreptunghi and the shape it represents.
While L2 lexical knowledge can be both qualitatively and quantitatively different from L1 knowledge, perhaps much also depends on the mode and purpose of L2 acquisition. Based on a preliminary analysis of lexical acquisition of L1 late French bilinguals (immersion into a French environment took place between the ages of 9-19 years), Sabourin et al. (2014) suggest that naturalistic acquisition could be conducive to (i) an integrated L1/L2 lexicon (i.e., "concept mediation") versus an L2 lexicon linked to its L1 counterpart through word associations, and (ii) richer semanticity of L2 lexical representations (cf. Finkbeiner et al. 2004). Another factor that comes into play in the interaction of the two lexicons is the nature of the lexical items and their conceptual representations themselves. For example, when concepts attached to L1 and L2 words are the same, as is more often the case with items with concrete referents than abstracts ones (De Groot and Keijzer 2000), bidirectional transfer of conceptual knowledge can be positive.
Previous research suggests that bilingual children develop smaller vocabularies in each of the two languages, and more slowly in each language than monolingual children (Mahon and Crutchley 2006;Nicoladis and Genesee 1996;Oller and Eilers 2002;Oller et al. 2007;Umbel et al. 1992). As a large body of research suggests that vocabulary size is an important factor in academic success (August et al. 2005;Ouellette 2006;Rohde and Thompson 2007;Swanson et al. 2008;Verhallen and Schoonen 1993;Vermeer 2001), such findings are especially consequential for a bilingual child's school language. However, Feng (2009) andBialystok et al. (2010), on the basis of tests of receptive lexical knowledge, caution that bilinguals' overall lexical scores in each of their two languages need to be broken down further to provide a clearer picture of their lexical abilities, for example, by distinguishing between every day home and community vocabulary and academic school-based vocabulary. Via a meta-analysis of existing studies on English receptive vocabulary knowledge obtained from 772 English monolingual children and 966 bilingual children between the ages of 3 and 10, Bialystok et al. (2010) uncovered significantly lower scores for bilingual children when compared with monolingual ones. Nonetheless, there was no disadvantage for academic vocabulary among the bilingual group but, instead, lower scores only in vocabulary related to home life, most likely learned by children in their L1. These findings are in line with research that found that bilingual Languages 2018, 3, 33 4 of 29 children are not disadvantaged in academic and literacy achievement (Bialystok et al. 2005), nor in academic uses of spoken language (Peets and Bialystok 2015). Furthermore, when taken together, the bilingual children's vocabularies in their two languages could be larger than the vocabulary of monolingual children (Bialystok et al. 2010). The potential strengths and gaps in the lexical knowledge of a child in each language is an issue that is addressed in this study.
With regard to the assertion that bilingual children develop vocabulary more slowly in each language compared with monolingual children, at least when receptive vocabulary knowledge is assessed (Mahon and Crutchley 2006;Oller and Eilers 2002), Bialystok et al. (2010) caution that this claim is too general to apply to all languages and that the specific language pairs that the children are learning will influence the rate of acquisition. For example, the existence of Greco-Latin cognates in Romanian and English could give a Romanian L1 speaker an advantage over a Vietnamese one if both are acquiring academic English vocabulary (Petrescu et al. 2017). We should note that even if cognates are excluded from consideration, the numerous typological similarities between related languages can make L2 acquisition easier for such learners, thus freeing up more resources for lexical acquisition (see, for example, Ard and Homburg (1983) study involving L1 Spanish versus L1 Arabic when it comes to acquiring L2 English vocabulary).

Romanian-English Cognates and Cognate Facilitation
Romanian belongs to the Romance subgroup of the Indo-European languages, and has strong lexical, grammatical, and phonological links to Italian, French, Spanish, Portuguese, and other Romance languages, being the "easternmost representative of the family of Romance languages" (Cojocaru 2003). At the morphosyntactic level, Romanian is one of the most conservative Romance languages (e.g., it is the only Romance language that has preserved all six cases and three genders inherited from Latin).
At the lexical level, Romanian is a language that is deeply rooted in Latin, developing in relative isolation as a result of geographical and historical circumstances. According to Maneca (1996), the language has a core vocabulary of Latin words (approximately 35%), which constitute around 74% of the most frequently used words in the language. Given the Latinate influence on the English lexicon through Latin as well as Old and Middle French, Romanian shares a sizeable number of cognates with English (Petrescu et al. 2017). For the purposes of this study, we define cognates psycholinguistically, that is, as words that overlap phonologically and semantically irrespective of diachronic (etymological) factors (Helms-Park and Dronjic 2016; Costa et al. 2005;Dijkstra et al. 2010;Hall 2002;Midgley et al. 2011). In this study, the degree of phonological overlap is calculated via Kohnert et al.'s COSP (crosslinguistic overlap scale for phonology) (Kohnert et al. 2004) in order to assess the probability of interlingual transfer. The theoretical basis for focusing on phonological overlap is the view that it is through automatic phonological activation of neighbours in the bilingual lexicon that cognate recognition begins; however, the learner must also ascertain immediately or subsequently that the words overlap semantically (Carroll 1992;Costa et al. 2005;Dijkstra et al. 2010;Midgley et al. 2011). For example, the Romanian word vehicol overlaps both phonologically and semantically with the English 'vehicle', fulfilling the condition of transfer, and thus potentially facilitating the acquisition of the English word by Romanian speaking bilingual children.
While there is a robust body of literature that demonstrates that adults are more successful in recognizing, acquiring, and retaining cognates compared with non-cognates (De Groot and Keijzer 2000;Sánchez-Casas and García-Albea 2005), research provides conflicting evidence regarding cognate facilitation when it comes to children. Umbel et al. (1992) found that children from Spanish monolingual and Spanish bilingual homes achieved similar overall scores on both the peabody picture vocabulary test (Dunn and Dunn 1981) and the Test de Vocabulario en Imágenes Peabody-Adaptación Hispanoamericana (Dunn et al. 1986), responding correctly on cognates and noncognates at about the same rate (68% vs. 67%). In a follow-up study using the same receptive tests, Umbel and Oller (1994) tested first, third, and sixth graders and obtained similar results. In contrast, Malabonga et al. (2008) report that recognition of cognates increases as the children progress academically. On the basis of their cognate awareness test (CAT), a multiple-choice receptive test, they found no evidence of cognate facilitation effects among Spanish-English fourth graders but, a year later, the same students exhibited a cognate advantage on a multiple-choice test of low-frequency English words. The results of a variety of studies indicate that children's sensitivity to cognate recognition is associated with a range of factors; namely, amount of language exposure (Pérez et al. 2010), previous knowledge of the word concept in the L1 , levels of L1 ability (Malabonga et al. 2008), and age (Kelley and Kohnert 2012).
In this study, we will examine whether cognate facilitation plays a role in lexical development in a context that has received little attention, that is, child Romanian and English bilingualism. Unlike Romance languages like French, Spanish, and Italian, Romanian has remained surprisingly under-represented in both L1 and L2 acquisition research (see Avram 2001;Buja 2008, for some notable exceptions). More generally, the current study highlights the growing importance of Romanian as a heritage (minority) language in a Canadian setting and provides an opportunity to examine the lexical knowledge of young bilinguals in this new context. Contrary to a typical French immersion scenario, which has served as the setting of most of the research on childhood bilingualism in Canada, in our study, the children's L1 and sole target language in their pre-school years is the minority language (Romanian). It is only after the age of four, when they join an English-medium kindergarten, that they acquire English in a systematic and meaningful way.

Method
The design and method of this two-part study aims to answer the following research questions:

•
What are the patterns of lexical development in Romanian and English depicted by the three Canadian bilinguals between the ages of four and six? • Is there any evidence of cognate facilitation among Romanian-English bilinguals during their kindergarten years? • How do the Romanian lexicons of bilingual children compare with those of monolinguals of the same age? Apart from potential differences in vocabulary size, do the types of lexical items comprehended by these monolinguals and bilinguals differ?
In order to address the above questions, Romanian and English lexical data were collected at five different points over a period of two years from three Romanian-speaking, Canadian-born children between the ages of 4 and 6 and 22 monolingual Romanian children aged 6.

Bilingual Participants
The participants in this bilingual study were approximately four years old at the starting point of this project (3;11-4;2), when they started junior kindergarten (JK) in English, and approximately six years old at the end (5;11-6;2), when they finished senior kindergarten (SK) in English. The two male participants (anonymized as "Dan" and "Radu") and the female participant (anonymized as "Moni") were born in Canada to parents who spoke Romanian natively. The children's first and dominant home language was Romanian; their initial interactions and contacts were with the family members, peers, and caregivers who spoke only Romanian to them. Prior to entering school, their contact with English was limited to very sporadic media exposure and minor interaction with English-speaking children in the playground. All three were first-born children, and none of them had any siblings at the commencement of the study; however, a baby brother was born to one of the children (Radu) shortly after the study started. Parents reported that the children had no major health issues and had normal hearing and vision.
The bilingual children in this study are typical of a cohort of Canadian-born children of Romanian-speaking parents who arrived in Canada around the year 2000 and settled in the northern end of Toronto within the municipality of Vaughan, Richmond Hill, and Aurora (Statistics Canada 2011). Their residency cohesion supports minority language maintenance as it provides varied input and plentiful opportunity for minority language use, factors that have been identified in successful language development and maintenance (De Houwer 2009). Furthermore, the three children came from university-educated families who valued both multilingualism and heritage language retention. At the age of four, the children were enrolled in English kindergarten, and at the end of the study, they were registered to begin Grade 1 in French immersion schools.

Monolingual Participants
As there are no standardized language tests or norms for monolingual Romanian-speaking children and no systematic study of lexical acquisition in that population, data were collected from six-year-old Romanian-speaking monolingual children in Romania (N = 22). While the scope of this part of the study was limited, we felt that it would be illuminating to examine the types of items that Romanian-speaking children of the same age would have acquired in the two different contexts, that is, the monolingual context in Romania and the bilingual one in Canada. The Romanian-born monolingual children were enrolled in the equivalent of full-time kindergarten ("Grupa Pregătitoare") in Sibiu, Romania, a historic city that is also a leading cultural centre in Europe. Like Canadian senior kindergarten, the Romanian "Grupa Pregătitoare", prepares children for the Grade 1 curriculum. In addition, as with the bilingual kindergarteners in the Canadian study, the Romanian participants belonged to middle-class families that encouraged extra-curricular activities such as sport, art, and personal development. In short, in terms of socio-economic status, the children in Romania were comparable to the three bilinguals.

Peabody Picture Vocabulary Test, Fourth Edition
The measure used to assess the children's receptive vocabulary knowledge of English was the PPVT-4 (Dunn and Dunn 2007), which contains two versions, Form A and Form B. In order to avoid word recognition through repeated exposure to the same instrument, the forms were alternated between tests. Each of the 228 target items in each form consists of four coloured pictures as response options. The 228 items in each form are split into 17 sets with 12 items each. The items are listed by frequency in descending order (from the most common to the least). There are several reasons for using a receptive vocabulary test rather than a productive one in this study. First, because there is usually a lag between word comprehension and word production (Bates et al. 1994;Gershkoff-Stowe et al. 1997), we realized that there would be a problem eliciting word production in English at the beginning of this study. (Note that the children started interacting in English only at the age of four, the point at which the longitudinal study began.) Furthermore, even in an L1 (here, Romanian), children's production of words is contingent upon many factors, such as quick retrieval of the lexeme, ability to pronounce the word, or the child's level of confidence; moreover, when responding to a picture stimulus, more than one response could be correct, while others could be partially or mostly correct (e.g., if a leopard is called a cat) (Bates et al. 1994;Gershkoff-Stowe et al. 1997).

Romanian-Adapted Peabody Picture Vocabulary Test
As there are no standardized lexical tests for Romanian-speaking children, the peabody picture vocabulary test, fourth edition (Dunn and Dunn 2007), was adapted to Romanian and the resulting two forms, each with 228 items divided into 17 sets, were used as a measure of Romanian receptive vocabulary knowledge at the five points that coincided with the English testing.
The English stimuli were translated into Romanian with the help of two reputable dictionaries (Bantaş 1994;Leviţchi 2005), as well three native speakers of Romanian, one of whom was a linguist well versed in language acquisition, and two of whom were parents of young Romanian-speaking children. As there is no established word frequency list for the Romanian lexicon, the Romanian-translated items closely match the English ones with respect to grammatical category, cultural importance, and relative importance in a child's life. For example, the English word gigantic, which is in the 6000 BNC (British National Corpus) word level was translated as gigantic in Romanian rather than by urias , or mare, which are likely to be in the same frequency bands as huge (2000 word-level BNC) or big (1000 word-level BNC). In addition, culturally-biased items from the English PPVT-4 were replaced by words that are more familiar to monolingual Romanian children in a typical Romanian-speaking home. For instance, item 15 from Form A (target word cookie) and item 32 from Form B (target word muffin) were replaced by pictures and names of pastries that are encountered more frequently by Romanian children, for example cozonac ('sweetbread') and biscuite ('biscuit').

Procedure
The bilingual children were recruited through an information flyer posted on a well-known Romanian forum in Canada. Ethics approval was obtained from the University of Toronto (Protocol Reference #25110) and letters of consent were obtained from all participants (Appendix A). The monolingual children were recruited through directly contacting a kindergarten in downtown Sibiu.
The bilingual children were administered both the English PPVT and the Romanian-adapted version at each point of data collection, alternating the order of the languages tested. The tests were administered at five points over two years: at the beginning of the study in the fall (September) of junior kindergarten (Time 1-Form B), then in the spring (March) (Time 2-Form A), followed in the fall (September) of senior kindergarten (Time 3-Form B), then again in the spring (March) (Time 4-Form A), and concluded at the end of the study (August) (Time 5-Form B). The test was administered in the children's homes by the same researcher, who gave the instructions in Romanian for the Romanian test and in English for the English one.
Also, to avoid priming, the testing in the two languages took place at a minimum of two-week intervals. The task took approximately 20 to 30 min to administer individually to each child. For each item, the examiner said a word and the child pointed to the picture that best illustrated its meaning. The testing stopped when there were eight or more errors in a set of 12.
The monolingual children were administered Form B of the PPVT-4, as this was the form used with the bilingual children at the age of six, the same age as the monolingual participants. This allowed for an item analysis between the two groups.

Coding, Scoring, and Data Analysis
As the bilingual children in this study used Romanian at home and English at school, there was a possibility that different patterns of responses would emerge for the English PPVT-4 and its Romanian-adapted version. In order to explore such differences, all words from the 17 sets were first categorized as "home" or "academic" (see Appendix B). Scores were then compared at T1 and T5. Expanding on the criteria used by Bialystok et al. (2010), we included the following types of items in the home category: commonly experienced food and household items (e.g., banana, lamp), culture-specific items (e.g., muffin, canoe, etc.), frequently used clothing (e.g., shoe, jacket), household pets (e.g., dog, cat), frequent physical activities (e.g., jumping, peeking), high-frequency body parts (e.g., mouth, knee), common colours (e.g., red, blue), and words that are unlikely to appear in an academic context (e.g., horrified). Criteria for including the items in the academic category included the following: professions (carpenter, dentist), animals or plants (hyena, cactus), shapes (e.g., rectangle, diamond), musical instruments (violin, clarinet), low frequency body parts (e.g., sternum, pelvis), geographic locations (e.g., peninsula), and words used for academic tasks (e.g., enumerating, composing). Using the criteria mentioned above, two people independently classified all of the items from sets 1-17 for both forms. The inter-rater raw agreement was 97.92% and chance corrected agreement using Cohen's Kappa was 0.92, which also indicated very good inter-rater reliability. A consensus was reached on all disagreements and in the end, 54 items were classified as "home" and 138 as "academic".
Because low frequency items generally end up being assigned to the "academic" category and the high-frequency items are potentially assigned to the "home" category, the frequency level for each item was established using the British National and Lextutor corpora (www.lextutor.ca). The items in the "home" category ranged from level 1000 to 8000 frequency band, while those in the "academic" category ranged between 1000 and 17,000. Within the "academic" category, 79.71% of the items were in the 1000 to 8000 frequency band, while almost 20% (18.11%) of the items were in the 9000 to 17,000 frequency band. (Three items could not be found in the British National and Lextutor corpora, and, therefore, were not assigned to any frequency band; they were excluded from the present analysis). Although it is obvious that frequency was not an unambiguous classification factor, to eliminate confounding effects related to frequency, only those items with a frequency between 1000 and 8000 level for each of the two categories were included in the analysis of home versus academic repertoires. The final count for the home items was 52 and for the academic ones was 110.
In addition, each English word from the PPVT-4 (Form B) was rated as being a cognate or a non-cognate. Two native speakers of Romanian independently classified the items into cognates and non-cognates, basing the amount of similarity between the phonological form of the English and Romanian equivalents through the COSP (Kohnert et al. 2004). Based on Kelley and Kohnert (2012), who found COSP to be a suitable instrument in child bilingualism research, each Romanian-English pair was phonetically transcribed and assigned a value between 0 and 10, with 0 corresponding to a word pair that shared no phonological commonalities and 10 corresponding to a complete phonological overlap. To assign a COSP value, the scale in Table 1 was used for each cognate pair. For example, two consonants were considered to be similar sounds (score 1 in the category of initial sound overlap), if they shared at least one of the three features of place, manner, and voice, or at least one of the same sounds in a consonant cluster. For example, the English word tuba (tuba in Romanian) would be given a score of 1 for the initial sound overlap instead of 3 because the Romanian sound /t/ is a voiceless dental stop, unlike its English equivalent, which is a voiceless alveolar stop. The final score was determined by four features; namely, shared initial sound, shared number of syllables, shared consonants, and shared vowels (see Table 1). Note that the scale does not take into consideration fine-tuned distinctions between the acoustic and perceptual aspects of phonemes that are considered parallel in the two languages (e.g., the language-specific voice onset time of voiced and voiceless obstruent pairs in languages like Romanian and English).

Feature Overlap Scoring Example (from Romanian-and English)
Initial sound (0-3 points) 3 = Same consonant 2 = Same vowel 1 = Similar sound (e.g., same sound class or one element of a consonant cluster) 0 = Complete mismatch between initial sounds lichid-'liquid' cerc-'circle' 1 Adapted from Crossing borders: recognition of Spanish words by English-speaking children with and without language impairment, by Kohnert et al. (2004).
The PPVT-4 contained words whose COSP scores ranged from 0 to 10. For instance, banana had a score of 10; ferma, farm had a score of 7; and autobus, bus had a score of 1. The average COSP score on Languages 2018, 3, 33 9 of 29 the PPVT-4 Form B was 6.59 (SD = 2.43). Cognates with COSP scores between 0 and 5 were excluded from the analysis. This cut-off point was chosen based on empirical evidence from Kohnert et al. (2004), who found that the majority of the monolingual speaking adults correctly guessed the English translation for 15-50% of Spanish words with COSP scores from 6 to 9, but did not guess the English translation for Spanish words with COSP scores lower than 5. The raw agreement between raters for assigning COSP scores for Form B was 95.31% and chance corrected agreement using Cohen's Kappa was 0.90 (i.e., high inter-rater reliability). Consensus was reached on all disagreements and all test items were classified as either cognate items (total of 70 items from sets 1-16) or non-cognate items (total of 122 items from sets 1-16). After COSP scores were assigned, the semantic overlap between pairs of cognates was examined and only those items that were identical or near-identical in meaning were selected for the analysis. For instance, the pair marsupiu-'marsupial' with a COSP score of 9, has identical meanings and was thus selected for the analysis. In contrast, the pair fizician-'physician', despite having a COSP score of 9 and being etymologically related, was excluded from the analysis because fizician means 'physicist' in Romanian.
In order to analyze the results in more fine-tuned ways, the data were also grouped as home/non-cognate, home/cognate, academic/non-cognate, and academic/cognate. These categories were used to compare the performance of the bilinguals in Romanian and English at T5, as well as the performance of the bilingual and monolingual children in Romanian at the age of six. By breaking up the categories in this fashion, the results can be viewed without the confounding overlap between academic and cognate words because, in both English and Romanian, the academic register tends to contain more Latinate words than is the case with non-academic high frequency items (Petrescu et al. 2017).

Results
As indicated above, the current study aimed to address three research questions. First, the study aimed to look at patterns of lexical development in Romanian-English bilingual children. Next, the study set out to determine whether there is any evidence of cognate facilitation among bilinguals during their kindergarten years. Finally, the study aimed to identify similarities and differences in types of vocabulary acquired by monolingual Romanian-speaking children and the Canadian-born Romanian-English bilingual children.

Lexical Development of the Three Romanian-English Bilinguals
3.1.1. Peabody Picture Vocabulary Test, Fourth Edition Overall Scores from T1 to T5 in English and Romanian-Adapted Versions Raw scores and percentiles for the English PPVT-4 as well as the raw scores for the Romanian-adapted PPVT-4 have been obtained. In Figure 1a-c, the children's raw scores for the English PPVT-4 and Romanian-adapted PPVT-4 are plotted at each round of data collection. Figure 1a reveals that Dan's English and Romanian raw scores were near equal at the start of the study (T1) and continued to grow over the two-year period, with Romanian surpassing English at T5. The steepest growth curve for English appeared between T1 (when Dan started kindergarten) and T2 (after six months of English instruction). For Romanian, Dan experienced a sudden surge in his receptive vocabulary between T1 to T2 and between T4 to T5 (end of kindergarten) following a four month visit to Romania. Figure 1b reveals that Radu's raw scores for both Romanian and English were near equal at T1, with Romanian showing a small advantage. Both languages continued to develop over the two-year period. Radu's English raw scores increased suddenly between T1 and T2, and then again between T4 and T5. The development of his Romanian receptive vocabulary from T1 to T5 was steady without periods of stagnation or attrition.
3.1.1. Peabody Picture Vocabulary Test, Fourth Edition Overall Scores from T1 to T5 in English and Romanian-Adapted Versions Raw scores and percentiles for the English PPVT-4 as well as the raw scores for the Romanianadapted PPVT-4 have been obtained. In Figure 1a-c, the children's raw scores for the English PPVT-4 and Romanian-adapted PPVT-4 are plotted at each round of data collection.  Figure 1a reveals that Dan's English and Romanian raw scores were near equal at the start of the study (T1) and continued to grow over the two-year period, with Romanian surpassing English at T5. The steepest growth curve for English appeared between T1 (when Dan started kindergarten) and T2 (after six months of English instruction). For Romanian, Dan experienced a sudden surge in  Figure 1c shows that at T1, Moni's raw scores for Romanian were higher than the raw scores for English and remained so until T5, when her English score surpassed the Romanian one. Both Romanian and English raw scores continued to grow over the two-year period. Figure 2 reproduces the plot for percentile scores for the English data for all three children. Overall, all three children's percentile scores improved substantially from T1 to T5, with Radu and Moni exceeding the mean scores at T5 and Dan arriving at the mean at T5. (Note that no corresponding summary can be provided for Romanian because there are no standardized tests for receptive vocabulary).

Languages 2018, 3, x FOR PEER REVIEW 11 of 29
Moni exceeding the mean scores at T5 and Dan arriving at the mean at T5. (Note that no corresponding summary can be provided for Romanian because there are no standardized tests for receptive vocabulary).

Item Category Analysis
The PPVT-4 data were further analyzed using two conceptual frameworks: home/academic category and cognate/non-cognate category. (Note that only Form B of the PPVT-4 was used for these comparisons, making the scores at T2 and T4, based on Form A unsuitable for a comparison of congruent items.) As the means reported are raw, in each category, the maximum number of items needs to be considered as a guideline. Percentages are not being reported as they are not suitable given that the PPVT-4 works with age-appropriate percentile scores.

Scores at T1, T3, and T5 for the Home and Academic Items in Romanian and English
To explore whether certain portions of the children's vocabulary are affected by the context where they are used (home or school), the items in the PPVT-4 were classified as "home" and "academic". Of the total number of items present in Form B of the PPVT-4, 54 items were classified as "home" and 138 as "academic". However, as discussed in the method section, the analysis excluded items that were beyond the 8000-frequency level, leaving 52 home items and 110 academic items. Table 2 presents the distribution of the items in each home/academic category within the frequency levels. As can be seen in Table 2, three items (amounting to 2.17% of the total) could not be found in the BNC and Lextutor corpora, and therefore, assigning these words to a frequency band was not possible. For this reason, they were excluded from the present analysis. The results of the analysis are presented in Figure 3a-c.

Item Category Analysis
The PPVT-4 data were further analyzed using two conceptual frameworks: home/academic category and cognate/non-cognate category. (Note that only Form B of the PPVT-4 was used for these comparisons, making the scores at T2 and T4, based on Form A unsuitable for a comparison of congruent items.) As the means reported are raw, in each category, the maximum number of items needs to be considered as a guideline. Percentages are not being reported as they are not suitable given that the PPVT-4 works with age-appropriate percentile scores.

Scores at T1, T3, and T5 for the Home and Academic Items in Romanian and English
To explore whether certain portions of the children's vocabulary are affected by the context where they are used (home or school), the items in the PPVT-4 were classified as "home" and "academic". Of the total number of items present in Form B of the PPVT-4, 54 items were classified as "home" and 138 as "academic". However, as discussed in the method section, the analysis excluded items that were beyond the 8000-frequency level, leaving 52 home items and 110 academic items. Table 2 presents the distribution of the items in each home/academic category within the frequency levels. As can be seen in Table 2, three items (amounting to 2.17% of the total) could not be found in the BNC and Lextutor corpora, and therefore, assigning these words to a frequency band was not possible. For this reason, they were excluded from the present analysis. The results of the analysis are presented in Figure 3a-c. Figure 3a-c reveals that all three children had higher scores proportionally in their home items in both English and Romanian. By T5, their scores were close to the maximum (52). All three children had lower means proportionally in the academic section in both Romanian and English. In the academic section, Dan and Radu experienced the sharpest increases between T1 and T5 in both  3a-c reveals that all three children had higher scores proportionally in their home items in both English and Romanian. By T5, their scores were close to the maximum (52). All three children had lower means proportionally in the academic section in both Romanian and English. In the academic section, Dan and Radu experienced the sharpest increases between T1 and T5 in both English and Romanian. Moni experienced a sharp increase mainly between T1 and T5 mainly in English (this was true also of a sharp increase in her English home vocabulary between T1 and T5).
Scores at T1, T3, and T5 for the Cognate and Non-Cognate Items in Romanian and English In order to investigate cross-linguistic influence in terms of possible cognate facilitation, the items from the English PPVT-4 tests were classified as either cognates or non-cognates on the basis of the cross-linguistic overlap scale for phonology (COSP) discussed in Section 2.4. As explained earlier, only the items that had a COSP score higher than 5 and those that were identical or near-identical semantically were included in the present analysis. After this screening, 122 items were classified as non-cognates and 70 items were classified as cognates.
We examined the possibility of cognate facilitation through how steep the upward trend was from T1 to T5. In the present case, the curve for non-cognates was steeper than that for cognates (see Figure 4). If anything, improvement was slightly higher for the non-cognates in some cases, for example, Moni's English scores and Radu's Romanian scores. English and Romanian. Moni experienced a sharp increase mainly between T1 and T5 mainly in English (this was true also of a sharp increase in her English home vocabulary between T1 and T5).
Scores at T1, T3, and T5 for the Cognate and Non-Cognate Items in Romanian and English In order to investigate cross-linguistic influence in terms of possible cognate facilitation, the items from the English PPVT-4 tests were classified as either cognates or non-cognates on the basis of the cross-linguistic overlap scale for phonology (COSP) discussed in Section 2.4. As explained earlier, only the items that had a COSP score higher than 5 and those that were identical or nearidentical semantically were included in the present analysis. After this screening, 122 items were classified as non-cognates and 70 items were classified as cognates.
We examined the possibility of cognate facilitation through how steep the upward trend was from T1 to T5. In the present case, the curve for non-cognates was steeper than that for cognates (see Figure 4). If anything, improvement was slightly higher for the non-cognates in some cases, for example, Moni's English scores and Radu's Romanian scores.
(a)  Table 3 analyzes the means and standard deviations in the four categories mentioned above: home versus academic and cognate versus non-cognate. As explained in Section 2.4, as there is  Table 3 analyzes the means and standard deviations in the four categories mentioned above: home versus academic and cognate versus non-cognate. As explained in Section 2.4, as there is overlap between academic and cognate repertoires in a language like English when examined alongside a Romance language like Romanian, the items were further broken up into four non-overlapping categories; namely, home/non-cognate, home/cognate, academic/non-cognate, and academic/cognate. A close examination of these newly created categories indicates the bilinguals' performance in the home/non-cognate and home/cognate categories were close to the maximum in English and Romanian and do not provide any evidence for cognate facilitation. The academic categories are lesser known in both English and Romanian, and once again provided no evidence for cognate facilitation. We should note that, while the raw scores for academic/non-cognates were slightly higher in English, the raw scores for academic/cognates were a little higher in Romanian. In short, the bilinguals' mean scores were very similar in their two languages.

Overview of the Lexical Knowledge of Monolingual Children Compared to Bilingual Children at Age Six
A list with all the academic items and the correct answers for each group is presented in Appendix C. All the "home" items are listed in Appendix D. Overall, the Romanian children had a bigger vocabulary size than their bilingual counterparts. Their mean score was 134.86, while the bilingual children's mean raw score was 119.33. However, a Kruksal-Wallis non-parametric test indicated that the difference was not significant (p = 0.13). Table 4 presents the bilingual and monolingual performance at age six on sub-classes of words on the English and Romanian PPVT-4.  According to Table 4, the home items were known to an equal extent by both groups. The academic items were better known by the monolingual children (M = 84.73; SD = 19.87 for monolinguals vs. M = 70.0; SD = 21 for bilinguals).
For the monolingual group, a Wilcoxon test revealed a significant difference between home and academic scores in favour of home scores (p < 0.0001). Note that comparisons between cognates and non-cognates within the monolingual group are not relevant. We have included the cognate/non-cognate distinction in Table 4 only for the purposes of comparing their scores with the bilinguals to determine whether the bilinguals enjoyed any cognate facilitation.
When the bilinguals were compared with the monolinguals, using the Kruksal-Wallis non-parametric test, in each of the categories labeled in Table 4, none of the differences were significant. The means and these non-significant values confirm that the bilinguals did not benefit from cognate facilitation. When we look at the means of the monolinguals and bilinguals, we see that in all of the home categories their scores are very similar. However, the monolinguals have much higher means in the academic categories.

Discussion
The results indicate that the receptive vocabulary of the three Canadian-born children increased with age in both Romanian and English. In English, their scores could actually be viewed in light of established norms. While all started below the 50th percentile at T1 at the beginning of the study (when they started kindergarten), they scored at or above the 50th percentile at T5 (at the end of their kindergarten years). Their sharpest increase occurred during the first six months of attending kindergarten, reflecting the rapidity of L2 lexical growth at least in the context of this study.
There were, however, some noteworthy differences in the children's patterns of lexical development. One factor that we could isolate as impacting both Romanian and English was the length and location of the children's summer vacations. Both Radu and Dan, for example, experienced their steepest growth in Romanian vocabulary after month-long visits to Romania at the beginning of kindergarten Year 1 and kindergarten Year 2, underlying the powerful impact of being fully immersed in a heritage-language milieu. Moni's visits to Romania, on the other hand, were relatively brief, which might explain the more gradual increase in her Romanian vocabulary. Conversely, Dan's English scores showed peak performance at T2 and T4 (six months into kindergarten Years 1 and 2, respectively) and lowest performance at T3 and T5 (at the beginning and end of kindergarten Year 2). A plausible explanation for this behavior is again Dan's month-long trips to Romania over the summer holidays. After full immersion in the heritage language environment, Dan's English skills seem to have regressed, attesting to the fact that the linguistic soundscape of the children is highly dynamic and plays a crucial role in the balance between the children's two languages (De Houwer 2009). The finding is also in line with previous research that suggests that summer vacations in the home country positively impact receptive language skills in the minority language of bilingual children (Hammer et al. 2008;Rojas and Iglesias 2013) and negatively impact the school language.
While two years of schooling in English appear to have erased the gap between the bilingual children and their English-speaking monolingual counterparts, there appear to be some striking differences between the bilinguals' Romanian vocabulary when compared with their 22 monolingual counterparts in Romania. As expected, the size of the monolinguals' vocabulary was greater than that of their Canadian counterparts. In short, while the English monolingual and bilingual comparisons in this study did not show a smaller English vocabulary in the bilingual children, contrary to expectations (Bialystok and Feng 2009;Werker 2012), bilingual and monolingual comparisons in the heritage language did seem to be compatible with the general belief that, as bilingual children's input and interaction is divided between two languages, their lexicons in each show gaps in some content areas (academic vocabulary). The difference in the size of the monolinguals' and bilinguals' academic vocabulary can be attributed to the bilinguals' lack of opportunity to study academic content in Romanian. The following section explores some differences between the words known by the monolinguals and bilinguals.

Comparing the Compositions of the Monolingual and Bilingual Romanian Lexical Repertoires
As the purpose of the study was more than to simply quantify the receptive lexical knowledge of the monolingual and bilingual participants, but also to look at the types of items in each groups' repertoires, looking at the actual items in their home and academic vocabularies shed some light on the similarities and differences in the environments of Romanian lexical acquisition in Canada and Romania. A comparison of the academic items known by the bilingual children and their monolingual counterparts reveals some commonalities. Neither group was familiar with certain words related to geometry (sferic-'spheric', parale-'parallel', concav-'concave'); low-frequency words related to body parts (bazin-'pelvis'; maxilar-'jaw'); archaic words (pocal-'goblet'; teacă-'pod'); or words related to different trades (scripete-'hoisting'; lubrifiază-'lubricating'; vitezometru-'speedometer'). On the other hand, certain academic items were known by both groups, for instance, creier ('brain'), astronaut ('astronaut'), and binoclu ('binoculars'). However, there were many words that the monolingual children knew, probably through classroom activities, for example, hartă as in 'map', marsupiu as in 'marsupial', or fundat , ie as in 'foundation'. Other words that were known by the Romanians monolinguals and not by the Romanian-English bilinguals reflected cultural differences between their communities. For example, words such as asamblează-'assembling', fundat , ie-'foundation', mistrie-'trowel', tubular-'tubular' are often encountered in a domestic context, as many Romanian families build, repair, or modify their homes on their own. Moreover, these Romanian children have probably had the opportunity to encounter words such as stup-'hive', t , ap-'goat', mânz-'colt', and păun-'peacock' in their grandparents' homes (as many still live in villages and raise their own animals).

Cognate Facilitation
As discussed above, to investigate whether young bilingual children are sensitive to Romanian-English cognates with overlapping meanings in spoken form, we examined the children's scores on cognates and non-cognates in the PPVT-4. In the bilingual study conducted in Canada, the results were mixed. One of the children, Radu, seemed to show evidence of cognate facilitation at both T1 and T5, while Dan and Moni showed no cognate advantage at any stage. As suggested by Kelley and Kohnert (2012), such individual variation is frequent in the study of bilingual children "even within any well-defined relatively homogenous sample" (p. 200). In our comparison between the Canadian bilinguals and the Romanian monolinguals, there were only a small handful of cognates that the bilinguals knew, and the monolinguals had difficulty with the following: triplet , i-"triplets", jogging-"jogging", pedală-"pedal", and fictiv-"fictive". However, there were numerous other potentially recognizable cognates that the monolinguals knew (despite having no second language), but that the bilinguals clearly did not recognize. The means and the non-significant statistical differences recorded in Table 4 provide further evidence of the lack of cognate facilitation among the three bilinguals.
The literature cautions that identifying cognates is a skill that develops with age; proficiency in the two related languages; and occasionally only through instruction, for example, when children begin to read for comprehension, as it is not intuitively developed at a young age Kelley and Kohnert 2012;Malabonga et al. 2008). The three bilingual children in our study were younger than the age at which children are apt to recognize cognates (Grade 5, as reported in Malabonga et al. 2008). They are also very new learners of English, having only started interacting in the language at the age of four.

Liitations of This Study
The design of the two-year longitudinal bilingual study provided some much-needed insights into first language development and maintenance of Romanian in a large urban centre in Canada, where English is the mainstream language. However, it is obvious that such a study needs to be followed up by research involving larger sample groups, especially as Romanian is of growing importance as a heritage language in the Greater Toronto Area. Larger numbers would clearly lend power to any statistical tests that chart bilingual progress or make comparisons between groups. While the monolingual study featured a larger sample group (n = 22), one of its major shortcomings lays in the fact that data collection was limited to a single point in time (i.e., when children were six years old). Longitudinal or larger cross-sectional studies on the acquisition of Romanian in Romania would certainly provide some yardsticks for viewing Canadian heritage language data in perspective.
A further limitation lies in the fact that the data analyzed in this study encompass receptive vocabulary knowledge alone. Ideally, such data can be greatly enriched by naturalistic or elicited production data because the latter provides information about the morphosyntactic, semantic and pragmatic accuracy of word use in a contextualized manner. As we know, correct responses in multiple-choice receptive tests can result from guessing through elimination, and even if not the product of guessing, correct responses do not necessarily reflect an accurate or rich mental representation of the item (Gyllstad et al. 2015).
While we made progress in creating a version of the PPVT-4 for Romanian by consulting parents of preschool children in Romanian-only households in a specific neighborhood in the Greater Toronto Area and pilot testing the items with older children, one major drawback of this newly designed test lay in the lack of reliable frequency data for the Romanian items. As the test items were, for the most part, translation equivalents of the English items, there is a distinct possibility that many were not in the same frequency range as their English equivalents. Second, the lack of any Romanian norms for tests of children's vocabulary made it impossible to consider the bilingual children's performance alongside that of standardized monolingual scores.
A related concern relates to the use of the PPVT-4 to explore cognate facilitation. While examining cognate facilitation is apropos or even necessary in a study involving two lexically proximate languages like Romanian and English, the PPVT-4 is not created to investigate cognate effects in a controlled manner. It was thus difficult to include a sufficient number of relevant items in the test, that is, those with a COSP of 5 and over, to obtain conclusive results within the bilingual study or across the monolingual and bilingual studies. Thus, what we obtained was only a gross measure of cognate facilitation (or the lack thereof).

Implications for Research and Testing
This study focuses on language acquisition in a unique context. There have been no previous studies involving Canadian-born children with Romanian as their first language and English as their second. Moreover, even if we consider contexts beyond a Canadian one, barring a few notable exceptions (Avram 2001;Buja 2008), Romanian has been conspicuously absent in mainstream child acquisition research. The study also makes a contribution to minority language retention in a situation where the school language, English, is not only the mainstream language of the community, but is also the language of world media, children's electronic games and TV shows, and global communication. By using a longitudinal design, the current study also captures changes in the children's minority and majority language lexicons at multiple points, allowing for new insights on the transition from home-based monolingualism to community-based bilingualism. This unique design adds to the few studies involving Romanian as a heritage language that have been conducted in different contexts and with a different focus (Buzilă 2016;Mesaros 2008;Montrul et al. 2015;Nesteruk 2010).
These insights notwithstanding, in light of the new Romanian diaspora, especially in countries like Canada, the time is ripe for larger studies investigating the acquisition of Romanian in both monolingual and bilingual contexts, and involving participants of various ages. As discussed earlier, reliable frequency data in Romanian as well as standardized Romanian tests would greatly enhance acquisition research in the language. Such studies could also provide a clearer picture of cross-linguistic influence involving Romanian L1 and English L2, including cognate facilitation, especially because Romanian, like French and other Romance languages, has the Latinate counterparts of low-frequency English academic words, but occurring in its higher frequency bands (Cummins 2005;Petrescu et al. 2017).
The development of the Romanian-adapted PPVT-4 used to test the bilingual and monolingual children was also a novel contribution of this study. However, it is clear that such a test is only a good starting point for creating a Romanian version of the PPVT based on frequency data and validated through trial runs involving large sample groups consisting of participants of not only different ages, but also different socio-cultural milieus. Perhaps, like any test being used in both monolingual and bilingual situations, a valid and reliable Romanian PPVT needs to be based on not only general frequency data, but also frequency data from specific communities. For example, in our study, the monolingual Romanian children knew words such as stup-'hive', mistrie-'trowel', and manz-'colt', presumably because these words are frequently used in their cultural context. It is much less likely that pre-schoolers growing up in an urban centre in North America would be exposed to these items at home or would even find them useful in their interactions within the Romanian-speaking community. One potential solution would be to build different versions of a lexical test for bilinguals and monolingual that take ethnographic details into consideration. Another solution would be to build a certain amount of flexibility into one test package, for example, by allowing for alternatives to certain culture-specific test items. Funding: This research received no external funding.

Acknowledgments:
We greatly appreciate the insightful and relevant comments made by the reviewers as well as the excellent guidance and oversight provided by the academic editors. The mechanism for submitting manuscripts and responding to reviews was excellent. We would also like to thank Pablo Velázquez for his clear and quick responses to our queries throughout the review process. Finally, we would like to extend our grateful thanks to Tamara Al-Kasey for helping us edit the first draft of our paper.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Parent Consent Form
I have read and understood the letter describing the proposed study, titled "Minority Language Acquisition and Retention: A Study of Canadian-Born Romanian-Speaking Bilingual Children". I understand that my child will be participating in language activities so that the researcher can examine my child's language and language comprehension and production. All information collected will be kept confidential. There are no risks involved and there are financial direct benefits to me, and my child. The study will increase researchers' understanding of children's language development. I understand that my child will be asked if s/he wants to take part and will not be required to do so if s/he is shy or unwilling. Also, s/he may stop and return to the classroom at any time without penalty. I may withdraw my consent at any time.