How Many Palabras? Codeswitching and Lexical Diversity in Spanish-English Picture Books

Bilingual picture books have been growing in popularity, with caregivers, teachers, and researchers increasingly interested in understanding how picture books might be able to support the learning of words in two languages. In this study, we present the first evaluation of the quantity and quality of text contained within bilingual picture books in English and Spanish targeted to children ages 0–9 and available to parents in the United States. We focus specifically on a sample of codeswitching books (N = 45) which present text in one language embedded in another language. All books were transcribed and evaluated for (1) the number of words and utterances presented in each language; (2) the quality and complexity of text presented in each language; and (3) how switching occurred between the two languages. Results showed that although picture books in our sample presented predominantly English text and more complex English sentences, relatively more unique words were presented in Spanish. Furthermore, picture books in our sample presented frequent switching between languages, particularly within utterances. We suggest that bilingual picture books provide children with potentially enriching yet asymmetrical opportunities for learning in each


Introduction
Early reading experiences provide an important foundation for young children's language development. Children's vocabulary knowledge is positively predicted by having more books in their home, reading with their caregivers at earlier ages, and spending more time reading with their caregivers (Farrant and Zubrick 2012;Sénéchal and Lefevre 2002;Duursma et al. 2007;Fitton et al. 2018). One explanation for why early reading is critical is that picture books contain rich linguistic input, often including words and concepts that are not typically found in spoken language (Montag et al. 2015). Given that children's language development is linked to their rich and varied experiences with language in their environments (e.g., Hurtado et al. 2008;Ramírez-Esparza et al. 2017;Sperry and Sperry 1996;Weisleder and Fernald 2013), it is important to evaluate the text that children's picture books contain as a key source of enriching input contributing to young children's language development (Montag et al. 2015;Logan et al. 2019).
For bilingual children, who need to learn roughly twice as many words as their monolingual peers, language skills depend not just on their overall input, but also on the input that they receive in each of their languages (Marchman et al. 2017;Place and Hoff 2011;Pearson et al. 1997). Moreover, bilinguals have to contend with challenges stemming from unequal exposure to each of their languages and the possibility that the two languages will be mixed together (Fennell and Byers-Heinlein 2014;Oller et al. 2007). Recognizing these challenges inherent in bilingual learning, caregivers, teachers, and researchers are becoming increasingly interested in understanding how picture books support the learning of words in two languages (Brouillard et al. 2020;Méndez et al. 2015;Read et al. 2021b;Restrepo et al. 2013). However, no empirical research has systematically evaluated the quality of the text in bilingual picture books, which present text in two languages. In the current study, we examined the text of Spanish-English bilingual picture books in order to understand how these books may contribute to the input that children receive within and across languages.
Bilingual children's books have been growing in popularity, both as a source of support for bilingual language development and as a means to expose monolingual children to words in a second language. In our own search, more than half of the books marketed to families as bilingual were published in the last 10 years (see Appendix A for full list of books). Bilingual families report owning bilingual books in addition to monolingual books in each of their languages (Gonzalez-Barrero et al. 2021). Beyond providing exposure to multiple languages, bilingual books have the potential to support dual-language literacy and academic skills for bilingual children at home, in the classroom, and in libraries (Agosto 1997;Dávila et al. 2017;Castro et al. 2011;Ernst-Slavit and Mulhern 2003;Glazer et al. 2017;Gómez et al. 2021;Hadaway and Young 2013;Rodríguez-Valls 2011;Semingson et al. 2015;Thibeault and Matheson 2020;Walker et al. 1996).
A growing body of research has begun documenting the benefits of bilingual books. For instance, reading bilingual books increases bilingual children's engagement and confidence during literacy activities (Hu et al. 2012;Zaidi 2020). Furthermore, compared to monolingual books, bilingual books have been found to be just as supportive, and sometimes more supportive, of young children's vocabulary learning and language skills in two languages (Brouillard et al. 2020;Naqvi et al. 2012;Read et al. 2021b;Tsybina and Eriks-Brophy 2010). Given the potential importance of bilingual picture books for dual language development, recent studies have emphasized the need to design and select bilingual children's books that promote engagement and learning, focusing on issues related to the appropriateness of the narrative content, translation quality, and design features of the text and illustrations (Chen 2019;Gallagher and Bataineh 2020;Glazer et al. 2017;Hojeij et al. 2019;Walker et al. 1996). Critically, in order to understand the potential role of bilingual books for supporting learning in two languages, examinations of the quantity and quality of text in each language contained within these books are needed.
Previous research exploring bilingual language environments has shown that the language input that bilingual children experience not just holistically, but in each language individually, contributes to dual language development. Bilingual children's knowledge of their two languages develops separately, and language skills in each of their languages are related to the specific experience they have had with that language (Marchman et al. 2010;Pearson et al. 1997). When children hear more words and a higher diversity of words in a particular language, their vocabulary knowledge in that language grows (David and Wei 2008;Marchman et al. 2017;Hoff et al. 2012;Pearson et al. 1997). Therefore, to understand how the text in bilingual picture books may promote learning, it is important to determine how much input is provided in each language and whether the input is equally rich across the two languages.
It is also important to recognize that the two languages can be used together, not just presented individually (e.g., Bullock and Toribio 2009;Vogel and García 2017). According to both observational studies and parental reports, codeswitching, or the use of two languages together, is a common feature of bilingual children's auditory input (Bail et al. 2015;Byers-Heinlein 2013;Goodz 1989;Kremin et al. 2021). Children regularly hear switches in language both within a single utterance (e.g., ¡Mira el firetruck!) or between utterances (e.g., ¡Mira el camión de bomberos! Its sirens are so loud!). Interestingly, bilingual children do not show disruptions in their processing when hearing switches that span utterance boundaries but are often less efficient to comprehend sentences that contain a within-utterance switch (Byers-Heinlein et al. 2017;Morini and Newman 2019;Potter et al. 2019). Moreover, codeswitching can interfere with children's learning of new words (Byers-Heinlein 2013;Byers-Heinlein et al. 2020). Nonetheless, codeswitching is relatively common in bilingual children's everyday experience. However, we do not yet know whether bilingual picture books offer input that matches the mixing found in spoken language or present the two languages in different ways.
Only a limited set of studies have examined early reading and literacy activities within bilingual environments, using different approaches. One approach involves assessing readers' talk surrounding shared reading activities. When asked about reading practices at home, bilingual caregivers report using both languages when reading to their child, even if reading a book in a single language (Muysken et al. 1996;Read et al. 2021a). For example, caregivers may translate words or switch to their dominant language while reading a book in their less-preferred language (Gonzalez- Barrero et al. 2021), although the language of preference during reading may depend on the home language environment and the reading preference of the caregiver (Read et al. 2021a). Direct assessments of talk during shared reading to bilingual children, most of which have been conducted via case studies, demonstrate that teachers and caregivers incorporate both languages in their extra-textual talk (Bauer 2000;García and Kleifgen 2020;Gonzalez-Barrero et al. 2021;Kabuto 2010;Li and Fleer 2015;Moody et al. 2021;Pontier and Gort 2016;Song 2016). In addition, children's learning from reading interactions is shaped by how the two languages are used together (Brouillard et al. 2020;Méndez et al. 2015;Read et al. 2021b;Restrepo et al. 2013). When Spanish-speaking preschoolers heard both English and Spanish during instruction involving shared reading, they learned more words than when they only heard English (Méndez et al. 2015). Together, these studies provide evidence that bilingual children regularly encounter both languages during reading and that differences in how the languages are used can affect the words that they learn.
While parents undoubtedly present important input via their talk surrounding books, the text of books themselves provide a potentially valuable source of input, and we know little about the quality of the text in bilingual children's books. Given that parents are likely to read picture books to young children from an early age (Deckner et al. 2006;Raikes et al. 2006;Taaffe Young et al. 1998), scrutinizing the text in children's picture books can provide a useful measure of the kinds of words that children have the opportunity to encounter. This approach has been used recently to assess the quantity and quality of words found in English children's picture books (Dawson et al. 2021;Logan et al. 2019;Montag et al. 2015). However, no study has examined the quantity and quality of text contained within bilingual children's picture books. In the present study, we measured the quantity and quality of text in Spanish-English bilingual codeswitching picture books currently available to parents in the U.S. to understand their potential contribution to the language input of young bilingual learners.
Although bilingual books are now available in a variety of language combinations, we focused specifically on books containing English and Spanish given that Spanish is the second most spoken language in the U.S. (U.S. Census Bureau 2019). A limited set of studies has examined the text of Spanish-English bilingual picture books written for children; however, these studies have largely focused on the narrative content of books (Alamillo 2017;Barrera et al. 2003;Botelho and Marion 2020;Chappell and Faltis 2007;Chaudhri and Torres 2021;Clark et al. 2015;Domke 2018;Gomm et al. 2017;Kelly 2020;Naidoo 2011;Naidoo and López-Robertson 2007). Only two studies have provided analyses of the quantity of each language presented in bilingual children's books, showing that Spanish-English bilingual books contain predominantly English text (Domke 2018;Gomm et al. 2017). According to Domke (2018), English text was also more readable than Spanish text. These findings provide preliminary evidence that Spanish-English bilingual children's books do not seem to provide equal input in the two languages and are likely skewed toward English. In addition to examining the amount of text in each language, our study further probed the richness of the input that children may encounter in Spanish vs. English, as well as providing the first analysis of how the languages are used together in a large sample of picture books.
Our sample of books was generated using online search engines that parents in the U.S. are likely to use when choosing bilingual picture books in English and Spanish for their children. Previous research has indicated that there are many different types of bilingual books (Domke 2018;Jeffers 2009;Semingson et al. 2015). Our search results included translation books, which present the entire text in both languages, as well as codeswitching books, which involve the embedding of words and/or sentences in one language within another language. The current study targets codeswitching books specifically, as this book format presents different information in each language and also presents multiple ways in which the two languages can be used together, reflecting the potential variability of dual-language experience. Moreover, while a translation book may be essentially two copies of a monolingual book, books containing codeswitching represent a type of input that is unique to the bilingual experience.
Each book was transcribed in its entirety, and we calculated multiple measures of quantitative and qualitative properties of the text in each picture book, taking inspiration from recent studies evaluating the quantity and quality of text in popular children's books in English (Logan et al. 2019;Montag et al. 2015), and from research describing childdirected speech (Anderson et al. 2021;Hurtado et al. 2008;Rowe 2012). For each picture book in our sample, we computed total word counts, assessed sentence complexity, and measured lexical diversity in each language. We additionally examined switching between the two languages. Our goal was to address three primary research questions.
First, we asked whether the picture books provide different amounts of exposure to Spanish vs. English. Our measures of quantity (calculated separately for each language) were the total number of words (word tokens), the number of unique words excluding repetitions (word types), and the number of utterances fully in that language (i.e., number of English-only utterances and the number of Spanish-only utterances). We predicted that picture books, on average, would present text predominantly in English, consistent with prior studies (Domke 2018;Gomm et al. 2017).
Our second question concerned whether there were differences in the quality of text presented in each language. To assess quality, for each picture book, we considered both the lexical diversity and the sentence complexity found in each language. We operationalized lexical diversity as type/token ratio, which corresponds to the proportion of unique words in a language, accounting for the total number of words appearing in that language. To measure sentence complexity, we calculated the mean length of utterance in each language in each picture book. Our prediction was that picture books, on average, would present greater lexical diversity and sentence complexity in English. We also examined the most frequent words used across our entire corpus of picture books for each language as a preliminary way of testing whether different words and concepts were presented in the two languages.
Our final research question asked how the languages were used together. Because no prior study has explored codeswitching in a large sample of bilingual books, we did not have direct hypotheses about the nature of mixing. Rather, we aimed to describe (1) the frequency of language switching both within and between utterances, (2) the quantity of English and Spanish words presented within utterances that contain both languages, and (3) the complexity of utterances containing both languages.

Sample
Our final sample of codeswitching books (N = 45) was taken from a larger list of bilingual children's books in Spanish and English compiled from several online sources: Amazon, Google, and two local library system catalogs (Phoenix Public Library and El Paso Public Library). One additional book was transcribed but excluded due to being more than 5 standard deviations from the sample mean for number of word tokens.
Searches were conducted in June and July of 2021 using the search terms "bilingual children's books" or "Spanish English bilingual children's books." The results of each search were filtered to exclude electronic-only materials and to only include books targeted toward children ages 0-9 that contained both Spanish and English. Books were then classified as either codeswitching if the text switched between English and Spanish such that one language was not exclusively used for direct translations, or translational if the full text was presented in both English and Spanish. The search generated a list of 286 titles.

Transcription and Coding
We transcribed all books with the Computerized Language Analysis (CLAN) program using the Computational Human Articulatory Theory (CHAT) transcription format (MacWhinney 2000). Text was transcribed as written in the book by five trained coders. Coders were all fluent in English and at least moderately proficient in Spanish. Although books often contain dialogue by characters (usually in quotation marks), all text was transcribed as coming from a single source. Background text that was part of the illustration on a page was only included if it was part of the main narrative.
Consistent with CHAT conventions, we divided the text into utterances. Coders were instructed to delimit an utterance when a statement ended in a punctuation mark indicating a full stop (period, exclamation, or question mark). In the case of dialogue (placed within quotation marks), coders were instructed to transcribe character statements and the surrounding narrative text, as separate utterances (see the example below).
> it was not that bad.
Consistent with CHAT conventions, for each transcript, one language was assigned as the default language and the other language was assigned as the secondary language. For all our picture books, English was assigned as the default language and Spanish was assigned as the secondary language. Utterances fully in Spanish were marked as Spanishonly utterances. Spanish words that appeared intermixed with English words within an utterance were marked as Spanish words. Coders were instructed to mark all words that were in Spanish even if these were proper nouns (e.g., Mamá) or have been adopted into the English lexicon (e.g., piñata). Different forms of the same word (e.g., abuela/abuelita, mamá/mami, dog/doggy) were written out as they appeared in the picture book, therefore counting as unique word types. The full codebook used for transcription is available on the Open Science Framework (OSF): https://osf.io/sxm5t.

Measures
Several measures were derived from the transcriptions of each book, using CLAN, to assess quantitative and qualitative features of each text. All measures were calculated for each picture book individually.
Number of word tokens was defined as the total number of words, number of word types was defined as the number of unique words, and type/token ratio was calculated by taking the number of word types and dividing it by the number of word tokens. We also calculated the total number of utterances separately for English-only, Spanish-only, and mixed-language utterances (which contained both languages). Mean length of utterance (MLU) was determined by averaging the number of word tokens in each utterance.
Amount of exposure in each language (quantity) was assessed by comparing the number of word tokens and word types in English vs. Spanish, as well as the number of English-only utterances vs. the number of Spanish-only utterances. Measures of quality included type/token ratio (lexical diversity) for English vs. Spanish and MLU (sentence complexity) for English-only utterances vs. Spanish-only utterances.
In addition, we examined switching between languages. While there are different approaches to assessing language switching (e.g., Bail et al. 2015;Cantone 2007;David and Wei 2008;Isurin et al. 2009;Kremin et al. 2021;Nicoladis and Genesee 1998), in the current study, we relied on CHAT coding conventions designed to separate a default and secondary language (Muysken 2000;Myers-Scotton 1997). We first identified utterances that contained both English words and words marked in Spanish. These utterances were classified as a mixed-language utterance, and counted as a single within-utterance switch (even if multiple switches occurred within that one utterance, consistent with the conventions used by Bail et al. (2015) to describe switching in spoken language; e.g., "she placed el guisante in the bed for their guest" from La Princesa and the Pea). The number of between-utterance switches was derived by identifying Spanish-only utterances and determining whether a language switch occurred before and/or after that Spanish-only utterance (e.g., "mommy loves kisses sweet as can be"/"besos mami besos" from Besos for Baby). In accordance with CHAT conventions, between-utterance switches were counted when a Spanish-only utterance and another utterance type (English-only or mixed-language utterances) occurred sequentially (i.e., when an English-only utterance was followed by a Spanish-only utterance, when a Spanish-only utterance was followed by an English-only utterance, when a mixed-language utterance was followed by a Spanish-only utterance, and when a Spanish-only utterance was followed by a mixed-language utterance).
We counted the number of within-utterance switches and the number of betweenutterance switches in each picture book, and also calculated the percentage of mixedlanguage utterances in each picture book. Additionally, we calculated the proportion of words in English vs. Spanish in each mixed-language utterance and calculated MLU for mixed-language utterances.

Reliability
Reliability for a set of six measures was assessed separately for each picture book using Krippendorff's alpha, which accounts for chance, can accommodate multiple coders, is sensitive to the size of values used in coding data, and takes into consideration the measure of the codes (e.g., ordinal vs. interval; Hayes and Krippendorff 2007;Krippendorff 2011aKrippendorff , 2011bKrippendorff , 2004. Reliability is considered good when values are 0.80 or higher. An experienced coder (the second author) transcribed two "training" books in our sample. The five coders were then asked to transcribe these two books, and training continued until each coder achieved reliability of 0.80 or higher with the expert's transcription for each of our main measures. Once reliability for our training books was achieved, each coder was assigned a set of picture books to transcribe. After coders completed their assigned picture books, we conducted an additional test of reliability by having all coders transcribe the same six books. Reliability was then calculated for six measures: (1) overall MLU, (2) overall number of utterances, (3) overall number of word types, (4) overall number of word tokens, (5) number of Spanish word tokens, and (6) number of Spanish-only utterances. These measures were selected for reliability because they provide an assessment of consistency in overall transcription of the picture books as well as consistency in coders' use of CHAT-specific codes to mark Spanish words and Spanish-only utterances. Reliability was calculated for 17% of the final sample (two training books plus six additional reliability books).
Krippendorff's alpha was high, averaging 0.94 (0.96 for MLU, 0.98 for number of utterances, 0.99 for word tokens, 0.99 for word types, 0.91 for Spanish word tokens, and 0.82 for Spanish-only utterances), demonstrating good reliability. Reliability for the number of utterances in Spanish was initially low (0.40); coders were reminded of the code for marking utterances fully in Spanish and asked to correct all of their transcripts. After correction, good reliability was achieved.

Results
Each of our key measures was derived in CLAN for each picture book separately, and we report descriptives aggregated across picture books, unless otherwise noted. For across-language comparisons, we conducted two-tailed paired sample t-tests, except for our analyses of codeswitching, which were modified in response to the skewness of the data. All analyses and plots were conducted and generated in R (version 3.6.3). All data, CLAN codes used to derive each measure, and R script for the main analyses reported in this manuscript are openly available on OSF: https://osf.io/sxm5t.

Overall Descriptives
We first examined the total number of word tokens, word types, and utterances in each picture book, collapsing across languages. Picture books in our sample contained on average 581.84 word tokens (SD = 356.49; range: 96-1769). Of those words, the mean number of word types (unique words) was 261.62 (SD = 121.71; range: 50-514), resulting in an average type/token ratio of 0.49 (SD = 0.11; range: 0.16-0.74). Picture books in our sample contained on average 85.98 utterances (SD = 51.3; range: 12-217) with an average MLU of 7.34 (SD = 4.33; range: 3.9-31.25). Means and standard deviations for each of our key measures by language are presented in Table 1.

Measures of Quantity: Codeswitching Picture Books Present Predominantly English Text
Our first question was whether codeswitching picture books present different quantities of text in Spanish vs. English. To address this question, we compared the average number of word tokens, number of word types, and number of utterances in each language (see Figure 1). Picture books in our sample on average contained significantly more word tokens in English than in Spanish [t(44) = −8.41, p < 0.001] and significantly more word types in English than in Spanish [t(44)= −10.03, p < 0.001]. We also examined the number of utterances in each language (excluding mixed-language utterances). The number of English-only utterances was higher than the number of Spanish-only utterances, a difference that was significant [t(44) = 6.27, p < 0.001]. Examining the percentage of word tokens in Spanish in picture books in our sample, it is clear that English was the dominant language; on average, only 15.42% (SD = 10.6%) of the text in our picture books was composed of Spanish words tokens, with some books having as little as 1% Spanish word tokens and other books having a maximum of 49% Spanish word tokens. Most books (N = 34) contained less than 20% Spanish word tokens, a few books (N = 11) contained between 20% and 50% Spanish word tokens, and no picture book in our sample contained 50% or more Spanish word tokens.

Measures of Quality: Codeswitching Picture Books Present Shorter Utterances in Spanish but Relatively More Unique Words in Spanish
Our second aim was to compare the quality of text across the two languages. Our first measure of quality was type/token ratio (see Figure 2A). Contrary to our hypothesis, type/token ratio was significantly higher in Spanish than in English [t(44) = 3.62, p < 0.001]. That is, although there may have been fewer words in Spanish presented within our picture books, the lexical diversity of Spanish was higher, demonstrating there were relatively more unique words presented in Spanish.
Our second measure of the quality of the text was mean length of utterance (MLU), which reflects the mean number of word tokens within single-language utterances (excluding mixed-language utterances). The MLU of English-only utterances within picture books in our sample was significantly higher than the MLU of Spanish-only utterances [t(44) = 10.78, p < 0.001]. This suggests that picture books in our sample presented longer and presumably more complex English-only utterances than Spanish-only utterances (see Figure 2B). Examining the percentage of word tokens in Spanish in picture books in our sample, it is clear that English was the dominant language; on average, only 15.42% (SD = 10.6%) of the text in our picture books was composed of Spanish words tokens, with some books having as little as 1% Spanish word tokens and other books having a maximum of 49% Spanish word tokens. Most books (N = 34) contained less than 20% Spanish word tokens, a few books (N = 11) contained between 20% and 50% Spanish word tokens, and no picture book in our sample contained 50% or more Spanish word tokens.

Measures of Quality: Codeswitching Picture Books Present Shorter Utterances in Spanish but Relatively More Unique Words in Spanish
Our second aim was to compare the quality of text across the two languages. Our first measure of quality was type/token ratio (see Figure 2A). Contrary to our hypothesis, type/token ratio was significantly higher in Spanish than in English [t(44) = 3.62, p < 0.001]. That is, although there may have been fewer words in Spanish presented within our picture books, the lexical diversity of Spanish was higher, demonstrating there were relatively more unique words presented in Spanish. Thus, although picture books in our sample contained more total word tokens and word types in English, and more word tokens per utterance in English, relatively more unique words were presented in Spanish than in English.  Our second measure of the quality of the text was mean length of utterance (MLU), which reflects the mean number of word tokens within single-language utterances (ex-cluding mixed-language utterances). The MLU of English-only utterances within picture books in our sample was significantly higher than the MLU of Spanish-only utterances [t(44) = 10.78, p < 0.001]. This suggests that picture books in our sample presented longer and presumably more complex English-only utterances than Spanish-only utterances (see Figure 2B).

Measures of Frequent
Thus, although picture books in our sample contained more total word tokens and word types in English, and more word tokens per utterance in English, relatively more unique words were presented in Spanish than in English.

Measures of Frequent Words: Across Books, Function Words Appear More Often in English, and Words about Family Are More Common in Spanish
To better understand differences in how the two languages were being used, we explored which words were presented most frequently across our entire sample of picture books. For these analyses, we examined frequencies over the full corpus of text generated by our sample of picture books. Our corpus included a total of 30,381 word tokens, of which 26,805 were in English and 3578 were in Spanish.
We first examined the ten most frequent words for English and Spanish separately (see Table 2). In English, the most frequently used words in our corpus were function words (e.g., articles, conjunctions, prepositions, and pronouns). Although some function words also appear among the most frequent words in Spanish, the most frequently used words in our corpus in Spanish were predominantly content words (nouns, specifically words referring to people). To rule out the possibility that apparent differences in the information being presented were due to linguistic properties (e.g., because of the use of grammatical gender and plural marking, there are more articles in Spanish, and the common English verb "to be" can be translated as either ser or estar in Spanish), we then examined the translation equivalents for the top five most frequent words in each language (see Table 3). While this analysis revealed that function words did appear relatively often in Spanish, we still found that certain content words were much more frequent in Spanish than in English. For example, the Spanish word abuela appeared 75 times and comprised 2.1% of Spanish words in our corpus, but its English equivalent grandma (or grandmother) occurred only 52 times and comprised just 0.1% of English words in our corpus. Lastly, to further investigate the possibility that some information (e.g., names for family members) disproportionately appeared in Spanish, we examined the top five most frequently used nouns in English and Spanish in our corpus. Here, we attempted to describe words at the conceptual level, and collapsed across closely related variants of the same word (e.g., abuela, abuelas, and abuelita were grouped together). Table 4 displays the results. We again found differences in the kinds of words being used across the two languages. Specifically, the most common nouns in Spanish made up a much larger proportion of the total Spanish input (9.9%), compared to the most common English nouns, which made up only 0.7% of the total English input. Moreover, in English, the five most frequent nouns appeared to refer to places and routines, while the most frequently used nouns in Spanish were largely names of family members. Together, our preliminary exploration into the kinds of words being presented frequently across our corpus of codeswitching picture books reveals differences across languages: the most common words in Spanish are mostly referring to family. In contrast, English presents more function words, and individual nouns are far less frequent and less tightly clustered in meaning.

Measures of Language Mixing: Codeswitching Picture Books Present More within-Utterance Than between-Utterance Language Switches
Our third question concerned how the two languages were presented together. Given high kurtosis (>3) in the distribution of difference scores for the following paired comparisons, we conducted Wilcoxon signed rank tests for these analyses. We first evaluated the amount of switching between languages and then examined the length and composition of mixed-language utterances.
Only two books did not contain within-utterance switches and only nine books did not contain between-utterances switches. Figure 3 presents the distribution of switches as a percentage of overall number of utterances, for within-and between-utterance switches. Picture books in our sample contained on average 27.82 within-utterance switches, or, on average, 38.4% of utterances included switching. Strikingly, there were more mixedlanguage utterances than Spanish-only utterances [Z = 932.5, p < 0.001]. In addition, books had on average 11.33 between-utterance switches, or, on average, 13.18% of utterances represented a switch from the prior utterance. Thus, there were significantly more withinutterance switches than between-utterances switches [Z = 923, p < 0.001], and overall, switching was common. Figure 3 presents the distribution of switches as a percentage of overall number of utterances, for within-and between-utterance switches. Further, mixed-language utterances presented more English word tokens (M = 8.01, SD = 5.42) than Spanish word tokens (M = 1.93, SD = 2.16), a difference that was significant [Z = 946, p < 0.001]. Interestingly, the MLU for mixed-language utterances (M = 8.96, SD = 5.42) was higher than the MLU for Spanish-only utterances [Z = 1011, p < 0.001] and higher than the MLU for English-only utterances [Z = 915, p < 0.001], suggesting that mixed-language utterances were typically more complex than single-language utterances.
Together, these results show that codeswitching picture books present high amounts of switching to young children, both within-and between-utterances, with more withinutterance switches than between-utterance switches. Based on MLU, mixed-language utterances were the most complex type of sentence, but consistent with the overall composition of the books, mixed utterances contained more English words than Spanish words. In fact, most mixed-language utterances contained only a word or two in Spanish and conveyed the majority of their content in English.

Discussion
The goal of the current study was to examine the quantity and quality of text in Spanish-English codeswitching picture books readily available to parents in the U.S. We considered the amount and complexity of text presented in each language, as well as how the two languages were used together. Overall, our sample of codeswitching picture books contained many more words and utterances in English than in Spanish. Additionally, Further, mixed-language utterances presented more English word tokens (M = 8.01, SD = 5.42) than Spanish word tokens (M = 1.93, SD = 2.16), a difference that was significant [Z = 946, p < 0.001]. Interestingly, the MLU for mixed-language utterances (M = 8.96, SD = 5.42) was higher than the MLU for Spanish-only utterances [Z = 1011, p < 0.001] and higher than the MLU for English-only utterances [Z = 915, p < 0.001], suggesting that mixed-language utterances were typically more complex than single-language utterances.
Together, these results show that codeswitching picture books present high amounts of switching to young children, both within-and between-utterances, with more withinutterance switches than between-utterance switches. Based on MLU, mixed-language utterances were the most complex type of sentence, but consistent with the overall composition of the books, mixed utterances contained more English words than Spanish words. In fact, most mixed-language utterances contained only a word or two in Spanish and conveyed the majority of their content in English.

Discussion
The goal of the current study was to examine the quantity and quality of text in Spanish-English codeswitching picture books readily available to parents in the U.S. We considered the amount and complexity of text presented in each language, as well as how the two languages were used together. Overall, our sample of codeswitching picture books contained many more words and utterances in English than in Spanish. Additionally, English utterances had a higher MLU than Spanish utterances, suggesting higher sentence complexity in English. However, the type/token ratio of the Spanish text was higher than the type/token ratio of the English text, demonstrating that relatively more unique words were presented in Spanish than in English. Finally, our sample of picture books presented high amounts of codeswitching, with over a third of utterances including words in both languages, many more than the number of utterances presented solely in Spanish. Together, these findings indicate that codeswitching picture books offer potentially enriching duallanguage input, but they appear to provide unequal experience across the two languages.
Consistent with previous research (Domke 2018;Gomm et al. 2017), we found that Spanish-English codeswitching picture books primarily present English text. The majority of our sample (73% of picture books) contained 20% or fewer Spanish word tokens, and every single book in our sample contained more English words than Spanish words. Although this was consistent with our predictions, it is notable how much English is overrepresented within codeswitching picture books, given that these picture books are marketed as "bilingual." It is an open question whether this type of imbalance is characteristic of codeswitching books in general, or if this is particular to codeswitching books available to families in the U.S, where the perceived status of English and Spanish may differ, particularly in academic or literary contexts (e.g., Cha and Goldenberg 2015;Hoff et al. 2018;Torres 2007). Future work describing codeswitching picture books available in other countries and other language combinations could determine whether codeswitching books tend to present a dominant language or if this pattern is specific to the U.S. context.
In addition to highlighting quantitative differences in English and Spanish text, our results revealed that text written in English and Spanish differed along qualitative dimensions. Consistent with our predictions, the picture books in our sample presented more complex English-only utterances than Spanish-only utterances (as defined by MLU). For example, two consecutive lines from Paletero Man demonstrate this trend: "We called out your name when we saw your coins drop but you must have not heard us because you didn't stop. Muchas gracias amigos." Contrary to our predictions, however, picture books in our sample had higher lexical diversity in Spanish than in English, indicating a greater variety of word types in Spanish relative to the total amount of Spanish words. Together, these findings demonstrate that the two languages are being used differently.
Examining the frequency of words in English and Spanish across the entire corpus provided additional evidence that the two languages were used to present different information. Approximately 10% of the total input in Spanish consisted of words referring to just four family members (abuela, mamá, papá, and tía). For example, an excerpt from Between Us and Abuela: A Family Story from the Border uses Spanish to incorporate a family word, "Abuela drops kisses on our fingers." In contrast, the most common words in English were function words, and the most frequent nouns in English, which comprised less than 1% of the total English words, were not closely related. For example, all function words (underlined) in this mixed-language utterance from Gazpacho for Nacho appear in English: "Among all the colorful, fresh alimentos, he picked out a few crispy green pimientos." Thus, these codeswitching picture books introduce different concepts and vocabulary across languages, and they appear to be providing richer grammatical complexity in English.
We selected our picture books to contain codeswitching, and indeed, picture books in our sample contained high amounts of switching between languages. We found that the rate of switching in our books was notably higher than even the highest estimates of the frequency of switching in child-directed speech, which range from 3% to 20% of utterances (Bail et al. 2015;Bentahila and Davies 1995;Goodz 1989;Kremin et al. 2021;Nicoladis and Secco 2000;Pan 1995;Tare and Gelman 2011). We also found that the picture books in our sample presented more within-utterance switches than between-utterance switches. This pattern contrasts recent studies of language mixing in child-directed speech showing that caregivers appear to be more likely to switch languages between utterances than within utterances (Bail et al. 2015;Kremin et al. 2021). Finally, in addition to being highly frequent, mixed-language utterances had a higher MLU than any other utterance type, demonstrating that the complexity of mixed-language utterances is high within the picture books in our sample. For example, the mixed-language utterance "Brave bomberos reach their station, time for rest and relaxation, eat their supper, wash los platos, feed their pets-los perros, gatos" from Fuego Fuego Brave Bomberos demonstrates high MLU with several within-utterance switches. Thus, bilingual picture books present complex input with respect to how the two languages are used together, and this input does not appear to mirror typical patterns of child-directed speech. This difference could be supportive of dual-language learning by adding variability and complexity to the bilingual language input children experience.
Our study contributes to the growing body of evidence demonstrating that young children's picture books contain rich linguistic input from which young children can learn (Montag et al. 2015;Logan et al. 2019) and that bilingual picture books in particular may be an important format for engaging and teaching bilingual children language and literacy skills (Brouillard et al. 2020;Hu et al. 2012;Naqvi et al. 2012;Read et al. 2021a;Tsybina and Eriks-Brophy 2010;Zaidi 2020). However, several important questions remain. In particular, it is crucial to consider the underlying causes driving the differences in quantitative and qualitative features in the English vs. Spanish text in picture books. There are several non-exclusive possibilities for why these differences are present.
One potential reason for the qualitative differences found between English and Spanish in picture books in our sample could be linguistic differences between Spanish and English. Spanish uses richer inflectional morphology compared to English (e.g., in Spanish, both grammatical gender and plurality are marked for nouns, articles, and adjectives; verbs can be inflected for tense, person, and mood; e.g., Lang 2013;Moreno-Sandoval and Goñi-Menoyo 2002), and it is possible that this leads to an inflated value for our type/token ratio measure. For example, accounting for the Spanish variations of the English word the in our exploration of the types of words presented in English and Spanish across our entire corpus, we found that articles represented a similar proportion of the input in each language.
However, even after we considered different translation equivalents of highly common words, we still found that function words appeared to be more frequent in English, as a relative proportion of the English tokens as well as in the input as a whole. Given that function words are the most frequent words found in natural language (Zipf 1949), the fact that common words such as a, is, and to were more frequent in English could explain why lexical diversity was higher in Spanish while MLU was higher for English-only utterances.
It remains an open question whether child-directed texts composed of only Spanish tend to have higher lexical diversity than child-directed text composed of only English, or if this pattern reflects the use of Spanish in combination with English. Future research will explore differences between bilingual and monolingual picture books written in both English and Spanish, but no appropriate corpora for comparison exist at present. Moreover, even if it is the case that Spanish text tends to have higher lexical diversity, we found very different words occurring with high frequency across the two languages, again suggesting that these books present different types of input in English vs. Spanish.
Another factor that may have led to both the quantitative and qualitative differences between languages in our sample of picture books is that English appears to be treated as the primary language, while Spanish is embedded to introduce novel and unique words. This possibility is supported by the characteristics of Spanish-only and mixedlanguage utterances. On average, English-only utterances were about 3.5 times longer than Spanish-only utterances, which were typically only one to two words in length. Likewise, mixed-language utterances were largely composed of English words, with, on average, only one or two words in Spanish intermixed (e.g., "Yellow are faroles flickering bright." from Green is a Chile Pepper: A Book of Colors). Thus, Spanish words appear to be sprinkled into English text, with very few longer sequences of Spanish words in succession.
Further supporting the view that Spanish was presented as a supplement to English is the fact that 37 of the 45 books in our sample contained a glossary of translations for the Spanish words, implying that the author or publisher assumed that the reader would have greater familiarity with English. Authors aiming to sell books within the Englishdominant U.S. may thus not be targeting bilingual children, but rather attempting to reach English-speaking children for whom Spanish words would represent novel vocabulary in an unfamiliar language (Domke 2018;Torres 2007). If many of these books were intended for the purpose of teaching English-speaking children new Spanish words, that could explain why the lexical diversity was higher in Spanish and why nouns dominated the Spanish input. Future studies could further explore the explicit goals of different types of books, in addition to the text contained within them.
It is also important to consider what messages, implicitly or explicitly, these books may send about the status of the two languages and how the two languages are, and should be, used. Specifically, the predominance of English in the books may reinforce English as a language for reading, learning, and speaking, while signaling that Spanish is less important (Domke 2018;Hammer et al. 2012;Montanari et al. 2020). Beliefs about language status can be instantiated, developed, and challenged across a variety of social contexts, including children's interactions with family, peers, teachers, and texts (e.g., Jaffe 2003;Kyratzis 2010;Martínez-Roldán and Malavé 2004;Showstack 2017). The privileging of English and minoritization of Spanish within the bilingual picture books in our sample could inform children's developing beliefs about the relative importance of Spanish vs. English (Fuller and Leeman 2020). Furthermore, the differences in the types of words being used across the two languages may also imply that there are constraints in the use of Spanish, i.e., Spanish is appropriate to use in contexts related to family, but not beyond this context (e.g., Kyratzis 2010). Future research should investigate how these books affect children's attitudes toward each language, and the consequences of this messaging.
An additional important remaining question concerns how the input provided by codeswitching picture books can actually shape young children's language learning. Given that picture books in our sample present relatively more unique words in Spanish than in English, this book format may promote learning new Spanish vocabulary. Children readily learn words from shared reading (Flack et al. 2018), especially when they engage in dialogue about those words (Blewitt et al. 2009;Lugo-Neris et al. 2010;Penno et al. 2002). For example, in an intervention study with primarily Spanish-speaking children, Lugo-Neris et al. (2010) found that providing elaboration in Spanish when introducing novel English words during shared reading activities ("bridging") promoted the acquisition of the novel words. Thus, the inclusion of Spanish words embedded in English text may potentially facilitate bridging, providing opportunities for parents to elaborate on new Spanish vocabulary. Furthermore, the inclusion of glossaries may support word learning for second language learners (Madiba 2010), particularly if they also encourage parents to define and discuss the Spanish words. However, it is also possible that the sporadic inclusion of Spanish words is unhelpful for learning. For instance, Spanish-only utterances in our sample were often single words (e.g., "And just like that, the mouse found her hat. "Adiós!" she said as she skipped away." from The Fabulous Lost & Found and the Little Mouse who Spoke Spanish), and young bilinguals have been shown to be relatively less efficient to process isolated words (Morini and Newman 2019). Thus, it is an open question whether and how the presentation of Spanish vocabulary embedded within English text provides effective opportunities to learn Spanish words.
Given that picture books in our sample presented mixed-language utterances frequently, this book format may also present challenges beyond that of single-language texts. Research has suggested that mixed-language utterances may be relatively difficult for children to process in realtime, particularly when codeswitched words are presented in the child's weaker language (Byers-Heinlein et al. 2017Morini and Newman 2019;Potter et al. 2019). It could be that by introducing added processing demands, frequent switching in bilingual books could reduce the potential for learning, and that children would benefit more from books that present the languages more separately. Conversely, encountering complex, challenging input in shared reading might be especially supportive of children's learning of two languages (Brouillard et al. 2020;Read et al. 2021b), just as the inclusion of unique words and complex structures in written text has been argued to support monolingual children's language development (Fletcher and Reese 2005;Hoff-Ginsberg 1998;Montag et al. 2015). Future studies can explore these competing possibilities.
In order to understand what young learners may gain from the text in codeswitching picture books, it is also important to consider the language skills that they bring to reading interactions. Children's vocabulary knowledge contributes to how well they acquire information from joint reading activities (Robbins and Ehri 1994;Sénéchal et al. 1995). Bilingual children have unequal experience and proficiency in each of their languages (Marchman et al. 2017;Pearson et al. 1997), and they are more efficient to process words in the language that they hear more often (Marchman et al. 2010;Potter et al. 2019), suggesting that they may not learn words equally easily across their two languages. As we have already suggested, the codeswitching picture books in our sample may be targeting English speakers learning Spanish vocabulary. Would Spanish speakers learning English benefit equally from English-dominant codeswitching books? A recent study found that Spanish speaking children learn English words from Spanish-dominant codeswitching books (Read et al. 2021b). For Spanish-speaking children, English-dominant codeswitching books may also provide opportunities to learn English words and grammatical structure, with the embedded Spanish vocabulary providing a bridge or a scaffold into the English text (Domke 2018;Leacox and Jackson 2014;Lugo-Neris et al. 2010). Given that codeswitching within text can serve several functions (Chaudhri and Torres 2021;Torres 2007), assessing what is being conveyed when a codeswitch occurs could clarify whether one language is being used to scaffold the content presented in the other language.
In addition to recognizing that children may vary in what they learn from the text most readily, we must also recognize how caregivers contribute to reading interactions with their child. A large literature has documented that children's learning from shared reading can depend on features of the interactions, including how caregivers ask questions, provide comments and feedback, and elaborate on the story beyond what is in the text (Gómez et al. 2021;Jiménez et al. 2006;Whitehurst et al. 1988;Zucker et al. 2013). For children learning a second language in particular, the kinds of talk and behaviors they experience within language interactions is especially important for learning (Hoff and Core 2013;Hurtado et al. 2008) and caregivers vary with respect to the behaviors they incorporate into their interactions (Custode and Tamis-LeMonda 2020; Escobar and Tamis-LeMonda 2017; Luo et al. 2020;Winstone et al. 2021). Currently, it is not clear how codeswitching picture books may support or affect the extratextual talk and behaviors that caregivers provide during reading activities, but future studies could explore whether some books promote or impede high-quality reading experiences.
Moreover, reading interactions may have many potential targets for learning, including language-related goals, factual content, and lessons (Bergman Deitcher et al. 2019;Breitfeld et al. 2021). In our sample, the topic of family appeared to be a significant focus of Spanish content across books, and it could be that parents choose to read bilingual books in order to share cultural values or discuss particular topics. It has been widely suggested that when words and concepts are presented in predictable contexts, this supports their learning (Benitez and Saffran 2018;Benitez and Smith 2012;Chang and Deák 2019;Eiteljoerge et al. 2019;Roy et al. 2015;Schwab et al. 2018;Vlach and Sandhofer 2011;Wojcik and Saffran 2015), and picture books might be able to offer useful predictability. It could be that for Spanish-English bilingual families, some topics are more commonly discussed in one language than the other (e.g., family-related words in Spanish, school-related words in English), and perhaps the use of these concepts in books is reflecting and/or supporting wider usage patterns. Bilingual reading environments are complex, and our study represents an important first step in understanding how picture books might be used to support bilingual language skills. The next steps will include examining how caregivers interact with codeswitching books while reading with their children, how individual differences in children's and/or caregivers' language contribute to this interaction, and what language, cultural, and literacy knowledge children ultimately gain from these interactions.

Conclusions
Children's picture books have the potential to provide rich opportunities for learning, and codeswitching picture books in particular provide children with different learning opportunities in English vs. Spanish. Specifically, although codeswitching picture books in English and Spanish contain more text in English and more complex text in English, they present relatively more unique words in Spanish, offering chances to learn new vocabulary in Spanish. Furthermore, the two languages are used in complex ways, and the text included higher amounts of within-utterance language switching than children typically hear in spoken language. Together, these findings indicate that Spanish-English codeswitching picture books may provide a unique source of language input for children learning a second language, but also highlight how much more research is needed to better understand and promote reading materials and practices that support bilingual language development. Data Availability Statement: All data, CLAN codes used to derive each measure, and R script for the main analyses reported in this manuscript are openly available on OSF: https://osf.io/sxm5t.