Regular and Irregular Inflection in Different Groups of Bilingual Children and the Role of Verbal Short-Term and Verbal Working Memory

Bilingual children often experience difficulties with inflectional morphology. The aim of this longitudinal study was to investigate how regularity of inflection in combination with verbal short-term and working memory (VSTM, VWM) influences bilingual children’s performance. Data from 231 typically developing fiveto eight-year-old children were analyzed: Dutch monolingual children (N = 45), Frisian-Dutch bilingual children (N = 106), Turkish-Dutch bilingual children (N = 31), Tarifit-Dutch bilingual children (N = 38) and Arabic-Dutch bilingual children (N = 11). Inflection was measured with an expressive morphology task. VSTM and VWM were measured with a Forward and Backward Digit Span task, respectively. The results showed that, overall, children performed more accurately at regular than irregular forms, with the smallest gap between regulars and irregulars for monolinguals. Furthermore, this gap was smaller for older children and children who scored better on a non-verbal intelligence measure. In bilingual children, higher accuracy at using (irregular) inflection was predicted by a smaller cross-linguistic distance, a larger amount of Dutch at home, and a higher level of parental education. Finally, children with better VSTM, but not VWM, were more accurate at using regular and irregular inflection.


Introduction
Inflectional morphology is a locus of difficulty for bilingual children, but less is known about the role of regularity of inflection, and about the impact of verbal short-term and working memory on children's accuracy at using regular versus irregular inflection. Investigating these issues, the current study aimed to contribute to our understanding of how language-internal factors, child-internal factors, and their interactions determine language outcomes in bilingual children. Importantly, the bilingual child does not exist. Hoff (2013, p. 261) stated that "All children who are bilingual have in common that they have been exposed to two languages. Beyond that, the ways in which children's environments provide exposure to two (or more) languages vary enormously." This statement highlights the variation that exists within the bilingual population, which pertains to exposure and use (Francot et al. 2020). Others argue that a binary categorical distinction between monolinguals and bilinguals falls short in describing the actual variation that exists in both groups (Surrain and Luk 2017;Bialystok et al. 2012;Dixon et al. 2012;Kroll and Bialystok 2013). To explore variation among bilingual children and determine whether the observed patterns generalize across bilinguals, we distinguished in this study between bilinguals who are all learners of Dutch but who have either Arabic, Frisian, Tarifit or Turkish as their other language. We examined bilingual speakers of these four languages in particular, because Frisian is the only officially recognized regional minority language in the Netherlands (Mercator European Research Centre on Multilingualism and Language Learning 2007) and Arabic, Tarifit and Turkish are the languages of two of the largest non-Western immigrant groups in the Netherlands (Den Ridder et al. 2020).

Inflection and Morphological Regularity
Various studies have shown that bilingual children make more inflection errors than their monolingual peers (Nicoladis et al. 2007;Rispens and De Bree 2015;Schwartz et al. 2009). That bilingual children perform less well on inflectional morphology is not surprising, given that their total language input is distributed across languages (Oller and Eilers 2002), which results in less input in one language compared to monolingual children learning that language. In addition, for many bilinguals, the age of onset to one of their languages is not from birth, resulting in a shorter length of exposure compared to same-age monolingual children.
Despite the unequal input to bilingual and monolingual children, it is not the case that bilingual children perform lower across-the-board. For example, investigating Hebrew-Dutch children, Rispens and De Bree (2015) found that bilingual and monolingual children performed similarly on Dutch regular past tense, but worse on Dutch irregular past tense. A similar observation was reported by Schwartz et al. (2009) who conclude that their monolingual and bilingual participants did not differ on regular plurals in Hebrew, while the bilinguals were less accurate than monolinguals on irregular forms. Relatively high accuracies with regular forms are not only found in between-child comparisons, but also emerge in within-child comparisons in studies that describe cross-linguistic patterns in bilingual French-English children (Nicoladis et al. 2007;Nicoladis and Paradis 2012;Paradis et al. 2011). These studies demonstrate higher accuracy for regular than irregular tense inflection with a larger difference between regulars and irregulars in English than French. In sum, regularity of inflection appears to be one of the key factors that modulates whether or not monolinguals outperform bilinguals on inflectional morphology.
Adopting a Usage-Based (UB) approach to language development, which presupposes that aspects of children's language input determine their language development, scholars have attributed the developmental differences between regular and irregular forms to frequency distributions in the input (e.g., Paradis et al. 2011;Rispens and De Bree 2015;Schwartz et al. 2009). In line with this approach, empirical studies have demonstrated that input distributions indeed predict bilingual children's accuracy at using regular and irregular inflection (Blom et al. 2012;Blom and Paradis 2013). The different acquisition rates of regular and irregular inflection originate, most likely, from relative type and token frequencies (Bybee 2007(Bybee , 2008. For example, in English, regular past tense -ed has a high type frequency, which means that there are many different verbs that end in the past tense with -ed. The phonological variability of these verbs also tends to be large, which, together with a high type frequency, contributes to children's early productive use of -ed. Irregular past tense forms in English have generally a low type frequency and children's accuracy with irregular past tense is largely dependent on token frequency, which varies from verb to verb and results in different acquisition rates for individual verbs. Low token frequency irregular verbs will be late acquired, in particular by bilingual children as their input is distributed (Schwartz et al. 2009). The cross-linguistic differences between English and French may stem from the higher type and lower token frequencies of irregular forms in French compared to English (Nicoladis et al. 2007;Paradis et al. 2011).
Previous research on the acquisition of inflection in bilingual children has largely focused on regular and irregular verb inflection, specifically tense inflection in English.
As part of the current study, we investigated noun plural and past participle formation in Dutch. To our knowledge, three studies have investigated this aspect of Dutch in monolingual and bilingual children (Boerma et al. 2017;Lalleman 1986;Verhoeven and Vermeer 1985). Lalleman (1986) and Verhoeven and Vermeer (1985) both found that the monolinguals outperformed the bilinguals. Boerma et al. (2017) analyzed plurals and participles separately and used a two-wave longitudinal design. For plurals, they observed that the monolinguals were more accurate than the bilinguals regardless of wave and regularity. The monolingual children improved more than the bilingual children. This was related to the irregular noun forms that remained difficult for the bilinguals, in line with the suggestion by Schwartz et al. (2009). In the case of participles, a different pattern emerged. The monolinguals were more accurate than the bilinguals on regular and strong forms, but only a marginal difference was found for irregular participles, because these forms were difficult for bilingual and monolingual children alike. The current longitudinal study builds on these previous findings by investigating different bilingual groups, and asking how performance on regular and irregular noun plurals and past participles relates to child-internal resources such as verbal short-term memory and verbal working memory. We turn to these issues below.

Different Bilingual Groups and Cross-Language Distance
The bilingual children in the current study share the same majority language (Dutch), but have different minority languages (Arabic, Frisian, Tarifit, Turkish). In the context of this study, the notion 'minority language' is used as an umbrella for the languages spoken by a minority of the inhabitants in the country, and it contrasts with the notion 'majority language', which refers to the dominant language in the society that is spoken by the majority of the inhabitants (Hogan-Brun and Wolff 2003). In this sense, minority language encompasses heritage language, home language, native language, first language, or regional language, among other more specific notions.
Comparing performances across groups that have a different minority language can provide insight into the role of cross-language distance and transfer (Blom et al. 2020;Floccia et al. 2018). Several studies have investigated accuracy at using English tense marking inflection in English language learners in Canada dividing the sample of participating children into children learning a tense-marking inflecting minority language and children learning a non-tense marking isolating minority language (Paradis 2011;Blom et al. 2012). These studies consistently show that being familiar with tense inflection via the minority language is associated with fewer errors in using tense inflection in the majority language, pointing to effects of transfer. Separate analyses for English regular and irregular past tense inflection indicate that both are impacted by transfer effects (Blom and Paradis 2013). Other research shows that these transfer effects are persistent and that despite early and lengthy exposure to English, English tense inflection is a locus of errors if the minority (and native language) does not possess tense inflection (Paradis et al. 2016;McDonald 2000).
For the current study, we investigated children whose minority language is a migrant language (Arabic, Tarifit, Turkish) or a regional language (Frisian). All four languages are minority languages in the context of the Netherlands, where the study is situated, and they all have inflection. However, they overlap with Dutch, the majority language, in different degrees. Frisian is typologically close to Dutch, which is reflected in lexical and grammatical overlap, although there are also differences between Frisian and Dutch. In a previous study, we examined transfer effects in performance on Dutch noun plurals and past participles in the same Frisian-Dutch sample included in the current study by comparing accuracies on target forms that do and do not overlap between the two languages (Blom and Bosma 2016). Overlap was based on lexical and grammatical overlap, that is, if Dutch and Frisian have the same lexeme and similar suffixes and changes in the stem for plural or participle formation. The results revealed that the children performed better on overlapping items, confirming that cross-language proximity impacts on bilingual children's performance and leads not only to between-child variation but also to within-child variation. Arabic, Tarifit, and Turkish are all typologically distant from Dutch. Although all three languages have rich inflectional systems, we expected that the groups of children with Arabic, Tarifit or Turkish as the minority language would perform lower than the Frisian group on regular and irregular inflection in Dutch, because of much less phonological and morphological overlap with Dutch. Arabic, Tarifit, and Turkish are quite similar to each other in terms of distance to Dutch (Blom et al. 2020). Yet, for the purpose of the current study, we did not collapse the data from the bilingual Arabic-Dutch, Tarifit-Dutch and Turkish-Dutch children, because previous research suggests notable differences in bilingual exposure, use and proficiency between these groups and patterns found for one group do not necessarily generalize to other groups (Scheele et al. 2010;Blom 2019).

Verbal Short-Term and Working Memory
According to the UB approach, children's accuracy at using inflection is in part determined by child-external input-related factors. Individual variation in children's ability to accurately use inflection is moreover expected to be impacted by child-internal resources, such as verbal short-term memory (VSTM) and verbal working memory (VWM). Several studies found that children with better VSTM and VWM also had higher language outcomes Gathercole 1996, 2000;Baddeley et al. 1998;French and O'Brien 2008;Masoura and Gathercole 2005;Paradis 2011;Service 1992;Verhagen and Leseman 2016), suggesting that language knowledge in long-term memory is associated with children's ability to temporarily store and manipulate information. It is still largely an open question whether VSTM and VWM have differential effects, and whether regular and irregular aspects of language show different relations with VSTM and VWM.
According to Baddeley's (2003) well-known working memory model, VSTM consists of two subcomponents: one to hold memory traces in mind and another to prevent these traces from decay through subvocal rehearsal. As such, VSTM is thought to facilitate language acquisition in native and child second language (L2) learners. Recent studies with bilingual children support this hypothesis. Paradis (2011) found that VSTM predicts receptive vocabulary and the production of tense inflection in English language learners who are between ages 4 and 7 years. Another study found that VSTM predicts receptive vocabulary and grammar in 5-and 6-year-old monolingual Dutch and bilingual Turkish-Dutch children (Verhagen and Leseman 2016). In this latter study, the same relations emerged for monolinguals and bilinguals, and for grammar skills at the sentence level and at the word level. At the word level, the same (standardized) instrument was used as in the current study, targeting children's ability to produce correct noun plurals and past participles. Verhagen and Leseman (2016) did, however, not distinguish the regular and irregular items that were both included in this task. It is therefore unknown whether or not VSTM differently related to regular and irregular inflection.
VWM refers to the ability to manipulate temporarily stored information, and involves storage and concurrent processing of information. VWM is, for example, measured with a listening span task in which children need to comprehend a sentence and provide a truth-value judgment about this sentence, while, at the same time, remember a digit presented auditorily after the last word of each sentence or via a digit span task in which children are asked to repeat sequences of auditorily presented digits in reversed order. VWM is thus involved in more complex mental processes, which is confirmed in research with children showing that VWM performance is constrained both by VSTM capacity and executive processes such as domain-general attention (Magimairaj and Montgomery 2012). Two studies observed significant relations between VWM and second language (L2) grammar outcomes in 15-to 16-year-old youth (Kormos and Sáfár 2008) and 7-to 8-year-old children (Engel de Abreu and Gathercole 2012). These studies investigated L2 learning in instructional settings, and did not focus on inflectional morphology. Verhagen and Leseman's (2016) study demonstrates that VWM also predicts grammar, including accurate use of noun plurals and past participles, in a naturalistic learning setting. It is, however, unknown whether this effect was the same for regular and irregular items.
There are reasons to hypothesize that VWM has a differential effect on regular and irregular inflection. Investigating native-speaking Arabic children in Israel who were, on average, 11 years old, Cohen-Mimran et al. (2013) observed that VWM was only related to regular inflection. In their study, children participated in a listening span task in which they Languages 2021, 6, 56 5 of 23 were asked to recall target words at the end of sentences. After hearing a series of sentences, they had to recall the target words which could be regular or irregular. Recall of words with regular morphology was found to be significantly poorer than recall of words with irregular morphology. The authors argue that the decomposition of regular forms imposes extra load and draws on executive processes, explaining the differential effects for regulars and irregulars. A similar explanation is given by Bosma et al. (2017) who found that VWM, but not VSTM, predicted Frisian-Dutch bilingual children's ability to detect cross-linguistic phonological regularities. A recent study with Turkish adults found no effects of VWM on the processing of English L2 regular past tense morphology (Rizaoglu and Gürel 2020). According to the authors, the absence of a significant outcome may, however, be related to high VWM scores and a lack of variation.

The Present Study
The aim of this longitudinal study is to investigate bilingual children's performance on noun plurals and past participles in Dutch focusing on effects of regularity, cross-language distance, and the role of VSTM and VWM. Previous research investigated effects of regularity, but it is unknown whether findings are the same across bilingual groups that vary in cross-language distance, and if effects of VSTM and VWM interact with regularity. Furthermore, the longitudinal design made it possible to investigate developmental patterns.
First, we hypothesized that owing to the distributed input of bilingual children, bilingualism would affect children's accuracy at using inflection and that this effect would be moderated by regularity and cross-language distance. The following predictions were formulated. We expected that the bilingual children in our study would be outperformed by the monolingual children (Nicoladis et al. 2007;Lalleman 1986;Verhoeven and Vermeer 1985;Boerma et al. 2017), and that this effect would be more pronounced for irregulars than regulars (Rispens and De Bree 2015;Schwartz et al. 2009;Boerma et al. 2017). It was predicted that the gap between bilinguals and monolinguals would be larger for the Arabic-Dutch, Tarifit-Dutch or Turkish-Dutch children than for Frisian-Dutch children. The reason is that positive transfer is facilitated if the cross-language distance is smaller (Blom and Bosma 2016;Floccia et al. 2018). Second, we hypothesized that regular and irregular inflection would differ in accuracy because of differences between regular and irregular inflection in input distributions, specifically, differences in type-token frequency balance. We predicted that children would perform more accurately using regular inflection compared to irregular inflection, in line with previous findings for English and French (Nicoladis et al. 2007;Nicoladis and Paradis 2012;Paradis et al. 2011). Rispens and De Bree (2015) suggest, moreover, that age moderates the differences in accuracy between regulars and irregulars as an effect of accumulating input. This may indeed be a viable hypothesis, as the older children are, the more input they have accumulated and the more likely it is that the specific effects of input distributions are not noticeable anymore as soon as input thresholds for mastery have been reached. Third, we hypothesized that children's accuracy at using inflection would be dependent on cognitive resources, specifically VSTM and VWM capacity, with differential effects for regular and irregular inflection. We predicted that VSTM and VWM would be positively related to accuracy at using inflection (Verhagen and Leseman 2016), regardless of bilingual group, and that better VWM scores would predict higher accuracy at using regular but not irregular inflection (Bosma et al. 2017;Cohen-Mimran et al. 2013).

Participants
For this longitudinal study, data from 231 children were analyzed. All children were typically developing with no indication of language and speech disorders, as judged by the Questionnaire for Parents of Bilingual Children (PaBiQ; Tuller 2015). They were assigned to five different groups: monolingual children (N = 45), Frisian-Dutch bilingual children (N = 106), Turkish-Dutch bilingual children (N = 31), Tarifit-Dutch bilingual children (N = 38) and Arabic-Dutch bilingual children (N = 11). The children were followed longitudinally, with three measurements over one-year intervals. All children were 5 or 6 years old at time 1, 6 or 7 years old at time 2, and 7 or 8 years old at time 3. Bilingual children were included in the dataset if their exposure to the non-Dutch language at home was more than 20% of the time. The groups are of unequal sizes because we combined two separate datasets for the current study, one focusing on bilingualism in migration contexts (Turkish-Dutch, Tarifit-Dutch) and another focusing on bilingualism in the context of regional minority languages (Frisian-Dutch). Only during testing did it turn out that some children of Moroccan descent spoke Arabic instead of Tarifit. We included these children, treating them as a separate bilingual group. All children were born in the Netherlands, and exposed to Dutch and the minority language from birth in different degrees with considerable variation in bilingual exposure in all groups. Participant characteristics are given in Table 1. As age, parental education and non-verbal IQ have previously been found to influence the acquisition of inflectional morphology (Paradis 2011), these were included as control variables in the present study. Table 1. Means (standard deviations) for age in months at time 1, 2 and 3, non-verbal intelligence (WNV (Wechsler Non-verbal Scale of Ability), standardized norm score), parental education (EdPar, measured on a 9-point scale) and current use of Dutch at home (Use Dutch, in percentages), and gender distribution in the three groups. Non-verbal IQ was measured with the short version of the Wechsler Non-verbal Scale of Ability (WNV; Wechsler and Naglieri 2006). This comprised the subtest Matrices, in which children had to complete figural matrices by selecting the missing piece from four or five response options, and the subtest Recognition, in which children looked at geometric designs for three seconds and then had to select the viewed stimulus from four or five response options. Parental education was calculated as the average educational level of both parents, based on the PaBiQ (Tuller 2015). Educational level was defined as the highest degree obtained on a 9-point scale, ranging from no education (1) to a university degree (9). The PaBiQ also provided information about the current use of the Dutch language at home, which was measured relative to the total amount of language input that the child received from his/her mother, father, siblings and other adults who looked after the child at least once per week. For each of these people, the question had to be answered how often (s)he spoke the Dutch language to the child: 'never' (0%), 'seldom' (25%), 'sometimes' (50%), 'usually' (75%) and 'always' (100%).

Inflectional Morphology
Inflectional morphology was measured with the Word Formation subtest which is part of the Taaltest Alle Kinderen (TAK, "Language Assessment for All Children", Verhoeven and Vermeer 2001). The Word Formation test includes 12 noun items testing pluralization, and 12 verb items testing participle formation. Plurals or participles that did not undergo a change to the stem vowel were classified as regular (8 nouns, 4 verbs) whereas plurals and participles that showed a stem vowel change were classified as irregular (4 nouns, 8 verbs). The items are listed in Appendix A.
Items were scored as incorrect if there was a morphological error related to the specific properties of the inflected form. For noun pluralization, this included the omission of the plural suffix, the use of an incorrect plural suffix, and no lengthening of the stem vowel in cases where this was required, as in weg-wegen (Vεx-'Ve:g@) 'road-roads'. Phonological errors in the stem but not in the target morpheme, e.g., kranden instead of kranten 'newspapers', were not scored as incorrect. Final -n deletion was also not considered a mistake, since this is not uncommon in colloquial Dutch (Booij 1995, p. 141). For participle formation, items were scored as incorrect when the prefix or suffix was omitted or used incorrectly, or when there was an error with the stem, i.e., no or incorrect changes to the stem.

Verbal Short-Term and Working Memory
Verbal short-term memory and verbal working memory were measured with a Forward and a Backward Digit Span task, respectively. The tasks were based on the Alloway Working Memory Assessment (AWMA; Alloway 2012) and were translated to Dutch. The children heard sequences of digits, which they had to repeat in the same (Forward Digit Span) or in reverse order (Backward Digit Span). The Forward Digit Span is considered a measure of verbal short-term memory, as the digits only need to be stored, whereas the Backward Digit Span is considered a measure of verbal working memory, as the digits need to be stored and processed to be able to repeat them in reverse order (Alloway et al. 2008).
The AWMA procedure was applied for scoring (Alloway 2012). Both the Forward and the Backward Digit Span task consisted of seven blocks of six trials, and for each correct trial, the child received one point. This means that there was a maximum of six points per block and a maximum of 42 points per task. Trials were scored as incorrect if the sequence was (partly) incorrect, if children omitted one or more digits, or if they recalled one or more digits incorrectly. Each task started with a block with sequences of one digit, after which the length of the sequences increased with one digit for each subsequent block. The child automatically continued with the next block, when (s)he repeated the first four, or four out of the first five trials, within one block correctly, in which case (s)he received a score of 6 or 5, respectively. In other words, the child automatically received one point for each skipped trial. When the child answered three trials within one block incorrectly, the task stopped immediately.

Procedure
This research was screened by the Standing Ethical Assessment Committee of the Faculty of Social and Behavioral Sciences at Utrecht University. Criteria were met and further verification was not deemed necessary. Participants were recruited through primary schools in the Netherlands, which distributed consent forms and information folders about the experiment among the parents of the children. In accordance with the Declaration of Helsinki, the parents of the participating children all signed a consent form. All children were tested individually in a quiet room at school or at home. They were tested by experimenters who had a native command of Dutch and, in the case of the bilinguals, also of Frisian, Turkish, Tarifit or Arabic. The tasks in the present study were part of a larger test battery that included language, working memory and attention tasks. For all of the measures reported in this study, the language of instruction was Dutch.

Data Analysis
Our aim was to investigate monolingual and bilingual children's accuracy at using regular versus irregular inflection, and specifically to determine whether (1) different groups of bilingual children score lower on irregular inflection, but not on regular inflection, compared to monolingual children, (2) differential outcomes for regular and irregular inflection are modulated by age, (3) VSTM and VWM predict outcomes on both regular and irregular inflection. Children's responses on the 12 noun and 12 participle items were the binary outcome (correct, incorrect). The main factors of interest were: Group (monolinguals, Arabic-Dutch, Frisian-Dutch, Tarifit-Dutch, and Turkish-Dutch bilinguals), Regular (irregular, regular), Age, VSTM and VWM.

Preliminary Analyses
Preliminary analyses were carried out to examine whether the groups showed differences on any of the child-level predictor variables that were included in the main analyses. We first examined differences in Age, VSTM and VWM between the five groups. For each of these variables, we performed a linear mixed-effects model with the respective variables as dependent variables, Group as fixed factor, and Subject and Time (1, 2, 3) as random intercepts. For Age, Subject prevented the model from converging, therefore it was left out as random intercept. We used the R function lmer from the lme4 package. Pairwise comparisons were carried out with the R function pairs and emmeans from the emmeans package using the Šidàk correction. Next, the variables Parental education, Non-verbal intelligence and Use Dutch (the percentage of current use of Dutch) in the homes of the bilingual children were compared across the five groups. Since these variables were only measured at time 1, there is just one measurement per subject. Therefore, no random intercepts for Subject and Time were needed. Since a classical ANOVA did not meet the assumptions, we used the R function pairwise.wilcox.test with Holm correction. Finally, using Fisher's Exact test we compared the proportions females/males between the groups using the R function fisher.test.
The Tarifit-Dutch bilinguals were significantly younger than the monolinguals (t = −5.54, p < 0.001), Arabic-Dutch (t = −4.83, p < 0.001), Frisian-Dutch (t = −6.23, p < 0.001) and Turkish-Dutch bilinguals (t = −3.66, p < 0.001). The Tarifit-Dutch bilinguals scored significantly lower on VSTM than the monolinguals (t = −4.24, p < 0.001) and the Frisian-Dutch bilinguals (t = −3.44, p < 0.01). Their VWM scores were lower than those of the Frisian-Dutch bilinguals (t = −3.46, p < 0.01). The parents of the monolinguals were significantly more highly educated than the parents in the Tarifit-Dutch (p <.05) and Turkish-Dutch bilingual groups (p < 0.05), and the parents of the Frisian-Dutch bilinguals were significantly more highly educated than the parents in the Arabic-Dutch (p < 0.05), Tarifit-Dutch (p < 0.001) and Turkish-Dutch groups (p < 0.01). The Tarifit-Dutch bilinguals had significantly lower non-verbal intelligence scores than the monolinguals (p < 0.01), Frisian-Dutch (p < 0.001) and Turkish-Dutch bilinguals (p < 0.05). In the homes of the Arabic-Dutch, Tarifit-Dutch and Turkish-Dutch bilinguals, significantly more Dutch was used with the children compared to the Frisian-Dutch bilinguals (Arabic: p < 0.01, Tarifit: p < 0.001, Turkish: p < 0.01), and in the Tarifit-Dutch homes, significantly more Dutch was used with the children than in the Turkish-Dutch homes (p < 0.001). There were no significant differences in the proportions of boys and girls for language group pairs.

Main Analyses
To test whether our predictions were borne out by the data and pinpoint significant explanatory factors, we analyzed the data using mixed-effects models: one model on the basis of all children, and a second model on the basis of only the bilingual subjects. We ran this second model to look more closely into differences between the four bilingual groups by including the role of the variable Use Dutch. This variable could not be included in the first model with the monolinguals because the monolingual children did not vary on the current use of Dutch at home.
In both models, in addition to the variables of interest (Group, Regular, Age, VSTM, VWM), the variables Sex (boy, girl), Parental education, Non-verbal intelligence, Word type (noun, verb), and Lemma frequency (based on the Corpus Spoken Dutch, Oostdijk 2000) were included because these may predict additional variance, allowing us to better understand the effects of the factors of interest. Correlations between the fixed-effects predictors are in Appendix B.
For the first model, we started by entering the following variables in the model: Age, Sex, Group (monolingual, Frisian-Dutch, Arabic-Dutch, Tarifit-Dutch and Turkish-Dutch bilingual), Parental education, Non-verbal intelligence, VSTM, VWM, Word type, Regular, and Lemma frequency. Additionally, as interactions, we added all possible combinations of two variables. Since the variables Sex, Word type and Lemma frequency were not found to yield significant main effects, we did not include interactions with these variables. In order to run the model, we used the function glmer of the R Package lme4 (Bates et al. 2015). Subject and Word were entered as random intercepts.
Given the binary response variable, we ran a logistic model with model link function 'logit' and optimizer 'nlminb' from the R package optimx (Nash and Varadhan 2011). In our experience, the optimizer 'nlminb' makes models converging where other optimizers do not. Nevertheless, there were four interactions that prevented the model to converge: Age:Nonverbal intelligence, Age:Regular, Parental education:Regular, VSTM:VWM. We removed those interactions. Given the large number of main effects and interactions, it is tempting to use automatic variable selection procedures such as stepwise forward or backward selection. However, these methods have strongly been criticized (Flom and Cassell 2007). Among other problems, models resulting from a stepwise selection produce p-values that are biased towards 0. Therefore, we decided to continue with the full model as was suggested as one of the alternatives by Flom and Cassell (2007). By using the function Anova from the car package for each of the fixed-effects, Wald χ 2 tests were calculated and p-values obtained (Fox and Weisberg 2019). We chose the type III test because of the presence of (significant) interactions (see: https://www.r-bloggers.com/2011/03/anova-%E2%80%93-type-iiiiii-ss-explained/, accessed on 19 March 2021).
After running the full model, the effects of significant fixed-effects were examined. Pairwise comparisons were carried out with the R function pairs and emmeans from the emmeans package, which we chose for the more conservative Šidàk correction. This correction is slightly less conservative than the Bonferroni correction, but more powerful (Bretz et al. 2016). Graphs were plotted with functions from the ggeffects package which uses the predictions generated by the model when one holds the non-focal variable(s) constant and varies the focal variable(s) (Lüdecke 2018). For plotting (interaction) graphs of only categorical variables, we used ggemmeans, and for plotting continuous variables or interactions that include a continuous variable, we used the function ggpredict. Confidence intervals were based on the standard errors of the predicted values, assuming normal distribution (±1.96 * SE).
For the second model, the same procedure was followed as for the complete data set (see above). The same variables were entered in the model, and the variable Use Dutch was added. As interactions, we added all possible combinations of two variables. Since the variables Sex, Word type and Lemma frequency were not found to yield significant main effects, we did not include interactions with these variables. We entered Subject and Word as random intercepts. Twelve interactions prevented the model to converge, and were re-

Results
Children's accuracies at regular and irregular inflection are displayed in Table 2, distinguishing between regular and irregular inflection. VSTM and VWM scores are shown in Table 3.

Mixed-Effects Modeling: All Children
In the first model, we analyzed the data of all children, both monolingual and bilingual. The results are presented in Table 4. In this section, we continue by discussing the significant fixed-effects one by one.  As to the main effects, we found that higher age, higher non-verbal IQ scores (WNV) and higher VSTM significantly improved children's accuracy. Since the variables Group and Regular are involved in interactions, the results for the individual variables may be misleading. Therefore, we focus on the four significant interactions, starting with the two interactions that pertain to the predictions we formulated (Group:Regular, Group:VSTM). The figures that pertain to interactions that are not of direct relevance to our hypotheses can be found in Appendix C.
The interaction between Group and Regular is shown in Figure 1. Children performed more accurately at regular forms compared to irregular forms. For all five groups, this difference was significant (monolinguals: z = −5.12, p < 0.001; Frisian-Dutch bilinguals: z = −6.29, p < 0.001; Arabic-Dutch bilinguals: z = −4.70, p < 0.001; Tarifit-Dutch bilinguals: z = −4.64, p < 0.001; Turkish-Dutch bilinguals: z = −5.50, p < 0.001). Figure 1 shows that the difference between regulars and irregulars was smallest for monolinguals, and largest for Turkish-Dutch bilinguals. The difference between the groups was smaller for regular inflection than for irregular inflection, and for regular inflection, the 95% confidence intervals were smaller than for irregular inflection. For regular inflection, we found that the Turkish-Dutch bilinguals were less accurate than the monolinguals (z = 4.59, p < 0.001) and the Frisian-Dutch bilinguals (z = 4.49, p < 0.001). For irregular inflection, it turned out that the monolinguals were more accurate than the Frisian-Dutch bilinguals (z = 3.96, p < 0.01) and Turkish-Dutch bilinguals (z = 5.66, p < 0.001), and that the Turkish-Dutch bilinguals were less accurate than the Tarifit-Dutch bilinguals (z = 3.50, p < 0.05) and Frisian-Dutch bilinguals (z = 3.28, p < 0.05).      Figure 2 shows the interaction between Group and VSTM. The better children scored on VSTM, the more accurately they used inflection, except for the Turkish-Dutch group. Two other interactions were significant: Group:Parental education ( Figure A1) and Non-verbal intelligence:Regular ( Figure A2). For each of the groups, we found that higher parental education was associated with higher accuracy at using inflection, but for the Turkish-Dutch and Frisian-Dutch bilingual groups, the effect of parental education was stronger than for the other groups. The interaction between non-verbal intelligence and regularity shows that with increasing intelligence scores, the difference between accuracy at regular and irregular forms became smaller. Figure A2 also suggests that the effect of non-verbal intelligence was larger for irregular inflection than for regular inflection. The 95% confidence intervals were larger for irregular inflection than for regular inflection.

Mixed-Effects Modelling: Bilingual Children
In the second model, we analyzed the data of the bilingual children only, so that we could also examine the effect of the current use of Dutch at home (Use Dutch). The results are presented in Table 5. We continue by discussing the significant fixed-effects one by one.  Two other interactions were significant: Group:Parental education ( Figure A1) and Non-verbal intelligence:Regular ( Figure A2). For each of the groups, we found that higher parental education was associated with higher accuracy at using inflection, but for the Turkish-Dutch and Frisian-Dutch bilingual groups, the effect of parental education was stronger than for the other groups. The interaction between non-verbal intelligence and regularity shows that with increasing intelligence scores, the difference between accuracy at regular and irregular forms became smaller. Figure A2 also suggests that the effect of non-verbal intelligence was larger for irregular inflection than for regular inflection. The 95% confidence intervals were larger for irregular inflection than for regular inflection.

Mixed-Effects Modelling: Bilingual Children
In the second model, we analyzed the data of the bilingual children only, so that we could also examine the effect of the current use of Dutch at home (Use Dutch). The results are presented in Table 5. We continue by discussing the significant fixed-effects one by one.  As to the main effects, we found that higher age, non-verbal intelligence (WNV) scores and VSTM scores significantly improved children's accuracy at using inflection. More frequent use of Dutch at home was also associated with higher inflection accuracy. Pairwise comparisons indicated that fewer errors were made with regular than irregular inflection (p < 0.001) and that the Frisian-Dutch and the Arabic-Dutch group performed more accurately than the Turkish-Dutch group (respectively p < 0.001 and p < 0.05). Although we are warned that the results for Group and Regular may be misleading since these variables are involved in interactions, they were not contradicted by the significant interactions in which they are involved. The significant interactions are discussed below, starting with those that involve Regular (Age:Regular, Use Dutch:Regular, WNV:Regular), followed by the interactions that involve Group (Group:Parental education). Lastly, the interactions with VSTM and VWM are presented (Parental education:VSTM, Parental education:VWM). The figures that illustrate interactions that are not of direct relevance to our hypotheses can be found in Appendix D.
The interactions between children's age and regularity, use of Dutch at home and regularity, and children's non-verbal intelligence scores are plotted in Figure 3 and in Figures A3 and A4 in Appendix D. The figures show that with increasing age, use of Dutch, and non-verbal intelligence scores, the difference between the scores for regulars and irregulars became smaller. Additionally, the figures suggest that the effects of age, use of Dutch, and non-verbal intelligence were larger for irregular inflection than for regular inflection. The 95% confidence intervals were larger for irregular inflection than for regular inflection.  The interaction between group and parental education is illustrated in Figure A5. For each of the groups, higher parental education was associated with higher accuracy, but for Turkish-Dutch and Frisian-Dutch groups, the effect of parental education was stronger than for the other groups. Figures A6 and A7 illustrate the interactions between parental education and, respectively, VSTM and VWM. With increasing level of parental education and with better VSTM scores, children's accuracy on inflection improved. The positive effect of VSTM disappeared when parental level of education increased. Figure A7 shows that with increasing level of parental education, children's inflection accuracy improved. However, for low values of parental education, lower inflection accuracy was associated with higher VWM scores, whereas for high values of parental education, it was the other way around. The interaction between group and parental education is illustrated in Figure A5. For each of the groups, higher parental education was associated with higher accuracy, but for Turkish-Dutch and Frisian-Dutch groups, the effect of parental education was stronger than for the other groups. Figures A6 and A7 illustrate the interactions between parental education and, respectively, VSTM and VWM. With increasing level of parental education and with better VSTM scores, children's accuracy on inflection improved. The positive effect of VSTM disappeared when parental level of education increased. Figure A7 shows that with increasing level of parental education, children's inflection accuracy improved. However, for low values of parental education, lower inflection accuracy was associated with higher VWM scores, whereas for high values of parental education, it was the other way around.

Discussion
Inflectional morphology can be a locus of errors in bilingual children. The aim of this longitudinal study was to examine how this effect is influenced by regularity of inflection, determine the role of verbal short-term and working memory (VSTM, VWM), and to investigate interactions between these language-and child-level variables. In addition, unlike most previous research, this study was focused on Dutch and, importantly, included four groups of typically developing five-to eight-year-old bilingual children (Arabic-Dutch, Frisian-Dutch, Tarifit-Dutch and Turkish-Dutch), allowing us to explore variation within the heterogeneous population of bilingual children. VSTM and VWM were measured with a Forward and a Backward Digit Span task, respectively. A mixed-effects modeling approach was used in which we included not only the target variables (regularity, group, VSTM/VWM) as predictors of children's accuracy at using inflection, but also additional variables at the language-level (lemma frequency, word type) and child-level (age, sex, non-verbal intelligence, parental education, use of Dutch at home), enabling a better and more nuanced understanding of the impact of the target variables.

Regularity
Previous research on French-English bilinguals has shown that children's performance on regular inflection is more accurate than performance on irregular inflection, and that the gap between regular and irregular is larger in English than in French (Nicoladis et al. 2007;Nicoladis and Paradis 2012;Paradis et al. 2011). This difference between acquisition rates of regular and irregular forms is probably related to unequal input distributions and the relatively low type frequency of irregular forms, in line with a UB approach to language acquisition (Bybee 2007(Bybee , 2008. The results of the current study indicate that higher accuracies for regular than irregular inflection are also found for children learning Dutch. This effect was found regardless of lemma frequency and whether items were nouns or verbs. The difference between regular and irregular inflection was significant in all five groups in this study, but the size of the effect varied across groups. The Turkish-Dutch bilinguals showed the largest difference between regular and irregular inflection, whereas the difference was the smallest for monolinguals (in the analyses with all children). Interestingly, in the analysis with only bilingual children, we found a significant interaction between regularity and use of Dutch instead of an interaction between group and regularity. More use of Dutch at home was associated with a smaller difference between accuracy at using regular and irregular inflection. Use of Dutch was not included in the model with all children, as, due to how this variable was measured, the monolingual children did not vary in this respect. However, it is likely that the small difference between regular and irregular inflection found for the monolinguals stems from frequent exposure to and use of Dutch in this group.
In addition to group, two other factors interacted with regularity. Both analyses revealed that with increasing non-verbal intelligence, the gap between regular and irregular inflection narrowed. Regardless of their non-verbal intelligence scores, children were able to use regular inflection accurately, whereas irregular inflection was more difficult for children with lower non-verbal intelligence scores than for their peers with higher nonverbal intelligence scores. The effect of non-verbal intelligence replicates previous research showing that non-verbal intelligence predicts bilingual children's correct production of verb morphology (Paradis 2011), and demonstrates that analytical reasoning is a component of language aptitude in children. A second interaction effect was found for age. This interaction was only part of the statistical model with bilinguals, as the same model with all children did not converge. With older age, the differences in accuracy between regular and irregular inflection became smaller, in line with Rispens and De Bree (2015). This effect can be attributed to increasing accuracy at using irregular inflection, as regular inflection was highly accurate throughout the age range investigated.
To conclude, between-group differences are more likely for irregular than regular forms. The effect of regularity is caused by the relative difficulty of irregular forms. More use of Dutch at home, an older age, and better learning potential/analytical reasoning predict fewer errors with irregulars and are associated with a smaller gap between regular and irregular inflection.

Group
Previous research has shown that monolingual children make fewer inflection errors than bilingual children (Nicoladis et al. 2007;Lalleman 1986;Verhoeven and Vermeer 1985;Boerma et al. 2017), in particular if accuracy at using irregular forms is compared across the two groups (Rispens and De Bree 2015;Schwartz et al. 2009;Boerma et al. 2017). We found effects of group, which were more pronounced for irregulars than regulars, but the effects of group did not conform to a binary distinction between monolinguals and bilinguals, and rather display a distinction between monolinguals versus Arabic-Dutch/Frisian-Dutch/Tarifit-Dutch bilinguals versus Turkish-Dutch bilinguals. For example, using regular inflection, Turkish-Dutch bilingual children made more errors than monolinguals, but no significant differences emerged between Tarifit-Dutch or Frisian-Dutch bilingual children on the one hand and monolingual children on the other hand. Additionally, monolingual children were more accurate than Frisian-Dutch bilinguals and Turkish-Dutch bilinguals at using irregular inflection, but the difference between monolinguals and Tarifit-Dutch bilinguals did not reach significance. We turn to this outcome below. The findings confirm the heterogeneity in the bilingual population which is also emphasized by other researchers (Surrain and Luk 2017;Bialystok et al. 2012;Dixon et al. 2012;Kroll and Bialystok 2013).
Our predictions about cross-linguistic distance were only partially borne out. Crosslinguistic distance is the smallest for the Frisian-Dutch group, hence this group was expected to be the bilingual group with the highest accuracy. The Frisian-Dutch group indeed performed significantly better than the Turkish-Dutch group on both regular and irregular inflection, in line with our predictions. However, the Frisian-Dutch group did not perform more accurately than the Arabic-Dutch and Tarifit-Dutch groups. Moreover, whereas the Frisian-Dutch group was outperformed by the monolinguals on irregular inflection, no such effect was found for the Arabic-Dutch and Tarifit-Dutch groups. We suggest that both limited impact of cross-language similarity and much impact of amount of Dutch explain these patterns. Explaining this below, we focus on irregulars, because these showed the most pronounced effects of group, and revealed patterns that contradicted the hypothesized effect of cross-linguistic distance.
The irregular items in the standardized test we used overlap less between Frisian and Dutch than the regular items (Blom and Bosma 2016). As a consequence, Frisian-Dutch bilingual children do not benefit from Frisian when they use Dutch irregulars. For example, all regular noun plurals end in Dutch and Frisian in the same suffix, whereas all four irregular noun plurals differ between the two languages, which means that Frisian interferes (Blom and Bosma 2016). The second effect concerns the role of amount of Dutch. The model with only bilinguals showed that more use of Dutch at home predicted higher inflection accuracy. The impact of use of Dutch at home is in line with other research showing effects of input (see for an overview of input effects in bilingual children: Unsworth 2016), and supports a UB approach to inflection that foregrounds experiential effects. As shown in Table 1, in the Frisian-Dutch group, Dutch is used relatively infrequently, whereas in the Arabic-Dutch and Tarifit-Dutch groups, Dutch is used relatively frequently. Firstly, this pattern confirms findings in other research showing that migrant families from Moroccan descent use more Dutch at home than families from Turkish descent (Scheele et al. 2010). Secondly, it explains why the Frisian-Dutch children do not outperform the Arabic-Dutch and Tarifit-Dutch children, and why irregulars show, from the perspective of cross-linguistic distance, an unexpected pattern. In particular, irregulars that are not highly frequentwhich is the case for most items in this study-will require a substantial amount of input before these forms become entrenched and part of a child's repertoire, because they tend to rely on token frequency (Blom et al. 2012;Paradis et al. 2011;Rispens and De Bree 2015).
The effect of Group was moderated by regularity and parental education. For each of the groups, we found that higher parental education was associated with higher accuracy at using inflection, but for the Turkish-Dutch and Frisian-Dutch bilingual groups, the effect of parental education was stronger than for the Arabic-Dutch and Tarifit-Dutch groups ( Figures A2 and A5). These differential patterns underscore that effects of parental education can vary across monolingual and bilingual populations and across different bilingual populations (Scheele et al. 2010). They also suggest that previous findings on the impact of socioeconomic status on Dutch vocabulary size in Turkish-Dutch children (Prevoo et al. 2014) hold for Dutch inflectional morphology as well. In a sample of Turkish-Dutch 5-year-old children, Prevoo et al. (2014) found that a higher socioeconomic status (which included parental education) predicted Dutch vocabulary. This relationship was explained by reading input (the availability of children's books at home).
We conclude that a binary categorical distinction between monolingual and bilingual groups is untenable. Factors that predict a higher accuracy at using (irregular) inflection in bilingual children comprise a smaller cross-linguistic distance, a larger amount of use of Dutch at home, and a higher level of parental education.

Verbal Short-Term (VSTM) and Verbal Working Memory (VWM)
We predicted that VSTM would be positively related to inflection accuracy, regardless of bilingual group and regularity, and that better VWM scores would predict higher accuracy at using regular but not irregular inflection. Overall, the results showed that better VSTM predicted better inflection accuracy, supporting results in previous research Gathercole 1996, 2000;Baddeley et al. 1998;French and O'Brien 2008;Masoura and Gathercole 2005;Paradis 2011;Service 1992). However, a significant interaction effect between Group and VSTM showed that this effect held for all groups, except the Turkish-Dutch bilingual children. This interaction did not fit in model 2 with the bilinguals only, and prevented this model from converging. In our study, the Turkish-Dutch bilinguals scored, on average, lower on inflection than the other bilingual groups. This may suggest that the role of VSTM in learning inflection is dependent on level of development, similar to what has been proposed for word learning (Baddeley 2003). Interestingly, our findings differ from those of Verhagen and Leseman (2016), who found that VSTM predicted accuracy at using inflection in 5-year-old Turkish-Dutch children. The children in our sample were, on average, older. However, probably more relevant for understanding the differential findings is the fact that Verhagen and Leseman measured VSTM using (non-)word recall tests whereas we used a Forward Digit Span task. Based on several studies in which relationships between VSTM and vocabulary development were investigated, Baddeley (2003) concludes that digit span tasks tend to correlate less reliably with vocabulary than nonword recall tests. Model 2 did reveal a significant interaction between VSTM and parental education showing that the positive effect of better VSTM on inflection accuracy was found for lower, but not for higher levels of parental education. Parental education is a proxy for input quality and quantity (Hoff 2006), and the observed interaction indicates that children who experience higher input quality and quantity rely less on VSTM for learning inflection than children who receive less input and input of lower quality.
VWM did not predict children's accuracy at using inflection. A significant interaction was found between VWM and parental education, showing that for lower parental education, higher accuracies on inflection were associated with lower VWM. We do not have an explanation for this unexpected and counter-intuitive pattern, but it is relevant to note that with increasing parental educational level, the differences between higher and lower VWM outcomes almost disappeared or even reversed. The absence of an effect of VWM is not in agreement with the study of Verhagen and Leseman (2016), in which better VWM scores in monolingual Dutch and bilingual Turkish-Dutch children predicted fewer errors at using inflection. One possible explanation for the contradicting results may have to do with the inclusion of non-verbal intelligence. Nonverbal intelligence, VSTM and VWM have all been found to predict bilingual children's accuracy at using inflection (Paradis 2011;Verhagen and Leseman 2016), but tend to be correlated (Colom et al. 2006;Engle et al. 1999;Engel de Abreu et al. 2010;Giofre et al. 2013). For example, in our study, VWM was moderately to strongly correlated with non-verbal intelligence and VSTM, respectively (Appendix B). While Paradis (2011) considered the simultaneous effects of VSTM and non-verbal intelligence, Verhagen and Leseman (2016) included the effects of VSTM and VWM. The results of our study indicate that when all three measures of children's language learning potential are included, only VSTM and non-verbal intelligence are relevant predictors.
In conclusion, children with better VSTM are more accurate at using inflection, but this effect does not generalize to all children. Analytical reasoning (indexed by non-verbal intelligence) may be more important than VWM for children's accurate use of inflection.

Limitations and Future Research
The study had a number of limitations. The different groups had unequal sizes and, particularly, the size of the Arabic-Dutch sample was very small. This group was not targeted and only during testing it turned out that the children were speakers of (Moroccan-)Arabic instead of Tarifit. Despite its small size, the behavioral patterns in this group of children lined up with patterns in the other groups, strengthening the conclusions. Another limitation is that for each construct, only one indicator was used. Multiple indicators enable the extraction of a latent variable, which provides a more robust reflection of the construct that is less dependent on one specific task. This would, for example, allow a more reliable distinction between VWM and non-verbal intelligence. In addition, the role of cognitive functions related to VWM, such as attention and inhibitory control, could be examined. In this study, we included proportional measures of Dutch input quantity at home. However, input quality is often more important than mere input quantity (Rowe 2012). Conceivably, measures of Dutch input quality could provide insight into the differential patterns of parental education across groups. In this study, VSTM was related to accuracy at using inflection, but this relationship was not found for all children (all except for Turkish-Dutch bilingual children, and bilingual children whose parents were more highly educated). A venue for future research concerns the questions which children do and do not benefit from good VSTM, and why this is the case. Funding: This study was funded with grants from the Dutch Organization for Scientific Research (NWO Vidi awarded to Elma Blom) and the Province of Fryslân.
Institutional Review Board Statement: All subjects gave their informed consent for inclusion before they participated in the study. This research with Arabic-Dutch, Tarifit-Dutch and Turkish-Dutch subjects was screened by the Standing Ethical Assessment Committee of the Faculty of Social and Behavioral Sciences at Utrecht University. Criteria were met and further verification was not deemed necessary. The research with Frisian-Dutch subjects was carried out at the Fryske Akademy and University of Amsterdam. Unfortunately, the study was not officially evaluated by an ethics committee before the start of the study due to a miscommunication. In hindsight, the ethics committee of the University of Amsterdam evaluated the information folder and the informed consent form that we used and concluded that the research had been conducted with the wellbeing of the participants in mind.

Informed Consent Statement:
Informed consent was obtained from all subjects involved in the study. Data Availability Statement: All anonymized data, analysis scripts and results are available on Open Science Framework (https://osf.io/gqb8v/, accessed on 19 March 2021).

Conflicts of Interest:
The authors declare no conflict of interest.
The variables were averaged per subject and word (random factors) before calculating the correlation coefficients, which resulted in n = 5232 cases for all children and n = 4128 cases for the bilingual children only. In both groups, the highest correlation was found between VSTM and VWM.
The variables were averaged per subject and word (random factors) before calculating the correlation coefficients, which resulted in n = 5232 cases for all children and n = 4128 cases for the bilingual children only. In both groups, the highest correlation was found between VSTM and VWM.
Appendix C. Interaction Graphs: All Children Figure A1. Interaction between group and parental education (EdPar). Figure A1. Interaction between group and parental education (EdPar).    Figure A4. Interaction between non-verbal intelligence (WNV) and regularity of inflection. Figure A4. Interaction between non-verbal intelligence (WNV) and regularity of inflection. Figure A3. Interaction between use of Dutch at home and regularity of inflection. Figure A4. Interaction between non-verbal intelligence (WNV) and regularity of inflection.