Effects of a Dutch Family Literacy Program: The Role of Implementation

: It is hypothesized that variability found in the effects of family literacy programs results from differences in implementation by parents. In this study, the implementation and effects of a Dutch program were examined in a sample of 207 kindergarteners (mean age at pre-test: 64 months). No main intervention effects on children’s literacy development were found. The quality of implementation proved to be higher for high-SES and native Dutch (speaking) parents than for low-SES, ethnic-minority parents with other home languages. Parent SES, ethnic-minority status, and home language did not moderate the program effects on child language scores and the program failed to impact targeted parental attributes, namely, the home literacy environment and parent self-efficacy. Finally, children’s development proved unrelated to implementation variables. Our results stress the importance of delivery for adequate implementation.


Introduction
Recognizing the strong influence of parents as first educators of their children, Family Literacy Programs (FLPs) aim to promote children's literacy development by stimulating their Home Literacy Environments (HLEs) [1,2]. Hannon [3] defines FLPs as 'programmes to teach literacy that acknowledge and make use of learner's family relationships and engagement in family literacy practices' (p. 100). Although this definition encompasses different sorts of programs [4,5] many interventions encourage parents to engage in joint literacy activities with their child. There appears to be substantial variability in FLP effects on children's literacy skills. Since it is hypothesized that this variability is partly due to differences in parental implementation, this study tests the role that different aspects of parental implementation play in program effects.

Variability in Effects of FLPs
Over the past decades, various FLPs have been developed and many have been the subject of effect studies. These effect studies were summarized in a number of meta-analyses, showing that FLPs are generally effective, although there is great variability in effect sizes. Meta-analyses showed significant but small effects of FLPs on children's literacy outcomes [6,7]. Comparing the impact of different types of FLPs, Sénéchal and Young [8] found tutoring programs, in which parents teach literacy skills such as letter knowledge, to yield large effects on reading acquisition, whereas shared reading programs generally had trivial effects. Mol, Bus, de Jong, and Smeets [9] summarized the effects of Dialogic Reading programs on vocabulary development, examining the added value of this approach-which requires the child's active participation-above and beyond typical shared reading. They found a medium mean effect in favor of Dialogic Reading.
The characteristics of the target population appear to be one source of variability in program outcomes. Two of the afore-mentioned meta-analyses provided evidence for differential effects of shared reading programs for different subgroups of children. Mol et al. [9] found that for children who were at risk of language and literacy impairments (based on family income or maternal education), the effects of Dialogic Reading on vocabulary skills were trivial compared to the effects for non-at risk children (d = 0.13 vs. d = 0.53). Manz et al. [6] reported a significant difference in effect sizes between Caucasian and ethnic-minority families (d = 0.64 versus d = 0.16), as well as in effect sizes between middleor high-income and low-income families (d = 0.39 versus d = 0.14). An overview of different meta-analyses [10] suggests that this raises doubts about whether low-SES and ethnicminority families are capable of executing FLPs optimally and, consequently, that studies analyzing the effects of FLPs in such groups should take program implementation into account.
The quality of program implementation by parents may vary for different reasons. Many shared reading programs, for instance, target parental strategies such as scaffolding, which require parents to be sensitive and responsive to their children's input [11]. Previous research has shown that low-SES parents demonstrate less of this behavior compared to high-SES parents [12,13]. Another possible reason is that ethnic-minority families are hampered in conducting program activities by limited language proficiency. Often, FLPs are delivered in the majority language, which might lead to the suboptimal implementation of these programs in families with other home languages [14]. A shortcoming in many intervention studies on FLPs so far is that they rarely include measures of program implementation in effect analyses [6,10,15].

Defining Implementation
Implementation is assumed to play an important role in the effectiveness of any intervention program [16][17][18]. In their landmark review, Durlak and DuPre [16] analyzed over 500 studies on (mental) health prevention and promotion programs for children and adolescents and found strong support for the importance of implementation in determining program effects. Summarizing the outcomes of five meta-analyses, the authors concluded that good implementation generally results in effect sizes two to three times larger than when implementation is poor. They therefore state that 'the assessment of implementation is an absolute necessity in program evaluations. Evaluations that lack carefully collected information on implementation are flawed and incomplete' [16] (p. 340).
In the current study, we build on a framework proposed by Powell and Carey [19] to systematically analyze the implementation of FLPs. This framework consists of three main components, two of which focus on parental behaviors: receipt and enactment. Both components contain a quality and a quantity dimension. Receipt refers to parent engagement in training and program activities. Attendance at training sessions is an example of a measure of receipt quantity, whereas quality can be assessed by parents' use of targeted program strategies, understanding of program content, and their engagement during program activities with their child. Enactment pertains to the degree to which participants use the gained knowledge and skills in their day-to-day life. Are parents able to transfer the learned program strategies to activities outside of the intervention? Enactment quality refers to the quality of parent-child interaction during reading or other targeted activities outside program time or after the intervention has ended. It also includes parents' intentions to change their behavior as a result of the intervention and changes in their sense of self-efficacy in supporting their child's learning. The quantity of enactment pertains to the frequency of reading or other targeted activities outside program time or after the intervention has ended. For sustained program effects, it is important that parents are able to maintain their use of newly learned skills in order to reach more long-term goals such as improving children's literacy skills.
Although both variables pertain to parental behavior, they can be argued to take on different roles in program effectiveness. Receipt pertains to parental behavior during program activities and can thus only be assessed in families taking part in the intervention. Variability in receipt is then assumed to predict variability in child outcomes in participating families [20]. Enactment can, in essence, be seen as a mediator of intervention effects on child development: FLPs are hypothesized to induce changes in parental behavior outside program time and such changes are assumed to contribute to children's literacy development [21,22]. The implication of this view is that enactment variables should be assessed both in families that participate in the intervention and control families that do not; only then can the hypothesized mediation effect be tested.
Delivery is the third component in Powell and Carey's [19] framework and refers to the transfer of main program contents from trainers to parents. The quantity dimension of delivery involves the dosage of parent training (e.g., number and duration of training sessions), whereas the quality dimension reflects the way program contents are communicated to parents. Because the current study focuses on parental implementation, delivery is not included in our analyses, although we do assess it as an indicator of treatment fidelity (see Method).
Recently, de la Rie et al. [15] reviewed the available research on implementation of FLPs and its relation to program effects. The authors analyzed 46 studies and found that information on implementation varied in breadth and quality: Almost all studies provided information on parents' quantitative engagement in programs (i.e., receipt quantity), but fewer studies reported about quality of engagement (i.e., receipt quality), and transfer to daily life (i.e., enactment). The relationships between implementation and FLP effects remained largely unexplored. Moreover, studies that did analyze this relationship reported inconsistent findings. Some studies found relationships between implementation and effects [20,22], whereas others did not [23,24]. None of the included studies examined the quality and quantity dimensions of both receipt and enactment, as well as relationships among implementation variables and program effects. In conclusion, a comprehensive approach in measuring implementation seems to be lacking, even though this can provide crucial information on how to improve program effectiveness [16].

The Current Study
In the current study, we examined the effectiveness of 'Early Education at Home' (EEH) [25], an FLP that is conducted in Dutch primary schools and that serves a diverse population of families in terms of SES, ethnic background, and home language. In light of the hypothesized implementation issues in low SES and ethnic minority families, we examined whether parental SES, ethnic-minority status, and home language are related to quality and quantity of receipt and enactment of EEH. Because implementation issues in these groups of families are assumed to hamper program effects, we first tested whether program effects were moderated by parent background variables (SES, ethnic-minority status, home language). Subsequently, we tested whether the quality and quantity of both receipt and enactment were associated with program outcomes. As argued in the introduction, receipt variables are included in our analyses as predictors of experimental children's growth in language and literacy skills. Enactment was measured in both conditions and treated as a mediator of intervention effects.

Research Questions
The following research questions are addressed: 1. Does EEH positively affect children's language and literacy skills? 2. What are the relationships among parental SES, ethnic-minority status, and home language, and implementation of EEH?
3. Are effects of EEH moderated by parental SES, ethnic-minority status, and home language? 4. Do receipt variables (quantity and quality of parental engagement in the intervention) predict EEH children's growth in language and literacy skills? 5. Do enactment variables (HLE, parents' sense of self-efficacy, and the quality of parents' behavior and language) mediate the effects of EEH on children's language and literacy development?

Hypotheses
Hypothesis 1. EEH positively affects children's language and literacy skills.

Hypothesis 2.
Parents of lower SES, ethnic-minority parents and parents with a home language other than the majority language show lower implementation quality. We base this hypothesis on the notion that FLPs are often not well tailored to the needs of at-risk families [6].

Hypothesis 3.
Parents of lower SES, ethnic-minority parents and parents with a home language other than the majority language show lower implementation quality. We base this hypothesis on the notion that FLPs are often not well tailored to the needs of at-risk families [6].

Hypothesis 4.
As can be expected from the wider implementation quality literature [16,19], we hypothesize that receipt variables will be positively associated with children's growth in language and literacy skills.

Hypothesis 5.
In line with the literature on our selected enactment variables [26,27], as well as with program theory [25], we expect program effects to be mediated by the HLE, parents' sense of self-efficacy, and the quality of parents' behavior and language while they engage in literacy related activities with their child.

Sample
Primary schools in the Western part of The Netherlands were invited to participate by letters and subsequent telephone calls, as well as by posting a call in a digital community for kindergarten teachers. Although teachers self-selected participation in the experimental condition, the comparability of experimental and control conditions was maximized by including both an experimental and a control class from the same school, in order to minimize school effects. The study involved a total of 7 schools and 27 kindergarten teachers. On average the participating teachers were 49 years of age, ranging from 23-59 years. All 18 participating classes were second year kindergarten classes, with pupils aged between 4 and 6 years old. In six of the schools, two classes took part, whereas one larger school participated with six classes (3 experimental and 3 control classes). See Table 1 for an overview of participants (number of schools, classes, teachers, children and their parents). Teachers delivered the intervention to the parents of their pupils. In the larger school, one teacher delivered the intervention to three experimental classes. None of the teachers had prior experience working with EEH.  Schools  7  7  7  Classes  9  9  18  Teachers  13  14  27  Kindergartners and their parents  98  119  217 Note. a 7 out of the 14 teachers that were teaching in intervention group delivered EEH to parents and children.

Control Group Intervention Group a Total
All parents were informed about the study by the school through a letter and they could indicate if they did not wish to participate. None of the parents refused participation. All children in the selected classes participated in the study, with three exceptions: two children who had a twin in the other condition (to prevent bias from a control group child being exposed to the intervention), and one child with Down's syndrome. In total, parents of 217 children from 18 classes agreed to participate in the study; 119 children participated in the intervention (9 classes) and 98 were in the control condition (9 classes). Children in both the intervention group and the control had a mean age of 64 months at pre-test (range intervention group: 57-76 months; range control group: 57-75 months). In the intervention group, 51% of the sample consisted of girls, whereas for the control group, this was the case for 49% of the participants.
We checked for significant differences between the experimental and control conditions on children's pre-test language and literacy scores, using independent samples ttests, and none were found. Regarding relevant background characteristics of children (gender, age) and parents (SES, migration background, home language), and richness of the HLE and parent self-efficacy (PSE), we again found no significant differences between the experimental and control group participants at pre-test, suggesting that the two conditions were comparable on important characteristics.
We asked the parents who considered themselves to be most involved in the child's upbringing to fill in a parent questionnaire (79% mothers, 20% fathers, 1% foster parents or extended family members). All but three parents completed this questionnaire. For parent-child observations, we asked parents who were most involved in conducting EEH with the child (experimental group) or in the upbringing of the child (control group) to participate. Parents' characteristics are presented in Table 2. During the school year, ten pupils left the study as a result of their families moving out of the area, or switching schools for another reason (e.g., because the child needed special education), decreasing the sample size to a total of 207 participants at the end of the school year (115 experimental participants; 92 controls). Although there was 4,6% attrition from the original sample, bivariate correlations suggested no significant differences between dropouts and the remainder of our sample on key background characteristics (i.e., parental educational attainment and migration background, gender of the child). There was one exception: home language was significantly related to drop-out (r = 0.202, p < 0.01), indicating that native Dutch speaking parents remained in the sample more often compared to non-native Dutch speakers.

Measures
Language and literacy skills. Three measures were included to assess child language and emergent literacy skills. First, we used a standardized language test [28] that was part of the participating schools' student monitoring system and that was administered (preand post-test) by teachers in a regular class setting (approx. 30 min.). This test included measures of receptive vocabulary, critical listening, phonemic and rhyme awareness, print knowledge, and auditory synthesis abilities. To measure receptive vocabulary, for example, children were asked to select out of four images the image corresponding to a word that was read aloud by the teacher. Rhyme awareness was measured, for example, by the teacher reading a word aloud followed by four other words and children were asked which of the final four words had the same starting sound as the first word. Cronbach's alpha for the total score is 0.87 [28]. Information on children's emergent literacy skills was additionally obtained by teacher ratings via a questionnaire. We used an emergent literacy instrument with a five-point Likert scale based on Van Steensel [26], which consists of three subscales with a total of 15 items: oral language, phonological awareness, and print knowledge. To assess oral language, teachers were, for instance, asked to indicate to what extent a child could tell a coherent story. For our data, the composite alpha was 0.97 (averaged across pre-and post-test).
Additional information on children's curriculum-based vocabulary was obtained from a receptive vocabulary test (similar in format to the Peabody Picture Vocabulary Test) designed for this study by the first and third author. We incorporated 43 words from EEH program themes. Children were tested individually (approx. 5 min.) by a research assistant, in a quiet one-on-one setting. Cronbach's alpha for this test was 0.71 (averaged across pre-and post-test).
Implementation. For an overview of our measurements of program implementation following the conceptual framework of Powell and Carey [19], see Table 3. Table 3. Receipt and enactment [19].

Element of Implementation Dimension Aspect
Receipt Quantity Attendance at training sessions Number of diaries handed in; activities completed

Quality
Quality of parent behavior and language during a program activity (Program Activity; shared reading) Enactment Quantity Frequency of literacy-related activities outside program time Quality Quality of parent behavior and language during a non-program activity (Non-Program Activity; prompting board) Parent self-efficacy in helping the child succeed in school Regarding receipt quantity, attendance at group meetings was registered for each session by the teachers who delivered the program to parents. Additionally, parents were given diaries for every activity booklet, in which they were instructed to register completed program activities on a checklist. We counted the number of diaries handed in by parents and the number of activities completed, based on what parents reported in the diaries.
All other implementation measures were administered twice: at the beginning and at the end of the intervention period (see Table 4). The quality dimension of intervention receipt was measured by observing parent-child interactions during a program activity (shared reading) at pre-and post-test and scoring the quality of parents' behaviors and language. Most observations took place at school, whereas a few parents preferred to be observed in their home. In order to rate the observations of both the program and nonprogram activities (see 'Enactment quality' below for the non-program activity) we used an observation scheme developed by Kenney [29] (later used by Mol & Neuman [30]), which we translated from English to Dutch. This observation scheme includes the following six categories of parent behavior features and language: Labeling, generalizing, repetition and paraphrasing, scaffolding, fostering child autonomy, and quantity and variety of language (see Appendix A for examples). These are all aspects targeted in EEH and were scored on a scale ranging from 1 (not at all characteristic) to 4 (very characteristic). Cronbach's alpha reliabilities of the scales were 0.83 for the non-program activity (NPA) and 0.87 for the program activity (PA)

PARENT-CHILD OBSERVATIONS
Receipt & enactment • Quality of parent behavior and language during a Non-Program Activity (NPA) and a Program Activity (PA) [29] Receipt & enactment • Quality of parent behavior and language during a NPA and a PA Enactment quantity was measured by an HLE-questionnaire consisting of eight items derived from Van Steensel [31]. Van Steensel found support for the construct and predictive validity of the HLE questionnaire in a previous study of a comparable sample. The questionnaires were available in Dutch, English, Arabic, Turkish, and Polish. The parents were asked to indicate how many times they engaged in literacy-related activities with their child on a scale ranging from 1 (daily) to 4 (almost never/never). The following activities were included: shared reading, going to the library, singing songs, writing alphabet letters, storytelling, visiting a bookstore, playing educational (online) games, and watching educational TV shows together. The Cronbach's alpha coefficient for this scale was 0.63. Enactment quality was measured by scoring the quality of parents' behaviors and language during a non-program activity (NPA), which was a prompting board task very much like those in EEH, but without written (program) instructions. A prompting board is a complex picture, suggesting a sequence of events, and is designed to elicit child speech. We selected a picture of a busy park on a summer day (pre-test) and a picture of a zoo (post-test) from an existing prompting board book [33]. We invited parents to engage in a conversation with their child as they would normally do when looking at pictures together.
Because of the large number of observations (n = 156 at pre-test and n = 148 at posttest for non-program and program activities combined), ratings were given by a total of 12 coders during the observations (one coder per parent-child dyad). In order to assess interobserver agreement, Cronbach's alphas were calculated for the NPA and the PA on both the pre-and post-test, based on a random selection of 12% of all ratings. This selection of observations was double-coded by the first author, using the video-recordings that were made during the observations. The average alphas across the observed categories of parent behavior and language for the NPA and the PA (α = 0.87) at pre-test (α = 0.86 and 0.87, respectively) and posttest (α = 0.78 and 0.83) indicated sufficient agreement. As our sample contained parents with limited Dutch language proficiency, observations were conducted in parents' self-reported home language, with the aid of bilingual research assistants who spoke one or more of the following languages: Arabic, Berber, English, Polish, and Turkish.
To measure parental self-efficacy (PSE) as an aspect of enactment, we administered the 'How to help my child succeed in school scale' at pre-and post-test. This scale was developed by Hoover-Dempsey and colleagues [26,32] and consists of 12 items that measure parents' perceptions of personal efficacy, specifically in relation to supporting their children's school success. This scale contains items such as 'I know how to help my child do well in school' and 'If I try hard, I can get through to my child even when he or she has trouble understanding something.' Parents were asked to rate their sense of efficacy per item on a 5-point scale ranging from 1 (strongly disagree) to 5 (strongly agree). Cronbach's alpha for this scale was 0.76.
Parent and child background variables. To obtain relevant background information from parents and answer our second and third research question, we added questions to our parent questionnaire pertaining to SES, ethnic-minority status, and home language. SES was operationalized as the highest level of education that parents had completed. Ethnic-minority status was distilled from parents' country of birth. Home language was operationalized as the parents' best oral language. Furthermore, we asked teachers to provide us with information regarding children's age and gender. Table 4 provides information on the overall planning of the study. The experimental teachers were trained and coached four times from June to October 2014. The measurements took place at four points in time. The pre-tests were spread across two periods: the language test was administered in June 2014 (T1), since this is the administration time prescribed [28], whereas the other measures were administered in September and October 2014 (T2), closer to the start of the intervention period (September 2014). Subsequently, data were collected half-way through the school year (T3: January 2015), and at post-test (T4: June 2015).

Intervention
EEH is a government-funded program conducted in major cities and suburban districts across The Netherlands, Belgium, and Germany. EEH is mostly conducted in schools with many children from low-educated and/or ethnic-minority families, although-because the program often targets whole classes-higher-educated, native Dutch families also take part. EEH involves literacy-related activities such as shared reading, prompting board activities [34] and arts and crafts activities which parents are stimulated to conduct with their child at home. Activities can also be part of daily routines outside the home, such as when parents take their child on a walk to the park and discuss what kind of things and animals can be found there (such as leaves, squirrels). Parents are instructed to pose stimulating questions (such as "during what season are the leaves falling and why do you think that happens?") and, for example, gather objects to bring home and use to create art or a drawing (such as leaves and beechnuts). During playful activities such as these, parents can stimulate their child's literacy development by exposing the child to sophisticated vocabulary, abstract language, carefully adjusted to the child's developmental level. Parents are trained by their child's teacher during group meetings at school in which literacy activities are discussed, modeled and role-played. EEH is assumed to affect children's literacy outcomes by means of improving both the frequency and the quality of literacy related activities in the home. With respect to the quality of shared activities, stimulating child autonomy, variety of language, and out-of-context language (also referred to as abstract language in the literature) are targeted. In addition, parent training is assumed to increase parents' self-efficacy. By enabling parents to create successful learning experiences with their child, they are expected to gain confidence in their role. Materials. Participants in EEH are provided with a colorful bag holding a multi-order with activity booklets (one for each theme, with a total of seven themes per year) which include eight literacy-related activities and instructions, as well as materials for conducting these activities, such as storybooks, prompting boards [35], and materials for arts and crafts (e.g., colored paper, crayons, paint, scissors, etc.). The storybooks used in the intervention were written specifically for this age group by well-known (Dutch) children's book writers. Each activity comes with a sheet of instructions for parental guidance and suggestions for questions aimed to trigger stimulating parent-child interactions, characterized by responsiveness, open-ended questions, scaffolding, and exposure to new vocabulary. As a considerable part of the target group of EEH consists of children of ethnic-minority parents who are more proficient in other languages than Dutch, some materials (i.e., storybooks) are also available in a selection of other languages (i.e., Arabic, English, and Turkish).
Teacher training. The teachers who delivered the program to parents were trained by the first author in two phases. Phase 1 was a three-hour session in which teachers were instructed on the specific contents of EEH, and on delivering the program to parents. The teachers were trained in using four techniques: explaining activities to parents in an interactive manner, modeling interaction strategies, conducting program activities together with all attending parents, and role-play (i.e., enacting activities with a colleague and/or parent(s), where one plays the parent and another the child). They were trained to invite parents to actively share their experiences with the program during the meetings. Finally, the teachers were asked to provide parents with ideas for them to transfer skills mastered during the intervention period to their daily lives (enactment). This included suggestions for turning a regular daily activity, such as shopping, into a learning experience. Parents could, for example, be encouraged to discuss pieces of clothing with their child, and to ask open-ended questions, such as 'What pieces of clothing are suitable for winters?'.
Additionally, the teachers were stimulated to adapt their instructions to low-educated and low-literate parents, and parents with limited Dutch language proficiency, through the use of pictures, repetition, monitoring parents' understanding, and, when possible, allowing time for parents to translate for others. EEH assumes that parents are best able to support their children's development by using the language they are most proficient in, and that knowledge and skills acquired in the first language can be transferred to the second language [34]. Hence, teachers were encouraged to stimulate parents with limited Dutch proficiency to make use of the materials available in other languages and conduct activities in their home language.
After this first training session, the intervention commenced. Phase 2 of the teacher training consisted of coaching. After the second and the third parent meeting, which were observed by the first author, teachers were provided with immediate feedback regarding their performance based on these observations (1.5 h per session).
Parent meetings. Parents in the intervention group were trained by their child's teacher. Teachers were requested to organize seven six-weekly group meetings, lasting between 60 and 90 min each. Teachers worked from a scripted outline to ensure fidelity across schools. The first part of a standard EEH parent meeting is dedicated to informing parents about what children have been learning in class during the previous period, and what they will be learning during the upcoming period. During the second part of the meeting, teachers evaluated the activities that parents completed with their child over the preceding period, in order to identify difficulties and suggestions for improvement for upcoming program themes. In the final part, teachers provided parents with an overview of the activities in the new workbook and explained how to conduct these activities.
Treatment fidelity (delivery). Adherence to the proposed number and duration of parent meetings was checked by contacting experimental group teachers after each planned meeting, asking them to provide basic quantitative information. Almost all teachers were able to organize all intended seven meetings. One teacher organized six meetings. The average duration of group meetings across all schools was 52 min, with quite a large range (15-80 min), suggesting that, generally, schools did not stay within the range that is prescribed by the program (60-90 min).
In order to assess the quality of intervention delivery, the first author observed three out of seven parent meetings at each participating school, using a checklist to assess adherence to program guidelines. This checklist entailed topics such as evaluation of the previous EEH theme, use of the trained techniques (e.g., role-play, modeling), and use of open questions and concrete examples. Overall, the quality of delivery was quite high across schools, with one exception. In this school the teacher failed to adequately address the new theme in class and the activities for the upcoming EEH theme in all of the observed meetings. In the remaining schools, most meetings were in line with program guidelines. Nonetheless, all teachers largely ignored transfer of program skills to daily situations outside of program time, with one exception. Regarding explanation of the upcoming EEH theme, three of the proposed techniques-modeling, enacting, and roleplay-were hardly ever used by any of the teachers.
Control group. The control group was a "business as usual" control group, meaning that the children in the control group followed the same school curriculum as the experimental group children. However, there were no family literacy type programs offered to the control group children. One of the inclusion criteria for participation in this study was that the school was not working with a family literacy or other type of parental involvement program in which systematic parent-child activities and training for parents were offered.

Analyses
We estimated, a priori, that a sample of 128 children was needed to test the intervention effects with a two-sided test, an alpha of 0.05 and a statistical power 0.80 [36]. The power analyses were based on the overall moderate effect size (Cohen's d = 0.50) of FLPs found in recent meta-analyses [6,9] and were conducted in in G*Power Version 3.1.9.2 using a two-sided t-test [37]. For all research questions we employed regression analyses using the program MLWin Version 2.36 [38]. When significantly related to outcome measures, relevant background variables (child gender, child age, parent SES, parent ethnic-minority status, and home language) were added to the models as covariates. Our data are hierarchical, that is, measurements are nested within pupils, pupils are nested within classes, and classes are nested within schools. Because of this hierarchical structure, we first of all tested for significant variance on the upper levels, to determine whether or not we should employ multi-level analyses. For each set of outcome measures, which differed per research question, decisions were made regarding the most appropriate strategy to model growth (growth model, pre-test as covariate, or change scores).
Research Question 1 and 3 involved effects on children's development, and thus included language, emergent literacy and receptive vocabulary as outcome measures. To analyze language scores we fitted a growth model to the data, as we had three measurement points (see Table 4). Exploration of intercept-only models showed significant variance in language scores on all four levels (see Appendix B, Table A1). Hence, we proceeded with a four-level growth model. Emergent literacy and receptive vocabulary were measured at two time points. As we were dealing with: (1) a quasi-experimental setting without randomization, and (2) existing groups (i.e., classes), we used change scores to conduct these analyses [39]. Because we found significant variance in change scores for emergent literacy on the class level, we proceeded with a two-level model (pupils and classes; see Appendix B, Table A3). We found no significant variance on the class-or school-level for receptive vocabulary, and hence the analyses involving this outcome measure were conducted at uni-level (see Appendix B, Table A5).
The second research question regarded the prediction of implementation and thus included both receipt and enactment variables as outcomes. For receipt we were interested in examining whether parent characteristics would predict overall implementation. Therefore, we analyzed sum scores for attendance, diaries, and activities, and mean scores for program activity (PA). Attendance scores were analyzed with a multilevel model with two levels: schools and pupils (see Appendix C, Table A7). Diary scores were analyzed with a pupil and a class level (see Appendix C, Table A8). Sum scores for activities conducted were analyzed uni-level (see Appendix C, Table A9). PA was analyzed uni-level, as no significant variance was found on the upper levels (see Appendix C, Table A10). With respect to enactment variables (HLE, PSE, NPA), we were interested in examining whether targeted behaviors and practices increased and whether low-SES and ethnic-minority families' growth on enactment variables was different from that of higher-SES and native Dutch parents. Therefore, we analyzed change scores. For change in HLE, PSE and NPA, no significant multi-level structures were found, and hence, all analyses including these variables as dependent variables were conducted uni-level (see Appendix C, Tables A14-A16).
The fourth research question involved the same child outcome measures as in Research Question 1 and 3, but now the sample included only experimental group children (see Figure 1), and hence the multi-level structures were explored separately. Language scores were analyzed using a growth model with two levels: time and pupil (see Appendix E, Table A23). Change scores on emergent literacy were analyzed using a multilevel model with two levels: pupils and schools (Appendix E, Table A26), and change scores on vocabulary were analyzed uni-level (Appendix E, Table A28). The fifth research question involved possible mediation of intervention effects by enactment variables (see Figure 2), which we tested following a widely used method [39]. We first examined relations between EEH and child development, followed by relations among EEH and growth on enactment variables (mediators), and finally, relations among growth in enactment variables and child development, after controlling for the effect of condition. The first and third steps for testing mediation included language, emergent literacy and receptive vocabulary as outcome measures and hence were analyzed according to the procedure described for Research Question 1. Change scores on our mediator variables (HLE, PSE, and NPA) showed no significant variance on the upper levels and thus were analyzed uni-level (see Appendix F, Tables A30-A32).

Descriptive Statistics
The descriptive statistics for child outcomes, receipt, and enactment variables are presented in Table 5. First, the table shows that language and literacy scores were higher at post-test than at pre-test. This was not the case for the three parent enactment variables that were expected to mediate intervention effects. For example, the overall quality of parents' behavior and language during the NPA slightly decreased over the year. Average scores on receipt quantity variables revealed that parental engagement in the program was not optimal: on average, parents returned approximately 4 out of 7 diaries and conducted about 60% of program activities. As the end of the intervention year approached, we witnessed a decline in attendance, diaries handed in, and activities conducted. Conversely, there was a slight increase in receipt quality (PA). Bivariate correlations between study variables (at post-test) are presented in Table 6. All child outcome measures showed to be significantly and moderately to strongly correlated. This is to be expected, as all child outcomes were to measure (a specific part of) language ability. Three of our measures of implementation quality-parent self-efficacy and parent behavior and language during a prompting board activity and shared reading-showed to be significantly related to one or more child outcome measures. The strongest significant relation was found between two measures of receipt: the number of diaries handed in and the percentage of activities conducted. This strong association reflects the fact that the percentage of activities conducted was derived from the diaries that were handed in by parents. Furthermore, parent behavior and language during the program task (shared reading) was significantly, though weakly, associated with parents' attendance at group meetings and the number of diaries handed in. Parent behavior and language during the non-program activity (prompting board) was associated only with the number of diaries handed in. Moreover, parent ethnicity correlated significantly with our child outcome measures and parent behavior and language (during both activities), in the expected direction.   Regarding our first research question, we found no direct effects of the EEH inter-2 vention on children's language skills as measured by a standardized language test, their 3 (teacher-reported) emergent literacy skills, and their curriculum-based receptive vocabu-4 lary scores. 5 Tables with parameter estimates are presented in Appendix B (Tables A2, A4, and   6 A6).  (Tables A11-A13). We did, however, find significant relations between SES and re- 13 ceipt quality, as measured by parents' behavior and language during a program activity 14 (shared reading; see Table 7). A higher level of education was associated with a higher 15 mean score on PA. In addition, non-native Dutch parents scored significantly lower on 16 PA than native Dutch parents. Finally, parents whose home language was different from 17 Dutch and parents who were equally proficient in Dutch and their mother tongue, scored 18 significantly lower on PA than parents who indicated their home language to be Dutch. 19 Regarding enactment variables, we found no significant relations between SES, ethnic-20 minority status, and home language, and change in HLE, PSE, and NPA (see Appendix 21 C, Tables A17-A19). 22 Table 7. Regression-predicting parents' behavior and language during a program activity.  to be the case (See Appendix D, Tables A20-A22). 28 3.5. The Role of 'Receipt' 29 In answering our fourth research question, we analyzed relations among receipt var- 30 iables and children's language and literacy development. None of our receipt variables-attendance at training sessions, diaries handed in, activities conducted, and quality of be- 32 havior and language during a program activity (PA)-significantly predicted children's 33 language and literacy development. Tables with parameter estimates are presented in Ap-34   pendix E (Tables A24, A25, A27 and A29). 35 36 To test whether enactment variables mediated intervention effects, we first analyzed 37 relations among EEH and growth in enactment variables HLE, PSE and NPA. These were 38 non-significant, indicating that the intervention did not succeed in improving these attrib- 39 utes in parents. Second, we analyzed effects of change in enactment on children's lan- 40 guage development. Growth in HLE, PSE and NPA was found to not significantly predict 41 children's development on any of the language measures. Finally, no significant relations 42 were found among the mediator variables and child outcome measures, while controlling 43 for condition. These results indicate that program effects were not mediated by enactment 44 variables. Tables with parameter estimates are presented in Appendix F (Tables A33-45 A38). 46

47
Earlier results of family literacy programs underline the importance of examining 48 implementation [7,10], as this seems key to understanding variability in intervention ef- 49 fects [16]. We evaluated the outcomes of EEH, a program that aims to stimulate kinder- 50 gartners' language and literacy skills by improving the frequency and quality of home 51 literacy practices as well as the degree of parent self-efficacy and analyzed whether pos- 52 sible effects were associated with two aspects of parental implementation: receipt and en- 53 actment. 54 Regarding our first research question, contrary to our first hypothesis, results indi- 55 cated no main intervention effects on children's language and literacy skills. With respect 56 to our second research question, we found significant relations among parent background 57 variables and receipt quality, as measured by the observed quality of behavior and lan- 58 guage during a program activity (shared reading their children [42,43]. Therefore, it is likely that when parents believe that program activ-108 ities can be effective in improving their child's development, they are more likely to be 109 able to realize desired outcomes. 110 Another explanation is that program effects were hampered by flaws in delivery. In-  can then be transferred to the second language [34]. However, we were not able to test 183 whether this transfer occurred, as our sample included a very limited number of ethnic-184 minority parents who conducted the program in their mother tongue. Furthermore, we 185 did not test children's language and literacy skills in other languages than Dutch. Another approach to realize more differentiation in program delivery could be to 214 deliver the program via additional home visits. This approach has been found to be more 215 effective than a center-based approach [6] and has the advantage that delivery can be tai-216 lored to the individual needs of parents. A number of FLPs that made use of home visits 217 with disadvantaged families showed significant effects on child outcomes [11,47,51,52]. 218 Moreover, provided that bilingual deliverers are available for intervention implementa-219 tion, parents can be instructed in their home language. Some findings suggest that home 220 visits in parents' home language is a beneficial way of delivering FLPs in these families 221 [47]. 222 Finally, an additional reason for delivery issues might be that program activities were 223 insufficiently aligned with participating families' literacy practices. In the late eighties, 224 FLPs have been criticized by researchers who pointed out that these programs were 225 mostly based on mainstream Western pedagogies and ignored the cultural capital of eth-226 nic-minority families [53,54]. More recently, scholars have argued for a more partnership-227 driven approach to intervention research [6]. Such an approach relies heavily on the active 228 involvement of stakeholders (e.g., parents and children), in order to form theories and 229 methods that underlie study designs [55]. •         Note. N pupils = 119; N classes = 9; N schools = 7. n.s. = non-significant. Note. gm = grand mean centered. Parent home language reference category = Dutch. n.s. = non-significant. *** p < 0.001. 316 Table A12. Multi-level Regression-Predicting Diaries with Parent Background Variables. Note. gm = grand mean centered. Parent home language reference category = Dutch. n.s. = non-significant. *** p < 0.001. 318 Note. gm = grand mean centered. Parent home language reference category = Dutch. n.s. = non-significant.           Note. gm = grand mean centered. Parent home language reference category = Dutch. n.s. = non-significant. * p < 0.05; *** p < 0.001. Note. gm = grand mean centered. Parent home language reference category = Dutch. n.s. = non-significant. * p < 0.05; *** p < 0.001.  Note. gm = grand mean centered. n.s. = non-significant. * p < 0.05; ** p < 0.01; *** p < 0.001.      Note. gm = grand mean centered. EEH = Early Education at Home. n.s. = non-significant. ** p < 0.01. Note. N repeated measures = 404; N pupils = 142; N classes = 18; N schools = 7. gm = grand mean centered. EEH = Early Education at Home. n.s. = non-significant. * p < 0.05; ** p < 0.01; *** p < 0.001.       Note. gm = grand mean centered. EEH = Early Education at Home. n.s. = non-significant. ** p < 0.01; *** p < 0.001.   Note. gm = grand mean centered. EEH = Early Education at Home. n.s. = non-significant. ** p < 0.01; *** p < 0.001.