Quantity–Quality Trade-Off and Early Childhood Development in Rural Family: Evidence from China’s Guizhou Province

This paper empirically investigates the causal effect of having siblings on the cognitive, language, motor, and social-emotional skills of infants under the age of 2 in rural families in Guizhou Province in China. The results are based on data from a survey conducted in 2017. To effectively relieve the endogeneity induced by selection bias, we applied the matching-smoothing (MS) method to evaluate the effects of having siblings. The results show that, first, having siblings produces significant negative impacts on an infant’s cognitive, language, and social-emotional skills; second, intrahousehold resource allocation is the mechanism behind the Quantity–Quality (Q–Q) trade-off, and it exerts its effects through two key identified channels—the home environment and parental warmth. By spreading the parents’ investment among siblings in terms of both the home environment and parental warmth, having siblings hinders infants’ early development. Our findings provide new evidence for the relation between the Q–Q trade-off and early childhood development in rural families in western China.

However, this empirical relationship has been challenged by many studies, which have cited omitted-variable bias as the principal problem when establishing such causal inference between the quantity of children and its long-term outcomes on human capital and income [6][7][8][9][10]. Omitted-variable bias occurs when the relevant variables are not included in the econometric model, so it violates the assumption that the error term is uncorrelated with the regressors and causes bias in the ordinary least squares (OLS) estimator. For example, unobserved family heterogeneity, such as parents' capacity [11], usually affects parents' fertility choices and children's development. Thus, applying the OLS estimator without accounting for parents' capacity will result in omitted-variable bias. early investments. Previous studies have also assessed long-versus short-term consequences in a way that might explain the contradictory findings. For example, twin birth was also adopted as an IV in a sample from Sweden, and the results revealed no impact of family size on children's long-term outcomes, but they did show a significant negative impact on those children's grades in compulsory and secondary school, i.e., their short-term performances [9].
In this paper, we analyze the direct impact of having siblings on early childhood development rather than children's long-term welfares. For one thing, early childhood plays a vital role in the acquisition of skills, including cognitive skills and non-cognitive skills, that are crucial to one's lifetime achievements [23,[26][27][28][29]. For another, since at this stage, children are too young (under two years old) to enter school and society, there are few unobserved changes in the environment over time, so this enables the focus to be placed on the family itself, without having to consider the impact of other external factors. Therefore, investigating the sibling effects on children's early development contributes to the establishment of a purely causal relationship with the Q-Q trade-off.
Additionally, there is another advantage to using a sample that comprises infants below the age of 2: it facilitates the identification of the mechanism driving sibling effects; that is, it can be more easily determined whether having siblings exerts an impact on the intrahousehold resource allocation and, in turn, affects the newborn's early development. Parents' investment has been previously emphasized [1], but it is poorly identified in much of the literature [24]. The home environment and parental warmth are two crucial resources that provide a sound foundation for early childhood development [30][31][32][33]. Evidence has been found that an additional child substantially reduced parental time [24].
Differing from other work studying the trade-off, the uniqueness of this paper is represented by the following four aspects: First, in terms of the target, we investigated sibling effects on infants' early development in rural families in western China. Compared with families from coastal cities, inland rural ones are more vulnerable to resource constraints in China [34][35][36]. Investigating the trade-off and intrahousehold resource allocation underlying it enables us to get a deeper insight into the urban-rural inequality in the human capital of China, and understanding this issue is vital for China's future growth [37,38].
Second, the samples in this study were infants younger than 2 years old. Contrary to studies that focused on the impact of additional children and held the opinion that the birth order was important [6,8], what we studied here was the sibling effects on the "newcomer" baby. Since a newborn baby younger than two years old is too young to have another younger sibling, the birth order does not matter in this study.
Third, in terms of the approach, considering that sometimes the external validity of the IV is difficult to satisfy, we took another approach called the matching-smoothing (MS) method to solve selection bias [39]. The procedure of matching-smoothing (MS) is described in Section 3.3 in more detail. As its name indicates, MS is composed of two parts: matching and smoothing. The matching part is just propensity score matching (PSM) in which the treatment unit (have siblings) is one-to-one matched to the comparison unit (no siblings) on the basis of their propensity scores. In the smoothing part, the trend of matched differences or unobserved selection is fitted by a non-parametric smoothing model using local polynomial regression.
Last, as for the mechanism, our paper also differs from the study of Juhn et al. that only focused on parental time [24]. The links of the home environment and parental warmth to children's early development have been found to be robust and positive [30,33,[40][41][42][43]. In this paper, home environment means the overall quality of the child care environment as measured by the total score of the Child Care Home Observation for Measurement of the Environment (CC-HOME) inventory. The CC-HOME inventory contains 43 caregiver-report binary-choice items in six subscales [44]. Parental warmth means the parents' warm affectionate behaviors toward their baby as measured by the primary caregiver's responses to six relevant questions [45]. In comparison, the home environment provides general information about the child care environment, whereas parental warmth directly assesses the parents' warm affectionate behaviors in detail. They are described in Section 3.4 in more detail, and the items or questions used to measure them are presented in Tables A1 and A2. With the findings that having siblings causes parents to spread out their investment in both the home environment and parental warmth for children, our work sheds light on the relation between the Q-Q trade-off and early childhood development.
The rest of this paper is organized as follows. Section 2 introduces our hypothesis and explains the identification strategies we applied to verify it. Section 3 describes the data and variables. Section 4 reports the empirical results. Section 5 discusses our main results. Section 6 concludes the paper.

Hypotheses
The purpose of our study is to investigate the direct association between having siblings and a newborn infant's early development, which was measured by their cognitive, language, motor, and social-emotional skills. We put forward the first hypothesis as follows: Having siblings has impacts on the infant's early development.
In terms of the mechanism, the home environment and parental warmth are two important issues in the literature on early childhood development [33]. More specifically, the home environment builds a foundation for children's skill formation [30,40,41], and parental warmth is crucial for children's early outcomes [42,43]. As a matter of fact, parents' investment has been poorly measured and estimated in many studies, yet it also lies at the center of the Q-Q theory. Hence, our second hypothesis is as follows: Hypothesis 2a. Having siblings influences the parents' investment in the home environment.
Hypothesis 2b. Having siblings influences the parents' investment in parental warmth.

Sampling
The data we used in this paper were collected from a sample of rural families in Guizhou province, a typical underdeveloped and minority-inhabited area in western China. In Guizhou, one developing prefecture was randomly chosen as the sample prefecture from which the program randomly selected one county. Similarly, after that, one town was randomly chosen from the list of all towns in the county. The sample town consists of nine villages in total.

Survey Organization
The survey was conducted in Guizhou province in 2017 by the China Reach program of the China Development Research Foundation (CDRF). We first obtained a list of registered births in each village of the sample town from the local regulatory authority. From the list, we identified all households that had babies who were under 24 months old at the time of the survey. We ended up with 446 households that met the criteria. After the sample households were selected, we visited them to conduct one-on-one interviews. After dropping 2 sample households with missing information, our final sample includes 444 infants from nine villages.
The Peking University Institutional Review Board (PU IRB), Beijing, China, approved the ethical assessment of the study (No. IRB00001052-17056), and verbal informed consent was obtained from all study subjects.

Econometric Model
We used the multivariate regression analysis to test Hypothesis 1 as follows: where bayley i denotes the development of infant i measured by the Bayley Scales of Infant and Toddler  Development III (Bayley-III) and comprises four variables: cog i , lang i , motor i , and soemo i , which are described in detail in the next section. sibling i is a dummy variable to represent whether the infant has siblings (sibling i = 1) or not (sibling i = 0). X 1i denotes infant and family characteristics, including gender (male), age in months (month), infants' birthweight (birthweight), infants' birth-height (birthheight), parents' age (fage, mage), employment status (fwork, mwork), education (fedu, medu), minority (fminority, mminority), whether the mother is the primary caregiver (moncare), and the family income groups. τ i includes the village dummy variable to control for the unobserved heterogeneity at the village level. ε 1i is the error term, and it is assumed to be i.i.d. Since, in the relevant literature, parental socioeconomic status [46], family income [40,47,48], and community environment have been found to be associated with early childhood development [49,50], we controlled for these covariates in the baseline regression. However, as stated above, selection bias is the main challenge. It can be described by the following equation: The OLS estimate is the left-side item of the equation. The first item on the right-hand side is the average treatment effect on the treated (ATT), and this is what we are interested in. However, the second item on the right-hand side, the selection bias, potentially contaminates the estimate.
In order to solve this problem, or at least partly, we applied the MS method to evaluate the heterogeneous treatment effect of siblings. The matching part of this algorithm is the same as in PSM [51]. According to the dummy sibling variable, the sample was divided into the treatment group (have siblings) and the control group (no siblings). Then, the probit regression model was used to estimate the propensity score for all families, namely, their probability of having more than one child, given all the observed covariates. After that, on the basis of the estimated propensity score, using the nearest-neighbor matching method, the treatment unit was one-to-one matched with the control unit; as a consequence, in each pair, a treated unit was matched with a control unit having almost the same propensity sco. However, PSM is often sensitive to the unobserved selection.
To address the problem, in the smoothing part, the matched differences between the treated and controlled units were plotted, and the local polynomial regression-a non-parametric smoothing model-was used to fit the trend of matched differences against the propensity score [52]. That is, the MS approach yields the treatment effect heterogeneity or unobserved selection as a non-parametric representation rather than the imposed functional form.
To test Hypothesis 2, we used the multivariate regression as follows: where home i measures the home environment of infant i, and warmth i measures the parental warmth toward infant i. They are also described in detail in the next section. X 2i controls for family characteristics as in Equation (1), τ i still controls for village heterogeneity, and ε 2i is the error term and is assumed to be i.i.d. Likewise, to get rid of the influence of selection bias, we adopted the MS method again to estimate Equation (3) using all the observed covariates of the family and village characteristics mentioned above.

Measurement of Key Variables
The dependent variable in the baseline regression, bayley, is the development of children measured by the Bayley Scales of Infant and Toddler Development III (Bayley-III), a well-known scale that has great reliability and validity. This standardized measurement, originally developed by Nancy Bayley [53], contains a series of play tasks and questions, and the scale scores are internationally applied to evaluate the developmental functioning of infants/toddlers from birth to age 3 [54].
In this study, we adopted its four main subtests-the cognitive, language, motor, and social-emotional scales-to assess the children's cognitive skills (such as playing, attention to objects, and counting), language skills (such as understanding and expression of language), motor skills (such as fine and gross motor skills), and social-emotional skills (such as social responsiveness and self-regulation), respectively. They are represented by the variables, cog i , lang i , motor i , and soemo i . All enumerators attended a week-long training course on how to administer the Bayley-III, and they were blind to the study hypotheses. They administered the test one-on-one with household members using a standardized set of toys and a detailed scoring sheet. The assessments of cognitive, language, and motor skills depend on scores that are given according to the infant's successful completion of the items, while the social-emotional score comes from caregivers' responses to relevant questions. A lower score usually means a higher risk of children experiencing developmental problems in the future [55].
In order to assess the home environment and parental warmth, trained enumerators made a 90-120-minute home visit when the infant and the primary caregiver were both present and the infant was awake. The home environment was assessed by the infant/toddler version of the CC-HOME inventory designed by Bradley et al. [44]. It includes 43 caregiver-report binary-choice items in six subscales: Caregiver Responsivity, Acceptance, Organization, Learning Materials, Caregiver Involvement, and Variety of Stimulation. The items and internal consistencies in terms of Cronbach's alpha are presented in detail in Table A1. The Cronbach's alpha coefficients of the HOME inventory and its six subscales are all larger than 0.7, implying that the internal consistencies are acceptable with this study's sample. The total score for the home environment, i.e., the home variable used in our mechanism analysis, is the sum of the scores for the six subscales.
Parental warmth was assessed by using the primary caregiver's responses to the six questions on how often they showed warm affectionate behaviors to their babies. The six questions take less time and training to administer than most existing measures and are strongly robust and reliable [45]. The questions and internal consistencies are presented in Table A2. The Cronbach's alpha of parental warmth is equal to 0.8, implying its internal consistency is good in this sample, too. The response to each question is on a five-point scale, where 1 indicates never/almost never and 5 indicates always/almost always. The total score for parental warmth, i.e., the warmth variable used in our mechanism analysis, is the sum of scores for the six questions. The lowest possible score is 6 (lowest warmth) and the full score is 30 (highest warmth).

Descriptive Statistics
The distribution of the Bayley-III score ( Figure A1) leaves us with an intuitive impression of the developmental differences between the treatment group (have siblings) and the control group (no siblings). The mean of the cognitive, language, and social-emotional score in the one-child family is higher, while the difference in the means of the motor scores is not obvious. It implies a negative association between having siblings and an infant's development of cognitive, language, and social-emotional skills.
Descriptive statistics report the sample mean and standard deviation of each variable (Table A3). The Wald test shows that the differences in the average cognitive, language, and social-emotional scores are statistically significant. Hence, we have reason to believe that an infant's neurodevelopment (except the motor skill) indeed has a negative correlation with having siblings. The Q-Q trade-off makes sense in this context. Further, the means of the control variables-age and year of schooling of parents, father's employment status, and family's total income in the last 12 months between 100,000 and 250,000 yuan-all show a significant gap between the two groups. Finally, in the mechanism analysis, the scores for the dependent variables-home and warmth-are also higher in the no-sibling group on average, indicating that the home environment and parental warmth are both negatively associated with having siblings, too. Table 1 presents the estimates for the sibling effects on the infant's Bayley score. In Panel A, after controlling for the infant's characteristics (Column 2), family's characteristics (Column 3), and village fixed effects (Column 4), successively, the OLS-estimated negative impact of having siblings on the cognitive score is still statistically significant. Therefore, the baseline regressions show that, after controlling for other factors, the infant's cognitive score in the have-sibling group is lower than that in the no-sibling group on average. Note: (i) Coefficients and standard errors are reported to the nearest 0.01. (ii) In the ordinary least squares (OLS) estimate, the robust standard errors clustered at the village level are presented in parentheses. In the propensity score matching (PSM) and matching-smoothing (MS) estimates, average treatment effect on the treated (ATT) is reported, and the robust standard errors are obtained by the bootstrapped method with 50 replications. (iii) *, **, and *** denote p < 0.1, p < 0.05, p < 0.01 in two-tailed tests, respectively. (iv) The estimates for covariates are presented in detail in Tables A4-A7. Then, in order to overcome selection bias, we applied the PSM method to estimate the sibling effects. Column 5 presents the PSM estimate of the effect of having siblings on the cognitive score for the ATT analysis, and it can be compared with the MS estimate shown in Column 6. Both the PSM and MS estimates indicate that the negative effect of having siblings on an infant's cognitive score is highly significant. Furthermore, owing to selection bias, the OLS severely underestimates the absolute value of sibling effects. The MS-estimated coefficient of the sibling effect on the cognitive score is −0.19, which is closer to a small effect size than it is to the trivial one implied by the OLS estimation.

The Sibling Effects on the Infant's Development
The estimations of the sibling effects on infants' language, motor, and social-emotional scores are presented in Panels B-D in Table 1, respectively. Similarly, the estimated sibling effects on infants' language and social-emotional score are also significantly negative, and selection bias drives the OLS to underestimate the sibling effects according to the PSM and MS estimations. However, little evidence is found that supports the assertion that siblings have an adverse impact on an infant's motor skills. In terms of the effect size of having siblings, it is small on the infant's language score (−0.21) and trivial on the infant's social-emotional score (−0.14). Figures A2 and A3 show the density distribution and the common support of the propensity score before and after matching in the PSM analysis. Before matching, there is a huge difference in the distribution between the treatment group and the control group. After matching, the difference diminishes substantially, and the two distribution curves almost coincide. There is a large common support between the two groups according to Figure A3. Most observations favor support, which means that the common support assumption of the PSM method is satisfied, or in other words, the matching result is satisfactory. Figure 1 demonstrates the advantage of the MS method by using the local polynomial regression to fit the matched differences. The figure plots the heterogeneous treatment effect in a non-parametric representation. It overwhelms the prior imposed functional form of PSM because of the unobserved selection. In Figure 1, the X-axis represents the continuous propensity score, and the Y-axis describes differences in the infants' expected development scores. We can observe that the traditional assumption of a linear function in PSM is not reasonable here. There are progressively negative sibling effects on infants' cognitive, language, and social-emotional scores as the propensity for having siblings increases ( Figure 1, Panels A, C, and D). Figure 1 implies that the selection bias is considerably serious for families with a greater propensity to have more than one child according to the observed characteristics. For those families, the selection plays a more pivotal role in affecting both their fertility decision and infant's development.
In short, in terms of early childhood development, having siblings exerts a negative influence on an infant's neurodevelopment, including cognitive, language, and social-emotional skills, in rural families in Guizhou, China, and these effects might be detrimental to these children's future development in many aspects. Overall, the Q-Q trade-off holds from the perspective of those newborn babies, and Hypothesis 1 is verified.

The Mechanism behind the Trade-Off
In this section, the mechanisms through which having siblings affects the infant's neurodevelopment are discussed. The discussion is focused on the sibling effects on the home environment and parental warmth, as the influences of these two factors on early childhood development have been emphasized in many studies [31]. We included the infant and family characteristics and village dummy variables as the covariates of the home environment and parental warmth. Table 2 reports the estimates for the sibling effects on the home environment and parental warmth. First, the OLS, PSM, and MS estimates are all significantly negative, which is reasonable. Faced with an increase in the number of children, parents in rural areas, who are constrained by a limited budget, have to reallocate intrahousehold resources. As a result, their investments in the home environment and warm affectionate behaviors decline. Second, the selection bias is more serious in the estimate for sibling effects on the home environment, which is also clearly depicted in Table 2. In short, two key mechanisms are successfully identified. Having siblings hinders the infant's neurodevelopment by lowering the parents' investment in the home environment and parental warmth.

Mediation Analysis
The causal mediation analysis is a three-stage linear regression as follows: The definitions of the variables are the same as in Equations (1) and (3). To avoid unnecessary multicollinearity, here we only identify the pure mediation effect. That is, we did not include any infant and family characteristics or village dummy variables in the mediation effect model, as in the other study [56]. Table 3 presents the estimates for the mediation effect of the home environment. The first stage is the baseline model, so the resulting estimate is the same as the result from the univariate analysis. The sibling effects on infants' developmental skills (except the motor score) are all negative and significant (Row 1, Table 3). The second stage is the mediation model, from which we can see that, as observed in Table 2, having siblings has a substantially adverse impact on the home environment (Row 2, Table 3). In the third stage, the comprehensive model, the sibling and home environment coefficients are both statistically significant in determining cognitive, language, and social-emotional scores (Row 3, 4, Table 3), and the sibling coefficient is lower than that in the baseline model. Furthermore, as the Sobel test has noted limits and flaws [57], we used the bias-corrected bootstrap test [58,59] with 95% confidence intervals to examine its mediation effect. Results from meditational analyses show that the home environment indeed acts as a mediator of the sibling effects on infants' cognitive and language scores, particularly on the language skill. This can also be inferred in the mediated proportion of total effect (Rows 7-9, Columns 1-2, Table 3). Likewise, the estimates of the mediation effect of parental warmth confirm our hypotheses (Table 4). This channel plays a statistically significant role in sibling effects on infants' language and social-emotional skills (Rows 7, 8, Column 2, 4, Table 4), particularly on the latter, as the mediated proportion of its total effect is 17.3% (Row 9, Column 4, Table 4).

Sibling Number Effects
The key independent variable in the multivariate analysis above, sibling, is a dummy variable, so it only has one of two outcomes, namely, whether the infants have siblings or not. The sibling number effect is also an interesting issue related to the Q-Q trade-off. However, because of the One-Child Policy, although relaxed recently, most families in China usually have two or three children at most, i.e., the sibling number is one or two in most of the sample population.
Here, we replaced the variable sibling with num_sibling and did the multivariate analysis again. Table A8 presents the estimates for the sibling number effects on infants' developmental skills. The results are consistent with the estimates for the sibling effects in Table 1. The increase in the sibling number also considerably decreases the infant's cognitive, language, and social-emotional score.
Then, we used the variable num_sibling to estimate the mediation effect model again to check the robustness of the mediation effect. Tables A9 and A10 report the estimates for the mediation effect of the home environment and parental warmth, respectively. The home environment works as a key mediator of sibling number effects on the infant's cognitive and language skills, and parental warmth plays a vital mediator role in sibling number effects on the infant's social-emotional skills. It is close to the results reported in Tables 3 and 4. The mechanism identification is robust, too.

Discussion
In this paper, we discuss the relation between the famous Q-Q trade-off theory and early childhood development in rural Chinese families. Using survey data collected from families in Guizhou province in 2017, we applied a multivariate analysis to investigate the sibling effects on infants' cognitive, language, motor, and social-emotional skills, which were measured by the well-known Bayley-III scale score.
First, the OLS estimates reveal that, after controlling for the infant's characteristics, family characteristics, and village fixed effects, sibling effects on infants' neurodevelopment (except motor skill) are all negative and statistically significant. Second, the PSM and MS estimates show that having siblings indeed exerts adverse impacts on infants' cognitive, language, and social-emotional skills, but the OLS estimate tends to underestimate this effect because of selection bias. In terms of the magnitude, the real sibling effect sizes on infants' cognitive and language skills are small, and the effect sizes on social-emotional skills are trivial. Third, two key mechanisms are successfully identified here: the home environment and parental warmth. The home environment plays an essential role in sibling effects on infants' cognitive and language scores, while parental warmth plays a vital part in sibling effects on infants' language and social-emotional score. Finally, the sibling number effect on infants' neurodevelopment is consistent with the sibling effects and thus verifies the robustness of the estimates.
Given the multi-factorial measure of the home environment, we further examined which subscale is most strongly correlated with infants' neurodevelopment. When looking at an infant's cognitive skills, the results from our analyses show that acceptance, organization, learning materials, and variety of stimulation have significantly positive effects (Tables A11-A13). When compared, learning materials have the largest effect size. When looking at an infant's language skills, our results show that five out of the six scales have significant effects. The only exception is acceptance. In terms of the magnitude of the estimated coefficient, organization is the biggest, followed by learning materials, and the smallest is involvement. When looking at an infant's social-emotional skills, we find that only the learning materials come out as statistically significant. Taken together, our study provides new evidence that learning materials might be the most effective home environment factor for improving the comprehensive neurodevelopment of infants.
Compared with their urban counterparts, Chinese rural households are more vulnerable to resource constraints. Thus, having an additional child will tend to decrease parents' investment in the home environment and parental warmth for the individual child on average. Qin et al. (2018) found evidence of the Q-Q trade-off in lower-income and less-developed credit market areas using the 2005 inter-census 1% population survey data in China, and they only observed evidence in lower-income and less-developed credit market areas [60]. Considering the fact that the lower-income and less-developed credit market is mainly a rural phenomenon, our finding is consistent with that of Qin et al. (2018).
We acknowledge a couple of limitations of our study. First and foremost, our sample is not nationally representative, so the conclusions may not be generally applicable to all families in the whole country. As a consequence of the lack of an urban sample, we are unable to examine whether the Q-Q trade-off still holds for urban families with higher income and a more developed credit market. Second, the impact of household resource constraints on the relationship between the Q-Q trade-off and early childhood development could be worth investigating. We leave this issue for future research.

Conclusions
Despite the limitations, all of these findings support the conclusion that the Q-Q trade-off holds in rural families in China's Guizhou province. Having siblings indeed hinders a newborn's early development in this setting, and it is harmful to their future achievements. The important mechanism driving the trade-off is the reallocation of intrahousehold resources.
We believe that this conclusion is of certain value to local and even national policymakers who are facing the knotty challenge of substantial urban-rural inequality in human capital in today's China.

Acknowledgments:
The team would like to gratefully acknowledge the participants in this study for their cooperation.

Conflicts of Interest:
The authors declare no conflict of interest.        Figure A3. The common support of the propensity score. Note: (i) The village fixed effect is controlled by 8 village dummies, which are not presented in Table A3 because the space is limited. (ii) The statistics reported in the table are the sample mean, and the standard deviation is presented in parentheses. (iii) The Wald test reports the t statistics of the differences, mean (control group)-mean (treated group), and the p value is presented in square brackets. *, **, and *** denote p < 0.1, p < 0.05, p < 0.01 in two-tailed tests, respectively.        −0.32 ** (0.14)