Unpaid Caregiving and Labor Force Participation among Chinese Middle-Aged Adults

Unpaid family caregivers must consider the economic trade-off between caregiving and paid employment. Prior literature has suggested that labor force participation (LFP) declines with caregiving intensity, but no study has evaluated this relationship by accounting for the presence of both kinks and discontinuities. Here we used respondents of the China Health and Retirement Longitudinal Study baseline survey who were nonfarming, of working age (aged 45–60) and had a young grandchild and/or a parent/parent-in-law. For women and men separately, a caregiving threshold-adjusted probit model was used to assess the association between LFP and weekly unpaid caregiving hours. Instrumental variables were used to rule out the endogeneity of caregiving hours. Of the 3718 respondents in the analysis, LFP for men was significantly and inversely associated with caregiving that involved neither discontinuities nor kinks. For women, a kink was identified at the caregiving threshold of eight hrs/w such that before eight hours, each caregiving hour was associated with an increase of 0.0257 in the marginal probability of LFP, but each hour thereafter was associated with a reduction of 0.0014 in the marginal probability of LFP. These results have implications for interventions that simultaneously advance policies of health, social care and labor force.


Background
With a rapidly ageing population and an increasing presence of women in the labor force, societies face a crisis when confronted with a dramatic increase in the demand for caregiving. In many low-or middle-income countries, middle-aged adults are often unpaid caregivers for their ageing parents (or parents-in-law), whereas grandparents usually take on the parental role for their grandchildren [1,2].
An important case study of unpaid caregiving among family members is China, which, due to decades of falling birth rates and rising life expectancy, has the world's fastest ageing population [3]. Recent data suggest that 90% of Chinese families are in need of either elderly care or childcare while 40% of them require both [4]. Because China has very limited publicly funded home care, the percentage of adults who are unpaid caregivers to one or more family members is very high. Specifically, studies show that 57%, 40%, 38% and 27% of daughters, sons, daughters-in-law and sons-in-law regularly provide care to elderly parents (or in-laws) [5] and 70% of grandparents raise their grandchildren [6]. It is very important to note that grandparenthood generally occurs early in the life course of Chinese adults because an average of 80% of them become grandparents by the age of 55 [7]. This means middle-aged Chinese adults (between 45-60 years) are very likely to assume multiple caregiving roles and must consider the relevant economic trade-off between providing unpaid familiar care and engaging in paid employment.

Literature Review
The relationship between the intensity of unpaid caregiving and the probability of labor force participation (LFP) has been studied extensively in the international literature. A consensus has emerged from these studies that suggests LFP is generally inversely related to more intensive caregiving and that there exists a caregiving threshold beyond which increased caregiving has a larger negative effect on LFP than before that threshold is reached [8]. What remains unclear from the literature is the exact form of such caregiving thresholds, specifically if they constitute kinks in and/or discontinuities to the relationship between caregiving and LFP. Most studies have only located discontinuities but did not conduct analyses to identify potential kinks. Carmichael and Charles [9,10] found two simultaneous discontinuities at caregiving hours CG > 0 and at CG > 10 hrs/w; Heitmueller [11] assessed a single discontinuity at CG > 20 hrs/w following a prior study [12] that had suggested that threshold; Van Houtven et al [2] tested a single discontinuity at either CG > 0 or CG > 20 hrs/w, and Jacobs and colleagues [13,14] examined multiple discontinuities at 0, 5 and 15 hrs/w. Meanwhile, several studies have estimated kinks by testing a set of rather arbitrary thresholds (10, 15 and 20 hrs/w), but they did not assess potential discontinuities at these thresholds [15,16]. To date, no study has simultaneously examined both kinks and discontinuities associated with the relationship between LFP and unpaid caregiving; indeed, neither term ("kink" nor "discontinuity") was specifically mentioned in the referenced articles [8].
This paper aims to fill this gap in the empirical literature. By using nationally representative data from the China Health and Retirement Longitudinal Study baseline survey, we investigated potential thresholds of weekly unpaid caregiving hours to see if they resulted in kinks and/or discontinuities in the relationship between caregiving and labor force participation. Our results will provide actionable implications for policy strategies that support family caregivers to balance their caregiving activities and work commitment.

Conceptual Framework
In the standard labor-leisure choice theory, individuals with family responsibilities must simultaneously allocate time to labor work, leisure and unpaid home care to maximize their utility [17][18][19]. Depending on the number of hours of caregiving, individuals confront the traditional labor-leisure trade-off wherein the decision of whether or not to work depends on the reservation wage, nonwage income and an array of sociodemographic and other contextual variables. Theory predicts that when holding all else constant, an increase in caregiving hours reduces the maximum amount of time available for paid work and thereby induces an availability effect that tends to lower LFP. The resulting relationship between caregiving intensity and LFP depends on a host of intrinsic factors including preference orderings between labor and leisure, social context and the life cycle [20]. Gendered impacts on the choices between caregiving and LFP are predicted to be in line with economic theories of specialization and bargaining [18,21] which may be particularly salient in the case of caring for certain dependents (such as parents or spouse) [22].
Recent work by Van Houtven and colleagues [8] advances the theory by examining the theoretical basis for a caregiving hours threshold. This work assesses labor-leisure preference orderings in which the marginal rate of substitution may change abruptly if leisure time were below a leisure threshold. As people value an incremental leisure time before this threshold more than they do after the threshold, this allows for the possibility that kinks may arise on indifference curves thereby yielding possible kinks and/or discontinuities in the relationship between caregiving hours and LFP.

Materials and Methods
Our empirical work was informed by this theoretical model in order to test for a potentially nonlinear relationship between hours of caregiving and LFP by accounting for possible presence of both kinks and discontinuities. Analysis was conducted on Chinese men and women separately to reflect the gender gap in employment [23]. In the following sections, we describe the data sources (Section 2.1.); the study sample (Section 2.2.); present the variables used in the analysis (Section 2.3.); and detail the statistical procedures, including identifying a caregiving threshold, testing for endogeneity of caregiving hours using instrumental variables (Section 2.4.) and sensitivity analyses (Section 2.5.).

Study Setting and Design
This is a population-based cross-sectional study using person-level data from the China Health and Retirement Longitudinal Study (CHARLS) baseline survey. CHARLS is a nationally representative panel study designed to comprehensively understand social trends, socio-economic well-being and the ageing of community-dwelling Chinese adults aged 45 or older. Between June 2011 and March 2012, a nationally representative sample of 23,422 dwelling units representing potential households was drawn from 150 counties (or districts) and 450 villages (or communities) in 28 provinces across China using multistage probability sampling [24]. Excluding empty or non-resident dwelling units yielded 12,740 age-eligible households for the baseline survey. Within each sampled household, the main respondent, defined to be a family member who was at least 45 years of age with sufficient knowledge about the household, and his/her spouse (if any) were both invited to participate in the baseline survey. This procedure yielded 10,257 households with at least one age-eligible member who agreed to participate in the baseline survey (response rate = 80.5%). A total of 17,708 individuals from these households completed the baseline survey at home using a face-to-face computer-assisted personal interview technique [24].

Study Sample
The analysis included CHARLS baseline participants who were of working age; not engaged in agricultural work or in an unpaid family business; and had at least one grandchild under the age of 16 or a parent (or parent-in-law) that was still alive. The normal pensionable age in China is 60 for males, 55 for white-collar women (such as teachers and civil servants) and 50 for blue-collar women [25]. Consequently, in order to consider the potential trade-off between paid work and caregiving, we limited the sample to only men aged between 45-60 and women aged between 45-55 (N = 8,603, 48.6%). We excluded participants who reported being either agricultural workers, unpaid family business workers (N = 4,140); self-employed individuals who worked with another hired family employee (N = 264); those who did not report having grandchildren under the age of 16 or parents (or parents-in-law) that were alive (N = 368); or those who had missing data in the survey (N = 113). These procedures resulted in 3718 (21.0% of the total sample) participants who met the eligibility criteria, which comprised 2,268 men (61.0%) and 1,450 women (39.0%).

Variables
We used a binary outcome variable to represent individual's self-reported current labor force participation (LFP) status using the survey question, "Did you work for at least one-hour last week? We consider any of the following activities to be work: earn a wage, run your own business and unpaid family business work, et al. Work does not include doing your own housework or doing activities without pay, such as voluntary work." Those who answered "Yes", were classified as labor force participants while those who responded "No", were non-participants [26].
The primary exposure variable was the number of hrs/w an individual provided unpaid caregiving services to grandchildren, parents and/or parents-in-law without financial compensation in the last year. In the survey, individuals reported how many hrs/w in the past year they had cared for each dependent (grandchildren, parents and parents-in-law). These responses were summed to yield total weekly unpaid caregiving hours. Those who did not report any caregiving activity over the past year were assigned a value of 0.
We extracted the following person-level characteristics from the survey that previously were identified in the literature [2,[27][28][29] to influence the decisions to participate in the labor market and the intensity of caregiving at home: age (years), marital status (currently married vs. not married), highest education (illiterate or primary/elementary school, middle school, high school or college and above), area of residence (urban vs. rural), having work-limiting health conditions (yes vs. no), household size (i.e., numbers of people in the household; between 1-12) and monthly income of spouse. In sensitivity analysis, we introduced four more covariates, including whether the individual held an urban or rural Hukou (household registration), the household income, the place of Hukou registration in terms of the three economic macro-regions (in the East, Central or West China) [30][31][32] and the tier of city where respondents had registered their Hukou on the basis of the 2013 version of China's City-Tier Classification (in a city that was ranked Tier 2 or above vs. a city below Tier 2) [33].

Statistical Analysis
We first compared the baseline characteristics of respondents stratified by their labor force participation (LFP) status using two-sample tests (t-test or Chi-square test). Probit regression analysis was performed for women and men separately to estimate the association between weekly caregiving hours and LFP. In each analysis, we investigated a potential threshold of caregiving hours by entering three independent caregiving-hours-related variables into the probit model as recently proposed by Van Houtzen and colleagues [8]: (1) a continuous variable representing caregiving hours, CG; (2) a dummy variable that indicated whether caregiving hours exceeded a threshold, CGˆ; and (3) an interaction term between caregiving hours and the threshold dummy variable, CG*CGˆ. Each of the three estimated coefficients on these caregiving variables reflects the incremental change in the likelihood of LFP because of a unit increase in caregiving hours, an abrupt discontinuity in the relationship between caregiving and LFP at the caregiving threshold and potential change to the incremental effect of caregiving on LFP when caregiving hours exceed the threshold. In an iterative selection procedure, we tested all potential thresholds of caregiving hours (between 0-140 with increments of 1-10 h depending on the availability of observations) to identify the threshold that maximized the likelihood of the corresponding probit model. Using this threshold in the probit model, we conducted two joint Wald tests: (1) first, we tested the null hypothesis that the coefficients of the three caregiving variables were all zero. Rejecting this hypothesis confirmed the significance of an overall association between LFP and the set of caregiving variables; (2) next, we tested if the coefficient of the caregiving threshold dummy variable, CGˆ, and that of the interaction term, CG*CGˆ, were jointly zero to assess whether there was a significant change in the association between caregiving and LFP once caregiving hours were at or beyond the threshold.
We also performed an instrumental variables analysis to correct for potential endogeneity of caregiving hours [34]. For both females and males, this technique was employed to address two statistical challenges: first, the potential for an inverse relationship between LFP and caregiving hours; second, the presence of unmeasured confounders (such as low career ambitious or a preference for family caregiving). We used three instrumental variables for weekly caregiving hours, including the number of grandchildren aged below 16; whether the husband's father was widowed; and whether the wife's father was widowed. These instruments are established in the international literature [2,11,28,34] and meet the required assumptions [35]: (1) as Chinese adults are legally obligated to support and take care of their elderly parents [36] and likely to take on the parental role of their grandchildren [7], it is therefore reasonable to assume that individuals tasked with heavy unpaid caregiving duties at home (captured by the three instruments) would tend to allocate more time to unpaid caregiving in order to fulfil their obligations; (2) the three instruments are not correlated with LFP as prior literature based in developing countries [37][38][39], especially in Asian countries [5,[40][41][42], has identified the determinants of LFP to be education attainment and external factors such as urban location.
Using these instruments, we performed a Limited-Information Maximum Likelihood (LIML) procedure [43,44]. This method was chosen over the more commonly used procedure-Two-Stage Least-Squares-because it results in less bias to the estimates when the IVs are weakly associated with the endogenous variable [45]. In the first equation, the three instruments were entered into a linear regression to predict caregiving hours, controlling for all covariates. In the second equation, caregiving hours were used in the threshold-adjusted probit model to predict LFP, after adjustment for covariates. We used the same caregiving threshold that had been previously identified in the one-step probit regression analysis. We assessed whether the predictions from a model treating the caregiving hours variable as exogenous differed significantly from a model where it was treated as endogenous using two Sargan-Hansen statistics [46,47]. We tested the validity of our instruments with tests of under-identification (using the Anderson canon. corr. LM statistic), over-identification (i.e. the Sargan-Hansen test of over-identifying restrictions) and weak identification (i.e. the Cragg-Donald Wald F-statistic in the first equation). The one-step probit model (without the use of instruments) was deemed more appropriate when we failed to reject the exogeneity of caregiving hours.
Using either the LIML model or the one-step probit model (whichever was deemed to be more appropriate), we predicted the probability of LFP separately for a Chinese men and women with reference-level characteristics who spent between 0 and 140 h a week on unpaid caregiving. We defined Chinese women with reference-level characteristics to be 50 years of age, married, with middle school education, living in a rural community with 4 household members, did not have work-limiting health conditions, and whose spouse earned 1,466 RMB per month. A Chinese man with reference-level characteristics shared the same characteristics except for being 52 years of age with a spouse and monthly income of 442 RMB.

Sensitivity Analysis
For women and men separately, we conducted three sets of sensitivity analysis. The first set of analysis entails the estimation of four simpler models of unpaid caregiving hours and LFP following the specifications used in the prior literature [2,[9][10][11][13][14][15][16]]. The first model regarded caregiving as a dummy variable (denoting caregivers vs. noncaregivers) considering neither kinks nor discontinuities; the other three models excluded the possibility of a discontinuity or a kink or both. A pairwise likelihood ratio test was performed to compare each simpler model with our model to establish the statistical value of accounting for the presence of both kinks and discontinuities in the estimation of the relationship between unpaid caregiving hours and LFP.
Next, we undertook subgroup analyses stratified by education attainment (below middle school vs. middle school or above), types of Hukou (urban vs. rural), household income (below the median income vs. equal to or above the median income) and regions of Hukou (in East, Central and West China; and in a Tier 2 or above vs. a below Tier 2 city). Using the weekly caregiving threshold yielded in the primary analysis, we repeated the analysis on each subgroup to verify the robustness of our primary findings.
Last, separate analyses were conducted to assess the LFP relationship with hours of grandchild care and with hours of eldercare (provided to parents and/or parents-in-law), respectively. For each type of caregiving and for women and men separately, we performed the instrumental variables analysis using the LIML procedure; a one-step probit analysis without the use of instruments was undertaken if endogeneity of caregiving hours was rejected. For the grandchild care analysis, three instruments were used: the presence of grandchildren aged below 16 (yes/no); the number of these young grandchildren; and the number of kindergartens in the community. For the eldercare analysis, another three instruments were used: the number of parents and parents-in-law that were alive; whether one of parents or parents-in-law was in poor health (yes/no); and the presence of eldercare facilities in the community (including publicly financed nursing homes, organizations for helping the elderly, elderly activity centers, home-based eldercare centers and elderly primary care centers) [26]. A new caregiving threshold was located in each instance using the same iterative procedure. All analyses were conducted using Stata/SE 15.0 (StataCorp LLC, College Station, TX, USA). Table 1 reports the baseline characteristics of respondents by LFP status and by gender. Among a total of 3718 (64.2%) labor force participants, there were 681 labor force participating women and 1,706 men. Compared to men, women were younger (mean age = 48.8 vs. 51.6 years, p-value < 0.01), living in smaller households (mean size = 3.5 vs. 3.8 persons, p-value < 0.01) and had a spouse with higher monthly income (mean income = 1,822.8 RMB vs. 502.2 RMB, p-value < 0.01). Labor force participating women were also less likely to be married (94% vs. 97%, p-value < 0.01), having completed middle school (58% vs. 66%, p-value < 0.01) and living in rural areas (36% vs. 44%, p-value < 0.01). The hours of unpaid caregiving per week did not differ between labor force participating women and men. We observed similar differences between gender groups among non-participants (N = 1,331, 35.8%), whereas women were also younger (mean age = 50.6 vs. 54.4 years, p-value < 0.01), with lower education (percentage of middle school graduates = 47% vs. 59%, p-value < 0.01) and had a spouse with higher monthly income (mean income = 1,149.7 RMB vs. 259.0 RMB, p-value < 0.01). Hours of unpaid caregiving, household size, marital status and urban/rural residence did not differ between women and men who were non-participants.

Endogeneity of Unpaid Caregiving Hours
For both women and men, the three instruments passed the over-identifying restriction test (both p-values > 0.1; see Appendix A), but we failed to reject under-identification on both occasions (both p-values > 0.1). Furthermore, for both gender groups, the three instruments were weak (Cragg-Donald Wald F-statistic = 0.803 and 1.198 for women and men, respectively). Nevertheless, use of the three instruments was still deemed appropriate as the Limited-Information Maximum Likelihood (LIML) estimator tended to be robust to weak instruments [45]. Using these instruments, we failed to reject the exogeneity of unpaid caregiving hours for both women and men (both p-values > 0.1). Consequently, the use of an instrumental variables approach was ruled out in both cases, and accordingly, we only present results of the one-stage probit analysis below ( Table 2). Two sets of instrumental variables analyses using a two-stage least-square and a LIML procedure yielded largely similar results (Appendix A). Table 2. Association of weekly unpaid caregiving hours and labor force participation estimated by the one-stage probit model.

Marginal Probability (95% CI)
Caregiving hours before the threshold, per hour, CG 0.0132 *** (0.00726, 0.0191) † The probit coefficient representing the association between labor force participation and caregiving hours after the threshold was calculated by summing the coefficients of CG and CG*CGˆ. Its p-value represents the joint significance of CG and CG*CGˆ. ** p-value < 0.05, *** p-value < 0.01. CG, caregiving; CI, confidence interval.

Association between Unpaid Caregiving Hours and LFP
For women, testing various caregiving thresholds between 1-140 hrs/w yielded probit models with log-likelihood values ranging from -820.8 (corresponding to a caregiving threshold of 12 hrs/w) to a high of -817.2 (corresponding to a caregiving threshold of eight hrs/w). Hence, the threshold of unpaid caregiving was identified to be eight hrs/w for women. When compared to women who provided less than eight hours of caregiving per week (Appendix B), those offering at least eight hours of caregiving per week were slightly older (mean age = 50.5 years vs. 49.4 years, p-value < 0.01) and less likely to be married (92% vs. 96%, p-value < 0.01); having graduated from college (2% vs. 5%, p-value < 0.05); and having work-limiting health conditions (7% vs. 12%, p-value < 0.05). They were also living in larger households (4.1 vs. 3.4 persons, p-value < 0.01) and less likely to be employed (percentage of labor force participants = 35% vs. 53%, p-value < 0.01).
Using the eight-hour threshold, we found that before eight hours, each unpaid caregiving hour was significantly associated with a higher likelihood of LFP (probit coefficient = 0.0804, 95% CIs: 0.0123 to 0.149, p-value < 0.05), which corresponds to an increase of 0.0257 (95% CIs: 0.00394 to 0.0474) in the marginal probability of LFP. There was no evidence of any significant abrupt change in LFP at the threshold (probit coefficient = -0.0840, 95% CIs: -0.313 to 0.144, p-value > 0.1). After the threshold, we observed a significant decrease of 0.0271 (95% CIs: -0.0488 to -0.0054) in the slope of the marginal probability of LFP for each additional unpaid caregiving hour above the caregiving threshold (probit coefficient of the interaction = -0.0847, 95% CIs: -0.153 to -0.0165, p-value < 0.05). These findings imply that each unpaid caregiving hour for women beyond eight hrs/w was associated with a reduction of 0.0014 in the marginal probability of LFP. Overall, there was strong evidence of a significant association between unpaid caregiving hours and LFP among women (joint p-value of three caregiving hours variables < 0.001), and that there was a differential association between caregiving hours and LFP below and above the caregiving threshold of eight hrs/w (joint p-value of the threshold and interaction = 0.028).
For men, the log-likelihood function of the probit model ranged from -1089 (corresponding to a caregiving threshold of 1 h per week) to -1087 (corresponding to a caregiving threshold of 72 hrs/w). Hence, the caregiving threshold was estimated to occur at 72 h of caregiving per week. Compared to men who provided less than 72 h of caregiving per week (Appendix B), those tasked with heavier caregiving duties were 2-years older on average (mean age = 54.2 vs. 52.2 years, p-value < 0.01), more likely to report work-limiting health conditions (21% vs. 13%, p-value < 0.01), and living in larger households (4.6 vs. 3.7 persons, p-value < 0.01). With regard to LFP status, men providing at least 72 h of caregiving per week were less likely to be employed (percentage of labor force participants = 67% vs. 76%, p-value < 0.01).
Using the 72-h caregiving threshed, we found that before 72 h, each caregiving hour was significantly associated with lower LFP (probit coefficient = -0.00442, 95% CIs: -0.00822 to -0.000612, p-value < 0.05), such that an hourly increment in caregiving reduced the probability of LFP by 0.00119 (95% CIs: -0.00222 to -0.000169) at the margin. There was neither a significant change in LFP at the caregiving threshold (probit coefficient = 1.262, 95% CIs: -0.178 to 2.702, p-value > 0.1), nor for the relationship between LFP and incremental changes in caregiving hours below or above that threshold. Specifically, the marginal probability of LFP fell continuously with more caregiving hours in the pre-and post-threshold periods (probit coefficient of interaction = -0.00852, 95% CIs: -0.0212 to 0.00418, p-value > 0.1). In conclusion, we found strong evidence of an overall association between caregiving hours and LFP for men (joint p-value of three caregiving hours variables = 0.0120), but this association did not depend on the caregiving threshold (joint p-value of the threshold and interaction = 0.160).
In Figure 1, we report the predicted probability of LFP and its 95% CIs with different unpaid caregiving hours for Chinese women with reference-level characteristics. When women were not caregivers, their probability of LFP was 0.535 (95% CIs: 0.400 to 0.670), but with unpaid caregiving, the probability of LFP would initially grow to 0.743 (95% CIs: 0.562 to 0.924) at 7 caregiving hrs/w. At the caregiving threshold of eight hours, there was an estimated, though not statistically significant, discontinuity as the probability of LFP fell abruptly to 0.488 (95% CIs: 0.342 to 0.634), and with caregiving hours beyond the caregiving threshold the probability of LFP fell continuously from 0.488 to just 0.274 (95% CIs: 0.126 to 0.423) once she reached 140 caregiving hours a week. In Figure 1, we report the predicted probability of LFP and its 95% CIs with different unpaid caregiving hours for Chinese women with reference-level characteristics. When women were not caregivers, their probability of LFP was 0.535 (95% CIs: 0.400 to 0.670), but with unpaid caregiving, the probability of LFP would initially grow to 0.743 (95% CIs: 0.562 to 0.924) at 7 caregiving hrs/w. At the caregiving threshold of eight hours, there was an estimated, though not statistically significant, discontinuity as the probability of LFP fell abruptly to 0.488 (95% CIs: 0.342 to 0.634), and with caregiving hours beyond the caregiving threshold the probability of LFP fell continuously from 0.488 to just 0.274 (95% CIs: 0.126 to 0.423) once she reached 140 caregiving hours a week. Figure 1. Predicted probability of labor force participation based on the one-stage probit model for Chinese women with reference-level characteristics. We defined Chinese women with reference-level characteristics to be 50 years of age, married, with middle school education, living in a rural community with 4 household members, did not have work-limiting health conditions, and whose spouse earned 1466 RMB per month. Grey dashed lines represent the 95% confidence intervals for the predicted probability of labor force participation. LFP, labor force participation.
In Figure 2, we report the predicted probability of LFP and 95% CIs for referencelevel Chinese men with different weekly hours of unpaid caregiving. We observed a steady decline in their probability of LFP from 0.710 (95% CIs: 0.669 to 0.752) to 0.597 (95% CIs: 0.493 to 0.700) as their unpaid caregiving hrs/w increased from 0 to 71 h. At the caregiving threshold of 72 h, the probability reached a high of 0.811 (95% CIs: 0.643 to 0.980), but thereafter the probability fell with more caregiving hours, ultimately reaching a low of 0.502 (95% CIs: 0.326 to 0.677) at 140 h of unpaid caregiving a week. Figure 1. Predicted probability of labor force participation based on the one-stage probit model for Chinese women with reference-level characteristics. We defined Chinese women with reference-level characteristics to be 50 years of age, married, with middle school education, living in a rural community with 4 household members, did not have work-limiting health conditions, and whose spouse earned 1466 RMB per month. Grey dashed lines represent the 95% confidence intervals for the predicted probability of labor force participation. LFP, labor force participation.
In Figure 2, we report the predicted probability of LFP and 95% CIs for reference-level Chinese men with different weekly hours of unpaid caregiving. We observed a steady decline in their probability of LFP from 0.710 (95% CIs: 0.669 to 0.752) to 0.597 (95% CIs: 0.493 to 0.700) as their unpaid caregiving hrs/w increased from 0 to 71 h. At the caregiving threshold of 72 h, the probability reached a high of 0.811 (95% CIs: 0.643 to 0.980), but thereafter the probability fell with more caregiving hours, ultimately reaching a low of 0.502 (95% CIs: 0.326 to 0.677) at 140 h of unpaid caregiving a week.

Figure 2.
Predicted probability of labor force participation based on the one-stage probit model for Chinese men with reference-level characteristics. We defined Chinese male with reference-level characteristics to be 52 years of age, married, with middle school education, living in a rural community with 4 household members, did not have work-limiting health conditions, and whose spouse earned 442 RMB per month. Grey dashed lines represent 95% confidence intervals for the predicted probability of labor force participation. LFP, labor force participation.

Sensitivity Analysis
We report estimation results for the four simpler models in Table 3. For women, we found strong statistical evidence that omitting considerations of either kinks or discontinuities or both would greatly reduce the performance of the model (all p-values of likelihood ratio tests < 0.05). For men, the results implied that while our model exceeded the performance of the model that regarded caregiving as a dummy variable and the discontinuity-only model (both p-values of likelihood ratio test > 0.05), it was comparable to the kink-only model (p-value < 0.1) and to the model that accounted for neither kinks nor discontinuities (p-value > 0.1).  Predicted probability of labor force participation based on the one-stage probit model for Chinese men with reference-level characteristics. We defined Chinese male with reference-level characteristics to be 52 years of age, married, with middle school education, living in a rural community with 4 household members, did not have work-limiting health conditions, and whose spouse earned 442 RMB per month. Grey dashed lines represent 95% confidence intervals for the predicted probability of labor force participation. LFP, labor force participation.

Sensitivity Analysis
We report estimation results for the four simpler models in Table 3. For women, we found strong statistical evidence that omitting considerations of either kinks or discontinuities or both would greatly reduce the performance of the model (all p-values of likelihood ratio tests < 0.05). For men, the results implied that while our model exceeded the performance of the model that regarded caregiving as a dummy variable and the discontinuity-only model (both p-values of likelihood ratio test > 0.05), it was comparable to the kink-only model (p-value < 0.1) and to the model that accounted for neither kinks nor discontinuities (p-value > 0.1).
Results of subgroup analyses stratified by the type of Hukou status, educational level, household income and Hukou region for Chinese women are reported in Table 4. We found that for women who either had urban Hukou status or at least middle school education or household income that equaled to or exceeded the median level or having registered their Hukou in a city that was below Tier 2, the relationship between their unpaid caregiving hours and LFP generally followed the pattern revealed in our primary analysis; that is, before eight hours, LFP tended to increase with more unpaid caregiving hours but any additional caregiving hours thereafter reduced LFP. For women with rural Hukou status, the pre-threshold increasing association between caregiving hours and LFP diminished (p-value of the caregiving continuous variable > 0.1), and for those with household income below the median level or having their Hukou registered in either the West Chinese region or in a city that was ranked Tier 2 or above, there was an absence of association between unpaid caregiving hours and LFP (joint p-value of three caregiving hours variables > 0.05). We report results of the same subgroup analyses on Chinese men in Table 5. We did not identify any association between unpaid caregiving hours and LFP for men with urban Hukou status, had at least middle school education, came from a household with income below the median level, or had their Hukou registered in the East or West Chinese regions (all p-values of the joint significance of the three caregiving hours variables > 0.1). For men with rural Hukou status or did not complete middle school or had their Hukou registered in the Central Chinese region, their LFP decreased continuously with more unpaid caregiving hours without the effects of any discontinuities or kinks at the 72-h caregiving threshold. For men whose household income was at least at the median level, their LFP was initially unrelated to more unpaid caregiving hours before the 72-h threshold (p-value of the caregiving continuous variable > 0.1); at the threshold, there was an increase of 0.969 in the marginal probability of LFP (p-value < 0.05) and LFP started to decrease with more caregiving hours thereafter (joint p-value of the threshold and interaction < 0.05).
The relationship between LFP and hours of grandchild care was assessed for women and men. We failed to reject the exogeneity of hours of grandchild care for both genders (Appendix C, Table A3), therefore results of a one-step probit model are reported (Table 6). For women, our analysis yielded 4-hrs/w as an important threshold of grandchild care. Although the three caregiving hours variables were individually insignificant, we did find the presence of an overall negative association between women's LFP and hours of grandchild care (p-value<0.01). For men, we identified a negative association between LFP and hours of grandchild care (p-value < 0.01) and a threshold of grandchild care at 72-hrs/w. Before 72 h, each hour of grandchild care was significantly associated with a 0.00199 decrease in men's marginal probability of LFP (p-value < 0.01). There was an abrupt but statistically insignificant rise of LFP at the 72-h threshold, and additional grandchild care hours beyond the threshold continued to lower men's marginal probability of LFP by 0.00199. Hence, the threshold effect among men was insignificant (p-value > 0.05). We report the marginal effect point estimate and the standard error. † The probit coefficient representing the association between labor force participation and caregiving hours after the threshold was calculated by summing the coefficients of CG and CG*CGˆ. Its p-value represents the joint significance of CG and CG*CGˆ. * p-value < 0.1, ** p-value < 0.05, *** p-value < 0.01. We report the marginal effect point estimate and the standard error. The one-step probit analysis results are reported for the grandchild care analysis while the instrumental variables analysis results are reported for the eldercare analysis. † The probit coefficient representing the association between labor force participation and caregiving hours after the threshold was calculated by summing the coefficients of CG and CG*CGˆ. Its p-value represents the joint significance of CG and CG*CGˆ. * p-value < 0.1, ** p-value < 0.05, *** p-value < 0.01.
We also examined the relationship between LFP and hours of eldercare provided to parents and/or parents-in-law. For both women and men, we found strong evidence to reject the exogeneity of eldercare hours, and thereby corroborated the use of the three instrumental variables (Table 6; Appendix C, Table A4). For women, 7-h of eldercare per week was identified as a threshold. Before 7 h, each eldercare hour was significantly associated with a 0.365 increase in women's marginal probability of LFP (p-value < 0.05). Although there was an absence of discontinuity at the 7-h threshold, each eldercare hour thereafter was associated with lower LFP by reducing the marginal probability of LFP by 0.001 (p-value < 0.05), which gave rise to a significant threshold effect (p-value < 0.05). For men, the threshold of eldercare occurred at 70-hrs/w. Before 70 h, each eldercare hour was associated with higher LFP that did not reach statistical significance (p-value > 0.05). Neither the discontinuity nor the kink was individually significant (both p-values > 0.1); however, we did identify a significant threshold effect that might indicate a net positive change in LFP once men's eldercare hours reached or exceeded the 70-h threshold (p-value < 0.05). An overall association between men's LFP and eldercare hours was also significant (p-value < 0.05).

Discussion
In this population-based cross-sectional study, we used data from the CHARLS baseline survey to explore the relationship between weekly unpaid caregiving hours and LFP among Chinese women and men. Three major findings emerged from our analysis: first, LFP was significantly associated with caregiving for both gender groups. Second, although we did identify a caregiving threshold (72 hrs/w) for men, their LFP was generally inversely related to caregiving without any kinks or discontinuities. Third, we identified a statistically significant kink in the relationship among women whereby their probability of LFP was initially positively associated with caregiving until it reached a caregiving threshold of eight hrs/w after which the probability of LFP fell continuously with more caregiving hours.

In Contrast to Prior Literature
To the best of our knowledge, our work is the first empirical analysis that simultaneously assessed both kinks and discontinuities in the relationship between caregiving and labor force participation status. Furthermore, we established the statistical value of considering both kinks and discontinuities through examining four simpler models all of which had poorer performance. A recently proposed theoretical model suggests LFP would be inversely related to caregiving intensity with one discontinuity in this relationship [8].
Our analysis attests to this non-linearity, but there are other findings that are not in line with those theoretical predictions: first, among both women and men, we did not identify any statistically significant discontinuity (both p-values > 0.1) in their respective LFP relationships; second, the relationship between caregiving intensity and LFP for women was not consistently negative; LFP first grew with caregiving hours, reaching a peak at 7-hrs/w of caregiving, and then falling from the caregiving threshold of 8-hrs/w. While the first discrepancy is largely attributed to the structure of our data, the second warrants further consideration. It is plausible that our results imply that when confronted with the double burdens of paid work and caregiving responsibilities, Chinese women are inclined to combine labor work with a moderate amount of caregiving; it is only when these caregiving duties become more time-consuming that they tend to withdraw from the labor market. Hence, our results suggest that while caregiving might exert an adverse impact on the employment opportunities of Chinese men, Chinese women are more likely to balance their work and caregiving activities, at least until their intensity of caregiving reached the caregiving threshold. These findings are unique in the international literature and contrast with prior studies that suggest caregivers are generally more likely to withdraw from the labor market given more intensive caregiving [2,9,11,15,48,49]. We add to the literature by identifying a segment of the LFP relationship with caregiving intensity that is not exclusively decreasing, at least for women who provide unpaid care up until the threshold.
Regarding potentially significant thresholds of weekly unpaid caregiving hours, prior studies have explored four candidates (including 0, 5, 10, 15 and 20 h), but these were either chosen conveniently in increments of 5 h or were loosely based on prior findings [12]. For the first time, we were able to locate two caregiving thresholds-for women and men separately-that were statistically grounded and verified the effects of these thresholds by joint hypothesis testing. There has been no Asian-based study that has examined a caregiving threshold, so our study represents an advancement in that regard [5,41,[50][51][52].
Another novelty of our analysis was to simultaneously deal with the potential endogeneity of unpaid caregiving hours and locate empirically a caregiving threshold. Although in the theoretical model proposed by Van Houtven et al [8] caregiving hours are considered exogenous, we used instrumental variables to statistically rule out the potential for inverse causality and unmeasured confounding while embedding a maximized likelihood-based procedure to identify a significant caregiving threshold. The popularity of instrumental variables is well established in the health economics literature and at least five studies have applied this technique to understand the causal role of unpaid caregiving on labor market outcomes [2,11,28,34,50]. However, there is a paucity of work that jointly use instrumental variables and a selective procedure to detect thresholds [53]. Our paper demonstrates that it is feasible to combine such methods and we hope that it may encourage others to replicate such methods when assessing a complex causal relationship involving potential kinks and/or discontinuities.
Our work identified a significant positive association between LFP and having worklimiting health conditions. Specifically, women and men with work-limiting health conditions in our sample were associated with 0.539 and 0.434 increase in the marginal probability of LFP, respectively. While this positive association may seem counter-intuitive and contrast with international literature [54], it is imperative to note that our study sample comprises nonfarming, working-age Chinese (aged 45-65 years) who face retirement, if they are not already retired [25]. So, for our study participants the decision they face is whether to exit the labor market or to continue working until they reach their mandatory age of retirement. In this way, our results imply that having a work-limiting health condition might act as a proxy for low levels of accumulated wealth whereby healthy individuals (with higher accumulative wealth) tend to retire earlier than those with work-limiting health conditions. The latter tend to have lower wealth and greater financial insecurity are likely to work towards their mandatory retirement age in order to enable them to live more independently in older age. Future studies with data on common measures of accumulated wealth, such as net worth, home ownership and total assets, need to confirm the proxy role of having work-limiting health conditions on low wealth [55]. As China is ready to raise the mandatory retirement age [56], it is important to monitor the trend of employment among those with work-limiting health conditions to allow for the design of welfare programs that aid the well-being of those individuals.

Policy Implications
Our findings have important implications for policy decision makers. Within China's institutional and cultural context, unpaid caregiving by family members is expected to continue to be the predominant source of care in future years [36]. As such, policy makers need to be well-informed about the trade-off between increased unpaid caregiving and erosions in labor market participation. By designing interventions that help unpaid caregivers better balance their caregiving commitment and labor market responsibility, there is potential to advance both sets of activities. This could be accomplished in many ways, including but not limited to more flexible work hours, paid leave for caregiving and care allowances. Depending on the type of caregiving, targeted interventions could be implemented to benefit childcare and eldercare providers. These include publicly funded daycare, expanded insurance coverage for children with complex medical needs, government assistance for long-term care accommodations for seniors and establishments of community-based eldercare facilities. Such programs have already been implemented successfully in some western countries with proven effectiveness in alleviating the burden of caregiving and supporting caregivers to balance paid work and caregiving obligations [57]. Moreover, as we found Chinese women and men react differently when confronted with the double burden of caregiving and employment, policies need to reflect this gendered difference. Specifically, family-friendly policies need to target women who combine work and caregiving in order to enhance their ability to actively take part in both unpaid home care and labor activities, with the goal of advancing their well-being. There is evidence from some western countries that such family-friendly policies targeting women are promising tools to promote a higher economic activity of women in addition to improving the work-family balance for both gender groups [58]. Furthermore, welfare programs that support male employees with family caregiving duties are nearly non-existent in China [59]. As we found Chinese middle-aged men, especially those providing care to grandchildren, tend to withdraw from work given increased caregiving tasks, efforts must be made to design interventions to aid men who engage in both paid work and unpaid caregiving.

Limitations
Our study has a number of limitations that are common in observational studies using cross-sectional survey data. First, accuracy of these data relied entirely on selfreporting by participants. However, the CHARLS is a nationally representative survey with rigorous sampling procedures and well-established survey instruments [26] which should lead to reliable responses among participants. Second, our study is uniquely situated in China, which impedes our ability to generalize the findings to other countries. However, we do believe that the increasing burden of unpaid caregiving is a shared concern worldwide [1,2], and the results from our analysis provide insights that are applicable to an international context. Furthermore, the significance of our study is that for the first time, a complex relationship between LFP and unpaid caregiving that entails potential kinks and discontinuities has been empirically examined. In this way, our work provides important statistical insights and paves the way forward for others to replicate this analysis using international data. We were unable to account for the effect of policy reforms that occurred after 2011, including the replacement of the one-child policy with a two-child policy [60], the expansion of welfare program for disabled persons [61] and the extension of maternity/paternal leaves in some regions [59]. In particular, starting in Guangzhou, China has launched a Hukou reform in May 2010 by introducing the Unified Residence Hukou as a third class of Hukou beyond Agricultural Hukou and Non-agricultural Hukou [62]. Due to the nascent status of this new Hukou class, only 0.6% (N = 107) of all CHARLS baseline survey participants reported to have this class of Hukou. Hence, future studies need to assess the impact of this Hukou reform in the analysis of caregiving and labor engagement. Furthermore, we were also unable to assess a causal relationship between unpaid caregiving and LFP due to the cross-sectional nature of our data as well as the possibility of having multiple meaningful caregiving hours thresholds due to our relatively small sample size [8]. Hence, future study with access to larger and more recent longitudinal data need to revisit this topic in order to examine the causal role of unpaid caregiving on various labor market outcomes in the current era. Finally, we were unable to account for the potential disparity amongst individuals' region of Hukou registration, the region where unpaid caregiving activities took place and the region of labor force participation and thereby did not identify subgroups of workers (such as rural migrant workers) whose place of caregiving and working might differed [63]. Future studies with those data will provide additional insights on how middle-aged Chinese adults find the balance between caregiving and working.

Conclusions
This study offers important empirical insights regarding the complex relationship between the intensity of unpaid caregiving and labor force participation among Chinese women and men. The findings help inform both health and social care policy decision making as well as labor force policy in the face of an aging of the population. Policies that assist unpaid caregivers to maintain balance in their caregiving and labor market activities are of universal importance. Moreover, there are opportunities to extend the methodology to other labor market outcomes that may be impacted by unpaid caregiving, such as hours of work and hourly wages.  Data Availability Statement: Publicly available datasets were analyzed in this study. These data can be found here: http://charls.pku.edu.cn/pages/data/2011-charls-wave1/en.html.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results. Table A1. Results of a two-stage least-square and a limited-information maximum likelihood procedure using the proposed instruments. We report the marginal effect point estimate and the standard error. We omitted the variable representing individual's marital status due to multicollinearity. † The probit coefficient representing the association between labor force participation and caregiving hours after the threshold was calculated by summing the coefficients of CG and CG*CGˆ. Its p-value represents the joint significance of CG and CG*CGˆ. IV, instrumental variables. * p-value < 0.1, ** p-value < 0.05, *** p-value < 0.01. We report the mean and the standard deviation. CG, caregiving; hr, hours; wk, weeks. * p-value < 0.1, ** p-value < 0.05, *** p-value < 0.01. Table A3. Results of an association between labor force participation and weekly hours of grandchild care among women and men.  We report the marginal effect point estimate and the standard error. † The probit coefficient representing the association between labor force participation and caregiving hours after the threshold was calculated by summing the coefficients of CG and CG*CGˆ. Its p-value represents the joint significance of CG and CG*CGˆ. LIML, limited-information maximum likelihood; IV, instrumental variables. * p-value < 0.1, ** p-value < 0.05, *** p-value < 0.01. Table A4. Results of an association between labor force participation and weekly hours of eldercare among women and men.  We report the marginal effect point estimate and the standard error. † The probit coefficient representing the association between labor force participation and caregiving hours after the threshold was calculated by summing the coefficients of CG and CG*CGˆ. Its p-value represents the joint significance of CG and CG*CGˆ. LIML, limited-information maximum likelihood; IV, instrumental variables. * p-value < 0.1, ** p-value < 0.05, *** p-value < 0.01.