4.1. Results from the Fuzzy Regression Discontinuity Design
In this study, our primary objectives were to determine whether PM2.5 concentrations influence the level of pension contributions of URRPS as well as UEBPI. To demonstrate the potential improvements in model performance and predictive power, we introduced the maximum PM2.5 concentration as the explanatory variable and constructed two different models for pension contributions of URRPS and UEBPI, respectively. The following two models were examined.
Case 1: The model was designed to determine the impact of PM2.5 concentrations exceeding the policy threshold 35 on the pension contributions of URRPS, alongside the year, province, log-transformed rural population size, log-transformed average years of schooling, log-transformed average disposable income for rural residents, and log-transformed degree of government intervention;
Case 2: The model was designed to determine the impact of PM2.5 concentrations exceeding the policy threshold 35 on the pension contributions of UEBPI, alongside the year, province, log-transformed urban employee population size, log-transformed average disposable income for urban employees, level of tax liability, and industry structure.
4.1.1. Impact of Severe PM2.5 Concentrations on Changes in Pension Contributions of URRPS
First, to identify the causal impact of severe PM
2.5 concentrations on changes in pension contributions of URRPS, we implemented a fuzzy regression discontinuity design (Fuzzy RDD) using a two-stage least squares (2SLS) framework. The cutoff value of the PM
2.5 is set at 35
, consistent with the Chinese national air quality standard. We defined a binary instrumental variable
if the annual maximum PM
2.5 concentration in a province exceeds this threshold and 0 otherwise. The results for first-stage regression and second-stage IV regression are detailed in
Table 1. For each control and independent variable, we showed the
value for the parametric coefficient, the
(*, **, and ***), the standard deviation for the parametric coefficient, and standard errors clustered by province in the second-stage IV.
In the first-stage regression, the dependent variable is pollution exposure, which captures the three-year moving average of PM
2.5. The first-stage model includes
(the instrument), and the flexible polynomial in the running variable
, their interaction, as well as log-transformed control variables such as income, education level, government expenditure, and rural population, along with city and year fixed effects. The results showed that the cutoff indicator
Z is not individually significant, whereas the interaction term
carried a positive and sizeable coefficient, suggesting that the slope shift at the cutoff is highly significant. This pattern was consistent with the specialization of the Fuzzy RDD, which states that the PM
2.5 crossing the standard 35
changes the conditional relationship between the running and long-term exposure and thereby increases exposure probabilistically rather than deterministically. Meanwhile, the quadratic term of the running variable was significantly negative, indicated by a nonlinear trend near the threshold value. Additionally, the first-stage F-statistic
and an
of 0.97 indicated that the instrument is informative and supports the validity of the Fuzzy RDD approach in this study. Then, we further validated the effectiveness of the Fuzzy RDD method in this study by applying the first-stage FRDD diagnostic figure, which captures the relationship between long-term PM
2.5 exposure near the threshold of 35
and the running variable.
Figure 4 shows that the mean jump in long-term exposure is limited at the threshold of 35
for the PM
2.5, but exposure to the right of the threshold grows more rapidly with the running variable. This suggests that the threshold affects long-term exposure mainly by changing the marginal relationship between exposure–running variables, providing empirical evidence for localized IV identification by FRDD and consistent with the significant results of
in the first-stage regression.
In the second-stage IV regression, the dependent variable is the actual level of pension contributions of URRPS. The endogenous variable is the pollution exposure defined as the three-year moving average of PM2.5, with instrumental variable estimation via a quadratic polynomial in the threshold indicator Z, an interaction term, and the same set of covariates in the first stage. The coefficient on the pollution exposure is negative and equal to −34.36, which is significant at the 1% level, suggesting that, within the neighborhood of the cutoff, a 1 increase in PM2.5 is associated with a decrease of about 34.36 units in URRPS contributions. Meanwhile, other covariates such as education level and government intervention show expected signs and significant effects, indicating that economic capacity and demographic structure also play substantial roles in the pension contributions of URRPS, which aligns with the results of previous studies. Taken together, these findings support the hypothesis that higher fine-particulate exposure causally reduces pension contributions among URRPS near the regulatory threshold, and these findings align with the first-stage evidence of a significant slope change at the cutoff, reinforcing the causality interpretation under the local RDD assumptions.
To further validate the negative impact of PM
2.5 on the pension contributions of URRPS,
Figure 5 illustrates the nonlinear relationship between PM
2.5 exposure and the pension contributions of URRPS, using a quadratic polynomial regression around the policy threshold of 35
. The fitted curves reveal an upward trend in contributions under low pollution conditions, contrasted with a declining trend once pollution exceeds the threshold. This curve might be consistent with the mechanism by which economic activity rises in tandem with pollution in low-pollution zones, leading to increased contributions; however, after pollution exceeds the standard, health absenteeism and compliance costs become dominant, instead inhibiting the ability and willingness to contribute to the pension. Meanwhile, visible discontinuity at the cutoff supports the presence of a localized treatment effect, which is consistent with the two-stage IV estimate based on the Fuzzy RDD, suggesting that high pollution levels significantly undermine pension funding outcomes.
4.1.2. Impact of Severe PM2.5 Concentrations on Changes in Pension Contributions of UEBPI
Next, to identify the causal impact of severe PM2.5 concentrations on changes in the pension contributions of UEBPI, we applied a similar fuzzy regression discontinuity design (Fuzzy RDD) using a two-stage least squares (2SLS) framework. Also, unlike the set of control variables used in the previous study of PM2.5 on pension contributions to UEBPI, the current study used a different set of control variables in the Fuzzy RDD.
From the results in the first stage of the Fuzzy RDD, we regress the pollution exposure that captures the three-year moving average of PM2.5 on the instrumental variable Z (an indicator equal to 1 when the PM2.5 exceeds the cutoff 35), the running variable , and their interaction, alongside a series of control variables including log-transformed income, tax revenue, industrial structure, log-transformed population, provincial fixed effects, and year fixed effects. In terms of model fitting, the of 0.98 indicate a reasonably good model fit in the first stage.
The results are similar to the results of first-stage regression under URRPS, showing that, although the coefficient on the instrumental variable Z is −0.929, which is not individually significant, the slope changes sharply at the cutoff point since the interaction term is statistically significant and positive, indicating a stronger marginal association between the running variable and long-term exposure once the 35
standard is exceeded. Meanwhile, the quadratic term in the running variable is negative, which is equal to −24.5217 and significant at the 1
level, supporting the use of piecewise polynomial controls in the second-stage regression. Overall, the results of the first stage supported no sizable mean jump in exposure at 35
but a statically pronounced kink. Thus, the cutoff indicators and their interactions provide exogenous local variations in long-term pollution exposure, satisfy the correlation condition in the first stage of the FRDD model, and provide a reliable basis for estimating the local causal effect of exposure on pension contributions under UEBPI in the second stage.
Table 2 reports the results of the second stage. For each control and independent variable, we showed the
value for the parametric coefficient, the
(*, **, and ***), and the standard deviation for the parametric coefficient.
The results of the second stage showed that the increase in PM2.5 exposure significantly contributes to the decrease in the level of actual pension contributions under UEBPI within the neighborhood of the threshold. The coefficient on treatment is positive, which is equal to −27.5573 and significant at the 5% level, suggesting that the provinces that experienced a significant decrease in pension contributions of about 27.5573 units under UEBPI within the neighborhood of the cutoff is with a 1 increase in PM2.5. Further, although the interaction term was not significant in the resultant realizations, the quadratic term of the running variable was positive and significant at the 10% significance level (coefficient estimate = 4.5838, p-value < 0.1), suggesting that there is a localized upward trend in pension contributions near the PM2.5 threshold. Additionally, several control variables further supported this economic interpretation. The coefficient of urban employee population size was negative and strongly significant at the 1% level, where the estimated value is −2624.235 and the p-value is <0.01, suggesting that more populous provinces may be more vulnerable to pollution shocks in terms of pension system performance. The industrial structure variable had a positive and significant effect at the 1% level, suggesting that service-driven economies may be better able to sustain pension growth under unfavorable environmental conditions. These findings suggest that severe PM2.5 pollution may inhibit the ability of local governments or employers to sustain strong pension contribution growth when PM2.5 exceeds a pollution threshold of 35 , even when underlying economic conditions may support such growth.
Overall, the Fuzzy RDD empirical results of the UEBPI and the URRPS present several commonalities despite the significant differences in system design. First, both types of pension insurance systems exhibit statistically significant breakpoint effects around the PM
2.5 national annual pollution standard, which is equal to 35
, suggesting that air pollution can significantly affect pension contribution levels through behavioral adjustment mechanisms in both urban and rural environments, and the LATE estimation results support the causal effect of PM
2.5 in both cases. Meanwhile, the estimation coefficients of the interaction term of instrument variable Z and the running variable in the first stage are significant in both models, indicating a stronger marginal association between the running variable and long-term exposure, and the exceedance of the pollution threshold provides an exogenous source of pension contribution changes [
63].
4.1.3. Robustness of Fuzzy Regression Discontinuity Design Estimates
To validate the robustness of our regression discontinuity design (RDD) estimates, we performed a bandwidth sensitivity analysis. The choice of bandwidth in RDD is crucial, because it determines the range of data around the cutoff used for local estimation. A narrower bandwidth may lead to high variance due to a small sample size, while a too wide bandwidth may introduce bias by including observations far from the threshold, potentially violating the local randomization assumption [
57,
64]. Therefore, following standard practice, we estimate the treatment effects across multiple alternative bandwidths and assess whether the results remain consistent. This diagnostic procedure helps ensure that the estimated causal effects are not sensitive to arbitrary bandwidth choices, thereby reinforcing the credibility of the findings [
65].
In our study, to assess the robustness of the estimated treatment effect of PM
2.5 on rural pension contributions under the RDD, we applied Calonico et al.’s [
59] method to check the optimal bandwidths and conducted the bandwidth sensitivity analysis using multiple thresholds around the policy cutoff of 35
.
Figure 6 demonstrates that the point estimates at different bandwidths are all negative, indicating that the local average treatment effect of the negative association between long-term PM
2.5 exposure and pension contributions is directionally robust within the threshold neighborhood. The absolute values of the point estimates converge from approximately −3.5 to approximately −2 as the bandwidths expand from ±25 and ±30
to ±35 and ±40
. At ±35
, the 95% confidence intervals are all below zero, which is statistically significant. The results are consistent across reasonable bandwidths, supporting the conclusion that long-term PM
2.5 exposure and pension contributions are negatively related to the localized average treatment effect.
4.2. Results from the Double Machine Learning with XGBoost and Causal Forest Learner
Based on the results of FRDD, we found that PM
2.5 and other control variables have a nonlinear relationship with pension contributions. Meanwhile, the Fuzzy RDD can only use the threshold of the PM
2.5, which is equal to 35, as an instrument variable to estimate the LATE for provinces, whereas we would like to estimate the global average treatment effect (ATE) of PM
2.5 on pension contributions, and thus, we need to understand the specific relationship between the actual value of PM
2.5 and pension contributions through machine learning. Then, we applied the double machine learning (DML) framework, combining the XGBoost and Causal Forest (Ranger) as nonlinear learners. The dependent variable is the total pension contributions of URRPS and UEBPI, respectively, and the key treatment variable is the annual maximum PM
2.5 concentration (Max_pm2.5). We further extend the model by introducing a one-year lag variable for PM
2.5 (Max_pm2.5_lag) to assess the potential lag effects. The results from the double machine learning with XGBoost are detailed in
Table 3,
Table 4,
Table 5 and
Table 6.
In the baseline specification using XGBoost and including only the contemporaneous PM
2.5 measure, we find an economically meaningful negative effect under both pension systems. Specifically, a one-unit increase in the annual Max_pm2.5 is associated with a CNY 0.2022 billion decrease in total pension contributions under URRPS, and a one-unit increase in pm2.5 is associated with a CNY 4.5156 billion decrease in total pension contributions under UEBPI. This result suggests that current-year pollution levels may depress residents’ capacity or willingness to contribute to the pension system, and the coefficient on PM
2.5 under URRPS is smaller than the coefficient on PM
2.5 under UEBPI, which can be partly attributed to the institutional design of the URRPS, which relies less on wage-based payroll contributions and more on individual voluntary contributions and government subsidies, resulting in its pension contributions being less sensitive to external environmental shocks [
66,
67]. When we included the lagged PM
2.5 variable under the URRPS, the contemporaneous effect is still statistically significant, with an estimated value of −0.4145, and the lagged effect becomes highly significant at the 1% significance level, and the estimated value of the lagged effect is negative and equal to −0.4976; under the UEBPI, the lagged PM
2.5 variable is also more significant in the variation in pension contributions relative to the PM
2.5 variable. This shift in significance supports the hypothesis that the economic consequences of air pollution on pension contributions materialize with a temporal lag, potentially reflecting the delayed impacts on household health, labor supply, and informal sector income.
To further validate these findings and capture more flexible nonlinearities and interactions, we estimate the DML model using a Causal Forest (Ranger) Learner and, additionally, to analyze the interpretability of the treatment effects of heterogeneity across features. The results from the double machine learning with the Causal Forest Learner are detailed in
Table 7 and
Table 8, and the results of the coefficient of significance of heterogeneous treatment effects under URPPS and UEBPI are detailed in
Table 9 and
Table 10.
The results reveal stronger and statistically more robust effects: both current and lagged PM
2.5s are negatively and strongly significantly associated with pension contributions under both pension systems. The estimated coefficient for Max_pm2.5 is −0.4722 (
p = 0.0515) and for Max_pm2.5_lag is −0.5035 (
p = 0.0249) under URRPS; the estimated coefficient for Max_pm2.5 is −4.6508 (
p = 0.2246) and for Max_pm2.5_lag is −7.9637 (
p = 0.0436) under UEBPI. These findings confirm that air pollution significantly reduces the level of pension contributions, suggesting that air pollution as a non-institutional factor has a significant impact on both types of pension contributions, and air pollution not only reduces pension contributions in the current period but exerts an even more pronounced lagged effect. This is generally consistent with the findings of Huang et al. [
68] and Dong et al. [
69] that long-term exposure to PM
2.5 has long-term effects on mortality, morbidity, etc., and thus, there are some lagged effects of air pollution risks. Additionally, the lag effect of PM
2.5 also affects the insurance sector; according to Brook et al.’s [
70] empirical findings, long-term exposure to PM
2.5 has been shown to reduce life expectancy and alter survival curves, and this reduction in life expectancy has a lagged effect, leading to changes in the timing and size of life insurance and pension liabilities. Adetutu et al.’s [
23] study revealed a correlation between rising pollution delay risks and changes in life insurance coverage. It also shows that insurance industries are sensitive to reductions in life expectancy and morbidity risks induced by air pollution. These findings are also consistent with the conclusion of our study that lagged PM
2.5 has a greater effect on pension contributions than the current PM
2.5.
As can be seen from
Table 9 and
Table 10, government financial spending, the level of education, population size, and income are the key factors explaining the differences in CATE under the URRPS pension system. This suggests that the impact of pollution on contributions varies more with these structural characteristics. The results also suggest that the sensitivity of pension contributions may be more pronounced in settings with high government financial input, high levels of education, and a large population. Furthermore, the insignificant contribution of provincial variables to the heterogeneity analysis indicates that provincial differences are not a significant factor in explaining fluctuations in CATE within the model and that socioeconomic characteristics are the primary drivers of heterogeneity rather than individual provincial variables. Meanwhile, under the UEBPI pension system, the heterogeneity analysis findings are consistent with those under the URRPS system. Structural and scale covariates, such as industrial structure, urban income level, population size, and tax level, mainly explain the differences in CATE within the DML model. The province-specific effects are negligible and suggest that more industrialized and larger provinces are more sensitive in terms of their pension contribution sensitivity to PM
2.5 exposure.
Additionally, understanding the importance of explanatory variables in predicting pension contributions provides valuable insights into the role of PM
2.5 concentrations as a potential predictor of pension contributions. The importance score serves as a measure of the contribution of a feature to the construction of double machine learning; thus, the more frequently a feature is utilized in building the DML, the higher its relative importance.
Figure 7 presents the importance plot for URRPS, indicating that rural population size, local government expenditure, and education level emerge as the most important predictors of pension contributions, which are related to the existence of government subsidies and incentives in the UEBPI system, as well as the fact that the education level enhances financial literacy and trust in the system, which leads to a more stable pension contribution path. Meanwhile, industry structure emerges as the most important predictor of pension contribution under UEBPI based on
Figure 8. This is directly linked to the fact that pensions under the system are linked to the wage base and the industrial structure leans towards high-value and standardized labor, which tends to result in a more stable level of contributions and a higher base. Importantly, both Max_pm2.5 and its lagged value rank among the top contributors under both different pension systems, reinforcing their substantive relevance to pension funding outcomes.
Combining the above empirical results, the outcomes obtained through Fuzzy RDD and DML correspond testably with the past literature in terms of direction, mechanism and strength. Firstly, the significant negative LATE of air pollution on pension contribution, as identified by Fuzzy RDD near the threshold, is consistent with Huang’s evidence [
17] that pollution inhibits firms’ production and thus compresses the social security contribution base. Meanwhile, the acceleration in marginal effects observed in the DML model as pollution intensity rises reflects the findings of nonlinear risk responses in the insurance market. For example, both crop insurance [
24] and life insurance [
23] are nonlinearly sensitive to pollution risk. Additionally, this study revealed that incorporating lagged effects into the DML model amplifies the impact of pollution risk on pension contributions, which aligns with past studies indicating that long-term PM
2.5 exposure has a lagged effect on mortality and disease [
25,
70]. This suggests that air pollution shocks do not instantly affect pensions but rather manifest gradually. Furthermore, this study revealed that pension contributions under UEBPI are more sensitive to industrial structures, while the pension contributions under URRPS are more sensitive to income, education, and government spending. This is consistent with the notion that labor productivity is significantly impacted by pollution in manufacturing scenarios [
44,
47] and that pollution moderates residents’ welfare and financial behaviors [
37,
42]. Meanwhile, in contrast to the established literature, which mostly reveals the negative economic impacts of pollution from the perspectives of healthcare expenditure or firm performance [
36,
38], our study combines pollution risk with the level of pension contributions and found that, under both the URRPS and the UEBPI, the effect of pollution risk on pension contributions is consistent in terms of both the threshold and lagged effects. The above comparative evidence provides an empirical basis for in-depth discussion of the explanatory mechanisms, system differences, and policy implications.