3.1. Model Specification
A review of the existing literature shows that causal inference on the effects of the NFCP can be conducted using methods such as DID, synthetic control, and regression control. However, the DID approach imposes a stringent requirement of the parallel trends assumption; if the NFCP improves the overall atmosphere for urban forest development, this assumption may be difficult to satisfy. Synthetic control and regression control methods are more suitable for cases where only a small number of cities implement the policy while most remain in the control group, and their applicability is limited under staggered policy implementation. Given the limitations of traditional causal inference methods, DML has attracted increasing attention [
18,
22]. In particular, the implementation of the NFCP is influenced by many factors, and these confounding variables are not only high-dimensional but may also have nonlinear relationships with urban forest development outcomes, thereby affecting the robustness of traditional causal inference models. In contrast, DML can effectively handle the nonlinear effects of these “high-dimensional” confounders on estimation results, significantly reducing the risk of omitted observable confounders and yielding more robust causal inference. Accordingly, the DML model is specified as follows:
In Equation (1), . Here, denotes the UFDE of city i in year t, and Dit is a dummy variable indicating whether city i has been selected as a National Forest City. represents the coefficient measuring the effect of NFCP on UFDE. Xit is a set of high-dimensional control variables, which may include confounding factors that simultaneously affect and . The function represents an unknown functional form of the control variables, which needs to be estimated using machine learning methods, denoted as .
At the same time, in order to obtain an unbiased estimator
of the NFCP treatment effect under a finite sample and to accelerate convergence, an auxiliary regression Equation (2) is constructed:
In Equation (2),
, and
is an unknown functional form that needs to be estimated using machine learning methods, denoted as
. Then, the residual
is computed using
. By the same principle, we can obtain
. Finally, using
as an instrumental variable for
, the estimate of the NFCP intervention coefficient can be obtained as follows:
By employing a double machine learning estimation, it not only helps to eliminate the negative impact of high-dimensional control variables, , on the treatment variable but the convergence rate of the NFCP treatment effect estimator can also be accelerated. Moreover, to reduce estimation variability arising from random sample splitting, we implement a two-fold cross-fitting procedure and repeat the DML estimation 20 times using different random partitions of the sample. The final results are then aggregated by taking the median of the estimates across all repetitions.
In the machine learning estimation, this study employs Lasso regression as the key regularization method to assist model estimation. Compared with other estimation methods, Lasso regression has stronger variable selection capability in high-dimensional data settings. By introducing the L1 regularization term, it can shrink the coefficients of less important variables toward zero, thereby achieving automatic variable selection. This feature not only helps alleviate multicollinearity but also significantly enhances the interpretability of the model. In addition, Lasso regression performs robustly when handling high-dimensional control variables, effectively avoiding overfitting and improving the generalization ability of the estimation results. Moreover, the Lasso model has a simple structure and high computational efficiency, and it produces results that are easier to interpret and analyze in an economic context. Therefore, within the double machine learning framework, using Lasso regression to estimate the nuisance functions and helps improve the accuracy and robustness of the estimated NFCP treatment effects.
In addition, ridge regression and elastic net also have irreplaceable advantages under a high-dimensional linearity framework. Ridge regression alleviates the problem of multicollinearity by introducing an L2 regularization term, enabling stable estimation results when variables are highly correlated. Elastic net combines the advantages of L1 and L2 regularization: It not only performs variable selection but also avoids the issue of Lasso randomly selecting variables when strong correlations exist, thereby improving model interpretability and predictive stability. Therefore, in the presence of high-dimensional control variables and correlations among them, ridge regression and elastic net provide important support for parameter shrinkage and robust estimation. Accordingly, this study mainly adopts Lasso regression to estimate the double machine learning causal inference model, and it uses elastic net and ridge regression for robustness checks.
Considering that endogeneity issues—such as omitted unobserved variables and potential reverse causality—may bias the regression results, this study employs both the DML-based Partially Linear Instrumental Variable Model (PLIV) and the DID method to address endogeneity. The PLIV model is specified as follows:
In Equations (4) and (5), is the instrumental variable introduced in this study, while and are unknown nonlinear functions, capturing the potentially complex effects of control variables on the dependent variable and the instrumental variable, respectively. and are random disturbance terms with zero mean. Other variables have the same definitions as in Equations (1) and (2).
Under the DML framework, no specific functional form assumptions are imposed on
and
. Instead, machine learning methods are utilized to estimate
,
, and
, respectively. Based on this, residuals are constructed through Neyman orthogonality:
Subsequently, using as the instrumental variable, an instrumental variable regression of on is performed to obtain a consistent estimate of the parameter . To avoid overfitting bias, this study employs a 2-fold cross-fitting approach. The sample is randomly divided into two sub-samples, and machine learning models are trained and used for prediction across the different sub-samples to reduce overfitting bias and improve the robustness of the estimation.
The DID model is specified as follows:
where
and
have the same meanings as in the benchmark regression;
represents whether city
i is included in the pilot program in year
t;
and
denote city individual fixed effects and time fixed effects, respectively;
is the random error term; the coefficient
represents the policy effect of the NFCP on UFDE. All of the above models were estimated using Stata 18.0.