Evaluation of the Urban Forest Development Effectiveness in Chinese Cities: A Causal Inference Approach Based on Double Machine Learning

Liu, Huanpeng; Wang, Luning; Wei, Feng; Wang, Yameng

doi:10.3390/f17060666

Open AccessArticle

Evaluation of the Urban Forest Development Effectiveness in Chinese Cities: A Causal Inference Approach Based on Double Machine Learning

¹

School of Economics, Qufu Normal University, Rizhao 276825, China

²

College of Economics and Management, Northwest A&F University, Yangling 712100, China

^*

Author to whom correspondence should be addressed.

Forests 2026, 17(6), 666; https://doi.org/10.3390/f17060666

Submission received: 23 April 2026 / Revised: 23 May 2026 / Accepted: 28 May 2026 / Published: 30 May 2026

(This article belongs to the Special Issue Integrative Forest Governance, Policy, and Economics)

Download

Browse Figures

Versions Notes

Abstract

In the context of rapid urbanization and climate change, evaluating urban forest development and the effectiveness of related policies is of great significance. This study takes Chinese prefecture-level cities as the research object and constructs an evaluation system for Urban Forest Development Effectiveness (UFDE), encompassing forest networks, forest health, ecological welfare, and development coordination. The analytic hierarchy process–entropy weight method is employed to measure UFDE. On this basis, leveraging the quasi-natural experiment formed by the staggered implementation of the National Forest City Policy (NFCP), this paper applies double machine learning (DML) to identify the causal effects of the policy. The results show that NFCP significantly improves UFDE, and this conclusion remains robust across various model specifications and robustness checks. Meanwhile, the policy effects exhibit significant heterogeneity, being more pronounced in eastern and central regions, as well as in humid climate zones, while being relatively weaker in western and arid regions. Methodologically, this study introduces DML to enhance the precision of causal identification, and in terms of measurement, it achieves a multidimensional, comprehensive evaluation. It provides a new analytical framework for assessing environmental policy effectiveness and offers empirical evidence for optimizing urban ecological governance and promoting green development.

Keywords:

urban forest development effectiveness; national forest city policy; double machine learning; causal inference; ecological benefits

1. Introduction

Against the backdrop of intensifying global climate change and rapid urbanization, achieving a dynamic balance between economic growth and ecological protection has become a central issue in contemporary economics and public policy research. A large body of studies has shown that urban green infrastructure, particularly urban forests, plays an irreplaceable role in improving air quality, regulating urban climate, enhancing ecosystem services, and promoting residents’ well-being [1,2,3]. At the same time, with the deepening of the concept of sustainable development, urban green spaces have been incorporated into national governance and the global development agenda, becoming an important pathway for achieving the United Nations Sustainable Development Goals [4]. Therefore, methods for promoting urban forest development through effective institutional arrangements have become a key concern for governments around the world.

Against this backdrop, to systematically advance urban ecological development, China has implemented the National Forest City initiative since 2004, gradually forming the National Forest City Policy (NFCP) with clear standards and an established evaluation system [5]. Through institutionalized assessment mechanisms and incentive–constraint arrangements, this policy guides local governments to increase investment and institutional innovation in urban planning, ecological restoration, and greening development. Compared with conventional environmental governance policies, the NFCP is more comprehensive and long-term oriented [6]. It not only emphasizes improvements in green coverage but also focuses on the construction of ecological networks, optimization of forest quality, and the overall enhancement of ecosystem service functions. Therefore, the policy serves not only as a key institutional tool for promoting ecological civilization in China but also as an important practical vehicle for evaluating local governments’ capacity for green governance. In theory, as a selection-based institutional instrument, the NFCP establishes explicit greening targets and comprehensive evaluation criteria, encouraging local governments to compete and invest in ecological development, thereby exerting a significant influence on Urban Forest Development Effectiveness (UFDE) in Chinese cities. However, an important question remains: has the NFCP actually improved UFDE in practice?

Given the importance of the NFCP in China, a substantial body of literature has examined it in depth [7,8,9]. Some studies suggest that well-designed environmental policies can not only correct externalities but also enhance resource allocation efficiency by stimulating innovation [10,11]. In the Chinese context, many scholars have explored related policies from the perspectives of air pollution control and green development [12], noting that environmental governance is often achieved through “top-down” policy transmission and local government incentive mechanisms [12,13]. In addition, with the advancement of econometric methods, some studies have begun to employ quasi-natural experiments and difference-in-differences approaches to identify policy effects [14].

Despite the growing body of related research, several notable limitations remain. First, in terms of indicator selection, existing studies mainly focus on single ecological functions of urban forests (such as air purification or carbon storage), lacking a systematic measurement of the “comprehensive effectiveness of urban forest development” [1,15]. Although some scholars have summarized the benefits of urban forests from a multidimensional perspective, a quantitative evaluation framework is still lacking [16]. Second, regarding research content, scholars have primarily focused on pollution control performance without directly examining the policy-driven mechanisms underlying urban forest development itself [12,17]. Third, methodologically, traditional multi-period difference-in-differences (DID) models have limitations in handling high-dimensional variables and complex non-linearities. Crucially, under staggered policy implementation, traditional TWFE-DID models are often constrained by the stringent parallel trends assumption and plagued by the “negative weighting” issue from heterogeneous treatment effects, leading to biased estimates. The double machine learning (DML) framework [18] offers substantive methodological gains in overcoming these challenges. By utilizing orthogonalization to construct Neyman orthogonal moments, DML relaxes the strict exogeneity assumption of policy timing. Thus, in the context of the NFCP’s staggered rollout, DML can effectively eliminate selection bias caused by heterogeneous initial endowments without forcing compliance with the parallel trends assumption, achieving precise and unbiased causal identification. Despite its significant advantages, its application in evaluating specific forest policies remains limited. In addition, many studies rely on case analyses or descriptive approaches, lacking rigorous causal identification [4,19]. Finally, empirical research on the NFCP—a typical selection-based policy instrument—remains scarce, especially studies that systematically exploit its staggered implementation for evaluation.

Based on this, the present study takes Chinese prefecture-level cities as the research object and examines the effects of the NFCP. Unlike previous studies that rely on single indicators or descriptive analysis, this paper first constructs a multidimensional evaluation system for urban forest development using the analytic hierarchy process (AHP)–entropy weight method, capturing UFDE in China across multiple dimensions, including forest networks, forest health, ecological welfare, and the coupling coordination degree of forest development. Second, in terms of methodology, this study introduces DML [18] to identify the causal effects of the policy. This approach addresses estimation bias arising from high-dimensional control variables and potential nonlinear relationships [14], thereby improving the robustness and precision of causal inference. This research design enables a more accurate identification of the net policy effects in complex real-world settings, enhancing the credibility of the conclusions. In addition, this study systematically examines the heterogeneity of policy effects from the perspectives of regional differences, natural conditions, and governance characteristics, thereby deepening the understanding of the underlying mechanisms. These extensions respond, to some extent, to the current research frontier in environmental economics concerning causal identification in complex policy environments [20,21].

The remainder of this study is organized as follows: Section 2 introduces the policy background; Section 3 presents the methods and data; Section 4 reports the regression results; Section 5 discusses the findings; Section 6 concludes this study.

2. Background

During China’s rapid industrialization and urbanization, tightening resource constraints and increasing environmental pressures have made the traditional growth-oriented development model increasingly unsustainable, creating a need for transformation. Against this backdrop, the Chinese government has gradually introduced the concept of “ecological civilization” and incorporated environmental quality as a key objective within the national governance system. As spatial units where population and economic activities are highly concentrated, cities’ ecological carrying capacity is directly linked to the level of sustainable development. Urban forests, as a core component of urban green infrastructure, play a crucial role in improving air quality, regulating climate, and enhancing ecosystem service functions, and they have therefore become a key focus of policy support.

The NFCP is an important institutional arrangement introduced under this broader macrocontext. To actively promote urban forest development in China and to incentivize and recognize cities that have made remarkable achievements in this field while setting exemplary models for ecological construction, the National Afforestation Committee and the State Forestry Administration launched the “National Forest City” designation program in 2004. They also formulated the “Evaluation Indicators for National Forest Cities” and the “Application Measures for National Forest Cities.” At the same time, the China Urban Forest Forum has been held annually. As the highest-level forum in the field of urban ecology and urban forest development in China, it is guided by the following vision: “bringing forests into cities and enabling cities to embrace forests.” Cities awarded the title of “National Forest City,” as evaluated and approved by the National Afforestation Committee and the State Forestry Administration, will be officially announced at this forum.

In terms of implementation, the NFCP has been carried out in batches and advanced in a rolling manner, exhibiting a typical pattern of gradual diffusion. In 2004, the former State Forestry Administration launched the National Forest City selection program and awarded the first batch of National Forest City titles to seven cities, marking the policy’s pilot exploration stage. Subsequently, the policy was gradually expanded nationwide: The second and third batches were conducted in 2005 and 2006, respectively, with a continuous increase in the number of selected cities and a significant expansion of policy coverage. Thereafter, the selection process entered a normalized stage, generally conducted on an annual basis, with the cumulative number of designated cities steadily rising and forming a policy pattern that expanded from pilot points to broader coverage. As of 5 January 2024, a total of 219 cities nationwide had been designated as National Forest Cities, with all 31 provinces, autonomous regions, and municipalities achieving full participation in the National Forest City initiative.

From an institutional perspective, the NFCP adopts a “phased admission and dynamic expansion” implementation approach, whereby different cities enter the policy system at different points in time, objectively creating policy shocks with temporal variation. Compared with a one-time, full-scale rollout, this gradual approach not only reduces the uncertainty of policy implementation but also provides room for the accumulation and dissemination of policy experience. Moreover, due to the exogenous differences in the timing of cities’ selection, this institutional arrangement constitutes a typical “quasi-natural experiment” from a methodological standpoint, offering favorable conditions for using DML to identify the causal effects of the policy.

The NFCP can influence UFDE through multiple mechanisms. On the one hand, the policy strengthens fiscal incentives, guiding local governments to increase investment in greening projects, ecological restoration, and public green space development, thereby directly expanding the scale of urban forests and green areas. On the other hand, by relying on quantitative indicators and performance evaluation systems, it embeds ecological objectives into urban planning and land-use decisions, institutionally constraining extensive development practices. At the same time, the implementation process is supported by technical standards and the dissemination of best practices, which enhance the scientific and refined management of urban forest development. In addition, the reputational incentives and demonstration effects generated by the selection mechanism further stimulate competition among local governments in the field of green development, thereby continuously promoting ecological investment and institutional innovation. However, due to differences in economic development levels, natural resource endowments, and governance capacity, cities may respond differently to the NFCP and achieve varying policy outcomes. This can lead to significant spatial heterogeneity in policy effectiveness. Therefore, it is necessary to systematically evaluate its effects and underlying mechanisms within a unified policy framework.

3. Data and Methods

3.1. Model Specification

A review of the existing literature shows that causal inference on the effects of the NFCP can be conducted using methods such as DID, synthetic control, and regression control. However, the DID approach imposes a stringent requirement of the parallel trends assumption; if the NFCP improves the overall atmosphere for urban forest development, this assumption may be difficult to satisfy. Synthetic control and regression control methods are more suitable for cases where only a small number of cities implement the policy while most remain in the control group, and their applicability is limited under staggered policy implementation. Given the limitations of traditional causal inference methods, DML has attracted increasing attention [18,22]. In particular, the implementation of the NFCP is influenced by many factors, and these confounding variables are not only high-dimensional but may also have nonlinear relationships with urban forest development outcomes, thereby affecting the robustness of traditional causal inference models. In contrast, DML can effectively handle the nonlinear effects of these “high-dimensional” confounders on estimation results, significantly reducing the risk of omitted observable confounders and yielding more robust causal inference. Accordingly, the DML model is specified as follows:

Y_{i t} = θ_{0} D_{i t} + g (X_{i t}) + ε_{i t}

(1)

In Equation (1),

E (ε_{i t} |D_{i t}, X_{i t}) = 0

. Here,

Y_{i t}

denotes the UFDE of city i in year t, and D_it is a dummy variable indicating whether city i has been selected as a National Forest City.

θ_{0}

represents the coefficient measuring the effect of NFCP on UFDE. X_it is a set of high-dimensional control variables, which may include confounding factors that simultaneously affect

Y_{i t}

and

D_{i t}

. The function

g (X_{i t})

represents an unknown functional form of the control variables, which needs to be estimated using machine learning methods, denoted as

\hat{g} (X_{i t})

.

At the same time, in order to obtain an unbiased estimator

{\hat{θ}}_{0}

of the NFCP treatment effect under a finite sample and to accelerate convergence, an auxiliary regression Equation (2) is constructed:

D_{i t} = m (X_{i t}) + ω_{i t}

(2)

In Equation (2),

E (ω_{i t} |X_{i t}) = 0

, and

m (X_{i t})

is an unknown functional form that needs to be estimated using machine learning methods, denoted as

\hat{m} (X_{i t})

. Then, the residual

{\hat{ω}}_{i t}

is computed using

{\hat{ω}}_{i t} = D_{i t} - \hat{m} (X_{i t})

. By the same principle, we can obtain

Y_{i t} - \hat{g} (X_{i t}) = θ_{0} D_{i t} + ε_{i t}

. Finally, using

{\hat{ω}}_{i t}

as an instrumental variable for

D_{i t}

, the estimate of the NFCP intervention coefficient can be obtained as follows:

{\hat{θ}}_{0} = (\frac{1}{N T} \sum_{i = 1}^{N} \sum_{t = 1}^{T} {\hat{ω}}_{i t} D_{i t}) (\frac{1}{N T} \sum_{i = 1}^{N} \sum_{t = 1}^{T} {\hat{ω}}_{i t} (Y_{i t} - \hat{g} (X_{i t})))

(3)

By employing a double machine learning estimation, it not only helps to eliminate the negative impact of high-dimensional control variables,

X_{i t}

, on the treatment variable

D_{i t}

but the convergence rate of the NFCP treatment effect estimator

{\hat{θ}}_{0}

can also be accelerated. Moreover, to reduce estimation variability arising from random sample splitting, we implement a two-fold cross-fitting procedure and repeat the DML estimation 20 times using different random partitions of the sample. The final results are then aggregated by taking the median of the estimates across all repetitions.

In the machine learning estimation, this study employs Lasso regression as the key regularization method to assist model estimation. Compared with other estimation methods, Lasso regression has stronger variable selection capability in high-dimensional data settings. By introducing the L₁ regularization term, it can shrink the coefficients of less important variables toward zero, thereby achieving automatic variable selection. This feature not only helps alleviate multicollinearity but also significantly enhances the interpretability of the model. In addition, Lasso regression performs robustly when handling high-dimensional control variables, effectively avoiding overfitting and improving the generalization ability of the estimation results. Moreover, the Lasso model has a simple structure and high computational efficiency, and it produces results that are easier to interpret and analyze in an economic context. Therefore, within the double machine learning framework, using Lasso regression to estimate the nuisance functions

m (X_{i t})

and

g (X_{i t})

helps improve the accuracy and robustness of the estimated NFCP treatment effects.

In addition, ridge regression and elastic net also have irreplaceable advantages under a high-dimensional linearity framework. Ridge regression alleviates the problem of multicollinearity by introducing an L₂ regularization term, enabling stable estimation results when variables are highly correlated. Elastic net combines the advantages of L₁ and L₂ regularization: It not only performs variable selection but also avoids the issue of Lasso randomly selecting variables when strong correlations exist, thereby improving model interpretability and predictive stability. Therefore, in the presence of high-dimensional control variables and correlations among them, ridge regression and elastic net provide important support for parameter shrinkage and robust estimation. Accordingly, this study mainly adopts Lasso regression to estimate the double machine learning causal inference model, and it uses elastic net and ridge regression for robustness checks.

Considering that endogeneity issues—such as omitted unobserved variables and potential reverse causality—may bias the regression results, this study employs both the DML-based Partially Linear Instrumental Variable Model (PLIV) and the DID method to address endogeneity. The PLIV model is specified as follows:

Y_{i t} = α_{0} D_{i t} + g (X_{i t}) + U_{i t}

(4)

Z_{i t} = m (X_{i t}) + V_{i t}

(5)

In Equations (4) and (5),

Z_{i t}

is the instrumental variable introduced in this study, while

g (X_{i t})

and

m (X_{i t})

are unknown nonlinear functions, capturing the potentially complex effects of control variables on the dependent variable and the instrumental variable, respectively.

U_{i t}

and

V_{i t}

are random disturbance terms with zero mean. Other variables have the same definitions as in Equations (1) and (2).

Under the DML framework, no specific functional form assumptions are imposed on

g (X_{i t})

and

m (X_{i t})

. Instead, machine learning methods are utilized to estimate

E (Y_{i t} |X_{i t})

,

E (D_{i t} |X_{i t})

, and

E (Z_{i t} |X_{i t})

, respectively. Based on this, residuals are constructed through Neyman orthogonality:

{\tilde{Y}}_{i t} = Y_{i t} - E (Y_{i t} |X_{i t}), {\tilde{D}}_{i t} = D_{i t} - E (D_{i t} |X_{i t}), {\tilde{Z}}_{i t} = Z_{i t} - E (Z_{i t} |X_{i t})

Subsequently, using

{\tilde{Z}}_{i t}

as the instrumental variable, an instrumental variable regression of

{\tilde{Y}}_{i t}

on

{\tilde{D}}_{i t}

is performed to obtain a consistent estimate of the parameter

α_{0}

. To avoid overfitting bias, this study employs a 2-fold cross-fitting approach. The sample is randomly divided into two sub-samples, and machine learning models are trained and used for prediction across the different sub-samples to reduce overfitting bias and improve the robustness of the estimation.

The DID model is specified as follows:

Y_{i t} = β N F C P_{i t} + θ X_{i t} + μ_{i} + ρ_{t} + ε_{i t}

where

Y_{i t}

and

X_{i t}

have the same meanings as in the benchmark regression;

N F C P_{i t}

represents whether city i is included in the pilot program in year t;

μ_{i}

and

ρ_{t}

denote city individual fixed effects and time fixed effects, respectively;

ε_{i t}

is the random error term; the coefficient

β

represents the policy effect of the NFCP on UFDE. All of the above models were estimated using Stata 18.0.

3.2. Construction UFDE

3.2.1. Calculating Indicator Weights Using AHP

This study, based on the UFDE indicator system, selects specific indicators from four dimensions: forest network, forest health, ecological welfare, and development coordination (as shown in Table 1). Among them, the forest network is measured by forest coverage rate and green coverage rate of built-up areas; forest health is measured by the growth rate of forest coverage and the growth rate of green space area in built-up areas; ecological welfare is measured by per capita park green space and per capita green space in built-up areas; development coordination is measured by the coupling coordination degree (the calculation process is provided in the Appendix A) between forest coverage rate and green coverage rate of built-up areas. All indicators are positive in orientation.

To evaluate the UFDE, this study first uses the AHP method to build a judgment matrix and then analyzes and scores the UFDE measurement factors. On this basis, together with expert scoring, it compares and calculates the weights for the first-level indicators of the UFDE measurement factors and then constructs a judgment matrix (see Table 2).

This study uses MCDM Online to calculate the weight matrix. Using the geometric mean method, the weight vector for the first-level indicators (i.e., the criterion layer) is obtained as follows: 0.55786, 0.09633, 0.24948, and 0.09633. In addition, MCDM Online provides the results of the consistency test: The maximum eigenvalue is 4.04338, the consistency index CI is 0.01446, the consistency ratio CR is 0.01607, and RI is 0.9. These results indicate that the consistency test is passed; the judgment matrix has perfect consistency and satisfies the requirements of the AHP model. Using the same approach, the indicator-layer weights are calculated via MCDM Online as 0.75, 0.25, 0.75, 0.25, 0.83333, 0.16667, 1, and they also pass the consistency test.

3.2.2. Combined Weighting Results of AHP–Entropy Weight Method

To balance the subjective judgment and objective information in assigning indicator weights, this study adopts a combination of the AHP method and the entropy weight method to determine the weights of the indicators, and then, it uses these weights to construct the UFDE evaluation index. First, all indicators are normalized to eliminate the effects of different dimensions. To remove the influence of scale differences, the min–max normalization method is used to process the original data. Since all indicators in this study are positive indicators, the following formula is used:

Z_{i j} = \frac{X_{i j} - \min (X_{j})}{\max (X_{j}) - \min (X_{j})}

Next, the objective weights are calculated using the entropy weight method. First, the proportion of the j-th indicator in the i-th sample is calculated as follows:

p_{i j} = \frac{Z_{i j}}{\sum_{i = 1}^{n} Z_{i j}}

On this basis, information entropy is calculated as

e_{j} = - k \sum_{i = 1}^{n} p_{i j} \ln (p_{i j})

Specifically,

k = 1 / \ln (n)

. It is acceptable to compute information entropy as

d_{j} = 1 - e_{j}

. The entropy calculation formula used in the final method is as follows:

w_{j}^{(E)} = \frac{d_{j}}{\sum_{i = 1}^{m} d_{j}}

Finally, the subjective and objective weights are combined to obtain the comprehensive weight:

w_{j} = θ w_{j}^{(A)} + (1 - θ) w_{j}^{(E)}

where

w_{j}^{(A)}

is the weight of each indicator obtained by AHP,

θ

is the weight preference parameter, and the consistency ratio is 0.5, indicating that the indicator weights and expert judgment weights are equally important.

Based on integrated weighting, the UFCF comprehensive evaluation index is calculated:

Y_{i} = \sum_{j = 1}^{m} w_{j} Z_{i j}

This method effectively combines the subjective judgments of experts and the objective information in the data, enabling the evaluation results to be more scientific and more reflective of the overall decision-making process.

3.3. Data Sources

The study sample covers the period 2003–2022, determined based on data availability. Forest coverage rate and the growth rate of forest coverage are mainly obtained from the China Forestry Statistical Yearbook, the China Forestry and Grassland Statistical Yearbook, and the forestry statistical bulletins of individual provinces (municipalities). The greening coverage rate of built-up areas, per capita park green space, and per capita green space in built-up areas are mainly sourced from the China City Statistical Yearbook and the statistical yearbooks of each city. In addition, the data are supplemented and cross-validated using the Urban Construction Statistical Yearbook and annual bulletins released by local housing and urban–rural development authorities to ensure completeness and consistency. The NFCP data are primarily based on the list of “National Forest Cities” and related announcement documents published on the official website of the National Forestry and Grassland Administration. For control variables, the calculation data for per capita GDP, level of opening-up, population density, urbanization rate, industrial structure upgrading, and fiscal pressure are drawn from the China City Statistical Yearbook and the statistical yearbooks of each city. Government governance capability is classified by whether a city is a smart city—assigned 1 if yes and 0 if no (see Table A1 for specific definitions). The annual average temperature is compiled based on raw data provided by the National Centers for Environmental Information (NCEI), which is part of the National Oceanic and Atmospheric Administration (NOAA). Average city elevation and terrain ruggedness are derived by extracting and calculating data from the SRTM DEM dataset provided by the National Aeronautics and Space Administration (NASA). During the data processing procedure, variables with a small number of missing values are imputed using linear interpolation. To avoid estimation bias caused by interpolation, this study applies linear interpolation to fill in missing data for cities with a missing rate of less than 10%, and it directly excludes samples with a missing rate exceeding 10%, such as Harbin, Qiqihar, Changchun, Jilin, and Pu’er.

4. Results

4.1. Benchmark Regression

Table 3 reports the benchmark regression results from using a DML causal inference approach with LASSO-based regression to identify the effect of NFCP on China’s UFDE. In column (1), where only time and city fixed effects are controlled, the estimated coefficient of NFCP’s intervention effect is 0.050 and is statistically significant at the 5% level. This indicates that even under the initial model specification, NFCP exhibits a preliminary positive effect on UFDE. In column (2), after additional control variables are included, the intervention effect coefficient of NFCP increases to 0.060 and remains significant at the 1% level. In column (3), which further incorporates the squared terms of the control variables to account for potential nonlinearities, the NFCP intervention effect coefficient decreases slightly to 0.055 but continues to be significant at the 1% level. This suggests that, after controlling for other confounding factors and their potential nonlinear effects, the identified causal impact of NFCP on UFDE becomes more precise. In column (4), province fixed effects are reintroduced on the basis of the previous specifications. The results show that the NFCP intervention effect coefficient is 0.040 and remains significant at the 1% level. This implies that, once the model comprehensively corrects for multidimensional confounding factors, nonlinear relationships, and potential omissions of macro-level variables, cities implementing NFCP experience an average increase in UFDE of 4.0%. Overall, the results demonstrate that NFCP has a statistically significant positive effect on UFDE in China, thereby providing strong evidence for the beneficial role of NFCP in promoting urban greening, enhancing ecological welfare, and fostering green development.

4.2. Addressing Endogeneity

This study constructs the cumulative number of approved National Forest Cities in other provinces nationwide (excluding the focal province) in a given year as the instrumental variable for whether a given city is approved. In the context of Chinese-style decentralization and the promotion tournaments among local officials, ecological governance exhibits significant peer effects. An increase in the number of National Forest Cities in other provinces will, through the demonstration effect of central policy guidance and horizontal competitive pressure, prompt the local provincial and municipal governments to intensify their application efforts, thereby positively impacting the probability of the focal city becoming a National Forest City. Furthermore, the number of forest city construction approvals in other provinces is jointly determined by external macro-policies and the decisions of non-local governments, making it independent of the focal city in terms of both geography and administrative jurisdiction. The number of approvals in other regions does not directly affect the local UFDE through direct fiscal transfers or natural ecosystem spillovers; instead, it operates solely through the single channel of influencing the focal city’s policy implementation. Therefore, the instrumental variable satisfies both the relevance and exogeneity conditions. Column (5) of Table 3 presents the estimation results of the PLIV model. As shown by the estimates, the coefficient of the NFCP passes the significance test at the 10% level, indicating that the benchmark results remain robust after controlling for potential endogeneity issues.

Column (6) of Table 3 reports the estimation results of the DID model, and it can be seen that under the premise of accounting for two-way fixed effects, the NFCP can significantly improve the urban UFDE. Moreover, the regression coefficient of the core explanatory variable remains significantly positive at the 1% level, indicating that the benchmark regression results are highly robust.

The parallel trend assumption is a fundamental prerequisite for the DID method to identify causal effects. It requires that, in the absence of a policy shock, the outcome variables of the treatment and control groups follow the same trajectory. Only when this assumption holds can the DID estimator be interpreted as the net effect of the policy; otherwise, the estimation results may be biased by conflating the policy effect with pre-existing trend differences. By conducting a pre-trend test or employing an event study approach, one can intuitively determine whether the two groups shared a consistent dynamic path prior to the policy’s implementation, thereby enhancing the credibility and causal explanatory power of the empirical results. This study utilizes an event study approach to further test for parallel trends before the policy and to examine the dynamic effects after the policy. The test results are presented in Figure 1. As shown, none of the regression coefficients prior to the implementation of the NFCP pass the significance test; however, in the year of NFCP implementation and subsequent years, the regression coefficients become significantly positive. This indicates that the estimation results of the difference-in-differences model are valid and robust.

4.3. Heterogeneity Analysis

Given the pronounced non-uniformity among China’s prefecture-level cities in terms of natural–geographical conditions, levels of economic development, factor endowments, and government governance capacity, relying solely on the average treatment effect for the full sample may obscure the true heterogeneity in policy impacts across different types of cities. As emphasized in most studies that evaluate China’s regional or environmental policies, during the downward transmission of uniform macro-level policies, implementation outcomes are constrained by multiple interrelated local factors, such as local fiscal capacity, industrial structure (e.g., resource-based versus non-resource-based industries), natural–climatic conditions (e.g., humid vs. arid zones), and administrative mobilization capacity (i.e., “big government” versus “small government”). These mechanisms can generate substantial asymmetric causal effects. Therefore, conducting a multidimensional heterogeneity decomposition of the sample not only enables a more accurate identification of the urban contexts in which NFCP yields the largest marginal benefits, thereby revealing the potential constraints under which the policy is effective, but also provides more targeted empirical evidence and theoretical support for future “tailored” and “classified” approaches to advancing urban greening. Moreover, heterogeneity analysis can effectively mitigate the risk of encountering Simpson’s paradox that may arise in pooled regressions. Against this background, this study performs the following heterogeneity analyses.

4.3.1. Regional Heterogeneity

Given China’s vast territory, there are substantial macroscopic disparities among the eastern, central, and western regions in terms of stages of economic development, local fiscal capacity, the pace of urbanization, and the underlying natural ecological endowments. Such location-based differences may lead to markedly heterogeneous marginal effects when a uniform national-level policy is transmitted and implemented in a top-down manner. Therefore, this study partitions the full sample into three regional subsamples—east, central, and west—based on cities’ geographical location to examine the asymmetric effects of NFCP across different macro-regions.

The regression results (columns 1–3 of Table 4) indicate that the intervention effect of NFCP exhibits a pronounced pattern of spatial non-uniformity. Specifically, in the eastern and central regional subsamples, the estimated intervention effect coefficients are 0.065 and 0.067, respectively, and both are statistically significant at the 1% level. These findings objectively demonstrate that, in both regions, the implementation of NFCP effectively improves UFDE, and the magnitude of the coefficients suggests that the central region experiences a slightly stronger marginal driving effect than the eastern region. By contrast, for the western regional subsample, the NFCP intervention effect coefficient is only 0.012, and it fails to achieve statistical significance. This implies that, under the current sample, NFCP has not produced a substantively positive and statistically significant impact on UFDE in western cities. Overall, the policy dividend of NFCP appears to be effectively realized in the eastern and central regions, whereas a clear attenuation of the policy effect is observed in the western region.

4.3.2. City Size Heterogeneity

City size typically reflects the population carrying capacity and density of a region, the degree of constraints on land resources within built-up areas, and the scale of allocation of administrative and fiscal resources by local governments. Owing to inherent structural differences across cities of varying sizes—in terms of pressures on green space planning, the carrying capacity of infrastructure, and the initial conditions for ecological construction—the implementation pathways and ultimate effectiveness of a uniform NFCP when it is rolled out to cities of different scales may exhibit heterogeneity. Accordingly, this study partitions the full sample into large-city and small-city subsamples based on city size characteristics in order to further test whether the NFCP effect exhibits scale-dependent asymmetry. The regression results (columns 4–5 of Table 4) show that, in the large-city and small-city subsamples, the estimated NFCP intervention effects are 0.081 and 0.047, respectively, and both are significant at the 1% level. These empirical findings indicate that the effectiveness of NFCP is not constrained in absolute terms by city size; the policy generates a comprehensive and positive effect on the UFDE of both large and small cities.

4.3.3. Heterogeneity in Government Intervention Intensity

As a typical public-sector-led urban environmental and ecological project, NFCP’s planning implementation and sustained advancement often rely heavily on local governments’ administrative mobilization capacity and their coordinated fiscal resources. Differences in local governments’ intervention intensity—typically manifested as variations in the relative size of public fiscal expenditure, i.e., “big government” versus “small government”—may directly determine the degree of resource allocation and governance efficiency during the policy execution process. Based on this, we measure the degree of government intervention using the ratio of fiscal expenditure to GDP. Cities with a degree of government intervention higher than the sample mean are classified as “big government” cities, while those below the mean are classified as “small government” cities. Consequently, we divide the full sample into two sub-samples—”big government” and “small government”—to examine whether the effectiveness of the NFCP is conditioned by differences in the scale of local governments.

The regression results (columns 1–2 of Table 5) show that, in the grouped tests by government intervention intensity, the estimated intervention effect coefficients of the NFCP are 0.045 for the big government subsample and 0.041 for the small government subsample. Among them, the intervention coefficient in the big government group is significant at the 5% level, whereas the coefficient in the small government group is significant at the more stringent 1% level. These findings objectively indicate that, regardless of whether the local government’s intervention scale is relatively large or small, NFCP can effectively and significantly promote urban forest development in prefecture-level cities. Moreover, by comparing the regression outcomes across the two groups, it can be observed that not only are both coefficients positive and statistically significant but their absolute magnitudes are also very close (0.045 versus 0.041). This suggests that NFCP’s positive driving effect on UFDE exhibits a high degree of consistency and stability across different government scales, and the policy effectiveness does not display a clear polarization or asymmetric pattern due to differences in local governments’ intervention intensity.

4.3.4. Heterogeneity in Resource Endowments

A city’s development trajectory is often tightly linked to its initial resource endowment and the structure of its dominant industries. Resource-based cities typically rely on the extraction and primary processing of natural resources for a prolonged period. Compared with non-resource-based cities, they often face more complex issues related to land degradation and leftover damage, relatively weak ecological foundations, and a more heavily constrained industrial structure. These inherent differences in ecological and industrial conditions may therefore lead to systematic disparities between the two types of cities in terms of the initial difficulty of advancing urban greening, the level of cost investment required, and the implementation pathways adopted. Whether a uniform NFCP yields differentiated implementation effects across these two city types is thus an empirical question worthy of in-depth examination. Accordingly, based on cities’ resource-endowment characteristics, this study divides the full sample into resource-based cities and non-resource-based cities to conduct the heterogeneity tests.

Regression results (columns 3–4 of Table 5) show that, in the grouped tests by resource endowment, the estimated NFCP intervention effect coefficients are highly consistent across the two subsamples: both equal 0.036. In terms of statistical significance, the intervention effect coefficient in non-resource-based cities passes the most stringent 1% significance level, whereas the coefficient in resource-based cities is significant at the 10% level. These empirical results objectively indicate that, regardless of whether a city has resource-based industrial characteristics, NFCP can exert a substantive positive driving effect on its UFDE. Notably, the absolute values of the intervention coefficients are identical across the two groups. This suggests that, under the same national policy shock, resource-based cities do not receive a smaller average marginal improvement in urban greening outcomes than non-resource-based cities. Although constrained by within-sample variance or other potential factors, statistical significance in the resource-based city subsample is slightly weaker than that in the non-resource-based city subsample (10% versus 1%). The overall results confirm that the policy delivers stable positive policy expectations in both types of cities, without showing clear evidence of policy exclusion attributable to differences in resource endowments.

4.3.5. Heterogeneity in Natural Climatic Conditions

As an ecosystem project centered on living vegetation, urban forest development is unavoidably constrained by fundamental natural climatic conditions, such as precipitation and humidity. Cities with different climatic characteristics exhibit inherent disparities in the selection of suitable tree species, vegetation survival rates, the difficulty of afforestation and landscaping, and the requirements for subsequent maintenance and management. Therefore, when a uniform NFCP is implemented across climate zones, its policy effectiveness is likely to diverge significantly. Against this background, this study divides the full sample into two subsamples—“humid-type” and “arid-type” cities—based on the wet–arid climatic conditions of the cities’ locations to test whether natural climate endowments introduce heterogeneous interference in the implementation effects of the policy.

The regression results (columns 5–6 of Table 5) show that the heterogeneity test by the wet–arid type exhibits the most pronounced inter-group differences among all the subgroup analyses conducted in this study. Specifically, in the humid-type city subsample, the estimated NFCP intervention effect coefficient reaches 0.102 and is highly statistically significant at the 1% level. Moreover, this coefficient is the largest in absolute magnitude across all heterogeneity analyses, indicating that NFCP substantially promotes UFDE in humid cities. By contrast, in the arid-type city subsample, the intervention effect coefficient sharply declines to 0.018, failing to achieve statistical significance at any conventional level. This starkly different empirical pattern objectively suggests that, under the current data sample, the positive effects generated by NFCP are almost entirely concentrated in humid-type cities, whereas in arid-type cities, the policy has not yet produced a substantively positive and statistically significant impact on UFDE.

Overall, the implementation effectiveness of NFCP demonstrates an exceptionally strong dependence on climatic conditions and a high degree of asymmetry. The underlying logic behind this asymmetry warrants further exploration. First, regarding stringent ecological constraints, arid regions are strictly limited by total precipitation and available water resources. The scarcity of suitable large canopy trees and the low natural survival rate of vegetation constitute an insurmountable “hard constraint” for urban forest development, preventing the initial policy momentum from easily translating into tangible improvements in greening indicators. Second, concerning exorbitant maintenance costs, sustaining an equivalent scale of urban forests in arid areas relies heavily on long-term upkeep investments, such as artificial irrigation and soil amelioration. This drastically exacerbates the financial burden on local governments, thereby diminishing their marginal willingness to execute the policy. Finally, from the perspective of the long-term feasibility of urban forestry investments, indiscriminately advancing large-scale, water-intensive greening projects in water-scarce regions entails high risks of ecological degradation. It may even aggravate regional water stress, thwarting the achievement of dual financial and ecological sustainability. Consequently, compared to wet regions with superior natural ecological endowments, arid regions confront steeper barriers and greater risks when advancing the National Forest City policy, naturally restricting the manifestation of policy dividends.

4.4. Robustness Checks

4.4.1. Test Based on Principal Component Analysis (PCA)

When calculating the UFDE, the AHP scoring is relatively subjective. To verify the robustness of our results, we re-estimate the model using PCA. Compared to AHP, which relies on the subjective weighting of expert scoring, the core advantage of PCA lies in its objective, data-driven nature, thereby eliminating the interference of human bias. Moreover, PCA can automatically eliminate multicollinearity among indicators. When processing complex objective quantitative data, its calculation results offer greater mathematical rigor and scientific persuasiveness.

Table 6 presents the regression results using the UFDE calculated via PCA. The results show that in the full sample, the NFCP exerts a significant promoting effect on UFDE. For the eastern region, although the effect of the NFCP on UFDE is no longer statistically significant, the magnitude of the regression coefficient is notably larger than that of the western region. Conversely, while the regression coefficient for the NFCP in the western region is significant at the 10% level, its magnitude remains the smallest among the eastern, central, and western regions. This is largely consistent with the baseline regression results. For cities in arid regions, the impact of the NFCP on UFDE does not pass the significance test; for cities in wet regions, the impact of the NFCP on UFDE is significantly positive at the 1% level. These findings fully demonstrate that our core conclusion regarding the effectiveness of the NFCP remains highly robust and reliable even when UFDE is calculated using PCA, thereby ruling out coincidental interference caused by the adoption of different measurement methods.

4.4.2. Alternative Dependent Variable

To mitigate the potential risk that the estimation results of UFDE may be affected by the arbitrary choice of a particular weighting scheme, this study conducts a robustness check by replacing the measurement of UFDE. Since the arithmetic mean method is one of the classical approaches in the AHP framework for approximating the principal eigenvector of the judgment matrix, it computes the row-wise mean after normalizing the column vectors. Owing to the relatively transparent calculation process and under conditions of reasonably consistent judgment matrices, it can yield highly robust weight allocations. Therefore, this study recalculates the AHP weights for the underlying evaluation indicators using the arithmetic mean method, derives a new UFDE index accordingly, and then substitutes the original dependent variable to re-estimate the model.

Figure 2 visually presents the point estimates and their corresponding confidence intervals for the regression model across different sample groups after replacing the dependent variable. The red dashed line in the figure represents the baseline where the coefficient estimate equals zero. As shown in Figure 2, in the full sample, as well as in the eastern, central, and humid region subsamples, the estimated intervention effect of the NFCP is clearly greater than zero. This implies that, even after altering the internal weighting rule of the dependent variable, the policy’s average promoting effect on UFDE nationwide, as well as in the eastern, central, and humid regions, remains valid. By contrast, in the western and arid region subsamples, although the estimated intervention effects of the NFCP are also above zero, the zero baseline intersects their confidence intervals, indicating that the regression coefficients for these subsamples fail to pass the statistical significance test. This suggests that, following the substitution of the dependent variable, the policy’s promoting effect on UFDE in the western and arid regions is statistically insignificant. Overall, these findings comprehensively demonstrate that the core conclusion regarding the effectiveness of the NFCP in this study is highly robust and reliable, effectively ruling out any incidental biases potentially driven by the adoption of a specific weighting method for measurement.

4.4.3. City-by-Time Interaction Fixed Effects

To mitigate the potential bias caused by omitted unobservable time-varying city-specific factors and to effectively enhance the accuracy of the DML causal inference, we further control for city-by-time interaction fixed effects in this robustness check. The regression results are presented in Figure 2. After controlling for the city-by-time interaction fixed effects, the estimated intervention effects of the NFCP for the full sample, as well as the eastern, central, and humid region subsamples, are clearly located to the right of the zero baseline, and the zero baseline does not intersect their confidence intervals. This indicates that, even with the inclusion of city-by-time interaction fixed effects, the average promoting effect of the NFCP on UFDE remains robust for the nation as a whole, as well as for the eastern, central, and humid regions. In contrast, for the western and arid region subsamples, although the estimated intervention effects of the NFCP are also located to the right of the zero baseline, the zero baseline intersects their confidence intervals. This suggests that the impact of the NFCP on UFDE in the western and arid regions fails to pass the statistical significance test after controlling for these interaction fixed effects. Overall, these findings demonstrate that the core conclusion regarding the effectiveness of the NFCP remains highly robust even after strictly controlling for city-by-time interaction fixed effects.

4.4.4. Substitution of the Estimation Method

To verify that the causal inference results produced by the DML framework are not solely driven by the algorithmic specifications of the Lasso regression, this study replaces the estimation procedure with ordinary least squares (OLS), elastic net, and ridge regression for robustness checks. Within the DML framework, the advantage of adopting OLS as a benchmark estimator lies in its provision of the most intuitive and transparent linear causal reference point, which enables us to examine whether the estimated intervention effect is merely an artifact generated by fitting higher-order nonlinearities. Meanwhile, this study further introduces elastic net and ridge regression as alternative estimators in the machine learning dimension. Ridge regression improves model stability by incorporating an L₂ penalty term, which can effectively address potential severe multicollinearity among high-dimensional control variables. As a result, it substantially reduces the variance of the estimates and enhances the reliability of parameter estimation. Elastic net integrates the complementary strengths of Lasso and ridge regression. It not only facilitates effective feature selection but also maintains favorable predictive performance when explanatory variables are highly correlated.

The regression results are presented in Figure 3: Regardless of whether OLS estimation, elastic net, or ridge regression is employed, the estimated intervention effects of the NFCP for the full sample, as well as the eastern, central, and humid region subsamples, are clearly located to the right of the zero baseline, and the zero baseline does not intersect their confidence intervals. This indicates that the average promoting effect of the NFCP on UFDE remains statistically significant. Conversely, for the western and arid region subsamples, although the estimated coefficients of the NFCP are also positioned to the right of the zero baseline, the zero baseline intersects their confidence intervals, implying that the impact of the NFCP on UFDE in these regions fails to pass the statistical significance test. These findings demonstrate that the core conclusion regarding the effectiveness of the NFCP is not driven by the choice of estimation methods or underlying algorithms, thereby proving to be highly robust.

4.4.5. Treatment of Outliers

The presence of potential outliers in the sample data, if left untreated, could severely bias the regression coefficients, thereby undermining the robustness of the conclusions and even distorting statistical inference. Therefore, to verify the robustness of the baseline regression results, we employ the Winsorization method to treat the variables. Specifically, to ensure the reliability of the results, the variables are Winsorized at the 5th and 95th percentiles. The regression results following Winsorization are presented in Figure 4: After Winsorizing the variables, the estimated intervention effects of the NFCP for the full sample, as well as the eastern, central, and humid region subsamples, are clearly located to the right of the zero baseline, and the zero baseline does not intersect their confidence intervals. Concurrently, for the western and arid region subsamples, although the intervention effects of the NFCP are also positioned to the right of the zero baseline, the zero baseline intersects their confidence intervals, indicating that the impact of the NFCP on UFDE in the western and arid regions fails to pass the statistical significance test. These findings clearly demonstrate that the core conclusion regarding the effectiveness of the NFCP is not driven by outliers and remains highly robust.

4.4.6. Controlling the Effects of Other Policies

In the process of promoting urban ecological civilization in China, multiple macro-level policy initiatives are often implemented with similar objectives and overlapping time horizons. For example, during the same period in which NFCP was being advanced, a national-level program—the National Garden City (NGC) selection policy led by the Ministry of Housing and Urban–Rural Development—was also concurrently promoted. Policies within the same category exhibit substantial overlap in terms of implementation timing, coverage areas, and policy visions related to “increasing green coverage and reducing carbon emissions.” If, during evaluation, these simultaneously implemented policy factors are not disentangled and excluded from the model, the “policy effect” estimated in the baseline regression is very likely to be confounded with the benefits arising from other policies. In this case, it becomes difficult to determine whether improvements in a city’s greening level are attributable specifically to NFCP or simply because the city happened to be selected as an NGC. Therefore, to eliminate such compounded effects and precisely identify the net effect attributable solely to NFCP, it is necessary to further control for the interference of these other policies in the model. To this end, this study incorporates related policies such as NGC as additional control variables within the DML model, and then it re-conducts causal identification and regression estimation. The results are shown in Figure 3. After controlling for NGC, the estimated effects are consistent with the baseline estimates and the heterogeneity estimates across the full sample, as well as across the eastern, central, and western regions and the arid- and humid-region subsamples. This indicates that the estimation results are robust.

5. Discussion

Based on panel data from prefecture-level cities in China, this study employs the DML framework to evaluate the impact of NFCP on UFDE. The results show that the policy significantly improves UFDE overall and that the estimated effects exhibit pronounced heterogeneity across different regions and under varying endowments of natural conditions. Compared with the existing literature, this study makes contributions in two respects. Methodologically, it introduces the DML framework to enhance the robustness of causal identification. Substantively, it further deepens the understanding of the policy’s mechanisms of effect and its applicable boundaries.

First, from the theoretical perspective of policy effects, the significant positive impact of the NFCP can be understood as an empirical manifestation of the potential synergistic relationship between environmental regulation and local governance incentives. The existing literature demonstrates that environmental policies typically influence local government behavior through a fundamental “constraint–incentive” framework; that is, they create constraints by setting targets and assessment standards while simultaneously guiding the behavioral orientation of local governments through performance evaluation systems [10,11]. In the institutional context of China, this logic is often coupled with the “target responsibility system” and a performance-oriented official evaluation system [13], thereby incorporating ecological governance into the vital agenda of local governance to a certain extent. Under this framework, as a policy arrangement with clear indicator requirements, the policy effects of the NFCP can be interpreted from the perspectives of “strengthened assessment constraints” and “adjustment of governance priorities.” Specifically, the relevant indicator system may compel local governments to attach greater importance to urban forest construction during resource allocation and policy execution, thereby driving the improvement of corresponding ecological indicators. Meanwhile, as environmental performance is gradually integrated into the evaluation system, local governments’ investments and execution efforts in the field of ecological governance are likely to increase accordingly [12]. Furthermore, the honorary attributes attached to the policy may, to some extent, enhance its external visibility, although the specific mechanisms and extent of this impact require further investigation. Our empirical results further validate this logic: Following the implementation of the policy, indicators related to urban forest construction (such as green coverage rate and ecological service capacity) exhibited systematic improvements. This is consistent with the findings of Nowak et al. (2006) [15] and Livesley et al. (2016) [1] regarding the crucial role of urban forests in improving air quality and enhancing ecosystem services.

Second, this study finds substantial regional heterogeneity in the policy effects: The impact is more pronounced in the eastern and central regions, while it is relatively weaker in the western region. This pattern can be explained from two perspectives—namely, differences in development stage and constraints arising from natural endowments. From an economic development standpoint, the eastern and central regions possess stronger fiscal capacity and more developed infrastructure, enabling them to translate policy requirements into actual ecological investment more effectively [23]. In addition, higher levels of development are typically associated with stronger public environmental preferences, which further encourage local governments to strengthen greening initiatives [12]. From the perspective of natural endowments, urban forest development is highly dependent on water resources and climatic conditions [16]. Arid and semi-arid cities not only face stringent ecological baseline constraints (e.g., insufficient precipitation and high evaporation rates that result in low afforestation survival rates) but must also bear exorbitant subsequent maintenance costs—particularly for water resource allocation and continuous artificial irrigation—to sustain a certain stock of urban forests. Such steep financial and ecological costs directly threaten the long-term feasibility of urban forestry investments. Forcibly imposing the same greening standards in arid regions as in wet regions not only poses an acute risk of vegetation degradation and ecological regression but may also trigger secondary ecological crises, such as competing with residents for water use or even exacerbating groundwater depletion. Consequently, this negative feedback mechanism in arid regions significantly attenuates the actual efficacy of the policy. A similar conclusion is echoed by Tratalos et al. (2007) [24], who indicate that the distribution of urban green space is significantly influenced by climatic and geographical conditions. Therefore, despite the uniform standards of the NFCP, its marginal effects inherently vary across different ecological zones. This suggests that future policy design should place greater emphasis on being “tailored to local conditions.” For ecologically fragile arid regions, policymakers should move beyond the paradigm of singularly pursuing green coverage rates. Instead, the focus should shift to “determining greening based on water availability” and developing water-saving, near-natural urban greening models.

From a methodological standpoint, an important contribution of this study lies in introducing DML into the evaluation of urban ecological policies characterized by staggered implementation. Traditional DID methods rely on strict linear model specifications and the stringent parallel trends assumption. When dealing with policies implemented in phases such as the NFCP, these conventional approaches often struggle to address the “negative weighting” bias caused by the heterogeneity of treatment effects across different cohorts. In contrast, DML provides substantive core gains in two main aspects. First, in the context of staggered policies, the timing of a city’s enrollment is often highly correlated with its initial economic and ecological conditions. By constructing orthogonal scores, DML effectively relaxes the strong assumption of exogenous treatment assignment, thereby maximizing the control over confounding bias induced by self-selection. Second, DML does not mandate that the sample strictly satisfy the stringent parallel trends prerequisite. By employing machine learning algorithms for the non-parametric fitting of nuisance parameters, it directly blocks the interference of high-dimensional confounders on the causal inference pathway. Ultimately, in the face of the complex reality of progressive policy rollouts and urban development heterogeneity, these methodological gains enable this study to identify the true net effect of the NFCP more robustly and scientifically than traditional policy evaluation methods. In addition, this study further examines the consistency of the results through alternative estimators, including OLS, ridge regression, and elastic net. Specifically, ridge regression, through L₂ regularization, effectively mitigates problems of multicollinearity [25]. Elastic net combines L₁ and L₂ penalties, thereby achieving a balance between variable selection and model stability [26]. By incorporating these methods, the analysis can identify the policy’s net effect more reliably, while avoiding confounding effects arising from misspecification bias in traditional modeling frameworks.

This study further verifies the independence and validity of the main conclusions by controlling for NGC. In practice, multiple environmental policies are often implemented concurrently; if such policies are not accounted for, the estimated effects may be confounded, giving rise to a policy mix effect. Prior research suggests that the stacking of multiple policies can lead to causal identification bias [20]. After incorporating NGC as an additional variable in the model, our key estimates remain statistically significant, indicating that the effect of the NFCP is independent and stable. This not only strengthens the credibility of the research findings but also implies that NFCP plays an irreplaceable role within China’s urban ecological governance framework.

From a broader perspective on urban sustainable development, the findings of this study also carry important policy implications. Urban forests are not only a key component of ecological construction but also a critical means to enhance urban livability and mitigate climate change [4,27]. The United Nations Sustainable Development Goals (SDGs) explicitly emphasize the role of urban green spaces in improving residents’ well-being. Our empirical results suggest that, through policy guidance, cities can achieve a substantial improvement in the level of urban green infrastructure within a relatively short period. This provides a potentially replicable policy experience for developing countries seeking to advance urban sustainability.

6. Conclusions

Focusing on UFDE in China’s prefecture-level cities, this study constructs a comprehensive evaluation framework using the AHP–entropy weight method and, in combination with the DML approach, conducts a systematic assessment and causal identification of the implementation effects of the NFCP. By integrating multidimensional greening indicators, improving the empirical methodology, and conducting a series of robustness checks, this study establishes a relatively complete research framework and an internally coherent logical chain. The main findings are as follows. First, this study finds that the NFCP significantly improves UFDE in prefecture-level cities, and this conclusion remains stable across various model specifications and testing conditions. This suggests that institutional arrangements centered on policy selection and incentive mechanisms can effectively guide local governments to optimize resource allocation, increase ecological investment, and thereby achieve sustained improvements in urban greening levels. Second, the policy effects are more pronounced in the eastern and central regions and in areas with humid climatic conditions, but they are relatively weaker in the western region and in arid areas.

In terms of research innovation and academic contributions, this study is primarily reflected in the following aspects. The first is methodological innovation. This study introduces the DML framework into the evaluation of urban ecological policies, thereby effectively addressing estimation biases commonly arising in conventional parametric models under high-dimensional and nonlinear settings. As a result, the causal identification precision and robustness are significantly improved. The second is innovation in the indicator system. This study develops the UFDE index by combining the analytic hierarchy process (AHP) with the entropy-weight method. In addition, it incorporates a coupling–coordination degree model, thereby enriching the ways in which UFDE can be measured. The third is the expansion of the research perspective. Building upon the identification of the overall policy effect, the analysis further investigates heterogeneity across multiple dimensions, including regional differences, natural conditions, and government size. This provides more granular empirical evidence for understanding the policy’s mechanisms of effect.

From a practical standpoint, this study offers important implications for optimizing urban ecological governance and promoting green development in China. Firstly, it is crucial to continue refining the NFCP system by strengthening assessment and incentive mechanisms. This will facilitate a shift in ecological development from quantitative expansion to qualitative improvement. Secondly, policies should be differentiated based on the resource endowments and development stages of various regions. Specifically, increased financial and technical support should be directed towards western and ecologically fragile areas. Thirdly, the boundaries between government and market roles should be clearly defined, encouraging social capital participation in ecological construction to enhance resource allocation efficiency. Finally, there is a need to strengthen the synergistic design among different ecological policies to avoid efficiency losses caused by policy overlaps.

It is important to acknowledge that this study also has certain limitations. Although the DML method offers advantages in causal identification, its results are nonetheless contingent on data quality and variable selection. Furthermore, this study primarily focuses on the average treatment effect of the policy and has not yet delved into an in-depth analysis of its long-term dynamic effects or micro-level mechanisms (e.g., changes in firm behavior or resident welfare). Future research could integrate remote sensing data, micro-level firm data, or resident survey data to further explore the comprehensive impacts of urban forest policies across multiple dimensions (e.g., environmental health effects). Concurrently, it would be beneficial to investigate the synergistic or substitutive relationships among different ecological policies to achieve a more comprehensive understanding of the operational mechanisms within the ecological governance system.

Author Contributions

Conceptualization, H.L. and Y.W.; methodology, H.L.; software, H.L.; validation, L.W. and F.W.; formal analysis, H.L.; investigation, H.L.; resources, F.W.; data curation, Y.W.; writing—original draft preparation, L.W.; writing—review and editing, H.L.; visualization, H.L.; supervision, H.L.; project administration, Y.W.; funding acquisition, H.L. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the General Program of the National Social Science Fund of China (Grant No. 22BJY153).

Data Availability Statement

The data presented in this study are available upon request from the corresponding author due to the large scale and high dimensionality of the compiled panel dataset.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

The calculation of the coupling coordination degree index mainly consists of four steps. First, the min–max normalization method is used to standardize C1 and C2, transforming the original indicators into standardized values

U_{i}^{'}

within the interval [0, 1]. Then, based on this, the coupling degree is calculated as

C = \frac{2 \sqrt{U_{1}^{'} \cdot U_{2}^{'}}}{U_{1}^{'} + U_{2}^{'}}

which is used to characterize the interaction intensity between C1 and C2; the closer its value is to 1, the higher the degree of coupling.

Next, a comprehensive development index is constructed as

T = α U^{'} + β U_{2}^{'}, α = β = 0.5

to reflect the overall development level.

Finally, the coupling degree is combined with the comprehensive development index to calculate the coupling coordination degree

\tilde{C} = \sqrt{C \times T}

thereby providing a comprehensive measure of the coordinated development level between C1 and C2.

Appendix B

Table A1. Control variables and their definitions.

Variable	Definition	Source
Per Capita GDP	GDP/Year-End Resident Population	Urban Construction Statistical Yearbook
Level of Opening-Up	Actually Utilized Foreign Capital/GDP	Urban Construction Statistical Yearbook
Population Density	Resident Population/Administrative Area	Urban Construction Statistical Yearbook
Urbanization Rate	Urban Population/Total Population	Urban Construction Statistical Yearbook
Industrial Structure Upgrading	Share of Tertiary Industry Value-added in GDP	Urban Construction Statistical Yearbook
Fiscal Pressure	Fiscal Expenditure/Fiscal Revenue	Urban Construction Statistical Yearbook
Government Governance Capability	Smart City Pilot Status (1 If Yes, 0 Otherwise)	Obtained from Public Sources
Annual Average Temperature	Annual Average City Temperature	Calculated based on NCEI data
Average City Elevation	Average City Surface Elevation	Extracted from SRTM DEM
Terrain Ruggedness	Difference between the Highest and Lowest Elevations of the City	Extracted from SRTM DEM

References

Livesley, S.J.; McPherson, E.G.; Calfapietra, C. The Urban Forest and Ecosystem Services: Impacts on Urban Water, Heat, and Pollution Cycles at the Tree, Street, and City Scale. J. Environ. Qual. 2016, 45, 119–124. [Google Scholar] [CrossRef]
Kacprzak, M.J.; Ellis, A.; Fijalkowski, K.; Kupich, I.; Gryszpanowicz, P.; Greenfield, E.; Nowak, D. Urban Forest Species Selection for Improvement of Ecological Benefits in Polish Cities—The Actual and Forecast Potential. J. Environ. Manag. 2024, 366, 121732. [Google Scholar] [CrossRef]
Ramon, M.; Ribeiro, A.P.; Theophilo, C.Y.S.; Moreira, E.G.; de Camargo, P.B.; Pereira, C.A.d.B.; Saraiva, E.F.; Tavares, A.d.R.; Dias, A.G.; Nowak, D.; et al. Assessment of Four Urban Forest as Environmental Indicator of Air Quality: A Study in a Brazilian Megacity. Urban Ecosyst. 2023, 26, 197–207. [Google Scholar] [CrossRef]
Kabisch, N.; Korn, H.; Stadler, J.; Bonn, A. Nature-Based Solutions to Climate Change Adaptation in Urban Areas—Linkages Between Science, Policy and Practice. In Nature-Based Solutions to Climate Change Adaptation in Urban Areas. Theory and Practice of Urban Sustainability Transitions; Kabisch, N., Korn, H., Stadler, J., Bonn, A., Eds.; Springer: Cham, Switzerland, 2017. [Google Scholar] [CrossRef]
Wang, Y.; Zhang, M.; An, Z.; Hou, M.; Wei, F.; Lu, W. Green Innovation, Industrial Upgrading, and Urban Environmental Improvement-Evidence from the Construction of National Forest Cities in China. Forests 2026, 17, 462. [Google Scholar] [CrossRef]
Wang, Y.; Zou, F.; Wei, F.; Hou, M.; Jin, H.; Zhang, M. Can Forest Cities Enhance Both Economic and Ecological Resilience? Front. Ecol. Evol. 2025, 13, 1671456. [Google Scholar] [CrossRef]
Wang, Y.; Zou, F.; Guo, W.; Lu, W.; Deng, Y. Impact of Forest City Selection on Green Total Factor Productivity in China under the Background of Sustainable Development. Forests 2024, 15, 1064. [Google Scholar] [CrossRef]
Ma, Y.; Geng, Y.; Zhong, S. Urban Forest Development and Extreme Heat Mitigation: The Climate Adaptation Effects of China’s National Forest City Policy. Forests 2026, 17, 79. [Google Scholar] [CrossRef]
Li, L.; Li, B.; Yao, T.; Zeng, Y. Building Urban Resilience through National Forest City: Evidence from China’s Sustainable Urban Transformation. Appl. Spat. Anal. Policy 2026, 19, 73. [Google Scholar] [CrossRef]
Porter, M.E.; Linde, C.V.D. Toward a New Conception of the Environment-Competitiveness Relationship. J. Econ. Perspect. 1995, 9, 97–118. [Google Scholar] [CrossRef]
Ambec, S.; Cohen, M.A.; Elgie, S.; Lanoie, P. The Porter Hypothesis at 20: Can Environmental Regulation Enhance Innovation and Competitiveness? Rev. Env. Econ. Policy 2013, 7, 2–22. [Google Scholar] [CrossRef]
Zheng, S.; Kahn, M.E. A New Era of Pollution Progress in Urban China? J. Econ. Perspect. 2017, 31, 71–92. [Google Scholar] [CrossRef]
Li, H.; Zhou, L.-A. Political Turnover and Economic Performance: The Incentive Role of Personnel Control in China. J. Public Econ. 2005, 89, 1743–1762. [Google Scholar] [CrossRef]
Angrist, J.D.; Pischke, J.-S. Mostly Harmless Econometrics: An Empiricist’s Companion; Princeton University Press: Princeton, NJ, 2009. [Google Scholar]
Nowak, D.J.; Crane, D.E.; Stevens, J.C. Air Pollution Removal by Urban Trees and Shrubs in the United States. Urban For. Urban Green. 2006, 4, 115–123. [Google Scholar] [CrossRef]
Roy, S.; Byrne, J.; Pickering, C. A Systematic Quantitative Review of Urban Tree Benefits, Costs, and Assessment Methods across Cities in Different Climatic Zones. Urban For. Urban Green. 2012, 11, 351–363. [Google Scholar] [CrossRef]
Kahn, M.E.; Sun, W.; Zheng, S. Clean Air as an Experience Good in Urban China. Ecol. Econ. 2022, 192, 107254. [Google Scholar] [CrossRef]
Chernozhukov, V.; Chetverikov, D.; Demirer, M.; Duflo, E.; Hansen, C.; Newey, W.; Robins, J. Double/Debiased Machine Learning for Treatment and Structural Parameters. Econom. J. 2018, 21, C1–C68. [Google Scholar] [CrossRef]
Haase, D.; Larondelle, N.; Andersson, E.; Artmann, M.; Borgström, S.; Breuste, J.; Gomez-Baggethun, E.; Gren, Å.; Hamstead, Z.; Hansen, R.; et al. A Quantitative Review of Urban Ecosystem Service Assessments: Concepts, Models, and Implementation. AMBIO 2014, 43, 413–433. [Google Scholar] [CrossRef]
Böhringer, C.; Keller, A.; Van Der Werf, E. Are Green Hopes Too Rosy? Employment and Welfare Impacts of Renewable Energy Promotion. Energy Econ. 2013, 36, 277–285. [Google Scholar] [CrossRef]
Ostrom, E. Polycentric Systems for Coping with Collective Action and Global Environmental Change. Glob. Environ. Change 2010, 20, 550–557. [Google Scholar] [CrossRef]
Heiler, P. Heterogeneous Treatment Effect Bounds under Sample Selection with an Application to the Effects of Social Media on Political Polarization. J. Econom. 2024, 244, 105856. [Google Scholar] [CrossRef]
Zhang, Q.; Yang, L.; Xu, S. The Relationships of Supporting Services and Regulating Services in National Forest City. Forests 2022, 13, 1368. [Google Scholar] [CrossRef]
Tratalos, J.; Fuller, R.A.; Warren, P.H.; Davies, R.G.; Gaston, K.J. Urban Form, Biodiversity Potential and Ecosystem Services. Landsc. Urban Plan. 2007, 83, 308–317. [Google Scholar] [CrossRef]
Hoerl, A.E.; Kennard, R.W. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics 2000, 42, 80–86. [Google Scholar] [CrossRef]
Zou, H.; Hastie, T. Regularization and Variable Selection via the Elastic Net. J. R. Stat. Soc. Ser. B 2005, 67, 301–320. [Google Scholar] [CrossRef]
Gill, S.E.; Handley, J.F.; Ennos, A.R.; Pauleit, S. Adapting Cities for Climate Change: The Role of the Green Infrastructure. Built Environ. 2007, 33, 115–133. [Google Scholar] [CrossRef]

Figure 1. Results of the parallel trend test.

Figure 2. Robustness checks with alternative dependent variables and city–year interactions. ADV represents the alternative dependent variable. CTF stands for city-by-time interaction fixed effects. FULL is the full sample. ER indicates the eastern region. CR indicates the central region. WR indicates the western region. AR stands for arid region. HR stands for humid region.

Figure 3. Robustness test results using alternative estimation methods. OLS denotes the OLS estimation results. ELASTIC denotes the elastic net estimation results. RIDGE denotes the ridge regression estimation results.

Figure 4. Robustness check results after outlier treatment and controlling for other policies. Wins. denotes the Winsorized estimation results, and NGC denotes the estimation results after controlling for the NGC policy.

Table 1. Design of the evaluation index system for UFDE.

Target Layer	Criterion Layer	Indicator Layer	Indicator Attribute
UFDE (U)	Forest Network (B1)	Forest Coverage Rate (C1)	Positive
	Forest Network (B1)	Green Coverage Rate of Built-up Areas (C2)	Positive
	Forest Health (B2)	Growth Rate of Forest Coverage (C3)	Positive
	Forest Health (B2)	Growth Rate of Green Space Area in Built-up Areas (C4)	Positive
	Ecological Welfare (B3)	Per Capita Park Green Space (C5)	Positive
	Ecological Welfare (B3)	Per Capita Green Space in Built-up Areas (C6)	Positive
	Development Coordination (B4)	Coupling Coordination Degree of C1 and C2 (C7)	Positive

Table 2. U–B judgment matrix.

U	B1	B2	B3	B4
B1	1	5	3	5
B2	1/5	1	1/3	1
B3	1/3	3	1	3
B4	1/5	1	1/3	1

Table 3. Baseline regression results.

Variables	Dependent Variable: UFDE
Variables	(1)	(2)	(3)	(4)	(5)	(6)
NFCP	0.050 **	0.060 ***	0.055 ***	0.040 ***	0.183 *	0.054 ***
	(0.020)	(0.012)	(0.010)	(0.009)	(0.095)	(0.019)
Control Variables	no	yes	yes	yes	yes	yes
Squared Terms of the Control Variables	no	no	yes	yes	yes	yes
Time Fixed Effects	yes	yes	yes	yes	yes	yes
City Fixed Effects	yes	yes	yes	yes	yes	yes
Province Fixed Effects	no	no	no	yes	yes	yes
Obs	3280	3280	3280	3280	2546	2546

Note: ***, **, and * denote statistical significance at the 1%, 5%, and 10% levels, respectively; the values in square brackets are standard errors.

Table 4. Heterogeneity analysis by region and city size.

Variables	Dependent Variable: UFDE
Variables	Eastern	Central	Western	Large Cities	Small Cities
NFCP	0.065 ***	0.067 ***	0.012	0.081 ***	0.047 ***
	(0.015)	(0.026)	(0.022)	(0.015)	(0.017)
Control Variables	yes	yes	yes	yes	yes
Squared Terms of the Control Variables	yes	yes	yes	yes	yes
Time Fixed Effects	yes	yes	yes	yes	yes
City Fixed Effects	yes	yes	yes	yes	yes
Province Fixed Effects	yes	yes	yes	yes	yes
Obs	1200	960	1120	1180	2100

Note: ***, **, and * denote statistical significance at the 1%, 5%, and 10% levels, respectively; the values in square brackets are standard errors.

Table 5. Heterogeneity analysis by government intervention, resource-based cities, and humid–arid types.

Variables	Dependent Variable: UFDE
Variables	Strong Intervention	Weak Intervention	Resource-Based Cities	Non-Resource-Based Cities	Arid-Type Cities	Humid-Type Cities
NFCP	0.045 **	0.041 ***	0.036 *	0.036 ***	0.018	0.102 ***
	(0.019)	(0.014)	(0.020)	(0.011)	(0.012)	(0.018)
Control Variables	yes	yes	yes	yes	yes	yes
Squared Terms of the Control Variables	yes	yes	yes	yes	yes	yes
Time Fixed Effects	yes	yes	yes	yes	yes	yes
City Fixed Effects	yes	yes	yes	yes	yes	yes
Province Fixed Effects	yes	yes	yes	yes	yes	yes
Obs	483	2797	1320	1960	2160	1120

Note: ***, **, and * denote statistical significance at the 1%, 5%, and 10% levels, respectively; the values in square brackets are standard errors.

Table 6. Regression results based on PCA.

Variables	Dependent Variable: UFDE
Variables	FULL	ER	CR	WR	AR	HR
NFCP	0.086 **	0.102	0.156 *	0.089 *	0.016	0.299 ***
	(0.035)	(0.065)	(0.095)	(0.050)	(0.047)	(0.057)
Control Variables	yes	yes	yes	yes	yes	yes
Squared Terms of the Control Variables	yes	yes	yes	yes	yes	yes
Time Fixed Effects	yes	yes	yes	yes	yes	yes
City Fixed Effects	yes	yes	yes	yes	yes	yes
Province Fixed Effects	yes	yes	yes	yes	yes	yes
Obs	3280	1200	960	1120	2160	1120

Note: ***, **, and * denote statistical significance at the 1%, 5%, and 10% levels, respectively; the values in square brackets are standard errors.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, H.; Wang, L.; Wei, F.; Wang, Y. Evaluation of the Urban Forest Development Effectiveness in Chinese Cities: A Causal Inference Approach Based on Double Machine Learning. Forests 2026, 17, 666. https://doi.org/10.3390/f17060666

AMA Style

Liu H, Wang L, Wei F, Wang Y. Evaluation of the Urban Forest Development Effectiveness in Chinese Cities: A Causal Inference Approach Based on Double Machine Learning. Forests. 2026; 17(6):666. https://doi.org/10.3390/f17060666

Chicago/Turabian Style

Liu, Huanpeng, Luning Wang, Feng Wei, and Yameng Wang. 2026. "Evaluation of the Urban Forest Development Effectiveness in Chinese Cities: A Causal Inference Approach Based on Double Machine Learning" Forests 17, no. 6: 666. https://doi.org/10.3390/f17060666

APA Style

Liu, H., Wang, L., Wei, F., & Wang, Y. (2026). Evaluation of the Urban Forest Development Effectiveness in Chinese Cities: A Causal Inference Approach Based on Double Machine Learning. Forests, 17(6), 666. https://doi.org/10.3390/f17060666

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Evaluation of the Urban Forest Development Effectiveness in Chinese Cities: A Causal Inference Approach Based on Double Machine Learning

Abstract

1. Introduction

2. Background

3. Data and Methods

3.1. Model Specification

3.2. Construction UFDE

3.2.1. Calculating Indicator Weights Using AHP

3.2.2. Combined Weighting Results of AHP–Entropy Weight Method

3.3. Data Sources

4. Results

4.1. Benchmark Regression

4.2. Addressing Endogeneity

4.3. Heterogeneity Analysis

4.3.1. Regional Heterogeneity

4.3.2. City Size Heterogeneity

4.3.3. Heterogeneity in Government Intervention Intensity

4.3.4. Heterogeneity in Resource Endowments

4.3.5. Heterogeneity in Natural Climatic Conditions

4.4. Robustness Checks

4.4.1. Test Based on Principal Component Analysis (PCA)

4.4.2. Alternative Dependent Variable

4.4.3. City-by-Time Interaction Fixed Effects

4.4.4. Substitution of the Estimation Method

4.4.5. Treatment of Outliers

4.4.6. Controlling the Effects of Other Policies

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI