5.2. Robustness Test
(1) Parallel trend test. Referring to Fang et al. [
9], we use the event research method for parallel trend test. The model is set up as follows:
where
denotes the time dummy variable for PDA in year
within the time window.
Figure 4 reports that there is no significant difference in urban spatial structure between the treatment and control groups before the launch. After the launch, the index of urban polycentric spatial structure is significantly higher than that of the control group, verifying the parallel trend hypothesis of the basic regression. And the increasing trend of
indicates that PDA has a long-term temporal dynamic effect, and that this effect increases with time.
(2) Placebo test. We conduct 500 random sampling regression on the city and the launch time of the government data platform. In
Figure 5, the kernel density of estimated coefficients approximates a normal distribution with a mean of 0. The
p-values of the vast majority of the estimated coefficients are greater than the 10% significance level of 0.1, and none exceed basic regression coefficient of 0.0251. The conclusion passes the placebo test and is not driven by random factors.
(3) Test for heterogeneity of treatment effects. When using the multi-period DID model for policy effect estimation, the different timing of policy shocks in multiple treatment groups may produce heterogeneous treatment effects, leading to serious bias in the estimates after weighted average treatment [
48]. To address this issue, this paper draws on Goodman-Bacon [
49] to identify bias sources and interference extent from heterogeneous treatment effects. This is achieved by decomposing the two-way fixed effects estimator into three sets of 2 × 2-DID estimators and calculating the average treatment effects and weights for each set separately.
Figure 6 reports the results of the heterogeneous treatment-effect test, based on the Goodman-Bacon decomposition method. The results show that the combined overall weight of the 2 × 2-DID estimates with cities that entered the public data access pilot later, and never entered the public data access pilot as the control group is 97.79%, while only the 2 × 2-DID estimates with an overall weight of 2.21% have cities that entered the public data access pilot earlier, as the control group. This indicates that the heterogeneity degree of the treatment effects used in this paper is small, and does not significantly affect the results of basic regressions, strengthening the reliability of basic conclusions of this paper.
(4) Excluded confounding policy shocks. This paper manually collects policies that are more similar to PDA for testing. The related policies mainly include Smart City (SMART), Information Benefits the People (INFO), Broadband China (BROADBAND), Power Sector Reform (POWER), and administrative reform (Data on administrative division changes includes county-to-district conversions, county-to-city reorganizations, provincial-level county administration, and newly established prefecture-level cities during the study period. The original data was sourced from the National Administrative Division Information Inquiry Platform provided by the Ministry of Civil Affairs) (ADMINISTRATIVE) pilots.
Table 3 shows that the estimated coefficients of the remaining strategies, except PDA, are insignificant. Thus, it suggests that the pilot shock selected in this paper to bring the government data platform online is unique and relatively exogenous, and the findings are not affected by the remaining confounding policy factors.
(5) Instrumental variables. To mitigate the potential endogeneity problem, we adopt the instrumental variable (IV) method. On the one hand, the IV1 is defined as the inverse of the mean of the distances from the coastline of all municipal party secretaries’ domicile cities in each city. The correlation in instrumental variables stems from the potential influence of municipal party secretaries’ hometown characteristics on their support for public data openness policies, as personal backgrounds may correlate with policy preferences. Exogeneity is ensured by the historical fixed nature of these hometown characteristics, which were established prior to policy implementation and are unlikely to be inversely affected by urban spatial structures. This guarantees the independence between instrumental variables and error terms, making them suitable for estimating the impact of public data openness on urban spatial structure. The domicile characteristics of successive municipal party secretaries in each city from 1985 to 2009 (before the launch of the urban public open data platform) are manually collected [
50]. It is important to note that the domicile characteristics of municipal party secretaries prior to 2009 are cross-sectional data, which poses a challenge in their direct utilization as IV1 for panel data. The cross-multiplier term between the number of internet broadband-access subscribers per 100 people and the domicile characteristics of pre-2010 municipal party secretaries is adopted as the IV1 [
51]. To alleviate potential endogeneity, the two-stage least squares method is employed. In columns (1) and (2) of
Table 4, the
p-value of the under-identification test is 0.0001, rejecting the hypothesis of non-identification. The Cragg–Donald Wald F value of 86.0820 exceeds the commonly accepted rule of thumb of 10 and surpasses the Stock–Yogo 10% threshold for weak IV1 of 16.38, indicating that the problem of weak IV1 in this paper is less significant. The regression coefficient of PDA is significantly positive, indicating that basic conclusions remain robust.
On the other hand, this study references Bosker and Buringh’s [
52] research, using the ratio of river density to exchange rate (IV2) as the core instrumental variable derived from geographical feature data. (The original data of river density should be extracted according to the vector map of rivers provided by the National Geographic Information Center. River density = river length/area). Regarding river density, correlation analysis reveals that cities with higher river density typically encounter complex challenges in managing water resources, conserving ecosystems, and controlling floods. To address these challenges, there is a need for increased transparency in public data concerning water resource monitoring, water environment governance, and urban drainage systems. This enhanced data openness is crucial for supporting sustainable urban development and enhancing the quality of life for residents, especially prior to the establishment of government data platforms. This establishes a positive correlation between river density and public data openness. Externally, river density is primarily determined by a city’s natural geographical conditions—a long-formed characteristic—without direct causal relationship with urban spatial structure, thus meeting the exogeneity requirement for instrumental variables. Considering this geographical feature variable as cross-sectional data, this study employs the ratio of river density to exchange rate (IV2) as an instrumental variable for instrumental variable regression, to align with panel data. As shown in columns (3) and (4) of
Table 4, the under-identification test and Cragg–Donald Wald F statistics confirm compliance with instrumental-variable selection criteria, with estimated coefficients of IV2 being significantly positive, consistent with theoretical expectations.
Regarding the exogeneity issue of IV, exclusion of endogeneity is challenging to verify, as the disturbance term is inherently unobservable. Conley et al. [
53] introduced a novel method termed “reasonable exogenous instrumental variable estimation,” which replaces strict exclusion of endogeneity by assuming that instrumental variables exert a certain influence on the dependent variable. Parameters are constrained within specific ranges or prior distributions, and confidence intervals for regression coefficients are constructed using prior information about these parameters to assess the robustness of estimates under incomplete exogeneity. This study references Huang et al. [
54], and employs the Local-to-Zero (LTZ) method proposed by Conley et al. [
53], for estimation. Results in columns (1)–(6) of
Table 5 demonstrate that findings based on “reasonable exogenous instrumental variable estimation” align with previous conclusions.
(6) Additional controls for other factors. Given the established correlations between educational attainment, digital literacy, and urban governance quality with this study, these three indicators were incorporated as control variables. Educational level was measured using human capital metrics, calculated as the ratio of regular university students to total population at year-end. Digital literacy was assessed using Peking University’s “Digital Financial Inclusion City Index”, developed by its Digital Finance Research Center. Urban governance quality was evaluated through the Government Transparency Index, published by Tsinghua University. The results in column (1) of
Table 6 show that the results are still robust.
(7) Replace core explanatory variables. Firstly, the method of identifying urban areas is changed. Based on LandScan data, the exploratory spatial data analysis method is used to identify urban areas [
55]. A spatial weight matrix is constructed, based on vertex principles, with local Moran indices calculated to screen out grids showing spatial lag values of population density above average. Adjacent grids are merged into polygons according to vertex adjacency rules, with the polygon containing the most population designated as the main urban area. Final urban areas are identified using the criteria of area exceeding 1 km
2 and total population surpassing 50,000. Secondly, referring to Li [
56], we use consumer-service and public-service POI data, and use kernel density estimation to measure the multi-center agglomeration service-facility index. The results in
Table 6 demonstrate that replacing core explanatory variables maintains robust results.
(8) Other robust tests. This paper also adopts the following methods for robust test. Adjust the research sample: to verify the robustness of research results, the sample of municipalities is excluded for regression. Remove outlier effects: this paper shrinks all variables in the basic regression, except the disposal variable, by 1%. Consider province–time interaction fixed effects: this paper adds province–time interaction fixed effects to basic regressions.
Table 7 indicates that the basic conclusion that “PDA is conducive to promoting urban polycentric development” is still valid.