Data Elements as a Systemic Enabler of Corporate Green Innovation: A Complex Adaptive System Perspective on China’s Public Data Openness Reform

Zhang, Xuexin; Zhang, Lin

doi:10.3390/systems14070731

Open AccessArticle

Data Elements as a Systemic Enabler of Corporate Green Innovation: A Complex Adaptive System Perspective on China’s Public Data Openness Reform

by

Xuexin Zhang

and

Lin Zhang

^*

School of Accounting, Harbin University of Commerce, Harbin 150028, China

^*

Author to whom correspondence should be addressed.

Systems 2026, 14(7), 731; https://doi.org/10.3390/systems14070731 (registering DOI)

Submission received: 27 May 2026 / Revised: 19 June 2026 / Accepted: 21 June 2026 / Published: 24 June 2026

Download

Browse Figures

Versions Notes

Abstract

Sustainability transitions confront firms with the following informational paradox: the regulatory pressure to innovate green has intensified, yet the knowledge required to do so is dispersed across agencies, sectors, and jurisdictions that rarely speak to one another. Treating data as a strategic factor of production, this paper asks whether and how opening public data—the systematic release of government-held datasets—reconfigures the conditions under which firms generate green innovation. We model the green-innovation ecosystem as a Complex Adaptive System (CAS) in which heterogeneous, bounded-rational agents co-evolve with a data-mediated selection environment. Within this frame, public data openness (PDO) is not marginal input but an exogenous shock to the fitness landscape that propagates through three coupling channels—supply–demand alignment, recalibration of government intervention, and amplification of green credit. Formal derivations link each channel to a testable proposition, and a multi-period Difference-in-Differences (DIDs) design built on the staggered roll-out of Chinese municipal open-data platforms identifies the causal effects, with Callaway–Sant’Anna estimators and double/debiased machine learning (DDML) addressing recent econometric critiques. The evidence supports each proposition and reveals the following distinctive heterogeneity signature consistent with absorptive-capacity heterogeneity: the policy is most consequential where agents and ecosystems are best able to convert data into knowledge. Reframing PDO as a systemic enabler clarifies why uniform rollouts yield uneven returns and motivates a tiered design that scales with the absorptive capacity of recipient firms and regions.

Keywords:

public data openness; green innovation; complex adaptive systems; innovation ecosystem; data elements; difference-in-differences

1. Introduction

On a spring morning in 2016, the municipal government of Hangzhou switched on a new kind of urban infrastructure—not a road, a substation, or a fibre line, but an open-data portal that, by close of business, had exposed several hundred government-held datasets to firms, researchers, and citizens. Within three years, green-patent filings by Hangzhou-headquartered listed firms had grown markedly faster than those of otherwise comparable firms in cities that had not yet opened their data. Across China, the pattern repeats as follows: cities that opened public data earlier and more comprehensively tend, after a lag, to host firms whose green-innovation output pulls ahead. Yet the pattern is far from uniform. Among firms in the same city, ownership structure, managerial background, and pre-existing absorptive capacity appear to partition the response into sharply different regimes. Some firms accelerate; others do not move at all.

This unevenness sits awkwardly with the dominant theoretical lenses applied to data and innovation. Resource-based and knowledge-based accounts treat data as an input whose marginal product is positive but bounded by absorptive capacity [1]; information-economics accounts emphasise signal-precision gains that loosen credit constraints [2]; and digital-economics arguments highlight reductions in search and verification cost. Each lens captures something, but each treats the firm in isolation from the ecosystem in which it acts. The heterogeneity we observe—and the comparative-static pattern across mechanisms—is difficult to derive from any single lens taken alone.

We argue that PDO is better read as an exogenous shock to a CAS: the green-innovation ecosystem [3,4]. Firms are heterogeneous, bounded-rational agents whose innovation strategies co-evolve with a selection environment composed of customers, suppliers, regulators, and financiers [5,6]. Public data do not enter agents’ production functions as a passive factor; they reshape the fitness landscape on which agents adapt, and propagate through the following three coupling channels that are inscribed in the ecosystem rather than within any single firm: it tightens supply–demand alignment by reducing signal noise; it recalibrates the role of the state by compressing the information rents on which discretionary intervention rests [7,8]; and it amplifies green-credit access by sharpening the posterior beliefs that lenders form about firm green credibility. From this CAS frame we derive four formal propositions—a main effect and three channel effects—each of which yields a testable hypothesis with a sign restriction.

We test these predictions using the staggered roll-out of municipal open-data platforms across Chinese A-share listed firms over 2013–2022 as a quasi-natural experiment. The empirical strategy combines a multi-period DID baseline with the following two safeguards demanded by the recent econometric literature: the Callaway and Sant’Anna [9] estimator (hereafter CSDID), which corrects the heterogeneous-treatment-effect biases that contaminate two-way fixed-effect DID (hereafter TWFE-DID) [10,11], and DDML [12], which residualises high-dimensional controls without imposing a linear specification.

The evidence supports the CAS reading. The main effect is positive and robust across TWFE-DID, CSDID, and DDML specifications. Each of the three coupling channels operates in the direction implied by its proposition. Additionally, the heterogeneity signature is consistent with the absorptive-capacity primitive in our model as follows: PDO is most consequential for firms whose ecosystems and managerial cognition are best able to convert data into actionable green-innovation effort, and least consequential where regime lock-in or limited absorptive capacity blocks the conversion.

In bringing these system-theoretic primitives to bear on the data-element literature, the paper provides both a formal account of why uniform data-openness reforms yield uneven returns and firm-level causal evidence on the channels through which they operate. This focus distinguishes our study from Lv and Zhang [13], who examine how open government data raise green economic efficiency at the regional level as follows: we shift the level of analysis to the firm, identify a causal effect on green innovation rather than aggregate efficiency, and open the mechanism black box through a derived three-channel decomposition. A full statement of the paper’s contributions is deferred to Section 6.1, where it can be read against the evidence.

This study is positioned within the scope of system practice in social science because public data openness is not examined as an isolated digital policy, but as a system-level intervention that changes the interaction rules among firms, governments, financial institutions, and markets. In this system, firms are heterogeneous adaptive agents; governments provide institutional signals and regulatory constraints; financial institutions allocate green credit according to information precision; and markets transmit demand-side feedback. Public data openness alters the information architecture connecting these actors, thereby changing feedback loops, coordination costs, and the fitness landscape of green-innovation strategies. This framing allows the paper to move beyond a linear “policy–outcome” logic and to explain why the same data-opening reform produces uneven innovation responses across firms and regions. In short, public data openness reshapes the adaptive structure of the corporate green-innovation system by changing information flows, feedback loops, agent interactions, and the selection environment in which firms, governments, markets, and financial institutions co-evolve.

The rest of the paper proceeds as follows. Section 2 develops the theoretical framework and derives the hypotheses. Section 3 describes the research design. Section 4 presents the empirical results. Section 5 reports the mechanism and heterogeneity analyses. Section 6 discusses theoretical contributions, articulates the paper’s contribution to systems research, draws out policy implications, and concludes.

2. Theoretical Framework and Hypotheses

2.1. Conceptual Foundations

Three strands of theory underpin our analysis. First, the data-as-factor literature argues that data are characterised by non-rivalry, near-zero replication cost, and combinatorial value, so that their economic worth grows with re-use, integration, and depth of processing [2,14]. Public data openness is the institutional mechanism that scales this potential by exposing previously siloed government datasets to a heterogeneous community of users [13]. The literature, however, has largely modelled PDO as a static input that linearly enters firms’ production or knowledge-accumulation functions, under-weighting the dynamic, agent-level adaptation it triggers. In the Chinese setting this adaptation is shaped by distinctive institutional features—an unusually salient role for local government in resource allocation, a state-influenced supply of green credit, and a fragmented, municipality-by-municipality data-governance landscape—so that public data openness operates on a selection environment in which administrative signals are first-order rather than peripheral.

Second, the innovation-ecosystem and CAS literature conceives of innovation as the emergent outcome of a population of heterogeneous, bounded-rational agents whose strategies co-evolve with a selection environment [3,4]. CAS supplies the following three theoretical primitives we exploit: agents adapt locally, so exogenous shocks propagate through micro-level revision rather than instantaneous re-equilibration; the system exhibits emergent macro-patterns that cannot be read off individual incentives; and shocks reshape the fitness landscape, altering both the level and the relative payoffs of competing strategies. A complementary multi-level perspective [5] explicates how niche innovations penetrate incumbent regimes under landscape pressure, while dynamic-capability theory [6] supplies the firm-level micro-foundation for adaptation.

Third, the corporate green-innovation literature converges on a patent-based measurement convention [15] and identifies two families of antecedents—internal (absorptive capacity, R&D intensity, managerial cognition [1,16]) and external (environmental regulation, industrial policy, and financial development [7,8,17])—that interact within an ecosystem [18,19,20]. The CAS reading we adopt integrates these strands by treating internal absorptive capacity as the firm-level adaptation kernel and external institutions and markets as the selection environment, with PDO operating as an exogenous landscape shock.

Figure 1 visualises the resulting theoretical framework as follows: PDO enters the green-innovation ecosystem as an exogenous landscape shock and propagates through three coupling channels—supply–demand alignment, recalibration of government intervention, and amplification of green credit—each of which feeds back to firm-level fitness and the equilibrium share of green-innovation strategies.

2.2. The Green-Innovation Ecosystem as a Complex Adaptive System

We model the green-innovation ecosystem as a four-tuple

Σ = ⟨ N, S, F, W ⟩

, where

N

is a finite set of heterogeneous firm agents,

S

is the strategy space of innovation portfolios, F is a fitness function mapping strategy–environment pairs to expected payoffs, and W is a transition kernel that governs agent-level adaptation [3]. Public data openness

D_{o} (t)

enters the environment as an exogenous variable; the policy question is how a perturbation of this variable propagates through W into the equilibrium distribution of strategies in

N

.

In this framework, the system boundary is defined as the urban green-innovation ecosystem in which listed firms interact with local governments, financial institutions, suppliers, customers, and data-platform operators. The agents are not assumed to optimise under complete information; rather, they update their innovation strategies through bounded rationality and local learning. Public data openness changes the system’s information topology by reducing data fragmentation and increasing observability across agents. As a result, the reform affects not only individual firms’ knowledge stocks, but also the feedback loops through which market demand, regulatory intervention, and green finance jointly select and reinforce green-innovation strategies. The emergent macro-pattern observed in green-patent data is therefore the population-level footprint of these reconfigured feedback loops, rather than the simple sum of independent firm-level responses to a digital-policy shock. A feature specific to the Chinese reform reinforces this reading as follows: because municipal data platforms were launched in a staggered, locally initiated manner rather than through a single national mandate, the reform itself supplies the heterogeneous, exogenous landscape shocks our identification exploits, while the agents adapting to them remain embedded in a strongly state-mediated institutional context.

2.3. Knowledge Accumulation and Replicator Dynamics

Let

K_{i} (t)

denote the green-innovation-relevant knowledge stock of firm

i

. Following Cohen and Levinthal [1] and Teece [6], knowledge evolves according to the following:

\begin{matrix} K_{i} (t + 1) = (1 - δ) K_{i} (t) + η R_{i} (t) + λ D_{o} (t) A (K_{i} (t)) \end{matrix}

(1)

where

δ \in (0, 1)

is the depreciation rate,

R_{i} (t)

is internal R&D,

D_{o} (t)

is the publicly accessible data stock, and

A (\cdot)

is an increasing, concave absorptive-capacity function. Equation (1) embeds the Cohen–Levinthal formalism in an open-data environment as follows: the marginal contribution of PDO to firm i’s knowledge stock is

λ \cdot A^{'} (K_{i}) > 0

and is positive for every firm. Because A(·) is increasing and concave, this marginal contribution is positive but decreasing in the level of absorptive capacity as follows: public data openness always raises the knowledge stock, and firms with greater absorptive capacity attain a higher level of converted knowledge A(K), even though the marginal productivity of an additional unit of data diminishes. The cross-firm heterogeneity we exploit below therefore reflects differences in the level of absorptive capacity and in ecosystem position, not a monotone ranking of this marginal term.

At the population level, let

x_{i} (t)

denote the share of firms adopting green-innovation strategy

i

. Following Hofbauer and Sigmund [21], adaptation in

N

follows the replicator equation:

\begin{matrix} \frac{d x_{i}}{d t} = x_{i} [F_{i} (D_{o}) - F (x, D_{o})] \end{matrix}

(2)

with

F (x, D_{o}) = Σ_{j} x_{j} F_{j} (D_{o})

the average fitness. Equation (2) states that strategies with above-average fitness expand their share and those with below-average fitness contract. The macro-level patterns observed in patent data are the aggregate footprint of this population-level adaptation; the question is how changes in

D_{o}

alter the fitness landscape that shapes (2).

2.4. The Selection Landscape and the Main Effect

Following the production-function tradition [7] and the data-as-factor literature [2], we specify fitness as a generalised Cobb–Douglas function in which

D_{o}

enters as a distinct factor:

\begin{matrix} F_{i} (D_{o}) = A_{i} {K_{i}}^{α} {L_{i}}^{β} {M_{i}}^{γ} {D_{o}}^{φ}, α + β + γ + φ \leq 1 \end{matrix}

(3)

where

L_{i}

is labour input, Mᵢ denotes the institutional and market environment, and

α, β, γ, φ > 0

are output elasticities. Differentiating (3) with respect to

D_{o}

yields the direct fitness effect of public data openness:

\begin{matrix} \frac{\partial F_{i}}{\partial D_{o}} = \frac{φ F_{i}}{D_{o}} > 0 \end{matrix}

(4)

PDO uniformly elevates the fitness of strategies that rely on external information, scaled by

φ

. Combining (2) with (4), the equilibrium share

x_{G I}^{*}

of green-innovation strategies is strictly increasing in

D_{o}

. Two features of this derivation are non-trivial. First, because

A (\cdot)

appears in (1), the marginal effect of D_o on

K_{i}

—and hence on

F_{i}

—is heterogeneous across agents in a way that yields the empirical heterogeneity signature we look for. Second, because fitness is multiplicative in

M_{i}

, PDO can shift

x_{G I}^{*}

not only directly but also via the institutional and market channels we develop next.

Proposition 1 (Main Effect).

An increase in public data openness shifts the selection landscape such that the equilibrium share of green-innovation strategies in the firm population is strictly increasing in

D_{o}

.

Hypothesis 1.

Public data openness has a positive effect on corporate green-innovation activity.

Because the marginal effect of public data openness on a firm’s knowledge stock is λ·A′(K) (Equation (1)), the same shock is expected to produce different responses across agents according to their absorptive capacity and ecosystem position. We therefore state three directional expectations before turning to the data. First, on absorptive capacity and managerial cognition, the effect should be larger for firms better able to convert data into green-innovation effort—firms with environmentally experienced executives and non-state and non-heavily polluting firms. Second, on regional financial development, the effect should be larger where formal credit markets are thin, because open data substitutes for missing financial infrastructure (the credit channel of H4). Third, on openness quality and maturity, the effect should be larger in cities with earlier or higher-quality data openness. The heterogeneity analysis in Section 5.2 tests these stated priors rather than interpreting subsample differences post hoc.

2.5. Coupling Channel I—Supply–Demand Alignment

A central CAS prediction is that exogenous shocks alter the institutional component of fitness through the informational topology that connects supply and demand. Let

{Q_{i}}^{s} a n d {Q_{i}}^{d}

denote firm i’s supply and demand quantities and define the supply–demand mismatch index:

\begin{matrix} Φ_{i} = \frac{σ ({Q_{i}}^{s})}{σ ({Q_{i}}^{d})} \end{matrix}

(5)

Mismatch raises coordination cost

C c (Φ)

, specified as a convex quadratic:

\begin{matrix} C_{c} (Φ) = c_{0} + c_{1} Φ + c_{2} Φ^{2}, c_{1}, c_{2} > 0 \end{matrix}

(6)

The cost enters fitness through the following institutional channel:

F_{i} = F_{i}^{0} \cdot e x p (- κ \cdot C ᴄ (Φ))

. PDO narrows the information gap between supply and demand sides—standardised, machine-readable datasets reduce signal noise about regulatory trends, emissions monitoring, procurement, and technology adoption—and so:

\begin{matrix} \frac{\partial Φ}{\partial D_{o}} < 0 \end{matrix}

(7)

By the chain rule applied through (6) and the institutional channel,

\begin{matrix} \frac{\partial F_{i}}{\partial D_{o}} = - κ F_{i} (c_{1} + 2 c_{2} Φ) \cdot \frac{\partial Φ}{\partial D_{o}} > 0 \end{matrix}

(8)

Proposition 2.

Public data openness raises green-innovation fitness by compressing supply–demand mismatch.

Hypothesis 2.

Public data openness promotes corporate green innovation by reducing supply–demand mismatch.

Interpreted in system terms, this is a market-feedback coupling mechanism: by improving the visibility of demand-side information and regulatory trends, public data openness strengthens the feedback loop from market signals to firms’ green-innovation strategies, so the channel runs continuously rather than as a one-off mediator.

2.6. Coupling Channel II—Recalibration of Government Intervention

Aghion et al. [7] and Chang et al. [8] show that government intervention in industrial and environmental policy is non-monotonic in its productivity effects: mild intervention corrects externalities, but excessive intervention crowds out private incentives. We parameterise this through a quadratic component multiplying baseline fitness:

\begin{matrix} F_{i} = {F_{i}}^{0} [1 + θ_{1} G_{t} - θ_{2} {G_{t}}^{2}], θ_{1}, θ_{2} > 0 \end{matrix}

(9)

The interior optimum is

G^{*} = θ_{1} / (2 θ_{2})

. When ambient

G_{t}

exceeds

G^{*}

, fitness is decreasing in further intervention. Public data openness compresses the information rents on which discretionary intervention rests as follows [8,16]: once procurement, subsidy allocation, and regulatory targeting are observable to firms and citizens, the marginal return to additional state engagement falls. Formally,

\begin{matrix} \frac{\partial G_{t}}{\partial D_{o}} < 0 w h e n e v e r G_{t} > G^{*} \end{matrix}

(10)

Substituting (10) into the derivative of (9) yields

\partial F_{i} / \partial D_{o} > 0

in the over-intervention regime, the empirically relevant case for many Chinese municipalities. PDO thus operates as a corrective mechanism on the state side of the ecosystem, not just on the firm side.

Proposition 3.

When ambient government intervention exceeds the social optimum, public data openness raises green-innovation fitness by dampening intervention.

Hypothesis 3.

Public data openness promotes corporate green innovation by reducing the intensity of government intervention.

Interpreted in system terms, this is an institutional-feedback mechanism: greater transparency of policy implementation and subsidy allocation recalibrates the state’s role from discretionary actor to feedback channel, so that policy signals correct rather than crowd out the selection of green-innovation strategies.

2.7. Coupling Channel III—Amplification of Green-Credit Access

The credit channel operates through information asymmetry between firms and lenders. Banks form posterior beliefs about a firm’s green credibility from observable signals

s

; under Bayesian updating with prior π₀ that the firm is green-type,

\begin{matrix} P (g | s) = \frac{f (s | g) π_{0}}{f (s | g) π_{0} + f (s | n g) (1 - π_{0})} \end{matrix}

(11)

where

f (s | g)

and

f (s | n g)

are the conditional densities of the signal under green-type and non-green-type firms respectively. PDO sharpens signal precision—standardised emissions, energy-use, and regulatory-compliance data tighten the distribution of

s | g

around its true mean—producing, by a standard monotone-likelihood-ratio argument:

\begin{matrix} \frac{\partial P (g | s)}{\partial (\frac{1}{V a r (s | g)})} > 0 f o r h i g h r e a l i s a t i o n s o f s \end{matrix}

(12)

Translating into the equilibrium green-credit ratio

C_{g}

—the share of credit allocated to certified green projects—yields:

\begin{matrix} \frac{\partial C_{g}}{\partial D_{o}} > 0 \end{matrix}

(13)

The step from Equation (12) to Equation (13) follows by the chain rule. Equation (12) shows that the posterior P(g|s) is increasing in signal precision 1/Var(s|g) for high signal realisations; aggregating over the population of high-signal green-type firms, the equilibrium green-credit ratio

C_{g}

inherits this monotonicity, so

\partial C_{g} / \partial (1 / V a r (s | g)) > 0

. Because public data openness raises signal precision,

\partial (1 / V a r (s | g)) / \partial D_{o} > 0

, and the product of the two partials gives

\partial C_{g}

/

\partial D_{o}

>0, which is Equation (13).

Proposition 4.

Public data openness amplifies green-credit access by reducing information asymmetry in the credit market.

Hypothesis 4.

Public data openness promotes corporate green innovation by raising the proportion of green credit.

Interpreted in system terms, this is a financial-feedback mechanism: sharper environmental and compliance signals let lenders distinguish credible green innovators from symbolic adopters, so that capital flows reinforce the strategies the data infrastructure makes legible.

Taken together, the three channels are not three independent mediating variables but three interlocking coupling mechanisms—market-feedback, institutional-feedback, and financial-feedback—that jointly reconfigure the feedback architecture of the green-innovation ecosystem. Public data openness operates as a system-level enabler precisely because it activates these mechanisms simultaneously, rather than because it raises any single firm-level input.

3. Research Design

3.1. Institutional Background

China’s public data openness reform originated with Shanghai’s 2012 municipal portal and diffused unevenly across prefecture-level cities over the following decade. By the close of our sample period, more than one hundred and seventy cities had launched dedicated open-data platforms, with substantial heterogeneity in dataset volume, update frequency, and quality [14]. Adoption was driven by a combination of central guidance and local administrative initiative rather than a single national rollout; the staggered, partially decentralised character of adoption supplies the identifying variation we exploit.

3.2. Empirical Specification

The baseline multi-period Difference-in-Differences estimator is as follows:

\begin{matrix} {G I}_{i, t} = α + β \cdot {O p e n}_{c (i), t} + Γ^{'} X_{i, t} + μ_{i} + λ_{t} + ε_{i, t} \end{matrix}

(14)

where

i

indexes firms,

c (i)

is the home city, and t is the year.

G I_{i t}

is the natural logarithm of (1 + green-patent applications);

{O p e n}_{c (i), t}

is unity if city

c (i)

had launched a PDO platform by year t;

X_{i, t}

is a vector of firm and city controls;

μ_{i}

and

λ t

are firm and year fixed effects; and

ε_{i, t}

is clustered at the city level.

β

is the parameter of interest, identified under the parallel-trend assumption.

A central methodological concern is that TWFE-DID identifies a weighted average of cohort-specific treatment effects whose weights can be negative when treatment is staggered [9,10]. To address this we complement (14) with the CSDID estimator [12], which non-parametrically aggregates cohort-time-specific group-time average treatment effects on the treated,

A T T (g, t)

, into an overall average treatment effect on the treated (ATT), and with DDML [11], in which

O p e n_{i t}

enters as the treatment, and high-dimensional controls are residualised using cross-fitted Lasso and stacked learners. CSDID and DDML jointly close two distinct doors—heterogeneous-effect misweighting and high-dimensional confounding—through which bias might enter (14).

Three identification concerns deserve explicit treatment. First, the Stable Unit Treatment Value Assumption (SUTVA) requires that a city’s PDO adoption not affect outcomes in untreated cities through spillover. We mitigate this risk by including city-level macroeconomic controls (financial deepening, human capital, and GDP per capita) that absorb the most plausible spillover vectors, and by reporting an exclusion exercise that drops cities with adjacent or co-administered units in Section 4.6. Second, selection-into-adoption could generate omitted-variable bias if early-adopting cities differ on unobserved time-varying determinants of firm green innovation. We address this with the parallel-trend and placebo tests in Section 4.4, and with the Honest-DID bound of Rambachan and Roth [22], which shows that the post-treatment estimate survives plausible degrees of pre-trend violation. Third, the sample is restricted to A-share listed firms; extension to non-listed firms—where absorptive capacity heterogeneity is likely larger—awaits future data work and is flagged as a boundary condition rather than a flaw of the present design.

For mechanism analysis we estimate the standard two-step mediation system. In the first stage, the candidate mediator is regressed on the treatment together with the baseline control vector and fixed effects; in the second stage, the outcome is regressed on the treatment, the mediator, the controls, and the fixed effects:

\begin{matrix} M_{i, t} = α + δ \cdot {O p e n}_{c (i), t} + Γ^{'} X_{i, t} + μ_{i} + λ_{t} + ε_{i, t} \end{matrix}

(15)

\begin{matrix} {G I}_{i, t} = α + β \cdot {O p e n}_{c (i), t} + ρ \cdot M_{i, t} + Γ^{'} X_{i, t} + μ_{i} + λ_{t} + ε_{i, t} \end{matrix}

(16)

where

M_{i, t}

denotes one of the three mediators (supply–demand mismatch

Φ_{i, t}

, government intervention

G_{c, t}

, or green-credit ratio

C_{g, c, t}

). A statistically significant coefficient δ in Equation (15) jointly with a significant ρ in Equation (16) is interpreted as evidence that the corresponding coupling channel is operative; significance of

β

in Equation (16) after controlling for the mediator quantifies the residual direct effect of PDO on green innovation.

3.3. Variables

3.3.1. Dependent Variable

The principal outcome is the natural logarithm of the green-patent application count of firm

i

in year

t

,

{G I}_{i, t} = l n (1 + {G r e e n P a t}_{i, t})

. Green patents are identified from the China National Intellectual Property Administration (CNIPA) database using the eight technological fields of the WIPO IPC Green Inventory (alternative energy production, transportation, energy conservation, waste management, agricultural/forestry, administrative/regulatory/design aspects, nuclear power, and core inventions in green technologies). Patent applications are preferred to grants because they capture innovation effort at the moment of disclosure, free of examination lags [15]. For the substantive-versus-strategic decomposition reported in Section 4.3, we further separate the count into invention patents and utility-model patents. Invention patents undergo substantive examination on novelty, inventive step, and industrial applicability, and proxy radical, knowledge-intensive innovation; utility-model patents are examined only for formality and proxy incremental, strategic innovation [23]. The grant series

{G r a n t G I}_{i, t}

is used as an alternative dependent variable in robustness checks.

3.3.2. Treatment Variable

The treatment is an absorbing indicator that switches on at the year a firm’s home city and operationalises a municipal public data openness portal:

\begin{matrix} {O p e n}_{c (i), t} = 1 ⟨ t \geq {T_{c}}^{*} ⟩ \end{matrix}

(17)

where

{T_{c}}^{*}

is the calendar year in which city

c

launched its open-data portal, and

c (i)

maps firm

i

to its registered city. Adoption dates are cross-verified against the Fudan University China Local Public Data Openness Report (2023) and the Central China Normal University Government Open Data Utilisation Report (2022). The treatment is irreversible by construction; we therefore exclude the possibility that a city “closes” its portal post-adoption.

3.3.3. Mediators

Three mediators correspond to the three coupling channels derived in Section 2. Each is constructed at the level dictated by the underlying mechanism—firm-year for supply–demand mismatch (which is firm-specific by construction) and city-year for government intervention and green credit (both of which are determined at the municipal level).

(a) Supply–demand mismatch

Φ_{i, t}

(mediator for H2). Following the production–demand coordination literature [24], we construct

Φ_{i, t}

as the ratio of supply-side to demand-side volatility over a rolling window:

\begin{matrix} Φ_{i, t} = \frac{σ ({Q_{i, t}}^{s})}{σ ({Q_{i, t}}^{d})}, {Q_{i, t}}^{s} = {O p e r C o s t}_{i, t} + ({I n v}_{i, t} - {I n v}_{i, t - 1}) \end{matrix}

(18)

where

{O p e r C o s t}_{i, t}

is operating cost,

{I n v}_{i, t}

is end-of-year inventory net value, and supply

{Q_{i, t}}^{s}

is recovered from the inventory accounting identity. Demand

{Q_{i, t}}^{d}

is proxied by operating-revenue growth scaled by sector-year mean to remove aggregate cyclicality. Standard deviations are computed over a three-year window. Larger

Φ_{i, t}

signals that supply volatility outpaces demand signal volatility, i.e., higher coordination friction; Equation (7) predicts

\partial Φ / \partial D_{o} < 0

.

Two clarifications about scope follow. The index is a firm-level coordination-friction measure; because the demand component is scaled by the sector–year mean and firm fixed effects absorb time-invariant cross-industry differences, level differences across industries are already largely netted out, and the proxy is most informative for firms whose output is inventoried and whose demand is observable. As a further check against industrial heterogeneity, we re-compute the index net of its industry–year mean; public data openness still significantly compresses the industry-adjusted mismatch (first-stage coefficient −0.0353, p = 0.020), confirming that the result is not an artefact of industry composition.

(b) Government intervention

G_{c, t}

(mediator for H3). Following Chang et al. [8] and the public-finance tradition, intervention intensity is measured as the share of municipal general-budget expenditure in regional GDP:

\begin{matrix} G_{c, t} = \frac{{F i s c a l E x p}_{c, t}}{{G D P}_{c, t}} \end{matrix}

(19)

A larger

G_{c, t}

indicates a more interventionist local state. The quadratic specification of Equation (9) implies that PDO compresses the resource rents underpinning discretionary intervention; we therefore expect

\partial G_{c, t} / \partial D_{o} < 0

in the over-intervention regime.

(c) Green-credit ratio

C_{g, c, t}

(mediator for H4). Following the green-finance literature [25], the green-credit ratio is the share of green-classified loans in total bank credit at the city–year level:

\begin{matrix} C_{g, c, t} = \frac{{G r e e n L o a n}_{c, t}}{{T o t a l L o a n}_{c, t}} \end{matrix}

(20)

Green loans are identified using the China Banking and Insurance Regulatory Commission’s green-credit guidelines and aggregated from bank-branch reporting at the municipal level. Equations (11)–(13) predict that PDO sharpens the precision of signals on which lenders condition green-credit allocation, raising the equilibrium ratio.

3.3.4. Control Variables

The control vector follows the literature on Chinese firm green-innovation determinants and is partitioned into firm-level and city-level blocks. At the firm level we include firm size (Size), leverage (Lev), profitability (ROA), listing age (Listage), board independence (Indep), ownership concentration (Top1), CEO–chair duality indicator (Dual), capital intensity (FixA), board size (Bdsize), and sales growth (Salesgr). At the city level we include human capital (Hcap), financial deepening (Fin), and GDP per capita (Eco). All continuous variables are winsorised at the 1st and 99th percentiles. Firm- and year-fixed effects absorb time-invariant heterogeneity and aggregate shocks. Definitions are summarised in Table 1.

3.4. Data Sources and Sample

The sample comprises Shanghai- and Shenzhen-listed A-share firms over 2013–2022. We exclude financial firms, ST/PT firms, firms with negative equity, and observations with substantial missing values. Continuous variables are winsorised at the 1st and 99th percentiles. Green-patent data are drawn from CNIPA (https://www.cnipa.gov.cn/, accessed on 12 December 2025); financial and governance data from CSMAR and WIND (https://data.csmar.com/, https://www.wind.com.cn/, accessed on 20 December 2025); city-level macroeconomic indicators from the China Urban Statistical Yearbook (http://www.stats.gov.cn/, accessed on 14 December 2025); and PDO adoption dates from publicly available municipal documentation.

4. Empirical Results

4.1. Descriptive Statistics

Table 2 reports descriptive statistics for the principal variables. The distribution of the natural logarithm of (1 + green-patent applications) is approximately symmetric about a mean of 6.49 with a standard deviation of 1.79, consistent with the empirical literature on Chinese listed firms after winsorisation. About one-third of firm–year observations are exposed to public data openness (mean of Open = 0.322), reflecting the staggered municipal roll-out across 2013–2022. The three mediators—supply–demand mismatch, government-intervention intensity, and green-credit ratio—display the dispersion required for the mechanism analysis in Section 5.

4.2. Baseline DID

Table 3 reports the estimates of Equation (14). The coefficient on Open is 0.2379 and statistically significant at the 1% level under both city-clustered standard errors (Column 1) and two-way (firm and city) clustered standard errors (Column 2). The magnitude implies that, conditional on firm- and year-fixed effects and the control vector, firms in cities that have opened public data file roughly 27% more green patents than firms in otherwise comparable cities still under closed-data regimes, expressed in exponentiated terms. The point estimate is consistent with Proposition 1 and supports Hypothesis H1.

4.3. Decomposition by Patent Type

A central theoretical claim is that PDO shifts the population of innovation strategies toward higher-fitness, substantive configurations rather than proliferating low-quality patents. Table 4 decomposes the dependent variable into three components. The coefficient on PDO is 0.3906 and significant at the 1% level for green-invention patents (substantive innovation, higher technical novelty and examination depth), while the coefficients for utility-model patents (strategic innovation, 0.1166) and authorised grants (0.1100) are statistically indistinguishable from zero. The substantive-versus-strategic gap is consistent with the CAS prediction that an enlarged information set shifts agents toward radical rather than imitative reconfigurations of the fitness landscape.

4.4. Identification Checks

Figure 2 plots event-time coefficients from a dynamic specification of Equation (14) in which Open is replaced by a vector of leads and lags. Pre-treatment coefficients are tightly bracketed around zero; the joint pre-trend test fails to reject parallel trends (p = 0.978). Post-treatment coefficients rise monotonically over the six years following PDO adoption, peaking at roughly 0.36, consistent with cumulative adaptation as predicted by Equation (1).

Figure 3 reports a randomisation-inference placebo test as follows: treatment timing is randomly permuted across firms and years for 300 replications, and the resulting placebo coefficients are plotted as a kernel density. The placebo mass concentrates tightly around zero, while the actual estimate (β = 0.238) lies far in the left tail of the placebo distribution. The probability of obtaining a coefficient as large as the observed estimate by random chance is essentially nil, ruling out a spurious-trend interpretation.

Figure 4 reports the Honest-DID sensitivity analysis of Rambachan and Roth [22]. The post-treatment estimate is bounded above and below by a one-dimensional relaxation parameter M⁻ governing the permissible degree of pre-trend violation. The estimate remains statistically distinguishable from zero for plausible values of M⁻, indicating that the result is robust to bounded violations of the parallel-trend assumption.

4.5. Heterogeneity-Robust Estimators and Machine Learning

Two contemporary critiques motivate the alternative estimators reported in Table 5. First, TWFE-DID under staggered adoption identifies a weighted average of cohort-specific ATTs whose weights can be negative [9,10], which can attenuate or even invert the sign of the aggregated estimate. Columns (1)–(2) of Table 5 report the Callaway and Sant’Anna [12] ATT and its post-treatment cohort average. Both estimates are roughly twice the TWFE-DID baseline (β = 0.5721 and 0.5731, respectively), consistent with the negative-weight contamination identified by the recent literature and reinforcing the qualitative conclusion of Hypothesis H1.

Second, identification under conditional parallel trends requires that the control set adequately characterises the conditioning information. We therefore complement the parametric specifications with DDML [11], in which Open enters as the treatment of interest and the controls in X are residualised non-parametrically via cross-fitted Lasso (Column 3) and stacked learners (Column 4). Both DDML estimates remain positive and statistically significant, with magnitudes (0.1080 and 0.1288) that span the model-selection uncertainty inherent in the underlying first-stage learners. Triangulating across CSDID and DDML closes two distinct doors—heterogeneous-effect misweighting and high-dimensional confounding—through which bias might enter the TWFE estimate.

4.6. Conventional Robustness

Table 6 reports five conventional robustness exercises. Column (1) adds an extended block of controls; column (2) excludes confounding digital and trade-related pilot policies; column (3) excludes supply-chain pilot zones and high-speed-rail openings that might affect the same firms; column (4) excludes the 2020–2022 pandemic years; and column (5) excludes municipalities directly administered by the central government. The coefficient on Open remains positive and statistically significant across all five specifications, with magnitudes in a tight band around the baseline 0.2379. The only attenuation occurs in column (2), where stripping out digital and trade pilots removes some of the variation correlated with PDO adoption; the coefficient remains significant at the 10% level there.

A natural concern is that a binary adoption indicator does not reflect how far data openness has progressed. As a continuous alternative, we replace Open with an openness-intensity measure—the cumulative number of years since a city operationalised its open-data platform (zero before launch). Table 7 shows that this continuous measure also raises green innovation, by 0.0404 per additional year of openness and by 0.1479 in a concave log specification; the implied cumulative effect over the post-treatment window is consistent with the binary baseline of 0.2379.

We also examine the robustness of the outcome to alternative measures of green innovation (Table 8). Public data openness significantly raises green invention-patent grants—authorised, substantively examined output—and the green invention share, defined as the proportion of substantive invention patents in a firm’s green portfolio. Both reinforce the substantive-innovation interpretation; the effect on total and utility-model grants is insignificant, consistent with the effect being concentrated in substantive invention rather than in incremental or purely formal output.

5. Mechanism and Heterogeneity Analysis

5.1. The Three Coupling Mechanisms

Table 9 reports the two-stage mediation estimates of Equations (15)–(16) for the three mediators in turn. Columns (1)–(2) test Hypothesis H2 (supply–demand mismatch). In stage 1, PDO reduces mismatch by 0.0439 (significant at the 5% level), consistent with the noise-reduction prediction of Equation (7). In stage 2, the coefficient on (dev) is −0.0199 (significant at the 5% level), confirming that mismatch suppresses green-innovation activity. The implied indirect effect is positive and supports H2.

Columns (3)–(4) test Hypothesis H3 (government intervention). In stage 1, PDO reduces intervention intensity by 0.0163 (significant at the 1% level), exactly the compression of discretionary intervention predicted by Equation (10) in the over-intervention regime. In stage 2, the coefficient on (govintv) is negative (−0.7226) but statistically indistinguishable from zero at conventional levels (SE 0.4763). H3 thus receives partial support, and the gap between a robust first stage and a weak second stage warrants a fuller explanation. Three factors are at work. First, measurement: intervention intensity is observed at the city–year level whereas innovation is a firm-level outcome, so the second-stage coefficient is attenuated by aggregation error relative to the firm-specific exposure that actually matters. Second, economics: in the over-intervention regime the marginal firm-level innovation response to a small reduction in aggregate intervention is of second order, because the channel works through portfolio reallocation and distortion correction rather than a direct level shift. Third, unobserved pathways: the firm-level routes through which a less interventionist environment raises innovation—reduced rent-seeking and a reallocation of managerial attention from compliance toward R&D—are not captured by the aggregate intensity proxy. Consistent with this reading, when the three mediators are entered jointly (Table 10) they are jointly significant even though the intervention channel remains individually weak, while the credit and demand channels carry the bulk of the identifiable transmission. Sharper identification of this channel would require a firm-level measure of exposure to discretionary intervention, which we leave to future data work.

Columns (5)–(6) test Hypothesis H4 (green credit). In stage 1, PDO raises the green-credit ratio by 0.0048 (significant at the 1% level), consistent with the precision-amplification logic of Equations (11) and (12). In stage 2, the coefficient on (gcredit) is 0.3856 and significant at the 1% level. The credit channel is the strongest in transmission magnitude among the three, in keeping with the deferred-payoff structure of green-R&D investments that require external financing.

Two interpretive caveats apply to the mediation estimates. First, identifying indirect effects via the two-stage system in Equations (15) and (16) requires sequential ignorability—the standard mediation assumption that the mediator is, conditional on controls and fixed effects, independent of unmeasured determinants of green innovation. We do not impose this assumption on the main estimate of β; it applies only to the channel decomposition. Second, the city-aggregated nature of the intervention and credit mediators introduces measurement noise relative to the firm-level outcome, which biases the second-stage coefficients toward zero. The pattern observed here—strong first-stage effects for all three mediators but a statistically weak second stage for intervention—is consistent with this attenuation, and our preferred interpretation is qualitative: PDO operates through all three channels, with credit and supply–demand alignment carrying the bulk of the identifiable transmission.

Because the three channels are theorised as interlocking rather than independent, we also estimate a joint specification that enters all three mediators simultaneously (Table 10). The mediators are jointly significant (F = 2.88, p = 0.037), confirming that they operate together rather than as mutually exclusive alternatives; the green-credit channel retains the largest independent transmission. Pairwise multiplicative interactions among the channels are not statistically significant, so we describe the channels as jointly operative and co-moving rather than multiplicatively amplifying.

5.2. Heterogeneity by Ecosystem Position

Section 2.4 set out directional expectations for heterogeneity; we now test them. CAS theory predicts that the same exogenous shock produces divergent fitness gains across agents with different absorptive capacities and ecosystem positions. Table 11 partitions the sample along six dimensions and reports the coefficient on Open for each subsample. The effect is positive and statistically significant in every subsample, but the magnitudes vary in directions that map onto the model primitives. Firms led by managers with prior environmental-protection experience exhibit the largest within-subsample coefficient (0.3449 for green-exec), illustrating the role of micro-level absorptive capacity

A (K_{i})

in Equation (1). Firms in financially under-developed regions exhibit the second-largest coefficient (0.3584), consistent with the credit-substitution logic of Hypothesis H4 as follows: where formal credit markets are thin, PDO substitutes for missing financial infrastructure. The early-open and high-openness columns show that cities with longer exposure or higher data-quality scores convert PDO into innovation at substantially larger magnitudes, consistent with the precision argument in Equation (7).

6. Discussion and Conclusions

6.1. Theoretical Contributions

The paper makes four contributions. Theoretically, it brings CAS and innovation-ecosystem theory to bear on the data-element literature, providing a formal scaffold—knowledge accumulation, replicator dynamics, and an explicit fitness landscape—that generates the comparative-static predictions the empirical analysis tests. Methodologically, it triangulates conventional DID with heterogeneity-robust estimators and machine-learning controls, addressing critiques that have rendered much of the existing DID-based open-data literature vulnerable. Substantively, it decomposes the policy’s causal architecture into three derived coupling channels and shows that mediation analyses confirm each. Practically, it reframes public data openness from a uniform public good into a systemic enabler whose returns scale with absorptive capacity, motivating a tiered policy design.

Within the CAS apparatus developed above, the paper makes two theoretical moves. First, it imports the formal apparatus of CAS—replicator dynamics, an explicit fitness landscape, and a knowledge-accumulation law with absorptive-capacity heterogeneity—into the literature on data elements and innovation policy. Where the prevailing approach treats data as an input that linearly augments output, the CAS reading treats public data openness as an environmental shock that propagates through coupling mechanisms inscribed in the ecosystem rather than within any single firm [3,4]. Second, it derives the three coupling channels from a common model primitive rather than enumerating them post hoc. This permits sign-restricted testing and a more disciplined mediation analysis. Both moves prepare the ground for the broader contribution to system research articulated in Section 6.2 and respond to the call by Yin et al. [18] for system-theoretic frames that articulate how digital infrastructure interacts with green-innovation performance.

A further theoretical implication is that the empirical heterogeneity is not a nuisance to be controlled for, but a prediction of the model. Because the marginal effect of

D_{o}

on

K_{i}

is

λ \cdot A^{'} (K_{i})

, the same shock produces different fitness gains across agents whose

A (\cdot)

differs—which is exactly the pattern we observe across ownership, industry, managerial cognition, and regional data quality. Future work that treats heterogeneity as evidence about the structure of

A (\cdot)

is likely to yield further identification leverage.

6.2. Contribution to Systems Research

Beyond its substantive contribution to the data-element and green-innovation studies, this study contributes to system research in three distinct ways. First, it extends system practice in social science by treating China’s public data openness reform as a system-level intervention rather than a conventional digital-policy shock. The analysis shows how a policy that appears administrative at the surface can restructure the adaptive relations among firms, governments, markets, and financial institutions, and that the macro-pattern observed in green-patent data is the population-level footprint of this restructuring rather than the additive sum of independent firm-level responses. Second, it contributes to complex-system research by formalising corporate green innovation as an emergent outcome of heterogeneous agents adapting on a changing fitness landscape. This approach explains why policy effects are not uniform, but depend on absorptive capacity, managerial cognition, and regional ecosystem conditions, and it converts heterogeneity from a nuisance to be conditioned away into direct evidence about the structure of the adaptation kernel. Third, it contributes to digital-system research by showing that data elements function not only as production inputs, but also as connectors that reshape information flows, feedback loops, and cross-agent coordination. In this sense, public data openness operates as a systemic enabler of green innovation rather than a simple resource supplement, and the three coupling mechanisms—market-feedback alignment, institutional-feedback recalibration, and financial-feedback amplification—jointly illustrate how a single information-architecture reform can re-wire the selection environment of an entire innovation ecosystem.

6.3. Policy Implications

Three implications follow for the design of data infrastructure. First, dataset quality and update frequency, rather than headline coverage, should be the operative metric: only data that meaningfully reduce signal noise compresses

Φ

and shift fitness. Second, complementary investments in firm-level absorptive capacity—digital infrastructure, data-literacy training, and R&D tax incentives—are pre-conditions for PDO returns, not optional add-ons [1,6]. Third, the credit channel is policy-actionable as follows: pairing PDO portals with green-credit certification protocols would amplify the H4 mechanism we identify and would do so most strongly in financially under-developed regions where the channel binds.

6.4. Limitations and Future Research

The study has limitations that open avenues for future research. It focuses on Chinese A-share listed firms over 2013–2022; extension to private firms and other jurisdictions would broaden external validity. Because the sample excludes unlisted small and medium-sized enterprises—which rely more on external open data than on internal R&D—our estimates may be a lower bound if such firms benefit more from openness, or an upper bound if listed firms’ greater absorptive capacity lets them convert open data more effectively; the net direction is a boundary condition rather than a settled magnitude. The sample window also ends in 2022, so post-2022 developments—further municipal platform rollouts, evolving data-openness quality, and post-pandemic normalisation—fall outside our estimates and are flagged as a scope condition. The patent-based measure captures innovation effort and output but not market diffusion or downstream environmental outcomes. The CAS model abstracts from inter-firm strategic interaction; richer agent-based simulations could quantify emergent dynamics such as cluster formation and regime shifts [5]. Finally, our mediation analysis identifies the direction and presence of the three channels but not their relative welfare weights; explicit structural estimation of Equations (1)–(13) would permit such a decomposition. Pursuing these avenues would deepen the integration of system theory with the empirical evaluation of data-element policies.

6.5. Conclusions

This paper has reframed corporate green innovation under public data openness through the lens of CAS. Treating data as a strategic factor of production and PDO as an exogenous shock to the fitness landscape of the green-innovation ecosystem, we derived four formal propositions—a main effect and three coupling channels operating through supply–demand alignment, the recalibration of government intervention, and the amplification of green credit—and translated each into a sign-restricted, testable hypothesis. Exploiting the staggered municipal roll-out of open-data platforms across China between 2013 and 2022, we identified the causal effect using a multi-period DID design and reinforced inference with the Callaway–Sant’Anna estimator and DDML, two safeguards that jointly address heterogeneous-treatment-effect mis-weighting and high-dimensional confounding.

The empirical evidence converges on a single, theoretically coherent picture. PDO raises firm-level green-innovation activity by roughly a quarter relative to comparable closed-data cities, with the effect concentrated in substantive invention patents rather than utility-model patents—consistent with a shift in the population of innovation strategies toward higher-fitness, knowledge-intensive configurations. All three derived channels operate in the direction implied by the model as follows: supply–demand mismatch contracts, discretionary government intervention is dampened in the over-intervention regime, and the green-credit ratio expands. Heterogeneity analyses confirm the model’s most distinctive prediction as follows: the same exogenous shock yields markedly different fitness gains across agents whose absorptive capacity, managerial cognition, and ecosystem position differ.

Taken together, these results reposition PDO from a uniform public good into a systemic enabler whose marginal returns scale with the absorptive capacity of recipient firms and regions. The policy implication is that uniform national rollouts will continue to generate uneven returns; tiered designs that pair data infrastructure with investments in firm-level absorptive capacity, data-literacy training, and green-credit certification protocols are likely to capture a substantially larger share of the potential welfare gains. More broadly, the analysis illustrates the value of bringing system-theoretic primitives—replicator dynamics, fitness landscapes, and absorptive-capacity heterogeneity—to bear on the evaluation of data-element policies, and points to a research agenda in which heterogeneity is treated not as a nuisance to be conditioned out but as direct evidence on the structure of the adaptation kernel itself.

Author Contributions

Conceptualization, X.Z. and L.Z.; methodology, X.Z. software, X.Z.; validation, X.Z. and L.Z.; formal analysis, X.Z.; investigation, X.Z.; resources, L.Z.; data curation, X.Z.; writing—original draft preparation, X.Z. and L.Z.; writing—review and editing, X.Z. and L.Z.; visualisation, X.Z.; supervision, L.Z.; project administration, X.Z.; funding acquisition, L.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Heilongjiang Provincial Philosophy and Social Science Research Planning Project, including the project “Research on the Collaborative Mechanism of Artificial Intelligence Empowering the Value Enhancement of the Whole Agricultural Industry Chain in Heilongjiang Province” (Grant No. 25JYI003), the project “The Study on Promoting the Upgrade of Heilongjiang’s Ice and Snow Industry through ‘Data Elements × Value Networks’” (Grant No. 24GLH005), and the project “Research on the Mechanisms and Pathways of Digital–Real Economy Integration Empowering the Development of Corporate Green Productivity in Heilongjiang Province” (Grant No. 25JYC025).

Data Availability Statement

The data presented are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CASs	Complex Adaptive Systems
PDO	Public data openness
DIDs	Difference-in-Differences
SUTVA	Stable Unit Treatment Value Assumption
TWFE-DID	Two-way fixed-effect DID
DDML	Double/debiased machine learning
CSDIDs	Callaway–Sant’Anna difference-in-differences
ATT	Average treatment effect on the treated

References

Cohen, W.M.; Levinthal, D.A. Absorptive capacity: A new perspective on learning and innovation. Adm. Sci. Q. 1990, 35, 128–152. [Google Scholar] [CrossRef]
Goldfarb, A.; Tucker, C. Digital economics. J. Econ. Lit. 2019, 57, 3–43. [Google Scholar] [CrossRef]
Holland, J.H. Studying complex adaptive systems. J. Syst. Sci. Complex. 2006, 19, 1–8. [Google Scholar] [CrossRef]
Adner, R. Ecosystem as structure: An actionable construct for strategy. J. Manag. 2017, 43, 39–58. [Google Scholar] [CrossRef]
Geels, F.W. Technological transitions as evolutionary reconfiguration processes: A multi-level perspective and a case-study. Res. Policy 2002, 31, 1257–1274. [Google Scholar] [CrossRef]
Teece, D.J. Explicating dynamic capabilities: The nature and microfoundations of (sustainable) enterprise performance. Strateg. Manag. J. 2007, 28, 1319–1350. [Google Scholar] [CrossRef]
Aghion, P.; Cai, J.; Dewatripont, M.; Du, L.; Harrison, A.; Legros, P. Industrial policy and competition. Am. Econ. J. Macroecon. 2015, 7, 1–32. [Google Scholar] [CrossRef]
Chang, L.; Li, W.; Lu, X. Government engagement, environmental policy, and environmental performance: Evidence from the most polluting Chinese listed firms. Bus. Strategy Environ. 2015, 24, 1–19. [Google Scholar] [CrossRef]
Callaway, B.; Sant’Anna, P.H.C. Difference-in-differences with multiple time periods. J. Econom. 2021, 225, 200–230. [Google Scholar] [CrossRef]
de Chaisemartin, C.; D’Haultfœuille, X. Two-way fixed effects estimators with heterogeneous treatment effects. Am. Econ. Rev. 2020, 110, 2964–2996. [Google Scholar] [CrossRef]
Goodman-Bacon, A. Difference-in-differences with variation in treatment timing. J. Econom. 2021, 225, 254–277. [Google Scholar] [CrossRef]
Chernozhukov, V.; Chetverikov, D.; Demirer, M.; Duflo, E.; Hansen, C.; Newey, W.; Robins, J. Double/debiased machine learning for treatment and structural parameters. Econom. J. 2018, 21, C1–C68. [Google Scholar] [CrossRef]
Lv, L.; Zhang, P. Unlocking green potential: How open government data enhances green economic efficiency in China? J. Environ. Manag. 2025, 380, 125043. [Google Scholar] [CrossRef] [PubMed]
Kitchin, R. The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences; SAGE Publications: Thousand Oaks, CA, USA, 2014. [Google Scholar]
Fleming, L.; Sorenson, O. Science as a map in technological search. Strateg. Manag. J. 2004, 25, 909–928. [Google Scholar] [CrossRef]
Banerjee, S.B. Managerial perceptions of corporate environmentalism: Interpretations from industry and strategic implications for organizations. J. Manag. Stud. 2001, 38, 489–513. [Google Scholar] [CrossRef]
Khan, P.A.; Johl, S.K.; Johl, S.K. Does adoption of ISO 56002-2019 and green innovation reporting enhance the firm sustainable development goal performance? An emerging paradigm. Bus. Strategy Environ. 2021, 30, 2922–2936. [Google Scholar] [CrossRef]
Yin, S.; Zhang, N.; Ullah, K.; Gao, S. Enhancing digital innovation for the sustainable transformation of manufacturing industry: A pressure-state-response system framework to perceptions of digital green innovation and its performance for green and intelligent manufacturing. Systems 2022, 10, 72. [Google Scholar] [CrossRef]
Mubarak, M.F.; Tiwari, S.; Petraite, M.; Mubarik, M.; Raja Mohd Rasi, R.Z. How Industry 4.0 technologies and open innovation can improve green innovation performance? Manag. Environ. Qual. 2021, 32, 1007–1022. [Google Scholar] [CrossRef]
Ahmed, R.R.; Akbar, W.; Aijaz, M.; Channar, Z.A.; Ahmed, F.; Parmar, V. The role of green innovation on environmental and organizational performance: Moderation of human resource practices and management commitment. Heliyon 2023, 9, e12679. [Google Scholar] [CrossRef] [PubMed]
Hofbauer, J.; Sigmund, K. Evolutionary Games and Population Dynamics; Cambridge University Press: Cambridge, UK, 1998. [Google Scholar] [CrossRef]
Rambachan, A.; Roth, J. A more credible approach to parallel trends. Rev. Econ. Stud. 2023, 90, 2555–2591. [Google Scholar] [CrossRef]
Liang, Z.; Shen, Y.; Yang, K.; Kuang, J. The role of high-tech certification in enterprise green innovation. Front. Environ. Sci. 2025, 13, 1539990. [Google Scholar] [CrossRef]
Lee, H.L.; Padmanabhan, V.; Whang, S. Information distortion in a supply chain: The bullwhip effect. Manag. Sci. 1997, 43, 546–558. [Google Scholar] [CrossRef]
Flammer, C. Corporate green bonds. J. Financ. Econ. 2021, 142, 499–516. [Google Scholar] [CrossRef]

Figure 1. Public data openness as a system-level intervention in the green-innovation ecosystem. Panel (a) presents the conceptual map: public data openness enters as an exogenous landscape shock and propagates through three coupling channels—supply–demand alignment, recalibration of government intervention, and amplification of green credit—each feeding back to firm-level fitness and the equilibrium share of green-innovation strategies. Panel (b) presents the corresponding feedback representation; the blue rectangle demarcates the boundary of the green-innovation ecosystem (the system Σ = ⟨N, S, F, W⟩), separating the endogenous agents and feedback loops inside the system from the exogenous data-openness shock entering from outside. Panels (a,b) are referenced where the framework is introduced in Section 2.1; a higher-resolution version of the figure with enlarged symbols will be supplied at production to improve legibility.

Figure 2. Event-time parallel-trend test. The vertical dashed line marks the timing of public-data openness (the reference period), the solid horizontal line denotes the zero effect, and the whiskers represent 95% confidence intervals.

Figure 3. Randomisation-inference placebo test (300 replications).

Figure 4. Honest-DID sensitivity [22]. The horizontal dashed line denotes the zero effect; the red bar shows the original 95% confidence interval, and the blue bars show the robust confidence intervals as the relaxation parameter increases.

Table 1. Definitions of control variables.

Variable	Symbol	Definition
Firm size	Size	Natural logarithm of total assets
Leverage	Lev	Total debt divided by total assets
Profitability	ROA	Net income divided by total assets
Listing age	Listage	Natural logarithm of years since IPO
Board independence	Indep	Share of independent directors (%)
Ownership concentration	Top1	Shareholding ratio of the largest shareholder
CEO–chair duality	Dual	Indicator equal to 1 if CEO is also board chair
Capital intensity	FixA	Net-fixed assets divided by total assets
Board size	Bdsize	Natural logarithm of board members
Sales growth	Salesgr	Year-on-year growth rate of operating revenue
Human capital	Hcap	College-graduate share of city employment
Financial deepening	Fin	(Deposits + loans)/city GDP
GDP per capita	Eco	Natural logarithm of GDP per capita (city level)

Notes: All firm-level controls are computed from CSMAR and WIND filings; city-level controls are from the China Urban Statistical Yearbook. Continuous variables are winsorised at the 1st and 99th percentiles.

Table 2. Descriptive statistics results.

Variable	N	Mean	SD	Min	Max
GI	14,865	6.491	1.789	0.693	9.632
Open	14,865	0.322	0.467	0.000	1.000
dev	13,272	0.018	0.293	−1.000	11.415
govintv	14,062	0.129	0.047	0.043	0.625
gcredit	12,393	0.092	0.024	0.043	0.160
FixA	14,865	0.230	0.166	0.000	0.960
Salesgr	14,865	0.294	3.487	−0.985	263.271
Bdsize	14,865	2.140	0.203	0.693	2.890
Indep	14,865	37.180	5.433	0.000	80.000
Dual	14,865	0.251	0.434	0.000	1.000
Top1	14,865	0.343	0.147	0.022	0.900
Listage	14,865	1.991	0.920	0.000	3.434
Size	14,865	21.978	1.232	17.803	27.967
Fin	14,865	3.258	1.412	0.588	8.778
Hcap	14,865	0.050	0.039	0.001	0.144
Eco	14,865	11.148	0.651	8.448	12.293

Notes: Continuous variables are winsorised at the 1st and 99th percentiles. The variable dev is the firm-level supply–demand mismatch index Φ defined in Equation (18) (the ratio of supply-side to demand-side volatility; larger values denote greater coordination friction); govintv is government-intervention intensity (municipal general-budget expenditure scaled by regional GDP, Equation (19)); and gcredit is the green-credit ratio (Equation (20)). Counts vary across mediators because some mediator inputs (sales-volume volatility, municipal expenditure, and green credit) are missing for a subset of firm–years.

Table 3. Baseline Difference-in-Differences estimates.

Variables	(1) City-Clustered	(2) Two-Way Clustered
Open	0.2379 ***	0.2379 ***
	(0.0733)	(0.0783)
Firm FE	Yes	Yes
Year FE	Yes	Yes
Controls	Yes	Yes
N	14,567	14,567
Adj. R²	0.987	0.987

Notes: Dependent variable: ln(1 + green-patent applications). Column (1) clusters standard errors at the city level; column (2) employs two-way clustering at the firm and city levels. Controls include firm-level Size, Lev, ROA, Listage, Indep, Top1, Dual, FixA, Bdsize, Salesgr, Hcap, and city-level Fin, Eco. Standard errors in parentheses. Firm- and year-fixed effects are included in all columns. *** p < 0.01.

Table 4. Decomposition of the effect by patent type.

Variables	(1) Invention	(2) Utility Model	(3) Grants
Open	0.3906 ***	0.1166	0.1100
	(0.0810)	(0.0879)	(0.0752)
Firm FE	Yes	Yes	Yes
Year FE	Yes	Yes	Yes
Controls	Yes	Yes	Yes
N	14,567	14,567	14,567
Adj. R²	0.980	0.983	0.986

Notes: Dependent variables: ln(1 + green-invention patents), ln(1 + green-utility-model patents), and ln(1 + authorised green patents), respectively. Standard errors clustered at the city level in parentheses. Same controls as Table 3. Firm- and year-fixed effects are included in all columns. *** p < 0.01.

Table 5. Heterogeneity-robust DID and double machine-learning estimates.

Variables	(1) CSDID ATT	(2) CSDID Post-Avg.	(3) DDML (Lasso)	(4) DDML (Stack.)
Open	0.5721 ***	0.5731 ***	0.1080 ***	0.1288 **
	(0.0584)	(0.0587)	(0.0343)	(0.0646)
Controls	Yes	Yes	Yes (residualised)	Yes (residualised)
Firm FE	No	No	Yes (residualised)	Yes (residualised)
Year FE	No	No	Yes (residualised)	Yes (residualised)
Estimator	CS-DID	CS-DID	Lasso (cross-fit)	pystacked
N	9459	9459	14,567	14,567

Notes: Columns (1)–(2) report the Callaway and Sant’Anna [12] doubly robust estimator of ATT (g, t), aggregated as the overall ATT and the simple average across post-treatment event times. The CSDID procedure conditions on the same covariate set as the baseline. Columns (3)–(4) report double/debiased machine-learning estimates [11] with cross-fitted Lasso and the pystacked meta-learner, respectively; both residualise the high-dimensional control block rather than estimating it parametrically. Standard errors clustered at the city level in parentheses (CSDID: influence-function based; DDML: cross-fit). Firm- and year-fixed effects are included in all columns. *** p < 0.01, ** p < 0.05.

Table 6. Conventional robustness checks.

Variables	(1) Ext. Controls	(2) Digital/Trade	(3) SC/HSR	(4) Excl. Pandemic	(5) Excl. Munic.
Open	0.2383 ***	0.1342 *	0.2565 ***	0.2474 ***	0.2379 ***
	(0.0730)	(0.0802)	(0.0730)	(0.0854)	(0.0733)
Firm FE	Yes	Yes	Yes	Yes	Yes
Year FE	Yes	Yes	Yes	Yes	Yes
Controls	Yes	Yes	Yes	Yes	Yes
N	14,567	11,892	14,567	11,261	14,567
Adj. R²	0.987	0.989	0.987	0.981	0.987

Notes: Dependent variable: ln(1 + green-patent applications). Column (1) augments the baseline with an extended control block (additional firm and city covariates). Column (2) excludes cities exposed to digital and trade pilot programmes. Column (3) excludes supply-chain pilot and high-speed-rail-opening cities. Column (4) excludes 2020–2022. Column (5) excludes Beijing, Shanghai, Tianjin, and Chongqing. Standard errors clustered at the city level in parentheses. Firm and year fixed effects are included in all columns. *** p < 0.01, * p < 0.10.

Table 7. Continuous openness intensity.

Variable	(1) Years Open	(2) ln(1 + Years)
Openness intensity	0.0404 **
	(0.0190)
ln(1 + years open)		0.1479 **
		(0.0605)
Firm FE	Yes	Yes
Year FE	Yes	Yes
Controls	Yes	Yes
N	14,567	14,567

Notes: dependent variable: ln(1 + green-patent applications). Openness intensity is the cumulative number of years since the city launched its open-data platform. Firm- and year-fixed effects included; standard errors clustered at the city level in parentheses; controls follow Table 1. ** p < 0.05.

Table 8. Robustness to alternative green-innovation measures.

Variable	(1) Green Invention Grants	(2) Green Invention Share
Open	0.3682 ***	0.0636 ***
	(0.0858)	(0.0210)
Firm FE	Yes	Yes
Year FE	Yes	Yes
Controls	Yes	Yes
N	14,567	14,567

Notes: column (1): dependent variable: ln(1 + green invention-patent grants); column (2): green invention patents divided by total green patents. Firm- and year-fixed effects included; standard errors clustered at the city level in parentheses; controls follow Table 1. *** p < 0.01.

Table 9. Mechanism tests for the three coupling channels.

Variables	(1) dev	(2) GI	(3) govintv	(4) GI	(5) gcredit	(6) GI
Open	−0.0439 **	0.2384 ***	−0.0163 ***	0.2312 ***	0.0048 ***	0.2216 ***
	(0.0177)	(0.0734)	(0.0036)	(0.0754)	(0.0011)	(0.0719)
dev		−0.0199 **
		(0.0087)
govintv				−0.7226
				(0.4763)
gcredit						0.3856 ***
						(0.1404)
Firm FE	Yes	Yes	Yes	Yes	Yes	Yes
Year FE	Yes	Yes	Yes	Yes	Yes	Yes
Controls	Yes	Yes	Yes	Yes	Yes	Yes
N	12,923	12,923	13,729	13,729	12,135	12,135
Adj. R²	0.345	0.987	0.903	0.987	0.554	0.987

Notes: Columns (1), (3), (5) are stage-1 regressions with the mediator as the dependent variable; columns (2), (4), (6) are stage-2 regressions with ln(1 + green-patent applications) as the dependent variable and Open and the mediator both included. Standard errors clustered at the city level in parentheses. Same control vector as Table 3. Firm- and year-fixed effects are included in all columns. *** p < 0.01, ** p < 0.05.

Table 10. Joint test of the three coupling channels.

Variable	(1) Green Patent Applications
Open	0.2254 ***
	(0.0724)
dev	−0.0164
	(0.0108)
govintv	−0.3862
	(0.4484)
gcredit	0.3382 **
	(0.1420)
Joint F-test (three mediators)	F = 2.88, p = 0.037
Firm FE	Yes
Year FE	Yes
Controls	Yes
N	11,326

Notes: dependent variable: ln(1 + green-patent applications); all three mediators entered simultaneously. Firm- and year-fixed effects included; standard errors clustered at the city level in parentheses; controls follow Table 1. *** p < 0.01, ** p < 0.05.

Table 11. Heterogeneity results by ecosystem position.

Variables	(1) Non-Heavy-Pol.	(2) Non-SOE	(3) Green-Exec	(4) Low-Fin-Dev	(5) Early-Open	(6) High-Openness
Open	0.2194 ***	0.2287 ***	0.3449 ***	0.3584 ***	0.3327 ***	0.3189 ***
	(0.0800)	(0.0802)	(0.0722)	(0.1105)	(0.1102)	(0.0836)
Firm FE	Yes	Yes	Yes	Yes	Yes	Yes
Year FE	Yes	Yes	Yes	Yes	Yes	Yes
Controls	Yes	Yes	Yes	Yes	Yes	Yes
N	10,680	8479	9982	7205	7004	9209
Adj. R²	0.988	0.989	0.990	0.974	0.994	0.992

Notes: Each column reports the baseline DID specification (Equation (14)) estimated on the subsample indicated in the column header. Column-header abbreviations are defined as follows: (1) Non-heavy-pol. = firms outside heavily polluting industries; (2) Non-SOE = non-state-owned enterprises; (3) Green-exec = firms whose top management has prior environmental-protection experience; (4) Low-fin-dev = cities below the median financial-development index; (5) Early-open = cities that adopted PDO earlier than the sample median adoption year; (6) High-openness = cities above the median data-quality (openness) score. Standard errors clustered at the city level in parentheses. Same controls as Table 3. Firm- and year-fixed effects are included in all columns. *** p < 0.01.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, X.; Zhang, L. Data Elements as a Systemic Enabler of Corporate Green Innovation: A Complex Adaptive System Perspective on China’s Public Data Openness Reform. Systems 2026, 14, 731. https://doi.org/10.3390/systems14070731

AMA Style

Zhang X, Zhang L. Data Elements as a Systemic Enabler of Corporate Green Innovation: A Complex Adaptive System Perspective on China’s Public Data Openness Reform. Systems. 2026; 14(7):731. https://doi.org/10.3390/systems14070731

Chicago/Turabian Style

Zhang, Xuexin, and Lin Zhang. 2026. "Data Elements as a Systemic Enabler of Corporate Green Innovation: A Complex Adaptive System Perspective on China’s Public Data Openness Reform" Systems 14, no. 7: 731. https://doi.org/10.3390/systems14070731

APA Style

Zhang, X., & Zhang, L. (2026). Data Elements as a Systemic Enabler of Corporate Green Innovation: A Complex Adaptive System Perspective on China’s Public Data Openness Reform. Systems, 14(7), 731. https://doi.org/10.3390/systems14070731

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Data Elements as a Systemic Enabler of Corporate Green Innovation: A Complex Adaptive System Perspective on China’s Public Data Openness Reform

Abstract

1. Introduction

2. Theoretical Framework and Hypotheses

2.1. Conceptual Foundations

2.2. The Green-Innovation Ecosystem as a Complex Adaptive System

2.3. Knowledge Accumulation and Replicator Dynamics

2.4. The Selection Landscape and the Main Effect

2.5. Coupling Channel I—Supply–Demand Alignment

2.6. Coupling Channel II—Recalibration of Government Intervention

2.7. Coupling Channel III—Amplification of Green-Credit Access

3. Research Design

3.1. Institutional Background

3.2. Empirical Specification

3.3. Variables

3.3.1. Dependent Variable

3.3.2. Treatment Variable

3.3.3. Mediators

3.3.4. Control Variables

3.4. Data Sources and Sample

4. Empirical Results

4.1. Descriptive Statistics

4.2. Baseline DID

4.3. Decomposition by Patent Type

4.4. Identification Checks

4.5. Heterogeneity-Robust Estimators and Machine Learning

4.6. Conventional Robustness

5. Mechanism and Heterogeneity Analysis

5.1. The Three Coupling Mechanisms

5.2. Heterogeneity by Ecosystem Position

6. Discussion and Conclusions

6.1. Theoretical Contributions

6.2. Contribution to Systems Research

6.3. Policy Implications

6.4. Limitations and Future Research

6.5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI