1. Introduction
On a spring morning in 2016, the municipal government of Hangzhou switched on a new kind of urban infrastructure—not a road, a substation, or a fibre line, but an open-data portal that, by close of business, had exposed several hundred government-held datasets to firms, researchers, and citizens. Within three years, green-patent filings by Hangzhou-headquartered listed firms had grown markedly faster than those of otherwise comparable firms in cities that had not yet opened their data. Across China, the pattern repeats as follows: cities that opened public data earlier and more comprehensively tend, after a lag, to host firms whose green-innovation output pulls ahead. Yet the pattern is far from uniform. Among firms in the same city, ownership structure, managerial background, and pre-existing absorptive capacity appear to partition the response into sharply different regimes. Some firms accelerate; others do not move at all.
This unevenness sits awkwardly with the dominant theoretical lenses applied to data and innovation. Resource-based and knowledge-based accounts treat data as an input whose marginal product is positive but bounded by absorptive capacity [
1]; information-economics accounts emphasise signal-precision gains that loosen credit constraints [
2]; and digital-economics arguments highlight reductions in search and verification cost. Each lens captures something, but each treats the firm in isolation from the ecosystem in which it acts. The heterogeneity we observe—and the comparative-static pattern across mechanisms—is difficult to derive from any single lens taken alone.
We argue that PDO is better read as an exogenous shock to a CAS: the green-innovation ecosystem [
3,
4]. Firms are heterogeneous, bounded-rational agents whose innovation strategies co-evolve with a selection environment composed of customers, suppliers, regulators, and financiers [
5,
6]. Public data do not enter agents’ production functions as a passive factor; they reshape the fitness landscape on which agents adapt, and propagate through the following three coupling channels that are inscribed in the ecosystem rather than within any single firm: it tightens supply–demand alignment by reducing signal noise; it recalibrates the role of the state by compressing the information rents on which discretionary intervention rests [
7,
8]; and it amplifies green-credit access by sharpening the posterior beliefs that lenders form about firm green credibility. From this CAS frame we derive four formal propositions—a main effect and three channel effects—each of which yields a testable hypothesis with a sign restriction.
We test these predictions using the staggered roll-out of municipal open-data platforms across Chinese A-share listed firms over 2013–2022 as a quasi-natural experiment. The empirical strategy combines a multi-period DID baseline with the following two safeguards demanded by the recent econometric literature: the Callaway and Sant’Anna [
9] estimator (hereafter CSDID), which corrects the heterogeneous-treatment-effect biases that contaminate two-way fixed-effect DID (hereafter TWFE-DID) [
10,
11], and DDML [
12], which residualises high-dimensional controls without imposing a linear specification.
The evidence supports the CAS reading. The main effect is positive and robust across TWFE-DID, CSDID, and DDML specifications. Each of the three coupling channels operates in the direction implied by its proposition. Additionally, the heterogeneity signature is consistent with the absorptive-capacity primitive in our model as follows: PDO is most consequential for firms whose ecosystems and managerial cognition are best able to convert data into actionable green-innovation effort, and least consequential where regime lock-in or limited absorptive capacity blocks the conversion.
In bringing these system-theoretic primitives to bear on the data-element literature, the paper provides both a formal account of why uniform data-openness reforms yield uneven returns and firm-level causal evidence on the channels through which they operate. This focus distinguishes our study from Lv and Zhang [
13], who examine how open government data raise green economic efficiency at the regional level as follows: we shift the level of analysis to the firm, identify a causal effect on green innovation rather than aggregate efficiency, and open the mechanism black box through a derived three-channel decomposition. A full statement of the paper’s contributions is deferred to
Section 6.1, where it can be read against the evidence.
This study is positioned within the scope of system practice in social science because public data openness is not examined as an isolated digital policy, but as a system-level intervention that changes the interaction rules among firms, governments, financial institutions, and markets. In this system, firms are heterogeneous adaptive agents; governments provide institutional signals and regulatory constraints; financial institutions allocate green credit according to information precision; and markets transmit demand-side feedback. Public data openness alters the information architecture connecting these actors, thereby changing feedback loops, coordination costs, and the fitness landscape of green-innovation strategies. This framing allows the paper to move beyond a linear “policy–outcome” logic and to explain why the same data-opening reform produces uneven innovation responses across firms and regions. In short, public data openness reshapes the adaptive structure of the corporate green-innovation system by changing information flows, feedback loops, agent interactions, and the selection environment in which firms, governments, markets, and financial institutions co-evolve.
The rest of the paper proceeds as follows.
Section 2 develops the theoretical framework and derives the hypotheses.
Section 3 describes the research design.
Section 4 presents the empirical results.
Section 5 reports the mechanism and heterogeneity analyses.
Section 6 discusses theoretical contributions, articulates the paper’s contribution to systems research, draws out policy implications, and concludes.
2. Theoretical Framework and Hypotheses
2.1. Conceptual Foundations
Three strands of theory underpin our analysis. First, the data-as-factor literature argues that data are characterised by non-rivalry, near-zero replication cost, and combinatorial value, so that their economic worth grows with re-use, integration, and depth of processing [
2,
14]. Public data openness is the institutional mechanism that scales this potential by exposing previously siloed government datasets to a heterogeneous community of users [
13]. The literature, however, has largely modelled PDO as a static input that linearly enters firms’ production or knowledge-accumulation functions, under-weighting the dynamic, agent-level adaptation it triggers. In the Chinese setting this adaptation is shaped by distinctive institutional features—an unusually salient role for local government in resource allocation, a state-influenced supply of green credit, and a fragmented, municipality-by-municipality data-governance landscape—so that public data openness operates on a selection environment in which administrative signals are first-order rather than peripheral.
Second, the innovation-ecosystem and CAS literature conceives of innovation as the emergent outcome of a population of heterogeneous, bounded-rational agents whose strategies co-evolve with a selection environment [
3,
4]. CAS supplies the following three theoretical primitives we exploit: agents adapt locally, so exogenous shocks propagate through micro-level revision rather than instantaneous re-equilibration; the system exhibits emergent macro-patterns that cannot be read off individual incentives; and shocks reshape the fitness landscape, altering both the level and the relative payoffs of competing strategies. A complementary multi-level perspective [
5] explicates how niche innovations penetrate incumbent regimes under landscape pressure, while dynamic-capability theory [
6] supplies the firm-level micro-foundation for adaptation.
Third, the corporate green-innovation literature converges on a patent-based measurement convention [
15] and identifies two families of antecedents—internal (absorptive capacity, R&D intensity, managerial cognition [
1,
16]) and external (environmental regulation, industrial policy, and financial development [
7,
8,
17])—that interact within an ecosystem [
18,
19,
20]. The CAS reading we adopt integrates these strands by treating internal absorptive capacity as the firm-level adaptation kernel and external institutions and markets as the selection environment, with PDO operating as an exogenous landscape shock.
Figure 1 visualises the resulting theoretical framework as follows: PDO enters the green-innovation ecosystem as an exogenous landscape shock and propagates through three coupling channels—supply–demand alignment, recalibration of government intervention, and amplification of green credit—each of which feeds back to firm-level fitness and the equilibrium share of green-innovation strategies.
2.2. The Green-Innovation Ecosystem as a Complex Adaptive System
We model the green-innovation ecosystem as a four-tuple
, where
is a finite set of heterogeneous firm agents,
is the strategy space of innovation portfolios, F is a fitness function mapping strategy–environment pairs to expected payoffs, and W is a transition kernel that governs agent-level adaptation [
3]. Public data openness
enters the environment as an exogenous variable; the policy question is how a perturbation of this variable propagates through W into the equilibrium distribution of strategies in
.
In this framework, the system boundary is defined as the urban green-innovation ecosystem in which listed firms interact with local governments, financial institutions, suppliers, customers, and data-platform operators. The agents are not assumed to optimise under complete information; rather, they update their innovation strategies through bounded rationality and local learning. Public data openness changes the system’s information topology by reducing data fragmentation and increasing observability across agents. As a result, the reform affects not only individual firms’ knowledge stocks, but also the feedback loops through which market demand, regulatory intervention, and green finance jointly select and reinforce green-innovation strategies. The emergent macro-pattern observed in green-patent data is therefore the population-level footprint of these reconfigured feedback loops, rather than the simple sum of independent firm-level responses to a digital-policy shock. A feature specific to the Chinese reform reinforces this reading as follows: because municipal data platforms were launched in a staggered, locally initiated manner rather than through a single national mandate, the reform itself supplies the heterogeneous, exogenous landscape shocks our identification exploits, while the agents adapting to them remain embedded in a strongly state-mediated institutional context.
2.3. Knowledge Accumulation and Replicator Dynamics
Let
denote the green-innovation-relevant knowledge stock of firm
. Following Cohen and Levinthal [
1] and Teece [
6], knowledge evolves according to the following:
where
is the depreciation rate,
is internal R&D,
is the publicly accessible data stock, and
is an increasing, concave absorptive-capacity function. Equation (1) embeds the Cohen–Levinthal formalism in an open-data environment as follows: the marginal contribution of PDO to firm i’s knowledge stock is
and is positive for every firm. Because A(·) is increasing and concave, this marginal contribution is positive but decreasing in the level of absorptive capacity as follows: public data openness always raises the knowledge stock, and firms with greater absorptive capacity attain a higher level of converted knowledge A(K), even though the marginal productivity of an additional unit of data diminishes. The cross-firm heterogeneity we exploit below therefore reflects differences in the level of absorptive capacity and in ecosystem position, not a monotone ranking of this marginal term.
At the population level, let
denote the share of firms adopting green-innovation strategy
. Following Hofbauer and Sigmund [
21], adaptation in
follows the replicator equation:
with
the average fitness. Equation (2) states that strategies with above-average fitness expand their share and those with below-average fitness contract. The macro-level patterns observed in patent data are the aggregate footprint of this population-level adaptation; the question is how changes in
alter the fitness landscape that shapes (2).
2.4. The Selection Landscape and the Main Effect
Following the production-function tradition [
7] and the data-as-factor literature [
2], we specify fitness as a generalised Cobb–Douglas function in which
enters as a distinct factor:
where
is labour input, Mᵢ denotes the institutional and market environment, and
are output elasticities. Differentiating (3) with respect to
yields the direct fitness effect of public data openness:
PDO uniformly elevates the fitness of strategies that rely on external information, scaled by . Combining (2) with (4), the equilibrium share of green-innovation strategies is strictly increasing in . Two features of this derivation are non-trivial. First, because appears in (1), the marginal effect of Do on —and hence on —is heterogeneous across agents in a way that yields the empirical heterogeneity signature we look for. Second, because fitness is multiplicative in , PDO can shift not only directly but also via the institutional and market channels we develop next.
Proposition 1 (Main Effect). An increase in public data openness shifts the selection landscape such that the equilibrium share of green-innovation strategies in the firm population is strictly increasing in .
Hypothesis 1. Public data openness has a positive effect on corporate green-innovation activity.
Because the marginal effect of public data openness on a firm’s knowledge stock is λ·A′(K) (Equation (1)), the same shock is expected to produce different responses across agents according to their absorptive capacity and ecosystem position. We therefore state three directional expectations before turning to the data. First, on absorptive capacity and managerial cognition, the effect should be larger for firms better able to convert data into green-innovation effort—firms with environmentally experienced executives and non-state and non-heavily polluting firms. Second, on regional financial development, the effect should be larger where formal credit markets are thin, because open data substitutes for missing financial infrastructure (the credit channel of H4). Third, on openness quality and maturity, the effect should be larger in cities with earlier or higher-quality data openness. The heterogeneity analysis in
Section 5.2 tests these stated priors rather than interpreting subsample differences post hoc.
2.5. Coupling Channel I—Supply–Demand Alignment
A central CAS prediction is that exogenous shocks alter the institutional component of fitness through the informational topology that connects supply and demand. Let
denote firm i’s supply and demand quantities and define the supply–demand mismatch index:
Mismatch raises coordination cost
, specified as a convex quadratic:
The cost enters fitness through the following institutional channel:
. PDO narrows the information gap between supply and demand sides—standardised, machine-readable datasets reduce signal noise about regulatory trends, emissions monitoring, procurement, and technology adoption—and so:
By the chain rule applied through (6) and the institutional channel,
Proposition 2. Public data openness raises green-innovation fitness by compressing supply–demand mismatch.
Hypothesis 2. Public data openness promotes corporate green innovation by reducing supply–demand mismatch.
Interpreted in system terms, this is a market-feedback coupling mechanism: by improving the visibility of demand-side information and regulatory trends, public data openness strengthens the feedback loop from market signals to firms’ green-innovation strategies, so the channel runs continuously rather than as a one-off mediator.
2.6. Coupling Channel II—Recalibration of Government Intervention
Aghion et al. [
7] and Chang et al. [
8] show that government intervention in industrial and environmental policy is non-monotonic in its productivity effects: mild intervention corrects externalities, but excessive intervention crowds out private incentives. We parameterise this through a quadratic component multiplying baseline fitness:
The interior optimum is
. When ambient
exceeds
, fitness is decreasing in further intervention. Public data openness compresses the information rents on which discretionary intervention rests as follows [
8,
16]: once procurement, subsidy allocation, and regulatory targeting are observable to firms and citizens, the marginal return to additional state engagement falls. Formally,
Substituting (10) into the derivative of (9) yields in the over-intervention regime, the empirically relevant case for many Chinese municipalities. PDO thus operates as a corrective mechanism on the state side of the ecosystem, not just on the firm side.
Proposition 3. When ambient government intervention exceeds the social optimum, public data openness raises green-innovation fitness by dampening intervention.
Hypothesis 3. Public data openness promotes corporate green innovation by reducing the intensity of government intervention.
Interpreted in system terms, this is an institutional-feedback mechanism: greater transparency of policy implementation and subsidy allocation recalibrates the state’s role from discretionary actor to feedback channel, so that policy signals correct rather than crowd out the selection of green-innovation strategies.
2.7. Coupling Channel III—Amplification of Green-Credit Access
The credit channel operates through information asymmetry between firms and lenders. Banks form posterior beliefs about a firm’s green credibility from observable signals
; under Bayesian updating with prior π
0 that the firm is green-type,
where
and
are the conditional densities of the signal under green-type and non-green-type firms respectively. PDO sharpens signal precision—standardised emissions, energy-use, and regulatory-compliance data tighten the distribution of
around its true mean—producing, by a standard monotone-likelihood-ratio argument:
Translating into the equilibrium green-credit ratio
—the share of credit allocated to certified green projects—yields:
The step from Equation (12) to Equation (13) follows by the chain rule. Equation (12) shows that the posterior P(g|s) is increasing in signal precision 1/Var(s|g) for high signal realisations; aggregating over the population of high-signal green-type firms, the equilibrium green-credit ratio inherits this monotonicity, so . Because public data openness raises signal precision, , and the product of the two partials gives />0, which is Equation (13).
Proposition 4. Public data openness amplifies green-credit access by reducing information asymmetry in the credit market.
Hypothesis 4. Public data openness promotes corporate green innovation by raising the proportion of green credit.
Interpreted in system terms, this is a financial-feedback mechanism: sharper environmental and compliance signals let lenders distinguish credible green innovators from symbolic adopters, so that capital flows reinforce the strategies the data infrastructure makes legible.
Taken together, the three channels are not three independent mediating variables but three interlocking coupling mechanisms—market-feedback, institutional-feedback, and financial-feedback—that jointly reconfigure the feedback architecture of the green-innovation ecosystem. Public data openness operates as a system-level enabler precisely because it activates these mechanisms simultaneously, rather than because it raises any single firm-level input.
6. Discussion and Conclusions
6.1. Theoretical Contributions
The paper makes four contributions. Theoretically, it brings CAS and innovation-ecosystem theory to bear on the data-element literature, providing a formal scaffold—knowledge accumulation, replicator dynamics, and an explicit fitness landscape—that generates the comparative-static predictions the empirical analysis tests. Methodologically, it triangulates conventional DID with heterogeneity-robust estimators and machine-learning controls, addressing critiques that have rendered much of the existing DID-based open-data literature vulnerable. Substantively, it decomposes the policy’s causal architecture into three derived coupling channels and shows that mediation analyses confirm each. Practically, it reframes public data openness from a uniform public good into a systemic enabler whose returns scale with absorptive capacity, motivating a tiered policy design.
Within the CAS apparatus developed above, the paper makes two theoretical moves. First, it imports the formal apparatus of CAS—replicator dynamics, an explicit fitness landscape, and a knowledge-accumulation law with absorptive-capacity heterogeneity—into the literature on data elements and innovation policy. Where the prevailing approach treats data as an input that linearly augments output, the CAS reading treats public data openness as an environmental shock that propagates through coupling mechanisms inscribed in the ecosystem rather than within any single firm [
3,
4]. Second, it derives the three coupling channels from a common model primitive rather than enumerating them post hoc. This permits sign-restricted testing and a more disciplined mediation analysis. Both moves prepare the ground for the broader contribution to system research articulated in
Section 6.2 and respond to the call by Yin et al. [
18] for system-theoretic frames that articulate how digital infrastructure interacts with green-innovation performance.
A further theoretical implication is that the empirical heterogeneity is not a nuisance to be controlled for, but a prediction of the model. Because the marginal effect of on is , the same shock produces different fitness gains across agents whose differs—which is exactly the pattern we observe across ownership, industry, managerial cognition, and regional data quality. Future work that treats heterogeneity as evidence about the structure of is likely to yield further identification leverage.
6.2. Contribution to Systems Research
Beyond its substantive contribution to the data-element and green-innovation studies, this study contributes to system research in three distinct ways. First, it extends system practice in social science by treating China’s public data openness reform as a system-level intervention rather than a conventional digital-policy shock. The analysis shows how a policy that appears administrative at the surface can restructure the adaptive relations among firms, governments, markets, and financial institutions, and that the macro-pattern observed in green-patent data is the population-level footprint of this restructuring rather than the additive sum of independent firm-level responses. Second, it contributes to complex-system research by formalising corporate green innovation as an emergent outcome of heterogeneous agents adapting on a changing fitness landscape. This approach explains why policy effects are not uniform, but depend on absorptive capacity, managerial cognition, and regional ecosystem conditions, and it converts heterogeneity from a nuisance to be conditioned away into direct evidence about the structure of the adaptation kernel. Third, it contributes to digital-system research by showing that data elements function not only as production inputs, but also as connectors that reshape information flows, feedback loops, and cross-agent coordination. In this sense, public data openness operates as a systemic enabler of green innovation rather than a simple resource supplement, and the three coupling mechanisms—market-feedback alignment, institutional-feedback recalibration, and financial-feedback amplification—jointly illustrate how a single information-architecture reform can re-wire the selection environment of an entire innovation ecosystem.
6.3. Policy Implications
Three implications follow for the design of data infrastructure. First, dataset quality and update frequency, rather than headline coverage, should be the operative metric: only data that meaningfully reduce signal noise compresses
and shift fitness. Second, complementary investments in firm-level absorptive capacity—digital infrastructure, data-literacy training, and R&D tax incentives—are pre-conditions for PDO returns, not optional add-ons [
1,
6]. Third, the credit channel is policy-actionable as follows: pairing PDO portals with green-credit certification protocols would amplify the H4 mechanism we identify and would do so most strongly in financially under-developed regions where the channel binds.
6.4. Limitations and Future Research
The study has limitations that open avenues for future research. It focuses on Chinese A-share listed firms over 2013–2022; extension to private firms and other jurisdictions would broaden external validity. Because the sample excludes unlisted small and medium-sized enterprises—which rely more on external open data than on internal R&D—our estimates may be a lower bound if such firms benefit more from openness, or an upper bound if listed firms’ greater absorptive capacity lets them convert open data more effectively; the net direction is a boundary condition rather than a settled magnitude. The sample window also ends in 2022, so post-2022 developments—further municipal platform rollouts, evolving data-openness quality, and post-pandemic normalisation—fall outside our estimates and are flagged as a scope condition. The patent-based measure captures innovation effort and output but not market diffusion or downstream environmental outcomes. The CAS model abstracts from inter-firm strategic interaction; richer agent-based simulations could quantify emergent dynamics such as cluster formation and regime shifts [
5]. Finally, our mediation analysis identifies the direction and presence of the three channels but not their relative welfare weights; explicit structural estimation of Equations (1)–(13) would permit such a decomposition. Pursuing these avenues would deepen the integration of system theory with the empirical evaluation of data-element policies.
6.5. Conclusions
This paper has reframed corporate green innovation under public data openness through the lens of CAS. Treating data as a strategic factor of production and PDO as an exogenous shock to the fitness landscape of the green-innovation ecosystem, we derived four formal propositions—a main effect and three coupling channels operating through supply–demand alignment, the recalibration of government intervention, and the amplification of green credit—and translated each into a sign-restricted, testable hypothesis. Exploiting the staggered municipal roll-out of open-data platforms across China between 2013 and 2022, we identified the causal effect using a multi-period DID design and reinforced inference with the Callaway–Sant’Anna estimator and DDML, two safeguards that jointly address heterogeneous-treatment-effect mis-weighting and high-dimensional confounding.
The empirical evidence converges on a single, theoretically coherent picture. PDO raises firm-level green-innovation activity by roughly a quarter relative to comparable closed-data cities, with the effect concentrated in substantive invention patents rather than utility-model patents—consistent with a shift in the population of innovation strategies toward higher-fitness, knowledge-intensive configurations. All three derived channels operate in the direction implied by the model as follows: supply–demand mismatch contracts, discretionary government intervention is dampened in the over-intervention regime, and the green-credit ratio expands. Heterogeneity analyses confirm the model’s most distinctive prediction as follows: the same exogenous shock yields markedly different fitness gains across agents whose absorptive capacity, managerial cognition, and ecosystem position differ.
Taken together, these results reposition PDO from a uniform public good into a systemic enabler whose marginal returns scale with the absorptive capacity of recipient firms and regions. The policy implication is that uniform national rollouts will continue to generate uneven returns; tiered designs that pair data infrastructure with investments in firm-level absorptive capacity, data-literacy training, and green-credit certification protocols are likely to capture a substantially larger share of the potential welfare gains. More broadly, the analysis illustrates the value of bringing system-theoretic primitives—replicator dynamics, fitness landscapes, and absorptive-capacity heterogeneity—to bear on the evaluation of data-element policies, and points to a research agenda in which heterogeneity is treated not as a nuisance to be conditioned out but as direct evidence on the structure of the adaptation kernel itself.