1. Introduction
Public data is a fundamental resource for promoting high-quality economic development and fostering innovation. Comprehensive promotion of the openness, sharing, and utilization of public data is essential for enhancing social governance capabilities and implementing innovation-driven development strategies [
1,
2,
3]. The Chinese government emphasizes the need to strengthen the aggregation, sharing, and open development of public data, promote interconnectivity, and break down data barriers. This is of significant importance for improving the efficiency of social resource allocation, promoting green technological innovation, and achieving sustainable development goals [
2,
4,
5]. However, a large amount of public data resources still remain trapped within various government departments, creating data barriers that severely restrict the effective utilization of the data elements. In light of this, leveraging the publicly available data resources across different regions to promote enterprises’ practices and development in the field of green technological innovation has become a pressing issue that requires urgent attention [
6].
As emphasized by You et al. [
7], data plays a crucial role in enterprises’ green technological innovation; nevertheless, the actual value embodied in public data openness remains underexplored in existing literature. Current research primarily focuses on theoretical frameworks, and the analysis of how different types of data influence innovation is still not sufficiently in-depth, particularly concerning the relationship between public data and green technological innovation, which remains relatively understudied. From the literature, public data can provide valuable information to enterprises, helping them better understand market dynamics and technological trends [
8]. Therefore, the openness of public data allows enterprises to obtain the necessary resources at a low cost or even at no cost. This valuable information helps reduce the uncertainties faced in green technological innovation and further incentivizes enterprises to invest in innovation, promoting the continuous development of technology [
9].
Although the openness of public data is considered by some scholars to be a potential means of promoting green technological innovation, some literature indicates that this openness does not necessarily stimulate innovation in practice. In reality, the nature of data varies, and different types of data may have significantly different impacts on the promotion of green technological innovation [
10,
11,
12,
13]. Based on the different sources of data, we can categorize data into personal data, enterprise data, and public data. Typically, personal data and enterprise data are collected by enterprises according to their own needs, which are associated with high costs for acquisition, processing, and storage. This results in these types of data having strong proprietary characteristics and competitive barriers [
14]. In contrast, public data is collected by the government in the course of fulfilling its responsibilities and inherently possesses sharing attributes [
15]. However, the openness of public data means that these resources are no longer exclusively owned by enterprises, which may enable competitors to access the same information and pursue similar technological development. In this case, the profit potential gained by enterprises through innovation may be limited, which could lead to hesitation when investing in green technological innovation.
It is evident that the impact of public data openness on green technological innovation does not yield a definitive answer. Whether the openness of public data can effectively promote green technological innovation and realize the empowering role of data elements still requires further theoretical analysis and empirical research for exploration and validation. From the current research on public data openness, many studies have examined its impact from the perspectives of carbon emissions [
16,
17,
18], income inequality [
19], and enterprise efficiency [
20,
21], while there is still a lack of literature analyzing and testing on the value of public data openness from the perspective of technological innovation. As technological innovation serves as a core driving force for economic transformation and sustainable development, the relationship between technological innovation and public data openness urgently needs to be explored in depth.
There are obvious research gaps in current academic research. First, the actual value of public data openness in promoting green technological innovation has not been fully explored. Most existing studies focus on the construction of theoretical frameworks and lack in-depth analysis of its specific impact paths and mechanisms. Second, there is insufficient research on the heterogeneous impacts of different types of data on green technological innovation, especially the specific role and mechanism of public data with inherent sharing attributes in this process, which remains unclear. Third, existing studies rarely analyze and test the value of public data openness from the perspective of technological innovation. There is still controversy about whether public data openness can effectively promote enterprises’ green technological innovation, and there is a lack of sufficient empirical verification. Fourth, traditional research methods have limitations in high-dimensional data analysis, and few studies have adopted interpretable machine learning methods to explore the importance of control variables affecting green technological innovation and their nonlinear relationships with green technological innovation, which affects the accuracy and comprehensiveness of research conclusions.
Aiming at the above research gaps, this paper focuses on solving the following core research questions: (1) Can public data openness effectively promote green technological innovation, and what is its net effect? (2) Through what specific mechanisms does public data openness affect green technological innovation, and what are the specific action paths of these mechanisms? (3) Are there heterogeneous characteristics in the impact of public data openness on green technological innovation, and what are the specific manifestations of these heterogeneous characteristics?
Based on this, this paper first analyzes the impact of public data openness on green technological innovation. Utilizing a database of prefecture-level cities in China from 2003 to 2022, this study treats the launch of local government data open platforms as an exogenous policy shock related to public data openness and empirically tests its effect on green technological innovation using a double machine learning approach. Furthermore, this paper employs the Shapley Additive explanations method of interpretable machine learning to rank the importance of control variables that may influence green technological innovation. It also investigates the nonlinear relationships between these control variables and green technological innovation. In the robustness checks, this study enhances the credibility of the baseline regression through various methods, including adjusting the research sample, conducting time dynamic tests, altering the dependent variable, and transforming the machine learning prediction model. Additionally, this paper conducts double verification through grouped regression and causal forest methods to examine variations in model explanatory power, thereby revealing heterogeneous characteristics. Finally, based on the theoretical analysis of the mechanisms through which public data openness exerts its influence, this paper empirically tests its role in reducing the uncertainty surrounding green technological innovation from three perspectives: the siphoning effect of aggregating innovative talent, the multiplier effect of enhancing entrepreneurial vitality, and the leapfrogging effect of promoting government transparency.
The marginal contributions of this paper are primarily reflected in three aspects. First, from a theoretical perspective and impact mechanism, this study incorporates the factor of public data openness into the classic product quality model, exploring the net effect of public data openness on green technological innovation from both the enterprise and consumer perspectives. Thus, this approach expands the research scope on how public data openness creates value and deepens our understanding of this relationship. Unlike previous studies that often analyze the mechanisms influencing green technological innovation from the perspectives of R&D investment and environmental regulation, Based on the policy attributes of public data, this study systematically analyzes and verifies the macro effects of public data openness, as well as its empowering mechanisms for driving green technological innovation. Second, from the perspective of research themes, this study conducts theoretical dissection and empirical corroboration on the issue of how public data openness affects green technological innovation, clarifying the significant effects of public data openness in driving the process of green technological innovation. Existing literature typically lacks in-depth exploration of the relationship between public data and green technological innovation, and there is ongoing debate regarding whether public data can effectively promote enterprises’ green technological innovation. Through empirical analysis of local government public data openness policies, this study substantiates the positive impact of public data openness on green technological innovation, filling a gap in the literature regarding the importance of control variable characteristics that influence green technological innovation. Third, from the perspective of research methodology, this study adopts the SHAP method within the realm of interpretable machine learning and the double machine learning method, effectively mitigating the limitations of traditional policy evaluation methods in high-dimensional data analysis. This study compares and explains the differences in the predictive effects and SHAP values of control variables that may influence green technological innovation, introducing a machine learning research paradigm that contributes to the further promotion and application of machine learning methods.
5. Discussions
5.1. Comparative Analysis Between This Study and Existing Literature
This study systematically examines the causal relationship between public data openness and green technological innovation, and conducts a comprehensive comparison and dialogue between its core findings and existing relevant literature.
First, in terms of the core impact effect, mainstream existing studies generally agree that public data openness can reduce information asymmetry and lower R&D trial-and-error costs, thereby promoting corporate innovation activities. For example, studies by Dong et al. (2026) [
49] and Yin et al. (2025) [
50] confirm that government data openness significantly improves green innovation efficiency. The baseline regression results of this paper show that public data openness has a significantly positive promoting effect on urban green technological innovation. This conclusion is consistent with the mainstream views in the existing literature, further verifying the rationality and reliability of the theoretical framework and empirical design in this paper.
Second, regarding heterogeneity characteristics, existing studies mostly explore regional differences from single dimensions such as economic development level and city size. On this basis, this paper further expands the analysis to three dimensions: urban agglomerations, industrial foundations, and transportation locations. The results indicate that the driving effect of public data openness is more significant in the Yangtze River Delta, urban agglomerations in the middle reaches of the Yangtze River, non-traditional industrial bases, and transportation hub cities, whereas its effect is relatively weak in the Beijing–Tianjin–Hebei, Chengdu–Chongqing, and Pearl River Delta urban agglomerations as well as traditional industrial bases. This conclusion supplements and enriches the discussion on spatial heterogeneity in existing studies and provides new empirical evidence for explaining the regional imbalance in the effectiveness of data openness policies.
Third, in terms of action mechanisms, previous literature mostly explains innovation-driven paths from the perspectives of R&D investment, environmental regulation, and digital finance. Based on the policy attributes and public goods characteristics of public data, this paper innovatively proposes and empirically tests three transmission mechanisms: the siphoning effect of attracting innovative talents, the multiplier effect of stimulating entrepreneurial vitality, and the leapfrogging effect of improving government governance transparency. These three mechanisms systematically reveal how public data openness aggregates innovation factors, activates market entities, and optimizes the institutional environment, thereby deepening the understanding of the internal mechanism of data elements empowering green technological innovation.
Fourth, in terms of research methods, most existing policy evaluation studies adopt traditional difference-in-differences, fixed effects, or synthetic control methods, which have limitations in handling high-dimensional variables, nonlinear relationships, and endogeneity issues. This paper adopts the Double Machine Learning (DML) method combined with the SHAP interpretability framework, which not only achieves robust estimation of causal effects but also completes quantitative interpretation of control variable importance and nonlinear relationships. This method effectively alleviates the curse of dimensionality and estimation bias in traditional econometric models, improves the accuracy and credibility of the conclusions, and constitutes a significant methodological improvement compared to similar studies.
5.2. Core Contributions
Compared with existing literature, the substantial contributions of this paper are mainly reflected in the following three aspects:
Theoretical contribution: This paper incorporates public data openness into the analytical framework of green technological innovation, clarifies the net effect and heterogeneous performance of data elements on green innovation, and expands the theoretical boundary of the digital economy empowering green development. Different from most studies focusing on the micro-enterprise level, this paper conducts research from the urban macro level, reveals the governance value of public data openness, and provides a new theoretical perspective for understanding the coordinated development of data elements and ecological innovation.
Methodological contribution: This paper introduces double machine learning and the SHAP interpretability method into the field of public policy evaluation, effectively solving the problems of model specification bias, difficulty in high-dimensional control, and weak result interpretability in traditional methods. By constructing a generalized interactive double machine learning model and combining causal forests for micro-heterogeneity identification, a more rigorous and flexible policy evaluation paradigm is formed, which can provide methodological references for subsequent research on data policies and innovation policies.
Practical contribution: The conclusions of this paper have clear policy implications. The regional heterogeneity results can provide a basis for the government to implement differentiated and precise data openness strategies; the three action mechanisms indicate a path for localities to optimize the innovation ecosystem and improve the data governance system; the robust positive effect provides empirical support for the nationwide promotion of public data platform construction and the release of data element value. Overall, the research conclusions can directly serve the policy practice of optimizing public services and achieving high-quality economic development in the digital era.
6. Conclusions
6.1. Conclusion and Policy Recommendations
The open sharing and utilization of public data is an essential component for establishing a foundational institutional framework for data resources. It plays a fundamental and leading role in the development and utilization of data elements. Against this backdrop, this study employs the DML method to empirically assess the impact of public data openness on green technological innovation. The empirical results indicate that public data openness significantly promote green technological innovation. After conducting a series of robustness tests—adjusting the research sample, removing outliers, and accounting for simultaneous policy interference—the conclusions remain valid. Additionally, we analyze the heterogeneity of policy impacts based on urban clusters, industrial foundations, and transportation hubs, with causal forest analysis results aligning closely with the group regression findings. The siphoning effect of gathering innovative talent, the multiplier effect of stimulating entrepreneurial vitality, and the leapfrogging effect of enhancing government transparency is a key mechanism through which public data openness exerts its green innovation effects.
This study provides important policy implications for improving the public data openness system and promoting green technological innovation.
Firstly, increase investment in public data openness platforms to enhance technological capabilities and data quality. The government should systematically increase investment and construction efforts for these platforms, ensuring technological completeness and data reliability. This should encompass effective financial resource allocation and the establishment of intelligent data management systems to facilitate convenient and efficient data acquisition, storage, and analysis. Additionally, cities above the prefecture level should develop unique public data openness platforms tailored to their economic and social characteristics, providing local enterprises with timely and comprehensive green technology-related data. A standardized data release system should be established to ensure the authenticity and regulation of data publication, enhancing enterprises’ capabilities to respond to policies and market demands. Meanwhile, the government should regularly organize training and promotional activities regarding data usage to improve awareness and capabilities among enterprises and the public, ensuring public data can play a substantive role in green technology research and innovation decision-making.
Secondly, establish a cross-regional data sharing mechanism to enhance the interconnection of public data. In light of the significant regional disparities among city clusters such as Beijing–Tianjin–Hebei, the Yangtze River Delta, and the middle reaches of the Yangtze River, it is recommended to establish a cross-regional data sharing mechanism to achieve effective interconnectivity among cities. This mechanism can provide comprehensive market information for enterprise decision-making and stimulate the innovative drive of regional enterprises, optimizing resource allocation. The government should lead the establishment of collaborative and communication platforms across regions, allowing successful experiences, technical solutions, and best policy practices to be effectively shared. Simultaneously, cooperation should be strengthened to develop relevant cross-regional data standards and interoperability agreements, fostering closer collaborative relationships and enhancing the trust foundation to create a favorable data ecosystem for green technological innovation.
Thirdly, formulate policies to attract high-end talent and encourage cooperation in the green technology sector. The government should develop targeted policies to attract and cultivate high-end talent in green technology, particularly individuals who can engage in research and innovation activities on public data openness platforms. Incentives such as special funds, tax reductions, and scholarships should encourage deep collaboration between universities, research institutions, and enterprises to cultivate talent that meets the demands of green technological innovation. Furthermore, an innovative talent exchange and cooperation platform should be established to facilitate tight collaboration between higher education institutions and enterprises, promoting talent mobility and knowledge sharing. Additionally, collaboration between national and local governments should be encouraged, leveraging local industrial characteristics to conduct targeted training and continuing education programs to enhance the skills of existing employees, providing robust talent support for the ongoing development of green technologies.
Finally, enhance policy transparency and public participation to ensure information access and policy feedback. The government should commit to improving policy transparency, ensuring that the public and enterprises can conveniently access green technology-related policy information. Specific measures include establishing multi-tiered online platforms to publish policy information, implementation details, and related laws and regulations, along with regular updates for easy access. Furthermore, the government should encourage broad participation from various sectors in the formulation, evaluation, and revision of policies, creating a strong interactive mechanism. Active public participation can enhance policy credibility and ensure that policies are dynamically adjusted to meet market demands. On this basis, the government should regularly publish assessment reports on policy implementation effectiveness and solicit feedback through forums and public hearings, ensuring continuous optimization of policy design to remain relevant and aligned with current needs.
6.2. Research Limitations and Future Research Directions
This study still has certain limitations. First, it only takes the launch of open data platforms as a proxy variable for public data openness, which cannot fully measure the quality, depth, and actual utilization efficiency of data openness. Second, the analysis is conducted at the prefecture-level city level without further combining micro-enterprise data to reveal the differentiated mechanisms. Third, the spatial spillover effects and cross-regional interactions of public data openness have not been considered. Future research can be expanded in three aspects: first, construct a multi-dimensional indicator system for the quality of public data openness to improve the measurement accuracy of core variables; second, use micro-enterprise data to further explore the heterogeneous impacts and micro-mechanisms; third, introduce spatial econometric methods to investigate the spatial spillover effects and regional collaborative innovation effects of data openness, so as to improve the relevant research system.