A Hybrid Data Envelopment Analysis–Random Forest Methodology for Evaluating Green Innovation Efficiency in an Asymmetric Environment

: The accurate evaluation of green innovation efficiency is a critical prerequisite for enterprises to achieve sustainable development goals and improve environmental performance and economic efficiency. This paper evaluates the green innovation efficiency of 72 new-energy enterprises by using a hybrid method of Data Envelopment Analysis (DEA) and a random forest model. The non-parametric DEA model is combined with the parametric SFA model to analyze the real green innovation efficiency on the basis of removing environmental factors and random factors. Then, the random forest model based on a nonlinear relationship is used to evaluate factors impacting green innovation efficiency. This paper proposes a comprehensive evaluation method designed to assess the green innovation efficiency of new-energy enterprises. By applying this method, companies can gain a comprehensive understanding of the current performance in green innovation, facilitating informed decision-making and accelerating sustainable development.


Introduction
With the increasing global emphasis on sustainable development, corporate green innovation has emerged as a crucial driver for economic growth and environmental protection.However, accurately assessing the efficiency of corporate green innovation faces numerous challenges, among which the issue of data asymmetry is particularly prominent.The diversity and difficulty in quantifying the input-output indicators of green innovation, the nonlinearity of the innovation process, as well as the complexity of firm heterogeneity and external environments have all led to difficulties in data collection, processing, and analysis for measuring green innovation efficiency.
The asymmetry of corporate data stems from various factors such as different business areas, market competition, economic cycle fluctuations, differences in management levels, and external environments, resulting in unbalanced performance across various indicators.This complexity reflects the diversity and challenges of the business environment, necessitating a comprehensive consideration to better understand and address the asymmetry of corporate data, thereby supporting effective decision making and management.In dealing with asymmetric data, the DEA (Data Envelopment Analysis) model exhibits significant advantages, making it a powerful tool for evaluating efficiency.DEA model is applicable to situations with multiple inputs and outputs.As a non-parametric method, it does not require assumptions about probability distribution of data or the form of production functions, thus being more suitable for handling various types of data, including asymmetric data.DEA assesses the performance level of units by comparing their relative efficiency rather than relying on a specific mathematical model.This allows DEA to effectively handle asymmetric data and provide judgments of relative efficiency, making it more flexible and widely applicable in practical settings.
DEA is a non-parametric efficiency evaluation method used to assess the efficiency of decision-making units.DEA was first proposed by Charnes, Cooper, and Rhodes in 1978 [1], leading to the CCR model.Nowadays, this model has been used in numerous applications [2][3][4].Subsequently, Banker, Charnes, and Cooper (1984) further expanded on the DEA method, introducing the BCC model [5], which addressed the issue of constant returns to scale present in CCR model.However, Fried et al. pointed out that [6] enterprises' inefficiency is not only impacted by internal mismanagement but also by external environments and random errors, thus proposing a three-stage DEA model.Nevertheless, traditional three-stage DEA models also have shortcomings.The efficiency measurement in the first stage of traditional three-stage DEA model operates under the premise of equal contraction proportions for every input [7].In reality, however, different inputs exhibit different elasticities and do not decrease proportionally, ignoring the slackness in resource utilization.This can lead to biases in evaluation results and the failure to fully reflect on decision-making units' efficiency level.A model based on slack variables, SBM model introduced by Tone [8], can effectively address this deficiency.
According to the above analysis, this article embeds SBM model into three-stage DEA model, adopting a non-parametric and non-oriented SBM model in the first and third stages of the three-stage DEA model.By considering slackness in resource utilization, it evaluates the efficiency level of DMU more comprehensively and provides more accurate evaluation results.In practical applications, many regression problems exhibit nonlinear relationships, and traditional linear regression models often have difficulties effectively capturing the complex patterns in data.As a powerful ensemble learning method, random forests demonstrate significant advantages in handling nonlinear regression problems due to their non-parametric, highly flexible, and robust nature.This article will delve into the application of random forests in studying the factors impacting green innovation efficiency, aiming to provide decision makers with more detailed and comprehensive information that can aid in taking appropriate actions.This paper's remaining sections are arranged as follows: A summary of previous research on green innovation efficiency is given in Section 2. The formulas for three-stage DEA model and random forest model utilized in this article are presented in Section 3. In Section 4, green innovation efficiency measured by three-stage DEA model is examined and elements that influence green innovation are discussed.The study findings are outlined in Section 5, along with the paper's limitations and future directions.

Literature Review
The rapid evolution of the modern economy has brought environmental concerns to the forefront, prompting a heightened societal focus on ecological issues.Green innovation has emerged as a critical research area attracting growing scholarly attention.Green innovation, rooted in the idea of "sustainable development," was initially used in the 1980 "World Conservation Strategy Report."[9].Subsequently, literature related to sustainable innovation [10,11], eco-innovation [12,13], green innovation [14][15][16][17], and environmental innovation [18] has gradually increased.Scholars hold different views on the understanding of green innovation.Chen et al. [19] defined "green innovation" as advancements in hardware or software that contribute to eco-friendly products or processes.These innovations encompass areas such as sustainable product design, and environmentally responsible corporate management practices.Wu et al. [20] believed that green innovation is a product of the combination of innovation theory and ecological views, which aims to maximize economic benefits while obtaining new knowledge and technologies to reduce environmental pollution.Rennings [21] argued that green innovation has a "double externality," with spillover effects both in the production stage and in the diffusion stage, resulting in a certain degree of reduction in internal costs and external environmental costs.Bernauer et al. [22] discussed the concept of green innovation as being within the same category as environmental innovation and eco-innovation.Zhang [23] and Schiederig [14], among other scholars, conducted detailed literature reviews and comparative analyses of definitions, revealing that green innovation, eco-innovation, environmental innovation, and sustainable innovation share a high degree of consistency in their core concerns and goals.Disregarding the subtle differences in their definitions, they are often interchangeably used or even equated in many literature sources.Currently, there are three major interpretations of green innovation definition in academia: equating green innovation with innovations that contribute positively to the ecological environment, equating green innovation with innovations that introduce environmental performance, and equating green innovation with environmental innovation or the optimization and innovation of environmental performance [23].
When evaluating green innovation efficiency and its influencing factors, numerous scholars have adopted diverse strategies.Most input-output indicators are constructed using Stochastic Frontier Analysis (SFA) [24,25], DEA [26][27][28][29], and related methods.Some scholars have also comprehensively assessed green innovation efficiency through spatial econometrics [30] and evaluated it using the entropy method [31].Xiao et al. [32] used an improved SFA model to conduct a thorough assessment of green innovation efficiency in Yangtze River Economic Belt.However, SFA model application requires the presetting of a production function, which, to a certain extent, increases the subjectivity of the evaluation.In contrast, DEA model operates without requiring assumptions regarding the production function form and can make evaluation results more objective and accurate.Thus, DEA has become the mainstream method for scholars to study green innovation efficiency.The following Table 1 presents relevant studies that use the DEA model to measure green innovation efficiency.Regarding the evaluation index system, the existing literature primarily constructs such a system from the following two aspects: input and output.This encompasses the following three dimensions: green innovation efficiency input variable, desirable output, and undesirable output, as illustrated in Table 2. Tian et al. [39] divided the input-output indicators into the following two stages: scientific and technological research and development (R&D) and achievement transformation.For R&D stage, the input-output indicators include the number of R&D personnel, whereas for achievement transformation stage, indicators encompass technology introduction and transformation expenditure, sales revenue of new products, etc. Zhang et al. [40] categorized inputs into human, material, and financial resources, while the selected innovation output indicators are broadly divided into the following two types: scientific and technological outcomes and economic benefits.Ma and Zhu [41] distinguished innovation inputs into R&D investment and production investment.R&D investment is represented by R&D funding and personnel, while pro-duction investment is expressed by employee compensation.For output indicators, they selected the number of patent applications and intangible assets.When examining factors impacting green innovation efficiency, scholars have primarily focused on two levels: macro-environment and micro-level factors.The macro-environment encompasses the institutional landscape [50][51][52], market industry [34,53], and related international trade relations [54].At the micro-level, internal factors related to enterprises mainly include the level of awareness of enterprise personnel [55], enterprise costs [56,57], and social responsibility [58,59].It is observable that empirical research on green innovation efficiency differs depending on research questions.Hong et al. [60] analyzed the influencing factors of innovation efficiency in China's pharmaceutical manufacturing industry and found that two external macro-factors, namely, market competition intensity and government policy support, as well as the internal micro-factor of the enterprise size, are essential for achieving higher levels of innovation efficiency.Wenbo [61] studied the impact of production factors, economic benefits, internal management, and the social environment on green innovation.Kang et al. [62] examined whether and how environmental regulations drive green innovation, aiming to explore the influencing mechanism of green innovation efficiency.Yalabik [63] found that factors such as market competition, consumption, and environmental protection pressure can significantly affect firms' green technology innovation efficiency.Gong et al. [64] provided a detailed analysis of how factors such as the agglomeration effect of outward foreign direct investment influence industrial green innovation efficiency.Kuang et al. [65] tested the influencing mechanism of green innovation efficiency from the perspective of the shadow economy, exploring potential pathways to enhance green innovation efficiency.
Regarding research methods for influencing factors, the random forest model, as an integrated learning method, exhibits good robustness and generalization capabilities, and is suitable for various types of datasets and problems.In 1995, Ho [66] first proposed the concept of random decision forests.He suggested creating a classifier based on decision trees that contained an infinite number of decision trees, which were combined in a complementary or weighted manner to construct a new classifier, namely, the random decision forest.Random decision forests address the issue of overfitting that can occur with single decision trees.In 2001, Breiman [67] integrated bagging algorithms, random subspace algorithms, and classification and regression trees to propose the traditional random forest.Subsequently, the traditional random forest has been widely applied in numerous fields such as ecology [68][69][70], medicine [71][72][73], management [74,75], and economics [76,77], and has achieved good results in solving routine classification or regression problems.Xu et al. [78] applied the random forest to observe data from gastric cancer patients to predict their postoperative survival status and assist doctors in assessing treatment decisions.Xie et al. [75] integrated sampling techniques and cost penalties into the random forest and used bank customer data as an example to predict customer churn.Susana et al. [79] applied the random forest method to unbalanced samples to enable public institutions to direct public investment subsidies to identified groups of enterprises based on this identification.
The traditional DEA model has limitations in efficiency evaluation, which does not consider the impact of environmental variables and random factors on the green innovation efficiency, resulting in bias in efficiency evaluation results.The measurement of green innovation efficiency mainly stays at the macro-level, such as the province and industry, and there are few studies on the enterprise level.As an important force to promote green low-carbon transformation and achieve sustainable development, research on measuring green innovation efficiency in new-energy companies using DEA model remains limited.The research on the influencing factors of green innovation efficiency is mainly based on linear regression models, which cannot effectively analyze nonlinear relationships.There is a gap in the research on the nonlinear influence relationship, and it is difficult to accurately evaluate the factors affecting the green innovation efficiency.
Against this backdrop, this paper establishes a research framework that combines a three-stage DEA model with an SBM model, excluding environmental and random factors, to provide a accurate measure of green innovation efficiency.Considering the advantages of random forest model in exploring influencing factors, this paper selects the random forest model to analyze the influencing factors of green innovation efficiency.The main contributions of this paper are as follows: Firstly, by embedding SBM model into the three-stage DEA, this paper comprehensively evaluates the efficiency level of DMUs by considering the slackness of resource utilization, providing more accurate evaluation results.By combining the parametric SFA model with the non-parametric DEA model, this paper fully utilizes their respective advantages to better handle asymmetric data, thereby more comprehensively assessing the efficiency level of units and proposing improvement suggestions.
Secondly, unlike other linear regression methods, the random forest model adopted in this paper can not only provide rankings of influencing factors but can also visually demonstrate the nonlinear characteristics of influencing factors on green innovation efficiency by plotting partial dependence plots.This facilitates a deeper understanding of how various factors influence green innovation efficiency.

The Three-Stage DEA Model
The three-stage DEA model framework for analyzing green innovation efficiency is illustrated in Figure 1.
sessing treatment decisions.Xie et al. [75] integrated sampling techniques and cost penal ties into the random forest and used bank customer data as an example to predict cus tomer churn.Susana et al. [79] applied the random forest method to unbalanced samples to enable public institutions to direct public investment subsidies to identified groups o enterprises based on this identification.
The traditional DEA model has limitations in efficiency evaluation, which does no consider the impact of environmental variables and random factors on the green innova tion efficiency, resulting in bias in efficiency evaluation results.The measurement of green innovation efficiency mainly stays at the macro-level, such as the province and industry and there are few studies on the enterprise level.As an important force to promote green low-carbon transformation and achieve sustainable development, research on measuring green innovation efficiency in new-energy companies using DEA model remains limited The research on the influencing factors of green innovation efficiency is mainly based on linear regression models, which cannot effectively analyze nonlinear relationships.There is a gap in the research on the nonlinear influence relationship, and it is difficult to accu rately evaluate the factors affecting the green innovation efficiency.
Against this backdrop, this paper establishes a research framework that combines a three-stage DEA model with an SBM model, excluding environmental and random fac tors, to provide a accurate measure of green innovation efficiency.Considering the ad vantages of random forest model in exploring influencing factors, this paper selects the random forest model to analyze the influencing factors of green innovation efficiency.The main contributions of this paper are as follows: Firstly, by embedding SBM model into the three-stage DEA, this paper comprehen sively evaluates the efficiency level of DMUs by considering the slackness of resource uti lization, providing more accurate evaluation results.By combining the parametric SFA model with the non-parametric DEA model, this paper fully utilizes their respective ad vantages to better handle asymmetric data, thereby more comprehensively assessing the efficiency level of units and proposing improvement suggestions.
Secondly, unlike other linear regression methods, the random forest model adopted in this paper can not only provide rankings of influencing factors but can also visually demonstrate the nonlinear characteristics of influencing factors on green innovation effi ciency by plotting partial dependence plots.This facilitates a deeper understanding o how various factors influence green innovation efficiency.

The Three-Stage DEA Model
The three-stage DEA model framework for analyzing green innovation efficiency is illustrated in Figure 1.

The First Stage: SBM Model
To assess efficiency from both input and output perspectives, this paper utilizes the non-oriented SBM model.The SBM model formula is as follows: The SBM model incorporates slack variables to account for differences in input and output levels.s − ∈ R m represents slack in input resources, s b ∈ R s 2 reflects slack in undesirable outputs, and s g represents slack in desirable outputs.The model considers m input variables, s 1 desirable output variables, and s 2 non-desirable output variables.
x k , y g k , y b k represents the input, desirable output, and non-desirable output values for the k-th decision-making unit.The collective data for all decision-making units is represented by X, Y g , Y b .The weights assigned to each of the n decision-making units are represented by λ ∈ R n .
The efficiency value of the evaluated DMU is denoted by ρ.Technical efficiency (TE) is determined under the assumption of constant returns to scale (CRS), while pure technical efficiency (PTE) is calculated assuming variable returns to scale (VRS).Scale efficiency (SE) is calculated as the ratio of TE to PTE (SE = TE/PTE).

The Second Stage: SFA Model
In the second stage, this paper decomposes input slack variables into components representing environmental factors, random factors, and managerial inefficiency.By excluding the environmental and random factors, this paper obtains the input redundancy attributable solely to managerial inefficiency.This can be expressed as: In this expression, S nk denotes the slack variable associated with the n-th input of the k-th decision-making unit.The influence of environmental factors is denoted by where Z k represents observed environmental variables and β n is the corresponding parameter vector.The mixed error term un .Frontier 4.1 software is used to perform SFA regression analysis, yielding estimates for β n , σ 2 , and the parameter γ.These estimates are then used to calculate σ vn and σ un using the formulas below: The parameter γ quantifies the proportion of variance attributed to managerial inefficiency within the total variance.When γ is close to 1, managerial inefficiency has a more significant impact.Conversely, when γ is close to 0, random factors have a greater influence.
The managerial inefficiency term can be isolated using the following formula: The mixed error term is represented by ε = V nk + U nk , with λ = σ un /σ vn .φ denotes the probability density function and Φ denotes the distribution function of the standard normal distribution.
After isolating the managerial inefficiency term U, the random factor term V can be calculated using the following formula: Next, the input variables are adjusted using the SFA model to derive new input values, which are calculated as follows: In this formula, X * nk represents the adjusted input, while X nk denotes the original input.The term [max(Z k β n ) − Z k β n ] accounts for the adjustment to the influence of environmental factors.The term [max(V nk ) − V nk ] accounts for the adjustment of random factors influence, ensuring that all decision-making units are evaluated under equivalent conditions.

The Third Stage: The SBM Model after Adjusting the Input Variables
By reintroducing adjusted inputs X * nk and original outputs into the SBM model, this paper can re-evaluate efficiency.This approach removes the influence of environmental and random factors, resulting in a more accurate representation of green innovation efficiency.

Random Forest Model
In this paper, a random forest model is used to analyze the factors influencing newenergy companies' green innovation efficiency.The random forest model is generally implemented through the following steps: The bootstrap method is used to extract subsamples with sample size n from the original data, and m feature variables are determined to form the dataset D = {x i1 , x i2 , x i3 , . . . ,x in , A regression tree is constructed for each subsample, denoting the regression tree as t j (x).
The results of all regression trees are summarized to obtain the optimal estimate, t(x) = ∑ j t j (x).
Compared with the traditional multiple regression analysis, the advantages of the random forest are very obvious.Not only does it not need to set the function form, it can also rank the importance of the independent variables and further give the partial correlation graph.

Variable Selection and Data Sources
Input variables are chosen based on three aspects: labor, capital, and energy.Labor input: Selecting the number of R&D personnel as an indicator can directly reflect the human resource investment of enterprises in green innovation.Capital investment: R&D expenditure, as a measure of capital investment, reflects the financial support of enterprises in green technology research and development.Energy input: The comprehensive energy consumption can reflect the energy consumption level of the enterprise in the production and operation process, and is an important indicator to measure the energy utilization efficiency and green development level.The selection of these three indicators takes into account the characteristics of green innovation and can better reflect the enterprises' investment in green innovation.
Output variables are categorized as either desirable or undesirable.Desirable outputs are selected based on technological and economic factors.Technological output is measured by the number of green patent applications.These patents represent innovations in environmentally friendly technologies, products, or solutions, reflecting a company's commitment to sustainable development.Main business income serves as the economic output variable, representing the sales revenue generated through core operations.Greenhouse gas emissions are chosen as the undesirable output variable, reflecting the new-energy companies' contribution to advancing the dual carbon target.
A detailed description of input and output variables, environmental factors, and data sources employed in the study is presented in Table 3.  2 and Table 4, the first stage green innovation efficiency of 72 newenergy enterprises in 2022 was determined using the MAXDEA software, which was based on SBM model.Environmental and random factors are not excluded in this calculation.Figure 2 indicates a relatively low mean technical efficiency of 0.309 for the 72 newenergy companies.Pure technical efficiency, also averages 0.445, suggesting a low level for technology and management within the sample.Scale efficiency, representing the rationality of company size and its influence on efficiency, averages 0.702.This suggests that scale efficiency is higher than pure technical efficiency within the sample.According to Table 4, both technical efficiency and pure technical efficiency have a large number of enterprises in the range of less than 0.5, followed by a large number of enterprises with an efficiency value of 1, and a small number of enterprises in the range of 0.5-1.The scale efficiency is the largest number of enterprises in the range of 0.5-0.8,accounting for the largest proportion.It shows that different new-energy enterprises have a large gap in green innovation efficiency.
DEA is a non-parametric efficiency evaluation method which evaluates the relative efficiency of each DMU by constructing the efficiency front of the DMU.DEA does not need to set the weight of the input-output index in advance, thereby avoiding the influence of subjective factors on the weight setting.The relative importance of each input and output index can be indirectly reflected through the analysis of slack variables.
Table 5 reveals that input improvement values are negative across labor, capital, and energy inputs, indicating excessive resource utilization.Companies appear to use more resources than necessary to achieve outputs.While economic output is relatively close to the target value, suggesting a focus on economic benefits, there's a significant gap between target and actual values for technical output.DMUs have a large improvement in the output index of green patent applications, which indicates that the output has a great impact on the efficiency of DMUs.Through the analysis of the slack variables, the improvement direction for efficiency is provided.Figure 2 indicates a relatively low mean technical efficiency of 0.309 for the 72 newenergy companies.Pure technical efficiency, also averages 0.445, suggesting a low level for technology and management within the sample.Scale efficiency, representing the rationality of company size and its influence on efficiency, averages 0.702.This suggests that scale efficiency is higher than pure technical efficiency within the sample.
According to Table 4, both technical efficiency and pure technical efficiency have a large number of enterprises in the range of less than 0.5, followed by a large number of enterprises with an efficiency value of 1, and a small number of enterprises in the range of 0.5-1.The scale efficiency is the largest number of enterprises in the range of 0.5-0.8,accounting for the largest proportion.It shows that different new-energy enterprises have a large gap in green innovation efficiency.
DEA is a non-parametric efficiency evaluation method which evaluates the relative efficiency of each DMU by constructing the efficiency front of the DMU.DEA does not need to set the weight of the input-output index in advance, thereby avoiding the influence of subjective factors on the weight setting.The relative importance of each input and output index can be indirectly reflected through the analysis of slack variables.
Table 5 reveals that input improvement values are negative across labor, capital, and energy inputs, indicating excessive resource utilization.Companies appear to use more resources than necessary to achieve outputs.While economic output is relatively close to the target value, suggesting a focus on economic benefits, there's a significant gap between target and actual values for technical output.DMUs have a large improvement in the output index of green patent applications, which indicates that the output has a great impact on the efficiency of DMUs.Through the analysis of the slack variables, the improvement direction for efficiency is provided.The input slack variables from the first stage are used as dependent variables in a regression analysis and environmental factors are as independent variables.The SFA regression analysis, conducted using Frontier 4.1, is summarized in Table 6.Table 6 shows that the one-sided error LR test is significant at the 1% level, rejecting the hypothesis of no managerial inefficiency.This implies that the slack variables of the three inputs are impacted by management inefficiency.The gamma value of 1 indicates that managerial inefficiency dominates, while random factors have a limited impact on green innovation efficiency.These findings support the use of the SFA model.Although the regression coefficients of the environmental variables on the slacks of the individual input variables are not significant, the LR one-sided error test passes at the 1% significance level.Therefore, the adjustment of the input variables still needs to take into account all five of the environmental variables mentioned above.
Environmental regulation intensity is positively correlated with comprehensive energy consumption slack variable at the 1% significance level.The increased intensity of environmental regulations may require companies to adjust or improve production processes, which may lead to some energy consumption increases.
A positive correlation is observed between technological market environment and the slack variable for comprehensive energy consumption at the 1% significance level.The improvement of the technological market environment may encourage new-energy companies to undergo technological updates and transformations, accompanied by a certain increase in energy consumption.However, as technology gradually matures, companies are expected to ultimately achieve a reduction in energy consumption through new technologies and more efficient production methods.
At the 1% level of significance, the educational environment is positively correlated with the slack variable of R&D expenditure, but negatively correlated with the slack variable of comprehensive energy consumption.Increased competition in technological innovation, often driven by a higher local education level, may prompt companies to boost R&D expenditure to remain competitive.The increase in local educational expenditure may offer new energy companies better access to talent and technological support, facilitating the transition from high-energy-consumption stages to more efficient and sustainable production modes.
Economic development level exhibits a positive correlation (p < 0.10) with R&D personnel slack variable and a negative correlation (p < 0.01) with comprehensive energy consumption slack variable.This suggests that as regional economies grow, more investment opportunities and innovative projects arise, leading to increased demand for R&D personnel.With the gradual advancement of technological progress, production optimization, and economic structural adjustments, a trend towards a reduction in energy consumption may be observed.
Regional openness exhibits a negative correlation (p < 0.01) with R&D expenditure slack variable.This suggests that open regions, with the favorable innovation ecosystems, facilitate more efficient utilization of R&D funds by fostering external cooperation, bringing in advanced technology, innovative management practices, and R&D resources.

Green Innovation Efficiency Analysis in the Third Stage
The SBM model was used to re-evaluate the green innovation efficiency of new-energy companies, using adjusted input variables in place of the originals while keeping output variables constant.This re-evaluation, illustrated in Figure 3 and Table 7, provides a more accurate assessment of efficiency by eliminating the influence of environmental and random factors.
novation, often driven by a higher local education level, may prompt companies to boost R&D expenditure to remain competitive.The increase in local educational expenditure may offer new energy companies better access to talent and technological support, facilitating the transition from high-energy-consumption stages to more efficient and sustainable production modes.
Economic development level exhibits a positive correlation (p < 0.10) with R&D personnel slack variable and a negative correlation (p < 0.01) with comprehensive energy consumption slack variable.This suggests that as regional economies grow, more investment opportunities and innovative projects arise, leading to increased demand for R&D personnel.With the gradual advancement of technological progress, production optimization, and economic structural adjustments, a trend towards a reduction in energy consumption may be observed.
Regional openness exhibits a negative correlation (p < 0.01) with R&D expenditure slack variable.This suggests that open regions, with the favorable innovation ecosystems, facilitate more efficient utilization of R&D funds by fostering external cooperation, bringing in advanced technology, innovative management practices, and R&D resources.

Green Innovation Efficiency Analysis in the Third Stage
The SBM model was used to re-evaluate the green innovation efficiency of new-energy companies, using adjusted input variables in place of the originals while keeping output variables constant.This re-evaluation, illustrated in Figure 3 and Table 7, provides a more accurate assessment of efficiency by eliminating the influence of environmental and random factors.A comparison of the green innovation efficiencies in the first and third stages reveals that all efficiency types have improved after removing environmental and random factors.The average technical efficiency increased from 0.309 to 0.337, the average pure technical efficiency increased from 0.445 to 0.454, and the average scale efficiency increased from 0.702 to 0.796.This suggests that the initial assessment of green innovation efficiency was underestimated due to environmental impacts, highlighting the constraints imposed on  A comparison of the green innovation efficiencies in the first and third stages reveals that all efficiency types have improved after removing environmental and random factors.The average technical efficiency increased from 0.309 to 0.337, the average pure technical efficiency increased from 0.445 to 0.454, and the average scale efficiency increased from 0.702 to 0.796.This suggests that the initial assessment of green innovation efficiency was underestimated due to environmental impacts, highlighting the constraints imposed on new-energy companies by external conditions.While improvements were observed after adjustment, significant room for further improvement remains.
In the third stage, the number of companies achieving DEA effectiveness remains at 14. Technical efficiency and pure technical efficiency are still the largest number of enterprises in the range of less than 0.5, accounting for more than half, while scale efficiency is the largest number of enterprises in the range of 0.8-1.This shows that the level of technical efficiency and pure technical efficiency of most enterprises is low, but the level of scale efficiency is high, so the emphasis should be placed on the improvement of the enterprise technology and management level.
As can be seen from Table 8, the improvements in slack variables in the third stage are similar to those in the first stage, where both input variables have redundant phenomena, the number of green patent applications in the output variable has a large room for improvement, and economic output closely approaches the target value.Analyzing input redundancy and output insufficiency allows for an evaluation of resource utilization efficiency and provides insights into improving both input and output inefficiencies.The insights can empower managers to make informed decisions that promote rational resource allocation, improve green innovation efficiency, and drive sustainable development.
After the removal of environmental factors and random factors, 72 new-energy companies are classified into four groups according to their pure technical efficiency and scale efficiency levels.The scatter points on the graph represent sample companies.Taking scale efficiency as the Y-axis and pure technical efficiency as the X-axis, and bounded by the mean value (0.454, 0.796), it is divided into the following four types: high-tech high-scale, high-tech low-scale, low-tech high-scale, and low-tech low-scale, as shown in Figure 4.  High-tech High-scale: There are 14 enterprises, accounting for 19.44% of the total, and PTE and SE of these 14 enterprises are one, reaching the forefront of efficiency.Although a business has reached the DEA efficiency frontier, it can still achieve further development by looking for new growth opportunities, maintaining sensitivity to competi- High-tech High-scale: There are 14 enterprises, accounting for 19.44% of the total, and PTE and SE of these 14 enterprises are one, reaching the forefront of efficiency.Although a business has reached the DEA efficiency frontier, it can still achieve further development by looking for new growth opportunities, maintaining sensitivity to competitive dynamics, and adapting strategies to capitalize on new opportunities.
High-tech Low-scale: Including 11 enterprises, accounting for 15.28% of the total, the PTE is at a high level, but the SE is low.Therefore, these enterprises should focus on scale efficiency, consider multiple factors such as strategy, market demand, capital and resources, and risk assessment, and reasonably control the scale of enterprises and provide efficient products and services at an appropriate scale.
Low-tech High-scale: Including 32 enterprises, accounting for 44.44% of the total, accounting for the largest proportion, its PTE is low, while the SE is at a higher level.Therefore, these enterprises should focus on pure technical efficiency, and improve the level of enterprise technology, management, and resource utilization by rationally allocating R&D resources, optimizing management processes, and improving the professional competence and innovation consciousness of employees.
Low-tech Low-scale: Including 15 enterprises, accounting for 20.83% of the total, their PTE and SE are at a low level.These enterprises should not only focus on enhancing technological capabilities, management practices, and resource utilization but also consider the optimal size for operations.

Analysis of Influencing Factors of Green Innovation Efficiency Based on Random Forest Model
This paper analyzes the importance of factors influencing new-energy companies' green innovation efficiency based on the random forest model.To further enhance the interpretability of the random forest model, the influencing factors are analyzed based on partial dependence plots.Traditional regression analysis methods only represent the influence of independent variables on dependent variables in terms of average trends through regression coefficients, but the random forest model can intricately demonstrate the effects of independent variables on the dependent variable at different levels through partial dependence plots.
The mean square error is 0.039, the root mean square error is 0.198, and the average absolute error is 0.157.All three of these values are small, indicating that the error between the actual value and the predicted value is small, and the prediction effect of the model is better.
Figure 5 presents the ranking of the importance of the factors influencing green innovation efficiency based on the random forest model.It is evident that, among these influencing factors, ownership concentration, R&D personnel structure, and operational capacity hold the top three positions in terms of importance, exerting significant influence on new-energy companies' green innovation efficiency.
The relationship between the ownership concentration and green innovation efficiency is illustrated in Figure 6.When the concentration of ownership is high, the resources of an enterprise are more likely to be concentrated in the hands of a few major shareholders.These large shareholders usually have a stronger decision-making ability and resource allocation ability, and can promote the implementation of green innovation projects more efficiently.High ownership concentration means that the interests of major shareholders are more consistent with the interests of the enterprise as a whole.In this case, major shareholders have more incentive to promote green innovation because it not only helps to enhance the social image and brand value of the company, but also can bring long-term economic benefits.The relationship between the ownership concentration and green innovation efficiency is illustrated in Figure 6.When the concentration of ownership is high, the resources of an enterprise are more likely to be concentrated in the hands of a few major shareholders.These large shareholders usually have a stronger decision-making ability and resource allocation ability, and can promote the implementation of green innovation projects more efficiently.High ownership concentration means that the interests of major shareholders are more consistent with the interests of the enterprise as a whole.In this case, major shareholders have more incentive to promote green innovation because it not only helps to enhance the social image and brand value of the company, but also can bring long-term economic benefits.The relationship between the R&D personnel structure and green innov ciency is illustrated in Figure 7. Initially, the newly added R&D personnel requi adapt to the company's working environment, products, and technology, lea temporary decrease in efficiency.Over time, the company's R&D team gradua lishes a more mature collaborative mechanism and accumulates experience in gr vation, resulting in an increase in green innovation efficiency.Overall, this ch stem from the developmental process of the R&D team, starting from the initial a period to subsequent synergistic effects, ultimately leading to improved eff green innovation.The relationship between the R&D personnel structure and green innovation efficiency is illustrated in Figure 7. Initially, the newly added R&D personnel require time to adapt to the company's working environment, products, and technology, leading to a temporary decrease in efficiency.Over time, the company's R&D team gradually establishes a more mature collaborative mechanism and accumulates experience in green innovation, resulting in an increase in green innovation efficiency.Overall, this change may stem from the developmental process of the R&D team, starting from the initial adaptation period to subsequent synergistic effects, ultimately leading to improved efficiency in green innovation.
vation, resulting in an increase in green innovation efficiency.Overall, this change m stem from the developmental process of the R&D team, starting from the initial adaptat period to subsequent synergistic effects, ultimately leading to improved efficiency green innovation.The relationship between operational capability and green innovation efficiency illustrated in Figure 8.The enhancement of a company's operational capability nece tates optimizing resource allocation to improve production efficiency and accelerate as turnover.Such changes in resource allocation may have a short-term impact on the inp and efficiency of green innovation.However, with the continuous optimization of source allocation and the adoption of new technologies, green innovation efficiency is pected to gradually increase and achieve long-term improvements.The relationship between operational capability and green innovation efficiency is illustrated in Figure 8.The enhancement of a company's operational capability necessitates optimizing resource allocation to improve production efficiency and accelerate asset turnover.Such changes in resource allocation may have a short-term impact on the input and efficiency of green innovation.However, with the continuous optimization of resource allocation and the adoption of new technologies, green innovation efficiency is expected to gradually increase and achieve long-term improvements.

Conclusions
The three-stage DEA model reveals that, after the second stage of SFA adjustme TE, PTE, and SE all demonstrate some improvement.However, significant potentia further enhancement remains, highlighting the impact of external environmental straints on new-energy companies' green innovation efficiency.Despite adjustments consistently surpasses PTE.The number of enterprises in the state of low-tech high-s is the largest, accounting for the largest proportion.Therefore, improving green inno tion efficiency requires a focus on increasing pure technical efficiency through adva ments in technology and management practices.
The factors affecting green innovation efficiency of new-energy companies are s ied based on random forest model.Meanwhile, so as to further improve the interpreta

Conclusions
The three-stage DEA model reveals that, after the second stage of SFA adjustments, TE, PTE, and SE all demonstrate some improvement.However, significant potential for further enhancement remains, highlighting the impact of external environmental constraints on new-energy companies' green innovation efficiency.Despite adjustments, SE consistently surpasses PTE.The number of enterprises in the state of low-tech high-scale is the largest, accounting for the largest proportion.Therefore, improving green innovation efficiency requires a focus on increasing pure technical efficiency through advancements in technology and management practices.
The factors affecting green innovation efficiency of new-energy companies are studied based on random forest model.Meanwhile, so as to further improve the interpretability of random forest model, important influencing factors are analyzed based on partial dependence plots.The study found that, among these influencing factors, the ownership concentration, R&D personnel structure, and operational capacity hold the top three positions in terms of importance, exerting important influence on new-energy companies' green innovation efficiency.
In order to improve the green innovation efficiency, it is necessary to work together on the following: At the enterprise level, strengthen technological innovation capacity building, increase investment in research and development, strengthen key core technologies, and develop more efficient, clean, and low-carbon new-energy technologies and products.Optimize the energy management system, actively promote clean production, and reduce pollutant emissions.Strengthen the construction of the talent team, introduce and cultivate green innovation talents, and enhance the talent support ability of green innovation in enterprises.
At the government level, improve the policy support system and increase the policy support for the green innovation of new-energy enterprises.Strengthen industry supervision, establish a sound green innovation standard system, strengthen the supervision and management of green innovation activities, and guide the green and healthy development of enterprises.Foster a favorable environment for innovation, strengthen intellectual property protection, and create a market environment for fair competition.

Discussion
In the field of new energy, the green innovation efficiency serves as a pivotal indicator for measuring sustainable development ability and competitiveness of enterprises.With the enhancement of global environmental awareness and the transformation of energy structures, the new-energy industry is facing unprecedented opportunities and challenges, and improving green innovation efficiency is crucial to promoting the high-quality development of the new-energy industry.Green innovation efficiency is the key for new-energy enterprises to achieve win-win economic and environmental benefits.China's new-energy industry is developing rapidly, but it also faces challenges such as tight resource and environmental constraints and the need for the breakthrough of core technologies.Improving the efficiency of green innovation can promote the development of the new-energy industry into the high-end, intelligent, and green direction, thereby getting rid of the dependence on traditional resources and achieving sustainable development.
The managerial implications of this study for new-energy companies lies in the following.(1) Pointing out the improvement direction and improving the performance of green innovation: The research results can help management to find the shortcomings of enterprises in green innovation, such as a low resource allocation efficiency and poor control of undesirable output, and take targeted improvement measures to improve green innovation performance of enterprises.(2) The research results can help management to deeply understand the factors affecting the green innovation efficiency, identify the advantages and disadvantages of enterprises, and provide a scientific basis for formulating green transformation and upgrading strategies, thereby optimizing resource allocation and enhancing enterprises competitiveness.(3) Promote the change in management concepts and strengthening the awareness of green development: This study emphasizes the importance of green innovation and sustainable development, and encourages enterprises to fully integrate green development principles into management practices.
This research also needs to be further explored from the following aspects: (1) Due to the limited years in which green data, such as the comprehensive energy consumption and greenhouse gas emissions of enterprises, can be obtained, this study only selects 2022 as the research period.It is suggested that the research time scope should be further expanded in future studies to explore the dynamic evolution trend of the green innovation efficiency of enterprises.(2) The selected index system is not complete enough, which affects the depth and breadth of the conclusion.In future studies, qualitative indicators can be added on the basis of quantitative indicators, and the two can be combined for the research so as to further improve the index system.

Figure 2 .
Figure 2. The efficiency mean of the first stage.

Figure 2 .
Figure 2. The efficiency mean of the first stage.

Figure 3 .
Figure 3.The efficiency mean of the third stage.

Figure 3 .
Figure 3.The efficiency mean of the third stage.

Figure 5
Figure 5 presents the ranking of the importance of the factors influencing green innovation efficiency based on the random forest model.It is evident that, among these influencing factors, ownership concentration, R&D personnel structure, and operational capacity hold the top three positions in terms of importance, exerting significant influence on new-energy companies' green innovation efficiency.

Table 1 .
Relevant DEA studies on green innovation efficiency.

Table 2 .
Overview of research on green innovation efficiency indicator systems.

Table 3 .
Green innovation efficiency index system.This paper analyzes A-share listed companies in the newenergy sector.Companies with ST or *ST designations, those without disclosed ESG or social responsibility reports, and those with missing indicators are excluded.This resulted in a sample of 72 new-energy listed companies.Data for 2022 is collected from company annual reports, ESG reports, CNRDS database, and statistical yearbooks.4.2.Three-Stage DEA Model for Green Innovation Efficiency Analysis 4.2.1.Green Innovation Efficiency Analysis in the First Stage As seen in Figure

Table 4 .
Green innovation efficiency in the first stage.

Table 4 .
Green innovation efficiency in the first stage.

Table 5 .
Slack variable analysis in the first stage.

Table 6 .
Regression results of the SFA model.

Table 7 .
Green innovation efficiency in the third stage.

Table 8 .
Slack variable analysis in the third stage.