Studying the Diffusion Effect of Policy Combinations on New Energy Vehicles Based on Reinforcement Learning

Li, Zhuangzhuang; Luo, Hua

doi:10.3390/electronics15040779

Open AccessArticle

Studying the Diffusion Effect of Policy Combinations on New Energy Vehicles Based on Reinforcement Learning

by

Zhuangzhuang Li

and

Hua Luo

^*

School of Economics and Finance, Shanghai International Studies University, Shanghai 201620, China

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(4), 779; https://doi.org/10.3390/electronics15040779

Submission received: 15 January 2026 / Revised: 4 February 2026 / Accepted: 6 February 2026 / Published: 12 February 2026

(This article belongs to the Special Issue New Trends in Machine Learning, System and Digital Twins)

Download

Browse Figures

Versions Notes

Abstract

The development of the new energy vehicle (NEV) industry has become a key driver of the global low-carbon transition. Understanding the policy effect on NEV diffusion is essential to promote sustainable growth. In this study, we propose a new approach that combines a two-layer small-world network involving consumers and enterprises and evolutionary game theory to study the diffusion effect of industrial and trade policies on enterprises’ low-carbon production strategies and consumer preferences. Different from existing diffusion models, we integrate reinforcement learning (RL) into the decision-making process of enterprises and use SHapley Additive exPlanations (SHAP) to decode the micro-level decision logic of enterprises. In terms of the decision-making mechanism, the simulation results show that the Q-learning algorithm better fits the real market diffusion trend of NEVs compared with traditional algorithms; in terms of policy effects, industrial policies and trade policies exhibit a synergistic effect. SHAP analysis reveals that enterprises are more concerned about NEV market maturity than the impact of policy parameters on decision-making; Sobol sensitivity analysis indicates that consumer subsidies have a greater impact on the market diffusion of NEVs than trade policies.

Keywords:

reinforcement learning; diffusion; new energy vehicles; policy combinations; evolutionary game; complex network

1. Introduction

The sustained expansion of the global economy has increased energy consumption and carbon emissions, exacerbating environmental issues such as global warming, glacial melt, and regional air pollution, with long-term impacts on ecosystems and public health. The transport sector is a major contributor to energy use and greenhouse gas emissions, with fuel vehicles generating significant emissions in both the usage and manufacturing phases. New energy vehicles (NEVs), with their lower lifecycle greenhouse gas emissions [1], are widely regarded as a key technological pathway.

To meet emission reduction targets and drive the green transition, countries have adopted a range of industrial policies, including carbon pricing, fuel taxes, purchase subsidies, infrastructure investments, and public procurement [2]. Furthermore, some countries support their domestic NEV industries by implementing trade policies. The Chinese government has actively supported the NEV industry by reducing costs of enterprises. In December 2023, the Tariff Commission announced a tariff reduction plan for 2024, including a 0% tariff on lithium chloride, lithium carbonate, and cobalt carbonate, which are key raw materials of NEVs, highlighting the importance of tariff policies in shaping industry development [3]. Similarly, the industry has faced escalating external trade barriers. In 2024, the U.S. and EU imposed tariffs on China’s NEVs to protect domestic industries and address the rapid expansion of China’s NEVs globally. The U.S. introduced a 100% tariff [4], while the EU imposed anti-subsidy tariffs ranging from 17.4% to 37.6%, depending on cooperation with manufacturers [5].

Despite the growing adoption of NEVs, there remains a lack of comprehensive understanding of how industrial and trade policies influence the market diffusion of NEVs. Specifically, the interplay between industrial and trade policies and their impact on consumer behavior and enterprise strategies is not fully explored, which is manifested in the diffusion effect of enterprises’ low-carbon strategies and consumers’ preferences for NEVs.

Existing research has explored how industrial policies affect the diffusion of low-carbon technologies, especially the market diffusion of NEVs, by constructing an evolutionary game model based on complex networks. Policies targeting enterprises, such as carbon taxes [6], production subsidies, penalties [7], R&D subsidies, and tax breaks [8], have been extensively discussed. However, few studies have explored the diffusion effect of NEVs from the perspective of trade policies, and few studies have examined how the coordination between industrial policies and trade policies affects the diffusion of NEVs. Based on the international trade environment over the past two years, it is necessary to study the diffusion effects of NEVs from the perspective of trade policies.

In the existing literature, the evolution of consumer preferences or enterprises’ low-carbon production strategies is based on traditional evolutionary rules, such as the Fermi rule. Under this rule, the agents in the model imitate the preferences or strategies of their neighbors with a certain probability, which is influenced by the utility differences among consumers or the profit differences among enterprises. However, these imitation-based evolutionary rules struggle to capture the trial-and-error and learning behaviors of real-world agents in their decision-making processes [9], leading to considerable deviations in simulation results. For instance, a phenomenon of “excessive market diffusion” arises [7,10], where the market diffusion rate reaches a high level in a relatively short period, which deviates from the actual trend of market diffusion.

In reality, enterprises often exhibit the characteristic of “learning by doing” [11], and reinforcement learning (RL) can reflect the learning processes of agents from a mathematical perspective. Therefore, it is necessary to introduce RL algorithms to replace existing evolutionary rules in the model, so as to better fit the market diffusion trend of NEVs. However, a key challenge remains: while RL can capture the learning behaviors of agents in the model, its decision-making process is inherently opaque. Existing studies mostly infer micro-logic only from macro-diffusion curves, lacking a quantitative framework to explicitly explain why agents shift strategies under complex scenarios.

Furthermore, in the models for studying the diffusion of the NEV market, some research focuses on the evolution of enterprises’ low-carbon production strategies, while others focus on the evolution of consumers’ purchasing strategies. In the NEV market, both consumers and manufacturers are indispensable components. Few studies have considered the co-evolution of consumer preferences and enterprises’ production strategies [10,12,13].

In light of the realities elaborated above, there is an imperative to conduct in-depth research to address the aforementioned gaps in the existing literature. This paper attempts to answer the following questions. First, how do trade policies affect the market diffusion rate of NEVs, and does there exist a synergistic effect between trade policies and industrial policies? Second, can the introduction of RL algorithms better reflect the trial-and-error and learning processes of enterprises, thereby making simulation results more consistent with real-world data? Third, how can we explain the decision-making process of enterprises?

To address the aforementioned questions, this paper proposes a two-layer small-world network including consumers and enterprises and takes the RL algorithm Q-learning as the decision-making mechanism of enterprises in the model. Furthermore, we introduce the SHapley Additive exPlanations (SHAP) method to open the “black box” of RL, thereby exploring how industrial policies and trade policies affect the diffusion rate of NEVs. The main contributions of this paper are shown below:

We proposed a two-layer small-world network evolutionary game model that incorporates both enterprises and consumers, and we took into account the consumer-driven dynamic process of demand changes, thereby better reflecting the real-world NEV market.
We proposed an interpretable decision-making framework based on RL and SHAP. We adopted the Q-learning algorithm as the decision-making process of enterprise in the model to reflect the learning behavior. Meanwhile, we used the SHAP method to evaluate the feature importance affecting enterprises’ decisions.
We explored the synergistic effects of industrial and trade policies on the market diffusion of NEVs. Specifically, we analyzed how export market shares, raw material import tariffs, and consumer subsidies jointly influence consumer preferences, enterprises’ production strategies and market diffusion effects.

The remainder of this paper is organized as follows. Section 2 presents the related literature. Section 3 formulates the diffusion model of the NEV market. Section 4 introduces an RL-based enterprise strategy updating mechanism and interpretability framework based on SHAP. Section 5 presents the simulation results under different scenarios and the discussion on the model. Section 6 summarizes the key findings of this paper, discusses the limitations of the model, and outlines potential directions for future research.

2. Related Work

2.1. Single-Policy Analysis from the Industrial Policy Perspective

Under the framework of evolutionary games and complex networks, scholars classify industrial policies into supply-side and demand-side ones.

Supply-side policies mainly target automakers; through instruments such as production subsidies, carbon taxes [6], asymmetric penalties [7], R&D subsidies, and tax incentives [8], governments aim to improve the profitability and production enthusiasm of enterprises. Thus, governments can promote the diffusion rate of the NEV market on the supply side.

Reference [14] emphasizes that enterprise expectations about government interventions, such as financial subsidies and regulatory policies, are crucial in shaping both the likelihood and pace of low-carbon strategy adoption. Reference [15] explores the strategic dynamics between governments and manufacturers under varying carbon tax and subsidy frameworks. The results highlight that when both instruments are dynamic, manufacturers are more strongly encouraged to implement low-carbon production. Reference [16] studies how subsidy schemes and the dual-credit mechanism impact the competitive behavior of NEV manufacturers versus fuel vehicle (FV) manufacturers. The research reveals that increased battery recycling rates substantially stimulate NEV demand under the dual-credit policy; moreover, the study argues that direct subsidies to NEV producers are more effective than indirect forms of support in enhancing their market competitiveness. In addition, the government also formulates some indirect policies to guide the production behavior of enterprises, such as incorporating the NEV industry into the carbon market [12] and accelerating the construction of charging infrastructure [17].

On the demand side, consumer preference is a key factor in predicting how NEVs will be adopted and accepted in the market. Several studies have highlighted that a positive consumer attitude toward NEVs does not always lead to actual purchasing decisions. Consequently, researchers have explored a range of determinants that affect consumer NEV adoption, including price, environmental awareness, income levels, living environments, and prevailing social values. For instance, if the cost of an NEV exceeds what a buyer considers acceptable, they might instead opt for a traditional fuel vehicle. Government support mechanisms also influence choices. In addition, individual psychological drivers, such as the desire for social recognition, environmental values, and personal taste, also contribute to consumer decision-making. Accessibility to charging stations in one’s vicinity further impacts adoption willingness. As the number of NEV users increases, so does the market demand, enhancing the profitability of producers. In essence, consumer preference significantly shapes demand trends, and many studies have assessed how different policy tools can affect such preference, providing relevant policy insights [18,19,20,21,22].

Reference [23] reveals that consumer subsidies, government procurement, and public information campaigns all contribute to the promotion of NEV adoption. However, the article warns that overly generous subsidies may result in diminishing marginal effects and indicates that government procurement plays only a minor role in accelerating NEV diffusion, while public information initiatives are comparatively more impactful. Similarly, reference [22] points out that fostering consumers’ environmental awareness is crucial for advancing low-carbon technologies. The article argues that governments should prioritize cultivating environmental consciousness among citizens as a long-term strategy to encourage NEV adoption. Under the framework of agent-based modeling (ABM), reference [24] simulates and analyzes the interaction mechanism between consumers, energy companies, and the government in the electric vehicle market; the results show that policy incentives (e.g., price subsidies and charging infrastructure construction), improvements in charging speed, and site accessibility can effectively improve user experience and the adoption rate. Similarly, the impact of consumer psychological factors (e.g., range anxiety and environmental awareness) on the adoption of NEVs is analyzed in [25].

Existing studies have extensively explored the impact of domestic industrial policies on the diffusion of the NEV market by constructing evolutionary game models based on complex networks. However, these studies typically treat the NEV market as a closed market, largely neglecting the impact of changes in the international trade environment and international trade policies. In the context of recent global trade frictions, factors such as cross-border tariffs and export barriers have become non-negligible external shocks. To address this limitation, our model extends the traditional policy framework by explicitly incorporating raw material import tariffs and export market share as dynamic variables to simulate the diffusion of NEVs in an open economy.

2.2. Policy Mix Analysis

Most of the above literature studies the impact of industrial policies on enterprise low-carbon production decisions and consumers’ new energy vehicle purchase decisions from a single-policy perspective. However, the study of policy combinations seems to be more important. Recent studies on the diffusion of NEVs have moved beyond analyzing single policies, shifting toward exploring the combined effects of multiple policy interventions.

Reference [26] employs a system dynamics model to assess how electric vehicle-targeted policies interact with electric vehicle charging infrastructure (EVCI), revealing strong synergies, particularly between oil-to-electric (O2E) subsidies and infrastructure policies. Similarly, reference [27] studies eight different policy combinations, including battery capacity limits, dual credit schemes, fuel vehicle license restrictions, and R&D incentives; the article concludes that integrated policies consistently outperform single measures. Reference [28] analyzes the impact of tax incentives on enterprises and consumers from the perspective of demand-side and supply-side policies; the results show that supply-side policies and demand-side policies have synergistic effects, especially in the promotion of the NEV market. Reference [29] also emphasizes the necessity of coordinating supply-side policies and demand-side policies.

The recent literature has shifted from single-policy analysis to exploring the synergistic effects of a “policy mix”. However, the existing literature mainly focuses on the combination of domestic industrial policies, with fewer studies considering the synergistic effects of trade policies and domestic industrial policies. Therefore, it is necessary to consider the impact of combinations of trade and industrial policies on the diffusion of the NEV market.

2.3. Strategy Updating Rules in Evolutionary Game Models

Under the framework of evolutionary game theory, the strategy updating rules are crucial in determining how agents evolve their strategies over time based on interactions and payoffs. These rules represent the decision-making mechanisms through which agents adjust their behavior in response to the strategies and outcomes of others. Various updating rules have been proposed and extensively studied, each with distinct implications for the dynamics of strategy evolution. Common strategy updating rules include imitating the best, unconditional imitation and the Fermi rule [30]. In the diffusion model of NEVs, common strategy updating mechanisms include the Fermi rule and the segmented probability algorithm [7,10]. In most current studies, agents change their strategies by imitating or learning from their neighbors’ behavior with a certain probability. These protocols, however, do not cover all possible ways because agents usually have the ability to perform autonomous exploration, especially in many complex systems [9].

To address this limitation, it is essential to move beyond simple imitation-based rules. Real-world enterprises do not merely mimic the production strategies of their competitors and partners in the NEV market. Enterprises are relatively rational; their goal is profit maximization. Therefore, they learn and accumulate experience through trial and error, gradually finding the optimal strategy. RL can help agents complete tasks or achieve optimal strategies by interacting with the environment and accumulating experience. Therefore, introducing RL to better simulate enterprises’ decision-making behavior is necessary.

2.4. Research Gaps

Despite extensive studies on NEV diffusion and policy measures under the framework of ABM, complex networks and evolutionary games, several key research gaps remain underexplored.

First, existing models ignore the dynamic changes in market demand when studying supply-side policies, which are usually determined by consumer preferences; when studying demand-side policies, some models ignore the low-carbon production decision-making process of enterprises.

Second, in existing models, the decision-making process of agents basically relies on a certain probability of imitating the strategies of neighbors. However, in reality, the decision-making process of agents relies more on experience and a self-learning trial-and-error process.

Third, under the framework of complex networks and evolutionary games, few studies have examined the impact of policy mixes on the diffusion of NEVs. Most existing research has primarily focused on industrial policies, while trade policies, such as import tariffs, and the influence of overseas markets have been largely overlooked.

3. Diffusion Model Design

In this section, we illustrate the structure of the NEV diffusion model. Before introducing the detailed modeling framework, we present a table (Table 1) of notation to help clarify the variables and parameters used throughout this paper.

3.1. Overall Model Structure

The evolutionary game based on complex networks is composed of three elements: network structure, different participants and strategy evolutionary rules. The main participants in the diffusion model of NEVs include the government, automobile manufacturers and consumers. In the model, the government is responsible for formulating relevant policies, enterprises choose between producing NEVs and FVs, and consumers choose between purchasing NEVs and FVs. Based on a small-world network, this paper establishes a strategic game model to explore decision-making and diffusion mechanisms of enterprises and consumers. The overall model structure is illustrated in Figure 1.

3.2. Basic Assumptions of the Model

In the consumer network, there are $N_{1}$ consumers, and the average degree of the network is $d_{1}$ . Consumers are categorized as green, white, or brown. Green consumers are environmentally conscious and consistently prefer NEVs, with an initial share denoted by $α (t = 0)$ . White consumers exhibit moderate environmental concern, and their preferences may change under appropriate incentives. Considering the objective limitations in the real world (such as the lack of charging stations in remote areas and rapid battery degradation in cold regions), some consumers prefer FVs; these types of consumers are regarded as brown consumers [10].
In the enterprise network, there are $N_{2}$ enterprises, and the average degree of the network is $d_{2}$ . Each enterprise chooses whether to produce NEVs [7]. The initial share of NEV producers is denoted by $γ (t = 0)$ .
Each agent can observe the strategies and rewards of others through their network connections [7,23]. This reasonably approximates the transparency of today’s Internet society, where market data, competitors’ situations, and the current status of friends are accessible.
Enterprises adopting the same strategy produce homogeneous products and equally share the market demand [23].This simplification is adopted to abstract away brand-specific noise, allowing the model to focus exclusively on the macro-level competition between the two technological categories.
The total market size is constant [31]. While the real-world automotive market fluctuates, assuming a static baseline is essential to isolate the policy effect from the confounding factors of market expansion. This allows the model to specifically quantify how policy drives the shift in consumer preference and enterprise production strategy.
The tariffs imposed by other countries on China’s NEVs will lead to a decline in the export market share; the reduction in import tariffs on NEV raw materials will reduce enterprise production costs.

To characterize the impact of trade policy shocks, we assume that, at some point, developed economies such as the United States and the European Union impose high tariffs on China’s NEVs, thereby suppressing NEV exports. To reflect this scenario, this paper incorporates an export market share parameter into the model, representing the ability of Chinese enterprises to expand into international markets. If developed economies such as the United States and the European Union impose tariffs on China’s NEVs, this parameter will decline, affecting the expected profits of enterprises and the evolution of their production strategies.

Furthermore, we assume that the Chinese government, in order to promote the development of the NEV industry, reduces import tariffs on key raw materials for NEVs (such as lithium, cobalt, and nickel) to lower costs for enterprises. This scenario is simplified in the model by stating that when import tariffs on raw materials fall, the enterprises’ unit costs decrease accordingly, thereby increasing their overall profits and strengthening their incentive to continue producing NEVs.

3.3. Small-World Network

The small-world network adopted in this paper follows the model proposed by Watts and Strogatz in 1998 [32]. This network captures two fundamental features commonly observed in real-world systems: a short average path length and a high clustering coefficient. In such networks, even with limited direct connections, any two nodes can be linked through only a few intermediaries. The small-world property emerges by randomly rewiring some edges of a regular ring lattice, thereby maintaining local clustering while significantly reducing the average path length. A typical example is a social network, where individuals are connected through a few mutual acquaintances. Similarly, platforms such as Facebook and Instagram demonstrate this phenomenon; though linked to a small circle of friends, users can reach nearly anyone via friends of friends.

With the rapid development of the Internet society, relationships between enterprises have become increasingly close. Enterprises engage in both cooperation and competition, naturally exhibiting small-world characteristics. In evolutionary game models based on complex networks, a scale-free network is often considered an alternative. A key feature of scale-free networks is that the node degree follows a power-law distribution, meaning that a small number of “hub” nodes can connect to a large number of other nodes. While this structure demonstrates strong robustness against random failures, it is fragile against targeted attacks because influence is overly concentrated in a few dominant nodes. However, this structure does not align with the current reality of China’s NEV market. The Chinese market presents a diverse and competitive landscape, rather than a market dominated by just one or two super-hubs. Therefore, the small-world network is more suitable for describing the current market structure.

Based on this, this paper employs a two-layer small-world network to model social interactions among consumers and relationships among enterprises.

3.3.1. Consumer Modeling

The profits of both types of manufacturers are influenced by market demand. As more and more manufacturers choose to produce NEVs, consumer preferences and purchasing decisions for NEVs are constantly evolving, leading to dynamic shifts in consumer demand, which in turn determine market demand at every moment. The existing literature often overlooks the dynamics of consumer demand during the diffusion of NEVs, hindering enterprises from making production decisions based on more realistic market demand. Therefore, this paper incorporates the dynamic changes in consumer demand into the diffusion model.

Consumer preferences reflect the extent to which they prefer different products. Preference theory in economics assumes that consumers are rational and will maximize their utility or satisfaction based on their preferences. Consumer preferences can be represented by a utility function. This paper introduces a group behavior mechanism developed within the framework of social coordination games to describe consumers’ dynamic preferences [13] and considers consumers’ dynamic preferences as a component of market demand in a network-based evolutionary game model. As consumers communicate and influence each other, their opinions on products and purchasing decisions constantly change, so social interaction also plays a decisive role in consumer preferences. This paper examines the evolution of consumer preferences for NEVs and FVs and therefore naturally views consumer purchasing decisions as being influenced by a combination of social interaction and utility value.

Consumers’ total utility includes social utility and economic utility. Economic utility primarily considers government policy incentives such as subsidies; social utility, in reality, manifests itself as the willingness to coordinate with others. Because consumers’ purchasing strategies are influenced by their neighbors in the network, social coordination, like economic factors, is also a key factor influencing consumer purchasing decisions. According to reference [13], consumers’ total utility can be expressed as

U_{i} = s + \frac{k_{i}}{d_{i}},

(1)

where s represents consumers’ economic utility and

\frac{k_{i}}{d_{i}}

represents consumers’ social utility. Let

d_{i}

be the degree of consumer i in the network (i.e., the number of neighbors connected to consumer i), and let

k_{i}

denote the number of neighbors who adopt the same strategy as consumer i.

The strategy updating of consumers follows the Fermi rule.

P_{(i \to j)} = \frac{1}{1 + \exp [(U_{i} - U_{j}) / k]},

(2)

where

U_{i}

denotes the total utility of consumer i and

U_{j}

denotes the total utility of consumer j. Equation (2) indicates that when the utility of consumer i is lower than that of consumer j, consumer i is more likely to adopt the strategy of consumer j; conversely, if the utility of consumer i exceeds that of consumer j, consumer i is less likely to adopt consumer j’s strategy. This bounded rational behavior is captured by parameter k, where k towards zero implies that individuals are capable of maintaining rationality, while k tends to infinity suggests that individuals, being in a noisy and uncertain environment, find it difficult to make fully rational choices. According to reference [33], the noise level in this paper is set to

0.1

.

3.3.2. Enterprise Modeling

The average output for NEV and FV manufacturers can be expressed as

q_{N E V} (t) = \frac{α (t) Q}{γ (t) N_{2}},

(3)

q_{F V} (t) = \frac{(1 - α (t)) Q}{(1 - γ (t)) N_{2}} .

(4)

At time t, let Q be the total market demand in the automotive industry. In this model, the consumer network constructed with

N_{1}

nodes serves as a representative sample of the overall automotive market. Consequently, the variable

α (t)

, which denotes the proportion of green consumers within the network, functions as a proxy for the demand proportion of NEVs in the automotive market. Based on this mapping logic, the market demand for NEVs can be calculated by multiplying the simulated

α (t)

by the automotive market demand Q. The term

γ (t)

represents the proportion of enterprises producing NEVs in the enterprise network at time t. The term

1 - α (t)

represents the share of consumers who prefer FVs at time t, including both white and brown consumers. Affected by policy incentives and social interactions, some white consumers develop a preference for NEVs, which causes

1 - α (t)

to decrease gradually.

Considering that China’s automobile industry exports NEVs to overseas markets every year, the average output of enterprises producing NEVs after considering international trade can be expressed as

q_{NEV} (t) = \frac{α (t) Q + e x p o r t \times Q}{γ (t) N_{2}},

(5)

where

e x p o r t

represents the ratio of China’s NEV exports to domestic market demand.

Based on the basic assumptions of the model, the government’s import tariffs on NEV raw materials will directly affect the production costs of enterprises. Therefore, the import tariff policy parameter can be expressed as

m a t e r i a l_t a r i f f

; taking into account the policy impact of import tariffs, the average production cost of NEVs is expressed as

c_{N E V} \times (1 + m a t e r i a l_t a r i f f)

.

Therefore, the profit of enterprises can be expressed as

W_{NEV} = (p_{NEV} - c_{NEV}) q_{NEV} (t) - c_{NEV} \times m a t e r i a l_t a r i f f \times q_{NEV} (t),

(6)

W_{FV} = (p_{FV} - c_{FV}) q_{FV} (t),

(7)

where

p_{NEV}

and

p_{FV}

represent the unit prices of NEVs and FVs, while

c_{NEV}

and

c_{FV}

denote their respective production costs per unit. Notably, this model adopts an asymmetric setting by introducing dynamic trade variables exclusively for NEVs, while treating the FV sector as a stable baseline. This specification reflects the distinct policy landscape of the past two years, where trade policies have specifically targeted the NEV industry. Given the FV industry’s lower sensitivity to these shifts, holding its parameters constant functions as a controlled experiment, effectively isolating the marginal impacts of these policies on NEV diffusion.

4. An Interpretable RL-Based Decision-Making Framework

This section will elaborate on the enterprise decision-making mechanism based on Q-learning, then construct a micro-interpretable framework rooted in SHAP, and complete the parameter setting, parameter calibration, and model validation. The core content structure of this section is illustrated in Figure 2.

4.1. Q-Learning-Based Strategy Update Algorithm

According to Arrow’s “learning by doing” [11], the reference emphasized that “learning” is an indispensable economic phenomenon where enterprises gain efficiency and competitive advantage not through external shocks, but through the accumulation of production experience and dynamic interaction with the environment. In our model, the enterprise’s objective is to maximize the Q-value. Under conditions of bounded rationality and incomplete information, enterprises cannot perfectly predict dynamic market demands or neighbors’ actions. Instead, like the enterprises described by Arrow, they must interact with the environment, processing signals from market demand, government policy and competitor actions, to acquire information. The Q-learning algorithm mathematically formalizes this process: the Q-value acts as the enterprises’ experience stock, updated continuously through trial and error.

Recent advances in machine learning, especially in RL, have enabled agents to acquire optimal strategies through iterative interaction and feedback [34]. Among these, Q-learning, as a model-free algorithm with convergence and scalability [35], has been increasingly applied to evolutionary game models [36], offering new insights into cooperation mechanisms and strategic evolution. Therefore, this paper innovatively uses the Q-learning algorithm as an updating mechanism for enterprise production strategies.

In the evolutionary game, manufacturers aim to maximize profits but lack full market information. To adapt under uncertainty, we introduce the Q-learning algorithm, enabling enterprises to refine their production strategies through experience with changing market demand. Each enterprise maintains a Q-table, i.e., a matrix of size

n_{s} \times n_{a}

, where

n_{s}

and

n_{a}

are the number of states and actions respectively. In this study we define the state set

S = {0, 1}

and action set

A = {0, 1}

, where 0 denotes producing FVs and 1 denotes producing NEVs. It is worth noting that the state set

S = {0, 1}

focuses explicitly on the enterprise’s internal technological status (FV or NEV production) rather than external environmental variables such as policy intensity, market share and neighbors’ strategies. This design is based on the rationale that environmental information is implicitly captured through the reward function. Furthermore, keeping the state set parsimonious is crucial for computational tractability in a multi-agent evolutionary game. Accordingly, each manufacturer’s Q-table can be expressed as

Q = [\begin{matrix} Q (0, 0) & Q (0, 1) \\ Q (1, 0) & Q (1, 1) \end{matrix}],

(8)

where

Q (s, a)

is the expected cumulative reward when the enterprise is in state

s \in S

and chooses action

a \in A

.

In each round of the evolutionary game, a manufacturer compares its profits with all of its neighbors. The total reward R for that round is calculated as the sum of the profit differences obtained through interactions with its neighboring manufacturers; the reward function can be expressed as

R = \sum_{i = 1}^{n} r_{i},

(9)

where n denotes the number of neighboring manufacturers and

r_{i}

represents the profit difference obtained from the interaction with each neighbor.

After each round of the game, the manufacturer updates its Q-table according to Equation (10).

Q (s, a) = Q (s, a) + α [R + γ \max_{a^{'}} Q (s^{'}, a^{'}) - Q (s, a)],

(10)

where

α

denotes the learning rate,

γ

is the discount factor, and

\max Q (s^{'}, a^{'})

represents the maximum Q-value corresponding to the state at time

t + 1

. In addition to updating Q-values iteratively, enterprises also need to make strategy selections based on the current state’s Q-values in each round of the game. To balance “exploration” and “exploitation,” this paper adopts the

ϵ

-greedy policy as the decision update mechanism for enterprises. Each manufacturer selects actions using the

ϵ

-greedy strategy: with probability

1 - ϵ

, the action with the highest current Q-value is chosen; with probability

ϵ

, a random alternative is explored.

a_{i} = \{\begin{matrix} \underset{a^{'} \in A}{\arg \max} Q_{i} (s^{'}, a^{'}), & 1 - ϵ \\ a_{j} (a_{j} \in A), & ϵ \end{matrix} .

(11)

In summary, the framework of the Q-learning algorithm is shown in Algorithm 1.

Algorithm 1: Q-learning algorithm for enterprise decision-making

Convergence Analysis of Q-Learning Algorithm

Before calibrating the model, we first verify the effectiveness of the internal decision-making mechanism of the model. What we are concerned about is whether the Q-learning algorithm used by the enterprise agent can converge to a stable strategy in our simulation environment. The sign of convergence is the reduction in the Temporal Difference (TD) error. The TD error represents the gap between the ‘expected return’ and the ‘actual return’ of the agent. When this error drops to a stable low level, it indicates that the agent has learned the optimal strategy. Under the three different policy scenarios mentioned above, we conducted 50 repeated experiments on the average absolute TD error of the model, with each experiment running 250 steps. The learning rate

α

is 0.1, the discount factor

γ

is 0.9, and the probability

ϵ

in the greedy strategy is set to 0.05. Results are shown in Figure 3.

As can be seen from Figure 3, under the three different policy scenarios, the average absolute TD error shows a similar downward trend during the model operation. When

t = 250

, the error value is around 0.1, and the enterprise has basically learned the optimal production strategy. This also means that the Q-learning algorithm is effective in the model.

4.2. Interpretability Framework Based on SHAP

Although RL (Q-learning) endows enterprises with learning capabilities in dynamic environments, their decision-making processes are usually regarded as a “black box”, making it difficult to directly quantify the specific contributions of various factors (e.g., market demand, cost advantage, and local NEV share) to the final decisions. To open this black box and reveal the micro-level decision-making mechanisms, this study introduces the SHAP method to construct a post-modeling interpretability framework.

The Shapley method can calculate the marginal contribution of each feature to the model’s prediction results. This attribution method is based on cooperative game theory and possesses desirable properties, including efficiency, symmetry, the dummy player property, and additivity, which ensures the fairness of contribution allocation [37]. This enables researchers to clearly understand the extent to which each input feature affects the final prediction. As a feature attribution method, the Shapley value can explain how a machine learning model generates predictions, thereby enhancing the credibility and transparency of the decision mechanism.

4.2.1. Construction of Surrogate Model

Inspired by [38], this paper adopts the Random Forest Classifier (RFC) as the surrogate model. The target variable y represents the specific strategy adopted by the enterprise at time t, corresponding to the action selected from the action space

A = {0, 1}

; the input feature space F includes six variables that influence the enterprise’s decision-making, namely NEV market maturity (enterprise diffusion rate), cost advantage ( relative cost), domestic and overseas market demand, the state of enterprises at the previous time step, and the number of neighbors of each agent in the network.

To evaluate the performance of the RFC, this paper collects the decision results and relevant variables of enterprises at each time step under different policy experiments and randomly divides the collected data into a training set (70%) and a testing set (30%), with specific parameter settings referring to the approach in [39]. The training results show that the RFC achieves an accuracy of 0.8135 on the training set and 0.8117 on the testing set. The consistency between these two metrics indicates that the surrogate model has strong generalization ability and is free from overfitting. An accuracy rate exceeding 80% demonstrates that the RFC can effectively capture the underlying decision-making logic of enterprises in the model, laying a reliable foundation for subsequent SHAP analysis.

4.2.2. SHAP Value

Based on the trained RFC, we adopt the idea of the Shapley value in game theory to decompose the marginal contribution of each feature to the prediction results. The SHAP value

ϕ

of feature j is calculated as follows:

ϕ_{j} = \sum_{C \subseteq F ∖ {j}} \frac{| C |! (M - | C | - 1)!}{M!} [f_{x} (C \cup {j}) - f_{x} (C)],

(12)

where F denotes the set of all input features, M is the number of features, C denotes a subset of F that does not include feature j, and

f_{x} (C)

represents the expected model prediction when only the features in the subset are known or provided.

4.3. Parameter Settings and Model Calibration

4.3.1. Static Parameter Settings

To perform the simulation, the static parameters in the model are based on values reported in previous studies. According to reference [7], the number of enterprises in the small-world network at the initial stage of the model is 300 [33]. The number of consumers in the network at the initial stage of the model is 1000 [12], and the proportion of brown consumers is 0.1 [33]. When the average degree in the network is 6, the diffusion effect of low-carbon technology is the best [40]; therefore, the average degrees of the consumer network and the enterprise network in the model are both 6. The average price and production cost of NEVs are 339,800 yuan and 250,000 yuan respectively, while the average price and production cost of FVs are 128,900 yuan and 55,400 yuan, respectively [7]. During the simulation experiment, normalizing parameters such as price and cost will not affect the simulation results [33]; therefore, the average price of NEVs is set to 1 in the model, the relative price of FVs is 0.38, the relative cost of NEVs is 0.74, and the relative cost of FVs is 0.16. When the reconnection probability in the small-world network is set to 0.1, the small-world network can reduce the average path length of the network while maintaining high clustering characteristics. Therefore, the reconnection probability of nodes in the small-world network in the model is set to 0.1 [32]. In summary, the model parameters are shown in Table 2.

4.3.2. Dynamic Parameter Settings

In the model, there are two parameters that change dynamically over time: the proportion of enterprises producing NEVs

γ (t)

and the proportion of green consumers

α (t)

. In the initial stage,

α (t = 0)

is 0.2 and

γ (t = 0)

is 0.1 [33].

When the government implements relevant policies, policy parameters affect consumers’ utility functions and enterprises’ profit functions; therefore, at the next time step, consumer preferences will change, thus affecting the proportion of green consumers in the consumer network. This proportion of green consumers can be regarded as the market demand for NEVs. In the enterprise network, enterprises’ production decisions will be affected by changes in profit functions and market demand, thereby influencing each enterprise’s production strategy and ultimately altering the proportion of enterprises producing NEVs at each time step.

4.3.3. Model Calibration

Under the framework of Q-learning, learning rate

α

, discount rate

γ

and the probability of updating strategy

ϵ

are calibrated with the historical data released by China Energy News [41]. The dataset used in this paper is derived from the real-time news released by China Energy News on 13 October 2025. The website provides detailed data on the market penetration rate of NEVs in China from 2018 to September 2025. As the website is supervised by People’s Daily, an official and authoritative media organization in China, the dataset selected for this paper is compliant and reliable. We selected the monthly data on the penetration rate of China’s NEV market from March 2021 to September 2025 and we searched for the optimal parameters by calculating the root mean squared error (RMSE). We traversed combinations of parameters (with

α

set from 0.1 to 0.9 and a step of 0.1;

γ

set from 0.1 to 0.9 and a step of 0.1; and

ϵ

set from 0.05 to 0.25 and a step of 0.05).The optimal combination of parameters is

α = 0.1

,

γ = 0.9

,

ϵ = 0.05

. The RMSE is 0.1145.

4.3.4. Model Validation

Under the framework of complex networks and evolutionary games, we compared the simulation data with the real data under the scenario of consumer subsidy policy. Meanwhile, we also replaced the Q-learning algorithm with two other representative enterprise decision-making mechanisms. The first one is the Fermi rule, which can be expressed as Equation (2); the second one is the segmented probability algorithm, which can be expressed as follows:

P (i \to j) = \{\begin{matrix} 1 & if W_{i} \leq (1 / 2) W_{j} \\ \frac{(W_{j} - W_{i})}{W_{i}} & if (1 / 2) W_{j} < W_{i} < W_{j} \\ 0 & if W_{i} \geq W_{j} \end{matrix} .

(13)

We calculated the relative error between the simulation data and the real data and simultaneously compared the trends in the diffusion rate over time under different algorithms. The results are shown in Figure 4. It can be seen from Figure 4a that the overall relative error of the Q-learning algorithm is lower, and in Figure 4b, the diffusion trend is closer to that of the real data.

The results in Figure 4b reveal a critical divergence: the curve of the other two mechanisms rises too sharply, which is similar to the simulation results in some literature, while the Q-learning curve follows a more gradual, realistic trajectory. In a complex, dynamic environment, simple imitation rules often lead to “over-diffusion.” Agents blindly copy neighbors’ strategies without considering their own cost structures or historical constraints, causing the simulation to overestimate the speed of adoption compared to reality. In contrast, the Q-learning mechanism introduces a necessary “learning cost.” Enterprises need time to accumulate positive feedback before committing to a final production strategy. This effectively slows down the diffusion rate, prevents premature saturation, and allows the model to more accurately capture the real diffusion trend of the NEV market.

In summary, the co-evolutionary process of consumers and enterprises is shown in Figure 5.

5. Evaluation

This section analyzes the effect of consumer subsidies, import tariffs on NEV raw materials, tariffs imposed by other countries on China’s NEVs, and policy combinations on the diffusion of NEVs. This article also uses Python 3.11.14 for simulations. In each experiment, the model focuses on three indicators: the proportion of enterprises producing NEVs, the proportion of green consumers and average profit of two types of enterprises. In the following policy combination experiments, we graphically present the enterprise diffusion rate. This choice is grounded in the model’s co-evolutionary logic: unlike previous studies that treat demand as exogenous, this paper incorporates dynamic consumer preferences to drive market demand, which in turn dictates enterprise production decisions through profit feedback; consequently, the enterprise diffusion rate effectively serves as a proxy for the overall NEV market share, reflecting the aggregate outcome of this co-evolution.

Each experimental model runs for 250 steps, each step representing one month. This temporal scope is deliberately selected to fully encompass the strategic period of China’s “New Energy Vehicle Industry Development Plan (2021–2035)” and the national “Carbon Peaking” target (2030). Extending the horizon into the early 2040s further ensures a sufficient duration to observe the complete diffusion lifecycle, from the current growth phase to future market saturation, and allows the Q-learning algorithm adequate iterations to reach a stable convergence equilibrium.

The simulation results are the average of 50 model iterations. After the model runs, this paper uses SHAP values to interpret the micro-level decision-making behaviors of enterprises and simultaneously employs the Sobol method to conduct a sensitivity analysis on the policy parameters of the model.

5.1. Single-Policy Analysis

5.1.1. Consumer Subsidy

Consumer subsidies reduce the cost of buying an NEV, enhancing their price competitiveness against traditional FVs. The promotion of NEVs also helps reduce greenhouse gas emissions and other harmful pollutants, improve air quality, and mitigate climate change. Therefore, consumer subsidies are not only a strategy to promote technological and market development but also a key measure for achieving sustainable development. They also increase public awareness and acceptance of NEVs and their environmental advantages, strengthening overall societal support for NEVs.

To explore the impact of consumer subsidy policies on the diffusion rate of NEVs, this paper will change the value of the consumer subsidy factor s (0, 0.2, 0.4, 0.6, 0.8, and 1.0) and observe the diffusion rate in both networks. Since the social utility value in the model ranges from 0 to 1, the economic utility value is also set to the same scale to ensure consistency of magnitude. Simulation results are shown in Figure 6.

From the perspective of overall dynamic evolution, the diffusion rates of both enterprises and consumers show an upward trend as the subsidy factor s increases. Diffusion behavior in consumer networks responds more quickly, exhibiting greater sensitivity and rapid propagation. While the diffusion rate of enterprises also increases with the subsidy factor s, it rises more slowly and exhibits a more gradual convergence process. This difference stems primarily from the RL mechanism employed by enterprises during their strategic evolution. Rather than directly imitating their neighbors, each enterprise continuously learns and optimizes its strategy through a Q-learning algorithm based on historical experience and reward feedback. Therefore, compared to consumers’ direct behavioral responses to policy incentives, enterprises exhibit greater rationality and adaptability in response to subsidy adjustments, demonstrating a gradual learning process in their strategic evolution.

From a policy perspective, the higher the consumer subsidy, the higher the diffusion rate. This result demonstrates the positive impact of government subsidies on consumers. With the continued implementation of the policy, consumers, as rational economic agents seeking to maximize utility, increasingly favor NEVs, leading to a continuous increase in market demand for them. As demand continues to rise, profits for NEV manufacturers also increase, attracting other enterprises to join the NEV market. Figure 6 also shows that when the subsidy factor s is low, the diffusion effect at both the enterprise and consumer levels is not particularly strong. This means that when consumers are subsidized, when the subsidy factor is too low, they do not deliberately change their preferences. When the subsidy factor s is high (0.6, 0.8, and 1.0), the enterprise and consumer diffusion rates are both high, but the diffusion rates are different. The higher the subsidy factor, the faster the diffusion rate. Over time, the enterprise and consumer diffusion rates become very close under high-subsidy conditions.

This experiment shows that low-intensity consumer subsidies have a weaker effect on firm diffusion rates, while high-intensity consumer subsidies are more effective. Under higher-intensity subsidies, diffusion rates vary. If the government wants to rapidly develop the NEV market, it should provide higher subsidies to consumers, but this will place fiscal pressure on the government. Therefore, policymakers need to rationally design and quantify consumer financial incentives based on their goals.

Based on the simulation results in Figure 6b, we observe distinct consumer behaviors across different subsidy levels. In low-subsidy scenarios (

s \leq 0.2

), consumer demand remains stagnant. Notably, when

s = 0.4

, the adoption rate shows a downward trend after an initial rise, indicating that the incentive is insufficient to sustain long-term market diffusion. Conversely, in high-subsidy scenarios (

s \geq 0.6

), the market saturates too rapidly. In contrast,

s = 0.5

represents a critical “tipping point” where the market successfully transitions from instability to sustainable growth. Therefore, to effectively assess the impact of trade policies, we set the subsidy factor at a baseline of

s = 0.5

in the subsequent experiments. This “moderate support benchmark” ensures sufficient market resilience, allowing us to observe the marginal effects of import tariffs and export fluctuations without these secondary effects being masked by extreme subsidy conditions.

5.1.2. Import Tariff Policies on NEV Raw Materials

The government’s reduction in import tariffs on key raw materials for NEVs (e.g., lithium, cobalt, and nickel) can lower manufacturers’ production costs, thereby reducing the overall cost of NEV manufacturing. This decrease in costs directly enhances the profitability of NEV producers and increases their willingness to supply. To examine the impact of raw material import tariff adjustments on the diffusion rate of enterprise adoption, this study analyzes how varying tariff rates (0%, 5%, 10%, 15%, 20%, and 25%) influence the proportion of NEV manufacturers and their average cumulative profits. The lower bound of import tariffs reflects the current realistic baseline; the provisional tariff rate for key raw materials such as lithium and cobalt carbonate is 0% [3]. The upper bound benchmarks the threshold of protectionism, aligning with contemporary punitive standards, such as the 25% U.S. Section 301 tariffs on China’s NEVs [42]. Simulation results are shown in Figure 7.

Figure 7a shows that as the tariff level increases from 0% to 0.25%, the peak value of the enterprise diffusion rate shows a decreasing trend. This suggests that higher import tariffs increase the cost of NEV production, thereby slowing the diffusion of the low-carbon transition. Figure 7b shows the average cumulative profit curve for NEV manufacturers and reveals that as the raw material tariff increases from 0 to 0.25, the average profit continuously decreases from approximately 0.25 to around 0.15. Therefore, higher import tariffs weaken the profitability of NEV manufacturers. In contrast, in Figure 7c, under the low-tariff scenario, the maximum profit of FV manufacturers is slightly higher than that of NEV manufacturers. Some FV manufacturers experience a short-term profit boost as competitors switch to NEVs. However, their profits decline over time.

From a trade policy perspective, the results of this experiment reveal the dual impact of government tariff intervention on the industrial ecosystem. On the one hand, lowering import tariffs on key NEV raw materials can indirectly boost profits for NEV manufacturers, accelerating the industry’s low-carbon transition through network neighborhood effects and evolutionary game mechanisms. On the other hand, indiscriminate tariff increases can lead to a decline in overall profits and delay the low-carbon transition in the medium and long term.

5.1.3. Other Countries’ Tariffs on China’s NEVs

Whether from theoretical research or empirical research, we can draw a conclusion: the tariffs imposed by other countries on China’s NEVs will affect the overseas market share of enterprises [43]. Therefore, in order to explore the impact of tariffs imposed by other countries on China’s NEVs on the diffusion rate, this experiment assumes that the overseas market share of NEV enterprises varies from 0% to 50%. Simulation results are shown in Figure 8.

As illustrated in Figure 8, a decline in the overseas market share tends to hinder the enterprise adoption rate. In this scenario, the diffusion rate of enterprises remains within the range of 0.7 to 0.8. Therefore, when NEV manufacturers are exposed to external competition or changes in the international policy environment, the suppressive impact of a reduced overseas market share on their diffusion performance appears relatively limited. A smaller overseas market share indicates a contraction in potential external demand and profit margins, which correspondingly reduces the cumulative positive rewards in the Q-learning framework, thus diminishing the incentive to adopt NEV production strategies. Nevertheless, due to the “neighbor effect” embedded in the network structure, once a certain proportion of enterprises transition toward NEV production, their local networks generate a demonstration effect that continuously strengthens the learning motivation of surrounding enterprises. Consequently, even under conditions of a low (or even zero) overseas market share, a considerable number of enterprises continue to pursue green production pathways. Hence, as the overseas market share decreases from 0.5 to 0, the diffusion rate declines slightly but remains stable. In other words, as long as the cost–benefit balance in the domestic market is not fundamentally reversed, the internal learning dynamics among enterprises can largely offset the adverse effects of declining external demand.

From the perspective of consumer level, the market acceptance of NEVs has gradually increased under the subsidy policy. Government subsidies have created a relatively rigid demand for NEVs among domestic consumers. This endogenous demand helps buffer the negative impact of a decline in export share, further stabilizing the development of the industry. In other words, when overseas market demand significantly declines, although enterprises face pressure on the revenue side, the diffusion rate shows only a slight decrease.

5.2. Policy Combination Analysis

5.2.1. Consumer Subsidies and Import Tariffs on NEV Raw Materials

Figure 9 shows the impact of the policy combination of import tariffs on NEV raw materials and consumer subsidies on the diffusion rate.

With low consumer subsidies, diffusion rates remain low regardless of import tariff levels. As shown in the first two columns of Figure 9, adoption rates fluctuate only slightly between 0.04 and 0.06. This indicates that without sufficient demand-side impetus, reducing enterprise costs by lowering raw material import tariffs is insufficient to promote the widespread adoption of NEVs.

When the consumer subsidy is 0.4, the domestic market is in its initial development stage. If import tariffs are reduced from 25% to 0, the diffusion rate can be significantly increased from 0.16 to 0.47. This means that when domestic market demand rises steadily in the early stage, reducing the cost of raw materials for producing NEVs can indirectly incentivize enterprises to transition to low-carbon practices. Especially when government departments face fiscal pressure, they can consider using trade policies to reduce related costs in the industrial chain, thereby supporting the NEV industry.

Under a high-subsidy scenario, the negative impact of import tariffs is significantly reduced. Even with the maximum import tariff level, an adoption rate of 0.46 is maintained when the consumer subsidy

s = 1

. This suggests that when governments implement relevant policies, they should design the primary policy from the perspective of stimulating demand, and import tariffs can be considered as a supplementary policy.

5.2.2. Consumer Subsidies and Tariffs Imposed by Other Countries on China’s NEVs

Figure 10 shows the impact of the policy combination of consumer subsidies and tariffs imposed by other countries on China’s NEVs on the diffusion rate.

When the consumer subsidy level is close to 1, the diffusion rate is generally high, approaching 0.7, regardless of changes in overseas market share. This indicates that when subsidies are high, consumers’ willingness to purchase NEVs increases significantly, in turn incentivizing enterprises’ low-carbon transition. When consumer subsidies are low, the diffusion rates show a downward trend, especially when the overseas market share is low. Figure 10 shows that the diffusion rates are low under low subsidies and only rebound when the overseas market share is high. Under low subsidies, consumers’ willingness is weak, resulting in slower adoption of NEVs.

With a high overseas market share, the diffusion rate of NEVs is more stable, maintaining a high level even when consumer subsidies are low. This is because a high overseas market share provides enterprises with additional profit margins, helping to mitigate the negative impact of low domestic demand and low subsidies. In this scenario, enterprises can gain more market opportunities through exports, driving the adoption of NEVs. When the overseas market share is low, the role of subsidy policies becomes even more critical. If subsidy levels are low, even with a small overseas market share, the diffusion rate will decline significantly. This is because overall demand for NEV production is overly dependent on the domestic market, which in turn is constrained by low subsidies. This leads to insufficient incentives for enterprises to transform and a low diffusion rate.

Therefore, the government should promote the balanced development of domestic and international market demand. Although the tariffs imposed by European and American countries on China’s NEVs have led to a decline in the overseas market share, the demand in the domestic market and the rigid support of subsidy policies have sustained the development of the NEV industry. Therefore, the government should strengthen domestic incentives while promoting the synergy between the upstream and downstream of the industry chain, facilitating the overall development of the industry. In addition, the government should enhance policy support for expanding international markets, such as providing export tax rebates and promoting international cooperation. By diversifying market strategies, the government can mitigate the uncertainties of external markets and strengthen global competitiveness.

5.2.3. Import Tariffs on NEV Raw Materials and Tariffs Imposed by Other Countries on China’s NEVs

Figure 11 illustrates the impact of a policy combination on the diffusion of NEVs, including tariffs on NEV raw materials and tariffs imposed by other countries. The different policy combinations shown in this figure allow us to observe how changes in tariffs and the overseas market share affect the diffusion rate and further analyze the interactions between these policy combinations.

When raw material tariffs are low (e.g., 0 and 0.05) and overseas market shares are high, the diffusion rates are generally high, ranging from 0.68 to 0.71. This combination provides enterprises with both market and cost incentives, accelerating the diffusion of NEVs. As raw material tariffs gradually increase (e.g., from 0.1 to 0.25), the diffusion rates gradually decrease, particularly when overseas market shares are low (0 or 0.1). For example, when the tariff is 0.25 and the overseas market share is close to 0, the diffusion rate is only 0.50, indicating that the combination of higher tariffs and lower market shares severely suppresses diffusion rates. In this scenario, enterprises face higher production costs and limited export opportunities, lacking sufficient market demand support, ultimately inhibiting the diffusion of NEVs.

Even if other countries impose tariffs on China’s NEVs, China can still respond to this external trade environment through appropriate trade policies. By flexibly adjusting import tariffs, strengthening overseas market development, and continuing to promote subsidy policies, the government can help China’s NEV industry maintain its competitiveness in the global market and promote its sustainable development.

5.3. Micro-Level Decision Mechanism Analysis Based on SHAP

To reveal the micro-level decision-making process of enterprises, we employ SHAP to interpret the trained RFC. The SHAP summary plot reflects the contribution and direction of each feature to the model output.

As can be seen from Figure 12, the most influential features for enterprises’ production decisions are NEV market maturity, domestic market demand, and the state at the previous time step. The more mature the NEV market is, the more it promotes enterprises to produce NEVs; conversely, low domestic and overseas market demand will hinder enterprises’ low-carbon transition. If an enterprise produced FVs at the previous time step, it will be more willing to switch to NEV production in the next time step, while some NEV-producing enterprises will maintain their production status. The lower the relative cost of NEV production for enterprises, the more it facilitates their low-carbon transition.

From the perspective of network structure, the number of an enterprise’s neighbors exhibits a relatively low SHAP value, indicating a limited marginal impact on decision-making outcomes. However, this result does not contradict the theoretical importance of the small-world network. In the model, the network serves as a fundamental information channel through which enterprises observe neighbors, acquire information, and update their production strategies. The existence of such a network structure is a necessary precondition for localized learning and strategy adaptation. Once this information channel is established, increasing or decreasing the number of neighbors will either introduce redundant information or lead to information insufficiency. Therefore, a low SHAP value reflects the very limited incremental effect of the number of neighbors in an already well-connected network, rather than negating the core role of the small-world network in enabling information acquisition and strategic adjustment.

5.4. Sobol Sensitivity Analysis

In this study, the parameters of the Q-learning algorithm (e.g., learning rate

α

, discount factor

γ

, and exploration rate

ϵ

) describe the behavioral characteristics of enterprises. These values have been rigorously calibrated in Section 4.3.3 to match historical diffusion data. Second, the network topology parameters

d_{1}

and

d_{2}

are fixed based on established standards [40]. Therefore, varying these structural parameters would cause the model to deviate from its empirically validated baseline. On the other hand, the policy variables represent the uncertain external conditions and actionable levers. Therefore, to evaluate the robustness of policy interventions while maintaining the model’s empirical and theoretical footing, we restrict the global sensitivity analysis exclusively to the three exogenous policy variables.

In this study, the Sobol sensitivity analysis method is employed to examine the policy parameters. This technique is a widely used global sensitivity analysis approach in mathematical modeling and statistics, designed to evaluate how the uncertainty in model input parameters affects the variability of model outputs. The method is based on variance decomposition, which allows for identifying the contribution of each input parameter to the model output, including both individual and interactive effects among parameters [44,45,46]. Specifically, the Sobol method quantifies the relative importance of model parameters by calculating Sobol indices derived from the variance decomposition principle. These indices are classified into two types: the first-order Sobol index, representing the direct effect of a single parameter on the model output, and the total-effect Sobol index, which measures the combined influence of a parameter and its interactions with other parameters. The results of the sensitivity analysis are shown in Figure 13.

Figure 13 shows the first-order (S1) and total-effect (ST) Sobol indices for the three policy parameters. The consumer subsidy exhibits S1 close to 1 for consumer diffusion rate and S1 close to 0.9 for enterprise diffusion rate, indicating that subsidies alone explain almost all output variance. This reflects the model structure: subsidies directly enter the consumers’ economic utility, which immediately affects adoption probability and subsequently propagates through the enterprise layer by altering demand and profitability. The ST index is also high, confirming that subsidies dominate not only as an independent factor but also through interactions, since all other parameters exert their influence primarily via enterprise profit, which is also shaped by consumer demand.

Import tariffs show a moderate S1 value but a significantly higher ST value. This indicates that tariffs influence diffusion through interactions with consumers’ adoption and enterprise learning dynamics. Since tariffs enter the enterprise profit function, their effect is amplified when enterprises update their strategies via Q-learning. The learning process accumulates profit differences over time, causing even small cost changes to produce sustained impacts on enterprise adoption.

The export market share has the smallest first-order and total-effect Sobol indices, which means that its impact on the diffusion results is limited. This is because export share does not directly change consumer decisions, and it only affects enterprise profits slightly compared with subsidies or import tariffs. The total-effect index is only slightly higher than the first-order index, showing that export share interacts weakly with other parameters. In our model, most enterprise profits come from domestic demand, which is strongly influenced by consumer subsidies. Therefore, changes in export share contribute very little to the overall variance in NEV diffusion outcomes.

However, in real-world policy environments, it is unrealistic for governments to provide high levels of consumer subsidies indefinitely, as long-term subsidy schemes impose considerable fiscal pressure. The sensitivity analysis shows that consumer subsidies play a dominant role, implying that subsidies are most effective during the early stage of industry development when stimulating demand is crucial. Once the industry enters a more stable growth phase, governments may consider gradually phasing out subsidies to reduce fiscal burdens while relying more on policy combinations to support sustainable development. In addition, the NEV industry is exposed to uncertainties in the global market. Although import tariffs on raw materials and export market share exhibit smaller Sobol indices compared with consumer subsidies, these factors may still become important under real external shocks. When confronted with changes in international trade conditions, governments can employ trade policies as complementary tools to mitigate adverse impacts and maintain the competitiveness of the NEV sector.

5.5. Discussion

While this model is calibrated using data from China’s NEV market, the proposed framework possesses generalizability. Methodologically, the integration of complex networks with evolutionary game theory serves as a universal template for analyzing ’supply–demand co-evolution’ in various technological transitions, provided that specific model parameters are recalibrated to reflect local market contexts. Furthermore, policy insights into the synergy between domestic incentives and trade barriers can provide valuable reference points for other emerging economies with similar energy structures and stages of development.

Although the Q-learning algorithm used in this study relies on a simple tabular structure (Q-table) and can be efficiently executed on a CPU, the computational cost grows substantially when the state–action space or the number of interacting agents increases. In large-scale social simulations, or when deep neural networks are introduced to approximate value functions, RL algorithms can be accelerated using GPUs to parallelize matrix operations and batch updates.

6. Conclusions

This paper proposes a two-layer small-world network model to study the co-evolution of enterprise low-carbon production strategies and consumer preferences under an open-economy framework. By innovatively integrating the Q-learning algorithm with the SHAP method, we reconstructed the micro-decision mechanism of enterprises. The experimental results demonstrate that the Q-learning algorithm can not only converge to a stable strategy, but also outperform traditional evolutionary rules in fitting real-world diffusion trends.

The results of the single-policy simulation indicate that low-level consumer subsidies (

s = 0.2

) exert a weak impact, while high-level subsidies yield the optimal diffusion effect (

s = 0.8, 1.0

); however, discrepancies exist in the diffusion rates. By reducing import tariffs on raw materials, the government can effectively lower enterprises’ production costs, thereby facilitating their low-carbon transition. If enterprises’ export market shares are affected, their diffusion rates will fluctuate by approximately 10%.

The results of the policy mix simulation show that when consumer subsidies are guaranteed to stimulate a certain level of demand, the government can amplify the policy effect of subsidies by reducing import tariffs on raw materials, which means there exists a synergistic effect between the two. When domestic market demand is low, changes in import tariffs exert no significant impact on the diffusion effect. When overseas market access is impeded, the market diffusion rate can still be maintained at a relatively high level as long as stable or growing domestic market demand is ensured. When domestic market demand is sluggish, the government can also boost the diffusion rate by developing overseas markets.

SHAP analysis results indicate that enterprises prioritize market maturity most in their decision-making process. In terms of policy factors, enterprises attach the greatest importance to domestic market demand and changes in relative costs, while the overseas market has a relatively small impact. The Sobol sensitivity analysis results indicate that consumer subsidies play a dominant role, whereas trade policies are more suitable as auxiliary policy instruments.

However, there are certain limitations to this study that should be noted:

Currently, the model identifies consumer subsidies as the primary driver of the utility function. While this specification captures the core economic incentive mechanism, the modeling of consumer behavior still involves certain simplifications. The model only focuses on policy incentives and social network effects, yet it simplifies the critical non-monetary attributes in real-world vehicle purchase decisions, such as range anxiety, charging infrastructure convenience, brand loyalty, and vehicle safety performance.
The model assumes relatively constant production technology levels; in reality, costs and prices of NEVs typically change over time due to technological maturity.
The social network topology is static throughout the simulation; however, social relationships and information channels are dynamically evolving.
The model employs a simplified representation of policy transmission mechanisms, where government policies can be transmitted instantly and perfectly to enterprises and consumers without causing any distortion; however, in reality, policy implementation often involves complex intermediate processes, resulting in a lag in policy effects.

Addressing these simplifications to better align the model with real-world complexities constitutes the central priority of our future research.

Author Contributions

Conceptualization, methodology, formal analysis, resources, writing—review and editing, supervision, project administration, and funding acquisition, H.L.; software, validation, investigation, data curation, writing—original draft preparation, and visualization, Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the HSSF Foundation of the MOE in China (No. 23YJA790053) and the 8th Mentor-Led Program of Shanghai International Studies University (No. 2025DSYL017).

Data Availability Statement

The data and code in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

NEVs	New energy vehicles
FVs	Fuel vehicles
RL	Reinforcement learning
SHAP	SHapley Additive exPlanations
RFC	Random Forest Classifier
ABM	Agent-based Modeling
TD error	Temporal Difference error

References

Shang, H.; Sun, Y.; Huang, D.; Meng, F. Life cycle assessment of atmospheric environmental impact on the large-scale promotion of electric vehicles in China. Resour. Environ. Sustain. 2024, 15, 100148. [Google Scholar] [CrossRef]
Stechemesser, A.; Koch, N.; Mark, E.; Dilger, E.; Klösel, P.; Menicacci, L.; Nachtigall, D.; Pretis, F.; Ritter, N.; Schwarz, M.; et al. Climate policies that achieved major emission reductions: Global evidence from two decades. Science 2024, 385, 884–892. [Google Scholar] [CrossRef] [PubMed]
Shenzhen Battery Industry Association. Temporary Import Tariff Rates for Lithium Chloride, Lithium Carbonate, and Cobalt Carbonate Reduced to 0% Starting January 1, 2024. Available online: http://www.szbattery.org/news/regulations/910.html (accessed on 23 October 2025).
The New York Times. Few Chinese Electric Cars Are Sold in U.S., But Industry Fears a Flood. The New York Times, 17 May 2024. Available online: https://cn.nytimes.com/business/20240517/china-electric-vehicles-biden-tariffs/dual/ (accessed on 23 October 2025).
The New York Times. European Union Hits E.V.s from China with Extra Tariffs Up to 38%. The New York Times, 12 June 2024. Available online: https://www.nytimes.com/2024/06/12/business/eu-china-ev-tariffs.html (accessed on 23 October 2025).
Zhao, T.; Liu, Z. A novel analysis of carbon capture and storage (CCS) technology adoption: An evolutionary game model between stakeholders. Energy 2019, 189, 116352. [Google Scholar] [CrossRef]
Shi, Y.; Han, B.; Zeng, Y. Simulating policy interventions in the interfirm diffusion of low-carbon technologies: An agent-based evolutionary game model. J. Clean. Prod. 2020, 250, 119449. [Google Scholar]
Hu, Y.; Wang, Z.; Li, X. Impact of policies on electric vehicle diffusion: An evolutionary game of small world network analysis. J. Clean. Prod. 2020, 265, 121703. [Google Scholar] [CrossRef]
Xie, K.; Szolnoki, A. Reinforcement learning in evolutionary game theory: A brief review of recent developments. Appl. Math. Comput. 2026, 510, 129685. [Google Scholar]
Shi, Y.; Wei, Z.; Shahbaz, M.; Zeng, Y. Exploring the dynamics of low-carbon technology diffusion among enterprises: An evolutionary game model on a two-level heterogeneous social network. Energy Econ. 2021, 101, 105399. [Google Scholar] [CrossRef]
Arrow, K.J. The economic implications of learning by doing. Rev. Econ. Stud. 1962, 29, 155–173. [Google Scholar] [CrossRef]
Zhang, Z.; Han, Z. Exploring coevolution in the diffusion of green products between consumers and enterprises—An agent-based model of two-layer heterogeneous networks. J. Clean. Prod. 2024, 450, 141689. [Google Scholar]
Fan, R.; Chen, R. Promotion policies for electric vehicle diffusion in China considering dynamic consumer preferences: A network-based evolutionary analysis. Int. J. Environ. Res. Public Health 2022, 19, 5290. [Google Scholar] [PubMed]
Wu, B.; Liu, P.; Xu, X. An evolutionary analysis of low-carbon strategies based on the government–enterprise game in the complex network context. J. Clean. Prod. 2017, 141, 168–179. [Google Scholar]
Chen, W.; Hu, Z.H. Using evolutionary game theory to study governments and manufacturers’ behavioral strategies under various carbon taxes and subsidies. J. Clean. Prod. 2018, 201, 123–141. [Google Scholar] [CrossRef]
Li, J.; Ku, Y.; Liu, C.; Zhou, Y. Dual credit policy: Promoting new energy vehicles with battery recycling in a competitive environment? J. Clean. Prod. 2020, 243, 118456. [Google Scholar]
Zheng, Y.; Liu, D.; An, F.; Wang, J.; Gao, X.; Jia, N. Impact of charging infrastructure construction on electric vehicle diffusion based on a multi-agent model. iScience 2025, 28, 112257. [Google Scholar] [CrossRef]
McCoy, D.; Lyons, S. Consumer preferences and the influence of networks in electric vehicle diffusion: An agent-based microsimulation in Ireland. Energy Res. Soc. Sci. 2014, 3, 89–101. [Google Scholar] [CrossRef]
He, Z.; Zhou, Y.; Wang, J.; Li, C.; Wang, M.; Li, W. The impact of motivation, intention, and contextual factors on green purchasing behavior: New energy vehicles as an example. Bus. Strategy Environ. 2021, 30, 1249–1269. [Google Scholar]
Zhao, H.; Bai, R.; Liu, R.; Wang, H. Exploring purchase intentions of new energy vehicles: Do “mianzi” and green peer influence matter? Front. Psychol. 2022, 13, 951132. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.; Fan, R.; Lin, J.; Chen, F.; Qian, R. The effective subsidy policies for new energy vehicles considering both supply and demand sides and their influence mechanisms: An analytical perspective from the network-based evolutionary game. J. Environ. Manag. 2023, 325, 116483. [Google Scholar] [CrossRef]
Zeng, Y.; Dong, P.; Shi, Y.; Wang, L.; Li, Y. Analyzing the co-evolution of green technology diffusion and consumers’ pro-environmental attitudes: An agent-based model. J. Clean. Prod. 2020, 256, 120384. [Google Scholar]
Fan, R.; Chen, R.; Wang, Y.; Wang, D.; Chen, F. Simulating the impact of demand-side policies on low-carbon technology diffusion: A demand-supply coevolutionary model. J. Clean. Prod. 2022, 351, 131561. [Google Scholar] [CrossRef]
Piñas, J.M.; Martinez-Gil, F.; Santacruz, M.S.; Fernández, F. Electric vehicle market dynamics: A multi-agent approach to policy, infrastructure, and consumer behavior. Energy Policy 2025, 206, 114800. [Google Scholar] [CrossRef]
Cannavacciuolo, L.; Maione, V.; Ponsiglione, C.; Primario, S. Agent-based modelling of electric vehicle adoption: A multidimensional perspective assessment. Res. Transp. Bus. Manag. 2025, 61, 101407. [Google Scholar] [CrossRef]
Zhu, L.; Shang, W.L.; Wang, J.; Li, Y.; Lee, C.; Ochieng, W.; Pan, X. Diffusion of electric vehicles in Beijing considering indirect network effects. Transp. Res. Part D Transp. Environ. 2024, 127, 104069. [Google Scholar] [CrossRef]
Deng, R.; Shen, N.; Zhao, Y. Diffusion model to analyse the performance of electric vehicle policies: An evolutionary game simulation. Transp. Res. Part D Transp. Environ. 2024, 127, 104037. [Google Scholar] [CrossRef]
Fan, X.; Li, C. “To be unfolding” or “be on its last legs”—Preferential tax policies between supply and demand and the development of the new energy vehicle industry based on an ABM model. Energy Policy 2025, 200, 114552. [Google Scholar] [CrossRef]
Wu, Y.; Yin, A.; Zheng, Y.; Wang, X.; Zhang, S. Research on the diffusion of electric vehicles based on dynamic games under the influence of environmental regulations. Energy Econ. 2025, 146, 108502. [Google Scholar] [CrossRef]
Chen, Q. The study of cooperative behaviour in network evolutionary games based on reinforcement learning. Oper. Res. Fuzzilogy 2024, 14, 1073–1085. [Google Scholar]
Li, F.; Cao, X.; Ou, R. A network-based evolutionary analysis of the diffusion of cleaner energy substitution in enterprises: The roles of PEST factors. Energy Policy 2021, 156, 112385. [Google Scholar] [CrossRef]
Watts, D.J.; Strogatz, S.H. Collective dynamics of ‘small-world’networks. Nature 1998, 393, 440–442. [Google Scholar] [CrossRef]
Li, F.; Cao, X.; Sheng, P. Impact of pollution-related punitive measures on the adoption of cleaner production technology: Simulation based on an evolutionary game model. J. Clean. Prod. 2022, 339, 130703. [Google Scholar] [CrossRef]
Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 1998; Volume 1. [Google Scholar]
Liang, J.; Miao, H.; Li, K.; Tan, J.; Wang, X.; Luo, R.; Jiang, Y. A review of multi-agent reinforcement learning algorithms. Electronics 2025, 14, 820. [Google Scholar] [CrossRef]
Lin, J.; Long, P.; Liang, J.; Dai, Q.; Li, H.; Yang, J. The coevolution of cooperation: Integrating Q-learning and occasional social interactions in evolutionary games. Chaos Solitons Fractals 2025, 194, 116165. [Google Scholar]
Li, M.; Sun, H.; Huang, Y.; Chen, H. Shapley value: From cooperative game to explainable artificial intelligence. Auton. Intell. Syst. 2024, 4, 2. [Google Scholar] [CrossRef]
Tao, Y.; Ao, Y.; Long, Y.; Martek, I. Thermal performance of post-disaster housing and its impact on occupant comfort: An integrated ML-ABM approach. Build. Environ. 2025, 281, 113205. [Google Scholar]
Serre, L.; Amyot-Bourgeois, M.; Astles, B. Use of Shapley Additive Explanations in interpreting agent-based simulations of military operational scenarios. In Proceedings of the 2021 Annual Modeling and Simulation Conference (ANNSIM), Fairfax, VA, USA, 19–22 July 2021; IEEE: New York, NY, USA, 2021; pp. 1–12. [Google Scholar]
Wang, L.; Zheng, J. Research on low-carbon diffusion considering the game among enterprises in the complex network context. J. Clean. Prod. 2019, 210, 1–11. [Google Scholar] [CrossRef]
China Energy News. China’s New Energy Vehicle Penetration Rate Reaches a Record High of 58.37%. Elephant News, 13 October 2025. Available online: https://www.cnenergynews.cn/article/4OvPMRXLgGn (accessed on 18 December 2025).
China Briefing. Breaking Down the US-China Trade Tariffs: What’s in Effect Now? China Briefing. 27 November 2025. Available online: https://www.china-briefing.com/news/us-china-tariff-rates-2025/ (accessed on 10 December 2025).
Sun, J.; Tan, C.; Zhao, W. Economic Impact Assessment of US and EU Tariffs on China’s New Energy Vehicle Industry. Res. Financ. Econ. Issues 2025, 66–79. [Google Scholar] [CrossRef]
Saltelli, A.; Annoni, P.; Azzini, I.; Campolongo, F.; Ratto, M.; Tarantola, S. Variance based sensitivity analysis of model output. Design and estimator for the total sensitivity index. Comput. Phys. Commun. 2010, 181, 259–270. [Google Scholar] [CrossRef]
Saltelli, A. Making best use of model evaluations to compute sensitivity indices. Comput. Phys. Commun. 2002, 145, 280–297. [Google Scholar] [CrossRef]
Sobol, I.M. Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates. Math. Comput. Simul. 2001, 55, 271–280. [Google Scholar] [CrossRef]

Figure 1. Overall diffusion model structure.

Figure 2. An interpretable RL-based decision-making framework.

Figure 3. Convergence analysis of the Q-learning algorithm under three policy scenarios. (a) Scenario 1: Consumer subsidy. (b) Scenario 2: Import tariff policies on raw materials for NEVs. (c) Scenario 3: Other countries’ tariffs on China’s NEVs.

Figure 4. Model validation and comparison of different algorithms. (a) Relative error of simulation data versus real data over time. (b) Comparison of simulated diffusion rate fit against real data.

Figure 5. Multi-agent evolution process in a two-layer small-world network.

Figure 6. (a) The impact of consumer subsidies on the enterprise diffusion rate. (b) The impact of consumer subsidies on the consumer diffusion rate.

Figure 7. (a) The impact of raw material import tariffs on the enterprise diffusion rate. (b) The impact of raw material import tariffs on the average cumulative profits of NEV enterprises. (c) The impact of raw material import tariffs on the average cumulative profits of FV enterprises.

Figure 8. The impact of other countries’ tariffs on the enterprise diffusion rate.

Figure 9. The impact of consumer subsidies and raw material tariffs on the enterprise diffusion rate.

Figure 10. The impact of consumer subsidies and tariffs imposed by other countries on the enterprise diffusion rate.

Figure 11. The impact of raw material tariffs and tariffs imposed by other countries on the enterprise diffusion rate.

Figure 12. SHAP summary plot for predictions based on RFC.

Figure 13. Sobol sensitivity analysis.

Table 1. Summary of all symbols and variable definitions used in the model.

Symbol	Definition
1. Network Structure
$N_{1}$ / $N_{2}$	The number of consumers or enterprises in the network
$d_{1}$ / $d_{2}$	Average degree of consumer and enterprise networks
$β$	Rewiring probability in the network
2. Consumer Modeling
$α (t)$	Proportion of green consumers at time t (the proportion of demand for NEVs)
$ρ$	Proportion of brown consumers
$U_{i}$	Total utility of consumer i
s	Consumer subsidy factor (economic utility parameter)
$d_{i}$	The number of neighbors of consumer i
$k_{i}$	The number of neighbors with the same preference as consumer i
k	Noise intensity level in Fermi rule
3. Enterprise Modeling
$γ (t)$	Proportion of NEV-producing enterprises at time t
Q	Total market demand size (constant)
$q_{N E V}$ / $q_{F V}$	Average output for two types of enterprises
$p_{N E V}$ / $p_{F V}$	Price of NEVs and FVs
$c_{N E V}$ / $c_{F V}$	Production cost of NEVs and FVs
$W_{N E V}$ / $W_{F V}$	Profit for two types of enterprises
$e x p o r t$	Ratio of NEV exports to domestic market demand
$m a t e r i a l_t a r i f f$	Import tariff rate on raw materials of NEVs
4. Q-learning
S/A	State space and action space
$n_{s}$ / $n_{a}$	The number of states and actions of enterprises
$Q (s, a)$	Q-value, expected cumulative reward for action a in state s
$r_{i}$	The profit difference between an enterprise and its neighbor i
n	The number of neighbors of enterprises
R	The sum of profit differences between an enterprise and its neighbors
$α$	Learning rate
$γ$	Discount factor
$ϵ$	Exploration rate in $ϵ$ -greedy strategy
5. SHAP Value
$ϕ_{j}$	SHAP value representing the marginal contribution of feature j
F	The set of all input features
M	Total number of input features
C	A coalition (subset) of features excluding feature j ( $C \subseteq F ∖ {j}$ )
$f_{x} (C)$	Expected model prediction given feature coalition C

Table 2. Key static parameters in the model.

Symbol	Description	Value	Source
Network Structure Parameters
$N_{1}$	Number of consumer nodes	1000	[12]
$N_{2}$	Number of enterprise nodes	300	[7]
$d_{1, 2}$	Average degree of networks ( $d_{1} = d_{2}$ )	6	[40]
$β$	Network rewiring probability	0.1	[32]
Vehicle Attributes
$p_{N E V}$	Price of NEVs	1	[7]
$p_{F V}$	Price of FVs	0.38	[7]
$c_{N E V}$	Production cost of NEVs	0.74	[7]
$c_{F V}$	Production cost of FVs	0.16	[7]
Other Settings
Q	Total market size	300	[33]
k	Noise intensity in Fermi rule	0.1	[33]
$ρ$	Proportion of brown consumers	0.1	[10]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, Z.; Luo, H. Studying the Diffusion Effect of Policy Combinations on New Energy Vehicles Based on Reinforcement Learning. Electronics 2026, 15, 779. https://doi.org/10.3390/electronics15040779

AMA Style

Li Z, Luo H. Studying the Diffusion Effect of Policy Combinations on New Energy Vehicles Based on Reinforcement Learning. Electronics. 2026; 15(4):779. https://doi.org/10.3390/electronics15040779

Chicago/Turabian Style

Li, Zhuangzhuang, and Hua Luo. 2026. "Studying the Diffusion Effect of Policy Combinations on New Energy Vehicles Based on Reinforcement Learning" Electronics 15, no. 4: 779. https://doi.org/10.3390/electronics15040779

APA Style

Li, Z., & Luo, H. (2026). Studying the Diffusion Effect of Policy Combinations on New Energy Vehicles Based on Reinforcement Learning. Electronics, 15(4), 779. https://doi.org/10.3390/electronics15040779

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Studying the Diffusion Effect of Policy Combinations on New Energy Vehicles Based on Reinforcement Learning

Abstract

1. Introduction

2. Related Work

2.1. Single-Policy Analysis from the Industrial Policy Perspective

2.2. Policy Mix Analysis

2.3. Strategy Updating Rules in Evolutionary Game Models

2.4. Research Gaps

3. Diffusion Model Design

3.1. Overall Model Structure

3.2. Basic Assumptions of the Model

3.3. Small-World Network

3.3.1. Consumer Modeling

3.3.2. Enterprise Modeling

4. An Interpretable RL-Based Decision-Making Framework

4.1. Q-Learning-Based Strategy Update Algorithm

Convergence Analysis of Q-Learning Algorithm

4.2. Interpretability Framework Based on SHAP

4.2.1. Construction of Surrogate Model

4.2.2. SHAP Value

4.3. Parameter Settings and Model Calibration

4.3.1. Static Parameter Settings

4.3.2. Dynamic Parameter Settings

4.3.3. Model Calibration

4.3.4. Model Validation

5. Evaluation

5.1. Single-Policy Analysis

5.1.1. Consumer Subsidy

5.1.2. Import Tariff Policies on NEV Raw Materials

5.1.3. Other Countries’ Tariffs on China’s NEVs

5.2. Policy Combination Analysis

5.2.1. Consumer Subsidies and Import Tariffs on NEV Raw Materials

5.2.2. Consumer Subsidies and Tariffs Imposed by Other Countries on China’s NEVs

5.2.3. Import Tariffs on NEV Raw Materials and Tariffs Imposed by Other Countries on China’s NEVs

5.3. Micro-Level Decision Mechanism Analysis Based on SHAP

5.4. Sobol Sensitivity Analysis

5.5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI