1. Introduction
As the COVID-19 pandemic continues to spread and rebound, the quantity and quality of medical waste disposal in China is facing new challenges. On the one hand, the challenge in “quantity” is the explosive growth of medical waste production. For example, the medical waste in Wuhan, the first city that suffered from the COVID-19 outbreak, produced from 45 t/d before the outbreak to 155–195 t/d after the outbreak (Yu et al., 2020; Singh et al., 2020) [
1,
2], whereas the maximum daily disposal capacity is only 49 tons. Meanwhile, Zhenjiang has a relatively low spread of COVID-19, and the number of its confirmed COVID-19 cases only accounts for 0.02% of the confirmed cases in Wuhan. The amount of medical waste produced after the outbreak is shown in 
Figure 1. Medical waste in Zhenjiang also increased rapidly from 372.42 tons in the first quarter of 2020 to 532.22 tons in the fourth quarter, among which the increased rate of infectious medical waste is particularly prominent. On the other hand, the challenge in “quality” is the higher requirements of medical waste disposal. After the COVID-19 outbreak, the Ministry of Ecology and Environment of China issued the “New Coronavirus -Infected Pneumonia Epidemic Medical Waste Emergency Disposal Management and Technical Guidelines (Trial Implementation)”, which requires disposal enterprises to speed up the collection frequency, shorten the residence time of disposal, set up isolation zones, increase cleaning and disinfection, and undertake other measures to carry out the harmless disposal of medical waste according to higher emergency disposal management and technical standards.
China has been strengthening its medical waste management system and financial supports. Especially after the outbreak of severe acute respiratory syndrome in 2003, China promulgated a series of laws and regulations to deal with medical waste, such as medical waste management regulations, technical specifications for centralized medical waste disposal, and invested in the construction of medical waste disposal stations and facilities (Wei et al., 2021) [
3]. However, due to the dual challenges in “quantity” and “quality” mentioned above, problems in China’s medical waste disposal process, such as untimely disposal, simplified disposal procedures, mixed disposal with ordinary household garbage, or illegal reselling of medical waste, have appeared. And these inappropriate disposal measures may cause significant harm to the environment and human health and even exacerbate the spread of diseases, such as typhoid, cholera, acquired immunodeficiency syndrome and 2019 coronavirus disease (COVID-19) (Taslimi et al., 2020; Sarkodie et al., 2021) [
4,
5].
Considering China’s medical waste management system, most of the current medical waste management policies were issued in 2001–2006, and the reward and penalty mechanism for medical waste disposal is not perfect (Cao et al., 2021) [
6]. Under the established regional centralized disposal system for medical waste (Chen et al., 2021) [
7], Chinese local governments evaluate and select medical waste disposal enterprises, and medical waste from various medical institutions is collected and transferred to the disposal enterprises for centralized disposal (Song et al., 2018) [
8]. This system leads to the co-governance dilemma in medical waste management. The fundamental reason lies in the static reward and penalty mode adopted by local governments, which leads to the lack of proactive motivation to dispose medical waste among disposal enterprises (Cao et al., 2021) [
6]. Concretely speaking, the medical waste disposal charge labeling of disposal enterprises is subject to the official guide price approved by the local governments once a year. In other words, when disposal enterprises improve the disposal quality, local governments rarely give additional compensation, except a slight increase in the approved price of medical waste. Meanwhile, for violations of medical waste disposal found in daily supervision, local governments usually impose fixed fines, rather than presenting continuous dynamic characteristics with the change of disposal enterprises’ behavior. Therefore, optimizing the reward and penalty model to improve the effectiveness of medical waste management is a key measure to break through the co-governance dilemma.
In fact, scholars have explored an endogenous mechanism to overcome social dilemmas by adopting game theory, and the classical metaphor for investigating social problems is the prisoner’s dilemma (PD) game (Tanimoto 2015) [
9]. Based on the concept proposed by Neumann et al. (1944) [
10], Nash (1949) drove game theory forward and made it more applicable in various fields, such as economics, information science, statistical physics, and other social sciences [
11]. However, according to the research application of Acevedo et al. (2005) [
12], Pothos et al. (2011) [
13], Bahbouhi et al. (2017) [
14], Babajanyan et al. (2020) [
15], a theoretical premise of the prisoner’s dilemma is that it is a symmetric game, that is, the two prisoners have equal strength. And this theoretical premise is very different from the current situation of medical waste disposal in China, where local governments are strong and disposal enterprises are weak. Therefore, the asymmetric game model is more suitable for describing the behavioral interaction mechanism of multi-agents in China’s medical waste disposal. Furthermore, the developments of evolutionarily stable strategy and Nowak classifications have driven evolutionary game theory to become one of the most exciting fields in science (Tanimoto 2019) [
16]. Hofbauer et al. (2003) and Dindo et al. (2011) also proposed that evolutionary game theory is an effective tool to study the influence of institutional evolution through the selection of strategies, such as multi-agent learning adjustment and group imitation [
17,
18]. Combined with evolutionary game tools, the asymmetric game model has been widely used in drug quality supervision, COVID-19 pandemic control in the medical field, and waste disposal in other fields (Ding et al., 2018; Rong et al., 2020; Wei et al., 2020; Yu et al., 2020) [
19,
20,
21,
22].
Taking the asymmetric model as the starting point of logical analysis, the evolutionary game systems of disposal enterprises and local governments are established under four scenarios: static reward and static penalty, dynamic reward and static penalty, static reward and dynamic penalty, and dynamic reward and dynamic penalty. Then, the evolutionary stability of the game system is analyzed, and the evolutionary trajectory and influencing factors are studied. The main problems to be solved in this paper are as follows:
- (1)
- What are the evolution characteristics of the game system between disposal enterprises and local governments under different reward and penalty modes? 
- (2)
- Which reward and penalty model implemented by local governments under the supervision of medical waste disposal has the best effect? 
- (3)
- How does reward and penalty affect the evolution of behavioral strategies of disposal enterprises and local governments? 
The remainder of this paper is structured as follows. 
Section 2 summarizes the relevant literature. 
Section 3 establishes and analyzes the evolutionary game system of disposal enterprises and local governments under the static reward and penalty model. 
Section 4 establishes and analyzes the evolutionary game system of disposal enterprise and local government behaviors under the dynamic reward and penalty model according to the shortcoming in 
Section 3. 
Section 5 provides a case study to simulate and discuss the evolution effect of the game systems under different reward and penalty modes. 
Section 6 gives the main conclusions and limitations.
  4. Construction and Analysis of the Evolutionary Game Model under Dynamic Reward and Penalty Mode
Meng et al. (2021) [
52] and Liu et al. (2021) [
53] proved that the dynamic reward and penalty mechanism is an effective way to relieve the burden of government financial expenditure. Specifically, the rewards or penalties set by local governments are continuously dynamic, and the amount of reward or penalty is related to the behavioral strategies of disposal enterprises. For example, the higher the probability of the disposal enterprises choosing the “low-quality disposal” strategy, the more likely the upper limit of the reward and penalty can be appropriately increased so as to produce a more effective incentive and deterrent effect. On the contrary, the upper limit of reward and penalty can be reduced, so as to avoid the failure of reward and penalty mode and the consumption of additional financial expenditure caused by the inflexible and unrealistic static reward and penalty policies. Therefore, we assume that government rewards and penalties are a linear function of the probability of the disposal enterprise’s behavioral strategy. That is, the reward is 
, and the penalty is 
.
According to the change of reward or penalty, the dynamic reward and penalty mode can be further divided into dynamic reward mode, dynamic penalty mode, and dynamic reward and dynamic penalty mode. Among them, the dynamic reward mode means that the reward is set as a dynamic parameter with the change of disposal enterprise behavior strategy while the penalty is set as a fixed parameter; therefore, it is also called the dynamic reward and static penalty mode. The dynamic penalty mode means that the penalty is set as a dynamic parameter while the reward is set as a fixed parameter; therefore, it is also called the static reward and dynamic penalty mode. Similarly, the dynamic reward and dynamic penalty mode means that reward and penalty are set as dynamic parameters.
  4.1. Game Model Construction and Stability Analysis under Dynamic Reward and Static Penalty Mode
We replace 
 in Formulas (2) and (4) with 
, and obtain the replication dynamic equation of local governments and disposal enterprises under dynamic reward and static penalty mode as follows:
According to Formula (6), the four fixed equilibrium points are , , , and . When  and ,  is another system equilibrium point, where  and .
The Jacobian matrix of the game system under the dynamic reward and static penalty mode is given in Formula (7):
Based on the above Jacobian matrix, the stability of equilibrium point (1, 0) is also affected by the relationship between 
 and 
. If 
 is met, the equilibrium point (1, 0) is the asymptotic stability point of the game system, but the stability of other equilibrium points is not affected, as shown in 
Table 6.
  4.2. Game Model Construction and Stability Analysis under Static Reward and Dynamic Penalty Mode
We replace 
 in Formulas (2) and (4) with 
 and obtain the replication dynamic equation of local governments and disposal enterprises under static reward and dynamic penalty mode as follows:
According to Formula (8), the four fixed equilibrium points are (0, 0), (1, 0), (0, 1), and (1, 1). When  and ,  is another system equilibrium point, where  and .
The Jacobian matrix of the game system under the static reward and dynamic penalty mode is given in Formula (9):
Based on the above Jacobian matrix, the 
 and 
 symbols of equilibrium point 
 are affected by the relationship between 
 and 
. However, regardless of whether the relationship between 
 and 
 changes, 
 cannot evolve into a stability point. The stability results of the five equilibrium points of the game system are shown in 
Table 7 with 
 as the example. The results showed that 
, 
, 
, and 
 are all saddle points or unstable points, whereas 
 has asymptotic stability. Therefore, the game system remains stable at 
.
  4.3. Game Model Construction and Stability Analysis under Dynamic Reward and Dynamic Penalty Mode
Similarly, we replace 
 and 
 with 
 and 
, respectively, in Formulas (2) and (4) and obtain the replication dynamic equation of local governments and disposal enterprises under dynamic reward and dynamic penalty mode as follows:
According to Formula (10), the four fixed equilibrium points are , , , and . When  and ,  is another system equilibrium point, where  and .
The Jacobian matrix of the game system under the dynamic reward and dynamic penalty mode is given in Formula (11):
The results of stability analysis showed that the stability of equilibrium point (1, 0) is consistent with that of the dynamic reward and static penalty mode, whereas the 
 and 
 symbols of equilibrium point (1, 1) are consistent with the changes of the static reward and dynamic penalty. Therefore, we made 
 and 
 as the analysis conditions. The stability results of the five equilibrium points of the game system are shown in 
Table 8. Obviously, (0, 0), (1, 0), (0, 1), and (1, 1) are all saddle points, whereas 
 evolves ultimately into an asymptotic stability point.
  5. Case and Simulation Analysis
Medical waste disposal in Zhenjiang City, Jiangsu Province, China is used in the case study. We simulate the evolution process of the behavioral strategies of local governments and disposal enterprises under different reward and penalty modes. The simulation results have a certain reference importance for the formulation of the reward and penalty model of medical waste disposal in other areas.
  5.1. A Case Study of Medical Waste Disposal in China
Zhenjiang New Universe Solid Waste Disposal Co., Ltd. is a medical waste disposal enterprise designated by the Zhenjiang municipal government in Jiangsu Province, China and holds a hazardous waste operation license issued by the provincial and municipal environmental protection departments. This disposal enterprise is responsible for the collection, disposal and comprehensive utilization of medical waste from more than 190 medical institutions. According to the operating situation, the regulatory cost savings of Zhenjiang’s local government under “relaxed supervision” may reach 50% (), and the disposal cost savings of Zhenjiang New Universe Solid Waste Disposal Co., Ltd. can reach about 70% () when it reduces the disposal standard. For the convenience of calculation, we worked with the relevant personnel of the Zhenjiang local government and Zhenjiang New Universe Solid Waste Disposal Co., Ltd. and set strict supervision costs to , high-quality disposal costs to , government rewards to , government penalties to , risk costs to , and remediation costs for risk events to . In addition, the discovery probability of low-quality disposal and the occurrence probability of risk events are set as  and , respectively, because of the formalism of relaxed supervision and the contingency of risk events.
  5.2. Numerical Simulation
In this section, we use the MATLAB software to perform numerical simulation and compare the evolution trajectory of the game system under the four reward and penalty modes. Finally, we identify the optimal reward and penalty mode and analyze the influence of rewards and penalties on the behavior strategies of disposal enterprises and local governments.
  5.2.1. Evolution Trajectory of Game System under Different Modes
Based on the initial assignment of parameters, the evolution trajectory of the game system under the static reward and static penalty mode, static reward and dynamic penalty mode, and dynamic reward and dynamic penalty mode are simulated, as shown in 
Figure 3a,c,d, respectively. Parameter 
 is adjusted to 
 to satisfy the analysis condition of 
 under the dynamic reward and static penalty mode, and the evolution trajectory of this game system is shown in 
Figure 3b. The comparison of the evolution trajectories of the game systems under the four reward and penalty modes reveals the following: (1) Under the static reward and static penalty mode (
Figure 3a), the evolution trajectory of the game system is a closed-loop orbit that oscillates around the equilibrium point 
. Both game players keep learning and adjusting their behavioral strategies according to the benefits, but they cannot evolve to a stable state; (2) Under the dynamic reward and static penalty mode (
Figure 3b), the equilibrium point 
 evolves into a stable point after repeated games between local governments and disposal enterprises. At this point, local governments choose the “strict supervision” strategy, but disposal enterprises still choose the “low-quality disposal” strategy; (3) Under the static reward and dynamic penalty mode (
Figure 3c) and the dynamic reward and dynamic penalty mode (
Figure 3d), the evolution trajectories of the game system show a spiral state and gradually converge after a short-term shock. Although the game system hardly evolves to a stable state, it tends to be close to the equilibrium point 
 or 
. That is, under a certain probability, local governments choose the “strict supervision” strategy and disposal enterprises choose the “high-quality disposal” strategy.
  5.2.2. Behavior of Game Players under Different Modes
The simulation results show that the reward and penalty models have a remarkable impact on the evolution trajectory of the game system. However, the evolution direction under the dynamic reward and static penalty mode does not conform to the social benefits of medical waste disposal. Therefore, the improved models, namely, the static reward and dynamic penalty mode and the dynamic reward and dynamic penalty mode, are more reasonable and conform to actual needs. We simulate the evolution law of behavior to further compare the effects of the two improved models on the behavior of local governments and disposal enterprises, and the results are shown in 
Figure 4a,b, respectively. The behavior of local governments and disposal enterprises tend to be stable after experiencing short-term shocks under both reward and penalty modes, and the amplitude of shocks under the dynamic reward and dynamic penalty mode is considerably higher than that under static reward and dynamic penalty mode. Furthermore, compared with the dynamic reward and dynamic penalty model, the period for local governments and disposal enterprises to stabilize is shorter, the probability that the disposal enterprises will choose the “high-quality disposal” strategy is slightly higher, and the probability of local governments to implement the “strict supervision” strategy is substantially lower under the static reward and dynamic penalty mode. Generally speaking, the static reward and dynamic penalty mode is better than the dynamic reward and dynamic penalty mode.
  5.2.3. Impact of Reward and Penalty on the Behavior of Game Players
The static reward and dynamic penalty model is used as an example to further simulate the influence of reward (
) and penalty (
) on the behavior of local governments and disposal enterprises. The assignment of 
 and 
 is changed, whereas the assignment of the other parameters is consistent with the original assignment. The simulation results are shown in 
Figure 5a,b, and 
Figure 6a,b. 
Figure 5a,b show that with the increase in the amount of reward, the oscillation range of the behavior choice of local governments and disposal enterprises becomes larger, the period for the game process to be stable becomes longer, and the influence degree of the behavior of local governments is remarkably greater than that of disposal enterprises. Although the probability of disposal enterprises choosing the “high-quality disposal” strategy has a small increase, the trend of local governments favoring the “relaxed supervision” strategy is more remarkable. Similarly, 
Figure 6a,b show that with the increase in the amount of penalty, the oscillation range of the behavior choice becomes smaller, the period toward stability becomes shorter, and the influence degree of the behavior of local governments is also greater than that of disposal enterprises. In addition, the probability of disposal enterprises choosing the “high-quality disposal” strategy increases slightly, but not remarkably, whereas local governments choose the “relaxed supervision” strategy with a higher probability.
  5.3. Results Discussion
The stability analysis and simulation results show that the behavioral interaction mechanism between local governments and disposal enterprises is complex and diverse. In the given realistic assumptions, when the conditions, “
 and 
”, are met, the game system between local governments and disposal enterprises has no ESS under the static reward and static penalty mode. This is consistent with the research conclusions of Meng et al. (2021) [
52] and Liu et al. (2021) [
53] in other fields. On the contrary, if the above conditions are not met, the game system under the static reward and static penalty mode will at least have one ESS. Particularly, the stable point 
 indicates that local governments choose the “relaxed supervision” strategy and disposal enterprises choose the “high-quality disposal” strategy, which is the most ideal state.
The optimization results show an ESS in the game system under the dynamic reward or dynamic penalty mode, and the optimization effect is remarkable. However, under the dynamic reward and static penalty mode, although local governments choose the “strict supervision” strategy, disposal enterprises still choose the “low-quality disposal” strategy, which does not meet the realistic needs of local governments. Meanwhile, under the other two modes (static reward and dynamic penalty or dynamic reward and dynamic penalty), the probability of disposal enterprises choosing the “high-quality disposal” strategy is consistent with 0.8, but the probability of local governments choosing the “strict supervision” strategy is about 0.6 and 0.8, respectively. In summary, the static reward and dynamic penalty mode is the most effective scheme to reduce the supervision pressure of local governments and to promote the disposal quality of medical waste to realize the ideal state, in which local governments do not need to strictly supervise all the time and disposal enterprises will also dispose medical waste with high standards.
Reward and penalty have an important influence on the interaction mechanism between local governments and disposal enterprises, and the degree of influence on the behavior of local governments is more obvious. On the one hand, long-term high reward makes local governments bear greater financial pressure, which may lead to management dilemmas (Wang et al., 2020) [
54]. Therefore, the probability of local governments turning to relaxed supervision may increase. Furthermore, after receiving a government reward, disposal enterprises may have an opportunistic tendency and use the reward for other risk projects instead of improving the disposal quality of medical waste, which results in the invalidation of the government reward. On the other hand, when the increase in government penalty and the cost of disposal enterprises to improve the disposal quality of medical waste are not obvious, the deterrent force of penalty faced by disposal enterprises is not remarkable, and the disposal quality of medical waste becomes difficult to be remarkably improved. Nevertheless, with the favorable corporate phenomenon and public reputation brought by high-quality disposal behavior, disposal enterprises begin to pursue the excess return of long-term development. In summary, the dynamic reward and penalty model could be a useful tool for capacity-limited regulators to achieve an effective regulation of an enterprise’s environmental behavior, and government should pay attention to the transitions from campaign-style enforcement to normal enforcement in China (Jin et al., 2017) [
55]. That is, even if local governments no longer strictly supervise them, disposal enterprises will continue to improve the disposal quality of medical waste.
  6. Conclusions
Based on evolutionary game theory, this paper constructs a game system between medical waste disposal enterprises and local governments and then analyses and compares the stability of the game system under four modes, namely, the static reward and static penalty mode, dynamic reward and static penalty mode, static reward and dynamic penalty mode, and dynamic reward and dynamic penalty mode. Compared with existing cases or investigation research in medical waste disposal (Sarker et al., 2014; Niyongabo et al., 2019; Ahmad et al., 2021) [
1,
42,
43], this study reveals the internal evolution mechanism of the behavioral strategies of disposal enterprises and local governments under the influence of rewards and penalties and verifies the game system through numerical simulations. The main research conclusions and related suggestions are as follows.
First, the static reward and static penalty model implemented by local governments is not universal. Especially for some specific circumstances, the choice of behavior strategies between medical waste disposal enterprises and local governments enters a vicious circle, and the game system cannot evolve to a stable strategy combination point, which is not conducive to the improvement of the disposal quality of medical waste. Therefore, local governments need to optimize the static reward and penalty mode and make dynamic adjustments according to the behavior of medical waste disposal enterprises to achieve the best reward and penalty effect.
Second, among the three optimized dynamic reward and penalty models, the static reward and dynamic penalty model has the best effect. This mode can minimize the financial burden of local governments and improve the disposal quality of medical waste. Medical waste disposal enterprises and local governments have information asymmetry, that is, disposal enterprises deliberately hide medical waste disposal information in pursuit of high profits; therefore, the penalty policy of local governments cannot achieve the deterrent effect. Accordingly, local governments should broaden the channels of information feedback, improve the transparency of the behavior of medical waste disposal enterprises with the help of third-party forces, such as the public and the media, and implement the dynamic adjustment of penalties.
Finally, local governments play a leading role in medical waste disposal, and their reward and penalty policies and supervision measures have a remarkable impact on the behavior strategies of medical waste disposal enterprises. Particularly, disposal enterprises tend to implement a high-quality disposal strategy with the increase in reward and penalty. However, although excessive rewards put pressure on government finances, they also further reduce the marginal effect of incentives. Consequently, local governments should set different reward and penalty levels for different types of medical waste disposal enterprises. For example, if the disposal capacity of disposal enterprises is insufficient, local governments can increase the reward and reduce the penalty; if the strength of disposal enterprises is strong, local governments can give priority to supervision and penalty, and take reward as the secondary means.
This paper has some limitations. The research will be further expanded in two aspects in the future. First, the basic nature of evolutionary game theory determines the incompleteness of information between game players, which may lead to moral hazards and adverse selection. The design of a co-governance mechanism with the participation of other stakeholders (such as the media and the public) is worthy of further discussion in order to standardize the disposal of medical waste more effectively. Second, the choice of behavioral strategy of medical waste disposal enterprises does not completely depend on government rewards and penalties and is also disturbed by random external factors. We will draw lessons from the viewpoint of stochastically stable equilibrium (Foster et al., 1990) [
56] and expand the game model accordingly