Research on Two-Way Logistics Operation with Uncertain Recycling Quality in Government Multi-Policy Environment

: With the emphasis on environmental protection issues, countries have introduced a number of circular economy-related policies aimed at guiding enterprises to green development and green innovation. At the same time, they are encouraged to strengthen the combination of forward logistics and reverse logistics, providing an effective way to improve resource shortages and reduce environmental pollution. However, the quality of recycled products in reverse logistics is often uncertain, which not only increases the risk of production decisions of remanufacturers, but also affects the sales of new products in the forward logistics to a certain extent. Based on the uncertainty of recycling quality of waste products and the controllability of remanufacturing technology level, this paper studies the impact of various regulatory environments on the operation of a two-way logistics system. We solve the decision model of the system through game theory. The results show that the government’s policy can improve product recycling quality and remanufacturing technology. However, recycling rewards and punishments, remanufacturing technology subsidies, and recycling payment factors need to be within a certain range to ensure the effectiveness of the trading market. When the recycling regulation has been implemented, it is beneﬁcial for manufacturers to choose to improve their remanufacturing technology level. This means that the combined effect of multiple policies is more conducive to the operation of a two-way logistics system. and C.G.; Methodology, C.G.; Visualization, Y.T. and C.G.; Formal analysis, Y.T.; Writing—original draft preparation, Y.T.; writing—review and editing, C.G.; The ﬁnal manuscript


Introduction
With the development of economy and technology, the update speed of various products has accelerated, and the demand for products of consumers has gradually increased. This shortens the life cycle of the product to some extent. According to statistics, 41.8 million tons of waste electrical and electronic equipment (WEEE) were generated in the world in 2014. This data is expected to grow at an annual rate of 4 to 5% [1]. At the same time of product renewal and iteration, the disposal of waste products is becoming increasingly serious. If not properly treated, WEEE can pollute the soil and groundwater with hazardous substances, such as lead-based solder, arsenic, and selenium, which pose threats to human health and the environment [2]. Besides, certain components and materials (such as plastics and metals) can be remanufactured to increase resource utilization and reduce manufacturing costs. For example, recycling one ton of cell phones would recover 40 times as much gold as mining one ton of raw ore [3].
Obviously, the disposal of waste products is both a challenge to reduce environmental pollution and an opportunity to improve resource utilization. How to reduce environmental pollution and improve the utilization rate of resources has become the key to effectively dealing with environmental smooth development of the remanufacturing process, which guarantees the "closed" of the two-way logistics system. Based on the above analysis, the two-way logistics system under multi-policy environment has important research significance. A two-way logistics system involves multiple policies. In other words, multiple policies can be applied in the system to improve resource utilization and the environment. This provides inspiration for us to study the implementation of joint policy. Based on this, our goal is to collect management insights by exploring the following research questions: • What should decision makers choose to protect their own interests under the "regulation menu" given by the government? • How do different government policies affect systems with uncertain recycling quality? Including the impact of pricing, profit, recycling quality, remanufacturing technology level, etc. • What is the impact of joint and separate implementation of policies on system decision-making?
Based on the aforementioned goal orientation, we have established system decision models with uncertain quality recovery under different policy environments. It is found that the implementation of government policies can effectively improve the quality of recycling and the level of remanufacturing technology. This will be conducive to the smooth development of system remanufacturing activities. Compared with the effect of a single policy, a joint policy is more beneficial to the system. The setting of the government policy menu needs to be within certain limits to ensure the effectiveness of market transactions. These can provide a reference for the formulation of government policies. Under different policies, the optimal decision of the manufacturer and retailer will be adjusted accordingly. We will conduct a specific analysis in the Section 5.
The rest of this paper is organized as follows: Section 2 outline the literature related to our work. Section 3 delineates our modeling assumptions and notation. Section 4 establishes and solves the decision models under four policy environments. Section 5 analyses the result of the solution. Section 6 is the numerical simulation. Section 7 concludes this paper.

Literature Review
Taking the two-way logistics system as a research object, we discussed the effect of different policies on the system in the case of uncertain recycling quality.
Recycling and remanufacturing is the key link between reverse and forward logistics. Recently, researchers have explored channel selection, production and pricing decisions for recycling and remanufacturing. Savaskan et al. [16] analyzed the profits of a closed-loop supply chain when retailer, third-party and manufacturer are responsible for recycling, respectively. It is more conducive to the increase in system profits when the retailer is in charge of recycling. Shulman et al. [17] found that manufacturers are in charge of recycling to achieve the highest efficiency in the case of a bilateral monopoly. Wenhui Zhou et al. [18] developed a two-period model with a selling period run by a manufacturer and a recycling period conducted by two recyclers. Their result shows that when the competitiveness difference between the two recyclers is large, one recycler cannot survive in the market because it fails to offer an attractive price to consumers so that no consumer "sells" WEEE to it. Jie Wei et al. [19] analyzed the optimal pricing and recovery rate of a closed-loop supply chain in the case of information symmetry and asymmetry. In Internet environment, Li et al. [20] studied the pricing of green dual-channel supply chain. In multiple manufacturer environments, Bandyopadhyay and Anand [21] discussed the optimal decision under competition. Atasu et al. [22] analyzed the factors affecting remanufacturing to obtain the profits in monopolistic and competitive environments. Zongsheng Huang et al. [23] considered the impact of stochastic disturbances in the recovery process on enterprise decision-making and coordinates the supply chain through contract design. Through investigations and experiments, Abbey et al. [24] studied the influence of consumers' attention to the quality of remanufactured products on their willingness to pay. Through behavioral experiments, Agrawal et al. [25] studied the impact of remanufactured products on the perceived value of new products and found that when third-party remanufactured products emerged, they could increase the perceived value of new products. Wu and Zhou [26] analyzed the optimal recycling channel in a competitive environment and showed that dealer recycling is a wise choice. Wu C.H. [27] studied the impact of different product detachability on decision making in the case of manufacturer remanufacturing and third-party remanufacturing.
According to the above content, we know that the existing literature on the impact of technical level on decision-making in remanufacturing activities is relatively scarce. The influence of different remanufacturing technology level on remanufacturer is significant. This is not only reflected in the cost of upgrading remanufacturing technology, but also in the positive utility of its effective use of resources and the establishment of social image. Therefore, research on the level of remanufacturing technology cannot be ignored.
The recycling quality significantly affects remanufacturing activities. The design, decision-making and implementation of a two-way logistics system contribute to reducing the negative impact of low-quality recycling products on remanufacturing. Pokharel and Liang [28] analyzed the optimal recovery price and quantity by dividing product quality into n grades with different recycling prices. Based on this, Gönsch J [29] studied the optimal decision-making problem under an uncertain supply-demand relationship of remanufactured components. Chen W et al. [30] considered the effects of consumer product preferences, recycling quantity and quality uncertainty on supply chain design. Bhattacharya and Kaur [31] studied the income segmentation problem caused by different levels of quality recovery products. Jeihoonian M et al. [32] discussed the effect of uncertain quality on the network design of closed-loop supply chain by taking durable goods as an example. Pinçe, Ç, etc. [33] define the quality level of used products by the length of the life cycle, and study how the manufacturer's remanufacturing dynamically allocates the product's life cycle between warranty claims and sales of refurbished products. Xie G et al. [34] studied the quality improvement of supply chain participants under different supply chain structures for markets with quality competition.
Most scholars set the product quality as an exogenous variable [28][29][30][31][32][33][34]. They argue that product quality is not directly related to corporate recycling and government regulation. However, in real life, the realization of value payment of waste products is an important standard to determine the product entering the reverse logistics. The recycler can determine whether to recycle the product according to the quality of the product. In addition, the recycler can determine the value cost by evaluating the quality of recycled products. In other words, the recycler is proactive in determining the recycled product and its quality. Therefore, it is more appropriate to set the quality of recycled products as an endogenous variable of the recycler in the system.
A sustainable development policy is a powerful tool for the government to improve the recycling rate of resources, reduce pollutant emissions, and promote a green environment in industry. In the two-way logistics system environment, Ma et al. [35] compared the decision-making results and impacts of various entities with or without government subsidy, and the research shows that the policy is beneficial to traditional sales channels. However, its advantage or disadvantage cannot be determined for direct sales channels. Esmaeili et al. [36] established a profit model of closed-loop supply chain considering carbon emission reduction subsidies and conducted sensitivity analysis. Arya et al. [37] studied the influence of corporate social responsibility (CSR) subsidies on supply chain decision-making. Luo et al. [38] studied the decision-making of enterprise competition and cooperation strategy considering carbon emission reduction technology under carbon trading policy. When the recovery rates are the same, Aksen D et al. [39] analyzed the BP models under government support and legislation decision-making environments. Result shows that a higher subsidy is obtained in a government support environment. Toptal et al. [40] discussed the optimal decision-making of green enterprises under different policies of carbon cap-and-trade and carbon trading. Dong et al. [41] designed a variety of contracts to achieve the coordination of the entire supply chain under the carbon trading mechanism. Drake et al. [42] studied the relationship between recovery quality and quantity. It was found that if the remanufacturing cost is low and the recovery quality is low, the recovery subsidy will not have any effect on the total amount of recovery. Esenduran et al. [43] considered the competition between a manufacturer's remanufacturing and a third-party remanufacturer's products, and analyzed the influence of government regulation on remanufacturing.
In the study of government rewards and punishments, taxation, subsidies and other mechanisms in two-way logistics system, most scholars regard the amount of recovery as the government assessment standard. The setting of this standard contributes to the government controls on recycling amount, failing to improve the quality of recycling products. The number of recycled products has steadily increased year by year with the increase of people's environmental awareness, enterprises' appreciation in a remanufacturing process and a government's guidance on recycling. On this basis, policy makers gradually pay more attention to the quality of recycling. The improvement of recycling quality can improve material utilization and production efficiency. Meanwhile, it also reduces the lack and pollution of harmful components caused by non-professional treatment to a certain extent.
In summary, few researchers focus on the impact of different types of policy combinations on system decision-making. Chen, Jen-Yi et al. [44] studied the effects of unit manufacturing and innovation effort subsidy on positive supply chain decisions. It is necessary to discuss the impact of multiple policy environments on the decision-making of two-way logistics systems.
Based on this, the quality of waste products is denoted as an endogenous variable. In the case of uncertain quality levels of recycling, we considered the level of remanufacturing technology and studied the impact of the four combinations of the following types of policies on system decision-making.
(1) Mandatory reward and punishment policy: recycling regulation. This acts on recyclers and is a reward and punishment policy for recycling standards. If the recycler fails to meet the standard, it will be punished by the government.
(2) Selective incentive policy: remanufacturing incentive. It acts on manufacturers and is an incentive policy for remanufacturing technology. Manufacturers who execute according to the policy will be rewarded by the government.
There are several implementations of the two types of policies: neither policies are implemented, only one policy is implemented, and two policies are implemented at the same time.

Symbol Definition
Based on the uncertainty of recycling quality, this paper considers a two-way logistics system consisting of one manufacturer, one retailer and consumers. Wherein, the retailer is responsible for selling new products and recycling waste products, and the manufacturer for disposal and remanufacture of waste products. The following decision-making models are established for calculation: (1) no policies; (2) implementation of only one government policy (recycling regulation, remanufacturing incentive); (3) simultaneous implementation of two government policies. The manufacturer can obtain the relevant incentive subsidy from government only by improving remanufacturing technology.
According to Giutini R [12] research, remanufacturing of used products usually makes no difference in the function, performance and quality of remanufactured products and newly manufactured products. Combined with the above analysis, the following assumptions and symbol definitions are made for the system.
(1) The wholesale price of the unit product is ω. The retail price of a unit product is p. And p > ω > 0.
(2) The demand for products is D(p) = a − bp, where a > 0 is the market capacity of the product and b > 0 is the sensitivity of the consumer to the selling price.
(3) The unit cost of manufacturing a new product with new materials is c n . (4) There is the difference of loss degrees in waste product recycling. It is denoted that the quality factor of recycled product k ∈ (0, 1). The higher the value is, the better the recycling quality will be. (5) The unit recycling cost of the recycler is R(k) = dk, where d > 0 represents the maximum unit recovery price that the recycler is willing to pay.  (6) The recycler resells the recovered product to the manufacturer, and the manufacturer's recycling price paid to the recycler is M(k) = ek, where e > 0 represents the maximum unit recovery price that the manufacturer is willing to pay. The above two payment coefficients satisfy the relationship e > d > 0. (7) The amount of recycling is q(k) = f 1 + f 2 k, where f 1 > 0 is the market's basic recoverable amount and f 2 > 0 is the sensitivity of the recycled quantity to the quality of the recovery. (8) Refer to Savaskan R [16] for a representation of the sales effort function. Let I(k) = hk 2 be an effort function to improve product quality, where h > 0 is expressed as the recovery effort cost factor. (9) The manufacturer's remanufacturing technology level is µ and satisfies µ ∈ [0, 1). The initial state remanufacturing skill level is zero. Let J(µ) = Hµ 2 be the effort cost function of the manufacturer to improve the level of remanufacturing technology, where H > 0 is expressed as the remanufacturing technology level cost factor. (10) After the product is recycled, the manufacturer is responsible for remanufacturing the recycled product and the unit remanufacturing cost is c r (k) = c n − c s k − c t µ. c s represents the maximum cost savings from remanufacturing different quality wastes, and c t represents the maximum cost savings from upgrading remanufacturing technology. The above coefficient satisfies This represents the total recovery cost of the recycler. It consists of two parts: the cost of recovery effort and the cost of recovery quantity. In this article, the retailer is responsible for recycling. (12) To strengthen the supervision of recycling of waste products, the government determines the standard recycling rate τ 0 of retailer. It is assumed that the actual recovery rate τ(p, k) = q(k)/D(p). The government recycling regulation is formulated by taking τ 0 as the criterion for recycling reward or punishment. If the actual recovery rate is greater than the standard recovery rate, then the retailer is rewarded. Otherwise, the retailer is punished. It is denoted that ε is the reward and punishment degree for recycling. When ε = 0 is satisfied, it means no reward and punishment; when ε > 0 is satisfied, the unit reward and punishment amount is ε(τ − τ 0 ). The total reward and punishment amount is A(p, k) = D(p)ε(τ − τ 0 ). (13) When the government implements remanufacturing incentive, the government subsidy is J 0 (µ) = H 0 µ 2 , where H 0 is the degree of subsidy for remanufacturing technology and meets 0 < H 0 < H. (14) Manufacturer's manufacturing cost: This consists of two parts: The total cost of producing with new materials. And the total cost of remanufacturing with old products. Remanufacturing of old products will reduce the unit production cost to a certain extent, so it needs to be distinguished from the use of new materials. It has two extreme situations as follows: The extreme case of non-recycling: This means that all products sold are made from new materials.
The extreme case of perfect-recycling production: This means that all products sold are remanufactured with old products. (15) The statement of profit is π j i , i ∈ {m, r}, j ∈ {1, 2, 3, 4}. Where, {m, r} represents manufacturer and retailer respectively. {1, 2, 3, 4} indicates the status of no policies, implementation of recycling regulation, implementation of remanufacturing incentive, and implementation of both policies. Among them, π j represents the sum of the profits of the manufacturer and the retailer in the case of j.

Model Building
The following decision-making models are constructed according to the above symbol-setting and conditional assumptions.

Decision Model under Case 1
Case 1: No policies. In this case, the production, sales, and recycling activities of the system are not subject to government intervention. Among them, the manufacturer's profit model consists of two parts: (1) the proceeds from the production of products with new materials, (2) the benefits from the remanufacturing of old products. The retailer's profit model consists of three parts: (1) the revenue from the sale of the product, (2) the revenue from recycling, (3) the cost of effort resulting from recycling. This situation will serve as a basic model to facilitate comparison with other situations.
Since the manufacturer is the dominant, the order of decision is as follows: The manufacturer first determines the wholesale price ω, and then the retailer decides the retail price p and the recycling quality k. At this time, the government does not interfere with the activities of manufacturer and retailer, which satisfies ε = 0, µ = 0. The profit maximization models of manufacturer and retailer are established as follows.
, the profit function given by Equation (2) based on anarchy policy is a concave function with a unique maximum. The following models are also satisfied.
Proof. It can be proved by establishing the Heather Matrix. The two-way logistics system is a complete information game. Therefore, the inverse induction is used to solve the function, obtaining the decision variables and profit balance solutions (See Table 1).
The specific analysis of the results are presented in Section 5.

Decision Model under Case 2
Case 2: Strong regulation. In this case, the government will implement recycling regulation. This is a mandatory policy. The government requires companies to achieve sustainable development goals according to certain standards, otherwise they will be punished. Enterprises facing this kind of policy are in a state of mandatory execution, otherwise they will not only affect their own profits, but also damage the corporate social image. In this article, the government will set recovery rates to assess recycling activities. In contrast to Case 1, in this case, the retailer's profit increases the government's regulation of the recycling rate standard. In this case, the government implements recycling regulation and acts on the retailer. That is, ε = 0, µ = 0. The decision sequence is the same as in Case 1. The total reward and punishment amount A(p, k) = D(p)ε(τ − τ 0 ). Create the following models: Similarly, the decision models under Case 2 are solved by the inverse induction method. The resulting equilibrium solution and optimal profit are shown in Table 1.

Decision Model under Case 3
Case 3: Incentive promotion. In this case, the government will implement remanufacturing incentive. This is a flexible policy. The government promotes the sustainable development of enterprises by implementing an incentive policy. Enterprises are in an optional state when facing this kind of policy. If they reach incentive policy conditions, they will be rewarded by the government. At the same time, they can choose not to respond to the policy. It should be noted that enterprises generally need to pay a certain cost in order to respond to government policy. In addition to the government's rewards, there are also some potential benefits for enterprises to respond to policies. For example, good social image, consumer trust and so on. In contrast to case 1, in this case, a manufacturer's profits increase the government's incentive to improve remanufacturing technology.
In this case, the government implements remanufacturing technical regulations. That is, The order of decision is unchanged. At this point, manufacturers obtain government subsidies by upgrading remanufacturing technology. The decision model is established as: Similarly, the decision models under Case 3 are solved by the inverse induction method. The resulting equilibrium solution and optimal profit are shown in Table 2.

Decision Model under Case 4
In this case, the recycling regulation and the remanufacturing incentive are considered to be implemented simultaneously. That is, (ε = 0, µ = 0). Create the following models: Similarly, the decision models under Case 4 are solved by the inverse induction method. The resulting equilibrium solution and optimal profit are shown in Table 2.
To facilitate the expression of equilibrium solution, let In Section 5, the results in the table are carefully compared and analyzed.

Model Analysis
In this section, relevant reasons for the profit of each subject were discussed by comparing the solutions of decision-making models in four different government policy environments. The following propositions are obtained by analyzing recycling quality, remanufacturing technology level, recycling effort cost coefficient, recycling reward and punishment, remanufacturing technology subsidy degree, and recovery payment coefficient. We can get the following proposition: Proposition 1. To ensure the effectiveness of market transactions, the degree of recycling reward and punishment has upper and lower limits. It is related to the difference between the recovery payment coefficient ∆ 1 = e − d and the recovery effort cost factor h.

Proof.
(1) When ε = 0, it means the system in an environment without regulation.
(2) When ε > 0, it means the system in an environment with recycling regulation. The following constraints are satisfied: As ∆ 1 increases, or h decreases, the range of ε will decreases.

Proposition 2.
There are upper and lower limits on the difference of transfer payment coefficient∆ 1 = e − d. When the recycling regulation is implemented, the manufacturer will reduce the recycling price of waste products. That is, when ε > 0, the range of ∆ 1 = e − d will decrease.
Proof. According to the hypothesis, 0 < e < d < c s . The following inequality is obtained by Theorem 1. Proposition 3. The implementation of recycling regulation will increase the sales price and reduce the wholesale price. With the increase of the reward and punishment level ε, the sales price will gradually increase, and the wholesale price will gradually decrease.
Proof. According to Tables 1 and 2: p 1 * r = p 3 * r < p 2 * r = p 4 * r ; ω 1 * r = ω 3 * r < ω 2 * r = ω 4 * r . Then, (1) The quality level always satisfies: That is, the implementation of the recycling regulation can effectively improve the quality of the recycled product. And satisfy ∂k 2 * r ∂ε = ∂k 4 * r ∂ε > 0, which means that the greater the recycling reward and punishment, the better the quality of recycling.
(2) The quality of recycling will increase as the difference in recovery payment factor ∆ 1 increases. Because The lower the cost of recovery effort, the easier it is to improve the quality of product recovery. Because Proposition 5. Impact of government policy environment to remanufacturing technology level: (1) The simultaneous implementation of both policies will help manufacturers to improve their remanufacturing technology.
Proof. By comparing the equilibrium solutions, the difference of remanufacturing technology between the two cases is always greater than zero.
> 0 (2) The level of remanufacturing technology (µ * m ) will decrease as the cost of remanufacturing technology increases after subsidies (∆H = H − H 0 ). At the same time, it will increase as the level of reward and punishment (ε) increases. Also, the remanufacturing technology level (µ * m ) is different for the change sensitivity of ∆H.
Proof. The process of proof is shown in Appendix A (1).

Proposition 6.
To ensure the effectiveness of trading market, and there is a limit for ∆H = H − H 0 .
Proof. The process of proof is shown in Appendix A (2).
(2) Profits comparison between case 3 and case 1: • π 3 * m − π 1 * m decreases with the increased H 0 and the decreased H. The rate of decline gradually slows down with the increase of ∆H.
Proof. For π 3 * m − π 1 * m to find the first-order partial derivative of ∆H, we can get: The subsidized remanufacturing technology cost ∆H = H − H 0 has upper and lower limits.
Proof. The process of proof is shown in Appendix A (3).
(3) Comparison of profits of Cases 2 and 3: There exists H − H 0 = ∆H (2−3) * to get π 2 * = π 3 * . (4) Comparison of profits in Cases 4 and 2: π 4 * m − π 2 * m constant is greater than zero. As long as the implementation of recycling regulation will increase π 4 * m − π 2 * m . That is, when recycling regulation have been implemented, it is advantageous for manufacturers to choose to increase their remanufacturing technology.
Proof. The process of proof is shown in Appendix A (4).

(5) Comparison of profits in Cases 4 and 3: The implementation of recycling regulation will increase
Proof. The process of proof is shown in Appendix A (5).

Example Analysis
Through the example analysis, this section verified the effects of recycling reward and punishment degree and the standard of recovery rate on profits. Furthermore, the influence of ∆H on µ is analyzed. It is assumed that a = 1, b = 0.1, c n = 3, c s = 1, c t = 0.5, e = 0.9, d = 0.6, f 1 = 0.1, f 2 = 0.8. Simulation results by MATLAB are described as follows.

Effects of Recycling Reward or Punishment Degree and Criterion on Profit Difference
Other parameters are fixed to analyze the effects of ε ∈ (0, 10) and τ 0 ∈ (0.2, 0.8) on profit difference. We can get Figure 1.
When recycling reward or punishment degree ε and criterion τ 0 are within the above range, max π 1 * c − π 2 * c is available at (ε, τ 0 ) = (0.42, 0.8). This is due to the high value of τ 0 at this time, it is difficult for recyclers to meet this standard, or to meet the standard requires greater effort costs. When the standard is quite different from the actual situation, it is difficult to mobilize the recycler's enthusiasm for recycling, and the recycling regulation is reflected in a state of punishment at this time. π 1 * c − π 2 * c will increase with the decrease of τ 0 , and the increasing rate will accelerate gradually. Here, the recycler is more likely to meet the recovery rate standard, thus improving the enthusiasm. The recycling regulation is reflected in the reward state.
gradually. Here, the recycler is more likely to meet the recovery rate standard, thus improving the enthusiasm. The recycling regulation is reflected in the reward state. This verifies the results of Proposition 7(1). From a retailer's perspective, recycling regulation can affect their own profits. Compared with the profit in the non-policy environment, retailers should adjust their decision-making variables according to the formulation of government policies to protect their profits. For example: increase the sales price to increase the sales revenue of individual products; increase the recycling price to increase the number of recycled products.
From a government perspective, recycling regulation can help companies improve according to their goals, but the impact of rewards and punishments or reward and punishment criteria on retailer profits is different. From the above picture, the impact of recycling rewards and punishments is more obvious. The government should design the above two indicators in light of the actual situation.

Effects of Recycling Reward or Punishment Degree and Remanufacturing Technology Cost after Subsidy on Profit Difference
Other parameters are fixed to analyze the effects of ( ) ε ∈ 0,10 and − ∈ 0 (5,20) H H on profit difference. Figure 2 can be obtained. This verifies the results of Proposition 7(1). From a retailer's perspective, recycling regulation can affect their own profits. Compared with the profit in the non-policy environment, retailers should adjust their decision-making variables according to the formulation of government policies to protect their profits. For example: increase the sales price to increase the sales revenue of individual products; increase the recycling price to increase the number of recycled products.
From a government perspective, recycling regulation can help companies improve according to their goals, but the impact of rewards and punishments or reward and punishment criteria on retailer profits is different. From the above picture, the impact of recycling rewards and punishments is more obvious. The government should design the above two indicators in light of the actual situation.

Effects of Recycling Reward or Punishment Degree and Remanufacturing Technology Cost after Subsidy on Profit Difference
Other parameters are fixed to analyze the effects of ε ∈ (0, 10) and H − H 0 ∈ (5, 20) on profit difference. Figure 2 can be obtained.
When recycling reward or punishment degree and remanufacturing technology cost after subsidy are within the above range, π 4 * m − π 2 * m > 0. When the recovery regulation is implemented, manufacturers are willing to improve the remanufacturing technology. The bigger ε is, the faster π 4 * m − π 2 * m grows. This also verifies the results in Proposition 7 (4). Remanufacturing incentives are beneficial to improving the level of remanufacturing technology, while improving the profits of manufacturers. This is a win-win situation. But the effects of the two policies on manufacturers' profits are different. The effect of remanufacturing subsidies on the profit of manufacturers in the figure is more obvious than the degree of reward and punishment. In this case, if the government adjusts the remanufacturing incentives, the manufacturer should react more cautiously. This also verifies the results in Proposition 7 (4). Remanufacturing incentives are beneficial to improving the level of remanufacturing technology, while improving the profits of manufacturers. This is a win-win situation. But the effects of the two policies on manufacturers' profits are different. The effect of remanufacturing subsidies on the profit of manufacturers in the figure is more obvious than the degree of reward and punishment. In this case, if the government adjusts the remanufacturing incentives, the manufacturer should react more cautiously.   Figure 3 shows the effect of remanufacturing technology cost after subsidy ∆H on profit difference when τ 0 = 0.5, ε = 0.25 or ε= 0.5. In this Figure, both π 4 * m − π 2 * m and π 3 * m − π 1 * m decrease with the increase of ∆H. The rate of decrease gradually slows down. When ∆H is large, it means high cost and small subsidy. Therefore, manufacturers are less motivated to raise the level of remanufacturing technology. This also verifies the results in Proposition 7 (4). Remanufacturing incentives are beneficial to improving the level of remanufacturing technology, while improving the profits of manufacturers. This is a win-win situation. But the effects of the two policies on manufacturers' profits are different. The effect of remanufacturing subsidies on the profit of manufacturers in the figure is more obvious than the degree of reward and punishment. In this case, if the government adjusts the remanufacturing incentives, the manufacturer should react more cautiously.  In the joint policy environment, the higher the degree of reward and punishment, the higher the profit of the manufacturer. This means that the advantages of the joint policy are more obvious. The profits in the above chart increase with the increase of subsidies, and the sensitivity of the increase is getting higher and higher. The government can design policies based on changes in the manufacturer's sensitivity to this variable.

Effects of Recycling Reward or Punishment Degree and the Remanufacturing Technology Cost after Subsidy on Remanufacturing Technology
Other parameters are fixed to analyze the effects of ε ∈ (0, 10) and H − H 0 ∈ (5, 20) on remanufacturing technology level difference µ 4 * m − µ 3 * m . Figure 4 can be obtained.
increase of ε , μ μ − 4* 3* m m gradually increases. This is because recycling reward or punishment degree increases to improve remanufacturing technology level, thus obtaining positive benefits. This provides support for the operational decision-making of the remanufacturer. The above figure verifies the results of Proposition 5. Obviously, the joint policy can effectively improve the level of remanufacturing technology. Recycling subsidies have a greater impact on the level of remanufacturing technology in the above chart. This provides a reference for the government to implement joint policy.

Conclusions
This paper considers the uncertainty of recycled product quality and sets it as an endogenous variable. At the same time, considering the impact of recycling quality level, recovery rate standard and remanufacturing technology level, the manufacturer and retailer decision-making models were established under the four policy environments (no policies, incentive promotion, strong regulation, joint policy). Through analysis, we can get the following conclusions to provide decision-making reference for the government and enterprises. With the increase of ∆H, µ 4 * m − µ 3 * m gradually decreases. This is because ∆H increases to reduce the enthusiasm of manufacturers to improve the remanufacturing technology. With the increase of ε, µ 4 * m − µ 3 * m gradually increases. This is because recycling reward or punishment degree increases to improve remanufacturing technology level, thus obtaining positive benefits. This provides support for the operational decision-making of the remanufacturer.
The above figure verifies the results of Proposition 5. Obviously, the joint policy can effectively improve the level of remanufacturing technology. Recycling subsidies have a greater impact on the level of remanufacturing technology in the above chart. This provides a reference for the government to implement joint policy.

Conclusions
This paper considers the uncertainty of recycled product quality and sets it as an endogenous variable. At the same time, considering the impact of recycling quality level, recovery rate standard and remanufacturing technology level, the manufacturer and retailer decision-making models were established under the four policy environments (no policies, incentive promotion, strong regulation, joint policy). Through analysis, we can get the following conclusions to provide decision-making reference for the government and enterprises.
Suggestions on the formulation of government policies: in order to ensure the effectiveness of the trading market, the government should pay attention to the relevant constraints when designing policies. In the formulation of recycling regulations, the degree of reward and punishment will be affected by many factors (Proposition 1). For example, the recovery effort cost coefficient and the recovery payment coefficient. As the cost of recycling by collectors increases, the effective range of rewards and punishments will shrink. At this time, the government should consider reducing the size of rewards and punishments appropriately, thereby reducing the recycling pressure of recyclers and ensuring the effective development of recycling activities. When the unit income recovered by the recycler increases, the effective range of rewards and punishments will expand. The government may consider increasing the degree of rewards and punishments to further promote recycling activities.
In the formulation of remanufacturing incentives, the degree of remanufacturing subsidies is related to the degree of reward and punishment, the cost of recovery efforts, and the recovery payment factor (Proposition 6, Proposition 7). When the degree of reward and punishment is reduced, the effective range of remanufacturing technology subsidies will expand. The government can consider increasing subsidies for remanufacturing technology and shifting the focus of the system to remanufacturing.
Suggestions on pricing for enterprises: in reverse logistics, when recycling regulations are implemented, manufacturers will consider protecting their profits by reducing the recycling price of used products (Proposition 2). At this point, the retailer will increase the recycling price and promote the increase in the amount of recycling, thereby protecting its own profits.
Recycling regulations affect the development of wholesale prices and retail prices, while remanufacturing incentives have no effect on this (Proposition 3). When implementing recycling regulations, the retailer will transfer the recovery pressure in reverse logistics to the forward sales segment. This will show that the sales price will increase as the reward and punishment level increases. At this time, manufacturers will increase the number of products sold by reducing the wholesale price, thus protecting their own interests.
The role of government policies in sustainable development and the circular economy: government policies can effectively improve the quality of product recycling (Proposition 4) and remanufacturing technology (Proposition 5). This will increase the amount of reusable materials and reduce the number of harmful components that are missing. The greater the reward and punishment, the higher the quality of recycling. The greater the subsidy, the higher the level of remanufacturing technology.
When recycling regulations have been implemented, it is advantageous for manufacturers to choose to increase their remanufacturing technology (Proposition 5). That is, the joint policy is more conducive to the operation of the two-way logistics system. When the two policies work together, as remanufacturing subsidies increase, manufacturers will increase their remanufacturing skills to increase their own profits. It should be noted that the joint policy will reduce the sensitivity of the remanufacturing technology level to subsidies. Therefore, manufacturers should be more cautious in determining the level of remanufacturing technology.
Based on the background of environmental protection, this work analyzed the influencing factors of the system with uncertain recycling quality. The decision situations in different government regulations were compared to improve the supervision and incentive effects on recycled products. It provides a reference for government policy making and enterprise operational decision-making. In the follow-up study, we will consider the system operation study of multiple manufacturers and retailers in the Internet environment.