Estimation of Initial Stock in Pollution Control Problem

: A two-player differential game of pollution control with uncertain initial disturbance stock is considered. In pace with contemporary policy in the resource extraction industry, we initiate our research based on a resource extraction differential model with a rehabilitation process in which the ﬁrms are required to compensate the local to rehabilitate the polluted and dilapidated areas. Given the reality that the initial pollution stock plays a critical role in the production, and we cannot rigorously determine its actual value, a simulation of the estimation of the initial stock is alternatively investigated through the Pontryagin maximum principle (PMP). The later analytical results by normalized value of information (NVI) indicate the precious inﬂuence brought to the ﬁnal payoff under various estimations of the initial stock both in the cooperative and non-cooperative cases. With such guidance, the player is capable of making a much more judicious decision when it comes to the determination of the initial stock. Furthermore, a numerical example is additionally presented for better comprehension.


Introduction
In assorted subjects, the problems involving the value of information (VoI), which functions as a decision analytic approach for evaluating the potential value contained in uncertainty, are ubiquitous. As indicated in [1], the use of VoI analysis started at least from the 1990s. Its realm of application is widespread over economics, infrastructure, environment, energy, medical system and other fields, partly shown in [2][3][4][5][6][7]. Among them, the VoI hidden in the estimated value of one specific parameter in a model, e.g., the estimation of potential amount of oil underground, plays a significant role in the economic activities.
Speaking of the role of estimated value containing VoI, commonly, when we are trying to calculate the velocity of an object moving with a constant acceleration, the initial speed v 0 has to be observed. It is also inevitable to determine the initial estimated point when we choose to utilize the gradient descent method to solve the optimization problem. The list of such similar cases can be immense, but the unanimous feature of these cases is that the estimated value stands as an indispensable part in all problems, some of which may even dramatically affect the interest of the decision maker.
Apparently, the models based on which the problems are discussed are numerous. However, the mixture of VoI and differential game in which we highlight how the information originated from the uncertainty of the parameter would actually affect the outcome is scarce. Previously, we accomplished several research works on the uncertainty of the model of the differential game in [8]. From [9], this paper will concentrate on the study of the uncertainty of the initial stock in a row. Compared with the former work, in this paper, we bring the pollution control problem with the rehabilitation process into consideration for the first time. Instead of continuing our work in the traditional model, the rehabilitation model symbolizes the updated policy made to optimize the regulation of resource extraction. From the discussion of the force delivered to the investment by the reclamation of responsibility in [10,11] to the disadvantages of reclamation bonds in [12,13], the policy referring to the rehabilitation process is still in reform [10,14,15]. Therefore, on the basis of [16], we will study a differential game with two firms extracting the resource and meanwhile reimbursing for the reclamation. There is also a similar research about the uncertainty in dynamic game of emissions pollution in [17] where the uncertainty is expressed in the form of information available to the players, which can be acquired completely.
Through our work, we offer a full-scale of estimation of initial stock covering overestimation and underestimation under both cooperative and noncooperative cases. The decision makers could touch the whole view of the performance of various estimated initial stock and make a reliable judgment afterwards. Additionally, for simplicity, we simplify the model mentioned in [16] to better pinpoint our interest in the role of the estimated value of the initial stock instead of investing much energy into finding an optimal solution to minimize the firms' cost brought by the rehabilitation activities. The game is supposed to be homogeneous such that partial parameters of two firms are identical.
The structure of this paper is well organized in the following way. In Section 2, we present the formulation of the problem and admissible results. The players' payoffs in the cooperative case of overestimation and underestimation of initial stock are demonstrated in Section 3, and the correspondent non-cooperative case is described in Section 4. The elaborate VoI analysis to the problem in terms of the estimation of the initial stock over two possible conditions is explained in Section 5 and an illustrative example is attached in Section 6. Finally, in Section 7, we give our conclusion.

Problem Statement
Consider a differential game in which there are two firms z ∈ {i, j} working on resource extraction with disturbance stock p t at time t ∈ [0, T]. The extraction rate γ z p t , the environmental disturbance rate e z t = z γ z p t and the abatement a z t = α z τ z t are separately denoted. γ z , z , α z are positive constant parameters. The reclamation effort τ z t is the control variable.
The system dynamics are given bẏ where δ > 0 is the natural revitalization rate of the environment and the growth rate of the environmental pollution stock without abatement i γ i + j γ j − δ > 0. The goal of the firms is to take optimal reclamation effort τ z t to minimize their cost devoted to the revitalization procedures. The cooperative objective functional is while the noncooperative objective functional is where p 0 stands for the initial disturbance stock, the reclamation cost is labeled as (τ z t ) = (τ z t ) 2 2 , and the abandonment reclamation fee for each firm at the terminal time is f (p T ) = φ p 2 T 2 in which φ is a constant coefficient of terminal cost. The coefficient φ can also be adjusted in accordance with the latest policy.
Since assuming the game is homogeneous, one could rewrite (1) tȯ and the updated constraint is 2 γ − δ > 0. This game was presented in [16], where the closed-loop solutions were considered. In our setting, we assume that the players do not have accurate information about the initial stock of perturbations and cannot observe p t at any moment of time. Under such conditions, it is not possible to use closed-loop control, so we focus on the open-loop solution and try to estimate the value of information about the initial state of the system.

Cooperative Case
Following [18], to find an open-loop solution, we define the Hamiltonian function in accordance with (2) as in which ψ is the adjoint variable, which can be obtained through the canonical system with the transversality condition ψ(T) = − d dp t φp 2 According to the first-order extremality condition and the sign of the second derivative of H on the control variable, the optimal control for each firm is . Correspondingly, the optimal trajectory for both firm is The total payoff of both firms is One can note that using controls (τ z t ) * , z ∈ i, j, the players will not completely clean up the environment by time T, since p * T > 0.

Two Cases Over Estimation
With respect to the uncertainty of the initial pollution stock, there are two possible cases which could occur when we are delving into the detailed performance of it. The measured value of initial stock can be overestimated compared with the actual one. On the other hand, underestimation is also reasonable. In fact, there is the third case in which the estimation exactly reaches the actual value, but the result can succinctly emerge without further discussion.
Supposep 0 represents the estimated initial stock. Then the updated optimal control and optimal trajectory are 3.

Overestimation of the Initial Stock
When the estimated initial stockp 0 is above the actual one (i.e.,p 0 > p 0 ), there are two possible outcomes, which depends on how muchp 0 differs from p 0 . If the difference is not very large, then we obtainp * T ≥ 0, as in the original problem. Otherwise, it is possible that firms will clean up and reclaim the environment by timet < T. In that event,p * T < 0. First, consider the case whenp * T ≥ 0. According to (7), we have This inequality holds if Writingp 0 in terms of p 0 asp 0 = rp 0 , we obtain inequality (8) in the form In the case when inequality (9) is satisfied, the players using controls (6) will have the total payoff in the following form: When condition (9) is not satisfied, we havep * T < 0. This means that firms will clean up and reclaim the environment by the timet < T. To find the momentt such thatpˆt = 0, one can solve the equation to pinpoint e −2(2 γ−δ)t instead of explicitt: The resulting expression for controls in this scenario thus reads The latest trajectory goeŝ , t ∈ [t 0 ,t], , the current total payoff is

Underestimation of the Initial Stock
In the case when the players underestimate the initial stock, i.e., utilizing the valuê p 0 < p 0 as an initial condition (i.e., 0 < r < 1), again we obtainp * T ≥ 0. The players use strategies (6) and the total payoff is identical to K(p 0 , (τ i t ) * , (τ j t ) * ).

Non-cooperative Case
Switching to (3), we again determine the new Hamiltonian function As the same solution described in the cooperative case, smoothly, the optimal control, optimal trajectory and payoff of each firm can be obtained . The optimal trajectory for the firm is The total payoff of the firms becomes

Two Cases Over Estimation
In the non-cooperative case, the analysis over the classification of estimation of the initial stock is identical to the previous-overestimation, equality and underestimation. Assuming that the estimation of the initial stockp 0 makes no difference to both firms, then the corresponding optimal control and optimal trajectory are . (12) Due to the similar calculations over the overestimation and underestimation as described above, the result in this case is frankly given in Proposition 2.

Normalized Value of Information for Estimation
We continue the line of research in [8,9], and define the problem of determining the value of information in a continuous time differential game. In contrast to the cooperative case, for a non-cooperative game, when using unreliable information, the costs of players could decline. In this case, we assume that the value of information about the initial condition is equal to zero.

Definition 2.
The normalized value of information in a noncooperative game for player z ∈ {i, j} is defined as NVI characterizes the maximal amount the decision maker may wish to pay to achieve the precise information about the value of the parameter of interest. The larger the values of V C and V NC , the bigger the amount that the decision maker loses when using inadequate information.
Grounding on the results obtained in Section 3, we can formulate the following proposition. (1) and (2), the value of information about the initial disturbance stock p 0 is given by

Proposition 1. In the cooperative game
Here, r =p 0 p 0 ,p 0 is an estimation of the initial disturbance stock and Proof. The proof follows directly from the results obtained in Section 3.
Note that when 0 < r < 1 or 1 < r ≤ 1 + θ, for the value of information, we use the formula If r = 1, then V C = 0. If r > 1 + θ, NVI is as follows: .

Proposition 2.
In the noncooperative game (1), (3), the value of information about the initial disturbance stock p 0 is given by Here r =p 0 p 0 ,p 0 is an estimation of the initial disturbance stock and

Comparison of Cooperative and Noncooperative Cases
In the considered game, the players are symmetric, which means we can conclude that in the cooperative game, each player will receive a half of the total payoff. Then the value of information for the total payoff will coincide with the value of information for each player separately. With such knowledge, the way to compare the value of information about the initial disturbance stock in the cooperative and non-cooperative game is quite explicit.
The result of comparison between V C and V NC is presented in Tables 1 and 2. In Table 1, it is assumed that 0 < θ ≤ 1. θ > 1 in Table 2. Table 1. Comparison of NVI for 0 < θ ≤ 1. Table 2. Comparison of NVI for θ > 1.
It can be noticed that in most cases, the value of information in a cooperative game is higher than in a non-cooperative one, which indicates that the impact of the estimation of the initial stock on the players' payoffs in a cooperative game is much more significant. Logically, if the players choose cooperative behavior, they should pay more attention to the accuracy of information about the initial condition.

Comparison of Overestimation and Underestimation
In order to figure out whether overestimation and underestimation under the relatively same level will have an identical impact on the final payoff in each case, first of all, an analytical analysis in cooperative case is given, in which V C represents NVI in the case of overestimation with oscillated value βp 0 away from p 0 and V C -NVI in the case of underestimation with the same oscillation. There are various cases depending on the parameters values.
In the case of overestimation, forp 0 = p 0 + βp 0 we have In the case of underestimation, ifp 0 = p 0 − βp 0 , then We can find out that These two cases show that the influence generated by the underestimation rate always outperforms or is equal to it in overestimation with the counterpart rate.

3.
For β ≥ 1, it does not make sense to progress under this circumstance. However, it can still be noted that the loss in the case of overestimation outweighs the largest one under underestimation only if β > 1 θ . Similar analyses can be made for the non-cooperative case and the terminal cost coefficient case.
Generally, from Figures 1 and 2, it is distinguishable that the classification of the game, cooperative and non-cooperative, does demonstrate a similar curve but with different magnitude. Obviously, compared with overestimation on the right, underestimation on the left would bring more cost to the players if we observe them from the whole view, which further corroborates our previous analytical analysis. With the fixed initial time, the larger the terminal time, the sharper the increased cost would be generated by underestimation. On the contrary, the linear-like curve of NVI on overestimation is much more stable and gentle. Combined what we have obtained in the analytical analysis, we have reasons to believe that the decision maker would consider adding more weights to the overestimation of the initial stock after obtaining an observed range of initial volume, e.g., given the range of initial stock [A ± B]. In our case, we turn to A + B. Moreover, because of the larger influence shown from the cooperative case, it requires extra effort for the decision maker to maximally make sure a high reliability of the estimated initial stock. In addition, we think it is meaningful to study the implication of the change of terminal cost coefficient φ. This parameter determines the penalty that a firm should pay to the regulator if it fails to reclaim the environment to the terminal time. From Figure 3, it can be learned that in the case of underestimation, the NVI greatly increases with the increase in φ. What is more, the overall plot here is in accordance with the performance of NVI under different terminal time. Hence, we think it is more acceptable to make an overestimation over the initial pollution stock when the decision maker cannot guarantee that the estimation they obtain is accurate enough to be followed.

Conclusions
In this paper, we accomplished the study of how the estimated initial stock could influence the performance of two players in terms of the rehabilitation process in cooperative and non-cooperative differential games. Two cases, including overestimation and underestimation, were explained and NVI was applied to quantify the value of information contained in the initial stock parameter. Through the rigorous analytical analysis by comparing NVI under various terminal times in both the cooperative and non-cooperative case, we firstly found out that the uncertainty in initial stock brings more disturbance to the cooperative case which reminds the players to be vigilant in that case. Then in both cases, a common phenomenon in which overestimation of initial stock carries less weight to the final payoff was observed. As also demonstrated in the numerical example, overestimation of the initial stock has significantly less effect on the players' interest, while the estimation generally deviates from the actual. Therefore, after making an observation of the targeted subject, the result shows that the decision makers could greatly reduce their cost if they incline toward overestimation when they receive the evaluation report of the estimated initial stock from the observation team, especially in cooperative case.
In addition, due to the lack of historical data, in our case, we cannot formulate the regular pattern of the final initial stock the player decides to leverage in contrast to the estimated interval, which means we are not able to proceed in empirical research. The pollution control problem is our current concern. However, we expect to apply the analysis in other economic problems in which the estimation of the initial value plays a critical role in their production activities.

Conflicts of Interest:
The authors declare no conflict of interest.