Game Theory of Pollution: National Policies and Their International Effects

In this paper we put forward a simple game-theoretical model of pollution control, where each country is in control of its own pollution, while the environmental effects of policies do not stop at country borders. In our noncooperative differential game, countries as players minimize the present value of their own costs defined as a linear combination of pollution costs and costs of environmentally friendly policies, where the state vector of the system consists of the pollution stock per country. A player’s time-varying decision is her investment into clean policies, while her expected costs include also pollution caused by her neighbors. We analyze three variants of this game: (1) a Nash game in which each player chooses her investment into clean policies such that her expected costs are minimal, (2) a game in which the players imitate the investments into clean policies of their neighbors without taking the neighbor’s success concerning their costs into account and (3) a game in which each player imitates her neighbors’ investments into clean policies if this behavior seems to bring a profit. In each of these scenarios, we show under which conditions the countries have incentives to act environmentally friendly. We argue that the different results of these games can be used to understand and design effective environmental policies.


Introduction
Current worldwide environmental policy goals aim at lowering emissions in the air in order to fight global warming [1]. These policies include energy generation via renewable energy sources (RES) and various mechanisms to clean the air. For example, the European Union (EU) is striving to achieve 20% of energy generated from RES by 2020 and to reach a minimum of 27% of renewable generated energy by 2030, while aiming to reduce greenhouse gas emissions by at least 40% by 2030 compared to their level in 1990 [2,3]. Objectives for 2050 are even more challenging, with a reduction of the carbon emissions by 80-95% [4]. All around the world (e.g., in China [5], Japan [6], New Zealand [7], United States of America [8,9] and Turkey [10]) countries turn to energy generation via RES.
However, some countries are more committed to the green policies than other ones and the behavior of each country may influence the pollution of other countries.
For example, each country's air policies contribute to the quality of the air of its neighbors [11][12][13]. In the last decades, countries therefore detected the need to cooperate in order to fight global warming. There are many joint policies aiming at the reduction of greenhouse emissions, e.g., the Kyoto Protocol [14] or The United Nations Framework Convention on Climate Change [15].
Besides joint policies, every country has its own interests and standards regarding its more or less green policies. Of course, the government of each country keeps in mind its own costs and tries to avoid policies harming its economy. This may give rise to conflicting objectives and behavior among different countries. In discussions among different countries belonging to the same geographical regions, some countries like the Scandinavian ones or Germany act as leaders trying to impose the emission reduction strategies on other countries that are less prone to the green policies [16][17][18]. In some countries, such as the United States of America, there are nation-wide policies taking into account the geographical characteristics of the country, such as the fact that downwind states suffer from cross-border pollution by upwind states and need to be protected [19].
There are many different strategies to achieve emission reduction. While some countries are mainly focusing on punishment of emission via taxes [20], other countries provide funding for low-emission technologies or approaches, such as implementing solar plants. After many discussions about the impact of taxes on emissions [20], there is an evidence that this impact is positive [21]. On top of the national incentives, the European Union offers super-credits for car manufacturers to produce low-carbon vehicles [22]. Furthermore, quite some countries implemented low-emission zones [23,24] or driving restrictions [25] in big cities to at least lower pollution locally.
Fighting the global warming involves both global and national policies. On the one hand, countries need to cooperate and, on the other hand, they often do not want to lose independence in their environmental choices.
There is much research regarding the influence of local pollution on the global or local environment of a country [11,26]. Additionally, some researchers model and analyze the role of cooperative country associations fighting against the global warming (e.g., [27][28][29][30]). However, in those associations, punishment from an external source often plays a role [31]. Krass et al. (2013) [32], for example, address the ability to force firms to invest into emission-reducing technologies and produce their goods in a more environmentally friendly way. They use a two-player Stackelberg model to find the optimal level of emission punishment to maximize welfare. One of their main conclusions is that taxes may have a positive effect but have to be used carefully, since extremely high taxes can have the opposite effect. In contrast to this pollution reduction forced by an external source, Barrett (1994) [33] models self-enforcement of international environmental agreements. Additionally, Nkuiya (2012) [34], Nkuiya et al. (2015) [35] and Miller and Nkuiya (2016) [36] investigate voluntary participation in climate treaties, also including the possibility of a sudden regime shift using both cooperative and non-cooperative game theory. Moreover, Lakzano et al. (2016) [37] study how adaptations of costs influence the decision of developed and developing countries to join international agreements. Furthermore, there is much research done about the influence of dominant players on the pollution behavior of weaker players. Garrab and Breton [38], for example, examine two different groups of players, namely signatories and defectors. In their model, signatories punish defectors with higher pollution costs. Their main focus lies on the comparison of Nash and Stackelberg information structures and their influence on the pollution policies of the players.
In this paper we adopt a game theoretical approach to model and understand interactions among EU countries and their subsequent choices of investment into green policies. Different EU countries are individual players in the game. We investigate different strategies of the players to invest into green policies. First, each player has a cost function to be minimized with respect to its choice of investment into green policy. Such costs could include the development costs for pollution reduction technologies or costs for cleaning the environment [39]. Second, the investment into environmentally friendly policies is influenced by the neighbors of a country -a country may imitate the behavior of its neighbors. While the choice to behave environmentally friendly will decrease the pollution stock of the country, it is expected to be costly, proportionally to the pollution stock in the country. However, pollution increases the costs for a country as well. Turnock et al. (2016) [40] estimated that air pollution reducing technologies caused an economical benefit of 232 billion US dollars due to the prevention of premature deaths annually in the EU. Thus, aside from the property value reduction and health costs, countries should be penalized and rewarded for increases and decreases in pollution, respectively. We assume that this penalty/benefit for pollution flow can be regulated by the EU.
We will first focus on optimal time-varying investment decisions per country when a country minimizes the present value of its own costs and we will observe the behavior of the entire system when we increase the cost/benefit for pollution flow per country. Additionally, we will also consider two different types of imitation behavior as possible strategies of the countries involved. Imitation has had a central role in evolutionary game theory, focusing on the properties of the attractors of the underlying dynamical games [41][42][43]. We make a direct connection to the existing results in the field. Subsequently, we compare the outcomes of all three scenarios and map the observed phenomena to the challenges in implementing green policies worldwide.
The reminder of the paper is composed as follows: In Section 2 we present the basic assumptions for our models and a short description for each of the variants of the models. Afterwards, we provide a short stability analysis. In Section 3, we perform different case studies about all variants of the model. We finish the paper with the discussion of the results and directions for future work in Section 4.

Models
In this section, we will introduce the modeling framework for our pollution game.

Basics
Let us assume that N is the set of EU countries, where country i ∈ N has a pollution stock x i (t) at time t ∈ [0, T] with T > 0 a fixed and known time to address. The pollution stock x i changes according to the differential equatioṅ where N i ⊂ N is the neighborhood of country i, i.e., country i itself and all countries neighboring with it. The neighborhood of a country is defined by the connections in a network of countries (see Figure 1 as an example). The initial pollution stock x i (0) > 0 is fixed and known a priori to all countries, ψ ji ∈ [0, 1] is the rate at which country j pollutes the environment of country i. Values of ψ ji can be arranged in an adjacency matrix of a network in which countries that are polluting each other are connected (ψ ji > 0). Control u i ∈ [0, 1] can be interpreted as the investment of country i into environmentally friendly (clean) policies. If u i = 0, country i is not investing into the clean policies while, if u i = 1, country i is spending its maximal effort into the investment. For example, a country can invest a lot into renewable energy or confine itself to coal-fired power stations. Please note that in our model, the investment into clean policies influences the current investments of the other countries as well. There exist other models, in which u i is interpreted as a pollution abatement level for country i and in these models, each country's costs depend only on its own abatement level [33].
The last term, u i (t)x i (t), can be seen as the reduction of the pollution stock due to investments into clean policies. Each country i has pollution costs defined as where e i (βẋ i (t)) with β ≥ 1 can be interpreted as the environmental costs that are caused by the pollution flow and λ i u i (t) defines the costs for clean policies. These costs for clean policies can vary per country. Constant β ≥ 1 denotes a factor defined by an external party (e.g., European Commission) as a rate of punishment for the pollution stock. In Section 2.2 we will introduce different versions of the model based on the way how the investment u i into clean policy is defined: 1. Nash game: Country i minimizes pollution costs (2) with respect to its investment u i (t) at each time t ∈ [0, T], where other countries are assumed to do the same (Section 2.2.1).

Countries imitate behavior of their neighbors independently of the neighbors' costs (Section 2.2.2). 3. Countries imitate the investments of their neighbors dependent on the neighbors' costs such that
more profitable neighbors influence a country in a stronger way.
To give function e i from (2) a more specific form, let us define it as In this case, the environmental costs e i grow exponentially with the pollution flowẋ i . Increasing parameter β increases the impact of the current pollution flow on the costs. The last term of Function (3) ensures that the costs are zero when no pollution flow takes place.
Thus, in our model, the individual costs for each player are given by

Nash Game: Optimizing Individual Costs
In this variant of the model each player minimizes her individual costs at time t ∈ [0, T] defined in (2), i.e., the optimal strategy u * i minimizes the costs (4) for each i ∈ N . The strategies u * therefore form a Nash equilibrium of the game at each time t ∈ [0, T].

Basic Imitation Behavior
In contrast to the model described in Section 2.2.1, we can also think of countries that are influencing other countries' behavior. Considering the pollution stock defined by Equation (1), country i is now no longer optimizing its own costs but rather imitating the average of investments into clean policies of all its neighbors. Please note that countries can thus be influenced by multiple countries at once.
A country i's decision u i is influenced by each neighbor in the same way. For this basic imitation approach, the change of investment u i of player i is defined aṡ

More Advanced Imitation Behavior
In this variant of the model we assume that countries with a successful investment have bigger impact on the decision of other countries. Here successful means that the pollution costs of a country are low. Similar to the consensus protocol described in [44] or [45], the change of investment u i of player i is defined asu with S ij (t) being the sigmoid function S ij (t) = 1 1+exp(c i (t)−c j (t)) . A country is therefore more influenced by countries that have lower costs than by countries paying much for their (less environmentally friendly) behavior.

Stability Analysis
Here we briefly discuss linear stability properties of the system defined by differential Equation (1). Equilibria of this equation satisfy where x * j , u * i , and u * j are the equilibrium pollution of country j, the optimal clean investment strategy for country i, and the optimal clean investment strategy for country j. The derivative of the right-hand side of the Equation (1) with respect to x i is 1 − 2 u i . This means that the equilibrium given by (7) is an attractor for u * i < 0.5 and a repeller for u * i > 0.5. This implies that if u * i is small for each i ∈ N , there is a single attractor x * i . If the values for u * i are all bigger than 0.5, then there is a single repeller. Therefore, in case of the strategy u * i of country i minimizing costs (4), our interest is whether or not u * i gets (and stays) under 0.5. This analysis however gives us a good idea only about the situation with all countries being in the same neighborhood and might not help us much in case of some players belonging to more neighborhoods. For both imitation behavior cases, where strategies of the players are given by (5) and (6), respectively, strategy u * i is in attracting equilibrium if u * i = u * j for each j ∈ N i \ {i}. This means that if all countries were within the same neighborhood, we would expect their optimal strategies asymptotically converge to each other. The attracting equilibria we find coincide with the results on imitation in evolutionary games [46,47].

Implementation
The models from Section 2.2 were implemented numerically. The software to find optimal investments is developed using Eclipse IDE for Java Developers, Version Neon.3, Release (4.6.3) with execution environment JavaSE-1.8 provided by Eclipse Foundation Inc. (Ottawa, Canada).
Depending on the selected model, a computation step in our simulation is defined by the following sub-procedures.

•
Nash game: When each country wants to minimize the present value of its own costs, the simulation starts with initial values for the pollution stock x 0 . Then, we use a fixed point approach to compute u * i for each country such that the costs c i become minimal for each country i ∈ N . The optimization itself is done by a software implementation called jcobyla [48]. While for some specific scenarios we can calculate u * analytically, especially for larger problems, we cannot find u * analytically that easily. The software implementation is based on Powell's numerical optimization implementation for constrained problems with unknown derivatives of the objective function [49]. We proceed with the next computation step by computing pollution x * , with discretization of the differential Equation (1) via a fourth-order Runge-Kutta approach with step size 0.01.

•
Imitation game: Considering that the countries imitate other countries' behavior, we start a computation step with values for x * i and u * i from the Nash game for the initial phase. Using those values, we can compute the investment into clean policies by applying either Formula (5) or (6). Then, we again use a fourth-order Runge-Kutta approach to compute the pollution stock. Afterwards, we continue with the next computation step until we reach the defined number of total computation steps corresponding to time T.

Settings of the Case Studies
In all our case studies, we examined a network based on the geographical structure of the EU. The 28 member countries of the EU are the players of the game. A country i is polluting the environment of country j if i and j share a geographical border. Those borders also include maritime borders [50]. An overview of all neighbor relations is provided in Table 1. Thus, influencing among countries is bi-directional, i.e., if country i can pollute the environment of country j, country j can also pollute the environment of country i. In this case, we assume that ψ ji = ψ ij > 0. The resulting network is shown in Figure 1. It does not include phenomena like the fact that downwind states suffer much more from the pollution by upwind states than in the opposite direction. However, our simulation software offers the possibility to choose any symmetric as well as asymmetric values for ψ ij .
We start all our numerical studies with an initial pollution stock x 0 based on values of carbon dioxide emissions from 2010 [51]. These values are in the unit of Mt. Due to the scalability of our model, we do not specify a fixed time unit. In the remainder of the paper, we will consider λ i to be fixed to 4 for all countries as this value seems to be rather realistic and, also, due to limited availability of better data. In the future, we would like to find and use more appropriate data in order to model the influence of λ i in a more realistic way. However, the influence of this parameter is anyway limited due to the much bigger impact of term exp(β x i ) on the cost function.

Optimizing Individual Costs
Without any force of a joint administration like the European Commission, each country wants to minimize its individual costs. In our model, those costs are influenced by the geographical neighbors in such a way that neighbors pollute the country's environment and the country has to pay for this (see (4)). Apart from that, the neighbors do not influence the costs of another country.
Our results show that there is a strong increase of the pollution stock compared to the initial values. We start with values in the range of about 0 Mt to 832 Mt and end with a pollution of 2000 Mt. However, after the initial phase, both the pollution stock x and the countries' investment u into environmentally friendly policies converge to the equilibrium values. This behavior is shown in Figure 2a. It is remarkable that the pollution stock x i for all countries converges to the same value while there is no consensus in the underlying investments u, which coincides to the equilibrium from Section 2.3. An overview of the different investments into green policies of the countries is shown in Figure 2b.
Furthermore, earlier or later the countries start acting more environmentally friendly. The more neighbors a country has the higher the chance is that it starts fighting pollution earlier since all neighbors are additionally polluting its environment and increasing the costs. Germany, the country with the highest number of neighbors (see Table 1), acts most environmentally friendly.

Influence of the External Control Parameter β on Pollution Costs
When the countries' investments into environmentally friendly policies only depend on their own costs, we observe that the pollution stock increases a lot until it reaches a saturation point. In order to lower this pollution increase, an external administration (e.g., European Commission) can punish pollution flow by increasing the corresponding costs. We model this punishment by using a factor β > 1 that influences the costs for pollution (also see (2)). The punishing party then becomes a leader in a Stackelberg game, while the countries as followers minimize their pollution costs. The leader wants to find an optimal β * that minimizes the total sum ζ β (T) = ∑ i∈N x i (T) of all pollution stocks x i (T) at time T when equilibrium values of x i 's are reached: where the compact set B ⊂ (1, ∞) of feasible policies is known a priori. The EU countries acting as the followers in the game choose their strategies u i minimizing their costs (4). Their strategy is the best response to the leader's choice (8), while the leader can take this best response into account in advance.
The dynamics of the system in this game is given by (1). The sum of all pollution stocks at equilibrium is decreasing with increasing β. This behavior is displayed in Figure 3. Thus, the leader should choose β > 1 sufficiently large in order to decrease the overall pollution up to a satisfactory level.
The simulation results, displayed in Figure 4, show that we are indeed able to lower the pollution stock while the remaining behavior like the consensus in pollution does not change. Compared to the case without punishment (Figure 2a), already a weak external punishment (Figure 4a) can halve the pollution x i (T) which all approach the attractor x * i given by (7). A strong punishment can force countries to behave more ecologically friendly than in the initial situation. For larger values of β, the pollution stocks x i (T) for all countries i and so the sum of all pollution stocks ζ β (T) even converge to zero (see Figure 3).

Imitation Behavior
In Section 3.3 an external administration needs to punish the countries very much in order to reduce the pollution stock to a moderate level. This strong penalty from an external force leads to a loss of independence regarding individual environmental choices. Therefore, we investigate whether an imitation behavior of countries can also reduce the pollution stock. In this case, we do not consider such a strong influence from an external administration. In the following, we distinguish between a basic imitation according to (5) and a more advanced imitation according to (6).
In both cases, we observe that the players agree on their investment into environmentally friendly policies but they converge to different pollution stocks x i (T). This is due to the fact that in our model all countries of the EU are connected and imitate each other in terms of their investment u. This result is not very surprising as similar results by Ranjbar-Sahraei (e.g., in [46,47]) show. Comparing  Figures 5a and 6a, we can see that this coinciding investment differs for different starting values for u. The results of Figure 5 are obtained when considering a short initial phase in which each country minimizes its own costs. Afterwards, they start with imitating the neighbors' investment strategies according to (5) or (6). In contrast to this, the results displayed in Figure 6 are obtained after a long initial phase. Compared to the short initial phase, we consider here twice as much time in which the countries minimize their own costs. Of course, the investment into clean policies determines the pollution stock as well. For a consensus on a non-environmentally friendly behavior the pollution stock may even increase exponentially. The more advanced imitation behavior is more robust against the initial conditions. In both cases, see Figures 5b and 6b, we do not notice an exponential growth but rather a convergence of the pollution stock towards different values x i (T) per country. Again, the countries differ in the amount of pollution they produce. Furthermore, when the countries perform a more advanced imitation behavior, they converge to a lower amount of pollution stock (see Figure 6).  Pollution stock x and investment u into environmentally friendly policies for each player with both a basic and more advanced imitation behavior with a short initial phase in which all players minimize their individual costs. All results are obtained with the same initial conditions such as the initial values for x i (0). (a) Basic imitation behavior after a short initial phase; (b) More advanced imitation behavior after a short initial phase.

Discussion
In this work, we introduce different game theoretical models in order to give a starting point for understanding the pollution behavior of the EU countries. This is important since then, we can find feasible ways how to control and improve the pollution behavior of countries. Our case studies demonstrate that the pollution stock can indeed be reduced by the influence of an external force that increases the costs for pollution. However, this scenario is not very realistic, since the countries would lose their independence of decision. We believe that no country would accept such a high intervention from an external party. However, if each country acts only according to its own interests, thus minimizing its individual costs, the pollution stock increases a lot before it starts to saturate.
With basic imitation, we observe that the increase of pollution can be reduced depending on the initial conditions. The basic imitation approach, where a country is influenced equally strong by each of its direct neighbors, is very susceptible to the initial conditions. Starting with large values for u may end up in an exponential pollution growth. Using the same initial conditions for the more advanced imitation approach, where a country is more influenced by neighbors that pay less for applying their strategy, the pollution stock is not growing exponentially. Additionally, with the same initial phase, the pollution can be reduced remarkably by applying a more advanced imitation behavior instead of a basic imitation behavior. Figure 7 shows the differences in the remaining pollution stock between the two imitation strategies.
Thus, we could see that we can find incentives to make all EU countries act environmentally friendly. We can adapt our algorithms regarding the behavior of countries in the whole world.
The shown case studies only consider a network of 28 countries, but in principle all countries of the whole world could be included. Our numerical software can handle also different network structures. For example, we can add or remove countries or even consider a completely different association of countries. Furthermore, the initial conditions like the initial pollution values can easily be modified. Thus, we can easily update our settings by using most accurate available data. For future work, we want to examine alternative objectives for the countries involved. Imagine the countries do not minimize their own costs but jointly minimize global costs consisting of the sum of all individual costs. Then, all countries would have one common goal instead of only addressing their own welfare. In this case, we can evaluate both the individual pollution per country as well as the global pollution. Additionally, with a feasible measurement of the political power of countries, we could also model a stronger influence of those countries that have more political impact.
Furthermore, we wish to add more details to our models. Until now, the adjacency matrix is a symmetric matrix with fixed values. This is due to the fact that it is hard to measure the source of each countries' pollution and thus to find reliable data on the influence of other countries on the pollution of neighboring countries. Considering geographical conditions, this adjacency matrix can just as well be asymmetric (as it is in the case of the downwind countries). We can easily model more realistic behavior by changing the entries of the adjacency matrix.