A Resource Extraction Model with Technology Adoption under Time Inconsistent Preferences

: A two-stage non-standard optimal control problem with time inconsistent preferences is studied. In an inﬁnite horizon setting, a time consistent (sophisticated) decision maker chooses the time of switching between two consecutive regimes. The second regime corresponds to the implementation of a new technology, and a cost must be paid at the switching time. Although the problem is formulated for a general discount function, special attention is devoted to models with nonconstant discounting and heterogeneous discounting. The problem is solved by transforming it into a problem in a ﬁnite horizon and free terminal time. The corresponding dynamic programming equations are presented, and conditions for the derivation of the switching time by decision makers with different degrees of sophistication are studied. A resource extraction model with technology adoption is solved in detail. Effects of the adoption of different discount functions are illustrated numerically.


Introduction
In the optimal management of a natural resource, one problem of interest is whether or not it is profitable to change to a new technology and, in the affirmative case, when to do it. This is of particular interest in the case of nonrenewable natural resources, since if the new technology implies a more efficient extraction or exploitation, we can extend the actual availability of the resource.
From a formal perspective, the former question is a controlled endogenous regime shift or two-stage (or multiple-stage) optimal control problem. While there is a rich literature on regime shifts (some recent contributions on the topic are Gromov and Gromova [1] for switches in differential games, Gromov et al. [2] for optimal control problems with infinite switches, or the different chapters in Haunschmied et al. [3] for a recent good overview of applications to Economics), there are less papers focusing on the optimal timing of switching. Some papers studying this last problem, for the case of one decision maker, are Tomiyama [4] and Amit [5], who derived necessary conditions for the optimal switching time in a finite time horizon, while Makris [6] focused on the infinite time horizon case. The extension to problems with more than one agent has been studied, for instance, in Dawid and Gezer [7] and in Long et al. [8]. In all these models, the switching time involves a trade-off between immediate costs and potential future benefits.
Regarding the study of biases in intertemporal decision processes, variable rates of time preference have received considerable attention in recent years. These biases, leading to decisions that are not totally rational from an axiomatic point of view, are supported by experimental evidence pointing out the fact that decision makers are more impatient in their short-term choices than in their long-term ones when they face similar decisions (see, for instance, Thaler [9]). Thus, payoffs in the near future are discounted at higher instantaneous rates than payoffs in the long-run, for instance, by using a discount function of the type where ρ (·) = 0. The case when ρ (·) ≤ 0 describes a situation in which decision makers are more impatient for short-term decisions, whereas for ρ (·) ≥ 0 the effect is the opposite. For ρ (·) = 0, we recover the standard case with a constant discount rate. However, since the work by Strotz [10] it is well known that when the instantaneous discount rate depends on the position of the decision maker, as in (1), standard optimization techniques fail in providing time consistent solutions. Karp [11] faced the analysis of dynamic optimization problems in a continuous time setting with non-constant discount rates, and obtained, in the infinite time horizon case, a dynamic programming equation (or modified Hamilton-Jacobi-Bellman equation) that characterizes time consistent solutions in this framework. Later on, Marín-Solano and Navas [12] extended the approach to the finite horizon case and studied the application to a nonrenewable resource problem with non-constant discounting.
A different type of time inconsistent preferences was analyzed in Marín-Solano and Patxot [13], who introduced and studied a problem with heterogeneous discounting in a deterministic setting. In that paper, in a finite horizon setting, payments of utilities derived from consumption enjoyed during the planning horizon are discounted at a rate (ρ 1 ) different to that (ρ 2 ) of the final function, representing, e.g., savings to be enjoyed after retirement. This can be justified in the sense that it seems questionable to assume that the enjoyment of different goods should be discounted at the same rate. In that model, the final function can be seen as a function of a good that is somehow different.
The introduction of different (constant) discount rates can be justified in different ways in our model with a regime switch. First, note that e −ρ 2 (T−t) = e −ρ 1 (T−t) · e (ρ 1 −ρ 2 )(T−t) . For ρ 2 > ρ 1 , this is an increasing function in t. This means that, as time t approaches T, the t-agent assigns a higher value to the final term, which will be in our model the moment in which the new technology is adopted. Hence, the decision maker has a bias towards the present, but this bias goes down as the moment of technology adoption is nearer. This is in agreement with psychological perceptions of many decision makers, according to which they can assign an increasing value to a change in regime when it is nearer in the future. For example, if a decision maker placed at time 0 exploits a natural resource and has the option to introduce a new technology at a future date T, but there exists some uncertainty regarding the actual effectiveness of the new technology, this uncertainty can be internalized by the decision maker at time 0 by applying a discount rate to payoffs obtained after T different to that applied to current payoffs before that time. There are other potential justifications of introducing different discount rates. If, after the regime switch, the firm is more efficient, it could have access to better financial conditions, and this could impact the discount rate by reducing it (ρ 2 < ρ 1 in this case). An opposite effect (ρ 2 > ρ 1 ) could be present if we introduce mortality rates (of the business) in the long term (i.e., after T), maybe due to stopping in the use of the resources (oil, natural gas. . . ) by the society.
The objective of this paper is to combine the above ideas, by extending previous results in standard optimal control problems with two regimes, in which the switching time is a decision variable, to a framework with time inconsistent preferences. Special attention is paid to the case of non-constant discounting (an area of increasing interest in Economics) and to the case of heterogeneous discounting. In order to solve the problem, we transform the infinite horizon problem with a switching time into a finite horizon problem with free terminal time. Then we find necessary conditions on the terminal time to be satisfied by decision makers with different degrees of sophistication (or rationality). The procedure proposed to solve the problem is then applied to a model of management of a natural resource, in which the agent has to decide when to implement a new technology.
The paper is organized as follows. Section 2 presents the main problem for an arbitrary discount function. Three particular classes of discount functions are described. Section 3 collects and derives some theoretical results that will be used in the paper. A procedure for solving the problem is presented in Section 4. In Section 5, we solve in detail a resource extraction model with technology adoption. Numerical illustrations showing the effects of introducing the different discount functions are presented in Section 6. Section 7 concludes the paper.

The Problem: Regime Switching with Time Inconsistent Preferences
In this Section we will state the general problem for the case of one decision maker with time inconsistent preferences. First, we introduce the general model in which future utility streams are discounted through a general discount function. Later on, we present some specifications for this discount function. The more relevant one is that of non-constant discounting (Problem A), a model that has been widely explored in a continuous time setting during the last fifteen years. A modified version of non-constant discounting is presented later on (Problem B). As we will see in Section 4, this modified version simplifies the resolution of the problem. Although it is less realistic and departs from the standard model with non-constant discounting (somehow it is in the middle point between nonconstant discounting and standard exponential discounting), it will serve us to illustrate the difficulties of the problem due to the introduction of non-constant discounting. As a final specification, we will present a third problem (Problem C) in which the decision maker discounts the future by using constant discount rates, but these discount rates can be different for the different utilities and costs.

The General Model
First, we state the general model. For simplicity, we will restrict our analysis to the one-dimensional case, so that there is just one state variable x ∈ R and one control variable u ∈ R. The extension to multidimensional problems is straightforward.
The decision maker maximizes a flow of utilities enjoyed along an infinite planning horizon [0, ∞), but has the possibility to change to a better technology at any moment T ∈ [0, ∞). This change can modify the state dynamics, improve the payoffs, or can have both effects. The utility function is given by and the state dynamics is driven bẏ Finally, the agent incurs a cost Ω(x(T), T) at the moment T in which she implements the new technology. The objective of the decision maker is to maximize Functions F i and f i , for i ∈ {1, 2}, and d j , for j ∈ {1, 2, 3}, are assumed to be, at least, continuously differentiable in all their arguments. In addition, we will assume that the second integral converges.
In the previous expression, functions d j (s, t), for j ∈ {1, 2, 3}, represent the way the agent at time t (the t-agent) discounts the different utilities (profits and costs) enjoyed at a future time s. Hence, it is natural to assume that ∂d i (s, t) ∂s < 0 (later enjoyments are valued less than more recent ones) and lim with ρ a (positive) constant number, time preferences become time inconsistent, in the sense that what is optimal for the agent at time t is no longer optimal for the agent at a future time t > t. In order to find time consistent decision rules, we have to solve a sequential game with a continuous set of players, described by all of the t-agents. In the literature of non-constant discounting, these agents are said to be sophisticated.

Remark 1.
In economic models and, in particular, in the resource model studied in Section 5 in this paper, the utility functions depend just on the control variable (representing consumption, extraction rate, harvest rate. . . ). In that case, it is common to assume that F i (u), for i ∈ {1, 2}, is continuously differentiable, strictly increasing, and concave. In addition, where g i (x) is a continuously differentiable and concave (possibly linear) production function. These conditions facilitate the fulfillment of the conditions in Benveniste and Scheinkman (1979) for the concavity and differentiability of the value function.

Remark 2.
The extension of the theoretical results in the paper to multidimensional problems with x ∈ X ⊂ R n and u ∈ U ⊂ R m is straightforward, provided that the value functions are sufficiently smooth.
In order to solve the problem (2)-(4), we can proceed backwards in time. For t > T, i.e., once the new technology has been adopted, the agent at time t aims to maximize in the control variable u the payoff function given the dynamicsẋ In order to find time consistent decision rules (or time consistent policies) followed by sophisticated agents, we can apply the nowadays well-established procedures described in, e.g., Karp [11], Ekeland and Lazrak [14], Marín-Solano and Shevkoplyas [15] or Yong [16], among others. Later on, for t < T, the t-agent maximizes the general payoff function given the dynamics (3). In this problem, for s > T, the control decision rule u(s) = φ(x(s), s) is taken as given, and is that calculated in the resolution of Problem (5)- (6). Hence, we have to compute the decision rule u(s) for the initial period s ∈ [t, T]. In addition, we must derive the switching time T.

Particular Cases
Although we will study how to solve the general problem stated in Section 2.1, in the present paper we will pay special attention to some particular cases that arise in economic applications and, in particular, in the management of a natural resource.

Problem A: Non-Constant Discounting
The standard procedure in economics is to assume that the discount function depends on the time distance between the moment t in which a decision is taken and the moment s in which utility derived from that decision will be enjoyed. In that case, d j (s, t) = θ j (s − t), for j ∈ {1, 2, 3}. Functions θ j (τ), τ ∈ [0, ∞), are assumed to be continuously differentiable. The corresponding instantaneous discount rates are given by As usual, we assume that ρ j (τ) > 0, for all τ, and lim τ→∞ ρ j (τ) > 0. Present-biased preferences are represented by a nonincreasing discount rate (ρ (s) ≤ 0).
In addition, it is commonly assumed that the discount rate is unique, so that θ j (τ) = θ(τ) and ρ j (τ) = ρ(τ), for all τ ∈ [0, ∞). As a result, the intertemporal utility function Problem A consists in looking for time consistent strategies maximizing (8) subject to (3) and to the future behavior of the agent. If the discount rate is constant and given by ρ > 0, then the discount function is an exponential, θ(τ) = e −ρτ , and we recover the so called Discounted Utility model that has been widely used in Economics. In that case, time preferences are time consistent and we simply have to find the optimal switching time for a standard optimal control problem. However, the problem becomes much more complicated in the case of (time-distance) non-constant discounting.

Problem B: Modified Non-Constant Discounting
Problem A describes the standard model of non-constant discounting. Note that the decision maker, at time t < T, discounts future enjoyments at time s > T by taking as a reference point the initial time t, so that d 2 (s, t) = θ(s − t). This is, we think, the natural approach in a setting with non-constant discounting in which the agent discounts the future by using the same discount rate. In Problem B we make a slight modification of this approach, by writing d 2 (s, t) = θ(T − t) · θ(s − T). Then, the intertemporal utility function Although this approach is, we think, questionable, it will serve us to better understand the differences between non-constant and constant discounting. Note that, if the discount rate is constant, the discount function in Problems A and B is the same, θ( , and both problems become equivalent.

Problem C: Heterogeneous Discounting
As a third particular case, we consider a situation in which the decision maker has a constant discount rate, but it is non-unique. In the present paper, Problem C is represented by the use of two different discount functions. More precisely, Several justifications on the employment of heterogeneous discount rates in the problem of extraction of a natural resource with a regime switch (due to the implementation of a new technology) were presented in the Introduction. We refer to, e.g., Marín-Solano and Patxot [13] and de-Paz et al. [17] for the discussion of the rationale and quantitative and qualitative implications of the introduction of these time preferences in more general problems. As illustrated in those papers, there are some relevant qualitative effects appearing in real life situations that can be explained by the use of heterogeneous discount functions for goods of a different nature.
In the present paper, for simplicity, we will assume that both the cost of implementing the new technology and future utilities are discounted at the same discount rate ρ 2 . For the derivation of the theoretical results we will not make assumptions on the sign of ρ 1 − ρ 2 .

Preliminary Results
The standard switching conditions for our problem in standard optimal control theory are usually formulated in terms of the Hamiltonian functions (in present or current value forms) corresponding to the two regimes. Unfortunately, there is no easily manageable version of the Pontryagin maximum principle in problems with non-constant discounting (as illustrated in Karp [11], a problem with non-constant discounting can be rewritten as a standard problem with a constant discount rate by introducing an auxiliary term in the Hamiltonian function. However, such a term incorporates the solution to the problem in feedback form). In the present paper we will follow an alternative approach. The idea consists of transforming the problem with a switching time into a finite horizon problem with free terminal time and time inconsistent preferences. More precisely, we will divide the problem into several steps, described in Section 4.1. In these steps, we will need to make use of conditions for finding strategies in both the control variable u and the terminal time T. In this section we collect the main theorems that will be used at the different steps.

Dynamic Programming Equation in Infinite Horizon
We summarize in this section some results presented in Marín-Solano and Shevkoplyas [15]. Let us consider the problem with an intertemporal utility function Functions d(s, t), F(x, u, s) and f (x, u, s) are assumed to be continuously differentiable in all their arguments. If u * (s) = φ(x(s), s) is a decision rule, then the corresponding payoff is given by Following Ekeland and Lazrak [14], for > 0, let us define If the t-agent can precommit her behavior during the period [t, t + ), the payoff along the perturbed control path u is given by given by (15) attains its maximum for v = φ(x, t). Alternatively, equilibrium rules are characterized by the condition P(x, φ, v, t) ≤ 0.
From Theorem 6 in Marín-Solano and Shevkoplyas [15], let the value function be given by (13), with φ(x(s), s) as the equilibrium rule. If the value function is of class C 1 , is an equilibrium rule. Alternatively, the solutions to are equilibrium rules.

Dynamic Programming Equation in Finite Horizon
Next, let us consider the problem of a sophisticated agent maximizing subject to (12), with functions d(s, t), F(x, u, s), f (x, u, s), and G(x, t, T) continuously differentiable in all their arguments. This problem is similar to the one studied in Section 4.1 in Marín-Solano and Shevkoplyas [15], but now function G can also depend explicitly on t.
As above, for > 0, let us consider the variations (14). If the t-agent can precommit her behavior during the period [t, t + ), the valuation along the perturbed control path u is given by Then, equilibrium rules for the problem (18)- (12) are defined as in Definition 1.

Proposition 1.
If the value function V(x, t) is of class C 1 in all their arguments, then it satisfies the functional equation

Proof. See Appendix A.
In the previous proposition we have assumed that the equilibrium rule is already given. Next, we prove that the equilibrium rule can be obtained by solving the right-hand term of the functional equation Equation (21) is the Dynamic Programming Equation for the problem (18)-(12).

Proposition 2.
If the value function is of class C 1 , then the solution u = φ(x, t) to the dynamic programming Equation (21) is an equilibrium rule.
Proof. See Appendix A.

A Free Terminal Time Problem
Finally, we study the problem with intertemporal utility function (18) subject to the dynamics (12), but now we consider that the final time T is also a decision variable. We will analyze the problem under different degrees of sophistication of the decision maker.
First, let us consider that the terminal time T can be decided by the agent at initial time, according to her time-preferences. Although this means that the terminal time can be precommited by the 0-agent and this is not in the spirit of looking for time consistent decision rules, we will start with this simple approach to center the problem.
For t ≤ T, let V T (x, t) denote the valuation along the equilibrium rule of the t-agent starting at initial state x(t) = x with terminal time T. If the 0-agent can decide the terminal time, she will simply maximize in T the function V T (x, 0). For this standard optimization problem, it is rather straightforward to adapt the proof in Hartl and Sethi [18] for ordinary optimal control problems to our setting with a general discount function. (18)-(12) for a time consistent (or sophisticated) agent, with the terminal time T free. If the agent can decide the terminal time at t = 0, then a necessary condition for the optimality of T * from the perspective of the 0-agent is

Proposition 3. Let us consider Problem
Proof. It is similar to the proof of Proposition 4 in Marín-Solano and Navas [12] for the case of non-constant discounting.
Under no commitment in the terminal time, each T-agent-who can be seen, for every T, as a different player in our setting with time inconsistent preferences-will have to decide if it is convenient for her to stop the problem (so the terminal time is T) or to continue. In order to make this decision, the T-agent has to compare the payment received if she finishes the problem at time T, with the payment received in the future moment at which she will stop if she decides to continue at time T. Next we formalize this idea.

Definition 2.
A terminal strategy for sophisticates is a set I ⊂ [0, ∞) defined as follows: τ ∈ I if, and only if, The idea is that elements τ ∈ I are the terminal times at which the agent at time τ decides to stop if the problem has not finished previously. Then, the final time for a sophisticated agent T * is characterized as follows: Assume that T * is the terminal time. Then, for every s ∈ [t, T * ), every s-agent obtains higher profits by finishing the problem at time T * compared with finishing the problem at time s, i.e., V s (x(s), s) < V T * (x(s), s).

Proposition 4.
If T * ∈ (0, ∞) is the final time for a sophisticated agent, then Next, if there exists > 0 such that the interval [T * , T * + ) ⊂ I, and if the problem does not finish at time T * , it will finish immediately later. Since T * is the terminal time then, for all τ ∈ (T * , . This suggests the following definition.

Definition 3.
We say that the agent is -sophisticated if candidates to the terminal time T satisfy the following conditions: There exists δ > 0 such that: 1.
For all τ ∈ (T, T + δ), V T (x(T), T) ≥ V τ (x(T), T). If U is the set of points verifying these conditions, -sophisticated agents finish the problem at time T * = inf{T ∈ U}.
-sophisticated agents are partially myopic, in the sense that they analyze if it is convenient for them to stop at time T * or to continue during a very short time period. The following proposition provides a necessary condition for a terminal time for -sophisticated agents.

Proof. See Appendix A.
Concerning the search of the terminal time for (fully) sophisticated agents, if T * is the terminal time for -sophisticated agents and there exists > 0 such that the interval [T * , T * + ) ⊂ I, then T * is also the terminal time for sophisticated agents. This is indeed the situation that we find in the numerical resolution of the model of Section 5. In that model, if T * is the terminal time (corresponding to the switching time in the original model) for an -sophisticated agent, for problems with an initial state lower than x(T * ) we obtain a corner solution (condition (25) is satisfied), so that the agent decides to implement the new technology at initial time. Since the state dynamics x(s) is (strictly) decreasing, every T-agent, for all T > T * , will choose to stop at time T.

Solving the Model: Decision Rules and Switching Times
In this Section we present a general procedure to solve Problem (7) subject to the state dynamics (3). The underlying idea consists in applying first the results presented in Section 3.1 in order to solve the problem for t ≥ T. Later on, the original problem with a regime switch is transformed into a finite horizon problem with final function and free terminal time. For that problem, we compute first the decision rule for arbitrary T, and finally find the switching time to be chosen by -sophisticated agents (we refer to the discussion in Section 3.3). We will assume that the regime switch can take place just one time. This will be the case, indeed, in our setting in which the decision maker (e.g., a firm) has to decide when to change to a new and better technology. Once the firm has paid the cost of implementing the new technology, it will be profitable to maintain the improvement along the remaining whole planning horizon. After presenting the general procedure, we will make some remarks of some particularities that appear in our problems with non-constant discounting (Problem A), modified non-constant discounting (Problem B), and heterogeneous discounting (Problem C).

The General Model
We will solve the problem in several steps.
Step 1. The first step consists in solving the problem for t > T. Hence, we must solve the problem with intertemporal utility function (5) subject to (6). By applying the results presented in Section 3.1, the equilibrium decision rule can be derived by solving (16) or (17). Let u * (s) = φ 2 (x(s), s) denote the equilibrium decision rule for t ≥ T, and let be the corresponding value function.
Step 2. Once we have derived the equilibrium rule u(s) = φ 2 (x(s), s), for s ≥ T, we can solve the corresponding dynamical equation with initial condition x(T) = x T , i.e., Let x * (s) = ϕ 2 (x T , s) be its solution, and defineφ 2 (x T , s) = φ 2 (ϕ 2 (x T , s), s). By substituting in (7), along this trajectory, the payoff function can be rewritten as By defining the intertemporal utility function can be rewritten as Step 3. Problem (28)-(29) is a non-standard optimal control problem with time inconsistent preferences in a finite planning horizon. Hence, we can solve it for an arbitrary "final" time T. Unlike the problem studied by Marín-Solano and Shevkoplyas (2011), the present problem exhibits a "calendar effect", in the sense that the final function depends explicitly on t. By applying Proposition 2 we know that equilibrium rules u(s) = φ 1 (x(s), s) for s ∈ [0, T) are the solutions to For the calculation of the value function V 1 (x, t) we can solve the dynamic programming Equation (21). Alternatively, if we can solve explicitly (in closed form) the differential equation given by the state dynamics along the equilibrium rule, it can be more convenient to proceed as follows. Given the equilibrium rule u(s) = φ 1 (x(s), s), for s < T, let x(s) = ϕ 1 (x, s) be the solution tȯ x(s) = f 1 (x(s), φ 1 (x(s), s)) , x(t) = x , for t < s < T .
By definingφ 1 (x, s) = φ 1 (ϕ 1 (x, s), s) and substituting in (28), we obtain where x T = ϕ 1 (x, T). In practice, for the computation of the equilibrium rule and the corresponding value function, equations (30) and (31) have to be solved jointly.
Step 4. It remains to compute the switching time T * . Note that we have transformed the problem of finding the switching point into that of looking for the "optimal" terminal time in a finite horizon problem with a final function. Hence, we can use Proposition 5 to solve the problem for -sophisticated decision makers, as defined in Definition 3.

Particular Cases
Under non-constant discounting (Problem A), the decision rule after the switching point (i.e., for t ≥ T) is stationary. Hence, since the problem is autonomous, we can restrict our attention to stationary convergent Markovian strategies, i.e., strategies u(s) = φ(x(s)) for which there exists x ∞ < ∞ and a neighborhood U of x ∞ such that, for every x T ∈ U, the solution to (6) along u(s) = φ(x(s)) converges to x ∞ . For stationary convergent strategies, the integral (5) converges. Later on, in the implementation of Step 2, the final function G(x T , t, T) depends explicitly on t and T. This fact can complicate some calculations, such as those related to the derivation of the terminal time corresponding to the switching point. Before the implementation of the new technology (t < T) the decision rules are non-stationary, in general. However, in the model of the following Section we will present a situation (with a constant cost function) in which the equilibrium strategies are stationary along the whole planning horizon.
In Problem B, the decision rule after the switching point (i.e., for t ≥ T) coincides with that in Problem A and is, therefore, stationary. Concerning the final function, it is independent from t. If, in addition, the cost function Ω does not depend explicitly on T, then the final function G is also independent from T, simplifying in this way the search of the terminal or switching time. Later on, in the implementation of Step 2, as in the case of nonconstant discounting, the final function G(x T , t, T) depends explicitly on t and T. Before the implementation of the new technology (t < T) the decision rules are non-stationary, also for the case of constant (or null) cost.
Under heterogeneous discounting (Problem C) things are similar to Problem B. Equilibrium decision rules are stationary for t > T and non-stationary for t < T. Although the final function (27) depends explicitly on t and T for this model, its dependence is such that it can be removed, as we show in Section 5.3 when we solve a resource extraction model with technology adoption.

A Resource Extraction Model with Technology Adoption
In this Section we illustrate the previous results by applying them to the management of a natural resource whose owner has to decide when to adopt a new technology improving the extraction process. In the model, we assume that the utility function in both periods is the same, so that F 1 (x, u) = F 2 (x, u). In particular, we take F 1 = F 2 = ln u. Concerning the state dynamics, it is given byẋ For a = 0 we recover the simplest model of the extraction of a nonrenewable resource, which is probably the most interesting case in our setting. If a > 0, the production function presents constant returns to scale. However, it implies an exponential, unlimited, growth of the resource, a property that is ecologically unrealistic in the setting of natural resources.
In any case, we will solve the model for an arbitrary a ≥ 0 (it can be easily checked that, under the assumptions made regarding Problems A, B, and C, the integrals converge). Concerning the remaining parameters, the improvement in the technology is represented by taking γ 1 > γ 2 > 0. For a two-player game discounting the future at constant (and unique) discount rates (and a = 0), this problem was studied in Long et al. [8].
The cost function is assumed to be Ω(x) = α ln x + β, with α, β ≥ 0. This choice can be justified economically in several ways. As we will illustrate later when we solve the model, it could correspond to a situation in which the cost is paid in units of the resource. Since, as we will show, the expression of the value function for time t ≥ T is V(x T ) = A ln x T + B, with A a positive number, if a fraction δ ∈ (0, 1) of the resource is paid in order to implement the technology, then the valuation becomes V ((1 − δ) In this case, α = 0 and β = −A ln(1 − δ). For our model with logarithmic utilities, paying in units of the resource implies that the cost is constant and independent from the amount of the resource. Probably, it would be more realistic to pay a fraction δ of the sum of discounted utilities after the implementation of the improvement, represented by the value function at time T. In that case, after paying the cost, the valuation would be (1 − δ)V(x T ) = A ln x T + B − δA ln x T − δB. Therefore, α = δA and β = δB in this setting.
In the following we will solve the above problem for the three discount functions described in Section 2: nonconstant discounting, modified nonconstant discounting, and heterogeneous discounting. In the final step, we will derive the conditions for interior solutions, by making use of (24). Conditions for corner solutions can be written in a similar way.

Problem A
First, we will solve the problem stated in Section 2.2.1, corresponding to non-constant discounting. This is indeed the most interesting case. We proceed according to the following steps.
Solution for t ≥ T. In that case, the t-agent has to solve the problem with the intertemporal utility function given by This problem has been already addressed in several papers (see, e.g., Marín-Solano and Navas [12]). It can be easily shown that a stationary linear decision rule exists for this problem, and it is given by By substituting (34) in (33) and solving the differential equation, we obtain Transforming the switching time problem into a finite horizon problem. From (8), the payoff function of the t-agent at time t < T is given by By taking (35) for t = T, substituting and simplifying, the functional above can be written as where Finally, the dynamics for s < T is given bẏ Solving the problem for t < T. From Proposition 2 for the case d(s, t) = θ(s − t), first we solve By solving (38) for u(s) = φ(x(s)) given as in (39), we obtain Therefore, By substituting (41) in (36), taking s = T in (40) and substituting Therefore, and, from (39) and (41), the decision rule becomes i.e., Note that if the cost is paid in units of resource, so that α = 0 (the cost is constant), the decision rule is stationary.
Derivation of the switching time. It remains to compute the switching time for -sophisticated agents. We apply the results in Section 3.3 to problem (36)-(37) for the case in which the discount function is d(s, t) = θ(s − t). In Problem A, the terminal condition becomes It remains to compute the four terms appearing in Equation (44). First, note that, taking t = 0 and s = T * in (43), Concerning the second term, since Next, Finally, after several calculations, the fourth term in Equation (44) is given by By substituting (45)-(48) in (44), the switching condition is derived.

Problem B
Next, let us solve the problem stated in Section 2.2.1, corresponding to a modified version of nonconstant discounting. We proceed as in the previous case.
Solution for t ≥ T. The t-agent has to solve the problem with payments given by (32) and dynamics (33), whose solution is (34)-(35). In addition, for t ≥ T, the value function is given by Transforming the switching time problem into a finite horizon problem. From (9) and (49), the intertemporal utility function of the t-agent at time t < T is given by where Solving the problem for t < T. As in the previous case, by applying Proposition 2 and guessing V 2 (x, t) = g(t) ln x + h(t), we obtain (39)-(41). By following the same procedure as in Problem A we easily derive Therefore, and from (39) and (41), the decision rule becomes i.e., Derivation of the switching time. For the calculation of the switching time for -sophisticated agents, note that in Problem B the final function depends just on the state variable. Hence, the terminal condition simplifies to Next we compute the three terms appearing in Equation (54). Taking t = 0 and s = T * in (53), ln u| x=x(T * ),t=T * ,T=T * = Finally, By substituting (55)-(57) in (54), we obtain the switching condition.

Problem C
Finally, we solve the problem with heterogeneous discounting presented in Section 2.2.3.
Solution for t ≥ T. In this case, the optimal decision rule for the problem (32)-(33) with θ(s) = e −ρ 2 s is u(s) = φ(x(s)) = ρ 2 x(s), i.e., u(s) =φ(x T , s) = e (a−ρ 2 )(s−T) x T , and the corresponding value function is Transforming the switching time problem into a finite horizon problem. From (10) and (58), the payoff function of the t-agent at time t < T can be written as whereḠ Solving the problem for t < T. By proceeding as in the previous cases, if the value function is V 1 (x, t) = g(t) ln x + h(t), we obtain (39)-(41). By substituting these expressions in (59) and (60), we obtain dτ − ln(γ 1 g(s)) ds+

Derivation of the switching time.
For the calculation of the switching time for -sophisticated agents, in order to apply the results in Section 3.3, we can write forḠ(x T ) given as in (60). Alternatively, we can apply Proposition 3 in Marín-Solano and Patxot [13] to functionḠ(x T ). It is straightforward to check that both procedures are equivalent. Indeed, note that, in the switching time T * , where Since the decision rule is given by (41) with g(τ) given by (61), taking t = 0 and s = T * , ln u| x=x(T * ),t=T * ,T=T * = ds − ln(γ 1 g(T * )) + ln x 0 .
In a similar way, and From (63)-(65) we derive the switching condition.

Numerical Illustration
Next, we illustrate numerically some of the previous results for the case of a nonrenewable natural resource (a = 0) by focusing on the two main settings corresponding to the non-constant discounting case (Problem A) and to the heterogeneous discounting case (Problem C). Additionally, we will include the case of standard exponential discounting, where temporal preferences are time consistent (Problem S), which can be obtained from any of the other analyzed cases by eliminating the temporal bias. In the case of non-constant discounting, we take as a discount function a convex linear combination of two exponential functions, i.e., θ(τ) = νe −ρ 1 τ + (1 − ν)e −ρ 2 τ , with ν ∈ (0, 1), and ρ 1 < ρ 2 , for which the instantaneous discount rate is given by that decreases from r(0) = νρ 1 + (1 − ν)ρ 2 to ρ 1 = lim τ→+∞ r(τ). Regarding the heterogeneous discounting case (Problem C), we take as discount functions θ 1 (t − s) = e −ρ 1 (s−t) for the instantaneous utility before the introduction of the innovation and θ 2 (t − s) = e −ρ 2 (s−t) , ρ 1 = ρ 2 , for utility after the regime switch. In our benchmark case, we take the values of the parameters ν = 0.5, ρ 1 = 0.05, ρ 2 = 0.15 defining the temporal preference of the decision maker. Regarding the efficiency in the exploitation process, we assume γ 1 = 1.3 and γ 2 = 1.1. Note that parameters γ 1 and γ 2 determine the efficiency in extraction before and after the introduction of the innovation, respectively. The lower the value of γ i , i ∈ {1, 2}, the more efficient is the extraction process. Regarding the cost of innovation, we assume that it is a fraction δ% of the value of the project (given by the value function) for the decision maker at the switching time T * , and in particular we set δ = 0.045. Moreover, as initial resource stock, we take x 0 = 1000. Finally, for the standard discounting case (Problem S), we will use as a discount function θ(τ) = e −ρτ , whereρ = ρ1ρ2/(ρ1 − νρ 1 + νρ 2 ) is obtained as the solution of The intuition behind (66) is to find a constant rate of time preference,ρ, that shows a similar overall level of impatience to the one given by the non-constant discount function, an idea that was proposed in Strulik [19]. Table 1 collects the switching times and the resource stock left at that time for nonconstant, heterogeneous and standard discounting cases. We can observe that the existence of some bias in the temporal preferences negatively affects the early adoption of the new technology, especially under non-constant discounting. In that case, the introduction of the innovation lasts almost twice compared with the standard case. Looking now at Figure 1, it is interesting to observe that the evolution of the resource stock under nonconstant and standard discounting is very similar. However, despite this coincidence in the extraction rates, note that since the decision maker in Model S introduces the innovation at a significant earlier time, she will consume more from that moment up to the time at which a decision maker with non-constant discounting preferences will do it. This can be easily seen in the plot of the evolution of the consumption rate at Figure 2. In the case of heterogeneous discounting, due to the particular bias in this setting, we can observe that at initial periods the decision maker undervalues all of the payoffs she will earn after the regime shift, so there is a significant overconsumption during these initial periods, which can be observed in the consumption rate. As the switching time approaches, this undervaluation decreases, and disappears at T * . Consequently, the time consistent consumption rule will coincide with that of a decision maker with standard discounting at a rate of time preference of ρ 2 .   Finally, we analyze results from Table 2, where a sensibility analysis with respect to some parameter values is included. In the setting of non-constant discounting, higher values of ρ 2 are associated with a higher impatience for short-run decisions, while in the heterogeneous discounting setting it implies an overvaluation of payoffs before the introduction of the innovation compared with payoffs after T * . Similarly, in the case of standard discounting, with an overall constant impatience rate, the level of impatience increases with ρ 2 , although in this last case there is no particular bias in the temporal preferences. In all three settings we can observe that an increase of ρ 2 negatively affects the timing of the innovation, especially in the case of Problem A. Moreover, note that with nonconstant discounting, the long-term rate of time preference is always the same (ρ 1 = 0.05), so all of the resulting delays in T * can be attributed to the increase in the impatience degree in the short term. With regards to changes in the efficiency improvement associated with the innovation, lower values of γ 2 represent larger improvements in efficiency. When this happens, in the three cases we can see a reduction in the timing of the innovation. On the contrary, by increasing the cost of the innovation (augmenting the value of δ) the effect is the opposite, and in all the cases the decision maker will delay the regime shift. In conclusion, in terms of sustainability of the resource, it is clear that the sooner an improvement in the exploitation process is introduced, the larger the saving in wasted resources (note that one unit of consumption requires γ i units of the resource).

Conclusions
In this paper we have studied the switching conditions between two different regimes, characterized by a possible change in the objective function and/or in the system dynamics, when the decision maker shows time inconsistent temporal preferences. In particular, we have focused on the cases of non-constant discounting and heterogeneous discounting. Each of these two settings induce a different bias in the temporal preferences. The main objective has been to analyze this framework from the perspective of a sophisticated agent, by transforming our original infinite horizon problem with a switching time into a finite horizon problem with free terminal time. After this, we derived the necessary conditions on the terminal time to be satisfied by decision makers with different degrees of sophistication (or rationality). Finally, the proposed procedure has been applied to a natural resource extraction model in which the decision maker has the option of implementing a more efficient exploitation technology.
There are several possible extensions of this work. In our resource extraction model we have focused on the case of log utility and a linear natural growth function. The extension to general isoelastic utilities or to non-linear growth functions would allow a richer analysis of the resource management problem. Another extension that we consider of special interest is the case of two agents where only one or both can decide on a regime shift.
Author Contributions: All authors have contributed equally to all the parts of the paper. All authors have read and agreed to the published version of the manuscript. By dividing by and taking the limit → 0 + we obtain and, from Definition 1, the result follows.
Proof of Proposition 4. Assume that T * is the terminal time. Then, for every s ∈ [t, T * ), every s-agent obtains higher profits by finishing the problem at time T * compared with finishing the problem at time s, i.e., V s (x(s), s) < V T * (x(s), s). In particular, for > 0, the (T * − )-agent will decide to continue until T * . Therefore,