Next Article in Journal
Collaborative Cost Multi-Agent Decision-Making Algorithm with Factored-Value Monte Carlo Tree Search and Max-Plus
Next Article in Special Issue
Dynamic Awareness and Strategic Adaptation in Cybersecurity: A Game-Theory Approach
Previous Article in Journal
Factors in Learning Dynamics Influencing Relative Strengths of Strategies in Poker Simulation
Previous Article in Special Issue
Defining Cyber Risk Scenarios to Evaluate IoT Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Generalized Hyperbolic Discounting in Security Games of Timing

1
imec-DistriNet, Department of Computer Science, KU Leuven, 3001 Heverlee, Belgium
2
School of Computation, Information and Technology, Technical University of Munich, 85748 Garching, Germany
*
Author to whom correspondence should be addressed.
Games 2023, 14(6), 74; https://doi.org/10.3390/g14060074
Submission received: 30 June 2023 / Revised: 12 November 2023 / Accepted: 26 November 2023 / Published: 30 November 2023
(This article belongs to the Special Issue Game Theory for Cybersecurity and Privacy)

Abstract

:
In recent years, several high-profile incidents have spurred research into games of timing. A framework emanating from the FlipIt model features two covert agents competing to control a single contested resource. In its basic form, the resource exists forever while generating value at a constant rate. As this research area evolves, attempts to introduce more economically realistic models have led to the application of various forms of economic discounting to the contested resource. This paper investigates the application of a two-parameter economic discounting method, called generalized hyperbolic discounting, and characterizes the game’s Nash equilibrium conditions. We prove that for agents discounting such that accumulated value generated by the resource diverges, equilibrium conditions are identical to those of non-discounting agents. The methodology presented in this paper generalizes the findings of several other studies and may be of independent interest when applying economic discounting to other models.

1. Introduction

Game theory’s great promise is to predict the evolution of a multi-agent interaction from agents’ preferences. Unfortunately, the particulars of scenarios tend to complicate the derivation of these predictions, and accurately describing a real scenario requires progressively more advanced models of agent behavior. This is markedly true for interactions that persist over time.
Research on games of timing dates back to the Cold War period, and various scenarios have by now motivated a sizable body of literature (see, for example, [1]). One effort is a series of papers centered around the so-called FlipIt model [2], which attempts to capture the dynamics of a persistent two-agent interaction involving a single contested resource. This resource is an abstract representation of, e.g., knowledge of cryptographic keys or passwords or ownership of cloud resources [3].
FlipIt models capture scenarios taken in part from a series of high-profile (cyber) attacks against supposedly well-protected targets—the Iranian industrial control systems [4], the U.S. Office of Personnel Management [5,6], large telecommunication providers [7,8], major health care insurance companies [9], and, recently, critical infrastructure companies in Ukraine [10]. An essential property shared by these threats is that against them, perfectly effective preventative investments are nearly impossible [11,12]. Mitigation strategies such as in-depth security audits or longer-term investments to reorganize existing IT infrastructures are therefore weighed against the economic incentives of would-be attackers.
While this line of work has explored diverse facets of security decision-making, a limitation in most studies on games of timing in general and on FlipIt-like games in particular is that they do not consider economic discounting in the valuation of the contested resource—so although the environment changes over time, the value of the resource and the costs of attacking or defending it do not. A few more recent efforts have attempted to bridge this deficiency by considering the effect of exponential discounting on the valuation and defense costs over time [13,14,15]. Such studies have found new predictions of optimal defense investments due mainly to the fact that an exponentially discounted investment has a finite cumulative valuation over time. Although mathematically appealing, exponential discounting is known to have low descriptive accuracy in most contexts. It remains to consider a more comprehensive approach to time-based discounting.
Our work meets this need by analyzing a FlipIt-like contested resource scenario under a more generic present-focused discounting framework using a two-parameter family of generalized hyperbolic discounting functions. These well-studied functions, introduced by Loewenstein and Prelec in 1992 [16], smoothly interpolate an agent’s temporal valuation between an arbitrary rate of exponential discounting and a constant valuation, nesting exponential and hyperbolic discounting as special cases. As we cover the entire space of generalized hyperbolic discounting functions, our conclusions subsume those of earlier research efforts involving persistent resource control—including some that do not involve discounting and work on exponential discounting [13,14].
Our primary technical contribution is to show that for a wide range of agent discounting preferences, the agent’s strategic considerations and equilibrium conditions are the same as if they were not discounting. We also provide a complete characterization of player utilities and (Nash) equilibrium conditions for the FlipIt model in which both agents are interacting using the same strategic class out of periodic or exponential (defined in Section 4.5) and when both agents are applying a form of discounting with parameters from the same class out of sub-hyperbolic or super-hyperbolic (defined in Section 3). We show that generalized hyperbolic discounting unlocks the possibility of equilibria in which neither player moves and prove that periodic strategies no longer always strictly dominate the class of so-called renewal strategies, as they do for the model without discounting, a result first stated by van Dijk et al. [2]. Beyond these new discoveries, and because some of the derivations developed for this work can be used to extend results from less generic models, we believe that our framework would likely be helpful in other timing-based games besides FlipIt.
We organized the rest of the paper as follows. In Section 2, we discuss related work. Then, we describe our model, which builds on FlipIt, in Section 4. We present our analysis of the model in Section 5, and we discuss some interesting aspects of our findings in Section 6. Finally, we conclude in Section 7. Following the main document are appendices containing formal proofs and derivations to support the claims made in Section 5 and Section 6.

2. Related Work

Many interactions, including security interactions and especially those dealing with Advanced Persistent Threats (APTs), have an important temporal dimension. Studying such interactions necessitates comparing the value of present and future potential gains or losses. It is here that discounting models come into play, offering precise explanations for the valuations of economic resources, including privacy and security, by individuals, institutions, or societies over time.
Within the field of economics, the exponential discounted-utility model pioneered by Samuelson [17] and Ramsey [18] has become the predominant approach to discounting. The exponential model posits that the rate at which assets depreciate remains constant, allowing a single discount factor, β , to encapsulate all individuals’ disparate perceptions and behaviors at different times. Specifically, it discounts utility flows with the function e β t , where t is the horizon of the utility flow and β is a free parameter representing the discounting rate.
The exponential model is popular because it is relatively easy to understand and reason about and mathematically tractable. There are also solid theoretical arguments for exponential discounting. Strotz [19] was the first one to derive exponential discounting as the only approach to intertemporal choice available to decision-makers whose choices are always consistent, in the sense that they are never in conflict with themselves at later points in time. This theoretical effort was later repeated by others in different settings [20,21]. The consistency property has established exponential discounting as the dominant model that is often considered to be normatively correct [22].
However, we know that the exponential discounting utility model is only very rarely descriptively accurate—it does not do well predicting the behaviors of agents in the real world. Numerous behavioral research works have uncovered facets of human and animal behavior that cannot be captured by the exponential discounting framework [23]. This includes present-focused preferences, preference reversals, self-control problems [16,24], effects of temptation [25], psychometric distortions such as subjective time [26] and probability perception [27], magnitude effects, and myopic decision-making on account of our limited cognitive faculties [28]. The search for more descriptive accuracy has resulted in many other discounting models, including present-biased or quasi-hyperbolic discounting [29] and the hyperbolic and generalized hyperbolic models discussed below. Ericson and Laibson [30] present an excellent overview of theories on intertemporal choice.
Within the wealth of existing models, the model studied most by psychologists is the hyperbolic discounting function, first implied by Herrnstein [31]. It is a mathematically uncomplicated one-parameter model that has been shown to greatly outperform exponential discounting in terms of descriptive accuracy and incorporates present-focused preferences and preference reversals. Hyperbolic discounting scales utility with the function ( 1 + α t ) 1 , where t is again the horizon of the utility flow, and where α is a free parameter related to the discounting rate. Hyperbolic discounting preserves the proportion between the ratio of future valuations and the corresponding future times—e.g., the relative drop in valuation between times 10 and 20 is (almost) the same as the relative drop in valuation between times 100 and 200. This implements a strictly decreasing discount rate; when compared with exponential discounting, hyperbolic discounting depreciates faster near t = 0 and slower for large t.
The hyperbolic model is often extended with an additional parameter β to ( 1 + α t ) β / α . It is then called generalized hyperbolic discounting or hyperboloid discounting. Here, the ratio β / α reflects the nonlinear scaling of the amount and delay [32,33]. Parameter α can also be interpreted as how much the function departs from constant-rate (exponential) discounting [16], with lower values of α corresponding to more (traditional) “rational” behavior.
The generalized hyperbolic discounting function has many desirable properties making it an excellent choice for study. For one, it embeds several other major models: classical exponential discounting, hyperbolic discounting, and constant or no discounting, for α , α = β , and α 0 , respectively. Generalized hyperbolic discounting can be shown to have quantitatively higher predictive accuracy beyond that which can be accounted for simply by the addition of an additional parameter to the hyperbolic model [33,34]. The additional parameter allows the modeling of very different agent types. For example, many studies have shown that humans, in many contexts, tend to exhibit behavior in accordance with α β , while animals such as pigeons exhibit behavior where α β ; we refer to Vanderveldt et al. [23] for a review of these studies. Lastly, in addition to expressing delay discounting behavior of agents with present-focused preferences, generalized hyperbolic discounting can be rationalized as a result of risk and probability discounting [35]. The underlying idea is that the value of a future reward should be discounted because there is a risk that rewards will not be realized. Generalized hyperbolic discounting can arise both when the hazard rate is horizon-dependent and when it is uncertain [36,37,38].
Our work builds on these behavioral and economic insights to advance the literature on games of timing [1] through the adoption of generalized hyperbolic discounting in the framework of a FlipIt-like game [2,39]. As such, our work contributes also to the overall space of security economics and, in particular, the application of game theory to security and privacy challenges [40,41].
FlipIt is a game motivated by persistent, stealthy, sophisticated attacks by an advanced persistent threat (APT) and models a situation where two players compete for the ownership of a resource generating value over time [2,39]. Apart from the inclusion of discounting, the interaction we investigate is exactly that of FlipIt described by van Dijk et al. [2], although our mathematical description of it is significantly different and more streamlined.
The interesting strategic aspects of FlipIt have led to numerous follow-up studies. Most closely related to our work are studies on the impact of exponential discounting in the framework of the FlipIt game [13,14,15]. These have found new predictions of optimal defense investments due largely to the fact that an exponentially discounted investment has a finite cumulative valuation over time. Throughout the paper, we will discuss the relationship of model variations with generalized hyperbolic discounting, exponential discounting, and no discounting in detail.
To the best of our knowledge, there is no comprehensive review article about the FlipIt game. Most of the literature has centered on the study of variations of the game’s structure, such as the types of involved players [42,43], the speed and efficacy of moves [42,43,44,45], the game’s time horizon [45,46], the structure of the resource and partial control [47,48,49], variations on the assumption of perfect stealthiness [42,43,44,46], discretization [45], the effects of budget constraints [46], and more sophisticated strategies involving machine learning [50]. FlipIt can also be embedded as a stage into other games [51]. More recently, Banik and Bopardikar [52] have formulated a variation of FlipIt as a zero-sum discrete control game, where the defender aims to stabilize a dynamical system. Miura et al. [53] have formulated a FlipIt-like game, augmented by an epidemic model, where defender and attacker compete to control as many computing resources as possible over a set period of time.
Although the concept of time forms the very core of the FlipIt game, van Dijk et al. [2] and most follow-up work consider many aspects of the interaction to remain unchanged as time progresses. This includes the value of the resource, as well as the costs to attack it. This presents some conceptual challenges, as nothing is everlasting, and also results in some questionable predictions, such as every resource being attacked by attackers at non-zero rates. In [13,14], Merlevede et al. introduced exponential discounting of future gains and costs, addressing some of these concerns. In [15], these authors also introduce a novel class of “discounted strategies” to the exponentially discounted game in which players move less frequently as the resource loses value. We are unaware of further studies considering discounting in the context of the FlipIt game.

3. Discounting

The term discounting implies that the future value of a measurable quantity is less than its present value. The effect of this value decrease can be formalized by a discount function  D ( t ) . D ( t ) gives a multiplicative factor expressing the value at future time t > 0 relative to its present value (at time t = 0 ).
An intuitive measure of the behavior of the discount function is the discount rate:
ρ ( t ) = D ( t ) D ( t ) .
The discount rate indicates how fast the value decreases at any time t 0 .
Our model discounts player gains and player costs along a member of the family of generalized hyperbolic discounting functions, introduced by Loewenstein and Prelec [16]1, which have the form
D ( t ) = 1 ( 1 + α t ) β / α α > 0 , β > 0 .
Figure 1a illustrates the hyperbolic discount function, while Figure 1b displays the corresponding discount rates. For generalized hyperbolic discounting, discount rates are always strictly decreasing:
ρ ( t ) = β 1 + α t .
Parameter α determines how much the function resembles the exponential function. For varying α , the family of generalized hyperbolic discounting functions spans a wide variety of discounting behaviors, including exponential discounting for α 0 , true hyperbolic discounting for α = β , and no discounting for α .
lim α 0 1 ( 1 + α t ) β / α = e β t ( exponential )
1 ( 1 + α t ) β / α α = β = 1 1 + β t ( hyperbolic )
lim α 1 ( 1 + α t ) β / α = 1 ( no   discounting ) .
Note that for exponential discounting ( D ( t ) = e β t ), the discount rate is equal to a constant value ( ρ ( t ) = β ).
The case of “true” hyperbolic discounting ( α = β ) corresponds to a boundary condition separating two qualitatively distinct classes of discounting functions. We will refer to generalized hyperbolic discounting with α < β as super-hyperbolic discounting, and to discounting with α β as sub-hyperbolic discounting. We include hyperbolic discounting ( α = β ) as a member of the class of sub-hyperbolic discounting functions. The following observations largely explain the behavioral differences between these two classes of functions.
  • For α < β (super-hyperbolic discounting), the area under D ( t ) ’s curve is finite and given by:
    τ = 0 + D ( τ ) d τ = 1 β α .
  • For α β (sub-hyperbolic discounting), τ = 0 T D ( τ ) d τ does not converge for T + .

4. Model

This section introduces our model for stealthy timing-based security games with generalized hyperbolic-discounted costs and resource valuations. We model the same interaction first presented in van Dijk et al. [2] but present a simplified, more streamlined mathematical characterization and include discounting. Our model subsumes models without discounting [2] and with exponential discounting [14] as special cases (Table 1). Although adding generalized hyperbolic discounting is a small technical change, it has a large conceptual impact as a model for risk, ephemerality, or bounded rationality. It also necessitates an entirely different mathematical approach to model analysis.

4.1. Overview

In our two-player game, a defender (D) and an attacker (A) vie for control over a central resource. To obtain control, either player i { A , D } can choose to pay a fixed instantaneous cost c i to ‘flip’ or immediately assume control of the resource. The last player to execute a move always controls the resource.
The controlling player accrues utility at a rate that decreases over time along a generalized hyperbolic function. The cost to execute a move is also time-discounted along a (possibly different) generalized hyperbolic function. Control is stealthy in the sense that neither player knows who controls the resource until the moment that they initiate an instantaneous flip. The remainder of this section formalizes the game.

4.2. Player Strategies

For player i { D , A } , define
t i = ( t i , 0 , t i , 1 , t i , 2 , )
to be a strictly increasing sequence of real times at which player i moves.3 The length of t i can be finite or infinite. A player strategy in this game is defined completely by a probability distribution over a set of possible t i .

4.3. Player Control

The player control function indicates, for a given pair of move sequences ( t D , t A ) , whether the defender or the attacker is deriving utility from the resource at any given moment in time. In the particular case where different players’ moves collide in time, we define the outcome as a no-op, meaning that resource ownership remains unchanged. Thus, removing any such collisions if necessary, we may assume without loss of generality that t D t A = .
Let
t = t D t A = ( t 0 , t 1 , t 2 , )
be the strictly increasing sequence of player move times. Then, for any time t t 0 , we define the latest flip time function by
LFT : t max { t k t : t k t } .
From time t = 0 until the time of the first flip t 0 , the defender has control of the resource. The player control function can, therefore, be expressed as
PC : t D if t < t 0 or LFT ( t ) t D A if LFT ( t ) t A .
The asymmetry of the player control function shows that the defender has an advantage due to starting the game in control of the resource.
We also define a player control indicator function
PC i : t 1 PC ( t ) = i ,
which can be useful for integration. The player control indicator function tells us, for a given player i { D , A } and time t > 0 , whether that player controls the resource at that time.

4.4. Player Utilities

Each player’s utility is defined to be the difference between the player’s gains and the player’s costs:
u i = G i C i .
Both gains and costs are subject to generalized hyperbolic discounting. We define generalized hyperbolic discounting, player gains, and then player costs in the following subsections.

4.4.1. Gains

Players achieve gains when in control of the resource. Gains initially accrue value at some rate of V units of value per unit of time. Gains decrease over time according to a player-dependent generalized hyperbolic discount function
D i : [ 0 , + [ ] 0 , 1 ] t 1 ( 1 + α i t ) β i / α i ,
where α i and β i are parameters characterizing player i’s (im)patience. We use reversed brackets to indicate open interval boundaries to avoid confusing intervals for ordered tuples, notably strategy profiles.
The discounted gain rate of player i up to time t may be determined by computing the expected weighted integral of PC i ( t ) up to t, normalized with respect to the total (discounted) value of the resource up to t:
G i ( t ) = E τ = 0 t PC i ( τ ) · V · D i ( τ ) d τ τ = 0 t V · D i ( τ ) d τ .
The expectation is taken over possible game outcomes resulting from the player’s strategies, represented by stochastic process PC i . Normalization allows comparing player gains for different discount rates and interpreting gain as a fraction of total achievable gain.
The gain of player i is the maximum limit of her discounted gain rate as time t moves to infinity:
G i = lim sup t G i ( t ) .
Note that G i ( t ) [ 0 , 1 ] for every t, and so G i [ 0 , 1 ] as well. When the strategies are restricted to periodic or exponential, as described in Section 4.5, then lim t G i ( t ) exists for each player i { D , A } , so that we can replace lim sup with limit. If we would further assume that players have the same discounting factors so that ( α D , β D ) = ( α A , β A ) , then we would obtain G D + G A = 1 by the linearity of expectation.

4.4.2. Costs

When players perform a move, this comes at a fixed instantaneous cost of c i > 0 . As with gains, we discount costs according to a player-dependent generalized hyperbolic discount function
D i c : [ 0 , + [ ] 0 , 1 ] t 1 ( 1 + α i c t ) β i c / α i c ,
where α i c and β i c characterize i’s (im)patience, this time with respect to costs. Note that α i c and β i c do not have to equal α i and β i , meaning that costs may be discounted at different rates from gains. This results in four discounting parameters per player and eight discounting parameters in total.
The discounted spending rate of player i up to time t may be determined by computing the expected weighted sum of instantaneous costs of the moves made by i up to time t, normalized with respect to the total (discounted) value of the resource:
C i ( t ) = E τ t i , τ t c i D i c ( τ ) τ = 0 t V · D i ( τ ) d τ .
As with gains, the expectation is taken with respect to the distribution used to define t i . Scaling gains and costs by the same factor ( τ = 0 t V · D i ( τ ) d τ ) makes the normalization operation neutral with respect to the behavior of utility-maximizing players. Finally, since we only deal with normalized costs and gains and since c i and V are both free parameters, we can, without loss of generality, assume that
V = 1 .
This assumption allows us to consider the instantaneous cost c i = c i / V as a unitless value, expressing a fraction of the initial rate at which the resource accrues value per unit of time.
A player’s (total) cost is defined as the limit of her spending rate as time t moves to infinity:
C i = lim t C i ( t ) .

4.5. Restricted Strategies

The description of all possible player strategies given in Section 4.2 is the most general class of strategies to which we can define a definite outcome according to our discounted FlipIt model. However, to exhibit an effective strategy and facilitate analysis, it is necessary to introduce additional constraints that reduce the number of free parameters in a strategy specification.
Each of the two strategy classes we consider in this paper—the class of exponential and the class of periodic strategies—has been studied extensively in prior work. Both classes are described by a single real parameter corresponding to the expected number of moves per unit of time, which we will refer to as flip rate or move rate throughout the paper. Both classes are also motivated by real-world examples of mitigation strategies in the context of advanced persistent threats.

4.5.1. Exponential Strategies

An exponential strategy is characterized by having the time of its first flip as well as its flip inter-arrival times (the time between subsequent flips) drawn from the same exponential distribution, described by probability density function
f ( t ) = ν e ν t .
If player i adopts an exponential strategy, we refer to the exponential distribution’s rate parameter ν i as her flip rate, move rate, or play rate. The expected time between two of her moves then equals 1 / ν i .
Exponential strategies are fully characterized by the single parameter ν i and are robust to information leakage due to their memorylessness properties. These properties make them a straightforward choice when the timing of moves might be observable.

4.5.2. Periodic Strategies with Random Phase

A strategy is a periodic strategy iff the time between consecutive moves is constant. If player i adopts a periodic strategy, the time between her moves is the period of her strategy, and we denote it by δ i . The inverse of the period, ν i = 1 / δ i , is her strategy’s play rate. In the context of periodic strategies, we refer to the time of the first flip, t i , 0 , as the phase and denote it as φ i . Periodic strategies with random phase are those periodic strategies whose phase is drawn uniformly random from the positive values smaller than its period. A periodic strategy with random phase is fully characterized by the single real number δ i , or equivalently ν i . Formally,
t i , 0 = φ i U [ 0 , δ ] and t i , n + 1 t i , n = Δ t i , n = δ for all n 0 ,
where U [ 0 , δ ] denotes the uniform distribution between 0 and δ .
As with exponential strategies, periodic strategies are specified by a single real parameter. They are of even more practical importance because decision-makers commonly implement them in real-world systems (e.g., Microsoft’s “Patch Tuesday” and secret rotation policies). An additional reason for looking into periodic strategies is that, when not discounting, they tend to perform outstandingly in the sense that they strictly dominate a wide class of strategies, including the exponential ones. This result was formalized in van Dijk et al. [2], and we revisit it in Section 6, where we state the property more precisely and show that it no longer applies in our discounted setting.

5. Analysis

In this section, we mathematically analyze our game and the behavior of participating rational players.
We begin by considering the case of sub-hyperbolic discounting ( α β ), where we state our principal result and show that the analysis of outcomes fundamentally reduces to that of the original FlipIt model without discounting as presented by van Dijk et al. [2]. For the case of super-hyperbolic discounting ( α < β ), our analysis requires many new investigations. We begin by addressing structural aspects of our game model that apply to both the exponential and the periodic strategy classes. We then proceed through a sequence of analyses involving successively more advanced notions of model outcomes, considering both strategy classes at each stage. The first stage addresses player utilities. The following stages continue the analyses addressing incentives, best responses, and, finally, Nash equilibria.

5.1. Sub-Hyperbolic Discounting

This section characterizes player behavior of hyperbolic and sub-hyperbolic discounters, i.e., player behavior when α β .
Theorem 1 (Player utilities for (sub-)hyperbolic discounters).
For any strategy profile producing convergent utilities when not discounting, the utilities of players discounting with α β  are the same as when not discounting. Specifically, discounted gains equal non-discounted gains
lim t E [ τ = 0 t PC i ( τ ) d τ ] t = lim t E [ τ = 0 t PC i ( τ ) D i ( τ ) d τ ] τ = 0 t D i ( τ ) d τ ,
and discounted costs equal non-discounted costs
lim t E [ τ t i , τ t c i ] t = lim t E [ τ t i , τ t c i D i c ( τ ) ] τ = 0 t D i c ( τ ) d τ .
Proof Outline.
To start, we show that the running average functional form with an unbounded denominator has the property that restarting the game at any fixed future point in time gives a new functional form with the same limit. Given a number ε > 0 , we construct a tracking mechanism on the functional form’s behavior, verifying that it adheres to the required convergence properties. The construction deterministically factors time into an infinite sequence of intervals [ s k , s k + 1 ] , where s k is the kth restart time, and s k + 1 is the infimum over the set of absolute times, which are required to exist by applying the limit definition to the kth functional form and ε .
A key property of this construction is that in the limit as k , the absolute restart times s k dominate the interval durations ( s k + 1 s k ) . We do not assume this. We prove that it has to be true of our construction for each fixed ε , or else there is a direct contradiction to our original limit assumption.
Now, we come to the discounted functional form. The only real attachment we have to the original functional form is a simple bounding relationship due to D ( τ ) decreasing in τ , which can be applied to finite integration intervals. Therefore, we use that to show that for all sufficiently large k, the discounted functional form on the interval [ s k , s k + 1 ] has to be close to the same limit as the non-discounted functional form. To wrap things up, for every sufficiently large time t, the well-defined representation of the discounted functional form in terms of the interval construction has three parts: a finite part early in time, an unbounded middle part, which is close to the correct running average value, and the present moment part, which is dominated by the restart time, and hence by the middle part.
The conclusion is that the limit value of the discounted functional form must be within 4 ε of the non-discounted limit value, and since ε was chosen arbitrarily at the beginning of the construction, the two limit values are the same. Due to notational differences between the functional form representations for gains and costs, the formal proofs are provided in different subsections, but the outline above is the same for both forms.
Appendix A gives the detailed proof.    □
Although we state and prove Theorem 1 for (sub-)hyperbolic discounting, the theorem and proof generalize to a much larger class of discounting functions (or other decreasing functions), the most crucial restriction being that the total discounted value accrued over time must diverge. Note that we impose no restrictions on the player’s strategy spaces except that player strategies must result in outcomes such that the limit expression for undiscounted utility exists, that is, lim t E [ τ = 0 t PC i ( τ ) d τ ] t must be defined. This is the case for the exponential and periodic strategy regimes.
Corollary 1.
When α i β i , α j β j , our model of discounted security games of timing is strategically equivalent to that of FlipIt-like games without discounting.
Proof. 
Each player evaluates the benefit of a periodic or exponential strategy based on the outcome. The theorem states that discounting (sub-)hyperbolically does not change this outcome, irrespective of the strategy that produced that outcome. Therefore, discounting does not change the computation of the optimal strategy.    □

5.2. Super-Hyperbolic Discounting

When discounting super-hyperbolically ( α i < β i ), player i’s calculated resource valuation is finite, even when accumulated over all of time.
With a finite total valuation, splitting the game into two parts turns out to be helpful for the derivation of player utilities: one typically finitely long part spanning from the start of the game until the first move by either player and another part spanning the remainder of time. We refer to the gain accrued by the defender, who starts in control, over the first part as the defender advantage  D ¯ D . The gain obtained by players during the second part is independent of their identity as attacker or defender, and we refer to player i’s gain over this part as i’s anonymous gain  G ¯ i . Appendix B.1 formally defines the concepts of defender advantage and anonymous gain, presents some of their properties, and uses them to present generic close-to-analytic expressions for G D , C D , G A , and C A .

5.3. Player Utilities for Super-Hyperbolic Discounting

For super-hyperbolic discounting behavior, i.e., when α < β , player utilities depend on the specific values of parameters α and β . This section lists analytical expressions for player utilities for exponential and periodic play.

5.3.1. For Exponential Play

The expression for player gain is not an elementary function but can be expressed in terms of the exponential integral function.
Definition 1 (Exponential integral function).
Define the exponential integral function as:
E r ( x ) = s = 1 + d s e s x s r .
To ease notation, we also introduce a helper function.
Definition 2 (Helper function for exponential play).
Define the helper function for exponential play as:
f : R 0 + × R 0 + ] 0 , 1 [ ( r , x ) e x · x · E r ( x ) .
The anonymous gain for exponential play has an elegant description through this helper function.
Lemma 1 (Anonymous gains for exponential play).
If both players are playing exponentially, then player i’s anonymous gain is given by:
G ¯ i = ν i ν i + ν j · f β i α i α i , ν i + ν j α i .
Proof Outline.
The total anonymous gain is the expected gain generated by the resource after the first move by either player is made. Player i moves at a rate or expected frequency of ν i , and her probability of having moved last, and therefore being in control, equals ν i / ( ν i + ν j ) at any point in time following the first move. We also show that the point in time at which the first move arrives is distributed according to an exponential distribution with rate parameter ν i + ν j . Finally, we derive a formula for the total anonymous gain by taking expectations over the time of the first move.
The full proof is in Appendix B.2.    □
Lemma 2 (Costs for exponential play).
The cost of an exponential strategy with play rate ν i is given by:
C i = c i · α i + β i α i c + β i c · ν i .
Proof. 
The probability density of flipping at any time is constant and equal to the play rate ν i . The discounted instantaneous cost of performing a flip at time t is c i · D i c ( t ) . Obtain the stated result by evaluating
C i = ( α i + β i ) t = 0 + ν i · c i · D i c ( t ) d t .
   □
The structure of the cost expression Equation (19) reveals that the presence of different discounting factors for costs and gains complicates notation without impacting the results. We obtain the same range of behaviors by assuming costs and gains are discounted by the same parameters and varying only the instantaneous cost of a move c i . We will assume that the discounting parameters for costs and gains are equal ( α i c = α i , β i c = β i ), essentially re-defining instantaneous cost as
c i · α i + β i α i c + β i c .
The utility functions for exponential play follow from Lemmas 1 and 2.
Theorem 2 (Utilities for exponential play).
If both players are playing exponentially and have α i < β i , their utilities are given by:
u D = c D ν D ν A ν D + ν A · f β D α D α D , ν D + ν A α D + 1
u A = c A ν A + ν A ν D + ν A · f β A α A α A , ν D + ν A α A .
Proof. 
The attacker never obtains any utility before his first move. Therefore, his gain is equal to his anonymous gain. Subtracting costs from gains yields Equation (21).
To obtain the defender’s gain, we use the fact that if α D = α A and β D = β A , then the players’ gains sum to one. This relation is discussed in Appendix B as Equation (A43). Writing the defender’s gain as one minus the expression for the anonymous gain and subtracting costs yields Equation (22).    □
Figure 2 illustrates gains and utilities for exponential play. The defender has a notable advantage over the attacker because she starts in control of the resource. The gain curves show this advantage decreasing as players move more often. Higher play rates by i (right for i = D , up for i = D ) correspond to i moving and taking control of the resource more often, increasing i’s gain. Similarly, higher play rates by her opponent j (up for i = D , right for i = A ) decrease i’s gain.
Higher play rates by i always increase her cost. The utility curves illustrate that these flip costs always, at some point, offset the increase in gain. We can also see where players’ utilities are maximal. The defender’s utility is always highest at the origin, where it equals one. The attacker’s utility reaches a maximum for some point along the abscissa (where it depends on α A , β A , and c A ).

5.3.2. For Periodic Play

As for exponential play, we introduce a helper function to ease notation.
Definition 3 (Helper function for periodic play).
Define the helper function for periodic play as:
h i ( ν ) = ν ( 2 α i β i ) ( 3 α i β i ) 1 + α i ν 3 α i β i α i 1 .
We use the notation h i ( ν ) to refer to the helper function’s derivative, d h i ( ν ) d ν .
Unlike for exponential play, the expression for anonymous player gain depends on whether the player is faster or slower.
Lemma 3 (Anonymous gains for periodic play).
If both players play periodically and have α i < β i , then the anonymous gain of the slower player is given by:
G ¯ i | ν i ν j = ν i · h i ( ν j ) 1 2 α i β i .
The anonymous gain of the faster player is given by:
G ¯ i | ν i ν j = ν i · h i ( ν i ) 1 2 α i β i + ( ν j ν i ) h i ( ν i ) .
The derivation of these non-trivial expressions can be found in Appendix B.3.
We can derive the costs and utilities for periodic play in the same way as for exponential play.
Lemma 4 (Costs for periodic play).
The cost of a periodic strategy with play rate ν i is given by:
C i = c i · β i α i β i c α i c · ν i .
Proof. 
The proof is the same as the proof for exponential play (Lemma 2).    □
The cost formulas for exponential and periodic play are identical, allowing us to redefine the instantaneous costs using Equation (20) as for exponential play. In the remainder of this paper, we will always assume that α i c = α i and β i c = β i .
Theorem 3 (Utilities for periodic play).
If the defender is the faster player ( ν D ν A ), player utilities are given by:
u D ν D ν A = c D ν D ν A · h D ( ν D ) 1 2 α D β D + 1
u A ν D ν A = c A ν A + ν A · h A ( ν D ) 1 2 α A β A .
If the defender is the slower player ( ν D ν A ), player utilities are given by:
u D ν D ν A = c D ν D ν A · h D ( ν A ) 1 2 α D β D + ( ν D ν A ) h D ( ν A ) + 1
u A ν D ν A = c A ν A + ν A · h A ( ν A ) 1 2 α A β A + ( ν D ν A ) h A ( ν A ) .
Note that although it is not possible to evaluate the formulae in Theorem 3 for β i = 2 α i or β i = 3 α i , these gaps in the function’s domain are removable singularities. To deal with these, we re-define u i as the continuous extension of the formulae above; Appendix B.3 lists the full definition.
Figure 3 illustrates gains and utilities for periodic play. A comparison of periodic to exponential gains (Figure 2) shows that they are similar but that the rate of change of players’ gains with respect to their move rates is higher for periodic play. Since the costs for periodic and exponential play are the same, this is also reflected in players’ utilities. The defender’s advantage decreases faster for periodic play as players start moving more quickly.
The super-hyperbolically discounted utility functions look and exhibit behavior that is very similar to the exponentially discounted utility functions derived in Merlevede et al. [14], both for periodic and exponential play. The generalized hyperbolic model interpolates between the exponentially discounted and the undiscounted utilities, and we can observe that this interpolation results in very “smoothly” changing utilities, where small changes to α i and β i result in small changes to the player utilities and, ultimately, even in the behavior of rational players. As we will see in the coming sections and discuss in Section 6, discounting parameters for which 2 α i < β D result in a function with mostly the same characteristics as the (simpler) exponentially discounted utility functions, while the game with α i < β i < 2 α i exhibits some new characteristics.

5.4. Player Incentives and Best Responses

Knowing players’ utility functions, our attention turns to how players behave as they try to optimize their utility. The first step involves player incentives.
Definition 4 (Player incentive).
Player i’s incentive is the partial derivative of her utility to her play rate, expressed in standard mathematical notation as
u i ν i .
Player incentives are closely related to player best responses because the roots of a player’s incentive function point to the local optima of her utility function.
A player’s incentive is generally a function of both her own play rate and her opponent’s play rate. Appendix C.1 states expressions for the player incentives for exponential and periodic play.
Figure 4 and Figure 5 display players’ incentive functions for exponential and periodic play. The defender’s incentive to play is low when the attacker moves slowly. She generally has less motivation to move than the attacker, as she is always in control while the resource is the most valuable. Note that the graphs assume that executing a move is free ( c D = c A = 0 ), causing incentives to always be positive. Increasing the cost of a move does not change the shape of the incentive function but shifts it downwards (by an amount c i ). Other properties of these graphs are discussed in the following sections.

5.4.1. Directionality of Incentives and Base Incentive

Using standard analytical techniques, we can show that incentives are decreasing for both exponential and periodic play (see Appendix C.2). Specifically, for exponential play, a player’s incentive is always strictly decreasing in her own play rate. For periodic play, the direction of a player’s incentive depends on whether she is the faster or slower-moving player. If she is the slower-moving player, her incentive is independent of her play rate, while her incentive is strictly decreasing in her play rate if she is the faster-moving player. We state this more precisely in Lemmas A18 and A22. The independence of incentive is clearly visible in Figure 5 as perfectly horizontal and vertical contour lines.
As players’ incentives are always decreasing, their incentive is maximal when they are not playing ( ν i = 0 ) , and the existence of a non-zero best response depends on their incentive when they do not play. This observation motivates the definition of base incentive.
Definition 5 (Base incentive).
Player i’s base incentive when playing against a player who moves at rate ν ¯ j is her incentive when not playing ( ν i = 0 ). Define player i’s base incentive functionas:
BI i ( ν ) = u i ν i ν i = 0 ν j = ν .
Note that a player’s base incentive function is a function of her opponent’s play rate. Figure 6 displays base incentive functions for exponential play. Graphs of the base incentive for periodic play look very similar.
Using the concept of base incentive and leveraging what we know of the directionality of the incentive function, we can state the following.
Corollary 2 (Best responses for exponential play).
For exponential play, each player has a unique, single-valued best response determined by her base incentive and incentive as follows:
  • If her base incentive is negative, her best response is not to play.
  • If her base incentive is strictly positive, then her incentive function has a single root at ν i > 0 , and her best response is to play at rate ν i .
Corollary 3 (Best responses for periodic play).
Player i’s best response to her opponent playing at rate ν ¯ j can be characterized in terms of her base incentive as follows:
  • If her base incentive is strictly negative, then her unique best response is not to play.
  • If her base incentive is zero, then any play rate ν i [ 0 , ν ¯ j ] is a best response.
  • If her base incentive is strictly positive, then her incentive function has a single root at ν i > 0 , and her best response is to play at rate ν i .
Corollaries 2 and 3 imply that each of the contour lines in Figure 4 and Figure 5 corresponds to a best-response curve for a specific flip cost ( c i ). The properties of the incentive functions stated above allow finding best responses as the roots of the incentive function using straightforward numerical procedures.

5.4.2. Directionality and Origin of Base Incentive

We can analyze the behavior of the base incentive function using standard analytical techniques (see Appendix C.3). The attacker’s base incentive is always strictly decreasing. The behavior of the defender’s base incentive function is discontinuous at ν A = 0 and depends on the relative size of 2 α D and β D and ν A = 0 . If 2 α D β D , then her base incentive function is strictly decreasing, similar to the attacker’s. If 2 α D < β D , then her base incentive is first strictly increasing, then strictly decreasing. We state this more precisely in Lemma A23.
As players’ base incentives are usually maximal when their opponents are barely moving at all, whether or not players choose to participate in the game is related to the sign of the base incentive near the origin. We can again analyze the behavior of base incentives near the origin using standard analytical techniques. For both exponential and periodic play, the base incentive of the defender is equal to c D at the origin. The value of the right limit of the base incentive of the defender at zero ( ν A 0 ) , and the value of the base incentive of the attacker at zero ( ν D = 0 ), depends on the relative size of 2 α i and β i . The origin base incentive becomes unboundedly large if 2 α i > β i and converges to specific values for 2 α i < β i . We state this precisely in Lemmas A28 and A29.
The properties of the base incentive function, together with their values at the origin, allow us to state the following results on best responses to non-participatory players.
Corollary 4 (Defender best response to non-participatory attacker).
For exponential and periodic play, the defender’s best response to a non-participatory attacker is not to play.
Corollary 5 (Attacker best response to non-participatory defender).
For exponential and periodic play, the attacker’s best response to a non-participatory defender is as follows:
  • If  2 α A β A , move at non-zero rates, regardless of cost ( c i ).
  • If  2 α A < β A , move at non-zero rates iff  c A < 1 β A 2 α A  and at the zero rate otherwise.

5.5. Equilibria for Super-Hyperbolic Discounting

This section characterizes and presents numerical procedures for finding Nash equilibria when both players discount super-hyperbolically.4 It starts with an investigation of equilibria in which at least one player does not move for both the periodic and the exponential regimes and to which we will refer as non-participatory equilibria (Section 5.5.1). It then discusses participatory equilibria for the exponential regimes (Section 5.5.2) and the periodic regimes (Section 5.5.3).

5.5.1. Non-Participatory Equilibria

Non-participatory equilibria are (Nash) equilibria in which at least one player never moves. The following can be stated as a corollary of Corollaries 4 and 5 and Lemma A28.
Corollary 6
For both exponential and periodic play, characterize the set of non-participatory equilibria as follows:
  • If 2 α A < β A and c A 1 2 α A β A , then there is a non-participatory equilibrium in which neither player moves.
  • Otherwise, there may be an equilibrium in which only the attacker plays if 2 α D < β D or if 2 α D = β D and c D 1 α D .
There are no other non-participatory equilibria.
Algorithm 1 presents a way to numerically determine the set of non-participatory strategy profiles.
Algorithm 1 Procedure for finding the set of non-participatory equilibria for exponential and periodic play
1:
if  2 α A < β A and c A 1 2 α A β A  then
2:
    return  { ( 0 , 0 ) } // Neither player moves
3:
end if
4:
if  2 α D = β D and c D 1 α D  then
5:
    return  { ( 0 , ν A ) } // Only attacker moves
6:
end if
7:
if  2 α D < β D  then
   // Compute best response of attacker to non-participatory defender
8:
     ν A Bisect ( u A ν A ν A = x ν D = 0 ,   X min = 0 ,   X max = 1 c A )  // Strictly decreasing
   // Inspect defender’s base incentive at  ν A
9:
    if  BI D ( ν A )  ≤ 0 then
10:
        return  { ( 0 , ν A ) }  // Only attacker moves
11:
   end if
12:
end if
13:
return// No non-participatory equilibria

5.5.2. Participatory Equilibria for Exponential Play

We now focus on equilibria in which both players move at non-zero rates, beginning with equilibria for the exponential strategy regime.
From our characterization of best responses for exponential play in terms of the base incentive function (Corollary 2) and the direction and the roots of this base incentive function (Lemma A23), we can bound the domain in which to look for participatory equilibria ( ν D , ν A ) .
  • ν D ] 0 , ν ¯ D [ , where ν ¯ D > 0 is the (unique) root of the attacker’s base incentive function.
  • If 2 α D β D , then ν A ] 0 , ν ¯ A [ , where ν ¯ A > 0 is the root of the defender’s base incentive function.
  • If 2 α D < β D , then ν A ] ν ¯ A ( 1 ) , ν ¯ A ( 2 ) [ , where ν ¯ A ( 1 ) and ν ¯ A ( 2 ) are the roots of the defender’s base incentive function with ν ¯ A ( 2 ) ν ¯ A ( 1 ) > 0 .
If the mentioned roots do not exist, there is no participatory equilibrium.
We can use a numerical procedure to find the participatory roots for exponential play. We find equilibria by searching through the domain ] 0 , ν ¯ D [ for the stable points ν D of the function ν D BR D ( BR A ( ν D ) ) , equivalently the roots of the function
r : ] 0 , ν ¯ D [ ] , ν D ] ν D ν D BR D ( BR A ( ν D ) ) .
The strategy profile ( ν D , BR A ( ν D ) ) is then an equilibrium.
We can obtain the same results by looking for stable points of the function ν A BR A ( BR D ( ν A ) ) .

5.5.3. Participatory Equilibria for Periodic Play

For periodic play, the insight that a players’ incentives are independent of their play rates when they are the slower player allows us to restrict the set of equilibria immediately. The following can be stated as a corollary of Corollary 3.
Corollary 7 (Faster play rate in participatory equilibrium).
In any participatory equilibrium, the faster-moving player moves at a rate corresponding to a root of the slower player’s base incentive function.
When the faster player moves at a rate ν ¯ f that is a root of the slower player’s base incentive, any rate ν s ] 0 , ν ¯ f ] is a best response for the slower player. Whether there exists an equilibrium involving ν ¯ f depends on whether one of these play rates makes playing ν ¯ f a best response for the faster player ( ν ¯ f = BR f ( ν s ) ).
We analyze the direction of the faster player’s incentive to see if such a ν s exists. It turns out that we can define the direction of player incentives in terms of the base incentive function.
Lemma 5.
For periodic play, the direction of the incentive of the faster player playing at rate  ν ¯ f with respect to the slower player’s play rate is opposite to the direction of the slower player’s base incentive in ν ¯ f :
2 u f ν s ν f ν s ν f = d BI s , f ( ν f ) d ν f = BI s , f ( ν f ) ,
where BI s , f is the base incentive function of the slower player evaluated using the faster player’s discounting parameters α f and β f .
Proof. 
For equal discounting parameters, the sum of player gains is constant (Equation (A43)). For exponential and periodic play, the impact of cost on incentive is constant, so the change of the faster player’s incentive with respect to the slower player’s move rate is opposite to the change of the slower player’s incentive with respect to the faster player’s move rate:
ν s u f ν f = ν s u s ν f c f c s = ν f u s ν s .
The slower player’s incentive equals her base incentive (Lemma A22).    □
This gives us an elegant description of the faster player’s incentive.
Lemma 6.
For periodic play, the faster player’s incentive is given by
u f ν f = BI f ( ν f ) + ( ν f ν s ) BI s , f ( ν f ) .
Proof. 
Lemma 5 implies that the faster player’s incentive changes linearly as a function of the slower player’s move rate, with the slope given by BI s , f ( ν f ) . The functions for incentive for slower and faster players are equal to each other when play rates are equal; therefore, when ν s = ν f , player f has an incentive of BI f ( ν f ) .    □
Corollary 6 and Equation (33) reveal that the faster player’s incentive behaves linearly with respect to the slower player’s move rate. We leverage this to determine the play rates by the slower player for which the faster player’s move rate is a best response. We can state the following as a corollary of Corollary 3 and Lemma 6.
Corollary 8 (Slower play rate in participatory equilibrium for periodic play).
Let ν ¯ f > 0 be the play rate of the faster-moving player. Move rates for the slower player to which move ν ¯ f is a best response for the faster player can be characterized as follows:
S s ( ν ¯ f ) = [ 0 , ν ¯ f ] i f BI s , f ( ν ¯ f ) = 0 a n d BI f ( ν ¯ f ) = 0 i f BI s , f ( ν ¯ f ) = 0 a n d BI f ( ν ¯ f ) 0 BI f ( ν ¯ f ) BI s , f ( ν ¯ f ) ν ¯ f [ 0 , ν ¯ f ] o t h e r w i s e .
Corollaries 7 and 8 and our knowledge of the base incentive function allow us to finally characterize the Nash equilibria for periodic play.
Theorem 4 (Participatory equilibria).
The set of equilibria where the defender moves faster than the attacker ( ν D ν A ) is given by:
( ν D , ν A ) ν D { ν D BI A ( ν D ) = 0 } , ν A S A ( ν D ) ,
where S A is as in Lemma 8. If 2 α A β A or c A is < 1 β A 2 α A , this set has zero or one element(s). Otherwise, it is empty.
Proof. 
Because the attacker’s base incentive is always strictly decreasing ( BI A , D < 0 ), we can disregard the case where BI A , D = 0 , so S A ( ν D ) contains at most one element regardless of the value of ν D . The set { ν D BI A ( ν D ) = 0 } contains at most one element. This is because the attacker’s base incentive function is strictly decreasing (Lemma A23). Therefore, there is at most one ν D . It exists if and only if the attacker’s base incentive is strictly positive, which is the case if 2 α A < β A and c A < 1 β A 2 α A or if 2 α A β A (Lemma A29). Whether a corresponding ν A exists is easily verified by evaluating S A ( ν D ) , that is, by evaluatiing BI D ( ν D ) / BI A , D ( ν D ) ν D .    □
Theorem 5 (Participatory equilibria with faster attacker for periodic play).
The set of equilibria where the attacker moves faster than the defender ( ν A ν D ) is given by:
( ν D , ν A ) ν A { ν A BI D ( ν A ) = 0 } , ν D S D ( ν A ) ,
where S A is as in Lemma 8. This set has zero or one element(s) if 2 α D > β D . If 2 α D β D = 0 , it has zero or one element(s) if 2 α D > β D and is empty otherwise. If  2 α D < β D , this set can have zero, one, two, or a continuum of elements.
Proof. 
The proof of Theorem 5 is identical to that of Theorem 4, except that if 2 α A < β A , then the defender’s base incentive function is first increasing then decreasing (Lemma A23). This results in up to two elements in { ν A BI D ( ν A ) = 0 } and introduces the possibility for BI D , A ( ν A * ) to be equal to zero, resulting in infinite roots if BI A ( ν A ) = 0 .    □
We can derive all periodic Nash equilibria algorithmically. Algorithm 2 gives an algorithm for deriving the equilibria with faster defender play. A similar but slightly longer algorithm yields the equilibria for faster attacker play.
Algorithm 2 Algorithm for finding the set of participatory equilibria with faster-moving defender for periodic play
   // Check if the attacker’s base incentive has a root
1:
if  2 α A < β A and c A 1 β A 2 · α A  then return end if
   // Determine the defender’s equilibrium play rate
2:
ν D Bisect BI A ( x ) , x min = 0 , x max = 1 / c A // Strictly decreasing
   // Determine attacker’s equilibrium play rate
3:
ν A BI D ( ν D ) / BI A , D ( ν D ) ν D
4:
if  ν A ] 0 , ν D ]  then return end if
5:
return  { ( ν D , ν A ) }

6. Discussion

In this section, we interpret our results further and look closely at how generalized hyperbolic discounting qualitatively impacts player utilities and utility-maximizing player behavior.

6.1. Degenerated Discounting Behavior

Our principal result is that complex discounted behavior often degenerates into non-discounting behavior (Theorem 1). There is a mathematical component to this result and a practical component. We begin with the mathematical component and shed some light on how it was established.
Mathematically, our result falls into the space of games involving infinite time horizons. In the original FlipIt paper, a significant part was dedicated to proving that if players are using renewal strategies, then the formalism that defines the game’s outcome in terms of very long-term (until infinity) running averages is valid. The important point was to show that if we define a game with an infinite time horizon, then there is some way to calculate the outcome from a class of strategies that can be specified finitely. Specifically, an initial result was that the formalism works for renewal strategies.
Because of this background, earlier versions of our result about sub-hyperbolic discounting assumed that the strategies were renewal. This gave us the additional property that restarting the game at a later point in time yields essentially the same outcome pattern. However, we could not find any counterexample to the discounting equivalence even while violating the renewal assumption.
One key example function helped to provide insight:
PC ( t ) = 1 if m , m ( m + 1 ) 2 t < ( m + 1 ) 2 2 0 otherwise .
This example satisfies
lim t 0 t P C ( τ ) d τ t = 1 2 .
However, for every integer m, there is a point in time s such that if we restart the game at time s, then the value of the functional form remains bounded away from its correct limit value for duration m. Well-behaved functions generated from renewal strategies essentially cannot depend on absolute time, so they do not have this property. Nevertheless, for this same example, the discounted limit value is also 1 2 .
If we perform our windowing construction using this example, we see that the restart time is quadratic in the window size. Iterating on this example, we see very quickly that whenever the restart time always dominates the next window size, then the discounted limit will converge. However, when we try to exemplify a window size that is about the same as the restart time, the original functional form does not converge. This gives us enough information to imagine the current proof structure.
The narrative we now follow does not assume that the function is well behaved but rather starts from a worst-case scenario in which the player control function is “optimized” to make the discounted limit deviate from the non-discounted one. In this framing, our tactic is to use the limit definition to construct, for each fixed ε > 0 , and as a function of t, a monitor on the function’s tail until the end of time. We use the properties of this monitor to show that whatever the function is doing, there is, at most, a finite amount of time after which it can no longer deviate from the truth of the original limit’s definition by more than 4 ε . From this, we show that discounting (sub-)hyperbolically results in the same outcome as not discounting at all.
The main intuition here is that discounting hyperbolically or sub-hyperbolically still assigns an infinite value to future gains and costs. Therefore, whatever happens in the now is not as important as the long-term future, which, independent of any (sub-)hyperbolic discounting parameters, always tends toward the same thing. The source of gravity for this sameness is that, if we wait long enough, the discount rate drops over time for all possible parameter values until it becomes essentially flat. In any case, the result is both moderately philosophical and mathematically precise, which makes it interesting.
There are also interesting behaviorally grounded aspects. Experimental research suggests that humans do temporally discount the future but tend to do so sub-hyperbolically [30,56].5 We have shown that the behavior of rational actors who do not discount, the behavior of those who discount sub-hyperbolically, and the behavior of those who discount hyperbolically is identical. Additionally, we observed that the behavior of super-hyperbolic discounters ( α < β ) remains similar as long as α and β are close to each other; there is no major jump or discontinuity in behavior. Consequently, the impact of discounting is absent or small for what appears to be the most relevant part of the parameter space. Our result seemingly has some implications that are perhaps counter-intuitive:
  • All existing research and results on FlipIt-like timing games without discounting carry over to the most commonly observed discounting behavior. This includes the results presented in van Dijk et al. [2] and follow-up work without discounting.
  • The players’ utilities, incentives, best responses, and the game’s equilibria are not impacted by the fact that the defender starts the game in control of the resource. Differences between defender and attacker emerge only when discounting super-hyperbolically.
Upon encountering a surprising result, we must evaluate if it represents a significant finding or if it is an unintended artifact of model imperfections. We discern some caveats that may temper some of the implications of Theorem 1: one related to commitment and two arguments for super-hyperbolic discounting in our context.

Commitment

Consider a choice between the following two options: a reward at time T or a somewhat larger reward a fixed duration later. For sufficiently large T, a hyperbolic discounter always prefers to wait for the larger reward, as her discount rate is lower for times further removed from the present. However, unless she somehow commits to her choice, she might choose to claim the smaller reward anyway when the time T actually comes because her discount rate is high for times close to the present. This is known as “preference reversal” or “time-inconsistent preferences”.
A utility-maximizing hyperbolic discounter knows that preference reversal can happen and, in the absence of commitments, has to take her own time-inconsistent preferences into account when making decisions. This can change her best response; for example, she could opt for a third option of claiming an even smaller reward sometime before T, even if she would prefer the larger reward after T when she opts for it.
Our model assumes that the decision on which strategy to execute is made at the start of the game. We, therefore, implicitly assume that players can (and, in fact, must) commit to this strategy and cannot change it later on if their preferences were to change. Setting fixed strategies at the start of the game is common in games of timing where no new information becomes available as the game progresses, but for discounters with time-inconsistent preferences, it takes away any tension that may exist between players’ desired long-run strategies today and what their future selves would choose to do.
Since we model instantaneous rewards and costs as constant over time, it appears somewhat unlikely that such tension exists, as agents are not rewarded for patience. Nevertheless, the role of commitment is something to take into account, and it could be interesting to extend the game with multiple decision-making points, e.g., in the style of extensive form games.

Organizational Actors

Experimental research suggests that individuals tend to discount sub-hyperbolically, but the actors in the modeled interaction typically represent highly sophisticated (adversarial) organizations. Little research exists on appropriate values for α and β in this context. Presumably, organizational actors tend to act more rationally in an economic sense, which might mean they do discount super-hyperbolically—for example, exponentially or close to exponentially (small α).

Rationalization of Risk

Sub-hyperbolic discounting appears to be a more appropriate model for present-focused preferences and delay discounting. However, super-hyperbolic discounting can arise naturally as a result of rationalizing risk. We can, instead of discounting the generated value, consider the risk that the resource might stop generating value at some point in time. Assigning a constant probability of disappearance to any fixed-length time interval results in an exponentially discounted model—an extreme form of super-hyperbolic discounting. However, the entire family of (generalized) hyperbolic discount functions can appear if we consider resources that have been around for longer to be less likely to disappear or by introducing uncertainty over the probability of disappearance.

6.2. Short-Horizon and Long-Horizon Discounting

For both exponential and periodic play, our description of incentives often resulted in two separate cases based on whether α was more or less than half the value of β. It turns out that the behavioral shift around this point is similarly significant to that between super- and sub-hyperbolic discounters. The result is a partitioning of the parameter space in three zones, with α i > β i being equivalent to that of FlipIt-like games without discounting [2], 2 α i < β i resulting in games that exhibit mostly the same characteristics as the exponentially discounted model presented in Merlevede et al. [13,14], and a “new” space for α i < β i < 2 α i exhibiting part of the behavior of exponential discounters but not all (e.g., no decreasing attacker best-response curves or the possibility for three equilibria).
In the remainder of this section, we refer to super-hyperbolic discounters with 2 α < β as short-horizon discounters and to super-hyperbolic discounters with 2 α β as long-horizon discounters.
Figure 7 and Figure 8 illustrate player behavior through best-response graphs for periodic play and varying costs for defenders and attackers. In all figures, α and β are chosen such that the value of the resource is halved over the course of the first unit of time.6 We discuss some features of these graphs over the course of the following sections.

6.3. Player Indifference for Periodic Play

A remarkable property of periodic strategies is that the incentive of the slower player is independent of her own play rate. This is visible as the vertical lines on Figure 7 and Figure 8. This independence resulted in a set-valued best-response function, an interesting mathematical analysis, and, sometimes, many Nash equilibria. It appears for all values of α and β and could, therefore, also be observed in related work that did not include discounting [2] or that modeled exponential discounting [14]. However, where does the player indifference come from, and does it hold up in a real-world setting?
To reason about this, we only need to concern ourselves with gains, as the impact of cost on incentives is constant irrespective of discounting parameters (Lemma 4). Careful thought7 reveals that irrespective of the precise play rate of the slower player, every move she makes remains guaranteed to be successful, resulting in an unchanged incentive. In contrast, for faster periodic or exponential play, moving more frequently increases the probability of moving while already in control.
In any real-life scenario, the assumptions made by our model are unlikely to hold exactly. As the qualitative analysis above shows, changing either the assumption of perfect stealthiness or perfect periodic-ness would disrupt the perfectly vertical or horizontal lines on our best-response curves, especially near their edges. However, the indifference is “stable” in the sense that the slope changes only slightly if assumptions are changed only slightly; Nash equilibria induced by indifference do not appear particularly brittle. It is interesting to see that the slower player’s indifference can be observed regardless of discounting method, and for all discounting values α and β, for both players and despite the importance of resource ownership at the start of the game.

6.4. Effective Move Cost and Motivation

Comparing the graphs in Figure 7 and Figure 8 from left to right reveals that both long- and short-horizon defenders always play at significantly reduced rates compared to non-discounters, while the behavior of attackers is more nuanced. We discern two distinct effects of discounting on player incentives. Note that these effects were previously observed by Merlevede et al. [14], as exponential discounting is essentially an extreme form of super-hyperbolic, short-horizon discounting.
  • Increased effective cost. Players must pay for the instantaneous cost of a move “up front”. They first pay for the move and accrue benefits from this move later, resulting in the benefits being discounted at a higher rate than the costs. For both players, this effect results in a slightly higher “effective cost” of a move and reduces their incentive.
  • Relevance of starting position. Another behavioral change is caused by the defender starting out in control of the resource. This envigorates the attacker—he will attack at reasonably high rates to reduce the time before the first move to obtain the resource before its value has been discounted away, even in the absence of a defender. In contrast, this demotivates the defender, as she is sure to be in control of the resource when it is at its most valuable.
Higher effective costs cause both players to always drop out of the game in response to high opponent play rates more quickly and cause their maximal best response to be lower. However, the invigoration of the attacker more than offsets this effect at low move rates, while the defender’s comfortable starting position causes the defender’s best responses to drop even lower.
The effects of the starting position on player behavior are stronger for short-horizon players. For short-horizon attackers, eagerness to obtain the resource quickly can become such that their best responses become decreasing in the defender’s move rate, reaching a maximum when the defender does not move at all. Defenders can become demotivated enough not to respond at all to attacks as long as they happen infrequently.
Interestingly, more present-oriented defenders will tend to defend their resources less, while more present-oriented attackers will attack poorly defended targets more frequently than their more future-oriented counterparts.
Taking a step back, it is not hugely surprising that the identity of the player starting out in control is relevant only in the face of a finite (convergent) total valuation of the resource. However, in real interactions, it seems to make sense for the starting ownership to matter. This can be because of how we discount, or because we intuitively realize that no interaction will ever go on forever. In this context, discounting as the rationalization of risk again comes to mind.

6.5. Equilibrium Selection

Our analysis shows that, for periodic and exponential play, the discounted model allows for a single equilibrium for (sub-)hyperbolic discounting, two equilibria for long-horizon discounting, and possibly infinitely many equilibria for short-horizon discounting. Inspection of Figure 7 and Figure 8 and our theoretical results show that if two or three equilibria exist, they always map onto two or three of the following situations:
  • The attacker moves infrequently, and the defender does not respond.
  • Both players move at a low, non-zero rate.
  • Both players move at a higher, non-zero rate.
In fact, if multiple equilibria exist, the defender always does not play in one of them.
Distinct equilibria result in distinct outcomes, and in games of timing, which are generally not zero-sum games, some outcomes may be objectively inferior to others in the sense that they are not Pareto-optimal. Upon examining the equilibrium situations described, it becomes clear that both players always fare better in the outcome where the defender refrains from playing. Intriguingly, this suggests that defenders should avoid making regular defensive moves in certain scenarios to prevent an arms race from harming both parties. This is a particularly poignant observation because such outcomes cannot be equilibrium outcomes when not discounting or when discounting sub-hyperbolically.

6.6. Perfectly Secure Systems

One of the foundational assumptions in the interaction with APTs that we model is the capability of advanced threat actors to execute indefensible attacks. Nevertheless, timing game analysis shows that defenders, through high-frequency maneuvering, can completely avoid system compromise by causing the attacker to drop out, effectively achieving “perfectly secure” systems. Unfortunately, such a strategy entails significant costs and is not an equilibrium solution. In the absence of attacks, defensive vigilance is likely to lapse.
When factoring in discounting, structural measures that make it harder to breach the system provide an alternative path to secure systems. Games with a super-hyperbolic discounting attacker invariably have a Nash equilibrium in which neither player moves provided that the (instantaneous) cost of attacks is high enough, that is, if
c A 1 β A 2 · α A .
In fact, as c A is a unitless number that expresses the cost of an attack in relation to the worth of the resource to the attacker, it is possible to increase c A and achieve perfect security not only by security hardening but also by decreasing the resource’s worth to attackers—for example, by not storing sensitive data for longer than is absolutely necessary.
In our opinion, the presence of a Nash equilibrium at ( 0 , 0 ) is a strong argument for super-hyperbolic discounting, as, practically speaking, most real systems are never attacked by most attackers, an outcome that is not an equilibrium outcome in the absence of discounting for the attacker.
An interesting path for future research to explore would be to define a two-stage game in which the defender (and, possibly, the attacker) could choose to invest before the game of timing starts, increasing the value of c A . This presents defenders with an alternative way of increasing the security of their system that may prevent the disastrous “arms race” scenario of Section 6.5.

6.7. Non-Optimality of Periodic Strategies

One of the major theoretical results holding in the non-discounted setting [2] that we touched on in Section 4.5.2 is that the class of periodic strategies strictly dominates the class of all renewal strategies. These are strategies that have the time of their first move as well as all inter-arrival times drawn from the same time-independent distribution.8 This theoretical result has been a major motivator for the interest in the class of periodic strategies.
In the context of super-hyperbolic discounting, too, we can show that a player playing an exponential strategy against a player with a periodic or exponential strategy can strictly increase her utility by playing a periodic strategy with the same rate parameter instead. Both strategies are equally costly, and the periodic strategy is optimal in the sense that it minimizes the probability of moving when already in control of the resource. Therefore, the set of periodic strategies strictly dominates the set of exponential strategies.
However, where the periodic strategy, when not discounting, ended up strictly dominating the class of not only the exponential strategies but that of all renewal strategies, we can show that this is not the case when discounting super-hyperbolically. This observation also applies to the exponentially discounted model [14] but is a novel result presented in this paper.
Theorem 6 (Non-optimality of periodic strategies).
For super-hyperbolic discounting defenders facing an attacker playing a periodic strategy, the class of periodic strategies with random phase does not dominate the class of renewal strategies.
Proof. 
We prove this by giving a counterexample. We define a class of strategies that are renewal strategies that outperform the periodic strategy under certain conditions. Specifically, we define a renewal strategy resembling the periodic strategy but with a lower probability of moving near the beginning of the game, when the defender is likely to still be in control of the resource. The formal proof is in Appendix D.    □
The periodic strategy and renewal strategies are “flawed” in the sense that they are incapable of fully exploiting the knowledge of resource ownership at the start of the game. The optimality of periodic strategies relies on a form of time independence that is no longer satisfied when discounting super-hyperbolically. The search for new classes of analytically tractable strategies that are more suited to contexts with super-hyperbolic discounting remains an interesting open avenue for future research.

7. Conclusions

The model presented in this paper introduces four parameters governing the preferences of two players discounting along a generalized hyperbolic discounting function ( α A , α D , β A , and β D ) and is sufficiently general as to subsume several previously studied classes of such games, including exponential discounting [14,15] and the original FlipIt game [2], which does not involve discounting. The price of this generality is an increase in the complexity of derivations and formulae. The additional degrees of freedom also make it more difficult to clearly narrate our players’ expected behavior.
On the upside, we are able to provide a characterization of equilibrium configurations for the canonical regions of our parameter space, and our machinery brings together several different approaches into one unifying framework. Many of our generalized arguments subsume and even simplify the arguments and calculations from prior related work [2,14,15]. Examples include our notations surrounding anonymous gain and our focused usage of base incentives. Over the long run, these analysis techniques may prove more valuable than the current characterization results they produced here.
Two central markers roughly guide the equilibrium characterizations. If each player is discounting hyperbolically or slower, then the equilibrium conditions are exactly the same as when the players are not discounting [2]. On the other end of the discounting spectrum, where both players are discounting faster than hyperbolic, the equilibrium conditions are very similar to those obtained in earlier work where players discount exponentially [14,15].
The present work advances our collective understanding of games of timing in which control of a single resource flips back and forth between two time-based-discounting players, and it extends a series of discounting-centric investigations begun within the FlipIt framework. By showing that discounting (sub-)hyperbolically does not impact player utilities, we enable an extended interpretation of all existing results on the undiscounted model [2] and its variations. We have also proven that one of the principal results for this undiscounted model—the strict dominance of periodic over renewal strategies—no longer holds when discounting super-hyperbolically; this includes exponential discounting.
Our result stating that hyperbolically discounted, FlipIt-like security games of timing are strategically equivalent to those without discounting seems especially powerful. The proof that we present makes little assumptions about the game structure or discounting function and may have applications beyond the space of FlipIt-like games.
Future work can extend and apply the tools and techniques we have developed here to other applicable models. Additionally, there are interesting possible extensions starting from this paper, such as considering more general discounting-customized strategy classes, increasing the number of players, or adjusting the background context in different dimensions others have opened. Likewise, we can study our model in the context of a behavioral experiment [57,58].

Author Contributions

Conceptualization, J.M., B.J., J.G. and T.H.; Formal analysis, J.M. and B.J.; Writing—original draft, J.M., B.J. and J.G.; Writing—review and editing, J.M., B.J., J.G. and T.H.; Supervision, J.G. and T.H.; Funding acquisition, J.G. and T.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Divergent Behavior

This entire section is dedicated to proving Theorem 1, which states that scaling gains and costs using a generalized hyperbolic discounting function with diverging total valuation does not impact said gains and costs, irrespective of the strategies adopted by players. Discounting hyperbolically or sub-hyperbolically consequently has no impact on the strategic considerations of players, the implications of which we discuss in Section 6.1.
Expectations are with respect to the probability distribution over the player control function PC i resulting from the player strategies. Appendix A.1 considers the statement made about gains (Equation (14)). Appendix A.2 proves the statement made about costs (Equation (15)).

Appendix A.1. Gains

We begin by considering the gain of player i. The entire proof involves only one player, so we will omit the subscript i. Our assumption that the non-discounted gain converges means that there is a constant value v [ 0 , 1 ] such that
lim t E [ τ = 0 t PC ( τ ) d τ ] t = v .
Before addressing the expression for discounted gains, we want to establish some additional structure around this limit statement. Therefore, let us define, for s , t 0 ,
f ( s , t ) = E τ = s s + t PC ( τ ) d τ t .
Function f expresses the non-discounted gain obtained over a period of duration t, starting at time s. We can thus express our main assumption Equation (A1) using this notation as:
lim t f ( 0 , t ) = v .
Our first lemma says that the gain moves continuously with respect to both the start time and the duration.
Lemma A1.
f is continuous in both s and t.
Proof. 
If player i adopts a pure strategy, then f ( s , t ) has the form τ = s s + t PC ( τ ) d τ t , where PC ( τ ) { 0 , 1 } for every τ. In this case,
f ( s 2 , t ) f ( s 1 , t ) = τ = s 2 s 2 + t PC ( τ ) d τ t τ = s 1 s 1 + t PC ( τ ) d τ t = s 1 s 2 PC ( τ ) d τ t s 2 s 1 t ,
and similarly,
f ( s , t 2 ) f ( s , t 1 ) = τ = s s + t 2 PC ( τ ) d τ t 2 τ = s s + t 1 PC ( τ ) d τ t 1 = 1 t 2 τ = s + t 1 s + t 2 P C ( τ ) d τ + 1 t 2 1 t 1 τ = s s + t 1 PC ( τ ) d τ 1 t 2 ( t 2 t 1 ) + t 1 t 2 t 1 t 2 · t 1 = 2 t 2 t 1 t 2 .
Therefore, for pure strategies, f is continuous in both s and t. Now, if we take an expectation over members of this class of functions, the result is still continuous.    □
Our next lemma shows that “starting” the game later does not impact the gain.
Lemma A2.
s 0 , lim t f ( s , t ) = v .
Proof. 
The following algebraic relation holds for all s and for all t > 0 :
f ( s , t ) = E τ = 0 s + t PC ( τ ) d τ E τ = 0 s PC ( τ ) d τ t = s + t t f ( 0 , s + t ) s s + t f ( 0 , s ) .
Therefore, if s is a non-negative real constant, we have by limit laws:
lim t f ( s , t ) = lim t s + t t f ( 0 , s + t ) s s + t f ( 0 , s ) = lim t s + t t · lim t f ( 0 , s + t ) f ( 0 , s ) lim t s s + t = 1 · lim t f ( 0 , t ) f ( 0 , s ) · 0 = 1 · ( v 0 ) = v .
   □
Next, using f together with the limit definition, we define a pair of sequences that will be unique with respect to f , v , and a fixed value of ε. These sequences provide us with notational anchors that we will eventually use to define boundary conditions for the discounted limit.
Definition A1.
Given a continuous function f ( s , t ) with domain ( R 0 × R 0 ) and values in [ 0 , 1 ] , and given v [ 0 , 1 ] such that s , lim t f ( s , t ) = v , fix ε > 0 and inductively define two sequences s k k = 0 and t k * k = 0 as follows:
s 0 = 0
t 0 * = max 1 , inf t * R : t t * , f s 0 , t v < ε .
For k 1 , assume that s m and t m are defined for m < k . Then, let
s k = m = 0 k 1 t m *
t k * = max 1 , inf t * R : t t * , f s k , t v < ε .
Intuitively, t k * expresses the minimal interval duration over which to average gains so as to always be within ε of the gain v, provided we start the game at time s k . If player strategies do not change over time, t k * should remain stable over time, but, e.g., t k * might increase with k if players move more slowly as time goes on. The value s k is simply the sum of all t i * with i < k . We define t k * as at least 1 to ensure that the sequence s k covers all of time. The next lemma enumerates some basic properties of this construction.
Lemma A3.
The constructed sequences s k and t k * satisfy the following properties.
  • s k + 1 = s k + t k *
  • t k * 1
  • lim k s k = +
  • t R + , ! k , s k < t s k + 1
  • | f ( s k , t k * ) v | ε
  • t > t k * , | f ( s k , t ) v | < ε
Proof. 
Each item follows directly from inspection of the definitions and previous properties in the list. □
Lemma A4.
If t k * > 1 , then | f ( s k , t k * ) v | = ε
Proof. 
If t k * > 1 , then t k * = inf { t * R : t t * , f s k , t v < ε } . This definition tells us two things: t > t k * , f s k , t v < ε and t < t k * , t > t , f s k , t v ε . Since f ( s k , t ) is continuous in t everywhere, in particular at t k * , and both of the conditions | f ( s k , t ) v | < ε and | f ( s k , t ) v | > ε would extend to an open interval around t k * , the only remaining option satisfying the definition of infimum is | f ( s k , t k * ) v | = ε . □
We now show the highly non-obvious result that t k * becomes strongly dominated by s k as k increases. We can interpret this as a constraint on how much f ( s , t ) can deviate from its limit value as time proceeds. We show this in several steps.
Lemma A5.
The sequences s k and t k * must satisfy the following constraint:
lim k t k * s k + s k + 1 = 0 .
Proof. 
The proof is by contradiction. Every term in the sequence t k * s k + s k + 1 is in ( 0 , 1 ) , so if lim k t k s k + s k + 1 0 , then there is a constant c ( 0 , 1 ) such that t k * s k + s k + 1 c for infinitely many k. At most, a finite number of terms with t k * = 1 may satisfy this constraint because 1 s k + s k + 1 < 1 2 k 1 , which becomes less than c after finitely many k. Therefore, there must be infinitely many m with t m * > 1 satisfying t m * c · ( s m + s m + 1 ) . By considering only those m that are large enough to witness the convergence of f ( 0 , t ) v to within c ε , there must be infinitely many m satisfying the following three properties.
  • t m * c · ( s m + s m + 1 ) ,
  • | f ( 0 , s m ) v | < c ε ,
  • | f ( s m , t m * ) v | = ε .
For such an m, we can evaluate the deviation error of f ( 0 , s m + 1 ) as follows:
f ( 0 , s m + 1 ) v = s m s m + 1 ( f ( 0 , s m ) v ) + t m * s m + 1 ( f ( s m , t m * ) v ) .
The first of the two terms on the right has absolute value at most
s m s m + 1 · c ε ,
while the second term has absolute value exactly
t m * s m + 1 · ε c ( s m + s m + 1 ) s m + 1 · ε = s m s m + 1 · c ε + c ε .
No matter how these numbers are added together with their signs, the absolute value of the sum will be at least c ε . We have thus exhibited a fixed constant c ε , and infinitely many m satisfying | f ( 0 , s m + 1 ) v | c ε , which contradicts lim t f ( 0 , t ) = v . □
Corollary A1.
lim t t k * s k + 1 = 0
Proof. 
lim t t k * s k + s k + 1 = 0 lim t t k * 2 s k + 1 = 0 lim t t k * s k + 1 = 0
Corollary A2.
t k * = o ( s k ) , or, equivalently,
lim t t k * s k = 0 .
Proof. 
If not, then c ( 0 , 1 ) , t k * c s k for infinitely many k. For each such k,
t k * = c t k * + ( 1 c ) t k * c t k * + ( 1 c ) c s k [ assumption t k * c s k ] = c s k + 1 c 2 s k [ definition , Equation ( A 7 ) ] > c s k + 1 c 2 s k + 1 [ s k is strictly increasing in k ] = c ( 1 c ) s k + 1 .
Therefore, d ( 0 , 1 ) , t k * d s k + 1 for infinitely many k, contradicting the previous Corollary. □
Having established sufficient structure around the convergence of f ( s , t ) , we now proceed to the discounted limit. Define the function g ( s , t ) to be the discounted variant of f ( s , t ) , namely:
g ( s , t ) = E τ = s s + t PC ( τ ) D ( τ ) d τ τ = s s + t D ( τ ) d τ .
Let us define functions a ( s , t ) and b ( s , t ) to be the numerator and denominator of g ( s , t ) , respectively.
a ( s , t ) = E τ = s s + t PC ( τ ) D ( τ ) d τ ,
b ( s , t ) = τ = s s + t D ( τ ) d τ .
And finally, using our sequences s k and t k * , define for each integer k,
a k = a ( s k , t k * ) b k = b ( s k , t k * ) .
With this notation, we can finally outline our strategy for proving our main result from Equation (14), which, using our recent notation, is equivalent to
lim t g ( 0 , t ) = v .
Given f, v, and ε > 0 , we determiniscally construct the sequences s k and t k * . This construction defines a map t m t , where m t is the unique integer with s m t < t s m t + 1 . We thus obtain a well-defined representation of g ( 0 , t ) as
g ( 0 , t ) = a ( 0 , s m t ) + a ( s m t , t s m t ) b ( 0 , s m t ) + b ( s m t , t s m t ) = k = 0 m t 1 a k + a ( s m t , t s m t ) k = 0 m t 1 b k + b ( s m t , t s m t ) .
We now construct an upper bound for | g ( 0 , t ) v | (for sufficiently large t) using three steps. First, we find an integer k * such that
r k * , k = k * r a k k = k * r b k v < 2 ε .
We then construct an integer K * k * such that
r K * , k = 0 r a k k = 0 r b k v < 3 ε .
Finally, we construct an integer m * K * such that
t s m * , k = 0 m t 1 a k + a ( s m t , t s m t ) k = 0 m t 1 b k + b ( s m t , t s m t ) v < 4 ε .
The desired result then follows because | g ( 0 , t ) v | is exactly the bounded value in the expression from the previous step, and the real number s m * in that expression is just a calculated maximum time value t = T * , and we can carry out this entire constructive process for an arbitrarily small ε.
Assumption A1.
Before doing the computations, we need a note about v and ε. These will remain fixed for the entire proof. To avoid unnecessary edge cases for v ( 0 , 1 ) , by convention and without loss of generality, we assume that ε is sufficiently small so as to ensure 0 < v 4 ε < v + 4 ε < 1 . All of our calculations will be valid under these assumptions. If v = 0 , then the calculation involving v k ε may involve some division by zero or multiplication of negative numbers. However, the result that g ( s o m e t h i n g ) > v k ε will still hold because g ( s , t ) [ 0 , 1 ] from the definitions. The other edge case when v = 1 seems unproblematic with respect to the calculations in this proof.
We begin with the computation of k * .
Lemma A6.
k * , k k * , v 2 ε < g ( s k , t k * ) < v + 2 ε .
Proof. 
The discounting function D ( τ ) = 1 ( 1 + α τ ) β / α is decreasing in τ, so we can bound each value of the form g ( s k , t k * ) by replacing the function D ( τ ) by its constant minimum or maximum in its defined integration interval [ s k , s k + 1 ] . We have
g ( s k , t k * ) D ( s k + 1 ) · E s k s k + 1 P C ( τ ) d τ D ( s k ) · s k s k + 1 1 d τ = D ( s k + 1 ) D ( s k ) f ( s k , t k * ) D ( s k + 1 ) D ( s k ) ( v ε ) , g ( s k , t k * ) D ( s k ) · E s k s k + 1 P C ( τ ) d τ D ( s k + 1 ) · s k s k + 1 1 d τ = D ( s k ) D ( s k + 1 ) f ( s k , t k * ) D ( s k ) D ( s k + 1 ) · ( v + ε ) .
Computing these ratios, we have
D ( s k + 1 ) D ( s k ) = 1 ( 1 + α s k + 1 ) β / α 1 ( 1 + α s k ) β / α = 1 + α s k 1 + α s k + 1 β / α = 1 α t k * 1 + α s k + 1 β / α D ( s k ) D ( s k + 1 ) = 1 ( 1 + α s k ) β / α 1 ( 1 + α s k + 1 ) β / α = 1 + α s k + 1 1 + α s k β / α = 1 + α t k * 1 + α s k β / α
Since
lim t t k * s k + 1 = lim t t k * s k = 0 ,
we can find a sufficiently large number k * as to make the ratio D ( s k + 1 ) D ( s k ) as close to 1 as we wish, and the same applies to its inverse. Let us choose k * so that the term less than 1 is at least ( 1 ε v ε ) and that the term greater than 1 is at most 1 + ε v + ε . Then, we will have
g ( s k , t k * ) 1 ε v ε ( v ε ) = v 2 ε g ( s k , t k * ) 1 + ε v + ε ( v + ε ) = v + 2 ε .
And so,
k k * , v 2 ε < g ( s k , t k * ) < v + 2 ε .
Corollary A3.
Given k * as above, we have
r k * , v 2 ε < k = k * r a k k = k * r b k < v + 2 ε .
Proof. 
Expressing the result of Lemma A6 into sequence notation, we have
k k * , ( v 2 ε ) · b k < a k < b k · ( v + 2 ε ) .
Since k = k * r b k is positive for each r k * , we can write
v 2 ε = k = k * r ( v 2 ε ) · b k k = k * r b k < k = k * r a k k = k * r b k < k = k * r b k · ( v + 2 ε ) k = k * r b k = v + 2 ε .
Since b k diverges, the finite number of terms involving a k and b k with k < k * cannot significantly affect the ratio of sums after sufficiently many terms. The following Lemma makes this precise.
Lemma A7.
K * k * , r K * , v 3 ε < k = 0 r a k k = 0 r b k < v + 3 ε .
Proof. 
We may write
k = 0 r a k k = 0 r b k = k = 0 k * 1 a k + k = k * r a k k = 0 k * 1 b k + k = k * r b k
Since lim k k = k * r b k = , and both k = 0 k * 1 a k and k = 0 k * 1 a k are finite, we may choose a K * large enough so that
r K * , k = 0 k * 1 b k k = k * r b k < ε and k = 0 k * 1 a k k = k * r b k < ε v 2 ε .
For the upper bound, we have
k = 0 r a k k = 0 r b k = k = 0 k * 1 a k k = 0 r b k + k = k * r a k k = 0 r b k < k = 0 r a k k = k * r b k + k = k * r a k k = k * r b k < ε + ( v + 2 ε ) = v + 3 ε .
For the lower bound, we have
k = 0 r a k k = 0 r b k > k = k * r a k k = 0 r b k = k = k * r a k k = k * r b k 1 k = 0 k * 1 b k k = 0 r b k > k = k * r a k k = k * r b k 1 k = 0 k * 1 b k k = k * r b k > ( v 2 ε ) 1 ε v 2 ε = v 3 ε .
The next lemma formalizes a result that is analogous to t k = o ( s k ) in terms of the discounted variant g. Intuitively, the result for f is that as we go further in time, the potential accumulated mass of f starting at s k and going on for t k * is eventually (for all sufficiently large k) dominated by the potential accumulated mass of f over the totality of time preceding s k . In the discounted version, this dominance should be even greater because the potential value of the current interval is more heavily discounted compared to everything that was accumulated before.
Lemma A8.
lim k b k b ( 0 , s k ) = 0
Proof. 
Let δ > 0 . Since lim k t k * s k = 0 , we may choose an integer N such that k N , t k * s k < δ . Then, for k N , we have
b k b ( 0 , s k ) D ( s k ) t k * b ( 0 , s k ) < D ( s k ) s k δ b ( 0 , s k ) < b ( 0 , s k ) δ b ( 0 , s k ) = δ .
Finally, we have
Lemma A9.
m * , t s m * , k = 0 m t 1 a k + a ( s m t , t s m t ) k = 0 m t 1 b k + b ( s m t , t s m t ) v < 4 ε .
Proof. 
Using the previous lemma, choose m * K * such that k m * , b k b ( 0 , s k ) < min ε , ε v 3 ε . We have the relations:
a ( s m t , t s m t ) b ( s m t , t s m t ) b m t ,
so then for t s m * , we will have m t m * and thus
a ( s m t , t s m t ) b ( 0 , s m t ) b ( s m t , t s m t ) b ( 0 , s m t ) b m t b ( 0 , s m t ) < min ε , ε v 3 ε .
Using algebra similar to that of Lemma A7, together with the equivalence k = 0 m t 1 b k = b ( 0 , s m t ) , we have for the upper bound,
k = 0 m t 1 a k + a ( s m t , t s m t ) k = 0 m t 1 b k + b ( s m t , t s m t ) < k = 0 m t 1 a k k = 0 m t 1 b k + a ( s m t , t s m t ) b ( 0 , s m t ) < ( v + 3 ε ) + ε = v + 4 ε .
For the lower bound, we have
k = 0 m t 1 a k + a ( s m t , t s m t ) k = 0 m t 1 b k + b ( s m t , t s m t ) > k = 0 m t 1 a k k = 0 m t 1 b k 1 b ( s m t , t s m t ) b ( 0 , s m t ) > ( v 3 ε ) 1 ε v 3 ε = v 4 ε .
Since g ( 0 , t ) = k = 0 m t 1 a k + a ( s m t , t s m t ) k = 0 m t 1 b k + b ( s m t , t s m t ) , this completes the proof that lim t g ( 0 , t ) = v .

Appendix A.2. Costs

The proof for costs is largely analogous to the proof for gains, with a few exceptions. Here, we supply a new modified notation and provide a new Lemma A10 summarizing the different properties of the new notation. We modify the statement and proof of Lemma A4; additionally, we provide modified proofs for Lemmas A5 and A9. The desired result for costs follows directly from applying the modifications and substituting the new notation into the proof from the previoussubsection.
Define
f c ( s , t ) = E τ t i , s < τ s + t c i t ,
and assume that for some value v c ,
lim t f c ( s , t ) = v c .
The function f c ( s , t ) has a few properties that are different from the f ( s , t ) defined for gains, which we summarize in the following lemma.
Lemma A10.
f c ( s , t ) is right continuous in t, and for every τ > 0 ,
0 f c ( s , τ ) lim t τ f c ( s , t ) c i τ
Proof. 
The support of f c ( s , t ) (without taking an expectation) is a set of functions having the form
C ( s , t ) = τ t i , s < τ s + t c i t .
If we imagine that time starts at s, then this function behaves as k c i t , from the time of player i’s kth flip (or the beginning of time s, if k = 0 ) until just before her k + 1 st flip. At the k + 1 st flip time τ, there is an upward transition of the value by an amount c i τ , at which point the function continues as the next ( k + 1 ) c i t . Therefore, it is right continuous, not left continuous at the flip times, generally decreasing except at the flip times, and for any point τ at which it is not continuous, it increases by an amount c i τ .
Not all of these properties are preserved by taking weighted averages via an expectation over some members of this function class; however, right continuity, the maximum shift in the value at points of discontinuity, and the direction of the shift are preserved. The statement in the lemma succinctly formalizes these properties. □
The relation f c ( s , t ) = t s + t f c ( 0 , s + t ) s + t s f c ( 0 , s ) shows that for every s,
lim t f c ( s , t ) = v c .
Using this, for a fixed ε, inductively define sequences
s k c = m = 0 k 1 t m * c
t k * c = max 1 , inf t * R : t t * , f c s k c , t v c < ε .
These sequences have the identical basic properties given in Lemma A3. The next Lemma is a modified version of Lemma A4.
Lemma A11.
If t k * c > 1 , then
f c ( s k c , t k * c ) v c ε c i t k * c .
Proof. 
If t k * c > 1 , then t k * c = inf t * R : t t * , f c s k c , t v c < ε . This definition implies lim t t k * c f c ( s k c , t ) v c ε . Then, from Lemma A10, we must have lim t t k * c f c ( s k c , t ) f c ( s k c , t k * c ) lim t t k * c f c ( s k c , t ) + c i t . Therefore, f c ( s k c , t k * c ) v c ε c i t . □
The next Lemma is more or less identical in its statement to that of Lemma A5, but it requires a new proof due to the different lower bound, which was derived in Lemma A11.
Lemma A12.
The newly constructed s k c and t k * c must satisfy the following constraint:
lim k t k * c s k c + s k + 1 c = 0 .
Proof. 
We only need to modify the previous proof slightly. For the same reasons as before, if lim k t k * c s k c + s k + 1 c 0 , there must be infinitely many m satisfying t m * c d · ( s m c + s m + 1 c ) for a fixed constant d. We have a revised lower bound of the form | f c ( s m c , t m * c ) v c | ε c i t m * c , and since there are infinitely many m with t m * c > max 1 , c i ε 2 , d · ( s m c + s m + 1 c ) , the lower bound
| f c ( s m c , t m * c ) v c | ε ( 1 ε )
holds for infinitely many m. The upper bound was chosen arbitrarily using the limit definition to make the algebra simple, so we can now take m large enough so that
| f c ( 0 , s m c ) v c | < d ε ( 1 ε ) .
We may thus express f ( 0 , s m + 1 c ) v c as the sum of two terms, one of which has absolute value at most
s m c s m + 1 c · d ε ( 1 ε ) ,
and the other of which has absolute value at least
t m * c s m + 1 c · ε ( 1 ε ) .
Summing these will give a term having absolute value at least d ε ( 1 ε ) , and having infinitely many m satisfying | f c ( 0 , s m + 1 c ) v c | D for the same fixed constant D = d ε ( 1 ε ) contradicts our main assumption that lim t f c ( 0 , t ) = v c . □
The remainder of the proof steps, with one small exception, can be carried out exactly as before using the following new notations.
g c ( s , t ) = E τ t i , s < τ s + t c i D i c ( τ ) τ = s s + t D c i ( τ ) ,
a c ( s , t ) = E τ t i , s < τ s + t c i D i c ( τ ) ,
b c ( s , t ) = τ = s s + t D c i ( τ ) ,
a k c = a c ( s k c , t k * c ) ,
b k c = b c ( s k c , t k * c ) .
The remaining exception is that in the proof of Lemma A9, we wrote two inequalities containing and assuming the relation
a ( s m t , t s m t ) b ( s m t , t s m t ) .
This inequality is true for the gains but not necessarily for the costs since it is possible for the flip costs to accrue early on in the interval that starts with s m t . However, the only property we required was much weaker, namely that for m t m * , we have
a c ( s m t c , t s m t c ) b c ( 0 , s m t c ) < ε .
And for this we could use any number of relations, which do not even depend on v c 1 , for example,
a c ( s m t c , t s m t c ) a m t c < b m t c · ( v c + 2 ε )
together with
b k c = o ( b c ( 0 , s k c ) ) .
Therefore, with these specific modifications to the lemmas and their proofs, we can perform the substitution
f ( ) , s k , t k * , g ( ) , a ( ) , b ( ) , a k , b k f c ( ) , s k c , t k * c , g c ( ) , a c ( ) , b c ( ) , a k c , b k c ,
and the modified proof for gains shows that we have the desired result for costs also; namely that
lim t g c ( 0 , t ) = v c .

Appendix B. Player Gains

Appendix B.1. Structural Aspects

This subsection derives closer-to-analytical expressions for player gains when discounting super-hyperbolically and introduces precise definitions for the defender advantage and player anonymous gain. It presents formulae for player gains in terms of these concepts.
The fact that player i’s calculated resource valuation is finite, together with the restriction of strategies that allow us to replace lim sup with limit, allows us to obtain a much closer-to-analytic expression for the gain of player i (starting from Equation (9)):
G i = lim sup t E τ = 0 t PC i ( τ ) · D i ( τ ) d τ τ = 0 t D i ( τ ) d τ = lim t E τ = 0 t β i α i ( 1 + α i τ ) β i α i PC i ( τ ) d τ ,
where PC i is the player control indicator function from Equation (6) that encodes the game’s temporal outcomes, and expectations are over PC i .
At this point, it becomes useful to separate the expression for gain according to two time periods—the time before the first flip of the game, and all the time afterward. This allows us to work with two quantities, the first of which has a direct analytic representation, and the second of which has a representing formula that is uniform with respect to the identity of i (A or D):
G i = E τ = 0 t 0 β i α i ( 1 + α i τ ) β i α i d τ 1 i = D + lim t E τ = t 0 t β i α i ( 1 + α i τ ) β i α i PC i ( τ ) d τ .
We refer to the first term of this sum, D ¯ i , as the defender advantage because its value only accrues toward the defender’s utility. We refer to the second term of the sum, G ¯ i , as the anonymous gain, because the expression is uniform in the identity of i (which is not true of the gain as a whole).
Evaluating the defender advantage does not require structural knowledge of the strategy configuration and can be derived by calculus. The term evaluates to
D ¯ i = 1 i = D 1 E 1 1 + α i t 0 β i α i α i ,
which reduces to
D ¯ D = 1 E 1 1 + α D t 0 β D α D α D and
D ¯ A = 0 .
Evaluating the anonymous gain requires structural knowledge of the strategies being employed by each player, and this expression is a prominent subject of our investigations in the following subsections. For reference, the anonymous gain transcribes directly to
G ¯ i = lim t E τ = t 0 t β i α i ( 1 + α i τ ) β i α i PC i ( τ ) d τ .
Because G i = D ¯ i + G ¯ i , we can obtain what we need about the gain of players by focusing our attention on the defender advantage and the anonymous gain. Moreover, by extending these two notations only slightly, we may take advantage of additional symmetries of the game formulation. Formally, we define
D ¯ i , j = 1 i = D 1 E 1 1 + α j t 0 β j α j α j , and
G ¯ i , j = lim t E τ = t 0 t β j α j ( 1 + α j τ ) β j α j PC i ( τ ) d τ .
The first index i of the notation identifies the player whose action strategy is being applied, while the second index j corresponds to the player whose discounting method is being applied. This extended notation subsumes the original via D ¯ i = D ¯ i , i and G ¯ i = G ¯ i , i . It also allows us to compute some terms from others even when the two players apply different discounting implementations. Such simplifications rely on the normalized gains of two competing players summing to one if the players apply the same discounting parameters. When players discount at different rates, this symmetry is broken, but its usefulness can still be recovered by using the following equation, which uses our extended notation:
D ¯ D , j + G ¯ D , j + G ¯ A , j = 1 .
This works because the discounting of player j is being applied equally to three types of game outcomes—the time before t 0 , the time when the defender controls after t 0 , and the time when the attacker controls after t 0 —which together subsume all possible outcomes.
Finally, we may also use the property of finite total resource valuation to obtain a closer-to-analytic expression for the costs of player i starting from Equation (13):
C i = lim t E τ t i , τ t c i D i c ( τ ) τ = 0 t D i ( τ ) d τ = lim t E τ t i , τ t c i ( β i α i ) ( 1 + α i c τ ) β i c α i c .
Throughout the remainder of this section and the following sections, we omit the subscripts to discounting parameters α and β; which subscript applies should be clear from the context.

Appendix B.2. Exponential Play

We start with the derivation of anonymous player gains for exponential play. We can obtain the defender’s and the attacker’s gain from these expressions because players’ gains with equal discounting parameters sum to one, as is expressed by Equation (A43).
To obtain the anonymous player gain of player i, we derive a formula for the fraction of the total anonymous gain she obtains and a formula for the total anonymous gain.
Lemma A13.
For exponential play, each player i obtains a fraction of the total anonymous gain equal to:
ν i ν i + ν j .
Proof. 
At any moment in time, the probability that player i is the next player to move is equal to
p i = τ i = 0 + ν i e ν i τ i τ j = τ i + ν j e ν i τ j d τ j d τ i = ν i ν i + ν j .
Probability p i is, therefore, also equal to the probability that any flip by either player after time t 0 is made by player i.
Consider the set of all intervals between flips. For each such interval, with probability p i , player i is the player who receives gain over the entire interval. With probability 1 p i , she receives nothing. Therefore, her expected gain over each interval is p i times the value of the interval. By linearity of expectation, her total expected gain over this set of intervals is then p i times the total combined value of the intervals. □
Lemma A14.
For exponential play, the time of the first move is exponentially distributed with rate parameter ν i + ν j .
Proof. 
Let X i be the time of i’s first flip, and define the random variable
Z = min { X i , X j }
as the time until the first flip by either player. Then
F Z ( z ) = Pr [ Z z ] = Pr [ X i z or X j z ] = 1 Pr [ X i z and X j z ] = 1 Pr [ X i z ] · Pr [ X j z ] = 1 ( 1 ( 1 e ν i z ) ) ( 1 ( 1 e ν j z ) ) = 1 e ( ν i + ν j ) z ,
that is, Z is distributed exponentially with rate parameter ν i + ν j . □
Lemma A15 (Anonymous gains for exponential play).
If both players are playing exponentially, then player i’s anonymous gain is given by:
G ¯ i = ν i ν i + ν j · f β i α i α i , ν i + ν j α i .
Proof. 
From Lemma A13, it follows that proving this theorem means showing that the total anonymous gain is given by
G ¯ = f β α α , ν i + ν j α .
From Lemma A14, we know that the time of the first move is exponentially distributed with rate parameter ν i + ν j . The total anonymous gain is defined as the expected total gain obtained after the first move by either player:
G ¯ = ( ν i + ν j ) ( β α ) t = 0 + e ( ν i + ν j ) t τ = t + D i ( τ ) d τ d t = ( ν i + ν j ) t = 0 + d t e ( ν i + ν j ) t ( 1 + t α ) β α 1 .
A change of the integration variables to s = 1 + t α and a couple of computation steps then yield Equation (A46):
G ¯ = ν i + ν j α s = 1 + e ν i + ν j α d s e ν i + ν j α s s β α α = ν i + ν j α e ν i + ν j α E β α β ν i + ν j α .

Appendix B.3. Periodic Play

It seems reasonable to derive player gains for periodic play in the same way as for exponential play by deriving a formula for the total anonymous gain and a formula for the fraction of the gain assigned to each player. We can readily derive an expression for the total anonymous gain for periodic play in the same way as for exponential play9, but deriving the expression for the fraction turns out to be tricky.
We can divide time into intervals as in Lemma A13, and as for exponential play, the probability of a random interval being in control of a particular player remains constant at all times after the first flip.10 However, unlike for exponential play, an interval’s expected duration, and consequently expected valuation, is not independent of who controls it. Specifically, for periodic play, the faster-moving player can expect to remain in control of the resource for a longer time than the slower player.11 Compared to exponential play, players are also less likely to flip a resource already in their control (sometimes called a “flop”).
Therefore, the expression for i’s anonymous gain depends on whether i is the faster or the slower player. Dealing with this dependency in a proof similar to that of Lemma A13 is tricky, so we instead derive the anonymous gain of the slower and the faster player separately in Lemmas A16 and A17. By re-writing the results of these lemmas using the helper function for periodic play, we obtain Lemma 3. Throughout this section, we use the subscripts f and s to refer to the faster and the slower player.
Lemma A16 (Anonymous gain of the slower player).
For periodic play, the anonymous gain of the slower player is:
G ¯ s = ν s ( β 3 α ν f ) + ν f ν f α + ν f β 3 α α ( β 3 α ) ( β 2 α ) .
Proof. 
If a player controls the resource from time t to time T > t , she obtains a normalized gain of
V ( t , T ) = ( β α ) τ = t T D ( τ ) d τ = 1 1 + t α β α α 1 1 + T α β α α .
If the slower player makes a move at time t, she remains in control of the resource until the faster player takes control. The faster player takes control at a time T uniformly distributed between t and t + 1 ν f . This leaves the slower player an expected gain of:
V s ( t ) = ν f T = t t + 1 ν f V ( t , T ) d T = ν f β 2 α ν f α + ν f + t α ν f β 2 α α 1 1 + t α β 2 α α + 1 1 + t α β α α .
The probability density of the slower player to flip is ν s at any point in time. We can, therefore, compute her anonymous gain as:
G ¯ s = ν s t = 0 + V s ( t ) d t .
Evaluation of Equation (A50) concludes the proof. □
Lemma A17 (Anonymous gain of the faster player).
For periodic play, the anonymous gain of the faster player is:
G ¯ f = ν f ( β 3 α ν s ) + ν s ν f α + ν f β 2 α α ( β 3 α ) ( β 2 α ) + ν s β 3 α ν f β 2 α ν f α + ν f β 2 α α .
Proof. 
If the faster player moves at time t, she remains in control of the resource either until she makes another move at time t + 1 ν f or until the slower player moves. She controls the entire interval with probability ν f ν s ν f . This leaves the faster player an expected gain of
V f ( t ) = ν f ν s ν f · V t , t + 1 ν f + ν s · T = t t + 1 ν f V ( t , T ) d T .
To evaluate Equation (A51), we just need to combine Equations (A48) and (A49). The probability density of the faster player to flip is ν f at any time. We can therefore compute her anonymous gain as:
G ¯ f = ν f t = 0 + V f ( t ) d t .
Evaluation of Equation (A52) concludes the proof. □
Throughout this section, we implicitly assumed that β 2 α and β 3 α . If β does happen to be one of these integer multiples of α, the formulae for G ¯ f and G ¯ s presented above are undefined. Luckily, the gaps in the functions’ domains are removable singularities. The limits of G ¯ f and G ¯ s at these points are defined, and if we define a generalized periodic utility function as
lim β 2 α G ¯ i if β = 2 α lim β 3 α G ¯ i if β = 3 α G ¯ i otherwise ,
then the resulting function can be evaluated everywhere and is regular around the points β = 2 α and β = 3 α . These limits are given by:
lim β 2 α G ¯ f = 1 α 2 ν f ( α + ν s ) · ln ( 1 + α ν f ) ν s α
lim β 2 α G ¯ s = 1 α 2 ν s ( α + ν f ) ln ( 1 + α ν f ) ν s α
lim β 3 α G ¯ f = ν f α α + ν s α + ν f ν s α ln ( 1 + α ν f )
lim β 3 α G ¯ s = ν s α 1 ν f α ln ( 1 + α ν f ) .

Appendix C. Player Incentives

Appendix C.1. Expressions

We compute player incentives as the derivatives of the player utilities derived in Appendix B.

Appendix C.1.1. Exponential Play

By making the substitutions x i = ν D + ν A α i and r i = β i α i , we can express the players’ incentives for exponential play as:
u D ν D = e x D ν A α D 2 E r D 2 ( x D ) E r D 1 ( x D ) c D
u A ν A = e x A α A E r A 1 ( x A ) e x A ν A α A 2 E r A 2 ( x A ) E r A 1 ( x A ) c A
Defender and attacker incentives are also related. Assuming equal discounting parameters ( x D = x A = x , r D = r A = r ), we have:
u A ν A + c A = e x α E r 1 ( x ) + u D ν D + c D .

Appendix C.1.2. Periodic Play

The player incentives when the defender is the faster player are:
u D ν D ν D ν A = c D ν A · h D ( ν D )
u A ν A ν D ν A = c A + h A ( ν D ) 1 2 α A β A .
The player incentives when the defender is the slower player are:
u D ν D ν D ν A = c D ν A · h D ( ν A )
u A ν A ν D ν A = c A + h A ( ν A ) 1 2 α A β A + ( ν D ν A ) ( h A ( ν A ) + ν A h A ( ν A ) ) .
Note that the formulae for slower and faster defender play yield the same value when both players play at the same rate.

Appendix C.2. Direction of Incentive

Appendix C.2.1. Exponential Play

Lemma A18 (Direction of incentive for exponential play).
For exponential play, players’ incentives are strictly decreasing in their play rates.
Proof. 
We show this separately for defender and attacker in Lemmas A19 and A20. □
Lemma A19.
For exponential play and β D > α D , the defender’s incentive is strictly decreasing in her play rate.
Proof. 
Using the identity
E r 1 ( x ) = d E r ( x ) d x
we can derive the rate of change of the defender’s incentive with respect to her play rate:
2 u D ν D 2 = e x α D 2 ν A E r 3 ( x ) 2 E r 2 ( x ) + E r 1 ( x ) ,
where r = β D α D and x = ν A + ν D α D . The sign of Equation (A66) is equal to the sign of:
E r 3 ( x ) + 2 E r 2 ( x ) E r 1 ( x ) .
Equation (A67) evaluates to a negative value, since E r 2 ( x ) is less than the average of E r 1 ( x ) and E r 2 ( x ) :
  • The exponential integral function E r ( x ) is positive:
    E r 3 ( x ) > 0 , E r 2 ( x ) > 0 and E r 1 ( x ) > 0 .
  • The exponential integral function E r ( x ) is decreasing in its order r, that is, the partial derivative of E r to r is negative and
    E r 3 ( x ) > E r 2 ( x ) > E r 1 ( x ) .
  • The rate at which the decrease happens is decreasing in the order r, that is, the second-order partial derivative of E r to r is positive and consequently
    E r 3 ( x ) E r 2 ( x ) > E r 2 ( x ) E r 1 ( x ) .
The first two statements can be verified by inspecting the definition of the exponential integral function. The third statement follows from the identity12:
j E r ( x ) r j = ( 1 ) j 1 ( ln t ) j t r e x t d t .
Lemma A20.
For exponential play and β A > α A , the attacker’s incentive is strictly decreasing in her play rate.
Proof. 
Using Equation (A65), we can show that the derivative of the attacker’s incentive function to her play rate is
2 u A ν A 2 = e x α A 3 ( ν A E r 3 ( x ) 2 ( α A + ν A ) E r 2 ( x ) + ( 2 α A + ν A ) E r 1 ( x ) ) ,
where x = ν A + ν D α A and r = β A α A . The sign of Equation (A66) is the same as the sign of:
ν A E r 3 ( x ) 2 ν A + ν D x + ν A E r 2 ( x ) + 2 ν A + ν D x + ν A E r 1 ( x ) ,
in which we substituted α A by ν A + ν D x .
Since E r ( x ) is decreasing in r, a sufficient condition for Equation (A69) to always be negative is to show that it is always negative given ν D = 0 , i.e., if we can show that
0 > ν A E r 3 ( x ) 2 ν A x + ν A E r 2 ( x ) + 2 ν A x + ν A E r 1 ( x ) ,
then Equations (A69) and (A68) are always negative and the attacker’s incentive is always decreasing in her play rate. We can re-rewrite Equation (A70) as:
2 > E r 3 ( x ) E r 2 ( x ) E r 2 ( x ) E r 1 ( x ) 1 x .
We can show that for r > 1 , the right-hand side is strictly increasing in x and that
lim x + E r 3 ( x ) E r 2 ( x ) E r 2 ( x ) E r 1 ( x ) 1 x = 2 .

Appendix C.2.2. Periodic Play

As the utilities and incentives for periodic play are expressed in terms of the helper function for periodic play, we start by looking into the direction and curvature of the helper function for periodic play.
Lemma A21.
The helper function for periodic play, h ( ν ) , is strictly convex and has a strictly downward direction, that is, for all ν 0 :
h ( ν ) = d h ( ν ) d ν < 0 h ( ν ) = d 2 h ( ν ) d ν 2 > 0 .
Proof. 
Compute the second derivative:
h ( ν ) = 1 ν 3 1 + α ν α β α .
This expression is clearly strictly positive for all ν > 0 .
Compute the first derivative:
h ( ν ) = ν + ( 2 α β ν ) ( 1 + α ν ) 2 α β α ν ( 2 α β ) ( 3 α β ) .
We can see that h ( ν ) is zero for ν + . Since h is strictly increasing, this implies that h is strictly negative. □
Lemma A22 (Direction of incentive for periodic play).
For periodic play, the direction of a player’s incentive depends on whether she is the slower- or the faster-moving player:
  • If she is the slower-moving player, her incentive is independent of her play rate.
  • If she is the faster-moving player, her incentive is strictly decreasing in her play rate.
Proof. 
Inspection of Equations (A63) and (A62) shows that players’ incentives are independent of their play rates when they are the slow player.
To see that the defender’s incentive is strictly decreasing in her play rate when she is the faster player, compute the rate of change of the defender’s incentive for faster defender play with respect to ν D :
2 u D ν D 2 ν D ν A = ν A · h D ( ν D ) .
Since h D ( ν D ) is always strictly positive (Lemma A21), the defender’s incentive is always strictly decreasing in ν D .
To see that the attacker’s incentive is strictly decreasing in her play rate when she is the faster player, compute the rate of change of the attacker’s incentive for faster attacker play with respect to ν A :
2 u A ν A 2 ν D ν A = 1 + α ν A β α α ( ν A 2 ν D ) ν A ν D β ( ν A ν D ) ν A 4 .
This expression is always negative ( = s denotes “has the same sign as”):
2 u A ν A 2 ν D ν A = s α ( ν A 2 ν D ) ν A ν D β ( ν A ν D ) α ( ν A ν D ) ν A ν D β ( ν A ν D ) = ( α β ) ( ν A ν D ) ν A ν D 0 .

Appendix C.3. Base Incentives

We now derive some properties of the base incentive function, which is a player’s incentive function when not playing. Remember that a player’s base incentive function is a function of her opponent’s play rate.
Lemma A23 (Direction of base incentive).
For exponential and periodic play, we can state the following about the direction of the players’ base incentive functions:
  • The attacker’s base incentive function is strictly decreasing.
  • If 2 α D β D , then the defender’s base incentive function is strictly decreasing for strictly positive attack rates ( ν A ] 0 , + [ ).
  • If 2 α D < β D , then the defender’s base incentive function is first strictly increasing, then strictly decreasing.
Proof. 
For the exponential regime, we include only the proof for the statement for the attacker (Lemma A24). For the periodic regime, we prove the statement for the attacker (Lemma A25) and for the defender (Lemmas A26 and A27). □
Lemma A24.
For exponential play, the attacker’s base incentive function is strictly decreasing.
Proof. 
The attacker’s base incentive function is given by
c A + e ν D α A · E β α α ν D α A ,
the derivative of which is
1 α A 2 e ν D α E β α α ν D α e ν D α E β 2 α α ν D α ,
which is strictly negative as the exponential integral function is strictly decreasing in its order. □
Lemma A25.
For periodic play, the attacker’s base incentive function is strictly decreasing.
Proof. 
We have
2 u A ν D ν A ν A ν D = ν D c A + h A ( ν D ) 1 2 α A β A = h A ( ν D ) ,
which is strictly negative by Lemma A21. □
Lemma A26.
For periodic play and 2 α D β D , the defender’s base incentive function is strictly decreasing for strictly positive attack rates ( ν A ] 0 , + [ ).
Proof. 
Compute the direction and curvature of the defender’s incentive for slower defender play as:
2 u D ν A ν D ν D < ν A = h D ( ν A ) ν A h D ( ν A ) = 1 1 + α ν A α β α ( β 2 α ) 2 + ( β α ) ν A + ν A 2 ν A 2 ( β 2 α ) ( β 3 α )
3 u D ν A 2 ν D ν D < ν A = 2 α β + ν A ν A 4 1 + α ν A β α .
From Equation (A72), we can see that the sign of 2 α β + ν A determines the curvature of the defender’s incentive. Since 2 α β , the defender’s incentive is strictly convex for ν A > 0 . To see that the direction is strictly negative, we compute the limit of the direction for ν A + :
lim ν A 2 u D ν A ν D ν D < ν A = 1 lim ν A + 1 + α ν A α β α ( β 2 α ) 2 + ( β α ) ν A + ν A 2 ν A 2 ( β 2 α ) ( β 3 α ) = 1 1 · ( 0 + 0 + 1 ) ( β 2 α ) ( β 3 α ) = 0 .
Since the direction is constantly increasing and has a limiting value of zero, it is always strictly negative. □
Lemma A27.
For periodic play and 2 α D < β D , the defender’s base incentive function is first strictly increasing, then strictly decreasing.
Proof. 
Equations (A72) and (A73) show that the direction of the defender’s incentive is negative (and increasing) for large enough ν A . Because the curvature of the incentive changes only once and the direction has an asymptotic root at ν A + , its direction has at most one real root and changes sign at most once. The proof, therefore, reduces to showing that the defender’s base incentive is strictly increasing for ν A 0 .
Looking at Equation (A71), we see that this is equivalent to showing that
f = 1 + α ν A α β α ( β 2 α ) 2 + ( β α ) ν A + ν A 2 ν A 2
is smaller than one if β > 3 α and greater than one otherwise. We can show that if β > 3 α , then lim ν A 0 f = 0 , and that if β < 3 α , then f becomes unboundedly large for small ν A . □

Appendix C.4. Origin of Base Incentive

The following results can be derived using standard analytical techniques and properties of the exponential integral function. We state them without proof.
Lemma A28 (Origin of the defender’s base incentive).
For both exponential and periodic play, we can characterize the origin of the defender’s base incentive as follows:
  • It is equal to c D for ν A = 0 .
  • If 2 α D > β D , then it becomes unboundedly large for ν A 0 .
  • If 2 α D = β D , then it is equal to 1 α D c D for ν A 0 .
  • If 2 α D < β D , then it is equal to c D for ν A 0 .
Lemma A29 (Origin of the attacker’s base incentive).
For both exponential and periodic play, we can characterize the origin of the attacker’s base incentive as follows:
  • If 2 α A β A , then it becomes unboundedly large for ν D 0 .
  • If 2 α A < β A , then it is equal to 1 β A 2 · α A c A for ν D = 0 .

Appendix D. A Renewal Strategy Beating the Periodic Strategy

Proof of Theorem 6.
We prove this by showing that for every defender and periodic strategy, a renewal strategy exists that outperforms it.
Define RPD ( ν ) as the renewal strategy induced by a Dirac point distribution at 1 ν , resulting in a periodic strategy with the phase fixed to be equal to the period and moving at times ( 1 ν , 2 ν , ) . This strategy outperforms the periodic strategy with rate parameter ν for the defender.
To keep the math simple, assume that α D 0 , that is, assume that the defender discounts by the exponential function e β D t . The cost of playing the renewal strategy RPD is then:
β D i = 1 + c D e β D i ν = c D β D e β D ν D 1 .
This cost is strictly lower than the cost of c D ν D of the periodic strategy with rate parameter ν D . Because she starts in control of the resource, the total expected gain of the defender when playing renewal strategy RPD in response to an attacker playing a periodic strategy with ν A ν D is:
β D k = 0 + τ = k ν D k ν D + 1 ν D e β τ d τ = e β D ν D β D ν A · e β D ν A 1 e β D ν D 1 .
We can show that this gain is strictly higher than the gain derived from playing the periodic strategy with rate parameter ν D . As its cost is lower and its gain is higher, the renewal strategy RPD yields higher utility than the periodic strategy, at least under the assumed conditions. □
Although RPD is arithmetic, renewal strategies defined by a narrow (but not “infinitely narrow”) distribution around 1 ν D are non-arithmetic and will yield approximately the same utilities for super-hyperbolic discounters.

Notes

1
Similar formulations of the generalized hyperbolic discounting function have been referred to as hyperboloid (e.g., Estle et al. [54]) or hyperbola-like (e.g., Green and Myerson [55]) discounting functions.
2
In Merlevede et al. [13], these parameters are named λ D and λ A instead of β D and β A , and Λ D and Λ A instead of β D c and β A c .
3
A player can move at most once at a particular instance of time and a finite number of times over any finite time interval.
4
We limit ourselves to cases where both players discount super-hyperbolically and do not exhaustively cover scenarios where one player discounts super-hyperbolically and another sub-hyperbolically. As there is no discontinuity in player best responses near α = β , outcomes for mixed scenarios can be observed under the super-hyperbolic discounting by bringing α and β close to each other.
5
This research does primarily focus on income or costs occurring at one particular point in time, not on continuous income and cost streams as is the case here.
6
We pick a value for r = β / α and then compute parameters α and β as α = 2 1 / r 1 and β = α · r .
7
Define income resulting from a move as follows: (•) When a player in control of the resource performs a move, it does not result in income. (•) When a player not in control of the resource performs a move, flipping the resource, it results in income equal to the value generated by the resource from the time of the move until the resource flips again.
Start by splitting the game into two parts: the part of the game before the attacker’s first move (ante) and the part after the attacker’s first move (post).
• (ante) If the attacker is the slower player, this part of the game always yields him precisely zero gain, irrespective of either player’s play rate. If the defender is the slower player, this part of the game does yield her gain, but the expected amount only depends on the duration of ante, which is also independent of her play rate.
• (post) The probability density of the slower player flipping at any time is constant and equal to her play rate. Every one of her moves results in an expected income; while difficult to determine this income exactly, it is independent of her own play rate, as it is certain to result in a change of ownership while the value of ownership depends only on the time and the play rate of the faster player. The slower player’s gain is, therefore, proportional to her play rate, and her incentive is independent of her play rate.
8
To be precise, the periodic strategies strictly dominate the class of non-arithmetic or non-lattice renewal strategies. Arithmetic strategies are strategies for which all possible realizations happen at an integer multiple of some real number.
9
The CDF for the time of the first move is 1 ( 1 t 0 ν f ) I t 0 1 / ν f · ( 1 t ν s ) I t 0 1 / ν s = t 0 ( ν f + ν s t 0 ν f ν s ) I t 0 1 / ν f , where I is the indicator function. The PDF for the time of the first move is its derivative, p ( t 0 ) = j ( ν f + ν s 2 t + 0 ν f ν s ) I t 0 1 / ν f . An expression for the total anonymous gain is, therefore,
( β α ) t 0 = 0 + p ( t 0 ) τ = t 0 + D i ( τ ) d τ d t 0 = t 0 = 0 1 / ν f ν f + ν s 2 t 0 ν f ν s ( 1 + t 0 α ) β α α i f t 0 .
We can confirm that this expression is equal to the sum of G ¯ s and G ¯ f as presented in Lemmas A16 and A17.
10
The probability of the faster player having moved last at any point in time is equal to τ = 0 1 / ν f ν f · τ · ν s d τ = ν s 2 ν f . The slower player moved last with probability 1 ν s 2 ν f .
11
The expected duration of the intervals owned by the faster-moving player is τ = 0 1 / ν f ( 1 τ ν s ) d τ = 1 ν f ν s 2 ν f 2 . The intervals of the slower-moving player have a shorter expected length of τ = 0 1 / ν f ( 1 τ ν f ) d τ = 1 2 ν f .
12
https://dlmf.nist.gov/8.19.E15, (accessed on 10 November 2023).

References

  1. Radzik, T. Results and Problems in Games of Timing; Lecture Notes-Monograph Series; Institute of Mathematical Statistics: Durham, NC, USA, 1996; pp. 269–292. [Google Scholar] [CrossRef]
  2. van Dijk, M.; Juels, A.; Oprea, A.; Rivest, R.L. FlipIt: The Game of “Stealthy Takeover”. J. Cryptol. 2012, 26, 655–713. [Google Scholar] [CrossRef]
  3. Pawlick, J.; Farhang, S.; Zhu, Q. Flip the Cloud: Cyber-Physical Signaling Games in the Presence of Advanced Persistent Threats. In Decision and Game Theory for Security; Khouzani, M., Panaousis, E., Theodorakopoulos, G., Eds.; Number 9406 in Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2015; pp. 289–308. [Google Scholar] [CrossRef]
  4. Kushner, D. The Real Story of Stuxnet. IEEE Spectrum 2013, 50, 48–53. [Google Scholar] [CrossRef]
  5. Barrett, D.; Yadron, D.; Paletta, D. U.S. Suspects Hackers in China Breached about 4 Million People’s Records, Officials Say. Available online: http://www.wsj.com/articles/u-s-suspects-hackers-in-china-behind-government-data-breach-sources-say-1433451888 (accessed on 7 October 2023).
  6. Perez, E.; Prokupecz, S. U.S. Data Hack May Be 4 Times Larger Than the Government Originally Said. Available online: http://edition.cnn.com/2015/06/22/politics/opm-hack-18-milliion (accessed on 7 September 2023).
  7. Gallagher, R. The Inside Story of How British Spies Hacked Belgium’s Largest Telco. Available online: https://theintercept.com/2014/12/13/belgacom-hack-gchq-inside-story/ (accessed on 10 November 2023).
  8. Price, R. TalkTalk Hacked: 4 Million Customers Affected, Stock Plummeting, ‘Russian Jihadist Hackers’ Claim Responsibility. Available online: http://uk.businessinsider.com/talktalk-hacked-credit-card-details-users-2015-10 (accessed on 10 November 2023).
  9. Reuters. Premera Blue Cross Says Data Breach Exposed Medical Data. Available online: http://www.nytimes.com/2015/03/18/business/premera-blue-cross-says-data-breach-exposed-medical-data.html (accessed on 10 November 2023).
  10. ESET. APT Activity Report T2 2022. Available online: https://www.eset.com/int/business/resource-center/reports/eset-apt-activity-report-t2-2022 (accessed on 10 November 2023).
  11. Cole, E. (Ed.) Preface. In Advanced Persistent Threat; Syngress: Boston, MA, USA, 2013; pp. xv–xvi. [Google Scholar] [CrossRef]
  12. Nadela, S. Enterprise Security in a Mobile-First, Cloud-First World. Available online: http://news.microsoft.com/security2015/ (accessed on 10 November 2023).
  13. Merlevede, J.; Johnson, B.; Grossklags, J.; Holvoet, T. Exponential Discounting in Security Games of Timing. In Proceedings of the Workshop on the Economics of Information Security (WEIS), Boston, MA, USA, 2–3 June 2019. [Google Scholar]
  14. Merlevede, J.; Johnson, B.; Grossklags, J.; Holvoet, T. Exponential Discounting in Security Games of Timing. J. Cybersecur. 2021, 7, tyaa008. [Google Scholar] [CrossRef]
  15. Merlevede, J.; Johnson, B.; Grossklags, J.; Holvoet, T. Time-Dependent Strategies in Games of Timing. In Decision and Game Theory for Security; Alpcan, T., Vorobeychik, Y., Baras, J.S., Dán, G., Eds.; Number 11836 in Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2019; pp. 310–330. [Google Scholar] [CrossRef]
  16. Loewenstein, G.; Prelec, D. Anomalies in Intertemporal Choice: Evidence and an Interpretation. Q. J. Econ. 1992, 107, 573–597. [Google Scholar] [CrossRef]
  17. Samuelson, P. A Note on Measurement of Utility. Rev. Econ. Stud. 1937, 4, 155–161. [Google Scholar] [CrossRef]
  18. Ramsey, F.P. A Mathematical Theory of Saving. Econ. J. 1928, 38, 543–559. [Google Scholar] [CrossRef]
  19. Strotz, R.H. Myopia and Inconsistency in Dynamic Utility Maximization. Rev. Econ. Stud. 1955, 23, 165–180. [Google Scholar] [CrossRef]
  20. Koopmans, T.C. Stationary Ordinal Utility and Impatience. Econometrica 1960, 28, 287–309. [Google Scholar] [CrossRef]
  21. Fishburn, P.C.; Rubinstein, A. Time Preference. Int. Econ. Rev. 1982, 23, 677–694. [Google Scholar] [CrossRef]
  22. Callender, C. The Normative Standard for Future Discounting. Australas. Philos. Rev. 2021, 5, 227–253. [Google Scholar] [CrossRef]
  23. Vanderveldt, A.; Oliveira, L.; Green, L. Delay Discounting: Pigeon, Rat, Human—Does it Matter? J. Exp. Psychol. Anim. Learn. Cogn. 2016, 42, 141–162. [Google Scholar] [CrossRef]
  24. O’Donoghue, T.; Rabin, M. Doing It Now or Later. Am. Econ. Rev. 1999, 89, 103–124. [Google Scholar] [CrossRef]
  25. Laibson, D. A Cue-theory of Consumption. Q. J. Econ. 2001, 116, 81–119. [Google Scholar] [CrossRef]
  26. Zauberman, G.; Kim, B.K.; Malkoc, S.A.; Bettman, J.R. Discounting Time and Time Discounting: Subjective Time Perception and Intertemporal Preferences. J. Mark. Res. 2009, 46, 543–556. [Google Scholar] [CrossRef]
  27. Kahneman, D.; Tversky, A. Prospect Theory: An Analysis of Decision under Risk. Econometrica 1979, 47, 363–391. [Google Scholar] [CrossRef]
  28. Frederick, S. Cognitive Reflection and Decision Making. J. Econ. Perspect. 2005, 19, 25–42. [Google Scholar] [CrossRef]
  29. Laibson, D. Golden Eggs and Hyperbolic Discounting. Q. J. Econ. 1997, 112, 443–478. [Google Scholar] [CrossRef]
  30. Ericson, K.M.; Laibson, D. Chapter 1—Intertemporal Choice. In Handbook of Behavioral Economics—Foundations and Applications 2; Elsevier: Amsterdam, The Netherlands, 2019; pp. 1–67. [Google Scholar] [CrossRef]
  31. Herrnstein, R.J. Relative and Absolute Strength of Response as a Function of Frequency of Reinforcement. J. Exp. Anal. Behav. 1961, 4, 267–272. [Google Scholar] [CrossRef]
  32. McKerchar, T.L.; Green, L.; Myerson, J. On the Scaling Interpretation of Exponents in Hyperboloid Models of Delay and Probability Discounting. Behav. Process. 2010, 84, 440–444. [Google Scholar] [CrossRef]
  33. Myerson, J.; Green, L. Discounting of Delayed Rewards: Models of Individual Choice. J. Exp. Anal. Behav. 1995, 64, 263–276. [Google Scholar] [CrossRef]
  34. McKerchar, T.L.; Green, L.; Myerson, J.; Pickford, T.S.; Hill, J.C.; Stout, S.C. A Comparison of Four Models of Delay Discounting in Humans. Behav. Process. 2009, 81, 256–259. [Google Scholar] [CrossRef] [PubMed]
  35. Green, L.; Myerson, J.; Vanderveldt, A. Delay and Probability Discounting. In The Wiley Blackwell Handbook of Operant and Classical Conditioning; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2014; Chapter 13; pp. 307–337. [Google Scholar] [CrossRef]
  36. Sozou, P.D. On Hyperbolic Discounting and Uncertain Hazard Rates. Proc. R. Soc. B Biol. Sci. 1998, 265, 2015–2020. [Google Scholar] [CrossRef]
  37. Azfar, O. Rationalizing Hyperbolic Discounting. J. Econ. Behav. Organ. 1999, 38, 245–252. [Google Scholar] [CrossRef]
  38. Fernandez-Villaverde, J.; Mukherji, A. Can We Really Observe Hyperbolic Discounting? 2002. Available online: https://ssrn.com/abstract=306129 (accessed on 13 July 2023). [CrossRef]
  39. Bowers, K.D.; van Dijk, M.; Griffin, R.; Juels, A.; Oprea, A.; Rivest, R.L.; Triandopoulos, N. Defending against the Unknown Enemy: Applying FlipIt to System Security. In Decision and Game Theory for Security; Grossklags, J., Walrand, J., Eds.; Number 7638 in Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2012; pp. 248–263. [Google Scholar] [CrossRef]
  40. Laszka, A.; Felegyhazi, M.; Buttyán, L. A Survey of Interdependent Security Games. ACM Comput. Surv. (CSUR) 2014, 47. [Google Scholar] [CrossRef]
  41. Manshaei, M.; Zhu, Q.; Alpcan, T.; Bacşar, T.; Hubaux, J.P. Game Theory Meets Network Security and Privacy. ACM Comput. Surv. (CSUR) 2013, 45, 1–49. [Google Scholar] [CrossRef]
  42. Laszka, A.; Johnson, B.; Grossklags, J. Mitigating Covert Compromises. In Proceedings of the Web and Internet Economics, Cambridge, MA, USA, 11 December 2013; pp. 319–332. [Google Scholar] [CrossRef]
  43. Laszka, A.; Johnson, B.; Grossklags, J. Mitigation of Targeted and Non-Targeted Covert Attacks as a Timing Game. In Decision and Game Theory for Security; Das, S.K., Nita-Rotaru, C., Kantarcioglu, M., Eds.; Number 8252 in Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2013; pp. 175–191. [Google Scholar] [CrossRef]
  44. Farhang, S.; Grossklags, J. FlipLeakage: A Game-Theoretic Approach to Protect against Stealthy Attackers in the Presence of Information Leakage. In Decision and Game Theory for Security; Zhu, Q., Alpcan, T., Panaousis, E., Emmanouil Tambe, E., Casey, W., Eds.; Number 9996 in Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2016; pp. 195–214. [Google Scholar] [CrossRef]
  45. Johnson, B.; Laszka, A.; Grossklags, J. Games of Timing for Security in Dynamic Environments. In Decision and Game Theory for Security; Khouzani, M., Panaousis, E., Theodorakopoulos, G., Eds.; Number 9406 in Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2015; pp. 57–73. [Google Scholar] [CrossRef]
  46. Zhang, M.; Zheng, Z.; Shroff, N.B. A Game Theoretic Model for Defending Against Stealthy Attacks with Limited Resources. In Decision and Game Theory for Security; Khouzani, M., Panaousis, E., Theodorakopoulos, G., Eds.; Number 9406 in Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2015; pp. 93–112. [Google Scholar] [CrossRef]
  47. Laszka, A.; Horvath, G.; Felegyhazi, M.; Buttyán, L. FlipThem: Modeling Targeted Attacks with FlipIt for Multiple Resources. In Decision and Game Theory for Security; Poovendran, R., Saad, W., Eds.; Number 8840 in Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2014; pp. 175–194. [Google Scholar] [CrossRef]
  48. Leslie, D.; Sherfield, C.; Smart, N.P. Threshold FlipThem: When the Winner Does Not Need to Take All. In Decision and Game Theory for Security; Khouzani, M., Panaousis, E., Theodorakopoulos, G., Eds.; Number 9406 in Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2015; pp. 74–92. [Google Scholar] [CrossRef]
  49. Hu, P.; Li, H.; Fu, H.; Cansever, D.; Mohapatra, P. Dynamic Defense Strategy against Advanced Persistent Threat with Insiders. In Proceedings of the 2015 IEEE Conference on Computer Communications (INFOCOM), Hong Kong, China, 26 April–1 May 2015; pp. 747–755. [Google Scholar] [CrossRef]
  50. Oakley, L.; Oprea, A. QFlip: An Adaptive Reinforcement Learning Strategy for the FlipIt Security Game. In Decision and Game Theory for Security; Alpcan, T., Vorobeychik, Y., Baras, J.S., Dán, G., Eds.; Number 11836 in Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2019; pp. 364–384. [Google Scholar] [CrossRef]
  51. Zhang, R.; Zhu, Q. FlipIn: A Game-Theoretic Cyber Insurance Framework for Incentive-Compatible Cyber Risk Management of Internet of Things. IEEE Trans. Inf. Forensics Secur. 2020, 15, 2026–2041. [Google Scholar] [CrossRef]
  52. Banik, S.; Bopardikar, S.D. FlipDyn: A Game of Resource Takeovers in Dynamical Systems. In Proceedings of the 2022 IEEE 61st Conference on Decision and Control (CDC), Cancun, Mexico, 6–9 December 2022; pp. 2506–2511. [Google Scholar] [CrossRef]
  53. Miura, H.; Kimura, T.; Hirata, K. Modeling of Malware Diffusion with the FlipIt Game. In Proceedings of the 2020 IEEE International Conference on Consumer Electronics—Taiwan (ICCE-Taiwan), Taoyuan, Taiwan, 28–30 September 2020; pp. 1–2. [Google Scholar] [CrossRef]
  54. Estle, S.; Green, L.; Myerson, J. When Immediate Losses are Followed by Delayed Gains: Additive Hyperboloid Discounting Models. Psychon. Bull. Rev. 2019, 26, 1418–1425. [Google Scholar] [CrossRef]
  55. Green, L.; Myerson, J. A Discounting Framework for Choice with Delayed and Probabilistic Rewards. Psychol. Bull. 2004, 130, 769–792. [Google Scholar] [CrossRef]
  56. Frederick, S.; Loewenstein, G.; O’Donoghue, T. Time Discounting and Time Preference: A Critical Review. J. Econ. Lit. 2002, 40, 351–401. [Google Scholar] [CrossRef]
  57. Grossklags, J.; Reitter, D. How Task Familiarity and Cognitive Predispositions Impact Behavior in a Security Game of Timing. In Proceedings of the 2014 IEEE 27th Computer Security Foundations Symposium, Vienna, Austria, 19–22 July 2014; pp. 111–122. [Google Scholar] [CrossRef]
  58. Reitter, D.; Grossklags, J. The Positive Impact of Task Familiarity, Risk Propensity, and Need for Cognition on Observed Timing Decisions in a Security Game. Games 2019, 10, 49. [Google Scholar] [CrossRef]
Figure 1. Hyperbolic discount function and corresponding discount rates. Each blue curve corresponds to a different value of α . Parameter β is chosen so that D ( 4 ) = 1 3 ( β is increasing in α ). The dashed curve is a true hyperbolic discounting curve with α = β = 1 2 . The exponential function crossing the point ( 4 , 1 3 ) is shown in red.
Figure 1. Hyperbolic discount function and corresponding discount rates. Each blue curve corresponds to a different value of α . Parameter β is chosen so that D ( 4 ) = 1 3 ( β is increasing in α ). The dashed curve is a true hyperbolic discounting curve with α = β = 1 2 . The exponential function crossing the point ( 4 , 1 3 ) is shown in red.
Games 14 00074 g001
Figure 2. Contour plots of gains and utilities for exponential play and super-hyperbolic discounting (with α D = α A = 0.5 , β D = β A = 1 , and c D = c A = 0.3 ). Warmer colors indicate higher values; the precise numerical values are not important.
Figure 2. Contour plots of gains and utilities for exponential play and super-hyperbolic discounting (with α D = α A = 0.5 , β D = β A = 1 , and c D = c A = 0.3 ). Warmer colors indicate higher values; the precise numerical values are not important.
Games 14 00074 g002
Figure 3. Contour plots of gains and utilities for periodic play and super-hyperbolic discounting (with α D = α A = 0.5 , β D = β A = 1 , and c D = c A = 0.3 ). Warmer colors indicate higher values; the precise numerical values are not important.
Figure 3. Contour plots of gains and utilities for periodic play and super-hyperbolic discounting (with α D = α A = 0.5 , β D = β A = 1 , and c D = c A = 0.3 ). Warmer colors indicate higher values; the precise numerical values are not important.
Games 14 00074 g003
Figure 4. Contour plots of incentives for exponential play and super-hyperbolic discounting (with α D = α A = 0.5 , β D = β A = 1 , and c D = c A = 0 ). Warmer colors indicate higher values; the precise numerical values are not important.
Figure 4. Contour plots of incentives for exponential play and super-hyperbolic discounting (with α D = α A = 0.5 , β D = β A = 1 , and c D = c A = 0 ). Warmer colors indicate higher values; the precise numerical values are not important.
Games 14 00074 g004
Figure 5. Contour plots of incentives for periodic play and super-hyperbolic discounting (with α D = α A = 0.5 , β D = β A = 1 , and c D = c A = 0 ). Warmer colors indicate higher values; the precise numerical values are not important.
Figure 5. Contour plots of incentives for periodic play and super-hyperbolic discounting (with α D = α A = 0.5 , β D = β A = 1 , and c D = c A = 0 ). Warmer colors indicate higher values; the precise numerical values are not important.
Games 14 00074 g005
Figure 6. Base incentive functions of the defender and the attacker for exponential play, c i = 0 , β i = 1 , and varying α i .
Figure 6. Base incentive functions of the defender and the attacker for exponential play, c i = 0 , β i = 1 , and varying α i .
Games 14 00074 g006
Figure 7. Best responses of non-discounting ( β / α = 1 ), long-horizon ( β / α = 2 ), and short-horizon ( β / α = 6 ) defender behavior for periodic play and changing defender costs ( c D ). Attacker play rates ( ν A ) are on the horizontal axis; defender play rates ( ν D ) are on the vertical axis.
Figure 7. Best responses of non-discounting ( β / α = 1 ), long-horizon ( β / α = 2 ), and short-horizon ( β / α = 6 ) defender behavior for periodic play and changing defender costs ( c D ). Attacker play rates ( ν A ) are on the horizontal axis; defender play rates ( ν D ) are on the vertical axis.
Games 14 00074 g007
Figure 8. Best responses of non-discounting ( β / α = 1 ), long-horizon ( β / α = 2 ), and short-horizon ( β / α = 6 ) attacker behavior for periodic play and changing attacker costs ( c A ). Defender play rates ( ν D ) are on the horizontal axis; attacker play rates ( ν A ) are on the vertical axis.
Figure 8. Best responses of non-discounting ( β / α = 1 ), long-horizon ( β / α = 2 ), and short-horizon ( β / α = 6 ) attacker behavior for periodic play and changing attacker costs ( c A ). Defender play rates ( ν D ) are on the horizontal axis; attacker play rates ( ν A ) are on the vertical axis.
Games 14 00074 g008
Table 1. Discounting models used with FlipIt-like games with their discounting parameters and relation to the other models.
Table 1. Discounting models used with FlipIt-like games with their discounting parameters and relation to the other models.
Class of DiscountingNo Discounting [2]Exponential Discounting [14]Generalized Hyperbolic Discounting
Discounting parameters { β A , β A c , β D , β D c } 2 { α D , β D , α D c , β D c , α A , β A , α A c , β A c }
Relation to other modelsn.a. lim λ 0 is None lim α 0 is Exponential
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Merlevede, J.; Johnson, B.; Grossklags, J.; Holvoet, T. Generalized Hyperbolic Discounting in Security Games of Timing. Games 2023, 14, 74. https://doi.org/10.3390/g14060074

AMA Style

Merlevede J, Johnson B, Grossklags J, Holvoet T. Generalized Hyperbolic Discounting in Security Games of Timing. Games. 2023; 14(6):74. https://doi.org/10.3390/g14060074

Chicago/Turabian Style

Merlevede, Jonathan, Benjamin Johnson, Jens Grossklags, and Tom Holvoet. 2023. "Generalized Hyperbolic Discounting in Security Games of Timing" Games 14, no. 6: 74. https://doi.org/10.3390/g14060074

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics