Ransomware and Reputation

: Ransomware is a particular form of cyber-attack in which a victim loses access to either his electronic device or ﬁles unless he pays a ransom to criminals. A criminal’s ability to make money from ransomware critically depends on victims believing that the criminal will honour ransom payments. In this paper we explore the extent to which a criminal can build trust through reputation. We demonstrate that there are situations in which it is optimal for the criminal to always return the ﬁles and situations in which it is not. We argue that the ability to build reputation will depend on how victims distinguish between different ransomware strands. If ransomware is to survive as a long term revenue source for criminals then they need to ﬁnd ways of building a good reputation.


Introduction
Broadly speaking, ransomware is a form of malware in which the victim of a cyber-attack is blackmailed.The term has recently become synonymous with crypto-ransomware wherein the files on a victims device are encrytped and a ransom is demanded for the key to decrypt those files [1][2][3][4].While ransomware dates back to the AIDS Trojan in 1989, it has come to prominence over the last five years or so with an explosion in the number of ransomware strands and victims [5].Critical to this rapid evolution has been the wider awareness of crypto-graphically sound techniques for encrypting files in a way that does not allow reverse engineering [6].Such techniques mean a victim has no way of recovering encrypted files without a back-up or the private key held by the criminals.
Ransomware offers a viable, long term business model for criminals [7].In particular, if a victim has no back-up and values the encrypted files then they may decide the ransom is worth paying.Survey evidence suggests that many businesses do indeed pay the ransom [8].We also have evidence of large profits for ransomware criminals [9,10].The example of South Korean web-hosting firm Nayana paying a $1 million ransom in 2017 amply demonstrates how lucrative ransomware can be for criminals.Given that the probability of being prosecuted is low, it is no surprise that we now see hundreds, if not thousands, of ransomware strands in the wild.Each strand may only last a few months before the authorities and security experts 'catch up' but new variants continually emerge.
While ransomware offers a long-term business model it is clear that this model relies on an element of trust between victim and criminal.Victims will only pay the ransom if they believe that paying the ransom gives them a good enough chance of getting their files back [11].This, in turn, suggests that criminals would benefit from a reputation for honouring ransom payments.CryptoLocker and CryptoWall are two examples of ransomware strands that had a good reputation for returning files [12,13].There are plenty of examples, like WannaCry, with a bad reputation.So, should the criminal return the files, take the money and run, or simply demand more money?Recent evidence suggests that victims who pay the ransom recover their data around 50% of the time [14]. 1 That may be enough to tempt victims to try their luck and pay up.So, have the criminals, consciously or not, hit upon a good strategy?
In this paper we analyze reputation formation in a simplified repeated game.In each period a new victim decides whether or not to pay the ransom and the criminal decides whether or not to return the files.We assume that victims learn from the past experience of other victims.We also allow that it is costly for the criminal to return access to files because of, say, the costs of a 'customer service' to guide victims on how to decrypt files.In a baseline case where victims are unresponsive to past experience we show that it is optimal for the criminal to not return files.In the other extreme where victims are highly responsive to past experience the criminal should always return the files.We then explore the middle ground to see where the tipping point lies above which it is optimal for the criminal to return files.We find that if a criminal's reputation will be based solely on their own actions then in all but the most extreme settings it is optimal for the criminal to return the files.If, however, the criminal's reputation is affected by what other criminals are doing then it may not be optimal to return the files.
We will discuss the implications of our findings for law enforcement more as we proceed.But we note here that the findings of this paper can help inform on the threat analysis of ransomware strands.If a particular strand has a poor reputation then it would be in the interests of law enforcement to advertise this, in order to undermine the business model of the criminals.Either way we would not expect the strand to last long.By contrast, a strand with a good reputation has a viable long term future, and so is of a higher threat.This may encourage copy-cats (who do not return files).Or it may be that criminals join behind a 'successful strand', as with Cerber [15].Or that 'successful strands' are 'reborn' in the game of cat and mouse with law enforcement and security experts, as we saw with Locky [16].All this suggests that reputation is a factor that should go into threat analysis, alongside more familiar factors such as the ease with which the strand can be detected by common anti-virus software.
Our work builds upon a number of papers using economics and game theory to analyze ransomware.Hernandez-Castro et al. (2017) show how the optimal ransom demand depends on the distribution of valuations in the population and the information criminals can discern about willingness to pay [7].Caulfield et al. (2019) analyze how criminals can learn over time about the distribution of valuations in the population [17].Caporusso et al. (2018) set out the basic rationale for using game theory to model the interaction between ransomware criminal and victim [18].Laszka et al. (2017) consider the strategic role of back-ups [19] while Cartwright et al. (2019) look at the role of irrational aggression and deterrence [20].We recognize that there is also a related literature looking at the game theory of hostage taking situations, typically framed in a terrorist context [20][21][22][23].
In none of the papers and models just mentioned is reputation explicitly modelled.Instead the focus is more on criminal and victim interaction in a one-shot context or how criminals can learn over time.The key contribution of our paper is, therefore, to move to a repeated game setting in which victims can collectively learn from past experience.This moves the focus somewhat away from the criminal onto how victims react to an attack.For instance, how do victims form beliefs and how does that influence their decision to pay the ransom [24].We would argue that such questions should critically inform policy and advice on how to intervene in the ransomware business model.In particular, the criminals have already had time to learn from experience about how to make ransomware work and so can be expected to have a more sophisticated, forward looking strategy.To be ahead of the game we need to understand optimal behaviour in this setting.
We proceed as follows.In Section 2 we provide the model, in Section 3 we have our results, and in Section 4 we conclude.1 Although it is not clear whether this is some criminals returning the files 100% of the time and some 0% of the time, or it is mixing by a particular criminal.

Model
There is a continuum of potential victims.Each victim i is characterized by the amount she values her files v i .The population distribution of valuations is given by function q, where q(v), for any v ≥ 0, is the proportion of victims who value their files more than v.We shall as an example consider the linear distribution q(v) = a − bv, where a and b are parameters. 2In stage 1 of the game a victim is randomly drawn from the population and her electronic device is attacked by a criminal.The victim knows her valuation v i .The only information the criminal has is that v i is drawn from distribution q.He does not, therefore, know v i .
In stage 2 of the game the criminal chooses a take-it-or-leave-it ransom demand r.In stage 3 of the game the victim decides whether or not to pay the ransom.Let p denote the victim's choice, where p = 1 indicates pay and p = 0 indicates not pay.In stage 4 of the game the criminal decides whether or not to release the files back to the victim.Let g denote the criminal's choice, where g = 1 indicates return and g = 0 indicates destroy.The payoff of the victim is given by ( For instance, the victims payoff is v i if she gets her files back without paying the ransom (which could be equivalent to not being attacked) while her payoff is −r if she pays the ransom and does not get her files back.The payoff of the criminal is given by where c ≥ 0 is the cost of returning files.For instance, the criminal's profit is r if the victim pays the ransom and the criminal does not return the files.
Let us briefly comment on the interpretation of the cost parameter c.If the criminal can effortlessly return a victim access to her files, c = 0, then it is a weakly dominant strategy for him to do so.This makes it is 'easy' for the criminal to build a reputation.In reality, however, we can expect that there are costs to returning access to files.These costs may include: checking the payment matches a particular victim, returning the key to that victim, and, perhaps most importantly, guiding the victim on how to decrypt their files and dealing with queries about files that fail to decrypt [13].This latter point is crucial in terms of reputation because giving back the private key is not enough-the files need to be successfully recovered by the victim in order that she would feel the ransom payment was honoured.Hence, c is likely to be positive.This provides an incentive for the criminal to not return files.
In the following we assume that the above game is repeated indefinitely in periods t = 1, 2, . . . .Let v t , g t , r t and p t denote the respective valuation and choices in period t.Given that victims are chosen randomly a victim will only ever be a victim in one period.Victims are assumed to be self-focused and so they ignore the externality effect that paying the ransom may have on future victims.Specifically, the payoff of the victim in period t is The criminal is forward-looking and discounts future payoffs with discount factor δ. His aggregate payoff is thus We assume that the objective of the victim in period t is to maximize u t and the objective of the criminal is to maximize Π.We also assume that both victims and criminal are risk neutral.We restrict q(v) to lie in the interval [0, 1] as appropriate.
Whether the victim is willing to pay the ransom will depend on his beliefs about the likelihood of the criminal returning the files.Let β t ∈ [0, 1] denote the perceived probability of the criminal returning the files in period t if the ransom is paid.For simplicity we assume that the victim believes the probability of retrieving the files if the ransom is not paid is 0. Let history at time t be given by the vector h t = {r 1 , p 1 , g 1 , . . ., r t−1 , p t−1 , g t−1 } of past ransom demands, whether the ransom was paid, and whether files were returned.We will assume that beliefs β t are a function of h t and r t . 3Clearly, in reality, a victim would not have access to the complete history.Instead they would need to pick up snippets of information from personal and social networks, forums, search engines etc.
To look at how beliefs may be shaped it is useful to relate our model to that on reputation in repeated games.The basic model in this literature involves a long-lived player interacting with a sequence of short lived players [25][26][27]. 4For instance, a monopolist interacting with a sequence of potential entrants [30,31].The key question addressed in the literature is whether the long-lived player has an incentive to generate a reputation.This could be a monopolist generating a reputation for being tough against entrants or, in our case, a criminal generating a reputation for honouring ransom payments.
Models of reputation are driven by some level of informational asymmetry between the long-lived and short-lived players.One variant on the theme is a game of imperfect information and perfect recall in which the short-lived players do not know the payoff function of the long-lived player but can observe all actions.In our setting this can be equated with victims not knowing c and/or δ [30].Or it could be the short-run players put some positive probability the criminal is a 'commitment type' who will always honour ransom payments [31,32].In either case beliefs β t are shaped by history h t because the past actions of the criminal reveal information about his type.
Throughout the following we take a partial equilibrium approach in which the criminal maximizes expected payoff taking as given the beliefs of victims.This allows us to abstract away from formally modelling incomplete information or equilibrium belief formation and focus on the incentives of the criminal.It also means we do not restrict to 'rational' victims.In an Appendix A, however, we briefly consider equilibrium belief formation and Bayes Nash equilibrium.

Results
As a trivial benchmark case let us begin with the case of independent beliefs in which β t is independent of h t for all t.An assumption of independent beliefs is appropriate in a setting where there are many strands of ransomware which are indistinguishable, or treated as indistinguishable by victims.Hence, victims lump all ransomware attacks together and beliefs are based on the overall probability of getting files back.In this case the criminal that we are modelling in our game would have no (or very limited effect) on the beliefs of victims. 5It is simple to show that the criminal's optimal strategy is to never return files.Proposition 1.In the case of independent beliefs, if c > 0 it is optimal for the criminal to set g t = 0 for all t.
Proof.For any period t we can see that the victim will pay the ransom if and only if r t < v t β t .Consider a specific period τ in which the ransom is paid.Given that β t for all t > τ is independent of g τ , returning the files costs the criminal c and has no benefit.The victim observes r t before making her decision and so beliefs may also be conditioned on this.In our results this will not be an issue because the ransom is constant over time.So in the text we use h t as shorthand for h t and r t .

4
A variation on this theme, less relevant to us, is a patient player interacting with a less patient player [28] or two long-lived players interacting a finite number of times [29].
To formally capture this one would need a game with multiple criminals in which beliefs are shaped by the collective behavior of independent criminals.Proposition 1 encapsulates the basic intuition that if reputation is irrelevant then criminals have no incentive to return files.In this setting the only rational belief for victims is β t = 0. But, then victims are not willing to pay and so ransomware is not profitable.We can already see in Proposition 1 that reputation is vital if ransomware is to be a long term business model [11].In particular, the ransomware business model is dependent on beliefs, β t , being somehow tied with history, h t .
To appreciate the potential power of reputation let us consider another extreme.Suppose that we have grim-trigger beliefs in which (1) β t = 0 if p t−1 = 1 and g t−1 = 0, (2) β t = β t−1 otherwise.In this case, if the criminal once fails to return the files when a victim pays then no other victim will ever trust him in the future.If the criminal always return the files then initial beliefs are retained. 6The following result shows that it can be optimal for the criminal to return files.Proposition 2. In the case of grim-trigger beliefs it is optimal for the criminal to set g t = 1 when p t = 1 for all t if δ is sufficiently large.
Proof.Suppose that the criminal (a) charges a ransom r t , where r t > c and q(r t β 1 ) > 0, and (b) sets g t = 1 if p t = 1, in all periods t.Consider period τ and suppose that the victim pays the ransom in this period.Should the criminal return the files?The expected future payoff of the criminal if he returns the files is If the criminal does not return the files his future payoff is Π τ = r .It is, therefore, optimal to return the files if It is clear that δ ∈ (0, 1).
To illustrate Propositions 1 and 2 consider the linear distribution q(v) = a − bv.In the case of independent beliefs suppose beliefs are fixed β t = β 1 for some initial belief β 1 .The expected profit of the criminal if he sets ransom r in each period t is Hence we get optimal ransom demand Consider next grim-trigger beliefs with the same initial belief β 1 .Suppose the criminal sets ransom r and always returns the files if the ransom is paid.His expected payoff is Here we get optimal ransom demand Note that the optimal ransom is higher in the case of grim-trigger beliefs because the criminal will pay the cost of returning the files.Returning the files is optimal (see the proof of Proposition 2) if For plausible parameter values the value of δ is likely to be low.For instance, if a = b = 1 and c = 0.01 we have δ = 0.02.

Sampling Recent Victims
Propositions 1 and 2 show that whether it is optimal for the criminal to return files will primarily depend on belief formation.For the remainder of the paper we consider a particular model of belief formation based on empirical frequencies.In motivating this approach we begin by recognizing that there is an extensive literature on learning in games [33,34].It is widely recognized that beliefs typically do not obey Bayes rule (see Selten (1991) for an early, light-hearted take on the matter, with reference to models of reputation [35]).Here we focus on a belief-based model of learning in which the victim's beliefs are based on the past record of the criminal in returning files.Our approach is similar to that of Young (1993) [34,36] and consistent with beliefs being biased by event frequencies [37].
Our model of belief formation can be explained as follows.Given period τ and history h τ let P(h(τ)) = {t : p t = 1} denote the subset of periods in which the ransom is paid.Let n be an exogenous parameter that measures sample size.If |P(h(τ))| < n then we assume β τ = β 1 for some initial beliefs β 1 .If |P(h(τ))| ≥ n then denote by P n (h(τ)) the n largest values in P(h(τ)), i.e., P n (h(τ)) ⊂ P(h(τ)) are the most recent n periods in which the ransom was paid.We assume that In interpretation, this means that beliefs are based on the proportion of times the criminal returned files the last n times the ransom was paid.So, we can think of victims as sampling the last n victims who paid the ransom. 7 The following result is analogous to Proposition 2 in showing that it can be optimal for the criminal to return the files.This must be the case if sample size is small.The logic here is that it is 'easy' for the criminal to get a 'bad reputation' when the sample size is small, and once he has that reputation nobody ever pays the ransom.So, the better long term strategy is to pay the ransom.Proposition 3. If the sample size is n = 1 and δ is sufficiently large then it is optimal for the criminal to set g t = 1 when p t = 1 for all t.
Proof.In period t we can see that the victim will pay the ransom if and only if r t < v t β t .Suppose that the victim pays the ransom in period τ.If the criminal returns the files and continues with a strategy of charging r τ and returning files, then his expected payoff in subsequent periods is Note that this equation takes account of the fact β t = 1 for all t > τ.If the criminal does not return the files his expected payoff is Π = r τ .This takes account of the fact that β t = 0 for all t > τ.It is, therefore, optimal to return the file if δ > δ, where δ is as defined in the proof of Proposition 2.

7
In comparing our model to other models of belief-based learning we make the following remarks.Young (1993) allows that the individual does not necessarily sample the last n events.This adds a further stochastic element.Other models allow the past to gradually be forgotten and so recent events are given higher weight [38].
Our next result shows that at the opposite limit of a large n it is not optimal to return the files to every victim.In this case the large sample size means that the criminal can take the money and run some of the time without it fundamentally influencing his reputation.Proposition 4. For any δ < 1 there exists n such that if n > n it is not optimal for the criminal to set g t = 1 when p t = 1 for all t.
Proof.We proceed by contradiction.Suppose that a victim in period τ pays the ransom.Moreover, assume the criminal has a perfect record of returning files and so β τ = 1.If the criminal returns the files then his expected payoff in future periods is at most where r is the optimal ransom for β t = 1.Fix a particular realization of values (v τ+1 , v τ+2 , . . .).
Given this we can determine (p τ+1 , p τ+2 , . . . ) and (g τ+1 , g τ+2 , . . .).Suppose that there exists some period t at which point n subsequent victims have paid the ransom.That is ∑ t t=τ+1 g t = n.Now suppose that the criminal does not return the files in period τ but does continue to return the files to those who pay in all subsequent periods t > τ.There will be at least n periods in which we have beliefs β t = (n − 1)/n.In this case the victim will only pay if r t < v t (n − 1)/n.We set r t = r (n − 1)/n.Consider again the realization of values (v τ+1 , v τ+2 , . . ., v t ).We have fixed things so that (p τ+1 , p τ+2 , . . ., p t ) and (g τ+1 , g τ+2 , . . ., g t ) will remain unchanged.The only difference is the lower ransom.The 'drop' in payoff compared to Π o is, therefore, This can be bounded The 'gain' in payoff compared to Π o from not returning the files in period τ is c.For sufficiently large n we have ∆ < c.Hence the criminal can increase his payoff.
In the proceeding we assumed that there existed some period t at which point n subsequent victims have paid the ransom.Potentially one can have a realization of values where this is not the case.Here, however, the drop in payoff from not returning the files is strictly less than ∆ (for t = τ + n).We observe, therefore, that irrespective of the sequences of valuations the criminal earns a payoff above Π o if he does not return the files in period τ.

Simulation Results on Sample Size
Propositions 3 and 4 formally demonstrate that it may or may not be optimal for the criminal to return files depending on how sensitive victim beliefs are to the criminal's actions.We considered, however, the relative extreme cases of n = 1 and n large.To explore what happens in the middle ground we report the results of simulations in which we see how the profit of the criminal is influenced by n.In the simulations we consider the linear distribution of valuations q(v) = 1 − v. 8 We track the profits of the criminal over 1000 victims.Throughout we assume that victims start with the 'optimistic' initial belief β 1 = 1 and then update beliefs according to Equation (12).Each data point is the average of 1000 simulations.
We consider two variants on criminal strategy.Let us first explain our baseline random model.This is characterized by an exogenous parameter w ∈ [0, 1].For any period t, if the victim in period t pays the ransom then the criminal returns the files with probability w.Moreover, the criminal sets ransom demand which corresponds to the optimal ransom demand for beliefs β t .So, in our baseline model the criminal will, over time, return the files 100w% of the time.
Our baseline model is somewhat biased by the effect which drives Proposition 3. To appreciate why, suppose that there are n instances in a row where the criminal does not return the files.Then beliefs will be zero and no victim will pay the ransom.This means the 'bad reputation' sits forever.To control for this we consider a lower-bound model in which the criminal always guarantees that beliefs are at least w.Specifically, for any period t, if the victim in period t pays the ransom then (1) the criminal returns the files with probability w, unless (2) failure to return the files would result in beliefs falling below w, i.e., it would be the case that β t+1 < w, in which case he returns the files, to make sure β t+1 > w.
Consider first the case where the cost of returning files is c = 0.01 and the discount factor is δ = 1. Figure 1 plots profit as a function of w for four different values of n.It is readily apparent that profit is increasing in w.In other words the criminals profit is increasing in the likelihood of returning the files and the optimal strategy is to always return files to those that pay the ransom.Figure 2 shows what happens if we increase the cost of returning files to c = 0.1 and lower the discount factor to δ = 0.9.Here we see that, with the exception of n = 2, profits are highest when w = 0 and so the criminal's optimal strategy is to not return files.Note, however, that this finding is critically dependent on two things: (1) a discount factor of 0.9 means the criminal is very much focused on the short term, and (2) it takes time for victims to update beliefs.So, the criminal essentially exploits victims during the time it takes them to collectively learn about his strategy.Clearly, this strategy is not consistent with a long term business model.To illustrate the criminal's trade-off in choosing between a high and low probability of returning files consider Figure 3.This plots profit over time for three different values of w when n = 100 and c = 0.1.We can see that the strategy of never returning files w = 0 succeeds in the short run because it takes time for victims to update beliefs.But, over the long run the strategy of always returning files w = 1 is a clear winner.You might think the short term advantage of setting w = 0 stems from our assumption that victims are initially optimistic (with beliefs β 1 = 1).This, however, is not the key factor.To illustrate, Figure 4 shows what happens when we set β 1 = 0.5.As you can see we get similar trade-offs across w.
The key factor driving the initial advantage of w = 0 is the time it takes for reputation to take hold.The smaller is n the sooner reputations are formed and so the sooner profits are higher with w = 1.In summary, our simulations suggest that a criminal interested in maximizing overall profit will take a long term view and always return files to victims that pay.We observed no gains from a strategy that entails sometimes not returning the files.This suggests that Proposition 4 captures a very specific case.Only the criminal interested in short term gains (low discount factor) would optimally choose to not return files.

Conclusions
In this paper we have looked at the incentives for ransomware criminals to return files to victims.We have done so by studying a repeated game in which a criminal interacts with a new victim each period.A victim is assumed to form beliefs based on the past actions of the criminal.In simple settings we were able to obtain explicit results on when it is optimal for the criminal to always return the files and when it is not.We then used simulations to further our understanding around this issue.
To summarize and interpret our results first imagine a world in which there was only one ransomware criminal.In this setting the beliefs of victims will likely be shaped by the actions of the criminal.If so, the criminal will do best to always return the files.This results from two different factors (1) returning the files gives the criminal a 'good reputation', and (2) the criminal can benefit in the long term from that good reputation.Only the most short term of criminals would take the money and run.So, in this stylized setting incentives are clearly to return files.
In reality we know that there are hundreds, if not thousands, of competing ransomware strands.So, next consider a world in which there is a large number of ransomware strands and no distinction between them.Then it is difficult for the criminal to create a personalized reputation.Put another way, the reputation of the criminal is likely to be influenced by factors outside of his control.In this setting the optimal strategy is to not return the files.This, though, cannot succeed in the long run and ultimately would mean that ransomware has no viable future [11].In the short run, some victims will no doubt keep on paying the ransom in the 'hope' of getting their files back.Ultimately, however, the more news spreads that files are not returned the less trust there will be.
As we discussed in the introduction, evidence suggests that victims recover files around 50% of the time [14].This would suggest we are somewhere in between the two polar cases discussed above.Some ransomware criminals, we would suggest, are trying to create a good reputation for their 'product' to create a long term viable business model.Other criminals, by contrast, simply want to cash in on short term gains and have no incentive to honour ransom payments.Indeed, we observe fake ransomware that does not encrypt files at all [6].This melting pot of competing motives could influence the long-term evolution in ransomware in different ways.
If ransomware is to survive in the long term then the criminals behind a particular strand of ransomware, call it Locky, would want to separate their reputation from that of other strands of ransomware.In practical terms this means having a 'brand' that is recognized on search engines or forums that victims may consult after attack.It also means the criminals behind Locky would have an incentive to eliminate any competitors that may muddle their reputation.On the flip side, competitors would have an incentive to imitate Locky so as to free-ride on their good reputation.So, we could have a game of cat and mouse between competing criminal gangs.
From a law enforcement perspective things are somewhat complex.On the one hand if they interfere in the reputation of Locky then this would undermine the business model of Locky.For instance, law enforcement could frame things in a way that lowers victim's beliefs about getting their files back (without, necessarily, explicit deception).This would, hopefully, have the long run effect of removing the criminal's desire to continue with Locky.But, on the flip side, if Locky typically does return files then there is a short term welfare loss of discouraging victims from paying the ransom and getting their files back.So, there is a sense in which some victims have to sacrifice their own interests to undermine the long-term business model.
It is interesting that current advice tends to skim over the likelihood of getting files back.This is possibly because of a recognition that victims who are 'desperate' are going to be tempted to pay the ransom.Other reasons for not paying the ransom are that the victim may be targeted more in the future or may only get some of their files back before being asked for a larger ransom [4].In our model we have abstracted away from such issues but the basic insights from our paper still hold true.In particular, victims are likely to be influenced by beliefs, whether that be the probability of getting files back or the probability of the ransom increasing.The long term viability of ransomware relies on a good reputation.
Informally, a Bayes Nash equilibrium in this setting consists of: (a) an optimal strategy for the criminal given his type and the belief formation of victims, (b) an optimal strategy for victims given their beliefs, and (c) victims form beliefs consistent with Bayes rule and the strategy of the criminal.
We introduce some preliminary notation.Given history h τ in period τ, let L τ be an indicator variable that takes value 1 if there is period t < τ, where p t = 1 and g t = 0 and takes value 0 otherwise.Let M τ be an indicator variable that takes value 1 if there is a period t < τ, where p t = 1 and takes value 0 otherwise.In interpretation L τ records if a victim paid and did not get her files back, and M τ records if any victim has paid.Also, let r (β) denote the optimal ransom to set in a one-shot game with beliefs equal to β.
It is simple to show that for δ sufficiently high the following is a Bayes Nash equilibrium with grim-trigger beliefs: 9 (a.i)Irrespective of type the criminal chooses ransom demand r (1) in all periods.(a.ii) Irrespective of type the criminal honours all ransomware payments, i.e., sets g t = 1 if p t = 1.(b) The victim in period t chooses p t = 1 if v t β t > r t and chooses p t = 0 otherwise.(c) The victim in period t has beliefs β t = 1 if L t = 0 or β t = 0 if L t = 1.Note that on the equilibrium path the criminal always honours ransom payments and so victims expect him to honour ransom payments.Hence the high type criminal maintains a good reputation.
It is simple to show that for δ sufficiently low the following is a Bayes Nash equilibrium with grim-trigger beliefs: (a.i) Irrespective of type the criminal chooses ransom demand r (β t ) in all periods.(a.ii)The low type criminal honours all ransomware payments, i.e., sets g t = 1 if p t = 1.The high type criminal never honours ransom payments, i.e., sets g t = 0 if p t = 1.(b) The victim in period t chooses p t = 1 if v t β t > r t and chooses p t = 0 otherwise.(c) The victim in period t has beliefs β t = z if M t = 0, beliefs β t = 0 if L t = 1 and beliefs β t = 1 if M t = 1 and L t = 0 otherwise.Here we have a separating equilibrium in which the low-cost criminal returns the files and a high-cost criminal does not.The first time a victim pays the type of criminal is, therefore, revealed.
Finally, we provide an example with independent beliefs.In the previous example the low-cost type had a weakly dominant strategy to return the files.Here assume that c is small but positive c > 0 for the low-cost criminal.Then both would prefer to not return the files if this would not change beliefs.The following is a Bayesian Nash equilibrium: (a.i) Irrespective of type the criminal chooses ransom demand r in all periods (where r can be any positive number).(a.ii) Irrespective of type the criminal does not honour ransomware payments, i.e., sets g t = 0 if p t = 1.(b) The victim in period t chooses p t = 1 if v t β t > r t and chooses p t = 0 otherwise.(c) The victim in period t has beliefs β t = 0.This equilibrium is characterized by 'pessimistic' beliefs, which make it impossible to generate a positive reputation.

Figure 1 .
Figure 1.Criminal's profit as a function of w for four different values of n when c = 0.01 and δ = 1.

Figure 2 .
Figure 2. Criminal's profit as a function of w for four different values of n when c = 0.1 and δ = 0.9.

Figure 3 .
Figure 3. Criminal's profit over time for three different values of w when c = 0.1.

Figure 4 .
Figure 4. Criminal's profit over time for three different values of w when c = 0.1 and β 1 = 0.5.