Auditable Blockchain Randomization Tool †

: Randomization is an integral part of well-designed statistical trials, and is also a required procedure in legal systems. Implementation of honest, unbiased, understandable, secure, traceable, auditable and collusion resistant randomization procedures is a mater of great legal, social and political importance. Given the juridical and social importance of randomization, it is important to develop procedures in full compliance with the following desiderata: (a) Statistical soundness and computational efﬁciency; (b) Procedural, cryptographical and computational security; (c) Complete auditability and traceability; (d) Any attempt by participating parties or coalitions to spuriously inﬂuence the procedure should be either unsuccessful or be detected; (e) Open-source programming; (f) Multiple hardware platform and operating system implementation; (g) User friendliness and transparency; (h) Flexibility and adaptability for the needs and requirements of multiple application areas (like, for example, clinical trials, selection of jury or judges in legal proceedings, and draft lotteries). This paper presents a simple and easy to implement randomization protocol that assures, in a formal mathematical setting, full compliance to the aforementioned desiderata for randomization procedures.


Introduction: Bad and Good Practices in Randomization
Randomization is a technique used in the design of statistical experiments: in a clinical trial, for example, patients are randomly assigned to distinct groups receiving different treatments with the goal of studding and contrasting their effects. Randomization is nowadays considered a golden standard in statistical practice; its motivation is to prevent systematic biases (like an unfair or tendentious assignment process) that could distort (unintentionally or purposely) the conclusions of the study. For further comments on randomization see [1][2][3], for Bayesian perspectives see [4,5]. In the legal context, randomization (also known as sortition or allotment) is routinely used for the selection of jurors or judges assigned to a given judicial case; see [6]. For these applications, our initial quotation, from the Roman emperor Julius Caesar, suggests the highest standards of technical quality, and auditability, see [7].
Rerandomization is the practice of rejecting and discarding (for whatever reason) a given randomized outcome, that is subsequently replaced by a new randomization. Repeated rerandomization can be used to completely circumvent the haphazard, unpredictable or aimless nature of randomization, allowing a premeditated selection of a final outcome of choice. There are advanced statistical techniques capable of blending the best characteristics of random and intentional sampling, see for example [8][9][10][11][12]. Nevertheless, rerandomization is often naively used, or abused, with the excuse of (subjectively) "avoiding outcomes that do not look random enough", see for example [13,14]. In the legal context, spurious manipulations of the randomization process are often linked to fraud, corruption and similar maladies, see [6] and references therein.
In order to comply with the best practices for randomization processes, the authors of [6] recommend the use of computer software having a long list of characteristics, for example, being efficient and fully auditable, well-defined and understandable, sound and flexible, secure and transparent. Such requirements are expressed by the following (revised) desiderata for randomization procedures: Given the juridical and social importance of the themata under scrutiny, we believe that it is important to develop randomization procedures in full compliance with the following desiderata: (a) Statistical soundness and computational efficiency, see [15][16][17][18]; (b) Procedural, cryptographical and computational security, see [19][20][21][22]; (c) Complete auditability and traceability, see [23][24][25]; (d) Any attempt by participating parties or coalitions to spuriously influence the procedure should be either unsuccessful or be detected, see [26][27][28]; (e) Open-source programming; (f) Multiple hardware platform and operating system implementation; (g) User friendliness and transparency, see [29,30]; (h) Flexibility and adaptability for the needs and requirements of multiple application areas (like, for example, clinical trials, selection of jury or judges in legal proceedings, and draft lotteries), see [6].
Such requirements conflate several complementary characteristics that may seem, at first glance, incompatible. For example, strong security is often (but wrongly) associated with excessive secrecy, a doctrine known as "security by obscurity", computer routines may be efficient but are often tough as hard to audit, and mathematically well-defined algorithms may be perceived as hard to understand. The bibliographical references given in the formerly stated desiderata for randomization procedures already hint at technologies that can be used to achieve a fully compliant randomization procedure, most preeminently, the blockchain. This is the key technology supporting modern public ledgers, cryptocurencies, and a host of related applications.
A technical challenge for the application under scrutiny is the generation of pseudo-random number sequences that reconcile complementary properties related to computational efficiency, statistical soundness, and cryptographic security. In this respect, the excellent statistical and computational characteristics of linear recurrence pseudo-random number generators (or their modern descendants and relatives), like [16], can be reconciled with the needs concerning unpredictability and cryptographic security by appropriate starts and restarts of the linear recurrence generator. A sequence start for a linear recurrence generator is defined by a seed specified by a vector of (typically 1 to 64) integers, while a restart is defined by a jump-ahead or skip-ahead specified by a single integer (kept small relative to the generator's full period), see [22].
Unpredictable and cryptographically secure seeds and jump-aheads can be provided by high entropy bit streams extracted from blockchain transactions, an idea that has already been explored in the works of [31][32][33][34].
The next section develops a possible implementation of a fully compliant core randomization protocol based on blockchain technology, and also makes a simple prototype available for study and further research. Moreover, in order to make it simple and easy to use, we develop the prototype on top of a readily available crypto-currency platform. We use Bitcoin for this example, but other alternatives like Ethereum or other cryptocurrencies whose miners work under the same incentives model can be used with minor adaptations.

Results: Core Randomization Protocol in Blockchain
We intend to establish a protocol able to deliver on demand pseudo random numbers, from an auditable and immutable ledger. The procedure will start as follows: the user (the part that wants to receive a random number) shall send a Bitcoin transaction with a register of its purpose embedded in it. (One way to embed a message in a transaction is using the OP_RETURN script, which allows to store up to 40 bytes in a transaction.) The recipient of this transaction may be a proxy representing a competent authority, a pertinent regulatory agency, an agreed custodian, etc. When this transaction is first attached to the blockchain, we concatenate the transaction ID (a 32 bytes, hexadecimal number) and the block header (a 80 bytes, hexadecimal number). In case someone tries to generate more than one transaction for a same purpose, just take the one that was attached first. The resulting 112 bytes hexadecimal number will be the input for some known Verifiable Delay Function (VDF), that should be calibrated accordingly to the purpose of the random number. For instance, a less critical purpose should have a VDF that delays the result in just a few seconds, or even skip completely the VDF step. A critical purpose, with significant interests involved, should have a more complex VDF, with a delay of minutes or even hours. The final result, after the VDF, will be the source for our seeds and jump-aheads.
With the aid of this protocol, one is able to find a different pseudo-random number for each user that demands it. Note that the user does not have any incentive to try to modify its transaction ID, because he does not have any control of the block header. We assume that the user and the miner are not the same person, so a miner will only be interested in trying to control his block header if he is paid to do so. Since the last stage of our protocol involves the calculation of a VDF, it will take a certain amount of time to the miner to decide if the the block he has found will be of interest of the user. Thus, he might even lose his block, if some other miner broadcasts a block of his own before he finishes calculating the VDF.
In the following subsection, the miner's payoff and the necessary delay T for the Verifiable Delay Functions will be explicitly calculated.

Preventing Collusion for Spurious Manipulation
Suppose a malicious user tries to bribe a miner that controls a fraction p of the network's computational power. A prize P = nB, where B is the Bitcoin block reward, will be paid to the miner if he successfully mines what we call a "desirable block": a block that will deliver a random number in a set A, chosen by the malicious user. Let also λ be the average rate of incoming blocks and q the probability of a randomly generated number being an element of A, i.e., the measure of the set of desirable results for the malicious user. Finally, let T be the expected amount of time needed for the VDF calculations. The moment a miner finds a block that can be accepted by the network, he faces the decision of broadcasting it before checking the VDF, or calculating the VDF before broadcasting. If he decides to check the VDF before broadcasting, he might start another attempt to find a block rightaway.
First, we calculate the expected absolute payoff for the first and second options, called E 1 and E 2 , respectively. E 1 will be larger than B, since the miner might issue a desirable block by chance: On the other hand, if the miner chooses to calculate the VDF, he will receive the block reward and the prize P, but with a probability given by E 2 =(B + P)qP{no other node finding a block before t = T} (2) P{successfully mining a desirable block in another attempt} P{successfully mining a desirable block after i attempts} The probabilities inside the summation, in the last equation, can be calculated as the product of the probability of finding a desirable block after i attempts (that will be a geometric distribution with probability of success q) and the probability of finding and checking i blocks before the rest of the network mines one. Considering P{attacker finding and analyzing i blocks before another node mining one} it follows that Finally, in order to make accepting the bribe not lucrative, we must have E 1 > E 2 , i.e.: Since for every n > 0 we have 1+n 1+nq < 1 q , if we choose λT * = 1 1−p log 1 q q 1−p+pq , we guarantee that the attack will not be lucrative for any bribe P = nB. Also, since it can be assumed that p < 1/2, a value λT * = 2 log 2 1+q < 2 log(2) will be high enough to prevent an attack for any bribe and any acceptable value of p.

Conclusions and Final Remarks
We formalized a simple and effective protocol to generate on demand pseudo random numbers, in a fully auditable way. We have demonstrated that none of the involved parts has enough financial incentives to try to affect the random number outcome: the part that issues the transaction lacks this power, since it does not have any control on the block header; and the miners do not have enough financial incentives to collude with an attacker, provided a suitable Verifiable Delay Function is applied.
The essentially decentralized, yet completely traceable and auditable nature of the protocol presented in this article, makes the resulting randomization process eminently reliable without recourse of blind trust in any central authority. The authors believe the adoption of such a protocol by the the Brazilian Supreme Court (STF), as recommended in [6], would significantly increase public confidence in the judicial system and be a contributing factor for political and social stability. A simple prototype of the randomization tool described in this article is available in the supplementary materials; it is not intended to be used in a full-fledged application, but only to provide a working example of the key procedures.