Provably Secure Covert Communication on Blockchain

: Blockchain is a public open ledger that provides data integrity in a distributed manner. It is the underlying technology of cryptocurrencies and an increasing number of related applications, such as smart contracts. The open nature of blockchain together with strong integrity guarantees on the stored data makes it a compelling platform for covert communication. In this paper, we suggest a method of securely embedding covert messages into a blockchain. We formulate a simpliﬁed ideal blockchain model based on existing implementations and devise a protocol that enables two parties to covertly communicate through the blockchain following that model. We also formulate a rigorous deﬁnition for the security and covertness of such a protocol based on computational indistinguishability. Finally, we show that our method satisﬁes this deﬁnition in the random oracle model for the underlying cryptographic hash function.


Introduction
Covert communication channels are designed to protect the relationship between the transmitter and the receiver by hiding the fact that secret communication is taking place in the first place.Such channels can be used, for example, in military communication or in the presence of an authoritarian government.Secure covert communication can be implemented as a combination of cryptography and steganography.Cryptography ensures that the communicated message stays private while steganography is used to hide the fact that there is encrypted communication.However, steganography requires a medium.It requires that there is a channel where innocuous communication is taking place and both the transmitter and the receiver have access to that channel.In addition, the channel needs to be reliable so that the receiver gets the transmitted message with high probability and it has not been manipulated with.
The blockchain technology was first introduced as the underlying mechanism for cryptocurrency in Bitcoin [1] as an open decentralized method of providing trust.Blockchain is a public distributed ledger implemented as a continuously growing chain of blocks that is designed to provide authenticity of the data without any centralized parties.Its integrity is ensured collaboratively by the majority of the nodes participating in the blockchain network.Therefore, data recorded into a blockchain is inherently resilient to manipulation, since manipulation would require the adversary to control a large fraction of the nodes.Since its introduction, blockchain technology has attracted a lot of interest in various applications, such as smart contracts enabled by Ethereum [2], Internet of Things [3], healthcare [4] and medical data management [5].Typically, blockchain networks, such as those underlying cryptocurrencies, are free for anyone to join.This openness together with strong integrity guarantees on the stored data make blockchains an interesting platform for implementing covert communication channels.
There are three main advantages of blockchain compared to other cover mediums: (1) it is anonymous and free to join, meaning that the communication parties have free access, (2) submitted data cannot be altered and, in particular, the integrity guarantees are not provided by any centralized party, but the consensus of the entire network, and (3) published data cannot be removed, meaning that no authority can apply censorship to already published data.Since the blockchain is immutable, alteration of the covert messages is virtually impossible and the embedding of covert information is free to be fragile.This is not the case, for example, for images on an image board.
In this paper, we suggest a method of submitting covert messages through a blockchain considered as a payment platform.Due to the immutable nature of the blockchain, the sender's ability to embed into it is limited.However, since everybody is free to submit payments into the chain, we apply those payments to convey encrypted messages to a receiver.We start by providing a model for the blockchain, called a simplified ideal blockchain, where irrelevant technical details have been abstracted away.We then devise BLOCCE, a method of embedding and extracting reliably into a blockchain following this model by submitting payments and show that the method is reliable and runs in expected polynomial time excluding the time spent waiting for new blocks to appear into the chain.Based on the provable security of stegosystems, we then formulate a definition of secure covert communication on a blockchain based on the hardness of distinguishing the payload containing payments from random payments.Finally, we prove that our method satisfies this definition.The embedding rate of our method is rather inefficient and it is considered in a simplified model.Therefore, our proposal can be seen more as a proof-of-concept scheme than a practical scheme that is ready to be implemented.However, our proposal is the first to achieve a covert channel over a public blockchain with provable security.
The paper is organized as follows.Section 2 explains the preliminaries for the rest of the paper such as the utilized cryptographic primitives and steganography.In Section 3, we describe our model of a simplified ideal blockchain.The suggested method is explained in detail in Section 4 and its security is studied and proved in Section 5. Finally, Sections 6 and 7 provide the discussion on future work and the conclusion, respectively.

Related Work
To the best of our knowledge, there are no suggestions implementing provably secure covert channels on a blockchain.However, there are methods and services that systematically insert arbitrary data into a blockchain [6].The insertion can be based on either the input scripts, that unlock funds for the transaction, or the output scripts that specify the receiver of the unlocked funds.Depending on the transaction type (coinbase transactions that generate new currency or normal ones that make payments) deviations from the standard payment templates enable users to insert a message into the blockchain.This can be achieved, for example, by using dead conditional branches in an input script or replacing receiver public keys or script hashes with arbitrary data.Especially, methods manipulating the receiver address bear similarities to the one suggested in this paper.However, none of the existing methods address the covertness and privacy of the inserted data, which is the main motivation of our proposal.See [6] for a recent survey on existing blockchain data insertion methods and [7] for the methods that are used to detect and thwart them.
Our work is also related to the wider topic of privacy of blockchain based applications.Privacy was not an inherent property of the original Bitcoin design [1].In fact, due to its open and public nature, there are several threats to the privacy of blockchain users [8].However, there are several suggestions in the literature that attempt to address these issues by improving the anonymity, for example, by unlinking payments from their origin [9], using zero-knowledge proofs [10] or blind signatures [11].There are also suggestions that use blockchain as a medium for secure communication.Typically, these suggestion apply encryption to protect the stored application data.For example, atrusted blind escrow service with encrypted data using a blockchain was suggested in Ref. [12].In Ref. [13], an energy trading platform with bi-directional encrypted message streams was suggested.A decentralized data storage based on blockchain with end-to-end encryption for the internet of things was suggested in [14].In order to provide a tamper-proof media transaction framework, blockchain technology was combined with digital watermarking of the media in [15].

Notation
Let D be a probability distribution.A random variable V that is distributed according to D is denoted by V ∼ D. If an element u is sampled from a probability distribution D, we denote u ← D. The uniform probability distribution on a set X is denoted by U(X).For probabilistic algorithms, we apply the standard notation [16].When an algorithm A is run with an input x and it outputs y, we denote y ← A(x).Algorithms can be given access to oracles.Oracles are considered as "black boxes" that can compute, for example, functions or other algorithms.An algorithm A with oracle access to an oracle O is denoted by A O .We assume that the time complexity of calling an oracle and receiving its answer is constant time O (1).
The length of a string σ = σ 1 σ 2 . . .σ n ∈ {0, 1} n is denoted by |σ| = n.The least significant bit (LSB) of σ = σ 1 σ 2 . . .σ n is σ n .The binary string of length s consisting solely of ones is denoted by 1 s and is often used as a security parameter for cryptographic primitives.A function :

Cryptographics Primitives
Our method is based on the perceived randomness of the outputs of a hash function.Let {0, 1} * denote the set of all finite-length strings over {0, 1} together with the empty string (Kleene closure).We follow the random oracle model where a cryptographic hash function H : {0, 1} * → {0, 1} n is modeled as a random oracle.For each input string m ∈ {0, 1} * , the output H(m) is independently and uniformly randomly selected (and fixed for subsequent queries with the same input) from {0, 1} n .
A symmetric encryption scheme SE = (Gen, Enc, Dec), where Gen is a key generation algorithm that on input a security parameter 1 s outputs a secret key k ← Gen(1 s ), Enc is an encryption algorithm that on input a key k and a plaintext message m outputs a ciphertext c ← Enc(k, m) and Dec is a decryption algorithm that on input a key k and a ciphertext message c outputs a plaintext message m such that m ← Dec(k, c) = Dec(k, Enc(k, m)).
Use the actual encryption end if 16: end procedure A symmetric encryption scheme SE has pseudorandom ciphertexts under a chosen plaintext attack if a probabilistic polynomial time adversary is unable to distinguish the ciphertext message c ∈ {0, 1} n from a uniformly random string σ ∈ {0, 1} n even if he was able to choose the plaintext message.Formally, a polynomial time adversary A = (A 1 , A 2 ) is run in two stages A 1 and A 2 and is given oracle access to Enc k , the encryption algorithm under a random secret key k.First, A 1 , is run with oracle access to Enc k and it outputs a cleartext message m together with its internal state information S that it may need in the second phase.Then, based on a coin toss, the second phase A 2 is run with either the correct encryption of the chosen plaintext message or a completely random string from the ciphertext space (together with the state information).This pseudorandom ciphertext experiment is defined in Algorithm 1.
The encryption scheme SE has pseudorandom ciphertexts under a chosen plaintext attack if the success advantage, is negligible for every probabilistic polynomial time adversary A. For an example of a cryptosystem with pseudorandom ciphertexts see for example [17].
A digital signature scheme is a public key authentication mechanism that enables a user to sign messages so that anybody can later verify the authenticity of that signature.A digital signature scheme Σ = (Gen, Sign, Verify) consists of three algorithms such that Gen on input the security parameter 1 s outputs a private and public key pair (s k , p k ) ← Gen(1 s ), Sign outputs a digital signature σ ← Sign(s k , m) of a message m computed using the private key s k and Verify verifies the signature σ by outputting 1 ← Verify(p k , m, σ) if and only if σ is a valid signature for m.A digital signature scheme should be existentially unforgeable meaning that no adversary should be able to generate valid signatures for any message m without the private key s k .For details, see for example [18].

Steganography
We follow the terminology of Hopper et al. [19], where Alice tries to convey a hidden message to Bob.The communication channel from Alice to Bob is modeled as a probability distribution on bit sequences.Each of the communication bits is augmented with a monotonically increasing timestamp.Communication on the channel is viewed as sampling from this distribution, possibly adaptively bit-by-bit.That is, a channel distribution C produces elements of the form where b i ∈ {0, 1} and t i ≤ t i+1 for every i ∈ {1, 2, . . ., n}.Let H denote a channel history of bits drawn from the channel.We denote by C H the channel distribution conditioned on the history H of already sampled bits together with their timestamps.Definition 1.A stegosystem Π = (Embed, Extract) with security parameter s on a channel C consists of two probabilistic algorithms such that 1.
In our investigation, the channel history H that is given as input to Embed and the stegotext c given as input to Extract will be replaced by access to the blockchain.That is, in our investigation, the blockchain represents the channel history.Definition 2. The reliability of a stegosystem Π = (Embed, Extract) with messages of length n is the probability min To be useful, we require that the stegosystem has high reliability for hiddentext messages of some length.That is, there is n ∈ N such that for hiddentext messages of length n, the reliability is reasonably high (for example 2/3) meaning that messages can be reliably embedded and extracted.
We also need to stegosystem to be secure.The security is modeled using a chosen hiddentext attack by giving an adversary A access to the channel C through a sampling oracle M and to an additional oracle which is either Π k that outputs stegotexts embedded by the stegosystem Π under a random key k or an oracle O that outputs random elements of C H for the current history H each with probability 1/2 [20].The adversary is free to choose the hidden message m and needs to distinguish which of these sets of two oracles it was given.If the advantage of distinguishing between these cases, where s is the security parameter, is negligible for any adversary, then the stegosystem is secure.For details, see [20].

A Simplified Blockchain Model
We shall start by describing our model for the blockchain.In order to clarify our disposition, we apply a simplified model that abstracts away technical details that are irrelevant for our investigation.For example, in practice, consensus can be reached with several different mechanisms, such as proof-of-work or proof-of-stake.However, for our method, we do not need to know the working details of the implementation as long as immutability is guaranteed.Our model is similar to the idealized public ledger model from [21].The simplified ideal blockchain B = (C, Σ, H, Read, Submit) consists of a chain of blocks C, an existentially unforgeable [18] digital signature scheme Σ, a cryptographic hash function H that is modeled as a random oracle and two oracles Read and Submit that can be used to read and write data to the blockchain.We shall describe these components in detail below.
For privacy reasons, payment addresses should be used only once for each payment.While it is technically possible, address re-use is generally considered bad practice since it may lead to identity exposure (For Bitcoin, see for example (https://en.bitcoin.it/wiki/Address_reuse)).Typically, the pseudonyms (that is, the addresses) in a blockchain are made to look like random.For example, in Bitcoin and Ethereum the address is computed by hashing the public key.In the random oracle model, the hash function outputs random strings.That is, for any public key p The simplified ideal blockchain contains 1.
Payee identities.In order to join the systems, an individual i generates a private and public key pair (s k ) using a digital signature scheme Σ.When paying, individuals are represented by their public keys p Recipient addresses.Payments to an individual i are sent to a one-time address a (i) ∈ {0, 1} n that the recipient can publish after hashing her public key: Each payment address appears only once in a payment and a new private and public key pair (s k ) is generated to receive additional payments.

3.
Money.The blockchain records the amount of money µ (i) that is associated to an address a (i) and, consequently, to a public key p

4.
Initial status of the blockchain.The initial total amount of money is distributed among a finite set of L individuals represented by their addresses a (1) , a (2) , . . ., a (L) .An amount of money µ (i) ≥ 0 is associated to each of these addresses and the initial status of the blockchain C 0 = (a (1) , µ (1) ), (a (2) , µ (2) ), . . ., (a (L) , µ (L) ) is public knowledge and the first block of the chain.

Payments. Let p (i)
k and p k be two distinct public keys.A valid payment P of amount µ from a user i to a user j is a tuple where a (j) = H(p k ) is the address of user j, µ is the amount of payment such that µ ≤ µ (i) , t is a unique identifier or a timestamp for the payment and σ is a digital signature, σ = Sign(s Once the payment is published in the blockchain, µ (i) and µ (j) are (implicitly) updated accordingly.

6.
The ideal chain of blocks.For the simplified ideal blockchain, all published payments are valid and are published in a tamper-proof chain C of payment blocks where C 0 is the initial status of the blockchain and the block C i+1 consists of all of the valid payments submitted to the blockchain after the publication of block C i .In the ideal system, a new block appears after a finite and fixed amount of time.
Note that users are free to join the system by generating their own private and public key pairs (s k , p k ).Any address generated from a public key appearing as the recipient of a payment is valid even if it has never appeared in the blockchain before.In practice, addresses are substrings of the digest H(p k ) in order to make then of a manageable length.However, for simplicity, we use the whole digest.For our proofs to work, it is imperative that a unique address is generated for every payment.
To enable access to the blockchain, in our model the users are given oracle access to oracles Read and Submit which can be used to read the contents of the blockchain block by block and to submit new payments.These are defined as 1.
Read(i) on input the block number i ∈ N ≥0 outputs the block C i from the chain . .) if it has appeared already.If the last block that has appeared is C i where i < i, Read outputs ⊥ indicating error.

2.
Submit(P) on input a payment P = (p k , a (j) , µ, t, σ) verifies that the payment is valid and saves it to be published in the next block to appear.If the payment is invalid, it is discarded.

The Suggested BLOCCE Scheme
In this section, we give a detailed description of our method called BLOCCE = (Gen BLOCCE , Embed, Extract) (Blockchain Covert Channel).We first give a general overview of the scheme.Then we describe the embedding and extraction procedures in detail.Finally, we show that embedding runs in expected polynomial time (exluding the time spent waiting for new blocks to appear on the chain) and that the method is reliable.

General Overview of BLOCCE
In our scenario, Alice attempts to convey a hidden message to Bob through the blockchain, while the adversary attempts to detect any such communication.Contrary to traditional steganography, Alice has no ability to alter the "covertexts" (that is, the blocks) stored in the blockchain due to the consensus mechanism.However, Alice has complete control on the payments she submits to the chain.In addition, the consensus mechanism will protect these payments from the possible modification attempts of the adversary.We will apply these payments and, in particular, the payment addresses to convey a hidden message to Bob.For simplicity, we shall send one bit for each block appearing in the chain.The general overview of the scheme is the following: 1.
Alice generates payments (of small amounts) from her own account to these addresses and, depending on the hiddentext message m, orders them such that the least significant bits (LSBs) of the payment addresses form m.

3.
Alice submits the payments in the correct order to the blockchain.4.
Bob reads the blockchain for payments made by Alice and reads the hiddentext message from the LSBs of the payment addresses.
Note that Alice does not lose any money by running the scheme excluding the possible transaction costs, since she controls the generated key pairs (s k ).While the transaction costs for certain blockchain implementations using the proof-of-work paradigm may be significant, there are also blockchains with other consensus mechanisms that do not require transaction costs.The scheme has been depicted in a simplified form in Figure 1.

Embedding Into the Blockchain
In this section, we give a detailed description of the embedding algorithm.For simplicity, we assume that Alice sends exactly one payment to the blockchain for each published block.This means that we are able to embed at most one bit for each published block.It would be possible to embed multiple bits, but it would complicate the formulation of the method.The embedding of a single bit simplifies our disposition and makes it more clear for the reader.For the same reason, we assume that the length of the hidden text is fixed and known to both Alice and Bob.We would not need to make such an assumption, but it greatly simplifies the description of the algorithms, as well as makes the discussion more clear.
Let B = (C, Σ, H, Read, Submit) be a simplified ideal blockchain, where Σ = (Gen Σ , Sign, Verify) and let the security parameter employed by the blockchain implementation be 1 s .Let H : {0, 1} * → {0, 1} n .Let also SE = (Gen SE , Enc, Dec) be a symmetric encryption scheme that has pseudorandom ciphertexts under the chosen plaintext attack and suppose that the security parameter 1 s is used for key generation.We note that it is crucial that an encryption scheme with pseudorandom ciphertexts is used.The argument of the security later in Section 5 is based the pseudorandomness of the ciphertexts.It should be noted that the utility of such encryption schemes in the construction of steganographic algorithms has been already observed in the literature [17,22].

Algorithm 2 Embedding Algorithm
a ← H(p end while 21: end procedure In the following, let the private and public key pair of Alice be (s . Let H denote the history of payments Alice has made through the blockchain.We denote by M H the probability distribution on the amount of money in a payment of Alice conditioned on the history H.It is important that the payments are made according to this distribution to prevent the adversary from detecting the communication.For any consequent payments, we assume that H is updated with the most recent payment and M H is the resulting probability distribution on the amount of money.For clarity, we also assume that Alice does not run out of money. On input a security parameter 1 s , the key generation algorithm Gen BLOCCE outputs a secret key of the form (λ, k), where λ is a uniformly random message start indicator such that λ ∈ {0, 1} n λ , n λ ∈ N, that will enable Bob to detect the start of the hidden message and k is an encryption key k ← Gen SE (1 s ).The length n λ of the message start indicator will play a role in the consideration of the reliability of the extraction.We assume that the concatenated length of the message start indicator λ together with the ciphertext c ← Enc(k, m) are known to both Alice and Bob and let us denote this length by N = n λ + |c|, where |c| is the length of the ciphertext c.Embedding is described in Algorithm 2.
Note that since the algorithm waits for a new block to be published after the submission of a payment, only a single payment from p (A) k is ever included into a single block.This will help Bob to extract the message.

Extraction
Extraction is straightforward.Bob will read the blocks from the chain and scans for any transactions made by Alice.Once Bob detects the secret message start indicator λ, he can read the encrypted hidden message.Since there is a single payment from p (A) k for each block, the message can be gathered in the correct order.Extraction is described in Algorithm 3. C = Read(j)

Algorithm 3 Extraction Algorithm
Wait until a block appears and read it: C = Read(j)

Computational Complexity and Correctness
We shall now show that the embedding can be done in expected polynomial time excluding the time spent waiting for new blocks to appear.That is, we consider here the computational complexity theoretic notion of "time", which is the number of computational steps needed for the algorithm to finish its task.Then, we show that the method is correct.That is, for a fixed N ∈ N and for any payload (λ, c) ∈ {0, 1} N , Alice can embed it into the blockchain and Bob is able to extract it with high reliability.
Let us first establish the computational complexity of the embedding algorithm.
Proposition 1. Suppose that Submit and µ ← M H run in O(1) for any payment history H. Let c E be the computational complexity of the encryption algorithm Enc, c t be the complexity of generating a unique identifier for a payment, c Gen Σ be the complexity of the digital signature key generation Gen Σ , c H be the complexity of computing the hash of a public key and c σ be the complexity of generating a signature for a payment.Embed runs in expected time of where N is the number of embedded bits and n λ is the length of the message start indicator.
Proof.Let m ∈ {0, 1} N−n λ be arbitrary.First, Alice runs a single encryption of m which takes c E (N − n λ ) steps.The while-loop is repeated for until i reaches the total number of embedded bits N. Now, i is updated whenever the LSB a n of a = H(p k ) is equal to c i .Suppose that the while-loop ends in a total of M iterations.Let A v denote the random variable on {0, 1} such that A v = 1 whenever a n obtained from the call to H in the v-th while-loop iteration equals c i and A v = 0 otherwise.Since H is a random oracle, A 1 , A 2 , . . .are independent and distributed according to the Bernoulli distribution with probability 1/2.
Let now I denote the random variable corresponding to i.We have I = ∑ M v=1 A v , where M is the number of calls we had to make to H.That is, I is distributed according to the binomial distribution Bin(M, 1/2) that has the expected value of M/2.Since we need to match N values, we expect the while-loop to run M = 2N times.In each of the loop iterations, we run Gen Σ .Finally, lines 12-18 are repeated exactly N times.
Note that Embed has to wait for a new block to be generated by the blockchain for each of the bits.Therefore, the actual time spent embedding depends also on the blockchain implementation.Even though the computational complexity of embedding is low (virtually close to the complexity of the applied encryption algorithm), for certain blockchains with inefficient block generation the actual time spent embedding can be long.For example, if the block generation takes time T which is significantly greater than the time needed to encrypt m, the time to embed N bits takes time N • T since the majority of the time is spent waiting for new blocks.The same is naturally true for extraction.
Next, we prove that BLOCCE is reliable provided that the length of the message start indicator λ is large enough.Proposition 2. BLOCCE on a simplified ideal blockchain B with a suitably long message start indicator λ is correct and reliable.That is, for messages of any length, Bob receives Alice's message with the reliability of at least where L A is the total number of payments Alice has submitted into the blockchain and n λ is the length of the message start indicator λ.
Proof.By our assumptions, the simplified ideal blockchain is tamper-proof meaning that the adversary is unable to prevent Alice's submissions from appearing on the chain.Therefore, we may assume that Bob receives every block submitted by Alice.For simplicity, let us assume that all of the message has already appeared on the chain if it has been sent.
Suppose that Alice transmitted m.Let us first show that the scheme is correct, whenever Bob has detected the correct message start indicator λ.We shall later show that this happens with high probability.By the description of Embed, Alice submits the ciphertext c directly after λ by submitting a single bit for each new block.Since the blockchain is tamper-proof, Bob receives all of these blocks and, by the description of Extract, extracts the correct c.Finally, Bob decrypts c to obtain the correct hidden message m.
Let us now show that the method is reliable.That is, let us show that Bob is able to detect the correct message start indicator λ with high probability.By the description of Extract, Bob first scans the blockchain for all of the payments from p (A) k , extracts the LSB a n of each address a and scans for the appearance of λ ∈ {0, 1} n λ .There are two ways that the extraction can fail: 1.
The LSBs of the addresses of the payments Alice has made before transmitting (λ, c) accidentally form λ that Bob misinterprets as the start of the hidden message.

2.
Alice has not submitted any message, but the LSBs of the addresses of her payments form λ.
These two cases are similar, but provided that Alice has submitted the same number of payments in both cases, in the latter the reliability is lower.To see this, we observe that in the first case, we are trying to find a false match for λ in the subchain of blocks that appeared before the true match for λ, while in the latter case, we have the whole blockchain to search for the false match.Therefore, we can restrict ourselves to the second case.By our assumptions, Alice submits exactly one payment for each published block.We now have to derive a upper bound on the probability of these payments forming λ.Now, H is a random oracle and both λ and the addresses are sampled from the uniform distribution.
Let α = α 1 α 2 . . .α L A ∈ {0, 1} L A be the string of LSBs of the addresses extacted (in order) from all of Alice's payments recorded into the blockchain (thus far).Let A 1 , A 2 , . . ., A L A denote the random variables corresponding to the choise of α 1 , α 2 , . . ., α L A each chosen independently and uniformly at random from {0, 1}.We derive an upper bound for the probability of hitting λ, We observe that λ can appear in any starting position i ∈ {0, 1, . . ., L A − n λ } in which case we have λ = α i+1 α i+2 . . .α i+n λ .Therefore, we estimate Pr [λ substring of α] with sum of the probabilities Pr A i+1 A i+2 . . .A i+n λ = λ for i ∈ {0, 1, . . ., L A − n λ }.This sum over-counts, since for large L A , λ can appear in multiple positions.However, we are only interested in deriving an upper bound.We have, Therefore, the reliability of BLOCCE is at least Interestingly, the reliability depends essentially only on the length of the message start indicator.Note that we have assumed that Alice has submitted a single bit for each block.If multiple bits are embedded into a single block, Proposition 2 needs to be updated accordingly.Finally, the result also assumes that addresses are not reused.

Security
In this section, we consider the security of BLOCCE.Following the notions of security for a stegosystem [20], we derive a security definition for the blockchain based covert channel by modeling a chosen hiddentext attack of a probabilistic polynomial time adversary on BLOCCE.We start by listing our assumptions.We then proceed to the formulation of a security definition called payment indistinguishability that requires the adversary to distinguish the payload containing payments from random.Finally, we show that BLOCCE satisfies this definition.

Assumptions
Our security proofs are based on a simplified ideal blockchain B. In particular, we assume that digital signatures are existentially unforgeable and the applied cryptographic hash function is modeled as a random oracle.There are three participants in our scenario: 1.
Alice represents the transmitter of our scheme and is known through her (payee identity) public key p k .She has agreed with the recipient Bob beforehand on a secret key (λ, k) ∈ {0, 1} n λ × {0, 1} n k that is not known to anyone else.She attempts to send a confidential message m to Bob through the simplified ideal blockchain B. Both Alice and Bob know the total amount of embedded bits N. Finally, Alice is aware of her "normal" distribution of payment amounts M H given her history of payments H and is able to sample from it.

2.
Bob represents the recipient of our scheme.He expects a confidential message from Alice through the blockchain, knows the public key p of Alice, as well as the secret key (λ, k) and the total number of embedded bits N.

3.
The adverary attempts to detect the presence of covert communication on the blockchain.
We assume that the adversary knows Alice and her public key p k .The warden also has complete access to the blockchain B through the oracles Read and Submit.The job of the adversary is to distinguish the secret communication payments from regular payments.

Payment Indistinguishability
We apply a computational indistinguishability based approach for the security definition.In particular, we define a scenario, where the adversary has to distinguish the payments containing the hidden message from a set of random payments.To formalize this into a rigorous security definition, we can apply a chosen hiddentext attack described in Section 2.3.We model the situation by defining the following payment distinguishing experiment against a stegosystem Π in which the adversary A attempts to distinguish between the scenarios where it is either given a set of randomly generated addresses or a set of addresses containing the conceiled message each with probability 1/2.
We give the adversary full control to choose the hiddentext message m and to observe the blockchain.However, we do not give the adversary the power to block Alice from sending payments to the blockchain or to prevent Bob from observing the blockchain.We also do not give him access to the private keys generated by Alice or to the secret key (λ, k) shared between Alice and Bob.Note that since the digital signature scheme is unforgeable, this means that the adversary cannot masquerade as Alice and forge messages into the blockchain.
The probabilistic polynomial time adversary A = (A 1 , A 2 ) is modeled in two stages A 1 and A 2 .In the first stage, it is given access to the full block history of the blockchain through the oracle Read, the ability to submit payments through the oracle Submit, as well as the public key p (A) k of Alice.In the first stage, the job of the adversary is to output a hiddentext message m that Alice is required to send to Bob.In the experiment, a coin is then tossed and, based on the outcome, either the stegosystem Π is applied to send m into the blockchain or a set of random valid payments from Alice is generated.In the second stage, A 2 is invoked with oracle access to the blockchain and it eventually outputs a bit trying to distinguish whether the blockchain contained the hidden message or just a set of random valid payments.As with the pseudorandom ciphetext experiment, a string storing the internal state information S of the adversary is also passed from A 1 to A 2 .
The rigorous definition of the payment distinguishing experiment is the following.(s k , p k ) ← Gen Σ (1 s ) 3: (m, S) ← A Read,Submit 1 (p k ) S is internal state information of the adversary that can be passed to the second stage 4: Actual message is sent to the blockchain 7: Embed((λ, k), m, B) end if 21: end procedure Definition 3. Let Π = (B, Gen Π , Embed, Extract) be a blockchain stegosystem based on a simplified ideal blockchain B = (C, Σ, H, Read, Submit) and let A = (A 1 , A 2 ) be a two stage probabilistic polynomial time adversary.The payment distinguishing experiment is defined by Algorithm 4.
We also define the advantage of an adversary in detecting the hidden message based on the payment distinguishing experiment.Definition 4. The payment detection advantage of an adversary A on a blockchain stegosystem Π on a simplified ideal blockchain B is If the payment detection advantage of the adversary is significantly greater that 0, then, in practice, the adverasry is able to detect the conceiled message.For a secure system, we want this advantage to be negligible.Definition 5.The blockchain stegosystem Π securely embeds into the blockchain B if for every probabilistic polynomial time adversary A, there is a negligible function such that for every s ≥ 1.

Security Proof of BLOCCE
We shall now show that BLOCCE securely embeds into the blockchain.In particular, we derive an algorithm that reduces the problem of distinguishing the ciphertexts of the encryption scheme SE used by BLOCCE to the problem of distinguishing the payments made with BLOCCE.Since SE is assumed to have pseudorandom ciphertexts, this shows that there is no adversary that succeeds in the payment distinguishing experiment with non-negligible advantage.Proposition 3. BLOCCE securely embeds into a simplified ideal blockchain B. For every probabilistic polynomial time adversary A there is a probabilistic polynomial time adversary A such that where SE is the encryption scheme used in BLOCCE and is a negligible function.
Proof.Let A = (A 1 , A 2 ) be any two-stage probabilistic polynomial time algorithm considered as an adversary against BLOCCE.We need to show that there is a negligible function such that Adv PAY_DETECT A,BLOCCE,B (s) ≤ (s).
Suppose that there was an adversary A that succeeds with a non-negligible advantage.Based on such an adversary, we shall construct a probabilistic polynomial time algorithm A that applies A to achieve a high ciphertext distinguishing advantage for the symmetric encryption scheme SE.
In particular, we show that the ciphertext distinguishing advantage is at least the advantage of A in the payment distinguishing experiment.Since, by the assumptions, SE has pseudorandom ciphertexts (see Section 4.2), and thus the ciphertext distinguishing advantage is negligible for every adversary, we get the claim.
For this, let A = (A 1 , A 2 ) be a two-stage adversary, that applies A = (A 1 , A 2 ), given below.In the description, we need to save the status information S that A 1 outputs in order to invoke A 2 in the later state.In addition, since we are emulating a payment distinguishing experiment, we also need to initialize a blockchain and to maintain its state.Therefore, we store the state and internal information of the blockchain into an information string S for a coherent second stage A 2 .
The adversary A = (A 1 , A 2 ) is described in Algorithms 5 and 6.

Algorithm 5 First Stage of the Adversary
Initialize a blockchain B 3: (m, S) ← A Embed λ||c into B by simulating Embed

2.
Suppose now that D = 0 and c is a uniformly random string.By the description of Gen BLOCCE , λ is also uniformly random, meaning that a uniformly random string λ||c gets embedded into the blockchain.By the description of the payment distinguishing experiment, this is equal to the case b = 0 and We have established that A , which we constructed to be an adversary to distinguish the ciphertexts of SE from random, succeeds in its experiment if and only if A succeeds in its own payment distinguishing experiment.Therefore, By the definition of advantage, By the definition of BLOCCE (see Section 4.2), the applied symmetric encryption scheme SE has pseudorandom ciphertexts under a chosen plaintext attack, and there is a negligible function such that Adv PRC A ,SE (s) ≤ (s) for every s ≥ 1 (see Section 2.2).Since A was any two-stage probabilistic polynomial time adversary and we have the claim and BLOCCE securely embeds into a simplified ideal blockchain.

Discussion and Future Work
To simplify our investigations, we restricted ourselves to the embedding of a single bit for each block.For many applications, such as Bitcoin, the time to publish a new block is counted in minutes instead of seconds.Therefore, the throughput of our scheme is low.However, it is easy to increase the number of embedded bits per block.One way to do it would be to match multiple LSBs of the address.However, in such a case, the expected computation time for embedding grows exponentially in the number of bits, since the bits are drawn from the random oracle.Proposition 2 would also need to be updated accordingly.It is more efficient to submit several payments into the same block.In such a case, the ordering of the message bits have to be ensured, for example, by using the unique payment identifier t or the public payer keys.
Our method requires the user to distribute the message bits over several payments.For many contemporary blockchains that apply the proof-of-work paradign for it consensus mechanism the transaction costs may be significant for larger hidden messages.However, there are newer consensus mechanisms that aim to address the energy usage issue of the proof-of-work paradigm to improve block generation efficiency and to ultimately possibly remove transaction costs in certain applications.Such mechanisms include proof-of-stake, distributed proof-of-stake and byzantine fault tolerance based mechanisms.For example, at the moment, Ethereum is planning on moving to the prove-of-stake paradigm.While there will be high transaction costs in certain use cases of blockchain, we believe that, in many applications, transaction costs will not be an issue and our method will prove to be useful.
Performance of our method can be also increased by pre-computing a list of L addresses, where L is significantly greater than N, the total length of the embedded message.Since the LSBs of these generated addresses are random, approximately one half will be ones and the rest zeros.Once Alice is ready to embed the payload (λ, c), she can pick the addresses in order from the pre-generated list.This approach would also mitigate the computational complexity of the embedding of multiple bits into a single payment.Furthermore, addresses to embed λ can be pre-computed immediately after the key (λ, k) has been agreed on.However, we leave these considerations for future work.
It should be noted that it is important that the hidden message is first encrypted using an encryption scheme SE that has pseudorandom ciphertexts.If it is not the case, the adversary is able to detect the non-random nature of the LSBs of the addresses in Alice's payments.For the same reason, the underlying blockchain should not allow the reuse of addresses.In addition to being a privacy risk [23][24][25], it would render the payment indistinguishability approach to the security inapplicable.For such systems, we would need to formulate a security definition that is based on the probability distribution of "normal" payments of Alice and the payload containing payments should be indinstinguishable from this distribution.We leave these considerations also for future work.
In this paper, we have not implemented BLOCCE in practice.Instead, we have considered it in the simplified ideal blockchain model that abstracts away details that are not relevant for the theoretical investigations.For example, the network is completely abstracted away in the simplified model.However, the details, such as the network, are relevant when considering a practical implementation of the scheme.We leave it as future work to investigate the secure implementation of BLOCCE using an existing blockchain such as Ethereum.

Conclusions
We suggest the first provably secure method called BLOCCE of implementing a covert communication channel over a blockchain.Our proofs are shown in the random oracle model, where the cryptographic hash function used by the blockchain is modeled as a random oracle.We formulate a simplified ideal blockchain that models the blockchain implementations underlying existing cryptocurrencies.Based on this model, we suggest a method of embedding a single bit for each block using payments submitted to the blockchain.We show that the method is reliable and runs in expected polynomial time.The method can be generalized to embed multiple bits to increase the throughput.To model the security of covert channels on a blockchain, we formulate the notion of payment indistinguishability, where the transmitted hidden message should be computationally indistinguishable from random payments.Finally, we show that BLOCCE satisfies this definition.

Figure 1 .
Figure 1.A simplified overview of the suggested method.Alice submits payments to the blockchain such that the LSBs of the addresses form the message m. Bob can read the message bits from those addresses and form m. In the actual method, encryption is also used and the start of the message is indicated.

10 :
Interpret a as a bit representation a 1 a 2 . . .a n ∈ {0, 1} n 11:if a n = c i then

Algorithm 4
Payment Distinguishing Experiment 1: procedure PAY_DIST_EXP Π,B A (1 s ) 2: end procedure If A is probabilistic polynomial time, so is A .Suppose that A was run in an PRC_EXP experiment.Let D denote the random variable corresponding to the experiment coin toss (b of PRC_EXP line 4) such that D = 1 if A was given the correct c ← Enc(k, m) under a random key k and D = 0 if it was given a random c ← U({0, 1} N−n λ ).Depending on D, we have the following two cases: 1. Suppose first that D = 1 and A was given the correct c ← Enc(k, m).Then A embeds λ||c into the blockchain which follows the payment distinguishing experiment for A for the case b = 1.Since A 2 outputs the same bit b as A 2 we have Pr A succeeds in PRC_EXP|D = 1 = Pr [A succeeds|D = 1] .
The variables used in Embed, Extract and the following proofs have been collected into Table1for easy reference.
9:for any payment P ∈ C do 10:if P is from p (A) k then 11:Extract address a from P and get the LSB a n 12:Scan if we have found the entire λ ∈ {0, 1} n λ 13:C = Read(j) 20: if C =⊥ then 21:Wait until a block appears and read it: C = Read(j)

Table 1 .
Used variables and notations.