A Small Subgroup Attack on Bitcoin Address Generation

: We show how a small subgroup conﬁnement-like attack may be mounted on the Bitcoin addresses generation protocol, by inspecting a special subgroup of the group associated to point multiplication. This approach does not undermine the system security but highlights the importance of using fair random sources during the private key selection


Introduction
Since its appearance in 2008 [1], Bitcoin (BTC) has been attracting a considerable attention from many communities, embracing several different fields, such as cryptography, computer science, and finance [2]. Although it has seen the light as a decentralized digital currency, its underlying blockchain structure has been rapidly adopted for diverse purposes [3,4]. Instances of its use, beyond modern cryptocurrencies, are smart contracts [5], financial services [6], supply chain monitoring [7], and digital notarization [8].
The security of these distributed ledgers has been widely studied and usually relies on a mixture of cryptographic and consensus-reaching algorithms [9]. In the renowned case of BTC, the consensus is reached via a hashcash proof of work [10], while public (asymmetric) cryptography is employed to ensure transactions legitimacy. In a nutshell, apart from some special cases, the owner of a bitcoin is any person who can prove (following the BTC protocol) to own the private key associated to the public address where that bitcoin has been previously sent.
Thus, it is crucial for such a cryptosystem to generate private keys that do not allow their recovering from public information. In the BTC case, this depends on the difficulty of the Elliptic Curve Discrete Logarithm Problem (ECDLP) and the preimage resistance of some hash functions. However, if the keys are not uniformly distributed, there may be a (relatively) small portion of the keyspace where they can be easily found (see, e.g., Reference [11]).
In this short note, we pursue the latter idea. In 2017, we conjectured that the key generation algorithms used by some BTC wallets to select private keys were flawed, that is, there was a non-negligible probability that some of the private keys would lie inside a small keyspace with prescribed algebraic properties, which can be entirely inspected. To be more precise, given the size q of the BTC curve, in 2018, we scanned a reasonably small subgroup of the multiplicative group (Z/qZ) * , actually finding (against all odds) a few public addresses corresponding to elements of such group. Immediately after that, we devised a method to warn the anonymous bitcoin's holders of this problem. Then, we followed a prudent responsible disclosure behavior, not mentioning the current note publicly until 2020 [12]. This paper is organized as follows: we begin by recalling, in Section 2, the mathematical notions employed for BTC address generation, while, in Section 3, the aforementioned group of weak private keys is determined and analyzed. The method effectiveness and further possible work directions are finally discussed in Section 4.

Elliptic Curve Discrete Logarithm Problem
In this section, we recall some basic notions and facts about elliptic curves and their related discrete logarithm problem. For a more detailed discussion of these topics, we refer to Reference [13,14].
Since we are here interested only in cryptographic applications of elliptic curves, we present this celebrated mathematical object in its simplest version (as the short Weierstrass form of a non-singular planar cubic). In the same spirit, in this short note the underlying field F may always be considered finite [15]. Given a prime integer p, we will denote the prime field with p elements by F p . Definition 1 (Elliptic curve). Let F be a field of characteristic different from 2 and 3 and A, B ∈ F be elements such that ∆ = 4A 3 + 27B 2 = 0. The elliptic curve defined by A, B over F is the set It is well-known that on an elliptic curve E one may define the chord-tangent point-addition law which makes E into an abelian group with O as the identity element, and with the inverses given by −(x, y) = (x, −y).
This operation reflects the following geometric construction: given two points P 1 , P 2 ∈ E, the point P 1 + P 2 is the inverse of Q, the third point of intersection of the curve with line connecting P 1 and P 2 . When the underlying field F is finite, this construction still holds with a toric identification of F × F, as in Figure 1. The point at infinity is a formal point that may be considered to be vertically aligned with every pair of inverse curve points.
The iterated sum of a given point naturally defines an action of Z on E, as follows.
Definition 2 (Scalar multiplication). Let k be a positive integer and let P be a point on an elliptic curve. We define k · P as k · P = P + P + P + · · · + P k .
Given a point P ∈ E, we denote by < P > the subgroup generated by P in E, i.e., all the points of E that may be written as k · P for some k ∈ Z. Problem 1 (ECDLP). Let E be an elliptic curve defined over a finite field F, and let P ∈ E be one of its points, called the base-point. Finding the discrete logarithm of any Q ∈< P > amounts to finding an integer k ∈ Z such that Q = k · P.
While computing a point-multiple is computationally easy, solving the discrete logarithm problem is considered hard, except for few special cases [13]. For this reason, scalar multiplication over elliptic curves is considered a valid one-way function, on which cryptographic protocols may be designed. These objects own many other cryptographically desirable properties, for which we refer to Reference [16].
In practical implementations, curves of large and prime size are considered. The point groups of such curves are cyclic, so that each non-zero element is a proper generator and a valid base-point.

The Bitcoin Address Generation
The BTC system is based on public key encryption: in short, addresses are generated by public keys, which are originated from private ones, withheld by users. All these steps are performed through one-way functions, which are considered impossible to reverse in a feasible time.

Public Key Generation
In the BTC system, both private and public keys depend on an elliptic curve E , known as Secp256k1, which belongs to the Standards for Efficient Cryptography Group [17]. The parameters of i.e., its affine points form the zero set of The base-point is defined as P = (P x , P y ), where P x = 55066263022277343669578718895168534326250603453777594175500187360389116729240, It may be verified that this curve is a cyclic group, of prime order thus, E is isomorphic to Z q via the mapping P → 1. A private key is any integer 1 ≤ k ≤ q − 1, so that the key space may be identified with Z q * .
The corresponding public key is the point k · P of E . This curve may be visualized as a set of scattered points inside the F p × F p plane with a prescribed base-point P, in which multiples generate the whole curve, as in the small example of Figure 2.

Address Generation
Given any public key Q = (Q x , Q y ), the corresponding address may be computed by first applying two hash functions consecutively, SHA-256 [18] and RIPEMD-160 [19], then adding an initial byte and four final checksum bytes, and finally encoding the result to Base58 [20]. To sum up, the BTC address is obtained from any public key (we are omitting a few details) by performing where || means string concatenation, and c is a prescribed constant.
For the following discussion, it is important to notice that SHA-256 turns any input element into a 256-bit string, while RIPEMD-160 squeezes it into a 160-bit string. Therefore, even if the final address is 200-bit long, there may be, at most, 2 160 10 48 distinct BTC addresses.

Subgroup Detection
Since private keys are elements of Z q * , a straight brute force attack to the Bitcoin system seems infeasible, as inverting the map would imply solving an ECDLP instance. However, there are few small subgroups H ≤ Z q * that may be inspected, for which an exhaustive computation of all the possible keys and corresponding addresses may be carried out. This way one may compute the inverse of the restricted map Since the keys are supposed to be uniformly distributed, there is no probabilistic argument suggesting their presence in specific small subgroups. However, assuming that this is the case, we need to choose a suitable subgroup. In this view, by considering the factorization of q − 1 into prime integers it is not difficult to test that the maximal subgroup of moderate size (i.e. that can today be checked with an average computer) contains N elements, where Such a group may be easily produced by considering any primitive element t of Z q , such as t = 7, and considering the element g = t p 1 ×p 2 ×p 3 , which generates the subgroup Indeed, we summarize in the following theorem two well-known results. Theorem 1. Let F be a field. Then, any finite subgroup G ≤ F * is cyclic. Moreover, for every positive integer M dividing |G|, there is a unique subgroup H ≤ G such that |H| = M.

Subgroup Inspection
The group H as previously defined has less than 20 millions elements; therefore, we were able to straightforwardly construct, in a few days, the BTC addresses originated by all private keys k ∈ H and to check whether they have appeared in the BTC blockchain since its creation until 2018.
We recall that an address appears in the blockchain whenever it receives any amount of bitcoin. Note that the number of addresses in the BTC blockchain does not correspond to the number of actual BTC users, as modern wallets handle many different addresses for each user.
With this procedure, we found 4 BTC addresses, in which private keys belong to H: Two of them, (3) and (4), came from the trivial keys 1 and −1, and they might have been generated on purpose, but the remaining two addresses appear to be legit. In particular, a blockchain inspection (Reference [21], 2018) suggests that one of them (2) has been used as temporary address for moving a small amount of bitcoins, while the other (1) has probably been used as a personal address, since its owner has stored some bitcoins there for 4 years.
To show that the private key of address (1) was really recovered, we used three of our addresses A. 1FCuka8PYyfMULbZ7fWu5GWVYiU88KAU9W, B. 1NChjA8s5cwPgjWZjD9uu12A5sNfoRHhbA, C. 1695755gMv3fJxYVCDitMGaxGu7naSXYmv, and we performed tiny transactions from each of them, as shown in Figure 3. These operations may be easily verified through any blockchain explorer, such as Reference [21], by searching for their transaction IDs:

Discussion and Conclusions
Although our approach has found only few addresses that may be generated by secret keys in the considered subgroup, it is worth noting that this outcome is still unexpected. At the time of our analysis (June 2018), there were about M = 4.5 × 10 8 .
BTC addresses that have appeared in the blockchain [21]. We have inspected N = 18, 051, 648 addresses of those generated from the secret keys in H. As we have already noticed in Section 2.2, the number of possible distinct addresses is, at most, which we may assume to be all existent under the assumption of uniform distribution of the considered hash functions (SHA-256 and RIPEMD-160). Therefore, the expected probability of finding at least a BTC address in the blockchain from a random sampling performed N times was We conclude that, if the BTC private keys had been generated in a random manner, we would not have found any addresses.
There may be many reasons why such an unlikely event has occurred. Probably it depends on the pseudo-random algorithms (or their implementations) used by wallets to generate user's addresses. Our 2017 conjecture started from this obvious fact from group theory: if g ∈ H ≤ G, then g a ∈ H for all a. Therefore, if private keys are obtained via iterated powers, once the wallet gets a key inside a subgroup, all the other keys will be in the same.
As for future research lines, it might be interesting to realize such an approach on different cryptocurrencies, or to examine other algebraic structures that may be associated to the private key space, such as special cosets of H. Funding: This note has been supported in part by the European Union's H2020 Programme under grant agreement number ERC-669891.