Bitcoin is an important electronic and decentralized cryptographic currency system proposed by Satoshi Nakamoto [1
]. It is based on a peer-to-peer architecture and there is no need for a central authority or central bank to control the money supply within the system [2
]. Bitcoin relies on a proof-of-work system to verify and authenticate the transactions that are carried out in the network. For further verification purposes all transactions are public [2
]. On the high economic relevance of Bitcoin cf. a plethora of online press articles such as those at the website of The Economist
In this article, we analyze the public transaction history of the first four years of Bitcoin with respect to economic and network aspects. We have chosen this time period in order to limit the amount of data to be analyzed and to have a specific time frame that can be compared with later analyses in future work. This period gives valuable insights into the birth of an important electronic currency. The objective is to investigate the evolution of the Bitcoin economy during this initial period by enriching the data from the public ledger with off-network data such as business categories and geo-locations. These novel analyses supersede some preliminary results [4
], especially in the geographic dimension, give insights on the business distribution by countries and how businesses evolve over time in the network. The descriptive statistics reveal what the major Bitcoin businesses and markets are. Furthermore, regional differences in the business distribution could be found. In the network analysis, the small world phenomenon is investigated and is established for several subgraphs of the Bitcoin network. The analysis of the degree distribution and power law on the time, business, and country aggregation level reveals that large portions of the network follow a power law distribution and can be considered as scale-free networks. Moreover, further network characteristics will be investigated.
This paper is structured as follows. In Section 2
, we will give an introduction into the technology of Bitcoin and the major entities and roles of this economy. In Section 3
, related work is discussed. Section 4
presents the methods used in our study, in particular the data collection and storage and the metrics from social network analysis and graph theory that are applied in the later analysis. The empirical data we collected, its structure and refinement are presented in Section 5
. Section 6
presents the results of our investigation, starting with the business-related analyses before turning the focus to the network structure of Bitcoin transactions. Section 7
concludes the article with a discussion of limitations and an outlook on future work.
2. Bitcoin Technology and Economy
A Bitcoin can be defined as a chain of digital signatures. By transferring the electronic coin to the next user it gets digitally signed with a hash of the previous transaction and the public key of the next owner; adding these together to the end of the Bitcoin. The signatures can be verified by the payee to prove the chain of ownership [1
To avoid inflation in the system, a unique feature of the currency is that it has a predetermined limited number of 21 million coins in circulation. Until that point, which might be reached around the year 2140, the money supply will increase at a certain rate [5
]. To provide some sort of anonymity, direct personally identifiable information is omitted from the transaction. Therefore, the source and destination address are encoded in the form of public keys. Every public key that serves as a pseudonym has a corresponding private key which is stored in the electronic wallet. These are used to sign or authenticate any transactions. To become part of the peer-to-peer network, one needs to have a client software that runs either on the own device or as cloud service [6
A node in the network will not accept multiple transactions using the same inputs. The nodes accept only the first transaction they receive and reject all subsequent. This is done to prevent double spending from malicious users and is part of the proof-of-work concept [5
The main idea behind the proof-of-work system is to make it expensive for a single user or a group of users to rewrite the history of transactions once it has been accepted as definite. This should prevent malicious users from double spending their Bitcoins [6
The solution that Nakamoto [1
] proposed is the use of a timestamp server that takes the hash of a block of items, timestamps it, and widely publishes the hash. The proof of work involves using hash algorithms such as SHA-256 to find a specific value. The objective is to increment a nonce in the block until a value is found that results in a required number of zero bits. The average work to do so is exponential with the number of zero bits, but the result can be easily verified. There is a predetermined target difficulty that is updated for every 2016 blocks that have been generated. This ensures that the time it takes to generate one block is on average about 10 min. The block is only accepted by users if all transactions in it are valid and the Bitcoins have not been spent previously. Users show their acceptance by using the newly found hash in the “previous hash” section of the next block they attempt to generate; thus adding a new block to the chain. This chain is called the block chain or transaction log and contains the entire history of all transactions that have been carried out in the network [5
The generation of blocks by users is called mining and is achieved through providing a certain amount of computation power to the network to solve the proof-of-work problem. The expending of computation power is rewarded when generating a block. There is competition to get the reward, and the more computation power a user or group possesses the better the chance to get it. The reward is predetermined and started at 50 BTC. It will decrease by half every 210,000 blocks. In that way new Bitcoins are introduced to the network. This procedure will continue until the predetermined final amount of 21 million Bitcoins is in circulation, around the year 2140.
shows a general overview of the Bitcoin economy with its major participants. Users can exchange their fiat currencies into Bitcoins via exchange platforms or local exchanges (1); withdraw money from recently introduced Bitcoin ATMs (2); store Bitcoins in an online wallet (3); use payment services in transactions with online merchants (4); pay with Bitcoins in local shops or bars (5); gamble with Bitcoins on various gaming platforms (6); incorporate transactions in a block, called mining (7); thus verifying the transactions and publish it to the network via the block chain (8).
There is a vast amount of merchants and services that already accept Bitcoins as a currency in exchange for their offerings. The services can be mainly categorized into exchanges, wallets, mining, payments, gambling, and vendors.
Exchanges: on exchanges one can trade their fiat currencies, other crypto currencies, and even gold into Bitcoins. The exchange platforms are mainly electronic but there are also local exchanges.
Wallets: web wallets are similar to banks in the real economy where Bitcoins owned by users can be centrally stored on online platforms. The major advantage is that users can access their Bitcoins from every device connected to the web and have less effort to protect their wallet.
Mining: mining is the contribution to the coin generation process, mainly executed in a mining pool. For a definition and comparison of mining pools see [7
]. In such mining pools, miners share their computing resources and each participant receives a reward for the particular contribution of computing power.
Payments: payment services enable online merchants to accept Bitcoins in the same way as they accept Visa or Paypal payments in their local currency. It reduces transaction costs, avoids chargebacks, Bitcoin exchanges rate risks, and identity thefts.
Gambling: gambling services offer a wide variety of online games such as dice games, roulette, and other casino related games where users can gamble with their Bitcoins.
Vendors: via online merchants users can exchange their Bitcoins for almost every kind of product such as multimedia content, electronics, travel, gift cards, clothing etc. There are also vendors that function as marketplaces such as Ebay.
The nature of Bitcoin services is quite unstable due to regulations or unsecure platforms that facilitate theft. The online merchant Silk Road was shut down because of trading illegal goods while the online wallet services MyBitcoin and Instawallet were closed due to several thefts [8
3. Related Work
Most of the recent research focuses on the de-anonymization of Bitcoin users by introducing clustering heuristics to form a user network. To gain knowledge and get novel insights about the economic relationships between users this is an essential criterion. Furthermore, linking external information, relate to executed transactions, is an important step in analyzing the Bitcoin economy.
Reid and Harrigan [2
] developed a clustering heuristic to form a user network by creating meaningful groups of users out of the vast amount of pseudonymous addresses (public keys) involved in transactions. The main idea is that multiple inputs of different public keys into a single transaction probably belong to the same user since the use of the corresponding private keys is highly coordinated. In order to construct the user network, all public keys that belong to the same user need to be clustered into one node or user entity. This user network represents the flow of Bitcoins between users over time. This clustering heuristic holds true if users do not share their private keys, but this is not always the case, for example in the case of web wallets that pool many private keys and would therefore mistakenly be clustered as a single user [9
]. The clustering approach was extended by Androulaki et al.
] who use another property in the Bitcoin protocol that is more complex but not that reliable in comparison to multi-input transactions. Since Bitcoins transacted from a single address need to be fully spent, the change is collected back to a newly generated address, the “shadow” or “change” address. In a transaction with two outputs, where one address has never appeared before in the block chain while the other address is public in the block chain, one can assume that the new address belongs to the user who initiated the transaction [10
]. This approach is very reliable under the assumption that users rarely issue transactions to two different users. Pay-outs from mining pools or bets on gaming sites are examples where this is not always the case. Hence, Meiklejohn et al.
] refined the clustering heuristics to account for these and other circumstances to increase the reliability.
Beside the clustering heuristics one want to add further information (e.g., IP addresses, geo-locations, businesses, and trade data from exchanges), which are related to transactions, to the user network. In their research Reid and Harrigan [2
] also proposed several methods to overcome anonymity including the integration from off-network information such as email addresses, shipping addresses, IP addresses, or bank and credit card details. This information is mainly held by businesses that accept Bitcoin as payment and other services like exchanges, laundry services and mixers. The researchers use a number of publicly available sources and integrate their information with the user network. For instance, they scraped the web site Bitcoin Faucet over time and were able to associate IP addresses with the public keys involved in the transaction. Thus, they could plot a map of geo-located IP addresses belonging to users who received Bitcoins over a period of one week and overlay it with the user network.
Another approach of getting an IP address related to a transaction, was introduced by Kaminsky [11
]. It exploits a leakage at the TCP/IP layer in the Bitcoin system. When a user is connected to every node in the peer-to-peer network, then the first node that publishes a transaction can be safely assumed the initiator of it; thus, the related IP address can be linked to that user [11
] downloaded and analyzed the publicly available IP addresses related to transactions from the site Blockchain.info for a short time horizon. The data was then linked to public known IP addresses from anonymizing services such as Tor and Proxies. The results show that around one percent of transactions could be related to anonymizing services.
One more valuable source of identifying information is the voluntary disclosure of public keys by users on Bitcoin forums or other social network sites such as Twitter streams [2
Meiklejohn et al.
] gathered external data from various Bitcoin services such as gambling, mining, exchanges, and vendors to link it with public keys that interact with those businesses. Therefore they engaged in 344 transactions with a wide variety of different types of services. Another approach they propose is the collection from publicly available sources where users claim their own addresses. The site Blockchain.info provides the information in a convenient way via tagging the transactions with the associated business. With the collected data and the applied clustering heuristics they were able to classify a vast number of transactions in the user network [8
] introduced a tool called BitIodine that includes several external data from various sources such as Mt.Gox, Bitcointalk, and Blockchain.info in the analysis of the Bitcoin network. Through the APIs from Mt.Gox and Blockchain.info and several scrapers the researcher is able to gather most recent data about the transactions and the associated Bitcoin users [9
Linking external information to transactions and subsequently to the formed user network gives meaningful insights in transaction flows and the overall Bitcoin economy.
7. Conclusion, Limitations and Outlook
This explorative research examined the Bitcoin economy and network by introducing an enriched data. The data model incorporates the Bitcoin user network introduced by Reid and Harrigan [2
] and scraped information from several websites to construct new aggregates on the business and geographical level. This information contains business tags, IP addresses, and geo-locations that could also be associated to Bitcoin users. Furthermore, trade data on the BTC/USD exchange rate and data on anonymous services such as Tor were extracted.
To conduct analysis on the business aggregation level the tags related to a transaction were categorized into 13 categories. Over 54% of all transactions could be classified. The first analysis on business categories reveals that around 48% of all transactions are related to gambling services and almost 46% are associated to the dice game SatoshiDICE. The second and third largest business according to the number of transactions is the mining pool Deepbit with 4.3% and the exchange platform Mt.Gox with 1.7%, respectively. When analyzing the number of transactions and the transaction volume for particular businesses, a different transaction pattern was found among the categories. Businesses such as exchanges, vendors, or wallets transact rather large amounts of Bitcoins, while businesses such as gambling or donation transact very small amounts of Bitcoins in the network. This was expressed by the T/V ratio (transaction to value ratio). For instance, the T/V ratio for gambling is 25 and that of the vendor business is 0.1. Further analysis on the transaction value distribution reveals that 63% of all transactions are in the range from 0.00000001 until 1.0 BTC and the gambling businesses incorporate most of them but with a decreasing trend in higher value regions. In the ranges above 100 BTC, most transactions are related to the exchange business. The development of the business categories over time shows that Bitcoin Talk users and the donation services were among the most active participants because of their importance during the startup phase of Bitcoin. Later on, web wallets, media and news, and exchange platforms enter the Bitcoin economy. With the introduction of gambling services the number of transactions gets inflated, especially by the most popular SatoshiDICE game. The differences between the number of transactions and the transaction volume for particular categories could be also observed over time.
The analysis on the geographic aggregation level was not done before on this scale and requires the exclusion of IP addresses that could be associated to Tor, Proxy or VPN services (~1.6% of transactions). When analyzing the Bitcoin economy geographically, one can see that the major markets are in the U.S. and Germany. The geographic distribution of the transaction volume reveals that Bitcoins are mainly used in countries with a good infrastructure during the analyzed time interval.
With the linkage of geo-locations to transactions with business tags, an innovative analysis of the distributions of businesses per country could be conducted. Northern European countries like Germany, Sweden, Russia, or France have a similar business distribution with a strong focus on mining with around 56%. In contrast, the U.S., Canada, and Brazil, which share a common business distribution with a focus on the gambling business with around 65%. A special case could be seen for the Chinese Bitcoin market, where 87% of transaction volume is linked to the gambling business. Another finding is that countries such as Spain, U.S., Canada, and Argentina were more engaged in the exchange business with around 10%, indicating a higher speculative behavior. In the cases of Spain and Argentina this could be related to economic and financial distress, and the searching for new save havens, while in the U.S. and Canada users are more market oriented and seeking for high abnormal returns through speculation on the Bitcoin exchange rate.
Investigation of the degree distribution and power law over time reveals that the Bitcoin network follows a power law distribution over large parts of the value range. Since 2010, the Bitcoin network can be considered as a scale-free network with a power law slope coefficient α in the range between 2.0 and 2.6 in the time horizon from 2010 to 2013. The degree distributions for particular businesses show strong heavy tails for the gambling, mining, and exchange business. These business categories are mainly driven by one business with an abnormal high degree. The power law slope coefficient α for all business categories (except the wallets business) is in the range between two and three, indicating a scale-free network. On the country level, the degree distributions show a similar result. All considered countries have a power law slope coefficient α between two and three, indicating the existence of a scale-free network. The analysis reveals that the majority of the investigated subgraphs of the Bitcoin network are scale-free networks.
To identify major hubs in the Bitcoin network, the degree centrality was analyzed. The results on the entire Bitcoin network in the most active time from September 2012 to April 2013 reveal that the major hub nodes are controlled by the exchange platform Mt.Gox, the gambling service SatoshiDICE, and the web wallet service Instawallet. Next, the degree centrality was analyzed on the business aggregates. The results show that the dominant services in a business category are also the major hubs in the network. This is especially the case for SatoshiDICE in the gambling business, Mt.Gox in the exchange business, Deepbit in the mining business, and Instawallet in the wallet business category.
The analysis of the average clustering coefficient indicates the existence of the small world phenomenon in the Bitcoin network over time as well as on the country and business aggregation level. This kind of analysis needs high computation power and was therefore tested on minor subgraphs on the business and country aggregation level. The existence of the small world phenomenon could be demonstrated for the country aggregations China, Brazil, Italy, and Argentina. For the business aggregations, the wallet and vendor businesses were investigated. Only the wallet business could be considered as small world network. The vendor business missed the requirements. This shows that a rather high clustering coefficient is just a first indicator and needs further investigation.
Further network statistics could be applied on a representative subgraph of the Bitcoin economy to identify clusters, hubs, brokers, and most central nodes in the network. Furthermore, particular Bitcoin nodes and their interaction in the network could be identified and a geographic visualization of the subgraph was realized. This gives new insights on the Bitcoin economy in a visual way.
Several interesting aspects of the Bitcoin economy could be covered in this work, but there are some limitations that could be addressed in future research. Extensions of this work should contain most recent data of the Bitcoin network to get insight on new developments in the economy, such as the attack on Mt.Gox with the subsequent closing of the exchange platform, or the closing of the vendor Silkroad. Furthermore, the intense fluctuations of the BTC/USD exchange rate in late 2013, resulting in a record high over $1,200 per Bitcoin, could be investigated. Another method would be time series analysis on economic distressed countries such as Spain, Cyprus, or Argentina and investigations on how the Bitcoin economy evolved during this time. One could also analyze the economic development in certain countries with appropriate economic measures and regress it against Bitcoin variables. Although events that explain the movements have been presented in this work, one could link these to network analysis and also visually investigate the Bitcoin transaction flows.
Even though 54% of all transactions could be related to a business tag and category, only 1.5% of them are not associated to the major businesses SatoshiDICE, Mt.Gox, or Deepbit. Re-identification techniques introduced by Meiklejohn et al.
] could be applied in addition to the Blockchain.info web scraper to link further businesses to transactions, especially for high volume transactions. With their approach and modified clustering algorithms Meiklejohn et al.
could tag around 2200 out of 3.38 million user nodes in the network. The more conservative and reliable clustering algorithm applied in this study is based on the research by Reid and Harrigan [2
] and resulted in around 6.3 million user nodes.
With sufficient computation power, future research could have a stronger focus on the network analysis of the Bitcoin economy. Then, complex network measures such as betweenness and closeness centrality, the average shortest path length, average clustering, and simulation of random networks can be applied on a much larger scale. Hence, the small world phenomenon could be investigated on large subgraphs or even on the entire Bitcoin network. Furthermore, the visualization of the Bitcoin economy could be extended on time, country, and business aggregation levels. Overall, our methods and data provide a starting point into a variety of fields for further research on Bitcoin.