1. Introduction
The aim of the Bitcoin paper (cf. [
1]) was to introduce a decentralized system to facilitate financial transactions without the need for a central intermediary as in traditional banking systems. Transactions in the Bitcoin system are intended to have similar properties to cash transactions. In cash transactions, the flow of money is physical. Cash is exchanged directly between parties without the need for any intermediary. This method is immediate and does not rely on any third-party processing. In traditional banking systems, money flow is centralized and controlled by financial institutions. Transactions are processed through a network of intermediaries, such as banks, clearinghouses, and payment processors. Money moves electronically between accounts, and transactions are recorded in the banks’ ledgers. In the Bitcoin blockchain system, the flow of money is decentralized and peer to peer. Transactions are verified and added to a public ledger by network participants through consensus mechanisms. Money is transferred directly between blockchain addresses without the need for intermediaries (cf. [
2]).
Traditional cash transactions offer a high level of privacy due to their natural peer-to-peer and fungibility characteristics. Participants in a cash transaction do not need to know identifying information about each other. They only need to have a pseudonymous identity to complete a transaction. In the traditional banking system, sending money between accounts involves exchanging identification information between the account owners and sharing that information with at least one banking provider. Although banking providers are obligated by law to maintain the privacy of their users, there is less privacy in banking transactions than in cash transactions (cf. [
3]). In blockchain systems, the creation of a transaction and its contents is transparent to every peer in the system.
Therefore, in public blockchain systems, it is easily readable which blockchain identities are participating in each transaction and how the blockchain state is being modified. This can lead to the users’ real-world identities being connected to their blockchain identity and thus to their blockchain activity. Depending on the use case of the respective blockchain system, this can, for example, leak the users’ financial transactions, storage metadata, or health records.
Further developments of blockchain systems have extended the use case of transactions between humans to transactions between humans and smart contracts (see
Section 4) and transactions only between smart contracts.
This literature review centers on the concept of information privacy within a technological context, while legislative or regulatory constraints are beyond the scope of this work.
Several systematic literature reviews on the topic of privacy in blockchain systems have already been published. However, they focus only on a narrow topic; for example, Liang and Ji [
4] review privacy challenges when combining blockchain systems with the Internet of Things (IoT), Herskind et al. [
5] analyze available privacy-enhancing schema for cryptocurrencies, and de Haro-Olmo et al. [
6] focus on the privacy properties of blockchain systems relating to Europe’s General Data Protection Regulation (GDPR). Furthermore, studies conducted by Kus Khalilov and Levi [
7] and Li et al. [
8] review several methods to increase privacy in blockchain systems. They, however, only cover the blockchain use case of electronic cash: cryptocurrencies. In our paper, we look beyond cryptocurrencies and explore the privacy properties of various types of blockchain systems.
In
Section 2, the research questions that this work intends to answer are provided. The following section presents the methodology used to perform this systematic literature review.
Section 4 introduces the background knowledge necessary for the rest of the paper. The results of the systematic literature review are split into three sections, with each one discussing one layer of blockchain systems: the on-chain (see
Section 5), off-chain (see
Section 6), and, finally, peer-to-peer network layers (see
Section 7). In
Section 8, the privacy impact of system design decisions and methods to increase and decrease information privacy is discussed. Finally,
Section 10 concludes this paper with a summary.
The authors wish to emphasize that this systematic literature review focuses specifically on privacy within blockchain systems and not on security. As a result, any security vulnerabilities that do not affect privacy are not in the scope of this study.
2. Research Questions
This systematic literature review offers a general view of privacy attacks and protections in blockchain systems. The introduction of new privacy techniques often necessitates substantial modifications to existing systems or the development of entirely new systems. This requirement arises from the complexities inherent in integrating advanced privacy measures into preexisting architectures. Furthermore, the process of adapting to new systems can be slow and challenging for users, who may be resistant to change due to factors such as familiarity, usability concerns, and perceived risks associated with adopting unfamiliar technologies.
In this context, privacy schemes that enhance the privacy properties of existing blockchain systems become especially valuable. Building upon established infrastructures, these frameworks can improve privacy protections without requiring users to migrate to entirely new systems. This approach not only facilitates a smoother integration of privacy enhancements but also encourages broader adoption among users who may be hesitant to embrace radical changes.
The following research questions are examined in this literature review:
RQ1: What are the privacy impacts of blockchain design decisions?
RQ2: What mechanisms, vulnerabilities, or adversarial techniques in blockchain systems lead to unintended reductions in information privacy?
RQ3: What methods have been applied to strengthen and improve information privacy in existing blockchain systems?
3. Methodology
The literature review process is based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 statement (cf. [
9]). Our methodology divides the review process into three steps that help to increase the reproducibility of the process: Identification, Screening, and Inclusion. The complete paper selection process is shown in
Figure 1. This review is in compliance with the PRISMA 2020 guidelines.
3.1. Identification
For the keyword search, the IEEE Xplore (
https://ieeexplore.ieee.org (accessed on 28 July 2025)) database was selected due to its relevance to computer science topics. Additionally, searches were also performed via the Scopus database (
https://www.scopus.com (accessed on 28 July 2025)) due to its general scope for scientific record search. As search terms, the following queries were used, with the goal being to include only documents that focus on privacy attacks and privacy analysis in blockchain systems:
blockchain AND “transaction tracking”.
blockchain AND “transaction tracing”.
blockchain AND “address tracking”.
blockchain AND “address tracing”.
blockchain AND “transaction linking”.
blockchain AND “address linking”.
blockchain AND “deanonymize”.
The paper identification and screening process was supported by the litstudy python package (cf. [
10]). The combined results of both databases resulted in 329 documents, of which 23 were found to be duplicates. The remaining documents were marked as ineligible by automation tools if any of the following statements were true:
The document is not available in the English language (15 documents).
The metadata of the document is not available on Scopus (23 documents).
The document has not been cited at least 3 times or has not had at least 2 citations per year since it was published. This was to filter out low-quality papers (74 documents).
In summary, 112 documents were marked as ineligible by automation tools, reducing the number of documents to be screened to 194.
3.2. Screening
Following the identification phase, the resulting records were screened both manually and automatically. For the automatic part, topic modeling was used, which extracted 7 core topics (see
Figure 2) that were identified by the following keywords:
Topic 1: bitcoin, network, graph, peer, nodes.
Topic 2: Ethereum, account, detection, graph, classification.
Topic 3: contracts, smart, cryptocurrency, smart contracts, execution.
Topic 4: anonymity, users, bitcoin, addresses, mixing.
Topic 5: iot, devices, security, private, data.
Topic 6: energy, grid, trading, peer, token.
Topic 7: vehicles, cloud, authentication, privacy, scheme.
After evaluating the reports for each topic, it was found that 8 records belonging to topic 6 were of low relevance to the research questions and were therefore excluded.
The manual screening process consisted of inspecting the title and abstract of each record to determine its relevance to the research questions. This resulted in 100 records being removed and 86 papers left for retrieval. Further, 75 records could be retrieved in full text.
The records were then assessed for eligibility based on their topic and quality:
3.3. Inclusion
After removing all documents that did not meet the necessary quality and relevance to the research questions, the resulting paper count was 36. Snowballing involved systematically reviewing the references obtained from the PRISMA screening step. The backward snowballing process was conducted iteratively, with each newly identified paper’s references further analyzed. Forward snowballing focused on tracking subsequent studies that cited the initial papers using Google Scholar and Scopus. Citations were manually verified, and the same criteria as in the rest of the screening methodology were applied across all snowballed papers. This dual approach expanded the dataset by 11 papers, bringing the final paper count to 47.
Table 1,
Table 2 and
Table 3 show the included references organized by topic, year, and related blockchain system.
3.4. Limitations
This review restricted the focus to technical papers, ensuring a concentrated analysis of blockchain-specific privacy mechanisms. The exclusion of legal and economic papers allowed for the prioritization of technical depth over interdisciplinary perspectives. Papers that propose privacy techniques for use cases that have blockchain as a building block but are not blockchain-specific were excluded.
The exclusion of papers based on citations per year was implemented to prioritize studies with demonstrable relevance and impact within the field of blockchain privacy. While this approach ensures a focus on well-established research, it may inadvertently overlook recent, high-quality works that have not yet accumulated sufficient citations due to their novelty. This criterion introduces a potential bias toward older or more mainstream contributions, which could limit the review’s ability to capture emerging techniques or underrepresented perspectives.
4. Background
This section presents the essential concepts required for a thorough understanding of the subsequent content. Initially, the fundamental principles of blockchain systems are outlined, followed by a description of the account model employed within these systems. Additionally, privacy concerns specific to blockchain technology are presented. Finally, a threat model pertaining to privacy issues in blockchain systems is provided.
4.1. Blockchain Systems
A blockchain is a form of a distributed database, with data replicated between a network of peers and data consistency achieved through decentralized consensus. Data are not updated directly. Instead, a collection of change records represented by transactions is appended as aggregated entities called blocks. Transactions modify the state of the blockchain. In a blockchain system, all participating peers can validate state changes and must agree on the state stored in the distributed database. Thus, a blockchain represents the immutable history of state changes triggered by the transactions of peers up to the most recent block.
The concept of blockchain was introduced by Nakamoto [
1] in 2008 to realize a decentralized digital currency called Bitcoin in a peer-to-peer interaction mode among its users without the need for centralized coordination. Although in the Bitcoin system, blockchain is used to achieve value transfers, it is not limited to do so. Any state can be written to the decentralized database, with the limiting factor being only the memory space and processing power of the peers needed to ensure data consistency. A blockchain system designed primarily for value transfer utilizes digital tokens, known as cryptocurrencies, to represent and facilitate the exchange of value.
Inspired by the concept of Bitcoin blockchain systems, Vitalik Buterin and Gavin Wood [
55] developed the Ethereum blockchain, which allows for the creation of smart contracts in a programming language more expressive than the scripting language provided by Bitcoin. In the Bitcoin system, each address is usually controlled by a person, and the main purpose of the system is to send and receive digital tokens, that is, cryptocurrency. Ethereum works differently. On Ethereum, you can run complex programs called smart contracts. These programs are stored directly on the blockchain and can do many different things automatically, without the need for a person to control them. They can run autonomously, own assets and tokens, and transfer value and state across different accounts.
This makes Ethereum useful for more than just the transfer of value—it can also support more complex use cases like decentralized finance (DeFi) or digital identity systems (cf. [
56]). In Ethereum, there are two types of accounts: some are controlled by a person, and these are called Externally Owned Accounts (EOAs), while others are controlled by smart contracts.
4.2. Account Model
The account model (cf. [
55]) determines how the blockchain system stores value. The model keeps track of how much funds each blockchain account owns and is able to use. In Bitcoin, this is carried out via the Unspent Transaction Output (UTXO) model. In blockchain systems using the UTXO model, transactions create outputs that can be used as input by other transactions. Each output can only be used once and must be fully spent. Thus, keeping track of all unspent transaction outputs that a given entity controls allows one to determine the available funds of the respective entity. Besides the original Bitcoin system, many blockchain systems such as Litecoin (
https://litecoin.org/ (accessed on 28 July 2025)), Dash (
https://www.dash.org/ (accessed on 28 July 2025)), Cardano (
https://cardano.org/ (accessed on 28 July 2025)), or Monero (
https://www.getmonero.org/ (accessed on 28 July 2025)) also use the UTXO model.
Ethereum does not use the UTXO model. Instead, Ethereum works more like a traditional bank. Each account has a balance, and when a transaction occurs, the amount of Ether (Ethereum’s built-in currency) in one account decreases, and the amount in another account increases. Each account is controlled by either a person or a smart contract, similar to how a bank account is controlled by its owner.
4.3. Privacy Issues in Blockchain Systems
The primary objective of privacy protections in blockchain systems is to reconcile the inherent transparency of distributed ledgers with the need to protect sensitive information. Participation in public blockchain systems exposes users to various privacy issues, as a large part of their interaction with the system is public, and some of the users’ data will be stored indefinitely. In the following subsections, we discuss the privacy properties from the perspective of the three layers: the on-chain, off-chain, and network layers.
4.3.1. On-Chain Layer
The on-chain layer contains all interactions of data stored directly in the blockchain database. Users of blockchain systems interact with the system with their public key pairs. Identities in blockchain systems are not always directly represented by the public key of its user; instead, public keys can be transformed to create blockchain addresses. The public keys and addresses of a user are their pseudonyms in the blockchain system. Depending on the blockchain system, blockchain addresses can also be formed by other means such as transforming a locking script.
Blockchain addresses can be created offline and are only visible to the system once a transaction occurs that involves the address. Identities on public blockchain systems are not verified by external parties. A blockchain system user can possess multiple addresses and can even send these to other users.
In Bitcoin and many other blockchains, all transactions can be read by any participating peer in plain text. Thus, it is possible to survey the complete transaction history and retrace the value movements between addresses. The addresses under the control of a user are therefore their pseudonyms.
Given a pseudonymous address of a blockchain system, a user’s true identity can be revealed via their interaction with on-chain assets outside the blockchain system, for example, if the user exchanges their cryptocurrency for off-chain goods and services (see
Section 6).
Furthermore, address clustering techniques enable blockchain observers to establish links between addresses and their users. One of the address clustering techniques, the multi-input heuristic, assumes that every transaction is created by a single user. Therefore, it links all input addresses of a transaction (cf. [
11]).
Figure 3 shows an example of two transactions that can be linked to the same user.
4.3.2. Peer-to-Peer Network Layer
Blockchain systems utilize peer-to-peer (P2P) networks to share messages. This includes new transactions, which are not yet appended to the blockchain, and lists of peers to connect to.
The act of publishing transactions in blockchain systems impacts the privacy of its users: if users want to create a new transaction in the system, they need to send their transaction to other peers. In the gossip model, they send their transactions to all their direct peers. These direct peers propagate the new transaction to their peers and so forth. This way, the system is informed of the new transaction and can later be included in a new block. The direct peers of a user now also know, besides various other networking information, the user’s IP address and thus their approximate location. Additionally, they can combine networking information with the data contained in the newly created transaction. These data are not limited to direct peers, as an adversary with access to a substantial amount of network traffic can also determine from which peer transactions originate.
Although privacy considerations of the P2P network layer may include data from the network and transport layer (cf. [
57]), the focus in this paper is only on blockchain system-relevant data and excludes privacy issues that arise explicitly outside of the P2P network layer.
4.3.3. Off-Chain Layer
The off-chain layer (cf. [
58]) refers to any data storage, computation, or communication that occurs outside the blockchain system itself but remains linked to it through cryptographic references. These solutions are often critical for scaling blockchain systems, as they allow large datasets or frequent transactions to be handled efficiently without overburdening the underlying chain. For instance, decentralized storage systems like IPFS (
https://ipfs.tech/ (accessed on 28 July 2025)) or Filecoin (
https://filecoin.io/ (accessed on 28 July 2025)) keep data and files, such as high-resolution NFT artwork or legal documents, off-chain while storing only a compact hash of the data on-chain. Similarly, payment channels, such as Bitcoin’s Lightning Network, enable users to conduct numerous transactions off-chain, submitting only the final balance to the blockchain for settlement. Oracles, such as Chainlink (
https://chainlinklabs.com/ (accessed on 28 July 2025)), further extend functionality by fetching real-world data, e.g., stock prices or weather conditions, through off-chain verification before delivering them to smart contracts.
Despite their utility, off-chain interactions can introduce privacy risks by exposing sensitive information through indirect channels. One major concern is identity linkage, where users inadvertently connect their blockchain activity to real-world identities. This can happen when someone shares their public wallet address on social media. Centralized exchanges such as Coinbase exacerbate this risk by requiring Know Your Customer (KYC) verification, which ties government-issued IDs to blockchain addresses during off-chain account creation.
Metadata leaks are another significant issue. Although payment channels improve scalability, they can potentially reveal transactional patterns, such as frequency, amounts, or counterparties, e.g., if intermediaries log routing data. Similarly, storing private records on IPFS can expose metadata like timestamps or geolocation tags unless files are encrypted before being hashed and anchored on-chain. Even hashed data carry risks: if an off-chain document (e.g., a medical report) uses predictable content, attackers could brute-force the original input by matching common phrases against the on-chain hash.
These vulnerabilities highlight the delicate balance between scalability, off-chain layer solutions, and privacy.
4.4. Threat Model
While the original Bitcoin protocol includes several considerations for its security threat model, such as double spending, consensus protocol issues, and centralization risks, it also incorporates economic incentives for participants to behave according to the protocol. The threat model for privacy issues is limited to the linkage of blockchain addresses and the suggestion to avoid address reuse.
Van Landuyt et al. [
59] investigated threat models for blockchain systems and identified the following privacy threats:
Linking blockchain entities to each other.
Linking blockchain entities to IP addresses.
Linking blockchain entities to real-world identities.
Linking entities to each other has a similar impact to linking entities to on-chain and off-chain data (see
Section 4.3.1 and
Section 4.3.3). Goldreich [
60] incorporates the roles of participants into the threat models of communication protocols:
Honest participants adhere to the rules of the communication protocol.
Honest-but-curious participants follow the protocol’s rules but keep track of the information that they send and receive.
Malicious participants may disregard the protocol rules at any time. They can cease communication, disrupt the protocol’s execution, and send arbitrary messages.
In regards to the privacy threat model for blockchain systems, honest-but-curious actors can observe the on-chain state and combine the information gained by collecting P2P network data and off-chain data to impact the privacy of blockchain system users. If not stated otherwise, the privacy attacks and protections discussed in this paper consider only the honest-but-curious actor.
5. On-Chain Layer
Blockchain technology maintains a transparent append-only ledger, where all transactions are recorded and visible to all participants in the network. In many blockchain systems, transactions include at least the sender’s identifier, the recipient’s identifier, and the amount transferred or a smart contract payload. This information is stored on the blockchain and can be accessed by anyone.
Although real-world identities are not directly linked to blockchain addresses, transactions are still associated with these addresses. Blockchain addresses are pseudonymous, which means that they are not explicitly tied to individuals but can be tracked and analyzed. By examining transaction patterns and addresses’ activity, it is possible to make inferences about the entities involved.
The reuse of blockchain addresses can compromise privacy. If an address is reused for multiple transactions, it becomes easier to link those transactions together, potentially revealing the transaction history and patterns of an individual or entity. Analytical techniques can be employed to cluster related addresses and identify transaction flows.
Various tools and techniques exist for blockchain analysis, which allows for the exploration and tracking of transactions. Through sophisticated data analytics, it is possible to examine transaction flows, identify common addresses, and even associate addresses with known entities or individuals.
5.1. Privacy Attacks
5.1.1. Address Clustering
Address clustering techniques for account-based blockchain systems differ from those for UTXO-based systems, as information leakage by users is less common due to most wallets using a single address. Therefore, address clustering in account-based systems is not as essential to analyze users’ behavior. However, if users purposely use multiple addresses, clustering becomes harder, as common heuristics (e.g., multi-input heuristic [
11]) cannot be used. In account-based blockchain systems, other user behavior has to be exploited. Victor [
12] proposes an address clustering technique for Ethereum based on the deposit addresses of exchanges: Cryptocurrency exchanges provide their users with deposit addresses. Each cryptocurrency address that deposits to a deposit address can then be considered to belong to the same entity.
In UTXO-based systems, such as Bitcoin, CoinJoins (see
Section 5.2.1) are commonly used to increase users’ on-chain privacy. CoinJoins directly decrease the efficiency of the multi-input address clustering heuristic. To combat this, Wahrstätter et al. [
13] describe how CoinJoin transactions of known CoinJoin systems can be detected and thus excluded from the address clustering process.
Zhang et al. [
14] propose an improved address clustering schema for ZCash by applying the multi-input heuristic, change heuristic, and shielded change heuristic together, which increases the clustered address ratio from 27% to 36%.
5.1.2. Address Classification
Liu et al. [
15] introduce FA-GNN, which is a graph neural network with neighborhood node filtering and feature augmentation. It allows for the classification of Ethereum addresses based on their own properties and the properties of their connected addresses to labels including mining, mixing, and exchanges. An additional result of the paper is that the transaction amount is a more significant metric to node classification than the number of transactions between addresses or the degree of a neighbor. The authors also note that Ethereum addresses display a high degree of heterophily, meaning that they transact mostly with addresses that have a different classification to their own.
The work of Yang et al. [
16] extends the work of Liu et al. [
15] by using both the homophily and heterophily of Ethereum addresses to classify them via a graph neural network.
5.1.3. Transaction Graph Analysis
Blockchain systems can be seen as a transaction graph, where transactions alter the state of accounts or addresses, forming a historical record of transactions. In this graph, each transaction is represented as an edge, and each address is represented as a vertex. By analyzing the transaction graph, relationships between addresses can be discovered.
Node embeddings enable the mapping of a graph’s structure and the connections between nodes into a vector space. This vector space representation can be leveraged by machine learning models to analyze and understand the graph.
Wijaya et al. [
17] demonstrate that transaction inputs can be linked between the hard forks of Monero, notably in October 2018, by comparing the key images of inputs. After a hard fork, a user’s outputs can be spent on each system separately. When this happens, the same key image for each output is used. This results in linking the user’s transactions, as well as decreasing the anonymity set for other users. Before the hard fork of October 2018, a minimum ring size of 5 was set, which was increased in later hard forks to 7, 11, and currently 16. The increase in ring size greatly reduces the impact of this attack.
Lin et al. [
18] explore how the transaction patterns of addresses in Ethereum can be predicted. They employ a scoring method based on common neighbors between addresses to determine similar transaction behavior. In this work, the authors do not consider internal transactions that are created by smart contracts.
Li et al. [
19] propose a new embedding method called transE for Ethereum transactions, which includes both the timestamp and amount data to detect phishing scam accounts. Compared with other embedding methods, transE achieves higher performance.
Beres et al. [
20] attempt to cluster Ethereum addresses by using time-of-day activity, gas prices, and the resulting graph structure of transactions. They find that using graph links achieves a 20% higher success rate in finding address pairs than using time and gas price data. Additionally, the authors propose three user behavior errors that can link a deposit and withdrawal address of the Tornado Cash mixer:
The user uses the same address for deposit and withdrawal.
The user uses the same unique gas price for deposit and withdrawal.
There exists a transaction between a deposit and a withdrawal address.
Similarly, Du et al. [
21] use the transaction network structure of Ethereum to construct MixBroker, a GNN employing GraphSage, which aims to find address pairs of Tornado Cash users.
Conversely, Li and He [
22] build their own GNN to deanonymize Bitcoin mixing graphs.
Dekhil et al. [
25] also focus on CoinJoins in the Bitcoin blockchain. They apply several machine learning methods (including K-NN, XGBoost, and Random Forests) to detect whether transactions are part of the CoinJoin mixing graph. Their work emphasizes the critical role of transaction features, particularly the number of inputs and outputs, in enhancing the performance of machine learning models for this task.
Wu et al. [
23] showcases TRacer, a tool that supports finding linkable addresses via community detection. The author’s work considers not only transactions between EOAs but also a limited number of internal transaction types (swap and transfer).
Tornado Cash, a coin mixer on the Ethereum blockchain, is implemented as a smart contract, which provides several denomination pools to which users can deposit and withdraw their coins. When depositing funds into a pool, the user also provides the commitment of a secret of their choosing. In the withdrawal transaction, the user is able to prove their prior deposit by providing a non-interactive zero-knowledge proof showing that they know the initial secret. The deposit and withdrawal transactions are not linked, thus obscuring the ownership of funds. Wu et al. [
37] propose several heuristics for linking the sender and receiver of mixed Tornado Cash funds:
A deposit and a withdrawal to the same pool occurs within 3 min between two addresses
A deposit and a withdrawal both use the exact same custom gas price.
At least three non-Tornado Cash transactions exist between two addresses that deposit or withdraw to Tornado Cash.
The same number of withdrawals and deposits from/to the same pool by two addresses.
5.2. Privacy Protections
Traditional blockchain architectures, such as those employed by Bitcoin and Ethereum, maintain a public ledger that records all transactions, making data easily accessible to anyone accessing the public network. This includes transaction payloads, transaction timestamps, and participants. This transparency is crucial for ensuring trust and accountability but poses challenges when it comes to protecting the privacy of individuals and entities involved in transactions. The exception thereof are blockchain systems specifically designed to improve the privacy of their users (e.g., Monero, Dash, and ZCash).
On-chain privacy mechanisms address these challenges by introducing cryptographic techniques and obfuscation schemes that hide transaction details while still preserving the integrity and verifiability of the blockchain system by prohibiting the forecasting of transactions on a macro- and micro-level, as shown by Wei et al. [
27] and Sharma and Sharma [
28]. Other popular techniques include zero-knowledge proofs (cf. [
61]), ring signatures (cf. [
62]), and ownership obfuscation. These techniques are widely adopted in privacy-focused blockchain systems like Monero and Zcash as foundational elements to obscure transaction origins and enhance user anonymity while also being integrated into smart contract ecosystems on Ethereum and other platforms to bolster privacy-preserving functionalities in decentralized applications. Zero-knowledge proofs are cryptographic protocols enabling a prover to demonstrate the validity of a statement to a verifier without revealing any additional information. They satisfy three properties: completeness (valid statements are accepted), soundness (invalid statements are rejected), and zero-knowledge (no information beyond the statement’s validity is disclosed). Zero-knowledge proofs underpin privacy-preserving systems, enabling verifiable computations while maintaining confidentiality.
A range proof is a specific type of zero-knowledge proof used to demonstrate that a value lies within a specified range without revealing the value itself. This is particularly important in privacy-focused cryptocurrencies, where a user might want to prove that they have enough funds to complete a transaction without disclosing the exact amount.
5.2.1. Ownership Obfuscation
Coin mixing, often referred to as coin tumbling or coin laundering, is a process by which the traceability of cryptocurrency transactions is obscured, making it difficult to link a sender’s address to a receiver’s address. The fundamental principle behind coin mixing involves breaking the deterministic link between the input and output addresses in a transaction. This is achieved by pooling and subsequently redistributing funds among multiple participants, making it challenging for external observers to determine the true origin and destination of the funds.
Coin mixing employs various techniques to achieve its privacy-enhancing objectives. One common method is CoinJoin (cf. [
29]), where multiple users collaboratively create a transaction by combining their inputs and outputs. This results in a complex network of interconnected transactions, making it difficult for external entities to distinguish between the original inputs and outputs. Thus, it is difficult to identify the exact ownership of each output.
Figure 4 shows an example of three CoinJoin transactions mixing funds. In the example, the outputs and inputs marked with red connections belong to one user. As all transactions use the same denomination and each transaction has multiple inputs and outputs, the ownership of the outputs is obfuscated, and address
A cannot be linked to address
O.
Alternatively, other mixing techniques such as confidential transactions and ring signatures can be used to introduce an additional layer of privacy. Confidential transactions hide the transaction amounts, and ring signatures allow multiple signers for a single transaction output, making it challenging to identify the actual owner.
In the early days of CoinJoin usage, participants commonly used unique amounts of coins in CoinJoin transactions. This allowed observers to identify participants despite using CoinJoin, as the input and output amounts could easily be matched. This was improved by Duffield and Diaz [
30] and applied in the Dash cryptocurrency by using common amount denominations for mixing. Similarly, Whirlpool and Wasabi 2.0 (cf. [
24,
26]) also use common denominations to implement their CoinJoin service on the Bitcoin blockchain. Stütz et al. [
24] analyze the usage of Whirlpool and Wasabi by creating and applying transaction classifiers.
Figure 5 shows the number of Wasabi 2.0 and Whirlpool transactions over time.
In contrast, Maurer et al. [
31] propose a new CoinJoin protocol that allows arbitrary values while maintaining high anonymity.
Ra et al. [
32] propose a privacy schema for permissioned blockchains, where users are anonymously authenticated and the tracking of malicious parties is possible via cryptographic solutions.
In their work, Nosouhi et al. [
33] propose UCoin, which is a mixing scheme that allows for the aggregation and thus de-linking of input addresses, output addresses, and amounts in transactions. They achieve this by extending the Dining Cryptographers (DC) scheme to reduce collision possibilities.
5.2.2. Zero-Knowledge Proofs
DCAP (Decentralized Conditional Anonymous Payment System) is introduced by Lin et al. [
34]. Based on blockchain technology, it allows for anonymous transactions via cryptographic primitives. Similarly to Monero, it allows one to reveal details of obfuscated transactions. With Monero, the system’s users control the revealing process of their own transactions. Conversely, DCAP allows a trusted third party to deanonymize every transaction and user in the system, while the users stay anonymous between themselves.
Kerber et al. [
35] combine the privacy properties of ZCash with the PoS protocol Ouroboros Genesis (cf. [
36]) to form a privacy-preserving PoS schema named Ouroboros Crypsinous. The protocol allows stakers to privately participate in the staking process on the on-chain level by using SNARKS and other cryptographic techniques to hide both the staker’s identity and their stake.
Zhao et al. [
38] propose a batch payment schema with denomination privacy using homomorphic encryption. It includes several zero-knowledge proofs to prevent overspending, prevent non-negative payments, and prove accurate cipher text generation.
5.2.3. Ring Signatures and Range Proofs
Kopp et al. [
39] propose a decentralized file storage system, which uses schemes such as ring signatures and range proofs to increase the privacy of its users. The system also supports proof of storage via sentinel blocks, which are verifiable blocks of data stored by the storage providers.
6. Off-Chain Layer
6.1. Privacy Attacks
Sabry et al. [
40] establishes connections between advertisements and trades on the LocalBitcoin escrow service and on-chain transactions through timing and amount heuristics. This association is facilitated by the publicly accessible LocalBitcoin API, which discloses all advertisements and trades, along with widely recognized on-chain escrow addresses that inadvertently reveal a substantial number of on-chain transactions associated with LocalBitcoin. LocalBitcoin is discontinuing its service and suspended trading (
https://localbitcoins.com/service_closure/ (accessed on 28 July 2025)) in February 2023.
Wu et al. [
41] describe the XBlockFlow framework to create a dataset containing known Ethereum addresses associated with money laundering. They achieve this by first collecting known money laundering addresses from reports and websites. Afterward, a taint analysis is performed, which traces the usage of funds and thus collects more addresses.
Hickey and Harrigan [
42] examine Bisq, a decentralized exchange built on the Bitcoin blockchain system. They note several information leaks in the Bisq trading protocol. In the Bisq DAO, every action corresponds to a blockchain transaction with one of multiple types. A significant number of these transactions are self-transfers, where an entity moves assets between its own addresses. This pattern allows for address clustering, as all addresses involved in self-transfers can be grouped to represent a single entity. The authors propose this heuristic to improve address clustering in Bitcoin.
In the context of privacy attacks based on off-chain data, Goldfeder et al. [
43] discuss how information leakage by web services and payment processors to third parties can severely impact users’ privacy by linking Bitcoin on-chain data to off-chain payment details. Similarly, Zhang et al. [
44] conduct the same analysis on the Litecoin blockchain system, and they also find that third party tracker information can be linked to on-chain data.
Rohrer and Tschorsch [
45] propose an attack to infer the destination of lightning network transactions. This works by having malicious nodes in the transaction propagation network observe the timings of incoming transactions.
6.2. Protections
Notably, the literature review did not result in papers discussing privacy enhancements for off-chain data. While simply not storing private data on public blockchain systems or conducting computations with private data off-chain already enhances the privacy of its users, it is conceptually hard to prevent off-chain data from being linked to on-chain data.
However, users could be educated to recognize that off-chain data can be linked to their blockchain identities and what measures that they can take to minimize the impact. The off-chain attacks discussed in this paper target external services that connect to a blockchain system. An educated user may limit the impact of off-chain privacy issues by using on-chain privacy techniques, such as ownership obfuscation via mixing.
Also, external services could introduce privacy techniques to reduce the amount of personal information that they gain when users use their service. For example, k-anonymity (cf. [
63]) could be used to obscure the blockchain address that a user is searching on a blockchain explorer.
If a user’s identity is required to be linked to off-chain data or computations, the user at least has more control. On-chain smart contracts could play an integral part in managing user consent, data access, and data deletion.
7. P2P Network Layer
7.1. Privacy Attacks
Here, we provide an overview of the key aspects of Bitcoin’s networking behavior, which is commonly inherited by alternative cryptocurrencies.
Nodes can discover peers through the following methods (cf. [
46,
47]):
Via predefined DNS servers that provide an initial set of node IP addresses.
Via a predefined set of IP addresses, as an alternative when DNS servers are not reachable.
Via a gossip protocol, enabling the sharing of node IP addresses among connected nodes.
Biryukov and Pustogarov [
48] propose a method to deanonymize Bitcoin clients using Tor. Counterintuitively, Tor decreases the user’s privacy as it makes it harder for the Bitcoin client to detect attackers, as they share the IP address of Tor exit nodes. The authors discuss a technique to remove legitimate peers from a victim client pool. Additionally, they introduce an address cookie, which exploits Bitcoin’s peer discovery protocol, to uniquely identify Bitcoin peers even between sessions. A fix (
https://github.com/bitcoin/bitcoin/pull/5442 (accessed on 28 July 2025)) for the address cookie attack was proposed by the authors and merged into Bitcoin core.
In [
47], the authors propose a technique for linking transactions to their P2P nodes based on message propagation timings and address advertisement messages. In addition to Bitcoin, they apply the method to the privacy-focused cryptocurrencies Dash, Monero, and ZCash.
Kim et al. [
49] examine and analyze network traffic on the P2P layer of the Bitcoin system. They describe an autoencoder neural net machine learning approach, which allows for the detection of network traffic anomalies.
Several tools for examining the P2P network of cryptocurrencies have been proposed.
CoinSeer, demonstrated in [
50], is a modified Bitcoin core client. The authors show that linking a nodes’ IP address to on-chain addresses is possible when users’ transaction patterns are noticeable enough to differentiate them from the rest. Zhu et al. [
51] describe an application that collects general P2P network data (IP addresses and gossiped transactions). Di Francesco Maesa et al. [
46] propose BITKER, a general-purpose P2P Bitcoin network data collection client, which allows for integration with external applications via an API. Their work enables the collection of research data of the P2P network for further processing.
7.2. Privacy Protections
Modinger et al. [
54] propose increasing network-level privacy by combining a flood and prune propagation method with a configurable Dining Cryptographers message passing schema.
The Dandelion protocol, originally proposed by Bojja Venkatakrishnan et al. [
52], has been deployed in several blockchain systems, including Monero, to enhance transaction anonymity. Similarly, Franzoni and Daza [
53] introduce a network-level privacy schema wherein nodes in the P2P network are categorized as routable peers (actively propagating transactions) or non-routable peers (not propagating transactions). Non-routable peers are instructed not to advertise themselves in the overlay network, thereby reducing the attack surface without compromising network functionality. Furthermore, the authors describe a modification to the Dandelion protocol, where newly created transactions are mixed with other received transactions before propagation, obscuring the origin of transactions and strengthening privacy guarantees. This approach exemplifies how protocol-level design choices can mitigate deanonymization risks in blockchain systems.
8. Discussion
8.1. Privacy Impact of Design Decisions
In the original Bitcoin paper, blockchain systems were not designed with privacy in mind. Although the data on-chain are pseudonymous, each interaction with the blockchain systems introduces potential information leaks. The UTXO model inherently links users’ blockchain addresses if not used correctly. Although CoinJoins can obscure the users’ identity, it only patches privacy issues on-chain.
Public access and permanent data storage make it inherently difficult for blockchain systems to preserve users’ privacy. Although the system does not require real-world credentials for interacting with the system, each pseudonymous interaction with the system can be observed publicly. Due to the persistent storage of transactions in blockchain systems, which represent the users’ interaction with blockchain systems on the on-chain level, users’ privacy can be degraded in the future if new privacy attacks are developed.
The inherent open design of blockchain systems both on-chain and on the P2P layer exposes users to serious privacy issues. The permanent historical record of all transactions provides malicious actors with a large attack surface, and the unauthenticated P2P layer allows one to observe virtually all network traffic (RQ1).
8.2. Methods to Decrease Information Privacy
In this paper, the current literature on the privacy of users of blockchain systems is reviewed. Attacks and protections can be divided into three layers: the on-chain layer, the off-chain layer, and the P2P network layer (see
Section 4). The reviewed privacy attacks (RQ2) reveal the inherent vulnerabilities in blockchain systems, expose design flaws, and illustrate that existing privacy techniques do not always provide true anonymity.
Although some attacks can be mitigated by privacy protections and schemas, they often come at a great cost of usability, or major changes to existing systems are necessary.
8.3. Methods to Increase Information Privacy in Existing Blockchain Systems
To enhance privacy in blockchain systems, several protocols have been proposed. These range from innovative cryptographic methods utilizing zero-knowledge proofs and shielded addresses to network propagation strategies designed to obscure the sender’s IP address. Despite significant advancements in cryptographic protocols, the propagation of transactions across the network continues to pose a privacy threat to its users. Although some studies have suggested that Dining Cryptographers networks or Mixnets can facilitate anonymous broadcasting among untrusting peers, such broadcasts can still enable the construction of transaction graphs, allowing for various attacks on user anonymity.
Updates to blockchain systems often require a hard fork, which is a major change in the system to which the majority of its vote-enabled users have to agree. This process can slow down responses to privacy concerns and may lead to disagreements among users about specific solutions.
Our research indicates that coin mixing is a popular and effective technique for improving privacy in existing blockchain systems (RQ3). This technique can be readily applied to UTXO-based blockchain systems due to their inherent capability for collaborative transaction construction. Additionally, blockchain systems that support smart contracts can be adapted to obscure the ownership of funds by facilitating the pooling of participants’ assets.
9. Materials and Methods
The authors declare that in the preparation of this manuscript no generative artificial intelligence has been used. Topic modeling was used to filter paper(see
Section 3).
10. Conclusions
This systematic literature review provides a comprehensive analysis of privacy challenges and solutions in the on-chain, off-chain, and peer-to-peer (P2P) network layers of blockchain systems. By synthesizing current research, this study discusses critical privacy vulnerabilities inherent in blockchain design and evaluates mechanisms to mitigate these risks.
The analysis reveals that blockchain transparency, while ensuring accountability, exposes users to significant privacy threats. At the on-chain layer (
Section 5), pseudonymous addresses are susceptible to deanonymization through transaction graph analysis, address clustering (e.g., multi-input heuristics), and behavioral patterns. Techniques like CoinJoin and zero-knowledge proofs (ZKPs) demonstrate effectiveness in obfuscating transaction links, yet their adoption faces challenges such as usability trade-offs and reliance on user compliance and adoption. The off-chain layer (
Section 6) introduces risks through metadata leaks and identity linkage via external services (e.g., exchanges and oracles), highlighting the need for encryption and user education to minimize exposure. At the P2P network layer (
Section 7), adversaries exploit the timing of transaction propagation and IP address correlations, underscoring the limitations of existing anonymization tools such as Tor and the potential of protocols like Dandelion++ to improve network-level privacy.
By addressing research questions, this review identifies blockchain design decisions, such as public ledgers, UTXO models, and unauthenticated P2P communication. These elements fundamentally impact privacy (RQ1). Vulnerabilities such as address reuse, predictable transaction patterns, and off-chain data linkage exacerbate privacy degradation (RQ2). Although privacy-enhancing methods such as mixing protocols, ZKPs, and cryptographic obfuscation offer robust protections (RQ3), their integration often requires systemic changes or compromises in decentralization and scalability.
In conclusion, achieving privacy in blockchain systems requires a layered approach, combining cryptographic innovations, network protocol refinements, and user-centric practices. Future work should focus on harmonizing privacy enhancements with blockchain’s core principles of transparency and decentralization while advancing techniques to mitigate emerging threats across all layers.