Cell-Sequence-Based Covert Signal for Tor De-Anonymization Attacks

Xin, Ran; Wang, Yapeng; Huang, Xiaohong; Yang, Xu; Im, Sio Kei

doi:10.3390/fi17090403

Open AccessArticle

Cell-Sequence-Based Covert Signal for Tor De-Anonymization Attacks

by

Ran Xin

¹,

Yapeng Wang

^1,*

,

Xiaohong Huang

^2,*,

Xu Yang

¹ and

Sio Kei Im

¹

Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China

²

Institute of Network Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China

^*

Authors to whom correspondence should be addressed.

Future Internet 2025, 17(9), 403; https://doi.org/10.3390/fi17090403

Submission received: 10 July 2025 / Revised: 1 September 2025 / Accepted: 1 September 2025 / Published: 4 September 2025

(This article belongs to the Special Issue Novel Approaches and Techniques for Privacy in Internet Communications)

Download

Browse Figures

Versions Notes

Abstract

This research introduces a novel de-anonymization technique targeting the Tor network, addressing limitations in prior attack models, particularly concerning router positioning following the introduction of bridge relays. Our method exploits two specific, inherent protocol-level vulnerabilities: the absence of a continuity check for circuit-level cells and anomalous residual values in RELAY_EARLY cell counters, working by manipulating cell headers to embed a covert signal. This signal is composed of reserved fields, start and end delimiters, and a payload that encodes target identifiers. Using this signal, malicious routers can effectively mark data flows for later identification. These routers employ a finite state machine (FSM) to adaptively switch between signal injection and detection. Experimental evaluations, conducted within a controlled environment using attacker-controlled onion routers, demonstrated that the embedded signals are undetectable by standard Tor routers, cause no noticeable performance degradation, and allow reliable correlation of Tor users with public services and deanonymization of hidden service IP addresses. This work reveals a fundamental design trade-off in Tor: the decision to conceal circuit length inadvertently exposes cell transmission characteristics. This creates a bidirectional vector for stealthy, protocol-level de-anonymization attacks, even though Tor payloads remain encrypted.

Keywords:

privacy; Tor; deanonymization attack; network security

1. Introduction

Low-latency anonymous communication systems, such as Tor [1], were originally developed to provide critical privacy protections and support functionalities like hidden services. However, the widespread adoption of Tor as the predominant anonymity network has coincided with significant misuse. The same anonymity intended to protect vulnerable individuals and secure sensitive communications [2,3] has also provided opportunities for illegal activities. Specifically, Tor has become a common platform for cybercrime, hosting darknet markets for illegal goods and services, disseminating extremist content, and facilitating child exploitation materials and other harmful activities [4,5]. The use of hidden services to conceal the real locations of users further complicates law enforcement efforts to identify and apprehend perpetrators.

The extensive misuse of the anonymity provided by Tor now presents a major challenge to law enforcement and governmental organizations. Consequently, there is an urgent need to develop effective techniques for de-anonymizing malicious users and hidden services to improve public safety. Typically, de-anonymization attacks target identifying either the websites visited by users or the actual IP addresses behind hidden services. Traditionally, executing these attacks involves three steps: attackers first deploy malicious routers into the Tor network, then wait until these routers integrate into a communication path of user. Next, these routers evaluate their position within the communication path. Finally, based on their position, attackers decide whether to initiate an active or passive attack.

To simplify the explanation of attack techniques, prior studies often adopted a model that skipped initial positioning steps, assuming attackers had already optimally placed malicious routers within user paths. Before Tor introduced bridges (hidden onion routers) in 2008, this assumption was justified, as attackers could easily identify their positions, allowing research to focus solely on attack techniques. However, the adoption of bridges [6] significantly complicated positional identification, rendering the traditional simplified attack model unrealistic. As the statistician George E. P. Box observed, “all models are wrong; the practical question is how wrong they must be to not be useful.” Unfortunately, post-2008 research has largely overlooked this critical limitation, continuing to rely on outdated assumptions. Consequently, despite theoretical advancements, many proposed attacks remain practically ineffective.

The introduction of Tor bridges prevents malicious routers from accurately determining their positions, thereby disrupting previous de-anonymization attack strategies. To overcome this challenge, we propose an advanced attack methodology that makes the following contributions:

A More General Attack Model: We propose a more general and realistic attack model that remains effective in the modern Tor network. Unlike prior models that often assume attackers can easily determine their position within a circuit, our approach is designed to function under the uncertainty introduced by components like Tor bridges. By relaxing the need for a priori positional knowledge and instead incorporating a verification mechanism, our model overcomes a critical limitation that renders many previous attacks impractical.
Discovery of Inherent Vulnerabilities: We identify and exploit two previously unexamined, protocol-level vulnerabilities. These vulnerabilities are not mere implementation flaws but stem from a fundamental design trade-off within Tor, where the goal of concealing circuit length inadvertently exposes subtle characteristics of cell transmission. Our attack leverages these inherent weaknesses to embed a covert signal without disrupting normal operations.
An Adaptive and Stealthy Framework: We design a highly adaptive and stealthy attack framework operationalized by a finite state machine (FSM) embedded within each malicious router. Using this FSM, our routers can dynamically switch between several roles based on real-time circuit conditions, such as a signal injector, a detector, or a standard passive relay. This adaptability is crucial for evading detection and ensuring the resilience of the attack.
Experimental Validation: We validate our methodology through experiments in a controlled, semi-realistic network environment. The results demonstrate that our covert signal is both stealthy, remaining undetectable by standard Tor routers and causing no performance degradation, and highly effective, enabling the reliable correlation of Tor users with their destinations.

The remainder of this paper is organized as follows. In Section 2, we review related works on Tor de-anonymization. Section 3 provides the necessary background on the Tor protocol and its architecture, establishing the context for our attack model. In Section 4, our proposed adaptive circuit-level cell sequence attack is presented in detail, from the underlying protocol vulnerabilities to the finite state machine that governs its execution. Section 5 validates the effectiveness and stealthiness of our method through experiments conducted in a controlled, semi-realistic environment. Finally, Section 6 concludes the paper, summarizing our key findings and their implications.

2. Related Works

Research on Tor de-anonymization attacks focuses on two main categories: identifying anonymous user activities and revealing the IP addresses of hidden services. This section reviews relevant studies, highlighting their methodologies, limitations, and the defensive responses implemented by the Tor Project. After Tor introduced bridges in 2008, only Ling et al. [7] explicitly considered how attackers might identify the positions of malicious relays within a circuit. Other studies have generally assumed ideal relay positioning, avoiding this practical challenge.

2.1. Linkage of Client Activities

One of the earliest client de-anonymization attacks was proposed by Bauer et al. [8] in 2007. This attack required controlling multiple Tor relays, either by introducing new malicious nodes or hijacking existing ones. At the time, attackers could misrepresent relay bandwidth and uptime to increase the likelihood that compromised relays would be selected as entry or exit nodes. Once selected, malicious relays disrupted traffic to force users into establishing new connections, repeating this process until both entry and exit positions were compromised. Attackers then correlated traffic patterns from these relays to link the identity of user with their visited sites. Bauer et al. further demonstrated [9] that certain applications, such as web browsing and email, were especially vulnerable due to their reliance on specific network ports. In response, the Tor Project introduced the “Bandwidth Authority” system [10], verifying relay metrics and thus mitigating the risks posed by manipulated relay selection.

Subsequent research refined these methods using timing-based techniques. Abbot et al. [11] showed that compromised exit nodes could embed invisible signals in webpage traffic. Malicious entry nodes detected these signals, enabling attackers to trace users. Extending this approach, Wang et al. [12] demonstrated methods that embedded detectable patterns in modified web pages without relying on active scripts, such as JavaScript, thereby increasing stealth. To counteract such threats, the Tor Project implemented an HTTPS-only browsing mode [13], significantly limiting attackers’ ability to manipulate or insert identifiable traffic patterns.

Other researchers explored vulnerabilities at the protocol level. Fu et al. [14] introduced a technique whereby attackers manipulated individual encrypted data packets (cells) within Tor circuits. However, this attack depended heavily on the simultaneous presence of relays responsible for injecting and detecting malicious signals; if only an injection relay existed, the manipulated traffic became detectable, alerting network monitors. Ling et al. [15] proposed embedding signals via intentional packet delays, although this introduced unnatural traffic characteristics, potentially alerting network defenses. Similarly, Rochet and Pereira [16] exploited the method used by Tor for handling dropped data packets to embed recognizable signals. The Tor Project responded by introducing the “Vanguard” addon [17], which effectively neutralized such attacks by preventing malicious use of dropped packets.

While these protocol-level attacks improved the theoretical understanding of Tor vulnerabilities, their effectiveness was limited by unrealistic assumptions regarding relay positions and circuit conditions, particularly ignoring complications introduced by bridges.

2.2. Exposing the True IP Address of Hidden Services

Early efforts to identify hidden service locations began with Øverlier and Syverson [18] in 2006. They demonstrated how attackers could manipulate traffic using compromised clients and relays to identify hidden services’ positions within the network. This approach became considerably less effective following the introduction of guard nodes by Tor [19], which restricted relay selection and significantly complicated attackers’ strategies.

Subsequently, Ling et al. [7] presented a more refined attack involving compromised entry relays and special-purpose nodes known as rendezvous points. Their method attempted to confirm relay positions by embedding identifiable traffic signals within Tor cells. Although they accounted for the presence of bridges, their detection mechanism had significant limitations, primarily because it only functioned correctly for specific circuit types. When malicious relays were positioned in unexpected circuit roles, these signals became ambiguous and unreliable, undermining the accuracy of the attack.

Further advancements were presented by Biryukov et al. [20], who proposed an attack relying on identifiable patterns created by malicious rendezvous points and guard nodes. By sending specific sequences of network packets and observing the resulting responses, attackers could effectively reveal hidden service locations. Chakravarty et al. [21] introduced another sophisticated method using network traffic statistics (NetFlow data), correlating traffic flows to identify hidden service endpoints accurately.

More recent techniques, such as INFLOW [22] and the Duster attack [23], developed by Iacovazzi et al., employed novel watermarking strategies exploiting weaknesses in Tor’s congestion control mechanisms. Due to their subtlety and effectiveness, these attacks prompted significant defensive measures from the Tor Project, notably the deployment of authenticated acknowledgments in 2020 [24], effectively neutralizing vulnerabilities related to congestion control.

Despite these advancements, previous research often relied on overly optimistic assumptions regarding relay positioning or control over circuit conditions, neglecting practical difficulties posed by bridges. Our paper addresses these gaps by introducing an adaptive and realistic attack model capable of dynamically adjusting relay roles based on real-time circuit conditions, significantly enhancing the practicality and effectiveness of de-anonymization attacks against Tor.

3. More Background on Tor

In this section we provide an overview of Tor’s key concepts useful for the understanding of the paper. Section 3.1 introduces the primary components of Tor. Section 3.2 explains the basic units of communication within Tor, called cells. In Section 3.3, we discuss the different types of circuits used in Tor and outline their construction processes. Lastly, we discuss how these circuits facilitate anonymous interactions between Tor clients and application servers in Section 3.4.

3.1. Components of Tor

Tor is a widely-used overlay network that ensures anonymous communication over the Internet. It is an open-source project that provides anonymity for TCP-based applications. Tor consists of six core components:

Tor Client: The client runs a local software called the Onion Proxy (OP), which anonymizes client data by routing it through the Tor network.
Application Server: This server supports TCP applications, such as web services. It can either be a public service, accessible by Tor clients through an external circuit, or a hidden service (HS), which can only be accessed by Tor clients using an internal circuit. Hidden services may use Unix sockets rather than TCP sockets to avoid leaking information about their IP address on the local network.
Onion Routers (ORs): Onion Routers are the proxies that relay data between the Tor client and the application server. There are two types of onion routers: public onion routers and hidden onion routers, the latter of which are also known as bridges. Public onion routers are listed in the directory server and can be accessed by anyone. In contrast, hidden onion routers (bridges) are not listed publicly and can only be obtained through specific channels (e.g., websites, email, or Telegram bots). These bridges are used as the initial hop to access the Tor network in regions with heavy censorship.
Directory Servers: These servers store and distribute the public information about onion routers and hidden services, including their public keys and configuration details.

3.2. Communication Unit: Cells

Cells are the basic units of communication in the Tor network. They can be categorized into fixed-length cells and variable-length cells. In this paper, we focus on fixed-length cells, which are 512 bytes in size, as shown in Figure 1a. There are two subtypes of fixed-length cells: link-level cells and circuit-level cells.

Circuit-Level Cells: These cells are used for end-to-end communication between the Onion Proxy (OP) and other nodes in the Tor circuit. When the Cmd field of fixed-length cell is set to RELAY or RELAY_EARLY, the cell is classified as a circuit-level cell (see Figure 1b). The different Rel_cmd values shown in Figure 1b define various types of circuit-level cells, each serving a distinct purpose. These include:

RELAY_COMMAND_EXTEND: Used for extending the circuit to the next node.
RELAY_COMMAND_BEGIN: Used to create a stream that will be multiplexed over the circuit.
RELAY_COMMAND_DATA: Carries application data to be sent over the circuit to the edge nodes.
RELAY_CONFLUX_LINK: Used by the Conflux subprotocol, facilitates the linking of circuits at the edge node, as sent by the OP.
RELAY_COMMAND_ESTABLISH_RENDEZVOUS: Used by the OP to register an OR as a rendezvous point.
RELAY_COMMAND_RENDEZVOUS1: Used by the HS to join the OP-side rendezvous circuit.

Link-Level Cells: These cells are used by Tor nodes to communicate with adjacent nodes within the circuit. When the Cmd field is set to a value other than RELAY or RELAY_EARLY, the cell is classified as a link-level cell. The Cmd field of a link-level cell can include:

CREATE and CREATED: Used to establish the session key and create a circuit.
DESTROY: Indicates that the circuit should be torn down.

3.3. Circuit Types and Construction

Tor circuits can be categorized into two main types based on the service being accessed: internal circuits for hidden services, and external circuits for public services. Before establishing a circuit, the Onion Proxy (OP) uses a path selection algorithm [25] to select multiple onion routers (ORs) from the directory servers. These ORs are then used to build the circuit incrementally, one hop at a time, as illustrated in Figure 2. The total number of routers in a circuit is called the path length.

Once the path is chosen, the OP begins constructing the circuit by connecting to the first onion router, OR₁. Tor employs TLS to authentication and encryption between each node. The OP begins by sending a CREATE cell to OR₁ and performing a Diffie-Hellman key exchange to negotiate session keys. OR₁ responds with a CREATED cell, confirming the successful circuit establishment with the first hop. To extend the circuit further, the OP sends a circuit-level cell known as RELAY_EARLY_EXTEND to OR₁. Upon receiving this cell, OR₁ extracts the enclosed messages and forwards a corresponding CREATE cell to the next router, OR₂. When OR₂ confirms the extension by sending back a CREATED cell, the OP repeats this procedure to extend the circuit to OR₃. After each step, session keys are established between the OP and each router along the circuit. Once the full circuit is established, the OP can use it to communicate anonymously with the intended server.

External Circuit: External circuits used by Tor clients to access public servers anonymously. After establishing a circuit (default length: 3), the OP designates the last router (OR₃) as an exit point by sending a circuit-level cell RELAY_EARLY_BEGIN. As this cell passes through each router, each layer of encryption is removed sequentially, until the exit router can read the original request and establish a TCP connection to the intended public server. Application data is subsequently exchanged between the OP and the public server using circuit-level cells with Rel_cmd value equal to RELAY_COMMAND_DATA.

Internal Circuit: Internal circuits are used to connect clients to hidden services within the Tor network, protecting both clients’ and services’ locations. Internal circuits consist of four subtypes, each serving a different purpose:

HS-side Introduction Circuit (default length: 3): Created by the hidden service to designate an onion router as an introduction point. Clients use this introduction point to contact with the hidden service.
Client-side Rendezvous Circuit (default length: 3): Established by the client to designate an onion router as a rendezvous point. The rendezvous point allows the hidden service to connect back to the client anonymously.
Client-side Introduction Circuit (default length: 4): Constructed by the client to connect to the introduction point of a hidden service. This circuit type includes an additional hop, with the fourth node being established by the hidden service.
HS-side Rendezvous Circuit (default length: 4): Built by the hidden service to connect to the client-established rendezvous point. Like the client-side introduction circuit, this also involves an additional hop, with the fourth node determined by the client.

3.4. Communication Between Tor Clients and Services

As previously discussed, Tor clients use external circuits to communicate with public services. The process for communication between a Tor client and a hidden service is illustrated in Figure 3. The process begins when the hidden service connects (1) to a node within the Tor network, requesting the node to act as an introduction point for the service. If the node agrees, the connection remains open. Otherwise, the hidden service attempts to connect to another node. These connections stay open indefinitely unless one of the nodes is restarted or chooses to close the connection. Next, the hidden service contacts (2) the directory server, requesting it to publish the contact information of hidden service. After this step, the hidden service is ready to accept connection requests from clients.

To retrieve data from the hidden service, the client first connects (3) to the directory server and queries it for the contact information of the hidden service, including the addresses of the introduction points. There may be multiple introduction points for each service. The client then selects a node to act as the rendezvous point, connects (4) to it, and requests that it listen for connections on behalf of the hidden service. The client continues attempting this process until a rendezvous point accepts the request. Once accepted, the client contacts (5) the introduction point and asks it to forward the rendezvous point information to the hidden service. The introduction point then forwards (6) the rendezvous point information to the hidden service. The hidden service evaluates whether to connect to the rendezvous point. If the connection is approved, the hidden service connects (7) to the rendezvous point and requests it to establish the connection to the waiting rendezvous circuit. The rendezvous point then forwards (8) this connection request to the client.

Finally, the rendezvous point can start relaying data between the client and the hidden service, creating an anonymous data tunnel (9) between them, as shown in the Figure 3.

3.5. Evaluation of Onion Router Positions in Circuits

Before bridges were introduced into the Tor network in 2008, onion routers could easily determine whether they were at the entry position in a circuit by examining the IP address of the preceding router. Since the publicly accessible consensus file was compiled and updated hourly by directory authorities, an onion router could conclude it was not the entry node if the IP address of previous router appeared in that file. Directory authorities maintained the consensus file only for onion routers, not for Tor clients. Consequently, identifying the preceding node’s information allowed a malicious router to accurately evaluate its position and plan further attack steps.

However, the introduction of bridges significantly altered this dynamic. Bridges perform the same basic functions as standard onion routers but were specifically introduced to circumvent censorship. Unlike regular onion routers, bridges do not have their details listed publicly in the consensus document. Instead, their information is maintained privately by the bridge authority. Consequently, when a router observes an upstream node whose IP is absent from the public consensus, it can no longer reliably determine if it is the entry node. This uncertainty undermines traditional attack methods, as malicious routers can no longer confidently identify their position within the circuit, complicating subsequent attack strategies.

4. Adaptive Circuit-Level Cell Sequence Attack

In this section, we describe the adaptive circuit-level cell sequence attack in detail. First, we identify two fundamental vulnerabilities in the protocol design of Tor in Section 4.1. Section 4.2 illustrates how these vulnerabilities can be exploited to inject covert signals into the network. Section 4.3 outlines the finite state machine embedded within malicious onion routers, which operationalizes the proposed attack. Finally, Section 4.4 provides an analytical assessment of the probability that an adversary can control both the entry and exit nodes of a Tor circuit.

4.1. Protocol-Level Vulnerabilities

To clearly present the terminology used throughout this paper, we define outbound direction as moving away from the OP, and inbound direction as moving toward the OP. Additionally, the downstream OR refers to the next router in the direction of circuit establishment, while the upstream OR is the router in the reverse direction.

4.1.1. Behavior of Circuit-Level Cells

Edge nodes, including onion proxies, exit routers, hidden servers, introduction points, and rendezvous points, exchange end-to-end messages through circuit-level cells. These cells are classified into two types: RELAY cells and RELAY_EARLY cells. RELAY cells serve general purposes and are utilized by all edge nodes. In contrast, RELAY_EARLY cells is designed to limit the length any circuit can reach and can only be initiated by the OP with the following rules:

Content Rule: Circuit extension requests must be encapsulated exclusively within RELAY_EARLY cells. As illustrated in Figure 1, if the Rel_cmd field is RELAY_COMMAND_EXTEND, the cell header Cmd field must be RELAY_EARLY.
Maximum Quantity Rule: When establishing a new circuit, the OP randomly sets the maximum quantity to either 7 or 8. This means the circuit allows a maximum of 7 or 8 RELAY_EARLY cells to be sent outbound.
Continuity Rule: To partially conceal the circuit length, the OP sends the first Maximum Quantity circuit-level cells as RELAY_EARLY cells.
Direction Rule: For the historical security reason [26], the direction of RELAY_EARLY cells is restricted to outbound.

Upon receiving RELAY_EARLY cells, onion routers perform checks to ensure compliance with the rules outlined above, as illustrated in Figure 4. Checks for content and direction are straightforward. For the quantity check, each router uses a counter named remaining_relay_early_cells, initialized to eight. This counter decreases by one for each outbound RELAY_EARLY cell received. If the counter falls below zero, or if any rule is violated, the router immediately sends a RELAY_DESTROY cell to terminate the circuit.

4.1.2. Absence of Continuity Validation (V1)

Figure 4 highlights a significant oversight: Tor lacks a continuity check for circuit-level cells. This absence enables adversaries to transmit covert collusive signals between malicious onion routers—one functioning as the signal injector and the other as the signal detector—without terminating the circuit. Specifically, when constructing a circuit, the OP initially sends path_length − 1 RELAY_EARLY cells to extend the circuit toward the edge node, with the remaining RELAY_EARLY cells, calculated as the maximum quantity minus (path_length − 1), reserved for tasks such as stream initiation or rendezvous point registration.

To send covert signals without triggering circuit termination, a malicious onion router modifies the header fields of relayed circuit-level cells, ensuring they pass the validation checks of downstream routers. However, the signal injector (i.e., a non-edge malicious onion router) cannot accurately determine the type of the victim circuit or its exact position within it. Therefore, it must preserve the initial sequence of unmodified RELAY_EXTEND cells. The number of such cells, denoted as

N_{reserved}

, corresponds to the maximum number of circuit extension cells relayed by the router across all circuit types, and is calculated as:

N_{r e s e r v e d} = p a t h_l e n g t h_{m a x} - 2

(1)

4.1.3. Residual Values in RELAY_EARLY Counters (V2)

A second vulnerability arises when the OP uses up all maximum quantity RELAY_EARLY cells. Although the OP stops to send such cells, downstream ORs may still have nonzero counter values. Table 1 illustrates this with a three-hop circuit: after transmitting 8 RELAY_EARLY cells, the counters at OR₂ and OR₃ remain at 1 and 2, respectively—indicating that these onion routers still have room to accept more RELAY_EARLY cells.

This discrepancy forms the basis of the confirmation mechanism in our attack. Malicious routers can use these anomalous odd cells to infer their position in the circuit. Let OR_n represent a router at the n-th hop of a circuit with total length

α

. Although the upper bound C_n of received outbound RELAY_EARLY cells is 8, the actual count m_n is usually lower. The difference C_n − m_n indicates how many odd cells may be received. For example, in a three-hop circuit, where OR₃ is theoretically expected to receive a maximum of six RELAY_EARLY cells (m₃ = 6), the arrival of the 7th and 8th cells would be anomalous and indicative of a malicious injector.

By intentionally exhausting the counter of downstream OR, the injector ensures these odd cells will be observed. These cells serve two purposes: they allow the injector to determine its position within the circuit and act as an end marker for the covert signal. Section 4.2 further details the use of odd cells in our attack.

4.2. Attack Algorithm

This attack aims to confirm whether Alice (client) is communicating with Bob (a public server) via Tor, or whether Eve is operating a hidden service. To achieve this, we assume the attacker controls a small fraction of Tor’s onion routers by introducing nodes to the network—a widely accepted assumption in prior Tor attack models, given that onion routers are volunteer-operated [27]. For instance, an attacker could rent BuyVM virtual machines and incorporate them into the Tor network.

The attack proceeds as follows. Once a malicious onion router becomes part of a Tor circuit, it determines its role in the attack based on its inferred position in the circuit: either as a signal injector or a signal detector. If identified as an edge node, the router assumes the role of detector; otherwise, it acts as a injector. The injector embeds a covert signal within the cell sequence, while the detector attempts to recognize this signal and infer the injector’s position in the circuit.

If the injector is located at the entry point and the detector at the exit, they can collaboratively confirm that Alice is communicating with Bob as demonstrated in Figure 5. Alternatively, if the detector is positioned at the rendezvous point of a hidden service and the injector is at the entry, they can reveal the location of Eve’s hidden server. In addition, each malicious router is assigned a unique global identifier i, which facilitates the coordination of bidirectional attack vectors.

The following subsections elaborate on the responsibilities of each role. We first describe how the injector creates and embeds the covert signal across various circuit types without triggering circuit termination. Then, we explain how the detector identifies the signal and determines the injector’s position within the circuit.

4.2.1. Functionality of the Covert Signal Injector

If a malicious router receives a RELAY_EARLY_EXTEND cell as the first recognized circuit-level cell, it confirms that it is a non-edge node within the circuit. It then assigns a circuit identifier j∈ {0, 1, …, 9}, and, together with its own global identifier i∈ {000, 001, …, 101}, constructs a tuple

(i, j)

, which may represent either Alice (the client) or Eve (a hidden service). The router now assumes the role of injector and begins generating the covert signal by modifying the Cmd field in the cell header.

As illustrated in Table 2, the signal consists of four fields: Reserved, Start Delimiter, Payload, and End Delimiter.

1.: Reserved: This field preserves the initial unmodified RELAY_EARLY cells to ensure the signal passes the content check of downstream ORs. Its length is set to $N_{reserved} = 2$ , given that the maximum Tor path length is 4. Thus, the second and third RELAY_EARLY_$α$ cells following the recognized RELAY_EARLY_EXTEND cell remain unchanged to form this field.
2.: Start Delimiter: This field exploits Vulnerability 1 (the absence of continuity validation) to mark the beginning of the covert signal. Since the OP sends consecutive RELAY_EARLY cells until C_OP is exhausted, the following cell transitions are valid under normal conditions: RELAY_EARLY → RELAY_EARLY, RELAY_EARLY → RELAY, and RELAY → RELAY. However, the transition RELAY → RELAY_EARLY should never occur. The injector deliberately introduces this anomalous pattern to signal the start of the covert message.
3.: Payload: The payload encodes the identifier pair $(i, j)$ into five segments of RELAY cells, separated by four RELAY_EARLY delimiters. To reduce signal size and minimize message overhead, this identifier pair is further compressed using the following encoding scheme:
First, the injector concatenates the three decimal digits of i (denoted $d_{1}, d_{2}, d_{3}$ ) with the single digit of j (denoted $c_{1}$ ) to form a four-digit decimal number:

$N = 1000 \cdot d_{1} + 100 \cdot d_{2} + 10 \cdot d_{3} + c_{1}$

(2)

This decimal value N is then converted into a 10-bit binary string in big-endian bit order:

$B = (b_{9}, b_{8}, \dots, b_{0})$

The binary sequence B is divided into five consecutive 2-bit segments $(b_{2 k}, b_{2 k + 1})$ , for $k \in {0, 1, 2, 3, 4}$ . Each pair is mapped to an integer $R_{k}$ that determines the number of RELAY cells in the k-th segment:

$R_{k} = 2 \cdot b_{2 k} + b_{2 k + 1}, for k \in {0, 1, 2, 3, 4}$

(3)

Finally, each RELAY segment of length $R_{k}$ is inserted into the cell sequence, with RELAY_EARLY cells serving as delimiters between adjacent segments.
4.: End Delimiter: This field leverages Vulnerability 2 by deliberately exhausting the counter of the adjacent downstream router. As a result, the edge node receives at least one anomalous (odd) RELAY_EARLY cell. If this edge node is also a malicious router, it can detect the presence of the odd cell, thereby identifying both the end of the covert signal and the position of the injector. Once the eighth RELAY_EARLY cell is transmitted, the injector completes the encoding process.

4.2.2. Measuring the Stealthiness and Effectiveness of the Covert Signal

In Section 4.2.1, covert signals can be injected from any non-edge position in all circuit types. These signals are defined by two key characteristics: stealthiness and effectiveness. A signal is stealthy if it remains undetected by other nodes in the circuit, meaning it does not disrupt the circuit or trigger any log alerts. It is effective if it successfully encodes the address of either Alice or Eve.

Stealth is a critical requirement for this attack, as a malicious relay cannot determine which types of circuits it will join. Therefore, the injected signal must remain undetectable across all circuit types to avoid exposure. In contrast, effectiveness is only necessary for specific target circuits. Because the injector must actively modify cells to generate the signal, it is important to assess both properties. In this section, we provide a theoretical measurement of the stealthiness and effectiveness of the covert signal.

We first demonstrate that all injected signals are stealthy. This means that the injector can modify the cell header without compromising the signal’s undetectability. The stealthiness of the signal is verified by ensuring that it does not violate the RELAY_EARLY rules enforced by downstream onion routers. Since the injector only modifies the cell header and does not alter the payload or the direction, the signal adheres to both the content and direction rules. Additionally, because there is no continuity check (V1), the signal is unaffected by the continuity rule.

When a malicious onion router receives an outbound RELAY_EARLY_EXTEND cell, it switches to injector mode. Regardless of the following number of RELAY_EARLY_$α$ cells received, the injector generates the signal by sending a sequence of discontinuous RELAY_EARLY_$α$ cells. The number of these cells is bounded by C, the maximum number of RELAY_EARLY_$α$ cells an onion router can receive, which is 8, as detailed in Section 4.1.3. These cells may also contain X unrecognized RELAY_EARLY_EXTEND cells, where X can range from 0 to 2 depending on the injector’s position. In other words, with an injector in the circuit, downstream onion routers will receive a maximum of C − X_min RELAY_EARLY cells, where X_min is the minimum number of unrecognized RELAY_EARLY_EXTEND cells (which is 0). This ensures that the total number of RELAY_EARLY cells received by downstream onion routers never exceeds 8. Consequently, the covert signal remains stealthy across all injector positions, as it does not violate any of the rules imposed by downstream onion routers.

Since the injector cannot determine its exact position within the circuit, it can only encode the address of the upstream node in the signal. As shown in Table 3, if the injector is not positioned at OR₁, it will be in an improper position, making the signal ineffective, as described by

I E # 3

. However, even when the injector is positioned at OR₁, the signal may still fail to be effective. Specifically, as noted in

I E # 1

, the padding mechanism implemented between the OP and OR₂ in the client-side onion circuit interferes with the signal, rendering it ineffective. Furthermore, if the signal is generated from OR₁ within the hidden server’s introduction circuit, it may encounter an insufficient number of outbound cells, leading to an incomplete signal.

4.2.3. Functionality of the Covert Signal Detector

A malicious onion router enters detector mode when it is positioned at the edge of circuit. This is determined when the first circuit-level cell is received, and the Rel_cmd field corresponds to one of the following values: BEGIN, INTRODUCE1, ESTABLISH_INTRO, ESTABLISH_REND, or RENDEZVOUS1. However, as shown in Table 3, the signal is effective only in external circuits and server-side rendezvous circuits. Therefore, the router switches to detector mode when it receives a RELAY_COMMAND_BEGIN cell, indicating it is at the exit node of an external circuit, or a RELAY_COMMAND_RENDEZVOUS1 cell, indicating it is at the rendezvous point of a server-side rendezvous circuit.

Once in detector mode, the router monitors the cell sequence for the presence of the covert signal by identifying the start delimiter. It also checks for odd cells to determine whether the signal has been fully received. If multiple odd cells are detected, it suggests the signal was sent from an incorrect position, as detailed in Section 4.1.3. In this case, the signal is ineffective, but to maintain stealth, the detector continues processing the message as usual.

If only a single odd cell is detected, the signal is confirmed to have originated from the correct position, specifically from OR₁ (the entry node). The signal is considered effective, and the detector processes the message, recording the cell sequence pattern to decode the signal. The decoding process follows these steps:

Extracting Payload Data: The lengths of the five RELAY segments are extracted as decimal digits: $R_{0}, R_{1}, R_{2}, R_{3}, R_{4}$ from the payload field of the signal.
Decompression: Each decimal digit is converted into a 2-bit binary string using the inverse function of Equation (3):

$f^{- 1} (R_{i}) = (b_{2 i}, b_{2 i + 1})$

(4)

The resulting 2-bit segments are concatenated to form a 10-bit binary string:

B = (b_{9}, b_{8}, \dots, b_{0})

The 10-bit binary string is converted into a four-digit decimal number N, which represents the identifier pair

(i, j)

, where i is the global identifier of the injector who created the signal, and j is the circuit identifier corresponding to the OP at the injector.

The decoded signal provides a bidirectional attack vector, enabling the adversary to either correlate the OP with a public server or uncover the address of a hidden server.

4.3. Deployment of Malicious Routers

This subsection explains how malicious routers are strategically deployed within the Tor network, enabling them to operate either as signal injectors or detectors based on their position within a circuit. To achieve this adaptive functionality, each malicious router utilizes a finite state machine (FSM), illustrated in Figure 6, which governs transitions between operational states according to observed events within the network.

This state-driven approach allows malicious routers to flexibly adapt their roles based on network events while ensuring that no two malicious routers directly connect each other in a circuit. Furthermore, throughout all operational states, the FSM ensures normal message transmission continues uninterrupted, preserving the covert nature of the attack.

Idle Mode (S): In Idle mode, the FSM waits passively for the router to be incorporated into a Tor circuit. During this stage, the router remains inactive but ready to shift modes upon detecting specific circuit-related events.

Transition from Idle mode to Controller mode EVENT_S→C occurs when the router successfully joins a circuit. Specifically, upon sending a CREATED cell in response to a previously received CREATE cell, the router moves to Controller mode, provided that the upstream node is not another malicious router. This ensures the circuit does not contain consecutive malicious routers, maintaining effective.

Controller Mode (C): In Controller mode, the router examines the first received circuit-level cell to determine its subsequent role within the attack. Depending on the cell type, the router transitions accordingly:

If the first cell received is neither a circuit extension RELAY_EARLY_EXTEND nor indicative of a target edge position (RELAY_EARLY_BEGIN or RELAY_EARLY_RENDEZVOUS1), or if the cell suggests another malicious router is downstream, the FSM returns the router to Idle mode EVENT_C→S.
If the initial cell is a circuit-extension request RELAY_EARLY_EXTEND, the router transitions into Injector mode to actively inject signals EVENT_C→I.
If the initial cell indicates the router is at a circuit edge (either RELAY_EARLY_BEGIN or RELAY_EARLY_RENDEZVOUS1), it transitions into Detector mode to monitor incoming signals EVENT_C→D.

Injector Mode (I): Injector mode allows the router to actively embed covert signals into circuit-level cells. The details of signal generation are described in Section 4.2.1. After exhausting the downstream router’s RELAY_EARLY counter C, the injector completes its task and returns to Idle mode. This ensures the injected signals remain compliant with rules detailed in Section 4.1.1 and avoids suspicion.

Detector Mode (D): In Detector mode, the router passively inspects the cell sequence to identify covert signals, as detailed in Section 4.2.3. Upon successful decoding of a valid signal or if multiple odd cells (indicating an ineffective signal) are received, the detector transitions back to Idle mode EVENT_D→S, concluding its monitoring role.

4.4. Analytical Probability of Entry-Exit Control

We now present an analytical evaluation of the probability that an adversary controlling both the entry and exit nodes of a Tor circuit can successfully link a client to the service being accessed. The relay selection algorithm of Tor assigns relays based on their advertised bandwidth, applying additional weighting factors defined by consensus parameters to balance relay usage effectively. Specifically, nodes designated as “Guard” can only be chosen as entry points, while “Exit” nodes are exclusively used as circuit exits. Nodes marked with both flags (“Guard + Exit”) receive adjusted weights to prevent dominance in either position.

To quantify this probability, let

B_{i}^{G}

be the total advertised bandwidth of Guard relays controlled by the adversary,

B_{G}

the total bandwidth of all Guard-only relays, and

B_{E E}

the bandwidth of relays flagged as both Guard and Exit. The consensus applies a weighting factor

W_{E}

for dual-flagged nodes in the Guard position, defined as

W_{E} = 1 - (B / 3 B_{E})

, and

W_{E} = 0

if

B_{E} < B / 3

. The probability that the adversary controls the entry node is thus:

P_{i} = \frac{B_{i}^{G}}{B_{G} + B_{E E} \cdot W_{E}}

(5)

For the exit position, let

B_{j}^{E}

denote the total Exit relay bandwidth of adversary,

B_{E}

the total bandwidth of all Exit-only relays, and

W_{G}

the consensus-defined weighting for dual-flagged nodes in the Exit role, where

W_{G} = 1 - (B / 3 B_{G})

, and

W_{G} = 0

if

B_{G} < B / 3

. The probability of controlling the exit node is:

P_{j} = \frac{B_{j}^{E}}{B_{E} + B_{E E} \cdot W_{G}}

(6)

Assuming the selection of entry and exit nodes is independent, the combined probability

P_{b o t h}

of simultaneously controlling both positions is:

P_{b o t h} = P_{i} \times P_{j} = \frac{B_{i}^{G}}{B_{G} + B_{E E} \cdot W_{E}} \times \frac{B_{j}^{E}}{B_{E} + B_{E E} \cdot W_{G}}

(7)

Real-world incidents, such as the documented KAX17 attack [28], illustrate the practical relevance of this model. At its peak, KAX17 controlled approximately 10.3% of the Guard capacity of Tor and about 4.6% of its Exit capacity, resulting in roughly a 0.5% per-circuit probability of capturing both entry and exit positions. Although seemingly modest, this probability highlights how even moderate bandwidth control can yield significant deanonymization risks when scaled across many circuits.

To better illustrate this relationship, Figure 7 presents the catch probability against the controlled bandwidth of the adversary, using representative network parameters from Tor Metrics [29] for mid-2025. The visualization reveals that while the chances of capturing a single entry or exit position scale linearly with the network share of an adversary, the probability of compromising the full circuit grows quadratically. This underscores the critical link between resource investment and deanonymization success, quantifying how the risk escalates as the control of an adversary over the network increases.

5. Evaluation

This section presents experimental results demonstrating the effectiveness and stealthiness of the protocol-level de-anonymization attack described earlier. Our approach exploits two protocol vulnerabilities: V1, which allows the silent transmission of covert signals, and V2, which enables confirmation of these signals. We conjecture that the attack can compromise both external and internal circuits as long as no other adversary is already exploiting these same vulnerabilities.

The experiments were conducted in two phases. In the first phase, we assessed whether the two protocol vulnerabilities were already being exploited in the wild, establishing a baseline for our own attack. In the second phase, we evaluated the effectiveness of our method using external circuits. Since the attack algorithms for both external and internal circuits share the same strategy, results obtained from external circuits are representative.

To ensure ethical compliance, all traffic used in these experiments was generated solely by our team within a controlled environment. This approach prevented any impact on real Tor users and eliminated legal or privacy concerns.

5.1. Experimental Setup

To create a realistic testing environment, we used Tor version 0.4.8.16 (April 2025) and included three types of components in our experiments:

Malicious Onion Routers: We deployed two modified Tor routers on separate dedicated servers provided by BuyVM, located in different autonomous systems—one in Luxembourg and one in the United States. To simulate vanilla path selection in the second phase, we added a configuration option ActiFsm that toggles the finite state machine (FSM) for signal injection and detection. When ActiFsm is set to 1, the router operates as a malicious relay; when set to 0, it functions as a standard relay. This flexibility allowed us to test all injector–detector topologies as illustrated in Figure 8. Additionally, the modified source code logs every FSM state transition and records the corresponding cell sequences during signal injection and detection. To avoid interfering with real users in the second phase, these routers operated privately, listened on a non-standard port (60001), and disabled public directory listings via the PublishServerDescriptor 0 option.

Modified Tor Client: We adapted the Tor client to verify the attack. In the second phase, the client established a new circuit for each 10KiB data upload to the server. The client was also configured to build circuits across all injector–detector topologies, simulating vanilla path selection using the EntryNodes, MiddleNodes, and ExitNodes options. All client traffic was routed over the attacker-controlled circuit using SOCKS to ensure reliable data delivery.

HTTPS Web Server: To serve as the destination for these external circuits, we deployed a standard HTTPS web server hosted on Amazon Web Services (AWS). The server accepted connections on port 443 and logged connection details for debugging and analysis.

5.2. Phase One: Detecting Exploitation in the Wild

In the first phase, we established a baseline by assessing whether the protocol vulnerabilities were already being exploited in the wild. We operated one router as a standard node, with PublishServerDescriptor set to 1 and ActiFsm set to 0, making it accessible to regular Tor users. Its advertised bandwidth was limited to 29 MiB/s (MaxAdvertisedBandwidth), and after 70 days of stable operation [30], the router acquired the entry, exit, stable, and fast flags. Data from Tor Metrics (Figure 9a) confirmed that the router primarily served as an exit node during a 30-day monitoring period.

We analyzed real-world Tor traffic by examining the sequences of RELAY_EARLY cells in circuits passing through our exit node, searching for inconsistencies or unusual patterns that could signal exploitation of V1. Among 13,457,823 observed circuits, the RELAY_EARLY cell sequences were consistent, and no irregular cells were found.

Based on our protocol analysis (Section 4.1.3), the expected number of RELAY_EARLY cells received by an exit node is typically 5 or 6. Counts of 7 or 8 would indicate possible exploitation, since in a standard three-hop circuit, the first two RELAY_EARLY cells are used for circuit extension. Even if the client sends the maximum allowed by the protocol, an exit relay should not receive more than six. Observing a higher number would therefore be a clear anomaly—an indicator that can be intentionally created by a malicious actor, but is extremely unlikely under normal conditions. Therefore, the consistent absence of such anomalies in our data suggests that V2 was not exploited during the monitoring period.

5.3. Phase Two: Validation of Attack Performance

In the second phase, we evaluated the attack across the injector–detector topologies illustrated in Figure 8, vanilla path selection results in six distinct topologies. However, in topologies (c) and (e), the finite state machine (FSM) at the upstream relay transitions from state C back to state S when it detects that the next node is also malicious, as described in Section 4.3. This mechanism ensures that the relay reverts to standard behavior and does not inject a signal. In topology (f), only a detector is present, which merely observes signals without modifying network traffic. For these reasons, we exclude topologies (c), (e), and (f) from further analysis.

The remaining topologies are summarized in Table 4. We demonstrate the stealthiness of the attack in these scenarios by verifying that the client can upload files without circuit disruption. The effectiveness of the signal in the third topology is confirmed through FSM log analysis. For the first and second topologies, we discuss the FSM logs in detail, noting that our relay at OR₃ can still record cell sequences even when ActiFsm is set to 0. Additionally, in the second topology, we deliberately removed the address of the detector router from the injector’s known node list to simulate a scenario in which OR₃ operates as a standard relay.

For each topology in Table 4, the client established a new circuit for every 10KiB of data uploaded to the server, repeating this process 1020 times. In both the first and second topologies, all 1020 circuits completed without any disruptions, further confirming the signal’s stealthiness. The first topology represents a scenario in which the client selects OR₁ as the injector, with no detector included in the circuit; for the tuple

(098, 7)

, the injected and detected cell sequences match those of the third topology. The second topology simulates OR₂ as the injector, again with no detector present, and the edge relay receives more than one odd cell, as illustrated in Figure 10. This finding also applies to server-side rendezvous circuits (see Table 3), since the number of odd cells received depends only on the injector’s position, not on the distance between injector and detector.

In the third topology, all 1020 circuits completed without disruption, demonstrating the stealthiness of the signal. Using the relay identifier

i = 098

for the injector, we analyzed FSM log files from both the injector and detector for circuits with identifiers

j = 1

(Figure 11a,b) and

j = 7

(Figure 11c,d). These results confirm the effectiveness of the covert signal, showing that the FSM can reliably encode and decode the signal from the observed cell sequences.

Across all 3060 tested circuits, no disruptions were observed, and the Tor process which was running at the default notice log level, recorded no warnings. These results confirm that the attack signal remained stealthy across all injector–detector topologies.

To further validate the accuracy of our attack, we performed a time correlation analysis using the third topology, where both signal injection and detection are expected to succeed. In this setup, the client initiates a new external circuit every 10 s to connect to the web server. Our modified component, the injector, records timestamps for two critical events: the completion of sending the start and end signal fields. At the same time, the detector logs the corresponding timestamps when it recognizes these fields.

To evaluate the alignment between injection and detection times, we compute the Pearson correlation coefficient (r), which measures the strength of a linear relationship between the two time sequences. Specifically, x and y represent the injection and detection timestamps, while

\bar{x}

and

\bar{y}

are their respective mean values:

r = \frac{\sum (x - \bar{x}) (y - \bar{y})}{\sqrt{\sum {(x - \bar{x})}^{2}} \cdot \sqrt{\sum {(y - \bar{y})}^{2}}}

(8)

Figure 12 demonstrate a near-perfect correlation (

r > 0.999

) between the timestamps for both the start and end delimiters. This high degree of correlation confirms the precision of our protocol-level attack in linking sender and receiver within the Tor network. The marginal departure from an ideal

r = 1

is consistent with the minute timing variations expected in a semi-realistic network environment incorporating live onion routers.

Notice that this time-correlation analysis serves only to demonstrate the accuracy of the attack. The attack itself does not rely on timing information. Moreover, the method remains efficient and stealthy, as it introduces no noticeable delay and requires only minimal modification to cell headers, regardless of whether a detector is present on the circuit.

5.4. Ethical Considerations

Our research strictly follows the safety guidelines published by the Tor Research Safety Board [31] to ensure the anonymity and privacy of legitimate Tor users. Specifically, we implement the following measures:

Controlled Traffic Generation: All experimental traffic is generated exclusively between our modified Tor client and a destination web server under our control. This ensures that our experiments do not interfere with or de-anonymize other users’ activities on the Tor network.
Private Attacker Infrastructure: The attacker-controlled onion routers are configured for private use only, meaning they do not publish their descriptors to the public Tor directory and listen on non-standard ports. This configuration prevents them from being incorporated into circuits used by other Tor users. Consequently, our active attack, which involves modifying cell headers, is limited to circuits initiated by our own client and passing through our controlled relays, thereby not affecting any other users’ traffic.
Minimal Data Logging: We log and store only the minimal amount of data essential for our analysis.Specifically, we record the sequence of cell headers and timestamps recorded by our injector and detector components for time-correlation analysis. No payload data or user-identifiable information beyond our controlled experiment is collected.

6. Discussion on Countermeasures

This section analyzes the limitations in the protocol of Tor that prevent the detection of our attack and proposes two practical countermeasures to mitigate these vulnerabilities.

6.1. Attack Detectability

Our experimental results show that the attack is not detectable by current Tor components. This undetectability arises because the attack takes advantage of fundamental weaknesses in the protocol, rather than breaking any explicit rules. Specifically, the attack succeeds due to two main vulnerabilities: first, Tor lacks a mechanism to check the continuity of RELAY_EARLY cells; second, there are no safeguards to detect leftover RELAY_EARLY counters at intermediate relays. Our review of the Tor source code confirms these weaknesses.

Moreover, while our attack involves active cell header modification, the introduced processing latency (measured in clock cycles) is negligible when compared to standard network jitter. This makes timing-based analysis an unreliable method for detecting the covert signal.

Currently, the defenses of Tor for RELAY_EARLY cells are limited to three main checks (version 0.4.8.16, April 2025). First, the variable remaining_relay_early_cells is initialized to eight in the or_circuit_new function for each circuit and is decremented with every use in the command_process_relay_cell function; if this limit is exceeded, the circuit is closed. Second, the same function ensures that RELAY_EARLY cells cannot be sent in the inbound direction. Third, the handle_relay_cell_command function requires that every circuit extension EXTEND relay command must be sent inside a RELAY_EARLY cell.

6.2. Potential Mitigations

Given these limitations, we propose two countermeasures that could effectively mitigate the risk of such protocol-level attacks.

The first vulnerability (V1) could be mitigated by introducing stricter validation of cell sequences. For instance, the protocol could be modified so that once a circuit has transmitted its first standard RELAY cell, all subsequent RELAY_EARLY cells on that circuit are rejected. This would prevent the injection of the RELAY → RELAY_EARLY transition that serves as the “start delimiter” for our covert signal, thereby neutralizing the attack.

The second vulnerability (V2), which concerns residual RELAY_EARLY counters, is more challenging to mitigate because it is directly tied to the core anonymity property of Tor. Introducing any mechanism that allows relays to infer their position within the circuit would undermine the fundamental design of Tor and could introduce new privacy risks.

As an alternative, a more comprehensive defense would be to extend the existing circuit-level padding of Tor, which is currently applied only to specific internal circuits, to all circuit types, including external and server-side onion service circuits. Broadening this approach would make it significantly harder for attackers to identify the position of their relay or reliably inject covert signals. However, such a change would incur additional bandwidth and latency costs, which could impact overall network performance.

While our protocol-level attack is currently undetectable within the existing architecture of Tor, implementing the proposed countermeasures could substantially raise the bar for similar attacks. However, any decision to deploy these defenses requires a careful trade-off between the security gains and the resulting costs in network performance and engineering complexity.

7. Conclusions

We have presented a new method that undermines the anonymity provided by the Tor network. Our approach subtly marks data flows by exploiting two specific design vulnerabilities within Tor, which arise from the inherent security trade-off between concealing circuit length from non-edge nodes and the resulting exposure of certain cell transmission characteristics.

Experimental results obtained within a controlled environment demonstrated the effectiveness and stealthiness of our attack. The embedded signals are consistently undetectable under normal network conditions, introducing no noticeable delays or disruptions. Importantly, standard onion routers are unaware of these signals, which can only be detected and decoded by attacker-controlled routers specifically configured for this purpose. This capability allows attackers to reliably associate Tor users with public services and identify the real IP addresses of onion services, significantly undermining the anonymity protections provided by Tor.

Our experiments also demonstrated the flexibility of malicious routers, which can adaptively alternate between injecting and detecting signals. This dynamic functionality is managed by a finite state machine on each compromised node, enabling sophisticated coordination and enhancing the resilience of the attack.

Our work reveals a critical vulnerability rooted in the design of Tor: the choice to obscure circuit lengths inadvertently creates new attack vectors by exposing patterns in data transmission. Even though payloads remain encrypted, the unencrypted cell headers present a bidirectional vector for protocol-level attacks.

Author Contributions

Conceptualization, R.X., Y.W. and X.H.; methodology, Y.W. and X.H.; software, R.X.; validation, X.Y. and S.K.I.; formal analysis, R.X.; investigation, R.X.; resources, Y.W. and X.H.; data curation, R.X.; writing—original draft preparation, R.X.; writing—review and editing, Y.W., X.H., X.Y. and S.K.I.; visualization, R.X.; supervision, Y.W. and X.H.; project administration, Y.W. and X.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request. Due to security considerations, the data are not publicly available.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Dingledine, R.; Mathewson, N.; Syverson, P. Tor: The Second-Generation Onion Router. In Proceedings of the 13th USENIX Security Symposium, San Diego, CA, USA, 9–13 August 2004; pp. 303–320. [Google Scholar]
Jeffries, A. The Pentagon Is Trying to Make the Internet More Anonymous. The Verge, 16 June 2014. Available online: https://www.theverge.com/2014/6/16/5814776/the-pentagon-isbuilding-ways-to-make-the-internet-more-anonymous (accessed on 13 March 2025).
Edman, M.; Yener, B. On anonymity in an electronic society: A survey of anonymous communication systems. ACM Comput. Surv. 2009, 42, 1–35. [Google Scholar] [CrossRef]
Christin, N. Traveling the Silk Road: A Measurement Analysis of a Large Anonymous Online Marketplace. In Proceedings of the 22nd International World Wide Web Conference, Rio de Janeiro, Brazil, 13–17 May 2013; pp. 213–224. [Google Scholar]
Weimann, G. Terrorist Migration to the Dark Web. Perspect. Terrorism 2016, 10, 40–44. [Google Scholar]
Tor Project. August 2008 Progress Report. In The Tor Project Blog, August 2008. Available online: https://blog.torproject.org/august-2008-progress-report/ (accessed on 18 March 2025).
Ling, Z.; Luo, J.; Wu, K.; Fu, X. Protocol-level hidden server discovery. In Proceedings of the IEEE Conference on Computer Communications (INFOCOM), Turin, Italy, 14–19 April 2013; pp. 1043–1051. [Google Scholar]
Bauer, K.; McCoy, D.; Grunwald, D.; Kohno, T.; Sicker, D. Low-resource routing attacks against Tor. In Proceedings of the ACM Workshop on Privacy in the Electronic Society, Alexandria, VA, USA, 29–30 October 2007; pp. 11–20. [Google Scholar]
Bauer, K.S.; Grunwald, D.; Sicker, D.C. Predicting Tor path compromise by exit port. In Proceedings of the 28th International Performance Computing and Communications Conference, Phoenix, AZ, USA, 14–16 December 2009; pp. 384–387. [Google Scholar]
Tor Project. Bandwidth Authority Measurements. Tor Project Network Health. Available online: https://tpo.pages.torproject.net/network-health/bandwidth_scanners/bandwidth_authority_measurements.html (accessed on 27 March 2025).
Abbott, T.G.; Lai, K.J.; Lieberman, M.R.; Price, E.C. Browser-based attacks on Tor. In Proceedings of the 7th International Symposium on Privacy Enhancing Technologies, Ottawa, ON, Canada, 10–12 June 2007; pp. 184–199. [Google Scholar]
Wang, X.; Luo, J.; Yang, M.; Ling, Z. A potential HTTP-based application-level attack against Tor. Future Gener. Comput. Syst. 2011, 27, 67–77. [Google Scholar] [CrossRef]
Tor Project. HTTPS Everywhere. In Tor Project Support Glossary. Available online: https://support.torproject.org/glossary/https-everywhere/ (accessed on 27 March 2025).
Fu, X.; Ling, Z.; Luo, J.; Yu, W.; Jia, W.; Zhao, W. One cell is enough to break Tor’s anonymity. In Proceedings of the Black Hat Technical Security Conference, Las Vegas, NV, USA, 25–30 July 2009; pp. 578–589. [Google Scholar]
Ling, Z.; Luo, J.; Yu, W.; Fu, X.; Xuan, D.; Jia, W. A New cell-counting-based attack against Tor. IEEE/ACM Trans. Netw. 2012, 20, 1245–1261. [Google Scholar] [CrossRef]
Rochet, F.; Pereira, O. Dropping on the edge: Flexibility and traffic confirmation in onion routing protocols. Proc. Priv. Enhancing Technol. 2018, 2018, 27–46. [Google Scholar] [CrossRef]
Perry, M. Vanguards Technical README. In Vanguards GitHub Repository. Available online: https://github.com/mikeperry-tor/vanguards/blob/master/README_TECHNICAL.md (accessed on 27 March 2025).
Øverlier, L.; Syverson, P. Locating hidden servers. In Proceedings of the IEEE Symposium on Security and Privacy, Berkeley, CA, USA, 21–24 May 2006; pp. 100–114. [Google Scholar]
Tor Project. Entry Guards. In Tor Project Support. Available online: https://support.torproject.org/about/entry-guards/ (accessed on 27 March 2025).
Biryukov, A.; Pustogarov, I.; Weinmann, R.-P. Trawling for Tor hidden services: Detection, measurement, deanonymization. In Proceedings of the IEEE Symposium on Security and Privacy, Berkeley, CA, USA, 19–22 May 2013; pp. 80–94. [Google Scholar]
Chakravarty, S.; Barbera, M.V.; Portokalidis, G.; Polychronakis, M.; Keromytis, A.D. On the effectiveness of traffic analysis against anonymity networks using flow records. In Proceedings of the 15th International Conference on Passive and Active Network Measurement, Los Angeles, CA, USA, 10–11 March 2014; pp. 247–257. [Google Scholar]
Iacovazzi, A.; Sarda, S.; Elovici, Y. Inflow: Inverse network flow watermarking for detecting hidden servers. In Proceedings of the IEEE Conference on Computer Communications (INFOCOM), Honolulu, HI, USA, 16–19 April 2018; pp. 747–755. [Google Scholar]
Iacovazzi, A.; Frassinelli, D.; Elovici, Y. The DUSTER attack: Tor onion service attribution based on flow watermarking with track hiding. In Proceedings of the 22nd International Symposium on Research in Attacks, Intrusions, and Defenses (RAID), Beijing, China, 23–25 September 2019; pp. 213–225. [Google Scholar]
Fifield, D.; Perry, M. Proposal 324: Congestion Control Using Round Trip Time Measurements. Tor Project, Proposal 324, 2020. Available online: https://spec.torproject.org/proposals/324-rtt-congestion-control.txt (accessed on 27 March 2025).
Dingledine, R.; Mathewson, N. Tor Path Specification. Tor Project Specification, Version 3, 2021. Available online: https://spec.torproject.org/path-spec/index.html (accessed on 27 April 2025).
Tor Project. Tor Security Advisory: “Relay Early” Traffic Confirmation Attack. In The Tor Project Blog, 30 July 2014. Available online: https://blog.torproject.org/tor-security-advisory-relay-early-traffic-confirmation-attack (accessed on 14 April 2025).
Tor Project. Tor Protocol Specification. Tor Project Specification Repository. Available online: https://spec.torproject.org/ (accessed on 16 April 2025).
Nusenu. Is “KAX17” Performing De-Anonymization Attacks against Tor Users? Medium, 29 November 2021. Available online: https://nusenu.medium.com/is-kax17-performing-de-anonymization-attacks-against-tor-users-42e566defce8 (accessed on 30 July 2025).
Tor Project. Tor Metrics. Official Network Statistics Portal. Available online: https://metrics.torproject.org/ (accessed on 27 April 2025).
Dingledine, R. The Lifecycle of a New Relay. Available online: https://blog.torproject.org/lifecycle-of-a-new-relay/ (accessed on 27 April 2025).
Tor Project. Research Safety Board. Available online: https://research.torproject.org/safetyboard/ (accessed on 27 April 2025).

Figure 1. Tor cell structures. Numbers indicate field sizes in bytes. (a) Fixed-length cell (512 bytes). (b) Circuit-level cell, a subtype of the fixed-length cell used for end-to-end communication.

Figure 2. Tor circuit creation. Image from [1].

Figure 3. Establishment of the circuit when a Tor client communicates with a hidden service.

Figure 4. Verification procedure for receiving a circuit-level cell in the Tor onion router. Any violation of these rules results in circuit termination.

Figure 5. Injection and detection process in an external circuit attack.

Figure 6. Adaptive circuit-level cell sequence attack event-driven state machine.

Figure 7. Theoretical probabilities of capturing the entry (

P_{i}

), exit (

P_{j}

), and full circuit (

P_{both}

) as a function of the controlled bandwidth percentage of the adversary.

Figure 7. Theoretical probabilities of capturing the entry (

P_{i}

), exit (

P_{j}

), and full circuit (

P_{both}

) as a function of the controlled bandwidth percentage of the adversary.

Figure 8. Illustration of the six possible injector–detector topologies evaluated in our experiments, generated by configuring the Tor client with different command-line options and toggling the ActiFsm setting on the malicious relays. Topologies (a,c,e) involve two malicious relays, while (b,d,f) involve only one. In the second phase of our evaluation, we systematically assess the effectiveness and stealthiness of attack across all these configurations.

Figure 9. Results from the first phase of experiments. (a) Data from Tor Metrics showing the consensus weight fraction of the Luxembourg router and its probability of being chosen for different positions within a Tor circuit. The consensus weight fraction reflects the relay’s bandwidth as observed by itself and measured by directory authorities, influencing how often clients select it for their circuits. Position probability indicates the likelihood of the router being selected as a guard, middle, or exit node. (b) Probability distribution (PMF) of the number of RELAY_EARLY cells received per circuit when the Luxembourg router serves as the exit node.

Figure 10. FSM logs from the second topology, illustrating that the edge relay OR₃ receives two odd cells as specified by the attack algorithm. (a) Signal injection sequence sent from the injector at OR₂. (b) The corresponding cell sequence as received by the exit relay at OR₃. In this scenario, the exit relay is a standard node, as indicated by ActiFsm = 0, which disables the active detector mode of FSM but still allows for passive logging of the cell sequence.

Figure 11. Signal injection and detection for two example circuits, as captured by FSM log files. Each pair shows the sent cell sequence from the injector (left) and the corresponding received sequence at the detector (right). The “Variable Updates” column tracks the FSM’s internal state, with B representing the binary-encoded identifier tuple

(i, j)

as it is constructed and decoded. The top pair (a,b) illustrates successful encoding and decoding of the identifier

(098, 1)

; the bottom pair (c,d) shows the process for

(098, 7)

. In both cases, the detector accurately reconstructs the original identifier, confirming the reliability of the attack.

Figure 11. Signal injection and detection for two example circuits, as captured by FSM log files. Each pair shows the sent cell sequence from the injector (left) and the corresponding received sequence at the detector (right). The “Variable Updates” column tracks the FSM’s internal state, with B representing the binary-encoded identifier tuple

(i, j)

as it is constructed and decoded. The top pair (a,b) illustrates successful encoding and decoding of the identifier

(098, 1)

; the bottom pair (c,d) shows the process for

(098, 7)

. In both cases, the detector accurately reconstructs the original identifier, confirming the reliability of the attack.

Figure 12. Time-correlation analysis of injection and detection timestamps for start and end delimiters (3rd topology). The Pearson correlation coefficient (r) is computed for both. (a) Timestamps for the start delimiter. (b) Timestamps for the end delimiter.

Table 1. Decrement of RELAY_EARLY cell counters at each hop during successive transmissions from the client (OP) to intermediate onion routers (OR₁, OR₂, OR₃).

Sequence	Rel_cmd	Src → Dst	C_OP	C_OR1	C_OR2	C_OR3
			C_OP = 8	C_OR1 = 8	C_OR2 = 8	C_OR3 = 8
1st	EXTEND	OP → OR₁	C_OP = 7	C_OR1 = 7	C_OR2 = 8	C_OR3 = 8
2nd	EXTEND	OP → OR₂	C_OP = 6	C_OR1 = 6	C_OR2 = 7	C_OR3 = 8
3rd	$α$	OP → OR₃	C_OP = 5	C_OR1 = 5	C_OR2 = 6	C_OR3 = 7
4th	$α$	OP → OR₃	C_OP = 4	C_OR1 = 4	C_OR2 = 5	C_OR3 = 6
5th	$α$	OP → OR₃	C_OP = 3	C_OR1 = 3	C_OR2 = 4	C_OR3 = 5
6th	$α$	OP → OR₃	C_OP = 2	C_OR1 = 2	C_OR2 = 3	C_OR3 = 4
7th	$α$	OP → OR₃	C_OP = 1	C_OR1 = 1	C_OR2 = 2	C_OR3 = 3
8th	$α$	OP → OR₃	C_OP = 0	C_OR1 = 0	C_OR2 = 1	C_OR3 = 2

This example assumes the OP sets the maximum RELAY_EARLY cell count to 8. Rel_cmd denotes the subcommand field of a RELAY_EARLY cell;

α

represents any valid subcommand value. C_OP indicates the remaining number of RELAY_EARLY cells the onion proxy (OP) is allowed to send. C_X = n denotes the remaining number of RELAY_EARLY cells that node X (e.g., OR₁, OR₂, OR₃) is allowed to receive.

Table 2. Detailed Process of Covert Signal Injection.

Cell Sequence	Received	Malicious Router Action	Sent	Signal Field
1st	`RELAY_EARLY_EXTEND`	Injector mode	`CREATE`	-
2nd	`RELAY_EARLY_$α$`	Forward	`RELAY_EARLY_$α$`	Reserved
3rd	`RELAY_EARLY_$α$`	Forward	`RELAY_EARLY_$α$`	Reserved
4th	`[Cmd]_$α$`	`[Cmd]` → `RELAY`	`RELAY_$α$`	Start Delimiter
5th	`[Cmd]_$α$`	`[Cmd]` → `RELAY_EARLY`	`RELAY_EARLY_$α$`	Start Delimiter
$(5 + R_{0})$ th	`[Cmd]_$α$` × $R_{0}$	`[Cmd]` × $R_{0}$ → `RELAY` × $R_{0}$	`RELAY_$α$` × $R_{0}$	Payload
$(6 + R_{0})$ th	`[Cmd]_$α$`	`[Cmd]` → `RELAY_EARLY`	`RELAY_EARLY_$α$`
$(6 + \sum_{k = 0}^{1} R_{k})$ th	`[Cmd]_$α$` × $R_{1}$	`[Cmd]` × $R_{1}$ → `RELAY` × $R_{1}$	`RELAY_$α$` × $R_{1}$
$(7 + \sum_{k = 0}^{1} R_{k})$ th	`[Cmd]_$α$`	`[Cmd]` → `RELAY_EARLY`	`RELAY_EARLY_$α$`
$(7 + \sum_{k = 0}^{2} R_{k})$ th	`[Cmd]_$α$` × $R_{2}$	`[Cmd]` × $R_{2}$ → `RELAY` × $R_{2}$	`RELAY_$α$` × $R_{2}$
$(8 + \sum_{k = 0}^{2} R_{k})$ th	`[Cmd]_$α$`	`[Cmd]` → `RELAY_EARLY`	`RELAY_EARLY_$α$`
$(8 + \sum_{k = 0}^{3} R_{k})$ th	`[Cmd]_$α$` × $R_{3}$	`[Cmd]` × $R_{3}$ → `RELAY` × $R_{3}$	`RELAY_$α$` × $R_{3}$
$(9 + \sum_{k = 0}^{3} R_{k})$ th	`[Cmd]_$α$`	`[Cmd]` → `RELAY_EARLY`	`RELAY_EARLY_$α$`
$(9 + \sum_{k = 0}^{4} R_{k})$ th	`[Cmd]_$α$` × $R_{4}$	`[Cmd]` × $R_{4}$ → `RELAY` × $R_{4}$	`RELAY_$α$` × $R_{4}$
$(10 + \sum_{k = 0}^{4} R_{k})$ th	`[Cmd]_$α$`	`[Cmd]` → `RELAY_EARLY`	`RELAY_EARLY_$α$`	End Delimiter

α

denotes any valid subcommand;

R_{k}

indicates the number of RELAY cells encoded from the binary payload.

Table 3. Injector Positions and Corresponding Signal Properties Across Different Circuit Types.

Circ. Type	Circ. Path	Injector Positions and Signal Properties	Reason for Ineffectiveness
External_ClientSide	OR₁ -…- OR₃ (Exit)	$O R 1 \in {S & E}$ $O R 2 \in {S & I E # 3}$	$I E # 1 :$ Out-of-order signal $I E # 2 :$ Incomplete signal $I E # 3 :$ Improper position
Rend_ServerSide	OR₁ -…- OR₄ (Rendezvous)	$O R 1 \in {S & E}$ $O R 2 \in {S & I E # 3}$ $O R 3 \in {S & I E # 3}$
Intro_ClientSide	OR₁ -…- OR₄ (Introduction)	$O R 1 \in {S & I E # 1}$ $O R 2 \in {S & I E # 3}$ $O R 3 \in {S & I E # 3}$
Rend_ClientSide	OR₁ -…- OR₃ (Rendezvous)	$O R 1 \in {S & I E # 1}$ $O R 2 \in {S & I E # 3}$
Intro_ServerSide	OR₁ -…- OR₃ (Introduction)	$O R 1 \in {S & I E # 2}$ $O R 2 \in {S & I E # 3}$

S stands for stealthiness, E stands for effectiveness, and

I E # n

refers to ineffectiveness, where n is the corresponding reason number.

Table 4. Injector–Detector Topologies.

Topology No.	Injector Node	Detector Node	Validation Objective
1st	OR₁	None	$S & -$
2nd	OR₂	None	$S & -$
3rd	OR₁	OR₃	$S & E$

None indicates that no detector node is present in the corresponding topology.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xin, R.; Wang, Y.; Huang, X.; Yang, X.; Im, S.K. Cell-Sequence-Based Covert Signal for Tor De-Anonymization Attacks. Future Internet 2025, 17, 403. https://doi.org/10.3390/fi17090403

AMA Style

Xin R, Wang Y, Huang X, Yang X, Im SK. Cell-Sequence-Based Covert Signal for Tor De-Anonymization Attacks. Future Internet. 2025; 17(9):403. https://doi.org/10.3390/fi17090403

Chicago/Turabian Style

Xin, Ran, Yapeng Wang, Xiaohong Huang, Xu Yang, and Sio Kei Im. 2025. "Cell-Sequence-Based Covert Signal for Tor De-Anonymization Attacks" Future Internet 17, no. 9: 403. https://doi.org/10.3390/fi17090403

APA Style

Xin, R., Wang, Y., Huang, X., Yang, X., & Im, S. K. (2025). Cell-Sequence-Based Covert Signal for Tor De-Anonymization Attacks. Future Internet, 17(9), 403. https://doi.org/10.3390/fi17090403

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Cell-Sequence-Based Covert Signal for Tor De-Anonymization Attacks

Abstract

1. Introduction

2. Related Works

2.1. Linkage of Client Activities

2.2. Exposing the True IP Address of Hidden Services

3. More Background on Tor

3.1. Components of Tor

3.2. Communication Unit: Cells

3.3. Circuit Types and Construction

3.4. Communication Between Tor Clients and Services

3.5. Evaluation of Onion Router Positions in Circuits

4. Adaptive Circuit-Level Cell Sequence Attack

4.1. Protocol-Level Vulnerabilities

4.1.1. Behavior of Circuit-Level Cells

4.1.2. Absence of Continuity Validation (V1)

4.1.3. Residual Values in RELAY_EARLY Counters (V2)

4.2. Attack Algorithm

4.2.1. Functionality of the Covert Signal Injector

4.2.2. Measuring the Stealthiness and Effectiveness of the Covert Signal

4.2.3. Functionality of the Covert Signal Detector

4.3. Deployment of Malicious Routers

4.4. Analytical Probability of Entry-Exit Control

5. Evaluation

5.1. Experimental Setup

5.2. Phase One: Detecting Exploitation in the Wild

5.3. Phase Two: Validation of Attack Performance

5.4. Ethical Considerations

6. Discussion on Countermeasures

6.1. Attack Detectability

6.2. Potential Mitigations

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI