A Vulnerability Taxonomy for Tor-Based Hidden Services: Toward a De-Anonymization Framework for Cybercrime Investigation

Shin, Jiho; Shin, Inkyoung

doi:10.3390/electronics15112370

Open AccessArticle

A Vulnerability Taxonomy for Tor-Based Hidden Services: Toward a De-Anonymization Framework for Cybercrime Investigation

by

Jiho Shin

¹

and

Inkyoung Shin

^2,*

¹

Division of Computer Science Convergence, College of AISW Convergence, Mokwon University, Daejeon 35349, Republic of Korea

²

Cybercrime Research Center, Police Science Institute, Korean National Police University, Asan 31539, Republic of Korea

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(11), 2370; https://doi.org/10.3390/electronics15112370

Submission received: 1 May 2026 / Revised: 28 May 2026 / Accepted: 28 May 2026 / Published: 31 May 2026

(This article belongs to the Special Issue Advances in Cybersecurity, Data Privacy, Robotics, and Cloud and Service Computing)

Download Versions Notes

Abstract

Tor-based hidden services host substantial criminal infrastructure, yet de-anonymization research remains fragmented across heterogeneous techniques. No prior work has organized these techniques into a unified taxonomy oriented toward forensic investigation. This paper proposes a five-layer vulnerability taxonomy for Tor hidden services, distinguishing network-level (L1), application-level (L2), side-channel (L3), operational-security-failure (L4), and ecosystem-level (L5) categories. The taxonomy is derived from a structured review of literature published between 2002 and 2024. We further propose a Traceability Evaluation Framework (TEF) that scores 11 vulnerability types along three dimensions: Applicability, Technical Difficulty, and Legal Admissibility. The TEF dimension weights are derived through Analytic Hierarchy Process elicitation from a five-member expert panel of cybercrime investigators, digital forensics researchers, and a legal scholar. The resulting weights of (0.385, 0.204, 0.412) for Applicability, inverted Technical Difficulty, and Legal Admissibility prove robust to ±0.10 perturbations in sensitivity analysis. Under this framework, four application-layer (L2) and operational-security-failure (L4) vulnerabilities receive the highest traceability scores (TS ≥ 2.80), while two network-level (L1) attacks and one side-channel (L3) technique fall to the lowest tier. The framework integrates technical exploitability with legal admissibility constraints across U.S., EU, and other evidentiary regimes, providing a structured reference for investigators and a methodological foundation for case-based empirical validation in future work.

Keywords:

dark web; Tor network; hidden services; de-anonymization; vulnerability taxonomy; digital forensics; cybercrime investigation; analytic hierarchy process; traceability evaluation

1. Introduction

The Tor network, originally developed by the United States Naval Research Laboratory and subsequently released as an open-source anonymity system [1], has become the most widely used infrastructure for hosting hidden services, which are web servers accessible only within the Tor overlay network via .onion addresses. These hidden services (formally termed “onion services” in Tor’s v3 specification) provide end-to-end anonymization of both the client and the server, making it structurally infeasible to determine the physical location of the hosting machine through conventional forensic means.

The dual-use nature of this technology is well established. Tor hidden services serve legitimate purposes, including circumvention of government censorship, secure whistleblowing platforms (e.g., SecureDrop v2.15.1), and protection of journalistic communications. Conversely, the very mechanisms that provide privacy protection make Tor an attractive hosting environment for cybercriminal infrastructure. Illicit marketplaces, ransomware command-and-control servers, CSAM distribution networks, and stolen financial data exchanges have all been documented as operating on the Tor network [2,3,4].

Law enforcement agencies face a fundamental investigative asymmetry: conventional techniques for locating web servers (such as WHOIS lookups, TLS certificate transparency logs, content delivery network (CDN) log requests, and IP geolocation) are rendered ineffective when the origin server is concealed behind Tor’s multi-hop circuit architecture. Successful takedowns of major dark web markets were achieved through a combination of operational security (OPSEC) failures by operators, undercover infiltration, and exploitation of specific application-layer vulnerabilities rather than cryptographic attacks on the Tor protocol itself [2].

The academic literature on dark web de-anonymization is extensive but fragmented. Individual studies have examined traffic correlation attacks [5], side-channel timing analysis [6], website fingerprinting via deep learning [7], circuit fingerprinting [8], and application-level leakage through external API calls [9,10]. However, no prior work has organized these heterogeneous vulnerabilities into a unified, layered taxonomy that explicitly connects technical exploitability to forensic investigative utility. This gap is significant because investigators must make real-time resource allocation decisions, and no structured decision-support framework currently exists.

This paper addresses this gap through the following four contributions:

A five-layer vulnerability taxonomy for Tor hidden services, categorizing all known de-anonymization attack surfaces into network-level (L1), application-level (L2), side-channel (L3), OPSEC failure (L4), and ecosystem-level (L5) categories.
A structured literature mapping demonstrating that existing research predominantly addresses L1 and L3, while L2 (application-level) and L4 (OPSEC failure) represent underexplored areas of high practical investigative value.
A Traceability Evaluation Framework (TEF) that scores each vulnerability type across three forensically relevant dimensions (applicability, technical difficulty, and legal admissibility) to provide actionable investigative prioritization.
A comparative structural analysis of dark web versus surface web vulnerability profiles, contextualizing the taxonomy within the constraints that distinguish hidden service forensics from conventional web server forensics.

The remainder of this paper is structured as follows. Section 2 provides background on the Tor network architecture. Section 3 describes the taxonomy construction methodology. Section 4 presents the proposed five-layer taxonomy. Section 5 introduces the Traceability Evaluation Framework. Section 6 discusses investigative implications and limitations. Section 7 concludes the paper.

2. Background

2.1. Tor Network Architecture

The Tor network achieves anonymity through layered encryption and multi-hop routing, originally described by Dingledine et al. [1]. When a client (Onion Proxy, OP) communicates with a destination, it constructs an encrypted circuit through three volunteer-operated Tor relays: a guard node (entry), a middle relay, and an exit node. Each relay decrypts one layer of the onion-encrypted packet (a 512-byte “cell”) and forwards the remainder, such that no single relay knows both the origin and the destination of the communication.

Hidden services do not use an exit node in the traditional sense. Instead, the server and the client each build separate circuits to a mutually agreed-upon “rendezvous point” relay, where communication is bridged. The server publishes a cryptographically signed descriptor, containing its public key and introduction points, to a distributed hash table (DHT) maintained by Hidden Service Directory (HSDir) relays. Clients retrieve this descriptor and negotiate a rendezvous session without either party revealing their IP address to the other or to intermediary relays.

Tor v3 onion addresses (introduced in 2017) encode the server’s Ed25519 public key in a 56-character Base32 string, providing 128-bit security against enumeration attacks. This represents a significant improvement over v2 addresses (deprecated in 2021), which used only 10-byte SHA-1 digests and were vulnerable to trawling attacks [5]. Despite these cryptographic improvements, the protocol’s structural characteristics, particularly its reliance on rendezvous points, introduction points, and descriptor distribution, introduce measurable side effects exploitable for identification, as detailed in Section 4.

2.2. Dark Web vs. Surface Web: Structural Differences

Table 1 contrasts the key structural attributes of surface web and dark web (Tor-based) services. These differences are fundamental to understanding why conventional forensic techniques are inapplicable in dark web investigations and why the taxonomy proposed in this paper requires a distinct investigative framework.

A critical observation is that the principal forensic entry points available in surface web investigations (WHOIS records, certificate transparency logs, CDN provider cooperation, and IP geolocation) are structurally unavailable for Tor hidden services. This forces investigators to rely on application-layer artifacts, behavioral patterns, and statistical signals, forming the basis of the taxonomy in Section 4.

2.3. Overview of Existing De-Anonymization Research

Research on dark web de-anonymization spans more than two decades. Early foundational work focused on the theoretical design and structural limitations of onion routing [1,11]. The seminal clock-skew attack by Murdoch [6] demonstrated that physical hardware characteristics could leak identifying information despite cryptographic protections. Network-level correlation attacks were advanced by Biryukov et al. [5], who demonstrated the feasibility of enumerating hidden service descriptors and correlating guard node traffic. Application-level vulnerabilities received increasing attention as marketplace measurement studies revealed widespread OPSEC failures [3,4]. The introduction of deep learning to website fingerprinting by Sirinam et al. [7] represented a paradigmatic shift in side-channel attack capability. More recent work [9,10,12,13] has shifted toward systematic measurement and categorization, reflecting the field’s maturation. Table A1 (Appendix A) maps these sources to the taxonomy layers they address.

The introduction of deep learning to website fingerprinting by Sirinam et al. [7] illustrates a broader trend in which machine learning techniques enhance the exploitation efficiency of existing vulnerability classes. Deep learning improves L3 side-channel analysis accuracy, and analogous developments may extend to L1 traffic correlation, L4 OSINT correlation, and L5 ecosystem analysis. These advances, however, operate within the existing taxonomy layers as methodological enhancements rather than constituting a separate vulnerability category. We accordingly treat machine learning and AI-assisted analysis as a methodological axis orthogonal to the vulnerability taxonomy presented in this paper, with implications for investigative throughput and accuracy rather than for the structural composition of attack surfaces. Comprehensive treatment of AI-assisted dark web forensics constitutes a distinct research program beyond the scope of the present taxonomic and evaluative work; future research directions in this area are identified in Section 6.3.

3. Methodology: Taxonomy Construction

3.1. Literature Collection Scope

The taxonomy was constructed through a structured literature review covering publications from 2002 to 2024, encompassing the full timeline from the foundational Tor design paper [1] through the most recent systematic reviews [9,12,13]. Primary sources included the IEEE Symposium on Security and Privacy, USENIX Security Symposium, ACM Conference on Computer and Communications Security (CCS), and the Privacy Enhancing Technologies Symposium (PoPETs). In total, 15 primary sources were systematically analyzed, supplemented by secondary references for contextual depth.

3.2. Taxonomy Classification Criteria

Vulnerabilities were classified based on four criteria: (1) Attack Surface: the component of the hidden service architecture at which the vulnerability is located; (2) Adversary Capability: the resources and access required to execute the attack; (3) Forensic Relevance: the degree to which exploitation can produce court-admissible evidence; and (4) Countermeasure Landscape: the extent to which existing defenses mitigate the vulnerability. This multi-criteria approach ensures that the taxonomy reflects not only technical capability but also practical investigative utility, distinguishing it from purely security-oriented classifications that treat all attack types as equivalent.

3.3. Traceability Score Computation

Each vulnerability type is assessed across three dimensions on a three-point ordinal scale: Low = 1, Medium = 2, High = 3. These dimensions are: (1) Applicability: the breadth of conditions under which the vulnerability is exploitable in real dark web environments; (2) Technical Difficulty: the level of expertise and computational resources required for exploitation; and (3) Legal Admissibility: the degree to which evidence obtained through exploitation is likely to be admissible in criminal proceedings. To align Technical Difficulty directionally with the other two dimensions, this dimension is evaluated on an inverted scale (Low difficulty = 3, Medium = 2, High = 1), reflecting the principle that lower exploitation barriers favor investigative utility.

The aggregate Traceability Score (TS) for each vulnerability v is computed as the weighted linear composite:

TS(v) = w_A · App(v) + w_T · Tech_inv(v) + w_L · Legal(v)

(1)

where (w_A, w_T, w_L) = (0.385, 0.204, 0.412) denote the weights for Applicability, inverted Technical Difficulty, and Legal Admissibility, respectively. These weights were derived through Analytic Hierarchy Process (AHP) pairwise comparison elicitation from a five-member expert panel (Section 3.4), with the resulting weight vector validated by sensitivity analysis confirming that conclusions are robust to ±0.10 weight perturbations.

The resulting TS, with theoretical range [1.00, 3.00], is discretized into three investigative priority categories using thresholds derived from the natural distributional clustering of computed scores:

●●● High Traceability (TS ≥ 2.25): Immediate-to-high investigative priority; techniques warrant operationalization in standard investigative workflows.
●● Medium Traceability (1.55 ≤ TS < 2.25): Conditional utility; appropriate as supporting evidence or in case-specific applications where complementary techniques are unavailable.
● Low Traceability (TS < 1.55): Research-stage techniques requiring further methodological development before operational deployment.

These thresholds correspond to natural gaps observed in the computed score distribution: a 0.25-unit gap separates the ●●● and ●● clusters, and a 0.35-unit gap separates the ●● and ● clusters, providing principled boundaries for the ordinal classification.

The empirical derivation of dimension weights and the validation of their robustness through consistency analysis and sensitivity testing are detailed in Section 3.4.

3.4. Weight Derivation via Analytic Hierarchy Process

To address the inherent subjectivity of dimension weights in any weighted-composite scoring scheme, we derived (w_A, w_T, w_L) through the Analytic Hierarchy Process (AHP), a multi-criteria decision analysis method that elicits ratio-scale weights from pairwise expert comparisons [14]. This section documents the panel composition, elicitation procedure, aggregation method, consistency verification, and sensitivity analysis underlying the weight vector (0.385, 0.204, 0.412) reported in Section 3.3.

3.4.1. Panel Composition and Inclusion Criteria

The TEF weight elicitation requires participants with combined experience in investigative practice, digital forensics, and criminal procedure, which are domains that purely academic security researchers or purely legal scholars cannot individually represent. Five experts were recruited under the following inclusion criteria: (i) direct practical, research, or advisory experience in cybercrime investigation, digital forensics, or digital evidence law; (ii) current or prior position requiring engagement with both technical evidence analysis and its legal application; and (iii) a minimum of 8 years of relevant professional experience. The resulting panel, anonymized as R1–R5, spans three professional domains: law enforcement practice (R1, R4), academia with prior law enforcement service (R2), legal scholarship in digital evidence (R3), and legal practice in digital forensics (R5). The panel composition is summarized in Table 2.

3.4.2. Pairwise Comparison Elicitation

Each panelist completed three pairwise comparisons among the three evaluation dimensions, Applicability (App), inverted Technical Difficulty (Tech_inv), and Legal Admissibility (Legal), using Saaty’s standard 9-point intensity scale, where 1 indicates equal importance, 3 weakly stronger, 5 essentially stronger, 7 very strongly stronger, and 9 absolutely stronger preference. To ensure interpretive consistency across panelists, respondents were instructed to apply the framing of “cybercrime investigation and prosecution context” rather than a generic information-security perspective. Panelists were further requested to supply brief justifications for each pairwise judgment. The substantive content of these justifications confirms domain-specific reasoning. For example, R4 wrote that “legal admissibility must exist before applicability can be discussed,” whereas R3, writing from a digital evidence law perspective, emphasized that “applicability takes precedence for the resolution of cases.” These explanations reflect substantive disciplinary reasoning rather than arbitrary intensity assignments.

For each panelist, a 3 × 3 reciprocal comparison matrix A = [a_ij] was constructed, with a_ji = 1/a_ij and a_ii = 1, encoding the elicited preference intensities among Legal, Applicability, and Tech_inv.

3.4.3. Individual Weight Derivation and Group Aggregation

Individual weight vectors were derived using the geometric mean method, which is mathematically equivalent to the right principal eigenvector method for 3 × 3 matrices and is widely adopted in AHP practice. For panelist k, the unnormalized weight for dimension i is:

w_i^(k) = (∏_j a_ij^(k))^1/3

(2)

followed by normalization so that the weights sum to unity. Group weights were then aggregated across panelists, again via the geometric mean, the recommended aggregation operator for AHP weights derived from individual judgments, as it preserves the reciprocal structure of the underlying comparisons. For the i-th dimension:

w_i^group = (∏_k=₁ⁿ w_i^(k))^1/n

(3)

followed by re-normalization. With n = 5 panelists, the resulting group weight vector is (w_A, w_T, w_L) = (0.385, 0.204, 0.412). Notably, the resulting weights show a systematic dispersion across panelists that correlates with professional background: panelists with active or prior law enforcement service (R1, R2, R4) consistently emphasize Legal Admissibility (mean weight 0.69), whereas panelists from academia and legal practice (R3, R5) emphasize Applicability (mean weight 0.63). This dispersion is not noise but rather reflects genuinely divergent expert priorities, a finding consistent with the interdisciplinary character of dark web forensic practice. Individual weights and consistency ratios are reported in Table 3.

3.4.4. Consistency Verification

The internal consistency of each panelist’s judgments was verified through Saaty’s consistency ratio. The consistency index (CI) is defined as CI = (λ_max − n)/(n − 1), where λ_max is the principal eigenvalue of the comparison matrix. The consistency ratio is CR = CI/RI, where the random index RI = 0.58 for n = 3. CR < 0.1 indicates strong internal consistency; values between 0.1 and 0.2 are considered acceptable in expert-elicitation contexts where panelists may legitimately weigh competing criteria in non-transitive ways.

Four of the five panelists exhibited consistency ratios at or below the 0.2 threshold (R2 = 0.008, R3 = 0.046, R4 = 0.000, R1 = 0.117). Panelist R5 produced CR = 0.431, exceeding standard thresholds. Inspection of R5’s response pattern, however, reveals that all three pairwise comparisons used identical extreme intensity values (intensity = 8 in all three judgments), reflecting a coherent and emphatic prioritization of Applicability rather than incoherent judgment. This interpretation is supported by R5’s written rationale that “Tor de-anonymization is fundamentally infeasible in many cases, so applicability alone carries substantial value.” Accordingly, R5’s response is retained in the primary analysis as an expression of a legitimate disciplinary perspective (legal-practice viewpoint emphasizing the practical infeasibility of de-anonymization in many investigative contexts) but is also examined in the sensitivity analysis (Section 3.4.5).

3.4.5. Sensitivity Analysis

To verify that the framework’s conclusions are robust to weight elicitation choices, we recomputed all Traceability Scores under three weight scenarios: the primary 5-panelist analysis, a CR-restricted 4-panelist subset (excluding R5), and a strict-consistency 3-panelist subset (excluding R1 and R5). Table 4 summarizes the resulting group weights and the count of top-tier (●●●) vulnerabilities under each scenario.

Across all three scenarios, the same four vulnerabilities (External API leakage, OPSEC failure, misconfigured server headers, and cross-platform identity linkage) consistently occupy the top-tier ●●● classification. The aggregate distribution is invariant under all examined weight scenarios: L2 (application-layer) and L4 (OPSEC-failure) vulnerabilities dominate the high-traceability category, while L1 (network-level cryptographic) attacks consistently fall to the low-traceability category. This invariance supports the principal substantive finding of the framework: that investigative prioritization toward L2/L4 vulnerabilities is justified independently of the precise weight values chosen, provided that Legal Admissibility is afforded comparable importance to Applicability. Two individual vulnerabilities exhibit threshold-boundary sensitivity (JavaScript/WebRTC leakage at the ●●/●●● boundary; circuit fingerprinting and clock-skew timing at the ●/●● boundary), and their classification under the primary analysis is discussed in Section 5.2.

4. Proposed Vulnerability Taxonomy

The proposed taxonomy organizes vulnerabilities across five layers corresponding to distinct attack surfaces within the hidden service architecture. Table 5 provides a high-level overview; subsequent subsections describe each layer in detail.

4.1. Layer 1 (L1): Network-Level Vulnerabilities

Layer 1 encompasses attacks that exploit properties of the Tor network routing protocol itself, independent of the application running on the hidden service. These attacks require access to traffic at the network level and typically involve observation of one or more Tor relay nodes.

4.1.1. End-to-End Traffic Correlation

The most theoretically powerful class of Tor attacks involves correlation of traffic entering and leaving the network. An adversary controlling both the guard node and the rendezvous point of a hidden service circuit can correlate timing and volume patterns to link a client’s identity to a hidden service with high probability [5]. Such attacks require either global passive adversary capability or targeted positioning at strategic relay nodes, placing this technique largely beyond standard law enforcement operational capacity.

4.1.2. Circuit Fingerprinting

Kwon et al. [8] demonstrated that traffic patterns generated during Tor circuit construction are distinguishable by hidden service type and, under some conditions, by specific service identity. By passively observing traffic at a guard node, an adversary can classify circuits as belonging to particular hidden services with laboratory accuracy exceeding 99%. However, this technique requires the adversary to pre-position a malicious or cooperative guard node in the victim’s circuit path, a significant operational constraint.

4.1.3. Timing and Clock-Skew Analysis

Murdoch’s 2006 demonstration [6] that CPU load-induced clock skew measurable through TCP timestamp variations could de-anonymize hidden servers represented an early practical attack. By simultaneously inducing load on a suspected server and measuring timing deviations across multiple circuits, the technique can confirm server identity without breaking cryptography. While modern operating systems have introduced countermeasures reducing clock-skew exploitability, the underlying principle of exploiting physical hardware signatures through protocol-level side effects remains an active research direction.

4.2. Layer 2 (L2): Application-Level Vulnerabilities

Layer 2 vulnerabilities arise not from the Tor protocol itself but from the web application deployed on the hidden service. These are among the most practically exploitable vulnerability classes because they require no network-level access and can often be detected remotely with minimal resources. This layer also presents the highest Traceability Scores in the TEF (Section 5).

4.2.1. External API and Resource Leakage

Perhaps the most operationally significant vulnerability for law enforcement is unintentional leakage of the server’s real IP address through embedded external resource requests. When a dark web site loads resources (images, fonts, analytics scripts, payment APIs, or CDN-hosted libraries) from surface web domains, the hidden service’s web server makes direct HTTP connections to those external hosts from its real IP address, entirely bypassing Tor routing [9,10]. The Tor anonymization layer protects the circuit, but application-layer code that makes out-of-band connections circumvents it entirely. The investigative technique involves operating a controlled surface web server that hosts a monitored resource, then triggering a page load on the target hidden service and logging the connecting IP address.

4.2.2. Misconfigured Server Headers and Error Pages

HTTP response headers frequently disclose server software versions, framework identifiers, hosting provider information, and internal network details. Default error pages (e.g., Apache 403/404, Nginx default pages) contain version-specific content that narrows the server identification space. Banner grabbing combined with dark web-specific fingerprinting (identifying the CMS, framework, and plugin stack) allows investigators to correlate a hidden service with surface web infrastructure operated by the same actor, particularly when operators reuse software configurations across dark web and surface web properties.

4.2.3. JavaScript and WebRTC Leakage

JavaScript execution within the Tor Browser has historically been a source of de-anonymization incidents. From a server-side perspective, JavaScript may invoke browser APIs, including WebRTC, that expose the server’s real network interfaces, cached DNS resolutions, or other identifying characteristics. Investigators may leverage crafted JavaScript payloads delivered through court-authorized Network Investigative Techniques (NITs) to identify specific clients accessing a hidden service. This technique is primarily client-focused but its application in court-authorized server-targeting operations has documented precedent.

4.3. Layer 3 (L3): Side-Channel Vulnerabilities

Side-channel vulnerabilities exploit measurable physical or statistical signals generated as a byproduct of computation and communication, without directly attacking cryptographic primitives. These techniques typically require longer observation periods and specialized expertise.

4.3.1. Website Fingerprinting via Traffic Analysis

Website fingerprinting attacks classify which hidden service a user is visiting by analyzing the pattern of encrypted cell counts, directions, and timing in a Tor circuit, even without decrypting content. Systematically demonstrated by Sun et al. [11] and progressively advanced to achieve over 98% classification accuracy using deep learning [7], the key insight is that the loading behavior of each web page creates a distinctive traffic fingerprint attributable to its HTML structure and embedded resources. While primarily client-side, server-side variants can be adapted to link server behavior patterns to observable infrastructure.

4.3.2. Resource Consumption Side-Channel

In shared-hosting environments, which are commonly used by small dark web operators, multiple hidden services share the same physical hardware. CPU, memory, disk I/O, and network bandwidth are shared resources subject to contention. An adversary operating on the same shared hosting infrastructure can measure resource availability perturbations to infer activity patterns of co-hosted services. This class of attack is analogous to cross-VM side-channel attacks documented in cloud computing literature.

4.3.3. Packet Timing and Traffic Shaping Analysis

Inter-packet timing analysis exploits the fact that Tor’s 512-byte cell structure introduces characteristic timing signatures depending on application data patterns. The low-latency constraint of hidden services, which arises because operators prefer minimal backend complexity to reduce circuit latency, means timing patterns may be less obscured by server-side processing delays. By correlating observed circuit timing with expected timing distributions for known server configurations, investigators can distinguish hosting environments and narrow identification hypotheses.

4.4. Layer 4 (L4): Operational Security (OPSEC) Failure Vulnerabilities

OPSEC failures represent the most practically productive class of vulnerabilities from a law enforcement perspective, because they arise from human behavior rather than protocol properties and are therefore not addressable through technical countermeasures alone. Documented criminal prosecutions consistently demonstrate that operator identification was achieved through behavioral and metadata analysis rather than Tor protocol attacks [2].

4.4.1. Metadata Reuse and Infrastructure Cross-Referencing

Common operator failure patterns include reuse of usernames across dark web forums and surface web platforms, registration of .onion infrastructure with email addresses linked to real-world identities, use of the same PGP key for both dark web and surface web communications, and reuse of cryptocurrency wallet addresses in ways that link dark web revenue to KYC-verified exchange accounts [3]. The investigative technique involves systematic OSINT collection and cross-platform correlation, with particular attention to PGP key servers, Bitcoin blockchain analysis, and username enumeration.

4.4.2. Cross-Platform Identity Linkage

Dark web operators frequently maintain presences on both dark web forums and clearnet social media for operational or reputational purposes. Writing style analysis (stylometry), posting time pattern analysis, and technical metadata in uploaded files (EXIF data, document metadata) have contributed to operator identification in documented cases. Soska and Christin [3] demonstrated that marketplace operator behavior patterns are distinctive and consistent over time, suggesting that behavioral fingerprinting represents a viable long-term investigative strategy even absent direct technical de-anonymization.

4.5. Layer 5 (L5): Ecosystem-Level Vulnerabilities

Layer 5 encompasses vulnerabilities arising from the network structure and economic organization of the dark web criminal ecosystem rather than from individual services.

4.5.1. Scale-Free Network Structure and Hub Identification

Measurement studies [3,4,15] have consistently demonstrated that the dark web marketplace ecosystem exhibits scale-free network properties: a small number of highly connected hubs account for the majority of transactions and link structure, while the vast majority of hidden services operate at low traffic volumes. This structural property has significant investigative implications: the takedown of a single major hub disproportionately disrupts the ecosystem, and the hub’s infrastructure provides investigative entry points radiating outward to affiliated services. Graph-based analysis of .onion link relationships can identify hub nodes without requiring direct de-anonymization of any individual service.

4.5.2. Shared Hosting Infrastructure Analysis

Commercial dark web hosting providers operate multiple client hidden services on shared infrastructure. Pastor-Galindo et al. [12,13,16] demonstrated that systematic crawling and analysis of onion address spaces can identify clusters of hidden services with similar technical fingerprints (server software, timing patterns) suggesting shared hosting origin. When one service on a shared host is identified through other techniques, forensic cross-contamination can leverage shared infrastructure to identify co-hosted criminal services.

5. Traceability Evaluation Framework (TEF)

The Traceability Evaluation Framework translates the technical taxonomy into an actionable investigative prioritization tool. Table 6 presents the full scoring matrix for all identified vulnerability types across the three evaluation dimensions.

5.1. High-Traceability Vulnerabilities (●●●): Investigative Priority

External API leakage, OPSEC failures (metadata), misconfigured server headers, and cross-platform identity linkage receive the highest traceability scores (TS ≥ 2.80), all crossing the ●●● threshold. These four techniques share a common forensic profile: each operates through passive observation or post-hoc analysis rather than active exploitation, each produces concrete evidence artifacts (IP addresses, server fingerprints, identity correlations) directly attributable to identifiable infrastructure or persons, and each requires only low-to-medium technical capability that established cybercrime units can deploy. Evidence obtained through API leakage (a server’s IP address recorded in a law enforcement-controlled server’s access log) is particularly defensible in court because it involves passive observation of a connection voluntarily initiated by the defendant’s server, analogous to a caller ID record in telephony law. The convergence of broad applicability, modest technical demand, and strong evidentiary value places these vulnerabilities at the operational frontier of dark web cybercrime investigation.

5.2. Medium-Traceability Vulnerabilities (●●): Conditional Utility

JavaScript/WebRTC leakage, website fingerprinting, scale-free hub mapping, and traffic correlation receive medium traceability scores (1.55 ≤ TS < 2.25). These techniques share a common limitation profile: each is either narrowly conditional in its real-world applicability, methodologically contested in evidentiary terms, or both. JavaScript/WebRTC leakage, although technically straightforward, depends on the victim user enabling client-side scripting in the Tor Browser, a configuration that security-conscious offenders may deliberately disable, narrowing applicability. Traffic fingerprinting evidence (asserting that an encrypted traffic pattern matched a known fingerprint) may face legal challenges regarding reliability and methodology, particularly in jurisdictions applying Daubert-style evidentiary scrutiny. Scale-free hub mapping and other network-topology techniques produce probabilistic leads rather than direct evidence of individual offender identity. Accordingly, these techniques are best employed as investigative leads to support other evidence collection, or as corroborative material reinforcing stronger primary evidence, rather than as standalone prosecutorial evidence.

5.3. Low-Traceability Vulnerabilities (●): Research-Stage Techniques

Circuit fingerprinting, clock-skew timing attacks, and resource consumption side-channel analysis receive the lowest traceability scores (TS < 1.55) due to their combination of high technical difficulty and limited evidentiary value. Circuit fingerprinting and clock-skew timing, while theoretically powerful, require adversarial positioning within the Tor relay network through malicious relay operation or ISP-level cooperation, raising significant legal authorization issues in most jurisdictions; the controlled conditions required for high-accuracy results are rarely achievable in real investigations. Resource consumption side-channel evidence presents a distinct admissibility burden: it relies on inferential reasoning about a target server’s response to externally induced load patterns rather than direct observation of an identifying artifact, making chain-of-evidence and methodological reliability difficult to establish in criminal proceedings. Across all three techniques, the evidentiary product is more consistent with intelligence gathering and lead generation than with the production of legally admissible evidence for criminal prosecution.

6. Discussion

6.1. Multi-Layer Attack Scenarios and Investigative Sequencing

The most effective dark web investigations documented in the literature employed sequential multi-layer approaches rather than single-technique exploitation. A representative investigative sequence might proceed as follows: an initial L5 ecosystem scan identifies a high-traffic hub and its probable shared hosting cluster; L2 header analysis and API monitoring establish that the target uses external resources and leaks software version information; L4 OSINT operations identify reused usernames or cryptocurrency wallets linking the operator to a clearnet identity; and L2 API leakage subsequently captures the server’s real IP address, enabling a conventional warrant for physical server access. Each layer contributes both direct evidence and hypotheses directing the next investigative step, creating a structured workflow analogous to the cyber kill chain concept in incident response.

This sequencing logic also informs the design of automated investigative tools. Rather than deploying all techniques simultaneously, a script-based tool can implement a triage phase (rapid L5 scanning and L2 header collection), followed by a targeted phase (L2 API monitoring and L4 OSINT) for identified high-value targets, and finally an evidence consolidation phase for legal proceedings. This architecture aligns with practical law enforcement constraints balancing technical effectiveness with legal authorization requirements.

6.2. Legal and Jurisdictional Considerations

The legal admissibility dimension of the TEF reflects a critical constraint on dark web investigative techniques. Different jurisdictions impose substantially different requirements on electronic surveillance authorization and evidentiary admissibility, and techniques permissible under one legal framework may be inadmissible or prohibited under another. Three jurisdictional regimes illustrate this divergence and condition how TEF scores translate into operational practice.

Under United States law, technique-derived evidence must satisfy Federal Rules of Evidence 702 and the Daubert standard, requiring that the underlying method be testable, peer-reviewed, demonstrably reliable, and generally accepted within the relevant scientific community. Network-level statistical attacks (L1 traffic correlation, circuit fingerprinting) face particular Daubert scrutiny because their error rates under realistic adversary conditions remain methodologically contested. Passive L2 API observation, by contrast, more readily satisfies Daubert criteria because it produces directly observable artifacts (logged IP addresses) whose reliability is widely accepted. The Cybersecurity and Infrastructure Security Agency framework and Fourth Amendment jurisprudence further differentiate passive observation from active intrusion, with the latter typically requiring Title III warrants.

Under European Union law, the General Data Protection Regulation (GDPR) and the Law Enforcement Directive (Directive 2016/680) impose data minimization and proportionality requirements on law enforcement processing. OSINT-based L4 correlation evidence, while technically straightforward, must satisfy the proportionality test when cross-platform identifiers link Tor activity to clearnet identities. National implementations vary; for example, Germany’s strict telecommunications surveillance framework imposes higher thresholds than some other member states, further qualifying the cross-jurisdictional portability of TEF priority rankings.

Under the People’s Republic of China legal framework, electronic data forensic regulations (most prominently the Provisions on Several Issues Concerning the Application of Law in Handling Criminal Cases Using Electronic Data as Evidence) emphasize chain-of-custody documentation and forensic procedure compliance. Investigative procedures and evidentiary thresholds for cross-border cybercrime cooperation may differ materially from those in U.S. and EU frameworks, requiring jurisdiction-specific recalibration of the Legal Admissibility weight when the framework is applied outside the originating context.

Across these regimes, the distinction between passive observation and active exploitation remains a near-universal organizing principle. When a law enforcement agency operates a server that passively records access logs including IP addresses, the legal framework in most jurisdictions treats this as analogous to caller ID, as no interception of communications is involved. Active exploitation of vulnerabilities (e.g., deploying NIT-style payloads) typically requires substantially higher judicial authorization across all three regimes examined. The TEF’s Legal Admissibility scores should accordingly be interpreted in light of applicable national law, and consultation with legal counsel specializing in cybercrime investigation is recommended before operationalizing any technique. Future cross-jurisdictional empirical work could examine how each TEF dimension’s weight should be recalibrated under specific evidentiary regimes.

6.3. Limitations and Future Research Directions

This taxonomy is based on a structured review of published literature and does not reflect proprietary law enforcement techniques. Several limitations warrant acknowledgment. First, the dark web landscape evolves rapidly; new hidden service platforms, protocol modifications (e.g., Tor’s vanguards-lite countermeasure against guard discovery), and novel criminal operational practices may create vulnerability categories not captured in the current taxonomy. Second, while the AHP-based weight elicitation reported in Section 3.4 provides expert-panel grounding for the TEF dimension weights, the framework itself has not been empirically validated against a corpus of documented criminal investigation outcomes; the AHP panel reflects collective judgment from five qualified domain experts but does not substitute for case-based outcome validation, which remains an open research need. Third, the expert panel was constituted within a single national context (Korean law enforcement and academic networks), and the resulting weight vector may not generalize directly to jurisdictions with substantially different evidentiary regimes or investigative cultures; comparative panels constituted across jurisdictions would be required to test the framework’s transportability. Fourth, the panel size (n = 5) is modest by quantitative survey standards, although consistent with AHP practice for highly specialized domains where expert availability is constrained. Fifth, the L5 ecosystem-level category remains the least developed, reflecting a gap in the existing literature regarding forensic operationalization of network topology analysis.

Future research directions include: (i) empirical case-validation of TEF scores against documented dark web investigation outcomes, ideally drawing on declassified or court-documented cases across multiple jurisdictions; (ii) cross-jurisdictional AHP elicitation with expert panels drawn from U.S., EU, and other legal regimes, to test whether the weight ordering (Legal Admissibility > Applicability > Technical Difficulty) generalizes or is specific to the present panel composition; (iii) development of automated tooling implementing high-traceability techniques with chain-of-evidence guarantees suitable for criminal proceedings; (iv) longitudinal tracking of vulnerability prevalence as Tor protocol versions and dark web hosting practices evolve; and (v) integration of the TEF with emerging AI-assisted dark web analysis methods, including the legal and evidentiary considerations that such automation introduces.

6.4. Methodological Reflections from Expert Panel Elicitation

The AHP-based weight elicitation procedure reported in Section 3.4 produced two findings that merit explicit discussion beyond their technical role in deriving the weight vector.

First, the panel-wide dispersion of individual weights was substantial: panelists with current or prior law enforcement service (R1, R2, R4) emphasized Legal Admissibility (mean individual weight 0.69), whereas panelists from academia and legal practice (R3, R5) emphasized Applicability (mean individual weight 0.63). One panelist (R4) explained that “legal admissibility must exist before applicability can be discussed,” reflecting the prosecutorial perspective in which evidence value gates technique selection. Another (R3), writing from a digital evidence law perspective, emphasized that “applicability takes precedence for the resolution of cases,” reflecting the case-resolution perspective in which feasibility precedes legal optimization. This dispersion is not a defect in the elicitation procedure; rather, it is substantive evidence that traceability evaluation is intrinsically dependent on the evaluator’s institutional position and operational context. The geometric-mean aggregation procedure produces a weight vector that balances these legitimate perspectives, but practitioners applying the TEF in specific operational contexts may reasonably recalibrate weights to align with their institutional mandate.

Second, the convergence of the AHP-derived weight ordering (Legal Admissibility ≈ Applicability > Technical Difficulty) with the structure originally hypothesized by the authors provides a form of triangulated validation. The original weights were derived from the authors’ combined investigative and research experience; their independent reproduction through a structured expert-elicitation procedure conducted with non-author panelists strengthens confidence that the weight ordering reflects genuine domain consensus rather than the authors’ idiosyncratic priors. The sensitivity analysis (Section 3.4.5) further confirms that the framework’s principal substantive finding, the dominance of L2 and L4 vulnerabilities in the high-traceability category, is invariant across all reasonable weight variations examined.

These observations argue for treating expert-panel elicitation not merely as a defensive methodological supplement to a weighted-composite framework, but as a substantive analytical tool that surfaces structurally divergent expert perspectives and conditions framework applicability across institutional contexts.

7. Conclusions

This paper has proposed a five-layer vulnerability taxonomy for Tor-based hidden services and an associated Traceability Evaluation Framework (TEF) intended to support cybercrime investigation by structuring the heterogeneous body of de-anonymization research into a forensically oriented classification. The framework’s dimension weights were derived through Analytic Hierarchy Process elicitation from a five-member expert panel; the resulting weight ordering in which Legal Admissibility (0.412) and Applicability (0.385) carry comparable importance, with Technical Difficulty (0.204) secondary, was found robust to weight perturbations of ±0.10 in sensitivity analysis.

Under this framework, the high-traceability category (●●●) is occupied consistently by application-layer (L2) and operational-security-failure (L4) vulnerabilities, while network-level cryptographic and timing-based attacks (L1), together with one side-channel (L3) technique, consistently fall to the low-traceability category. This pattern is observed across all sensitivity scenarios examined, suggesting that the relative prioritization of L2/L4 over L1 reflects substantive structural differences in evidentiary value rather than artifacts of any particular weight choice. The pattern also suggests, with appropriate caution, that law enforcement agencies operating under resource constraints may find greater investigative leverage in API monitoring, server header analysis, and systematic open-source correlation than in computationally intensive traffic-analysis or network-level correlation attacks, particularly in jurisdictions where evidentiary admissibility imposes constraints comparable to Daubert-style scrutiny.

Several limitations qualify these conclusions. The framework has not been empirically validated against documented criminal investigation outcomes; the AHP panel, while qualified, reflects a single national context and a modest sample size; and the dark web operational landscape evolves rapidly enough that any taxonomy requires periodic re-grounding. The TEF should therefore be regarded as a structured starting point for investigative prioritization and as a candidate for further empirical refinement, rather than as a definitive operational protocol.

The contribution of this work lies in unifying previously fragmented de-anonymization research into a single forensically grounded structure, in proposing a transparent weight-derivation procedure that surfaces rather than conceals expert disagreement, and in articulating the legal-admissibility constraint as a first-class evaluative dimension alongside technical applicability. We hope this framework provides a useful foundation for case-validation research, cross-jurisdictional comparative work, and the integration of emerging AI-assisted analysis methods, all of which we expect will refine and partially supersede the present formulation as the field matures.

Author Contributions

Conceptualization, J.S.; methodology, J.S.; formal analysis, J.S.; investigation, I.S.; writing—original draft preparation, I.S.; writing—review and editing, J.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (RS-2026-25468561).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

Tor	The Onion Router
OPSEC	Operational Security
TEF	Traceability Evaluation Framework
NIT	Network Investigative Technique
OSINT	Open-Source Intelligence
CDN	Content Delivery Network
CSAM	Child Sexual Abuse Material
OP	Onion Proxy
OR	Onion Router
DHT	Distributed Hash Table
HSDir	Hidden Service Directory
PaaS	Platform as a Service
DL	Deep Learning

Appendix A

Table A1 maps the primary literature sources reviewed to the corresponding taxonomic layers addressed in each work, illustrating the research coverage distribution across layers and identifying underexplored areas. ✓ indicates the layer is a primary focus of the reference; blank indicates the topic is not addressed. L1 = Network-level; L2 = Application-level; L3 = Side-channel; L4 = OPSEC failure; L5 = Ecosystem-level.

Table A1. Literature-to-taxonomy layer mapping for primary sources reviewed.

Reference	Year	L1 Net	L2 App	L3 Side	L4 OPSEC	L5 Eco	Focus Area
Dingledine et al. [1]	2004	✓					Foundational Tor design
Murdoch [6]	2006			✓			Clock-skew timing attack
Biryukov et al. [5]	2013	✓			✓		Trawling & deanonymization
Christin [4]	2013					✓	Silk Road measurement
Kwon et al. [8]	2015	✓					Circuit fingerprinting
Soska & Christin [3]	2015					✓	Marketplace evolution
Biryukov et al. [15]	2014					✓	Content & popularity analysis
Sirinam et al. [7]	2018		✓				Deep fingerprinting (DL)
Pastor-Galindo et al. [16]	2023				✓		Onion address gathering
Pastor-Galindo et al. [13]	2024				✓		Modular dark web framework
Ruiz Ródenas et al. [12]	2024				✓		Big data architecture
Jin et al. [10]	2024				✓	✓	Forensic investigation
Tippe & Tippe [9]	2024	✓	✓				Deanon attacks survey

References

Dingledine, R.; Mathewson, N.; Syverson, P. Tor: The second-generation onion router. In Proceedings of the 13th USENIX Security Symposium, San Diego, CA, USA, 9–13 August 2004; pp. 303–320. [Google Scholar]
Moore, D.; Rid, T. Cryptopolitik and the Darknet. Survival 2016, 58, 7–38. [Google Scholar] [CrossRef]
Soska, K.; Christin, N. Measuring the longitudinal evolution of the online anonymous marketplace ecosystem. In Proceedings of the 24th USENIX Security Symposium, Washington, DC, USA, 12–14 August 2015; pp. 33–48. [Google Scholar]
Christin, N. Traveling the Silk Road: A measurement analysis of a large anonymous online marketplace. In Proceedings of the 22nd International WWW Conference, Rio de Janeiro, Brazil, 13–17 May 2013; pp. 213–224. [Google Scholar]
Biryukov, A.; Pustogarov, I.; Weinmann, R.-P. Trawling for Tor hidden services: Detection, measurement, deanonymization. In Proceedings of the IEEE Symposium on Security and Privacy, Berkeley, CA, USA, 19–22 May 2013; pp. 80–94. [Google Scholar]
Murdoch, S.J. Hot or not: Revealing hidden services by their clock skew. In Proceedings of the 13th ACM CCS, Alexandria, VA, USA, 30 October–3 November 2006; pp. 27–36. [Google Scholar]
Sirinam, P.; Imani, M.; Juarez, M.; Wright, M. Deep fingerprinting: Undermining website fingerprinting defenses with deep learning. In Proceedings of the ACM SIGSAC CCS, Toronto, ON, Canada, 15–19 October 2018; pp. 1928–1943. [Google Scholar]
Kwon, A.; AlSabah, M.; Lazar, D.; Dacier, M.; Devadas, S. Circuit fingerprinting attacks: Passive deanonymization of Tor hidden services. In Proceedings of the 24th USENIX Security Symposium, Washington, DC, USA, 12–14 August 2015; pp. 287–302. [Google Scholar]
Tippe, P.; Tippe, A. Onion services in the wild: A study of deanonymization attacks. Proc. Priv. Enhancing Technol. 2024, 2024, 291–310. [Google Scholar] [CrossRef]
Jin, P.; Kim, N.; Lee, S.; Jeong, D. Forensic investigation of the dark web on the Tor network: Pathway toward the surface web. Int. J. Inf. Secur. 2024, 23, 331–346. [Google Scholar] [CrossRef]
Sun, Q.; Simon, D.R.; Wang, Y.-M.; Russell, W.; Padmanabhan, V.N.; Qiu, L. Statistical identification of encrypted web browsing traffic. In Proceedings of the IEEE Symposium on Security and Privacy, 12–15 May 2002; pp. 19–30. [Google Scholar]
Pastor-Galindo, J.; Sandlin, H.-Â.; Gómez Mármol, F.; Bovet, G.; Martínez Pérez, G. A big data architecture for early identification and categorization of dark web sites. Future Gener. Comput. Syst. 2024, 157, 67–81. [Google Scholar] [CrossRef]
Ruiz Ródenas, J.M.; Pastor-Galindo, J.; Gómez Mármol, F. A general and modular framework for dark web analysis. Clust. Comput. 2024, 27, 4687–4703. [Google Scholar] [CrossRef]
Saaty, T.L. The Analytic Hierarchy Process: Planning, Priority Setting, Resource Allocation; McGraw-Hill: New York, NY, USA, 1980. [Google Scholar]
Biryukov, A.; Pustogarov, I.; Thill, F.; Weinmann, R.-P. Content and popularity analysis of Tor hidden services. In Proceedings of the 34th IEEE ICDCSW, Madrid, Spain, 30 June–3 July 2014; pp. 188–193. [Google Scholar]
Pastor-Galindo, J.; Gómez Mármol, F.; Martínez Pérez, G. On the gathering of Tor onion addresses. Future Gener. Comput. Syst. 2023, 145, 12–26. [Google Scholar] [CrossRef]

Table 1. Structural comparison of surface web and Tor hidden service environments.

Attribute	Surface Web	Dark Web (Tor Hidden Service)
DNS Resolution	Standard DNS (ICANN-managed, globally routable)	Distributed hash table; .onion addresses resolved within Tor
IP Visibility	Server IP exposed to clients and intermediaries	Origin server IP concealed via multi-hop circuit; rendezvous-point contact only
Certificate Authority	Public CA validates domain ownership; PKI-bound identity	Self-signed or CA-issued cert detached from physical identity; no PKI binding
Latency Profile	Typically <100 ms for regional connections	200–600 ms overhead per circuit due to three-relay onion routing
Backend Complexity	Full-stack deployment (CDN, load balancer, microservices) common	Minimal stack preferred; heavy dependencies create API-leakage risks
Search Indexing	Crawlable by public search engines; easily discoverable	Not indexed; address discovery relies on dark web directories only
Takedown Mechanism	Domain seizure, IP blocking, hosting provider cooperation	Address cannot be seized without server identification; host often unknown
Forensic Entry Point	Abundant: WHOIS, CDN logs, TLS certificate transparency, IP geolocation	Limited: must exploit application-layer or side-channel vulnerabilities

Table 2. AHP panel composition and qualifications (anonymized).

ID	Current Affiliation	Law Enforcement Service	Years of Experience	Primary Domain	Evidentiary Role
R1	Law enforcement (active)	Currently serving	15+	Cybercrime investigation; digital forensics; dark web analysis	Investigator (evidence collection)
R2	Academia (university faculty)	Previously served	15+	Cybercrime investigation; digital forensics	Investigator (evidence collection)
R3	Academia (university faculty)	None	15+	Criminal procedure law; digital evidence law; forensic research	Legal expert (admissibility analysis)
R4	Law enforcement (active)	Currently serving	15+	Cybercrime investigation (practice and policy)	Investigator (evidence collection)
R5	Legal practice (law firm)	Previously served	8–15	Cybercrime investigation; digital forensics; cyber law; defense counsel	Investigator and legal counsel

Table 3. Individual AHP-derived weights and consistency ratios for each panelist, with group geometric-mean aggregation (bottom row). Tech (inv.) is the inverted Technical Difficulty dimension.

ID	Legal	Applic.	Tech (inv.)	CR	Consistency	Primary Emphasis
R1	0.701	0.202	0.097	0.117	Acceptable	Strong Legal Admissibility
R2	0.540	0.297	0.163	0.008	Excellent	Moderate Legal Admissibility
R3	0.196	0.493	0.311	0.046	Excellent	Applicability emphasis
R4	0.818	0.091	0.091	0.000	Excellent	Dominant Legal Admissibility
R5	0.048	0.762	0.190	0.431	Sensitivity only	Dominant Applicability
Group	0.412	0.385	0.204	—	Geometric mean	Balanced Legal–Applicability

Table 4. Sensitivity analysis: group weights and top-tier vulnerability counts under three panelist-subset scenarios.

Scenario	n	Legal	Applic.	Tech (inv.)	Top-Tier (●●●) Vulnerabilities
Primary analysis (all panelists)	5	0.412	0.385	0.204	4 (L2/L4 dominant)
Sensitivity A (CR ≤ 0.2 subset)	4	0.571	0.262	0.167	4 (L2/L4 dominant)
Sensitivity B (CR ≤ 0.1 subset)	3	0.523	0.280	0.197	4 (L2/L4 dominant)

Table 5. Five-layer vulnerability taxonomy for Tor-based hidden services.

Layer	Category	Key Vulnerabilities	Attack Surface	Investigation Method
L1	Network-level	Traffic correlation, timing analysis, circuit fingerprinting	Tor relay/entry-exit node	Network traffic capture, relay cooperation
L2	Application-level	External API leakage, misconfigured headers, JS/WebRTC exposure	Web server, CMS layer	HTTP header analysis, API monitoring
L3	Side-channel	CPU/memory resource patterns, packet timing, website fingerprinting	Client–server interaction	Statistical traffic analysis, fingerprinting
L4	OPSEC Failure	Metadata reuse, cross-platform identity linkage, credential reuse	Operator behavior	OSINT, blockchain analysis, stylometry
L5	Ecosystem-level	Scale-free hub structure, shared hosting infrastructure	Dark web network topology	Graph analysis, crawler-based mapping

Table 6. Traceability Evaluation Framework (TEF): scoring matrix for Tor hidden service vulnerability types. Column abbreviations: App = Applicability; TD = Technical Difficulty; LA = Legal Admissibility; TS = Traceability Score. Cell value abbreviations: H = High, M = Medium, L = Low. App and LA denote the breadth of real-world exploitability and the likelihood of evidence meeting court standards, respectively. TD is evaluated on an inverted scale, where L (Low) difficulty scores favorably for traceability. TS is the numerical composite computed per Equation (1) using AHP-derived weights (w_A, w_T, w_L) = (0.385, 0.204, 0.412) from Section 3.4. Rating: ordinal classification of TS, with ●●● High (TS ≥ 2.25), ●● Medium (1.55 ≤ TS < 2.25), ● Low (TS < 1.55). Rows ordered by descending TS within rating tier.

Vulnerability	Layer	App	TD	LA	TS	Rating	Investigative Priority
External API leakage	L2	H	L	H	3.00	●●●	Immediate
OPSEC failure (metadata)	L4	H	L	H	3.00	●●●	Immediate
Misconfigured server headers	L2	H	L	H	3.00	●●●	Immediate
Cross-platform identity linkage	L4	H	M	H	2.80	●●●	High
JavaScript/WebRTC leakage	L2	M	L	M	2.20	●●	Medium
Website fingerprinting	L3	M	M	M	2.00	●●	Medium
Scale-free hub mapping	L5	M	M	M	2.00	●●	Medium
Traffic correlation	L1	M	H	M	1.80	●●	Medium
Circuit fingerprinting	L1	L	H	M	1.41	●	Low
Clock-skew timing	L1	L	H	M	1.41	●	Low
Resource consumption side-channel	L3	M	H	L	1.38	●	Low

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shin, J.; Shin, I. A Vulnerability Taxonomy for Tor-Based Hidden Services: Toward a De-Anonymization Framework for Cybercrime Investigation. Electronics 2026, 15, 2370. https://doi.org/10.3390/electronics15112370

AMA Style

Shin J, Shin I. A Vulnerability Taxonomy for Tor-Based Hidden Services: Toward a De-Anonymization Framework for Cybercrime Investigation. Electronics. 2026; 15(11):2370. https://doi.org/10.3390/electronics15112370

Chicago/Turabian Style

Shin, Jiho, and Inkyoung Shin. 2026. "A Vulnerability Taxonomy for Tor-Based Hidden Services: Toward a De-Anonymization Framework for Cybercrime Investigation" Electronics 15, no. 11: 2370. https://doi.org/10.3390/electronics15112370

APA Style

Shin, J., & Shin, I. (2026). A Vulnerability Taxonomy for Tor-Based Hidden Services: Toward a De-Anonymization Framework for Cybercrime Investigation. Electronics, 15(11), 2370. https://doi.org/10.3390/electronics15112370

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Vulnerability Taxonomy for Tor-Based Hidden Services: Toward a De-Anonymization Framework for Cybercrime Investigation

Abstract

1. Introduction

2. Background

2.1. Tor Network Architecture

2.2. Dark Web vs. Surface Web: Structural Differences

2.3. Overview of Existing De-Anonymization Research

3. Methodology: Taxonomy Construction

3.1. Literature Collection Scope

3.2. Taxonomy Classification Criteria

3.3. Traceability Score Computation

3.4. Weight Derivation via Analytic Hierarchy Process

3.4.1. Panel Composition and Inclusion Criteria

3.4.2. Pairwise Comparison Elicitation

3.4.3. Individual Weight Derivation and Group Aggregation

3.4.4. Consistency Verification

3.4.5. Sensitivity Analysis

4. Proposed Vulnerability Taxonomy

4.1. Layer 1 (L1): Network-Level Vulnerabilities

4.1.1. End-to-End Traffic Correlation

4.1.2. Circuit Fingerprinting

4.1.3. Timing and Clock-Skew Analysis

4.2. Layer 2 (L2): Application-Level Vulnerabilities

4.2.1. External API and Resource Leakage

4.2.2. Misconfigured Server Headers and Error Pages

4.2.3. JavaScript and WebRTC Leakage

4.3. Layer 3 (L3): Side-Channel Vulnerabilities

4.3.1. Website Fingerprinting via Traffic Analysis

4.3.2. Resource Consumption Side-Channel

4.3.3. Packet Timing and Traffic Shaping Analysis

4.4. Layer 4 (L4): Operational Security (OPSEC) Failure Vulnerabilities

4.4.1. Metadata Reuse and Infrastructure Cross-Referencing

4.4.2. Cross-Platform Identity Linkage

4.5. Layer 5 (L5): Ecosystem-Level Vulnerabilities

4.5.1. Scale-Free Network Structure and Hub Identification

4.5.2. Shared Hosting Infrastructure Analysis

5. Traceability Evaluation Framework (TEF)

5.1. High-Traceability Vulnerabilities (●●●): Investigative Priority

5.2. Medium-Traceability Vulnerabilities (●●): Conditional Utility

5.3. Low-Traceability Vulnerabilities (●): Research-Stage Techniques

6. Discussion

6.1. Multi-Layer Attack Scenarios and Investigative Sequencing

6.2. Legal and Jurisdictional Considerations

6.3. Limitations and Future Research Directions

6.4. Methodological Reflections from Expert Panel Elicitation

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI