Next Article in Journal
PEARL: A Rubric-Driven Multi-Metric Framework for LLM Evaluation
Previous Article in Journal
Correction: Wu et al. Critical Factors for Predicting Users’ Acceptance of Digital Museums for Experience-Influenced Environments. Information 2021, 12, 426
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Unveiling Dark Web Identity Patterns: A Network-Based Analysis of Identification Types and Communication Channels in Illicit Activities

by
Luis de-Marcos
*,
Adrián Domínguez-Díaz
,
Javier Junquera-Sánchez
,
Carlos Cilleruelo
and
José-Javier Martínez-Herráiz
Department of Computer Science, Universidad de Alcalá, Alcalá de Henares, 28801 Madrid, Spain
*
Author to whom correspondence should be addressed.
Information 2025, 16(11), 924; https://doi.org/10.3390/info16110924
Submission received: 17 September 2025 / Revised: 15 October 2025 / Accepted: 17 October 2025 / Published: 22 October 2025
(This article belongs to the Section Information Security and Privacy)

Abstract

The Dark Web, a hidden segment of the internet, has become a hub for illicit activities, facilitated by various forms of digital identification (IDs) such as email addresses, Telegram accounts, and cryptocurrency wallets. This study conducts a comprehensive analysis of the Dark Web’s identification and communication patterns, focusing on the roles of different ID types and their associated activities. Using a dataset of Dark Web documents, we construct and analyze a bipartite network to model the relationships between IDs and web documents, employing graph–theoretical metrics such as degree centrality, closeness centrality, betweenness centrality, and k-core decomposition, while analyzing subnetworks formed by ID type. Our findings reveal that Telegram forms the backbone of the network, serving as the primary communication tool for hacking-related activities, particularly within Russian-speaking communities. In contrast, email plays a more decentralized role, facilitating finance–crypto and other activities but with a high level of fragmentation and English as the predominant language. XMR (Monero) wallets emerge as a key component in financial transactions, forming a cohesive subnetwork focused on cryptocurrency-related activities. The analysis also highlights the modular and hierarchical nature of the Dark Web, with distinct clusters for hacking, finance–crypto, and drugs–narcotics, often operating independently but with some cross-topic interactions. This study provides a foundation for understanding the Dark Web’s structure and dynamics, offering insights that can inform strategies for monitoring and mitigating its risks.

Graphical Abstract

1. Introduction

The Dark Web, a hidden portion of the internet accessible only through specialized software like Tor, has become a focal point for illicit activities, including hacking, drug trafficking, financial fraud, and the exchange of illegal goods and services [1,2]. Its anonymity and decentralized nature make it an attractive platform for cybercriminals, raising significant challenges for law enforcement and cybersecurity researchers [3]. However, it is worth noting that the Dark Web is also used by activists and individuals seeking to promote ideas and causes, such as championing human rights in regimes with strict policing, by concealing their identities from authorities [4,5]. Understanding the structure and dynamics of the Dark Web is crucial for developing effective strategies to monitor and mitigate its risks. However, the complexity and scale of the Dark Web necessitate advanced gathering, navigating, and analytical approaches to uncover its underlying patterns and mechanisms [6,7,8].
One of the key challenges in studying the Dark Web is the identification and analysis of the communication channels and financial tools used by its actors. These include a wide variety of aliases and identities that facilitate interactions and transactions [9]. Previous research has highlighted the importance of these forms of authorship in enabling illicit activities [10], but there is limited understanding of how they are organized and interconnected within the broader network. For example, studies have shown that Telegram is widely used for illegal sales of drugs [11] and scams [12], while cryptocurrency wallets like Monero (XMR) and Bitcoin (BTC) are prevalent in financial transactions [13,14,15]. However, a comprehensive analysis of how these IDs interact across different topics and subnetworks remains underexplored.
Network analysis has emerged as a powerful tool for studying the Dark Web, enabling researchers to map its structure, identify key actors, and uncover patterns of interaction [16]. Existing studies focus on the structural properties of the network of direct connections between domains or actors [17,18,19]. By modeling the Dark Web as a bipartite graph, where nodes represent IDs and web documents, and edges represent connections between them, researchers can gain insights into the organization and dynamics of illicit activities. Metrics such as degree centrality, betweenness centrality, and k-core decomposition have been used to identify influential nodes and central areas of activity [20]. Nonetheless, another body of studies have focused on specific aspects of the Dark Web, such as marketplaces or forums, leaving a gap in understanding the broader network and its subnetworks [2].
This paper aims to address this gap by providing a comprehensive analysis of the Dark Web’s identification and communication patterns, focusing on the roles of various ID types and their associated activities. By examining the overall network structure, subnetworks, and key metrics, we seek to uncover the underlying mechanisms that drive the Dark Web’s organization and dynamics. Our analysis builds on previous work in network science and Dark Web research, leveraging advanced analytical techniques to provide new insights into this complex and evolving ecosystem. The study is guided by the following research questions:
  • What are the predominant types of IDs used on the Dark Web, and how do they facilitate communication and coordination?
    This question seeks to identify the most common ID types (e.g., email, Telegram, cryptocurrency wallets) and their roles in enabling interactions within the Dark Web.
  • How are different topics of activity (e.g., hacking, finance, drugs–narcotics) distributed across the Dark Web network IDs, and what are their linguistic and structural characteristics?
    This question explores the thematic and linguistic patterns of Dark Web activities, focusing on how different topics are organized and interconnected between different forms of identification.
  • How do ID subnetworks differ in terms of connectivity, cohesion, and fragmentation?
    This question investigates and compares the overall structural organization of subnetworks of activity by ID type, including the largest connected components, subnetworks, and their metrics (e.g., density, average path, centrality).
  • How do specific ID types (e.g., Telegram, email, XMR wallets) contribute to the overall network, and what are their unique roles in facilitating different types of activities?
    This question examines the distinct roles of various ID types, focusing on their centrality, connectivity, and specialization within the network.
  • What insights can be gained from analyzing the central areas of activity (largest connected components and k-cores) of the Dark Web network, and how do they reveal the key areas of activity?
    This question focuses on identifying the core areas of activity within the network, using k-cores and connected components.
To address these research questions, this study pursues the following objectives: (1) identifying the most common ID types and their roles in enabling interactions; (2) exploring the thematic, linguistic, and structural patterns of Dark Web activities across ID types; (3) investigating and comparing the structural organization of subnetworks by ID type, including metrics like density and centrality; (4) examining the distinct roles of specific ID types in facilitating activities, focusing on centrality and specialization; and (5) identifying core areas of activity using connected components and k-cores.
By addressing these research questions through the stated objectives, this paper aims to contribute to a deeper understanding of the Dark Web’s structure, dynamics, and the roles of various ID types in facilitating illicit activities. The findings can inform future research and potential interventions aimed at monitoring and mitigating the risks associated with the Dark Web.

2. State of the Art

The Dark Web, a segment of the internet that is intentionally hidden and often associated with illicit activities, presents a complex landscape for researchers and law enforcement agencies. Understanding the dynamics of communication and interaction within this environment is crucial for developing effective strategies to combat cybercrime [21]. Cilleruelo et al. [18] investigate the interconnection between darknets, focusing on the structural properties and topological characteristics of these hidden networks. Their study reveals that darknets exhibit a bow-tie structure, similar to that of the surface web, where a few highly connected nodes act as central hubs for illicit activities. By employing graph analysis techniques, the authors analyze the relationships between various dark web platforms. Research by Alharbi et al. [19] provides valuable insights into the topological properties of the Tor dark web, revealing also a bow-tie structure similar to that of the surface web. Their findings indicate that a few highly connected nodes serve as hubs for illicit activities, highlighting the importance of understanding network structure. Figueras-Martín et al. [22] expand the understanding of darknets by analyzing the connectivity and content of Freenet, a lesser-known darknet. Their study highlights the prevalence of illegal content, particularly underage pornography, within Freenet. By comparing these findings with a similar study on the I2P darknet, the authors underscore the unique features and differences among various darknets, further enriching the discourse on the structural and content dynamics of these hidden networks. Similarly, Takaaki and Atsuo [17] focus on the content and visualization of Dark Web forums, emphasizing their role as central hubs for discussions related to various illicit activities. Their analysis illustrates how forums facilitate communication among users, providing a platform for the exchange of information and coordination of illicit activities.
Lee et al. [13] investigate cryptocurrency abuses in the Dark Web, uncovering the interconnectedness of various actors involved in financial transactions. Their findings highlight the financial dimensions of illicit activities, yet they do not examine the identification mechanisms that enable these interactions. A larger body of work [15,23,24] examines darknet activity through the lens of cryptocurrency and the Bitcoin blockchain, providing insights into the financial transaction patterns occurring within the Dark Web including timing, geographies, and migration. However, theses analyses focus mostly on illicit drugs and does not explore the identification patterns that facilitate these transactions. Nali et al. [11] focus on the identification and characterization of illegal sales on the Telegram messaging platform, emphasizing its role as a communication channel for illicit activities. While their research underscores the importance of Telegram in facilitating interactions, it primarily emphasizes the content of these communications. Arabnezhad et al. [9] investigate the linking of dark web aliases to real internet identities, providing insights into how users navigate the Dark Web. Although their study addresses the technical aspects of linking aliases, it does not offer a comprehensive analysis of the types of IDs used in communication. Kühn et al. [6] explore the evaluation of the Dark Web for cyber threat intelligence, discussing manual and semi-automated methods for analyzing the network. While their research contributes to understanding the structural organization of the Dark Web, it does not specifically address the identification patterns of users.
Despite the valuable contributions of these studies, there remains a significant gap in the literature regarding the structural properties that underlie the connections between dark web documents and the forms of IDs used. While various types of IDs, including usernames, email addresses, and cryptocurrency wallets, have been identified, the analyses often lack a comprehensive examination of how these IDs interact within the larger network structure. Further, much of the existing research emphasizes the role of forums as central hubs for communication and interaction [25,26] often at the expense of understanding the identification mechanisms that facilitate these interactions. This underrepresentation of identification patterns limits the overall understanding of how users navigate the Dark Web and engage in illicit activities. This oversight limits the understanding of the dynamics of communication and coordination in the Dark Web, highlighting the need for further research that integrates both identification patterns and network structure.

3. Materials and Methods

3.1. Data Gathering and Preprocessing

The base dataset used in this study consists of Dark Web documents categorized into seven topics of malicious activities: drugs–narcotics, electronics, finance, finance–crypto, hacking, search-engine-index, and others. The categorization follows the MISP Dark Web taxonomy (https://www.misp-project.org/taxonomies.html#_dark_web, accessed on 21 April 2025), initially proposed by Dalins, Wilson, and Carman [27] and subsequently updated and extended by the MISP Project and the European Commission Joint Research Center. This taxonomy provides a standardized framework for classifying malicious activities on the Dark Web.
The dataset was constructed using a combination of automated and manual processes. An automated crawling system continuously monitored the Dark Web, generating daily reports that include statistics and a list of the most visited sites over the past 24 h. These reports were manually reviewed to filter out sites deemed “uninteresting.” Uninteresting sites are defined as those previously analyzed and those hosting non-malicious content (e.g., mirrors of Wikipedia or The New York Times). This manual review ensures that only relevant and non-redundant sites are included in the dataset. Over several weeks (from 10 July to 19 November 2024), the dataset was compiled by manually inspecting the visited sites, scraping their content, and categorizing their content based on the MISP Dark Web taxonomy. Periodic web scraping was found as an effective approach to gather evidence about darknet marketplace users [28] while web crawling specific methodologies can identify operators of illegal Dark Web sites and connect them with the corresponding surface websites [29]. Malicious websites not covered by any topic were categorized under ‘other’. Upon closer inspection, we noticed that these were mostly community-driven question-and-answer (Q&A) websites.
The dataset comprises web archives (WARCs) divided into web documents that were processed using a Python (version 3.10.5) script to extract references to external communication channels and cryptocurrency accounts. The following types of account IDs were identified and extracted: email addresses, phone numbers, Telegram usernames, Pastebin links, Discord URLs, and cryptocurrency wallets (BTC, XMR, DASH, BNB, and ZEC). The extraction process utilized the “Restalker” package, which is designed to detect standard communication and cryptocurrency IDs. Custom regular expressions were also implemented to identify additional Telegram user IDs not covered by the default package. Table 1 presents the distribution of crawled Tor domains and number of initial web documents under each topic. To further enrich the dataset, the primary language of each document was determined using the “langdetect” package. This step helps in understanding the linguistic distribution of the Dark Web content and its potential geographic or cultural associations.
The relationships between the extracted IDs and the WARCs were modeled as a bipartite graph, a type of network structure where nodes are divided into two distinct partitions, and edges only connect nodes from different partitions. In this case, the two partitions consist of (1) IDs (e.g., email addresses, cryptocurrency wallets, Telegram usernames) and (2) Dark Web documents. An edge is created between an ID and a document if the ID is referenced within the HTML content of that document. This bipartite structure allows for a clear representation of how external communication and financial channels are linked to specific Dark Web content. The bipartite graph was constructed by connecting each document node to the ID nodes that appeared at least once in its HTML code. For example, if a Dark Web document contained references to a Bitcoin wallet and a Telegram username, edges were drawn from the document to both the wallet and the username nodes. Only documents containing at least one reference to an ID were included in the graph, ensuring that the network focuses on meaningful connections. In this study, an edge represents a co-occurrence association between an ID and a document, rather than a confirmed communication or transaction event. The resulting bipartite network therefore reflects patterns of reference and association rather than direct, verified user-to-user interactions. Interpretations of connectivity and centrality should thus be understood as structural indicators of visibility and association frequency within the dataset. The final bipartite graph comprises 139,356 nodes (57,071 IDs and 82,285 documents) and 248,971 edges, representing the interactions between Dark Web documents and the external IDs they reference. This network serves as a foundation for analyzing the structure and dynamics of communication and financial activities on the Dark Web in this paper.

3.2. Analytical Tools

With the network represented as an undirected bipartite graph, where nodes are divided into two partitions—IDs and Dark Web documents—we employed a range of graph-theoretical metrics to analyze the structure and dynamics of the network. The bipartite nature of the graph ensures that edges only connect nodes from different partitions, reflecting the relationships between IDs and the documents in which they are referenced. To identify the main or influential actors in the network, we computed the following node-level metrics, adapted for bipartite graphs: Degree, closeness centrality, and betweenness centrality. Degree is the number of edges connected to a node. For IDs, this represents how many documents reference them; for documents, it indicates how many IDs they contain. High-degree nodes are likely to be influential. Closeness centrality measures how close a node is to all other nodes in the network. Nodes with high closeness centrality can efficiently interact with others, making them central to the network’s communication flow. We used harmonic closeness centrality which is appropriate for unconnected networks. Betweenness centrality quantifies the number of shortest paths that pass through a node. Nodes with high betweenness act as bridges between different parts of the network, playing a critical role in information flow. Degree, closeness and betweenness were normalized to the range 0–1 to compare results of different subgraphs.
To analyze the categorized topics and areas of activity, we built subgraphs for each type of ID (e.g., email, BTC, Telegram) including all the ID nodes of the given type and all the documents connected with them. These subgraphs represent the interactions specific to each ID type, allowing us to identify the most influential actors within each category and also the size that each type of ID represents in the network along with its main topics of activity and language. For example, the subgraph for Bitcoin wallets highlights documents and activities (e.g., hacking or finance) related to this cryptocurrency transactions. By computing node-level metrics within each subgraph, we identified the most influential IDs and their roles in their respective subnetworks.
We computed the following metrics for both the overall bipartite network and its subnetworks: Number of connected components, size of the largest connected component, density, average path length and diameter. The number of connected components is the number of disconnected subgraphs within the network. The largest connected component (LCC) is the component that contains more nodes. An LLC containing a high proportion of nodes indicates a cohesive network, while many small components suggest fragmentation. Density is the ratio of actual edges to possible edges. A dense network indicates strong interconnectivity, while a sparse network suggests limited interactions. Average path length is the average number of steps along the shortest paths between all pairs of nodes. For a bipartite graph, we computed all shortest paths from all nodes of the first partition to all nodes of the second partition. Shorter paths indicate efficient communication or interaction. Diameter is the longest shortest path between any two nodes of different partitions. A smaller diameter suggests a more tightly connected network. We also computed the average of the node metrics previously described. Average degree represents number of edges per node. This provides an overview of the network’s connectivity. Average closeness and betweenness provide insights into the network’s overall centralization and bridging potential.
The largest connected components of the main network were analyzed in detail, as they represent the core structure. By focusing on the LCCs, we gained insights into the most active and interconnected parts of the network. This analysis also revealed patterns of collaboration or coordination among IDs and dark web pages. To identify the central areas of activity, we also applied k-core decomposition, a method that recursively removes nodes with degrees less than k. The resulting k-cores represent densely connected subgraphs, highlighting the most active and cohesive regions of the network. Higher k-cores indicate areas with intense interaction, often involving influential actors.
The analysis was conducted using NetworkX, a Python library for graph analysis, to extract and process subgraphs and compute metrics. Particularly, the algorithms under the bipartite package were used to compute metrics. For visualization, we used Gephi version 0.9.2 [30], employing the ForceAtlas 2 layout [31], which emphasizes community structure and node centrality.

3.3. Validation and Error Estimation

To assess the reliability of ID extraction, a manually annotated validation set of 500 documents was created. Each document was reviewed to verify the accuracy of automatically detected IDs. We computed the precision, which stood at 95.8%, the recall at 91.4%, and the F1-score at 93.6%.
The main sources of false positives were embedded non-functional email strings and incomplete Telegram handles. False negatives mostly involved nonstandard cryptocurrency address formats. These error rates indicate that extraction reliability is high enough to support structural inferences but not precise enumeration of all IDs. This validation step improves confidence in the overall network model and informs the interpretation of its results.

4. Results

4.1. ID Types and Their Connections

Table 2 provides a comprehensive summary of the types of IDs found in the Dark Web documents, along with the number of documents connected to each ID type, their main topics, and primary languages. Email and Telegram are the most prevalent forms of identification in the dataset. Email IDs are present in 29,735 documents, while Telegram IDs are referenced in 76,967 documents, making them the two largest subnetworks. For email, the majority of connected documents are in English (22,385 documents) and classified under the finance–crypto topic (17,640 documents). This suggests that email is a common communication channel for cryptocurrency-related activities, which are predominantly conducted in English. In contrast, Telegram IDs are most frequently associated with documents in Russian (50,416 documents) and classified under hacking activities (55,482 documents). This indicates that Telegram is a preferred platform for communicating about hacking-related activities, particularly within Russian-speaking communities.
XMR (Monero) wallets form the third largest subnetwork, with 20 wallets connecting 17,644 documents. Almost all of these documents are in English (17,663 documents) and classified under the finance–crypto topic (17,597 documents). This highlights the significant role of Monero, a privacy-focused cryptocurrency, in financial transactions on the Dark Web, particularly in English-speaking contexts. BTC (Bitcoin) wallets create the fourth largest network, connecting 1944 documents. Most of these documents are in English (1106 documents), and the majority are classified under the ‘other’ category (689 documents). This suggests that Bitcoin, while still widely used, is less specialized in specific topics compared to Monero, and its use cases are more diverse.
All other ID types are significantly underrepresented, connecting fewer than 1000 documents each. For example, Paste IDs are connected to 623 documents, primarily in Russian (495 documents) and classified under hacking (589 documents). PGP IDs are linked to 895 documents, mostly in English (860 documents) and associated with search-engine-index activities (856 documents). Phone numbers, Discord URLs, and other cryptocurrency wallets (e.g., DASH, BNB, ZEC) have minimal representation, with fewer than 1000 documents each. These IDs are often linked to hacking activities and are primarily in Russian or English.
The overall network consists of 57,071 IDs connected to 82,285 documents. The dominant topic across the network is hacking (57,223 documents), and the primary language is Russian (50,852 documents). This underscores the prominence of hacking-related activities and the significant role of Russian-speaking communities in the Dark Web ecosystem. The dominance of finance–crypto and hacking topics across multiple ID types (e.g., email, Telegram, XMR wallets) highlights the dual focus of the Dark Web on financial transactions and cybercriminal activities. The linguistic divide between English and Russian documents reflects the global nature of the Dark Web, with English being the primary language for financial activities and Russian dominating hacking-related content. The limited presence of other ID types (e.g., Skype URLs, DASH wallets) suggests that these are niche tools used as supplementary channels or in specific contexts, often within smaller, specialized communities.

4.2. Top Topics and Languages for the Largest ID Networks

Table 3 provides a detailed breakdown of the four largest networks by ID type, presenting the top three topics and their primary languages for each. The largest network, associated with Telegram IDs, is dominated by hacking activities, with 55,482 documents primarily in Russian (50,411 documents). This reinforces the observation that Telegram serves as a central platform for coordinating hacking-related activities, particularly within Russian-speaking communities. The second most common topic for Telegram IDs is finance–crypto, with 17,614 documents all in English. This indicates that Telegram is also widely used for cryptocurrency-related discussions, particularly in English-speaking contexts. The third topic, drugs–narcotics, is associated with 1671 documents, mostly in English (1614 documents), suggesting that Telegram also facilitates communication in the illicit drug trade, albeit to a lesser extent compared to hacking and finance–crypto.
For email IDs, the primary topic is finance–crypto, with 17,640 documents all in English. This underscores the role of email as a key communication tool for cryptocurrency-related activities, particularly in English-speaking environments. The second most common topic is hacking, with 9347 documents, primarily in Russian (7129 documents), indicating that email is also used in hacking-related activities, though to a lesser extent than Telegram. The third topic, search-engine-index, is associated with 1378 documents, mostly in English (1357 documents), suggesting that email is occasionally used in contexts related to search engines or indexing services on the Dark Web.
The network associated with XMR wallets is overwhelmingly focused on finance–crypto, with 17,597 documents all in English. This highlights the significant role of Monero in financial transactions on the Dark Web, particularly in English-speaking contexts. The second topic (search-engine-index) and third topic (hacking) are associated with only 30 and 14 documents, respectively, indicating minimal presence of XMR wallets in other activities. For BTC wallets, the primary topic is “other”, with 689 documents mostly in Portuguese (641 documents). This category includes documents dealing with questions and answers, indicating a niche use case for Bitcoin in Portuguese-speaking contexts. The presence of Portuguese in this category may be attributed to the specific moment and method of data collection, reflecting regional variations in the use of Bitcoin during the dataset’s compilation. The second topic, drugs–narcotics, is associated with 355 documents all in English, suggesting that Bitcoin is used in the trade of illicit drugs, particularly in English-speaking environments.
The dominance of Russian in hacking-related documents across multiple ID types, and particularly in Telegram, highlights the significant role of Russian-speaking communities in malicious activities on the Dark Web. In contrast, English is the predominant language for finance–crypto activities more evenly distributed across a variety of ID types, reflecting the global nature of cryptocurrency transactions and the widespread use of English in financial contexts. The presence of Portuguese in the “other” category for BTC wallets suggests regional variations in the use of Bitcoin, particularly in Portuguese-speaking communities, which may be influenced by the specific timing and methodology of data collection.
To quantify linguistic bias, we analyzed the language distribution across all documents: Russian (43%), English (39%), Portuguese (9%), and others (9%). To assess representativeness, we added a small simulated validation subset of 3000 documents collected from lower-traffic domains predominantly in Portuguese and Spanish.
Results from this subset confirmed that while Russian–English polarization defines most high-traffic regions, smaller Portuguese and Spanish subnetworks exhibit coherent internal structures. Telegram remains central, but in these subsets, email plays a more prominent role (average degree 1.7× higher than global mean). This suggests that smaller linguistic communities rely on simpler and more direct coordination methods, partially compensating for their limited network scale.

4.3. Main Network

4.3.1. Overall Network Structure

Figure 1 presents the bipartite graph representing the complete network of IDs and web documents. At a high level, the graph reveals a densely connected structure with a large central cluster dominated by Telegram IDs, located in the central-bottom part of the figure. This central cluster highlights the significant role of Telegram as a hub for communication and coordination within the Dark Web. Surrounding this central cluster, email accounts are predominantly distributed across the peripheral areas of the graph, indicating their widespread use but less centralized role compared to Telegram IDs. Other ID types, such as BTC wallets and phone numbers, appear sporadically and are less densely connected, reflecting their niche or specialized use within the network.
Due to the complexity and scale of the complete network, finer details are difficult to discern. The graph also highlights the heterogeneity of the network, with different ID types occupying distinct regions. For instance, the central cluster of Telegram IDs suggests a high level of interconnectivity among these nodes, likely reflecting their role in facilitating group communication or coordination. In contrast, the peripheral distribution of email accounts indicates a more decentralized pattern of use, potentially associated with individual or less coordinated activities.

4.3.2. Largest Connected Component

The LCC (Figure 2) contains 106,924 nodes, comprising 47,218 IDs and 59,706 documents, which represents 76.72% of the complete network. This substantial portion underscores the centrality and cohesion of the LCC within the overall network structure. Compared to the complete network in Figure 1, most nodes in the periphery have disappeared, and a significant area of activity—originally visible in the central-left part of Figure 1 and constituting the second largest connected component—is no longer present.
The periphery of the LCC is almost exclusively dominated by email IDs, represented in blue, indicating that email is the prevalent form of communication in these outer regions. This suggests that email serves as a widespread but less centralized means of interaction, often connecting to documents at the edges of the network. In contrast, the central area of the graph, particularly the bottom-center region, is primarily populated by Telegram IDs, highlighting their role as a central platform for communication and coordination. Other ID types, such as Paste and Phone, are also present in this central area, though to a lesser extent, indicating their involvement in core activities but with a more limited scope compared to Telegram. BTC wallets, depicted in dark blue, are relatively scarce but are primarily concentrated in the top-left part of the figure. This region is notable for its abundance of both Telegram and email IDs, suggesting a convergence of communication channels and financial activities. The presence of BTC wallets in this area may indicate their use in facilitating transactions or financial interactions within this subset of the network.

4.3.3. Second Largest Connected Component

Figure 3 illustrates the bipartite graph of the second largest connected component (2nd LCC) for the network of IDs and web documents. This component contains 17,614 documents, representing 12.63% of the overall network, and is connected by 52,864 edges. Notably, only 7 nodes represent IDs, with the majority of connections driven by four central IDs: two XMR wallets, one email address, and one Telegram bot. All documents in this component are classified under the finance–crypto topic, highlighting its focus on cryptocurrency-related activities. The structure of the 2nd LCC is characterized by four highly connected “whale” nodes that link all web documents in the component. The email address and Telegram bot belong to the OrangeFren exchange aggregator, a platform that facilitates cryptocurrency transactions. These two IDs are referenced by all documents within the component, indicating their central role in coordinating activities. The two XMR wallets are also highly influential, with degrees of 8394 and 9204, respectively, reflecting their extensive use in financial malicious activities. The remaining IDs are peripheral, playing a minimal role in the overall structure.
The 2nd LCC is entirely independent of the LCC, meaning its activities are isolated from the main network. This independence suggests a specialized subnetwork focused exclusively on finance–crypto operations, likely centered around the OrangeFren platform and associated XMR wallets. The high degree of connectivity among the central IDs underscores their importance in facilitating transactions and communication within this component. Together, the first and second largest connected components account for 89.37% of all nodes in the network, indicating that the majority of activity is concentrated in these two components. The remaining 1848 components in the network are highly dispersed, reflecting a fragmented structure with many smaller, isolated subnetworks. This dispersion suggests that while the Dark Web contains a few large, cohesive networks of documents and IDs, also a part of its activity occurs in smaller, specialized groups.

4.3.4. k-Core Analysis: The 5-Core

Figure 4 presents the bipartite graph of the 5-core for the network of IDs and web documents. The 5-core represents a subgraph in which each node has a degree of at least five, meaning every node in this subgraph is connected to at least five other nodes. This subgraph captures the most densely connected and central part of the network, highlighting the core areas of activity. It contains 2647 nodes (1.90% of the overall network) and 13,456 edges. In this subgraph, Telegram and email IDs remain the most prevalent forms of identification, maintaining a layout similar to that observed in the larger connected components. Telegram IDs continue to dominate the central areas, underscoring their role as a primary communication tool within the Dark Web. Email IDs are also prominently present, particularly in the peripheral regions. Additionally, the 5-core includes a notable number of Paste IDs, which are often used for sharing text-based information, such as code snippets or instructions. A small number of phone numbers are also present, reflecting their limited but specialized use in certain activities. Interestingly, the 5-core contains one BTC wallet, suggesting that even in the most central and densely connected part of the network, cryptocurrency wallets play a relatively minor role compared to communication-focused IDs like Telegram and email.

4.4. Subnetworks by ID Type

4.4.1. Metrics of Subnetworks by ID Type

The metrics of the subnetworks containing only the nodes for each ID type (Table 4) offer insights into the structural characteristics and connectivity patterns while also revealing distinct differences in the organization and cohesion of subnetworks. All subnetworks exhibit relatively low densities, a common feature in networks of human activity where interactions are often sparse but meaningful [32]. Despite the low densities, the subnetworks display relatively short average path lengths, indicating efficient communication or interaction between nodes. This is further supported by high average closeness centrality values across all subnetworks, suggesting that nodes are generally close to each other within their respective subnetworks. The diameters of the subnetworks are proportional to their size, with larger subnetworks tending to have longer diameters.
Differing from the main graph, most subnetworks, except for Telegram, exhibit a low percentage of nodes in the LCC and a high number of connected components. This fragmentation suggests that these subnetworks consist of numerous small, independent areas of activity rather than a single cohesive structure. For example, the BTC Wallet subnetwork has 224 connected components, with only 28.8% of nodes in the LCC, while the Discord URL subnetwork has 66 connected components, with 19.9% of nodes in the LCC. Similarly, the Paste subnetwork has 289 connected components, with only 4.0% of nodes in the LCC. This pattern is consistent across most ID types, indicating that activities involving these IDs are highly decentralized and fragmented. The exception is the Telegram subnetwork, which has a relatively large LCC containing 75.1% of its nodes, similar to the main network’s LCC size of 76.72%. This suggests that Telegram IDs form the backbone of the network, providing a centralized structure for communication and coordination.
The relative size of the LCC in the main network (76.72%) is larger than that of all subnetworks, except for the Telegram subnetwork, which closely mirrors the main network’s cohesion. The XMR Wallet subnetwork also stands out, with 52.1% of its nodes in its LCC, reflecting its role in facilitating a significant portion of financial transactions. These observations suggest that Telegram IDs and XMR wallets play central roles in the network, while other ID types provide supporting or complementary areas of activity.
To examine temporal stability, we compared network metrics between two temporal snapshots: July 2024 (early collection period) and November 2024 (final collection period). We found that the Telegram subnetwork LCC size changed from 74.9% in July to 75.3% in November, the XMR subnetwork density changed from 0.051 to 0.049, and the mean degree centrality for all IDs showed a variation of ±3%. Overall, the network topology and community structure remained consistent. Topic-language correlations, such as those between Russian and hacking or English and finance, persisted with minimal shifts. These results suggest that the observed patterns represent relatively stable structures rather than transient clustering.

4.4.2. Telegram Subnetwork by Topic

Figure 5 presents the bipartite subnetwork representing only the Telegram IDs and the web documents connected to them, colored by topic. The graph reveals several distinct clusters, each corresponding to a specific topic of activity. The most prominent feature is the central cluster of hacking activities, which aligns with the prevalence of hacking as the dominant topic across the complete network. This central cluster underscores the significant role of Telegram in facilitating hacking-related communication and coordination, particularly within Russian-speaking communities, as previously observed. To the left of the figure, a separate cluster is dedicated to finance–crypto activities. This cluster is unconnected to the central hacking cluster and corresponds to the 2nd LCC described earlier. The isolation of this cluster suggests that finance–crypto activities involving Telegram IDs operate independently from hacking-related activities, likely focusing on cryptocurrency transactions and financial discussions.
Two additional clusters, both focused on drugs–narcotics, are located at opposite sides of the graph. These clusters form their own separate components, with no connections to the central hacking cluster or to each other. This separation indicates that drug-related activities on Telegram are highly specialized and operate in isolated niches, potentially reflecting distinct communities or marketplaces. Smaller clusters related to search-engine-index activities are scattered across the left part of the graph, while a cluster categorized under “other” is located at the bottom-left. These smaller clusters are part of the central connected component of the subgraph, suggesting cross-topic interactions between hacking, search-engine-index, and Q&A forums (classified under ‘other’). These interactions may indicate collaborations or overlaps between different illicit activities, such as the use of search engines for hacking purposes or the integration of Q&A forums into broader cybercriminal operations.

4.4.3. Email Subnetwork by Topic

Figure 6 illustrates the bipartite subnetwork representing only the email IDs and the web documents connected to them, colored by topic. The graph reveals a complex and dispersed structure, highlighting the role of email as a communication tool across various topics on the Dark Web. The central cluster of the graph is dominated by finance–crypto activities, which corresponds to the 2nd LCC of the overall network. This cluster underscores the importance of email in facilitating cryptocurrency-related transactions and discussions, particularly in English-speaking contexts. The prominence of finance–crypto activities in this central cluster aligns with the broader trend observed in the network, where email serves as a key communication channel for financial operations.
However, the subnetwork exhibits a high level of dispersion, making it difficult to identify clear patterns. This dispersion is reflected in the 3044 connected components within the email subnetwork, indicating that activities connected by email are often isolated and fragmented. Unlike the Telegram subnetwork, which displays a more cohesive structure, the email subnetwork consists of numerous small, independent areas of activity. This suggests that email is used in a decentralized manner, often for specialized or niche purposes rather than large-scale coordination. Within the central area of the graph, several clusters related to hacking activities are visible, with some connections between them. These clusters indicate that email is also used in hacking-related communications, though to a lesser extent than Telegram. Additionally, clusters related to finance and other topics are present, with similar levels of connectivity. The presence of these clusters suggests that email serves as a versatile tool, facilitating interactions across multiple topics, including hacking, finance, and general discussions (classified under “other”).

4.5. Qualitative Content Analysis

To contextualize quantitative findings, we conducted a manual content analysis of 50 randomly selected documents referencing Telegram and XMR IDs. The analysis revealed three primary functional patterns:
  • Telegram—primarily used for coordination, broadcast announcements, and contact exchange.
  • XMR wallets—predominantly used for payment instructions, escrow, and transaction verification.
  • Email—used for customer follow-ups, negotiation of trades, or technical inquiries.
This qualitative evidence suggests that Telegram’s network prominence reflects its broadcast and coordination role, while XMR’s cohesion corresponds to reliable financial functions. These insights help explain the observed structural distinctions between communication and transaction subnetworks.

5. Discussion

The findings of this study provide valuable insights into identification and reference patterns on the Dark Web, highlighting associations among IDs and documents rather than confirmed communication between individuals. Telegram appears structurally central, but this should be interpreted as a broadcast effect rather than evidence of direct interaction. Telegram’s function as a public coordination hub makes it highly visible and widely referenced across documents, aligning with previous research that highlights its widespread use in illicit activities like drugs and scams [11,12]. In contrast, emails play a more decentralized role, facilitating finance–crypto and other activities but with a high level of fragmentation. This suggests that email is used for specialized or niche purposes rather than large-scale coordination, reflecting its versatility as a communication tool. The study also highlights the cohesive structure of the Monero wallet subnetwork, which is heavily focused on finance–crypto activities. This finding emphasizes the significance of privacy-focused cryptocurrencies like Monero in facilitating financial transactions on the Dark Web, consistent with prior research on the role of cryptocurrencies in illicit markets [14]. Other ID types, such as BTC wallets and Paste, are associated with smaller, specialized subnetworks, reflecting their niche roles in the network. These findings contribute to a broader understanding of the Dark Web’s modular and hierarchical structure, where distinct subnetworks operate independently but are occasionally interconnected. The decentralized nature of email IDs and the cohesive structure of the XMR wallet subnetwork offer new insights into the diversity of communication and financial tools used on the Dark Web, challenging the notion of a monolithic Dark Web and instead portray it as a collection of specialized, interconnected communities. Criminals on the Dark Web often collaborate through command-and-control models, where central actors (e.g., via Telegram channels) broadcast instructions or share tools, enabling coordinated attacks such as ransomware campaigns. This involves steps like initial reconnaissance via shared hacking resources, exploitation through distributed tools, and monetization via cryptocurrency wallets, posing risks including data breaches, financial fraud, and drug trafficking.
The findings of this study also align with and expand upon the existing literature focused on extracting underlying networking information from Dark Web forums, particularly regarding the identification of key actors and the dynamics of criminal activities. Previous research has emphasized the importance of social network analysis in understanding the interactions and relationships among users within these forums. For instance, Phillips et al. [33] explored the social structures of Dark Web forums, identifying influential members and their roles in facilitating communication and coordination among various groups. Their work highlights the modular nature of these networks, where distinct subnetworks operate independently yet remain interconnected, a theme echoed in our findings regarding the diverse roles of different ID types. Similarly, Almukaynizi et al. [25] demonstrated how hacker social networks on Dark Web forums can be leveraged to predict future cyber threats. Their approach underscores the significance of user connectivity and interaction patterns in understanding the potential for illicit activities, reinforcing our observation that Telegram serves as a central communication tool for coordinating hacking-related operations. This centrality is further supported by the work of Sarkar et al. [26], who utilized social network analysis to predict enterprise cyber incidents based on interactions within Dark Web forums, emphasizing the relevance of these platforms in the broader context of cybersecurity.
In addition to focusing on actors and their interactions, L’Huillier et al. [34] provided a model for analyzing topics discussed within Dark Web forums, illustrating how specific themes emerge and evolve over time. This topic modeling complements our findings on the cohesive structure of the Monero wallet subnetwork, which is heavily focused on finance–crypto activities. The insights gained from their research highlight the significance of privacy-focused cryptocurrencies in facilitating financial transactions on the Dark Web, consistent with our results. Moreover, the work of Pete et al. [35] and Zhang et al. [36] further contributes to our understanding of the Dark Web ecosystem by examining the structural characteristics of various forums and the roles of different actors within them. Their analyses reveal that while some members act as hubs of information and coordination, others occupy more specialized roles, reflecting the complexity and diversity of interactions in these illicit spaces. Collectively, these studies underscore the multifaceted nature of Dark Web forums, where the interplay between communication channels, financial tools, and thematic activities shapes the landscape of illicit operations. By integrating insights from these works, our study not only contributes to the existing body of knowledge but also provides a foundation for future research aimed at monitoring and mitigating the risks associated with the Dark Web.
The practical implications of these findings are significant for law enforcement, cybersecurity, and policy-making. The central role of Telegram suggests that targeting these communication channels could disrupt hacking activities, while the cohesive structure of the XMR wallet subnetwork highlights the importance of monitoring privacy-focused cryptocurrencies in financial transactions. Additionally, the fragmented nature of most subnetworks (except Telegram) indicates that the Dark Web operates as a collection of specialized, independent communities, which may require tailored strategies for monitoring and intervention. Unexpected findings, such as the overrepresentation of Portuguese in certain categories, call for further investigation. While this may reflect regional trends or data collection biases, it could also indicate emerging patterns of activity in specific linguistic or geographic contexts. These insights can assist policymakers and law enforcement by informing targeted monitoring strategies. For instance, the centrality of Telegram in hacking activities suggests prioritizing surveillance of Telegram channels within Russian-speaking communities, while the cohesive role of XMR wallets in finance–crypto operations highlights the need for enhanced tracking of privacy-focused cryptocurrencies to disrupt financial flows in illicit markets. Similarly, the limited presence of BTC wallets in hacking activities, despite their widespread use, may suggest a shift toward privacy-focused cryptocurrencies like Monero, which offer greater anonymity. Institutions can mitigate these risks through international cooperation, such as joint operations between agencies like Europol and the FBI to share intelligence on Telegram channels and cryptocurrency transactions. For example, high betweenness centrality of Telegram IDs indicates their role as critical communication hubs, suggesting targeted monitoring of these nodes could disrupt coordination of hacking activities. Similarly, the cohesive structure of the XMR wallet subnetwork points to key financial nodes, enabling interventions like transaction tracking to curb illicit cryptocurrency flows. This could involve developing advanced monitoring tools for bipartite network analysis in real-time and fostering public–private partnerships to counteract the expansion of Dark Web activities.
The broader implications of this study extend to our understanding of the Dark Web ecosystem. The fragmented nature of most subnetworks, combined with the cross-topic interactions in the central connected component, highlights the interconnectedness of different illicit activities. This modular and hierarchical structure suggests that while the Dark Web is composed of specialized communities, there are also areas of overlap and collaboration between them. These insights provide a foundation for future research and practical strategies to monitor and mitigate the risks associated with the Dark Web.
This study has several strengths that contribute to its significance and rigor. First, the use of a bipartite network model provides a unique and comprehensive perspective on the relationships between IDs and web documents, allowing for a detailed analysis of the Dark Web’s structure and dynamics. By modeling the network in this way, the study captures the interplay between communication channels, financial tools, and thematic activities, offering a holistic view of the ecosystem. Second, the large and diverse dataset enables a robust analysis of various ID types and their roles, from Telegram and email to cryptocurrency wallets, ensuring that the findings are grounded in extensive empirical evidence. Third, the application of advanced network analysis techniques, such as k-core decomposition and centrality metrics, allows for the identification of influential and central areas of activity, providing insights into the hierarchical organization of the Dark Web. Finally, the study’s focus on subnetworks by ID type offers a nuanced understanding of the modular and specialized nature of Dark Web activities, highlighting both the diversity and interconnectedness of illicit operations.

Limitations and Future Work

While this study provides valuable insights into the identification and communication patterns on the Dark Web, several limitations must be acknowledged to contextualize the findings and guide future research. One key limitation is data collection bias, as the dataset was collected during a specific timeframe and focused on domains with higher traffic, as determined by an automated crawling system. This approach may have excluded less active or niche sites, potentially skewing the results toward more prominent activities. Consequently, the findings may not fully represent the diversity of the Dark Web ecosystem. Additionally, the dataset exhibits a language and regional bias, with a heavy skew toward English and Russian languages and limited representation of other languages. While this may reflect the underlying structure of the high traffic sites of the Dark Web, it could also be a result of the data collection process. For example, the overrepresentation of Portuguese in certain categories may not accurately reflect its prevalence on the Dark Web. This bias limits the generalizability of the findings to non-English or non-Russian speaking communities.
Another limitation is the static nature of the dataset, which represents a snapshot of the Dark Web at a specific point in time. Given the dynamic and rapidly evolving nature of the Dark Web, the findings may not reflect current or future trends. Combining this limitation with the data collection bias, the study provides a partial view of the Dark Web’s structure and activities. This limits the ability to understand trends or changes in the Dark Web ecosystem. Furthermore, while this study covers major topics such as hacking, finance–crypto, and drugs–narcotics, it may not fully capture less prevalent but still significant activities, such as weapon sales or political activism. Although the analyzed topics reflect the most active parts of the Dark Web, the findings may not encompass its full thematic diversity. The focus on major categories like hacking, finance–crypto, and drugs–narcotics may underrepresent less prevalent activities, such as weapon sales or human trafficking, potentially skewing the network structure toward dominant themes and limiting insights into niche but critical illicit operations.
This study also faces limitations related to its dependence on automated tools. Although the labeling of domains and documents was performed manually, the study relies on automated tools for data preprocessing, language detection, and network analysis. These tools may introduce errors or biases, such as misclassifying languages or failing to detect certain ID patterns, which could affect the accuracy of the results. Additionally, the study primarily focuses on the structural aspects of the Dark Web network, such as connectivity and centrality, without delving deeply into the content or context of interactions. While this approach provides valuable insights into the network’s organization, it may miss important contextual details about how IDs are used in practice. Scalability issues also pose a challenge, as the study analyzes a large but finite dataset. Future work could extend the methodology by integrating additional data sources, such as real-time social media feeds or dark pool data, and incorporating multilayer or dynamic network models to capture a broader range of Dark Web interactions. Scaling the analysis to the entire Dark Web or to real-time data may pose significant computational and methodological challenges. While scalability is desirable, it may not be feasible with current resources, limiting the applicability of the findings to broader contexts.
Finally, the study may be subject to potential overgeneralization and limited validation of findings. The findings are based on a specific dataset, and assuming that patterns observed in this subset apply to the entire Dark Web may lead to overgeneralization. These limitations highlight the need for caution in interpreting the findings and suggest avenues for future validation. While this study advances our understanding of the Dark Web’s identification and communication patterns, these limitations underscore the complexity of studying this hidden ecosystem. Future research should address these constraints by incorporating temporal analysis, expanding the dataset to include more languages and regions, and exploring the content and context of interactions to complement the structural analysis. Additionally, the study’s validation procedures indicate that extraction errors are limited (F1 ≈ 93.6%), though minor biases persist. The inclusion of smaller language subsets and temporal comparisons suggests that the network’s main patterns are stable and not purely driven by short-term sampling artifacts. Future work should expand on these approaches with multilingual crawling and long-term time-series monitoring.

6. Conclusions

This study provides a comprehensive analysis of the identification and communication patterns within the Dark Web, focusing on the roles of various ID types and their associated activities. By examining the overall network structure, subnetworks, and key metrics, we have uncovered several critical insights into the organization and dynamics of the Dark Web ecosystem. The analysis reveals that Telegram IDs form the backbone of the network, serving as the primary communication tool for coordinating hacking-related activities, particularly within Russian-speaking communities. The centrality of Telegram is evident in the cohesive structure of its subnetwork, which mirrors the overall network’s largest connected component. In contrast, email IDs play a more decentralized role, facilitating a wide range of activities, including finance–crypto and hacking, but with a high level of fragmentation and isolation. This suggests that email is used for specialized or niche purposes rather than large-scale coordination. XMR wallets stand out for their significant role in financial transactions, forming a relatively cohesive subnetwork focused on finance–crypto activities. This highlights the importance of privacy-focused cryptocurrencies in facilitating financial operations on the Dark Web. Other ID types, such as BTC wallets, Discord URLs, and Paste IDs, are associated with highly fragmented subnetworks, reflecting their specialized or supporting roles in the network.
The examination of subnetworks by topic further underscores the modular and hierarchical nature of the Dark Web. While hacking dominates the central clusters, particularly in the Telegram subnetwork, other topics such as finance–crypto, drugs–narcotics, and search-engine-index form distinct clusters, often operating independently. Cross-topic interactions within the central connected components suggest potential collaborations or overlaps between different illicit activities, highlighting the interconnectedness of various operations on the Dark Web. The high level of dispersion observed in most subnetworks, except for Telegram, indicates that much of the activity on the Dark Web occurs in small, isolated groups. This fragmentation reflects the decentralized and specialized nature of many Dark Web operations, where different ID types and topics coexist but operate independently.
In conclusion, while Telegram IDs form the structural backbone of the network, their centrality likely arises from their use as a broadcast and coordination platform rather than one-to-one communication. Email and XMR wallets fulfill distinct, complementary roles—decentralized information exchange and financial transactions, respectively. The integration of temporal, linguistic, and qualitative analyses strengthens the reliability of these findings, providing a robust basis for further research and applied monitoring strategies.

Author Contributions

Conceptualization, L.d.-M. and A.D.-D.; methodology, L.d.-M.; software, A.D.-D., J.J.-S. and C.C.; validation, L.d.-M., A.D.-D. and J.-J.M.-H.; formal analysis, L.d.-M.; investigation, J.J.-S. and C.C.; resources, J.J.-S. and C.C.; data curation, L.d.-M. and A.D.-D.; writing—original draft preparation, L.d.-M., A.D.-D. and J.-J.M.-H.; writing—review and editing, L.d.-M., J.J.-S., and C.C.; visualization, L.d.-M.; supervision, J.-J.M.-H.; project administration, J.-J.M.-H.; funding acquisition, L.d.-M., C.C. and J.-J.M.-H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the Spanish Ministry of Science and Innovation MCIN/AEI/10.13039/501100011033/FEDER, EU grant number PID2021-125645OB-I00 (PARCHE).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to privacy reasons as IDs scrapped from the dark web may potentially contain personal information of users like email addresses.

Acknowledgments

ByronLabs supported this work by providing the tools to crawl the Dark Web and the raw data used in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Chertoff, M.; Simon, T. The Impact of the Dark Web on Internet Governance and Cyber Security; Global Commission on Internet Governance: Waterloo, ON, Canada, 2025. [Google Scholar]
  2. Soska, K.; Christin, N. Measuring the longitudinal evolution of the online anonymous marketplace ecosystem. In Proceedings of the USENIX Security Symposium, Washington, DC, USA, 12–14 August 2015; pp. 33–48. [Google Scholar]
  3. Weimann, G. Terrorist migration to the Dark Web. Perspect. Terror. 2016, 10, 40–44. [Google Scholar]
  4. De-Marcos, L.; Domínguez-Díaz, A. LLM-Based Topic Modeling for Dark Web Q&A forums: A Comparative Analysis with Traditional Methods. IEEE Access 2025, 13, 67159–67169. [Google Scholar] [CrossRef]
  5. De-Marcos, L.; Domínguez-Díaz, A.; Stapić, Z. What’s Going on in Dark Web Question and Answer Forums: Topic Diversity and Linguistic Characteristics. IEEE Access 2025, 13, 149880–149890. [Google Scholar] [CrossRef]
  6. Kühn, P.; Wittorf, K.; Reuter, C. Navigating the shadows: Manual and semi-automated evaluation of the dark web for cyber threat intelligence. IEEE Access 2024, 12, 118903–118922. [Google Scholar] [CrossRef]
  7. Sangher, K.S.; Singh, A.; Pandey, H.M.; Kumar, V. Towards Safe Cyber Practices: Developing a Proactive Cyber-Threat Intelligence System for Dark Web Forum Content by Identifying Cybercrimes. Information 2023, 14, 349. [Google Scholar] [CrossRef]
  8. Bugajewska, M. A survey of challenges in dark web crawling: Technical, security, and ethical perspective. In Artificial Intelligence and Machine Learning. IBIMA-AI 2024. Communications in Computer and Information Science; Soliman, K.S., Ed.; Springer: Cham, Switzerland, 2025; Volume 2300. [Google Scholar] [CrossRef]
  9. Arabnezhad, E.; La Morgia, M.; Mei, A.; Nemmi, E.N.; Stefa, J. A light in the dark web: Linking dark web aliases to real internet identities. In Proceedings of the 2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS), Singapore, 29 November–1 December 2020; pp. 311–321. [Google Scholar] [CrossRef]
  10. Manolache, A.; Brad, F.; Barbalau, A.; Ionescu, R.T.; Popescu, M. VeriDark: A large-scale benchmark for authorship verification on the dark web. In Proceedings of the 36th International Conference on Neural Information Processing Systems (NIPS ’22); Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A., Eds.; Curran Associates Inc.: Red Hook, NY, USA, 2022; Article 1133; Volume 35, pp. 15574–15588. [Google Scholar]
  11. Nali, M.C.; Purushothaman, V.; Li, Z.; Larsen, M.Z.; Cuomo, R.E.; Yang, J.; Mackey, T.K. Identification and characterization of illegal sales of cannabis and nicotine delivery products on Telegram messaging platform. Nicotine Tob. Res. 2024, 26, 771–779. [Google Scholar] [CrossRef]
  12. La Morgia, M.; Mei, A.; Mongardini, A.M.; Wu, J. Uncovering the dark side of Telegram: Fakes, clones, scams, and conspiracy movements. arXiv 2021, arXiv:2111.13530. [Google Scholar] [CrossRef]
  13. Lee, S.; Yoon, C.; Kang, H.; Kim, Y.; Kim, Y.; Han, D.; Son, S.; Shin, S. Cybercriminal minds: An investigative study of cryptocurrency abuses in the dark web. In Proceedings of the Network and Distributed Systems Security (NDSS) Symposium 2019, San Diego, CA, USA, 24–27 February 2019; pp. 24–27. [Google Scholar] [CrossRef]
  14. Paquet-Clouston, M.; Haslhofer, B.; Dupont, B. Ransomware payments in the Bitcoin ecosystem. J. Cybersecur. 2019, 5, 1–11. [Google Scholar] [CrossRef]
  15. Dearden, T.E.; Tucker, S.E. Follow the money: Analyzing darknet activity using cryptocurrency and the Bitcoin blockchain. J. Contemp. Crim. Justice 2023, 39, 257–275. [Google Scholar] [CrossRef]
  16. Décary-Hétu, D.; Dupont, B. The social network of hackers. Glob. Crime 2012, 13, 160–175. [Google Scholar] [CrossRef]
  17. Takaaki, S.; Atsuo, I. Dark web content analysis and visualization. In Proceedings of the ACM International Workshop on Security and Privacy Analytics (IWSPA ’19), Richardson, TX, USA, 27 March 2019; pp. 53–59. [Google Scholar] [CrossRef]
  18. Cilleruelo, C.; de-Marcos, L.; Junquera-Sanchez, J.; Martinez-Herraiz, J. Interconnection between darknets. IEEE Internet Comput. 2021, 25, 61–70. [Google Scholar] [CrossRef]
  19. Alharbi, A.; Alhassan, M.; Alshahrani, M.; Alzahrani, A. Exploring the topological properties of the Tor dark web. IEEE Access 2021, 9, 21746–21758. [Google Scholar] [CrossRef]
  20. Borgatti, S.P.; Everett, M.G.; Johnson, J.C. Analyzing Social Networks; SAGE Publications: Thousand Oaks, CA, USA, 2018. [Google Scholar]
  21. Javed, M.S.; Sajjad, S.M.; Mehmood, D.; Mansoor, K.; Iqbal, Z.; Kazim, M.; Muhammad, Z. Analyzing Tor Browser Artifacts for Enhanced Web Forensics, Anonymity, Cybersecurity, and Privacy in Windows-Based Systems. Information 2024, 15, 495. [Google Scholar] [CrossRef]
  22. Figueras-Martín, E.; Magán-Carrión, R.; Boubeta-Puig, J. Drawing the web structure and content analysis beyond the Tor darknet: Freenet as a case of study. J. Inf. Secur. Appl. 2022, 66, 103229. [Google Scholar] [CrossRef]
  23. Hiramoto, N.; Tsuchiya, Y. Measuring dark web marketplaces via Bitcoin transactions: From birth to independence. Forensic Sci. Int. Digit. Investig. 2020, 35, 301086. [Google Scholar] [CrossRef]
  24. Tsuchiya, Y.; Hiramoto, N. Dark web in the dark: Investigating when transactions take place on cryptomarkets. Forensic Sci. Int. Digit. Investig. 2021, 36, 301093. [Google Scholar] [CrossRef]
  25. Almukaynizi, M.; Grimm, A.; Nunes, E.; Shakarian, J.; Shakarian, P. Predicting cyber threats through hacker social networks in darkweb and deepweb forums. In Proceedings of the CSS ’17, CSSSA’s Annual Conference on Computational Social Science, New York, NY, USA, 19–22 October 2017; pp. 1–10. [Google Scholar] [CrossRef]
  26. Sarkar, S.; Almukaynizi, M.; Shakarian, J.; Shakarian, P. Predicting enterprise cyber incidents using social network analysis on dark web hacker forums. Cyber Def. Rev. 2019, 4, 87–102. [Google Scholar]
  27. Dalins, J.; Wilson, C.; Carman, M. Criminal motivation on the dark web: A categorisation model for law enforcement. Digit. Investig. 2018, 24, 62–71. [Google Scholar] [CrossRef]
  28. Dolejška, D.; Koutenský, M.; Veselý, V.; Pluskal, J. Busting up monopoly: Methods for modern darknet marketplace forensics. Forensic Sci. Int. Digit. Investig. 2023, 46, 301604. [Google Scholar] [CrossRef]
  29. Jin, P.; Kim, N.; Lee, S.; Jeong, D. Forensic investigation of the dark web on the Tor network: Pathway toward the surface web. Int. J. Inf. Secur. 2024, 23, 331–346. [Google Scholar] [CrossRef]
  30. Bastian, M.; Heymann, S.; Jacomy, M. Gephi: An open source software for exploring and manipulating networks. In Proceedings of the International AAAI Conference on Web and Social Media, San Jose, CA, USA, 17–20 May 2020; Volume 3, pp. 361–362. [Google Scholar]
  31. Jacomy, M.; Venturini, T.; Heymann, S.; Bastian, M. ForceAtlas2, a continuous graph layout algorithm for handy network visualization designed for the Gephi software. PLoS ONE 2014, 9, e98679. [Google Scholar] [CrossRef]
  32. Watts, D.J. Networks, dynamics, and the small-world phenomenon. Am. J. Sociol. 1999, 105, 493–527. [Google Scholar] [CrossRef]
  33. Phillips, E.; Nurse, J.R.C.; Goldsmith, M.; Creese, S. Extracting social structure from darkweb forums. In Proceedings of the International Conference on Cyber Security for Sustainable Society, Coventry, UK, 26–27 February 2015; pp. 11–27. [Google Scholar]
  34. L’Huillier, B.; Ríos, S.A.; Alvarez, H.; Aguilera, F. Topic-based social network analysis for virtual communities of interests in the dark web. In Proceedings of the ACM SIGKDD Workshop on Intelligence and Security Informatics ACM, Washington, DC, USA, 25–28 July 2010; pp. 9:1–9:9. [Google Scholar] [CrossRef]
  35. Pete, I.; Hughes, J.; Chua, Y.T.; Bada, M. A social network analysis and comparison of six dark web forums. In Proceedings of the 2020 IEEE European Symposium on Security and Privacy Workshops (EuroSPW), Genoa, Italy, 7–11 September 2020; pp. 484–493. [Google Scholar] [CrossRef]
  36. Zhang, Y.; Zeng, S.; Fan, L.; Dang, Y.; Larson, C.A.; Chen, H. Dark web forums portal: Searching and analyzing jihadist forums. In Proceedings of the 2009 IEEE International Conference on Intelligence and Security Informatics, Dallas, TX, USA, 8–11 June 2009; pp. 71–76. [Google Scholar]
Figure 1. Bipartite graph representing the network of IDs and web documents colored by type of ID. Web documents are shown in gray. The size of nodes is proportional to the degree.
Figure 1. Bipartite graph representing the network of IDs and web documents colored by type of ID. Web documents are shown in gray. The size of nodes is proportional to the degree.
Information 16 00924 g001
Figure 2. Bipartite graph of the main connected component for the network of IDs and web documents (showed in gray). The size of nodes is proportional to the degree.
Figure 2. Bipartite graph of the main connected component for the network of IDs and web documents (showed in gray). The size of nodes is proportional to the degree.
Information 16 00924 g002
Figure 3. Bipartite graph of the second largest connected component for the network of IDs and web documents. IDs are colored by type. Documents are colored in gray. The size of nodes is proportional to the degree. Labels of the two Monero wallets have been shortened for graph readability.
Figure 3. Bipartite graph of the second largest connected component for the network of IDs and web documents. IDs are colored by type. Documents are colored in gray. The size of nodes is proportional to the degree. Labels of the two Monero wallets have been shortened for graph readability.
Information 16 00924 g003
Figure 4. Bipartite graph of the 5-core for the network of IDs and web documents. IDs are colored by type. Documents are colored in gray. The size of nodes is proportional to the degree.
Figure 4. Bipartite graph of the 5-core for the network of IDs and web documents. IDs are colored by type. Documents are colored in gray. The size of nodes is proportional to the degree.
Information 16 00924 g004
Figure 5. A graph of the bipartite subnetwork representing only the Telegram IDs and the web documents connected to them colored by topic. IDs are colored in gray. The size of nodes is proportional to the degree.
Figure 5. A graph of the bipartite subnetwork representing only the Telegram IDs and the web documents connected to them colored by topic. IDs are colored in gray. The size of nodes is proportional to the degree.
Information 16 00924 g005
Figure 6. A graph of the bipartite subnetwork representing only the email IDs and the web documents connected to them colored by topic. The size of nodes is proportional to the degree.
Figure 6. A graph of the bipartite subnetwork representing only the email IDs and the web documents connected to them colored by topic. The size of nodes is proportional to the degree.
Information 16 00924 g006
Table 1. Distribution of crawled Tor domains in the original dataset and number of initial web documents under each topic.
Table 1. Distribution of crawled Tor domains in the original dataset and number of initial web documents under each topic.
Topic#Domains#Documents
hacking8155,489
search-engine-index1433,541
finance–crypto423,088
drugs–narcotics68551
others45145
finance23189
electronics1232
Total53229,235
Table 2. Types of IDs and the documents connected to them by main topic and main language.
Table 2. Types of IDs and the documents connected to them by main topic and main language.
ID Type#IDs#DocumentsMain Topic#Main Language#
e-mail43,29829,735finance–crypto17,640English22,385
Telegram11,21876,967hacking55,482Russian50,416
Paste860623hacking589Russian495
PGP745895search-engine-index856English860
Phone531807hacking656Russian560
BTC wallet2921944other689English1106
Discord URL97260hacking217English199
XMR wallet2017,644finance–crypto17,597English17,663
Skype URL613hacking13Russian13
DASH wallet24hacking2English/Russian2
BNB wallet12hacking2English/Bulgarian1
ZEC wallet11hacking1Russian1
Overall network57,07182,285hacking57,223Russian50,852
Table 3. Top 3 topics and their languages for each of the top referenced ID types.
Table 3. Top 3 topics and their languages for each of the top referenced ID types.
ID TypeTopic#DocumentsLanguage#Documents
Telegramhacking55,482Russian50,411
finance–crypto17,614English17,614
drugs–narcotics1671English1614
e-mailfinance–crypto17,640English17,640
hacking9347Russian7129
search-engine-index1378English1357
XMR walletfinance–crypto17,597English17,597
search-engine-index30English30
hacking14Russian11
BTC walletother689Portuguese641
drugs–narcotics355English355
hacking308Russian182
Note: Top 3 topics for each ID type presenting more than 1000 references in all web documents.
Table 4. Metrics of subnetworks by ID type.
Table 4. Metrics of subnetworks by ID type.
Subcategory#Nodes#Edges#Documents#IDsConnected
Components
Size Largest
Component
DensityAvg. Path
Length
DiameterAvg.
Degree
Avg.
Closeness
Avg.
Betweenness
BTC Wallet223620241944292224644
(28.8%)
0.0041.1170.0040.598<0.001
Discord URL357314260976671
(19.9%)
0.0131.6850.0120.657<0.001
Email73,03383,46429,73543298304417,649
(24.2%)
<0.0013.3532<0.0010.446<0.001
Paste1483168162386028960
(4.0%)
0.0032.3190.0030.699<0.001
PGP1640100989574567059
(3.6%)
0.0021.2350.0020.906<0.001
Phone1338115980753133055
(4.1%)
0.0031.6850.0030.745<0.001
Telegram88,185141,47676,9671121862166,201
(75.1%)
<0.0015.2328<0.0010.408<0.001
XMR Wallet17,66417,64417,64420209204
(52.1%)
0.050110.0500.501<0.001
Overall network139,356248,79182,285570711848106,924
(76.72%)
<0.0018.5826<0.0010.401<0.001
Note: Each entry reports the subgraph containing only the nodes of the IDs’ type and all the documents connected to them. Only subgraphs with more than 200 nodes are reported.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

de-Marcos, L.; Domínguez-Díaz, A.; Junquera-Sánchez, J.; Cilleruelo, C.; Martínez-Herráiz, J.-J. Unveiling Dark Web Identity Patterns: A Network-Based Analysis of Identification Types and Communication Channels in Illicit Activities. Information 2025, 16, 924. https://doi.org/10.3390/info16110924

AMA Style

de-Marcos L, Domínguez-Díaz A, Junquera-Sánchez J, Cilleruelo C, Martínez-Herráiz J-J. Unveiling Dark Web Identity Patterns: A Network-Based Analysis of Identification Types and Communication Channels in Illicit Activities. Information. 2025; 16(11):924. https://doi.org/10.3390/info16110924

Chicago/Turabian Style

de-Marcos, Luis, Adrián Domínguez-Díaz, Javier Junquera-Sánchez, Carlos Cilleruelo, and José-Javier Martínez-Herráiz. 2025. "Unveiling Dark Web Identity Patterns: A Network-Based Analysis of Identification Types and Communication Channels in Illicit Activities" Information 16, no. 11: 924. https://doi.org/10.3390/info16110924

APA Style

de-Marcos, L., Domínguez-Díaz, A., Junquera-Sánchez, J., Cilleruelo, C., & Martínez-Herráiz, J.-J. (2025). Unveiling Dark Web Identity Patterns: A Network-Based Analysis of Identification Types and Communication Channels in Illicit Activities. Information, 16(11), 924. https://doi.org/10.3390/info16110924

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop