Deep Learning-Based Community Detection Approach on Bitcoin Network

Essaid, Meryam; Ju, Hongteak

doi:10.3390/systems10060203

Open AccessArticle

Deep Learning-Based Community Detection Approach on Bitcoin Network^†

by

Meryam Essaid

^1,* and

Hongteak Ju

²

¹

Department of Robotics Engineering, Keimyung University, Daegu 42601, Korea

²

Department of Computer Engineering, Keimyung University, Daegu 42601, Korea

^*

Author to whom correspondence should be addressed.

^†

This paper is an extended version of our paper published in Essaid, M.; Park, S.; Ju, H. Visualising Bitcoin’s Dynamic P2P Network Topoogy and Performance. In Proceedings of the 2019 IEEE International Conference on Blockchain and Cryptocurrency (ICBC), Seoul, Korea, 15–17 May 2019; pp. 141–145.

Systems 2022, 10(6), 203; https://doi.org/10.3390/systems10060203

Submission received: 30 September 2022 / Revised: 24 October 2022 / Accepted: 27 October 2022 / Published: 1 November 2022

Download

Browse Figures

Versions Notes

Abstract

:

Community detection is essential in P2P network analysis as it helps identify connectivity structure, undesired centralization, and influential nodes. Existing methods primarily utilize topological data and neglect the rich content data. This paper proposes a technique combining topological and content data to detect communities inside the Bitcoin network using a deep feature representation algorithm and Deep Feedforward Autoencoders. Our results show that the Bitcoin network has a higher clustering coefficient, assortativity coefficient, and community structure than expected from a random P2P network. In the Bitcoin network, nodes prefer to connect to other nodes that share the same characteristics.

Keywords:

bitcoin; network topology; community detection; deep learning; deep autoencoders

1. Introduction

BITCOIN has latterly captivated the utmost attention due to its underlying nature of combining cryptography [1,2], e-money system [3], and blockchain [4,5]. Bitcoin functions to propagate the balance records known as the ledger among the actors. Bitcoin actors establish direct communication in one of the most significant flooding P2P networks over TCP. Due to this distributed nature, the behavior of users determined the overall performance of the Bitcoin system. Community detection is a crucial area of research for determining and understanding distributed networks. It seeks to discover possible small groups called communities within the P2P networks and check the frequency of connection of the members of any given community. Understanding the Bitcoin network structure is critical to the robustness and security of public blockchain networks.

A group of nodes that is more densely bound than other groups in the network is a group with high modularity and is called a community. Community detection algorithms are used to examine how groups of nodes are grouped, what characteristics they have, and their tendency to strengthen or dispart in a given network. Popular Community Detection Algorithms with low runtime complexities, such as Louvain, Label Propagation, and Infomap methods, have been implemented and compared on Peer-to-peer (P2P) networks. Existing methods [6,7,8,9,10,11,12] mainly utilize only topological data and neglect the rich data obtained from the content data. As the size and complexity of P2P networks increases, more sophisticated techniques are needed to detect communities. In [9], the authors have proposed a method to monitor connections of known nodes in the network and then progressively discover other nodes through the analysis of their mutual contacts; instead of relying on the study of content characteristics or packet properties. In [10], the authors have proposed a Decentralized Iterative Community Clustering Approach (DICCA) to reveal the community structure for large networks using the LFR benchmark model. The proposed method identifies the community clusters from an entire network without the global knowledge of the network topology due to the use of the Parallel Decentralized Iterative Community Clustering Approach (PDICCA), a pipelined parallel implementation that transforms the serial process of the DICCA into a parallelized approach. Recent research [13,14,15,16,17,18,19,20] has shown that using content data also helps measure the similarity between nodes. Nodes with similar content are highly likely to belong to the same community. Hence, combining topological data with content data could remarkably affect community detection. There are primarily two methods to evaluate the fusion of different elements in the network. Firstly, the Linear Incorporating Method (LIM) hypothesizes that all components can be linearly joined and link various aspects through the linear regulators. Ruan et al. [13] proposed the Community Discovery Inferred from Content Information and Link-structure (CODICIL), a linear approach that combines topology and content data into a newly generated network. CODICIL method later computed node clusters using standard graph clustering algorithms onto the new network. Secondly, the Probabilistic Incorporation Method (PIM) presents the subjective preference for developing one piece of information by integrating different information sources. Yang et al. [14] proposed a stochastic model for the Communities from Edge Structure and Node Attributes (CESNA). CESNA connects topology and node attributes by sparse restrictions to gain the relation among node attributes and communities. They have also adopted the block coordinate descent method to optimize the model parameters. Bhih et al. [20] proposed a new method for community detection by considering both topology and content sources of information. The proposed algorithm tightly integrates the network’s attribute information, shared neighbors, and connectivity information aspects to build a hybrid similarity matrix. The generated matrix is then used to cluster the network in possible subnetworks.

Nonetheless, the above-stated approaches tolerate several drawbacks. They cannot reveal potential imperceptible links between nodes in the network. Furthermore, they cannot automatically weigh different features with adequate balanced parameters. To overcome these challenges, we proposed a method to measure the structure of the Bitcoin networks and get a deeper view of the network division, also called communities. The proposed method combines topology and content data using a Deep Learning (DL) algorithm. DL quickly examines deep non-linear connections and combines Deep Feedforward Autoencoders [21] and spectral clustering [22]. We designed each element of the proposed DL model for a particular purpose. (1) To create the Modularity Matrix (B), we used the Modularity Model (Q) to cluster topology data. (2) To generate the Markov matrix (M), we utilized the normalized-cut model (NCM) to cluster content data. (3) We applied spectral clustering (SC) to find the low rank embedding to reconstruct the spectral matrix. (4) To reproduce the input matrix, we used the Deep Feature Representations (DFR) algorithm to combine the modularity and Markov matrices. (5) We adopted Deep Feedforward Autoencoders (AFE) to combine different parameters of each node to create a profound depiction of the nodes.

1.1. Contribution

Through this research, we aim (1) to provide a quantitative examination of network properties and dynamicity in Bitcoin. The proposed method scans the network and tracks the nodes’ connectivity over time. Nodes in bitcoin can be classified as reachable and unreachable nodes, where over 30% of the active nodes are unreachable—nodes without inbound peers. While 80% of reachable nodes churn vastly, 16% of active nodes are well connected and overlap over all the measurement periods. (2) Expand the current knowledge of Bitcoin, mainly the P2P structure. Based on the total number of nodes’ inbound and outbound connections, Bitcoin active reachable nodes can be divided into three categories, light nodes with a connection count less than the default one, medium nodes with a total number of connecting equal to or over the default count, and heavy nodes with a paramount number of peers approximating or surpassing a 100. Using the proposed method, we noticed that the Bitcoin topology is centrally dense. The Bitcoin network shows more community structures compared to other random graph networks. Thus, we conclude that the Bitcoin topology is not a random graph; nodes tend to establish links with nodes like them more often than any other random network. (3) To offer a quantitative review of community structure to understand the Bitcoin underlying network better. Discovering and understanding the topology of unstructured blockchain networks is essential from various perspectives, including performance and security. The knowledge of the network properties is a basis for future protocol designs to improve information propagation speed, scale up the network size, and defend against possible malicious attacks (e.g., eclipse attacks and BGP hijacks). A more profound comprehension of the topology scheme also leads to the development of new practical algorithms. Bitcoin topology stands firmly on heavy and long-running nodes; improving the propagation mechanism using these nodes could improve the propagation delay by proximity ×25 compared to the Bitcoin default protocol. However, revealing these nodes’ information can threaten the Bitcoin ecosystem; for instance, attacking heavy and long-running nodes via the Internet routing infrastructure by BGP hijacks, intercepting traffic, or autonomous systems can intercept and manipulate a significant fraction of currency traffic.

1.2. Structure of the Paper

The paper is organized as follows. Section 2 and Section 3 introduce the Bitcoin network and preliminary deep learning model, respectively. Section 4 reports the experiment and the Bitcoin community structure analysis results. While Section 5 discusses how our findings could be used to enhance the network vitality and stability by analyzing the network resilience and security and improving the propagation mechanism. Finally, a summary of the performed work and future work directions are presented in Section 6.

2. Bitcoin & Blockchain Background

Bitcoin network is designed as a peer-to-peer (P2P) architecture on top of the Internet, where computers that run the Bitcoin protocol are peered to each other in a mesh network, forming a flat and decentralized topology. In a P2P network, nodes simultaneously provide and consume services. Within the P2P network, there are no central servers, no special nodes, and no hierarchy; nodes are all equal. An outstanding example of a P2P network architecture was the early Internet, where nodes were equal on the Internet Protocol (IP) network forming a flat topology. Nowadays, The Internet relies on a centralized, hierarchical architecture. However, the Internet Protocol is still P2P by nature. Another example of P2P technology is the file-sharing system with BitTorrent as the recent evolution of the P2P architecture.

Bitcoin is a virtual currency rooted in a distributed peer-to-peer network. In simple words, Bitcoin neither has nor requires a central server or authority. Users running Bitcoin communicate directly with each other in one of the largest P2P networks over the Internet. Through this distributed network, the behavior of its clients determines the overall performance of the Bitcoin system. Thus, Bitcoin’s P2P topology is much more than a network architecture choice. The P2P network architecture is the cornerstone of decentralization, security, and transparency.

Nodes are a significant part of Bitcoin protocols. When an actor joins the Bitcoin network to exchange (receive or send) bitcoin, the used computer acts as a node. There are various types of nodes. On the one hand, many Bitcoin nodes are light, downloading the most recent block data (ledger) required to process and verify newly generated transactions; this helps nodes run quickly and efficiently the Bitcoin client without requiring surplus storage or computational resources. On the other hand, full nodes store the complete copy of the ledger. Those nodes download all Bitcoin blocks and transactions from the Genesis Block (block 0) up to the most rest block. Lastly, miner nodes are nodes that verify and validate all the newly generated transactions and include them in the blockchain in the form of blocks; by doing so, new bitcoins are minted, and miner nodes get rewards as incentives for participation in the consensus process. A Bitcoin node can have one or many functions depending on its role in the network. All nodes have the routing function participating in the network, meaning they can discover and connect to other nodes and propagate and validate transactions and blocks. A few nodes can have a wallet function, a miner function, a copy of the entire blockchain database, or run different protocols.

In the Bitcoin network [23,24,25], nodes (actors) hold the bitcoin currency (BTC) by SHA-256 hash function, a brute-forcing dual cryptographic hash function. Participating nodes agree on a standard protocol to exchange the tokenized value maintaining an identical ledger copy. To avoid bitcoins (BTC) forging, the Bitcoin system [26] necessitates proof of work [27] from involved miners. Any newly mined block should include a hash value, a series of zeros whose length frequently changes to make coins mining computationally complex. Besides the mining process, the Bitcoin protocol offers a peer discovery mechanism to ensure node connectivity, stability, and homogeneity of ledger replicas. The ledger and its changes are publicly available in the network for all nodes, enabling them with instant majority consensus [28] to form the most current state of the system. To join the Bitcoin network, actors can run different implementations of the P2P protocol [29,30,31,32,33]. Once the node joins the P2P network, it uses the peer discovery mechanism implemented into the protocol to learn about other active nodes. Address gossiping allows the nodes to discover other potential active nodes. By default, nodes in the Bitcoin network can establish up to 125 connections. Among these, 117 connections are inbound links, and eight connections are outgoing links.

Bitcoin nodes utilize a flooding P2P broadcasting method to propagate blocks and transactions over the network participants. It is relatively straightforward to interact with active nodes in the network. Connected nodes exchange their network view (ledger stats) by sending new received/created transactions and blocks. Freshly connected nodes begin with a DNS lookup with predefined servers maintaining records of several known-active nodes. Then, the new nodes establish outbound connections with the received known-active nodes to either initialize or update the ledger. The nodes obey rules that support the consensus and influence network performance. Nodes only broadcast valid data to avoid endless and forged data broadcasting. The process of validating and rebroadcasting new data repeats until all the participating nodes in the network update their ledger. However, nodes can ban other nodes for 24 h if they misbehave by sending fallacious data, for example.

3. Proposed Methodology

In this section, we give an overview of the community detection [34,35] deep learning model, model basics, and model structure.

3.1. Deep Learning Model Preliminary

A distributed network

G

with a total of nodes

N

and the edge matrix

ξ

, where

a_{i j}

equals one if the nodes

i

and

j

are connected; otherwise, it is equal to zero. The node’s degree is shown by

ξ_{i} = \sum_{i} a_{i j}

, while

m = \frac{1}{2} \sum_{i} ξ_{i}

represents the overall number of edges in the network. The cosine similarity within the nodes

i

and

j

is presented by

o_{i j}

, while

O = [o_{i j}] ϵ R^{n \times n}

presents the similarity matrix. In this paper, we used N-Nearest Neighbors (NN) method to define the similarity matrix

O .

In the proposed NN model, we set

n

adjacent edges for each node and all other values to zero. As a result of this, we obtained the sparse similarity matrix where

= [s_{i j}] \in R^{n \times n}

. Considering two communities (for simplification purposes), to optimise the modularity matrix Q, we redefined it as follows:

Q = \frac{1}{4 m} \sum_{i j} (a_{i j} - \frac{ξ_{i} ξ_{i}}{2 m}) (ψ_{i} ψ_{i})

(1)

where

ψ

is the indicator of community membership indicator (

ψ_{i} = 1

or

- 1

if node

i

is part of the first community or second one). We also optimised the modularity matrix

Q

using eigenvectors and eigenvalues, where

B = [b_{i j}] \in R^{n \times n}

and

b_{i j} = a_{i j} - \frac{ξ_{i} ξ_{i}}{2 m}

, hence

Q

becomes:

Q = \frac{1}{4 m} ψ^{T} B ψ

(2)

ψ = [ψ_{i j}] ϵ \{- 1, 1\}

(3)

To avoid NP-hardness problem, we set the variable

ψ_{i j}

to take true values using the following equation:

\max Q = \max_{ψ ϵ R^{n \times n}} T r (ψ^{T} B ψ)

(4)

where

T r

presents the trace function, we used the Normalized-Cut Method (NCM) to calculate the cut-cost ratio between the actual connection in one cluster for the contact graph clustering. The objective function is as follows:

N_{c u t} (C_{1}, C_{2}, C_{3}, \dots, C_{n}) = \sum_{t = 1}^{k} \frac{l i n k (C_{t,} \bar{C_{t}})}{v o l (C_{t})}

(5)

where

l i n k (C_{t,} \bar{C_{t}})

denotes the total links based on content similarities from the node in

C_{t}

to all nodes not in

C_{t}

(

\bar{C_{t}}

), and

v o l (C_{t})

is the total inbound links count within

C_{t}

. If node

i

is considered as

v_{i}

, the objective function must be minimised as follows:

\min_{ϕ ϵ R^{n \times n}} T r (ϕ^{T} L ϕ)

(6)

L = D - S where D = diag (d_{1}, d_{2}, d_{3}, \dots, d_{4})

(7)

where

L

indicates the Laplacian matrix of the similarity graph, which is normalized by

D^{- 1} L = I - M

, and

I

denotes the identity matrix and

M = D^{- 1} S

represents the Markov matrix. The solution matrix

Φ

comprises the eigenvectors of the

k

smallest non-zero eigenvalues of the normalized Laplacian matrix (

D^{- 1} L

), for

ϕ_{i j} = \frac{1}{\sqrt{v o l (C_{t)}}}

if

v_{i} ϵ c_{i}

or

ϕ_{i j} = 0

otherwise.

3.2. Model Description

As mentioned in the previous section, the Modularity Model (Q) represents the topology-based clusters, and the Normalized-Cut Model (NCM) represents content-based clusters; we noticed that Q and NCM could be considered directed at reproducing modularity matrix B and Markov matrix M. Thus, applying Deep Feedforward Autoencoders (DFAE) help combine network topology and content data in a non-linear fashion to automatically learn the balance of combining different data sources, avoiding the requirement to tune the combination coefficient. The DFAE is the critical building element in our model. Finally, spectral clustering (SC) is used to find a low rank embedding to reconstruct the spectral matrix. The proposed DFAE can support multi-layers to achieve additional features and more accurate representation from the deep structures.

For a given network A, we constructed the modularity matrix B, then for each node we select its η closest nodes in the similarity matrix O using the Nearest Neighbors algorithm (NN) resulting in the matrix S. By using the new matrix S, we built the Markov matrix M. The mixed spectral Matrix

X = [B, M]

was fed to the auto-encoder algorithm. The auto-encoder algorithm was split into Encoding and decoding layers. The encoding layers map the input features (the

X

matrix) to a lower-dimensional

H

matrix at the hidden layer (

h

). The function used to map the matrix

X

into the

H

matrix is as follows:

H = f (X) = Γ (W^{(H)} X + b^{(H)})

(8)

W^{(H)}

and

b^{(H)}

present the parameters of the encoder.

Γ

presents the non-linear function (such as

\tan (x) = \frac{1}{(1 + \exp (- x)}

). The decoder layers

Y

map back

H

to the original data

X

as follows:

Y = g (H) = Γ (W^{(Y)} X + b^{(Y)})

(9)

W^{(Y)}

and

b^{(Y)}

are the parameters learned in the decoder.

Γ

presents the decoding layers’ mapping function (sigmoid function). The suggested DFAE represents the input features minimized by the reconstruction loss between the original and representation data. Therefore, the optimized function is defined as follows:

\hat{θ} = \arg \min_{θ} L_{θ} (X, Y) = \arg \min_{θ} \sum_{i = 1}^{n} ‖ x_{i} - y_{i} ‖_{2}

(10)

\hat{θ}

presents the vector of the auto-encoder, while

L_{θ}

denotes the Euclidean distance loss function used to measure the error rate. To minimize the objective function

J (θ)

, we randomly initialize each parameter and use backpropagation with stochastic gradient descent. After each iteration, we update the parameters

θ = \{W^{(H)}, b^{(H)}, W^{(Y)}, b^{(Y)}\}

as follows:

J (θ) = \sum_{i = 1}^{n} ‖ x_{i} - y_{i} ‖_{2}

(11)

W_{j i}^{*} = W_{j i}^{*} - α \frac{\partial}{\partial W_{j i}^{*}} J (θ)

(12)

b_{j}^{*} = b_{j}^{*} - α \frac{\partial}{\partial b_{j}^{*}} J (θ)

(13)

where

α

is the learning rate and the wildcard

* = \{(H, Y)\}

and by defining

z^{*} = W^{*} x + b^{*}

, we got:

\frac{\partial}{\partial W_{ji}^{*}} J (θ) = \sum_{i = 1}^{n} \frac{\partial J (θ)}{{δ z}_{j}^{*}} \cdot \frac{{δ z}_{j}^{*}}{W_{ji}^{*}} = \sum_{i = 1}^{n} δ_{j}^{*} X_{i}^{T}

(14)

\frac{\partial}{\partial b_{j}^{*}} J (θ) = \sum_{i = 1}^{n} \frac{\partial J (θ)}{{δ z}_{j}^{*}} \cdot \frac{{δ z}_{j}^{*}}{b_{j}^{*}} = \sum_{j = 1}^{n} δ_{j}^{*}

(15)

δ_{j}^{*} = \frac{\partial J (θ)}{δ z_{j}^{*}}

measures the error between the activation and the actual target value, the error is defined by:

δ_{j}^{(Y)} = - \sum_{i = 1}^{n} (y_{ij} - y_{ij}) \cdot s' (z_{j}^{(Y)})

(16)

δ_{j}^{(Y)} = (\sum_{i = 1}^{n} W_{j i}^{(H)} δ_{i}^{(Y)}) \cdot s^{'} (z_{j}^{(H)})

(17)

For the preliminary training of the auto-encoder algorithm, we used the matrix

X = [B, M]

to obtain

H^{(i)}

and then build

Y^{(i)}

. In the proposed method, we integrated both modularity and normalized-cut models to combine topology and node content data for community detection. We trained the model with i-th auto-encoder to rebuild the appropriate hidden layer of the (i − 1)-th auto-encoder, where

\hat{θ} = \arg \min_{θ^{(i)}} L_{θ^{(i)}} (H^{(i - 1)}, Y^{(i)})

. Finally, we selected the K-Means algorithm to be the clustering method to detect the communities. The model final complexity function is computed as follows:

O = (t_{p} t_{q} γ L (m + n k))

(18)

where

L

presents the number of layers,

n

denotes the number of nodes in the network,

m

is the number of edges. Thus, the average degree is given by

\frac{2 m}{n}

. While

γ

is the highest number of the hidden layers,

t_{p}

and

t_{q}

present the number of iterations and the number of parameters, respectively. We need to emphasize that the parameter

γ

is a constant value related to the model dimension, not the total number of nodes in the network. Thus, the complexity is approximately linear in terms of full nodes and links in the given network. We noted that a complex structure led to only mimicking the identity function throughout the preliminary training process. The model only learned how to copy the inputs to the outputs without learning any meaningful representation of the fed data. In contrast, a simple structure led to underfitting. The suggested DL model is an auto-encoders organized with five encoding and four decoding layers as illustrated in Figure 1. In contrast, the dimension of every layer is configured to be less than both input and output space.

3.3. Topology Dynamicity Analysis

The changes in the structural position of the individual nodes in the distributed network define the topology dynamicity.

The topology dynamicity can reveal the network’s structural vulnerabilities and evolutionary changes. It mainly helps discover the contribution of nodes to network evolution and disintegration. In Bitcoin, studying the structural positions of nodes in the network can help to understand the behavior of actors and reveal how nodes switch their roles in the network over time (e.g., level of interactions with other nodes). The structural position of each node in the network was measured using the essential properties of the social network analysis, i.e., betweenness centrality (BC), closeness centrality (CC), and degree centrality (DC). To compute the degree of dynamicity (DD) of a particular node, we used the following equation:

D D^{i} = \frac{\sum_{j = 1}^{m} |V_{A N}^{i} - V_{N G_{(j)}}^{i} |}{m}

(19)

D D^{i}

indicates the degree of dynamicity shown by the

i

-th nodes,

V_{A N}^{i}

represents the structural position (BC, CC, or DC) in the aggregated network for the

i

-th nodes, and

V_{N G_{(j)}}^{i}

indicates the structural position for the

i

-th nodes in the same given network graph (

NGNG

).

m

presents the number of graphs considered in the analysis. A Bitcoin node may not always exist in the network graphs as they can join and leave the network at any time. A node may exist in the

j

-th NG and may vanish in the

(j \pm 1)

-th

NGNG

. Highlighting that a change from the existing state to the vanishing state in two successive NG will negatively influence the node’s degree of dynamicity. A node presence in two consecutive graphs shows a higher degree of dynamicity than a node seen in only one of the successive graphs. To capture the contribution of the nodes on the topological dynamicity, Equation (19) becomes:

D D^{i} = \frac{\sum_{j = 1}^{m} α_{j, j - 1} \times |V_{A N}^{i} - V_{N G_{(j)}}^{i}|}{m}

(20)

α_{j, j - 1}

represents the states of a given node in consecutive graphs. The degree of dynamicity (

D D^{i})

was normalised using the highest detected degree of dynamicity (

D D^{H})

shown by an individual node in the network. Thus, the normalised degree of dynamicity (

N D D^{i}

) was calculated as follows (Where

n

is the total number of nodes in the graph (

N G)

.):

N D D^{i} = \frac{1 - (D D^{H} - D D^{i})}{n}

(21)

To evaluate the level of randomness in the Bitcoin network, we compare the properties of the Bitcoin network with those of three other random networks: (1) Erdős-Rényi (ER) model, (2) Barabási-Albert (BA) model, and (3) Configuration model (CM). The Erdős-Rényi (ER) model generates graphs with an identical independent likelihood of connections. As input parameters to ER model, we use the peer count from our measurements and the default peer count from Bitcoin documentation. The Barabási-Albert (BA) model creates scale-free graphs with power-law degree distribution. We used the same number of nodes and edges found in our measurements as input parameters to the BA model. Since the degree distributions generated by the ER and BA models are not close to those observed in the Bitcoin network, we used the Configuration Model (CM) with empirical degree distribution. Finally, we calculated the topology properties in each random network and compared them with topology properties found in Bitcoin’s network. We want to emphasize that the proposed method could detect community structures in other public blockchain networks that share the same characteristics as Bitcoin.

4. Experiment & Results

In this section, we introduce the method used to collect Bitcoin topology information and content data, present the experiment setup and settings, and report the experimental results later.

4.1. Data Collection

We used the Nodes-Probe technique [23,25,36] to collect the topology and content data. Nodes-Probe was based on the flooding P2P resource discovery technique. The method reconstructs the graph around all active nodes in the network. Later it combines them to discover an approximate topology. The Node-Probe agent creates a topology snapshot by repeating the simple four steps: (1) prepare an IP list, (2) send ‘PING,’ (3) send ‘Version,’ and (4) send ‘GETADDR.’ The proposed peer inference method combines topology and content data to map the reachable nodes to peers that are possible outbound connections in real-time manner.

The peer inference method was split into two phases:

(1) Timestamp Match: To identify a possible peer relationship between nodes, we compare the timestamps of a node in interest and its candidate peers recorded when they are alive; the timestamp values can be acquired and registered from ‘ADDR’ reply messages. If both the node and peers are active during the same period, we nominate such peers to go on to the next phase of our peer inference. Furthermore, we apply unequal selection probabilities favoring peers with fresh timestamp.

(2) Ledger Status Match: According to the Bitcoin propagation protocol, when a node receives a newly mined block, it propagates it to all its outbound connections (peers) to update the ledger. Therefore, directly connected nodes likely have a similar ledger [34,35]. Thus, we compare the node in interest’s ledger status and pre-selected peers nominated in the previous phase. If both ledgers include the latest ten mined blocks, we assume that the pre-selected peer is the node’s outbound connection.

To collect the node data, we ran Nodes-Probe on the Bitcoin Main network during a seven-days period from 8 May 2022 until 14 May 2022, which allowed us to gather over 92,300 active nodes with a daily average of 9500 active nodes. The dataset used in this work consists of 92,378 nodes and 914,694 edges, where the edges present the nodes’ outbound connections. To validate the collected data, we compare the daily peer count as measured by the proposed Nodes-Probe [36], Bitnodes [37], and Bitcoin Monitoring [38]. The comparison results showed that the quotidian number of active nodes was nearly alike among the three datasets, with a difference of ±1% and ±5% of the total found nodes by Bitcoin Monitoring and Bitnodes, respectively.

By default, nodes can establish 125 connections in the Bitcoin network—8 outbound and 117 inbound peers. According to the total number of outgoing connections, we have classified the bitcoin nodes into three categories: (1) Light nodes denote the nodes with an outbound links count of less than eight, accounting for 9.74% on average. (2) Medium nodes represent nodes with 8 to 12 outgoing links, making up approximately 73.26% of the discovered reachable nodes. (3) Heavy nodes (or well-connected nodes) point to the nodes with outbound links count surpassing 12 peers, with a paramount of peers equal to 135. Heavy nodes account for approximately 17% of the reachable nodes; 3% of Heavy nodes have more than 100 outbound connections.

4.2. Description of the Experiment

We calculate the node content similarity by applying cosine similarity to create the Bitcoin content-based topology. We guarantee that topology and content data are in the same range using the Z-score Normalization function. Then, we trained the model with well-defined parameters (Liveness, Freshness, Degree, Inbound Links, Outbound Links, BC, DC, and CC) and gained the potential description for each row in the used matrix. Furthermore, we used the k-Means algorithm for clustering the obtained description and generated the result as the average performance of experimental results. Lastly, we utilized the Normalized Mutual Information (NMI) as a community comparison matrix to evaluate the performance of the presented technique. The normalized mutual information (NMI) is defined as follows:

N M I (C, C^{*}) = \frac{\hat{M I} (C, C^{*})}{\max (H (C), H (C^{*}))}

(22)

H (C) = \sum_{c_{i}} \frac{|C_{i}|}{|C|} \log (\frac{|C_{i}|}{|C|})

(23)

\hat{M I} (C, C^{*}) = \sum_{C_{i}, C_{j}} p (C_{i}, C_{j}^{*}) \log \frac{p (C_{i}, C_{j}^{*})}{p (C_{i}) p (C_{j})}

(24)

where

H (C)

is the entropy of the communities

C

and

\hat{M I} (C, C^{*})

is the reciprocal data between the found Structure communities

C

and ground-truth communities

C^{*}

.

4.3. Modularity Results

Modularity is a measure of the structure of networks that measures the strength of communities of a given network. As defined in Equation (1), the range of the modularity value is between −1 and 1. The value generally varies between 0.3 and 0.9 for detecting communities across different networks. We have noticed that the community structure generated by the proposed method has higher modularity of around 0.8 since the graph involved many nodes/edges and the model is based on a combination of both topology and content data. Further improvement of the proposed method could be conducted by focusing on label propagation and narrowing the focus in a graph.

4.4. Community Structure Analysis Result

To evaluate the proposed community detection model, we have trained the DFAE model using two datasets with ground-truth network communities in social and information networks [39]. The first dataset is the Amazon database contains the customer data based on the “who bought this item also bought” feature. If a given item

i

is repeatedly co-purchased with an item

j

, the Amazon graph holds an indirect link between

i

and j items. It also provides a ground-truth community for each category of the items. The second dataset we have used is the YouTube social network based on friend relationships and groups. For the ground-truth communities, we used the user-joined groups as references.

Before examining the community structure in the Bitcoin graphs, we first evaluate the efficiency of the suggested DFAE model in detecting communities by comparing its performance with four other well-known community detection methods: (1) SBM [40], (2) CAN [18], (3) Louvain [41], and (4) LPA random [42]. From the results of Figure 2, the proposed DFAE neatly accomplished the best performance and proved to be the best, followed by CAN and SBM. The NMI value of other methods showed a significant downtrend due to the severe randomness of the label updating process, mainly because the Bitcoin network includes approximately 10 K nodes.

One of the most crucial issues we need to consider when evaluating a specific Deep learning algorithm for real-time application is how it grows or scales; to rephrase it, how the variations in the input size affect given algorithm performance. This growth concept can be predicted using the Big O notation quantifies. This notation specifies the dominant term used to define the time complexity involved with running an algorithm, which helps identify the effect of the input size over the execution time—how quickly the runtime grows relative to the input, as the input increases. The runtime in the feed-forward algorithm is based on the number of floating-point operations per second (flop/s) required for an epoch batch multiplied by the total number of epochs needed to reach a desired level of accuracy. The number of flop/s depends on the data count (data size) and the DL network architecture. Thus, in a feed-forward algorithm, the time complexity is a function of the number of layers, the number of neurons per layer, and the feature dimensions. To demonstrate that the proposed DFAE algorithm scales well when increasing the network size, we ran it on a simulated Bitcoin network of various sizes, ranging from 5000 to 145,000 nodes. Figure 3 illustrates the execution time of running the community detection algorithm with DFAE and Louvain, with the number of iterations set to 100. While the proposed DFAE method does have a relatively small runtime compared to the Louvain algorithm, it still scales in the same manner as the Louvain algorithm.

The Bitcoin community structure results show that Bitcoin has 20 communities on average. While the ER network has five communities on average, the scale-free network (BA) and the CM network have seven communities. Compared to the other random networks, Bitcoin contains considerably more communities. Our analysis also shows that at least two large communities in every network snapshot include 16% of the discovered nodes and an occasional super community of over 30% of heavy nodes. We have also noticed that a few communities have 15 to 20 well-connected nodes, as shown in Table 1. The rest of the communities have less than six well-connected nodes. Furthermore, we have noticed an overlapping between the discovered communities where the top 5 well-connected nodes participate in all the founded communities.

Figure 2. Normalized Mutual Information (NMI) performance curves balanced with coefficient λ.

Figure 3. Analysis of time complexity. Comparison of the runtime between the proposed method DFAE and Louvain on various network sizes, ranging from 5000 to 145,000 nodes, with a number of iterations set to 100.

4.5. Bitcoin Network Properties

As shown in Table 2, the calculated value of each network property metric is higher in the Bitcoin network than in the generated random networks. In Bitcoin, the diameter is equal to 6, whereas the closest one is the diameter of the CM graph with a value of 5.897. Eliminating the heavy nodes from both graphs decreased the diameter to 5 and became similar in both networks. The results in [38] show that the Bitcoin test-net has a diameter equal to 5, and the average clustering coefficient is equal to 0.052. In contrast, our results show that the average clustering coefficient in the main net is equal to 0.068. Furthermore, the average of both the clustering coefficient and assortativity coefficient in the Bitcoin graph was higher than expected from any generated random graphs. The topology in the Bitcoin network is very dense, ensuring connectivity among all nodes. In addition, Bitcoin nodes tend to establish connections with nodes like them more often than any other random network. Since the Bitcoin network is extensive and the topology is very dense. To better show the community structure results, we selected to provide the graph of the top 10 Bitcoin communities. Figure 4 illustrates the visualization of the top 10 communities in the snapshot taken on 14 May 2022 from the Bitcoin main network.

5. Discussion

Our findings could be used to enhance the network vitality and stability by analyzing the network resilience and security and improving the propagation mechanism. Our measurement results show that active reachable Bitcoin nodes play an essential role in the vitality and resilience of Bitcoin’s network. They allow the new joining nodes to connect to other peers and provide them with the current ledger state. For more efficient and fast data propagation, the role played by Well-connected nodes in the network could be extended. Well-connected nodes can be considered as Bitcoin Master nodes and used to improve the propagation delays. The transactions and blocks could be propagated in two steps: (1) from one cluster to other adjacent clusters in the network and (2) from Master nodes to typical peers in each cluster. For efficient data transfer between different clusters in step (1), utilizing Boundary Master nodes would avoid any redundancy in the data propagation. The powerful influence of Well-connected nodes shown by the dynamicity analysis may threaten the Bitcoin network operation. More thorough studies are required on the dynamics of Well-connected nodes to specify the danger posed to Bitcoin.

5.1. Performance Improvement

The community structure measurement results show that active reachable Bitcoin’s nodes play an essential role in the vitality and resilience of Bitcoin’s network, as they allow the new joining nodes to connect to other peers and provide them with the current ledger state. Specifically, the role played by Well-connected nodes (nodes with more than default peer count and up-to-date ledger) could possibly be extended for more efficient and fast data propagation. Here, we provide an exemplary improvement suggestion and preliminary simulation results when applied to the network.

In our proposal, Well-connected nodes can be considered as Bitcoin Master nodes and used to improve the propagation delay as follows: Considering that

N \{n_{1,} n_{2,}, \dots, n_{i,}\}

is a set of nodes in the Bitcoin network where

i

presents the number of total nodes.

M N \{m n_{1}, m n_{2,}, \dots, m n_{j,}\}

presents the Master nodes set where

j

presents the number of total Master nodes (

M N \subseteq N

).

Considering that

m n_{x} \{m n_{x}, p_{1}, p_{2}, \dots, p_{k}\}

where

m n_{x}

presents the set of Master nodes in the given

x

th cluster,

k

is the total number of peers in the cluster. Thus, we have

m n_{x} \subseteq N

and

N = m n_{1} \cup m n_{2} \cup \dots \cup m n_{x}

. Now, when a new node

q

wants to join the network, it first connects to a Special Node

(S N)

learned from the

D N S

seeds. The

S N

provides node

q

with the list of the available Master nodes to establish a connection with. Node

q

selects one master node

m n_{i}

where

\forall m n_{i} \in M N

and

d i s t a n c e (q, m n_{i}) \leq d i s t a n c e (q, m n_{x})

. Knowing some peers can be part of different clusters. Considering that, the following two clusters

C = c_{1}, c_{2}, \dots, c_{n}

and

G = g_{1}, g_{2}, \dots, g_{n}

, and

[c_{y}, g_{y}]

presents the boundary nodes of both clusters where

c_{y} \in C a n d g_{y} \in G

. Thus, all peers of both clusters where

c_{n} \neq c_{y}, g_{n} \neq g_{y}, c_{n} ϵ S, a n d g_{n} ϵ G

, we have

d i s t a n c e (c_{n}, g_{n}) \geq d i s t a n c e (c_{y}, g_{y})

. We need to emphasize here that the

d i s t a n c e (x, y)

point out the geo-internet distance of the two nodes x and y in the Bitcoin network.

To evaluate our proposed propagation method, we simulate the network using the same node count measured from the Bitcoin main network in our previous work [36]. Within the conducted experiment, the transaction generated by a given Master Nodes (MN) is propagated from MN to each directly connected node (where a few are boundary nodes). Then, the neighboring (master, boundary, and normal) nodes record the latency by which all the peers would receive the transaction. For example, when a Bitcoin node

B N

with

n

direct connections (neighboring peers) broadcasts a transaction at time

t_{BN}

, the sent data is received at different times (

t_{1}, t_{2}, \dots, t_{n}

) by

B N

’s neighboring peers, where

t_{n} > t_{n - 1} > \dots > t_{2} > t_{1}

. Thus, the time differences between the broadcast event and the arrivals of data at each peer (

Δ t_{B N, 1}

,

Δ t_{B N, 2}, \dots, Δ t_{B N, n}

) is calculated as follows:

Δ t_{B N, i} = t_{i} - t_{B N}

(25)

Now,

Δ t_{B N, i}

refers to the time taken for the Bitcoin node

B N

to finish data propagation to all its peers.

The simulation results indicate that information propagation using Master nodes could improve the propagation delay by proximity ×25 compared to the Bitcoin default protocol, as shown in Figure 5. The most plausible explanation for the significant improvement in the propagation delay compared to Bitcoin’s default protocol is that Master nodes propagation takes advantage of the correlation results shown in the communication cost long computed using the distance of boundary nodes between Bitcoin clusters. We need to emphasize that since in the main network, the latency might be affected by many events such as loss of connections. To increase the accuracy of the collected latencies, we run the simulation 100 times and provide the average of the gotten results. We notice that after 100 runs, the latency remains the same irrespective of the changes made on the number neighboring peers of the master node if all clusters are fully connected.

5.2. Analysis of Security Vulnerabilities

Bitcoin is a pseudo-anonymous network since each actor owns a public address that theoretically could be mapped to an IP address through proper network analysis. Any successful mapping of actors’ IP addresses to a real-world identity [24,43,44] can threaten the anonymity concept in the Bitcoin network and the privacy of its actors, which represents a crucial defect in the cryptocurrency ecosystem. With the considerable adaptation and use of Bitcoin and a Market Cap at a current level of 358.81B, a few sophisticated techniques were proposed to map the Bitcoin public address to IP addresses and real-world identities. In the bitcoin network, nodes keep broadcasting their IP addresses allowing other nodes to maintain their presence in the network. Collecting revealed IP addresses can be easily done by scanning the Bitcoin main network. Thus, grouping those addresses using geo-graphic regions (geolocation) or autonomous systems (AS) is as simple as collecting them. Considering the K-anonymity attack, where “K” presents the total count of nodes belonging to a particular AS or a particular geo-region “X,” the anonymity of nodes resident in the given AS/geo-region becomes K-out-of-X. A previous study [30] showed that over 6% of the ASes participating in the Bitcoin network have a single node, in such cases, the anonymity of the node converges to a minimal one-out-of-one. Thus, we can easily link the Bitcoin public addresses to the IP address and trace back all transactions from those public addresses to the real user ID.

Bitcoin is a pseudo-anonymous network since each actor owns a public address that theoretically could be mapped to an IP address through proper network analysis. Any successful mapping of actors’ IP addresses to a real-world identity can threaten the anonymity concept in the Bitcoin network and the privacy of its actors, which represents a crucial defect in the cryptocurrency ecosystem. With the considerable adaptation and use of Bitcoin and a Market Cap at a current level of 358.81B, a few sophisticated techniques were proposed to map the Bitcoin public address to IP addresses and real-world identities. In the bitcoin network, nodes keep broadcasting their IP addresses allowing other nodes to maintain their presence in the network. Collecting revealed IP addresses can be easily done by scanning the Bitcoin main network. Thus, grouping those addresses using geo-graphic regions (geolocation) or autonomous systems (AS) is as simple as collecting them. Considering the K-anonymity attack, where “K” presents the total count of nodes belonging to a particular AS or a particular geo-region “X,” the anonymity of nodes resident in the given AS/geo-region becomes K-out-of-X. A previous study [30] showed that over 6% of the ASes participating in the Bitcoin network have a single node, in such cases, the anonymity of the node converges to a minimal one-out-of-one. Thus, we can easily link the Bitcoin public addresses to the IP address and trace back all transactions from those public addresses to the real user ID.

6. Conclusions

In this work, we proposed a deep non-linear method that combines topology data and content data of a distributed network to detect community structure. The proposed approach integrates a deep feature representation algorithm (DFR) founded on Deep Feedforward Autoencoders (DAFE). Our results regarding the community structure showed that the Bitcoin graph displays a higher community structure, high assortativity coefficient, and clustering coefficient than expected from a randomly distributed network. In Bitcoin, nodes tend to establish links with nodes like them more often than any other random network. Moreover, the presented results indicate that well-connected nodes take a leading role in the vitality and resilience of the Bitcoin network.

Our future work is being split into two directions. On the one hand, we aspire to improve the proposed method by focusing on the label propagation mechanism and narrowing the focus in a graph. On the other hand, we intend to enhance the data propagation using a community-based propagation algorithm and focus on the threat posed to Bitcoin due to the existing well-connected nodes.

Author Contributions

Conceptualization, M.E.; methodology, M.E.; formal analysis, M.E. and H.J.; investigation, M.E.; resources, M.E.; data curation, M.E.; writing—original draft preparation, M.E.; writing—review and editing, M.E. and H.J.; funding acquisition, M.E. and H.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

This research was supported by the BISA Research Grant of Keimyung University in 2021.

Conflicts of Interest

The authors declare no conflict of interest.

References

Judmayer, A.; Stifter, N.; Schindler, P.; Weippl, E. Blockchain: Basics. In Business Transformation through Blockchain; Palgrave Macmillan: London, UK, 2019; pp. 339–355. [Google Scholar] [CrossRef]
Kasper, M.; Schindler, W.; Stöttinger, M. A stochastic method for security evaluation of cryptographic FPGA implementations. In Proceedings of the 2010 International Conference on Field-Programmable Technology, Beijing, China, 8–10 December 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 146–153. [Google Scholar] [CrossRef]
Grym, A. The great illusion of digital currencies, BoF Economics Review, No. 1/2018. 2018. Available online: http://hdl.handle.net/10419/212992 (accessed on 11 January 2022).
Michael, J.; Cohn, A.L.A.N.; Butcher, J.R. Blockchain technology. Journal 2018, 1, 7. [Google Scholar] [CrossRef]
Capece, G.; Ghiron, N.L.; Pasquale, F. Blockchain Technology: Redefining Trust for Digital Certificates. Sustainability 2020, 12, 18952. [Google Scholar] [CrossRef]
Karrer, B.; Newman, M.E. Stochastic blockmodels and community structure in networks. Phys. Rev. E 2011, 83, 016107. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Yang, J.; Leskovec, J. Overlapping community detection at scale: A nonnegative matrix factorisation approach. In Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, Roma, Italy, 4–8 February 2013; Association for Computing Machinery: New York, NY, USA, 2013; pp. 587–596. [Google Scholar]
Yang, L.; Cao, X.; He, D.; Wang, C.; Wang, X.; Zhang, W. Modularity Based Community Detection with Deep Learning. IJCAI 2016, 16, 2252–2258. [Google Scholar]
Jusko, J.; Rehak, M. Identifying peer-to-peer communities in the network by connection graph analysis. Int. J. Netw. Manag. 2014, 24, 235–252. [Google Scholar] [CrossRef]
Bhih, A.; Johnson, P.; Randles, M. Decentralized iterative approaches for community clustering in the networks. J. Supercomput. 2019, 75, 4894–4917. [Google Scholar] [CrossRef]
Ding, S.; Yue, Z.; Yang, S.; Niu, F.; Zhang, Y. A novel trust model based overlapping community detection algorithm for social networks. IEEE Trans. Knowl. Data Eng. 2019, 32, 2101–2114. [Google Scholar] [CrossRef]
Bonifazi, G.; Cecchini, S.; Corradini, E.; Giuliani, L.; Ursino, D.; Virgili, L. Investigating community evolutions in TikTok dangerous and non-dangerous challenges. J. Inf. Sci. 2022, 32, 2101–2114. [Google Scholar] [CrossRef]
Ruan, Y.; Fuhry, D.; Parthasarathy, S. Efficient community detection in large networks using content and links. In Proceedings of the 22nd International Conference on World Wide Web, Rio de Janeiro, Brazil, 13–17 May 2013; Association for Computing Machinery: New York, NY, USA, 2013; pp. 1089–1098. [Google Scholar]
Yang, J.; McAuley, J.; Leskovec, J. Community detection in networks with node attributes. In Proceedings of the 2013 IEEE 13th International Conference on Data Mining, Houston, TX, USA, 7–10 December 2013; Volume 13, pp. 1151–1156. [Google Scholar]
Pool, S.; Bonchi, F.; Leeuwen, M.V. Description-driven community detection. ACM Trans. Intell. Syst. Technol. 2014, 5, 1–28. [Google Scholar] [CrossRef]
Chang, S.; Han, W.; Tang, J.; Qi, G.J.; Aggarwal, C.; Huang, T.S. Heterogeneous network embedding via deep architectures. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, Australia, 10–13 August 2015; pp. 119–128. [Google Scholar]
Tian, F.; Gao, B.; Cui, Q.; Chen, E.; Liu, T.Y. Learning deep representations for graph clustering. In Proceedings of the AAAI Conference on Artificial Intelligence, Quebec, QC, Canada, 27–31 July 2014; Volume 28. Available online: https://ojs.aaai.org/index.php/AAAI/article/view/8916 (accessed on 26 September 2022).
Nie, F.; Wang, X.; Huang, H. Clustering and projected clustering with adaptive neighbors. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 26–29 August 2014; pp. 977–986. [Google Scholar]
Hu, H.; Lin, Z.; Feng, J.; Zhou, J. Smooth representation clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 3834–3841. [Google Scholar]
Bhih, A.; Johnson, P.; Randles, M. An optimisation tool for robust community detection algorithms using content and topology information. J. Supercomput. 2020, 76, 226–254. [Google Scholar] [CrossRef] [Green Version]
Bourlard, H. Auto-association by multilayer perceptrons and singular value decomposition. Biol. Cybern. 1988, 59, 291–294. [Google Scholar] [CrossRef] [PubMed]
Chen, W.Y.; Song, Y.; Bai, H.; Lin, C.J.; Chang, E.Y. Parallel spectral clustering in distributed systems. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 568–586. [Google Scholar] [CrossRef] [PubMed]
Essaid, M.; Park, S.; Ju, H. Visualising Bitcoin’s Dynamic P2P Network Topoogy and Performance. In Proceedings of the 2019 IEEE International Conference on Blockchain and Cryptocurrency (ICBC), Seoul, Korea, 15–17 May 2019; pp. 141–145. [Google Scholar]
Eisenbarth, J.P.; Cholez, T.; Perrin, O. A Comprehensive Study of the Bitcoin P2P Network. In Proceedings of the 2021 3rd Conference on Blockchain Research & Applications for Innovative Networks and Services (BRAINS), Paris, France, 27–30 September 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 105–112. [Google Scholar]
Essaid, M.; Kim, H.W.; Park, W.G.; Lee, K.Y.; Park, S.J.; Ju, H.T. Network usage of bitcoin full node. In Proceedings of the 2018 International conference on information and communication technology convergence (ICTC), Jeju Island, Korea, 19–21 October 2018; pp. 1286–1291. [Google Scholar]
Beukema, W. Formalising the Bitcoin Protocol. In 21th Twente Student Conference on It. 2014. Available online: https://allquantor.at/blockchainbib/pdf/beukema2014formalising.pdf (accessed on 11 January 2022).
Sriman, B.; Kumar, S.G.; Shamili, P. Blockchain technology: Consensus protocol proof of work and proof of stake. In Intelligent Computing and Applications; Springer: Berlin/Heidelberg, Germany, 2021; pp. 395–406. [Google Scholar]
Kostarev, G. Review of blockchain consensus mechanisms. Waves Platform, 31. 2017. Available online: https://medium.com/wavesprotocol/review-of-blockchain-consensus-mechanisms-f575afae38f2 (accessed on 26 September 2022).
Skudnov, R. Bitcoin Clients. 2012. Available online: https://bitcoin.org/en/ (accessed on 26 September 2022).
BTCD. Available online: https://github.com/btcsuite/btcd (accessed on 26 September 2022).
BitcoinJ. Available online: https://bitcoinj.github.io/getting-started (accessed on 26 September 2022).
Libbitcoin. Available online: https://github.com/libbitcoin/libbitcoin-system (accessed on 26 September 2022).
Python-bitcoinlib. Available online: https://github.com/petertodd/python-bitcoinlib (accessed on 26 September 2022).
Yang, T.; Jin, R.; Chi, Y.; Zhu, S. A Bayesian framework for community detection integrating content and link. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, Montreal, QC, Canada, 18–21 June 2009; pp. 615–622. [Google Scholar]
Wang, X.; Jin, D.; Cao, X.; Yang, L.; Zhang, W. Semantic community identification in large attribute networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30. [Google Scholar]
Essaid, M.; Park, S.; Ju, H.T. Bitcoin’s dynamic peer-to-peer topology. Int. J. Netw. Manag. 2020, 30, e2106. [Google Scholar] [CrossRef]
Bitnodes. Available online: https://bitnodes.earn.com/ (accessed on 26 September 2022).
KIT "DSN Bitcoin Monitoring. Available online: https://dsn.tm.kit.edu/bitcoin/ (accessed on 26 September 2022).
Stanford Large Network Dataset Collection. Available online: https://snap.stanford.edu/data/#communities (accessed on 26 September 2022).
Abbe, E. Community detection and stochastic block models: Recent developments. J. Mach. Learn. Res. 2017, 18, 6446–6531. [Google Scholar]
Louvain Algorithm for Community Detection. Available online: https://mons1220.tistory.com/129 (accessed on 26 September 2022).
Fu, J.; He, J.; Ge, M.; Zhang, K.; Zhang, Q. A seed-edge-based link clustering LPA for robust overlapping community detection. In Proceedings of the 13th IEEE Conference on Industrial Electronics and Applications (ICIEA), Wuhan, China, 31 May–2 June 2018; pp. 1715–1720. [Google Scholar]
Corradini, E.; Nicolazzo, S.; Nocera, A.; Ursino, D.; Virgili, L. A two-tier Blockchain framework to increase protection and autonomy of smart objects in the IoT. Comput. Commun. 2022, 181, 338–356. [Google Scholar] [CrossRef]
Essaid, M.; Lee, K.; Kim, D.; Shin, H.; Ju, H.T. Mapping Out Bitcoin’s Pseudonymous actors. In Proceedings of the 2020 International Conference on Information Networking (ICOIN), Barcelona, Spain, 7–10 January 2020; pp. 802–806. [Google Scholar]

Figure 1. The Structure of the Proposed DL Model.

Figure 4. Top 10 Communities within the Bitcoin network. The color of the node indicates the node’s community. (Snapshot token on 14 May 2022).

Figure 5. Data propagation time (

Δ t_{B N, n}

) in simulated Bitcoin network using both default protocol and Master node-based protocol.

Figure 5. Data propagation time (

Δ t_{B N, n}

) in simulated Bitcoin network using both default protocol and Master node-based protocol.

Table 1. Community Size Distribution in Bitcoin Network.

# of Community	Nodes% ¹	Heavy Nodes% ²
1	4.44	0.9
2	3.76	0.9
3	7.71	19.99
4	6.49	10.75
5	3.76	0.9
6	1.46	0.9
7	5.71	0.9
8	5.56	0.9
9	3.16	0.9
10	1.65	0.9
11	6.12	4.83
12	5.47	0.9
13	1.78	0.9
14	2.68	0.9
15	4.36	0.9
16	4.17	0.9
17	1.39	0.9
18	1.32	0.9
19	4.81	0.9
20	1.8	0.9
21	1.94	0.9
22	6.26	8.66
23	1.19	0.9
24	0.39	0.9
25	3.64	0.9
26	8.98	36.87

¹ Percent of active and ² percent of heavy nodes (Snapshot token on 14 May 2022).

Table 2. Network Properties of Bitcoin and Other Generated Networks.

	Bitcoin Network	ER Network	Bitcoin Network	ER Network
Diameter	6	5	5	5.897
Degree	16.576	15.98	15.81	15.98
Clustering coef.	0.068	0.045	0.044	0.044
Path length	3.621	3.163	3.366	3.570
Assortativity coef.	0.337	0.209	0.198	0.204

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Essaid, M.; Ju, H. Deep Learning-Based Community Detection Approach on Bitcoin Network. Systems 2022, 10, 203. https://doi.org/10.3390/systems10060203

AMA Style

Essaid M, Ju H. Deep Learning-Based Community Detection Approach on Bitcoin Network. Systems. 2022; 10(6):203. https://doi.org/10.3390/systems10060203

Chicago/Turabian Style

Essaid, Meryam, and Hongteak Ju. 2022. "Deep Learning-Based Community Detection Approach on Bitcoin Network" Systems 10, no. 6: 203. https://doi.org/10.3390/systems10060203

APA Style

Essaid, M., & Ju, H. (2022). Deep Learning-Based Community Detection Approach on Bitcoin Network. Systems, 10(6), 203. https://doi.org/10.3390/systems10060203

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deep Learning-Based Community Detection Approach on Bitcoin Network^†

Abstract

1. Introduction

1.1. Contribution

1.2. Structure of the Paper

2. Bitcoin & Blockchain Background

3. Proposed Methodology

3.1. Deep Learning Model Preliminary

3.2. Model Description

3.3. Topology Dynamicity Analysis

4. Experiment & Results

4.1. Data Collection

4.2. Description of the Experiment

4.3. Modularity Results

4.4. Community Structure Analysis Result

4.5. Bitcoin Network Properties

5. Discussion

5.1. Performance Improvement

5.2. Analysis of Security Vulnerabilities

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Deep Learning-Based Community Detection Approach on Bitcoin Network †

Abstract

1. Introduction

1.1. Contribution

1.2. Structure of the Paper

2. Bitcoin & Blockchain Background

3. Proposed Methodology

3.1. Deep Learning Model Preliminary

3.2. Model Description

3.3. Topology Dynamicity Analysis

4. Experiment & Results

4.1. Data Collection

4.2. Description of the Experiment

4.3. Modularity Results

4.4. Community Structure Analysis Result

4.5. Bitcoin Network Properties

5. Discussion

5.1. Performance Improvement

5.2. Analysis of Security Vulnerabilities

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Deep Learning-Based Community Detection Approach on Bitcoin Network^†