Leveraging K-Means Clustering and Z-Score for Anomaly Detection in Bitcoin Transactions

Patel, Jinish; Reiner, Joseph; Stilwell, Brenden; Wahbeh, Abdullah; Seetan, Raed

doi:10.3390/informatics12020043

Open AccessArticle

Leveraging K-Means Clustering and Z-Score for Anomaly Detection in Bitcoin Transactions

by

Jinish Patel

,

Joseph Reiner

,

Brenden Stilwell

,

Abdullah Wahbeh

^*

and

Raed Seetan

Department of Computing and Security, College of Engineering and Science, Slippery Rock University, Slippery Rock, PA 16057, USA

^*

Author to whom correspondence should be addressed.

Informatics 2025, 12(2), 43; https://doi.org/10.3390/informatics12020043

Submission received: 8 March 2025 / Revised: 10 April 2025 / Accepted: 16 April 2025 / Published: 25 April 2025

(This article belongs to the Section Machine Learning)

Download

Browse Figures

Versions Notes

Abstract

With the growing popularity of cryptocurrencies, detecting potential market manipulation and fraudulent activities has become crucial for maintaining market integrity. In this study, we aim to detect anomalous Bitcoin transactions using an integrated approach by combining clustering techniques with statistical outlier detection. More specifically, anomalies were detected using three approaches: a distance-based method, flagging points with distances greater than the 95th percentile from their cluster centers; a statistical method, identifying transactions with any feature having an absolute Z-score greater than 3; and a hybrid approach, where transactions flagged by either method were considered anomalous. Using sample subset Bitcoin transaction data from 2015, our results showed that the combined approach was able to achieve the best performance with a total of 6492 (6.61%) detected anomalous transactions out of a total of 98,151 transactions.

Keywords:

cryptocurrency transactions; anomaly detection; clustering; statistical analysis

1. Introduction

The blockchain has gained significant attention in recent years due to its potential to transform various industries. It operates on a decentralized network where the ledger is maintained by a network of computers which validates and records transactions, hence eliminating intermediaries, reducing costs, and increasing efficiency [1,2,3]. Immutability is a major significant feature of the blockchain, where a transaction recorded on the blockchain cannot be altered or deleted, which in turn enhances the integrity of the data as well as increasing trust among participants [4,5]. Advanced cryptographic algorithms are used to secure blockchain transactions, making it difficult for malicious attacks to manipulate the data [6,7]. The blockchain uses consensus mechanisms, such as Proof of Work (PoW) and Proof of Stake (PoS), for transaction validation to ensure the validity of a transaction before it is added to the ledger [8,9,10]. Moreover, blockchain technology could be used to establish smart contracts that could be executed automatically with conditions written directly into lines of code and when predefined conditions are met, hence eliminating intermediaries and significantly reducing the risk of fraud [11,12].

The blockchain is the underlying technology that enables cryptocurrencies to function. Cryptocurrencies are digital currencies that utilize cryptographic techniques to secure transactions and verify the transfer of assets [13]. The blockchain is revolutionizing the banking and finance sectors by enabling decentralization, security, and anonymity of financial transactions [14,15]. Over the last 15 years, the application of the blockchain and distributed ledger technologies (DLTs) in the financial domain has generated enthusiastic hype [16]. Popular cryptocurrencies include Bitcoin, Ethereum, Solana, XRP, and many more.

While the blockchain and cryptocurrencies offer transformative potential for the financial market as well as e-commerce, they also pose several challenges that must be addressed for enhancing their adoption and integration. These challenges are related to smart contract vulnerabilities [17], consensus mechanism security [18], transaction throughput [19], energy consumption [20], regulatory concerns [21], market volatility [22], and their impact on traditional financial systems [23].

An important area of research in cryptocurrency transactions is anomaly detection. Anomaly detection aims to identify suspicious patterns and behaviors that reflect fraudulent activities, such as money laundering or other illicit financial schemes [24,25]. Anomaly detection is essential in maintaining blockchain network integrity and security [26]. Given the complexity and volatile nature of cryptocurrency markets, detecting anomalies is crucial to maintaining confidence in such decentralized technology [27].

According to the literature, several techniques have been used for anomaly detection in crypto transactions. Techniques such as local outlier factor (LOF) [28], the transformer model [29], and Graph Neural Networks (GNNs) [30] have demonstrated promising capabilities in identifying anomalous patterns without prior labeling. Ensemble learning [31] has the potential to improve the accuracy of anomaly detection and enhance detection robustness by combining multiple weak learners. In addition, the use of explainable AI (XAI) techniques [32] has gained traction recently and has the potential to improve cryptocurrency ecosystem security and integrity as well as improve the interpretability of detected anomalies. Emerging frameworks like Heterogeneous Graph Transformers (HGTs) [33] and active learning [34] have also been explored to model complex, multi-relational data and to iteratively refine detection with minimal supervision.

In this study, we aim to use a hybrid approach that leverages K-means clustering with Z-score analysis to detect anomalies in crypto transactions. Clustering is an unsupervised machine learning technique which groups unlabeled datasets into different clusters [32,33]. Each cluster consists of objects that are like one another and dissimilar to objects in other groups [33]. Z-score is a statistical measure that quantifies the number of standard deviations a data point is from the mean of a dataset [34]. More specifically, the objectives of this study are to explore whether a combination of K-means and Z-score effectively detect anomalous Bitcoin transactions in an unsupervised setting, and how the hybrid method compares to each technique in terms of anomaly detection rate and interpretability.

Compared to advanced and traditional approaches to anomaly detection in crypto transactions, the proposed approach offers significant practical advantages by avoiding reliance on labeled datasets that are rarely available. It is also considered far more interpretable and computationally less expensive than deep learning approaches, allowing for the rapid identification of anomalies and facilitating transparent analysis. Moreover, while robust global anomaly detection is possible with Isolation Forests and density-based algorithms, such as Density-Based Spatial Clustering of Applications with Noise (DBSCAN), these techniques tend to fail in capturing nuanced contextual outliers. In general, our approach balances a trade-off of interpretability, scalability, and context sensitivity with ease of implementation.

This study improves anomaly detection by combining clustering techniques and simple statistical analysis. The hybrid approach offers researchers and practitioners a simple and effective method for detecting anomalous cryptocurrency transactions. It is practical, adaptable, and helps to reduce false alarms, which in turn helps to identify suspicious activities efficiently.

The remainder of this paper is organized as follows. Section 2 surveys the relevant literature related to anomaly detection techniques in cryptocurrency transactions. Section 3 details the steps followed for anomaly detection using clustering techniques and Z-score. Section 4 presents findings and a discussion. Finally, we conclude with a summary and discussion of any limitations.

2. Literature Review

Several studies have attempted to address the issue of anomaly detection in cryptocurrencies and blockchain transactions. Sayadi et al. [35] used two machine learning algorithms to detect anomalies in bitcoin transactions. The authors used One-Class Support Vector Machines (OCSVMs) algorithms for outlier detection followed by a K-means algorithm to group lists of similar outliers together based on anomaly type. The dataset consisted of Bitcoin statistics, with 90,514 cases with 33 attributes. Their experimentation results showed that the methods generated good accuracy in terms of anomaly detection. Bael et al. [36] attempted to detect anomalies in Binance using metadata of 38,526 crypto wallets with unsupervised learning expectation maximization (EM) and Random Forest (RF). Using EM, the authors were able to engineer nine features of the wallets. RF was used to label suspicious wallets. Their experimentation results showed that the proposed approach achieved high precision, recall, and F1 scores. Cunha et al. [37] used anomaly detection algorithms as well as active learning (AL) techniques to uncover anomalies in cryptocurrency transactions. The authors aimed to establish a supervised baseline for benchmarking various AL setups. Three supervised algorithms were tested, namely, RF, XGBoost, and logistic regression (LR). Their results showed that the anomaly detection algorithms performed well in terms of anomaly detection. The RF algorithm achieved a similar performance to the anomaly detection algorithms. Using the anomaly detection algorithms in AL configurations failed to yield a desirable performance.

Pocher et al. [25] used machine learning methods and transaction graph analysis to analyze Bitcoin transaction datasets for anomaly detection. Two graph-based models, namely, graph convolutional networks (GCNs) and graph attention networks (GATs), were used to analyze a dataset that consisted of 203,769 transaction nodes connected by 234,355 edges. Their results showed that GCNs outperform other classic approaches and GATs perform better than a simple dense network but not as well as GCNs. Martin et al. [38] used several machine learning models to detect anomalies in different cryptocurrency transactions. The authors used three different datasets and followed the GAT2VEC framework, an unsupervised learning approach, for feature extraction. Supervised learning algorithms including LR, stochastic gradient descent (SGD) classifier, and decision tree, and unsupervised learning algorithms including K-means and Support Vector Machines (SVMs) were used to detect anomalies in the transactions. Their results showed that the supervised algorithms performed well compared to the unsupervised ones. Elmougy and Manzi [39] attempted to detect fraudulent Bitcoin and Ethereum transactions. The authors used SVM, Random Forest, and LR algorithms to detect anomalies from over 30 million transactions. The results showed that the SVM model achieved accuracies of 98.7% and 82.6% on Bitcoin and Ethereum transactions, respectively. The RF achieved accuracies of 96.4% and 90.9% on Bitcoin and Ethereum transactions, respectively. Finally, LR achieved accuracies of 94.6% and 89.9% on Bitcoin and Ethereum transactions, respectively.

Shayegan et al. [40] presented a new method, a collective anomaly approach, for detecting fraudulent accounts in Bitcoin transactions based on a trimmed K-means clustering algorithm. The Eötvös Loránd University (ELTE) Bitcoin Project dataset was used. The dataset was processed and features were extracted. The results showed that 14 users committed fraud, including 26 addresses in nine cases. Henderi and Siddique [41] evaluated the performance of an Isolation Forest algorithm in detecting cryptocurrency transaction anomalies. The authors focused on transaction types and regions. Using a dataset of 78,600 transactions, their results showed that 3930 were flagged as anomalies. Both transaction types were vulnerable, and Asia and Africa had the highest average risk scores. Zhao et al. [42] combined mutual information with self-supervised learning to improve the use of unlabeled data. The authors used a dataset with 203,769 transaction nodes as well as 234,355 edges, with 42,019 legal transactions, 4545 illegal transactions, and the remaining labeled as unknown. Using the GCN method, their results showed that the novel loss function in the GCN was better by four points compared to traditional methods of cryptocurrency anomaly detection, while the self-supervised network improved performance by three points compared to the GCN method. Buchdadi and Al-Rawahna [43] attempted to detect anomalies in blockchain transactions using Isolation Forest and Autoencoder Neural Networks. Using simulated data from the Open Metaverse, the Isolation Forest and autoencoder achieved precisions of 0.85 and 0.87, respectively. Furthermore, the autoencoder and Isolation Forest achieved AUC-ROC scores of 0.85 and 0.82, respectively.

Kang and Buu [44] proposed a cryptocurrency transaction anomaly detection method called the disentangled prototypical graph convolutional autoencoder (DP-GCAE). Using real Ethereum dataset, their proposed method achieved a better F1 score, with a 37.7% increase compared to other anomaly detection methods. Badawi and Al-Haija [45] proposed an anti-money laundering system to detect suspicious cryptocurrency transactions. The authors evaluated the proposed system using a Bitcoin anti-money laundering dataset, and their experimentation showed that the proposed system resulted in improved accuracy. Song et al. [46] proposed a deep learning-based anomaly detection model called the anomaly VAE-Transformer. The model was evaluated for detecting four anomaly cases in Olympus DAO, a decentralized financial (DeFi) protocol. Their results showed that the model could quickly identify malicious attacks and structural changes in DeFi protocols. Kim et al. [47] proposed a mechanism to detect malicious events in blockchain networks. The authors used a data collection engine to generate periodic multi-dimensional data streams which were analyzed using semi-supervised learning and an autoencoder (AE). Their results showed that the proposed approach detected malicious events in real time and reduced time complexity through feature prioritization. Snigdha et al. [48] used ensemble learning for anomaly detection in Bitcoin transactions by combining machine learning techniques and DBSCAN in a stacking framework. Their results showed that the proposed model achieved accuracies of 98% and higher with hyperparameter tuning.

Lenus [49] utilized the FP-Growth algorithm to identify frequent co-occurrence patterns among blockchain address categories, providing insights into how different types of addresses interact within blockchain networks. Using a large dataset of over 10 million addresses across various blockchain networks, the study uncovered frequent item sets and association rules that revealed how different address types, such as smart contracts, centralized exchanges, and liquid staking entities, commonly interact. By uncovering which address categories commonly appeared together, the analysis revealed the underlying transactional relationships that characterize blockchain ecosystems. Understanding these co-occurrence patterns is valuable for improving address classification and can enhance anomaly detection.

Recent research has emphasized the growing importance of anomaly detection in cryptocurrency transactions for many different reasons. However, limitations exist with respect to the methods and techniques used in the literature. Many supervised methods have shown high accuracies in detecting anomalies. However, they suffer from the unavailability of labeled datasets [24,37], which are often scarce or expensive to obtain. On the other hand, studies that utilized unsupervised learning approaches such as Isolation Forest, DBSCAN, and One-Class SVM do not require labeled data but often suffer from mixed results and findings [28,50]. In addition, ensemble learning approaches, which combine multiple machine learning models, have also been proposed for anomaly detection in cryptocurrency transactions [31]. While these methods can improve accuracy, they often require the careful selection of base models and hyperparameter tuning, adding complexity to the overall approach. Furthermore, graph-based methods show decent performance but suffer from their computational overhead [51,52]. Therefore, there is a need to explore new approaches that are simpler and at the same time effective for detecting anomalies in cryptocurrency transactions.

The hybrid approach proposed in this study, combining K-means and Z-score normalization, aims to address these limitations of existing approaches by offering a computationally efficient, interpretable, and adaptable framework for anomaly detection. Z-score normalization enhances the sensitivity of K-means by scaling features uniformly, allowing for the better separation of normal and abnormal clusters. Such an approach has the potential to take advantage of simpler models, like Isolation Forest and One-Class SVM, while at the same time helping to avoid their associated complexity and parameter tuning challenges. In other words, the proposed hybrid approach balances performance with scalability, making it a compelling alternative for real-world cryptocurrency anomaly detection systems.

3. Methodology

Figure 1 shows the research methodology for detecting anomalies in Bitcoin transactions. Using Google BigQuery [53], we obtained a Bitcoin transaction dataset that consisted of transactions on the Bitcoin blockchain. Given the gigantic size of the dataset, we chose a small percentage of these transactions that fell within March 2015, as we needed the transactions to fall into a sequential timely manner.

The features found in this dataset include “block_hash”, “block_number”, “block_timestamp”, “hash”, “input_count”, “input_value”, “lock_time”, “output_count”, “size”, and “virtual_size”. For analysis purposes, “block timestamp”, “input count”, and “transaction size” were the key features to be analyzed. From these key features, additional features could be generated to gain further insights into the transaction dataset. More specifically, we generated “time difference”, which is the difference in time between two subsequent transactions. If transactions occurred rapidly, this could be considered anomalous behavior. The second feature was “rolling transaction size mean”. This rolling statistic tracked the mean transaction size for every one-hundred transactions. This feature should be stable and consistent throughout the dataset. Any significant change would be considered an outlier and anomalous. Next, we extracted the hour of the day, day of the month, and day of the week for each transaction. These temporal features, including “time difference”, could help to reveal patterns of when transactions occurred and could be useful for detecting anomalies. Finally, we dropped any cases with missing values and standardized the resulting feature matrix [54]. Standardization ensured that every feature contributed equally when we computed the distances later.

Next, we applied the K-means clustering algorithm to the standardized feature matrix and then computed the distance of each point to its assigned cluster center. To do so, three different approaches were followed to detect anomalies. First was a distance-based measure [55], where points with a distance greater than the 95th percentile were flagged as anomalies. The 95th percentile threshold was a heuristic choice intended to flag the most extreme 5% of points based on their distance from their assigned cluster centers [56]. Second, the Z-score measure [57] was used to detect anomalous points, and transactions where any standardized feature had an absolute Z-score greater than 3 were flagged as anomalies [58]. As a rule of thumb, the threshold of |Z-score| > 3 was based on the empirical rule for normally distributed data, where 99.7% of the data falls within 3 standard deviations of the mean and absolute Z-score values greater than 3 are very rare, meaning that they, in turn, were flagged as anomalies. In statistical analysis, a Z-score threshold of ±3 is commonly employed to identify extreme outliers. This practice is grounded in the empirical rule, which states that for a normal distribution, approximately 99.7% of data points lie within three standard deviations from the mean [59,60]. Third, we combined the two methods so that a point was considered anomalous if it was flagged by either method. For the optimal number of clusters, we used the elbow method [61] by computing the inertia for different numbers of clusters. The optimal number of clusters was chosen when the inertia began to level off. In other words, the elbow method involves running clustering with different numbers of clusters and then plotting the clustering error for each count. The optimal number of clusters was identified as the point where adding more clusters did not significantly improve the results, creating an elbow on the plot.

Once results were obtained for each method, we used principal component analysis (PCA) to reduce the high-dimensional feature space into 2 dimensions, which in turn allowed for the simpler visualization of the resulting cluster structure and showed the detected anomalies in a simple scatter plot. Finally, we analyzed and reported on the clustering and anomaly detection results and then generated descriptive statistics comparing the normal and anomalous transactions.

4. Results and Discussions

The data collection step ended with a total of (98,151) cases after preprocessing and feature engineering. Feature engineering resulted in the following features, namely, “block_hash”, “block_number”, “block_timestamp”, “hash”, “input_count”, “input_value”, “lock_time”, “output_count”, “size”, “virtual_size”, “time difference”, “rolling transaction size mean”, and “time difference”.

Figure 2 shows the results of plotting the inertia across different possible numbers of clusters (K). Based on the figure, we choose a value of 7 for the number of clusters.

The analysis of anomalies using the three different approaches resulted in the detection of 4913 (5.01%) anomalous transactions using distance-based measure, a total of 4670 (4.76%) anomalous transactions using the Z-score-based measure, and a total of 6492 (6.61%) anomalous transactions using the combined approach.

Table 1 shows the distribution of transactions, both normal and anomalous, across different clusters using the combined approach. As shown in the table, cluster 6 contains the highest number of anomalies, with 2123 transactions, which accounts for about 32.70% of all the anomalies. Cluster 4 contains 1532 anomalous transactions, representing roughly 23.60% of the total anomalies. Cluster 1 has 1115 anomalous transactions, representing 17.17% of the anomalies. Cluster 0 consists of 834 anomalies (12.84%), cluster 5 consists of 667 anomalies (10.28%), cluster 3 consists of 199 anomalies (roughly 3.07%), and cluster 2 is the smallest, with 22 anomalies (0.34%). Overall, Clusters 6, 4, and 1 are responsible for most of the anomalous transactions, containing 73.47% of all anomalies. On the other hand, clusters 0, 5, 3, and 2 represent the remaining 26.53% of anomalous transactions.

Figure 3 shows the distribution of the transactions in a 3D feature space, and each plot focuses on a different subset of the data. The first figure on the left displays every transaction (normal and anomalous) color-coded by cluster. The figure shows how the transactions are grouped in space and which clusters are dominant. The figure in the middle represents the normal transactions only, color-coded by cluster. This plot shows which clusters mostly contain normal transactions and how they are positioned in the feature space compared to the figure on the left. Furthermore, given the state of the transactions, we can clearly see that some clusters are barely visible in the figure, especially if they have a smaller number of normal transactions. Finally, the figure on the right represents the anomalous transactions only, color-coded by cluster. This figure clearly shows which clusters contribute the most to the detected anomalies, with clusters with large numbers of anomalies appearing more prominently in the figure. Overall, these visualizations illustrate how the clusters separate out in a 3D space and where the normal versus anomalous points lie. They help to identify which clusters are predominantly normal, which ones are primarily anomalous, and whether the anomalies are tightly grouped or dispersed.

Our findings suggest that the combined hybrid approaches yielded better results by combining different techniques, overcoming the drawbacks of each one and resulting in a higher accuracy of anomaly detection. Our results from combining K-means clustering and Z-score analysis for detecting anomalies demonstrate the effectiveness of this hybrid, which aligns with the findings of [32]. The identification of 6.61% of total transactions being anomalous using our combined algorithm provides insightful results regarding unusual transaction patterns in Bitcoin networks.

The distribution of anomalies across different clusters shows us important patterns that emerge from our Bitcoin transaction dataset. We were able to find distinct clustering behaviors in our dataset. Some of the clusters from our experiments contained exclusive anomalous transactions, especially clusters 2, 3, 4, and 6, while clusters 0, 1, and 5 showed a mix of normal and anomalous transactions. This suggests that the most typical transactions patterns that identify anomalies fall within some clusters. Despite clusters 0, 1, and 5 being mostly normal transactions, they do contain anomalies, suggesting that outlier patterns can exist in all clusters.

5. Conclusions

Overall, the clustering and anomaly detection results provide useful insights into transaction behaviors within the analyzed Bitcoin dataset. The proposed hybrid approach showed potential in isolating transactions that significantly deviate from normal patterns. Furthermore, the clustering analysis showed that some clusters are more prone to anomalies than others. Our findings could help to guide further analysis, allowing for targeted interventions or deeper investigations into the nature of the detected anomalies. However, these findings are preliminary, and the effectiveness of the proposed method has not yet been validated against known fraudulent transactions or benchmarked against established anomaly detection models.

This research is not without any limitations. First, one notable limitation of this study is its reliance on a historical Bitcoin transaction dataset from 2015. While this dataset offers a clean and manageable structure, ideal for benchmarking anomaly detection techniques, it does not reflect the current state of the blockchain ecosystem, which has undergone substantial changes over the past decade. Second, only a subset of the data was used in the analysis due to computational resource constraints. This could have caused our methods to miss rare or less frequent anomalies in the dataset. Third, the 95th percentile threshold for distances is an arbitrary cutoff, which could lead to the method missing anomalies if the distribution of distances is skewed. Finally, while PCA helped to visualize the clusters, it could lead to the loss of information that is considered important to distinguish between clusters.

Future research should validate the proposed hybrid approach across more recent and diverse blockchain datasets, conducting benchmark comparisons with established models and exploring adaptive thresholding methods to improve detection robustness. In addition, exploring the use of deep learning techniques, large language models, and explainable AI (XAI) techniques, could enhance the detection rate as well as the interpretability and transparency of anomaly detection results. Finally, future work should explore additional features such as transaction velocity or address behavior patterns, as well as detecting anomalies over time, to potentially enhance model accuracy and provide deeper insights into anomalies.

Author Contributions

Conceptualization, J.P., J.R., B.S. and A.W.; methodology, J.P., J.R. and B.S.; software, J.P., J.R., B.S. and A.W.; validation, J.P., J.R., B.S., A.W. and R.S.; formal analysis, J.P., J.R. and B.S.; investigation, J.P., J.R. and B.S.; resources, A.W. and R.S.; data curation, J.P., J.R. and B.S.; writing—original draft preparation, J.P., J.R., B.S., A.W. and R.S.; writing—review and editing, A.W. and R.S.; visualization, J.P., J.R. and B.S.; supervision, A.W. and R.S.; project administration, A.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is available from Google BigQuery.

Conflicts of Interest

The authors declare no conflict of interest.

References

Habib, G.; Sharma, S.; Ibrahim, S.; Ahmad, I.; Qureshi, S.; Ishfaq, M. Blockchain Technology: Benefits, Challenges, Applications, and Integration of Blockchain Technology with Cloud Computing. Future Internet 2022, 14, 11. [Google Scholar] [CrossRef]
Pilkington, M. Chapter 11: Blockchain Technology: Principles and Applications. 2016. Available online: https://www.elgaronline.com/edcollchap/edcoll/9781784717759/9781784717759.00019.xml (accessed on 6 March 2025).
Zheng, Z.; Xie, S.; Dai, H.; Chen, X.; Wang, H. An Overview of Blockchain Technology: Architecture, Consensus, and Future Trends. In Proceedings of the 2017 IEEE International Congress on Big Data (BigData Congress), Honolulu, HI, USA, 25–30 June 2017; pp. 557–564. [Google Scholar] [CrossRef]
Politou, E.; Casino, F.; Alepis, E.; Patsakis, C. Blockchain Mutability: Challenges and Proposed Solutions. IEEE Trans. Emerg. Top. Comput. 2021, 9, 1972–1986. [Google Scholar] [CrossRef]
Tariq, U.; Ibrahim, A.; Ahmad, T.; Bouteraa, Y.; Elmogy, A. Blockchain in internet-of-things: A necessity framework for security, reliability, transparency, immutability and liability. IET Commun. 2019, 13, 3187–3192. [Google Scholar] [CrossRef]
Ferdous, M.S.; Chowdhury, M.J.M.; Hoque, M.A. A survey of consensus algorithms in public blockchain systems for crypto-currencies. J. Netw. Comput. Appl. 2021, 182, 103035. [Google Scholar] [CrossRef]
Leng, J.; Zhou, M.; Zhao, J.L.; Huang, Y.; Bian, Y. Blockchain Security: A Survey of Techniques and Research Directions. IEEE Trans. Serv. Comput. 2022, 15, 2490–2510. [Google Scholar] [CrossRef]
Lashkari, B.; Musilek, P. A Comprehensive Review of Blockchain Consensus Mechanisms. IEEE Access 2021, 9, 43620–43652. [Google Scholar] [CrossRef]
Yan, S. Analysis on Blockchain Consensus Mechanism Based on Proof of Work and Proof of Stake. In Proceedings of the 2022 International Conference on Data Analytics, Computing and Artificial Intelligence (ICDACAI), Zakopane, Poland, 15–16 August 2022; pp. 464–467. [Google Scholar] [CrossRef]
Zhang, C.; Wu, C.; Wang, X. Overview of Blockchain Consensus Mechanism. In Proceedings of the 2020 2nd International Conference on Big Data Engineering, BDE ’20, New York, NY, USA, 5 July 2020; pp. 7–12. [Google Scholar] [CrossRef]
Khan, S.N.; Loukil, F.; Ghedira-Guegan, C.; Benkhelifa, E.; Bani-Hani, A. Blockchain smart contracts: Applications, challenges, and future trends. Peer Peer Netw. Appl. 2021, 14, 2901–2925. [Google Scholar] [CrossRef]
Mohanta, B.K.; Panda, S.S.; Jena, D. An Overview of Smart Contract and Use Cases in Blockchain Technology. In Proceedings of the 2018 9th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Bengaluru, India, 10–12 July 2018; pp. 1–4. [Google Scholar] [CrossRef]
Tredinnick, L. Cryptocurrencies and the blockchain. Bus. Inf. Rev. 2019, 36, 39–44. [Google Scholar] [CrossRef]
Amarasinghe, N.; Boyen, X.; McKague, M. A Survey of Anonymity of Cryptocurrencies. In Proceedings of the Australasian Computer Science Week Multiconference, ACSW ’19, New York, NY, USA, 29–31 January 2019; pp. 1–10. [Google Scholar] [CrossRef]
Makarov, I.; Schoar, A. Cryptocurrencies and Decentralized Finance (DeFi). Brook. Pap. Econ. Act. 2022, 2022, 141–215. [Google Scholar] [CrossRef]
Ali, O.; Ally, M.; Clutterbuck, P.; Dwivedi, Y. The state of play of blockchain technology in the financial services sector: A systematic literature review. Int. J. Inf. Manag. 2020, 54, 102199. [Google Scholar] [CrossRef]
Kushwaha, S.S.; Joshi, S.; Singh, D.; Kaur, M.; Lee, H.-N. Systematic Review of Security Vulnerabilities in Ethereum Blockchain Smart Contract. IEEE Access 2022, 10, 6605–6621. [Google Scholar] [CrossRef]
Yadav, A.K.; Singh, K.; Amin, A.H.; Almutairi, L.; Alsenani, T.R.; Ahmadian, A. A comparative study on consensus mechanism with security threats and future scopes: Blockchain. Comput. Commun. 2023, 201, 102–115. [Google Scholar] [CrossRef]
Gracy, M.; Rebecca Jeyavadhanam, B. A Systematic Review of Blockchain-Based System: Transaction Throughput Latency and Challenges. In Proceedings of the 2021 International Conference on Computational Intelligence and Computing Applications (ICCICA), Nagpur, India, 18–19 June 2021; pp. 1–6. [Google Scholar] [CrossRef]
Kohli, V.; Chakravarty, S.; Chamola, V.; Sangwan, K.S.; Zeadally, S. An analysis of energy consumption and carbon footprints of cryptocurrencies and possible solutions. Digit. Commun. Netw. 2023, 9, 79–89. [Google Scholar] [CrossRef]
Yeoh, P. Regulatory issues in blockchain technology. J. Financ. Regul. Compliance 2017, 25, 196–208. [Google Scholar] [CrossRef]
Khan, R.; Hakami, T.A. Cryptocurrency: Usability perspective versus volatility threat. J. Money Bus. 2021, 2, 16–28. [Google Scholar] [CrossRef]
Kayani, U.; Hasan, F. Unveiling Cryptocurrency Impact on Financial Markets and Traditional Banking Systems: Lessons for Sustainable Blockchain and Interdisciplinary Collaborations. J. Risk Financ. Manag. 2024, 17, 58. [Google Scholar] [CrossRef]
Lorenz, J.; Silva, M.I.; Aparício, D.; Ascensão, J.T.; Bizarro, P. Machine learning methods to detect money laundering in the bitcoin blockchain in the presence of label scarcity. In Proceedings of the First ACM International Conference on AI in Finance, in ICAIF ’20, New York, NY, USA, 15–16 October 2021; pp. 1–8. [Google Scholar] [CrossRef]
Pocher, N.; Zichichi, M.; Merizzi, F.; Shafiq, M.Z.; Ferretti, S. Detecting anomalous cryptocurrency transactions: An AML/CFT application of machine learning-based forensics. Electron. Markets 2023, 33, 37. [Google Scholar] [CrossRef]
Kim, J.; Nakashima, M.; Fan, W.; Wuthier, S.; Zhou, X.; Kim, I.; Chang, S.-Y. A Machine Learning Approach to Anomaly Detection Based on Traffic Monitoring for Secure Blockchain Networking. IEEE Trans. Netw. Serv. Manag. 2022, 19, 3619–3632. [Google Scholar] [CrossRef]
Ul Hassan, M.; Rehmani, M.H.; Chen, J. Anomaly Detection in Blockchain Networks: A Comprehensive Survey. IEEE Commun. Surv. Tutor. 2023, 25, 289–318. [Google Scholar] [CrossRef]
Yahia, A.; Mouhssine, Y.; Elalaoui, A.; Ouatik Elalaoui, S. Leveraging Machine Learning for Anomaly Detection Methods in Cryptocurrency: A Data-Driven Study. In Proceedings of the 2024 10th International Conference on Optimization and Applications (ICOA), Almería, Spain, 17–18 October 2024; pp. 1–5. [Google Scholar] [CrossRef]
Wang, Z.; Ni, A.; Tian, Z.; Wang, Z.; Gong, Y. Research on blockchain abnormal transaction detection technology combining CNN and transformer structure. Comput. Electr. Eng. 2024, 116, 109194. [Google Scholar] [CrossRef]
Sharma, A.; Singh, P.K.; Podoplelova, E.; Gavrilenko, V.; Tselykh, A.; Bozhenyuk, A. Graph Neural Network-Based Anomaly Detection in Blockchain Network. In Proceedings of the International Conference on Computing, Communications, and Cyber-Security, Delhi, India, 21–22 October 2022; Tanwar, S., Wierzchon, S.T., Singh, P.K., Ganzha, M., Epiphaniou, G., Eds.; Springer Nature: Singapore, 2023; pp. 909–925. [Google Scholar] [CrossRef]
Hisham, S.; Makhtar, M.; Aziz, A.A. Combining Multiple Classifiers using Ensemble Method for Anomaly Detection in Blockchain Networks: A Comprehensive Review. Int. J. Adv. Comput. Sci. Appl. 2022, 13, 8. [Google Scholar] [CrossRef]
Agrawal, S.; Agrawal, J. Survey on Anomaly Detection using Data Mining Techniques. Procedia Comput. Sci. 2015, 60, 708–713. [Google Scholar] [CrossRef]
Berkhin, P. A Survey of Clustering Data Mining Techniques. In Grouping Multidimensional Data: Recent Advances in Clustering; Kogan, J., Nicholas, C., Teboulle, M., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; pp. 25–71. [Google Scholar] [CrossRef]
Varshney, R.P.; Sharma, D.K. Optimizing Time-Series forecasting using stacked deep learning framework with enhanced adaptive moment estimation and error correction. Expert. Syst. Appl. 2024, 249, 123487. [Google Scholar] [CrossRef]
Sayadi, S.; Ben Rejeb, S.; Choukair, Z. Anomaly Detection Model Over Blockchain Electronic Transactions. In Proceedings of the 2019 15th International Wireless Communications & Mobile Computing Conference (IWCMC), Tangier, Morocco, 24–28 June 2019; pp. 895–900. [Google Scholar] [CrossRef]
Baek, H.; Oh, J.; Kim, C.Y.; Lee, K. A Model for Detecting Cryptocurrency Transactions with Discernible Purpose. In Proceedings of the 2019 Eleventh International Conference on Ubiquitous and Future Networks (ICUFN), Zagreb, Croatia, 2–5 July 2019; pp. 713–717. [Google Scholar] [CrossRef]
Cunha, L.L.; Brito, M.A.; Oliveira, D.F.; Martins, A.P. Active Learning in the Detection of Anomalies in Cryptocurrency Transactions. Mach. Learn. Knowl. Extr. 2023, 5, 1717–1745. [Google Scholar] [CrossRef]
Martin, K.; Rahouti, M.; Ayyash, M.; Alsmadi, I. Anomaly detection in blockchain using network representation and machine learning. Secur. Priv. 2022, 5, e192. [Google Scholar] [CrossRef]
Elmougy, Y.; Manzi, O. Anomaly Detection on Bitcoin, Ethereum Networks Using GPU-accelerated Machine Learning Methods. In Proceedings of the 2021 31st International Conference on Computer Theory and Applications (ICCTA), Alexandria, Egypt, 11–13 December 2021; pp. 166–171. [Google Scholar] [CrossRef]
Shayegan, M.J.; Sabor, H.R.; Uddin, M.; Chen, C.-L. A Collective Anomaly Detection Technique to Detect Crypto Wallet Frauds on Bitcoin Network. Symmetry 2022, 14, 328. [Google Scholar] [CrossRef]
Henderi; Siddique, Q. Anomaly Detection in Blockchain Transactions within the Metaverse Using Anomaly Detection Techniques. J. Curr. Res. Blockchain 2024, 1, 2. [Google Scholar] [CrossRef]
Zhao, K.; Dong, G.; Bian, D. Detection of Illegal Transactions of Cryptocurrency Based on Mutual Information. Electronics 2023, 12, 1542. [Google Scholar] [CrossRef]
Buchdadi, A.D.; Al-Rawahna, A.S.M. Anomaly Detection in Open Metaverse Blockchain Transactions Using Isolation Forest and Autoencoder Neural Networks. Int. J. Res. Metaverse 2025, 2, 24–51. [Google Scholar] [CrossRef]
Kang, J.; Buu, S.-J. Graph Anomaly Detection with Disentangled Prototypical Autoencoder for Phishing Scam Detection in Cryptocurrency Transactions. IEEE Access 2024, 12, 91075–91088. [Google Scholar] [CrossRef]
Badawi, A.A.; Al-Haija, Q.A. Detection of money laundering in bitcoin transactions. In Proceedings of the 4th Smart Cities Symposium (SCS 2021), Manama, Bahrain, 21–23 November 2021; pp. 458–464. [Google Scholar] [CrossRef]
Song, A.; Seo, E.; Kim, H. Anomaly VAE-Transformer: A Deep Learning Approach for Anomaly Detection in Decentralized Finance. IEEE Access 2023, 11, 98115–98131. [Google Scholar] [CrossRef]
Kim, J.; Nakashima, M.; Fan, W.; Wuthier, S.; Zhou, X.; Kim, I.; Chang, S.-Y. Anomaly Detection based on Traffic Monitoring for Secure Blockchain Networking. In Proceedings of the 2021 IEEE International Conference on Blockchain and Cryptocurrency (ICBC), Sydney, Australia, 3–6 May 2021; pp. 1–9. [Google Scholar] [CrossRef]
Snigdha, K.; Reddy, P.S.N.; Hema, D.; Gayathri, S. BitPredict: End-to-End Context-Aware Detection of Anomalies in Bitcoin Transactions using Stack Model Network. In Proceedings of the 2024 International Conference on Advances in Computing, Communication and Applied Informatics (ACCAI), Chennai, India, 9–10 May 2024; pp. 1–6. [Google Scholar] [CrossRef]
Lenus, L. Discovering Co-Occurrence Patterns Among Blockchain Address Categories Using the FP-Growth Association Mining Algorithm. J. Curr. Res. Blockchain 2025, 2, 1. [Google Scholar] [CrossRef]
Hariguna, T.; Al-Rawahna, A.S.M. Unsupervised Anomaly Detection in Digital Currency Trading: A Clustering and Density-Based Approach Using Bitcoin Data. J. Curr. Res. Blockchain 2024, 1, 70–90. [Google Scholar] [CrossRef]
Pérez-Cano, V.; Jurado, F. Fraud Detection in Cryptocurrency Networks—An Exploration Using Anomaly Detection and Heterogeneous Graph Transformers. Future Internet 2025, 17, 44. [Google Scholar] [CrossRef]
Han, Y.; Wang, X.; He, M.; Wang, X.; Guo, S. Intrusion Detection for Encrypted Flows Using Single Feature Based on Graph Integration Theory. IEEE Internet Things J. 2024, 11, 17589–17601. [Google Scholar] [CrossRef]
Tigani, J.; Naidu, S. Google BigQuery Analytics; John Wiley & Sons: Hoboken, NJ, USA, 2014. [Google Scholar]
Elen, A.; Avuçlu, E. Standardized Variable Distances: A distance-based machine learning method. Appl. Soft Comput. 2021, 98, 106855. [Google Scholar] [CrossRef]
Ghazal, T.M. Performances of K-Means Clustering Algorithm with Different Distance Metrics. Intell. Autom. Soft Comput. 2021, 30, 2. [Google Scholar] [CrossRef]
Olson, K.L.; Bonetti, M.; Pagano, M.; Mandl, K.D. Real time spatial cluster detection using interpoint distances among precise patient locations. BMC Med. Inform. Decis. Mak. 2005, 5, 19. [Google Scholar] [CrossRef]
Kathiresan, V.; Sumathi, P. An efficient clustering algorithm based on Z-Score ranking method. In Proceedings of the 2012 International Conference on Computer Communication and Informatics, Coimbatore, India, 10–12 January 2012; pp. 1–4. [Google Scholar] [CrossRef]
Qi, B.; Zhang, P.; Rong, Z.; Wang, J.; Li, C.; Chen, J. Rapid Transformer Health State Recognition Through Canopy Cluster-Merging of Dissolved Gas Data in High-Dimensional Space. IEEE Access 2019, 7, 94520–94532. [Google Scholar] [CrossRef]
Meijer, P.; Sobas, F.; Tsiamyrtzis, P. Assessment of accuracy of laboratory testing results, relative to peer group consensus values in external quality control, by bivariate z-score analysis: The example of D-Dimer. Clin. Chem. Lab. Med. 2024, 62, 1548–1556. [Google Scholar] [CrossRef]
Curtis, A.E.; Smith, T.A.; Ziganshin, B.A.; Elefteriades, J.A. The Mystery of the Z-Score. Aorta 2018, 4, 124–130. [Google Scholar] [CrossRef]
Syakur, M.A.; Khotimah, B.K.; Rochman, E.M.S.; Satoto, B.D. Integration K-Means Clustering Method and Elbow Method for Identification of The Best Customer Profile Cluster. IOP Conf. Ser. Mater. Sci. Eng. 2018, 336, 012017. [Google Scholar] [CrossRef]

Figure 1. Research methodology.

Figure 2. Optimizing the number of clusters using the elbow method.

Figure 3. Distribution of different transactions across clusters.

Table 1. Cluster distribution summary.

Cluster	Total Points	Normal Points (Count, %)	Anomalous Points (Count, %)
0	74,470	73,636 (98.88%)	834 (1.12%)
1	17,434	16,319 (93.60%)	1115 (6.40%)
2	22	0 (0.00%)	22 (100.00%)
3	199	0 (0.00%)	199 (100.00%)
4	1532	0 (0.00%)	1532 (100.00%)
5	2469	1802 (72.99%)	667 (27.01%)
6	1123	0 (0.00%)	1123 (100.00%)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Patel, J.; Reiner, J.; Stilwell, B.; Wahbeh, A.; Seetan, R. Leveraging K-Means Clustering and Z-Score for Anomaly Detection in Bitcoin Transactions. Informatics 2025, 12, 43. https://doi.org/10.3390/informatics12020043

AMA Style

Patel J, Reiner J, Stilwell B, Wahbeh A, Seetan R. Leveraging K-Means Clustering and Z-Score for Anomaly Detection in Bitcoin Transactions. Informatics. 2025; 12(2):43. https://doi.org/10.3390/informatics12020043

Chicago/Turabian Style

Patel, Jinish, Joseph Reiner, Brenden Stilwell, Abdullah Wahbeh, and Raed Seetan. 2025. "Leveraging K-Means Clustering and Z-Score for Anomaly Detection in Bitcoin Transactions" Informatics 12, no. 2: 43. https://doi.org/10.3390/informatics12020043

APA Style

Patel, J., Reiner, J., Stilwell, B., Wahbeh, A., & Seetan, R. (2025). Leveraging K-Means Clustering and Z-Score for Anomaly Detection in Bitcoin Transactions. Informatics, 12(2), 43. https://doi.org/10.3390/informatics12020043

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Leveraging K-Means Clustering and Z-Score for Anomaly Detection in Bitcoin Transactions

Abstract

1. Introduction

2. Literature Review

3. Methodology

4. Results and Discussions

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI