Next Article in Journal
The PECC Framework: Promoting Gender Sensitivity and Gender Equality in Computer Science Education
Previous Article in Journal
A Comparative Evaluation of Time-Series Forecasting Models for Energy Datasets
Previous Article in Special Issue
Immersive, Secure, and Collaborative Air Quality Monitoring
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Machine Learning for Anomaly Detection in Blockchain: A Critical Analysis, Empirical Validation, and Future Outlook

1
Department of Computer Science, Shaheed Zulfiqar Ali Bhutto Institute of Science and Technology, University Karachi, Karachi 75600, Pakistan
2
Department of Computer Science, Shaheed Zulfiqar Ali Bhutto Institute of Science and Technology, Gharo Campus, Gharo 73210, Pakistan
*
Authors to whom correspondence should be addressed.
Computers 2025, 14(7), 247; https://doi.org/10.3390/computers14070247
Submission received: 10 May 2025 / Revised: 11 June 2025 / Accepted: 20 June 2025 / Published: 25 June 2025

Abstract

Blockchain technology has transformed how data are stored and transactions are processed in a distributed environment. Blockchain assures data integrity by validating transactions through the consensus of a distributed ledger involving several miners as validators. Although blockchain provides multiple advantages, it has also been subject to some malicious attacks, such as a 51% attack, which is considered a potential risk to data integrity. These attacks can be detected by analyzing the anomalous node behavior of miner nodes in the network, and data analysis plays a vital role in detecting and overcoming these attacks to make a secure blockchain. Integrating machine learning algorithms with blockchain has become a significant approach to detecting anomalies such as a 51% attack and double spending. This study comprehensively analyzes various machine learning (ML) methods to detect anomalies in blockchain networks. It presents a Systematic Literature Review (SLR) and a classification to explore the integration of blockchain and ML for anomaly detection in blockchain networks. We implemented Random Forest, AdaBoost, XGBoost, K-means, and Isolation Forest ML models to evaluate their performance in detecting Blockchain anomalies, such as a 51% attack. Additionally, we identified future research directions, including challenges related to scalability, network latency, imbalanced datasets, the dynamic nature of anomalies, and the lack of standardization in blockchain protocols. This study acts as a benchmark for additional research on how ML algorithms identify anomalies in blockchain technology and aids ongoing studies in this rapidly evolving field.

1. Introduction

Blockchain technology has recently become increasingly popular for its secure and decentralized data processing. However, it has encountered several security issues and shortcomings, such as malicious attacks and forking. When anomalies occur in blockchain networks, there can be serious consequences, such as transaction fraud, data tampering, and network disruption. The traditional anomaly detection methods implemented in a centralized system are inappropriate for blockchain due to its decentralized nature [1]. Therefore, detecting anomalous behavior in Blockchain is crucial to prevent security threats [2]. Anomaly detection implies the challenge of identifying the data patterns of unexpected behavior. Machine learning (ML) attempts to “Automate the procedure of gaining knowledge from instances” [3]. ML methods are increasingly used for anomaly detection and can also be utilized in blockchain to monitor anomaly detection and prevention in blockchain networks effectively. Hence, it evolves as a promising tool in blockchain to detect anomalies in the network. By utilizing ML algorithms, it is possible to detect anomalous patterns in the blockchain and train these ML algorithms based on historical data [4].
Anomaly detection in blockchain systems plays a vital role in upholding the core principles of the confidentiality, integrity, and availability (CIA) triad. By identifying malicious transactions, it helps preserve data integrity; by preventing potential attacks, it ensures the continued availability of the network and through analyzing suspicious behavior, it contributes to maintaining confidentiality. This integration of ML into blockchain security highlights how advanced technologies can effectively translate foundational cybersecurity principles into practical safeguards within decentralized systems [4,5]. Figure 1 shows the CIA-Triad.
The primary purpose of this study is to perform a systematic review that comprehensively examines machine-learning techniques for detecting anomalies in blockchain networks, considering the 51% attack. Moreover, this study aims to evaluate the performance of ML models in anomaly detection. Additionally, we analyze the percentage of papers incorporating ML models in anomaly detection during the last five years. We believe this study will enable scholars to understand the various anomaly detection methods and the latest research on the topic. As far as our knowledge is concerned, there are very limited SLRs available for detecting abnormalities in Blockchain networks using ML techniques, which is the motivation for this study. In performing this, we read the articles thoroughly and followed the procedure of Kitchenham and Charter while selecting the research papers, which include (1) main prediction work in anomaly detection, (2) the types of ML models used for anomaly detection in blockchain, (3) approximation and accuracy of the ML models proposed for anomaly detection in blockchain, and (4) the strengths and limitations of the proposed ML technique. The main contribution of this study is mentioned below:
  • Evaluate the performance of ML models in anomaly detection, including supervised and unsupervised anomaly detection.
  • Consequently, compare the performance of each of the given techniques.
  • Classify according to algorithm type, learning type, data requirements, objective, optimization problem, decision boundary type, performance evaluation, regularization parameter, and merits/demerits of techniques.
  • It spans the years 2019–2025, which is a relatively current period.

2. Background on Blockchain Technology

The blockchain is an electronic platform that uses blocks to store digital data. All blocks in the blockchain are interconnected with each other. The immutable feature of a blockchain prevents changes in the information stored in the blocks. Any modification to a single bit of data in a block renders all subsequent blocks invalid [6]. Key information, such as customer data, payment details, property contracts, etc., is publicly available in the network and is freely accessible and trackable by anybody, which creates the potential risk of security [7]. However, each block has a digital signature corresponding to the strings associated with that block. The complete process takes place with the help of a hashing algorithm. A new digital signature will be created even if a single bit of the block string is changed. Therefore, anyone connected to the network can verify blocks in the blockchain. As a result, it allows network participants to recognize any malicious change instantly. Although blockchain provides numerous advantages to ensure security and trustworthiness, it is also vulnerable to security attacks, such as the 51% attack [8,9].
Figure 2 illustrates the layout of a Bitcoin block. A block header is 80 bytes, including the version, previous hash, Merkle hash, timestamp, difficulty level, and nonce value. The block information is then hashed so that it forms a link to the inside of the next block’s header. These blocks are chained using hashing [10].

2.1. Hashing and Digital Signatures

A digital signature is a technique that helps to ensure data accuracy and integrity [11]. In blockchain, a digital signature is associated with each block depending on its content. Figure 2 illustrates a partial view of three interconnected blocks. Each block generates a digital signature by considering factors such as the digital signature of the previous block and the Merkle hash. However, Bitcoin calculates the block creation time based on the difficulty level. Therefore, a digital signature not only qualifies the block’s validity but also ensures integrity. This is crucial because miners must meet the difficulty requirement, such as generating a digital signature starting with 17 zeros to validate a block. If an attacker alters the data in the Merkle tree, the resulting signature would not match, invalidating the block and breaking the chain’s integrity [10]. To mitigate this, a new signature is needed. This still sounds a little off, so a new signature must be generated for the subsequent blocks in the chain to solve this. This process is very costly and almost impossible because the attacker will have to develop new signatures for the distorted blocks and the new blocks that may be created at interval gaps.
Figure 3 illustrates the structure of some interconnected Bitcoin blocks. In a Merkle tree process, Tx1, Tx2, and Tx3 are hashed, and then a nonce value is added sequentially along with the difficulty level to create another hash. This entire data string adds a hash value in the following block to form a chain.

2.2. Blockchain Mining

Blockchain mining is the process of mining a new block and incorporating it into the distributed ledger. The mining process is mainly related to Bitcoin, although variations exist in block generation among cryptocurrencies. Sometimes, a high-performance computing system is a basic requirement for the mining process. Mining nodes are required to solve the complex or computationally intensive mathematical problems. Therefore, it requires specialized hardware with significant processing power. A miner broadcasts its message in the network as soon as it solves a block on the network. Hence, other miners in the network stopped mining that block and instead began to work on solving the mathematical problem/puzzle for the next one. Mining pools play a vital role in the mining process because they are computationally expensive. A mining pool is a group of mining machines. Hence, a pool always possesses more hashing capability than a single miner, which gives it the advantage of obtaining the requisite hash faster [12]. Computing a Bitcoin block takes approximately ten minutes and highly utilizes hardware capabilities. Currently, miners are being paid 12.5 BTC for validating an effective block to add to the chain and all the fees that come with it. This reward is scaled down roughly by half every four years.

2.3. Proof of Work (PoW)

The first type of consensus mechanism is the proof of work (PoW), where individuals are required to solve an equation/mathematical puzzle to mine a block. Bitcoin adopted PoW as the first cryptocurrency, and many other popular cryptocurrencies followed the same practice. Miners are the primary stakeholders in PoW consensus and the decision-makers on whether to authorize and include transactions in the blockchain. To mine a block under PoW consensus, the miners must use much of their effort and energy. The procedure entails attempting answers at random using a trial-and-error approach [7]. Therefore, solving the equation may require one or more attempts. The “target hash” must have a lower value than the block hash for recognition [13].
Since the PoW consensus depends on the assumption that half of the network’s nodes are always trustworthy miners, it remains prone to attacks that exploit more than half of the hash power. One of the major disadvantages of PoW is its high energy cost and hardware requirements. Existing literature reveals that bitcoin mining requires more power than 159 countries. However, it should be understood that the requirements for mining and the actual time needed to mine can be strikingly different depending on the algorithms employed by each cryptocurrency. However, PoW mining is slow compared with other consensus types of protocols. Mining pools are also concentrated, as a few dominate the mining power all over the world in a network. Claiming attacks against those pools are capable of destabilizing the Bitcoin network. That is why recent attacks demonstrated that PoW is not immune to a 51% attack. Digital coins that work on PoW consensus and have low hashing capability are more prone to the 51% attack, as obtaining the necessary hash is quite simple [14]. According to [15], the following is the equation to calculate the probability of a node to produce a new block in a proof-of-work (PoW) blockchain.
P ( T i t ) = 1 exp r i D · t
The probability of a node producing a new block in a proof-of-work (PoW) is the node (i) has the processing capacity (ri), which has a negative exponential distribution, correlates with the time (Ti) it takes to produce a new valid block, whereas D denotes the target difficulty value.

2.4. 51% Attack on Blockchain Network

The 51% attack occurs when an attacker succeeds in obtaining network hashing power of up to 51%. This attack involves forging a parallel chain that is completely hidden from the primary chain. The attacker then discloses this secret chain to the network later to make it the main chain [6]. This is what makes the malicious activities possible. Hence, the blockchain policy holds that the longest chain rules; the attackers with 51% or more hashing power will be able to force other nodes in the network to join them by following their chain. Nevertheless, as long as the attackers acquire less than half the hashing power of the legitimate users, they undertake malicious activities and attempt the double-spending attack, damaging the credibility of the blockchain, but with a lower probability of success [8].
The network hashing rate directly influences the cost of an attack. Therefore, as the network hashing rate increases, the price of the attack also rises. Hence, it is assumed that cryptocurrencies with a high network hashing rate are much more protected against the 51% attack [16]. However, the majority hash rate could lead to a 51% attack, which is dangerous for the blockchain.

2.5. Use Case of a 51% Attack

In July 2014, the Bitcoin mining pool GHash.io briefly exceeded 50% of the total hashrate of the network, raising significant concerns about the potential for a 51% attack, a scenario in which a single entity could compromise the integrity of the blockchain by reversing transactions, double spending, or censoring other users. Although GHash.io did not exploit this control, its dominance exposed a critical vulnerability in Bitcoin’s consensus mechanism and underscored the risks associated with mining centralization. In response, GHash.io voluntarily limited its hashrate to 42% and supported the formation of a monitoring committee to address future threats [14]. This incident served as a pivotal case study, emphasizing the need for robust decentralization to maintain trust and security in blockchain networks. Emerging ML techniques offer promising solutions by enabling the real-time monitoring of mining activity, detecting anomalous behavior, and predicting centralization trends, thereby supporting proactive measures to preserve network integrity.

2.6. Probability of 51% Attack Calculation

The Bitcoin framework proposed by Satoshi Nakamoto [17] calculated the probability of an attack on multiple computing powers that an attacker could manipulate. The speed of an attacker chain and an honest chain move in a binomial random way. The attacker chain can overtake the honest chain at the rate calculated as follows:
q z = 1 if p q q p z if p > q
In the above equation, p represents the probability of an honest miner finding the block first, and q represents the attacker’s probability of mining a faster new block than an honest miner. Similarly, qz represents the probability that an attacker will overtake honest nodes that are z blocks behind. If the attacker’s probability exceeds the honest power, the attackers will catch up to the honest chain successfully. However, many factors affect the probability of an attack, including the network’s mining power, the difficulty level in proof-of-work consensus, and other variables. The network recipient must wait until six valid miners confirm the transaction and add it to a block. After that, z blocks are added to the main chain. The recipient is unaware of the progress of an attacker. However, it assumes the average expected mining time per block is for an honest node. Hence, the potential progress of an attacker will be a Poisson distribution with an expected value k = 0 λ k e λ k ! [18].
To determine the probability of an attacker, the Poisson density is multiplied by the progress that could be made.
k = 0 λ k e λ k ! · q p z k if k z 1 if k > z
The final equation to calculate the probability of the 51% attack is rearranged to avoid summing the infinite tail of the distribution.
1 k = 0 z λ k e λ k ! 1 q p z k

3. Literature Review

Blockchain has recently emerged as a promising technology in various areas and industries. However, it also suffers from various anomalous attacks, such as the majority attack, 51% attack, and double-spending. Therefore, detecting anomalous behavior in blockchain is necessary to prevent security threats. This literature review will focus on peer-reviewed research papers produced between 2019 and 2025 that aim to study the possibility of implementing ML algorithms to detect anomalies in the context of Blockchains. We perform a Systematic Literature Review (SLR) of ML models that are used to detect anomalies in the Blockchain network.
Our search has yielded 250 papers published between 2019 and 2025 proposing ML-based approaches to anomaly detection in blockchain. Based on benchmarking the chosen research articles, we outline the following ML techniques for anomaly detection in the selected research articles. Further, we also note that researchers have used unsupervised anomaly detection most among the classification anomaly detection systems. Detecting anomalies by employing ML models is a promising development direction, as many studies use different ML models. Hence, we provide recommendations and guidelines for researchers based on this review of pertinent literature.
Parsad et al. proposed a framework for Dynamic Miner Reputation and Weighted Block Acceptance (DRW-BA) to protect the blockchain from a 51% attack based on mining tracking behavior with adopted consensus mechanisms. The major limitation of this study is the scalability challenges due to the computational overhead in a real-time scenario. Additionally, model dependency is based on the miner’s behavior data accuracy, which could be easily manipulated by adversarial nodes [1].
Hasan et al. proposed an under-sampling algorithm called XGBCLUS to handle the imbalance across the transaction dataset between anomalous and non-anomalous data. This research incorporates eXplainable Artificial Intelligence (XAI) methods and anomaly rules into tree-based ensemble classifiers for anomalous Bitcoin transaction identification. An approach known as the Shapley Additive exPlanation (SHAP) is used to quantify the contribution of each feature, and it is usable with ensemble learning. Further, they define the procedure for deciding whether a Bitcoin transaction is normal. This algorithm is contrasted with other basic over-sampling and under-sampling methods [3].
Mohammed et al. proposed an architecture based on blockchain and ML analysis to detect fraudulent transactions and attacks in Blockchain. The proposed architecture is based on two stages: ML checks the data from sensors and blocks the abnormal transactions from entering the network, and the same ML model also checks the transactions in the blockchain, storing the data, and detects the attack in the Blockchain. The major issue with this study is to create computational overhead in real-time scenarios, as the same ML model is used for both tasks, sensor data analysis, and attack detection in blockchain [19].
Aponte-Novoa et al. present a comprehensive classification of miners in the crypto Ethereum and Bitcoin blockchain networks. This classification aims to provide the distributed computing assumptions and create miners’ profiles that will effectively detect the anomalous behavior in the miners’ nodes and also be helpful to avert the 51% attacks. They present a detailed analysis of the mining behavior of nodes in well-known blockchain networks, such as Ethereum and Bitcoin. The real blockchain data have been collected from the crypto Ethereum and Bitcoin networks using data science techniques, methodically examined in the analysis [14]. Duong, T., proposes a new approach that combines two blockchain consensus mechanisms, PoS and PoW, known as 2-hop Blockchain, to mitigate the risk of double-spending attacks. This approach ensures that if a miner node obtains mining power of more than 51% of the network, it will not have many opportunities to take any malicious action. They implement two consensuses: PoW is implemented in the first step, and PoS is implemented in the second. The limitation is higher centralization and resource consumption [20].
Yun et al. propose a scheme known as TBSD—a trust-based shard distribution that assigns the potentially malicious nodes to different shards in the blockchain network to exploit the limitations. They examine the single shard to inhibit malignant nodes ahead of the dominating influence. They proposed a TBSD model that incorporates the genetic algorithm (GA) and a trust management system to prevent malicious nodes from gathering in a single shard to reduce the likelihood of the majority of nodes’ attacks (51%) in a single shard [21].
Sayadi, S., et al., designed a new anomaly detection model that incorporates One-Class Support Vector Machine (OCSVM) and K-Means techniques to improve detection outcomes. This model is based on two phases: Phase 1 involves node behavioral analysis based on One-Class SVM, and Phase 2 involves detecting anomalous patterns in Bitcoin transactions. In phase 2, K-means clustering will be utilized to validate the attack, and each sort of attack will also be grouped based on the distance between the similarity indices. Large datasets take a long time to train the model [22].
Chen, X., et al., proposed a learning chain model over a blockchain that is a general learning predictive model. They develop differential privacy-based schemes by using decentralized SGD to protect the data privacy of each party. They also proposed a nearest aggregation algorithm capable of protecting the network from potential Byzantine attacks. A trade-off is observed between accuracy and privacy; whenever the budget for privacy is decreased, it amplifies the test error rate in all datasets. Furthermore, the model is compared with the higher differential privacy model, “Learning ChainEX,” but a similar test error has been found [23].

4. Research Methodology

A Systematic Literature Review SLR was carried out to achieve the objective of this research. SLR is the process outlined by Kitchenham and Charters [24]. The approach involves planning, conducting, and reporting the research, with several phases within each stage. The planning phase includes six stages: First, we need to define the study questions in terms of the purpose of the review. Second, we develop a search strategy to find relevant research papers that address the research questions by specifying appropriate search terms. Third, we establish the procedures for study selection, including inclusion and exclusion criteria. Fourth, we set rules for quality assessment to evaluate the collected studies. Fifth, we detail an extraction strategy to gather data that answers the research questions. Finally, the sixth stage involves synthesizing the obtained data. Figure 4 shows the methodology followed in SLR.

4.1. Research Questions

The main objective of this SRL is to provide an overview, elucidate, and analyze the ML models utilized for anomaly detection in blockchain networks from 2019 to 2025. This study is based on the three research questions (RQs):
RQ1: What is the percentage of papers that address anomaly detection in blockchain by implementing ML models?
RQ1 seeks to give the percentage of research papers that use ML models to detect anomalies in the blockchain.
RQ2: What kinds of ML algorithms are used for anomaly detection in blockchain?
RQ2 aims to specify the machine-learning techniques used to detect anomalies. It provides insight into the diversity and types of methods used in this research field, which can help understand trends, identify gaps, or guide future research.
RQ3: What is the accuracy and overall estimation of ML models?
RQ3 focuses on the estimation capabilities of ML models. The primary performance measure for these models is estimation accuracy. This question examines three key aspects: building the dataset, the performance metrics used, and the accuracy values achieved.

4.2. Search Strategy

The search strategy is based on research question 1 (RQ1) to address the quantitative analysis and pursues the percentage of papers focused on anomaly detection in the blockchain using ML models. RQ1 is based on quantitative analysis and seeks the percentage of papers focused on a particular application, such as anomaly detection in Blockchain, using a specific approach like Machine Learning. It provides a measurable and precise piece of information that could be used to gauge the prevalence or popularity of this research focus. We constructed the search term using the following procedure:
  • The search terms relevant to anomaly detection and machine learning are utilized in this SLR;
  • The research questions are analyzed to identify the key search terms for this SLR;
  • We replaced the main terms with new key terms, such as malicious, outliers, and anomalous;
  • Boolean operators like ORS and ANDS are also utilized to restrict the search results.
The keywords were used to construct the responses to the research questions, “Anomaly detection in blockchain”, “anomaly detection in blockchain AND machine learning”, “51% attack in blockchain”, “51 percent attack detection in blockchain”, “anomalous behavior of blockchain nodes detection”, “anomalous transactions detection AND machine learning”, “anomalous fork detection in blockchain”, “supervised ML models for anomaly detection in blockchain”. In this search, we utilized the following digital libraries: Google Scholar, Springer, ACM Digital Library, IEEE Explorer, and Elsevier. This review included 47 papers based on our inclusion and exclusion criteria. Table 1 shows the number of records found based on keyword searches on various digital libraries.
Figure 5 shows the publication trends from 2019 to 2025. The x-axis represents sequential years, while the y-axis on the right quantifies the number of publications, ranging from 100 to 2000. The bar chart indicates the specific publication count for each year, while the superimposed line graph traces the cumulative number of publications over this period. The data reveal a consistent upward trajectory in publication numbers until 2024, with a particular increase between 2019 and 2024. This trend appears to decline slightly in 2025, suggesting a potential stabilization in the publication output during this period.

4.3. Study Selection

Initially, we gathered 250 papers using the previously mentioned search terms, then we refined the selection to ensure only relevant papers were included in the SLR. The process of filtration and selection was as follows:
  • Eliminate duplicate articles from various digital libraries.
  • We set the inclusion and exclusion criteria based on the research questions to exclude irrelevant papers.
  • Exclude books and lecture notes from the collected list.
  • Quality assessment criteria were applied to comprise only those studies that best addressed our research goals.
Table 2 shows a brief overview of inclusion and exclusion criteria.

4.4. Quality Assessment Rules (QARs)

The Quality Assessment Rules (QARs) were the final step in determining the list of papers to be included in this SLR. These QARs are crucial for ensuring and evaluating the quality of the research papers. Ten QARs were identified, each assigned a score from 1 to 10. The scoring for each QAR is as follows: “fully answered” receives 1 point, “above average” receives 0.75, “average” receives 0.5, “below average” receives 0.25, and “not answered” receives 0. The total score for each paper is the sum of these marks. Papers scoring five or higher are included, while those scoring below five are excluded. The threshold of 5 was chosen because it represents the midpoint of good-quality papers that adequately address our research goals. The QARs are as follows:
  • QAR1: Are the objectives of the study clearly defined?
  • QAR2: Are the techniques for anomaly detection well-defined and discussed?
  • QAR3: Is the proposed specific application of anomaly detection well-defined?
  • QAR4: Are the implementation details of the proposed work included in the paper?
  • QAR5: To what extent are the experiments valid and justified?
  • QAR6: Is a sufficient dataset used to conduct experiments?
  • QAR7: Are the criteria in the estimation accuracy report accurate?
  • QAR8: Is the proposed approach compared with similar approaches?
  • QAR9: Is the analysis of the outcomes based on the proper techniques?
  • QAR10: Does the study have any implications for the academic fraternity or the industry?

4.5. Data Extraction Strategy

The final list of selected papers has been analyzed in the data extraction part to collect the necessary information to answer the research questions. The information collected from each paper is the title of the paper, year of publication, type of publication, type of anomaly application, RQ1, RQ2, and RQ3. The information was unstructured, which made extraction difficult. However, it is essential to understand that not all papers responded to the three research questions.

4.6. Synthesis of Extracted Data

To compile the data acquired from the selected papers, we employed several processes to aggregate evidence to address the research questions (RQs). We followed the narrative synthesis method to organize and present the information per RQ1 and RQ2. For RQ3, which involved data from various papers with differing methods for calculating accuracy, we used binary outcomes to compare the quantitative results.
Table 3 presents the comprehensive details of the ML models utilized in various applications for anomaly detection in blockchain networks. Previous studies were analyzed based on the ML model, blockchain type, application, evaluation criteria, and study findings to draw the research gap, showing the need for more focus in the field.

5. Machine Learning Algorithms in Blockchain

Anomaly detection in blockchain using ML identifies unusual patterns or behaviors in blockchain transactions, blocks, or other activities that may indicate fraudulent activities, security breaches, or other anomalies. Various ML methods are commonly employed for this purpose. We examined some of these ML models to address Research Question 2 (RQ2) and (RQ3).

5.1. Support Vector Machine (SVM)

A support vector machine (SVM) is an ML algorithm for classification problems. It is a frequently cited algorithm that uses a hyperplane to separate one class of observations from another class in a multidimensional space. However, the SVM is also used in one-class classification problems where all the data are related to a single class, and the algorithm is trained to learn the normal data and classify new data as normal data or anomalies. Although SVM performed well in linearly separated datasets, it is also used for non-linearly separable datasets with a certain proportion of errors allowed. This will be conducted by using a slack variable ξ and an upper bound C for the number of errors. The following is a minimization formula for the objective function of the SVM [32].
m i n w , b , ξ i w 2 2 + C i = 1 n ξ i
Subject to
y i w T ϕ ( x i ) + b 1 ξ i , for all i = 1 , , n or ξ i 0 , for all i = 1 , , n
x i is the i-th input data pattern and y i { 1 , 1 } is the i-th output pattern, specifying the class membership. The ϕ represents the non-linear function, and b is a bias vector.
x i · w + b + 1 for all y i = + 1 and x i · w + b 1 for all y i = 1
The hyperplane is defined as w T · x = 0 , and to increase the margin between the hyperplane, we should decrease w .
m i n w , b 1 2 w 2
The upper bound C represents the error rate; the possibility of errors increases when it is larger and decreases when it is smaller.
m i n w , b , c 1 2 w 2 + C i = 1 n C i

5.2. K-Means

The k-Means algorithm is used to cluster the data into k clusters. It is an unsupervised algorithm in which observations belong to the cluster with the nearest mean. This algorithm separates the data samples into k groups. K-means also minimizes the inertia or criterion within-cluster sum-of-squares. K-Means algorithm is also used to detect anomalies in the dataset. Inertia can be considered a measure of the cluster’s internal coherence. However, it also suffers from some limitations:
  • In manifolds with irregular shapes and elongated clusters, inertia responds very poorly because it is assumed that clusters are isotropic and convex, which is not always the case.
  • Inertia is a non-normalized metric; lower values are more effective, and zero is optimal. However, Euclidean distances tend to expand in very high-dimensional spaces.
The following is an equation for calculating the k-clusters.
a r g min S i = 1 k x S i x μ i 2
With the use of the k-means algorithm, a set of N samples { X 1 , X 2 , , X n } is divided into K disjoint clusters C = { C 1 , C 2 , , C k } , each of which is defined by the mean μ i of the samples within it. The within-cluster sum-of-squares criterion, also known as inertia, is the goal of the K-means algorithm’s centroids selection process. In S i , the centroid of points is represented by μ i . By minimizing the distance between the points inside each partition, a set of points { X 1 , X 2 , X 3 , , X n } is partitioned into k clusters S = { S 1 , S 2 , S 3 , , S k } with k n [38,39].

5.3. Random Forest

Breiman first introduced Random Forest (RF) in 2002. As its name implies, the Random Forest is a forest with a large number of trees. Most often, a decision tree (DT) is used as a sub-model in RF (Tree-Based) analysis, which produces a set of random parameters representing the degree of dependence on each tree. The random forest model generates predictions, like other ensemble algorithms, by combining multiple individual models. The RF procedure is composed of several steps. Initially, random bootstrap samples were created using the dataset. Subsequently, the data sample-based DT structure will yield the prediction outcomes for every tree. The final step involves implementing the results of the voting phase to generate the final output. The model that produces the most accurate prediction results will be chosen in this final stage [40]. The mathematical calculation based on the Gini index can be written as
Gini Index = 1 i = 1 n P i 2
Gini Index = 1 ( P + ) 2 + ( P ) 2
whereas P + and P are the probabilities of positive and negative classes, respectively. Entropy is implemented to determine whether a node can branch depending on the probability of a specific outcome [41]. It is more mathematically complex than the Gini index because it is computed using a logarithmic equation.
Entropy = i = 1 c     p i log 2 ( p i )

5.4. XGBoost

Chen and Guestrin presented a scalable ML procedure for tree boosting, Extreme Gradient Boosting, or XGBoost. It is based on gradient boosting and employs extra boosting techniques to outperform other gradient boosting models in terms of prediction accuracy. XGBoost generates scalable and accurate boosting gradients by leveraging the benefits of boosted tree algorithms. XGBoost generates trees in parallel to make predictions. The procedure is carried out level by level to generate predictions on each iteration from weak learners. In this way, errors of the previous level or tree can be reduced in each iteration. The resultant prediction of the model is the combination of weak learners, and these procedures are similar to other ensemble approaches [3]. XGBoost objective function at the iteration t can be written as [36]
L ( t ) = i = 1 n l y i , y ^ i ( t 1 ) + f t ( x i ) + Ω ( f t )
Let y ^ i ( t ) represent the prediction of the t-th iteration on the i-th occurrence, f t will be added to minimize the above objective [42].

5.5. Adaboost

The first truly successful and effective binary classification boosting approach is adaptive boosting. The developers originally called it AdaBoost. Recently, it has been applied for classification rather than regression; hence, it is also known as discrete AdaBoost. This method eliminates weak learners by repeatedly and iteratively fixing their errors, and turns them into strong ones. It assigns a weight to the outcome in each iteration. A series of weak learners are trained using this weighted training dataset. Afterwards, a large number of weak learners are combined into one strong learner. Ultimately, the stronger final model was chosen by applying the weighted voting approach to the weaker model [42,43].
Let S = { ( x 1 , y 1 ) , , ( x i , y i ) , , ( x n , y n ) } denote the training sample set in the binary classification. The additive model is the linear combination of the base classifier h t ( x ) is the foundation of the AdaBoost method.
f ( x ) = t = 1 T α t h t ( x )
The iteration number is denoted by t = { 1 , , T } , the base classifiers h t ( x ) are trained from a base classification algorithm whose classification ability is simply better than random guessing, and the weight coefficient is denoted as α t . The final classifier after continuous iteration is,
F ( x ) = s i g n ( f ( x ) ) = s i g n ( t = 1 T α t h t ( x ) )
Table 4 compares five prominent ML algorithms, such as Support Vector Machine (SVM), K-means clustering, Random Forest, XGBoost, and AdaBoost, across multiple critical dimensions. These models have been evaluated on the characteristics, including algorithmic type (supervised vs. unsupervised learning), learning paradigm, data requirements, optimization objectives, decision boundary characteristics, performance evaluation metrics, and key hyperparameters. The comparison shows that SVM, a supervised margin-based classifier, demonstrates strong performance in high-dimensional spaces but exhibits sensitivity to noise and outliers. In contrast, K-means, an unsupervised clustering algorithm, offers computational efficiency but requires a priori specification of cluster numbers and is sensitive to centroid initialization. Random Forest, an ensemble-based method, provides robust performance on large datasets but necessitates careful feature scaling. Meanwhile, XGBoost, a gradient-boosting technique, demonstrates superior resistance to overfitting but incurs higher computational costs. Finally, AdaBoost, a boosting algorithm, effectively reduces bias and variance but remains vulnerable to noisy data and outliers. This comparative analysis shows that each model has performance trade-offs and is selected based on specific research requirements and dataset characteristics.

6. Implementation Details

The primary focus of RQ3 is to analyze the accuracy and overall performance of ML models’ integration with the blockchain network. Therefore, three main aspects have been examined in this study. These key aspects are (1) the dataset building process (EDA), (2) the performance metrics, and (3) the accuracy values achieved. The details are given as follows:

6.1. Dataset Description

Obtaining real-time blockchain datasets presents significant challenges due to privacy and security constraints. Most enterprise networks operate on private blockchains, such as Ethereum or federated blockchains, where data access is restricted. In contrast, public blockchain datasets (e.g., Bitcoin) are more readily accessible and can be sourced from open repositories like Kaggle or extracted via web scraping from blockchain explorers such as Blockchain.com. However, it is difficult to find all the required features in the available open-source datasets. Therefore, feature engineering is necessary to derive the desired features.
We used the Bitcoin Historical Dataset featured on Kaggle. The dataset contains eight files, and we used the “data_update_2021-05-08.csv” file to perform feature selection and anomaly detection. The dataset file consists of 682,676 rows and 19 columns.

6.2. Experimental Setup

We use the Kaggle platform for the experimental setup. The first step is to conduct Exploratory Data Analysis (EDA) to understand the main characteristics, patterns, and potential issues within a dataset. During the EDA process, relevant features are identified to detect anomalies. In the preprocessing stage, several essential tasks are carried out, including handling missing values, feature scaling, converting non-numeric columns, and splitting the dataset into features (X) and the target label (y).

6.3. Feature Engineering

The features are selected based on the anomalous behavior of mining nodes that could indicate a hash rate capable of enabling a 51% attack. The following is the list of features:
  • Hash: It represents the hash value of a transaction.
  • Number of Transactions: It represents the number of transactions.
  • Timestamp: It represents the mining time of a transaction.
  • Height: It represents the block number.
  • Difficulty changes over time: A block’s difficulty is determined by comparing it to its easiest possible state.
  • Confirmations: It represents the number of times a transaction has been verified by subsequent blocks. More confirmations mean higher security and irreversibility.

6.4. Insert Anomalies

We applied the synthetic anomaly generation method to create anomalous transactions in the dataset. This method tests ML algorithms by generating synthetic anomalies that deviate from the normal data patterns. We will insert anomalies into the relevant columns (Confirmations, Height, Difficulty, Number of Transactions, and Timestamps) and set the 5% anomalies into the dataset. A binary column is added to indicate whether the row contains an anomaly.

6.5. Train the ML Models

We will build the models and train them on the training data. We implement the RandomForest, K-means, AdaBoost, XGBoost, and Isolation Forest ML models from scikit-learn.

6.6. Anomaly Detection

To detect anomalies, we consider instances that are either misclassified by the models or have a low prediction probability (below a specified threshold). For K-means clustering, anomalies are identified by measuring the distance of each data point from its nearest cluster center. Points that lie farthest from the cluster centers are considered anomalous. We also use synthetic anomalies, data points that deviate significantly from normal patterns, to evaluate detection performance.

6.7. Model Evaluation and Comparison

Finally, we compare the accuracy, precision, recall, and F1-score of each model to determine the performance of ML models.

7. Findings

We implement Random Forest, AdaBoost, XGBoost, K-Means, and Isolation Forest machine models to analyze the performance of detecting anomalies such as 51% attack in blockchain networks. The results are evaluated using the following metrics.

7.1. Confusion Matrix

The confusion matrix provides a critical insight into how well the classification technique works. For considering the anomaly detection tasks, confusion matrices are very useful as the balance among True Positive, True Negative, False Positive, and False Negative is critical [24].
Figure 6 shows the confusion matrices of ML models for anomaly detection in blockchain. The matrices show that the Random Forest and XGBoost models outperformed other ML models. It shows true positives and true negatives with minimal misclassification. AdaBoost performed well above the average; nevertheless, the false negative rate is higher. However, K-Means and Isolation Forest struggled with both false positives and false negatives. To improve the AdaBoost performance in the future, there is a need to concentrate on hyperparameter tuning and feature engineering. It could enhance model performance and also optimize the Random Forest and XGBoost models. Moreover, K-Means and Isolation Forest would need more significant adjustments, like dimensionality reduction or hybrid modeling, to effectively be used for anomaly detection in blockchain systems. When considering anomaly detection, models should be optimal when the TP and TN rates are high, while the rates of FP and FN are low.

7.2. Performance Matrix

It is useful to recall that researchers use such factors as precision, recall, and F-score to compare the outcome of an anomaly detection model. Table 5 demonstrates the performance matrix of the implemented ML models. The performance matrix shows that Random Forest and XGBoost are efficient models with maximum accuracy of 99.83% and 99.80%, with a good value of precision, recall, and F1 score, which shows that the models can be very effective for the detection of anomalies. The performance of AdaBoost is also quite impressive, with an accuracy of 98.45%. However, it is slightly lower at the recall of 70.37%, meaning that although it yields accurate results with good precision, it does not detect all the anomalies. Additionally, the performance of the K-Means algorithm is relatively lower compared with other models, with a maximum accuracy of 48.54% at the best level. Despite the precision ratio being impressively high at 95%, the model struggles with recall, and the significant number of outliers cannot be detected. However, it labels outliers with high probability and with elevated certainty. The Isolation Forest algorithm shows a poor performance even after the optimization across all metrics, which indicates that it is the least effective model for anomaly detection in the blockchain.
In Figure 7, the bar chart compares the results of different ML models for anomaly detection. Apart from the accuracy, both Random Forest and XGBoost were very encouraging with perfect scores of 1.00 and 0.998, respectively, in this case. The result of AdaBoost was slightly lower at 0.98, but it still performed well among all the models presented here. However, the performance of K-Means run on the same set was much lower, as well as Isolation Forest. For the K-Means clustering technique, the accuracy is 0.49, which is moderate accuracy; however, for Isolation Forest, optimized to the best extent, the cross-validation accuracy is 0.05. This contrast alone shows how better the ensemble methods like Random Forest and XGBoost are over the unsupervised learning for this specific anomaly detection task.

7.3. Open Challenges

Several challenges arise when applying ML algorithms for anomaly detection in blockchain. These issues stem from the nature of blockchain data, the constantly evolving dynamics of malicious activities, and the limitations of existing models. Below are the key challenges identified, derived from the findings on ML model performance and further analyzed based on previous studies.
Scalability: Blockchain Size: Blockchains, especially those as large as Bitcoin or Ethereum, produce large volumes of data. Handling this volume effectively while detecting anomalies in real-time is a major computational challenge [7].
Throughput and Latency: Since anomaly detection is in real-time, ML models must analyze high transaction volumes to identify any irregularities. High throughput and low-latency performance are hard to achieve in blockchain systems, especially with limited computation resources [44].
Imbalanced Dataset: Rare Anomalies: Anomalies in blockchain systems, for example, fraud or attacks, happen in comparison to the number of real transactions exceptionally seldom. This leads to an extremely imbalanced dataset where normal transactions are many while the anomalous ones are few. To control learning, it is challenging to train an ML model about rare occurrences without loading its learning algorithm with normal events [45]. Synthetic Data Generation: Creating sufficiently diverse synthetic data for rare cases, such as attack detection or specific attack vectors for improving model training and testing, is not an easy task [46].
Dynamic Nature of Anomalies: Evolving Attack Vectors: Blockchain systems persistently face challenging attacks by intelligent adversaries who change their approaches with time. As a result, such information-based ML approaches may fail to identify new forms of anomalies as ML models trained on historical data. Adaptation: As for these dynamic anomalies, it is inevitable to employ flexibility in creating ML models that learn continuously, and this makes model deployment and the inclusion of training factors a little more complex [47].
Feature Engineering: Complexity of Transaction Graphs: Blockchain systems present multifaceted, multi-dimensional graphs of transactions, wallets, addresses, and smart contracts. Extracting useful features from this raw, high-dimensional, and heterogeneous data is an ongoing challenge. Time-Series Data: The utilization of Blockchain will entail the temporality of data because blocks and transactions that are processed by the system are associated with a timestamp. Using time series analysis in conjunction with anomaly detection adds further complications [27].
Lack of Standardization in Blockchain Protocols: Diverse Blockchain Architectures: It becomes difficult to design a general model that runs across different blockchains because of the varying architectures, data structures, and consensus mechanisms of blockchains like Bitcoin, Ethereum, and Hyperledger. It requires a flexible and adaptable approach. Cross-Blockchain Anomaly Detection: As blockchains and dApps become more integrated into the ecosystem, the identification of distributed anomalies becomes an upcoming challenge [48].
Real-time Detection and False Positives: Real-time Anomaly Detection: Blockchain systems are real-time, and to contain such issues, detection must be quite near real-time. This, especially if accomplished with high accuracy and low latency, is computationally challenging. False Positives and Negatives: Finding the right balance between identifying threats and minimizing false positives when it comes to a blockchain is also a challenging task. Aggressive models risk overanalyzing genuine transactions, while more relaxed models might leave out important threats [49].
Limited Labeled Data for Supervised Learning: Most of the available blockchain anomaly detection systems use the labeled data for training purposes, but labeled data are often a challenge if, for example, the blockchain environment is privacy-sensitive, has unauthorized access or the type of attack is relatively new, hence difficult to label [4].
Addressing these challenges will require advances in algorithm design, computational efficiency, collaborations between research institutions and enterprises, and restoring an understanding and perception of blockchain and ML models.

7.4. Discussion

In this study, data were collected from 2019 to 2025. As demonstrated in Table 3, several studies have been conducted and published since creating and applying ML algorithms in blockchain networks. The study compared the performance of several ML algorithms for the ability to identify anomalies in blockchain networks. According to Table 5, the accuracy of the supervised learning models, XGBoost and Random Forest, was achieved at about 99.8%. It is also backed with high precision, high recall, and high F1-score, revealing that such models are well-suited for detecting anomalies in the blockchain network. These models, incorporating the ensemble learning method, are especially suitable for coping with missing values and are possibly immune to overfitting, and such aspects facilitate the performance of these models. Despite that, from a performance perspective, the training phase of Random Forest may be costly with regard to the number of trees or iterations. Although XGBoost is relatively fast and scalable, it could have serious computational problems, especially for hyperparameter tuning or model building in high dimensions. AdaBoost also worked well and was quite accurate 98.45%, while its recall could suggest a better capability to identify all anomalies. The accuracy of K-means was 48.54% while that of Isolation Forest was 5.31%. These findings suggest that unsupervised learning approaches, despite their simplicity, may be insufficiently accurate for reliable anomaly detection in high-stakes blockchain environments.
Some open challenges have been discussed in Section 7.3 that the research community in this field needs to address. Solving these challenges will imply new methodological approaches in algorithm design and computational scaling, the interplay between academia and industry, and improved knowledge of blockchain technology and ML models.

7.5. Limitations

This study emphasizes the need for anomaly detection in blockchain networks. Supervised learning, especially ensemble methods, outperforms unsupervised techniques, making them a preferred choice for safeguarding blockchain systems. However, the use of a synthetic anomaly dataset in the experiment limits the results and implementation in a real-time blockchain. Automated anomaly detection in decentralized systems raises important ethical considerations, particularly concerning the potential for false positives. In financial contexts, such errors can lead to unwarranted transaction delays or denials, financial loss, and adversely affect user trust and access.

7.6. Future Directions

This work demonstrates that specific model tuning and feature extraction have to be conducted to enhance the results yielded by the models. In the future, we will work on more sophisticated techniques for model tuning, hybrid models, feature extraction, and real-time classifications to propose a new framework to address the challenges of anomaly detection, such as a 51% attack in blockchain networks.

8. Conclusions

In this paper, we conducted a Systematic Literature Review (SLR) and employed a classification framework on blockchain to investigate the integration of blockchain and ML for the detection of anomalous transactions in the blockchain network. An extensive discussion is then made on a review of studies that have looked at the application of ML methodologies for anomaly detection. We explored the literature on how ML techniques can be used to identify an anomaly. This study gives a clearer perspective on this relatively new field of study and serves as the basis for further research into the integration of blockchain and ML in anomaly detection. This work assessed the performance of various ML algorithms in the detection of anomalies that occur in blockchain systems. The study evaluated that supervised learning models show the highest accuracy, particularly in Random Forest and XGBoost. The study proves that the strategies of assembling can improve outcomes and performance. Such models use an ensemble approach and can manage missing values; in addition, they are not sensitive to first-order overfitting, which is beneficial for the results. However, Random Forest and AdaBoost also have some limitations; they take a long time to train, especially when considering a large number of trees or iterations. XGBoost is well known for its speed and ability to scale up, although it demands a fair deal of computational resources, more if hyperparameters are being tuned or when the number of features being used is large. However, our findings emphasize optimized model tuning and feature extraction; future work targets hybrid architectures and real-time detection to mitigate blockchain anomalies, such as a 51% attack.

Author Contributions

Conceptualization, F.J.; Methodology, F.J.; Validation, M.R.; Formal analysis, F.J.; Data curation, F.J.; Writing—original draft, F.J.; Writing—review & editing, M.R.; Supervision, M.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data used to support the findings of this study are available from the corresponding author upon request.

Acknowledgments

The authors gratefully acknowledge the support and resources provided by the SZABIST University, which contributed to the successful completion of this study. They also extend their appreciation to the editor and editorial team of Computers for the opportunity to publish this work.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Prasad, V.S.R.; Harshitha, G.; Sujitha, A.; Asmitha, M.; Aishwarya, A.; Priya, D.J. Strengthening Blockchain Security: Countering 51% Attacks Using Dynamic Miner Reputation and Weighted Block Acceptance (DRW-BA). Synth. Multidiscip. Res. J. 2025, 3, 1–13. [Google Scholar]
  2. Chen, S.; Liu, Y.; Zhang, Q.; Shao, Z.; Wang, Z. Multi-Distance Spatial-Temporal Graph Neural Network for Anomaly Detection in Blockchain Transactions. Adv. Intell. Syst. 2025, 2400898. [Google Scholar] [CrossRef]
  3. Hasan, M.; Rahman, M.S.; Janicke, H.; Sarker, I.H. Detecting anomalies in blockchain transactions using machine learning classifiers and explainability analysis. Blockchain Res. Appl. 2024, 5, 100207. [Google Scholar] [CrossRef]
  4. Cholevas, C.; Angeli, E.; Sereti, Z.; Mavrikos, E.; Tsekouras, G.E. Anomaly Detection in Blockchain Networks Using Unsupervised Learning: A Survey. Algorithms 2024, 17, 201. [Google Scholar] [CrossRef]
  5. Tukur, Y.M.; Thakker, D.; Awan, I.U. Edge-based blockchain enabled anomaly detection for insider attack prevention in Internet of Things. Trans. Emerg. Telecommun. Technol. 2021, 32, e4158. [Google Scholar] [CrossRef]
  6. Mishra, D.; Phansalkar, S. Blockchain Security in Focus: A Comprehensive Investigation into Threats, Smart Contract Security, Cross-Chain Bridges, Vulnerabilities Detection Tools & Techniques. IEEE Access 2025, 13, 60643–60671. [Google Scholar]
  7. Jain, A.K.; Gupta, N.; Gupta, B.B. A survey on scalable consensus algorithms for blockchain technology. Cyber Secur. Appl. 2025, 3, 100065. [Google Scholar] [CrossRef]
  8. Yusuf, F.; Widayanti, R.; Putri, S.R.; Wellington, A. A Comprehensive Framework for Enhancing Blockchain Security and Privacy. Blockchain Front. Technol. 2025, 4, 171–182. [Google Scholar]
  9. Liu, Z.; Gao, H.; Lei, H.; Liu, Z.; Liu, C. Blockchain Anomaly Transaction Detection: An Overview, Challenges, and Open Issues. In Proceedings of the International Conference on Information Science, Communication and Computing, Chongqing, China, 2–5 June 2023; Springer: Berlin/Heidelberg, Germany, 2023; pp. 126–140. [Google Scholar]
  10. Sayeed, S.; Marco-Gisbert, H. Assessing blockchain consensus and security mechanisms against the 51% attack. Appl. Sci. 2019, 9, 1788. [Google Scholar] [CrossRef]
  11. Walker, H. How Digital Signatures and Blockchains Can Work Together. 2018. Available online: www.cryptomathic.com/news-events/blog/how-digital-signatures-and-blockchains-can-work-together (accessed on 1 August 2018).
  12. Acheson, N. How Bitcoin Mining Works. CoinDesk. 2018. Available online: https://www.coindesk.com/learn/how-bitcoin-mining-works-2 (accessed on 8 September 2023).
  13. Dey, S. Securing majority-attack in blockchain using machine learning and algorithmic game theory: A proof of work. In Proceedings of the 2018 10th Computer Science and Electronic Engineering (CEEC), Colchester, UK, 19–21 September 2018; pp. 7–10. [Google Scholar]
  14. Aponte-Novoa, F.A.; Orozco, A.L.S.; Villanueva-Polanco, R.; Wightman, P. The 51% attack on blockchains: A mining behavior study. IEEE Access 2021, 9, 140549–140564. [Google Scholar] [CrossRef]
  15. Liu, Q.; Xu, Y.; Cao, B.; Zhang, L.; Peng, M. Unintentional forking analysis in wireless blockchain networks. Digit. Commun. Netw. 2021, 7, 335–341. [Google Scholar] [CrossRef]
  16. Chaudhary, K.C.; Chand, V.; Fehnker, A. Double-spending analysis of bitcoin. In Proceedings of the Pacific Asia Conference on Information Systems, Dubai, United Arab Emirates, 22–24 June 2020. [Google Scholar]
  17. Nakamoto, S. Bitcoin: A Peer-to-Peer Electronic Cash System. 2008. Available online: https://assets.pubpub.org/d8wct41f/31611263538139.pdf (accessed on 1 August 2023).
  18. Sodhi, G.K.; Sharma, M.; Miglani, R. A Comprehensive Analysis of Blockchain Network Security: Attacks and Their Countermeasures. In Proceedings of the International Conference on Recent Trends in Image Processing and Pattern Recognition, Derby, UK, 7–8 December 2023; Springer: Berlin/Heidelberg, Germany, 2023; pp. 276–291. [Google Scholar]
  19. Mohammed, M.A.; Boujelben, M.; Abid, M. A novel approach for fraud detection in blockchain-based healthcare networks using machine learning. Future Internet 2023, 15, 250. [Google Scholar] [CrossRef]
  20. Duong, T.; Fan, L.; Katz, J.; Thai, P.; Zhou, H.S. 2-hop blockchain: Combining proof-of-work and proof-of-stake securely. In Proceedings of the European Symposium on Research in Computer Security, Guildford, UK, 14–18 September 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 697–712. [Google Scholar]
  21. Yun, J.; Goh, Y.; Chung, J.M. Trust-based shard distribution scheme for fault-tolerant shard blockchain networks. IEEE Access 2019, 7, 135164–135175. [Google Scholar] [CrossRef]
  22. Sayadi, S.; Rejeb, S.B.; Choukair, Z. Anomaly detection model over blockchain electronic transactions. In Proceedings of the 2019 15th International Wireless Communications & Mobile Computing Conference (IWCMC), Tangier, Morocco, 24–28 June 2019; pp. 895–900. [Google Scholar]
  23. Chen, X.; Ji, J.; Luo, C.; Liao, W.; Li, P. When machine learning meets blockchain: A decentralized, privacy-preserving and secure design. In Proceedings of the 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, 10–13 December 2018; pp. 1178–1187. [Google Scholar]
  24. Budgen, D.; Brereton, P. Performing systematic literature reviews in software engineering. In Proceedings of the 28th International Conference on Software Engineering, Shanghai, China, 20–28 May 2006; Volume 49, pp. 1051–1052. [Google Scholar]
  25. Buchdadi, A.D.; Al-Rawahna, A.S.M. Anomaly Detection in Open Metaverse Blockchain Transactions Using Isolation Forest and Autoencoder Neural Networks. Int. J. Res. Metaverse 2025, 2, 24–51. [Google Scholar] [CrossRef]
  26. Rwibasira, M.; Suchithra, R. ADOBSVM: Anomaly detection on block chain using support vector machine. Meas. Sens. 2022, 24, 100503. [Google Scholar] [CrossRef]
  27. Jatoth, C.; Jain, R.; Fiore, U.; Chatharasupalli, S. Improved classification of blockchain transactions using feature engineering and ensemble learning. Future Internet 2021, 14, 16. [Google Scholar] [CrossRef]
  28. Kim, J.; Nakashima, M.; Fan, W.; Wuthier, S.; Zhou, X.; Kim, I.; Chang, S.Y. Anomaly detection based on traffic monitoring for secure blockchain networking. In Proceedings of the 2021 IEEE International Conference on Blockchain and Cryptocurrency (ICBC), Sydney, Australia, 3–6 May 2021; pp. 1–9. [Google Scholar]
  29. Agarwal, R.; Barve, S.; Shukla, S.K. Detecting malicious accounts in permissionless blockchains using temporal graph properties. Appl. Netw. Sci. 2021, 6, 1–30. [Google Scholar] [CrossRef]
  30. Signorini, M.; Pontecorvi, M.; Kanoun, W.; Di Pietro, R. BAD: A blockchain anomaly detection solution. IEEE Access 2020, 8, 173481–173490. [Google Scholar] [CrossRef]
  31. Liao, Q.; Gu, Y.; Liao, J.; Li, W. Abnormal transaction detection of Bitcoin network based on feature fusion. In Proceedings of the 2020 IEEE 9th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), Chongqing, China, 11–13 December 2020; Volume 9, pp. 542–549. [Google Scholar]
  32. Huang, D.; Chen, B.; Li, L.; Ding, Y. Anomaly detection for consortium blockchains based on machine learning classification algorithm. In Proceedings of the International Conference on Computational Data and Social Networks, Bangkok, Thailand, 16–18 December 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 307–318. [Google Scholar]
  33. Kumar, N.; Singh, A.; Handa, A.; Shukla, S.K. Detecting malicious accounts on the Ethereum blockchain with supervised learning. In Proceedings of the Cyber Security Cryptography and Machine Learning: Fourth International Symposium, CSCML 2020, Be’er Sheva, Israel, 2–3 July 2020; Proceedings 4. Springer: Berlin/Heidelberg, Germany, 2020; pp. 94–109. [Google Scholar]
  34. Poursafaei, F.; Hamad, G.B.; Zilic, Z. Detecting malicious Ethereum entities via application of machine learning classification. In Proceedings of the 2020 2nd Conference on Blockchain Research & Applications for Innovative Networks and Services (BRAINS), Paris, France, 28–30 September 2020; pp. 120–127. [Google Scholar]
  35. Song, J.; He, H.; Lv, Z.; Su, C.; Xu, G.; Wang, W. An efficient vulnerability detection model for ethereum smart contracts. In Proceedings of the Network and System Security: 13th International Conference, NSS 2019, Sapporo, Japan, 15–18 December 2019; Proceedings 13. Springer: Berlin/Heidelberg, Germany, 2019; pp. 433–442. [Google Scholar]
  36. Ostapowicz, M.; Żbikowski, K. Detecting fraudulent accounts on blockchain: A supervised approach. In Proceedings of the Web Information Systems Engineering–WISE 2019: 20th International Conference, Hong Kong, China, 19–22 January 2020; Proceedings 20. Springer: Berlin/Heidelberg, Germany, 2019; pp. 18–31. [Google Scholar]
  37. Baek, H.; Oh, J.; Kim, C.Y.; Lee, K. A model for detecting cryptocurrency transactions with discernible purpose. In Proceedings of the 2019 Eleventh International Conference on Ubiquitous and Future Networks (ICUFN), Zagreb, Croatia, 2–5 July 2019; pp. 713–717. [Google Scholar]
  38. Siddamsetti, S.; Tejaswi, C.; Maddula, P. Anomaly detection in blockchain using machine learning. J. Electr. Syst. 2024, 20, 619–634. [Google Scholar] [CrossRef]
  39. Li, Y.G. A clustering method based on K-means algorithm. Appl. Mech. Mater. 2013, 380, 1697–1700. [Google Scholar] [CrossRef]
  40. Hisham, S.; Makhtar, M.; Aziz, A.A. Combining multiple classifiers using ensemble method for anomaly detection in blockchain networks: A comprehensive review. Int. J. Adv. Comput. Sci. Appl. 2022, 13, 404–422. [Google Scholar] [CrossRef]
  41. Primartha, R.; Tama, B.A. Anomaly detection using random forest: A performance revisited. In Proceedings of the 2017 International Conference on Data and Software Engineering (ICoDSE), Palembang, Indonesia, 1–2 November 2017; pp. 1–6. [Google Scholar]
  42. Awan, K.A.; Din, I.U.; Almogren, A.; Kim, B.S.; Guizani, M. Enhancing IoT Security with Trust Management Using Ensemble XGBoost and AdaBoost Techniques. IEEE Access 2024, 12, 116609–116621. [Google Scholar] [CrossRef]
  43. Wang, W.; Sun, D. The improved AdaBoost algorithms for imbalanced data classification. Inf. Sci. 2021, 563, 358–374. [Google Scholar] [CrossRef]
  44. Hafid, A.; Hafid, A.S.; Samih, M. Scaling blockchains: A comprehensive survey. IEEE Access 2020, 8, 125244–125262. [Google Scholar] [CrossRef]
  45. Zamyatin, A.; Al-Bassam, M.; Zindros, D.; Kokoris-Kogias, E.; Moreno-Sanchez, P.; Kiayias, A.; Knottenbelt, W.J. Sok: Communication across distributed ledgers. In Proceedings of the Financial Cryptography and Data Security: 25th International Conference, FC 2021, Virtual Event, 1–5 March 2021; Revised Selected Papers, Part II 25. Springer: Berlin/Heidelberg, Germany, 2021; pp. 3–36. [Google Scholar]
  46. Pang, G.; Shen, C.; Cao, L.; Hengel, A.V.D. Deep learning for anomaly detection: A review. ACM Comput. Surv. CSUR 2021, 54, 1–38. [Google Scholar] [CrossRef]
  47. Conti, M.; Kumar, E.S.; Lal, C.; Ruj, S. A survey on security and privacy issues of bitcoin. IEEE Commun. Surv. Tutor. 2018, 20, 3416–3452. [Google Scholar] [CrossRef]
  48. Jia, X.; Xu, J.; Han, M.; Zhang, Q.; Zhang, L.; Chen, X. International Standardization of Blockchain and Distributed Ledger Technology: Overlaps, Gaps and Challenges. CMES Comput. Model. Eng. Sci. 2023, 137, 1491–1523. [Google Scholar] [CrossRef]
  49. Sanjay Rai, G.; Goyal, S.; Chatterjee, P. Anomaly detection in blockchain using machine learning. In Computational Intelligence for Engineering and Management Applications: Select Proceedings of CIEMA 2022; Springer: Berlin/Heidelberg, Germany, 2023; pp. 487–499. [Google Scholar]
Figure 1. CIA-Triad.
Figure 1. CIA-Triad.
Computers 14 00247 g001
Figure 2. A basic Bitcoin block design layout.
Figure 2. A basic Bitcoin block design layout.
Computers 14 00247 g002
Figure 3. The structure of Bitcoin blocks.
Figure 3. The structure of Bitcoin blocks.
Computers 14 00247 g003
Figure 4. SLR Methodology.
Figure 4. SLR Methodology.
Computers 14 00247 g004
Figure 5. Number of Studies Published over Time.
Figure 5. Number of Studies Published over Time.
Computers 14 00247 g005
Figure 6. Confusion matrices of ML algorithms.
Figure 6. Confusion matrices of ML algorithms.
Computers 14 00247 g006
Figure 7. Accuracy Comparison of ML models.
Figure 7. Accuracy Comparison of ML models.
Computers 14 00247 g007
Table 1. Number of retrieved records from digital libraries according to pre-specified keywords.
Table 1. Number of retrieved records from digital libraries according to pre-specified keywords.
Digital LibrariesKeywordsTotal Records
Google Scholar“Anomaly detection in blockchain”, “anomaly detection in blockchain AND machine learning”19,018
Springer“51% attack in blockchain”, “51 percent attack detection in blockchain”3784
ACM Digital Library“anomalous behavior of blockchain nodes detection”4779
IEEE Explorer“anomalous transactions detection in Blockchain AND machine learning”, “anomalous fork detection in blockchain”911
Elsevier 7954
Table 2. Inclusion and Exclusion Criteria.
Table 2. Inclusion and Exclusion Criteria.
Inclusion CriteriaExclusion Criteria
Publication of articles in peer-reviewed journalsEliminate duplicate articles
Accessible research articles.Exclude books, reports, lecture notes, and miscellany.
Relevant content to anomaly detection in the blockchain networkGrey literature, such as blogs and government documents
Table 3. Analysis of ML models used for Anomaly Detection.
Table 3. Analysis of ML models used for Anomaly Detection.
Ref.TypeML-ModelBlockchainApplicationEvaluation CriteriaFindings
[1]JournalTCNEthereum51% attack detectionF1-score, AUC-ROCReduces 51% attack success to 7.3%, better than PoW (78.4%) and CBL-PoW (26.7%) but has complexity issues.
[2]JournalGNNBitcoinAnomaly detectionAUC-ROC, AUC-PRAchieves 1.5% AUC-ROC and 2.9% AUC-PR, but limited by dataset and scalability issues.
[25]JournalIsolation Forest, AutoencoderOpen MetaverseAnomaly detectionPrecision, Recall, F1-score, AUC-ROCIsolation Forest: 0.85 precision; Autoencoder: 0.87. Depends on threshold.
[3]JournalTree-Based EnsembleBitcoinAnomaly detectionAccuracy, TPR, FPR, ROC-AUCSHAP values help identify normal/fraudulent transactions effectively.
[26]JournalEnsemble BoostingCrypto-currencyanomaly detectionAccuracy, precision, F-measure, and RecallIt is evaluated that the ensemble boosting technique performs better than the other models.
[27]JournalADOBSVMBitcoinAnomaly detectionAccuracy, energy value matricesSVM focuses on good security with less execution time. Its efficacy is measured with the help of attack detection rate, error rate, execution time, and power consumption.
[28]ConferenceAuto-EncoderBitcoinDetection of malicious eventsThe standard metrics of accuracy and F1 scoreDetect malicious events in blockchain networks with reduced time complexity.
[29]JournalEnsemble (DT)Ethereummalicious accountsBalanced accuracy, Precision, and Recall, and F1Security analysis performed with the ensemble technique (ExtraTreesClassifier) classified the accounts as suspicious. It yielded an overall accuracy of 87.2% and 88.7%.
[14]JournalK-Means, DBScan, BirchBitcoin, Ethereumdetection of anomalous behaviors and preventing 51% attacksmeans, one standard deviation, hash rate calculationAny miners with a hash rate exceeding one per cent within previously established time intervals will develop profiles that could be used to identify anomalous behaviors.
[30]JournalBADBitcoinAnomalous Fork Detectioncomplexity and overhead calculationBAD can identify blocks that are different but hold the same transactions (or a subset).
[31]JournalKNNBitcoinAbnormal transactionKendall correlation coefficient matrix, correlation heat mapThe analysis outcomes indicate that KNN effectively identifies suspicious transactions in the nodes.
[32]JournalSVN + KNNBlockchainMalicious UsersPrecision, F1-score, recall, accuracyWhile comparing KNN with the CNN algorithm, KNN and SVM are more appropriate, consuming one-third of the resources of the CNN algorithm, while having an accuracy value of more than 0.9, which is 0.9 per cent less than the CNN algorithm.
[33]ConferenceXGBoostEthereumMalicious AccountPrecision, F1-score, recall, accuracyThat assessment is 96.21% accurate with a false positive rate of 3 per cent. The ensemble approach provides considerable results (an F1 score of 0.996).
[34]JournalEnsembleEthereumMalicious TransactionPrecision, F1-score, recall, accuracyThe accuracy of that assessment is 96.21%, with a false positive rate of 3%. The ensemble approach yields high results in the benchmark (F1 score of 0.996).
[35]ConferenceRandom ForestEthereumVulnerability detectionPrecision, F1-score, recall, accuracyThe model is capable of identifying these vulnerabilities effectively as well as expeditiously.
[36]JournalXGBoostEthereumVulnerability DetectionAccuracy, precision, F-measure, RecallThe Ethereum Micro-F1 and Macro-F1 yield a more accurate Turing-complete Ethereum Virtual Contract of over 96%.
[37]ConferenceOCSVMBitcoinAnomaly DetectionRand index (RI), Confusion matrix,In the first stage, among 27 data instances, the OCSVM algorithm with an accuracy of 0.9 gave 15 anomalies by using K-means in the second stage, to cluster the anomalies detected in the first stage into 3 clusters, with 0.951 as a better result of clustering.
Table 4. Comparison of the machine learning algorithms based on the given categories.
Table 4. Comparison of the machine learning algorithms based on the given categories.
CategorySVMK-MeansRandom ForestXGBoostAdaBoost
Algorithm typeSupervised learningUnsupervised learningSupervised learningSupervised learningSupervised learning
Learning typeBatch learningPrototype-based learningEnsemble-based learningGradient boostingBoosting
Data requirementLabeled dataUnlabeled dataLabeled dataLabeled dataLabeled data
ObjectiveClassificationClusteringClassification or regressionClassification or regressionClassification or regression
Optimization problemMargin maximizationMinimize within-cluster sum of squaresMinimize loss functionGradient boosting optimizationMinimize classification error
Decision boundary typeLinear or non-linearNot ApplicableLinear or non-linearNon-linearNon-linear
Performance evaluationAccuracy, F1-scoreInertia (within-cluster sum of squares)Accuracy, Out-of-bag errorAccuracy, Log lossAccuracy, Log loss
Regularization parameterTerm CNumber of clusters (k)Number of estimators, Max depthLearning rate, Max depthLearning rate, Number of estimators
AdvantagesEffective in high-dimensional spaces; VersatileSimple; Computationally efficient; ScalableHigh efficiency on large datasets; Fast convergenceRobust to overfitting; Handles missing values wellReduces bias and variance; Resistant to overfitting
DisadvantagesSensitive to noise and outliers; Memory-intensiveSensitive to initial centers; Need to predefine number of clustersSensitive to feature scaling; Complex tuningComputationally expensive; Sensitive to hyperparametersSensitive to noisy data; Affected by outliers
Table 5. Performance matrix of ML models.
Table 5. Performance matrix of ML models.
ModelAccuracyPrecisionRecallF1 Score
Random Forest0.9983450.9978030.9692500.983319
AdaBoost0.9845070.9839930.7036570.820542
XGBoost0.9979980.9982890.9618780.979745
K-Means0.4854520.9502790.4834750.640886
Isolation Forest (Optimized)0.0531140.6443430.0065300.012929
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jumani, F.; Raza, M. Machine Learning for Anomaly Detection in Blockchain: A Critical Analysis, Empirical Validation, and Future Outlook. Computers 2025, 14, 247. https://doi.org/10.3390/computers14070247

AMA Style

Jumani F, Raza M. Machine Learning for Anomaly Detection in Blockchain: A Critical Analysis, Empirical Validation, and Future Outlook. Computers. 2025; 14(7):247. https://doi.org/10.3390/computers14070247

Chicago/Turabian Style

Jumani, Fouzia, and Muhammad Raza. 2025. "Machine Learning for Anomaly Detection in Blockchain: A Critical Analysis, Empirical Validation, and Future Outlook" Computers 14, no. 7: 247. https://doi.org/10.3390/computers14070247

APA Style

Jumani, F., & Raza, M. (2025). Machine Learning for Anomaly Detection in Blockchain: A Critical Analysis, Empirical Validation, and Future Outlook. Computers, 14(7), 247. https://doi.org/10.3390/computers14070247

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop