Near Real-Time Ethereum Fraud Detection Using Explainable AI in Blockchain Networks

Ertam, Fatih

doi:10.3390/app151910841

Open AccessArticle

Near Real-Time Ethereum Fraud Detection Using Explainable AI in Blockchain Networks

by

Fatih Ertam

Department of Digital Forensics Engineering, Technology Faculty, Firat University, 23200 Elazığ, Türkiye

Appl. Sci. 2025, 15(19), 10841; https://doi.org/10.3390/app151910841

Submission received: 28 August 2025 / Revised: 27 September 2025 / Accepted: 7 October 2025 / Published: 9 October 2025

(This article belongs to the Special Issue Artificial Intelligence on the Edge for Industry 4.0)

Download

Browse Figures

Versions Notes

Abstract

Blockchain technologies have profoundly transformed information systems by providing decentralized infrastructures that enhance transparency, security, and traceability. Ethereum, in particular, supports smart contracts and facilitates the development of decentralized finance (DeFi), non-fungible tokens (NFTs), and Web3 applications. However, its openness also enables illicit activities, including fraud and money laundering, through anonymous wallets. Identifying wallets involved in large transfers or abnormal transactional patterns is therefore critical to ecosystem security. This study proposes an AI-based framework employing XGBoost, LightGBM, and CatBoost to detect suspicious Ethereum wallets, achieving test accuracies between 95.83% and 96.46%. The system provides near real-time predictions for individual or recent wallet addresses using a pre-trained XGBoost model. To improve interpretability, SHAP (SHapley Additive exPlanations) visualizations are integrated, highlighting the contribution of each feature. The results demonstrate the effectiveness of AI-driven methods in monitoring and securing Ethereum transactions against fraudulent activities.

Keywords:

blockchain; cryptocurrency forensics; ethereum; explainable AI; fraud detection

1. Introduction

The advent of blockchain technology has precipitated a paradigm shift within numerous industries, including finance, supply chain management, healthcare, and digital identity systems [1,2]. This technology introduces a decentralised, transparent, and tamper-resistant framework for the recording and verification of transactions [3]. In contradistinction to conventional centralized systems, in which a sole authority is responsible for the maintenance and validation of records, blockchain relies on a distributed ledger that is collectively maintained by a network of nodes. This decentralised consensus mechanism has been demonstrated to mitigate the risk of single points of failure, whilst simultaneously enhancing data integrity and auditability across untrusted environments [4]. Among the diverse blockchain platforms that have been developed, Ethereum distinguishes itself as a second-generation blockchain that extends beyond simple value transfer by offering a Turing-complete programming environment for deploying smart contracts—self-executing agreements encoded directly onto the blockchain [5]. The utilisation of smart contracts facilitate the creation of decentralised applications (dApps), which are characterised by their ability to function without the involvement of intermediaries and to execute complex logic in a deterministic and trustless manner. Consequently, Ethereum has emerged as the foundational infrastructure for a diverse range of decentralised finance (DeFi) protocols, non-fungible token (NFT) ecosystems, and autonomous governance models. This positions it as a pivotal element in shaping the future of internet-based services [6,7]. The introduction of the concept of a programmable blockchain by Ethereum resulted in a substantial expansion of the functional scope of distributed ledger technology, thereby enabling developers to construct decentralised applications (dApps) that exceed the limitations of basic peer-to-peer financial transactions [3]. The integration of a Turing-complete virtual machine, designated as the Ethereum Virtual Machine (EVM), enables the deployment of sophisticated smart contracts capable of executing conditional logic, managing digital assets and automating multi-step workflows in a trustless and transparent environment [8]. This paradigm shift has laid the foundation for novel application domains such as decentralised finance (DeFi), tokenised assets, supply chain automation, decentralised autonomous organisations (DAOs), and identity management systems [9]. These domains leverage Ethereum’s programmable infrastructure to eliminate intermediaries, reduce operational costs, and enhance system resilience [10]. The accelerated growth in the adoption and market capitalisation of Ethereum has not only attracted legitimate innovation but also drawn the attention of malicious actors seeking to exploit the platform for financial gain. As Ethereum continues to serve as the foundational infrastructure for a vast array of decentralised applications, it has become increasingly susceptible to diverse forms of fraud and abuse. Within the context of the Ethereum ecosystem, the prevalence of fraudulent activities encompasses phishing attacks, Ponzi schemes, transaction manipulation, and the deployment of counterfeit decentralised applications (dApps). In the context of the blockchain ecosystem, phishing has been identified as the most prevalent and damaging form of attack, accounting for approximately 50% of all malicious incidents [11]. These fraudulent schemes typically employ deceptive tactics, such as the creation of fake websites, emails, or messaging platforms, with the aim of deceiving users into disclosing private credentials, including seed phrases or private keys. This ultimately results in unauthorised access to their digital wallets and the theft of crypto-assets. Ethereum, a prominent cryptocurrency, was originally developed by Vitalik Buterin. Ethereum functions as a decentralised digital asset transfer system, allowing individuals to send cryptocurrency to others for a minimal transaction fee. Irrespective of geographical location or background, Ethereum ensures secure, consistent, and cost-effective participation in digital transactions on a global scale. Ethereum’s decentralised architecture and the anonymity it affords have led to its emergence as a highly effective medium for cryptocurrency transactions. This has, in turn, rendered it an attractive tool for criminal networks seeking to conduct money laundering and other illicit financial activities [12,13]. The Ethereum ecosystem has recently experienced a notable surge in fraudulent activities, driven by the increasing sophistication of cybercriminal tactics and the integration of advanced technologies. As stated in the 2025 Crypto Crime Report by Chainalysis, the estimated value of illicit cryptocurrency transactions in 2024 was USD 40.9 billion. This figure is predicted to exceed USD 51 billion as more illicit addresses are identified [14]. Furthermore, the Ethereum network has been subject to advanced phishing techniques, including payload-based transaction phishing (PTXPhish). This method involves the manipulation of smart contract interactions through the use of malicious payloads, with the objective of deceiving users. A thoroughgoing investigation has revealed more than 130,000 PTXPhish transactions on the Ethereum blockchain, resulting in financial losses in excess of USD 341.9 million [15]. These developments highlight the pressing need for effective detection and mitigation strategies within the Ethereum ecosystem. The implementation of advanced security measures, user education, and continuous monitoring are of critical importance in ensuring the protection of users and maintaining trust in decentralised platforms.

The primary contributions of this study are outlined as follows:

An artificial intelligence model was developed using labeled data from publicly available blockchain datasets. This model extracts behavioral and transactional features of individual wallet addresses in near real time, and subsequently to classify them as either suspicious or benign based on patterns of fraudulent activity that have been learned.
A near real-time monitoring framework was implemented for the identification and analysis of recently active wallet addresses. The system is designed to ingest on-chain transaction data in a continuous manner, with the capacity to detect newly active wallets. Utilizing a trained model, it is then able to evaluate the likelihood of these wallets being involved in illicit activities.
In order to enhance the interpretability of the model and thus support trust in automated decision-making processes, explainable artificial intelligence (XAI) techniques were incorporated. These techniques facilitate the attribution of model predictions to specific features or behaviors, thereby providing transparency into the rationale behind the classification of a wallet as suspicious.

The remainder of this paper is organized as follows. Section 2 reviews recent studies on ethereum based fraud detection, highlighting methodological advances and existing research gaps. Section 3 describes the materials and methods employed in this study, including dataset construction, feature engineering, and model development procedures. Section 4 presents and discusses the experimental results, emphasizing model performance, feature relevance, and comparative analyses. Section 5 outlines the main limitations of the current study, providing context for result interpretation. Section 6 discusses potential directions for extending this research, such as multiple blockchain networks deployment and integration with blockchain monitoring systems. Finally, Section 7 concludes the paper by summarizing the key findings and their implications for future blockchain security research.

2. Related Works

Numerous studies have been conducted to detect fraudulent activities within the Ethereum network, employing a variety of machine learning algorithms and classification methods. A selection of significant contributions is outlined below.

2.1. Classical ML Approaches

Aziz et al. [16] investigated Ethereum fraud detection using various machine learning techniques, including RF, MLP, and ensemble methods, on a dataset with limited attributes. LGBM outperformed other models, achieving 98.60% accuracy, which improved to 99.03% after hyperparameter tuning. Results were also compared with other boosting algorithms such as XGBoost.

Steven et al. [17] focused on identifying malicious accounts involved in Ethereum transactions. They utilized the XGBoost algorithm and evaluated its performance using tenfold cross-validation. The model achieved a classification accuracy of 96.3%, and the study highlighted the three most influential features contributing to the model’s decision-making process.

Ravindranath et al. [18] evaluated ensemble learning models for detecting fraud in the Ethereum network. CATBoost and LightGBM showed strong performance, achieving 97–98.42% accuracy with oversampling. High F1 and AUC scores indicated reliable detection without overfitting. Among the tested methods, K-Means SMOTE yielded the best results, with 98.42% accuracy and a 99.82% AUC. These findings highlight the effectiveness of ensemble models and advanced resampling in crypto fraud detection.

Dahiya et al. [19] proposed a neural network-based model for the detection of fraudulent transactions on the Ethereum blockchain. The performance of the model was benchmarked against several traditional machine learning classifiers, including Logistic Regression, Support Vector Machine (SVM), Gaussian Naive Bayes, and K-Nearest Neighbours. Among all models that were evaluated, the neural network demonstrated the highest level of accuracy, achieving 97.09%. This result indicates that the neural network possesses a superior capacity to capture and learn complex data patterns. The findings emphasise the efficacy of neural networks in differentiating between authentic and fraudulent Ethereum transactions.

2.2. Self-Supervised and Deep Learning Methods

Teng et al. [20] proposed a novel method for identifying anomalous smart contracts on the Ethereum platform. Their approach involves extracting transaction patterns through a data slicing technique, followed by training a detection model using LSTM networks. The results demonstrated high precision in distinguishing anomalous contracts from legitimate ones.

Ehsan et al. [21] aimed to identify malicious actors and categorize attacks based on behavior. They built a dataset from illicit Ethereum activities and applied feature selection methods such as PCA, Information Gain, and Ridge Regression. Classification using LightGBM, XGBoost, and others showed that models with Information Gain and LGBM/XGBoost reached 98% accuracy. XGBoost also completed analysis in 13.72 s. Additionally, the study improved blockchain security by categorizing fraud types, enhancing network reliability.

Liu et al. [22] introduced S_HGTNs, a framework for detecting anomalies in Ethereum smart contracts, focusing on financial fraud. It builds a Heterogeneous Information Network (HIN) from contract features, learns a relational matrix via a transformer, and classifies using node embeddings. Experiments show that the model outperforms traditional methods with higher accuracy and low variance, confirming its robustness and effectiveness.

2.3. Graph-Based Techniques

Tan et al. [23] proposed a fraud detection method on Ethereum by analyzing transaction records and using web crawlers to obtain labelled fraudulent addresses. These were used to reconstruct a transaction network, from which features were extracted via an amount-based network embedding. A Graph Convolutional Network (GCN) then classified addresses as legitimate or fraudulent. The system achieved 95% accuracy, demonstrating strong performance in identifying fraud.

Jin et al. [24] introduced Meta-IFD (Meta-Interaction-based Fraud Detection), an Ethereum fraud detection framework based on meta-interaction concepts. It combines generative and contrastive self-supervision to refine behavioral features and distinguish activity types. Using multi-view feature learning, Meta-IFD captures rich behavioral representations to detect fraud such as Ponzi schemes and phishing. Evaluations on real Ethereum data show its robustness and high accuracy, with the generative module addressing class imbalance and the contrastive module improving profile discrimination.

Tan et al. [25] proposed a framework for detecting fraudulent Ethereum transactions through analysis of transaction records. Labelled addresses were collected using web crawlers and used to build a transaction network from the public ledger. A network embedding method was employed to extract node features, which were then classified by a Graph Convolutional Network (GCN). The system achieved 96% accuracy, demonstrating its effectiveness in fraud detection on the Ethereum blockchain.

Given the rapid growth of blockchain technology and cryptocurrencies, phishing scams have emerged as a significant threat to transaction security. Existing detection methods frequently fail to capture critical neighbor information and its impact on fraudulent behaviors. In order to address these limitations, a phishing detection framework based on FAAN-GBM (Feature and Attention Augmented Network with Gradient Boosting Machine) has been proposed. This framework integrates basic, transaction, and interaction features of nodes while leveraging attention mechanisms and autoencoders to enhance feature representation. A recent experimental evaluation on authentic Ethereum datasets has demonstrated that the FAAN-GBM model exhibits superior performance in comparison to existing approaches, thereby significantly enhancing the accuracy of phishing fraud node detection [26].

The proliferation of smart contracts within the blockchain ecosystem has engendered a heightened imperative for efficacious phishing detection mechanisms. Existing methods frequently prove inadequate in capturing both global structural patterns in transaction networks and local semantic relationships in transaction data. This limitation restricts their capacity to detect complex phishing behaviors. To address these challenges, a dynamic feature fusion model has been proposed, combining graph-based representation learning with semantic feature extraction. The model constructs global graph representations of account relationships and extracts local contextual features from transactions. These features are then integrated via a dynamic multimodal fusion mechanism. A recent experimental evaluation on large-scale real-world blockchain datasets has demonstrated that this approach exhibits superior performance in terms of accuracy, F1 score, and recall when compared to existing benchmarks. This finding underscores the importance of jointly modeling structural and semantic information for effective phishing detection [27].

LMAE4Eth is a multi-view learning framework designed to improve Ethereum fraud account detection by integrating transaction semantics, masked graph embeddings, and expert knowledge. It utilises a transaction token comparative language model (TxCLM) to convert numerical transactions into semantically meaningful representations and a masked account graph autoencoder (MAGAE) focused on reconstructing account node features for advanced node-level detection. Scalability is achieved through layer-wise sampling, and features designed by experts are incorporated to improve model performance. Experimental results demonstrate that LMAE4Eth outperforms 15 baseline methods, achieving over 10% improvement in F1 score across two datasets and proving its effectiveness in detecting fraudulent accounts [28]. However, the approaches require extensive sequence pre-processing and lack the real-time deployment capabilities demonstrated in our work.

2.4. Hybrid Systems

Li et al. [29] addressed phishing detection on Ethereum as a graph classification task and proposed PDGNN (Phishing Detection Graph Neural Network), an end-to-end framework. It constructs a lightweight transaction network and extracts subgraphs linked to known phishing accounts. Using a Chebyshev-GCN, the model classifies accounts as phishing or legitimate. Experiments on five datasets show that PDGNN outperforms traditional methods and scales well to large networks. Pahuja et al. [30] proposed a fraud detection approach based on the CRISP-DM (Cross-Industry Standard Process for Data Mining) framework for Ethereum transactions. Their method tackled data imbalance using resampling, applied correlation-based feature selection, and used ensemble learning to enhance accuracy. A comparison of ten classifiers showed ensemble models outperformed single ones, with LightGBM achieving the highest accuracy at 99.2%, surpassing other approaches on the same dataset.

2.5. Background on Ethereum and Fraud Typologies

Ethereum is a decentralised blockchain system that facilitates programmable transactions through smart contracts. In the context of the Ethereum blockchain, wallet addresses can be classified as either externally owned or contract-based. Common fraudulent behaviors include phishing, contract abuse, and address laundering, often observable via abnormal transaction frequency, unusually high or low gas usage, or multiple interactions with known blacklisted addresses. The comprehension of these behaviors was instrumental in the subsequent feature engineering process, which is elaborated in the following section.

3. Materials and Method

The graphical representation of the proposed method of the study is presented in Figure 1.

The pipeline commences with Phase 1, in which blockchain data is retrieved from the Ethereum mainnet via Web3 APIs. This phase involves the extraction of transactions from the preceding 1000 blocks or the 10 most recent active addresses. Phase 2 involves the implementation of feature engineering, which entails the transformation of raw transaction data into a set of 17 behavioral features. These features encompass metrics such as transaction counts, value statistics, and temporal patterns. Subsequent to this, data preprocessing is conducted through the utilization of Min–Max normalization. Phase 3 encompasses the training of models using three gradient boosting algorithms (XGBoost, LightGBM, CatBoost) with comprehensive hyperparameter tuning through 5-fold cross-validation. Phase 4 integrates SHAP for model interpretability, providing both local explanations for individual predictions and global feature importance analysis. Phase 5 demonstrates the deployment pipeline, achieving sub-50 millisecond feature extraction and sub-10 millisecond inference time, and outputting classification results with confidence scores and SHAP-based explanations.

3.1. Feature Selection and Reference Dataset

In this study, the MetaMask API was employed to establish a secure connection to the Ethereum network. The most recent 1000 blocks were analysed programmatically through the API, enabling the extraction of 17 distinct features associated with a given wallet address. These features capture various behavioural and transactional characteristics of the wallet, including but not limited to transaction frequency, token interaction patterns, and gas usage metrics. Table 1 presents the extracted features and their definitions.

A key challenge in this research is the limited availability of publicly accessible labelled datasets that classify Ethereum addresses as either suspicious or normal. In order to address this issue, the dataset employed in the study by Aziz et al. [16] served as the primary dataset. The dataset under consideration contains 9841 entries corresponding to transactions on the Ethereum network. Each entry is labelled to indicate whether the behavior is normal (label 0) or suspicious (label 1). Specifically, 7662 records are marked as normal, while the remaining instances are identified as suspicious. The original dataset encompasses 49 extracted features pertaining to transactional behavior, account activity and smart contract interactions. For the purposes of this research, a subset of 17 features was selected from the original 49. These features were determined to be both relevant and technically extractable in real time. This refined feature set was then used to construct a new dataset, the parameters of which were tailored to the requirements of the proposed detection system. The finalized dataset for this study has been made publicly available via a GitHub repository [31].

3.2. Performance Metrics

In order to evaluate the classification performance of the dataset constructed for this study, several widely accepted performance metrics were employed, including Accuracy, Precision, Recall, and the F1-Score [32]. The metrics thus provide a comprehensive understanding of the model’s effectiveness in correctly identifying both suspicious and benign wallet behaviors. The mathematical formulations corresponding to each metric are presented in Equations (1)–(4). Accuracy is a metric of model precision, calculated as the ratio of instances classified correctly to the total number of instances. Precision is defined as the proportion of correctly predicted suspicious wallets among all wallets predicted as suspicious, thereby reflecting the model’s ability to avoid false positives. Recall, also known as sensitivity, is defined as the proportion of actual suspicious wallets that were correctly identified by the model. This metric highlights the model’s capacity to minimize false negatives. The F1-Score is the harmonic mean of precision and recall, offering a balanced metric that is particularly useful when dealing with imbalanced datasets. The collective utilization of these metrics ensures a robust evaluation of the model’s classification capabilities.

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(1)

P r e c i s i o n = \frac{T P}{T P + F P}

(2)

R e c a l l = \frac{T P}{T P + F N}

(3)

F 1 - S c o r e = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(4)

3.3. Classification

In this study, several ensemble-based boosting algorithms were employed for the purpose of classification, including LightGBM (Light Gradient Boosting Machine), XGBoost (Extreme Gradient Boosting), and CatBoost [33]. The selection of these gradient boosting frameworks was made on the basis of their proven efficiency, scalability, and high predictive performance, particularly in the context of structured tabular data. Each of these algorithms employs decision tree ensembles with optimized boosting strategies, thereby enabling the model to capture complex patterns within the feature space and effectively distinguish between suspicious and normal wallet behaviors. XGBoost is an advanced implementation of gradient boosting machines that incorporates system optimization and algorithmic enhancements to improve efficiency, scalability, and model performance [34].

To optimize the objective function, XGBoost applies a second-order Taylor expansion to approximate the loss at iteration t. The approximated loss is given on Equation (5).

L^{(t)} \approx \sum_{i = 1}^{n} [g_{i} f_{t} (x_{i}) + \frac{1}{2} h_{i} f_{t}^{2} (x_{i})] + Ω (f_{t})

(5)

where

f_{t} (x_{i})

is the prediction from the newly added function (typically a regression tree) at iteration t, and

Ω (f_{t})

denotes the regularization term given in Equation (6) that controls the complexity of the model.

Ω (f) = γ T + \frac{1}{2} λ \sum_{j = 1}^{T} w_{j}^{2}

(6)

The terms

g_{i}

and

h_{i}

represent the first and second-order derivatives of the loss function with respect to the prediction from the previous iteration

{\hat{y}}_{i}^{(t - 1)}

, and are defined as follows:

g_{i} = \frac{\partial l (y_{i}, {\hat{y}}_{i}^{(t - 1)})}{\partial {\hat{y}}_{i}^{(t - 1)}}, h_{i} = \frac{\partial^{2} l (y_{i}, {\hat{y}}_{i}^{(t - 1)})}{\partial {\hat{y}}_{i}^{{(t - 1)}^{2}}}

(7)

Here,

l (y_{i}, {\hat{y}}_{i}^{(t - 1)})

denotes the loss function comparing the true label

y_{i}

and the predicted value

{\hat{y}}_{i}^{(t - 1)}

. The gradient

g_{i}

captures the direction of steepest descent, while the Hessian

h_{i}

provides curvature information, allowing the algorithm to perform more accurate and stable updates during optimization.

LightGBM is a gradient boosting framework based on decision tree algorithms, designed to be distributed and efficient. It introduces techniques such as Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB) to reduce computation and memory usage, making it suitable for large-scale and high-dimensional data [35]. LightGBM is a gradient boosting framework that uses histogram-based algorithms and grows trees leaf-wise, optimizing computational efficiency. Loss function is given in Equation (8).

L^{(t)} = \sum_{i \in A} g_{i} f_{t} (x_{i}) + \frac{1}{2} h_{i} f_{t}^{2} (x_{i})

(8)

where

A \subset {1,

…

, n}

is selected using GOSS (Gradient-based One-Side Sampling).

Regularized objective can be defined as given in Equation (9).

L = \sum_{i = 1}^{n} l (y_{i}, {\hat{y}}_{i}) + λ {∥ f ∥}^{2}

(9)

CatBoost is a gradient boosting algorithm specifically designed to handle categorical features efficiently. It employs techniques such as ordered boosting and target statistics to reduce overfitting and eliminate prediction shift, which commonly arise in the processing of categorical variables [36].

At boosting iteration t, the prediction is updated as follows, as given in Equation (10):

{\hat{y}}_{i}^{(t)} = {\hat{y}}_{i}^{(t - 1)} + η f_{t} (x_{i})

(10)

where

η

denotes the learning rate, and

f_{t}

is the decision function (typically a decision tree) added at iteration t.

To prevent target leakage and ensure unbiased gradient estimation, CatBoost introduces the ordered gradient, defined in Equation (11).

g_{i}^{(t)} = {\frac{\partial ℓ (y_{i}, {\hat{y}}_{i}^{(t - 1)})}{\partial {\hat{y}}_{i}^{(t - 1)}}|}_{without (x_{i}, y_{i})}

(11)

where the gradient for sample i is calculated excluding the sample itself from the statistics, thus avoiding prediction shift.

The loss function minimized during training is expressed as shown in Equation (12).

L^{(t)} = \sum_{i = 1}^{n} ℓ (y_{i}, {\hat{y}}_{i}^{(t)})

(12)

where ℓ is the chosen loss function (e.g., log loss or squared error).

For the transformation of categorical features, CatBoost computes a smoothed target statistic as presented in Equation (13).

{TS}_{j} = \frac{\sum_{i \in B (x_{i j})} y_{i} + a \cdot p}{| B (x_{i j}) | + a}

(13)

where:

$B (x_{i j})$ is the set of prior samples with the same categorical value as $x_{i j}$ ,
p is the prior mean of the target,
a is a regularization (smoothing) parameter.

This approach enables CatBoost to achieve state-of-the-art performance, particularly on datasets with high-cardinality categorical variables.

For the purposes of this study, the dataset was divided into a training set and a testing set using an 80/20 split ratio, where 80% of the data was used for training the model and the remaining 20% was reserved for performance evaluation. The classification results obtained from the employed boosting algorithms were compared based on the metrics defined earlier. Table 2 presents a summary of the performance comparison across different classifiers.

Table 2 shows that the boosting algorithms typically generate comparable outcomes, with accuracy values ranging approximately from 95.83% to 96.46%, as evidenced by the test results. The XGBoost-based model was selected for utilization in this study, and all code implementations were written in Python 3.13.

Table 3 presents a near-real-time performance metrics and requrirements.

The performance metrics in the table demonstrate the model’s high efficiency and effectiveness levels. The mean processing time for a single instance was measured at just 0.72 milliseconds (ms), well below the specified requirement of 100 milliseconds (ms). The completion times for 95% and 99% of the transactions were recorded as 0.76 milliseconds (ms) and 1.03 ms, respectively, thereby demonstrating the model’s efficacy, even in extreme cases. The throughput per batch was 12,021 samples per second, which is well above the predefined requirement, confirming the model’s high processing capacity. Additionally, the batch processing time for a single example was found to be 0.08 milliseconds, thereby substantiating the system’s aptitude for real-time applications.

An evaluation of the resource usage revealed a memory consumption of 410 MB and a model size of 548.8 KB. These values are both well below the specified limits. These findings demonstrate that the model is both lightweight and portable, operating efficiently in terms of resources. A comprehensive evaluation of the performance metrics reveals that the requirements are being met with considerably higher performance, thereby substantiating the model’s reliability in delivering both high processing speed and minimal resource utilization.

3.4. Hyperparameter Optimization

Comprehensive grid search was performed across three gradient boosting algorithms. The optimal XGBoost configuration achieved through 5-fold (for each of 2187 candidates, totalling 10,935 fits) stratified cross-validation:

n_estimators: 300
max_depth: 5
learning_rate: 0.2
subsample: 1.0
colsample_bytree: 1.0

This systematic approach ensures reproducible results and addresses potential overfitting concerns.

3.5. Ablation Study

In this study, ablation results were obtained by removing each feature separately. The five most effective features are given in Table 4.

The findings of the present study suggest that the most critical features affecting the model’s prediction performance are ‘Time Diff between first and last (Mins)’ and ‘TotalTransactions’. The elimination of both features results in a model accuracy reduction of approximately 1.06 %, suggesting a considerably more substantial impact compared to other features. Upon the removal of the three additional features—MinValueReceived, TotalEtherReceived, and MaxValueReceived—the accuracy loss remained at 0.26%, 0.26%, and 0.21%, respectively. The findings indicate that behavioral/temporal characteristics, including transaction timing and frequency, are more effective discriminators and informative indicators for the model than financial value-based features.

4. Results and Discussion

4.1. SHAP (SHapley Additive exPlanations)

To enhance the interpretability of the XGBoost model selected for this study, the SHAP (SHapley Additive exPlanations) algorithm was employed. In the field of explainable artificial intelligence (XAI), SHAP (SHapley Additive exPlanations) has emerged as one of the most theoretically grounded and model-agnostic approaches for interpreting machine learning models. It is based on cooperative game theory, particularly the concept of Shapley values, which aim to fairly distribute the “payout” (in this case, the model output) among the input features based on their marginal contributions. SHAP assigns each feature a value that quantifies its individual contribution to a particular prediction. These contributions are calculated by considering all possible permutations of feature subsets and computing the average marginal effect of including a feature across these subsets. The result is a set of additive feature attributions that sum to the model’s output for that instance. This makes SHAP both local (interpreting individual predictions) and global (aggregating attributions across many predictions) in scope [37].

In SHAP, the contribution of each feature i to the model’s prediction is calculated using the following Shapley value formula:

ϕ_{i} = \sum_{S \subseteq N ∖ {i}} \frac{| S |! \cdot (| N | - | S | - 1)!}{| N |!} [f (S \cup {i}) - f (S)]

(14)

where N is the set of all input features,

S \subseteq N ∖ {i}

is a subset of features excluding feature i,

f (S)

is the model prediction using only the features in subset S,

f (S \cup {i})

is the prediction after adding feature i,

ϕ_{i}

is the SHAP value representing the contribution of feature i to the model output.

This formulation guarantees several desirable properties: local accuracy (the sum of SHAP values equals the model prediction), missingness (features not in the model get zero contribution), and consistency (if a model changes to increase the contribution of a feature, its SHAP value will not decrease). As such, SHAP provides a principled and intuitive way to interpret complex machine learning models. Algorithm 1 illustrates the procedure for generating the SHAP plot. Figure 2 depicts the contribution values of each feature to the classification outcome.

For instance, in a specific prediction case, a wallet with high TotalEtherSent and frequent outgoing transactions showed positive SHAP values, indicating strong association with suspicious behavior. In contrast, wallets with low diversity in interacting addresses showed negative SHAP values, correlating with benign activity. This insight is useful for forensic analysts investigating suspicious wallet activity.

Table 5 presents features according to SHAP importance.

The SHAP plot, in isolation, does not explicitly indicate the model’s prediction for a given instance. Instead, it provides valuable insights into which features the model utilised to make its prediction and the extent to which each feature influenced the outcome. The SHAP importance plot is interpreted by examining the vertical axis, which lists the various features employed in the model, arranged in accordance with their estimated impact on the model’s output. The horizontal axis represents the SHAP values, which quantify the extent to which a feature’s value for a particular sample deviates the model’s output from its expected baseline. Positive SHAP values indicate that the feature in question exerts a propelling influence on the prediction, thereby elevating the output (e.g., increasing the probability of being considered suspicious). Conversely, negative SHAP values imply a mitigating effect of the feature on the output. The colour of each point corresponds to the actual value of the feature, with red representing high values and blue indicating low values.

SHAP values are used to quantitatively ascertain the extent to which features contribute to model predictions. A thorough examination of the table reveals that features with the highest average absolute SHAP values play a pivotal role in determining the model’s output. Specifically, features such as “Time Diff between first and last (Mins)” and “UniqueReceivedFrom_Addresses,” with values of 1.693 and 1.453, respectively, exert the most significant influence on the model’s decision-making processes. Features such as “AvgValueReceived”, “TotalTransactions”, and “Received_tnx”, which follow, also have significant effects. Conversely, certain features, including “NumberofCreated_Contracts,” “TotalEtherBalance,” and “MaxValSent,” exhibited lower SHAP values, suggesting that their influence on model predictions is comparatively constrained relative to other features.

In summary, SHAP analysis offers a reliable indicator for explaining the model’s decision-making mechanism and identifying which features are significant. This enhances the model’s transparency and interpretability.

The integration of SHAP with XGBoost enhances the interpretability of tree ensemble models by providing a decomposition of the model’s predictions into individual feature contributions. For each prediction, expressed as

f (x) = \sum_{t = 1}^{T} f_{t} (x)

, SHAP computes the contributions of each feature, denoted as

ϕ_{i}

, in a manner that satisfies the efficiency property,

\sum_{i} ϕ_{i} = f (x) - E [f (X)]

. This property ensures that the sum of the feature contributions precisely accounts for the difference between the model’s prediction and its expected value, thereby enabling a rigorous and quantitative assessment of how each feature influences the model’s output.

Consequently, the SHAP framework provides a transparent and interpretable mechanism for understanding the decision-making process of complex ensemble models such as XGBoost.

Algorithm 1 Fraud Detection in Blockchain Transactions Using XGBoost and SHAP

Input: Dataset file path

Output: Trained XGBoost model, normalization scaler, accuracy score, SHAP summary plot

1: Load dataset from CSV file

2: Separate input features X and target variable y:

3: Remove Address column from X

4: Extract class column as target y

5: Normalize features X using Min-Max scaling to obtain

X_{scaled}

6: Split data into training and test sets:

7:

(X_{train}, X_{test}, y_{train}, y_{test}) \leftarrow train_test_split (X_{scaled}, y, test_size = 0.2, random_state = 42)

8: Initialize XGBoost classifier with parameters:

9: use_label_encoder=False, eval_metric="logloss", tree_method="hist"

10: Train model M on training data

(X_{train}, y_{train})

11: Save trained model and scaler to disk

12: Predict target values

y_{pred}

for

X_{test}

using model M

13: Calculate accuracy score between

y_{test}

and

y_{pred}

14: Initialize SHAP explainer with model M

15: Compute SHAP values for test data

X_{test}

16: Generate and save SHAP summary plot

4.2. Feature Extraction for Given Wallet Address

One of the modules developed for the present study focuses on the extraction of relevant features for any given Ethereum wallet address, and the subsequent prediction of the wallet’s suspicion based on these extracted features. Algorithm 2 illustrates the step-by-step procedure used for feature extraction from the wallet.

4.3. Detection of the Last 10 Active Wallet Addresses and Extraction of the Properties of These Wallets and Model Estimation

The algorithm developed to identify the last 10 active wallet addresses is presented in Algorithm 3. This module is designed to identify the last 10 active Ethereum wallet addresses and extract their corresponding features. Following feature extraction, the model is used to predict whether each of these wallets exhibits suspicious behavior. This script connects to the Ethereum API using Infura and iteratively checks each block from the latest one down to the range limit. For every block, it collects unique “from” and “to” addresses from all transactions. Once it identifies 10 unique active addresses, it stops and writes them to a CSV file.

Algorithm 2 Ethereum Account Feature Extraction

Input: Ethereum account address, start block, end block

Output: Transaction features for the account in a CSV file

1: Connect to Ethereum via Web3 provider

2: Initialize empty list transactions

3: Loop over blocks from start_block to end_block:

4: for each block number in the range do

5: Retrieve block with full transaction data

6: for each transaction in block do

7: if transaction is sent from or received by the given address then

8: Extract transaction data: block number, hash, sender, recipient, value in ETH, gas, timestamp

9: Append data to transactions

10: end if

11: end for

12: end for

13: Extract timestamps and compute:

14: Time difference between first and last transaction

15: Average time between sent transactions

16: Separate sent and received transactions

17: Compute the number of unique senders and recipients

18: Count the number of contract creations

19: Compute statistical values for sent and received ETH:

20: Min, Max, and Average values

21: Total ETH sent and received

22: Balance = Received - Sent

23: Build a feature dictionary with all computed metrics

24: Save the feature dictionary as a row in a CSV file

Algorithm 3 Extracting Latest Active Ethereum Addresses

Input: Ethereum node access, block range N, number of addresses k

Output: A CSV file with the k most recent active Ethereum addresses

1: Connect to Ethereum using Web3

2: Get the latest block number

3: Set the scan range as the last N blocks

4: Initialize an empty set active_addresses

5: for block number from latest to latest

- N

(in reverse) do

6: Fetch the block with full transaction data

7: for each transaction in the block do

8: Add from and to addresses to active_addresses, if present

9: if size of active_addresses

\geq k

then

10: Return the list of active addresses

11: end if

12: end for

13: end for

14: Save the collected addresses into a CSV file

After identifying the last 10 active addresses, the characteristics of each of these addresses were extracted as illustrated in Algorithm 4, and the resulting data were saved to a CSV file.

The model was trained on the extracted features of the last 10 addresses, and predictions regarding the normal or suspicion of each wallet address were made by applying the XGBoost model. The algorithm developed for this module is presented in Algorithm 5.

A comparison of our model with existing approaches, such as those proposed by Aziz et al. [16] and Ehsan et al. [21], reveals that our model not only achieves similar or better accuracy, but also emphasizes real-time applicability and modular design. The majority of extant research concentrates exclusively on offline datasets, whereas our pipeline operates directly on live-chain data using the Web3 API. This architectural enhancement ensures the system’s deployability for security monitoring platforms. Furthermore, the explainability provided by SHAP not only fosters model transparency but also supports compliance with regulatory requirements in the context of blockchain forensics.

Algorithm 4 Extract Ethereum Account Features

Input: Ethereum address list from CSV, start and end block numbers

Output: Extracted features for each address saved in output CSV

1: Connect to Ethereum mainnet via Web3

2: Load Ethereum addresses from latest_active_addresses.csv

3: Define output CSV eth_account_features.csv

4: Get current block as latest_block, set start_block = latest_block - 500

5: for each address in input CSV do

6: if address is valid then

7: Initialize empty transaction list

8: for each block from start_block to latest_block do

9: Get block with full transactions

10: for each transaction in block do

11: if transaction involves address then

12: Collect transaction info (value, gas, timestamp, etc.)

13: end if

14: end for

15: Wait 0.5 s to avoid rate limits

16: end for

17: Compute:

• Number of sent/received transactions

• Number of contracts created

• Unique sent-to and received-from addresses

• Min, max, avg sent/received values

• Total Ether sent/received, balance

• Time statistics

18: Write all features to output CSV

19: end if

20: end for

Algorithm 5 Ethereum Account Fraud Classification Pipeline

Input: CSV file path

Output: Prediction (Normal or Suspicion)

1: Load: Saved scaler from scaler.pkl

2: Load: Trained model from xgboost_fraud_detection_model.pkl

3: Load dataset as dataframe df from CSV

4: Store the Address column separately in addresses

5: Remove the Address column from df

6: for all columns in df do

7: Convert values to numeric (coerce invalid values as NaN)

8: end for

9: Fill missing values in df with column medians

10: Load scaler using joblib

11: Apply scaler transformation to df to obtain X_new_scaled

12: Load XGBoost model using joblib

13: Predict labels for X_new_scaled using the loaded model

14: Create a new dataframe output_df with:

15: Address from original data

16: Class as "NORMAL" if prediction is 0, else "SUSPICION"

17: Save output_df to eth_account_predictions.csv

5. Limitation

While the proposed system exhibits high performance in the realm of near real-time Ethereum fraud detection, it is imperative to acknowledge certain limitations to provide a balanced perspective on the scope and applicability of the work.

The paucity of high-quality, publicly labeled datasets for Ethereum fraud detection poses a significant constraint. The present study is predicated on a dataset comprising 9.841 transactions, which, while substantial, may not encompass the full spectrum of fraudulent behavior present in the current ecosystem.
It is imperative to acknowledge the perpetual adaptability exhibited by cybercriminals in their endeavors to circumvent detection systems. This perpetual adaptability renders training data as a representation of evolving fraud patterns as inadequate. This temporal bias has the potential to compromise the model’s efficacy in addressing novel attack vectors.
The binary classification approach (normal and suspicious) may be an oversimplification of the complex nature of blockchain activity. It is important to note that some transactions may fall into a gray area that is neither clearly fraudulent nor entirely legitimate. The current model does not differentiate between different fraudulent activities (e.g., phishing, Ponzi schemes, and money laundering).
Despite the fact that our set of 17 features captures fundamental behavioral patterns, it is possible that these features do not encompass all relevant fraud indicators. It is possible that more sophisticated fraud schemes may employ patterns not yet captured in our current feature set. The prevailing feature engineering approach is static, which may hinder its ability to adapt to the evolving nature of fraud.
At present, the system’s design is exclusively tailored for Ethereum, thereby limiting its generalizability to disparate blockchain networks characterized by varied transaction structures and consensus mechanisms.
The system’s reliance on external APIs (Ethereum nodes, Etherscan) introduces potential points of failure and rate-limiting constraints that could impact real-time performance.
Despite the system’s optimization for efficiency, concurrent processing of voluminous address groups can necessitate substantial memory resources, a factor that could impede the system’s applicability in environments characterized by resource constraints.
In the context of sophisticated attacks, entities that possess an understanding of the model’s decision boundaries may devise specially designed transactions with the intent to evade detection. It should be noted that this particular scenario has not been the focus of a comprehensive evaluation within the scope of our current research.
While SHAP offers interpretability, it concomitantly unveils the model’s decision-making process, leaving it vulnerable to exploitation by potential attackers.

While the current work demonstrates significant advances in near real-time Ethereum fraud detection with high interpretability, the identified limitations provide clear directions for future research and development. The proposed future work encompasses both technical enhancements and broader considerations of practical deployment, ethical implications, and societal impact. Addressing these limitations and pursuing the outlined research directions will contribute to the development of more robust, scalable, and trustworthy blockchain security systems. The rapid evolution of blockchain technology and fraud techniques necessitates continuous research and adaptation. The framework under consideration provides a solid foundation that can be extended and enhanced to meet emerging challenges in blockchain security and fraud detection.

6. Future Works

In the future, the robustness of models may be enhanced through the incorporation of token transaction graphs and smart contract call traces. In addition, the validation of the pipeline is planned to be conducted on real-time unlabeled data, employing human-in-the-loop verification. The development of a unified framework capable of operating across multiple blockchain networks (Ethereum, Binance Smart Chain, Polygon, etc.) would significantly increase the system’s utility and market viability. The extension of the binary classification to identify specific types of fraud (e.g., phishing, pyramid schemes, mixing services) will provide researchers with more actionable intelligence. The incorporation of graph-based features, while preserving the interpretability and efficiency of the prevailing approach through hybrid architectures, is expected to enhance the system’s effectiveness. Beyond rudimentary statistical measurements, more sophisticated temporal modeling techniques can be employed to identify patterns in fraud behavior over time. The development of specific models for various types of fraud, and the subsequent integration of these models to create hybrid systems, is a promising area of research.

7. Conclusions

This study presents a machine learning approach for detecting fraudulent Ethereum wallet addresses. The system has been demonstrated to demonstrate the capability of evaluating individual or recently active wallet addresses in near real-time by leveraging a pre-trained XGBoost model. This has been shown to result in a high accuracy rate of 96% in classifying suspicious behavior. Incorporating SHAP values into the model helps improve its interpretability and transparency, thus providing information on the contribution of each feature to the final decision. The findings of this study suggest that explainable artificial intelligence (XAI) techniques have the potential to substantially improve the trustworthiness and usability of blockchain analytics tools. In the future, several enhancements are planned. Firstly, the model will be extended to support multi-chain analysis, incorporating data from other popular blockchains such as Binance Smart Chain or Polygon. Furthermore, the feature set may be expanded to include more granular behavioral indicators derived from smart contract interactions and token transfers. Real-time streaming data integration is also a key development goal, which would allow the system to function continuously on live blockchain activity. Finally, the deployment of the solution as a publicly accessible API or dashboard has the potential to facilitate broader use in security monitoring, compliance, and financial auditing applications.

Funding

This work is supported by Scientific Research Projects Coordination Unit of Firat University, Türkiye, Project Numbers: TEKF.25.13 and ADEP.25.28.

Data Availability Statement

Data are available from the corresponding author upon reasonable request.

Conflicts of Interest

The author declares no conflicts of interest.

References

Xu, M.; Chen, X.; Kou, G. A systematic review of blockchain. Financ. Innov. 2019, 5, 27. [Google Scholar] [CrossRef]
Ressi, D.; Romanello, R.; Piazza, C.; Rossi, S. AI-enhanced blockchain technology: A review of advancements and opportunities. J. Netw. Comput. Appl. 2024, 225, 103858. [Google Scholar] [CrossRef]
Sun, J.; Jia, Y.; Wang, Y.; Tian, Y.; Zhang, S. Ethereum fraud detection via joint transaction language model and graph representation learning. Inf. Fusion 2025, 120, 103074. [Google Scholar] [CrossRef]
Gad, A.G.; Mosa, D.T.; Abualigah, L.; Abohany, A.A. Emerging trends in blockchain technology and applications: A review and outlook. J. King Saud Univ.-Comput. Inf. Sci. 2022, 34, 6719–6742. [Google Scholar] [CrossRef]
Zheng, Z.; Su, J.; Chen, J.; Lo, D.; Zhong, Z.; Ye, M. Dappscan: Building large-scale datasets for smart contract weaknesses in dapp projects. IEEE Trans. Softw. Eng. 2024, 50, 1360–1373. [Google Scholar] [CrossRef]
Han, H.; Shiwakoti, R.K.; Jarvis, R.; Mordi, C.; Botchie, D. Accounting and auditing with blockchain technology and artificial Intelligence: A literature review. Int. J. Account. Inf. Syst. 2023, 48, 100598. [Google Scholar] [CrossRef]
Tripathi, G.; Ahad, M.A.; Casalino, G. A comprehensive review of blockchain technology: Underlying principles and historical background with future challenges. Decis. Anal. J. 2023, 9, 100344. [Google Scholar] [CrossRef]
Ma, F.; Ren, M.; Fu, Y.; Wang, M.; Li, H.; Song, H.; Jiang, Y. Security reinforcement for Ethereum virtual machine. Inf. Process. Manag. 2021, 58, 102565. [Google Scholar] [CrossRef]
Wu, S.; Yu, Z.; Wang, D.; Zhou, Y.; Wu, L.; Wang, H.; Yuan, X. Defiranger: Detecting DeFI price manipulation attacks. IEEE Trans. Dependable Secur. Comput. 2023, 21, 4147–4161. [Google Scholar] [CrossRef]
Faqir-Rhazoui, Y.; Arroyo, J.; Hassan, S. A comparative analysis of the platforms for decentralized autonomous organizations in the Ethereum blockchain. J. Internet Serv. Appl. 2021, 12, 9. [Google Scholar] [CrossRef]
Li, S.; Gou, G.; Liu, C.; Xiong, G.; Li, Z.; Xiao, J.; Xing, X. TGC: Transaction Graph Contrast Network for Ethereum Phishing Scam Detection. In Proceedings of the 39th Annual Computer Security Applications Conference, Austin, TX, USA, 4–8 December 2023; pp. 352–365. [Google Scholar]
Wu, J.; Lin, D.; Fu, Q.; Yang, S.; Chen, T.; Zheng, Z.; Song, B. Toward understanding asset flows in crypto money laundering through the lenses of Ethereum heists. IEEE Trans. Inf. Forensics Secur. 2023, 19, 1994–2009. [Google Scholar] [CrossRef]
Wronka, C. Money laundering through cryptocurrencies-analysis of the phenomenon and appropriate prevention measures. J. Money Laund. Control 2022, 25, 79–94. [Google Scholar] [CrossRef]
Chainalysis, T. The Chainalysis 2025 Crypto Crime Report. 2025. Available online: https://go.chainalysis.com/2025-Crypto-Crime-Report.html (accessed on 19 May 2025).
Chen, Z.; Hu, Y.; He, B.; Luo, D.; Wu, L.; Zhou, Y. Dissecting payload-based transaction phishing on Ethereum. arXiv 2024, arXiv:2409.02386. [Google Scholar] [CrossRef]
Aziz, R.M.; Baluch, M.F.; Patel, S.; Ganie, A.H. LGBM: A machine learning approach for Ethereum fraud detection. Int. J. Inf. Technol. 2022, 14, 3321–3331. [Google Scholar] [CrossRef]
Farrugia, S.; Ellul, J.; Azzopardi, G. Detection of illicit accounts over the Ethereum blockchain. Expert Syst. Appl. 2020, 150, 113318. [Google Scholar] [CrossRef]
Ravindranath, V.; Nallakaruppan, M.; Shri, M.L.; Balusamy, B.; Bhattacharyya, S. Evaluation of performance enhancement in Ethereum fraud detection using oversampling techniques. Appl. Soft Comput. 2024, 161, 111698. [Google Scholar] [CrossRef]
Dahiya, M.; Mishra, N.; Singh, R. Neural network based approach for Ethereum fraud detection. In Proceedings of the 2023 4th International Conference on Intelligent Engineering and Management (ICIEM), London, UK, 9–11 May 2023; 2023; pp. 1–4. [Google Scholar]
Hu, T.; Liu, X.; Chen, T.; Zhang, X.; Huang, X.; Niu, W.; Lu, J.; Zhou, K.; Liu, Y. Transaction-based classification and detection approach for Ethereum smart contract. Inf. Process. Manag. 2021, 58, 102462. [Google Scholar] [CrossRef]
Ehsan, A.; Iqbal, Z.; Abuowaida, S.; Aljaidi, M.; Zia, H.U.; Alshdaifat, N.; Alshammry, N.K. Enhanced Anomaly Detection in Ethereum: Unveiling and Classifying Threats with Machine Learning. IEEE Access 2024, 12, 176440–176456. [Google Scholar] [CrossRef]
Liu, L.; Tsai, W.T.; Bhuiyan, M.Z.A.; Peng, H.; Liu, M. Blockchain-enabled fraud discovery through abnormal smart contract detection on Ethereum. Future Gener. Comput. Syst. 2022, 128, 158–166. [Google Scholar] [CrossRef]
Tan, R.; Tan, Q.; Zhang, P.; Li, Z. Graph neural network for ethereum fraud detection. In Proceedings of the 2021 IEEE international conference on big knowledge (ICBK), Auckland, New Zealand, 7–8 December 2021; pp. 78–85. [Google Scholar]
Jin, C.; Zhou, J.; Xie, C.; Yu, S.; Xuan, Q.; Yang, X. Enhancing Ethereum Fraud Detection via Generative and Contrastive Self-supervision. IEEE Trans. Inf. Forensics Secur. 2024, 20, 839–853. [Google Scholar] [CrossRef]
Tan, R.; Tan, Q.; Zhang, Q.; Zhang, P.; Xie, Y.; Li, Z. Ethereum fraud behavior detection based on graph neural networks. Computing 2023, 105, 2143–2170. [Google Scholar] [CrossRef]
Liu, S.Z.; Yu, X.Y.; Li, Y.T.; Zhang, H.; Guo, X.P.; Ma, C.H.; Long, H.X. Detection of Ethereum Phishing Fraud Nodes Based on Feature Enhancement Strategy and GBM. Electronics 2024, 13, 5060. [Google Scholar] [CrossRef]
Sheng, Z.; Song, L.; Wang, Y. Dynamic Feature Fusion: Combining Global Graph Structures and Local Semantics for Blockchain Phishing Detection. IEEE Trans. Netw. Serv. Manag. 2025, 22, 4706–4718. [Google Scholar] [CrossRef]
Jia, Y.; Wang, Y.; Sun, J.; Tian, Y.; Qian, P. LMAE4Eth: Generalizable and Robust Ethereum Fraud Detection by Exploring Transaction Semantics and Masked Graph Embedding. IEEE Trans. Inf. Forensics Secur. 2025, 20, 10260–10274. [Google Scholar] [CrossRef]
Li, P.; Xie, Y.; Xu, X.; Zhou, J.; Xuan, Q. Phishing fraud detection on ethereum using graph neural network. In Proceedings of the International Conference on Blockchain and Trustworthy Systems, Chengdu, China, 4–5 August 2022; Springer: Singapore, 2022; pp. 362–375. [Google Scholar]
Pahuja, L.; Kamal, A. EnLEFD-DM: Ensemble Learning Based Ethereum Fraud Detection Using CRISP-DM Framework. Expert Syst. 2023, 40, e13379. [Google Scholar] [CrossRef]
Github. Github Repository Dataset. 2025. Available online: https://github.com/fatihertam/ethereumfrauddetection (accessed on 19 May 2025).
Kilincer, I.F. Explainable AI supported hybrid deep learnig method for layer 2 intrusion detection. Egypt. Inform. J. 2025, 30, 100669. [Google Scholar] [CrossRef]
Ahn, J.M.; Kim, J.; Kim, K. Ensemble machine learning of gradient boosting (XGBoost, LightGBM, CatBoost) and attention-based CNN-LSTM for harmful algal blooms forecasting. Toxins 2023, 15, 608. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 2017, 30, 3149–3157. [Google Scholar]
Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features. Adv. Neural Inf. Process. Syst. 2018, 31, 6639–6649. [Google Scholar]
Li, Z. Extracting spatial effects from machine learning model using local interpretation method: An example of SHAP and XGBoost. Comput. Environ. Urban Syst. 2022, 96, 101845. [Google Scholar] [CrossRef]

Figure 1. Proposed method.

Figure 2. SHAP value.

Table 1. Ethereum wallet transaction features.

Feature Name	Description
Address	Ethereum wallet address.
Sent_tnx	Total number of standard (non-contract) transactions sent from the address.
Received_tnx	Total number of standard (non-contract) transactions received by the address.
NumberofCreated_Contracts	Number of smart contract creation transactions initiated by the account.
UniqueReceivedFrom_Addresses	Count of distinct sender addresses that sent Ether to this account.
UniqueSentTo_Addresses	Count of distinct recipient addresses this account has sent Ether to.
MinValueReceived	The smallest single Ether amount received in a transaction.
MaxValueReceived	The largest single Ether amount received in a transaction.
AvgValueReceived	Average Ether value received across all incoming transactions.
MinValSent	The smallest single Ether amount sent in a transaction.
MaxValSent	The largest single Ether amount sent in a transaction.
AvgValSent	Average Ether value sent across all outgoing transactions.
TotalEtherSent	Cumulative Ether sent from this address across all transactions.
TotalEtherReceived	Cumulative Ether received by this address across all transactions.
TotalEtherBalance	Net Ether balance after all incoming and outgoing transactions.
TotalTransactions	Total count of transactions including normal and contract creation ones.
TimeDiffBetweenFirstandLast	Time duration in minutes between the first and the most recent transaction.
AvgMinBetweenSentTnx	Average time in minutes between two consecutive sent transactions.

Table 2. Performance comparison of different classifiers.

Metric	XGBoost	LightGBM	CatBoost
Best Hyperparameters	colsample_bytree: 1.0 learning_rate: 0.2 max_depth: 5 n_estimators: 300 reg_alpha: 0 reg_lambda: 1.5 subsample: 1.0	bagging_fraction: 0.8 feature_fraction: 0.9 learning_rate: 0.2 max_depth: 5 n_estimators: 200 num_leaves: 31	depth: 7 iterations: 300 l2_leaf_reg: 3 learning_rate: 0.2
CV Accuracy Mean	0.9588	0.9646	0.9583
CV Accuracy Std	0.0090	0.0071	0.0066
CV F1 Mean	0.9585	0.9643	0.9580
CV F1 Std	0.0090	0.0071	0.0066
Test Accuracy	0.9589	0.9634	0.9584
Test Precision	0.9586	0.9633	0.9582
Test Recall	0.9589	0.9634	0.9584
Test F1	0.9581	0.9628	0.9575
Test ROC AUC	0.9882	0.9898	0.9880
Training Time (s)	833.96	471.86	220.65
Model Size (KB)	548.79	516.15	639.19
Latency (ms)	0.72	0.72	0.72

Table 3. Near-real-time performance metrics and requirements.

Metric	Value	Requirement
Average Processing Time	0.72 ms	<100 ms
P95 Response Time	0.76 ms	<200 ms
P99 Response Time	1.03 ms	<500 ms
Throughput (10 samples)	12,021.0 samples/s	>50 samples/s
Time per Sample (batch)	0.08 ms	<20 ms
Memory Usage	410.0 MB	<1000 MB
Model Size	548.8 KB	<1000 KB

Table 4. Ablation study.

Feature	Baseline Accuracy	Without Feature Accuracy	Accuracy Drop	Relative Accuracy Impact	Baseline F1	Without Feature F1	F1 Drop	Relative F1 Impact
Time Diff between first and last (Mins)	0.9589	0.9487	0.0102	1.0593	0.9581	0.9474	0.0107	1.1209
TotalTransactions	0.9589	0.9487	0.0102	1.0593	0.9581	0.9477	0.0104	1.0895
MinValueReceived	0.9589	0.9563	0.0025	0.2648	0.9581	0.9556	0.0026	0.2677
TotalEtherReceived	0.9589	0.9563	0.0025	0.2648	0.9581	0.9554	0.0027	0.2851
MaxValueReceived	0.9589	0.9568	0.0020	0.2119	0.9581	0.9558	0.0023	0.2375
Sent_tnx	0.9589	0.9573	0.0015	0.1589	0.9581	0.9565	0.0017	0.1725
TotalEtherSent	0.9589	0.9573	0.0015	0.1589	0.9581	0.9565	0.0016	0.1682
UniqueReceivedFrom_Addresses	0.9589	0.9573	0.0015	0.1589	0.9581	0.9566	0.0015	0.1598
AvgValueReceived	0.9589	0.9573	0.0015	0.1589	0.9581	0.9564	0.0017	0.1768
TotalEtherBalance	0.9589	0.9573	0.0015	0.1589	0.9581	0.9565	0.0016	0.1682
Avg min between sent tnx	0.9589	0.9578	0.0010	0.1059	0.9581	0.9569	0.0012	0.1291
MaxValSent	0.9589	0.9578	0.0010	0.1059	0.9581	0.9571	0.0010	0.1079
UniqueSentTo_Addresses	0.9589	0.9578	0.0010	0.1059	0.9581	0.9570	0.0011	0.1163
Received_tnx	0.9589	0.9584	0.0005	0.0530	0.9581	0.9575	0.0006	0.0602
MinValSent	0.9589	0.9584	0.0005	0.0530	0.9581	0.9577	0.0005	0.0478
NumberofCreated_Contracts	0.9589	0.9589	0.0000	0.0000	0.9581	0.9580	0.0001	0.0082
AvgValSent	0.9589	0.9599	−0.0010	−0.1059	0.9581	0.9592	−0.0011	−0.1119

Table 5. Features and their mean absolute SHAP values.

Feature	Mean Absolute SHAP Value
Time Diff between first and last (Mins)	1.6933
UniqueReceivedFrom_Addresses	1.4526
AvgValueReceived	1.1515
TotalTransactions	1.0833
Received_tnx	0.9854
Sent_tnx	0.9066
TotalEtherReceived	0.8005
TotalEtherSent	0.6614
MaxValueReceived	0.6227
MinValueReceived	0.6016
Avg min between sent tnx	0.5552
MinValSent	0.5397
AvgValSent	0.3124
UniqueSentTo_Addresses	0.2595
MaxValSent	0.2215
TotalEtherBalance	0.1875
NumberofCreated_Contracts	0.1364

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ertam, F. Near Real-Time Ethereum Fraud Detection Using Explainable AI in Blockchain Networks. Appl. Sci. 2025, 15, 10841. https://doi.org/10.3390/app151910841

AMA Style

Ertam F. Near Real-Time Ethereum Fraud Detection Using Explainable AI in Blockchain Networks. Applied Sciences. 2025; 15(19):10841. https://doi.org/10.3390/app151910841

Chicago/Turabian Style

Ertam, Fatih. 2025. "Near Real-Time Ethereum Fraud Detection Using Explainable AI in Blockchain Networks" Applied Sciences 15, no. 19: 10841. https://doi.org/10.3390/app151910841

APA Style

Ertam, F. (2025). Near Real-Time Ethereum Fraud Detection Using Explainable AI in Blockchain Networks. Applied Sciences, 15(19), 10841. https://doi.org/10.3390/app151910841

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Near Real-Time Ethereum Fraud Detection Using Explainable AI in Blockchain Networks

Abstract

1. Introduction

2. Related Works

2.1. Classical ML Approaches

2.2. Self-Supervised and Deep Learning Methods

2.3. Graph-Based Techniques

2.4. Hybrid Systems

2.5. Background on Ethereum and Fraud Typologies

3. Materials and Method

3.1. Feature Selection and Reference Dataset

3.2. Performance Metrics

3.3. Classification

3.4. Hyperparameter Optimization

3.5. Ablation Study

4. Results and Discussion

4.1. SHAP (SHapley Additive exPlanations)

4.2. Feature Extraction for Given Wallet Address

4.3. Detection of the Last 10 Active Wallet Addresses and Extraction of the Properties of These Wallets and Model Estimation

5. Limitation

6. Future Works

7. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI