1. Introduction
In 2009, with the release of Satoshi Nakamoto’s Bitcoin [
1], the concept of blockchain emerged in the public domain and has since undergone rapid development [
2]. Unlike Bitcoin, Ethereum introduced support for the deployment of smart contracts, enabling a wide range of blockchain-based upper-layer applications [
3]. Similar to traditional legal contracts, a smart contract defines a set of predefined rules and procedures that both parties in a transaction must follow [
4]. Technically, a smart contract is implemented as a replicable and immutable piece of code deployed on the Ethereum blockchain, ensuring transparency, automation, and trustless execution.
To date, tens of millions of smart contracts have been deployed on the Ethereum blockchain, facilitating a wide range of applications across various domains, such as finance [
5] and industry [
6]. Given the substantial volume of digital assets managed by smart contracts, they have become prime targets for malicious blockchain attackers seeking to exploit vulnerabilities for illicit gains. Due to the structural similarities between smart contract code and traditional programming languages, attackers targeting smart contracts have frequently adopted vulnerability exploitation techniques from conventional software security. Specifically, they analyze the source code of smart contracts to identify and exploit security flaws, thereby extracting illegal profits. For instance, in 2016, the DAO smart contract suffered an attack due to two critical security vulnerabilities, leading to the theft of approximately USD 60 million worth of Ether and ultimately resulting in an Ethereum hard fork [
7]. Similarly, the Cream.Finance contract was exploited through a reentrancy vulnerability, enabling attackers to steal over USD 130 million worth of digital assets [
8]. With the deepening research into blockchain smart contract security, traditional smart contract code vulnerability detection techniques have developed into a multi-layered defense system, including static analysis tools, dynamic symbolic execution, formal verification, and machine-learning-based models for identification. The widespread adoption of these detection technologies and approaches has significantly reduced the success rate of exploiting traditional code vulnerabilities such as reentrancy attacks and integer overflow, from 34% in 2018 to 6% in 2022 (according to the ConsenSys Security Report). This shift has forced attackers to adopt alternative attack strategies, giving rise to the emergence of smart contract honeypots.
The essence of a smart contract honeypot lies in its proactive nature as an attack strategy. These types of contracts are designed to deceive victims into believing they contain obvious vulnerabilities, leading them to believe that exploiting these flaws will result in illicit profits. However, in reality, the victims not only fail to benefit but also suffer financial losses. The concept of smart contract honeypots emerged in 2018 [
9], and by October of the same year, the economic losses attributed to the honeypot attacks had already reached approximately USD 90,000. By 2023, Ethereum mainnet detection revealed a 217% annual growth rate in smart contract honeypots, with asset losses surpassing USD 4.3 million.
The pollution and damage that smart contract honeypots inflict on the Ethereum blockchain ecosystem are substantial and cannot be overlooked. This new form of attack has not only transformed the security landscape of smart contracts but also underscores the complex interaction between human behavior and technical vulnerabilities in decentralized systems. As a result, to improve the security of data applications in blockchain technology, smart contract honeypot detection has become one of the key research directions in the broader field of smart contract code security.
Specifically, current research on smart contract honeypot detection still exhibits certain limitations, and existing detection approaches encounter several unresolved challenges, including the following.
Currently, research on dynamic detection techniques remains relatively limited [
10], with most proposed smart contract honeypot detection approaches confined to the domain of static analysis [
9,
11,
12]. Although static detection techniques offer advantages such as rapid analysis and high code coverage, they still pose significant risks due to high false positive rates, extreme data imbalance, and the potential issue of symbolic path explosion. Moreover, when dealing with unknown types of smart contract honeypots, relying solely on static detection techniques to manually define detection rules is costly and inefficient.
Traditional standalone fuzzing frameworks struggle to efficiently generate valid test cases for complex conditional statements within smart contracts, resulting in low code and branch coverage [
13]. This limitation prevents comprehensive security assessments of smart contracts, as critical vulnerabilities may remain undetected due to inadequate exploration of execution paths.
Traditional fuzzing methods typically generate test cases in a completely random manner and fail to adequately consider initial assignments and execution order when constructing transaction sequences [
14]. As a result, they struggle to detect honeypot traps that require strict triggering conditions and specific execution sequences, leading to a high false negative rate.
The completely random mutation approach in genetic algorithms lacks dynamic feedback guidance and optimization [
15,
16], making it difficult for fuzzing to efficiently navigate complex conditional statements in smart contracts. As a result, it struggles to identify optimal solutions within a short time, leading to significant resource waste and increased time consumption.
The range of smart contract honeypot types that existing detection schemes can identify remains incomplete [
9,
11], indicating substantial potential for further optimization and enhancement.
In summary, due to the high false positive rates of pure static analysis tools like Mythril, which struggle to differentiate between real vulnerabilities and benign code patterns, as well as the path explosion issues that result in insufficient symbolic execution coverage for nested conditional statements and the inefficiencies in input space exploration of pure dynamic analysis tools like ContractFuzzer—where randomly generated transaction sequences fail to trigger honeypot techniques in deeper code spaces and face challenges in reproducing specific block timestamps/chain state combinations—we propose a taint-guided hybrid fuzzing framework for collaborative enhancement. By combining the strengths of both techniques, we avoid the disadvantages of relying on a single approach, effectively improving the framework’s detection capabilities while ensuring both high code coverage and detection efficiency.
To address these challenges, we propose SCH-Hunter, a taint-based hybrid fuzzing framework specifically designed for detecting smart contract honeypots in the Ethereum ecosystem. Specifically, SCH-Hunter consists of five key components: static analysis, adaptive construction of transaction sequences, hybrid fuzzing, taint-based seed optimization, and honeypot detection and reporting.
4. Method
4.3. Adaptive Generation of Sequence
Currently, smart contracts are typically composed of multiple functions, and most existing fuzzing approaches use a random selection of functions and randomly generated function parameter values to construct transaction sequences. However, research has shown that the final execution state (i.e., the outcome) of a smart contract is often influenced by the current states of the variables in the code, and even minor changes in variables can lead to significant differences in the execution results of the generated transaction sequences. Most existing fuzzing methods tend to ignore the variable dependencies between functions, which results in an inability to fully explore potential risks within the contract’s code space. For certain types of smart contract honeypot techniques, such as those that require the smart contract to be in a specific state to trigger the honeypot, blindly constructing transaction sequences can lead to these honeypot techniques being undetected and resources being wasted. Therefore, it is crucial to take the interdependencies between functions and variables into account when generating transaction sequences to effectively trigger the honeypot traps and identify potential vulnerabilities.
For example, in the source code of the Balance Disruption Honeypot smart contract shown in Listing 3, the analysis reveals that the success of the honeypot in capturing a victim depends on three essential conditions: the smart contract honeypot must contain a certain amount of contract balance to lure the victim; the victim must deposit funds into the smart contract; and the victim must then attempt to call the multiplicate function to withdraw the deposited funds and the contract’s balance. If the sequence of these function calls is altered, the honeypot trap will fail to trigger. For instance, if the victim directly calls the multiplicate function in an attempt to withdraw the contract balance without depositing any funds, the condition msg.value >= this.balance will not be satisfied, causing the function call to fail and the honeypot trap to be undetected. Thus, the transaction call chain required to trigger this honeypot trap is deposit() -> Command() -> multiplicate(). Any change in the sequence of these calls in the transaction chain could result in the honeypot trap not being triggered, making it difficult to detect effectively.
Listing 3. Balance Disorder smart contract honeypot. |
![Information 16 00405 i003]() |
To address the issue described above, SCH-Hunter employs an adaptive transaction sequence construction module based on the RAW (Read-after-Write) principle, which consists of two main components: the data flow analyzer and the assignment range determination. Specifically, the process is as follows:
Compilation and CFG Construction: First, the source code of the smart contract is compiled into bytecode, and the corresponding control flow graph (CFG) is constructed.
Data Flow Analyzer: Using the data flow analyzer, the module extracts variable access types related to assignment and comparison operations from the CFG. It also captures the read–write dependencies of global variables between different functions in the smart contract, tracking how the state of these variables changes during the contract’s execution. Additionally, it extracts the conditional ranges of global variables involved in conditional statements.
Determination of Execution Priority and Parameter Ranges: Based on the captured data dependencies, the module calculates the execution priority between functions to determine the order in which the transaction sequence should be composed. Furthermore, it uses the determined conditional ranges of the involved global variables to set the initial assignment ranges of the parameters in the transaction sequence.
Fuzz Testing Integration: These determined parameters and transaction sequence order are then passed to the fuzz testing module, where optimized transaction sequence instances (i.e., test cases) are generated.
We use the Balance Disruption Honeypot smart contract source code shown in Listing 3 to illustrate the execution flow of this adaptive generation of sequence module.
4.5. Taint-Based Seed Optimization
Traditional fuzzing methods employ genetic algorithms for seed mutation but typically rely on fully random mutation strategies without optimization. Specifically, mutations occur randomly within the valid range of data types, making the approach simple and convenient. However, this purely random mutation often leads to the generation of invalid or meaningless test cases, preventing the fuzzing process from quickly discovering optimal test cases. As a result, fuzzing resource waste will increase, and detection efficiency will decline. To address this issue, SCH-Hunter proposes a taint-based seed optimization module. This module aims to guide and optimize the seed mutation process in genetic algorithms, thereby reducing the number of ineffective mutations, minimizing resource waste, and enhancing the efficiency of the detection framework. The taint-based seed optimization module consists of four key components, which is shown in
Figure 4: taint marking, taint propagation and monitoring, taint data classification, seed mutation resource scheduling.
Compared with traditional static taint analysis approaches, the taint-based seed optimization in SCH-Hunter adopts a dynamic taint analysis technique to guide and optimize the seed mutation process. This approach offers advantages such as real-time tracking at runtime and the ability to dynamically update propagation rules, thereby reducing the likelihood of false positives. Moreover, the primary purpose of employing dynamic taint analysis is to guide the scheduling of mutation resources, rather than directly using taint analysis for honeypot classification. By integrating dynamic taint analysis with runtime feedback information from smart contracts, SCH-Hunter is able to enhance its detection accuracy more effectively.
5. Experiments
In this section, we conduct a series experiments to evaluate the effectiveness and performance of SCH-Hunter by answering the following research questions:
How effective is SCH-Hunter in detecting smart contract honeypot techniques? How does its detection performance compare to existing tool?
How does SCH-Hunter perform in improving the code coverage of fuzz testing?
Are the static analysis engine module, hybrid fuzzing module, and taint-based seed optimization module used in SCH-Hunter effective?
5.2. Effectiveness
To evaluate the detection capability of SCH-Hunter for smart contract honeypot techniques, we select the smart contract honeypot dataset(Dataset I) as the experimental dataset. SCH-Hunter is compared with HoneyBadger, a widely recognized and efficient smart contract honeypot detection tool based on symbolic execution, to assess its effectiveness in detecting smart contract honeypots.
Table 5 presents the types of smart contract honeypot techniques that both tools can detect.
Table 6 displays the detection results of these two approaches for ten common smart contract honeypot types. Additionally,
Figure 5 illustrates the number of detected smart contract honeypots for each honeypot technique category.
To evaluate the detection capability of these two detection approaches, this section employs three evaluation metrics: Precision, Recall, and F1-Score. Precision measures the proportion of detected honeypots that are actual smart contract honeypots; Recall assesses the proportion of actual smart contract honeypots that the detection approach successfully identifies; F1-Score is the harmonic mean of Precision and Recall, balancing the trade-off between the two metrics. The calculation methods for these three evaluation metrics are as follows:
As shown in the detection results presented in
Table 6, when evaluating the eight types of smart contract honeypots that both tools can detect, the average precisions of HoneyBadger and SCH-Hunter are 95.35% and 95.04%, respectively, while their average recall rates are 93.07% and 92.21% and their average F1-scores are 0.9390 and 0.9339, respectively. Although HoneyBadger demonstrates a slight advantage across all three metrics, SCH-Hunter achieves nearly equivalent detection performance for these eight types of smart contract honeypots, indicating that its detection capability is highly reliable. Additionally, as illustrated in
Figure 5, the detection performance of both tools is closely matched. The figure also reveals that the Hidden State Update type of smart contract honeypot has the highest occurrence, suggesting that this honeypot technique remains one of the most widely used honeypot technique in smart contracts.
The slightly lower performance of SCH-Hunter compared to HoneyBadger can be attributed to two main factors:
Smart contract honeypots are typically displayed in source code form on blockchain explorers (such as Etherscan), which entices victims into traps. Some honeypot features are more prominent in the source code. However, once the smart contract honeypot’s source code is compiled into bytecode, certain semantics and features may be lost. This makes it difficult for dynamic detection techniques, such as fuzzing, to accurately capture the actual behavior of the contract, leading to potential false negatives.
The smart contract honeypot detection module based on EVM runtime code instrumentation in SCH-Hunter still has room for optimization. There are certain specific features of some honeypot types that have not been fully considered, which results in both false positives and false negatives.
However, compared to HoneyBadger, SCH-Hunter demonstrated a 16.67% higher precision in detecting Type–Deduction–Overflow smart contract honeypots. This improvement is attributed to its ability to actually execute smart contracts and combine taint analysis to capture runtime information in real time, effectively identifying type deduction overflow issues that arise during the contract execution process. Additionally, SCH-Hunter can detect two extra smart contract honeypot techniques: Map Key Encoding Trick and Unexecute Call. When facing ten types of smart contract honeypots, SCH-Hunter achieved an average recall rate of 91.77%. It is worth noting that HoneyBadger is unable to detect the aforementioned two types of smart contract honeypot techniques primarily because it was originally designed to target eight commonly known honeypot patterns prevalent at the time of its development. In other words, HoneyBadger implements detection rules specifically tailored to these eight predefined categories mentioned in
Table 4. However, the Unexecute–Call and Map-Key-Encoding-Trick techniques represent newer forms of smart contract honeypots that have emerged in recent years. Since HoneyBadger does not incorporate detection rules for these newly introduced techniques, it lacks the necessary capabilities to identify them effectively.
Certainly, as observed in
Table 5, SCH-Hunter exhibits instances of both false negatives and false positives. For example, with regard to the Type–Deduction–Overflow honeypot technique, SCH-Hunter failed to detect one of the cases. An illustrative example of this missed detection can be found in one of the honeypot smart contracts, as shown in Listing 4. The reason for the false negative in detecting this honeypot smart contract is that, starting from Solidity version ≥0.8.0, overflow checks are enabled by default. However, the honeypot circumvents these checks by using an unchecked block. SCH-Hunter failed to identify the unchecked block and the potential overflow operation during the detection process. This indicates that SCH-Hunter still has certain compatibility limitations with newer versions of the Solidity compiler, which can lead to missed detections. Additionally, as shown in
Table 5, SCH-Hunter exhibits relatively lower detection precision for the Hidden–State–Update honeypot technique compared to the other nine categories, resulting in false positives. To explain the cause of this misclassification, we take the honeypot smart contract shown in Listing 5 as a representative example. The false positive generated during the detection of this smart contract honeypot stems from the fact that legitimate smart contracts may rely on block.timestamp or block.number to implement standard functionalities such as time locks. However, SCH-Hunter, due to its stringent parameter sensitivity, flags all dependencies on block parameters as suspicious, failing to distinguish between malicious and legitimate usage. Moreover, it overlooks the directionality of temporal constraints. These oversights—namely, insufficient handling of timestamp and block number semantics and an overly conservative detection rule—contribute to the occurrence of false positives in such cases. Nevertheless, from an overall perspective, both the false positive rate and false negative rate of SCH-Hunter remain within acceptable limits, indicating that the framework still demonstrates strong and reliable detection capabilities.
In summary, SCH-Hunter not only demonstrates strong detection capabilities but also covers a wider range of smart contract honeypot types. This answers the first question raised in this section.
Listing 4. The reason of false negative for Type–Deduction–Overflow. |
![Information 16 00405 i004]() |
Listing 5. The reason of false positive for Type–Deduction–Overflow. |
![Information 16 00405 i005]() |
5.4. Component Evaluation
To evaluate the effectiveness of the static analysis engine module, the hybrid fuzzing testing module, and the taint-based seed optimization module in improving the performance of SCH-Hunter, we conducted ablation experiments to separately test the impact of each of these three modules.
The static analysis engine module is responsible for handling and identifying three specific types of smart contract honeypot techniques: Unexecuted Call, Map Key Encoding Trick, and Hidden Transfers. These three smart contract honeypot techniques manipulate the source code of smart contracts in particular ways to make them appear to have corresponding fund leakage vulnerabilities, thereby luring victims into traps. The reason for using the static analysis engine to identify these three types of smart contract honeypots is that, for example, the Unexecuted Call technique initiates a fund transfer request using an incorrect calling method (e.g., call.value(x)), but the call will not execute as expected and, in fact, will never execute. Therefore, after the smart contract is compiled, the Ethereum Virtual Machine (EVM) does not compile this line of code. This is a characteristic of the EVM, meaning that this honeypot trap does not reveal any features during fuzzing, thus making it undetectable without static analysis.
To demonstrate the impact of the static analysis engine module on SCH-Hunter’s ability to detect these three types of smart contract honeypots, we disable the static analysis engine module, referring to it as SCH-Hunter-NS, and compare its detection capability against the full version of SCH-Hunter when identifying these three specific honeypot techniques.
Table 7 presents the comparison results between SCH-Hunter-NS and SCH-Hunter. It can be observed that when the static analysis engine module is disabled, relying solely on the fuzzing engine module to detect these three types of smart contract honeypot techniques yields no detection results, i.e., a complete failure to detect them. This is because, during the actual execution of smart contracts, the features of these three smart contract honeypot techniques are completely erased from the bytecode by the Ethereum Virtual Machine (EVM). Therefore, although the fuzzing engine performs well, it remains ineffective when dealing with source-level features.
From the above, it can be concluded that the static analysis engine module enhances the smart contract honeypot detection capability and scope of SCH-Hunter, enabling it to more accurately detect a wider range of smart contract honeypots. This demonstrates that the static analysis engine module proposed in this framework is effective. Also, it is important to note that, for the Unexecuted-Transfer-type smart contract honeypots, SCH-Hunter exhibits two false negatives. This is because the Solidity compiler version for these two smart contracts is 0.5x, which the current static analysis engine does not support, as it only supports Solidity compiler version 0.4x.
To verify the effectiveness of the hybrid fuzzing module and the taint-based seed optimization module in enhancing SCH-Hunter’s performance, we use code coverage as the evaluation metric, which both of these modules contribute positively to the improvement of code coverage. We compare SCH-Hunter with SCH-Hunter-NTS, which is a fuzzing framework using a genetic algorithm based solely on traditional random mutation, without the hybrid fuzzing or taint-guided mutation strategy. The experiment first uses the smart contract honeypot dataset as the test dataset. The comparison is made based on the improvement in average code coverage after the same number of fuzzing testing iterations.
Figure 7 presents the comparison results of code coverage between SCH-Hunter and SCH-Hunter-NTS.
In
Figure 7, the x-axis represents the number of iterations during the fuzzing process, while the y-axis shows the average code coverage. It can be observed that, when conducting fuzzing on the smart contract honeypot dataset, there is no significant difference in the code coverage or the speed of improvement between the two frameworks for the same number of iterations. This could be attributed to the fact that smart contract honeypots generally have small codebases with low complexity, allowing the fuzzing method based solely on a traditional fully random mutation genetic algorithm to quickly achieve good code coverage. The code coverage of SCH-Hunter stabilizes after the 11th iteration, while SCH-Hunter-NTS stabilizes after the 15th iteration. Thanks to the assistance of the symbolic execution module and the taint-based seed optimization module in the hybrid fuzzing framework, SCH-Hunter demonstrates superior performance in improving both code coverage and detection efficiency.
In addition, we also conducted comparative experiments on the long smart contract dataset, as observed in
Figure 8. When dealing with smart contracts with higher code complexity and larger code sizes, SCH-Hunter’s hybrid fuzzing framework and taint-based seed optimization module show significant advantages in improving code coverage. By combining symbolic execution, SCH-Hunter can quickly activate the symbolic execution module when generating test cases that meet the conditions becomes challenging within a short time. It extracts complex conditions’ constraints and solves them, avoiding the symbolic path explosion issue that may arise from relying solely on symbolic execution. Moreover, by continuously monitoring the propagation of tainted data and taint flow information, SCH-Hunter can capture critical taint information, dynamically adjusting the mutation resource allocation weights, making the seed mutation process more focused on high-value code areas, such as vulnerable code paths and conditional branch paths, and continue to explore deeper. The synergy of these two methods ensures that even when dealing with smart contracts with high code complexity or large codebases, SCH-Hunter can still maintain excellent code coverage and high detection efficiency.
To evaluate the effectiveness of the taint-based seed optimization in improving the detection efficiency of SCH-Hunter, we used average detection time as the primary metric. This is because the optimization module is designed to provide a positive feedback loop by reducing the time required for smart contract analysis.
We conducted a benchmark comparison between SCH-Hunter and its variant, SCH-Hunter-NTA, which employs a genetic algorithm with purely random mutations within the valid input domain without taint guidance. The evaluation was performed on Dataset II, which was composed of long smart contracts; we used this dataset because of its larger codebase and greater complexity, which make it more suitable for demonstrating the efficiency gains brought by this optimization module.
Figure 9 illustrates the comparison of average detection times between SCH-Hunter and SCH-Hunter-NTA. In the figure, the x-axis represents the number of iterations during the fuzzing process, while the y-axis indicates the corresponding average detection time.
As shown in
Figure 9, SCH-Hunter is able to complete 20 iterations in approximately 70 s, whereas SCH-Hunter-NTA requires around 135 s to complete the same number of iterations. This result highlights that by continuously monitoring taint data propagation and taint flow information, SCH-Hunter can capture critical taint points and dynamically adjust the mutation resource allocation weights, allowing the mutation process to converge toward high-value code areas, such as vulnerable execution paths and conditional branches.
This strategy significantly reduces the number of ineffective mutations and resource consumption, resulting in faster detection times. Overall, the results demonstrate that SCH-Hunter achieves superior detection efficiency through its taint-guided optimization mechanism.
In conclusion, the modules proposed in this framework have a positive feedback effect on the detection framework itself, and they effectively improve both the code coverage and detection efficiency, which answers the third question of this section.