Heuristics Analyses of Smart Contracts Bytecodes and Their Classifications

Udokwu, Chibuzor; Mirhosseini, Seyed Amid Moeinzadeh; Craß, Stefan

doi:10.3390/electronics15010041

Open AccessArticle

Heuristics Analyses of Smart Contracts Bytecodes and Their Classifications

by

Chibuzor Udokwu

^*

,

Seyed Amid Moeinzadeh Mirhosseini

and

Stefan Craß

Austrian Blockchain Center, 1020 Vienna, Austria

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(1), 41; https://doi.org/10.3390/electronics15010041

Submission received: 3 November 2025 / Revised: 16 December 2025 / Accepted: 18 December 2025 / Published: 22 December 2025

(This article belongs to the Special Issue Innovative Architectures and Advanced Solutions for Network Security in the Era of Emerging Technologies)

Download

Browse Figures

Versions Notes

Abstract

Smart contracts are deployed and represented as bytecodes in blockchain networks, and these bytecodes are machine-readable codes. Only a small number of deployed smart contracts have their verified human-readable code publicly accessible to blockchain users. To improve the understandability of deployed smart contracts, we explored rule-based classification of smart contracts using iterative integration of fingerprints of relevant function interfaces and keywords. Our classification system included categories for standard contracts such as ERC20, ERC721, and ERC1155, and non-standard contracts like FinDApps, cross-chain, governance, and proxy. To do this, we first identified the core function fingerprints for all ERC token contracts. We then used an adapted header extractor tool to verify that these fingerprints occurred in all of the implemented functions within the bytecode. For the non-standard contracts, we took an iterative approach, identifying contract interfaces and relevant fingerprints for each specific category. To classify these contracts, we created a rule that required at least two occurrences of a relevant fingerprint keyword or interface. This rule was stricter for standard contracts: the 100% occurrence requirement ensures that we only identify compliant token contracts. For non-standard contracts, we required a minimum of two relevant fingerprint occurrences to prevent hash collisions and the unintentional use of keywords. After developing the classifier, we evaluated its performance on sample datasets. The classifier performed very well, achieving an F1 score of over 99% for standard contracts and a solid 93% for non-standard contracts. We also conducted a risk analysis to identify potential vulnerabilities that could reduce the classifier’s performance, including hash collisions, an incomplete rule set, manual verification bottlenecks, outdated data, and semantic misdirection or obfuscation of smart contract functions. To address these risks, we proposed several solutions: continuous monitoring, continuous data crawling, and extended rule refinement. The classifier’s modular design allows for these manual updates to be easily integrated. While semantic-based risks cannot be completely eliminated, symbolic execution can be used to verify the expected behavior of ERC token contract functions with a given set of inputs to identify malicious contracts. Lastly, we applied the classifier on contracts deployed Ethereum main network.

Keywords:

smart contracts; bytecodes; static analyses; classifier; tagging

1. Introduction

The Blockchain network provides a transparent ledger for executing transactions and verifying them. Smart contracts specify functions that are executed on the blockchain, and the Solidity programming language is a common language for writing smart contracts. Smart contracts deployed on blockchain are represented by their bytecodes, which are not human-readable. Although tools exist that attempt to convert smart contract bytecodes to human-readable logical representations, such tools cannot generate exactly the original representation of the smart contracts in their solidity code equivalents [1,2]. There are also several open-source tools and blockchain explorers that provide crowd-sourced representations of smart contract code in Solidity. The users who have access to the source codes for specific contracts provide them to these tools, and the tool verifies the correctness of the source by checking if a compiler version specified in the contract generates the same bytecode as the contract that is already on-chain. However, only a few verified contracts are publicly available using these tools, and in some cases, less than 1% of the total smart contracts deployed on the public network [3].

Blockchain has faced adoption challenges due to complexity issues related to blockchain and the technologies that support it [4]. There is a challenge in reading and understanding smart contracts deployed on the blockchain network [5]. Hence, there is a need for an automated system for understanding and classifying functions in smart contracts deployed on the blockchain. There is also a lack of clear classification standardization for describing various smart contracts. For this work, we broadly classify smart contracts into standard token contracts and non-standard contracts. Development of a standard contract follows a predefined interface that specifies functions and their logical executions [5,6]. Some tools automatically classify smart contracts into various categories, such as Ether-scan labels (LabelCloud: https://etherscan.io/labelcloud (accessed on: 15 October 2025)), Wallet labels (WalletLabels: www.walletlabels.xyz (accessed on: 15 October 2025)), and Dedaub App (DedaubAPP: app.dedaub.com (accessed on: 15 October 2025)). However, these tools are mostly proprietary and do not reveal the logic behind their classification executions. Hence, the performance of these tools and the result of their classification categories cannot be formally verified. Some research works have also attempted to classify smart contracts. Some research papers have applied machine learning models and algorithms in supervised and unsupervised classification of smart contracts [5,7,8]. There is limited work in applying a heuristic rule-based approach in classifying smart contracts into standard and non-standard categories. The research applied weighted graphs for both standard token classifications and non-standard wallet types classifications [6]. The heuristic approach has also been applied in classifying smart contracts; however, limited to standard token contracts [9].

The objective of this research is as follows: (1) Design an iterative rule-based system for classifying smart contracts into standard ERC token contracts and non-standard contracts by identifying core functions of ERC contracts and their fingerprints, and identifying relevant function interfaces and keyword fingerprints for classifying non-standard contracts. In this paper, we consider standard contracts as ERC token contracts with clearly defined function interfaces, while non-standard contracts are other categories of smart contracts that are not ERC token contracts. (2) Implement a PoC prototype of the classifier and formally evaluate the performance for specified categories. (3) Perform risk analyses to assess potential vulnerabilities of heuristic smart contract classifiers and design a mitigation strategy. (4) Apply the smart contract classifier on contract bytecodes on the Ethereum mainnet to have a timeseries overview of different types of contracts deployed across blocks of transactions in the network. The rest of the paper is structured as follows. Section 2 provides technical background and related literature analyses on smart contract classifiers. Section 3 describes the methodology used in designing the classifier and evaluating its performance. Section 4 describes the architecture design and the benefits of the modular architecture adopted, and also shows the proof of concept implementation of the classifier and its performance evaluation. Section 5 shows the risks associated with heuristic smart classifiers and outlines mitigation strategies for the important risks. Section 6 shows the application of the classifier and time-series analyses of patterns in smart contracts deployed on a public blockchain network. Section 7 provides discussions related to the work done and results obtained in this paper. Section 8 provides conclusions of this work, limitations, and future work.

2. Background and Literature Review

2.1. Blockchain Concepts

Smart contracts and types: These are computer programs that run on the blockchain. Smart contracts specify functions that, when executed, result in transactions that are stored on the blockchain. They can also specify events that track the outcome of the execution of a smart contract function. Smart contracts can be broadly classified into standard contracts and non-standard contracts [10,11]. Standard token contracts specify functions for creating and interacting with blockchain tokens. The main standards for specifying tokens in the Ethereum Virtual Machine (EVM) blockchain environment include ERC20, ERC721, and ERC1155. ERC20 standard is used for specifying fungible tokens, while ERC721 and ERC1155 are used for non-fungible tokens (NFTs). There is also a new standard, the ERC404, that combines fungible and non-fungible tokens into one contract [12]. Non-standard contracts are other categories of smart contract applications, such as decentralized finance (DeFi), Governance, Cross-chain applications, and Proxy contracts. DeFi includes decentralized exchange (DEX) platforms for trading tokens, staking applications, and token lending and borrowing applications that enable the execution of financial-related functions without relying on traditional intermediaries [13]. Governance applications include decentralized autonomous organizations (DAOs) containing proposal and voting features for group-based governance, and access control applications for managing and administering smart contracts [14]. Cross-chains are applications that allow smart contracts to interact with objects outside of a particular blockchain, where such an entity could be another blockchain or a non-blockchain [15]. A bridge is a type of cross-chain that enables smart contract interaction with another blockchain, while an oracle enables interactions with non-blockchain entities. We also consider applications that interact with bridge and oracle applications as cross-chains. Cross-chain and interoperability applications are used interchangeably in this research. Proxies provide data references of the implementation of smart contracts, thereby providing a system for upgrading smart contracts [16]. There are also non-standard ERC tokens that do not fully implement the core functions of ERC token contracts; however, we do not consider this category in this research.

Smart contract parts: Functions contained in a smart contract can be directly implemented on the contract or in a library, which is used in developing the smart contract [17]. Interfaces can also be provided in a contract, which allows the contract to interact with functions specified in another contract. Hence, identifying and understanding various functions specified in a contract or called by a contract provides a meaningful way for categorizing different types of smart contracts.

Smart contract deployment: Smart contracts are compiled into machine-readable opcodes referred to as bytecode, containing instructions for the EVM to execute various functions in the contracts. Hence, functions in a bytecode are identified by their function fingerprints/signatures, which are a four-byte hash representation of the function header [5]. The function header specifies the function name and the function parameters. Therefore, hashing function headers provides a unique identification of a smart contract function in the bytecode. However, since the size of the fingerprint is only four bytes, there are possibilities for hash collisions where multiple smart contracts’ functions can share the same fingerprint. Nonetheless, identifying function fingerprints in a bytecode provides an approach for automated classification of contracts since bytecodes are not human-readable.

2.2. Literature Review

Smart contract classifiers generally apply heuristics or machine learning-based approaches to categorize smart contracts into predefined categories. Such categories usually include standard contracts and non-standard thematic categories. Such non-standard categories also include vulnerability tagging on smart contracts.

The research [5] combined symbolic executions represented in bytecodes with natural language processing to automatically generate human-readable descriptions for functions in a smart contract. The research [7] transformed bytecodes into “opcode words” to extract features for classifying smart contracts such as Voting, Auction, Entity management, Renting, Trading, using ML models like Naive Bayes, SVM, Logistic Regression, and compared with tree-like ensembles (Random Forest, XGBoost). The research [9] applied a heuristic rule-based approach to classify standard contracts into ERC20 and ERC721 tokens. The research [18] applied an ML learning approach by first extracting features in bytecodes using opcode frequency and control flow features, using binary particle swarm optimization for feature selection, and multi-stage ensemble classification using AdaBoost, classified smart contracts into non-standard categories like Governance, Finance, Gambling, Game, Wallet, and Social. The research [8] applied the LDA algorithm for unsupervised topical classification of smart contracts into Notary, Token, Game, Financial, and Blockchain interaction. The research [19] applied both LDA and LSTM to classify smart contracts into Entertainment, Tools, Management, Finance, Lottery, and IoT using an open-source dataset of Solidity codes. The research [6] applied control-weighted graphs on smart contract bytecodes to classify contracts comprising standard ERC tokens and non-standard multi-signature wallet types. For vulnerability tagging of smart contracts, the research [20] transformed bytecodes into bigrams and applied ML classifiers such as Random Forest and KNN to detect different types of vulnerabilities including reentrancy, integer overflow/underflow, time dependency, transaction ordering, unchecked calls, callstack depth, signedness, concurrency, external calls, transaction origin use, and unchecked suicide in smart contracts.

Table 1 summarizes the result of the literature review. The analyses show that the ML approaches are the prevalent approach for classifying non-standard contracts. Such ML approaches utilize both supervised and unsupervised classification models. The F1 score is commonly used for supervised classifiers, while various methods that measure the cohesion of clusters are used for unsupervised classifiers. Both heuristic and ML approaches have been used for classifying standard token contracts. For non-standard contracts, the common repeating classification categories include governance, finance, wallet, and gaming.

3. Methodology

3.1. Method for Standard Token Contracts Identification

To identify token contracts such as ERC20, ERC721, and ERC1155, we first identify the core functions fingerprints that must be contained in these types of contracts. Then we extract the implemented functions in the contract and check if all the core functions of the standard contract occur in the implemented functions that were initially extracted. The following are the core functions used in identifying:

ERC20: totalSupply(), balanceOf(address), transfer(address, uint256), allowance(address, address), approve(address, uint256), transferFrom(address, address, uint256)
ERC721: balanceOf(address), ownerOf(uint256), getApproved(uint256), setApprovalForAll(address, bool), isApprovedForAll(address, address), transferFrom(address, address, uint256), safeTransferFrom(address, address, uint256), safeTransferFrom(address, address, uint256, bytes)
ERC1155: balanceOf(address, uint256), balanceOfBatch(address[], uint256[]), setApprovalForAll(address, bool), isApprovedForAll(address, address), safeTransferFrom(address, address, uint256, uint256, bytes), safeBatchTransferFrom(address, address, uint256[], uint256[], bytes)

The event signatures of the contracts can also be included. We adapted the open-source tool Contract Header (Header extractor: https://github.com/gsalzer/ethutils (accessed on: 15 October 2025)) for extracting the implemented functions in a smart contract bytecode. This approach is expected to provide up to 100% identification precision for standard token contracts. Although false positives due to hash collisions are theoretically possible, matching only on implemented functions further reduces the chances to a negligible level in practice for standard contracts.

3.2. Method for Non-Standard Contracts Identification

For the identification of non-standard contracts like DeFi, Governance, Cross-chain, Proxy, we do not only rely on implemented function fingerprints but also include function fingerprints that occur in common interfaces, libraries, and keywords that are associated with the specific category. To identify the contract headers for specific keywords, we used the open-source dictionary (Four-byte: https://www.4byte.directory/ (accessed on: 15 October 2025)) first to identify all the functions that occur for a particular keyword and then derive the fingerprint. Hence, the fingerprints from the relevant keywords and commonly occurring interfaces provide the search criteria for identifying the non-standard contracts. When at least 2 of these relevant fingerprints associated with a particular category occur in the bytecode, the selected category is returned. This approach limits misclassifications due to hash collisions and unintended use of a function keyword. Limiting the rule to one occurrence of a relevant fingerprint will result in lower precision of the classifier due to increased false positives. However, like most automated classifiers, this approach is not expected to provide 100% identification precision. The following common interfaces/libraries: ‘UniswapV2pair’, ‘UniswapV2Factory’, ‘CurvePool’, ‘BalancerPool’, ‘UniswapV3factory’, ‘UniswapV3Pool’, and the keywords: ‘DEPOSIT’, ‘STAKED’, ‘BORROW’, ‘LOAN’, ‘COLLATERAL’, ‘YIELD’, and ‘LIQUIDATE’ in identifying FinDApps. If at least 2 function fingerprints are associated with either of the listed interfaces or keywords, the contract is classified as FinDApp. The following are the relevant function keywords: ‘VOTE’, ‘PROPOSAL’, ‘DAO’, ‘GOVERNANCE’, and ‘DELEGATE’ were used in identifying governance contracts. The following are the relevant keywords: ‘BRIDGE’ and ‘ORACLE’ were used in identifying cross-chain applications. However, for proxy contracts, we used the following commonly occurring function signatures upgradeTo(address), upgradeToAndCall(address, bytes), implementation(), and admin(). Also, a fixed bytecode signature ‘363d3d373d3d3d363d73’ is used before the address a proxy is pointing to, for a minimal proxy that has no implemented function.

We adopted an iterative keyword selection and result optimization process. Although several other function keywords and function signatures could be associated with these non-standard contracts, we selected the ones that returned the most relevant results from a small sample of tests. Related keywords are iteratively added to the feature list until no new keyword results in improved classification performance.

3.3. Method for Evaluation

The confusion matrix factors, such as precision, recall, and f1-score, provide a formal approach to evaluating the performance of classification systems. The precision measures the accuracy of positive classifications, while recall measures the classifier’s ability to correctly identify all positive occurrences. The F1-score combines precision and recall values of the classifier into a single score. To evaluate the performance of the classifier designed in this paper, we extracted 6500 records of recently active smart contracts, and applied the smart contract classifier to identify ERC20, ERC721, ERC1155, DeFi, Governance, Cross-chain, Proxy contracts, and stored the returned classification category. Since there is no open record of the true classification of smart contracts deployed in a public network like Ethereum, we rely on manual checks to collect true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) outcomes from the classifier in each category. For each category, we used a random selection of 50 smart contract addresses (25 when positive classification and 25 for negative classification) for each category. We adopted the following exclusion and inclusion strategies for the final contracts for evaluation selection: first, we removed minimal proxy contracts, which are contracts with no function implementation in them, and secondly, we included only contracts where there is publicly accessible data of the code (solidity code) of the selected contract on EtherScan. Since solidity is a high-level human-readable language, we checked the logic within the functions to see if it matches the classification produced by the smart contract classifier; hence, we derived the TP, TN, FP, and FN values of the classifier.

4. Design and Implementation of Classifier

4.1. Architecture of Smart Contract Classifier

We used a sequence diagram and a class diagram to outline the architecture of the smart contract classifier developed in this paper. The classifier sequence diagram, as shown in Figure 1, shows the main components of the classifier and the dynamic interactions that occur between them. The main components include the crawler, header extractor, parser, classifier, and UI API. The crawler first extracts and stores bytecode data from a blockchain node. The header extractor reads the bytecode data and extracts implemented function fingerprints, and passes the fingerprints to the parser. The parser extracts classification features from the list of implemented functions’ fingerprints. However, for non-standard token contracts, the parser directly extracts the classification features from the bytecode. The classifier uses the features contained in each smart contract bytecode to classify the address into a given category. The crawler checks the blockchain node for new smart contracts, and the interactions described above are repeated. The user interface api takes input of a smart contract address and returns classification categories for it.

Following the sequences of interactions on the main components, as shown in Figure 2, the following data classes are relevant for the classifier: contract bytecode, fingerprints, features, and classification. The contract bytecode contains the bytecode data of every smart contract deployed on the blockchain and its deployment timestamp. The fingerprint data contains relevant function signatures for each of the classification categories. The parser uses the fingerprints to check the bytecode and extract relevant features of each contract. The feature contains implemented functions, functions from interfaces/libraries, function keywords, and events that describe a particular contract. The classifier checks the features in each contract and assigns them to specific categories following the classification rules earlier described in Section 3 for both standard and non-standard contracts.

The design of the smart contract classifier presented in this paper follows a modular design, ensuring the upgradeability and maintainability of the classifier. This ensures that newly deployed smart contracts can be added to the classifier and the classification identified. Modularity also ensures that the classification rules can easily be updated to add new smart contract categories or improve the performance of existing classification categories. Also, new classification features can be added for specific categories to improve the classifier’s performance.

4.2. PoC Implementation

The development of the proof-of-concept (PoC) of the classifier involves the implementation of the various components. We developed a Keccak-hash calculator in Python adapted from Keccak Implementation https://github.com/aupiff/keccak (accessed on: 15 October 2025) to extract the relevant function fingerprints from the selected smart contract standards, interfaces, and libraries. As earlier stated, the function fingerprints from keywords were extracted from the 4-byte database. The header extractor is adapted from the Python open-source tool. A Python implementation of regular expressions is used to implement the parser.

4.3. Evaluation Results of the Smart Contract Classifier

Table 2 shows the F1 score performance analyses of the smart contract classifier on various categories. The standard smart contract categories, such as ERC20, ERC721, and ERC1155, show a high F1 score with the overall category average at 99%. For the non-standard contracts, proxy and cross-chain (interoperability applications) contracts show high classification performance with precision of 100% and 96% respectively. Both Governance and DeFi contracts have a similar precision of 88%; however, the governance category has a very good recall of 96%. The overall F1 score average of the non-standard contracts is at 93%, unlike the highly accurate standard contracts with a 99% F1 score.

5. Risk Analyses and Mitigation Strategies

5.1. Risk Analyses

This part of the paper identifies and quantifies related risks associated with the automated classification of smart contracts using a heuristic/rule-based approach. Table 3 summarizes seven vulnerabilities that can affect the smart contract classifier, including hash collisions, incomplete rule set, fake function headers, function fingerprints obfuscation, data completeness, and manual verification bottlenecks. By combining the estimated likelihood and severity of the identified vulnerability, we identified critical and high-risk events that can be exploited. The identified high and critical risks include hash collisions, incomplete feature set, fake function headers, function fingerprint obfuscation, and manual verification bottlenecks.

Table 4 summarizes suitable risk treatment approaches that address the potential risk earlier identified. Extended fingerprinting and continuous monitoring are proposed as solutions to address hash collisions, incomplete feature sets, and manual verification bottlenecks. These are already implemented as part of the classifier update process. We use multiple fingerprints to ensure that the detected fingerprint occurrence was not a mistake due hash collision or a mistake in the function keywords. The probability of a cryptographic hash collision for our bytecode-level behavior fingerprints is extremely small (negligible level in practice). This is because hashing is a practical, not theoretical, guarantee of the uniqueness of a function. Still, while evaluating the performance of the classifier, none of the misclassifications are attributable to hash collisions. Thus, hash collision risk does not materially affect the empirical precision observed in our evaluation.

The indexer that extracts contracts bytecodes actively scans the blockchain network to extract new smart contract address and their bytecodes and classifies them based on the defined classification rule. The modular design pattern of the smart contract classifier ensures that new classification categories and relevant keywords can be added without the entire classifier. Additional evaluation can be done as well to add new keywords to the categories or remove keywords that result in true negative classifications. For semantic-related risks such as function fingerprint obfuscation and fake function headers, a symbolic execution simulation can be implemented in a suitable environment to test smart contracts’ outputs for a given set of inputs, and check if the outputs vary from the expected output following the expected logic of the function.

5.2. Risk Mitigation with Simulation of Smart Contract Function Behaviour Using Symbolic Execution Environment

This part of the paper shows how symbolic execution for a smart contract function can be utilized to mitigate high-priority risks in heuristic smart contract classifiers such as fake function headers, function obfuscation, and static analysis constraints. With symbolic execution, the expected outputs of a smart contract function can be compared across various inputs to prevent Fake Function Headers, Semantic Misdirection, and Function Fingerprint Obfuscation, as previously identified in the risk analyses. However, this type of analysis is limited to smart contract functions where their expected behavior is known for a given set of inputs. We use pseudocode to illustrate the setup environment for executing the transfer function in ERC20, including notation definitions, input parameters, preconditions, the initial state of relevant variables, the call data context, and the verification procedure. The same process can be repeated for other similar functions to verify their actual behaviour against expected behaviour. Other similar functions that can be validated using this procedure include transferFrom in ERC20 and ERC721, safeTransferFrom in ERC721 and ERC1155, and safeBatchTransferFrom in ERC1155. These functions result in the status changes of sender and receiver balances. The procedure described below can be executed in an EVM bytecode simulation environment, such as Manticore (Manticore GitHub: https://github.com/trailofbits/manticore (accessed on: 15 October 2025)) or similar tools.

5.2.1. Symbolic Execution Verification Process of Standard Contract Functions

Figure 3 shows a flowchart process of how symbolic execution can be used to verify functions in all standard contract categories. The process starts with setting up a simulation execution environment and selecting an ERC contract to verify. The verifier then selects a specific function in the contract, such as write functions, that results in a status change of the blockchain state. The verifier then initializes the necessary input parameters for the function and checks the output over predefined conditions applicable to the function. If one of the functions is non-compliant, it implies that the specific contract is non-compliant. The process is repeated for all the functions in a standard contract and for all standard contract categories. The simulation parameters in Section 5.2.2 and the Algorithm 1 mimic this process for Transfer function verification in ERC20 contract.

Algorithm 1 Verify ERC20 Transfer Balance Delta

1:: $TERMINALS \leftarrow Exec (C, σ, data, ctx)$
2:: $SUCCESSFUL \leftarrow {t \in TERMINALS ∣ t . status = SUCCESS}$
3:: for all $t \in SUCCESSFUL$ do
4:: $σ^{'} \leftarrow t . σ$
5:: $bad_sender \leftarrow (Balance (σ^{'}, sender) \neq b_{S 0} - amt)$
6:: $bad_receiver \leftarrow (Balance (σ^{'}, receiver) \neq b_{R 0} + amt)$
7:: if $SAT (t . PC \land (bad_sender \lor bad_receiver))$ then
8:: return NON_COMPLIANT
9:: return COMPLIANT

5.2.2. Simulation Parameters and Algorithm for ERC20 Transfer Function

Notations used for the simulation description:

C: contract bytecode under test
$σ$ : pre-state (storage)
$σ^{'}$ : post-state (storage)
$PC$ : path constraints from symbolic execution
$Exec (C, σ, data, ctx)$ : symbolic execution of a call, producing terminal states
$Balance (σ, a)$ : abstract accessor for the balance mapping at address a in state $σ$
$Selector (“ 0 xa 9059 cbb ”)$ : 4-byte function selector
$ABI (a, x)$ : ABI encoding of parameters (address, uint256)

Simulation input Parameters and Preconditions:

\begin{matrix} INPUT : \\ C / / Contract bytecode \\ sender, receiver / / Distinct EVM addresses \\ b_{S 0}, b_{R 0} \in N / / Initial balances for sender, receiver \\ SYMBOLS : \\ amt \in N / / Symbolic transfer amount \\ PRECONDITIONS : \\ amt > 0 \\ amt \leq b_{S 0} / / Sender has enough tokens \\ sender \neq receiver \end{matrix}

Initial State of the relevant variables:

\begin{matrix} Balance (σ, sender) = b_{S 0} \\ Balance (σ, receiver) = b_{R 0} \\ / / Other storage unconstrained \end{matrix}

Call Data and Context:

\begin{matrix} sel : = Selector (“ 0 xa 9059 cbb ”) \\ data : = sel ∥ ABI (receiver, amt) \\ ctx . caller : = sender \\ ctx . value : = 0 \\ ctx . gas : = unconstrained but sufficient \end{matrix}

Verification Procedure: Algorithm 1 shows the pseudocode of symbolic execution that checks the logical execution of the transfer function in a smart contract. The simulation checks two conditions: that the balance of the sender decreases by the value of ’amount sent’ and the balance of the receiver increases by the value of the ’amount sent’; otherwise, the transfer function is considered non-compliant to ERC20 standards.

6. Classifier Application and Analyses of Historical Contracts

6.1. High-Level Analyses of Deployed Smart Contracts

The classifier is applied to the bytecodes of smart contracts deployed on the Ethereum network to provide a time series understanding of the different types of smart contracts deployed in each block. We used the crawler tool to extract all the bytecodes deployed in the Ethereum main network beyond the initial 6500 smart contracts used in evaluating the performance of the classifier. Figure 4 illustrates all the categories of smart contracts that we can classify, including standard contracts such as ERC20, ERC721, and ERC1155, as well as non-standard contracts like cross-chain (interoperability applications), DAO (governance), finDApp (DeFi), and Proxy contracts. Figure 5 also shows these categories, excluding proxy contracts. Each contract can have more than one category of classification; however, our time series figures only show the count occurrence of the categories.

6.2. In-Depth Pattern Analyses of Smart Contract Categories

Based on the results obtained by applying the smart contract classifier to the bytecodes in the Ethereum public network, statistical and time series analyses of smart contract patterns can be used to identify patterns in standard and non-standard contracts. Further advanced analyses can be used to identify ERC token contracts that do not fully comply with the functional requirements of the contract specification. By combining the classification results and their deployment time with external real-world data, we can also understand the reasons for a sudden jump/reduction in the deployment of a particular type of contract within a specific time frame.

7. Discussion

7.1. Research Implication and Patterns from the Deployed Smart Contracts

To develop the heuristic smart contract classifier in this work, we identified relevant core function fingerprints for ERC standard contracts and relevant function keywords and interfaces fingerprints, and applied them on a sample dataset of deployed smart contract bytecodes. For standard contracts, all the core functions have to be implemented to be classified in this category. For non-standard contracts, interactive steps are repeated for the keywords to identify a set of fingerprints that results in the best results for each category. The average F1 score for standard contracts is 99%, while for non-standard contracts is 93%. The related work [9] also applied a heuristic approach to classify standard contracts like ERC20 and ERC721. Although the work did not explicitly show the overall performance of the classifier through the F1 score matrix, the result shows that most of the ERC contracts created during the period of the research (block 0 to 10 m) are mostly compliant (all the core functions and events of the ERC type are implemented in the deployed bytecode). However, in this work, we focused only on compliant ERC contracts (standard contracts) and covered all three main types: ERC20, ERC721, and ERC1155. Our result shows that ERC20 is the most deployed type of ERC contracts during the analysis period (block 0 to block 23+ m), and these are mostly used to implement different crypto token projects. Around block 15 m, there are a lot of activities on the deployed standard ERC721, and this correctly coincides with the period of NFT boom around the year 2021 up till early 2022. Although ERC1155 was introduced in 2018 and can also be used in implementing NFTs, however, there is no significant deployment of this type of token, even during the NFT boom period, to be noticeable in the time series graph as shown in Figure 5.

The work [7] also applied a supervised ML approach that combines random forest and XGBoost to classify smart contracts into non-standard categories like voting, trading, auction, renting, and entity management. The performance is measured using the MCC score that ranges from −1 to +1, with the direction of positive number indicating how good the model is. The maximum MCC obtained in the classifier is 0.82 for smart contracts classified as Auction. In our case, we applied supervised classification and F1 score to measure the performance of the categories, such as finDApp, Governance, cross-chain, and proxy. The classifier developed in this paper has the best performance under the category governance with an F1 score of 96%. The least performing category is the FinDApp that has an F1 score of 88%. This score can be improved by iteratively testing the fingerprints for relevant keywords to remove fingerprints that result in FP classifications and test new keywords to reduce FN classifications. Improving the search fingerprints will increase the recall and precision and result in a better F1 score for this category. Still, further analyses of deployed non-standard smart contracts in the Ethereum network (as shown in Figure 5) show that FinDApp, such as DeFi applications and governance applications such as DAOs, are the most commonly deployed smart contract types. The graph shows that FinDApps are consistently deployed throughout all the blocks; however, governance applications were significant during the early phases of the ethereum network and also around block 15 m, representing the NFT boom phase of 2021 till early 2022. These significant deployments around block 15 m could imply that governance features were used for managing NFTs and crypto token smart contracts deployed during this period. It is also important to note that since the introduction of proxies in the Ethereum network, around 2018, most of the contracts that have been deployed in the Ethereum Blockchain network are minimal proxies (proxy contracts that have no executable function)that point to previously deployed smart contracts. It is also understandable that cross-chain applications that enable interoperability, such as oracles and bridges, are not significant in any of the blocks since they only enable interactions between blockchain applications, networks, and external data sources and are not necessarily used to realize crypto tokens and NFTs.

7.2. Comparison of ML Approach and Heuristics in Token Contracts Identification

To establish the ground-truth performance of the smart contract classifier developed in this paper, we compared it with that of a supervised machine learning approach described in [8]. We collected sample data comprising 25 records from (Github Project https://github.com/giacomofi/Top-Trending-Contracts/tree/main/ (accessed on: 15 December 2025)), for which the LDA-based ML model predicted the selected smart contracts as token contracts. We compared them with the classifier predictions for the same addresses. Our classifier maintained 100% precision while the LDA-ML approach had a precision of 84%. The main reason for the FP detections in the ML approach is that it falsely identified smart contracts that used ERC20 libraries but didnt implement any ERC20 functions as ERC20 contracts. Also, the ML approach is also falsely flags a contract that implements only one ERC20 function as an ERC20 contract. These false detections are common with smart contract classifiers that are based on Solidity code instead of implemented bytecode functions like the method described in this paper. The summary of this comparison can be found in Table 5. Our predicted classifications for smart contracts can be found here in our Classifier Data (Our Classifier Data DOI: https://zenodo.org/records/17752102 (accessed on: 15 December 2025)).

7.3. Beyond Main Classification Categories

In terms of main category coverage, our work shares similar categories with the related work [8], which used an unsupervised LDA ML model to produce thematic categories of non-standard contracts such as governance, finance, blockchain interaction, gaming, and notary. The categories we classified match this earlier work, excluding the gaming and notary (certification) applications. Our work can be extended to cover these missing categories with the addition of sub-categories with more complex classification rules. Governance, FinDApp, ERC20, and ERC721 applications share similar properties with gaming applications. FinDApp applications can further be broken down into DeFi, Staking, DEX, Payments, stablecoins, and various other types of financial applications. For DEX, the core fingerprints of all DEXs can be extracted and matched with simple heuristics. However, for other types of FinDApps, complex rules have to be applied to correctly classify these sub-categories. For interoperable cross-chain applications, simple keyword-based heuristics can be applied to identify specific bridge and oracle applications. The sub-categories for Proxy can be mapped into minimal proxies and proxy applications. Minimal proxies can be identified by their specific signature before the contract address they are pointing to, while general proxy applications contain executable functions for governance and updating the execution contract address the proxy is pointing to. Hence, a simple heuristic approach can be applied for sub-categories for proxy contracts.

7.4. Risk Implications

Exploiting the vulnerabilities associated with heuristic classifiers will further reduce the performance of the smart contract classifier developed in this work. Some of the vulnerabilities, such as hash collision, incomplete feature set/classification categories, data freshness, and manual verification bottlenecks, can be addressed by extended fingerprinting, manual refinements, continuous data crawling, and improved continuous performance monitoring. For commercial use of the classifier, these risk mitigation strategies can be implemented as part of the lifecycle management strategy for the classifier. Still, the resource requirements to maintain and update the classifier rule are quite high. Some of these update implications may not apply to ML-based classifiers [8,18,19], although their initial development will have higher resource requirements, but they do not require constant updates of classification rules.

For semantic-based risks such as fake function headers and function fingerprint obfuscation, it is difficult to eliminate these risks, especially for non-standard contracts. For standard contracts, while the process is rigorous, symbolic execution simulation can be used to test the expected outputs from smart contract functions for a given set of inputs. Hence, without having an overview of the underlying code logic, ERC contract functions whose outputs deviate from expected outputs can be identified and tagged as malicious. For non-standard contracts, it is difficult to test the functions for expected outputs, since for most of the functions that occur in smart contracts, their expected behaviors are not standardized. Therefore, these risks are retained for heuristic classifiers for non-standardized contracts. For ML approaches where the classification features are based on function fingerprints [7,18,19], these risks also apply to the classifiers. The federated ML approach in [21], provides a potential alternative approach to risk-aware classifications, with a strong emphasis on threat detection, policy enforcement, and explainable monitoring.

8. Conclusions

The objective of this research is to apply a heuristics-based approach to analyze function fingerprint occurrences in smart contract bytecode to classify them into predefined categories. Such categories include standard contracts like ERC20, ERC721, and ERC1155, and non-standard categories like FinDApps, cross-chain, governance, and proxy. To achieve this objective, we first identified core function fingerprints for all ERC token contracts and checked their 100% occurrence in implemented functions in the bytecode. We adapted an existing header extractor tool to identify implemented functions. For non-standard contracts, we iteratively identified contract interfaces and relevant fingerprints for specific smart contract categories. We applied a rule that checks for at least two occurrences of relevant fingerprint keywords/interface to be classified in a given category. Using the 100% occurrence rule for standard contracts ensures that we identify only compliant token contracts. The requirement of at least two relevant fingerprint occurrences in non-standard contracts prevents hash collisions and unintentional use of a keyword in a contract. A performance evaluation is conducted on some sample datasets using the classifier. The classifier showed a very good classification score, standard contracts at over 99% F1 score, and considerably good performance for non-standard contracts at about 93%. Furthermore, a risk analysis is conducted to identify potential vulnerabilities that can be exploited to further reduce the performance of the classifier. These include potential hash collisions, incomplete rule set, manual verification bottlenecks, data freshness, and semantic misdirection and obfuscation of smart contract functions. Continuous monitoring, continuous data crawling, and extended rule refinement are considered as potential solutions to these risks. The modular design of the classifier ensures that these manual updates can be easily integrated into the classifier. Semantic-based risks cannot be eliminated; however, they can be reduced using symbolic execution to verify the expected behaviour of ERC token contract functions over a given set of inputs.

The main limitation of this work is the potential overgeneralization of the classifier performance due to the limited number of smart contracts used in evaluating the performance. This is due to the manual rigor required to verify that the predicted category to the actual category since there is no public data on different categories of smart contracts. Another potential weakness of the classifier verification is evenly splitting the correct cases for each category into true and positive cases (25 for each). This is done to address contract categories that have a lot of true cases, and the ones that have a lot of false cases; however, this can introduce unintended bias in the verification. The future work can be explored in three areas. The first is improving the performance of the classifier by further rigorous testing of fingerprints to improve the precision of the classifier by reducing FP, and adding more relevant fingerprints to improve the recall of the classifier by reducing FN. Another direction of future work that emerges from this research is to cover more types of smart contracts by using more complex rule combinations to identify sub-categories like gaming, DAO, access control, voting, DeFi, DEXs, Staking, bridges, oracles, and minimal proxies. The third direction for future work is to implement a symbolic execution environment rule for identifying semantic misdirections and obfuscation in ERC contract functions and evaluate the amount of risk reduction achieved by this type of tool.

Author Contributions

Conceptualization, C.U.; methodology, C.U., S.A.M.M. and S.C.; validation, C.U. and S.A.M.M.; Implementation, S.A.M.M. and C.U.; writing, C.U.; review and editing, C.U., S.C. and S.A.M.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work is partially funded via the General Programme of the Austrian Research Promotion Agency (FFG), project Datenprovider Web3 (no. FO999923332).

Data Availability Statement

Public data about the research for the design and evaluation of the classifier can be found here: https://zenodo.org/records/17752102 (accessed on: 15 December 2025). There are two files; the first shows the keyword fingerprint selections as well as the sample bytecodes used in evaluating the classifier performance in various categories. The second file shows the classification results obtained by applying the classifier to bytecodes in the Ethereum public network.

Acknowledgments

Special thanks to Gernot Salzer and his team from TU Wien, for their feedback and technical support, especially in adapting the bytecode header extractor tool for this research. We also want to thank our project partner, Bitfly Explorer GmbH, for enabling this research and providing valuable feedback during the design, implementation, and evaluation of the smart contract classifier.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Suiche, M. Porosity: A decompiler for blockchain-based smart contracts bytecode. DEF Con 2017, 25, 1–29. [Google Scholar]
Grech, N.; Brent, L.; Scholz, B.; Smaragdakis, Y. Gigahorse: Thorough, declarative decompilation of smart contracts. In Proceedings of the 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), Montreal, QC, Canada, 25–31 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1176–1186. [Google Scholar]
Etherscan. Verified Contracts. Available online: https://etherscan.io/contractsverified (accessed on 13 June 2025).
Udokwu, C.; Kormiltsyn, A.; Thangalimodzi, K.; Norta, A. The state of the art for blockchain-enabled smart-contract applications in the organization. In Proceedings of the 2018 Ivannikov Ispras Open Conference (ISPRAS), Moscow, Russia, 22–23 November 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 137–144. [Google Scholar]
Li, X.; Chen, T.; Luo, X.; Zhang, T.; Yu, L.; Xu, Z. Stan: Towards describing bytecodes of smart contract. In Proceedings of the 2020 IEEE 20th International Conference on Software Quality, Reliability and Security (QRS), Macau, China, 11–14 December 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 273–284. [Google Scholar]
Di Angelo, M.; Salzer, G. Assessing the similarity of smart contracts by clustering their interfaces. In Proceedings of the 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Guangzhou, China, 29 December 2020–1 January 2021; IEEE: Piscataway, NJ, USA, 2020; pp. 1910–1919. [Google Scholar]
Sezer, S.; Eyhoff, C.; Prinz, W.; Rose, T. Exploiting Smart Contract Bytecode for Classification on Ethereum. In Proceedings of the PoEM Workshops, Riga, Latvia, 26 November 2020; pp. 11–22. [Google Scholar]
Ortu, M.; Ibba, G.; Destefanis, G.; Conversano, C.; Tonelli, R. Taxonomic insights into ethereum smart contracts by linking application categories to security vulnerabilities. Sci. Rep. 2024, 14, 23433. [Google Scholar] [CrossRef] [PubMed]
Di Angelo, M.; Salzer, G. Identification of token contracts on Ethereum: Standard compliance and beyond. Int. J. Data Sci. Anal. 2023, 16, 333–352. [Google Scholar] [CrossRef]
Fekih, R.B.; Lahami, M.; Jmaiel, M.; Bradai, S. Formal modeling and verification of erc smart contracts: Application to nft. In Proceedings of the 2023 IEEE Symposium on Computers and Communications (ISCC), Gammarth, Tunisia, 9–12 July 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 556–561. [Google Scholar]
Chirtoaca, D.; Ellul, J.; Azzopardi, G. A framework for creating deployable smart contracts for non-fungible tokens on the ethereum blockchain. In Proceedings of the 2020 IEEE International Conference on Decentralized Applications and Infrastructures (DAPPS), Oxford, UK, 3–6 August 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 100–105. [Google Scholar]
Long, H.W.; Si, Y.W. Token Fungibility Duality: Technical and Graphical Analysis on 404 Standards. In Proceedings of the 2024 IEEE International Conference on Blockchain (Blockchain), Copenhagen, Denmark, 19–22 August 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 252–259. [Google Scholar]
Jensen, J.R.; von Wachter, V.; Ross, O. An introduction to decentralized finance (defi). Complex Syst. Inform. Model. Q. 2021, 26, 46–54. [Google Scholar] [CrossRef]
Ding, W.W.; Liang, X.; Hou, J.; Wang, G.; Yuan, Y.; Li, J.; Wang, F.Y. Parallel governance for decentralized autonomous organizations enabled by blockchain and smart contracts. In Proceedings of the 2021 IEEE 1st International Conference on Digital Twins and Parallel Intelligence (DTPI), Beijing, China, 15 July–15 August 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–4. [Google Scholar]
Ou, W.; Huang, S.; Zheng, J.; Zhang, Q.; Zeng, G.; Han, W. An overview on cross-chain: Mechanism, platforms, challenges and advances. Comput. Netw. 2022, 218, 109378. [Google Scholar] [CrossRef]
Bodell, W.E., III; Meisami, S.; Duan, Y. Proxy hunting: Understanding and characterizing proxy-based upgradeable smart contracts in blockchains. In Proceedings of the 32nd USENIX Security Symposium (USENIX Security 23), Anaheim, CA, USA, 9–11 August 2023; pp. 1829–1846. [Google Scholar]
Zou, W.; Lo, D.; Kochhar, P.S.; Le, X.B.D.; Xia, X.; Feng, Y.; Chen, Z.; Xu, B. Smart contract development: Challenges and opportunities. IEEE Trans. Softw. Eng. 2019, 47, 2084–2106. [Google Scholar] [CrossRef]
Shi, C.; Xiang, Y.; Yu, J.; Gao, L.; Sood, K.; Doss, R.R.M. A bytecode-based approach for smart contract classification. In Proceedings of the 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), Honolulu, HI, USA, 15–18 March 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1046–1054. [Google Scholar]
Tian, G.; Wang, Q.; Zhao, Y.; Guo, L.; Sun, Z.; Lv, L. Smart contract classification with a bi-lstm based approach. IEEE Access 2020, 8, 43806–43816. [Google Scholar] [CrossRef]
El Haddouti, S.; Khaldoune, M.; Ayache, M.; Ech-Cherif El Kettani, M.D. Smart contracts auditing and multi-classification using machine learning algorithms: An efficient vulnerability detection in ethereum blockchain. Computing 2024, 106, 2971–3003. [Google Scholar] [CrossRef]
AlSobeh, A.; Shatnawi, A.; Magableh, A. AspectFL: Aspect-Oriented Programming for Trustworthy and Compliant Federated Learning Systems. Information 2025, 16, 1048. [Google Scholar] [CrossRef]

Figure 1. Interacting components of smart contract classifier.

Figure 2. Data model of smart contract classifier.

Figure 3. Standard contract function verification process with symbolic execution.

Figure 4. Smart contracts deployments and their categories based on block number.

Figure 5. Smart contracts deployments (excluding proxy) and their categories based on block number.

Table 1. Literature review of related works.

Approach	Method	Data Type	Contract	Categories	Evaluation
ML [5]	NLP	Static (Bytecode)	Non-standard	Function descriptions	Informal (interviews)
ML [7]	NB, SVM, LR, RF, XGBoost	Static (Bytecode)	Non-standard	Voting, Auction, Entity management, Renting, Trading	Formal (MCC Score)
Heuristic [9]	rule-based	Static (Bytecode)	Standard	ERC20, ERC721	Formal (F1 Score)
ML [18]	BPSO, AdaBoost	Static (Bytecode)	Non-standard	Governance, Finance, Gambling, Game, Wallet, Social	Formal (F1 Score)
ML [8]	Unsupervised LDA	Static (Code)	Non-standard	Notary, Token, Game, Financial, Blockchain interaction	Formal (Cohen’s kappa)
ML [19]	LSTM, LDA	Static (Code)	Non-standard	Entertainment, Tools, Management, Finance, Lottery, IoT	Formal (F1 Score)
ML [6]	Unsupervised weighted graphs	Static (Bytecode)	Standard, Non-standard	All ERC tokens, Multi-sig Wallets types	Formal (jaccard similarity)
ML [20]	RF, KNN	Static (Bytecode)	Non-standard	Reentrancy, Integer overflow, Tx-Ordering, transaction origin use, Unchecked suicide	Formal (F1 Score)

Table 2. Performance evaluation results of smart contract classifier.

Type	Classifier	Precision	Recall	F1-Score	Ave. F1
Standard	ERC20	100	96	98	99
	ERC721	100	100	100
	ERC1155	100	100	100
Non-standard	DeFi	88	85	88	93
	Governance	88	96	92
	Cross-chain	96	96	96
	Proxy	100	90	95

Table 3. Risk details with summarised exploitation and impact.

Risk ID	Vulnerability Description	Exploit Summary	Impact Summary	Likelihood	Severity	Risk Score
R1	Four-Byte Hash Collisions	Collision-based function mimicry	Misclassification; financial & reputational loss	High	Critical	Critical
R2	Incomplete Feature Set/Rule Coverage	Novel patterns bypassing rules	Misclassification; blind spots; reduced utility	Medium	High	High
R3	Static Analysis Constraints	Runtime-dependent logic bypassing static analysis	Undetected vulnerabilities; false security	Medium	High	High
R4	Fake Function Headers/Semantic Misdirection	Correct fingerprints with malicious logic	Deception; financial fraud risk	High	Critical	Critical
R5	Function Fingerprint Obfuscation	Assembly or proxies for obfuscation	Evasion; misclassification	Medium	High	High
R6	Data Freshness and Completeness	Crawler delays causing stale data	Delayed classification; reduced utility; inefficiencies	Medium	Medium	Medium
R7	Manual Verification Bottlenecks/Ground Truth Reliability	Dependence on small biased dataset	Overestimation; misinformed trust	High	Medium	High

Table 4. Risk vulnerabilities and mitigation strategies.

Risk ID	Vulnerability Description	Mitigation Summary	Implemented
R1	Four-Byte Hash Collisions	Extended fingerprinting; semantic analysis; hybrid ML	Extended fingerprinting
R2	Incomplete Feature Set/Rule Coverage	Hybrid ML; continuous monitoring; rule refinement	Continuous monitoring, Manual rule refinement
R3	Static Analysis Constraints	Selective dynamic analysis; Semantic analysis (symbolic execution), vulnerability assessment tool	Symbolic execution (ERC functions)
R4	Fake Function Headers/Semantic Misdirection	Semantic analysis; extended fingerprinting; hybrid ML	Symbolic execution (ERC functions)
R5	Function Fingerprint Obfuscation	Extended fingerprinting; semantic analysis; hybrid ML	Symbolic execution (ERC functions)
R6	Data Freshness and Completeness	Enhanced crawler scalability; performance monitoring	Enhanced crawler scalability
R7	Manual Verification Bottlenecks/Ground Truth Reliability	Enhanced data sourcing & ground truth generation (multi-source, semi-automated), Continuous performance monitoring	Continuous performance monitoring

Table 5. Comparison of Article [8] ground truth vs. Our Method classification outcomes for selected ERC-20 token contracts.

TokenContract	Ref [8]	Out1	Our Meth.	Out2	Explanation
0x0077d27cb82ff12322987b225bfce0bb6e8931b4	1	TP	1	TP
0x0079453f683380c7493d4bc4fa9baac97c5e693c	1	FP	0	TN	Non implemented erc20 function
0x007bead59a807eb50aef56e80e3aecbab9a3026e	1	TP	1	TP
0x007da60ea2a53c09f5cdb1b5339d8cebe4409744	1	TP	1	TP
0x007dfb0c30f55ccac0191387fe5ffc9cfde519c0	1	TP	1	TP
0x0080cfc1b3177a45a4459b2e85cd202c26b37eb9	1	TP	1	TP
0x008a548284F2E66A1150f4306492b0f5d82b3283	1	TP	1	TP
0x008da6dfe18c61844d614294932d52c50323d722	1	TP	1	TP
0x008eeef21c0dab336deba4c89d449c5e2593463d	1	TP	1	TP
0x008f1d94ad209a5cc9439BA515f619F1d015412e	1	TP	1	TP
0x0095a819919f3409e58128304b8b2b06b29e77be	1	TP	1	TP
0x0099686345e611F4c7646aaba8BCC535e150C20E	1	TP	1	TP
0x009c43B42AEFAC590C719E971020575974122803	1	TP	1	TP
0x009fa1ebc188022c4391c69ef63f1323d358e987	1	TP	1	TP
0x00A55375002f3cDa400383F479e7Cd57Bad029A9	1	TP	1	TP
0x00E3c1F30dC416dBF841435cB1b2188c1A268F7E	1	TP	1	TP
0x00E9303e0fA754751C417E33FdBC031F0cc01360	1	FP	0	TN	Non implemented erc20 function
0x00EAeA176307159B928CCD4A8b9b33c2955092Db	1	FP	0	TN	Non implemented erc20 function
0x00a73102c76647055e8b93f3d662cab686e5638e	1	TP	1	TP
0x00a9a70b94fc1f97141f99d90a3471cf49edadd9	1	TP	1	TP
0x00a8b738e453ffd858a7edf03bccfe20412f0eb0	1	TP	1	TP
0x00c3a4ea499cf8a68f26ec78fad0bd2be28c2769	1	TP	1	TP
0x00d14753f126286502a3aa6df97a949a951398c9	1	TP	1	TP
0x0107d006806d07d32efe5fad1c68b7b63b90e08c	1	FP	0	TN	Implemented only transfer function
0x0114622386c1a00686e594c70682d7aa0f8afa29	1	TP	1	TP

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Udokwu, C.; Mirhosseini, S.A.M.; Craß, S. Heuristics Analyses of Smart Contracts Bytecodes and Their Classifications. Electronics 2026, 15, 41. https://doi.org/10.3390/electronics15010041

AMA Style

Udokwu C, Mirhosseini SAM, Craß S. Heuristics Analyses of Smart Contracts Bytecodes and Their Classifications. Electronics. 2026; 15(1):41. https://doi.org/10.3390/electronics15010041

Chicago/Turabian Style

Udokwu, Chibuzor, Seyed Amid Moeinzadeh Mirhosseini, and Stefan Craß. 2026. "Heuristics Analyses of Smart Contracts Bytecodes and Their Classifications" Electronics 15, no. 1: 41. https://doi.org/10.3390/electronics15010041

APA Style

Udokwu, C., Mirhosseini, S. A. M., & Craß, S. (2026). Heuristics Analyses of Smart Contracts Bytecodes and Their Classifications. Electronics, 15(1), 41. https://doi.org/10.3390/electronics15010041

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Heuristics Analyses of Smart Contracts Bytecodes and Their Classifications

Abstract

1. Introduction

2. Background and Literature Review

2.1. Blockchain Concepts

2.2. Literature Review

3. Methodology

3.1. Method for Standard Token Contracts Identification

3.2. Method for Non-Standard Contracts Identification

3.3. Method for Evaluation

4. Design and Implementation of Classifier

4.1. Architecture of Smart Contract Classifier

4.2. PoC Implementation

4.3. Evaluation Results of the Smart Contract Classifier

5. Risk Analyses and Mitigation Strategies

5.1. Risk Analyses

5.2. Risk Mitigation with Simulation of Smart Contract Function Behaviour Using Symbolic Execution Environment

5.2.1. Symbolic Execution Verification Process of Standard Contract Functions

5.2.2. Simulation Parameters and Algorithm for ERC20 Transfer Function

6. Classifier Application and Analyses of Historical Contracts

6.1. High-Level Analyses of Deployed Smart Contracts

6.2. In-Depth Pattern Analyses of Smart Contract Categories

7. Discussion

7.1. Research Implication and Patterns from the Deployed Smart Contracts

7.2. Comparison of ML Approach and Heuristics in Token Contracts Identification

7.3. Beyond Main Classification Categories

7.4. Risk Implications

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI