Multi-Label Vulnerability Detection of Smart Contracts Based on Bi-LSTM and Attention Mechanism

: Smart contracts are decentralized applications running on blockchain platforms and have been widely used in a variety of scenarios in recent years. However, frequent smart contract security incidents have focused more and more attention on their security and reliability, and smart contract vulnerability detection has become an urgent problem in blockchain security. Most of the existing methods rely on ﬁxed rules deﬁned by experts, which have the disadvantages of single detection type, poor scalability, and high false alarm rate. To solve the above problems, this paper proposes a method that combines Bi-LSTM and an attention mechanism for multiple vulnerability detection of smart contract opcodes. First, we preprocessed the data to convert the opcodes into a feature matrix suitable as the input of the neural network and then used the Bi-LSTM model based on the attention mechanism to classify smart contracts with multiple labels. The experimental results show that the model can detect multiple vulnerabilities at the same time, and all evaluation indicators exceeded 85%, which proves the effectiveness of the method proposed in this paper for multiple vulnerability detection tasks in smart contracts.


Introduction
With the widespread application of blockchain technology [1] in different scenarios, it has moved from the blockchain 1.0 era, in which the programmable digital cryptocurrency system is the main feature, to the 2.0 era, in which the programmable financial system is the main feature [2].Blockchain originates from the underlying technology of Bitcoin, which is essentially a distributed ledger of shared transactions using cryptography to form a special chained data structure that is jointly maintained by all miners in the blockchain network (i.e., nodes or users in the blockchain network) according to a consensus protocol [3], and its main features include decentralization, time-series data, collective maintenance, programmability, and secure and trustworthy, etc.The concept of a smart contract was first proposed by computer scientist Nick Szabo in the 20th century: "A smart contract is a set of commitments defined in digital form, including an agreement that the participants in the contract can execute these commitments" [4].However, due to the lack of a credible execution environment, the exploration was only at the theoretical level until the emergence of blockchain technology provided a reliable execution environment for smart contracts.By integrating various technologies such as distributed data storage, smart contracts, consensus mechanisms [5], and encryption algorithms, blockchain can realize remote pointto-point value transfer without any trusted third party.A smart contract is a program deployed on the blockchain.The node makes the Ethereum virtual machine execute code by calling the interface of the smart contract to realize the execution of the smart contract.
As a representative technology of the blockchain 2.0 era, smart contracts offer broader and more practical applications than earlier blockchains designed only for cryptocurrencies.According to relevant statistics, the daily transaction volume on the Ethereum platform

•
We propose a model for multi-label classification of smart contract opcodes.Existing models perform dichotomous detection of contracts, which can only achieve the identification of one type of vulnerability, while our model can identify multiple vulnerabilities of smart contracts simultaneously.

•
Our approach performed well on the test set, exceeding 85% for each of the four evaluation indicators, as well as exceeding 85% detection accuracy for each of the five vulnerabilities.

•
We used the Word2Vec model to preprocess the opcode data and achieve higher classification accuracy compared to the NN Embedding model.
The rest of this paper is organized as follows.Section 2 details the background of Ethereum smart contracts, multi-label classification, and existing smart contract vulnerability detection methods.Section 3 describes five smart contract vulnerabilities and analyzes the relationship between contract source code, bytecode, and opcode.Section 4 explains our proposed method, and Section 5 presents experimental details and evaluation results.Section 6 concludes the paper and provides an outlook.

Background 2.1. Ethereum Smart Contract
A smart contract is a digital agreement that uses algorithms and programs to formulate contract terms, is deployed on the blockchain, and can be automatically executed according to the rules.Its original intention is to build smart contracts into physical entities to create various flexible and controllable intelligent assets, but due to the limitations of early computing conditions and the lack of application scenarios, smart contracts have not received extensive attention from researchers [8].In December 2013, Buterin proposed the Ethereum blockchain platform.In addition to realizing digital currency transactions based on the built-in Ether, it also provided a Turing-complete programming language (Solidity) to write smart contracts, thus applying smart contracts to the blockchain for the first time.Blockchain realizes decentralized storage, and smart contracts realize decentralized computing based on it [9].
With the development of smart contracts, more and more smart contracts are deployed on different blockchain platforms, such as Ethereum, Fabric, ESO, etc. [10].Among the many blockchain systems that support the operation of smart contracts, Ethereum, as the earliest blockchain platform recognized to support the operation of smart contracts, has now become the largest blockchain platform in the world.In recent years, among the various research works on smart contracts, most of them focus on the relatively mature smart contracts of the Ethereum platform, and the research in this paper is also based on the smart contracts of Ethereum platform.Ethereum customizes Turing-complete programming languages such as Solidity and Serpent to develop smart contracts to achieve special contract functions.In Ethereum, the smart contract source code is compiled into machine-recognizable bytecode by a compiler to form a contract deployment transaction, and each successful deployment generates a new contract account, which is then packaged into blocks for multiple distribution and replication among Ethereum nodes, and finally run on the Ethereum Virtual Machine (EVM), which is the core of smart contract operation and is mainly used to handle the deployment and execution of smart contracts [11].Since smart contracts run on the blockchain, their life cycle is not the same as that of ordinary programs.It is the different life cycle and characteristics of smart contracts that bring new security threats to it.

Multi-Label Classification
In order to effectively manage information in a structured manner, labeling technology emerges as the times require.As a brief overview of the data information, labels can be quickly indexed and accessed through labels after the data is marked.However, in the face of the trend of data diversification, a single label cannot comprehensively explain and generalize the data information; that is, it cannot better meet the actual needs of multiple tasks.The birth of multi-label technology has become inevitable and reasonable.Multi-labeling of data information can make the original information adapt to more application scenarios.With the development of artificial intelligence, multi-label classification technology has transitioned from classical machine learning to deep learning based on neural networks, and has wide and practical applications in sentiment analysis, information mining, image processing, and other fields.
Du et al. [12] proposed an end-to-end ML-Net model to solve the multi-label classification task of biomedical texts.The model combines a label prediction network with an automatic label number prediction mechanism, using the prediction confidence score of each label, and deep contextual information in the target document to achieve multi-label classification.Alhuzali et al. [13] proposed the SpanEmo model, which uses multi-label sentiment classification as span prediction, which can help sentiment analysis models learn the association between labels and words in sentences, and also introduces a loss function for modeling multiple coexisting emotions.Yogarajan et al. [14] studied the multimorbidity of patients, used a multi-label variant of medical text classification to enhance the prediction of concurrent medical codes, demonstrated new embeddings in health-related text, and dealt with imbalanced compared concentrated variants of embedding models for the multi-label medical text classification problem.We found that multi-label classification methods have achieved more and more results in the field of natural language processing.Smart contract vulnerability detection is a natural language processing problem.Most of the existing research in this field is based on binary classification methods, which can only detect whether the contract has vulnerabilities or not.Therefore, we used a multi-label classification method to detect smart contracts and multiple vulnerability types, so as to accurately, effectively, and quickly detect vulnerabilities in smart contracts to ensure the security of the blockchain platform.

Existing Vulnerability Detection Methods
A series of smart contract security vulnerabilities have not only caused huge financial losses to users but also raised serious questions about the fairness and trustworthiness of smart contracts.The security of smart contracts has become a common concern for researchers and developers, and researchers have tried various methods to detect vulnerabilities in smart contracts.The existing smart contract vulnerability detection methods are divided into two main categories, which are traditional vulnerability detection methods and machine learning-based vulnerability detection methods.Among them, traditional vulnerability detection methods include formal verification, symbolic execution, fuzzy testing, and intermediate representation, while machine learning-based vulnerability detection methods include traditional machine learning detection methods and deep learning-based detection methods [15].Table 1 lists the model frameworks included in each detection method.

Intermediate Representation
Slither [26]; Madmax [27]; Ethir [28]; SmartCheck [29] Machine Learning Dynamit [30]; ContractWard [31] Deep Learning R-CNN [32]; TSSC [33]; CBGRU [34] Formal verification transforms the concepts, judgments, and reasoning in a contract into a formal model through formal language.This eliminates ambiguities and implausibilities in the contract and then works with rigorous logic and proof to verify the correctness and security of the function functions in the smart contract.Common formal verification methods include model detection and deductive verification.When symbolic execution is applied in smart contract vulnerability detection, the variable values in the contract are first symbolized, then the instructions in the execution process are interpreted one by one, the execution status is updated and path constraints are collected during the interpretation and execution process, and finally, the exploration of all executable paths in the contract is completed and the corresponding security issues are discovered.Fuzzy testing generates a large number of normal and abnormal test cases from the target application, tries to provide the generated cases to the target application, and monitors the abnormal results in the execution state to discover security issues.Intermediate representation is performed by converting smart contract source code or bytecode into an intermediate representation with high-level semantic representation and then analyzing the intermediate representation of the contract to find security issues.The above traditional detection methods usually rely on fixed semantic rules, are not fully automated, require a second audit by technicians, and have the disadvantages of high false alarm rate and long detection time.
In recent years, machine learning and deep learning have been successfully used in the field of program security, and at the same time, they have promoted new developments in the field of smart contract vulnerability detection.Dynamit is a detection method based on traditional machine learning.It manually extracts features from transactions generated between contracts, adds labels to each transaction manually, and finally inputs a random forest (RF) model for training to obtain high-precision RF classification.This method can only detect whether the contract contains vulnerabilities; it cannot detect the types of vulnerabilities.ContractWard performs N-Gram feature extraction for smart contract opcodes, uses Oyente to detect contracts to add labels and inputs the data into the XGBoost model for training.This method can detect six types of vulnerabilities, and achieve multiple classifications, but the algorithm has large time and space complexity, and memory consumption is too high.R-CNN is a detection model based on a cyclic neural network (CNN), which converts the bytecode of the vulnerability contract into an RGB image and inputs it into CNN for detection.This method can only detect reentrancy vulnerabilities and is not as effective as it could be.TSSC performs a word embedding operation on the contract at the opcode level to obtain the feature representation of the contract and then inputs the long short-term memory neural network (LSTM) for contract vulnerability detection.This model can detect three vulnerabilities, but ignores the contract before and after the operation code logical relationship between them.CBGRU uses a combination of CNN and GRU models to perform binary detection on three types of vulnerabilities.This method cannot detect which types of vulnerabilities the contract has at the same time, and the CNN model is not good at processing data with sequential relationships.This research uses a model combining Bi-LSTM and an attention mechanism to detect multi-label vulnerabilities in smart contracts.It not only extracts the semantic relationship between the opcodes before and after the contract but also focuses on the important features that have a greater impact on the detection results, to ensure the model detection performance.

Smart Contract Vulnerability
Today, various types of smart contract vulnerabilities are emerging, and the reasons for the vulnerabilities are closely related to the functionality of the blockchain platform itself, developer writing, and contract design.Smart contracts are vulnerable to security vulnerabilities for three main reasons.(1) The current smart contract programming language is novel and crude, while contracts are relatively difficult to test in new operating environments.(2) Developers cannot predict the future state and environment of the contract, which makes it easy to write a contract that is vulnerable or susceptible to attack.
(3) Unlike traditional programs, smart contract binary code and its state are stored on a tamper-evident blockchain network, and cannot be modified once a smart contract is deployed.Therefore, detecting smart contracts before they are deployed can maximize the security of smart contracts after they are deployed.To achieve smart contract vulnerability detection, this paper conducts research work on the following types of vulnerabilities.
Reentrancy: Reentrancy is a vulnerability arising at the smart contract code level, due to the special fallback mechanism of Solidity smart contracts [35], which allows attackers to reenter the called function to attack the contract before the end of program command execution.Similar to most programming languages, Ethereum smart contracts make crosscontract function calls when handling business logic, but smart contracts often involve sensitive operations such as transferring money.The reentrancy vulnerability arises because an attacker designs malicious attack code in the fallback function to recursively call the contract's transfer function to steal ETH.
Integer overflow/underflow: The types of integer overflow in smart contracts include three types: multiplication overflow, addition overflow, and subtraction overflow.In Ethereum smart contracts, integers are generally specified as fixed size and unsigned integer types, which means that integer variables can only be values within a certain range, beyond which an integer overflow error will occur [36].Solidity language integer variable steps are generally incremented by 8, supporting types from uint8 to uint256.The uint8 type, for example, has a variable length of 8 bits and supports storing numbers in the range [0,255].The range of numbers supported for storage is [0, 255].If an attempt is made to store data in a variable of type uint8 that is larger than this range, the EVM will automatically truncate the high bits, resulting in an integer overflow error.The vulnerability occurs at the smart contract code level.
Transaction order dependency: The order of transaction execution in a blockchain network is determined by miners in the blockchain, and some contracts have strict requirements on the order of transaction execution [37].For example, user A and user B simultaneously submit transactions T1 and T2 respectively at time t.However, the execution order of T1 and T2 is decided by miners in the block, and if T1 is executed first, the contract state will change from S to S1 and vice versa, so the final contract state depends on the order of transaction execution selected by miners.If a malicious miner listens to the corresponding contract transaction in the block, he can change the current contract state by submitting a malicious transaction, thus having the opportunity to deploy an attack in advance.This vulnerability is a vulnerability that arises at the blockchain system level.
Timestamp dependency: A security vulnerability introduced by smart contracts that use strict block timestamps in their code to make important control flow decisions.The block timestamp is the timestamp when the block belonging to the current contract call transaction is packed.Block timestamps can be manipulated by miners within a certain range of fetching values [38].The precise timestamp is used as an important decision parameter in the contract, and while it is somewhat non-defeatable to a normal attacker, a miner-identified attacker can easily circumvent the limitations of the contract's timestamp design by constructing a malicious timestamp with a range of values.This vulnerability also occurs at the blockchain system level.

Smart Contract Source Code, Bytecode, Opcode
Smart contract development is similar to traditional software development in that a high-level language is used to code it.There are currently more than 40 platforms that support the deployment of smart contracts, all with corresponding contract development languages.Most of these platforms use the Solidity language for development, such as the Ethereum blockchain platform.The programming languages used to develop smart contracts are Vyper, Idris, Rust, etc., in addition to Solidity [39].The developed smart contract source code needs to be compiled before it can be deployed on the EVM.Program compilation is the translation of program code written in a high-level language into machine code that can be used for machine computation.The binary code used for execution in a smart contract is called bytecode (or EVM code), and EVM completes the execution of the contract by decompiling the bytecode into the corresponding opcode.The Ethereum smart contract is compiled by the compiler, which also generates an application call interface (ABI) for the blockchain to parse out the function selector to implement the contract function calls.We used the Remix tool [40] and Opcode-tool [41] provided by the Ethereum platform to compile source code to bytecode and decompile bytecode to opcode, respectively.Figure 1 shows the relationship between a piece of smart contract source code after compiling and decompiling to generate parts of bytecode and opcode.
Transaction order dependency: The order of transaction execution in a blockchain network is determined by miners in the blockchain, and some contracts have strict requirements on the order of transaction execution [37].For example, user A and user B simultaneously submit transactions T1 and T2 respectively at time t.However, the execution order of T1 and T2 is decided by miners in the block, and if T1 is executed first, the contract state will change from S to S1 and vice versa, so the final contract state depends on the order of transaction execution selected by miners.If a malicious miner listens to the corresponding contract transaction in the block, he can change the current contract state by submitting a malicious transaction, thus having the opportunity to deploy an attack in advance.This vulnerability is a vulnerability that arises at the blockchain system level.
Timestamp dependency: A security vulnerability introduced by smart contracts that use strict block timestamps in their code to make important control flow decisions.The block timestamp is the timestamp when the block belonging to the current contract call transaction is packed.Block timestamps can be manipulated by miners within a certain range of fetching values [38].The precise timestamp is used as an important decision parameter in the contract, and while it is somewhat non-defeatable to a normal attacker, a miner-identified attacker can easily circumvent the limitations of the contract's timestamp design by constructing a malicious timestamp with a range of values.This vulnerability also occurs at the blockchain system level.

Smart Contract Source Code, Bytecode, Opcode
Smart contract development is similar to traditional software development in that a high-level language is used to code it.There are currently more than 40 platforms that support the deployment of smart contracts, all with corresponding contract development languages.Most of these platforms use the Solidity language for development, such as the Ethereum blockchain platform.The programming languages used to develop smart contracts are Vyper, Idris, Rust, etc., in addition to Solidity [39].The developed smart contract source code needs to be compiled before it can be deployed on the EVM.Program compilation is the translation of program code written in a high-level language into machine code that can be used for machine computation.The binary code used for execution in a smart contract is called bytecode (or EVM code), and EVM completes the execution of the contract by decompiling the bytecode into the corresponding opcode.The Ethereum smart contract is compiled by the compiler, which also generates an application call interface (ABI) for the blockchain to parse out the function selector to implement the contract function calls.We used the Remix tool [40] and Opcode-tool [41] provided by the Ethereum platform to compile source code to bytecode and decompile bytecode to opcode, respectively.Figure 1 shows the relationship between a piece of smart contract source code after compiling and decompiling to generate parts of bytecode and opcode.A bytecode is a string of hexadecimal digit-encoded byte arrays, which are parsed in units of one byte, each of which represents an EVM instruction or an operation data [31].The bytecode contains a total of three parts: deployment code, runtime code, and auxdata.When EVM creates a contract, it creates the contract account first and then runs the deployment A bytecode is a string of hexadecimal digit-encoded byte arrays, which are parsed in units of one byte, each of which represents an EVM instruction or an operation data [31].The bytecode contains a total of three parts: deployment code, runtime code, and auxdata.When EVM creates a contract, it creates the contract account first and then runs the deployment code.After running, the runtime code and auxdata are stored on the blockchain, and finally, the deployment of the contract is completed by associating the storage address of both with the contract account.After deployment, the real execution is the runtime code, so only the runtime code needs to be decompiled to get the contract opcode.Currently, EVM has more than 140 instructions, including arithmetic, comparison, bitwise, cryptographic, stack, memory and storage, and jump instructions, Table 2 summarizes the commonly used Ethereum opcode assembly instructions.By consulting the EVM instruction table [42], we can analyze the bytecode and the opcode in Figure 1: the EVM reads in the hexadecimal numbers of the bytecode "60 80 60 40 . . .", and first parses the first byte "60", which corresponds to the "PUSH1" operation in the operation instruction table, because the PUSH1 instruction requires an input parameter, followed by "80", which is the parameter of the PUSH1 instruction, representing the pressing of a byte value into the stack at address 0 × 80, and then continues to parse the subsequent code.For the following three reasons, in this study we chose to detect vulnerabilities in smart contracts at the opcode level.(1) The source code is human-written, and the function names in the code are also human-specified.When the function names are changed, the uncertainty in making calls to the functions will affect the detection results, while the source code is long and contains more blank lines and comments, making it difficult to obtain a better feature representation.(2) The bytecode is not readable and it is difficult to directly extract the code syntax structure or the sequence information corresponding to the corresponding function.(3) The opcode is based on more than 140 operation instructions specified by EVM, which can accurately reflect the inner logic of the contract and ensure the reliability of the vulnerability detection model.During the actual dataset collection, we found that more than half of the Ethereum-based smart contracts were not open source, and only bytecode binary files were available, so we decompiled the bytecode into opcode to support model training.

Method
To address the problems of poor detection, incomplete automation, and slow detection in traditional detection methods for smart contract vulnerability detection, the goal of this paper was to design a method that can accurately and automatically detect whether a smart contract contains multiple vulnerabilities.The multi-label vulnerability detection framework proposed in this paper consists of two parts, as shown in Figure 2: the data pre-processing module and the vulnerability detection model.The data pre-processing starts with converting the smart contract bytecode into opcodes using the Opcode-tool, followed by converting the opcodes into a word embedding matrix for input to the neural network model using the word embedding model Word2Vec, while we add multi-label vectors to the generated word embedding matrix using Oyente, a static detection tool based on symbolic execution, and the word embedding

Data Preprocessing
In the field of natural language processing (NLP), syntactic analysis, dependency analysis, lexical analysis, and machine translation are indispensable for processing text data.Ordinary text is unstructured data, and unstructured text needs to be "structured" for further analysis and understanding.Structured data makes it easier to extract semantic features in both machine learning and deep learning domains.Word embedding is the main and common approach to solving this problem, which involves embedding a high-dimensional space of all words into a continuous vector space of lower dimensionality, where each word or phrase is mapped to a vector over the real number field [43].Commonly used word embedding methods are Word2Vec, GloVe, etc.The GloVe is based on traditional statistical methods and does not use neural networks, and the computation is relatively complex, and common Python toolkits do not integrate GloVe.Smart contract opcodes are composed of operation instructions and opcodes, and each opcode can be regarded as a word, which belongs to the textual form of data.To be able to extract and accurately characterize the smart contract vulnerabilities, we chose the Word2Vec model for word embedding operations.

Feature Extraction
Among the more than 140 operation instructions specified by EVM, not all of the opcodes can affect the vulnerability detection results of smart contracts.Opcodes with similar functions can be grouped into the same category to avoid dimensional disasters and reduce computation.For example, by deleting the number 1 after the PUSH1 operation instruction, the partially simplified opcodes were obtained, and are shown in Table 3.

Data Preprocessing
In the field of natural language processing (NLP), syntactic analysis, dependency analysis, lexical analysis, and machine translation are indispensable for processing text data.Ordinary text is unstructured data, and unstructured text needs to be "structured" for further analysis and understanding.Structured data makes it easier to extract semantic features in both machine learning and deep learning domains.Word embedding is the main and common approach to solving this problem, which involves embedding a highdimensional space of all words into a continuous vector space of lower dimensionality, where each word or phrase is mapped to a vector over the real number field [43].Commonly used word embedding methods are Word2Vec, GloVe, etc.The GloVe is based on traditional statistical methods and does not use neural networks, and the computation is relatively complex, and common Python toolkits do not integrate GloVe.Smart contract opcodes are composed of operation instructions and opcodes, and each opcode can be regarded as a word, which belongs to the textual form of data.To be able to extract and accurately characterize the smart contract vulnerabilities, we chose the Word2Vec model for word embedding operations.

Feature Extraction
Among the more than 140 operation instructions specified by EVM, not all of the opcodes can affect the vulnerability detection results of smart contracts.Opcodes with similar functions can be grouped into the same category to avoid dimensional disasters and reduce computation.For example, by deleting the number 1 after the PUSH1 operation instruction, the partially simplified opcodes were obtained, and are shown in Table 3.The simplified opcodes were vectorized by the Word2Vec model, which is a lightweight neural network consisting of an input layer, a hidden layer, and an output layer.The input layer was the one-hot encoding of the opcode O i , and according to the simplified set of operational instructions M, each opcode has a number i ∈ {1, . . . ,|M|}, and the One-hot encoding of O i represents a vector of dimension |M|, where the i-th element value is 1 and the rest of the elements are 0, such as O 1 = [1, 0, . . . , 0]T .The hidden layer sets a linear activation function, and when the model is trained, the neural network weights are obtained, and the number of weights is the same as the number of nodes in the hidden layer.Only the weight corresponding to position 1 is activated, thus generating a new vector to represent the operation code O i .Equation (1) represents the feature matrix of the operation code of contract p.The word vector dimension in this study was set to 300, where i represents the i-th opcode in contract p, since the number of opcodes differs for each contract, max denotes the maximum number of contract opcodes in the whole contract dataset, and the vacant position is filled to 0 when i < max.

Multi-Label
Smart contract multi-label classification is defined as follows: given a dataset SC = x p , l p P p=1 = [X, L], where X = [x 1 , x 2 , . . . ,x P ] T ∈ R P×d , L ∈ R P×K , X refers to the feature space corresponding to the dataset, L refers to the label space of the dataset, N refers to the sample size of the dataset, d refers to the feature vector dimension size, and K refers to the dimension size of the label vector.For the sample x p , l p , x p refers to the feature vector corresponding to the p-th contract in the dataset; that is, Equation (1), and accordingly, the label vector l p of the p-th contract consists of K labels, denoted as l p = l p1 , l p2 , . . .l pK , when it contains the k-th label then it has l pk = 1, otherwise l pk = 1.
We collected a total of 5700 verified smart contracts from the Ether website and manually added labels to all contracts by the Oyente static detection tool.The model proposed in this paper detects five types of vulnerabilities, which are reentrancy, integer overflow, integer underflow, transaction order dependency, and timestamp dependency, so the above-K value is 5.A contract corresponds to five labels, and the value of each label is 0 or 1.When the value is 0, it means that the contract has no certain vulnerability, and when the value is 1, it means that there is a vulnerability of that type, and each vulnerability label is independent of the other.For example, the label of contract p is Lable p = [1, 0, 1, 0, 0], which means contract p has reentrancy and integer underflow vulnerability.

Vulnerability Detection Model
The vulnerability detection model for smart contracts is the core module, which mainly designs and trains specific network models for the vulnerability detection framework, and adjusts the parameters of the models and algorithms in the training process in conjunction with the requirements to obtain the detection model with optimal parameters.Recurrent neural networks (RNN are commonly used to process sequential data and are widely used in the field of NLP.As the number of network layers increases and the training time lengthens, the original RNN is prone to gradient disappearance and gradient explosion, and to solve this problem, S. Hochreiter and J. Schmidhuber proposed the long short-term memory neural network (LSTM) [44].
LSTM is an improved model based on a RNN, which can efficiently solve multiple learning problems associated with sequential data, better capture the dependencies between longer distance operands in contract sequences, and have better performance in handling long sequence data [45].The LSTM model is formed by connecting multiple LSTM cells.The cell structure of LSTM is shown in Figure 3. Compared with an RNN, which has only one transmission state, c t (cell state), LSTM adds a h t (hidden state), including two transmission statuses.The input gate, output gate, and forget gate in the LSTM cell jointly realize the functions of long-term and short-term memory, and together with the cell state c t constitute the LSTM cell.The input gate determines how much input information Z i is retained in the cell structure, the forget gate determines how much information Z f should be retained from the hidden state h t−1 at the previous moment to the current moment, and the output gate determines how much information Z o is output from the cell state c t .Each gate completes the tasks of forgetting and remembering by regulating the cell information.The specific calculation formula is as follows: (2) where h t−1 and x t are the transfer of the previous state and the current input respectively, and the transfer h  The data processing process of LSTM can be summarized mation by forgetting and remembering the information in the tra only realize one-way information transmission, and cannot be smart contract datasets.Therefore, for our model we selected a bid memory neural network (Bi-LSTM) [46] to better capture bidirect cies.The data processing process of one layer of Bi-LSTM is shown The data processing process of LSTM can be summarized as passing valuable information by forgetting and remembering the information in the transmission state, but it can only realize one-way information transmission, and cannot be analyzed when modeling smart contract datasets.Therefore, for our model we selected a bidirectional long short-term memory neural network (Bi-LSTM) [46] to better capture bidirectional semantic dependencies.The data processing process of one layer of Bi-LSTM is shown in Figure 4. Whether it is LSTM or Bi-LSTM, when processing the contract dataset, it can only calculate the opcodes in the forward or reverse order.This mechanism will calculate the time step t depending on the calculation result at the time of t − 1, and the performance will drop dramatically when processing data sequences that are too long, so we introduced the attention mechanism [47].As shown in Figure 5, its essence can be described as a query (query) to a series of (key-value) correct mappings.It focuses limited attention on important opcodes and assigns weights to them, thereby saving resources and quickly obtaining the most effective feature information.To highlight the important information implicit in the smart contract opcode, the attention layer first calculates the feature matrix output from the Bi-LSTM layer to obtain its implicit representation H, which is calculated as shown in Equation ( 9), and then calculates the similarity between H and the random initialization parameter matrix , normalizes the weight coefficients assigned to the input feature vector in the overall semantic scenario using the softmax function to obtain the weights α, and finally multiplies the weight matrix α with the feature matrix to obtain the implicit features Y of the opcode, as shown in the following equation.
where denotes the attention layer weight matrix and denotes the bias vector of the attention layer.After the data are processed by the Bi-LSTM layer and attention layer, the output is still high-dimensional data compared to the classifiable vulnerability classes, and the output dimensions and parameters of each network layer are shown in Table 4.We added a fully-connected layer to convert the output vector into the dimensionality of a multi-label vector.In contrast to the Bi-LSTM and attention layers, which extract features, the ful- Whether it is LSTM or Bi-LSTM, when processing the contract dataset, it can only calculate the opcodes in the forward or reverse order.This mechanism will calculate the time step t depending on the calculation result at the time of t − 1, and the performance will drop dramatically when processing data sequences that are too long, so we introduced the attention mechanism [47].As shown in Figure 5, its essence can be described as a query (query) to a series of (key-value) correct mappings.It focuses limited attention on important opcodes and assigns weights to them, thereby saving resources and quickly obtaining the most effective feature information.To highlight the important information implicit in the smart contract opcode, the attention layer first calculates the feature matrix X c output from the Bi-LSTM layer to obtain its implicit representation H, which is calculated as shown in Equation ( 9), and then calculates the similarity between H and the random initialization parameter matrix W T , normalizes the weight coefficients assigned to the input feature vector in the overall semantic scenario using the softmax function to obtain the weights α, and finally multiplies the weight matrix α with the feature matrix X c to obtain the implicit features Y of the opcode, as shown in the following equation.
where W a denotes the attention layer weight matrix and b a denotes the bias vector of the attention layer.Whether it is LSTM or Bi-LSTM, when processing the contract dataset, it can only calculate the opcodes in the forward or reverse order.This mechanism will calculate the time step t depending on the calculation result at the time of t − 1, and the performance will drop dramatically when processing data sequences that are too long, so we introduced the attention mechanism [47].As shown in Figure 5, its essence can be described as a query (query) to a series of (key-value) correct mappings.It focuses limited attention on important opcodes and assigns weights to them, thereby saving resources and quickly obtaining the most effective feature information.To highlight the important information implicit in the smart contract opcode, the attention layer first calculates the feature matrix output from the Bi-LSTM layer to obtain its implicit representation H, which is calculated as shown in Equation ( 9), and then calculates the similarity between H and the random initialization parameter matrix , normalizes the weight coefficients assigned to the input feature vector in the overall semantic scenario using the softmax function to obtain the weights α, and finally multiplies the weight matrix α with the feature matrix to obtain the implicit features Y of the opcode, as shown in the following equation.

= tanh( • + )
= softmax( • ) where denotes the attention layer weight matrix and denotes the bias vector of the attention layer.After the data are processed by the Bi-LSTM layer and attention layer, the output is still high-dimensional data compared to the classifiable vulnerability classes, and the output dimensions and parameters of each network layer are shown in Table 4.We added a fully-connected layer to convert the output vector into the dimensionality of a multi-label vector.In contrast to the Bi-LSTM and attention layers, which extract features, the ful- After the data are processed by the Bi-LSTM layer and attention layer, the output is still high-dimensional data compared to the classifiable vulnerability classes, and the output dimensions and parameters of each network layer are shown in Table 4.We added a fullyconnected layer to convert the output vector into the dimensionality of a multi-label vector.In contrast to the Bi-LSTM and attention layers, which extract features, the fully-connected layer serves to map the learned distributed feature representation into the sample label space.When processing sequential samples, the fully-connected layer maps the embedding space to the hidden space and then transforms the hidden space to the sample label space to achieve the classification effect, and in the multi-label classification task, the sigmoid function is used as the activation function of the output layer to model the classes' Bernoulli probability distribution, as shown in Equation (12), and also use binary cross-entropy loss function as the activation function to calculate the loss for each label of a sample and take the average as the final loss to solve the multi-label classification problem, as shown in Equation (13).For the vector output by the fully connected layer, the calculation results of each category are respectively input into the sigmoid function, and the value of the vector is mapped to the probability value between [0, 1].We set the threshold probability to 0.5, and if the probability value is greater than 0.5, it is determined that the sample contains this type of vulnerability, and the output is assumed to be [0.9, 0.1, 0.2, 0.3, 0.7], indicating that the sample contains reentrancy and timestamp dependency vulnerabilities.
where z is the output of the fully connected layer, l k is the true value (0 or 1) corresponding to the k-th class, and σ(z) is the output value corresponding to the model; that is, the probability value corresponding to each vulnerability label.

Dataset and Parameter Settings
By consulting relevant research materials, we found that there is currently no good open-source smart contract vulnerability dataset.Although some datasets have been opened in similar work, the integrity of smart contracts and their data labels cannot be guaranteed.The Ethereum official website obtains the verified smart contract bytecode files to build a reliable experimental dataset, and then uses Oyente, a static detection tool based on symbolic execution, to add multi-label vectors to the contract.There are a total of 5450 contracts with multiple types of vulnerabilities.After verification and deletion of 28 missing value contracts and 132 duplicate contracts, there were 5290 remaining vulnerability contracts.At the same time, in order for the model to better detect vulnerabilities, we also constructed 1000 contract datasets that did not contain vulnerabilities for experiments, and the ratio of training set to test set was 8:2.The number of contracts containing each vulnerability in the final dataset is shown in Table 5.The experiments were run in the official docker container of PyTorch and trained by using Nvidia RTX3070 graphics card.Since the parameter settings of different models are different, we used the same parameters as much as possible to ensure a fair evaluation of each model.The hidden layer dimension of RNN, LSTM, and BiLSTM was set to 128, and their stacking layers were set to 5 layers.The learning rate was uniformly set to 0.001, the epoch was set to 100, the optimizer used the Adam optimizer, the batch size was set to 64, the dimension of the input word vector was 300, and the maximum length of the word vector was 1000.The loss function used binary cross-entropy.

Evaluation Indicators
The evaluation indicators commonly used in the experiment included: recall, precision, F1-score, and accuracy.Based on the confusion matrix in Table 6, the calculation of each indicator was as the following equation shows: Accuracy = TP + TN TP + TN + FP + FN (17) where TN (true negative) represents the number of samples whose predicted value and true value are both negative, FN (false negative) represents the number of samples whose predicted value is negative and the true value is positive.TP (true positive) represents the predicted value and the number of samples whose true values are all positive classes, and FP (false positive) represents the number of samples whose predicted values are positive classes and the true values are negative classes.The experiment in this paper was multilabel classification, so we chose micro-recall rate (micro-R), micro-precision rate (micro-P), and micro-F1-Score (micro-F1) to comprehensively evaluate the multi-vulnerability detection effect of the model.The calculation equation was as follows: micro − R = TP TP × FN (18) micro − P = TP TP × FP (19) micro

Experimental Results
To verify the effectiveness of the model designed in this study, RNN, LSTM, LSTM-ATT, Bi-LSTM, and TextCNN [48] were used to compare with our model.The evaluation results of the six models on the experimental dataset are shown in Table 7.It can be seen that the multiple multi-label evaluation indicators of our proposed model are higher than those of the comparison model, which reflects the effectiveness of the WBL-ATT model in the field of multi-label detection of smart contract vulnerabilities.The higher micro-R value of RNN and the lower micro-P value is because the RNN model suffers from severe overfitting, and the model is more inclined to predict the vulnerability of the sample as the category of vulnerabilities in the training set, while other models micro-P values are higher than RNN, indicating that it would be wise for us to introduce sequential model LSTM to process opcodes.By comparing the indicators of LSTM and Bi-LSTM, it is proved that extracting reverse semantic information between contract opcodes is conducive to detecting contract vulnerabilities.In addition, we also found that the indicators of LSTM-ATT were greatly improved compared to Bi-LSTM.Combined with the analysis of the experimental process, it is concluded that the addition of an attention mechanism can make the model balance features with a small number of samples and focus on important feature data, which has a greater impact on vulnerability detection results than Bi-LSTM.TextCNN is an improved CNN model for handling NPL tasks, but the network structure of this model is relatively simple, the feature extraction ability is limited, and the effect performance is general.Its metrics are much lower than those of our model.Figure 6 shows the accuracy values of the six models for each of the five vulnerabilities.The accuracy value is the ratio of the number of correctly predicted samples to the total number of samples, and it can be found that the accuracy of WBL-ATT that the model we proposed for the detection of all five vulnerabilities exceeded that of the other models, with TOD and timestamp dependency being much higher than in the other models.Due to the small number of samples containing reentrancy and timestamp dependency vulnerabilities in the dataset, RNN and LSTM performed well, but in fact, because these two models had overfitting phenomenon, the models tended to predict the samples as not containing these two vulnerabilities.Comparing the LSTM and Bi-LSTM models, we found that the bidirectional LSTM model performed better than the unidirectional LSTM model in all vulnerabilities.The TextCNN performed poorly except for on the integer underflow vulnerability, which verifies that the TextCNN model is simpler and therefore weak in feature extraction.Table 8 shows the precision, recall, and F1-score of the five vulnerabilities detected by the six models respectively.The precision value can measure the number of correct predictions in the samples predicted by the model; the recall value is used to indicate how many of the actual samples were correctly predicted.The F1 value is an indicator used in statistics to measure the accuracy of the binary classification model.It takes into account the precision and recall of the classification model and can be regarded as a weighted average of the precision and recall of the model.From the table, it can be found that RNN, LSTM, and Bi-LSTM all had serious overfitting.This is because the number of samples of integer overflow, integer underflow and TOD in the training samples was much larger than the number of reentrancy and timestamp dependency samples.The model trained on unbalanced data was more in- Table 8 shows the precision, recall, and F1-score of the five vulnerabilities detected by the six models respectively.The precision value can measure the number of correct predictions in the samples predicted by the model; the recall value is used to indicate how many of the actual samples were correctly predicted.The F1 value is an indicator used in statistics to measure the accuracy of the binary classification model.It takes into account the precision and recall of the classification model and can be regarded as a weighted average of the precision and recall of the model.From the table, it can be found that RNN, LSTM, and Bi-LSTM all had serious overfitting.This is because the number of samples of integer overflow, integer underflow and TOD in the training samples was much larger than the number of reentrancy and timestamp dependency samples.The model trained on unbalanced data was more inclined to include all three types of vulnerabilities, so the recall values of these three types of vulnerabilities were all 1, while the recall value of the other two types of vulnerabilities with fewer samples was 0. The LSTM-ATT and WBL-ATT with the added attention mechanism could reasonably balance different vulnerabilities, reflecting that the attention mechanism can effectively enhance the generalization ability of the model.Although the WBL-ATT model also had a low accuracy of vulnerability classification with a small number of samples, the precision value on reentrancy was only 72.26%, which was lower than those of LSTM-ATT and TextCNN, but timestamp dependency reached 78.97%, much higher than the values of LSTM-ATT and TextCNN, and among the three samples with a large number of vulnerabilities, integer overflow, integer underflow, TOD, and WBL-ATT performed well, and the precision value was greater than that of TextCNN and LSTM-ATT.In general, the WBL-ATT model had good generalization performance and performed well on samples with a large number of vulnerabilities.In order to verify the reliability of our proposed Word2Vec model for processing opcode data, we used the NN Embedding layer that comes with Pytorch to process word vectors to convert opcodes, and input the WBL-ATT model together with our proposed Word2Vec feature matrix.The results of the four evaluation indicators are shown in Figure 7.It can be seen that the data extracted by the Word2Vec model was more beneficial to the subsequent detection results, but it could not achieve end-to-end model training, which provides our new future research direction.
code data, we used the NN Embedding layer that comes with Pytorch to process word vectors to convert opcodes, and input the WBL-ATT model together with our proposed Word2Vec feature matrix.The results of the four evaluation indicators are shown in Figure 7.It can be seen that the data extracted by the Word2Vec model was more beneficial to the subsequent detection results, but it could not achieve end-to-end model training, which provides our new future research direction.

Conclusions and Future Work
Smart contracts have expanded the application scenarios of blockchain, and the number of contracts on various blockchain platforms has increased exponentially.However, frequent smart contract vulnerability attacks not only cause heavy losses to users but also bring into question the reliability of blockchain platforms.In our work, we proposed an approach combining Bi-LSTM and attention mechanisms to implement multi-label vulnerability detection for smart contracts.For the Ethereum smart contract dataset, the bytecode was parsed to obtain the corresponding opcode, and the Word2Vec word embedding model was used to convert the opcode into a word embedding matrix for the neural network input, and then the Bi-LSTM was used to extract the semantic information between the front and back of the opcode, while the attention mechanism was added to assign weights to the important features, and finally the fully connected layer and sigmoid function were used to achieve multi-label classification.We tested our proposed multi-label vulnerability classification detection model with several vulnerability classification detection models on a self-made multi-label vulnerability dataset, and the results showed that our proposed multi-label vul-

Conclusions and Future Work
Smart contracts have expanded the application scenarios of blockchain, and the number of contracts on various blockchain platforms has increased exponentially.However, frequent smart contract vulnerability attacks not only cause heavy losses to users but also bring into question the reliability of blockchain platforms.In our work, we proposed an approach combining Bi-LSTM and attention mechanisms to implement multi-label vulnerability detection for smart contracts.For the Ethereum smart contract dataset, the bytecode was parsed to obtain the corresponding opcode, and the Word2Vec word embedding model was used to convert the opcode into a word embedding matrix for the neural network input, and then the Bi-LSTM was used to extract the semantic information between the front and back of the opcode, while the attention mechanism was added to assign weights to the important features, and finally the fully connected layer and sigmoid function were used to achieve multi-label classification.We tested our proposed multi-label vulnerability classification detection model with several vulnerability classification detection models on a self-made multi-label vulnerability dataset, and the results showed that our proposed multi-label vulnerability classification detection model outperforms other models, indicating that our proposed approach performs well in handling the multi-label vulnerability detection task of smart contracts.
Although this research has contributed to the field of smart contract vulnerability detection, there are still limitations to a certain extent: the relationship between the code in a smart contract and different vulnerabilities is a key issue for vulnerability detection in smart contracts and requires further digging.The following aspects are planned to be optimized in future research work: (1) Locate the vulnerability of the contract that has detected the vulnerability, and find out which operation instructions cause the vulnerability of the contract.(2) Our method is only for vulnerability detection of smart contracts on the Ethereum platform, and the next step is to extend it to other blockchain platforms (such as Fabric, ESO) to achieve broader contract vulnerability detection.(3) With the advancement of technology in the NLP field, researchers have proposed more new frameworks, and we will try to use end-to-end technology to process opcodes to achieve vulnerability classification.
values at the top of the stack, add and push the result to the top of the stack.0 × MUL Pop the top two values from the stack, multiply and push the result to the top of the stack.0 × SUB Pop the values arg0 and arg1 from the stack in turn, and push the values of arg0-arg1 to the top of the stack.0 × ISZERO Pop the top value of the stack, if the value is 0, push 1 to the stack, otherwise push 0 to the stack.0 × CALLVALUE Get the transfer amount in the transaction.0 × POP Pop the top value from the stack.0 × MLOAD Pop the top of the stack and use the value as the index to load the 32 bytes after the index in memory to the top of the stack.0 × MSTORE Pop the values arg0 and arg1 from the stack in turn, and place arg1 at arg0 in memory.0 × JUMP Pop the top value of the stack and use this value as the destination address of the jump.0 × PUSH1 Put a 1-byte value on the top of the stack.0 × DUP1 Copy the first value in the current stack to the top of the stack.0 × SWAP1 Swap the first value and the second value in the stack.
matrix and multi-label vectors together form the dataset for model training and testing.The training set is fed into the vulnerability detection model for training to obtain a neural network model with high classification accuracy, and the model is finally validated on a test set.The vulnerability detection model consisted of a Bi-LSTM layer, an attention layer, a fully-connected layer and a sigmoid function to implement multi-label vulnerability detection for smart contracts classification.Electronics 2022, 11, x FOR PEER REVIEW 8 of 19 word embedding matrix and multi-label vectors together form the dataset for model training and testing.The training set is fed into the vulnerability detection model for training to obtain a neural network model with high classification accuracy, and the model is finally validated on a test set.The vulnerability detection model consisted of a Bi-LSTM layer, an attention layer, a fully-connected layer and a sigmoid function to implement multi-label vulnerability detection for smart contracts classification.
t and the current output y t of the next unit are updated through three gating units, W c , W i , W f , W o are weights, b c , b i , b f , b o are bias vectors, and σ is the activation function.are the transfer of the previous state and the and the transfer ℎ and the current output of the next unit a gating units, , , , are weights, , , , are bias tivation function.

Figure 6 .
Figure 6.Accuracy of five vulnerabilities detection.

Figure 6 .
Figure 6.Accuracy of five vulnerabilities detection.

Table 1 .
Existing smart contract vulnerability detection methods.

Table 4 .
The output and parameters of each network layer.

Table 5 .
Number of smart contracts.

Table 7 .
Model evaluation results.

Table 8 .
Precision, recall and F1-score of five vulnerabilities detection.