Smart Contract Vulnerability Detection Model Based on Multi-Task Learning

The key issue in the field of smart contract security is efficient and rapid vulnerability detection in smart contracts. Most of the existing detection methods can only detect the presence of vulnerabilities in the contract and can hardly identify their type. Furthermore, they have poor scalability. To resolve these issues, in this study, we developed a smart contract vulnerability detection model based on multi-task learning. By setting auxiliary tasks to learn more directional vulnerability features, the detection capability of the model was improved to realize the detection and recognition of vulnerabilities. The model is based on a hard-sharing design, which consists of two parts. First, the bottom sharing layer is mainly used to learn the semantic information of the input contract. The text representation is first transformed into a new vector by word and positional embedding, and then the neural network, based on an attention mechanism, is used to learn and extract the feature vector of the contract. Second, the task-specific layer is mainly employed to realize the functions of each task. A classical convolutional neural network was used to construct a classification model for each task that learns and extracts features from the shared layer for training to achieve their respective task objectives. The experimental results show that the model can better identify the types of vulnerabilities after adding the auxiliary vulnerability detection task. This model realizes the detection of vulnerabilities and recognizes three types of vulnerabilities. The multi-task model was observed to perform better and is less expensive than a single-task model in terms of time, computation, and storage.


Introduction
Smart contracting [1], a computerized trading protocol that codes paper contracts and executes terms, has lagged behind theoretical research in its application practice owing to the lack of a trusted execution environment. In 2008, Satoshi Nakamoto proposed a peerto-peer (P2P) cryptocurrency system Bitcoin [2], whose underlying blockchain technology has decentralized characteristics and provides a reliable execution environment for smart contracts. Since 2013, the emergence of blockchain-driven, Turing-complete application platforms (Ethereum [3], Hyperledger [4], etc.) has provided opportunities for the development of smart contracts, resulting in a sharp rise in the number of contracts. However, as an emerging technology, smart contracts have a few defects in their programmable language and execution systems [5]. First, developers use high-level languages to code smart contracts and implement various complex business logic. However, these high-level languages (e.g., Solidity [6]) are extremely error-prone and contain programming errors that may be exploited and which cannot be detected until they are deployed on the blockchain. Second, smart contracts typically store and manage large amounts of financial assets and are operated on an open network, where any user can join and view the contract without a trusted third party, which increases the risk of the contract being attacked and can, in turn, bring substantial losses. Table 1 summarizes various attacks on smart contracts in recent years, each of which resulted in significant economic losses. In addition, because they run on distributed nodes on a blockchain platform, smart contracts cannot be modified after being deployed on a blockchain platform. Therefore, the vulnerability detection [7] of smart contracts is essential prior to their deployment on a blockchain platform. However, as most contract source codes are difficult to obtain, it is not practical to detect vulnerabilities only through smart contract source codes or to manually check whether the contract has vulnerabilities and identify their type. Table 1. Attacks on smart contracts.

Attack Types of Vulnerability Economic Loss Time
The DAO attack [8] Reentrancy Direct costs USD 60 million June 2016 The Parity Wallet hack [9] Code injection Over USD 280 million were frozen July 2017 The ERC-20 campaign [10] Integer overflow Indirect losses run into billions of dollars (USD) April 2018 To resolve the security issues concomitant to smart contracts, researchers have developed various detection tools that can detect the existing vulnerabilities. According to the technology employed, there are four main methods-symbolic analysis [11][12][13][14][15], formal verification [16][17][18][19][20][21][22][23], fuzzy testing [24][25][26][27], and other technologies [28][29][30][31]. Symbolic analysis is the most widely employed method and involves converting the uncertain input into a symbolic value in the process of program execution to promote the program execution and analysis. Symbolic execution can achieve a more accurate and comprehensive analysis of the program; however, it is typically confronted with problems such as path explosion due to the influence of program branches and loops [32]. Formal verification primarily employs rigorous demonstrable descriptive language or logic to describe the attributes and characteristics of the program and constructs formal specifications using mathematical logical proof and reasoning to determine whether the security attribute is set in line with expectations. However, formal verification entails a relatively low degree of automation and detects vulnerabilities that do not practically exist [33]. Fuzzy testing uses randomly generated test samples to trigger the vulnerability of the program and monitors the program's abnormal behavior during its execution [34]. This method can effectively detect smart contract vulnerabilities, but its precision is low, and it cannot detect contracts without source code and call interface information. Other technologies mainly include program analysis and taint analysis, which are also widely used in program vulnerability mining and have a low cost but high false positive rate.
In recent years, machine learning (ML) methods used for program analysis have become a new trend in the field of security detection, which has a high degree of automation and breaks through the limitations of rule-based vulnerability detection of the existing detection methods. ML-based methods can extract hidden features from massive data (not limited to a single information feature) and have good scalability, which has caused researchers to introduce ML into smart contract vulnerability detection to achieve meaningful results [35][36][37][38][39]. However, the existing ML-based smart contract detection methods still have a few shortcomings-(1) they can only distinguish between leaky and non-leaky contracts (binary classification problem); (2) the source code of the smart contract is required, limiting its applicability; (3) it is not scalable and can only detect a few specific vulnerability types and cannot be extended to other vulnerability types.
To resolve the above-mentioned issues, this paper developed a smart contract vulnerability detection model based on multi-task learning [40], a new ML method, which can combine multiple learning tasks to complete multi-label classification and improve the accuracy and generalization ability of the model. By setting auxiliary tasks to learn more directional vulnerability features, the model can improve the detection effect and realize the detection and recognition of vulnerabilities.
The contributions of this paper are enumerated as follows: • This paper proposes a smart contract vulnerability detection model and introduces multi-task learning into security detection. The model consists of two parts-(1) the bottom sharing layer, which uses neural networks based on the attention mechanism to learn the semantic information of input contracts and extract feature vectors; (2) the specific task layer, which uses the classical convolutional neural network (CNN) [41] to establish a classification network for each task branch. The captured features are learned from the sharing layer for detection and recognition to realize the detection of various vulnerabilities; • The proposed multi-task learning model can effectively improve the precision of vulnerability detection in comparison with other methods. Compared to the singletask model, the multi-task model can complete multiple tasks at the same time, thereby saving costs in terms of time, computation, and storage. At the same time, this model can be extended to support the learning and detection of new vulnerabilities; • In this study, we collected and downloaded 149,363 smart contracts running on realworld Ethereum from the XBlock platform [42] and used the existing detection tools to detect and label them and construct an open-source dataset containing the labeling information. This dataset provides key attributes, including the address, bytecode, and source code, and can be employed for smart contract vulnerability detection research.
The remainder of this paper is organized as follows: In Section 2, the paper provides the necessary background information on smart contracts and their vulnerabilities and briefly describes the pertinent literature. In Section 3, the design of the smart contract vulnerability detection model based on multi-task learning is introduced in detail, and in Section 4, the experimental details and evaluation results are described in detail. Finally, Section 5 includes the discussion and conclusion of this paper.

Smart Contracts and Vulnerability
In this study, the authors used the smart contracts running on Ethereum [3] as the research object. Ethereum, one of the most popular blockchain platforms, provides a decentralized, Turing-complete Ethereum Virtual Machine (EVM) [43] to handle the execution and invocation of smart contracts through its dedicated cryptocurrency. EVM is a stackbased machine with a 256-bit word size and a maximum stack size of 1024 for performing transactions on Ethereum [16]. The business process is depicted in Figure 1. When a transaction is entered, it is internally converted to a message, which is then passed to EVM for execution. When the smart contract is invoked, the bytecode is loaded and executed through the interpreter in EVM. Unlike traditional applications, smart contract execution and transactions require the gas to be maintained, which is the unit used to pay miners the computational cost of running the contract or transaction paid in cryptocurrency [44]. As a kind of code, smart contracts are compiled by EVM to generate bytecode and executed on a blockchain platform through address invocation. As mentioned above, smart contracts have defects in their programming languages and execution systems. Researchers have collected this vulnerability information and created the Smart Contract Vulnerability Classification (SWC) Registry [45]. Considering the economic loss and harm caused by contract vulnerabilities to the real world, three classic and common vulnerabilities for detection have been selected, namely arithmetic vulnerability (SWC-101 [45]), reentrancy (SWC-107 [45]), and the contract contains unknown address. They are described as follows.
(1) Arithmetic vulnerability: This type of vulnerability is also known as integer overflow or underflow, arithmetic problems, and so forth. It is very common because programming languages have a length limit for integer types of storage, and it occurs when the results run outside of this range. For example, if a number is stored in the uint8 type, it means that the number is stored in an 8-bit unsigned number ranging from 0 to 255, and an arithmetic vulnerability occurs when an arithmetic operation tries to create a number outside of that range. Arithmetic vulnerability is also one of the most common vulnerabilities in smart contracts, and malicious attackers use this vulnerability to steal a large number of tokens, resulting in considerable economic losses. (2) Reentrancy: The ability to call external contract codes is one of the features of smart contracts, and contracts can send digital currency to external user addresses for transactions. Such calls to external contracts may cause reentrancy. The attacker uses reentrancy vulnerability to perform the recursive callback of the main function and continuously carries out the "withdrawal" operation in the contract until the account balance in the contract is cleared or the gas upper limit is reached. In 2016, the DAO attack [8] took place, wherein a malicious attacker applied to the DAO contract for funds several times before the contract balance was updated. The vulnerability was caused by a code error in which the developer failed to consider recursive calls. Although Ethereum resolved the attack with a blockchain hard fork, it still caused significant economic losses. (3) Contract contains unknown address: Smart contracts are P2P computer transaction protocols. Thus, when a contract contains an unknown address, this address is likely to be used for some malicious activities. When this vulnerability occurs, it is required to check the address. In addition, it is required to check the code of the called contract for vulnerabilities.

Methods for Smart Contract Vulnerability Detection
Automated vulnerability mining is an important area of software vulnerability mining. The authors of this paper examined the literature on five related types of research-symbolic execution, formal verification, fuzzy testing, other technologies, and ML-based methods.
(1) Symbolic execution: Oyente [11] was one of the earliest works to use symbolic execution for detecting vulnerabilities in the source code or bytecode of smart contracts. It constructs the control flow graph and uses it to crease inputs and provides a symbolic execution engine for other tools. Osiris [12] has been improved based on Oyente, using symbolic execution and taint analysis to detect vulnerabilities. Similarly, Manticore [15] analyzes contracts by executing symbolic transactions against the bytecode, tracking the contracts' states, and verifying the contracts. Maian [14] and Mythril [13] are also based on symbolic execution. (2) Formal Verification: Hirai et al. [46] used Isabelle/HOL to formalize contracts to prove their security. In addition, many methods have been developed to formalize smart contracts, such as ZEUS [18], which translates source codes into LLVM intermediate language, uses XACML to write validation rules, and then uses SeaHorn [47] to formalize validation. Securify [22] and VerX [23] are two other mainstream formal verification tools. (3) Fuzzy Testing: Fuzzy testing has been widely used in the vulnerability mining of traditional programs, and it attempts to expose vulnerabilities by executing the pro-gram with inputs. Echidna [24], published by Trail of Bits, is a complete fuzzy testing framework for analyzing smart contracts and simulation testing. ContractFuzzer [25] is also a kind of fuzzy testing scheme that performs vulnerability detection by recording the instruction log during the execution of smart contracts. ILF [26] is a fuzzy testing scheme based on neural networks that are used to generate better test cases in fuzzy testing. (4) Other technologies: Program analysis and taint analysis are also commonly used in vulnerability detection, and program analysis involves determining the safety of a program by analyzing it to obtain its characteristics, while taint analysis involves marking key data and tracking its flow in the process of program execution to achieve program analysis. SASC [28] is a smart contract vulnerability detection based on static program analysis methods that searches for control flow characteristics to detect vulnerabilities through automatic analysis of the source code. Similarly, SmartCheck [29] and Slither [30] are also detection tools based on program analysis. Sereum [31] uses taint analysis to trace data streams to detect vulnerabilities. (5) ML-based methods: As mentioned above, the security of smart contracts has garnered public attention, and some achievements have been made in the research of contract vulnerability detection methods using ML. TonTon Hsien-De Huang [35] proposed a method for the in-depth analysis of potential vulnerabilities that involves converting bytecode into an RGB image and then training a CNN for automatic feature extraction and learning. Sun et al. [48] added an attention mechanism [49] to CNN to further improve its accuracy in detecting vulnerabilities. Wesley et al. [36] used a long short-term memory (LSTM) neural network to learn vulnerabilities by a sequential learning method and realized a relatively fast detection of vulnerability contracts. However, these methods can only be used to distinguish whether there are vulnerabilities, which are essentially binary classification models, and they cannot identify the types of vulnerabilities or detect multiple vulnerabilities. To realize the detection of multiple vulnerabilities, Moment et al. [50] and ContractWard [37] used various ML algorithms (support vector machine, decision tree, random forest, XG-Boost, AdaBoost, k-nearest neighbor, etc.) to establish an independent classification model for each vulnerability; the main difference is that the former used an abstract syntax tree to construct the vulnerability features of a smart contract, while the latter used an N-Gram language model [51] to extract binary syntax features from the simplified opcodes of smart contracts. Although these two methods have realized various vulnerability detection, they are still dichotomous and not separate models.
In addition, ESCORT [52] proposed a multi-output architecture that connected each vulnerability classification branch to the feature extractor based on a deep neural network (DNN) and established a separate output for each vulnerability type, thus realizing the detection of multiple vulnerabilities. This provided the preliminary idea for our model design, and in this work, we propose a multi-task learning-based model for smart contract detection, which not only detects the presence of vulnerabilities in the contract but also recognizes the types of contract vulnerabilities, that is, it includes multi-vulnerability detection.

Multi-Task Learning
Multi-task learning is usually only about optimizing for specific metrics in ML. To do that, a single model is trained to perform tasks to get the desired results, and then fine-tune models until the performance is at its best. Although this single-model approach often results in better performance, if one focuses only on a single indicator, they may miss some relevant information. This neglected indicator information can make the single-task model achieve better performance to some extent. In essence, there is some information from related tasks, and the shared representation between tasks enables the model to better summarize the original task, an approach known as multi-task learning [40].
The main purpose of multi-task learning is to jointly learn multiple related tasks such that the information contained in the task can be used by other tasks, and the learning efficiency and generalization ability of the model can be improved by sharing this information. With this method, smart contracts are treated as a special long sequence of text and they are learned from natural language processing (NLP) to process them. Multi-task learning has been widely used in the field of NLP and has had some achievements. Collobert et al. [53] proposed a CNN model based on multi-task learning to demonstrate the excellent performance of multi-task learning in natural language processing, including part-of-speech tagging, named entity recognition, and semantic role tagging. Niu et al. [54] proposed a character-level CNN model to normalize disease names based on multi-task learning and introduced an attention mechanism to optimize the model effect. Liu et al. [55] designed three information-sharing mechanisms based on LSTM and used the sharing layer of a specific task to model the text. Their study found that sub-tasks could improve the performance of the main classification task. Yang et al. [56] proposed an attention-based multi-task BiLSTM-CRF model with embeddings from language models (ELMo) as a vector, which further improved the entity recognition and normalization effect. In this current study, multi-task learning was incorporated into the task of smart contract vulnerability, as it can improve the performance of the main task by sharing knowledge. By setting auxiliary tasks, the proposed model can learn more directional vulnerability features, improve the detection effect, and realize the detection and recognition of vulnerabilities.
There are two types of multi-task learning models-the hard parameter sharing method [57] and the soft parameter sharing method [58]. The former method, illustrated in Figure 2, is the most common sharing strategy wherein different tasks share the model part except for the output layer, which can train the general representation of multiple tasks at the same time and effectively avoid the over-fitting risk caused by less training data. The latter method, depicted in Figure 3, does not share the parameter structure directly, and each task has its model and parameters. The parameter similarity of the model can be guaranteed by the regularization of the parameters of similar parts of the model. In this study, the authors proposed a multi-task learning framework based on the hard parameter sharing method, which shares the underlying parameters; the parameters of each model at the top are independent of each other to reduce the possibility of model overfitting.

The Source Codes, Bytecodes, and Operation Codes of Smart Contracts
Deploying a smart contract on a blockchain platform entails three steps. First, developers use a high-level language (e.g., Solidity [6]) to code the smart contracts; second, the source codes are compiled into bytecode with a compiler; third, the bytecodes are uploaded to EVM via an Ethereum client and translated into operation codes (opcodes). However, because contract source code is unavailable and there are many man-made variables defined in source codes, it is not appropriate to analyze smart contracts using source code. Furthermore, smart contracts run on blockchains in the form of bytecode, which is easier to obtain. The relationships among source codes, bytecodes, and opcodes of smart contracts are illustrated in Figure 4. The three text frames from left to right represent the three code types, respectively. The leftmost frame represents the source codes of smart contracts, which are written in a high-level programming language, Solidity, where ** represents the exponential operator. The middle frame represents the byte codes of smart contracts, which are a cluster of hexadecimal digits corresponding to a specific operation code, for example, the bytecode 60 in the first line corresponds to the opcode PUSH1 in the first line of the rightmost frame, which represents the operation codes. When processing the smart contracts, the source codes are firstly compiled into byte codes, then into operation codes, that is, opcodes, which are used as data in the model.

Data Acquisition
In this study, we collected and downloaded 149,363 smart contracts running on realworld Ethereum from the XBlock platform, which contains nine contract attributes, namely the address, contract code, timestamp, create value, create block number, created transaction hash, creation code, creator, and code [42]. Next, the data were cleaned by writing data-cleaning scripts to remove redundant, repeated, invalid, and vacant data. We then used a vulnerability detection tool to label all source code files and obtain the labeled data, including whether the contracts have vulnerabilities (Flag) and the specific types of vulnerabilities (Label). Specifically, each contract has four tags, and the tags are independent of each other for each type of vulnerability. Each label consists of a four-dimensional column [x 1 , x 2 , x 3 , x 4 ], and each element x i (i = 1, 2, 3, 4) has a value of 0 or 1. For x i = 1(i = 1), the smart contract has the i-th vulnerability; otherwise, there is no such vulnerability. In particular, x 1 represents the label "Flag", which indicates whether the smart contract has any vulnerabilities. Next, the labels were assumed to be correct. Finally, the datasets constructed in this study contain 141,387 smart contracts and four kinds of tag information. A few samples of the smart contracts datasets are presented in Table 2. Bytecodes are byte arrays encoded by hexadecimal digits, which represent a specific sequence of operations and parameters; however, they take up a large amount of memory space to analyze and model long sequences of bytecodes. It is thus impractical to directly use bytecode as model input. The bytecode representation of smart contracts has a oneto-one mapping relationship with the opcode of blockchain, and thus, it is feasible to detect contract vulnerabilities at the opcode level. Therefore, it is necessary to convert the bytecode to generate the opcode; the opcode was used as data in this research.
According to Ethereum Yellow Paper [16], there are 142 operation instructions with 10 functions, including arithmetic operations, bit-wise logic operations, block information operation, comparison, stack, memory, storage, jump instruction, and so forth. If all these instructions are used, it may lead to a dimension disaster caused by the presence of too many instructions. Fortunately, some operating instructions have little to do with the behavior of the source code, and thus, the role of these instructions can be neglected in the process of vulnerability detection [48]. Likewise, the opcodes are simplified by classifying opcodes with similar functions; specifically, each push instruction is followed by an operand that can be removed [37]. The simplified opcode methods are shown in Table 3.  The sample data of simplified opcodes is shown in Table 4.

Data Imbalance
As a typical anomaly detection problem, the number of smart contracts with vulnerabilities is far lower than the number of normal contracts, and the number of contracts with specific vulnerabilities is even lower. Thus, it is essentially an imbalanced classification problem. Most of the current methods to solve data imbalance are resampling, including under-and over-sampling [59], the main idea of which is to reduce the influence of imbalanced data on the classifier by adjusting the proportion of imbalanced data. In particular, the under-sampling method balances data by reducing the number of samples from most classes, while the over-sampling method improves the classification performance by increasing the number of samples from a few classes.
As depicted in Figure 5, the number of category vulnerabilities in the original dataset is highly unbalanced. Therefore, resampling this dataset includes the under-sampling of contracts without vulnerabilities, thereby reducing the number of samples of most classes. It also includes the over-sampling of contracts with vulnerabilities, increasing the number of samples of a few classes to achieve balance in the dataset. Table 5 shows the number of different contracts before and after sampling, and the balanced dataset is shown in Figure 6.

Model Design
The paper developed a smart contract vulnerability detection model based on multitask learning; the model framework is shown in Figure 7. An efficient multi-task network structure must take the shared part and the specific part at the same time into account. It not only needs to learn the generalized representation between different tasks to avoid overfitting but also needs to learn the unique characteristics of tasks to avoid under-fitting [40]. Therefore, the model is mainly divided into two parts-the bottom sharing layer and the top specific tasks layer.

The Bottom Sharing Layer
The design of the bottom sharing layer is used to determine how to achieve knowledge sharing among all tasks in the multi-task learning model. In multi-task learning, the shared content mainly includes features, instances, and parameters. Feature-based multi-task learning is mainly used to learn common features among different tasks. Instance-based multi-task learning mainly attempts to identify useful data instances in one task for other tasks and then share knowledge through the identified instances. Parameter-based multitask learning uses model parameters of one task to help learn model parameters in other tasks [40]. As mentioned above, the opcode of smart contracts is regarded as a special long-sequence text and use the knowledge of NLP to convert the text representation into a digital vector representation and then extract text features for feature learning to complete specific tasks. In this study, we, therefore, chose to implement a multi-task learning model by sharing features.
In addition to sharing knowledge, another problem with the design of the sharing layer is how to share it, which refers to the specific ways in which knowledge is shared between tasks. In feature-based multi-task learning, knowledge sharing between tasks is realized mainly by learning key features. Existing studies have shown that the attention mechanism can make the neural network focus on key features. Therefore, the proposed model constructs a sharing layer network based on the attention mechanism to enable the model to learn the features of smart contract sequences; its architecture is depicted in Figure 8. The bottom shared layer uses word embedding and positional embedding to describe the smart contract opcode sequence. Word embedding converts each input opcode into the form of a word vector to map the opcode sequence into a multi-dimensional space. To make the model understand the order of the opcode sequence, positional embedding uses the position information of words to make a secondary representation of each word in the sequence. Adding the position vector to the word coding combines the word order information with the word vector to form a new representation in the subsequent calculation, which can better express the distance between words and complete the description of the input sequence. Therefore, the proposed model can learn the information of the opcode sequence. The specific calculation formula is as follows: After encoding by word embedding and positional embedding, the opcodes are input into a multi-head attention layer for learning and extracting features. When the model processes each word in the input sequence, the self-attention network of the multi-head attention layer [49] helps the encoders pay attention to all words in the whole input sequence and assists the model to view other positions in the input sequence to achieve a better encoding effect. The input consists of queries and keys of dimension d k and values of dimension d v . The output of the self-attention networks can be computed as follows: where Q, K, V is a set of a query vector, key vector, and value vector formed by multiplying the input vector by three weight matrices, respectively. We computed the products of the query with all keys, divide each by √ d k , and applied a softmax function to obtain the weights of the values [49].
The multi-head attention layer performs the same self-attentional network calculation mentioned above. The difference is that each head i projects the input vectors Q, K, V through different linear transformations to maintain the independent vector weight matrices of each head, thus generating different attention vectors matrices. Finally, the vectors generated by each head are superimposed and spliced, resulting in a vector-matrix that incorporates all the attention head information. The multi-head attention network structure used by the model is shown in Figure 9; it improves the performance of the attention layer and expands the model's ability to focus on different positions. The output of the multi-head attention layer is computed as follows: The model sends the vector-matrix output from the multi-head attentional network to the feed-forward network and outputs the results of the feature-sharing layer. The network structure is depicted in Figure 10. The matrix of output can be computed as follows:

The Top Specific Tasks Layer
To implement the multi-task learning model, we had to determine what tasks the model needs to perform. This model is mainly used for the vulnerability detection of smart contracts, so tasks are divided into detection and recognition. For detection, which is essentially is a binary classification task, the classic CNN is used to build a binary classification network. The feature vector calculated by the shared layer is taken as the input of the convolution layer, and the important information in the feature vector is extracted by the CNN. Next, focal loss [60] is used to calculate the task loss. The specific formula is as follows: Focal loss is modified in accordance with the cross-entropy loss function, primarily to resolve a serious imbalance in the proportion of positive and negative samples for target detection and to reduce the weight of a large number of simple negative samples in training. In the formula, y is the probability that the sample i is predicted to be a positive class; γ is the adjustment factor, which is used to adjust the rate of the weight reduction of simple samples, pay special attention to samples that are difficult to distinguish, and to reduce the impact of simple samples. The loss is the cross-entropy function when γ = 0; α is a balancing factor used to balance the proportion of and adjust the importance of positive and negative samples. In this study, along the lines of the paper [60], default values (γ = 2 and α = 0.25) were used.
The recognition task is essentially a multi-label classification task. As with the detection task, the classic CNN was still used to build the task network. The difference is that multiple neurons are added to the last fully connected layer of the branch network layer, and the probability output of each sample is calculated through the softmax function. Finally, the cross-entropy loss function was used to calculate this task loss. The recognition loss was calculated as follows: where M is the number of categories. p ic is the predicted probability that sample i belongs to category c, y ic is the sign function (0 or 1), and if the true category of sample i is equal to c, then y ic = 1, or y ic = 0. The design of the task-level branch network is depicted in Figure 11. Multi-task learning is a method of derivation and transfer, the idea of which entails an ML-based method that puts multiple related tasks together to learn through shared representation. Therefore, whether there is a correlation between tasks is an important problem in model design. According to paper [40], if two tasks use the same features to make decisions, the tasks can be considered to be similar. The proposed model uses the same datasets, learns and obtains features from the datasets, and the detection and recognition tasks share the same features. Therefore, it is believed that the two tasks are related.
In addition, multiple related tasks share information and optimize parameters through the mutual adjustment of the loss function in multi-task learning, followed by feedback to each sub-task to improve the model effect. The sharing loss function was calculated as follows: Loss = Loss f l +Loss ml 2 (9) The algorithm pseudocode flow is shown in Algorithm 1 as follows: for b t in D do // b t is a mini-batch of task t 12: 3. Compute loss: Loss(θ) 13: Loss f l (θ) = Equation (7) for judgment task 14: Loss ml (θ) = Equation (8)

Experiment and Analysis
To verify the efficacy of our model, in this study, we conducted four groups of experiments. Experiment 1 was a multi-task learning experiment that tested different parameters and carried out a vulnerability detection experiment. The remaining experiments were comparative. In Experiment 2, the model was compared with existing vulnerability detection tools to verify the detection effect of the model compared with the benchmark method. In Experiment 3, the model was compared with other ML-based vulnerability detection, and Experiment 4 was an ablation experiment. Each experimental dataset was divided into a training set, validation set, and test set at a ratio of 7:2:1.

Experimental Setup
All the experiments were carried out under the settings listed in Table 6. In addition, the Jieba word segmentation database, word2vec, and Glove word vector training tools were also used in the experiment.

Evaluation Metrics
We evaluated the performance of our model in terms of precision, recall, and F1 score ; each metric is detailed as follows.
The results of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) are the basis for calculating other metrics, where true values represent the number of correctly predicted results, whereas false values represent the number of wrongly predicted results [61]. The precision metric describes the proportion of correct predictions to all correct predictions, indicating the accuracy of the classifier's positive prediction [62]. The recall metric represents the proportion of actual positives that are correctly classified. The formulas to compute these two metrics are given as follows: The F1 score metric is also a commonly used measurement index that uses precision and recall for quantifying the accuracy of the overall decision. This metric is defined as the harmonic average of precision and recall, and its calculation formula is as follows:

Experiment 1: Multi-Task Learning Model Experiments
Experiment 1 was an investigation into the influence of parameters on detection performance. Owing to the large number and wide range of variables, it was difficult to select the optimal value for each combination of variables. Therefore, the values of some variables were changed while determining the other variables to find a relatively good combination of parameters.
This model involves multiple parameters (see Table 7). In addition, this experiment involved the setting of the neural network training parameters, including the epoch and batch size. Based on experience, the experiment initially set a batch size of 32 and an epoch of 150, focusing on the training accuracy changes as the training process converged. As is evident in Figure 12, the training precision and the test precision increased with the increase in the epoch at first. When the epoch reached 100, the precision began to stabilize. Therefore, the initial value of the epoch was set to 100 to achieve high precision without making the training time too long.   From Figure 13, it is evident that as the batch size increased, the training time decreased. When the batch size was greater than 64, the time spent in training did not change significantly. Furthermore, the training metrics fluctuated within a certain range with the increase in batch size. Precision decreased first and then increased; recall and F1 score increased first and then decreased, reaching the highest value when the batch size was 64. By weighing the time required for the training model against the training metrics, the batch size was finally set to 64. The parameter settings used in this experiment are shown in Table 7.
According to the above-mentioned parameter settings, we completed the multi-task learning experiment on the constructed datasets. To determine the influence of the two main tasks on the model, the loss value curves of the two tasks in the training and verification of the dataset were drawn respectively, as shown in Figures 15 and 16. It can be seen that during the task of recognition, the loss value curve of the training and verification process was well fitted, while during detection, the loss value curve of the training and verification process was well fitted in the initial stage, but with the increase in the number of iterations, the fit degree decreased and then produced a certain range of fluctuations. Figure 17 shows the loss value curve of the multi-task learning model in the process of training and verification. It can be seen that the two curves were also well fitted. Obviously, the better the fitting degree of the loss value curve during training and verification, the stronger the generalization ability of the model. It can, therefore, be asserted that a multi-task learning model can improve the generalization performance of a single task.

Experiment 2: Baseline Model Experiment
To verify the effect of datasets on the baseline model, mainstream vulnerability detection tools (SmartCheck [29], Securify [22], and Mythril [13]) were employed as the baseline models for comparative experiments. SmartCheck [29] is a contract analysis tool based on static program analysis technology. It uses grammatical and lexical analyses of source codes and XML to describe the abstract syntax tree results after analysis. It then uses XPath [63] to check the security attributes of smart contracts on this basis to detect reentrancy, timestamp dependency, and denial of service vulnerabilities. Securify [22] is a contract static analysis tool based on symbolic abstraction that defines compliance and violation modes for each security attribute. Starting from the bytecode of the contract, Securify symbolically analyzes the dependency graph of the contract, extracts precise semantic facts, and uses these fact-matching patterns to obtain detection results. Securify can detect vulnerabilities such as reentrancy, unvalidated input, transaction order dependence (TOD), frozen Ether, unhandled exceptions, and more. Mythril [13] is a smudge analysis-based smart contract security analysis tool that integrates mixed symbol execution, smudge analysis, and control flow checking to detect vulnerabilities such as reentrancy, integer overflow, unknown calls, transaction order dependence, and timestamp dependence. As mentioned above, in this study, we selected reentrancy, arithmetic vulnerability, and contract contains unknown address vulnerability for detection; the results are summarized in Table 8. It is evident from Table 8 that compared to the baseline model, the vulnerability detection performance of this model is improved noticeably. For arithmetic vulnerability, the precision, recall, and F1score of our model improved by 17.85%, 31.83%, and 24.91%, respectively, compared to Mythril. In regard to reentrancy detection, compared to SmartCheck, Securify, and Mythril, the precision of our model was enhanced by 28.68%, 19.46%, and 20.73%, respectively. The recall rate improved by 34.77%, 21.23%, and 26.14%, respectively, and the F1 score improved by 31.69%, 20.3%, and 23.26%, respectively. As for unknown address vulnerability, the detection performance of this model was improved compared to the previous two vulnerabilities. However, as the baseline model does not support the detection of this type of vulnerability, the comparison data cannot be provided.

Experiment 3: Comparison with ML Methods
ML-based smart contract vulnerability detection has been a hot topic in recent studies. Several representative ML methods were selected for comparison, including RNN [64], LSTM [36], and ABCNN (attention-based CNN [46]). The results are summarized in Table 9. As can be seen from Table 9, based on our dataset, the performance of this model has obvious advantages over RNN, LSTM, and ABCNN. Taking ABCNN, which has the best performance among the contrast models, as an example, compared to the ABCNN model, this model improves the detection precision of three kinds of vulnerabilities, namely arithmetic vulnerability, reentrancy, and contact contains unknown address by 3.63%, 10.75%, and 5.44%, respectively, and the recall by 10%, 14.07%, and 7.29%, respectively. The F1 score increased by 6.67%, 12.29%, and 6.29%, respectively, and the detection performance improved in all three indexes. Such promising values of these metrics confirm that the shared layer design of the model can effectively capture the code characteristics of smart contracts with vulnerability, and the model design of multi-task learning can improve the accuracy of model detection.

Experiment 4: Ablation Experiments
To analyze the performance difference between our model and a single-task learning model, the detection performance of two sub-tasks was examined, namely detection and recognition. Taking the former task experiment as an example, the network structure was kept unchanged, the recognition task branch was frozen, the loss function of the network was canceled, and the output results of the characteristic network were directly input into the binary classification network. The recognition task experimental setup is similar.
As can be seen from Tables 10 and 11, the performance of our model was significantly improved compared to the single-task model in terms of vulnerability detection and recognition. For example, compared to the single-task model, the precision, recall, and F1 score of the vulnerability detection improved by 4.15%, 5.34%, and 4.74%, respectively. This model also attained high accuracy in recognizing the type of vulnerability, especially in the detection of reentrancy, with significant improvement to the indicators. The results show that knowledge sharing through multi-task learning can achieve better performance than the single-task model. To discuss the occupation of computing resources in multi-task learning, the back-end program was called to check the proportion of resources used in the single-task model experiment and the multi-task model experiment. As illustrated in Figure 18, the VIRT and SHR used by the multi-task learning model were higher than those used by the singletask model, while the RES was slightly lower. In general, although the resource memory required by multi-task learning is higher than any single task model, it is far less than the sum of resources used by the single-task model. At the same time, the multi-tasking model can simultaneously complete multiple individual tasks in approximately equal time, consuming much less time than the sum of the time consumed to complete multiple single tasks. The above data show that multi-task learning can share features among multiple tasks such that multiple tasks can be executed in parallel, thereby saving storage resources and improving execution speed at the same time. Figure 18. Resource usage comparison. Virtual memory usage (VIRT) indicates the total virtual memory applied for by a process, including the libraries, codes, and data used by the process, and represents the total space that the current process can access. Resident memory usage (RES) is the resident memory usage of a process; it can accurately indicate the physical memory usage of the current process by counting only the memory occupied by loaded library files. Shared memory (SHR) indicates the size of the virtual memory (VIRT) that can be shared.

Discussion and Conclusions
Vulnerability detection of smart contracts is an important issue for the security of blockchain. In order to improve the performance of conventional methods, a smart contract vulnerability detection model based on multi-task learning is presented. The model takes advantage of the architecture of multi-task learning, which includes a shared bottom layer and a task-specific layer. First, the opcodes of smart contracts are converted into word and position vectors by word embedding and positional embedding. They are then used in a new vector representation for successive processing. Then, a neural network based on a multi-head attention mechanism is applied to learn and extract the feature vector of the contracts. In the task layers, the classic CNN is used for the two sub-tasks of detection and recognition, which trains the features captured by the sharing layer to detect and recognize the types of vulnerabilities of contracts.
As an emergent machine learning method, multi-task learning is getting more and more attention for its high efficiency and generalization, while its application in the vulnerability detection of smart contracts is seldom reported, which makes our work valuable in this field. Moreover, the experimental results show the effectiveness of our method. In the contrast experiments, several mainstream detecting tools, such as SmartCheck, Securify, and Mythril, and some ML-based methods, such as RNN, LSTM, and ABCNN, were compared with our model. The results (Tables 8 and 9) show that our model was superior to the contrast models in terms of detection precision. In addition, the ablation experiment results (Tables 10 and 11) also show that our model reduced memory consumption and time consumption compared to the single-task models.
Compared to some popular detecting tools (e.g., Securify [22], SmartCheck [29], Mythril [13]), our model is based on machine learning, which can extract hidden features from massive data to improve the detection accuracy. Meanwhile, the designs of the detecting tools are usually oriented to grammatical or symbolic analysis, which limits the vulnerability types they can detect. For example, both SmartCheck and Securify are not able to detect arithmetic or the contract contains unknown address vulnerabilities, while our model can.
Compared to the common ML-based methods (e.g., RNN [64], LSTM [36], and ABCNN [46]), our model takes advantage of the architecture of multi-task learning, in which the sharing layer can learn the semantic information and extract the important feature for the processing of the task layer, helping to improve the detecting efficiency. In addition, a multi-head attention mechanism is also introduced in the sharing layer, which further enhances the efficiency.
As shown in Tables 10 and 11, compared to the single-task model, our model had an obvious advantage in detection performance. This is because multi-task learning can improve the performance of related tasks by sharing information and complementing each other [40]. Typically, since different tasks have different local optima, the question of how to escape from them for multiple tasks is a tough issue in single-task learning, while it can be solved in multi-task learning by sharing related parts that help learn the common feature. Moreover, compared to single-task learning, such a sharing mechanism in multi-task learning saves not only the computing power but also the memory space by sharing related parts, which is also shown by the experiment results ( Figure 18).
Furthermore, the vulnerability labeling work for 149,363 smart contracts collected from the XBlock platform was performed by the research group, generating a new labeled dataset that will be released soon. The dataset may be beneficial for future research in smart contract vulnerability detection.
Despite the contributions of this research, as explained above, there are still some limitations. For example, the architecture of this work adopted the hard parameter sharing method, which may weaken the expressiveness of the model and reduce the generalization ability of the model [65]. Furthermore, the weight settings of the loss function for each task are not automatically determined. In future work, further optimization of the multi-task learning architecture will be explored. For example, instead of the hard parameter sharing method, the soft parameter sharing method will be used and the shared bottom layer will be replaced by multiple hidden networks with its own parameters to improve the efficiency. In addition, some automatically weighted mechanisms for the loss function will be studied.  Data Availability Statement: Publicly available datasets were analyzed in this study. The data will be made available to the public soon.

Conflicts of Interest:
The authors declare no conflict of interest.