Incorporating Code Structure and Quality in Deep Code Search

: Developers usually search for reusable code snippets to improve software development efﬁciency. Existing code search methods, including methods based on full-text or deep learning, have two disadvantages: (1) ignoring structural information of code snippets, such as conditional statements and loop statements, and (2) ignoring quality information of code snippets, such as naming clarity and logical correctness. These disadvantages limit the performance of existing code search methods. In this paper, we propose a novel code search method named Structure and Quality based Deep Code Search (SQ-DeepCS). SQ-DeepCS introduces a code representation method called program slice to represent structual information as well as API usage of code snippets. Meanwhile, SQ-DeepCS introduces a novel deep neural network named Method-Description-Joint Embedding Neural Network (MD-JEnn) to weight the quality of code snippets. To evaluate the proposed methods, we train MD-JEnn and evaluate SQ-DeepCS by searching for code snippets with respect to the top-rated questions from Stack Overﬂow. We use four evaluation indicators to measure the effectiveness of SQ-DeepCS: FRank, SuccessRate@k, PrecisionRate@k, and Mean Reciprocal Rank (MRR). The experimental results show that our approach can provide better results than existing techniques when searching for relevant code snippets.


Introduction
To develop software efficiently, software developers often find and reuse existing code snippets by searching over professional codebases, such as GitHub [1][2][3][4]. Driven by information needs, developers will submit queries expressed in natural language and expect for code snippets satisfying their needs. However, code snippets and natural language queries are heterogeneous, and thus it is hard to locate code snippets that meet user's intent [5].
Traditional information retrieval methods toward code searching are usually based on text vocabulary matching [6]. For example, Lv et al. [7] combined text similarity and API sequence matching and proposed an extended Boolean model named CodeHow. Linstead et al. [8] proposed Sourcerer, a code search tool that combines structural information with text vocabulary information using information retrieval techniques. Since code snippets and natural language queries have obvious heterogeneous characteristics [9,10], code snippets that can fulfill information needs do not necessarily contain submitted query words or natural language words with similar semantics. As a result, the performance of traditional text vocabulary-based code search methods are greatly limited.
Gu et al. [11] brought joint embedding technology to code search to deal with the flaw and proposed a code search tool named DeepCS. The key idea of joint embedding technology is to transform heterogeneous inputs into shared vector space. With joint embedding technology, DeepCS embedded code snippets and natural language descriptions into a high-dimensional vector space. As a result, a code snippet and its corresponding description will occupy nearby regions of the space. By calculating the vector similarity of the embedded vectors of the code snippet and its corresponding description, code search tools can retrieve related code snippets that are more in line with users' expectations.
With the development of machine learning technology in recent years, many different methods have gradually sprung up in the field of code search. Sachdev et al. [2] proposed an unsupervised technique named Neural Code Search (NCS). NCS extracts specific keywords from code snippets and uses only the word embedding mechanism to obtain the vector of code snippets. Yao [12] used Tree LSTM to process the abstract syntax tree of code snippets and proposed a new code search method named At-CodeSM. These methods used different techniques to extract the semantic features of code snippets and completed the code search task by comparing the similarity between vectors of code snippets and natural language queries.
Although DeepCS showed quite good results on some datasets, we could still notice certain disadvantages. Firstly, DeepCS ignored certain structural information when representing code snippets, such as conditional statements and loop statements. Structural information reflects the execution order of code snippets, and thus is an essential part of code semantics [13]. DeepCS treated structural information as chain links, which ignores the semantics contained in the structure of code snippets, and thus limited the performance of code search.
Secondly, DeepCS ignored the quality information of code snippets, such as naming clarity and logical correctness. The quality of codes from large codebases may vary. Take the two code snippets shown in Figure 1 as an example. The method name of the first code snippet cannot clearly reflect the purpose of this code, and the variable names in the second code snippet do not conform to naming conventions. Meanwhile, the purposes of the two code snippets are the same. That is to say, they should have a similar ranking order in the results returned by code search tools. However, when representing code snippets, DeepCS assigns the same weight value to all features, such as method names and tokens. As a result, the ranking order of the first code is far lower than that of the second code because of the incomprehensible method name. The goal of this paper is to overcome the aforementioned problems and to improve code search performance. For this purpose, we propose a novel code search method named SQ-DeepCS. Firstly, we introduce a novel code representation method called program slice to preserve structural information and data information when representing code snippets. Program slice is a formal representation of function body. It preserves structural information on the basis of API linear sequence [11]. Secondly, we introduce attention mechanism [11,14] to weight the quality of code snippets and propose a novel deep learning model MD-JEnn. MD-JEnn is a bi-directional long-short term memory (BLSTM) based deep learning model. It leverages attention mechanism to weight the quality of code snippets. To evaluate the proposed methods, we train MD-JEnn and evaluate SQ-DeepCS by searching for code snippets with respect to the top-rated questions from Stack Overflow. We use four evaluation indicators to measure the effectiveness of SQ-DeepCS. The experimental results show that our approach can provide better results than existing techniques when searching for relevant code snippets.

Recurrent Neural Network
In code search, code snippets and natural language queries are required to be embedded into vectors so that their semantic similarity can be measured. Variable-length sequential data, such as code snippet and natural language query, are often precessed by recurrent neural network (RNN). RNN is composed of multiple neural network units and takes sequential data as input [15][16][17]. RNN has the ability to map sequential input into sequential hidden state. Compared with the ordinary fully connected network, the current time step output of the neurons in RNN hidden layers depends not only on the input of the current time step but also on the output of the previous time step. This feature of RNN is particularly suitable for processing code [18].
With the length of input sequence increasing, RNN will face the problems of long-term dependencies [19,20]. To alleviate this problem, Xu introduced bi-directional long-short term memory network (BLSTM) [21]. BLSTM combined memory cells and RNN to preserve memory information. BLSTM controls selective memory and forgetting information through three gating units: forget gate, input gate, and output gate. The forget gate determines how much information in the previous time step can be preserved to the current time step. The input gate determines how many input signals will be fused, and the output gate controls how much memory is finally output. The specific calculation method is defined as follows: where i t , f t ,o t ,g t represent the input state, the forget state, the output state, and the unit state of the current time step. X t is the input signal of the current time step. H t−1 represents the output signal of the previous time step. W,V,b are the coefficient matrix of BLSTM to be trained. By training these weights, BLSTM can selectively ignore or strengthen the current memory c or input signals according to the current input signals and memory information.
In this way, BLSTM better learns the semantic information of long sentences. c and h are determined by: where c t represents the memory signal of current time step t. h t is the hidden state of t. The above process can be simplified as: where h t , t = 1, 2, ..., N D are the hidden states of BLSTM. [x; y] ∈ R 2d is a concentrate operate to integrate x and y. tanh is a commonly used activation function. w t ∈ R d represents the embedded representation of the natural language word w t . W D and b D are the weight matrix and bias matrix of BLSTM. Experimental results show that BLSTM can overcome RNN when processing long sequential data in code search [11].

Attention Mechanism
When embedding sequential inputs into a vector space, RNN assigns the same information weight to each input feature. However, since different features of code snippets have different qualities, it is necessary to give higher weight to the high-quality features. The problem of assigning different weight to different features has been extensively studied, and one of the most effective methods is the attention mechanism [14,[22][23][24][25].
Attention method contains a randomly initialized global attention vector α ∈ R d . For a set of combined context vectors c 1 , c 2 , ..., c n , attention method calculates an attention weight α i for each c i as the normalized inner product between the context vectors and the global attention vector α: where α i is a softmax function in conventional form. According to the properties of the softmax function, the attention weights are positive and add up to 1. Attention weight α i can be regarded as a weighted average and can be trained to represent the importance of combined context vectors. Finally, representative vector v ∈ R d combines the characteristics of each context vector. In the domain of code search, Alon et al. have introduced the attention mechanism to weight the critical paths of codes [14]. In our work, we leverage attention mechanism to solve the problem of different code quality.

Joint Embedding Mechanism
Joint embedding mechanism, which is also known as multi-model embedding, is usually used to model the relation between two heterogeneous data [26]. Consider two heterogeneous datasets X and Y with some semantic association. The semantic association can be expressed as a mapping function f : Since X and Y are heterogeneous datasets, different embedding technologies need to be used in order to embed X and Y into a unified vector space. The semantic relation between X and Y is measured by calculating similarity of the two embedded vectors. The goal of joint embedding is to make semantically similar concepts across the two modalities occupy nearby regions of the space [27]. This process can be expressed as: where ϕ, τ are the different embedding function to transform X,Y into the same semantical space by setting the dimensions of two embedded vectors v X , v Y to the same. S(v X , v Y ) represents a similarity measure (e.g., cosine) to evaluate the matching degrees of v X and v Y . In this way, the mapping function can model the semantic relation between two heterogeneous datasets X and Y.

Code Representation
Existing code representation methods generally treat a code snippet as three parts: the method name, the API sequence, and the tokens. The API sequence is generated by traversing the AST tree of the code snippet. The API sequence treats the calling of API functions in the code snippet as a chain link. As a result, the API sequence ignores semantics contained in the structure of the code snippet. To overcome this disadvantage, this section introduces a novel code representation method called program slice to preserve structural information when representing code snippets as follows: Method Name Representation: Method name of each code snippet is divided into a sequence of tokens. These tokens are split according to the corresponding naming conventions such as camelCasing or under_scores. The token sequence is then lowercased. For example, the method name 'Write2File' will be transformed into token sequence{write, 2, file}.
Program Slice Representation: Program slice extends API sequence to preserve structural information of method body [11]. Program slice is generated by two steps: (1) parsing AST of a code snippet [28,29] and (2) applying static analysis method on the ASTs [30]. For different statements of code snippet, the processing methods are as follows: • For each variable declaration statement, program slice analyzes its corresponding variable type and adds new keyword before the variable type. Taking Golang as an example, var a string is transformed to new string.
where A, C, E are the name of class or struct for object a, c, e. • For each operation expression, program slice preserves operation symbols and result types. For example, a = 1 + 2 is converted to +int.
Token Representation: Method body of code snippets are tokenized according to blank space and processed in the same way of method name representation. Duplicated tokens and tokens in a stop word list are removed. Figure 2 shows an example of code representation extracted from a Golang method.

Model
Existing code search methods take code snippets and the corresponding code annotations as input and use joint embedding technology to model the semantic relation between code snippets and annotations. When using RNN to embed sequential inputs into a vector with semantic information, each input has the same influential weight. However, since different features of code snippets, such as method name and program slice, have different qualities, it is necessary to give higher weight to the high-quality features. Thus, we introduce attention mechanism to weight the quality of code snippets and propose a novel deep learning model MD-JEnn.
MD-JEnn embeds code snippets and descriptions into hidden vectors using BLSTM. According to attention mechanism, the hidden vectors are integrated into vectors that respectively represent code snippets and descriptions. MD-JEnn then jointly embeds the two vectors into the same space to calculate the similarity. By this means, a code snippet and its corresponding description can be embedded into nearby vectors through iterative training. The details of our model are introduced in the following subsections. Figure 3 shows the overall structure of the MD-JEnn. MD-JEnn is divided into three components: the description embedding module, the method feature embedding module, and the similarity module. Each of them corresponds to a part of joint embedding:  The following subsections describe the detailed design of these modules. Figure 4 shows the detailed structure of MD-JEnn.

Description Embedding Module
Description embedding module (DE-Module) embeds natural language code annotations into description vectors. The first sentence in the code annotation of a code snippet always represents the summary of the entire code snippet. In order to obtain the embedded description vector, DE-Module processes a natural language description in the following steps: Firstly, DE-Module takes a natural language description as input and outputs an embedded feature vector. In DE-Module, BLSTM considers a natural language description D = {w 1 , w 2 , ..., w N D } as a sequence that contains N D words. BLSTM takes the description as input and calculates the hidden state according to each time step. BLSTM updates the hidden state h t at time t by concentrating input word w t and preceding hidden state h t−1 : where h t , t = 1, 2, ..., N D are the hidden states of BLSTM. [x; y] ∈ R 2d is a concentrate operation to integrate x and y. tanh is a commonly used activation function. w t ∈ R d represents the embedded representation of natural language word w t . W D and b D are the weight matrix and bias matrix of BLSTM (either of the matrices is bi-directional). In this way, a description is embedded into N D d-demensional embedded feature vectors. Secondly, since some words in a description are important, it is necessary to assign higher weights to these important words. For example, the words write and file are more important as they express the key semantics of the description write to a file. To give higher weights to important words, MD-JEnn introduce attention mechanism to aggregate the embedded vectors of the description into a represented vector by calculating a scalar weight for each vector of the description word. The individual vectors are aggregated to a represented description vector v via attention: where α ∈ R d is initialized in random, h t , t = 1, 2, ..., N D are the hidden states of the previous BLSTM layer. α i is the attention weight of each h i . The exponents in α i are the common form of softmax function. The vector d can be treated as the representation of the input description. The description vector d considers both lexical information and semantic information. In this way, DE-Module can emphasize the key words in a description.

Method Feature Embedding Module
Method feature embedding module (MF-Module) embeds code representations into code vectors. A code snippet can be represented into a code representation C = [M, S, K] using the code representation method described in Section 3, where M = m 1 , m 2 , ..., m N M is the sequence of N M tokens and represents the method name of the code snippet; S = s 1 , s 2 , ..., s N S is the program slice with N S consecutive tokens; K = {k 1 , k 2 , ..., k N K } is the collection of tokens that appeared in the code snippet. Each part of the code snippet is embedded into partial embedding vectors. In order to highlight the high-quality parts of the code snippet, these partial embedding vectors are concentrated into a represented code vector with attention mechanism. MF-Module processes the code representation C according to the following steps: Firstly, MF-Module embeds the method name M into a sequence of separated tokens using BLSTM: where h t , t = 1, 2, ..., N M , are the hidden states of BLSTM. m t ∈ R d represents the embedded representation of tokens. W M and b M are the weight matrix and bias matrix of BLSTM. Secondly, MF-Module embeds program slice S into N S d-demensional hidden vectors h t : where h t , t = 1, 2, ..., N S , are the hidden states of the BLSTM, s t ∈ R d represents the embedded representation of the tokens in program slice s t , W S and b S are the weight matrix and bias matrix of BLSTM. Finally, as the tokens K are not strictly ordered, MF-Module embeds the tokens K by a fully connected layer: where h t , t = 1, 2, ..., N K , are the embedding vectors of tokens. k t ∈ R d represents the embedded representation of tokens. W K is the weight matrix of the fully connected layer.
Since different features of code snippets have different qualities, it is necessary to give higher weight to the high-quality features. After embedding the three components (method name, program slice, and tokens) of a code snippet, MF-Module emphasizes the high-quality parts of the code snippet with attention mechanism: where [x; y; z] is a concentrate operate to integrate x, y, and z. α ∈ R d is initialized in random, h t , t = 1, 2, ..., N H , are H d-demensional vectors. α i is the attention weight of each h i . In this way, the code vector c can be viewed as the final representation of the code snippet.
In summary, a code snippet is first processed into three features: the method name, the program slice, and the tokens. In the training phase, each feature of <method name, program slice, tokens> is embedded into a feature vector by BLSTM, BLSTM, and MLP, accordingly. These feature vectors are then concatenated into a code vector through an attention layer.

Similarity Module
Joint embedding mechanism needs a similarity calculation method to form a unified vector space. In order to joint embed the code vectors c and the description vectors d, similarity module uses cosine similarity as the measurement: where c and d are the code vector and description vector. Since the similarity marks the degree of correlation, similarity module aims to make semantically similar vectors occupy adjacent spatial regions. To summarize, MD-JEnn receives an input as <code, description> pair and calculates the cosine similarity cos(c, d) of them. The similarity module is only responsible for calculating the similarity. The semantic relation between code snippets and natural language descriptions is obtained through model training described in the next subsection.

Model Training
To train MD-JEnn, a training dataset containing triples <C, D+, D−> needs to be constructed, where D+ represents a similar description that describes the functionality of code C, and D− represents a dissimilar description. In principle, the similarity value of C and D+ should be higher than C and D−. Therefore, the training algorithm can be expressed as: where m(A, B) denote the cosine similarity of A and B. L is the loss function. λ is a threshold that the similarity of positive sample should be higher than negative sample. The advantage of loss function L is that it does not force the classification of a single sample but learns the relation of samples. This method reduces the difficulty of building datasets. Figure 5 is an example of a training dataset. In each epoch of model training, the ranking loss encourages the cosine similarity between a code snippet and its correct description to go up and the cosine similarities between a code snippet and incorrect descriptions to go down. In this way, the semantic relation between code snippets and natural language descriptions is established.

Experimental Setup
In order to evaluate the performance of SQ-DeepCS, we constructed a codebase to train our model and varify the result. The codebase consists of Golang code snippets from GitHub repos. To ensure data quality, we only chose repos with more than 20 stars. For code snippets with comments, we treat the comments as the descriptions of the code snippets. The code snippets and descriptions are represented into triples <number, code representation, description representation>. For code snippets without descriptions, we represented them as <number, code representation, none>. The codebase consists of 218,072 triples with descriptions and 566,103 triples without descriptions. As in [11], we use code snippets with descriptions for training and all the code snippets for result verification.
For a natural language query, SQ-DeepCS returns top K most relevant code snippets calculated by the MD-JEnn model. The use of SQ-DeepCS consists of three steps: (1) offline training, (2) offline codebase embedding, and (3) online code searching. As follows: (1) Offline training: SQ-DeepCS is trained by the method described in Section 4 through code snippets with descriptions in the codebase. Before offline training, the codebase is preprocessed using the method described in Section 3 to train MD-JEnn. The offline training parameters are as follows: All BLSTMs have 200 hidden units in each direction. The dimension of word embedding is 100. MD-JEnn has two types of multilayer perceptron (MLP). The number of hidden units of the MLP for embedding individual tokens is 100. The number of hidden units of MLP for combining the embedding vectors of different aspects is 400. We consider K as 1, 5, and 10 when returning the top K relevant results. MD-JEnn model is trained via the mini-batch Adam algorithm. The batch size is set as 128. We limit the size of vocabulary to 20,000 words that are the most frequently used words in the training dataset. We build our model on Keras, an open-source deep learning framework and train models on a server with one NVidia 1080Ti GPU. The training lasts for nearly 13 h with 200 epochs.

Evaluation Questions
To evaluate the effectiveness of SQ-DeepCS, we establish an evaluation question set that consists of 45 top voted Golang programming questions collected from Stack Overflow. We follow the following criteria to choose the 45 questions from the list of top voted Golang questions in Stack Overflow: (1) The question should be a concrete and achievable programming task. There are various questions in Stack Overflow, but some problems are not related to programming tasks. For example, 'When is the init() function run', 'What should be the values of GOPATH and GOROOT', and 'Removing packages installed with go get'. We only retain questions related to concrete and achievable programming tasks and filter out other problems such as knowledge sharing and judgement.

Evaluation Metrics
We submit 45 questions to SQ-DeepCS and obtained the corresponding search results. In order to evaluate the results, five experienced developers were invited to determine whether the results can or cannot resolve the submitted questions. The decisions made by the developers are binary. When the developers hold different opinions on the results, they will discuss to reach an agreement.
We use four evaluation indicators to measure the effectiveness of SQ-DeepCS: FRank, SuccessRate@k, PrecisionRate@k, and MRR. All of them are widely used in the research domain of information retrieval and code searching.
FRank (also known as the best hit rank) is the rank of the first correct result in the query result list. The assumption behind FRank is that users often browse down from the first search result. The smaller the FRank of the search result, the less effort the users make to obtain correct results. We use FRank to evaluate the effectiveness of a single search and record the corresponding value of each query.
SuccessRate@k (also known as the success percentage at k) represents the percentage of the queries for which more than one correct result could exist in the top k ranked results [31-33]: where Q is the number of all queries. δ(cdot) is a function which returns 0 if the input is false and 1 otherwise. SuccessRate@k counts the number of queries with correct results in the first k results of all the queries and then divides it by the total number of queries.
A good code search engine should help users find correct search results in shorter time so as to save their query cost. The higher the SuccessRate@k, the better the code search performances.
Precision@k is a variant of accuracy which calculates the average value of the quality of query results. In our evaluations, it is calculated as: where relavant q,k represents the number of search results related to the query statement in the top k results of the qth query. Precision@k is important because developers often inspect multiple results of different usages to learn from [34]. A better code search algorithm should allow less noisy results so that users can obtain more relevant results. In our experiment, the higher the Precision@k, the better the code search performances. For FRank, Success@k, and PrecisionRate@k, we recorded their values when k is set to 1.5 and 10. For MRR, we only recorded the value when k equals 10. MRR is the average of the reciprocal ranks in the results of all queries. The reciprocal rank of a query is the reciprocal of the ranking of the first correct search result. MRR is calculated by the following formula: The idea behind MRR is the reciprocal of the sorting results: the MRR score of the first result is 1, the score of the second result is 0.5, and the score of the nth match result is 1/n. In our experiment, the score of not finding (NF) the correct result in the top 10 returned results is recorded as 1/11. The higher the MRR value, the better the code search performance.

Compared Method
We compare our method with DeepCS. DeepCS embeds a code snippet and the corresponding description into four vectors: the method name vector, the API sequence vector, the token vector, and the description vector. DeepCS uses BLSTM and maxpooling to embed method names, API sequences, and descriptions. For code snippet tokens, DeepCS simply uses an MLP. The vectors of method name, API sequence, and tokens are fused into one vector through a fully connected layer.
We improved DeepCS by introducing program slice and attention mechanism. To understand the impact of introducing program slice and attention mechanism, we designed several comparison methods based on DeepCS: (1) P-DeepCS: Program slice based DeepCS (P-DeepCS) uses program slice instead of API sequence in DeepCS. This is to explore the impact of introducing program slice. (2) A-DeepCS: Attention based DeepCS (A-DeepCS) uses attention layers instead of maxpooling layers in DeepCS. This is to explore the impact of introducing attention mechanism.
(3) PA-DeepCS: Program slice and attention based DeepCS (PA-DeepCS) use program slice to represent method body and use attention mechanism to weight embedded vector. However, when fusing the vectors of method name, program slice, and tokens, PA-DeepCS still uses a fully connected layer. (4) SQ-DeepCS: SQ-DeepCS is our proposed method. The key difference of PA-DeepCS and SQ-DeepCS is that SQ-DeepCS adopts an attention layer to concentrate the vectors of method name, program slice, and tokens.
We also formulate experiments in the context of the code search techniques mentioned in Section 1: (1) NCS: NCS extracts some specific keywords as semantic features of code snippets, such as method name, method invocations, and enums. NCS combines these keywords with fastText [35] and conventional IR techniques, such as TF-IDF. The encoder of NCS adopts unsupervised training mode. (2) At-CodeSM: At-CodeSM used Tree LSTM [36] to process the abstract syntax tree of code snippets. At-CodeSM extracted three features of code snippets: method name, token, and AST. For tokens of code snippets, At-CodeSM use LSTM and attention mechanism to obtain the vector of tokens. When fusing the vectors of method name, AST, and tokens, At-CodeSM uses a fusion layer. Table 1 shows the evaluation queries and the corresponding FRank(DCS: DeepCS; AD: A-DeepCS; PD: P-DeepCS; PAD: PA-DeepCS; SQD: SQ-DeepCS; NCS and At-CodeSM are not given in the table due to space limitations). In Table 1, 'NF' represents Not Found, which means there is no relevant result in the top K results in this query (here we consider K as 10). A FRank value of 1 indicates that a user can obtain reusable code snippets in the first search result. The FRank value of 1 that appeared in DeepCS and SQD is 8 and 10. The number of 'NF' in DeepCS and SQD is 20 and 16. Therefore, SQD outperforms DeepCS from the perspective of FRank. The FRank value of 1 that appeared in AD, PD, and PAD is 6, 7, and 9. The number of 'NF' in AD, PD, and PAD is 22, 21, and 16. The results show that AD and PD perform even worse than DeepCS, while PAD outperforms DeepCS a bit but still worse than SQD. For NCS and At-CodeSM, the number of 'NF' is 20 and 17. In this regard, the effect of NCS is similar to that of DeepCS, slightly inferior to that of At-CodeSM, while SQD still maintains a good effect. The FRank value of 1 that appeared in NCS and At-CodeSM is 7 and 9. It means that when we only consider the top search results, SQD still performs well. Figure 6 shows the box-plot of FRank for the several approaches. The vertical axis represents the FRank values from 1 to 11, where we regard 'NF' as the FRank value of 11. The symbol '+' and the horizontal line in the box represent the mean and median of FRank. We can observe that the average FRank score of SQ-DeepCS is 6.09, which is lower than the average FRank score of DeepCS (6.94). This result shows that, from the perspective of the ranking of first useful search results, we have improved about 0.85 on the basis of DeepCS. The average FRank scores of AD, PD, and PAD are 6.96, 7.24, and 6.51. For NCS and At-CodeSM, the average FRank scores are 7.09 and 6.22. The results show that at the FRank level, the performance of SQD is ahead of other methods.  Table 2 shows the overall accuracy of SQ-DeepCS and related approaches. The results show that the performance differences of A-DeepCS, P-DeepCS, and DeepCS are not significant. For example, S@10 of PD is 0.533, which is 2.3% lower than that of DeepCS. From these results, we could see that the introduction of program slice or attention mechanism may not improve the performance of DeepCS. SQ-DeepCS outperforms PA-DeepCS in most indicators expected for P@10. The key difference of PA-DeepCS and SQ-DeepCS is that SQ-DeepCS adopts an attention layer to concentrate the vectors of method name, program slice, and tokens. The result implies that considering the quality of different parts of code snippets can significantly improve the search performance. By using attention mechanism when fusing method name, API sequence, and tokens of the code snippet, SQ-DeepCS gives higher weight to the highquality parts, so as to better improve the search performance.

Results
For NCS and At-CodeSM, we can see that SQ-DeepCS once again takes the lead in the search performance evaluated by S@k, P@k, and MRR. Compared with NCS, MRR of SQ-DeepCS is improved by 12.3%. In addition, it is worth noting the difference between PA-DeepCS and At-CodeSM. PA-DeepCS outperforms At-CodeSM in most indicators. The key difference between them is the representation method of code snippets. The result implies that program slice performs better than AST in expressing the semantic information of code snippets. We speculate that this is because program slicing removes the special features in AST and uses more generalized common features instead. In the training process, we use MRR as an indicator of model convergence. We take the result that MRR is no longer improved in 30 consecutive epochs as the condition to stop training. Figure 7 shows the MRR of each epoch of validation set in the training process. We can observe that SQ-DeepCS achieves better MRR over nearly all epochs and stops training in the 144th epoch, 26.5% less than DeepCS (196th epoch). The results show that our model converges faster than DeepCS. Compared with DeepCS, our model needs to train fewer parameters. DeepCS has nearly 11 million parameters, while our model only needs to train 8 million, which is 27.3% lower than DeepCS. This is reflected in the training time of each epoch where DeepCS cost 140 s and we only need 100 s.

Discussions
We take a brief discussion about the performance of SQ-DeepCS and other code search models in Section 5.4. We propose a new representation method of code snippets named program slice. Compared with API sequence used in DeepCS and AST used in NCS, program slice contains more structural information of code snippets. Program slice extracts common features based on AST, which is more in line with the programming idea of code snippets and makes SQ-DeepCS perform better than At-CodeSM. The introduction of the attention mechanism enables different parts of the code snippets to obtain different weights. The attention mechanism weights the quality of code snippets and gives higher weight to the high-quality parts of code snippets. Via comparison from Section 5.4, we find that limitations still exist in our research. Although the overall performance of our model is better than the existing methods, we find that in the results of some individual natural language queries, the accuracy of DeepCS, NCS, and At-CodeSM exceeds that of our model. In the experiment, we found that when the functions described by query statements are complex, the performance of the five methods is not satisfactory. We speculate that the function of many Golang methods collected in GitHub in the dataset is parameter setting or intermediate variables updating. These kinds of code snippets are less reusable and will thus affect search performance.

Conclusions
In this paper, we propose a novel code search method named SQ-DeepCS. SQ-DeepCS introduces a code representation method called program slice to represent the structural information as well as API usage of code snippets. Meanwhile, SQ-DeepCS introduces a novel deep neural network named MD-JEnn to weight the quality of code snippets. We train the model and search for code snippets with respect to the top-rated questions from Stack Overflow. We use four evaluation indicators to measure the effectiveness of SQ-DeepCS: FRank, SuccessRate@k, PrecisionRate@k, and MRR. The experimental results show that our approach can provide better results than existing techniques when searching for relevant code snippets. Our potential future research may focus on extracting features of complex code snippets. Meanwhile, we also suggest studying why the proposed method performed worse than existing methods on some simple code snippets.