Early-Stage Graph Fusion with Refined Graph Neural Networks for Semantic Code Search

Ao, Longhao; Qi, Rongzhi

doi:10.3390/app16010012

Open AccessArticle

Early-Stage Graph Fusion with Refined Graph Neural Networks for Semantic Code Search

by

Longhao Ao

¹ and

Rongzhi Qi

^1,2,*

¹

College of Computer Science and Software Engineering, Hohai University, Nanjing 211100, China

²

Key Laboratory of Water Big Data Technology of Ministry of Water Resources, Hohai University, Nanjing 211100, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(1), 12; https://doi.org/10.3390/app16010012

Submission received: 24 October 2025 / Revised: 5 December 2025 / Accepted: 11 December 2025 / Published: 19 December 2025

Download

Browse Figures

Versions Notes

Abstract

Code search has received significant attention in the field of computer science research. Its core objective is to retrieve the most semantically relevant code snippets by aligning the semantics of natural language queries with those of programming languages, thereby contributing to improvements in software development quality and efficiency. As the scale of public code repositories continues to expand rapidly, the ability to accurately understand and efficiently match relevant code has become a critical challenge. Furthermore, while numerous studies have demonstrated the efficacy of deep learning in code-related tasks, the mapping and semantic correlations are often inadequately addressed, leading to the disruption of structural integrity and insufficient representational capacity during semantic matching. To overcome these limitations, we propose the Functional Program Graph for Code Search (called FPGraphCS), a novel code search method that leverages the construction of functional program graphs and an early fusion strategy. By incorporating abstract syntax tree (AST), data dependency graph (DDG), and control flow graph (CFG), the method constructs a comprehensive multigraph representation, enriched with contextual information. Additionally, we propose an improved metapath aggregation graph neural network (IMAGNN) model for the extraction of code features with complex semantic correlations from heterogeneous graphs. Through the use of metapath-associated subgraphs and dynamic metapath selection via a graph attention mechanism, FPGraphCS significantly enhances its search capability. The experimental results demonstrate that FPGraphCS outperforms existing baseline methods, achieving an MRR of 0.65 and ACC@10 of 0.842, showing a significant improvement over previous approaches.

Keywords:

code search; graph attention mechanisms; graph neural networks; heterogeneous graphs

1. Introduction

With the rapid development of software engineering in recent years, the volume of code resources stored in both public and private repositories has grown exponentially, reaching unprecedented levels. Additionally, research data indicate that developers typically spend approximately one-third of their development time retrieving relevant code snippets from large-scale codebases [1]. This is because they can improve software development efficiency and quality by reusing or modifying existing code, significantly reducing development costs. Although code search can partially meet developers’ needs, its intelligence has not yet reached expectations because of the inherent complexity and diversity of code. Code search generally consists of two stages: early information retrieval methods, which rely on text matching, and deep learning-based methods, where both queries and code are embedded in a shared vector space. For example, Sourcerer [2] constructed a code search engine that supports both keyword-based and structure-aware retrieval by modeling and indexing the semantic structure of source code, enabling the retrieval of program entities and semantics. QECK [3] automatically extends queries through pseudo-relevance feedback (PRF), whereas QueCos [4] generates natural language descriptions related to the original query, thus enriching the query’s semantic information. These information retrieval-based methods [5] primarily depend on keyword query matching. However, these methods fail to address the semantic gap between code and natural language queries, significantly limiting the accuracy of the search results.

In recent years, the focus of research has shifted toward deep learning (DL), where both natural language queries and source code are jointly embedded into a shared vector space, and their semantic matching is computed on the basis of similarity in the same-dimensional space. This method not only removes irrelevant keywords but also learns the semantic mapping between queries and code snippets [6]. Although graph neural networks (GNNs) have demonstrated potential in the code search domain, several challenges remain. First, when matching natural language queries with code snippets, heterogeneity arises. The complex relationships between different types of nodes and edges in heterogeneous graphs have yet to be effectively addressed, leading to the underutilization of rich semantic information [7]. Second, most existing methods extract only partial feature information from the code, such as SPT-Code [8] and GGNN [9], which encode abstract syntax trees and control flow graph, respectively. These methods overlook global dependencies and structural information under query conditions, leading to code snippets that are semantically fragmented and making it difficult to ascertain the true query intent.

To address these challenges, FPGraphCS introduces an early fusion strategy that combines AST, DDG, and CFG into a unified functional program graph for improved code feature extraction. The selection of AST, DDG, and CFG for early graph fusion is driven by their complementary strengths in representing critical aspects of code semantics. While the AST captures the syntactic structure, the DDG models data dependencies, and the CFG represents control flow, together they provide a more comprehensive view of code, enabling more precise capture of code features and semantic information. Given the heterogeneous nature of the edges in the multi-graph, an improved graph neural network [10] is employed to process different edge types, such as control flow, data dependency, and syntactic relationships, enhancing the model’s ability to understand the semantic structure of the code. Furthermore, by integrating metapath-associated subgraphs and a graph attention mechanism, we dynamically assign weights to different nodes, emphasizing key information while capturing temporal dependencies within the code, thereby improving the accuracy of code representation and enhancing query matching performance. To validate the effectiveness of our method, we conducted extensive performance evaluations using the Java dataset from the open-source CodeSearchNet [11] dataset.

1.: We propose an early fusion strategy that integrates the AST, DDG, and CFG of code statements to construct a functional program graph, enhancing the representation of code features.
2.: For code representation, we design the IMAGNN that constructs metapath-associated subgraphs to mitigate information loss and redundant aggregation, while substituting attention with mean pooling to reduce computational overhead without compromising representational fidelity on heterogeneous graphs.
3.: Extensive empirical evaluation on the publicly available CodeSearchNet benchmark demonstrates that FPGraphCS attains statistically significant and consistent gains over state-of-the-art baselines, thereby substantiating its superior semantic precision and robustness in code search.

The remainder of this paper is organized as follows: Section 2 provides a detailed background on existing methods in code search. Section 3 presents the proposed approach. In Section 4, we describe the experimental setup, performance evaluation, and comparisons with baseline methods using the CodeSearchNet dataset. Finally, Section 5 discusses the results and outlines future research directions.

2. Related Work

2.1. Text-Based Code Representation

Traditional text-based code representation methods [12] rely primarily on code attribute information [13], such as method names and keywords [14]. The seminal work DeepCS [15] pioneered the application of deep learning to code search by proposing a deep neural network model (CODEnn) that abandons direct similarity matching. Instead, it jointly embeds natural language descriptions and source code into a shared high-dimensional vector space, enabling semantically related code and descriptions to be represented by proximate vectors for similarity computation. Compared with conventional information retrieval techniques, this approach demonstrates superior generalizability and retrieval accuracy, marking a milestone in neural code search. NCS [16] subsequently leverages unsupervised word embedding training on code and comments, combined with TF-IDF weighted averaging of these embeddings, to derive composite vector representations for both code and annotations. Ranking is then performed on the basis of similarity scores, resulting in increased search efficiency.

2.2. Structural Feature-Based Code Representation

Structural feature-based approaches [17] exploit graph-structured representations of code, including abstract syntax trees (ASTs), data flow graphs (DDGs), and program dependency graphs (PDGs). Niu et al. [8] argued that simplistic text-based methods lack deep structural comprehension. They introduced a sequence-to-sequence architecture augmented by a pretraining task predicting code-AST structures, facilitating model acquisition of complex syntactic patterns to enhance semantic matching [18]. Similarly, Guo et al. [19] emphasized modeling data dependency flows among variables and incorporated auxiliary pretraining objectives, such as data flow edge prediction and node alignment, to mitigate structural complexity and reinforce control flow awareness. Zhao et al. [20] identified the limitations of uni-faceted code modeling and proposed an enhanced program dependency graph representation that preserves syntactic cues while augmenting data and controlling dependencies, thereby increasing flexibility and robustness in capturing intricate structural and semantic code features. Wan et al. [21] reported that single-modality features inadequately capture full semantic richness. They combined code tokens, ASTs, and control flow graphs (CFGs), employed LSTM, Tree-LSTM [22], and GGNN [9] encoders, respectively, and applied intermediate fusion techniques to achieve a comprehensive semantic representation. Gu et al. [23] further substantiated the efficacy of multimodal features, proposing AST simplification by pruning redundant nodes and unifying node labels, followed by a multimodal fusion strategy. Ling et al. [24] highlighted the limitations of conventional graph representations when dealing with complex or unstructured data and proposed a variable-centric fine-grained flow graph based on the LLVM intermediate representation (IR), which models precise variable interactions via integrated data and control dependencies. Liu et al. [25] proposed a code search framework utilizing bidirectional gated graph neural networks (BiGGNNs) combined with multihead attention, emphasizing the global structural context and deep semantic understanding. While these approaches advance code representation from multiple perspectives, insufficient functional semantic expressiveness constrains their ability to capture complex syntactic structures and logical dependencies comprehensively. Moreover, they generally lack mechanisms to allocate representational weights dynamically among heterogeneous features. To overcome these deficiencies, this work proposes a novel functional program graph representation framework that aims to comprehensively encode code semantics and thereby improve matching accuracy.

2.3. Heterogeneous Graph Representation Learning Methods

The evolution of heterogeneous graph representation learning has progressed from shallow graph decomposition and non-metapath random walk methods [26], which employ relatively simple low-dimensional node embeddings suited for small-scale heterogeneous graphs [27], to advanced deep learning-based methods capable of modeling large and complex graphs [28]. Shallow methods are limited in their ability to capture higher-order semantic and logical structures inherent in large heterogeneous graphs [29].

The advent of graph neural networks has facilitated multilevel neighborhood aggregation and feature propagation mechanisms that significantly enhance the capacity to learn expressive node embeddings from heterogeneous graph structures. These models capture intricate logical relationships between nodes and edges, as well as multiscale semantic contexts, resulting in improved accuracy and generalizability of node representations.

Current state-of-the-art heterogeneous graph learning paradigms encompass relation-subgraph-based, metapath-subgraph-based, and non-subgraph-based approaches. Relation-subgraph methods decompose the heterogeneous graph by edge type into multiple relation-specific subgraphs, each emphasizing particular semantic relations [30]. These methods aggregate neighborhood features within each subgraph and apply residual connections to maintain gradient flow. Notably, relational graph convolutional networks (RGCN) [31] instantiate this paradigm by designing relation-specific convolutional filters associated with distinct adjacency and weight matrices to model heterogeneous connectivity patterns. The aggregated information is weighted by relation type and transformed via nonlinear activation functions across layers [32]. Metapath-subgraph methods introduce metapaths to define semantic sequences across heterogeneous node types, enabling the capture of diverse semantic proximities through path-based aggregation. For example, metapath aggregated graph neural network(MAGNN) employs dedicated encoders per metapath, aggregating endpoint and intermediate node semantic information to enrich node embeddings and reinforce comprehensive relational understanding. In contrast, non-subgraph-based methods embed heterogeneous graphs by projecting nodes of diverse types into a shared semantic space via linear transformations, followed by type-agnostic or type-aware neighborhood aggregation strategies.

3. Methodology

3.1. Model Framework

This study introduces FPGraphCS, a semantic code-search method that projects functional program graph representations of source code and tokenised natural-language queries into a unified vector space, thereby enabling similarity-based retrieval. The architecture, illustrated in Figure 1, comprises two principal modules: the code processing module and the query processing module. An early fusion strategy is employed to integrate multidimensional code features, thereby substantially improving both the accuracy and robustness of code retrieval [33]. FPGraphCS differs from representative baselines such as MAGNN, GraphCodeBERT, and DGMS/MMAN in both its graph construction and its heterogeneous message aggregation strategy. It constructs a functional program graph that performs early fusion of the AST, CFG, and DDG into a unified heterogeneous multigraph prior to neural encoding, and defines metapaths that follow control-flow, data-dependency, and syntactic relations between statement nodes rather than schema-based patterns. Within the IMAGNN encoder, intra-metapath neighbourhood interactions are aggregated via mean pooling, while semantic-level attention across metapaths is retained, yielding a lightweight yet expressive architecture specifically tailored to semantic code search.

Within the code processing module, comprehensive multimodal representations are initially extracted from the source code repository, encompassing ASTs, DDGs, and CFGs. These heterogeneous graph modalities are subsequently unified into a comprehensive functional program graph representation. Multilevel feature extraction is then conducted via an enhanced metapath aggregation graph neural network. Specifically, node-level features of the program graph are first encoded via advanced word embedding techniques. Contextual feature aggregation is performed on metapath-associated subgraphs, yielding enriched semantic representations of code statement nodes. This process culminates in the generation of high-dimensional semantic vector embeddings of the source code [34].

On the query processing side, user queries are embedded into high-dimensional vector representations employing state-of-the-art word embedding methodologies. Leveraging the pretrained feature extraction model, FPGraphCS encodes queries into semantic vectors. These vectors are then compared against code embeddings from the repository through similarity metrics, enabling the system to rank and return the top-k most semantically relevant code snippets.

3.2. Functional Program Graph

Definition: Given the control flow graph

C_{F G} = (V_{C F}, E_{C F})

, the data dependency graph

G_{D D} = (V_{D D}, E_{D D})

, and the abstract syntax tree

G_{A S T} = (V_{A S T}, E_{A S T})

, where

V_{C F}

,

V_{D D}

, and

V_{A S T}

represent the sets of nodes in the CFG, DDG, and AST, respectively, and

E_{C F}

,

E_{D D}

, and

E_{A S T}

are the corresponding directed edge sets. The functional program graph

G = (V, E)

is defined as a graph formed by the nodes and directed edges from these graphs.

V is the set of nodes, $V = V_{C F}$ , with $V_{D D} \subseteq V$ and $V_{A S T} \subseteq V$ . Each node represents a statement in the code.
E is the set of directed edges, where each edge is represented as $e = (v_{i}, v_{j}, r) \in E$ , where $v_{i}$ , and $v_{j}$ are arbitrary nodes in V, and $r \in R = {0, 1, 2}$ denotes the type of edge.
$r = 0$ indicates that the edge type is a control relationship. $r = 1$ indicates that the edge type is a data dependency. $r = 2$ indicates that the edge type is an abstract syntax structure.
If $(v_{i}, v_{j}) \in E_{C F}$ , then $(v_{i}, v_{j}, 0) \in E$ , edge represents the CFG.
If $(v_{i}, v_{j}) \in E_{D D}$ , and $(v_{i}, v_{j}) \notin E_{C F}$ , then $(v_{i}, v_{j}, 1) \in E$ , edge represents the DDG.
If $(v_{i}, v_{j}) \in E_{A S T}$ , and $(v_{i}, v_{j}) \notin E_{C F}$ and $(v_{i}, v_{j}) \notin E_{D D}$ , then $(v_{i}, v_{j}, 2) \in E$ , edge represents the AST.

To unify syntactic, control flow, and data dependency semantics, we convert the AST, CFG, and DDG into a single functional program graph [35]. Algorithm 1 formalises this transformation. Starting from an empty graph, Algorithm 1 proceeds in three ordered stages. First (Lines 1 to 5), the functional program graph is constructed and all control flow edges from

G_{CFG}

are inserted; any missing nodes are created, and each such edge is labelled 0. Next (Lines 6 to 12), the procedure iterates over the data dependency edges of

G_{DDG}

, adding absent nodes and inserting an edge with label 1 only if the pair

(v_{i}, v_{j})

is not already connected by an edge in

G_{CFG}

. Finally (Lines 13 to 20), syntactic structure edges from

G_{AST}

are considered; missing nodes are added as needed, and an edge is inserted with label 2 only if it is absent from both

G_{CFG}

and

G_{DDG}

. This ordering ensures that, when multiple relation types could connect the same pair of nodes, the semantically most informative relation is retained.

Time and Space Complexity: The time complexity of constructing the functional program graph (Algorithm 1) is

O (| E (C F G) | + | E (D D G) | + | E (A S T) |)

, where

| E (C F G) |

,

| E (D D G) |

, and

| E (A S T) |

are the number of edges in the CFG, DDG, and AST, respectively, and the space complexity is

O (| V (C F G) | + | V (D D G) | + | V (A S T) | + | E (C F G) | + | E (D D G) | + | E (A S T) |)

, as it primarily depends on storing the graph structure and node features.

Algorithm 1 Building functional program graph.

Require: CFG

G_{CFG}

, DDG

G_{DDG}

, AST

G_{AST}

Ensure: Functional program graph

G_{CF - DD - AST}

1:: Initialize $G_{CF - DD - AST}$
2:: for each edge $(v_{i}, v_{j}) \in E (G_{CFG})$ do
3:: if $v_{i} \notin V (G_{CF - DD - AST})$ then
4:: Add node $v_{i}$ to $G_{CF - DD - AST}$
5:: end if
6:: if $v_{j} \notin V (G_{CF - DD - AST})$ then
7:: Add node $v_{j}$ to $G_{CF - DD - AST}$
8:: end if
9:: Add edge $(v_{i}, v_{j}, 0)$ to $G_{CF - DD - AST}$
10:: end for
11:: for each edge $(v_{i}, v_{j}) \in E (G_{DDG})$ do
12:: if $v_{i} \notin V (G_{CF - DD - AST})$ then
13:: Add node $v_{i}$ to $G_{CF - DD - AST}$
14:: end if
15:: if $v_{j} \notin V (G_{CF - DD - AST})$ then
16:: Add node $v_{j}$ to $G_{CF - DD - AST}$
17:: end if
18:: if $(v_{i}, v_{j}) \notin E (G_{CFG})$ then
19:: Add edge $(v_{i}, v_{j}, 1)$ to $G_{CF - DD - AST}$
20:: end if
21:: end for
22:: for each edge $(v_{i}, v_{j}) \in E (G_{AST})$ do
23:: if $v_{i} \notin V (G_{CF - DD - AST})$ then
24:: Add node $v_{i}$ to $G_{CF - DD - AST}$
25:: end if
26:: if $v_{j} \notin V (G_{CF - DD - AST})$ then
27:: Add node $v_{j}$ to $G_{CF - DD - AST}$
28:: end if
29:: if $(v_{i}, v_{j}) \notin E (G_{CFG}) \land (v_{i}, v_{j}) \notin E (G_{DDG})$ then
30:: Add edge $(v_{i}, v_{j}, 2)$ to $G_{CF - DD - AST}$
31:: end if
32:: end for
33:: return $G_{CF - DD - AST}$

The functional program graph is a multiattribute heterogeneous graph designed to capture the semantic features of code. It is formed by the fusion of the AST, CFG, and DDG, with the aim of overcoming the limitations of traditional program dependency graphs and CFGs in modeling complex syntactic structures. Although these two types of graph structures have long been widely used in code analysis, the continuous development of programming languages and the diversification of development goals suggest that a single-structure code representation may lead to insufficient expression of the sequential execution flow, control logic, and semantic relationships within code blocks.

Therefore, a functional program graph is designed to incorporate multiple code semantics. From the analysis of the code example in Figure 2, the AST includes method declarations (e.g., proc, store), control statements (e.g., if statements), and exception throwing (throw statements), which reflect method calls and the nesting relationships of syntax blocks. The CFG focuses on the execution path of the code. In the example, the if statement is used to determine whether the store method is called and whether an exception is thrown, illustrating the execution order and control logic of the program. The DDG, on the other hand, captures the variable definitions and interactions. For example, the variable in is passed to the store method after a conditional check, making the variable transfer relationship within the code more explicit. By jointly modeling these three types of features, the Functional Program Graph compensates for the limitations of single-modal information, providing richer contextual information. It also supplies higher-quality input to deep learning models, significantly enhancing the semantic expression of the code.

3.3. Code Feature Extraction

Following the construction of a functional program graph by merging the CFG, DDG and AST (see Section 3.2 for details), FPGraphCS employs IMAGNN to compute latent code embeddings. The extraction pipeline comprises three successive stages: (i) node statement feature extraction, (ii) node statement context feature extraction, and (iii) feature fusion.

3.3.1. Node Statement Feature Extraction

First, word embedding techniques are applied to extract features from different types of nodes in the program graph. Each node is modeled with the code statement as its basic feature. However, since code statements vary in length, we first tokenize each statement to obtain the corresponding sequence of tokens [36]. Afterward, word embedding techniques are used to convert each token into a vector. The vectors of the token sequence are then averaged and integrated to form the feature vector for each statement. Specifically, suppose that the tokenized statement consists of n tokens, represented as:

S = 〈 s_{1}, s_{2}, \dots, s_{n} 〉

(1)

where n is the length of the sequence. After word embedding, the token vector sequence is as follows:

Z_{s} = 〈 C s_{1}, C s_{2}, \dots, C s_{n} 〉

(2)

Finally, the vector representation of the node statement is obtained by averaging the token vectors in the sequence:

C_{u d} = \frac{1}{n} \sum_{i = 1}^{n} C s_{i}

(3)

3.3.2. Node Statement Context Feature Extraction

The context feature extraction of the node is the core component of the entire model. Its main goal is to enhance the representation of the statement node and effectively construct the contextual relationships of statements within the code.

The model in this paper utilizes multi-hop information propagation across nodes via a metapath strategy, which specifies the traversal method of information within the heterogeneous graph, thereby enabling effective global context modeling, as shown in Figure 3.

Figure 3a illustrates the transformation of node content, which incorporates three different types of information to better represent the code’s execution logic, data dependencies, and syntactic structure. By projecting different types of nodes into the same vector space, linear transformations are applied to each node type:

h_{u}^{'} = W_{A} \cdot f_{u}^{A}

(4)

where

f_{u}^{A} \in R^{d A}

is the original feature vector, and

h_{u}^{'} \in R^{d}

is the projection of node u.

W_{A}

represents the parameter weight matrix. This ensures that all nodes in the graph have similar vector representations. The metapath, which is constructed on the basis of the functional program graph, helps facilitate the subsequent aggregation process.

The metapath consists of node types and edge types. Nodes can represent code statements, function calls, and other elements, whereas edge types include control flow edges, data dependency edges, and syntactic structure edges. A metapath can be described as a path containing complex relationships, with a starting node

A_{1}

and an ending node

A_{k + 1}

.

After feature preprocessing, the method constructs, for each metapath

P \in Φ_{X}

and each node v, a metapath-based local context

S_{v}^{P}

. This context is formally defined as follows: given a heterogeneous graph G and a metapath P, the local subgraph centered at node u on metapath P is denoted by

S_{u}^{P} = (U_{u}^{P}, E_{u}^{P})

, where the edge

(u_{1}, u_{2}) \in E_{u}^{P}

if and only if this edge belongs to a metapath instance starting from node u following P in the original graph G. Specifically,

S_{u}^{P}

includes all nodes connected to u via metapath P and their corresponding edges.

The metapath-based local context

S_{v}^{P}

captures the relationships between the target node v and its neighbors as induced by the metapath P. This approach effectively addresses the information loss caused by missing intermediate nodes in metapaths within the HAN model, while simultaneously reducing redundant computations arising from overlapping metapaths in MAGNN.

Figure 3b depicts the intra-metapath aggregation module, wherein the refined metapath-aggregation graph neural network merges messages originating from neighbors linked by the same metapath. In stark contrast, the baseline architecture shown in Figure 4 is predicated upon hand-crafted metapaths and a neighbor-level attention mechanism. Although attention [37] mechanisms are routinely introduced at this stage in prior heterogeneous GNNs, a controlled ablation study on the DBLP and IMDB benchmarks in which both neighbor-level and semantic-level attention are removed and replaced by mean aggregation demonstrates that such attention is not essential. As summarized in Table 1, substituting mean pooling for attention yields statistically indistinguishable Micro-F1 scores while markedly reducing the average wall-clock time per training epoch. These findings indicate that mean aggregation constitutes an efficient alternative to attention for intra-metapath aggregation, conferring substantial computational savings without degrading predictive performance.

Formally, for each node u, the IMAGNN aggregates the features of neighbors on the basis of metapaths, producing a series of semantic feature vectors represented as follows:

m_{u} = \{z_{t}^{P} = \frac{1}{∥ S^{P} ∥} \sum_{(u, v) \in S^{P}} x_{v} : P \in Φ_{X}\}

(5)

Here,

S^{P}

denotes the set of all metapath instances related to metapath P, and

p (u, v)

represents a metapath instance with target node u and source node v. For each target node u, the IMAGNN selects specific metapath structures according to the specified metapath P, determining how to connect to neighboring nodes via different edge types. The features of neighboring nodes w, denoted

h_{w}

, are aggregated with weights according to edge types. The aggregation of information is defined as:

h_{u}^{(t + 1)} = \sum_{w \in N (u)} \frac{1}{| N (u) |} W h_{w}^{(t)}

(6)

where

h_{u}^{(t + 1)}

is the feature of node u at layer

t + 1

,

N (u)

is the set of neighbor nodes of node u, and

\frac{1}{| N (u) |}

is the weight for mean aggregation. W is the weight matrix associated with the edge type. Through this metapath-based aggregation, the model propagates information across different edge types in the heterogeneous graph. At this stage, the features of neighbouring nodes are combined using parameter-free mean aggregation, so that each neighbour contributes equally regardless of edge type. This design replaces the neighbor-level attention commonly used in heterogeneous GNNs with a computationally cheaper yet numerically stable operator, and produces intermediate node representations that are subsequently refined by the semantic-level attention mechanism described below.

Figure 3c illustrates the subsequent semantic-level metapath aggregation, where a graph attention mechanism is employed to adaptively weight the contributions of different metapath-specific contexts for each node rather than individual neighbors. In this mechanism, the importance of a metapath P for node u is quantified by the following attention coefficients:

S_{i}^{P} = \frac{1}{| V_{P} |} \sum_{u \in V_{P}} tanh (W h_{P}^{u} + b_{P})

(7)

e_{u w}^{P} = q_{P}^{T} \cdot S_{i}^{P}

(8)

where

V_{P}

represents the set of nodes on the metapath

P^{i}

,

h_{u}^{i}

represents the feature vector of the target node u at the current layer i,

b_{P}

denotes the bias term associated with metapath P,

q_{P}^{T}

represents the attention weight vector associated with the metapath P and

h_{p} (u, w)

denotes the feature vector aggregated from the entire metapath instance. This aggregation process does not focus solely on neighboring nodes but also considers the information propagated throughout the entire metapath. The attention mechanism ensures that more relevant information is given higher weights, allowing for a more precise understanding of the global context and dependencies across distant nodes in the graph.

3.3.3. Feature Fusion

After completing node feature extraction and context feature modeling, the model performs feature aggregation to generate the final representation for each node. To achieve this, the model uses the Softmax function to normalize and update all selected paths:

β_{u w}^{P} = \frac{exp (e_{u w}^{P})}{\sum_{v \in N (u)} exp (e_{u v}^{P})}

(9)

h_{u}^{P} = σ (\sum_{w \in N_{u}^{P}} β_{u w}^{P} h_{P} (u, w))

(10)

In the formula,

v \in N (u)

represents all the neighboring nodes of the target node u. For each target node u, the model collects the features of its neighboring nodes w on the basis of the specified metapath, and performs a weighted sum according to the edge type. In this process, the model can dynamically assign different weights to the features and aggregate information from different types of edges, thereby generating the final feature representation for each node. Next, the model applies an average pooling operation to the features of each node to enhance the features and generate a global feature representation of the graph. Suppose that the program graph contains t nodes and that the initial vector for each node in the graph is

h_{i}^{(y)}

, which represents the feature vector of node u after the feature extraction process. After the feature for each node is obtained through the previous steps, the local node information is aggregated into a global graph feature representation as follows:

h_{graph} = \frac{1}{t} \sum_{i = 1}^{t} h_{i}^{(y)}

(11)

3.4. Model Training

During the training phase, we adopt a similarity-based loss function to optimize the model. Unlike traditional ranking-based loss functions, this loss function handles the issue of node similarity in the graph structure. The loss formula used in this method for minimization is as follows:

\begin{matrix} loss = max ( & 1 - sim (p_{graph}, p_{query}^{+}) & + max_{k} sim (p_{graph}, p_{{query}_{k}}^{-}), 0) \end{matrix}

(12)

We use all the code representations and their corresponding code comments, where the positive samples

p_{query}^{+}

represent the feature vectors of the comments corresponding to the code, and the negative samples

p_{query}^{-}

are randomly selected comment feature vectors. The cosine similarity metric is employed to derive the minimization formula, aiming to maximize the similarity between the target code’s feature vector and its corresponding comment, while minimizing the similarity between the target code’s feature vector and the most similar negative sample comment feature vector.

4. Experiments

4.1. Dataset

The dataset used in this experiment is derived from the Java subset of the CodeSearchNet corpus, which includes code search data for six programming languages: Java, Python, Ruby, JavaScript, PHP, and Go, as summarized by language in Table 2. The decision to focus exclusively on the Java dataset stems from the fact that, for other programming languages, there are currently no effective tools available for parsing the necessary graph structures, specifically the AST, CFG, and DDG. Furthermore, Java code in other datasets is often preprocessed in ways that interfere with the extraction of these essential graph structures. As a result, this study is conducted solely on the Java dataset from CodeSearchNet. The dataset contains a subset of code snippets that could not be parsed due to syntax errors or other issues. Therefore, these unparsed samples were excluded, and the following data preprocessing steps were implemented:

1.: Code snippets for which Abstract Syntax Tree, Control Flow Graph, or Data Dependency Graph could not be extracted were removed. Such failures typically arise due to syntax errors that impede the parsing tools.
2.: Lambda expressions present in Java code were excluded because the extraction tool Progex does not support features introduced in Java JDK versions above 1.7.
3.: Samples with node counts of zero or exceeding 600 were discarded. A node count of zero indicates an empty function body, whereas the upper threshold was set to 600 to optimize algorithm runtime performance.
4.: Superfluous characters and symbols (e.g., \n, \t, []), as well as numeric literals that bear no semantic significance within the code, were removed since their presence adversely impacts matching accuracy.
5.: Functions containing only comments or non-executable code were excluded. These functions, often used as placeholders or documentation, do not contribute to meaningful code retrieval and could degrade the dataset’s quality.
6.: Functions, variables, and statements following camelCase or snake_case naming conventions were tokenized into individual units. This segmentation reduces vocabulary size and improves semantic clarity, enabling the model to process each component of the identifier more efficiently.

Table 3 presents the sample size and proportions of the processed dataset.

To ensure a rigorous evaluation of the proposed enhanced metapath-aggregation GNN, we additionally employ two canonical heterogeneous information network (HIN) benchmarks drawn from different application domains: the Internet Movie Database (IMDB) and the dblp computer science bibliography (DBLP).

IMDB comprises three node types: movies, directors, and actors, as well as their corresponding relation edges. The content of each movie node is encoded as a bag-of-words (BoW) vector constructed from its associated plot keywords.

The dataset extracted from the DBLP contains authors, papers, and venues as distinct node types. The author nodes are labeled with one of four research areas: Database, Data Mining, Artificial Intelligence, and Information Retrieval. For each author, BoW feature vector is derived from the keywords of the papers that he or she has authored.

Key statistics for both datasets are summarised in Table 4. These two HIN benchmarks are further utilised to conduct auxiliary classification experiments, aiming to assess the generalisation capability of the proposed model on domains characterised by distinct graph topologies and semantic heterogeneity. The corresponding classification performance on the DBLP and IMDB datasets is reported in Table 1.

4.2. Evaluation Metrics

To evaluate the effectiveness of the method, we use four commonly employed code evaluation metrics: MRR, ACC@1, ACC@5, and ACC@10. The selection of these cut-offs (k = 1, 5, 10) is based on typical developer practices. Developers often examine the top 5 or 10 results in a code search, as they do not rely solely on the top result but instead explore multiple results to identify the most relevant code. As such, ACC@5 and ACC@10 offer a more realistic reflection of retrieval scenarios, where developers review the initial results before making a decision. These metrics are particularly useful for assessing the model’s ability to retrieve relevant code snippets within a limited number of top results, which aligns with real-world development workflows.

These metrics are particularly useful in assessing the model’s ability to retrieve relevant code snippets within a limited number of top results, as is typical in real-world development workflows.

MRR is defined as the average reciprocal rank of the first relevant result over a set of queries, formally calculated as follows:

MRR = \frac{1}{| Q |} \sum_{i = 1}^{| Q |} \frac{1}{{rank}_{i}}

(13)

ACC@k measures whether the correct answer appears within the top k position of the ranked results, defined formally as:

ACC @ k = \frac{count}{| Q |}

(14)

To assess the classification performance of the proposed metapath-aggregation graph neural network on heterogeneous information networks, we report two standard metrics, Macro-F1 and Micro-F1.

Macro-F1 is the unweighted arithmetic mean of the class-specific F1 scores; it therefore assigns equal importance to every class, irrespective of prevalence, and is particularly informative under label imbalance. Let Q denote the set of class labels, and let

{Precision}_{q}

and

{Recall}_{q}

be the precision and recall for class

q \in Q

. Macro-F1 is given by:

Macro - F1 = \frac{1}{| Q |} \sum_{q \in Q} \frac{2 {Precision}_{q} {Recall}_{q}}{{Precision}_{q} + {Recall}_{q}}

(15)

Micro-F1 pools the contingency counts over all classes before computing the F1 score, thereby weighting each class proportionally to its frequency and reflecting the overall predictive accuracy of the model. Let TP, FP, and FN denote the total numbers of true positives, false positives, and false negatives, respectively. Micro-F1 is defined as:

Macro - F1 = \frac{2 TP}{2 TP + FP + FN}

(16)

4.3. Implementation Details

To train the proposed model, data shuffling was performed on the dataset processed in Section 4.1. During pre-training, we focused on code snippets with a maximum length of 512 tokens. For each batch, code snippets exceeding the maximum length were truncated, while those shorter than the maximum length were padded with the special token "__pad" to achieve the specified length. The model was trained for 100 epochs using the Adam optimizer with an initial learning rate of 0.01. Early stopping was implemented, and the best-performing model was selected based on validation performance.

We train all models on a Linux server with Ubuntu 20.04.5 LTS and a GPU of Nvidia GeForce RTX 3090. The models were implemented in Python 3.8 with PyTorch 1.12.0 (CUDA 11.6), and the heterogeneous graph modules were implemented using DGL 1.0.0.

4.4. Results and Analysis

4.4.1. Performance Study

To evaluate the performance of FPGraphCS, we conducted training on the dataset and compared it with several advanced baseline methods. These baseline methods [38] include text-based feature models such as NCS, DeepCS and NBoW, as well as structure-based feature models such as MMAN, DGMS and MRNCS. All reported scores are obtained by averaging multiple runs under the same configuration in order to reduce randomness and improve the stability of the evaluation.

Table 5 presents the experimental results, which show that our method achieves the best performance on the CodeSearchNet dataset. Specifically, the MRR of FPGraphCS exceed those of NCS, DeepCS and NBoW by 28%, 18% and 10%, respectively. This demonstrates that the contextual information from functional multigraph representations can effectively improve the accuracy of code search. Additionally, FPGraphCS’s MRR outperforms MMAN, DGMS and MRNCS by 15%, 18% and 5%, respectively, indicating that richer structural information enhances code semantics.

4.4.2. Baseline Rationale

This study evaluates a lightweight early graph-fusion method in a compute-constrained setting. FPGraphCS is trained on the standard CodeSearchNet-Java training split and contains markedly fewer trainable parameters than transformer-based encoders. By contrast, state-of-the-art pre-trained encoders (CodeBERT [39], GraphCodeBERT [19], and UniXcoder [40]) contain approximately 110 to 125 million parameters and are pre-trained on no fewer than two million functions. A direct head-to-head retraining would conflate architectural effects with differences in model capacity and pre-training data scale, thereby compromising both fairness and reproducibility. In line with established practice, we report the official test-set scores released by the respective authors on the same CodeSearchNet-Java split and present them in Table 5 as external references. Our primary comparisons therefore focus on methods retrained from scratch under the same data budget, namely BM25, NCS, DeepCS, NBoW, and MMAN.

While pre-trained models such as CodeBERT, GraphCodeBERT, and UniXcoder are highly effective for code search, retraining them under identical conditions would introduce confounding factors due to their different pre-training scales and capacities. Specifically, CodeBERT, GraphCodeBERT, and UniXcoder differ in terms of the amount of code-specific data they were exposed to during pre-training, making a direct comparison challenging. Given these differences, we chose to compare methods retrained from scratch using the same data budget to ensure a fair comparison.

The integration of large pre-trained encoders into FPGraphCS to investigate potential synergies is reserved for future work (Section 5).

4.4.3. Ablation Study

To study the impact of different graph structural representations on code search performance, we compared the experimental results when only using the control flow graph (FPGraphCS-CFG), only the data dependency graph (FPGraphCS-DDG), only the Abstract Syntax Tree (FPGraphCS-AST), and the fusion of all three (FPGraphCS) were used. As shown in Table 6, FPGraphCS significantly outperforms the other methods in terms of MRR. Although using a single graph structure can somewhat improve code representation, the experimental data suggest that relying solely on one type of structural information is insufficient to fully capture the syntax, execution order, and control logic of the code. Therefore, combining all three structures effectively integrates their individual characteristics, providing richer contextual information and thereby improving the accuracy of the code search.

4.4.4. Sensitivity Analysis

To analyze the robustness of our proposed method, we examine two key parameters code length and comment length that may impact the effectiveness of code representation. Figure 5 illustrates the performance of FPGraphCS across various evaluation metrics under different parameter configurations.

Figure 5 clearly shows that even with a significant increase in both comment length and code length, FPGraphCS maintains stable performance. This result can be attributed to the design advantages of our functional program graph. Even when the input size is larger, the model is still capable of handling the data effectively. This demonstrates the model’s robustness in processing larger codebases and longer queries, highlighting its ability to scale without compromising accuracy. Additionally, we analyzed the characteristics of the dataset used in the experiments. On average, code snippets contain 80 tokens, while comments consist of 15 tokens. The distribution of code and comment lengths in the dataset is relatively uniform, with few outliers for long code snippets and comments. These characteristics were accounted for during preprocessing, which included tokenization and normalization, ensuring dataset consistency and improving reproducibility.

4.4.5. Time Analysis

To quantify computational efficiency, we benchmark the IMAGNN against three state-of-the-art heterogeneous-graph models HAN, RGCN, and MAGNN with the results summarized in Figure 6. Our redesign removes the intra-metapath attention module and instead derives a semantic graph that decomposes the original HIN into a collection of metapath-specific subgraphs. This decomposition substantially reduces message-passing overhead and, consequently, the overall algorithmic complexity.

Figure 6 shows the average wall-clock time per training epoch versus the corresponding Micro-F1 for each approach. The proposed model attains a markedly lower training cost while matching or exceeding the predictive performance of all the baselines, thereby demonstrating superior efficiency without sacrificing effectiveness.

5. Conclusions

In this paper, we propose FPGraphCS, a novel code search method based on graph construction, which employs an optimized early fusion strategy to comprehensively learn the structural, control, and data dependency information of code through heterogeneous graph representation learning. This enables the effective establishment of semantic associations between natural language queries and source code. Furthermore, we introduce an enhanced metapath aggregation Graph Neural Network, which utilizes metapath-associated subgraphs and multi-hop path learning to thoroughly explore the contextual relationships between nodes in the graph, thereby improving the model’s ability to capture complex code features. Experimental results demonstrate that the proposed FPGraphCS method significantly outperforms existing baseline methods across several evaluation metrics, confirming its superior accuracy and robustness in code search. Specifically, FPGraphCS shows notable improvements in MRR and Top-k Accuracy, indicating its enhanced ability to handle code search tasks with high precision and effectiveness. The evaluation metrics of the proposed method exceed those of typical deep learning methods based on text and structural features by more than 5%.

Future work will focus on expanding experiments to datasets from other programming languages, such as Python, and optimizing the functional program graph to enhance its adaptability and performance in cross-language code search tasks. We also plan to refine the construction of functional program graphs and the FPGraphCS framework to better accommodate the nuances of language-specific constructs and tooling. Ensuring the model performs well across diverse programming environments. Furthermore, we will explore hybrid architectures that integrate large-scale pre-trained code models, such as CodeBERT, GraphCodeBERT, and UniXcoder, into our early graph fusion framework. This integration will aim to further enhance retrieval effectiveness, while taking into account realistic computational budgets and resource constraints.

Author Contributions

Conceptualization, methodology, software, validation, formal analysis, investigation, data curation, and writing—original draft preparation, L.A.; supervision, writing—review and editing, and project administration, R.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets used in this manuscript are publicly available datasets. Detailed information about these datasets is provided in Section Dataset of this manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Mao, Y.; Wan, C.; Jiang, Y.; Gu, X. Self-Supervised Query Reformulation for Code Search. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, San Francisco, CA, USA, 3–9 December 2023; pp. 363–374. [Google Scholar]
Lv, F.; Zhang, H.; Lou, J.-G.; Wang, S.; Zhang, D.; Zhao, J. CodeHow: Effective Code Search Based on API Understanding and Extended Boolean Model. In Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering, Lincoln, NE, USA, 9–13 November 2015; pp. 260–270. [Google Scholar]
Nie, L.; Jiang, H.; Ren, Z.; Sun, Z.; Li, X. Query Expansion Based on Crowd Knowledge for Code Search. IEEE Trans. Serv. Comput. 2016, 9, 771–783. [Google Scholar] [CrossRef]
Wang, C.; Nong, Z.; Gao, C.; Li, Z.; Zeng, J.; Xing, Z.; Liu, Y. Enriching Query Semantics for Code Search with Reinforcement Learning. arXiv 2021, arXiv:2105.09630. [Google Scholar] [CrossRef] [PubMed]
Huang, J.; Tang, D.; Shou, L.; Gong, M.; Xu, K.; Jiang, D.; Zhou, M.; Duan, N. CoSQA: 20,000+ Web Queries for Code Search and Question Answering. arXiv 2021, arXiv:2105.13239. [Google Scholar]
Xie, Y.; Lin, J.; Dong, H.; Zhang, L.; Wu, Z. Survey of Code Search Based on Deep Learning. ACM Trans. Softw. Eng. Methodol. 2023, 33, 1–42. [Google Scholar] [CrossRef]
Gao, X.; Jiang, X.; Wu, Q.; Wang, X.; Lyu, C.; Lyu, L. GT-SimNet: Improving Code Automatic Summarization via Multi-Modal Similarity Networks. J. Syst. Softw. 2022, 194, 111495. [Google Scholar] [CrossRef]
Niu, C.; Li, C.; Ng, V.; Ge, J.; Huang, L.; Luo, B. SPT-code: Sequence-to-Sequence Pre-training for Learning Source Code Representations. In Proceedings of the 44th International Conference on Software Engineering, Pittsburgh, PA, USA, 22–27 May 2022; pp. 2006–2018. [Google Scholar]
Gu, W.; Li, Z.; Gao, C.; Wang, C.; Zhang, H.; Xu, Z.; Lyu, M. CRaDLe: Deep Code Retrieval Based on Semantic Dependency Learning. Neural Netw. 2021, 141, 385–394. [Google Scholar] [CrossRef]
Fu, X.; Zhang, J.; Meng, Z.; King, I. MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding. In Proceedings of the Web Conference 2020, Taipei, Taiwan, 20–24 April 2020; pp. 2331–2341. [Google Scholar]
Husain, H.; Wu, H.-H.; Gazit, T.; Allamanis, M.; Brockschmidt, M. CodeSearchNet Challenge: Evaluating the State of Semantic Code Search. arXiv 2019, arXiv:1909.09436. [Google Scholar]
Sun, W.; Fang, C.; Ge, Y.; Hu, Y.; Chen, Y.; Zhang, Q.; Ge, X.; Liu, Y.; Chen, Z. A Survey of Source Code Search: A 3-Dimensional Perspective. ACM Trans. Softw. Eng. Methodol. 2024, 33, 166. [Google Scholar] [CrossRef]
Salza, P.; Schwizer, C.; Gu, J.; Gall, H.C. On the Effectiveness of Transfer Learning for Code Search. IEEE Trans. Softw. Eng. 2023, 49, 1804–1822. [Google Scholar] [CrossRef]
Chen, J.; Hu, X.; Li, Z.; Gao, C.; Xia, X.; Lo, D. Code Search Is All You Need? Improving Code Suggestions with Code Search. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, Lisbon, Portugal, 14–20 April 2024; pp. 73:1–73:13. [Google Scholar]
Gu, X.; Zhang, H.; Kim, S. Deep Code Search. In Proceedings of the 40th International Conference on Software Engineering, Gothenburg, Sweden, 27 May–3 June 2018; pp. 933–944. [Google Scholar]
Sachdev, S.; Li, H.; Luan, S.; Kim, S.; Sen, K.; Chandra, S. Retrieval on Source Code: A Neural Code Search. In Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, Philadelphia, PA, USA, 18 June 2018; pp. 31–41. [Google Scholar]
Xu, L.; Yang, H.; Liu, C.; Shuai, J.; Yan, M.; Lei, Y.; Xu, Z. Two-Stage Attention-Based Model for Code Search with Textual and Structural Features. In Proceedings of the 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), Luxembourg, 9–12 March 2021; pp. 342–353. [Google Scholar]
Hu, Y.; Cai, B.; Yu, Y. CSSAM: Code Search via Attention Matching of Code Semantics and Structures. arXiv 2022, arXiv:2208.03922. [Google Scholar]
Guo, D.; Ren, S.; Lu, S.; Feng, Z.; Tang, D.; Liu, S.; Zhou, L.; Duan, N.; Svyatkovskiy, A.; Fu, S.; et al. GraphCodeBERT: Pre-training Code Representations with Data Flow. arXiv 2020, arXiv:2009.08366. [Google Scholar]
Zhao, W.; Liu, Y. Utilising Edge Attention in Graph-Based Code Search. In Proceedings of the 34th International Conference on Software Engineering and Knowledge Engineering, SEKE 2022, Pittsburgh, PA, USA, 1–10 July 2022; pp. 60–66. [Google Scholar]
Wan, Y.; Shu, J.; Sui, Y.; Xu, G.; Zhao, Z.; Wu, J.; Yu, P.S. Multi-Modal Attention Network Learning for Semantic Source Code Retrieval. In Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering, San Diego, CA, USA, 11–15 November 2019; pp. 13–25. [Google Scholar]
Tai, K.S.; Socher, R.; Manning, C.D. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Beijing, China, 26 June–1 July 2015; pp. 1556–1566. [Google Scholar]
Gu, J.; Chen, Z.; Monperrus, M. Multimodal Representation for Neural Code Search. arXiv 2021, arXiv:2107.00992. [Google Scholar]
Zeng, C.; Yu, Y.; Li, S.; Xia, X.; Wang, Z.; Geng, M.; Bai, L.; Dong, W.; Liao, X. deGraphCS: Embedding Variable-based Flow Graph for Neural Code Search. ACM Trans. Softw. Eng. Methodol. 2023, 32, 34. [Google Scholar] [CrossRef]
Liu, S.; Xie, X.; Siow, J.; Ma, L.; Meng, G.; Liu, Y. GraphSearchNet: Enhancing GNNs via Capturing Global Dependencies for Semantic Code Search. IEEE Trans. Softw. Eng. 2023, 49, 2839–2855. [Google Scholar] [CrossRef]
Yang, X.; Yan, M.; Pan, S.; Ye, X.; Fan, D. Simple and Efficient Heterogeneous Graph Neural Network. arXiv 2023, arXiv:2207.02547. [Google Scholar] [CrossRef]
Fu, X.; King, I. MECCH: Metapath Context Convolution-based Heterogeneous Graph Neural Networks. Neural Netw. 2024, 170, 266–275. [Google Scholar] [CrossRef]
Zhou, W.; Huang, H.; Shi, R.; Yin, K.; Jin, H. An Efficient Subgraph-Inferring Framework for Large-Scale Heterogeneous Graphs. In Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI 2024), Vancouver, BC, Canada, 20–27 February 2024; pp. 9431–9439. [Google Scholar]
Ling, X.; Wu, L.; Wang, S.; Pan, G.; Ma, T.; Xu, F.; Liu, A.X.; Wu, C.; Ji, S. Deep Graph Matching and Searching for Semantic Code Retrieval. ACM Trans. Knowl. Discov. Data 2021, 15, 88. [Google Scholar] [CrossRef]
Li, J.; Peng, H.; Cao, Y.; Dou, Y.; Zhang, H.; Yu, P.S.; He, L. Higher-Order Attribute-Enhancing Heterogeneous Graph Neural Networks. IEEE Trans. Knowl. Data Eng. 2023, 35, 560–574. [Google Scholar] [CrossRef]
Schlichtkrull, M.; Kipf, T.N.; Bloem, P.; van den Berg, R.; Titov, I.; Welling, M. Modeling Relational Data with Graph Convolutional Networks. arXiv 2017, arXiv:1703.06103. [Google Scholar] [CrossRef]
Wang, X.; Ji, H.; Shi, C.; Wang, B.; Cui, P.; Yu, P.S.; Ye, Y. Heterogeneous Graph Attention Network. arXiv 2021, arXiv:1903.07293. [Google Scholar]
Hu, F.; Wang, Y.; Du, L.; Li, X.; Zhang, H.; Han, S.; Zhang, D. Revisiting Code Search in a Two-Stage Paradigm. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, Singapore, 27 February–3 March 2023; pp. 994–1002. [Google Scholar]
Zhu, Q.; Sun, Z.; Liang, X.; Xiong, Y.; Zhang, L. OCoR: An Overlapping-Aware Code Retriever. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, Melbourne, Australia, 21–25 September 2020; pp. 883–894. [Google Scholar]
Ferrante, J.; Ottenstein, K.J.; Warren, J.D. The Program Dependence Graph and Its Use in Optimization. ACM Trans. Program. Lang. Syst. 1987, 9, 319–349. [Google Scholar] [CrossRef]
Deng, Z.; Xu, L.; Liu, C.; Yan, M.; Xu, Z.; Lei, Y. Fine-Grained Co-Attentive Representation Learning for Semantic Code Search. In Proceedings of the 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), Honolulu, HI, USA, 15–18 March 2022; pp. 396–407. [Google Scholar]
Shuai, J.; Xu, L.; Liu, C.; Yan, M.; Xia, X.; Lei, Y. Improving Code Search with Co-Attentive Representation Learning. In Proceedings of the 28th International Conference on Program Comprehension, Seoul, Republic of Korea, 5–11 October 2020; pp. 196–207. [Google Scholar]
Gu, W.; Wang, Y.; Du, L.; Zhang, H.; Han, S.; Zhang, D.; Lyu, M. Accelerating Code Search with Deep Hashing and Code Classification. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, 22–27 May 2022; pp. 2534–2544. [Google Scholar]
Feng, Z.; Guo, D.; Tang, D.; Duan, N.; Feng, X.; Gong, M.; Shou, L.; Qin, B.; Liu, T.; Jiang, D.; et al. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. arXiv 2020, arXiv:2002.08155. [Google Scholar]
Guo, D.; Lu, S.; Duan, N.; Wang, Y.; Zhou, M.; Yin, J. UniXcoder: Unified Cross-Modal Pre-training for Code Representation. arXiv 2022, arXiv:2203.03850. [Google Scholar]

Figure 1. Model architecture of FPGraphCS.

Figure 2. An example demonstrating the integrity of code structure.

Figure 3. Flowchart of node context statement extraction process. (a) Node content tranformation. (b) Intra-metapath aggregation. (c) Across metapaths aggregation.

Figure 4. Predefined metapaths and neighbor attention. (a) Predefined metapaths. (b) Neighbor attention.

Figure 5. Different code length and comment length. (a) Code length. (b) Comment length.

Figure 6. Micro-F1 and average training-epoch time for heterogeneous graph neural networks on the DBLP dataset.

Table 1. Classification performance on the DBLP and IMDB datasets.

Model	DBLP (macro-F1)	DBLP (micro-F1)	IMDB (macro-F1)	IMDB (micro-F1)
MAGNN	93.61	94.07	60.79	60.93
MAGNN *	93.82	94.23	61.21	61.26
MAGNN ^†	93.27	93.61	60.12	60.30

* means removing neighbor attention, and ^† means removing semantic attention.

Table 2. CodeSearchNet corpus statistics by language.

Language	w/Documentation	All
Go	347,789	726,768
Java	542,991	1,569,889
JavaScript	157,988	1,857,835
PHP	717,313	977,821
Python	503,502	1,156,085
Ruby	57,393	164,048
All	2,326,976	6,452,446

Table 3. Statistical description of the experimental datasets.

Dataset	Sample Size (Percentage)
Training set	393,008 (91.64%)
Validation set	12,608 (2.94%)
Test set	23,251 (5.42%)

Table 4. Statistical description of heterogeneous graph datasets.

Dataset	Nodes	Edges	Node Types
DBLP	26,128	119,783	4
IMDB	11,616	17,106	3

Table 5. Performance of different methods on the CodeSearchNet dataset.

Method	MRR	ACC@1	ACC@5	ACC@10
NCS	0.367	0.288	0.454	0.455
DeepCS	0.461	0.358	0.579	0.659
NBoW	0.544	0.447	0.660	0.725
MMAN	0.494	0.381	0.630	0.719
DGMS	0.465	0.339	0.613	0.706
MRNCS	0.596	0.503	0.706	0.768
FPGraphCS	0.651	0.549	0.775	0.842

Table 6. Ablation study on the CodeSearchNet dataset.

Method	MRR	ACC@1	ACC@5	ACC@10
FPGraphCS–CFG	0.624	0.522	0.750	0.821
FPGraphCS–AST	0.629	0.521	0.575	0.789
FPGraphCS–DDG	0.529	0.481	0.719	0.790
FPGraphCS	0.642	0.539	0.767	0.834

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ao, L.; Qi, R. Early-Stage Graph Fusion with Refined Graph Neural Networks for Semantic Code Search. Appl. Sci. 2026, 16, 12. https://doi.org/10.3390/app16010012

AMA Style

Ao L, Qi R. Early-Stage Graph Fusion with Refined Graph Neural Networks for Semantic Code Search. Applied Sciences. 2026; 16(1):12. https://doi.org/10.3390/app16010012

Chicago/Turabian Style

Ao, Longhao, and Rongzhi Qi. 2026. "Early-Stage Graph Fusion with Refined Graph Neural Networks for Semantic Code Search" Applied Sciences 16, no. 1: 12. https://doi.org/10.3390/app16010012

APA Style

Ao, L., & Qi, R. (2026). Early-Stage Graph Fusion with Refined Graph Neural Networks for Semantic Code Search. Applied Sciences, 16(1), 12. https://doi.org/10.3390/app16010012

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Early-Stage Graph Fusion with Refined Graph Neural Networks for Semantic Code Search

Abstract

1. Introduction

2. Related Work

2.1. Text-Based Code Representation

2.2. Structural Feature-Based Code Representation

2.3. Heterogeneous Graph Representation Learning Methods

3. Methodology

3.1. Model Framework

3.2. Functional Program Graph

3.3. Code Feature Extraction

3.3.1. Node Statement Feature Extraction

3.3.2. Node Statement Context Feature Extraction

3.3.3. Feature Fusion

3.4. Model Training

4. Experiments

4.1. Dataset

4.2. Evaluation Metrics

4.3. Implementation Details

4.4. Results and Analysis

4.4.1. Performance Study

4.4.2. Baseline Rationale

4.4.3. Ablation Study

4.4.4. Sensitivity Analysis

4.4.5. Time Analysis

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI