Contract-Graph Fusion and Cross-Graph Matching for Smart-Contract Vulnerability Detection

Liang, Xue; Tan, Yao; Song, Jun; Yang, Fan

doi:10.3390/app151910844

Open AccessArticle

Contract-Graph Fusion and Cross-Graph Matching for Smart-Contract Vulnerability Detection

¹

School of Computer Science, China University of Geosciences, Wuhan 430074, China

²

School of Computing, Newcastle University, Urban Science Building, Newcastle upon Tyne NE4 5TG, UK

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(19), 10844; https://doi.org/10.3390/app151910844

Submission received: 9 September 2025 / Revised: 1 October 2025 / Accepted: 7 October 2025 / Published: 9 October 2025

Download

Browse Figures

Versions Notes

Abstract

Smart contracts empower many blockchain applications but are exposed to code-level defects. Existing methods do not scale to the evolving code, do not represent complex control and data flows, and lack granular and calibrated evidence. To address the above concerns, we present an across-graph corresponding contract-graph method for vulnerability detection: abstract syntax, control flow, and data flow are fused into a typed, directed contract-graph whose nodes are enriched with pre-code embeddings (GraphCodeBERT or CodeT5+). A Graph Matching Network (GMN) with cross-graph attention compares contract-graphs, aligns homologous sub-graphs associated with vulnerabilities, and supports the interpretation of statements at the level of balance between a broad structural coverage and a discriminative pairwise alignment. The evaluation follows a deployment-oriented protocol with thresholds fixed for validation, multi-seed averaging, and a conservative estimate of sensitivity under low-false-positive budgets. On SmartBugs Wild, the method consistently and markedly exceeds strong rule-based and learning baselines and maintains a higher sensitivity to matching false-positive rates; ablations track the gains to multi-graph fusion, pre-trained encoders, and cross-graph matching, stable through seeds.

Keywords:

smart contracts; blockchain security; software vulnerability detection; graph neural networks; graph matching

1. Introduction

Smart contracts are the core element of the blockchain stack and face persistent security risks due to code-level weaknesses. Many vulnerabilities are related to Solidity features and suboptimal development practices [1]. High-impact incidents in 2023, including Lido DAO and Deus DAO with losses over USD

6.5

million [2], and other events, such as Euler Finance and CertiK [3], show that local design or implementation flaws can escalate to protocol-level failures. As decentralized finance and other critical applications are more dependent on chain execution [3], detection methods must balance accuracy, scalability and operational reliability.

Vulnerability detection has therefore become a central theme of blockchain security [4]. Traditional program analysis, including static analysis and formal verification, can provide strong guarantees for some bugs, but it is difficult to adapt to large heterogeneous code bases and rapidly changing attack surfaces. Data-driven methods reduce part of this gap. Earlier neural approaches treated code as token sequences or shallow characteristics and could not capture program structure or data dependencies [5]. Large language models improve context modeling, but are expensive and often underrepresent complex control and data flows [6]. Graph-based analysis addresses these limitations by modeling code as graphs, so that graph neural networks combine structural and semantic signals and reason over paths and dependencies. Many existing graph methods still rely on a single modality such as the abstract syntax tree (AST), the control-flow graph (CFG), or the data-flow graph (DFG), or operate only at the contract level, which limits recognition of patterns that span structural views or recur across contracts [5,7].

Integrating multimodal program modeling with cross-contract pattern recognition is central to smart-contract vulnerability detection. At the representation level, a typed directed multi-graph that fuses abstract syntax, control flow, and data flow makes interactions among external calls, execution order, and state variable dependencies explicit, which closes blind spots left by single-view methods [5]. While such structural representation is helpful, it alone is insufficient. Contextual semantics from modern code encoders such as GraphCodeBERT and CodeT5+ add language-aware and domain-aware signals that improve discrimination under varied coding styles and obfuscation [8,9]. Effective generalization across contracts then requires alignment, where attention-based graph matching focuses inference on homologous sub-graphs that carry vulnerability evidence and improves performance on unseen contracts while preserving assertion-level interpretability [7]. Audit realism also matters. Threshold calibration, multi-seed reporting, and conservative estimates under low-FPR budgets control review cost without reducing sensitivity; on large real-world corpora such as SmartBugs Wild, multi-graph and cross-graph pipelines consistently surpass single-view baselines, with clear gains in low-FPR regimes [10].

Building on prior work and real-world audit requirements, this study is structured to achieve two practitioner-relevant objectives: robust cross-contract generalization and calibrated performance under low-false-positive budgets. Our contributions are threefold. First, we propose a unified multi-graph representation that fuses the AST, CFG, and DFG, enabling joint modeling of syntax, control, and data dependencies in a single space and improving coverage of execution semantics, including cross-dimensional triggers often missed by single-view methods. Second, we enrich node features with contextual semantics via GraphCodeBERT and CodeT5+, strengthening discrimination between benign and vulnerable variants across heterogeneous coding styles and obfuscation, thereby improving robustness. Third, we employ a Graph Matching Network with cross-graph attention to align homologous sub-graphs and concentrate evidence on vulnerability-relevant regions, which supports assertion-level cues and downstream localization. We further adopt a deployment-oriented evaluation protocol with validation-calibrated thresholds, multi-seed reporting, and conservative estimates of the average normalized positive rate under low-FPR budgets, controlling audit cost without sacrificing sensitivity. On SmartBugs Wild, comprising 47,398 unique Solidity files and about 203,716 contracts, we report sensitivity at 1%, 5%, and 10% operating points. The method attains a macro F1 of 90.18%, surpasses strong baselines in macro precision and recall, and exhibits higher sensitivity in low-FPR regions. Ablations confirm complementary gains from multi-graph fusion, semantic encoders, and cross-graph matching, with results stable across seeds.

The rest of this document is organized as follows: Section 2 examines the detection of vulnerability based on graphs and motivates multi-graph fusion with cross-graph alignment. Section 3 describes contract-graph construction, encoder integration, and matching architecture. Section 4 presents datasets, baselines, measures, and results, including low-false-positive sensitivity and ablations. Section 5 concludes with the limitations and directions for the deployment of unified extensions.

2. Related Works

Traditional vulnerability detection relies primarily on rule-based static and dynamic analyses. Static analysis checks program syntax without execution and can identify common errors, such as buffer overflows, through predefined patterns [11]. However, its dependence on hand-made rules limits its adaptability to previously invisible or semantically complex vulnerabilities. Dynamic analysis, on the other hand, observes execution-dependent behaviors and identifies abnormal paths during runtime [12]. However, its effectiveness is fundamentally limited by the scope of the test, leaving much of the execution space unexplored. These inherent limitations have motivated the transition to neural network-based approaches, in particular graph neural networks (GNNs), which are well adapted to capture the structural and semantic dependencies of source code and thus allow more generalized, robust, and scalable vulnerability detection beyond rigid rules or incomplete runtime traces.

Recent studies model the program code as graphs to combine structural and semantic hints. However, unique encoding (AST, CFG, or DFG only) often misses the interactions between control and data that trigger vulnerabilities. Multi-graph and heterogeneous formulations address this gap by coding syntax, control flow, and data dependencies together, and report more stable detection and localization at the function and statement levels [5,8,13]. In smart contracts, attention mechanisms further separate control signals from data signals and mitigate false correlations, improving trigger location through dual or heterogeneous attention [7,14]. One work on robustness and interpretation uses contextual graphic enhancement and assignment [15]. These results suggest that fusion benefits from explicit relationship separation and task-aware weighting, rather than naive concatenation.

Pre-trained code models provide contextual representations that encode long-term semantics and lexical regularity; CodeT5+ is a representative model for understanding and producing code [9]. In addition to sequence encoders alone, hybrid pipelines that combine sequence-level integrations with graph encoders consistently improve software vulnerability detection and just-in-time (JIT) scenarios by injecting structural biases from program graphs [5,8,16]. Recent studies show that graph-aware alignment and the transmission of cross-graph messages can mitigate the heuristic stack exposed in ablation analyses by selectively transferring information across modalities and attenuating cross-graph conflicts. These trends are consistent with the evidence from hybrid static–dynamic pipelines and heterogeneous graph architectures, and they motivate tighter graph–text connections, as well as more stringent evaluation protocols to address data leakage, fragility under small edits, and generalization gaps [17,18,19].

In the field of smart contracts, surveys synthesize attack taxonomies, data sources, assessment practices, and deployment concerns, emphasizing the trade-offs between coverage, interpretability, and operational costs [20,21,22,23,24]. Contract-related learners extend coverage through transfer and multi-label learning [25], while program analysis and measurement studies reveal risks rooted in contract-specific semantics and inter-contract interactions [26]. Public datasets and benchmarks (e.g., SmartBugs Wild) enable reproducible evaluation and large-scale error analysis [10,27,28]. In these threads, graph learning and pre-training are complementary: robust detection depends on the modeling of fine-grained interactions between syntax, control flow, and data flow. Our work follows this direction by fusing the AST, CFG, and DFG under a common spine with cross-sectional alignment and attention, aiming to precisely locate triggers and provide evidence-based reports at a practical cost. Recent heterogeneous and semantic graph approaches (e.g., HGAT [29], MVD-HG [30], and SCVHunter [7]) empirically demonstrate the effectiveness of multi-graph, alignment-based and attention-based designs by showing consistent gains in widely used smart-contract benchmarks and ablation settings, and further underline the need for transparent and reproducible evaluation protocols.

3. Proposed Approach

3.1. Overview

The proposed framework detects smart-contract vulnerabilities by the similarity of the contract-graph (Figure 1). Given a pair of Solidity contracts, it generates a similarity score that reflects the likelihood of shared vulnerabilities. During the construction of the graphs, marked training contracts (safe or vulnerable) are preprocessed to remove irrelevant code; static analysis extracts the AST, CFG, and DFG, which are merged into a structural and semantic context of directed multi-graph coding for the pre-trained encoder. In the cross-graph similarity phase, the GMN consumes graph pairs and learns vulnerability discrimination characteristics by optimizing supervised losses. In inference, the target contract is iteratively paired with vulnerable examples of the training set, converted to graphic form, and measured by the trained GMN using cosine similarity. If a similarity point exceeds a predefined threshold, the contract is labeled as vulnerable; otherwise, after evaluating all examples, it is classified as secure.

3.2. Contract-Graph Construction

The contract-graph construction process transforms smart-contract source code into a structured graph representation by integrating key code features through static analysis, which involves two main steps: information extraction and information integration. First, static analysis tools extract three core structures from contract code: AST, CFG, and DFG. The AST captures the syntactic structure of code statements, and these statements are identified by nodeType. The CFG models execution paths and control logic, and the DFG tracks variable usage and inter-statement data dependencies. Afterwards, these structures are integrated into a unified contract-graph. Each CFG node is replaced by its corresponding AST subtree to preserve syntactic details, and redundant nodes are pruned during this replacement. Two virtual nodes, v_fun and v_loop, are introduced to capture the function and the loop contexts, respectively. Directed edges of the AST, CFG, and DFG are combined, with type labels indicating the origin of each edge. This merged graph contains both structural and semantic information: structural information contains control flow and data flow, while semantic information contains syntactic type. Its visual workflow is illustrated in Figure 2.

Given a contract S with statement set

U

and variable set

V

, static analyses derive the abstract syntax tree

T_{ast} = P_{ast} (S)

, the control-flow graph

G_{cfg} = (V_{cfg}, E_{cfg})

, and the data-flow graph

G_{dfg} = (V_{dfg}, E_{dfg})

. For each

u \in U

, a parser mapping

m : U \to 2^{V_{ast}}

selects the associated AST node subset, yielding a pruned subtree

T_{u} = T_{ast} [m (u)]

. The contract is represented as a typed multi-graph

CG (S) = (V, E, X, ψ, τ)

with

E = E_{ast} \cup E_{cfg} \cup E_{dfg}

, edge types

ψ : E \to {ast, cfg, dfg}

, and node types

τ : V \to Σ_{node}

; auxiliary vertices

v_{fun}

and

v_{loop}

encode function and loop scopes. Construction replaces eligible CFG nodes by the corresponding

T_{u}

, reconnects incident CFG edges to the subtree root, and updates

(ψ, τ)

consistently. Node features

X \in R^{| V | \times d}

combine contextual embeddings with structural descriptors, and a token-level alignment

π : V \to 2^{{1, \dots, L}}

supports pooling and evidence tracing. Under a single-pass extraction regime, the end-to-end assembly runs in

O (| V | + | E |)

time; control-/data-flow metadata are obtained with Slither [31], and AST structures (including nodeType) with solc-typed-ast [32]. The complete pipeline is summarized in Algorithm 1.

Algorithm 1 BuildContractGraph: extraction and fusion

Require:: Source code S, tokenizer, static analyzers
Ensure:: $CG (S) = (V, E, X, T)$ and alignment $π$
1:: $T \leftarrow P_{ast} (S)$ ; $(V_{cfg}, E_{cfg}) \leftarrow CFG (S)$ ; $(V_{dfg}, E_{dfg}) \leftarrow DFG (S)$
2:: Build $m (u) = V (T_{u})$ for $u \in U$ ; record token alignment $π$
3:: Replace each CFG node by $T_{u}$ ; reconnect links; prune redundant leaves
4:: Add $v_{fun}, v_{loop}$ ; connect statements within each function/loop
5:: Set $E = E_{ast} \cup E_{cfg} \cup E_{dfg}$ with type maps $ψ, τ$
6:: return $CG (S) = (V, E, ⌀, T)$ and $π$ ▹X set in Section 3.3

3.3. Pre-Training Phase

In contrast to GraphCodeBERT, which adopts a DFG-centric input, this approach integrates the AST, CFG, and DFG into a unified contract-graph (CG). Let

G (S) = (V, E)

denote the CG for source code S. Let

V = Var = {v_{1}, \dots, v_{k}}

be the variable set. Let

E = {e_{1}, \dots, e_{ℓ}}

be the set of directed, typed edges that encode AST, control-flow, and data-flow relations. The input sequence is

X = {[CLS], S, [SEP], Var}

with total length L. It is embedded into

H^{0} \in R^{L \times d}

using token embeddings and positional encodings. Variable tokens receive specialized positional encodings to capture data-flow structure.

A 12-layer Transformer processes

H^{0}

with masked multi-head attention:

Attn (Q, K, V; M) = softmax (\frac{Q K^{⊤}}{\sqrt{d}} + M) V,

where Q, K, and V are the query, key, and value projections, d is the feature dimension, and M is the attention mask. The mask permits attention between adjacent tokens, CG-connected tokens (via AST/CFG/DFG edges), code–variable matches, and special tokens ([CLS], [SEP]). Other entries are set to

- \infty

. For layer

n \in [1, 12]

,

G^{n} = LN (Attn (H^{n - 1}) + H^{n - 1}), H^{n} = LN (FFN (G^{n}) + G^{n}),

with

LN

denoting layer normalization and

FFN

a two-layer feed-forward network.

Node and edge features for downstream tasks are derived from

H^{12}

. For node i, features are obtained by pooling over the token alignment

π (i)

and concatenating type encodings and Skip-gram embeddings:

x_{i} = [pool (H^{12} [π (i)]); W_{type} 1_{τ (i)}; e_{i}^{sg}],

where

τ (i)

is the node type,

W_{type}

is the type-embedding matrix, and

e_{i}^{sg}

is the Skip-gram embedding. Edge features

α_{i j}

include edge-type encodings and optional Skip-gram components to capture relational context.

Pre-training optimizes three objectives with a weighted sum:

L = λ_{1} L_{MLM} + λ_{2} L_{DFG} + λ_{3} L_{Align} .

Here,

L_{MLM}

(masked language modeling) [33] learns syntax and semantics,

L_{DFG}

models variable dependencies and data flow, and

L_{Align}

aligns variable representations between source code and data-flow views. This design fuses static structure from the AST/CFG with data-flow semantics from the DFG and supports smart-contract vulnerability detection.

3.4. Cross-Graph Similarity with a Contract-Graph Matcher

Vulnerability detection is formulated as cross-graph similarity between two contract-graphs

{CG}_{1} = (V_{1}, E_{1}, X_{1})

and

{CG}_{2} = (V_{2}, E_{2}, X_{2})

. Graph

{CG}_{1}

is derived from a known vulnerable contract, and

{CG}_{2}

is obtained from a target contract. Nodes correspond to statements or variables. Edges encode abstract syntax, control-flow, and data-flow relations. Feature matrices

X_{1}

and

X_{2}

are produced by a shared pre-trained encoder to ensure consistent representations across graphs.

Node and edge embeddings are initialized with multilayer perceptrons to place inputs in a common latent space suitable for graph matching:

η_{i}^{(0)} = {MLP}_{node} (x_{i}), e_{i j} = {MLP}_{edge} (α_{i j}),

where

η_{i}^{(0)}

is the initial hidden state of node i,

x_{i}

is the preprocessed node feature, and

α_{i j}

is the raw edge feature for

(i, j)

. The embedding

e_{i j}

encodes edge attributes in the same latent space as node states.

Each iteration t applies two operations: intra-graph aggregation and cross-graph attention. Intra-graph aggregation preserves local dependencies and computes messages between neighbors:

μ_{j \to i}^{(t)} = f_{m} (η_{i}^{(t)}, η_{j}^{(t)}, e_{i j}), (i, j) \in E_{1} \cup E_{2},

where

f_{m}

is a feed-forward message function. Cross-graph attention enables interaction between nodes across graphs to compare vulnerability patterns. Attention weights are

a_{i j}^{(t)} = {softmax}_{j} ({(η_{i}^{(t)})}^{⊤} W η_{j}^{' (t)}), v_{j \to i}^{(t)} = a_{i j}^{(t)} η_{j}^{' (t)},

where W is a trainable projection and

η_{j}^{' (t)}

denotes the hidden state of a node from the opposite graph. Node updates fuse the previous state with aggregated messages:

η_{i}^{(t + 1)} = f_{u} (η_{i}^{(t)}, \sum_{j} μ_{j \to i}^{(t)}, \sum_{j^{'}} v_{j^{'} \to i}^{(t)}),

with

f_{u}

being a feed-forward update rule. The resulting interactions yield similarity signals that highlight matched substructures indicative of shared vulnerabilities.

The hidden state of node i at iteration

t + 1

is updated by combining three components: the previous state, aggregated intra-graph messages, and aggregated cross-graph messages. The update rule is

η_{i}^{(t + 1)} = f_{u} (η_{i}^{(t)}, \sum_{j : (j, i) \in E_{1} \cup E_{2}} μ_{j \to i}^{(t)}, \sum_{j^{'}} v_{j^{'} \to i}^{(t)}),

where

f_{u}

is a feed-forward function that maps the aggregated inputs to the next hidden state. A typical choice instantiates

f_{u}

as a multilayer perceptron with a nonlinearity and optional gating; residual connections and layer normalization can be applied to improve stability. The cross-graph attention that yields

v_{j^{'} \to i}^{(t)}

is illustrated in Figure 3.

After T iterations, node embeddings are aggregated into graph-level representations by a permutation-invariant readout

f_{a}

. Common choices include mean pooling and max pooling; the resulting vectors summarize global vulnerability signals in a contract-graph:

g_{1} = f_{a} ({η_{i}^{(T)}}_{i \in V_{1}}), g_{2} = f_{a} ({η_{j}^{(T)}}_{j \in V_{2}}) .

Cosine similarity is then computed after

ℓ_{2}

normalization to remove scale effects:

{\hat{g}}_{1} = \frac{g_{1}}{{∥ g_{1} ∥}_{2}}, {\hat{g}}_{2} = \frac{g_{2}}{{∥ g_{2} ∥}_{2}}, s = {\hat{g}}_{1}^{⊤} {\hat{g}}_{2} .

The similarity score

s \in [- 1, 1]

quantifies alignment between two contracts. Values close to 1 indicate high similarity and a greater likelihood of shared vulnerabilities. Values near

- 1

indicate low similarity.

The pipeline is computationally efficient for large-scale analysis. Contract-graph construction runs in

O (| V | + | E |)

time. For the pre-trained encoder with sequence length L and feature dimension d, each layer incurs

O (L^{2})

for self-attention and

O (L d)

for projections and feed-forward blocks. For the GMN with T iterations, intra-graph message passing costs

O (T (| E_{1} | + | E_{2} |))

, and cross-graph attention costs

O (T | V_{1} | | V_{2} |)

. This complexity profile balances accuracy and efficiency and supports practical deployment in real-world detection workflows.

3.5. Training and Detection

Training uses a margin-based contrastive loss to separate similar and dissimilar contract pairs. Similar pairs comprise contracts that share the same vulnerability type; dissimilar pairs comprise one vulnerable and one secure contract. For a labeled pair

(S_{a}, S_{b}, y)

,

y = 1

denotes a similar pair and

y = 0

denotes a dissimilar pair. The graph embeddings of

S_{1}

and

S_{2}

are

g_{1}

and

g_{2}

. The similarity score is

s = cos (g_{1}, g_{2})

. The loss is

L_{pair} = y {(1 - s)}^{2} + (1 - y) max {(0, s - m)}^{2},

where

m \in (0, 1)

is a margin hyperparameter. The margin enforces a minimum separation for dissimilar pairs and improves the model’s discrimination between vulnerable and non-vulnerable patterns during training.

During detection, all vulnerable contracts from the training set are embedded to form the reference library

Z

. The library summarizes known vulnerability patterns. A target contract is processed by the same construction and encoding pipeline to obtain its embedding g. The maximum cosine similarity to the library identifies the most relevant pattern:

s (G) = max_{z \in Z} cos (g, z) .

The target is classified as vulnerable if

s (G) \geq τ

, where

τ

is selected on a validation set to balance precision and recall. Precision controls false positives. Recall controls false negatives. This procedure yields a stable operating point for practical deployment. Algorithm 2 outlines graph construction, encoder pre-training and pooling, pairwise optimization of the matcher, construction of the reference library, and maximum-similarity inference.

Algorithm 2 TrainAndDetect: margin-based training and library-guided inference

Require:: Labeled contracts ${(S_{i}, y_{i})}$ ; validation-selected threshold $τ$
1:: Build $CG (S)$ and alignment $π$ via Algorithm 1
2:: Pre-train encoder; extract and pool node/edge features
3:: Train the matcher on pairs $(S_{1}, S_{2}, y)$ by minimizing $L_{pair}$
4:: Assemble vulnerable embedding library $Z \leftarrow {z = embed (S) : y = 1}$
5:: Detect: embed target contract G as g; compute $s (G)$ ; output vulnerable if $s (G) \geq τ$ , else secure

4. Empirical Evaluation

The method is evaluated on three fronts: first, the end-to-end accuracy against the rule-based and learning-based baselines; secondly, the benefit of fusing AST+CFG+DFG over single views; and thirdly, the ablations of the key modules. Sensitivity is also studied under low-false-positive budgets.

4.1. Datasets

The SmartBugs Wild dataset [10] is used as a public corpus of Solidity contracts. It contains

47,398

unique .sol files (approximately

203,716

contracts). Labels cover reentrancy (RE), timestamp dependency (TD), and integer overflow/underflow (IO), plus non-vulnerable files.

4.2. Baselines and Setup

(1): Static analysis and graphs. ASTs are extracted with solc-typed-ast [32]. CFG/DFG and def-use metadata come from Slither [31]. These artifacts are merged into contract-level graphs (Section 3.2) using NetworkX [34].
(2): Encoder and matcher. GraphCodeBERT [33] is used (12 layers, hidden size 768, 12 heads, Adam). The contract-graph matcher uses embedded size 100, four hidden layers, learning rate $10^{- 4}$ , and test time threshold $τ = 0.8$ . Training occurs for 85 iterations (Section 3.4).
(3): Baselines. For transparency and comparability, the baselines are organized into two families under a uniform evaluation interface-rule-based: sFuzz [35], SmartCheck [36], Osiris [37], Oyente [38], and Mythril [39]; and learning-based: LineVul [40], GCN [41], TMP [42], AME [43], Peculiar [44], CBGRU [45], and CGE [46]. All experiments follow the configurations specified in the original works.
(4): Splits and protocol. A 60/20/20 train/validation/test split is used. The thresholds are chosen during validation and fixed during testing. Each experiment uses five random seeds; the means are reported.
(5): Metrics. Precision, recall, and $F_{1}$ follow the standard definitions, which are not repeated here. For completeness, $F_{2}$ and Fowlkes–Mallows (FM) are reported as deterministic functions of $(P, R)$ .

4.3. End-to-End Results

Table 1 reports precision, recall, and macro-averaged

F_{1}

for reentrancy, timestamp dependency, and integer overflow. The proposed framework attains 91.02% macro precision, 89.37% macro recall, and 90.18% macro

F_{1}

, exceeding the strongest baseline, CGE [46], by 4.40, 3.03, and 3.71 percentage points, respectively. Pattern-based tools (sFuzz [35], SmartCheck [36], Mythril [39]) yield only 21.69%–59.52% macro

F_{1}

, reflecting the brittleness of fixed rules, whereas learning-based methods perform more consistently. Within these, GNN-based approaches outpace sequence-only models such as LineVul [40] (75.96% macro

F_{1}

) by leveraging structural context.

Per-category analysis indicates the largest gain on reentrancy: the method achieves 91.29%

F_{1}

, a +6.01 percentage-point improvement over CGE, attributable to fused AST/CFG/DFG representations that capture syntax–execution–dependency interactions. Advantages also hold for timestamp dependency (90.95%, +2.72 pp) and integer overflow (88.30%, +2.41 pp). Auxiliary metrics corroborate these trends: for reentrancy, the

F_{2}

-score is 90.68% and the Fowlkes–Mallows coefficient is 91.30%, underscoring utility in security auditing where both recall and precision are critical. Overall, combining multi-graph fusion, pre-trained semantic encoding, and cross-graph matching addresses limitations of rule-based and sequence-only baselines and yields consistent, across-task improvements.

4.4. Low-FPR Sensitivity

False-positive rate (FPR) is the proportion of non-vulnerable contracts incorrectly classified as vulnerable and is a key consideration in audits where false alarms waste reviewer effort. To compute the observed FPR from available summary statistics, the confusion-matrix relations are inverted using prevalence

π \in (0, 1)

, recall R (true-positive rate), and precision P.

P = \frac{π R}{π R + (1 - π) {FPR}_{⋆}} ⟹ {FPR}_{⋆} = \frac{π R (\frac{1}{P} - 1)}{1 - π},

where

{FPR}_{⋆}

denotes the FPR implied by

(P, R, π)

. Following [17], the low-FPR normalized sensitivity (vertically normalized partial ROC area) is

μ_{TPR} (α) = \frac{1}{α} \int_{0}^{α} TPR (u) d u \in [0, 1], α \in (0, 1),

which emphasizes detector behavior under a false-positive budget

α

. Under the ROC convex-hull (ROCCH) model and the constraint

TPR (u) \leq R

, the tight lower bound is

μ_{TPR}^{lower} (α) = \{\begin{matrix} (\frac{R}{2}) \frac{α}{{FPR}_{⋆}}, & α \leq {FPR}_{⋆}, \\ R (1 - \frac{{FPR}_{⋆}}{2 α}), & α > {FPR}_{⋆}, \end{matrix}

which is continuous at

α = {FPR}_{⋆}

. The protocol fixes the prevalence at

π = 5 %

and reports

μ_{TPR}^{lower} (α)

at

α \in {1 %, 5 %, 10 %}

.

Figure 4 plots

μ_{TPR}^{lower} (α)

versus

α

for reentrancy (RE), timestamp dependency (TD), and integer overflow (IO), comparing the proposed method with CGE, Peculiar, CBGRU, and Mythril. The proposed method attains the highest normalized mean TPR across all

α

: for RE, the gains over the baselines range from

+ 11.82

to

+ 54.91

percentage points at

α = 1 %

, from

+ 4.97

to

+ 40.09

at

α = 5 %

, and from

+ 3.95

to

+ 31.96

at

α = 10 %

. All curves are monotonic non-decreasing and are upper-bounded by recall R, consistent with the ROCCH constraint.

These advantages arise from the fused AST/CFG/DFG representation, which captures syntax–control–data interactions, and from cross-graph matching that aligns vulnerability-relevant sub-graphs to reduce false positives. In contrast, the pattern-based Mythril [36] is constrained by fixed rules and shows limited adaptability to code variants; the sequence-based CBGRU underutilizes structural context; and Peculiar [44] emphasizes structural patterns yet may underexploit semantic cues. As a result, these baselines struggle to balance TPR and FPR in the low-FPR regime, whereas the proposed method delivers more favorable trade-offs for practical audits.

4.5. Visual Diagnostics

Figure 5 compares

F_{1}

bars with low recall and precision markers. When the two markers are in close proximity to the top of a bar, the operating threshold is well calibrated; a visible separation indicates asymmetric errors, i.e., a preference for accuracy over recall or vice versa. In reentrancy, timestamp dependency, and integer overflow, the proposed method appears as the rightmost bar (the diagram is sorted by

F_{1}

), showing the highest

F_{1}

in each class. It also shows minimal marker gaps. For reentry, its

F_{1}

is

91.29 %

(consistent with Table 1); the almost-overlap of recall and precision markers further indicates a balanced precision trade-off.

Among the baselines, CGE is the closest competitor; however, for reentrancy, its recall is about four percentage points lower than its precision, suggesting a lack of detection of positive cases at the selected threshold and reflecting the lack of explicit cross-graph interaction modeling. Peculiar shows a pre-precision profile with a 3–5-percentage-point gap between precision and recall markers, while CBGRU suppresses the recall for an integer overflow with a gap of about six percentage points. Rule-based tools such as Mythril occupy the leftmost positions (the lowest

F_{1}

) and display the largest marker gaps, in line with the fragility of fixed rules under code variants. Overall, our combined AST/CFG/DFG representation and cross-graph matching method reduces false positives without sacrificing recall, resulting in a more favorable precision–recall balance for audit-oriented use.

4.6. Fusion and Ablations

Table 2 reports per-class

F_{1}

for single-graph input variants and ablations of the proposed framework. The full model fuses the AST, CFG, and DFG and attains the best performance—

91.29 %

on reentrancy,

90.95 %

on timestamp dependency, and

88.30 %

on integer overflow—surpassing all single-graph inputs. The AST-only variant peaks at

84.16 %

F_{1}

, reflecting its inability to encode control- or data-flow constraints, whereas the DFG-only variant, the strongest single-graph baseline at

86.41 %

F_{1}

, still underperforms on control-intensive patterns due to limited modeling of execution structure.

Ablations corroborate the contribution of each component. Replacing the fused graph with DFG alone yields consistent drops across classes. Removing the pre-trained encoder (GraphCodeBERT) reduces the reentrancy

F_{1}

to

82.78 %

(

- 8.51

percentage points relative to the full model), indicating that semantic priors are necessary beyond structural cues. Eliminating the cross-graph matcher further degrades performance by weakening alignment of vulnerability-relevant sub-graphs across contracts. Taken together, multi-graph fusion, pre-trained semantics, and cross-graph alignment act synergistically to provide more comprehensive code representations and higher detection accuracy.

5. Conclusions and Future Work

This work presents the GMN, a framework that unites multi-graph fusion, semantic improvement with GraphCodeBERT, and cross-graph matching for smart-contract vulnerability detection. By combining the AST, CFG, and DFG into a single contract-graph, this method captures syntax, execution logic, and variable dependency in a single representation. The pre-developed encoder provides contextual semantic embeddings for nodes and edges, and a network corresponding to the graph with cross-graph attention aligns vulnerability-related sub-graphs across contracts, so that the inference focuses on evidenced code regions and reduces false positives. Experiments on SmartBugs Wild show that the GMN achieves a macro-average (

F_{1}

) of

90.18 %

across reentry, time dependency, and integer overflow, surpasses rule- and learning-based baselines, and offers superior performance in low-false-positive regimes, supporting audit-oriented use. At the same time, the current scope focuses on single-contract analysis and assumes static adaptation to the evolving characteristics of Solidity. Future work will model inter-contract interactions and transaction-level behaviors, introduce adaptive and continuous learning to follow emerging vulnerabilities and language updates, and evaluate the robustness of code perturbations and oppositional corrections. We will also strengthen transparency through calibrated threshold selection, reproducible protocols, and evidence-based reports linking each detection to its supporting sub-graphs.

Author Contributions

Conceptualization, F.Y.; Methodology, X.L.; Software, Y.T.; Supervision, J.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding. The APC was funded by the authors.

Data Availability Statement

The data used to support the findings of this study are unavailable due to privacy or ethical restrictions and cannot be publicly shared.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Ivanov, N.; Li, C.; Yan, Q.; Sun, Z.; Cao, Z.; Luo, X. Security threat mitigation for smart contracts: A comprehensive survey. ACM Comput. Surv. 2023, 55, 326. [Google Scholar] [CrossRef]
Tsai, C.C.; Lin, C.C.; Liao, S.W. Unveiling vulnerabilities in DAO: A comprehensive security analysis and protective framework. In Proceedings of the 2023 IEEE International Conference on Blockchain (Blockchain 2023), Danzhou, China, 17–21 December 2023; pp. 151–158. [Google Scholar]
Wu, H.; Yao, Q.; Liu, Z.; Huang, B.; Zhuang, Y.; Tang, H.; Liu, E. Blockchain for finance: A survey. IET Blockchain 2024, 4, 101–123. [Google Scholar] [CrossRef]
Li, S.; Zhou, Y.; Wu, J.; Li, Y.; Liu, X.; Zhou, J.; Zhang, Y. Survey of vulnerability detection and defense for Ethereum smart contracts. IEEE Trans. Netw. Sci. Eng. 2023, 10, 2419–2437. [Google Scholar]
Qiu, F.; Liu, Z.; Hu, X.; Xia, X.; Chen, G.; Wang, X. Vulnerability Detection via Multiple-Graph-Based Code Representation. IEEE Trans. Softw. Eng. 2024, 50, 2178–2199. [Google Scholar] [CrossRef]
Ding, H.; Liu, Y.; Piao, X.; Song, H.; Ji, Z. SmartGuard: An LLM-enhanced framework for smart contract vulnerability detection. Expert Syst. Appl. 2025, 269, 126479. [Google Scholar] [CrossRef]
Luo, F.; Luo, R.; Chen, T.; Qiao, A.; He, Z.; Song, S.; Jiang, Y.; Li, S. SCVHunter: Smart Contract Vulnerability Detection Based on Heterogeneous Graph Attention Network. In Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering (ICSE 2024), Lisbon, Portugal, 14–20 April 2024; pp. 170:1–170:13. [Google Scholar]
Chen, D.; Feng, L.; Fan, Y.; Shang, S.; Wei, Z. Smart contract vulnerability detection based on semantic graph and residual graph convolutional networks with edge attention. J. Syst. Softw. 2023, 202, 111705. [Google Scholar] [CrossRef]
Wang, Y.; Le, H.; Gotmare, A.D.; Bui, N.D.; Li, J.; Hoi, S.C. CodeT5+: Open Code Large Language Models for Code Understanding and Generation. In Proceedings of the 2023 Association for Computational Linguistics Conference on Empirical Methods in Natural Language Processing (EMNLP 2023), Singapore, 6–10 December 2023; pp. 1069–1088. [Google Scholar]
Durieux, T.; Ferreira, J.F.; Abreu, R.; Cruz, P. Empirical Review of Automated Analysis Tools on 47,587 Ethereum Smart Contracts. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, Seoul, Republic of Korea, 27 June–19 July 2020; pp. 530–541. [Google Scholar]
Xiang, J.; Fu, L.; Ye, T.; Liu, P.; Le, H.; Zhu, L.; Wang, W. LuaTaint: A Static Analysis System for Web Configuration Interface Vulnerability of Internet of Things Device. IEEE Internet Things J. 2024, 12, 5970–5984. [Google Scholar] [CrossRef]
Li, Y.; Ma, L.; Shen, L.; Lv, J.; Zhang, P. Open source software security vulnerability detection based on dynamic behavior features. PLoS ONE 2019, 14, E0221530. [Google Scholar] [CrossRef]
Cai, J.; Li, B.; Zhang, J.; Sun, X.; Chen, B. Combine sliced joint graph with graph neural networks for smart contract vulnerability detection. J. Syst. Softw. 2023, 195, 111550. [Google Scholar] [CrossRef]
Zhen, Z.; Zhao, X.; Zhang, J.; Wang, Y.; Chen, H. DA-GNN: A smart contract vulnerability detection method based on Dual Attention Graph Neural Network. Comput. Netw. 2024, 242, 110238. [Google Scholar] [CrossRef]
Cao, S.; Sun, X.; Wu, X.; Lo, D.; Bo, L.; Li, B.; Liu, W. Coca: Improving and Explaining Graph Neural Network-Based Vulnerability Detection Systems. In Proceedings of the 2024 IEEE/ACM International Conference on Software Engineering (ICSE 2024), Lisbon, Portugal, 14–20 April 2024; pp. 155:1–155:13. [Google Scholar]
Hussain, S.; Nadeem, M.; Baber, J.; Hamdi, M.; Rajab, A.; Al Reshan, M.S.; Shaikh, A. Vulnerability detection in Java source code using a quantum convolutional neural network with self-attentive pooling, deep sequence, and graph-based hybrid feature extraction. Sci. Rep. 2024, 14, 7406. [Google Scholar]
Arp, D.; Quiring, E.; Pendlebury, F.; Warnecke, A.; Pierazzi, F.; Wressnegger, C.; Cavallaro, L.; Rieck, K. Pitfalls in Machine Learning for Computer Security. Commun. ACM 2024, 67, 104–112. [Google Scholar] [CrossRef]
Guo, Y.; Bettaieb, S.; Casino, F. A comprehensive analysis on software vulnerability detection datasets: Trends, challenges, and road ahead. Int. J. Inf. Secur. 2024, 23, 3311–3327. [Google Scholar] [CrossRef]
Wang, H.; Tang, Z.; Tan, S.H.; Wang, J.; Liu, Y.; Fang, H.; Xia, C.; Wang, Z. Combining structured static code information and dynamic symbolic traces for software vulnerability prediction. In Proceedings of the 46th International Conference on Software Engineering (ICSE 2024), Lisbon, Portugal, 14–20 April 2024. [Google Scholar]
Jiao, T.; Xu, Z.; Qi, M.; Wen, S.; Xiang, Y.; Nan, G. A survey of Ethereum smart contract security: Attacks and detection. Distrib. Ledger Technol. Res. Pract. 2024, 3, 1–28. [Google Scholar] [CrossRef]
Chu, H.; Zhang, P.; Dong, H.; Xiao, Y.; Ji, S.; Li, W. A survey on smart contract vulnerabilities: Data sources, detection and repair. Inf. Softw. Technol. 2023, 159, 107221. [Google Scholar] [CrossRef]
Wei, Z.; Sun, J.; Zhang, Z.; Zhang, X.; Yang, X.; Zhu, L. Survey on quality assurance of smart contracts. ACM Comput. Surv. 2024, 57, 32. [Google Scholar] [CrossRef]
Vidal, F.R.; Ivaki, N.; Laranjeiro, N. Vulnerability detection techniques for smart contracts: A systematic literature review. J. Syst. Softw. 2024, 217, 112160. [Google Scholar] [CrossRef]
Wu, G.; Wang, H.; Lai, X.; Wang, M.; He, D.; Choo, K.-K.R. A comprehensive survey of smart contract security: State of the art and research directions. J. Netw. Comput. Appl. 2024, 226, 103882. [Google Scholar] [CrossRef]
Sendner, C.; Petzi, L.; Stang, J.; Dmitrienko, A. Smarter Contracts: Detecting vulnerabilities in smart contracts with deep transfer learning (ESCORT). In Proceedings of the Network and Distributed System Security Symposium (NDSS 2023), San Diego, CA, USA, 27 February–3 March 2023; Internet Society: Reston, VA, USA, 2023; pp. 1–18. [Google Scholar]
Ruaro, N.; Gritti, F.; McLaughlin, R.; Grishchenko, I.; Kruegel, C.; Vigna, G. Not your type! Detecting storage collision vulnerabilities in Ethereum smart contracts. In Proceedings of the Network and Distributed System Security Symposium (NDSS 2024), San Diego, CA, USA, 26 February–1 March 2024; Internet Society: Reston, VA, USA, 2024; pp. 1–16. [Google Scholar]
Ferreira, J.F.; Durieux, T.; Maranhao, R. SmartBugs Wild Dataset: 47,398 Smart Contracts from Ethereum. Dataset. 2020. Available online: https://github.com/smartbugs/smartbugs-wild (accessed on 26 August 2025).
Huang, Q.; Zeng, Z.; Shang, Y. An empirical study of integer overflow detection and false positive analysis in smart contracts. In Proceedings of the 8th ACM International Conference on Big Data and Internet of Things (BDIOT 2024), Macau, China, 14–16 September 2024; pp. 247–251. [Google Scholar]
Ma, C.; Liu, S.; Xu, G. HGAT: Smart contract vulnerability detection method based on hierarchical graph attention network. J. Cloud Comput. 2023, 12, 93. [Google Scholar] [CrossRef]
Xu, J.; Wang, T.; Lv, M.; Chen, T.; Zhu, T.; Ji, B. MVD-HG: Multigranularity smart contract vulnerability detection method based on heterogeneous graphs. Cybersecurity 2024, 7, 55. [Google Scholar] [CrossRef]
Feist, J.; Grieco, G.; Groce, A. Slither: A static analysis framework for smart contracts. In Proceedings of the 2019 IEEE/ACM 2nd International Workshop on Emerging Trends in Software Engineering for Blockchain (WETSEB), Montreal, QC, Canada, 27 May 2019; pp. 8–15. [Google Scholar]
Consensys. solc-typed-ast: A Typed Solidity AST Library, Version v18.1.4. GitHub Repository. 2024. Available online: https://github.com/ConsenSys/solc-typed-ast (accessed on 26 April 2024).
Guo, H.; Yu, Y.; Li, X. ContractFuzzer: Fuzzing smart contracts for vulnerability detection. In Proceedings of the 2020 IEEE International Conference on Software Testing, Verification and Validation (ICST), Porto, Portugal, 24–28 October 2020; pp. 191–201. [Google Scholar]
Hasan, M.; Kumar, N.; Majeed, A.; Ahmad, A.; Mukhtar, S. Protein–Protein Interaction Network Analysis Using NetworkX. In Protein–Protein Interactions: Methods and Protocols; Springer: Berlin/Heidelberg, Germany, 2023; pp. 457–467. [Google Scholar]
Nguyen, T.D.; Pham, L.H.; Sun, J.; Lin, Y.; Minh, Q.T. sFuzz: An efficient adaptive fuzzer for Solidity smart contracts. In Proceedings of the 42nd ACM/IEEE International Conference on Software Engineering (ICSE 2020), Seoul, Republic of Korea, 27 June–19 July 2020; pp. 778–788. [Google Scholar]
Tikhomirov, S.; Voskresenskaya, E.; Ivanitskiy, I.; Takhaviev, R.; Marchenko, E.; Alexandrov, Y. SmartCheck: Static analysis of Ethereum smart contracts. In Proceedings of the 1st International Workshop on Emerging Trends in Software Engineering for Blockchain (WETSEB), Goteburg, Sweden, 25 May 2018; pp. 9–16. [Google Scholar]
Torres, C.F.; Schütte, J.; State, R. Osiris: Hunting for Integer Bugs in Ethereum Smart Contracts. In Proceedings of the 2018 Annual Computer Security Applications Conference (ACSAC 2018), San Juan, PR, USA, 3–7 December 2018; pp. 664–676. [Google Scholar]
Luu, L.; Chu, D.-H.; Olickel, H.; Saxena, P.; Hobor, A. Making smart contracts smarter. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS), Vienna, Austria, 24–28 October 2016; pp. 254–269. [Google Scholar]
Chee, C.Y.M.; Pal, S.; Pan, L.; Doss, R. An analysis of important factors affecting the success of blockchain smart contract security vulnerability scanning tools. In Proceedings of the 5th ACM International Symposium on Blockchain and Secure Critical Infrastructure (BSCI 2023), Melbourne, Australia, 10–14 July 2023; pp. 105–113. [Google Scholar]
Fu, M.; Tantithamthavorn, C. LineVul: A transformer-based line-level vulnerability prediction. In Proceedings of the 19th International Conference on Mining Software Repositories (MSR 2022), Pittsburgh, PA, USA, 23–24 May 2022; pp. 608–620. [Google Scholar]
Zhang, H.; Lu, G.; Zhan, M.; Zhang, B. Semi-Supervised Classification of Graph Convolutional Networks with Laplacian Rank Constraints. Neural Process. Lett. 2022, 54, 2645–2656. [Google Scholar] [CrossRef]
Zhuang, Y.; Liu, Z.; Qian, P.; Liu, Q.; Wang, X.; He, Q. Smart contract vulnerability detection using graph neural networks. In Proceedings of the 29th International Joint Conference on Artificial Intelligence (IJCAI 2020), Virtual, 11–17 July 2020; pp. 3283–3290. [Google Scholar]
Liu, Z.; Xu, Q.; Chen, H.; Zhang, W. Hybrid analysis of integer overflow vulnerabilities in Ethereum smart contracts. Future Gener. Comput. Syst. 2021, 119, 91–100. [Google Scholar] [CrossRef]
Wu, H.; Zhang, Z.; Wang, S.; Lei, Y.; Lin, B.; Qin, Y.; Zhang, H.; Mao, X. Peculiar: Smart contract vulnerability detection based on crucial data flow graph and pre-training techniques. In Proceedings of the 32nd IEEE International Symposium on Software Reliability Engineering (ISSRE 2021), Wuhan, China, 25–28 October 2021; pp. 378–389. [Google Scholar]
Zhang, R.; Wang, P.; Zhao, L. Machine learning-based detection of reentrancy vulnerabilities in smart contracts. Future Gener. Comput. Syst. 2022, 127, 362–373. [Google Scholar]
He, L.; Zhao, X.; Wang, Y. GraphSA: Smart Contract Vulnerability Detection Combining Graph Neural Networks and Static Analysis. In Proceedings of the 26th European Conference on Artificial Intelligence ECAI 2023, Krakow, Poland, 30 September–4 October 2023; Frontiers in Artificial Intelligence and Applications. IOS Press: Amsterdam, The Netherlands, 2023; pp. 1026–1036. [Google Scholar]

Figure 1. Overview of the contract-graph matching framework. The static analysis extracts the AST/CFG/DFG and constructs the fused contract-graph (CG). The training stage learns cross-graph similarity with a contrastive loss in CG pairs. The training process integrates a CG target, calculates a similarity point

s (G)

with the matcher, and applies a decision threshold

τ

selected for validation and kept fixed in the test. Solid arrows indicate training; outlined arrows indicate detection.

Figure 1. Overview of the contract-graph matching framework. The static analysis extracts the AST/CFG/DFG and constructs the fused contract-graph (CG). The training stage learns cross-graph similarity with a contrastive loss in CG pairs. The training process integrates a CG target, calculates a similarity point

s (G)

with the matcher, and applies a decision threshold

τ

selected for validation and kept fixed in the test. Solid arrows indicate training; outlined arrows indicate detection.

Figure 2. Contract-graph construction. (a) Source snippet (VictimBank ) with transfer logic to avoid reentry. (b) Abstract syntax tree (AST), (c) data-flow graph (DFG), and (d) control-flow graph (CFG) of static analysis. (e) Fused contract-graph (CG), in which the statement nodes are connected by a type dependency: control (orange) and data (green). For readability, the AST leaves and syntax edges are cut and the virtual anchor nodes (functions and loops) are omitted in this rendering.

Figure 3. Cross-graph attention in the contract-graph matcher. The blue nodes represent a

C G_{1}

sub-graph with edges

e_{12}, e_{23}, e_{1 i}, e_{3 i}

; the red nodes represent a

C G_{2}

sub-graph with edges

e_{a b}, e_{a j}

. The drawn arrows from

v_{i}

in

C G_{1}

to

{v_{a}, v_{b}, v_{j}}

in

C G_{2}

indicate attention.

Figure 3. Cross-graph attention in the contract-graph matcher. The blue nodes represent a

C G_{1}

sub-graph with edges

e_{12}, e_{23}, e_{1 i}, e_{3 i}

; the red nodes represent a

C G_{2}

sub-graph with edges

e_{a b}, e_{a j}

. The drawn arrows from

v_{i}

in

C G_{1}

to

{v_{a}, v_{b}, v_{j}}

in

C G_{2}

indicate attention.

Figure 4. Class-wise low-FPR sensitivity (lower bound) at prevalence

π = 5 %

. The ordinate is the average normalized TPR (%) over

[0, α]

. Vertical dashed lines mark

α \in {1 %, 5 %, 10 %}

.

Figure 4. Class-wise low-FPR sensitivity (lower bound) at prevalence

π = 5 %

. The ordinate is the average normalized TPR (%) over

[0, α]

. Vertical dashed lines mark

α \in {1 %, 5 %, 10 %}

.

Figure 5. Bars represent

F_{1}

. Hollow circles and squares indicate recall and precision. The proposed method is consistently rightmost with aligned markers.

Figure 5. Bars represent

F_{1}

. Hollow circles and squares indicate recall and precision. The proposed method is consistently rightmost with aligned markers.

Table 1. Detection results by method. Columns 2–13 report per-class recall (R), precision (P), and

F_{1}

for reentrancy, timestamp, and integer overflow, with macro-averaged

P / R / F_{1}

(cols. 11–13). The three right-hand columns show the

F_{2}

/FM class calculated from

(P, R)

.

Table 1. Detection results by method. Columns 2–13 report per-class recall (R), precision (P), and

F_{1}

for reentrancy, timestamp, and integer overflow, with macro-averaged

P / R / F_{1}

(cols. 11–13). The three right-hand columns show the

F_{2}

/FM class calculated from

(P, R)

.

Methods	Reentrancy			Timestamp			Integer Overflow			Macro Avg			$F_{2}$ /FM
Methods	$R$	$P$	$F_{1}$	$R$	$P$	$F_{1}$	$R$	$P$	$F_{1}$	$R$	$P$	$F_{1}$	RE	TD	IO
sFuzz	13.99	10.71	12.13	28.05	24.73	26.29	25.66	27.70	26.64	22.57	21.05	21.69	13.18/12.24	27.32/26.34	26.04/26.66
SmartCheck	17.24	46.86	25.21	78.81	47.65	59.39	69.79	41.35	51.93	55.28	45.29	45.51	19.73/28.42	69.69/61.28	61.35/53.72
Osiris	62.82	39.91	48.81	53.65	59.85	56.58	61.33	41.79	49.71	59.27	47.18	51.70	56.35/50.07	54.79/56.67	56.09/50.63
Oyente	63.20	45.08	52.62	57.01	59.17	58.07	58.13	58.53	58.33	69.45	54.26	56.34	58.50/53.38	57.43/58.08	58.21/58.33
Mythril	76.00	43.22	55.10	50.00	58.05	53.73	70.73	68.73	69.72	65.58	56.67	59.52	65.99/57.31	51.43/53.87	70.32/69.72
LineVul	72.84	83.57	77.84	65.80	88.90	75.63	73.42	75.45	74.42	70.69	82.64	75.96	74.76/78.02	69.41/76.48	73.82/74.43
GCN	74.37	73.70	74.03	79.25	74.03	76.55	71.02	68.61	69.79	74.88	72.11	73.46	74.24/74.03	78.15/76.60	70.52/69.80
TMP	76.16	76.26	76.21	74.52	78.36	76.39	68.58	71.62	70.07	73.09	75.41	74.22	76.18/76.21	75.26/76.42	69.17/70.08
AME	79.71	81.31	80.50	82.24	80.98	81.61	69.48	71.75	70.60	77.14	78.01	77.57	80.02/80.51	81.98/81.61	69.92/70.61
Peculiar	80.53	85.39	82.89	87.94	87.60	87.77	83.72	84.23	83.97	84.06	85.74	84.88	81.46/82.92	87.87/87.77	83.82/83.97
CBGRU	81.70	85.16	83.39	81.68	81.51	81.59	79.48	80.29	79.88	80.95	82.32	81.62	82.37/83.41	81.65/81.59	79.64/79.88
CGE	86.78	83.83	85.28	87.39	89.09	88.23	84.86	86.95	85.89	86.34	86.62	86.47	86.17/85.29	87.72/88.24	85.27/85.90
Proposed	90.27	92.34	91.29	90.38	91.54	90.95	87.45	89.17	88.30	89.37	91.02	90.18	90.68/91.30	90.61/90.96	87.79/88.31

Table 2. Single-view inputs and ablations on three vulnerabilities.

Methods	Reentrancy			Timestamp			Integer Overflow
Methods	$R$ (%)	$P$ (%)	$F_{1}$ (%)	$R$ (%)	$P$ (%)	$F_{1}$ (%)	$R$ (%)	$P$ (%)	$F_{1}$ (%)
AST only	82.19	86.23	84.16	82.47	85.68	84.04	80.34	83.75	82.00
CFG only	83.45	86.78	85.08	83.19	85.27	84.22	81.27	83.53	82.39
DFG only	84.56	88.34	86.41	84.34	87.12	85.71	82.14	85.22	83.65
Proposed (DFG only)	84.56	88.34	86.41	84.34	87.12	85.71	82.14	85.22	83.65
Proposed (no pre-trained encoder)	80.58	85.12	82.78	81.22	85.89	83.49	79.34	83.62	81.42
Proposed (no matcher)	82.19	83.45	82.82	82.67	84.23	83.44	81.48	82.27	81.87
Proposed (full)	90.27	92.34	91.29	90.38	91.54	90.95	87.45	89.17	88.30

Notation: “no pretrained encoder” removes the pre-trained code encoder; “no matcher” removes cross-graph matching.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liang, X.; Tan, Y.; Song, J.; Yang, F. Contract-Graph Fusion and Cross-Graph Matching for Smart-Contract Vulnerability Detection. Appl. Sci. 2025, 15, 10844. https://doi.org/10.3390/app151910844

AMA Style

Liang X, Tan Y, Song J, Yang F. Contract-Graph Fusion and Cross-Graph Matching for Smart-Contract Vulnerability Detection. Applied Sciences. 2025; 15(19):10844. https://doi.org/10.3390/app151910844

Chicago/Turabian Style

Liang, Xue, Yao Tan, Jun Song, and Fan Yang. 2025. "Contract-Graph Fusion and Cross-Graph Matching for Smart-Contract Vulnerability Detection" Applied Sciences 15, no. 19: 10844. https://doi.org/10.3390/app151910844

APA Style

Liang, X., Tan, Y., Song, J., & Yang, F. (2025). Contract-Graph Fusion and Cross-Graph Matching for Smart-Contract Vulnerability Detection. Applied Sciences, 15(19), 10844. https://doi.org/10.3390/app151910844

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Contract-Graph Fusion and Cross-Graph Matching for Smart-Contract Vulnerability Detection

Abstract

1. Introduction

2. Related Works

3. Proposed Approach

3.1. Overview

3.2. Contract-Graph Construction

3.3. Pre-Training Phase

3.4. Cross-Graph Similarity with a Contract-Graph Matcher

3.5. Training and Detection

4. Empirical Evaluation

4.1. Datasets

4.2. Baselines and Setup

4.3. End-to-End Results

4.4. Low-FPR Sensitivity

4.5. Visual Diagnostics

4.6. Fusion and Ablations

5. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI