DyTSSAM: A Dynamic Dependency Analysis Model Based on DAST

Zhao, Yuxiang; Jiang, Ying; Huang, Peifeng

doi:10.3390/electronics14224443

Open AccessArticle

DyTSSAM: A Dynamic Dependency Analysis Model Based on DAST

by

Yuxiang Zhao

^1,2

,

Ying Jiang

^1,2,* and

Peifeng Huang

^1,2

¹

Yunnan Provincial Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming 650504, China

²

School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650504, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(22), 4443; https://doi.org/10.3390/electronics14224443

Submission received: 17 October 2025 / Revised: 5 November 2025 / Accepted: 6 November 2025 / Published: 14 November 2025

(This article belongs to the Section Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

Program dependence analysis plays a fundamental role in program comprehension, software maintenance, and defect detection. However, existing static approaches—such as those based on Program Dependence Graphs or Abstract Syntax Trees—struggle to model fine-grained syntactic changes and fail to capture how dependencies evolve as code changes over time. To address these limitations, this study proposes DyTSSAM, a dynamic dependency analysis model built upon the Dynamic Abstract Syntax Tree (DAST). DyTSSAM decomposes DAST into temporally ordered change subtrees to capture the minimal syntactic units of code evolution, and incorporates local–global dependency analysis to enrich node representations with heterogeneous dependency information. The model further integrates a dynamic structural-syntax layer and a temporal-semantic layer, which jointly learn dynamic syntactic structures and temporal dependency patterns through a dynamic graph neural network. Experiments conducted on real-world datasets compare DyTSSAM with seven state-of-the-art dynamic graph neural networks. Results demonstrate that DyTSSAM achieves significantly higher AUC and AP scores, improves fine-grained modeling of node- and subtree-level dependencies, and exhibits greater sensitivity in capturing dependency evolution throughout code changes. To support reproducibility and enable future research, the complete datasets, preprocessing code, and model implementation are publicly available on GitHub. These findings suggest that DyTSSAM provides an effective and scalable framework for dynamic program dependence analysis.

Keywords:

program dependence analysis; dynamic abstract syntax tree (DAST); fine-grained modeling; predictive accuracy

1. Introduction

Traditional program dependence analysis can be broadly categorized into static analysis and dynamic analysis. Static analysis, first introduced by M. E. Fagan in 1976 [1], detects potential issues in source code without executing the program. Based on the level of abstraction, static analysis encompasses both syntactic analysis and semantic analysis [2]. Among these, program dependence is a core element of semantic analysis, as it characterizes control-flow and data-flow relationships in programs, thereby supporting critical tasks such as code comprehension, defect detection, software maintenance, and testing.

However, most existing approaches construct dependency models from static code snapshots, such as Program Dependence Graphs (PDGs), Abstract Syntax Tree (AST) extensions, or reverse-engineering-driven methods [3,4,5]. These approaches typically analyze code at the function, class, or package level, relying on predefined rules or intermediate representations. As a result, they are ill-suited for modern development environments with frequent code modifications. Once the code changes, the original dependency relationships may quickly become invalid, and updating the models incurs substantial overhead.

In traditional PDG- or AST-based static analysis, even a minor code modification (e.g., adding a statement or renaming a variable) often requires reconstructing the entire dependency graph, since all affected control and data dependencies must be re-parsed from scratch. This results in repetitive full-graph recomputation and high maintenance costs for large-scale projects. In contrast, DyTSSAM leverages a change-driven DAST segmentation mechanism that isolates modifications into independent syntactic subtrees. During incremental updates, only the subtrees corresponding to changed code regions and their dependency-affected counterparts are re-encoded and updated, while the remaining parts of the DAST are preserved. This selective re-encoding strategy minimizes redundant computation and ensures that both direct and indirect dependency evolutions are consistently captured with high temporal granularity.

Moreover, these methods struggle to capture fine-grained syntactic changes at the node level (e.g., expression refactoring or variable renaming), exhibit limited accuracy when modeling complex dependency relationships, and lack the ability to reflect dynamic dependency evolution throughout the code development process.

For instance, during software refactoring, modifying the internal structure of a class or function may alter dependency relationships among modules, thereby affecting the accuracy of regression test selection and fault localization. Similarly, a seemingly simple variable renaming may break data-flow dependencies in downstream components, leading to inconsistent analysis results if dependency tracking is not updated in time. In practice, Deng et al. [3] employed the Dependency Finder tool to reverse-engineer project source code and identify dependencies among packages, classes, and functions. Based on these dependency relationships, they reduced regression testing time by pruning redundant test cases. Likewise, Zhang et al. [4] constructed Program Dependence Graphs (PDGs), where nodes represent program statements and edges capture both data and control dependencies, enabling deeper dependency-based analysis of program behaviors. These examples and empirical studies together demonstrate that even minor code changes can significantly affect dependency structures, further emphasizing the necessity for dynamic and fine-grained dependency analysis methods.

Consequently, a critical research question arises: how can program dependence analysis simultaneously capture fine-grained dependency variations, ensure high predictive accuracy, and effectively model the dynamic evolution of dependencies as the code evolves?

The Abstract Syntax Tree (AST) is a tree-structured representation of source code, where each node corresponds to a syntactic unit (e.g., variable declaration, function call, or conditional statement), and edges encode syntactic composition. To better model program evolution, Yao et al. [6] proposed the Dynamic Abstract Syntax Tree (DAST), which augments the AST with node provenance, node operations, and temporal information. This endows DAST with the properties of a dynamic graph, enabling it to faithfully capture step-by-step changes in program structure during development. Inspired by the success of dynamic graph neural networks in modeling temporal structural evolution, we propose to model DAST as a dynamic graph that evolves with both time and code modifications. Specifically, DAST nodes correspond to graph nodes, syntactic edges act as static structural edges, while dependency edges are dynamically updated as code changes occur, resulting in a sequence of temporal graphs. Within this framework, program dependence analysis can be naturally reformulated as a link prediction problem in dynamic graphs: given the current program structure and historical dependencies, the task is to predict new dependencies induced by code modifications and further analyze their corresponding evolutionary patterns.

This paper focuses on the task of predicting program dependencies during dynamic software development. Our main contributions are as follows:

Fine-grained dependency modeling. We propose a change-driven subtree segmentation and feature aggregation strategy based on DAST, which captures fine-grained structural evolution at both subtree and node levels. By incorporating a dynamic syntactic attention mechanism, our model adaptively aggregates subtree structural information, effectively identifying dependency variations caused by local syntactic changes such as expression replacement and variable renaming.
Improved predictive accuracy. We design the Dynamic Temporal Syntax-Semantics Aggregation Model (DyTSSAM), which leverages a dual-channel fusion mechanism of structural and temporal features. This enhances the model’s expressive power for discriminating complex dependencies. Experimental results show that DyTSSAM consistently outperforms existing models in terms of AUC and AP, achieving higher predictive accuracy at both node- and subtree-level granularity.
Support for dependency evolution modeling and analysis. By adopting a code-change–driven DAST segmentation strategy and integrating a temporal semantic attention mechanism, our model captures the evolutionary trends of dependencies throughout code development. This enables fine-grained characterization of dependency evolution without relying on repetitive full-scale static analysis.

The remainder of this paper is organized as follows. Section 2 reviews the background of program dependence analysis and dynamic spatiotemporal dependency modeling. Section 3 presents the theoretical foundation of our study. Section 4 and Section 5 detail the DyTSSAM model and experimental evaluation, respectively. Section 7 concludes the paper.

2. Related Work

2.1. Program Dependence Analysis Based on Static Program Data

Most existing program dependence analysis techniques are built on static source code, making them ill-suited for continuously evolving development environments. Agarwal et al. [7], by comparing the role of control dependence and data dependence in static slicing, pointed out that data dependence is more suitable for analyzing large-scale systems. However, this approach relies on complete program snapshots and cannot perceive code changes, requiring full re-analysis after each modification. Similarly, Kalhauge et al. [8] proposed the Binary Reduction algorithm, which performs static analysis on Java bytecode to construct class-level dependency graphs and effectively reveal system structure. Yet, the analysis results only reflect the state of a specific version; once classes are refactored or removed, the original dependencies become invalid, and incremental updates are not supported.

In terms of syntactic structure modeling, Zhang et al. [4] introduced the CMFVD framework, which uses Joern to generate Code Property Graphs (CPGs) and extract Program Dependence Graphs (PDGs) for vulnerability detection. Although this approach combines control and data dependencies, it requires parsing the complete codebase at once. If the source code is locally modified, the entire parsing process must be repeated, leading to inefficiency. Guo et al. [9] leveraged AST subtrees and neural machine translation (NMT) models to generate code comments, effectively integrating syntactic and semantic information. However, this approach depends on fixed AST structures and lacks the ability to capture code changes during evolution. Furthermore, Deng et al. [3] constructed multi-level software network graphs from packages, classes, to functions, and employed node2vec and graph-based feature learning for dependency prediction. While effective in static systems, this method relies on full-scale static analysis and requires graph reconstruction and retraining when common modifications such as function renaming or class splitting occur.

More recently, Roy et al. [10] proposed the Dynamic Syntax Tree Model (DSTM), which dynamically composes AST-based neural modules to learn adaptive and fine-grained code embeddings. This study represents a new direction in code representation learning that enhances the expressiveness of static program models. In addition, JLineVD+ [11] further extends this line of research by applying graph neural networks to Java code vulnerability detection, leveraging CodeBERT-based contextual embeddings and Node2vec-enhanced subgraph construction to capture both local and global code relationships.

In summary, static program-data-based approaches suffer from two major drawbacks: (1) their modeling granularity is typically confined to the function, class, or package level, lacking the capability to capture fine-grained syntactic changes at the subtree or node level; (2) their predictive accuracy is limited, making it difficult to remain robust in complex and frequently changing code environments.

2.2. Temporal Graph Evolution Analysis

Dynamic graph neural networks (DGNNs) have recently achieved remarkable progress in modeling the temporal evolution of graph structures, demonstrating strong dynamic representation capabilities. Sankar et al. [12] proposed DySAT, which employs a self-attention mechanism over both structural neighborhoods and temporal dynamics to capture graph evolution, outperforming traditional recurrent neural networks in link prediction. Pareja et al. [13] proposed EvolveGCN, which evolves graph convolutional network parameters through recurrent neural networks, enabling efficient adaptation to structural changes and achieving strong performance on link prediction, edge classification, and node classification tasks. Cui et al. [14] introduced DyGCN, extending GCN-based embeddings to dynamic graph scenarios with a local update mechanism that significantly improves efficiency. Li et al. [15] proposed SDGL, which decomposes spatiotemporal dependencies into long-term and short-term patterns, combining static and dynamic graphs for improved modeling of dynamic dependencies.

These studies validate the effectiveness of DGNNs in capturing temporal structural evolution. However, their applications have primarily focused on domains such as social networks and traffic prediction, with limited exploration of code syntax and semantic modeling. Moreover, most existing methods rely on time-snapshot processing of the full graph, making it difficult to capture fine-grained dependency variations during continuous code evolution.

2.3. Dynamic Spatiotemporal Dependency Modeling

At a finer level of dynamic graph modeling, Mu [16] proposed the CHGDS algorithm, which explicitly models node temporal effects via the Hawkes process and integrates dynamic sampling to learn temporal and structural semantics for event prediction in continuous-time dynamic graphs. Xia [17] introduced the DSTGRNN framework, which combines a dynamic graph generator and a dynamic graph convolution module to model spatiotemporal dependencies in traffic flow, achieving accurate predictions. These works show that DGNNs are capable of capturing multi-scale dependencies evolving over time in complex systems, making them particularly suitable for event-driven, locally updated, and multi-granularity scenarios.

In the domain of software evolution, Jiang et al. [18] proposed a DAST–GCN-based method for analyzing the impact scope of code changes. By extending DAST to encode change information and integrating a two-layer GCN with attention mechanisms, their approach effectively identifies the propagation range of code modifications, offering practical support for maintenance and testing.

Nevertheless, most of these approaches do not account for the dependency evolution characteristics unique to program development. Unlike social or traffic networks, code evolution often manifests as fine-grained local syntactic modifications. This demands models that can capture subtle dependency variations at the node and subtree level while maintaining stability and accuracy throughout continuous software evolution.

Motivated by the strong capability of DGNNs (e.g., DySAT, EvolveGCN, TGAT) to model temporal structural evolution—especially their advantage in capturing dynamic topological changes—we propose a model that deeply integrates DGNNs with program syntactic and semantic features. Building upon the hierarchical and type-sensitive characteristics of the Dynamic Abstract Syntax Tree (DAST), we introduce DyTSSAM, a model based on dynamic syntactic attention and temporal semantic attention. This model explicitly captures fine-grained syntactic evolution and long-range dependency propagation during code modifications, enabling dynamic modeling and prediction of program dependencies.

Specifically, this study addresses the following three research questions:

RQ1: Can the proposed dynamic syntactic and temporal semantic attention mechanisms effectively enhance the modeling of subtree-level dependency variations, thereby enabling finer-grained dependency analysis?
RQ2: Can DyTSSAM maintain high predictive accuracy when modeling complex program dependency relationships?
RQ3: Can DyTSSAM accurately capture dependency evolution trends during code modifications and demonstrate its advantages for dynamic dependency modeling?

3. Preliminary

3.1. Definition of DAST and Dependencies

Yao [6] and Jiang [18] developed the Dynamic Abstract Syntax Tree (DAST) and designed semantic dependency types tailored to its characteristics. Building upon their prior work, we refine DAST to improve the interpretability and structural clarity of syntactic representations. Specifically, we extend the definition of syntactic edge types through pattern-matching techniques, making the syntactic structure more explicit and distinguishable. Formally, we define DAST as a directed graph triple with two categories of edges, as shown in Equation (1):

G = (V, E^{syn}, E^{dep}),

(1)

Here, V denotes the set of nodes across all time steps;

E^{syn}

represents all syntactic edges in the graph, capturing code syntax structures; and

E^{dep}

denotes dependency edges, encoding semantic information.

To clarify the semantics of dependency relationships, Table 1 lists the dependency edge types defined in DAST, including their parent and child node constraints and corresponding meanings. Each edge is directed from the dependent node to its dependency source, forming a labeled directed acyclic graph that reflects both control-flow and data-flow semantics.

Figure 1 illustrates an example of dependency creation and disappearance during code evolution. At time

t_{1}

, the programmer writes the code shown on the left of part (a). These statements are parsed into a base tree as shown in part (b, left), where the Assign node and the While node are connected by a control-flow dependency edge labeled VC. At time

t_{2}

, a new statement “k = 2” is added. The newly inserted code is parsed into a subtree and linked to the existing While subtree, producing an additional VC dependency between the new Assign node and the controlling While node. Finally, at time

t_{3}

, the programmer modifies the previous statement “j = 1” to “x = 2”. This modification operation generates a new Assign subtree, which is connected to the original While subtree through a dashed line, indicating that it originates from an update rather than a new insertion. Consequently, the original VC dependency disappears, and a new VC edge is established to reflect the updated control-flow relationship.

3.2. Limitations of Traditional Dynamic Graph Neural Networks on DAST

Although dynamic graph neural networks (DGNNs) provide useful techniques for modeling the evolution of program dependencies, they are primarily designed for general-purpose domains such as social networks and traffic flows. Consequently, they exhibit limitations when applied to DAST, which is inherently tree-structured with parent–child relationships and multi-feature constraints.

As illustrated in Figure 2a, in traditional DGNNs, the feature representation of an if node is obtained solely by aggregating information from its local neighborhood layer by layer. This process cannot fully exploit the complete subtree’s syntactic structure and associated semantic features (e.g., edge labels and descendant nodes). However, in Abstract Syntax Trees (ASTs), similar subtree topologies may convey entirely different semantics under different combinations of node and edge types. For example, “assignment” and “function argument passing” may share a similar structure but express different semantics, as shown in Figure 2b. This decoupling between structure and semantics can lead to representational ambiguity in traditional GNN aggregation, ultimately reducing accuracy in capturing dynamic dependency changes in frequently evolving codebases.

3.3. Problem Definition

The goal of this study is to model and predict dependency relationships induced by code changes in a fine-grained manner, with particular emphasis on capturing the dynamic characteristics of dependencies throughout program evolution. We represent the temporal evolution of DAST as a sequence of graph snapshots

{G_{1}, G_{2}, \dots, G_{T}}

, where each snapshot at time step t is defined as follows (Equation (2)):

G_{t} = (V_{t}, E_{t}^{syn}, E_{t}^{dep}), t = 1, \dots, T .

(2)

The complete edge set at time t is the union of syntactic and dependency edges (Equation (3)):

E_{t} = E_{t}^{dep} \cup E_{t}^{syn} .

(3)

Each dependency edge

e_{i j}^{dep} \in E_{t}^{dep}

is associated with a type label

l_{i j}^{dep} \in L_{dep}

and a feature vector

a_{i j}^{dep} \in A_{t}

, where

A_{t}

denotes the dependency edge feature matrix at time t. Syntactic edges follow the same formulation.

At each time step t, we leverage implicit structural information from the previous step, defined as

M_{t - 1} = {z_{v}^{t - 1} \in R^{F} ∣ v \in V_{t - 1}}

, where

z_{v}^{t - 1}

is the stored representation of node v integrating its historical syntactic and dependency context. Together with the updated syntactic structure after the current modification, denoted as

{\tilde{G}}_{t} = (V_{t}, E_{t}^{syn})

, the prediction process is formalized as Equation (4):

P ((v_{i}, v_{j}) \in E_{t}^{dep} ∣ v_{i}, v_{j} \in V_{t}, {G_{t}}_{t = 1}^{T - 1}, {\tilde{G}}_{t}) .

(4)

Ultimately, our objective is to learn a function

f_{θ}

that takes as input the historical sequence of graphs and the current syntactic structure, and outputs the probability of a dependency edge existing between any two nodes (Equation (5)):

f_{θ} : (v_{i}, v_{j}, G_{1 : T}) \to R .

(5)

4. Methodology

Traditional static dependency analysis methods struggle to capture the dynamic evolution of code structure and dependencies during software development. Meanwhile, existing dynamic graph neural networks (DGNNs) remain insufficient for modeling syntactic structural changes and temporal semantic evolution. To address these challenges, we propose a Dynamic Temporal Syntax-Semantics Aggregation Model (DyTSSAM) for program dependency prediction based on the Dynamic Abstract Syntax Tree (DAST). The model is specifically designed to tackle the difficulty of fine-grained and accurate dependency modeling in continuously evolving codebases, while supporting dynamic characterization of dependency evolution.

As illustrated in Figure 3, DyTSSAM consists of the following four components:

(1): Change-Based Temporal Subtree Segmentation of DAST: The original DAST is segmented into a sequence of subtrees according to code modifications, where each change corresponds to an independent subgraph. This step ensures that dependency modeling remains aligned with the code evolution process and provides fine-grained temporal units for subsequent analysis.
(2): Dynamic Syntax Structure Aggregation (DSAM): The segmented subtrees are integrated at the syntactic-structural level to generate structural features that distinguish different syntactic patterns. The objective of this step is to capture structural variations at both the subtree and node levels, thereby providing a fine-grained foundation for dependency prediction.
(3): Dependency Source Analysis and Temporal Semantic Aggregation (LGDAM and TDAM): Heterogeneous dependency information from multiple sources is integrated, while historical temporal context is incorporated to model the evolving patterns of dependencies. The purpose of this step is to maintain contextual coherence and enhance predictive capability in handling complex dependency relationships.
(4): Contrastive Learning-Based Dependency Analysis (CLM): Based on the aggregated features, the model outputs potential dependency edges along with their types. The goal of this step is to improve both the accuracy and stability of dependency prediction throughout the code evolution process.

To ensure consistent notation throughout the model design and mathematical formulation, Table 2 summarizes the symbols used in Section 4.1, Section 4.2, Section 4.3 and Section 4.4. We use t to denote time steps, l for neural network layer indices, and employ abbreviated superscripts (syn for syntactic edges, dep for dependency edges).

4.1. Change-Based Temporal Subtree Segmentation of DAST

To accurately capture fine-grained syntactic changes in program dependency prediction and avoid semantic ambiguity or structural fragmentation caused by fixed time-window approaches, we adopt a change-driven subtree segmentation strategy based on code modification events. Existing discrete dynamic graph neural networks often employ fixed time windows to partition the original dynamic graph into a sequence of equally spaced subgraphs [12,13,14,15,16,17]. However, such an approach is limited when applied to Dynamic Abstract Syntax Trees (DASTs), which are inherently event-driven: it assumes structural evolution occurs uniformly over time, while in reality, code changes are discrete and dense semantic events, often concentrated on specific syntactic structures.

As illustrated in Figure 4, at time

t_{3}

a modification occurs on the return subtree (highlighted in blue). Under fixed-window partitioning, this change may be forcibly split across two adjacent subgraphs and mixed with other unrelated modifications in the same window. This makes it difficult to precisely localize where the modification occurs syntactically and to distinguish structural differences before and after the change, thereby weakening the model’s capacity to capture dependency evolution induced by code changes. More critically, fixed-window approaches typically treat time as a continuous and uniform dimension, which downplays the role of a meaningful syntactic modification as the fundamental semantic unit, making it difficult to analyze fine-grained syntactic structural changes.

To address this issue, we propose a temporal-change-based segmentation strategy, which treats each individual code modification as a basic unit to construct a semantically aligned sequence of subgraphs. This strategy ensures that each insertion, deletion, or modification of a syntactic structure corresponds to an independent change subgraph, thereby providing a clearer representation of how dependency relationships evolve with code modifications. The improved change-subgraph sequence is formally defined in Equation (2) of the previous section.

It is worth noting that the proposed change-based segmentation is not primarily designed as a computational optimization, but as a structural improvement to preserve syntactic integrity and semantic continuity during program evolution. Unlike fixed-window partitioning, which may fragment relevant code modifications or merge unrelated ones, our approach guarantees that each subgraph corresponds to a coherent syntactic change event. While this design does not significantly reduce wall-clock time or memory usage, it improves the fidelity of dependency evolution modeling by aligning segmentation boundaries with actual semantic edits.

Through change-based segmentation, the boundaries of subgraphs are guaranteed to align with syntactic modifications. As a result, the model achieves higher discriminability in capturing syntactic changes such as control-flow refactoring, variable renaming, and expression replacement, while providing semantically consistent input representations for dependency prediction.

4.2. Aggregating Dynamic Syntactic Structural Features

To enhance the model’s ability to represent complex code syntactic structures and fine-grained dependency relationships—and to overcome the limitation of existing models that can only aggregate direct neighbors while failing to incorporate historical syntactic structures—we design a Dynamic Syntax Attention Module (DSAM). The goal of DSAM is to aggregate information at the subtree level, thereby improving the model’s effectiveness in capturing syntactic changes and their influence on dependency evolution.

Conventional structural attention mechanisms typically aggregate information from either direct neighbors or temporal neighbors. Such designs are insufficient for handling complex program structures: they rely on the traditional notion of local adjacency, which may result in incomplete modeling of dependency relationships within complex syntactic constructs. Moreover, existing models lack effective integration of historical syntactic structures, making it difficult to reflect the dynamic evolution of dependencies in scenarios with frequent code modifications. To address these challenges, DSAM recursively aggregates the entire subtree’s syntactic information, constrains information flow through explicit syntactic relationships, and fuses historical syntax with current changes via a time-decay mechanism.

The DSAM consists of three key processes: subtree selection, attention computation, and dynamic updating. Unlike traditional models, DSAM performs recursive aggregation at the subtree level, controls information propagation through syntactic relations, and integrates historical features with current modifications by means of a time-decay mechanism.

The operation of DSAM is formally defined in Equation (6):

ASTAttn : (X_{t}, E_{t}^{syn}, A_{t}, M_{t - 1}) \to (Z_{t}, M_{t}),

(6)

where

X_{t}

is the set of node features in the current subgraph,

E_{t}^{syn}

is the set of syntactic edges,

A_{t}

is the syntactic edge feature matrix, and

M_{t - 1}

denotes the historical memory bank. The outputs are the updated node representations

Z_{t}

and the updated memory bank

M_{t}

.

First, DSAM selects valid subtrees for each node, as defined in Equation (7):

N_{v}^{valid} = {u ∣ (u, v) \in E_{t}^{syn} \land v \in children (u)},

(7)

This ensures that the dynamic syntax attention layer computes attention along correct subtree directions. The proposed subtree-constrained attention mechanism limits the attention aggregation within each syntactically valid subtree derived from change-based segmentation. Each subtree corresponds to a code modification scope (e.g., a statement block or function body) detected in the DAST structure. The recursive aggregation proceeds along the subtree hierarchy until reaching leaf nodes, since leaf-level syntax tokens (identifiers, literals, operators) represent the minimal semantic units that carry no further compositional dependency. Therefore, we terminate the recursion at the leaf level rather than imposing an artificial maximum depth, ensuring that all structurally meaningful dependencies within each change scope are fully captured while preventing redundant cross-subtree interactions.

As shown in Figure 5, the figure illustrates a DAST example requiring structural aggregation. Taking the if node (blue node) as an example, it demonstrates how the feature

n_{1}^{f}

of the hlif node is aggregated with the syntactic edge feature

e_{1}^{f}

and the child-node feature

n_{2}^{f}

through the enhanced attention mechanism.

For each edge and node, attention computation is performed as follows: the query vector is defined in Equation (8), while the key and value vectors are given by Equations (9) and (10), respectively.

Query vector (based on the child node’s initial feature):

Q_{v}^{l} = W_{Q}^{l} x_{v}^{t},

(8)

Key vector (based on the parent node’s feature and edge feature):

K_{u}^{l} = W_{K}^{l} [\begin{matrix} x_{u}^{t} \\ α_{u v} \end{matrix}],

(9)

Value vector (same as Key):

V_{u}^{l} = W_{V}^{l} [\begin{matrix} x_{u}^{t} \\ α_{u v} \end{matrix}],

(10)

Here,

x_{v}^{t}

is the node feature at time step t, and

α_{u v}

is the syntactic edge feature vector. These vectors are used to compute attention weights across nodes, ensuring that the model effectively integrates both node types and syntactic edge information.

When node subtree structures change during programming, the associated node features also change. To address this, the structural syntactic attention layer dynamically updates features based on the elapsed time since the last update, with the memory bank storing features from earlier timestamps.

As shown in Figure 6, consider an If node (blue node) created at time

t_{1}

and modified at time

t_{5}

(red node). First, the node feature from time

t_{1}

is retrieved from the memory bank

M_{t} : V \to R^{F}

. The time-decay parameter is then computed based on the elapsed time

Δ t

as defined in Equation (11):

λ_{v} = σ (w_{λ} Δ t + b_{λ}),

(11)

where

Δ t

denotes the time interval between two consecutive code changes. We empirically found that the sigmoid form offers stable gradient behavior and bounded output in

(0, 1)

, which facilitates smooth weighting of temporal dependencies. The parameters

w_{λ}

and

b_{λ}

are learned end-to-end together with model training, and their initialization range was selected through a small-scale grid search on the validation set to ensure sensitivity robustness.

The set of active nodes (to be updated) is defined as in Equation (12):

A_{t} = {v \in V_{t}},

(12)

For each active node v, its feature is updated according to Equation (13):

z_{v}^{t} = (1 - λ_{v}) h_{v}^{t} + λ_{v} m_{v}^{t - 1},

(13)

where

h_{v}^{t}

is the new feature computed via syntactic attention,

m_{v}^{t - 1}

is the node’s previous feature from the memory bank, and

λ_{v}

is the node’s decay weight. Finally, the updated node feature and timestamp are stored back into the memory bank as specified in Equation (14):

M_{t} (v) = (z_{v}^{t}, t) .

(14)

Through DSAM, the model effectively integrates historical syntactic information with current subtree modifications, enabling the capture of more fine-grained structural variations at both node and subtree levels. This lays a solid foundation for subsequent dependency source analysis and temporal semantic modeling.

4.3. Dependency Source Analysis and Temporal Semantic Feature Aggregation

In the previous section, the Dynamic Syntax Attention Module (DSAM) enables the model to capture syntactic feature variations at the node level. However, relying solely on syntactic structure information is insufficient to fully model dependencies in the DAST. Program dependencies are not only determined by the current subtree structure but are also influenced by global/local differences in dependency sources and the temporal effects of historical dependencies. To further enhance predictive accuracy and stability, we design the Local–Global Dependency Analysis Module (LGDAM) and the Temporal Dependency Attention Module (TDAM). LGDAM distinguishes and models global versus local dependencies, while TDAM assigns varying importance to historical dependencies based on temporal semantics, thereby jointly modeling dependency sources and temporal evolution.

During the evolution of DAST, dependencies may arise either from the complete structure (global dependencies) or from partial observable subtrees and historical information (local dependencies).

As shown in Figure 7, at time

t_{1}

the model has full structural visibility and can establish global dependencies (blue edges). In contrast, at time

t_{3}

, the model only observes a partial subtree structure, historical dependencies, and past node features. Dependencies inferred under such conditions are defined as local dependencies (red edges), which rely more heavily on historical context and carry lower confidence compared to global dependencies.

The LGDAM consists of an input stage and an output stage. The input stage fuses dependency edges, timestamps, and type labels, while the output stage produces enhanced dependency features. The overall input-output mapping of the module is formally defined in Equation (15):

LocGlobModule : (E_{t}^{dep}, T_{t}, Y_{t}) \to F_{t}^{enh},

(15)

Specifically, for each dependency edge, the enhanced feature is computed by an MLP that concatenates the edge representation, type label, and temporal encoding, as shown in Equation (16):

f_{i, t}^{enh} = MLP ([e_{i, t}^{dep} ∥ y_{i, t} ∥ e_{i, t}]),

(16)

Here,

E_{t}^{dep}

denotes the set of historical dependency edges (with global/local indicators) at time step t,

T_{t}

is the timestamp matrix, and

Y_{t}

denotes dependency type labels. The output is an enhanced edge feature incorporating local and global dependency information.

At the output stage, node features are concatenated with dependency features for dependency type prediction, following the formulation in Equation (17):

h_{t} = MLP ([z_{v}^{t} ∥ z_{u}^{t} ∥ α_{u v}^{t} ∥ τ_{t} (u, v)]),

(17)

In dynamic program evolution, dependencies are continuously established, updated, or invalidated as code changes. To model their varying temporal reliability, TDAM introduces a learnable time encoder based on cosine basis functions:

TimeEncoder (Δ t) = cos (Δ t \cdot b + ϕ),

(18)

where

b

and

ϕ

are learnable frequency and phase parameters. This encoding preserves smooth temporal continuity while allowing flexible decay adjustment.

As shown in Figure 8, consider an Assign node (blue node) at time

t_{5}

. Its feature aggregates the dependency type feature

e_{1}^{f}

and the features of related dependency nodes

n_{2}^{f}

and

n_{3}^{f}

(red features) through the temporal attention mechanism. Earlier dependencies that remain valid retain higher attention weights, while outdated or invalidated dependencies gradually decay and are masked out.

Formally, given a query node

x^{t}

and M historical dependencies

{k_{j}^{t_{j}}, e_{j}^{t_{j}}, t_{j}}_{j = 1}^{M}

, TDAM constructs input keys as

k_{j}^{in} = [k_{j}^{t_{j}} ∥ e_{j}^{t_{j}} ∥ t_{j}],

(19)

which are projected into query, key, and value spaces via linear transformations:

\begin{matrix} q^{l} = W_{Q}^{l} x^{t}, k_{j}^{l} = W_{K}^{l} k_{j}^{in}, v_{j}^{l} = W_{V}^{l} k_{j}^{l} . \end{matrix}

(20)

A multi-head attention layer aggregates temporal information as

{AttnOutput}^{t} = MultiHeadAttn (q^{l}, k^{l}, v^{l}),

(21)

where invalidated edges (deleted or overwritten) are excluded upstream during DAST updates. If no historical dependencies exist, TDAM directly returns the original node feature (guard clause if len(key_nodes)==0: return query).

Through this mechanism, TDAM dynamically learns the temporal relevance of historical dependencies, suppresses obsolete relations, and adaptively models evolving program semantics within DAST.

4.4. Contrastive Learning and Loss Function

In the previous sections, we enhanced the modeling of global/local dependencies and temporal semantics through LGDAM and TDAM. However, relying solely on the fusion of structural and semantic features may still limit the model’s discriminative power when faced with dependencies that are semantically similar but logically different.

To further improve discrimination in dependency prediction, we introduce a contrastive learning mechanism into the task. The core idea of contrastive learning is to reduce the semantic distance between positive samples while enlarging the representational gap between positive and negative samples, thereby enhancing the model’s ability to distinguish different dependency relationships. This mechanism is particularly well-suited for the Dynamic Abstract Syntax Tree (DAST), where dependencies between nodes are complex and dynamically evolving [19].

A key challenge lies in negative sample generation. Traditional approaches often adopt random sampling, which is problematic in the DAST context. Since dependencies in DAST are highly dependent on node types and syntactic–semantic structures, randomly generated negative samples often violate program semantics. This introduces noise during training and undermines model stability and generalization. To address this issue, we propose an improved two-level negative sampling strategy that incorporates semantic constraints.

Formally, for each candidate node pair

(v_{u}, v_{v})

within the same subgraph, a pair is accepted as a valid negative if it satisfies the following semantic and structural constraints:

(v_{u}, v_{v}) \notin E_{t}^{dep}, v_{u} \neq v_{v}, and R (v_{u}, v_{v}) = 0,

(22)

where

E_{t}^{dep}

denotes the set of existing dependency edges at time t, and

R (v_{u}, v_{v})

is a semantic validity indicator derived from the dependency formation rules defined in Table 1. A value of

R (v_{u}, v_{v}) = 0

indicates that the node pair

(v_{u}, v_{v})

does not satisfy any valid dependency relation in the predefined type schema, and thus can be considered as a high-quality negative sample. This formulation ensures that all generated negative pairs are syntactically valid but semantically implausible within the DAST structure, avoiding noise introduced by random pairings while maintaining structural consistency.

Statistical analysis shows that approximately 72–78% of the generated negative samples fall into this semantically constrained “high-quality” category, while the remaining 22–28% are randomly sampled as fallback when the constrained pool is exhausted. This balance provides sufficient diversity while preserving semantic validity.

In the link prediction stage, the model estimates the probability of an edge between two nodes

v_{i}

and

v_{j}

using the final node representations

h_{i}^{l}

and

h_{j}^{l}

at the last layer l. The probability is computed via the inner product followed by a sigmoid activation, as given in Equation (23):

p_{i j}^{t} = σ (h_{i}^{l} \cdot h_{j}^{l}),

(23)

To address network sparsity, negative sampling is applied to balance class distribution. For each positive edge pair

(v_{i}, v_{j})

and each negative sample pair

(v_{i}, v_{m})

, the model is optimized using a cross-entropy loss function defined in Equation (24):

L_{t} = \sum_{(v_{i}, v_{j}) \in E_{t}} [- \log (p_{i j}^{t}) - E_{v_{m} \sim p (v_{j})} \log (1 - p_{i m}^{t})],

(24)

To further validate the sampling mechanism, we conducted a sensitivity analysis comparing the proposed semantic-constrained sampler with a fully random baseline. Results show that the AUC decreased by 6.6% and AP decreased by 5.1% when using purely random negative sampling, confirming that semantic filtering significantly reduces false negatives and improves model stability.

Through the combination of contrastive learning and the improved negative sampling strategy, our model strengthens its capacity to distinguish complex dependency relationships in dynamic prediction tasks, while maintaining both stability and accuracy as dependencies evolve with code changes.

5. Experiments and Analysis

In this section, we conduct experiments on a collected real-world dataset to evaluate the accuracy of the proposed model. Multiple performance metrics are employed for evaluation from different perspectives. The predictive performance of our model is compared against several baseline models. In addition, ablation experiments are performed to verify the effectiveness of different design modules in improving predictive accuracy.

5.1. Dataset and Data Preprocessing

Our experiments are based on the open-source plugin tool developed by Yao [6], which captures both programmer behaviors and code auto-generation patterns. Early code completion tools such as aiXcoder mainly used token-based completion, predicting one variable name or syntactic unit (token) at a time. With the advent of modern tools such as GitHub Copilot and CodeGeeX, multi-line semantic completion has become common, enabling the generation of coherent code fragments in a single step. To align with this paradigm, we extended the original tool to construct a dataset containing approximately 32,000 programming behavior records across 105 complete source files, recording both human and tool-generated edits together with their corresponding Dynamic Abstract Syntax Trees (DASTs).

All source projects in the dataset were implemented in the Python 3.10 with project-level and file-level data provided in our public repository. Each DAST corresponds to an individual program, serving as a unified structural representation that records all code evolution behaviors throughout its lifetime. Within a single DAST, every addition, modification, and deletion operation is explicitly represented as a syntactic or dependency update applied to the corresponding subtree nodes. This design enables DyTSSAM to model the complete history of program evolution within one integrated structure, rather than relying on multiple temporal snapshots. Consequently, the model can analyze both short-term and long-term dependency changes while preserving full contextual information across all evolution stages.

Since raw DASTs contain numerous nodes and edges with limited semantic contribution (e.g., redundant wrapper nodes), we applied structural reduction to retain only node and edge types representing meaningful syntactic and dependency relations. Specifically, we preserved statement nodes such as Assign, AugAssign, Call, Return, Global, Expr, target, and iter, and block nodes including FunctionDef, ClassDef, If, For, While, and hlModule. All other node types were pruned as redundant to reduce noise and to focus the model on syntactically and semantically significant constructs. We note that this pruning strategy was chosen to simplify representations and lower computational overhead; we did not perform exhaustive formal verification of information preservation across all cases.

To ensure reproducibility, we adopted a stratified temporal splitting strategy for dataset partitioning. The dataset was divided into 80% training, 10% validation, and 10% testing subsets Each subset preserves temporal continuity of code evolution sequences to prevent information leakage between time steps.

To support reproducibility, all datasets, preprocessing scripts, and model implementations have been released in a public GitHub repository (https://github.com/3134726136/DAST-DyTSSAM-Code/tree/master, accessed on 5 November 2025).The repository provides the complete behavioral dataset, DAST construction and update pipeline, processed data used for training, and full implementations of DyTSSAM and its ablation variants. These materials enable end-to-end replication of the data preparation and model training process described in this study.

5.2. Evaluation Metrics

To comprehensively assess performance on the dependency link prediction task, we adopt two widely used metrics: Area Under the ROC Curve (AUC) and Average Precision (AP). These metrics evaluate how effectively the model distinguishes positive dependency edges (true links) from negative pairs (non-existent links).

This task is inherently imbalanced: in our dataset, the ratio of positive to negative samples is approximately 1:12. Both AUC and AP are therefore chosen because they are threshold-independent and provide robust evaluation under severe class imbalance [20]. They focus on ranking quality rather than fixed decision thresholds, making them well aligned with link prediction tasks.

Formally, AUC measures the probability that a randomly chosen positive edge is ranked higher than a negative one, while AP summarizes the precision–recall trade-off across all thresholds:

AUC = \frac{TP \times TN + \frac{1}{2} \times FP \times (FP - 1)}{P \times N},

(25)

AP = \sum_{k} (\frac{{TP}_{k}}{{TP}_{k} + {FP}_{k}}) \times Δ {Recall}_{k},

(26)

where TP, FP, FN, and TN denote true positives, false positives, false negatives, and true negatives, respectively, and P and N denote the total numbers of positive and negative samples.

5.3. Baselines

To evaluate dependency prediction performance, we compare our proposed model with the following representative baselines:

CD-GCN [21]: Combines Graph Convolutional Networks (GCN) and Long Short-Term Memory (LSTM), extracting structural features via GCN and modeling temporal sequences with LSTM.
DySAT [10]: Employs structural and temporal self-attention mechanisms to jointly model graph structures and temporal dynamics, enabling flexible complexity and improved computational efficiency.
EvolveGCN [12]: Evolves GCN parameters over time using a recurrent neural network, effectively modeling structural changes.
TGAT [22]: Integrates self-attention with functional time encoding (via Bochner’s theorem), aggregating temporal neighbors to capture both dynamic topology and temporal interactions.
DyGNN [23]: Incorporates update and propagation modules, where updates modify node features upon edge arrivals and propagations diffuse the updates across neighbors.
HGNN+ [24]: Constructs hyperedge groups to capture high-order correlations among modalities or types, and uses adaptive fusion of hyperedge groups to integrate heterogeneous relational information.
JLineVD+ [11]: A recent code-specific graph neural network designed for vulnerability detection in Java. It enhances subgraph construction through semantic-aware partitioning and integrates pretrained code representations from CodeBERT to strengthen code-level feature extraction and relational reasoning.

These baselines represent convolutional, attention-based, evolutionary, higher-order, and code-specific paradigms, providing a comprehensive spectrum for evaluating DyTSSAM in fine-grained dependency modeling, predictive accuracy under complex dependencies, and dependency evolution analysis.

5.4. Experimental Setting

All models, including the proposed DyTSSAM and baselines, are implemented in PyTorch 2.4.0. Experiments are conducted on a machine with an Intel(R) Core(TM) i5-13600KF CPU and an NVIDIA GeForce RTX 4070Ti Super GPU. The dataset is split into training, validation, and test sets in an 80%/10%/10% ratio. To ensure robustness, each experiment is repeated 10 times, and the average results are reported. For training, all deep learning models use the Adam optimizer with a learning rate of 0.001 for 100 epochs. The key hyperparameters are configured as follows: node embedding dimension: 32, edge feature dimension: 16, temporal feature dimension: 16, hidden layer size: 64.

5.5. Experiments Results

RQ1: Can the proposed dynamic syntax and temporal semantic attention mechanisms effectively enhance the modeling of subtree-level dependency changes, thereby enabling finer-grained dependency analysis?

To evaluate the effectiveness of DyTSSAM in fine-grained dependency modeling, we design a dual-granularity evaluation paradigm. Beyond the conventional subtree-level granularity, we introduce a more stringent node-level granularity to assess performance across different abstraction layers. For a predicted dependency edge, node-level evaluation requires both the source and target nodes to exactly match the ground truth; in contrast, subtree-level evaluation only requires the predicted nodes to reside within the ground-truth dependency subtrees. This design enables us to distinguish whether the model merely perceives that “a dependency exists within a code block” (subtree-level) or can accurately identify “the exact syntactic node pair” (node-level), thereby providing a comprehensive assessment of its ability to capture fine-grained syntactic evolution.

The results are presented in Figure 9. Figure 9a–c illustrate overall performance trends of all models at the node- and subtree-levels, while Figure 9d–i depict detailed per-model comparisons of node versus subtree AUC.

As shown in Figure 9a–i, most baseline models, including CDGCN, DyGCN, DySAT, EvolveGCN, and JLineVD+, exhibit a consistent “subtree > node’’ trend, achieving higher AUC at the subtree level (average gain: +3.8–7.2%). This indicates that these models can roughly localize structural regions where dependencies occur, but often fail to precisely identify the specific syntactic node pairs. For instance, DySAT reaches a node-level AUC of

0.710 \pm 0.05

and a subtree-level AUC of

0.760 \pm 0.04

(

+ 7.1 \pm 0.6 %

). Similarly, JLineVD+, which integrates pre-trained semantic embeddings from CodeBERT into its graph encoder, achieves competitive subtree-level accuracy (

0.812 \pm 0.05

) but limited node-level precision (

0.836 \pm 0.06

), suggesting that while semantic enhancement improves code representation, it still lacks fine-grained syntactic discrimination.

In contrast, DyTSSAM demonstrates a reversed pattern—performing significantly better at the node level than at the subtree level. As shown in Figure 9c, DyTSSAM attains a node-level AUC of

0.944 \pm 0.03

compared to

0.854 \pm 0.04

at the subtree level. This unusual “node > subtree’’ trend indicates that DyTSSAM not only recognizes structural dependency contexts but also accurately localizes specific syntactic node pairs induced by code changes. The improvement originates from the Dynamic Syntax Attention Module (DSAM), which leverages type-aware and subtree-constrained attention to aggregate intra-subtree interactions while preserving hierarchical syntax integrity. Moreover, DyTSSAM converges faster and more stably, reaching

0.931 \pm 0.06

node-level AUC by the 5th epoch, compared with slower convergence and higher variance in models such as DyGCN and JLineVD+.

Figure 9c also shows that DyTSSAM achieves the highest overall AUC stability across epochs. Compared to the strongest baseline (EvolveGCN:

0.859 \pm 0.05

), DyTSSAM improves node-level discrimination by

+ 12.3 \pm 0.7 %

and maintains competitive subtree-level performance (

+ 4.6 \pm 0.5 %

). This consistency indicates that DyTSSAM’s combination of DSAM and TDAM effectively mitigates temporal noise and enhances discriminative capability at fine syntactic granularity.

To further assess ranking quality, we conduct the same dual-granularity evaluation using the Average Precision (AP) metric (Figure 10). Figure 10a–c present the overall AP evolution, while Figure 10d–i depict per-model comparisons between node and subtree levels.

The AP results exhibit a consistent pattern: CDGCN, DyGCN, DySAT, EvolveGCN, and JLineVD+ perform better at the subtree level, indicating an emphasis on coarse structural recognition rather than node-level precision. For instance, JLineVD+ achieves

0.834 \pm 0.03

at the subtree level and

0.805 \pm 0.04

at the node level, reflecting improved semantic ranking but limited syntactic alignment. In contrast, DyTSSAM achieves the opposite pattern, with node-level AP of

0.951 \pm 0.03

exceeding subtree-level AP of

0.867 \pm 0.04

(

+ 8.4 \pm 0.6 %

). This demonstrates DyTSSAM’s superior ability to focus on relevant syntactic interactions within changed subtrees while maintaining stable performance across epochs.

Overall, the results confirm that while existing dynamic GNNs and code representation models effectively capture coarse dependency regions, they remain constrained by fixed receptive fields or global embedding biases. DyTSSAM uniquely achieves higher accuracy at the node level—an inversion of conventional trends—demonstrating that its dynamic syntax and temporal semantic attention mechanisms jointly capture both the structure and evolution of fine-grained program dependencies with higher fidelity and stability.

RQ2: Can DyTSSAM maintain high predictive accuracy when modeling complex program dependency relationships?

To evaluate the predictive accuracy of DyTSSAM in modeling complex code dependencies, we conduct comparative experiments against both conventional graph neural networks and representative dynamic GNNs.

In addition to AUC and AP, which evaluate link prediction performance, we introduce two complementary metrics—Dependency Prediction Accuracy (Dep_Acc) and Mean Reciprocal Rank (MRR)—to assess the dependency type classification task. Dep_Acc measures the overall classification accuracy of predicted dependency types, while MRR quantifies the average inverse rank position of the correct dependency type in the prediction list, reflecting the model’s confidence in ranking relevant dependencies.

To further capture model sensitivity to positive dependencies, we also report Recall, which evaluates the proportion of correctly identified true dependency types. All reported results are averaged over ten independent runs with different random seeds to ensure statistical robustness and reproducibility.

The metric definitions are as follows:

\begin{matrix} Dep_Acc & = \frac{T P + T N}{T P + F P + F N + T N} \end{matrix}

(27)

\begin{matrix} Recall & = \frac{T P}{T P + F N} \end{matrix}

(28)

\begin{matrix} MRR & = \frac{1}{| Q |} \sum_{i = 1}^{| Q |} \frac{1}{{rank}_{i}} \end{matrix}

(29)

where

| Q |

is the total number of dependency-type queries, and

{rank}_{i}

denotes the position of the correct type in the prediction list for the i-th query. Higher MRR values indicate that correct dependency types are ranked closer to the top.

Since the dependency type classification task involves multiple dependency categories (e.g., Assign_VD, VC, Arg, Ret, and Glo), all metrics are computed using discrete class predictions based on the argmax operation of the output logits, without applying any probability thresholds.

The results in Table 3 clearly demonstrate that DyTSSAM consistently outperforms all baselines across AUC, AP, Dep_Acc, and MRR. We further analyze the results as follows:

Dynamic Syntax Attention Layer (DSAM)—By recursively aggregating subtree information with syntactic constraints, DSAM effectively captures node types and hierarchical features. Compared to direct neighbor aggregation models (e.g., CDGCN), DyTSSAM achieves an AUC improvement of +8.86 percentage points, confirming DSAM’s superior ability to encode fine-grained syntactic dependencies.
Temporal Dependency Attention Layer (TDAM)—With learnable temporal encodings and interval-aware weighting, TDAM dynamically captures dependency evolution. This design substantially improves Dep_Acc, with DyTSSAM outperforming TGAT by +6.99 percentage points, indicating its capacity to model long-term dependency shifts and reduce noise from outdated edges.
Local–Global Dependency Analysis Module (LGDAM)—By jointly modeling global structural dependencies and local contextual information, LGDAM enhances representational completeness. DyTSSAM achieves a Recall improvement of +6.38 percentage points over DyGCN, showing that LGDAM effectively improves dependency coverage and robustness, especially in large, evolving codebases.

JLineVD+, as a recent model that leverages semantic pre-training (CodeBERT) and enhanced subgraph construction, demonstrates solid performance (AUC 0.831, AP 0.815). This verifies that semantic embeddings can improve general representation quality. However, JLineVD+ still underperforms DyTSSAM on key dependency-specific metrics such as Dep_Acc (0.800 vs. 0.8936) and MRR (0.760 vs. 0.830), indicating that while pretrained semantic knowledge aids static code understanding, it cannot fully capture temporal evolution and fine-grained syntactic interactions crucial for dependency prediction. DyTSSAM’s explicit temporal-semantic modeling and subtree-constrained aggregation thus offer a more effective solution for dynamic dependency analysis.

Overall, DyTSSAM achieves superior performance in DAST-based dependency prediction, excelling in both discriminative accuracy (AUC, Dep_Acc) and ranking quality (AP, MRR). Compared to dynamic GNNs and recent code-specific graph models such as JLineVD+, DyTSSAM exhibits stronger adaptability to evolving syntactic and semantic structures, confirming the synergistic effectiveness of DSAM, TDAM, and LGDAM in fine-grained, temporally aware dependency modeling.

RQ3: Can DyTSSAM accurately capture dependency evolution trends during code modifications and demonstrate its advantages for dynamic dependency modeling?

Traditional static dependency analysis methods require full re-parsing after every code modification, resulting in high update costs and poor scalability. In contrast, DyTSSAM employs a change-based temporal subtree segmentation of DAST, treating each code modification as an independent input unit. This design avoids frequent reconstruction of the full dependency graph, enabling the model to efficiently capture dependency shifts triggered by localized syntactic edits while maintaining stable predictive performance throughout continuous code evolution.

The results in Table 3 confirm DyTSSAM’s robustness under frequent updates. Specifically, DyTSSAM consistently achieves superior performance on AUC and AP, even under large-scale continuous modification settings, without significant performance degradation. Its Dep_Acc score of 0.8936 further indicates that dependency prediction accuracy remains high during sequential updates. By contrast, continuous-time dynamic models such as DyGNN suffer from pronounced performance drops (AUC = 0.4939), suggesting that edge-level update mechanisms can easily lead to dependency information entanglement and instability. DyTSSAM’s subtree-based modeling framework provides a more resilient alternative. By structuring updates around meaningful syntactic units rather than isolated edges, DyTSSAM is able to continuously integrate syntactic structure and temporal dependency signals across evolution steps. This allows the model to capture dependency evolution at both node-level and subtree-level granularities, ensuring consistency and interpretability in long code evolution sequences.

Moreover, compared to edge-level update models (e.g., DyGNN), DyTSSAM demonstrates stronger robustness when handling intra-function edits or structural refactorings. Subtree-level modeling mitigates the risks of dependency information loss or confusion that often arise from purely local update mechanisms. As a result, DyTSSAM can more clearly represent dependency evolution patterns, thereby enabling effective modeling of complex dependency dynamics during program evolution.

In summary, the experimental findings show that DyTSSAM reliably captures dependency evolution trends in code, maintaining both stability and accuracy when faced with fine-grained modifications and structural adjustments. These results highlight that a modeling framework centered on dynamic syntax and temporal semantics is well-suited for dependency prediction in evolving software systems, and provide new insights into the interplay between code evolution and dependency dynamics.

5.6. Case Study: Fine-Grained Dependency Evolution Analysis

To further interpret the experimental results and visually demonstrate how DyTSSAM captures fine-grained dependency evolution, we present a real-world case derived from our evaluation dataset. This case involves three consecutive code modification events that collectively illustrate the model’s ability to handle incremental structural updates, dependency creation, and deletion during program evolution.

We select EvolveGCN [13] as the primary comparative baseline for this analysis, since it is one of the most widely adopted dynamic graph neural network architectures. EvolveGCN dynamically updates graph convolutional parameters through recurrent neural units (e.g., GRU or LSTM), allowing temporal modeling without explicitly reconstructing intermediate graph states. While this mechanism is conceptually similar to DyTSSAM’s temporal modeling objective, the two differ fundamentally in how they handle structural updates: EvolveGCN evolves model parameters globally over time, whereas DyTSSAM explicitly models localized structural evolution within DASTs via subtree segmentation and syntax-constrained aggregation. Therefore, this comparison highlights the effectiveness of DyTSSAM’s explicit change-driven design in accurately capturing fine-grained dependency evolution.

Stage 1: Initial Dependency Formation ( $t_{1}$ ). At time

t_{1}

, the programmer writes the initial code segment shown in Figure 11a, whose corresponding DAST is illustrated in Figure 11b. Figure 11c and Figure 11d respectively depict the structural aggregation processes of DyTSSAM and EvolveGCN, focusing on the While node (highlighted in blue). In DyTSSAM, the Dynamic Syntax Attention Module (DSAM) identifies all subtrees rooted at the While node (red nodes in Figure 11c) and performs syntax-constrained aggregation to preserve all parent–child and syntactic-edge relationships. EvolveGCN, in contrast, aggregates only first- and second-order neighbors (red nodes in Figure 11d), which leads to the omission of long-range dependencies such as the variable–control (VC) link between the Assign and While nodes. This explains the lower dependency accuracy observed in Table 3 for EvolveGCN when modeling control-flow relations.

Stage 2: New Subtree Integration and Temporal Fusion ( $t_{2}$ ). At time

t_{2}

, a new line of code is added, as shown in Figure 12a, producing an updated DAST in Figure 12b. DyTSSAM first aggregates features of the newly inserted subtree (red nodes in Figure 12c) through DSAM, and then applies an update operation to propagate the new features to the parent If node, ensuring structural consistency. Next, the Temporal Dependency Attention Module (TDAM) fuses historical dependency information (highlighted in red) with the new structure, enhancing temporal continuity. This process enables DyTSSAM to correctly detect both a new variable–control (VC) dependency (global, shown in blue) and a variable–data (VD) dependency (local, derived from historical context, shown in red). In contrast, EvolveGCN (Figure 12d)—which only evolves temporal weights—fails to update prior dependency information into new subtrees, leading to incomplete dependency prediction.

Stage 3: Control-Flow Refactoring and Dependency Deletion ( $t_{3}$ ). At time

t_{3}

, the programmer refactors the control flow by replacing the original If statement with a For loop and moving several statements outside the previous block (Figure 13a). The corresponding DAST, shown in Figure 13b, captures the resulting structural reorganization. DyTSSAM aggregates modified subtree features (blue nodes in Figure 13c) and performs a DSAM update operation to synchronize these changes with existing node features. This operation effectively removes obsolete If-related information, enabling the model to delete three outdated dependencies and infer three new global dependencies. EvolveGCN (Figure 13d), lacking a structure-aware update mechanism, fails to prune obsolete dependencies and cannot correctly identify the newly formed ones.

Discussion. This real-world case clearly demonstrates DyTSSAM’s advantages in maintaining both structural and temporal consistency during code evolution. By combining DSAM’s syntax-constrained aggregation with TDAM’s temporal dependency attention, DyTSSAM achieves accurate fine-grained modeling of dependency creation and disappearance across successive code edits. Compared to traditional dynamic GNNs such as EvolveGCN, which rely solely on weight evolution, DyTSSAM’s explicit structure–time alignment leads to better dependency prediction accuracy and robustness, consistent with the quantitative improvements observed in Section 5.5.

5.7. Ablation Study

To assess the contribution of different components within DyTSSAM to the representation learning of DAST, we designed four ablated variants and compared their performance on the dependency prediction task. This analysis highlights the role of each module in the overall framework:

DyTSSAM-V1: Removes both the dynamic syntax module and temporal semantic module, retaining only conventional GCN and GRU components. This serves as a baseline to evaluate performance without the core innovations.
DyTSSAM-V2: Retains the temporal semantic module but removes the dynamic syntax module, relying on standard GCN for structural aggregation. This variant isolates the effect of temporal semantics.
DyTSSAM-V3: Retains the dynamic syntax module but removes the temporal semantic module, replacing it with GRU-based temporal aggregation. This variant isolates the effect of dynamic syntax modeling.
DyTSSAM-V4: The complete model, serving as the benchmark for comparison.

By comparing V3 vs. V2, we assess the contribution of the dynamic syntax module in capturing fine-grained subtree and node-level dependency changes. By comparing V2 vs. V4, we assess the contribution of the temporal semantic module in modeling dependency evolution and improving predictive accuracy. Finally, comparing V1 vs. V4 highlights the necessity of the overall change-based subtree segmentation framework for modeling dependency evolution during code changes.

The experimental results are presented in Table 4 and Figure 14.

DyTSSAM-V1, which lacks both dynamic syntax and temporal semantic modules, performs poorly (AUC = 0.6361, Recall = 0.587), showing a sharp drop of 31.79 percentage points in AUC and 26.35% in Recall compared to the full model. This demonstrates that without its core modules, the model cannot effectively capture structural and temporal signals in DAST, leading to severe degradation in predictive power.

DyTSSAM-V2, which retains only temporal semantics, also suffers significant declines (AUC = 0.7457, AP = 0.7425, Recall = 0.64, MRR = 0.7223), all markedly lower than the complete model. These results confirm that GCN-based aggregation alone cannot capture fine-grained subtree-level dependency variations, underscoring the essential contribution of the dynamic syntax module.

DyTSSAM-V3, which retains the dynamic syntax module but substitutes the temporal semantic module with GRU, shows a moderate performance drop (AUC = 0.8833, AP = 0.8733). The smaller decline compared to V2 indicates that dependency prediction relies more heavily on subtree structural modeling, while temporal information serves as a secondary but still valuable contributor to performance—particularly for capturing long-range dependency evolution.

In summary, the ablation study demonstrates that DyTSSAM’s performance gains primarily stem from its dynamic syntax module, which enables fine-grained modeling of subtree-level structural changes. The temporal semantic module and local–global dependency analysis further enhance predictive accuracy under complex dependency relationships. Finally, the change-based subtree segmentation framework proves necessary for effectively modeling dependency evolution during code changes, enabling DyTSSAM to achieve both stability and accuracy in dynamic dependency prediction.

6. Threats to Validity

Although DyTSSAM demonstrates strong performance in dynamic dependency prediction, several potential threats to validity remain.

Internal validity. The internal validity threat mainly arises from potential label noise and pruning bias in the dependency dataset. Dependency relationships were extracted automatically from Dynamic Abstract Syntax Trees (DASTs), and although manual inspection was performed for a subset of samples, minor annotation inconsistencies may remain, particularly for edge cases such as nested control flows or indirect variable references. Furthermore, the structural pruning applied to remove redundant syntactic nodes may bias the dependency distribution toward frequently occurring constructs. To mitigate these issues, all pruning and labeling procedures were verified against multiple projects, and experiments were repeated with different random seeds to confirm consistency.

External validity. Our dataset contains approximately 105 complete source files and primarily focuses on Python programs collected from specific IDE environments. As a result, performance may vary when applying DyTSSAM to larger projects, different programming languages, or alternative development tools. However, since DAST is language-agnostic by design, the framework can be adapted to other languages (e.g., Java or C#) with minimal preprocessing adjustments. Future work will include cross-language validation and integration with broader development environments to further enhance generalization.

Construct validity. Construct validity refers to the degree to which our experimental setup accurately reflects the intended research concept—i.e., the dynamic evolution of program dependencies. In this study, dependency evolution is modeled through link prediction over dynamic graphs, which serves as a proxy for real dependency changes. Although link prediction effectively captures structural dynamics, it may not fully reflect higher-level semantic or behavioral changes that occur during complex code refactoring. We mitigate this limitation by combining structural (syntactic) and temporal (semantic) attention mechanisms, yet we acknowledge that dependency evolution in real development scenarios is more nuanced. Future extensions will incorporate additional behavioral traces and commit-level metadata to improve the fidelity of dependency evolution modeling.

7. Conclusions

This paper addresses the key challenges in dynamically constructing program dependency relations and proposes DyTSSAM, a discrete-time dynamic graph neural network enhanced with attention mechanisms. Within its dynamic syntax layer, DyTSSAM employs type-aware and subtree-constrained attention to strengthen the modeling of fine-grained syntactic structures under code changes. Its temporal semantic layer introduces time encoding and dynamic weighting to capture the evolutionary patterns of dependencies over time. Furthermore, through a local–global dependency differentiation strategy, the model effectively integrates multiple levels of dependency information, enabling a more precise representation of dependency evolution throughout program development.

Experimental results demonstrate that DyTSSAM outperforms both static dependency methods and existing dynamic graph models across multiple metrics, including AUC, AP, Recall, and Dep_Acc. These findings confirm that DyTSSAM achieves a balanced advantage in fine-grained dependency modeling, complex dependency prediction accuracy, and evolutionary modeling capability. Ablation studies further validate that the dynamic syntax layer makes the largest contribution to modeling subtree structures, while the temporal semantic layer and local–global dependency analysis module provide complementary benefits in predicting complex dependencies. Moreover, the change-based subtree segmentation framework proves essential for capturing dependency evolution driven by code changes.

Despite its strong performance, DyTSSAM—as a deep learning model—still inherits the “black-box” nature of neural networks, limiting its interpretability. For future research, we plan to conduct deeper analyses of code evolution patterns and optimize data collection and preprocessing strategies to further improve predictive accuracy. Additionally, we will explore incorporating external factors related to code changes (e.g., developer behavior, version control system information) to build more practically applicable and interpretable models for program dependency prediction.

Author Contributions

Conceptualization, P.H.; Formal Analysis, Y.Z.; validation, Y.Z.; formal analysis, Y.Z.; investigation, Y.Z.; resources, Y.Z.; data curation, Y.Z.; writing—original draft preparation, Y.Z.; writing—review and editing, Y.J. and Y.Z.; visualization, Y.Z.; supervision, Y.J.; project administration, Y.Z.; funding acquisition, Y.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by The National Natural Science Foundation of China (No. 62162038, No. 61462049, No. 61063006, No. 60703116), and The National Key Research and Development Program of China (No. 2018YFB1003904).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

All datasets, source code, and experimental configurations used in this study are publicly available at the following GitHub repository: https://github.com/3134726136/DAST-DyTSSAM-Code, accessed on 5 November 2025.

Conflicts of Interest

The authors declare no conflict of interest.

References

Fagan, M. Design and code inspections to reduce errors in program development. In Software Pioneers: Contributions to Software Engineering; Springer: Berlin/Heidelberg, Germany, 2011; pp. 575–607. [Google Scholar]
Jin, Z.; Liu, F.; Li, G. Program comprehension: Present and future. Ruan Jian Xue Bao/J. Softw. 2019, 30, 110–126. (In Chinese). Available online: http://www.jos.org.cn/1000-9825/5643.htm (accessed on 21 July 2025).
Deng, W.T.; Cheng, C.; He, P.; Chen, M.Y.; Li, B. Interaction prediction of multigranularity software system based on graph neural network. J. Softw. 2025, 36, 2043–2063. Available online: http://www.jos.org.cn/1000-9825/7207.htm (accessed on 21 July 2025).
Zhang, Y.; Hu, Y.; Chen, X. Context and multi-features-based vulnerability detection: A vulnerability detection frame based on context slicing and multi-features. Sensors 2024, 24, 1351. [Google Scholar] [CrossRef] [PubMed]
Gu, S.; Chen, W. Function level code vulnerability detection method of graph neural network based on extended AST. Comput. Sci. 2023, 50, 283–290. [Google Scholar]
Yao, W.; Jiang, Y.; Yang, Y. The metric for automatic code generation based on dynamic abstract syntax tree. Int. J. Digit. Crime Forensics 2023, 15, 20. [Google Scholar] [CrossRef]
Agarwal, S.; Agrawal, A.P. An empirical study of control dependency and data dependency for large software systems. In Proceedings of the 2014 5th International Conference—Confluence: The Next Generation Information Technology Summit, Noida, India, 25–26 September 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 877–879. [Google Scholar]
Kalhauge, C.G.; Palsberg, J. Binary reduction of dependency graphs. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Tallinn, Estonia, 26–30 August 2019; ACM: New York, NY, USA, 2019; pp. 556–566. [Google Scholar]
Guo, H.; Chen, X.; Huang, Y.; Wang, Y.; Ding, X.; Zheng, Z.; Zhou, X.; Dai, H. Snippet comment generation based on code context expansion. ACM Trans. Softw. Eng. Methodol. 2023, 33, 24. [Google Scholar] [CrossRef]
Roy, J.; Patel, R.; Simon, S. Dynamic Syntax Tree Model for Enhanced Source Code Representation. J. Softw. Eng. Res. Dev. 2023. preprint. [Google Scholar]
Lekeufack Foulefack, R.Z.; Marchetto, A. Enhanced Graph Neural Networks for Vulnerability Detection in Java via Advanced Subgraph Construction. In Proceedings of the IFIP International Conference on Testing Software and Systems, London, UK, 30 October–1 November 2024; Springer Nature: Cham, Switzerland, 2024; pp. 131–148. [Google Scholar]
Sankar, A.; Wu, Y.; Gou, L.; Zhang, W.; Yang, H. DySAT: Deep neural representation learning on dynamic graphs via self-attention networks. In Proceedings of the 13th International Conference on Web Search and Data Mining, Houston, TX, USA, 3–7 February 2020; ACM: New York, NY, USA, 2020; pp. 519–527. [Google Scholar]
Pareja, A.; Domeniconi, G.; Chen, J.; Ma, T.; Suzumura, T.; Kanezashi, H.; Kaler, T.; Schardl, T.; Leiserson, C. EvolveGCN: Evolving graph convolutional networks for dynamic graphs. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; AAAI Press: Palo Alto, CA, USA, 2020; Volume 34, pp. 5363–5370. [Google Scholar]
Cui, Z.; Li, Z.; Wu, S.; Zhang, X.; Liu, Q.; Wang, L.; Ai, M. DyGCN: Efficient dynamic graph embedding with graph convolutional network. IEEE Trans. Neural Netw. Learn. Syst. 2022, 35, 4635–4646. [Google Scholar] [CrossRef] [PubMed]
Li, Z.L.; Zhang, G.W.; Yu, J.; Xu, L.Y. Dynamic graph structure learning for multivariate time series forecasting. Pattern Recognit. 2023, 138, 109423. [Google Scholar] [CrossRef]
Mu, Z.; Zhuang, Y.; Tang, S. Contrastive Hawkes graph neural networks with dynamic sampling for event prediction. Neurocomputing 2024, 575, 127265. [Google Scholar] [CrossRef]
Xia, Z.; Zhang, Y.; Yang, J.; Xie, L. Dynamic spatial–temporal graph convolutional recurrent networks for traffic flow forecasting. Expert Syst. Appl. 2024, 240, 122381. [Google Scholar] [CrossRef]
Jiang, Y.; Huang, P.; Gu, J. Analysis of the impact scope of code changes based on DAST and GCN. J. Kunming Univ. Sci. Technol. (Nat. Sci.) 2024, 49, 118–127. [Google Scholar]
Martínez, V.; Berzal, F.; Cubero, J.C. A survey of link prediction in complex networks. ACM Comput. Surv. 2016, 49, 1–33. [Google Scholar] [CrossRef]
Zhou, T. Discriminating abilities of threshold-free evaluation metrics in link prediction. Phys. A Stat. Mech. Its Appl. 2025, 615, 128529. [Google Scholar] [CrossRef]
Manessi, F.; Rozza, A.; Manzo, M. Dynamic graph convolutional networks. Pattern Recognit. 2020, 97, 107000. [Google Scholar] [CrossRef]
Xu, D.; Ruan, C.; Korpeoglu, E.; Kumar, S.; Achan, K. Inductive representation learning on temporal graphs. arXiv 2020, arXiv:2002.07962. [Google Scholar] [CrossRef]
Ma, Y.; Guo, Z.; Ren, Z.; Tang, J.; Yin, D. Streaming graph neural networks. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual, 25–30 July 2020; ACM: New York, NY, USA, 2020; pp. 719–728. [Google Scholar]
Gao, Y.; Feng, Y.; Ji, J.R. HGNN+: General hypergraph neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 3181–3199. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Example of dependency creation and disappearance during DAST evolution. (a) Code changes over time (

t_{1} \to t_{2} \to t_{3}

). (b) Corresponding dependency tree updates, where solid blue arrows denote existing VC dependencies, new dependencies are added in later steps, and dashed lines represent modified subtree links originating from code updates. Blue arrows indicate global dependency relationships preserved across code versions.

Figure 1. Example of dependency creation and disappearance during DAST evolution. (a) Code changes over time (

t_{1} \to t_{2} \to t_{3}

). (b) Corresponding dependency tree updates, where solid blue arrows denote existing VC dependencies, new dependencies are added in later steps, and dashed lines represent modified subtree links originating from code updates. Blue arrows indicate global dependency relationships preserved across code versions.

Figure 2. Limitations of Traditional Dynamic Graph Neural Networks on DAST. (a) Local aggregation process in traditional DGNNs. (b) Examples of structurally similar but semantically different AST subtrees.

Figure 3. Overview of the proposed DyTSSAM framework.

Figure 4. Illustration of DAST subgraph sequences segmented using fixed time intervals.

Figure 5. Example of structural aggregation in DSAM. (a) A DAST example illustrating the aggregation process. (b) The attention-based feature aggregation of the if node with its syntactic edge and child node features.

Figure 6. Example of Update Operations in the Dynamic Syntax Layer.

Figure 7. Example of global and local dependencies. (a) Global dependencies established when the model has full structural visibility at time

t_{1}

. (b) Local dependencies inferred under partial structural observation at time

t_{3}

.

Figure 7. Example of global and local dependencies. (a) Global dependencies established when the model has full structural visibility at time

t_{1}

. (b) Local dependencies inferred under partial structural observation at time

t_{3}

.

Figure 8. Illustration of the temporal semantic layer and decay weighting. (a) Temporal dependency aggregation of the Assign node at time

t_{5}

. (b) Temporal decay process where outdated dependencies gradually lose their influence.

Figure 8. Illustration of the temporal semantic layer and decay weighting. (a) Temporal dependency aggregation of the Assign node at time

t_{5}

. (b) Temporal decay process where outdated dependencies gradually lose their influence.

Figure 9. Comparison of AUC performance between node-level and subtree-level tasks.

Figure 10. Comparison of AP performance between node-level and subtree-level tasks.

Figure 11. Case study at time

t_{1}

: Initial code segment and dependency aggregation. (a) Original code snippet; (b) Corresponding DAST; (c) DSAM-based aggregation in DyTSSAM; (d) Neighbor-based aggregation in EvolveGCN.

Figure 11. Case study at time

t_{1}

: Initial code segment and dependency aggregation. (a) Original code snippet; (b) Corresponding DAST; (c) DSAM-based aggregation in DyTSSAM; (d) Neighbor-based aggregation in EvolveGCN.

Figure 12. Case study at time

t_{2}

: Integration of new subtrees and temporal dependency fusion. (a) Modified code snippet with an added line; (b) Updated DAST; (c) DyTSSAM aggregation and update operations; (d) EvolveGCN aggregation and update operations.

Figure 12. Case study at time

t_{2}

: Integration of new subtrees and temporal dependency fusion. (a) Modified code snippet with an added line; (b) Updated DAST; (c) DyTSSAM aggregation and update operations; (d) EvolveGCN aggregation and update operations.

Figure 13. Case study at time

t_{3}

: Control-flow refactoring and dependency update. (a) Refactored code snippet; (b) Updated DAST after control-flow change; (c) DSAM-based feature update and pruning of obsolete dependencies; (d) Comparative EvolveGCN result lacking accurate dependency deletion.

Figure 13. Case study at time

t_{3}

: Control-flow refactoring and dependency update. (a) Refactored code snippet; (b) Updated DAST after control-flow change; (c) DSAM-based feature update and pruning of obsolete dependencies; (d) Comparative EvolveGCN result lacking accurate dependency deletion.

Figure 14. Radar chart of evaluation metrics for different model variants.

Table 1. Dependency edge types and semantic definitions in DAST.

Dependency Type	Parent Node Type	Semantic Description
Assign_VD	Assign → Assign, Call, Return, etc.	Variable dependency: a variable references another variable; the calling variable points to the referenced variable.
For_VD	For → Assign, Call, Return, etc.	Loop variable dependency: a statement accesses variables defined in the for loop header; the statement points to the loop variable.
VC	If, While, For → Assign, Call, Return, etc.	Control-flow dependency: statements inside a control block point to the controlling statement.
Arg	FunctionDef → Assign, Call, Return, etc.	Argument dependency: a statement in a function body references a parameter defined in the function header.
Call	Call → FunctionDef	Function call dependency: a statement calls a function; the Call node points to the corresponding FunctionDef node.
Ret	FunctionDef → Return	Return dependency: the return statement connects back to the function definition to represent the return path.
Glo	Assign → Global	Global declaration dependency: a global statement declares a local variable as global, pointing from the declaration to its defining statement.

Table 2. Notation Summary for DyTSSAM Framework.

Symbol	Definition
$X_{t}$	Node feature matrix at time step t
$E_{t}^{syn}$	Set of syntactic edges in the AST at time t
$E_{t}^{dep}$	Set of dependency edges in the DAST at time t
$A_{t}$	Feature matrix of syntactic edges at time t
$M_{t}$	Memory bank storing node representations up to time t
$z_{v}^{t}$	Updated representation of node v at time t
$h_{v}^{t}$	Temporarily computed node feature before decay fusion
$Δ t$	Time interval between consecutive code changes
$λ_{v}$	Learnable temporal decay weight for node v
$W_{Q}^{l}, W_{K}^{l}, W_{V}^{l}$	Linear transformation matrices at layer l
$Q_{v}^{l}, K_{u}^{l}, V_{u}^{l}$	Query, Key, and Value vectors at layer l
$f_{i}^{enh}$	Enhanced dependency feature after LGDAM
$T_{i}$	Encoded temporal feature by TimeEncoder
$q, k_{j}, v_{j}$	Query, Key, and Value representations in TDAM
h	Output feature for dependency prediction

Table 3. Overall performance comparison across all models (mean ± standard deviation over 10 runs).

Model	AUC	Recall	AP	Dep_Acc	MRR
DyTSSAM	$0.9742 \pm 0.031$	$0.8200 \pm 0.048$	$0.9739 \pm 0.029$	$0.8936 \pm 0.050$	$0.8300 \pm 0.043$
CDGCN	$0.8856 \pm 0.057$	$0.8484 \pm 0.062$	$0.8935 \pm 0.068$	$0.8200 \pm 0.074$	$0.7500 \pm 0.051$
DySAT	$0.8271 \pm 0.073$	$0.7467 \pm 0.069$	$0.9185 \pm 0.061$	$0.8133 \pm 0.067$	$0.8400 \pm 0.064$
EvolveGCN	$0.8750 \pm 0.055$	$0.7613 \pm 0.059$	$0.8650 \pm 0.072$	$0.7879 \pm 0.068$	$0.6300 \pm 0.055$
DyGCN	$0.8566 \pm 0.075$	$0.7562 \pm 0.070$	$0.8700 \pm 0.073$	$0.8241 \pm 0.071$	$0.7620 \pm 0.060$
TGAT	$0.7847 \pm 0.069$	$0.5123 \pm 0.084$	$0.7692 \pm 0.091$	$0.8237 \pm 0.073$	$0.7621 \pm 0.075$
DyGNN	$0.4939 \pm 0.081$	$0.3824 \pm 0.075$	$0.4157 \pm 0.090$	$0.6400 \pm 0.087$	$0.6472 \pm 0.084$
HGNN	$0.6719 \pm 0.077$	$0.4679 \pm 0.083$	$0.5873 \pm 0.085$	$0.5863 \pm 0.080$	$0.6324 \pm 0.076$
JLineVD+	$0.8310 \pm 0.035$	$0.7901 \pm 0.026$	$0.8000 \pm 0.063$	$0.7603 \pm 0.043$	$0.7523 \pm 0.062$

All results are averaged over 10 runs with different random seeds.

Table 4. Performance of ablated variants on dependency prediction.

Model	AUC	AP	Recall	Dep_Acc	MRR
DyTSSAM-V1	0.6361	0.5593	0.5870	0.6746	0.4856
DyTSSAM-V2	0.7457	0.7425	0.6400	0.7745	0.7223
DyTSSAM-V3	0.8833	0.8733	0.7618	0.8724	0.8869
DyTSSAM-V4	0.9540	0.9720	0.8500	0.9400	0.9215

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, Y.; Jiang, Y.; Huang, P. DyTSSAM: A Dynamic Dependency Analysis Model Based on DAST. Electronics 2025, 14, 4443. https://doi.org/10.3390/electronics14224443

AMA Style

Zhao Y, Jiang Y, Huang P. DyTSSAM: A Dynamic Dependency Analysis Model Based on DAST. Electronics. 2025; 14(22):4443. https://doi.org/10.3390/electronics14224443

Chicago/Turabian Style

Zhao, Yuxiang, Ying Jiang, and Peifeng Huang. 2025. "DyTSSAM: A Dynamic Dependency Analysis Model Based on DAST" Electronics 14, no. 22: 4443. https://doi.org/10.3390/electronics14224443

APA Style

Zhao, Y., Jiang, Y., & Huang, P. (2025). DyTSSAM: A Dynamic Dependency Analysis Model Based on DAST. Electronics, 14(22), 4443. https://doi.org/10.3390/electronics14224443

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

DyTSSAM: A Dynamic Dependency Analysis Model Based on DAST

Abstract

1. Introduction

2. Related Work

2.1. Program Dependence Analysis Based on Static Program Data

2.2. Temporal Graph Evolution Analysis

2.3. Dynamic Spatiotemporal Dependency Modeling

3. Preliminary

3.1. Definition of DAST and Dependencies

3.2. Limitations of Traditional Dynamic Graph Neural Networks on DAST

3.3. Problem Definition

4. Methodology

4.1. Change-Based Temporal Subtree Segmentation of DAST

4.2. Aggregating Dynamic Syntactic Structural Features

4.3. Dependency Source Analysis and Temporal Semantic Feature Aggregation

4.4. Contrastive Learning and Loss Function

5. Experiments and Analysis

5.1. Dataset and Data Preprocessing

5.2. Evaluation Metrics

5.3. Baselines

5.4. Experimental Setting

5.5. Experiments Results

5.6. Case Study: Fine-Grained Dependency Evolution Analysis

5.7. Ablation Study

6. Threats to Validity

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI