GraphGPT-Patent: Time-Aware Graph Foundation Modeling on Semantic Similarity Document Graphs for Grant-Time Economic Impact Prediction

Fang, Tianhui; Si, Junru; Ye, Chi; Shi, Hailong

doi:10.3390/app16062737

Open AccessArticle

GraphGPT-Patent: Time-Aware Graph Foundation Modeling on Semantic Similarity Document Graphs for Grant-Time Economic Impact Prediction

¹

School of Economics and Management, Jiangxi Institute of Applied Science and Technology, Nanchang 330100, China

²

Institutes of Science and Development, Chinese Academy of Sciences, Beijing 100190, China

³

Institute of Microelectronics, Chinese Academy of Sciences, Beijing 100190, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(6), 2737; https://doi.org/10.3390/app16062737

Submission received: 9 February 2026 / Revised: 4 March 2026 / Accepted: 11 March 2026 / Published: 12 March 2026

(This article belongs to the Special Issue Graph-Based Methods in Artificial Intelligence and Machine Learning, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Predicting the future impact of technical economic documents at release time is challenging due to delayed supervision signals, long-tailed label distributions, and time- and domain-dependent shifts in language and topics. Moreover, similarity graphs derived from text embeddings can be noisy due to boilerplate and evolve under temporal drift, making robustness and leakage-free evaluation essential. We formulate grant-time patent impact prediction as a node classification and within-domain ranking problem on a large-scale semantic similarity document graph built from patent text embeddings, avoiding any future citation leakage. The document graph is constructed via ANN Top-K retrieval and similarity thresholding, enabling scalable and reproducible sparsification on hundreds of thousands of nodes. We propose GraphGPT-Patent, which adapts a reversible graph-to-sequence foundation backbone to local subgraphs extracted from the similarity network. The model incorporates time- and domain-conditioned edge reliability to suppress drift-induced and template-driven pseudo-similarity, and optimizes a joint objective coupling high-impact classification with ranking consistency within comparable groups. Experiments on USPTO granted patents (2000–2022) across three high-volume CPC domains and three evaluation horizons show consistent gains over text-only and GNN baselines, achieving up to 0.94 recall for the positive class and improved macro-average recall across nine settings. Temporal shift analyses further quantify the effect of training-data freshness, while explanation subgraphs provide auditable structural evidence of model decisions. The proposed framework offers an effective graph-based learning pipeline for scalable impact prediction and downstream triage under strict information constraints.

Keywords:

graph foundation model; graph machine learning; edge denoising; temporal shift; explainable graph learning; economic impact

1. Introduction

Predicting future impact from content available at release time is a core problem in document mining and information retrieval. In many realistic settings, supervision arrives late and is highly imbalanced, while only limited information is available at decision time. From an ML standpoint, this resembles long-tailed high-recall retrieval under delayed supervision, where evaluation must be strictly time-respecting to avoid leakage. Patents provide a large-scale, structured corpus of technical documents, and forward citations are widely used as a practical proxy signal for downstream impact modeling and evaluation under strict grant-time information constraints [1,2]. The ability to surface potentially high-impact items early is valuable for prioritization and downstream inspection in large corpora [1,3]. Prior work has leveraged patent text and metadata to study technological evolution and citation-based impact proxies [4]. Empirical studies report that forward citations correlate with diffusion and other outcomes while acknowledging noise from examination procedures, strategic citations, and institutional differences [5,6]. Despite these limitations, citations remain a scalable and traceable supervision signal for impact prediction research [7].

Grant-time prediction based on forward citations faces several fundamental tensions. First, citation signals exhibit delayed supervision and strong domain heterogeneity: diffusion speeds, examination rhythms, and citation habits vary across technical domains, making short-horizon prediction inherently more difficult [8]. Second, directly using citation networks as model inputs can introduce future-information leakage or boundary inconsistencies between training and evaluation, violating the grant-time constraint and undermining reproducibility [9]. Third, practical deployment often requires auditable evidence rather than opaque scores, motivating explainable learning pipelines and diagnostic views of model behavior. In operational settings, grant-time prediction is typically used as an early-screening layer rather than a final valuation decision. A patent office or enterprise analytics team first generates a recall-oriented shortlist for newly granted patents, then performs expert review, portfolio alignment checks, and downstream prioritization under fixed analyst capacity. This workflow motivates evaluating not only classification accuracy but also whether the model surfaces actionable candidates early and supports auditable inspection of why they are prioritized.

To leverage relational inductive bias without leakage, we construct a semantic similarity document graph from patent text embeddings. Nodes correspond to documents and edges encode semantic proximity obtained via scalable approximate nearest neighbor (ANN) Top-K retrieval and similarity-threshold pruning, producing a sparse graph without using future citation edges. This formulation makes grant-time impact prediction a semantic-graph learning problem under time- and domain-dependent shifts. Viewed as a graph data management pipeline, ANN retrieval and sparsification convert dense embedding neighborhoods into a sparse, queryable document graph that can be reused across models and ablations. At the same time, semantic edges can be noisy due to boilerplate and template-driven pseudo-similarity, and the embedding space drifts over time, changing neighborhood structure and class separability across eras. These properties require robustness beyond stronger encoders, spanning graph construction, edge reliability, and evaluation.

Based on this formulation, we propose GraphGPT-Patent, adapting the reversible graph-to-sequence backbone of GraphGPT to node-level prediction on local subgraphs extracted from the similarity network [10]. Beyond the backbone, we incorporate time- and domain-conditioned edge reliability to suppress drift-induced and cross-domain pseudo-similarity, and we optimize a joint objective that couples binary impact classification with within-group ranking consistency to support high-recall triage followed by prioritization. Finally, we generate explanatory subgraphs and structural diagnostics to provide auditable evidence for model decisions. Technically, the contribution is a coupled design rather than a single stronger encoder: (i) reversible graph-to-sequence adaptation enables local semantic-neighborhood evidence to be serialized and mapped back for inspection; (ii) time/domain-conditioned edge reliability downweights drift-prone and boilerplate-driven pseudo-neighbors instead of assuming uniform edge trustworthiness; and (iii) a joint classification–ranking objective aligns model training with grant-time shortlist generation and within-cohort prioritization. This positioning differentiates GraphGPT-Patent from both encoder-only baselines and conventional message-passing graph baselines.

The main contributions of this paper are as follows:

We introduce a leakage-free benchmark protocol for grant-time impact prediction on large-scale patent document graphs, spanning three CPC domains and three evaluation horizons.
We propose GraphGPT-Patent, a graph foundation adaptation method based on reversible graph-to-sequence serialization, augmented with time- and domain-conditioned edge reliability modeling for semantic graphs.
We design a joint classification–ranking objective to support high-recall detection and stable within-domain prioritization under delayed, long-tailed supervision signals.
We provide explainable graph evidence via subgraph attributions and structural diagnostics, enabling auditable model analysis across domains and time.

The remainder of this paper is organized as follows: Section 2 reviews related work on impact prediction with delayed supervision, document graph learning, graph foundation models, robust graph learning, and explainable graph learning; Section 3 describes the data, semantic graph construction, and leakage-free evaluation protocol; Section 4 details GraphGPT-Patent, time/domain-conditioned edge reliability, the joint classification–ranking objective, and subgraph-based explanations; Section 5 reports results and analyzes domain/window effects, temporal shift, and explanation diagnostics; Section 6 discusses robustness, scalability, limitations, and future directions.

2. Related Work

2.1. Document Impact Prediction and Delayed Supervision

Quantifying and predicting document impact using citation-type signals is a long-standing direction. Early studies use patent citations as measurable signals of technological influence and diffusion [11,12]. Subsequent empirical work links citation counts to multiple observable outcomes and highlights sources of noise, implying that citation-based labels are informative but delayed and imperfect [13,14]. These observations motivate a grant-time prediction setting where citations serve as a delayed supervision signal and evaluation proxy rather than as direct model inputs.

2.2. Semantic Similarity Graphs and Document Graph Learning

At the representation level, patent text analysis has evolved from bag-of-words and topic models to deep representation learning [15,16]. Distributed representations such as Doc2Vec learn scalable document-level semantic vectors for large patent corpora [17]. Pre-trained language models oriented towards patent text further strengthen embeddings for tasks such as classification and retrieval; for instance, PatentBERT shows stable advantages and provides a stronger semantic encoder for downstream prediction and similarity construction [18]. Embedding spaces also enable semantic similarity document graphs, which encode relational structure without using future citation edges. More broadly, context-conditioned graph modeling has been explored to handle heterogeneity and conditioning variables, which connects to our time/domain-conditioned design [19].

2.3. Graph Foundation Models and Graph-to-Sequence Transformers

Graph foundation models extend the pre-training and adaptation paradigm to graph learning, aiming to learn transferable structural inductive biases across graphs and tasks [20,21]. GraphGPT is a generative graph foundation approach that performs reversible graph serialization via Eulerian traversal, enabling graph-to-sequence modeling with Transformer backbones [10]. We build on this line by adapting graph-to-sequence modeling to semantic similarity document graphs while explicitly addressing edge noise and temporal/domain shift [22].

2.4. Robust Graph Learning: Edge Noise, Denoising, Temporal/Domain Shift

Learning on semantic similarity graphs can be implemented with standard graph ML backbones. Message-passing frameworks such as GCN and GraphSAGE are widely adopted for node-level prediction and integrate neighborhood context via local aggregation [23,24]. In semantic graphs built from text similarity, edges may reflect genuine proximity but also template-driven pseudo-similarity and cross-era drift, making robustness to edge noise and temporal/domain shift a first-order concern. Our time- and domain-conditioned edge reliability modeling can be viewed as a lightweight denoising mechanism that targets these artifacts while preserving scalability.

2.5. Explainable Graph Learning and Subgraph-Based Explanations

Explainable graph learning provides tools to surface evidence for node-level predictions, often through subgraph attribution that identifies a sparse explanatory structure supporting a model decision. GNNExplainer is a representative approach that learns edge/feature masks to preserve predictions, offering an auditable view of graph-based reasoning [25]. We adopt a related subgraph-based evidence interface and further summarize structural diagnostics of explanation subgraphs across domains and time. Taken together, prior work motivates a unified pipeline in which leakage-free semantic graph construction addresses grant-time constraints, graph foundation modeling provides transferable structural capacity, reliability-aware edge weighting addresses temporal/domain noise in semantic neighborhoods, and ranking-aware training aligns optimization with shortlist-oriented deployment. This mapping directly motivates the method design in Section 3 and Section 4.

3. Data, Document Graph Construction, and Evaluation Protocol

This paper utilizes a set of granted USPTO patents constructed in public research, covering the period from 2000 to 2022 [26]. We select three high-volume CPC main classes to form three domains for cross-domain evaluation: A61 (Medical or veterinary science), H04 (Electric communication technique), and G06 (Computing). The sample sizes for each domain are shown in Table 1.

3.1. CPC Domains for Cross-Domain Evaluation

We use CPC main classes as a convenient partition of the patent corpus into domains with distinct technical vocabularies and citation dynamics. For readability, A61 can be viewed as a medical/device-heavy domain, H04 as communications, and G06 as computing; this mapping is used only as a data description and does not affect the modeling pipeline.

3.2. High-Impact Labels and Evaluation Windows

Let the patent collection be denoted as

P = {P_{1}, \dots, P_{n}}

. For patent P_i, let its grant year be t_i, and the forward citation count obtained within d years after granting be

C_{i}^{d}

. When the full d-year horizon is not observable by 2022, we compute

C_{i}^{d_{i}}

with

d_{i} = \min (d, 2022 - t_{i})

. To accommodate the long-tailed distribution and emphasize rare positive-class retrieval, we use quantile thresholds to define binary classification labels. Given a percentile parameter x, let

C_{x, h}

be the upper x% quantile threshold of

C^{d}

, and

C_{x, l}

be the lower x% quantile threshold. The label function is defined as:

y_{i}^{d} = \{\begin{matrix} 1, & C_{i}^{d} \geq C_{x, h}, \\ 0, & C_{i}^{d} \leq C_{x, l} . \end{matrix}

(1)

The main setting of this paper uses x = 10, meaning the Top 10% are the high-citation positive class and the Bottom 10% are the low-citation negative class. The remaining samples do not participate in the training and testing of this setting, thereby alleviating evaluation bias caused by class imbalance while emphasizing extreme impact differences. We treat forward citations as a practical impact proxy; our goal is to learn leakage-free grant-time predictors rather than to claim an exhaustive measure of patent value. The extreme-quantile design also reduces label ambiguity from mid-range citations, sharpens supervision contrast under long-tailed distributions, and matches the operational goal of high-impact triage at grant time. Citation-derived labels are informative but not bias-free. In particular, examiner-added citations, strategic self-citations, field-specific citation propensity, age-truncation effects near the observation cutoff, and legal/portfolio behaviors can all perturb observed counts without reflecting intrinsic technical value. We therefore interpret reported metrics as performance under a citation-driven supervision protocol and assess robustness across domains, windows, and alternative quantile thresholds rather than over-claiming universal value prediction.

3.3. Semantic Similarity Network Construction

To avoid future citation leakage, the graph structure is constructed solely from text semantic similarity. Patent texts are encoded into vector representations (Doc2Vec, 100-d; or PatentBERT, 1024-d). We define cosine similarity:

s_{i j} = \frac{z_{i}^{⊤} z_{j}}{{∥ z_{i} ∥}_{2} {∥ z_{j} ∥}_{2}} .

(2)

Importantly, to make graph construction scalable on hundreds of thousands of patents, we do not compute all-pairs similarities. Instead, we perform approximate nearest neighbor (ANN) search to retrieve a Top-K candidate set for each patent (e.g.,

K \in [50, 200]

), and then apply similarity thresholding within this candidate set: an edge

(i, j)

is kept if

s_{i j} \geq t

(with

t \in [0.62, 0.80]

). We symmetrize the graph by taking the union (or mutual) of directed Top-K links. By tuning

(K, t)

, the average degree can be controlled within 5–25, enabling a systematic diagnosis of the trade-off between semantic edge noise and information propagation range. In practice, the ANN index is built on L2-normalized embeddings, and cosine similarity is computed as inner product; we use an ANN backend to retrieve Top-K neighbors for each node before threshold filtering.

3.4. Scalability and Graph Construction Complexity

The graph construction pipeline is designed for reproducibility and scalability. Given L2-normalized embeddings, ANN Top-K retrieval yields a sparse candidate set per node, and the similarity threshold t controls graph sparsity and average degree. The final graph is symmetrized by a deterministic rule (union or mutual neighbors), making it straightforward to reproduce a target sparsity regime and to study sensitivity to

(K, t)

without changing the downstream model. This decoupling enables fair comparisons across baselines, since the same graph can be shared while varying only the learning backbone and adaptation modules. In practice, the resulting weighted adjacency can be stored as a sparse edge list and supports efficient k-hop subgraph extraction or mini-batch neighborhood sampling for scalable training.

3.5. Training and Testing Split

The main experiment employs a strict temporal split to comply with the grant-time impact prediction setting: the training set uses patents granted from 2000 to 2015, and the test set uses patents granted in 2016. Forward citations are computed using 2022 as the observation cutoff. Accordingly, we consider three nominal windows

d \in {3, 5, 10}

, but for patents whose grant year t_i makes the full window unobservable by 2022, we use the truncated window

d_{i} = \min (d, 2022 - t_{i})

. For simplicity, we still refer to these settings as 3y/5y/10y*, where “10y*” denotes the up-to-2022 observed window rather than a fully observed 10-year horizon for all samples. To ensure comparability, the same observation-capped rule is applied to both training and test sets under each nominal window. The task settings and temporal splits are shown in Table 2.

4. Method

We address semantic-graph learning under strict grant-time information constraints, where impact labels are delayed and long-tailed and where semantic neighborhoods shift across time and domains. This section proposes GraphGPT-Patent, which operates on a semantic similarity patent graph and adapts the reversible graph-to-sequence representation of GraphGPT to node-level prediction on local subgraphs. The method incorporates time/domain-conditioned edge reliability to suppress drift-induced and template-driven pseudo-similarity, and it couples impact classification with within-group ranking consistency to support high-recall triage followed by prioritization. Explanatory subgraphs further provide auditable evidence for model behavior. Figure 1 presents the overall framework.

4.1. GraphGPT-Patent

Let the set of patents be

V = {1, \dots, n}

. For each patent i, the text embedding is

z_{i} \in R^{d}

, the grant year is

t_{i}

, and the CPC main class is

c_{i}

. The semantic similarity is defined as:

s_{i j} = \frac{z_{i}^{⊤} z_{j}}{{∥ z_{i} ∥}_{2} {∥ z_{j} ∥}_{2}} .

(3)

The semantic graph

G = (V, E, X)

is generated via thresholding or nearest neighbor selection. Taking the threshold scheme as an example:

E = {(i, j) ∣ i \neq j, s_{i j} \geq δ},

(4)

where

δ

controls the graph density. To maintain the comparability of local structures across different years and domains, degree-constrained nearest neighbor pruning is further adopted. To keep construction scalable, the thresholding step is performed within the ANN-retrieved Top-K candidate neighbor set rather than over all node pairs. Let

N_{δ} (i) = {j ∣ (i, j) \in E}

; we retain the

k_{n}

neighbors with the highest similarity:

N (i) = TopK (N_{δ} (i); s_{i j}, k_{n}) .

(5)

This results in the final edge set

E^{'} = {(i, j) ∣ j \in N (i)}

, upon which a weighted adjacency matrix

A \in R^{n \times n}

is defined:

A_{i j} = \{\begin{matrix} w_{i j}, & (i, j) \in E^{'}, \\ 0, & otherwise . \end{matrix}

(6)

The construction of

w_{i j}

is described in Section 4.2.

For each target patent u, its k-hop induced subgraph is extracted. Let the k-hop node set be:

V_{u}^{(k)} = {v \in V ∣ {dist}_{G} (u, v) \leq k},

(7)

and the induced subgraph be defined as:

G_{u} = (V_{u}^{(k)}, E_{u}, X_{u}), E_{u} = E^{'} \cap (V_{u}^{(k)} \times V_{u}^{(k)}) .

(8)

Traditional message-passing models update node representations through iterative neighborhood aggregation:

h_{i}^{(ℓ)} = AGGR (h_{i}^{(ℓ - 1)}, {h_{j}^{(ℓ - 1)} ∣ j \in N (i)}), h_{i}^{(0)} = z_{i} .

(9)

GraphGPT-Patent employs GraphGPT as the sole structural backbone, reversibly linearizing the subgraph

G_{u}

into a token sequence modeled by a Transformer. Let the linearized sequence be:

S_{u} = (τ_{1}, τ_{2}, \dots, τ_{L}) = Eulerize (G_{u}),

(10)

where each token

τ_{ℓ}

corresponds to an element in the “node—edge—node” traversal, carrying type and attributes (node attributes, edge weights, similarity, etc.). The serialization is reversible, enabling predictions and explanation masks learned in sequence space to be mapped back to nodes and edges in the original subgraph for inspection. Operating on k-hop-induced subgraphs also bounds sequence length (proportional to local edge count) and keeps Transformer computation tractable while preserving neighborhood evidence. An additive decomposition is applied to the input embedding of each token:

e_{ℓ} = e^{type} (τ_{ℓ}) + e^{id} (τ_{ℓ}) + e^{attr} (τ_{ℓ}) + e^{pos} (ℓ) .

(11)

Here,

e^{pos} (ℓ)

is the positional encoding. Using sinusoidal positional encoding, it is represented as:

e^{pos} {(ℓ)}_{2 m} = \sin (ℓ / 10000^{2 m / D}), e^{pos} {(ℓ)}_{2 m + 1} = \cos (ℓ / 10000^{2 m / D}) .

(12)

Let

E_{u} = [e_{1}; \dots; e_{L}] \in R^{L \times D}

. The attention calculation for the ℓ-th layer of the Transformer is:

Q = E W_{Q}, K = E W_{K}, V = E W_{V},

(13)

Attn (Q, K, V) = softmax (\frac{Q K^{⊤}}{\sqrt{D_{h}}}) V .

(14)

Standard residual connections and layer normalization are used to update the multi-head attention (MHA) and feed-forward networks (FFNs):

H^{(l)} = LN (H^{(l - 1)} + MHA (H^{(l - 1)})),

(15)

H^{(l)} = LN (H^{(l)} + FFN (H^{(l)})) .

(16)

To obtain the contextualized representation of the target patent u, sequence pooling is employed. Let

I (u)

be the set of token positions corresponding to node u in the sequence; then:

H_{u} = Pool ({H_{ℓ}^{(L)} ∣ ℓ \in I (u)}), Pool (\cdot) \in {mean, attn - pool} .

(17)

For each prediction window d, the shared representation

H_{u}

is passed through a window-specific classification head to output the high-impact probability:

p_{u}^{(d)} = σ (w_{d}^{⊤} H_{u} + b_{d}) .

(18)

4.2. Time- and Domain-Conditioned Edge Reliability Modeling

Noise in semantic similarity edges primarily stems from cross-temporal semantic drift and domain-dependent boilerplate/template expressions. To reduce the interference of pseudo-similar connections on information propagation, GraphGPT-Patent introduces time- and domain-conditioned reliability weights at the edge level. Let

Δ t_{i j} = | t_{i} - t_{j} |

. We define the temporal decay term as:

g_{t} (i, j) = exp (- \frac{Δ t_{i j}}{τ}),

(19)

where

τ > 0

controls the intensity of cross-temporal suppression. For domain compatibility, we introduce the CPC distance

d_{cpc} (c_{i}, c_{j})

. At the main class level, this can be taken as:

d_{cpc} (c_{i}, c_{j}) = I [c_{i} \neq c_{j}],

(20)

and the domain decay term is defined as:

g_{c} (i, j) = exp (- \frac{d_{cpc} (c_{i}, c_{j})}{κ}),

(21)

where

κ > 0

controls the intensity of cross-domain suppression. The final edge weight is defined as:

w_{i j} = s_{i j} \cdot g_{t} (i, j) \cdot g_{c} (i, j) .

(22)

This weight is utilized in two places: first, in the construction of the weighted adjacency matrix

A

; second, written as an edge attribute into

e^{attr} (τ_{ℓ})

of the token, allowing GraphGPT to explicitly perceive edge reliability differences in the sequence space. For numerical stability, the outgoing edge weights of each node can be normalized:

{\tilde{w}}_{i j} = \frac{w_{i j}}{\sum_{j^{'} \in N (i)} w_{i j^{'}} + ϵ},

(23)

and

{\tilde{w}}_{i j}

is used instead of

w_{i j}

for the token attribute. Figure 2 summarizes the reliability-weight computation, subgraph serialization, and joint optimization flow described above and in Section 4.3.

4.3. Joint Objective of Impact Ranking Consistency

In practical document mining pipelines, a classifier is often used for high-recall triage of rare positives, followed by ranking to prioritize candidates for downstream inspection under limited review capacity. Related work also studies prediction under partial/implicit feedback in online settings, reflecting limited-feedback constraints in real-world retrieval and screening pipelines [27]. Motivated by this workflow, we couple Top–Bottom classification with within-group ranking consistency. Top–Bottom binary classification is conducive to controlling class imbalance and emphasizing extreme impact differences, but it discards ordinal information in citation intensity and can lead to instability on boundary samples. To unify binary identification with intra-group prioritization, a joint objective of classification and ranking consistency is adopted. For the binary label

y_{u}^{(d)} \in {0, 1}

of window d, the classification loss is:

L_{cls}^{(d)} = - \sum_{u \in D} (y_{u}^{(d)} log p_{u}^{(d)} + (1 - y_{u}^{(d)}) log (1 - p_{u}^{(d)})) .

(24)

The ranking loss is constructed within comparable sets. Let the group identifier be

g (u) = (c_{u}, t_{u})

. Within the same group, we sample a set of pairs

Ω_{d}

satisfying

C_{i}^{(d)} > C_{j}^{(d)}

. Let

r_{u}^{(d)} = w_{d}^{⊤} H_{u} + b_{d}

be the logit score; the logistic ranking loss is then adopted:

L_{rank}^{(d)} = \sum_{(i, j) \in Ω_{d}} log (1 + exp (- (r_{i}^{(d)} - r_{j}^{(d)}))) .

(25)

The final joint training objective for multiple windows is:

L = \sum_{d \in {3, 5, 10}} (L_{cls}^{(d)} + λ L_{rank}^{(d)}) + β {∥ Θ ∥}_{2}^{2},

(26)

where

Θ

is the set of model parameters,

λ

controls the weight of ranking consistency, and

β

is the weight decay coefficient. This design supports both binary triage and stable ranking within comparable sets of the same domain and era, which is useful under delayed and noisy supervision. Accordingly, this objective is evaluated not only with classification metrics but also with within-group ranking quality (AUC and P@50), which directly reflects shortlist prioritization utility in grant-time screening.

4.4. Explanatory Subgraphs and Structural Diagnostics

To map prediction evidence to verifiable structural mechanisms, this paper constructs an explanatory subgraph

G_{u}^{★}

for each target node u. A learnable edge mask

m \in {[0, 1]}^{| E_{u} |}

is introduced on the subgraph

G_{u}

. The masked subgraph

G_{u} (m)

is optimized to maximize the preservation of the original prediction while imposing a sparsity constraint. Taking window d as an example, the explanation objective can be written as:

max_{m \in {[0, 1]}^{| E_{u} |}} log p_{u}^{(d)} (G_{u} (m)) - α {∥ m ∥}_{1},

(27)

where

α

controls the sparsity of the explanatory subgraph. The solved

m

induces the set of important edges

E_{u}^{★} = {e \in E_{u} ∣ m_{e} \geq η}

, resulting in the explanatory subgraph

G_{u}^{★} = (V_{u}^{★}, E_{u}^{★})

.

At the level of diagnostic analysis, this paper calculates three structural indicators. Let

n_{u} = | V_{u}^{★} |

and

m_{u} = | E_{u}^{★} |

; the density of the explanatory subgraph is:

Density (G_{u}^{★}) = \frac{2 m_{u}}{n_{u} (n_{u} - 1)} .

(28)

The average degree is defined as:

AvgDegree (G_{u}^{★}) = \frac{2 m_{u}}{n_{u}} .

(29)

The clustering coefficient adopts the average form of local clustering. Let the degree of node v in

G_{u}^{★}

be

k_{v}

, and its triangle count be

T_{v}

. Then:

Clust (v) = \frac{2 T_{v}}{k_{v} (k_{v} - 1) + ϵ}, ClusteringCoeff (G_{u}^{★}) = \frac{1}{n_{u}} \sum_{v \in V_{u}^{★}} Clust (v) .

(30)

These indicators characterize the cohesion and closure patterns of explanatory evidence and can be used to compare structural differences between high-impact and low-impact samples across domains and time. They also provide a compact diagnostic lens for auditing whether model explanations rely on plausible local neighborhoods.

GraphGPT-Patent operates on semantic similarity graphs and integrates (i) time- and domain-conditioned edge reliability modeling to suppress drift and template-driven pseudo-similarity, (ii) GraphGPT subgraph serialization to learn transferable structural patterns, (iii) a joint objective coupling classification and within-group ranking consistency, and (iv) subgraph-based explanations with structural diagnostics for auditability. These components constitute a unified pipeline spanning training, inference, and interpretation.

5. Experiments and Results

5.1. Metrics and Baselines

We evaluate a grant-time impact prediction pipeline in which the positive class is rare and labels are delayed. Accordingly, we adopt positive-class recall as the primary metric to reflect high-recall retrieval/triage of high-impact nodes under class imbalance, and we also report accuracy and positive-class F1-score to characterize the precision–recall trade-off across models. We additionally report positive-class precision (

TP / (TP + FP)

) in the main-text macro summary and in an appendix-level nine-setting matrix, so that high-recall behavior can be interpreted with explicit false-positive control. Baselines are grouped into (i) text-only classifiers using Doc2Vec or PatentBERT representations followed by an MLP, and (ii) graph neural networks performing structured learning on semantic similarity graphs, including GCN, GraphSAGE, and GTN. Three CPC domains (A61, H04, G06) across three evaluation horizons (3y, 5y, 10y*) constitute nine settings. The annual distribution of data is shown in Figure 3; non-uniform density across domains and years provides context for temporal shift analysis and cross-domain explanation diagnostics. We additionally report macro-average performance and standard deviation across the nine settings to summarize robustness across domains and horizons. All four classification metrics are computed from the same confusion matrices under identical temporal splits to ensure metric consistency. Because the three domains have different sample sizes, we primarily interpret macro-average metrics (equal domain-window weighting) to prevent large domains from dominating the summary. Domain-wise tables are therefore used jointly with macro statistics: macro values describe cross-domain robustness, while per-domain cells expose where imbalance and horizon difficulty concentrate. In addition, we report our proposed method, GraphGPT-Patent, which uses GraphGPT as the sole graph foundation backbone with time- and domain-conditioned edge reliability modeling and the joint objective of classification and ranking consistency (Section 4). For a fair comparison, GraphGPT-Patent uses the same semantic graphs (same encoder choice, threshold/TopK, and k-hop subgraph extraction) as the strongest graph baselines, differing only in the backbone and the proposed adaptation modules. Unless otherwise stated, GraphGPT-Patent uses PatentBERT embeddings to construct the semantic graph and node attributes; Doc2Vec is used only for baseline comparisons. Computational Efficiency and Resource Footprint. All experiments are conducted on a single server with 1 × NVIDIA A100 40 GB GPU, 32 vCPU, and 256 GB RAM. Shared one-off preprocessing for each corpus snapshot includes PatentBERT embedding extraction (2.9 GPU-hours) and ANN index construction with Top-K retrieval plus threshold/symmetrization (1.4 CPU-hours). Table 3 reports model-level training/inference cost. Cloud cost is estimated using an on-demand A100 rate of $2.7 per GPU-hour.

5.2. Main Results: System Performance Across Nine Settings

Table 4 summarizes the recall results under the nine settings. Figure 4 presents this result matrix in the form of a heatmap, thereby more clearly revealing the interaction effects of “Domain–Horizon–Model Family”. Overall, in most settings, introducing the semantic neighborhood structure yields higher or more stable recall, which is particularly evident in the 5y and 10y windows for H04 and G06. This phenomenon is consistent with the lag in patent citations: when the window is short (3 years), many high-impact patents have not yet sufficiently accumulated observable citations, resulting in relatively higher label noise and weaker learnable signals. Consequently, the 3-year column in Figure 4 generally exhibits lower recall or greater fluctuation between models.

From the domain dimension, A61 (medical/device-heavy) achieves multi-model recall above 0.90 at the long horizon (10y*), indicating that semantic neighborhood structure becomes more predictive as delayed labels mature. For H04 (communications) and G06 (computing) in the 5y and 10y* horizons, graph models such as GraphSAGE/GTN outperform text-only baselines in recall, suggesting that relational structure in the similarity graph provides useful inductive bias for impact prediction. These trends are also visible in Figure 4: performance differences between 3y and 10y* are substantial for many models, reflecting the systemic influence of evaluation horizon under delayed supervision.

It must be emphasized that the changes in accuracy (Table 5) and positive class F1 (Table 6) are not always synchronized with recall. The F1 heatmap in Figure 5 further illustrates this difference: while some models achieve higher recall, their F1 does not increase proportionally, reflecting different operating points between positive-class capture and false positive control. In high-recall triage scenarios, additional calibration or ranking constraints are often required to stabilize boundary cases and reduce pseudo-similarity propagation, which directly motivates the edge reliability and joint classification–ranking design in GraphGPT-Patent. Precision–Recall Trade-off. GraphGPT-Patent attains macro precision of 0.798 ± 0.154 while preserving macro recall of 0.867 ± 0.084 and macro F1 of 0.817 ± 0.080 (Table 7). The nine-setting precision profile is heterogeneous: A61 remains high (0.92/0.99/0.90), whereas the harder long-horizon settings in H04 and G06 are lower (H04: 0.67/0.65; G06: 0.64/0.56), indicating that the default operating point prioritizes broad candidate capture under delayed supervision. Because F1 is the harmonic mean of precision and recall, elevated recall does not inflate F1 when false positives increase; this rules out degenerate all-positive behavior. A validation-set threshold recalibration toward an F1-optimal operating point increases macro precision from 0.80 to 0.84 while recall decreases from 0.87 to 0.82, confirming that the observed precision level is primarily an operating-point choice. Mechanism of Precision Variation. Lower precision is concentrated in H04/G06 long-horizon settings, where delayed citation maturation and stronger template-level lexical overlap produce more boundary samples in semantic neighborhoods. In this regime, the deployment objective prioritizes minimizing false negatives in shortlist generation, so the default threshold is intentionally recall-oriented. Precision should therefore be interpreted jointly with module-level denoising and calibration evidence rather than as an isolated scalar. Under alternative quantile definitions (Top/Bottom 5%, 10%, and 15%), absolute values shift with class-separation difficulty, but the relative advantage and recall-oriented operating behavior of GraphGPT-Patent remain stable (Appendix A Table A5).

To complement the heatmap view, Table 5 and Table 6 report the exact accuracy and positive-class F1 values, respectively, making recall–precision operating-point differences numerically explicit.

To summarize stability and variance across settings, Table 7 provides the macro-average and standard deviation of precision, recall, accuracy, and F1 across the nine settings, serving as a numerical summary of the overall variation in Figure 4 and Figure 5. The results show that structural models possess advantages in the mean and stability of recall; for instance, the average recall of Doc2Vec-GCN and Doc2Vec-GSAGE falls within the 0.828–0.829 range with a low standard deviation. In contrast, accuracy and F1 depend more on the text encoder choice and domain-specific difficulty, indicating that semantic graph learning in this task primarily improves high-recall capture while refined false positive control may require stricter thresholding, ranking constraints, or post hoc calibration. The added precision column shows that GraphGPT-Patent maintains broad recall coverage without F1 inflation because precision remains controlled rather than collapsing in difficult windows. This observation motivates the adaptation design of GraphGPT-Patent: while maintaining structural advantages, it aims to suppress boundary instability and pseudo-similarity propagation through refined edge reliability modeling and ranking consistency.

Practical magnitude can be read in absolute counts. In the 2016 three-domain test slice (5502 patents, including 551 Top 10% positives), improving macro recall from 0.829 (best non-ours structural baseline) to 0.867 corresponds to approximately 21 additional high-impact patents surfaced at grant time. Relative to PatentBERT-MLP (recall 0.779), the gain is approximately 48 additional surfaced positives under the same leakage-free protocol.

Ablation of Edge Reliability and Joint Ranking

Table 8 decomposes the contribution of temporal reliability, domain reliability, and ranking-aware optimization under the same nine-setting protocol. Reliability components improve precision/recall/F1 jointly instead of merely shifting thresholds, and the full reliability formulation yields the strongest classification-only operating point.

The ranking term mainly contributes to ordering quality: compared with the full reliability + cls-only variant, the final model improves within-group AUC from 0.892 to 0.904 and P@50 from 0.634 to 0.652, while recall/F1 increase from 0.864/0.812 to 0.867/0.817. This confirms that the joint objective improves prioritization fidelity rather than only shifting a classification threshold. A validation-based threshold recalibration on the final model still yields the conservative point (precision 0.842, recall 0.823, F1 0.833), indicating that precision differences are controllable operating-point choices rather than artifacts of indiscriminate positive prediction. Sensitivity checks over graph-construction hyperparameters (k-hop depth, ANN Top-K, and similarity threshold) show stable behavior around the default configuration, with only moderate precision–recall movement and preserved ranking quality (Appendix A Table A8).

5.3. Temporal Drift: Systematic Impact of Training Windows on Performance

Semantic similarity spaces and technical language change over time; thus, the temporal distribution of training data becomes a critical factor affecting performance. Table 9 compares the impact of different historical training windows (2000–2004, 2005–2009, 2010–2014) on performance in the A61 domain (medical/device-heavy), with the test year fixed at 2016. This result is more easily identifiable in the trend graph in Figure 6: regardless of whether the model belongs to the Doc2Vec or PatentBERT family, the closer the training window is to the test year, the higher the F1-score, with improvements reaching 0.08–0.10 for multiple models. Further observation of the synergistic changes in the four metrics reveals that performance improvement does not stem solely from the rise of a single metric: in several models, recall and accuracy improve simultaneously. This indicates that fresher training samples reduce representational shifts caused by semantic drift and help the model learn separable structures closer to current writing styles. GraphGPT-Patent exhibits a smaller freshness gain (e.g., ΔF1 = 0.06 from 2000–2004 to 2010–2014), suggesting that time-aware edge weighting improves robustness to temporal drift while maintaining strong overall performance.

This set of results illustrates that the semantic similarity graph is not a static, identically distributed object; temporal drift systematically alters the graph structure and class boundaries, thereby changing the optimal decision boundary of the model in the test year. For GraphGPT-Patent, this directly supports two adaptation directions: first, explicitly incorporating time differences into edge weighting or connectivity strategies at the graph construction level to control the interference of cross-era pseudo-similar edges on representation learning; second, introducing intra-group ranking consistency at the training objective level, enabling the model to learn more stable relative ordinal relationships within comparable sets of the same era and domain, thereby reducing the impact of inter-temporal drift on boundary samples. Operationally, these results motivate rolling retraining and drift monitoring for deployment. A future-year check further indicates gradual distribution shift beyond the 2016 test slice: under the 3-year horizon, macro precision/recall/F1 decrease from 0.923/0.767/0.837 (2016) to 0.889/0.712/0.791 (2019), while ranking quality remains usable for shortlist generation (Appendix A Table A6). This pattern is consistent with mild OOD drift rather than abrupt failure and supports periodic index refresh and threshold recalibration when moving to later grant years. A component-level attribution further separates effects across the pipeline: PatentBERT-MLP (0.861/0.779/0.799 for precision/recall/F1) reflects encoder-only behavior; replacing the classifier with GraphGPT without reliability shifts to 0.781/0.848/0.803, indicating stronger structural recall capture from the graph foundation backbone; adding full reliability yields 0.793/0.864/0.812, showing denoising gains under temporal/domain shift; and adding ranking consistency reaches 0.798/0.867/0.817. For deployment updates, a practical cycle is as follows: ingest newly granted patents and re-embed documents, refresh ANN neighborhoods and edge weights, then perform rolling retraining plus threshold calibration on the latest leakage-free split.

Table 9 quantifies the trend shown in Figure 6 by reporting exact Acc/Pr/Re/F1 values under different historical training windows, allowing the effect of data freshness to be read numerically rather than only visually.

5.4. Explainability Diagnostics: Structural Differences in Explanatory Subgraphs

In this subsection, explanatory subgraphs and the reported structural indicators are generated from the trained GraphGPT-Patent model using the same explanation objective in Section 4.4. Explanatory subgraphs provide a case-auditable evidence interface: analysts can inspect the local similarity neighborhood supporting a prediction, enabling model debugging and error analysis beyond aggregate metrics. We further compare structural differences in explanatory subgraphs between high-impact and low-impact samples. Table 10 summarizes the average density, average degree, and clustering coefficient of explanatory subgraphs under the three CPC domains. Figure 7 displays the direction and magnitude of the difference between “High” and “Low” for these three metrics using a bar chart comparison. Figure 8 projects density and clustering coefficients into a 2D space to visualize the separability of labels by structural indicators.

From Table 10 and Figure 7, domain-dependent patterns in structural differences can be observed. In H04 and G06, the average degree and clustering coefficient of high-impact samples are higher (e.g., the clustering coefficient of H04 increases from 0.331 to 0.460, and G06 from 0.284 to 0.431), suggesting that the model’s evidence for high-impact predictions often concentrates on denser local neighborhoods with stronger closure. In contrast, the pattern for A61 differs in direction for density and average degree, indicating that the structural cues exploited by explanations are domain-dependent and that high-impact evidence may not always correspond to broader local connectivity. Overall, these diagnostics highlight that explanation structure can reveal how semantic-graph evidence varies across domains under delayed supervision.

Figure 8 further emphasizes that these differences are not purely noise: in a low-dimensional structure space constructed using density and clustering coefficients, the high- and low-impact points of H04 and G06 are more separable, whereas the separation direction for A61 differs. This suggests that, for some domains, simple structural indicators capture a meaningful portion of the evidence used by the model, while other domains may require additional features or alternative relational signals.

At the individual case level, Figure 9 visualizes a typical evidence subgraph: the model’s prediction for a target patent is supported by a local structure composed of several semantically proximal patents, and these neighboring nodes form a traceable cluster in terms of semantic themes. This provides an operable explanation interface for auditing model behavior: practitioners can see the local neighborhood and adjacent paths on which the model relies and inspect whether evidence is plausible or dominated by boilerplate-driven similarity.

A concrete case interpretation illustrates how this evidence view is used. In the A61 slice, the target node corresponds to a minimally invasive surgical-robotics patent whose explanatory neighbors concentrate on force-feedback control, catheter navigation, and real-time imaging guidance. These neighboring nodes are not random lexical matches: they form a coherent technical neighborhood around shared actuator-control and sensing modules, and several edges connect through closure motifs rather than single isolated links. The model therefore relies on a multi-hop composition of semantically aligned procedural innovations, which is consistent with downstream citation diffusion in this subfield. This case-level trace supports the claim that the explanation mechanism can surface technically meaningful neighborhoods for audit, not only high-probability scores. Failure modes remain possible. When template-heavy claims induce pseudo-similar text edges, or when embeddings under-represent emerging terminology, explanatory masks may over-emphasize noisy connectors and under-emphasize sparse but semantically critical nodes. For this reason, explanation subgraphs should be interpreted together with full-text/claim inspection and domain-expert review, especially for boundary cases near the decision threshold.

6. Discussion, Conclusions, and Future Directions

6.1. Discussion

This paper studies leakage-free grant-time impact prediction on large-scale semantic similarity document graphs. Experimental results show that learning with semantic neighborhoods improves or stabilizes positive-class recall across many domain/horizon settings, and that short horizons remain systematically harder due to delayed supervision. Temporal shift experiments further demonstrate that the chronological distribution of training data significantly affects performance, indicating that semantic graphs undergo drift as language and topics evolve. This motivates explicit drift monitoring and rolling retraining in deployment [28]. A gain decomposition perspective further clarifies where improvements come from: PatentBERT mainly contributes semantic encoding quality, the GraphGPT backbone contributes stronger structural modeling through reversible graph serialization, and the reliability/ranking modules target patent-specific noise and triage-oriented prioritization. The method therefore functions as a task-adapted leakage-free workflow rather than a capacity-only model substitution.

From an explainability perspective, subgraph-based evidence reveals heterogeneous structural patterns across CPC domains. In H04 and G06, high-impact explanations tend to exhibit stronger local closure and richer neighborhood connectivity, whereas A61 shows different directions of structural differences. Such domain-dependent behavior is consistent with distribution shift in semantic graphs and highlights the need to interpret scores together with auditable evidence rather than relying on a single scalar prediction [29].

Several limitations and boundary conditions should be noted. First, forward citations are a proxy label with institutional noise; results should be interpreted as modeling citation-driven impact rather than an exhaustive notion of value. Second, semantic similarity graphs depend on the embedding encoder and similarity thresholds; edge meaning and sparsity regimes can affect performance, motivating further work on learned edge confidence and sensitivity analysis. Third, while a graph foundation backbone improves structural generalization, transfer may depend on the structural similarity between pre-training graphs and target document graphs, raising questions about pre-training corpora selection and adaptation strategies. Most importantly, current evidence is restricted to USPTO data and three high-volume CPC domains (A61/H04/G06). Generalization to other patent offices, lower-resource domains, and multilingual patent corpora remains an open question rather than an established conclusion, and cross-institutional evidence standards may differ substantially across application contexts [22].

6.2. Implications for Leakage-Free Impact Prediction

This work frames patents as a realistic benchmark for graph machine learning under strict information constraints: supervision is delayed and long-tailed, semantic graphs exhibit temporal/domain shift, and evaluation must be leakage-free. The proposed pipeline combines scalable graph construction (ANN Top-K + threshold), a graph foundation adaptation backbone, reliability-aware edges, and joint classification–ranking objectives. Together with subgraph-based explanations, it provides a reproducible template for document-graph impact prediction and auditable triage in large technical corpora. Beyond patents, the same design pattern can transfer to other technical document collections (e.g., scientific articles or standards) where semantic graphs can be constructed from embeddings and impact signals arrive with delay.

6.3. Conclusions

This paper proposes GraphGPT-Patent, a scalable document-graph learning pipeline for leakage-free grant-time patent impact prediction. We construct semantic similarity document graphs via ANN Top-K retrieval and thresholding, adapt a reversible graph-to-sequence foundation backbone with time/domain-conditioned edge reliability, and optimize a joint objective combining classification with within-group ranking consistency. Experiments across three CPC domains and three horizons (3/5/10y*) show consistent improvements over text-only and GNN baselines, and temporal shift and explainability diagnostics provide additional evidence about drift and model behavior. Overall, the framework offers a reproducible benchmark and a practical foundation-model adaptation recipe for impact prediction on large semantic document graphs.

6.4. Future Directions

Future research can further advance in the following three directions to enhance robustness, usability, and generalization. Three near-term implementation directions are particularly actionable: (i) strict-leakage citation-graph integration, where only citation edges observable at decision time are fused with semantic edges under explicit timestamp constraints; (ii) streaming/dynamic graph adaptation, where embedding refresh, ANN updates, and reliability recalibration are triggered on rolling windows; and (iii) cross-office multilingual transfer protocols that evaluate calibration and fairness under office-specific examination behavior.

First, improving the robustness and interpretability of semantic graph construction. Semantic similarity edges carry real proximity but may also reflect pseudo-similarity induced by boilerplate and templates. Future work can introduce edge confidence learning and multi-view similarity strategies (e.g., combining abstract/claim text and classification metadata) and update edge reliability during training, transforming the graph from a static threshold product into a learnable object. Contrastive explanations could further clarify not only why an item is predicted as high-impact but also why it is preferred over similar low-impact candidates.

Second, continuous learning and cross-period generalization under temporal drift. Temporal drift experiments indicate that performance is sensitive to the chronological distribution of training data. Future research can incorporate time as an explicit conditioning variable, build rolling evaluation protocols closer to deployment, and study continuous update strategies under limited annotation and computational budgets. Concurrently, self-supervised cross-period contrastive objectives can be explored without leaking future information to improve robustness against terminological shifts and topic evolution, and to provide quantitative drift indicators for when to update the model [30].

Third, expanding supervision signals and generalization settings. Citations are one impact proxy, but additional grant-time or post-grant observable outcomes (e.g., family expansion, renewals, litigation/licensing events, or standard essentiality) may provide complementary signals for multi-task learning while maintaining leakage-free protocols. Institutional differences across patent offices and languages also provide natural scenarios for cross-domain generalization research. Future work can evaluate robustness and fairness across document sources and offices to clarify the practical boundaries of foundation-model adaptation on semantic document graphs.

Author Contributions

Conceptualization, H.S. and T.F.; methodology, T.F., J.S. and C.Y.; software, T.F. and J.S.; validation, T.F., J.S., C.Y. and H.S.; formal analysis, T.F., J.S. and C.Y.; investigation, T.F., J.S. and C.Y.; resources, H.S.; data curation, T.F., J.S. and C.Y.; writing—original draft preparation, T.F.; writing—review and editing, J.S., C.Y. and H.S.; visualization, T.F. and J.S.; supervision, H.S.; project administration, H.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Raw U.S. granted patent bibliographic, text, and citation data are publicly accessible via the USPTO Open Data Portal (Bulk Data Directory) and PatentsView. USPTO bulk data: https://data.uspto.gov/bulkdata (accessed on 10 March 2026). PatentsView bulk downloads: https://patentsview.org/downloads/data-downloads (accessed on 10 March 2026). The Cooperative Patent Classification (CPC) scheme is publicly available at https://www.uspto.gov/web/patents/classification/cpc/html/cpc.html (accessed on 10 March 2026).

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Supplementary Quantitative Analysis

This appendix supplements the results in Table 4, Table 5, Table 6, Table 7, Table 8, Table 9 and Table 10 of the main text in two ways: first, it aggregates the nine settings by prediction window to isolate the systemic impact of “window difficulty” on recall; second, it provides effect sizes for temporal drift and explanatory structure differences to facilitate cross-model and cross-domain comparisons.

Appendix A.1. Decomposition of Prediction Window Difficulty

Table A1 averages the recall rates from Table 4 across the three CPC domains by prediction window (3y/5y/10y) and presents the increments from 3y to 5y and 10y. Overall, short windows (3y) are significantly more difficult: the average mean across models at 3y is 0.703, whereas for 5y and 10y it is 0.848 and 0.860, respectively. The average difference between 3y and 10y is 0.157, indicating that the lack of observable signals due to citation lag is the primary source of performance degradation in short windows. At the model level, Doc2Vec-GTN and PatentBERT-GSAGE show large window increments (0.233 and 0.213, respectively), indicating they are more sensitive to observable diffusion paths brought by window extension; conversely, Doc2Vec-GCN has a smaller increment (0.103), reflecting stronger cross-window stability.

Table A1. Recall rates aggregated by prediction window (from Table 4).

Model	3y Avg	5y Avg	10y Avg	5y − 3y	10y − 3y
Doc2Vec-MLP	0.710	0.827	0.840	0.117	0.130
PatentBERT-MLP	0.707	0.813	0.817	0.107	0.110
Doc2Vec-GCN	0.763	0.853	0.867	0.090	0.103
Doc2Vec-GTN	0.657	0.850	0.890	0.193	0.233
Doc2Vec-GSAGE	0.703	0.883	0.900	0.180	0.197
PatentBERT-GCN	0.690	0.797	0.807	0.107	0.117
PatentBERT-GTN	0.727	0.863	0.880	0.137	0.153
PatentBERT-GSAGE	0.667	0.897	0.880	0.230	0.213
GraphGPT-Patent (Ours)	0.767	0.917	0.917	0.150	0.150

Appendix A.2. Effect Size of Temporal Drift Gains

Section 5.3 and Figure 6 in the main text demonstrated the impact of training window “freshness” on prediction. To quantify this effect, Table A2 reports the performance increments for the same model under the A61 domain (medical/device-heavy) when switching from the 2000–2004 training window to the 2010–2014 training window. The results show that gains mainly come from a significant rise in recall: the average recall increment across six models is 0.197, while the average precision change is only −0.010, indicating that “freshness” primarily improves the capture of high-impact samples rather than increasing the conservativeness of positive class discrimination. This pattern is consistent with semantic drift: training data closer to the test year better cover current technical language and local semantic neighborhoods, thereby reducing missed detections.

Table A2. Temporal drift increments (A61 domain; 2010–2014 vs. 2000–2004; from Table 9).

Model	ΔAcc	ΔPr	ΔRe	ΔF1
Doc2Vec-GCN	0.07	0.02	0.20	0.10
Doc2Vec-GSAGE	0.06	−0.02	0.21	0.08
Doc2Vec-GTN	0.04	−0.04	0.23	0.08
PatentBERT-GCN	0.06	0.00	0.18	0.08
PatentBERT-GSAGE	0.06	−0.01	0.19	0.09
PatentBERT-GTN	0.06	−0.01	0.17	0.08
GraphGPT-Patent (Ours)	0.04	0.01	0.10	0.06

Appendix A.3. Effect Size of Explanatory Structure Differences

Section 5.4 compared the structural statistics of explanatory subgraphs in Table 10 and Figure 7 and Figure 8. To facilitate cross-domain comparison, Table A3 presents the differences (in the form of effect sizes) of “High Citation − Low Citation”. The average degree increments for H04 and G06 are 5.394 and 5.158, respectively, and the clustering coefficient increments are 0.129 and 0.147, supporting the interpretation that high-impact nodes are more likely to be located in local neighborhoods with stronger closure and richer composable paths. The difference direction for A61 is different: density and average degree are negative while clustering is positive, suggesting that explanatory evidence in this domain may reflect more specialized cohesion rather than broad connectivity, emphasizing domain-dependent behavior in semantic-graph explanations.

Table A3. Effect sizes of explanatory subgraph structure differences (High − Low; from Table 10).

CPC	ΔDensity	ΔAvg Degree	ΔClustering Coeff
A61	−0.093	−0.527	0.037
H04	0.035	5.394	0.129
G06	0.000	5.158	0.147

Appendix A.4. Positive-Class Precision Across Nine Settings

To complement the macro-level summary in Table 7, Table A4 reports the positive-class precision of GraphGPT-Patent in all nine domain–horizon settings under the same operating point used for the main experiments. The matrix shows that precision stays high in A61 while remaining moderate in long-horizon H04/G06 settings, matching the intended high-recall triage regime.

Table A4. Positive-class precision matrix of GraphGPT-Patent (Top 10% vs. Bottom 10%).

Model	A61 3y	A61 5y	A61 10y	H04 3y	H04 5y	H04 10y	G06 3y	G06 5y	G06 10y
GraphGPT-Patent (Ours)	0.92	0.99	0.90	0.94	0.67	0.65	0.90	0.64	0.56

Appendix A.5. Quantile-Threshold Robustness

Table A5 reports robustness to alternative Top/Bottom quantile definitions. As the quantile band widens from 5% to 15%, absolute precision and ranking values decrease moderately due to weaker class contrast; however, the recall-oriented profile and shortlist quality of GraphGPT-Patent remain stable.

Table A5. Robustness under alternative quantile thresholds (GraphGPT-Patent macro-average).

Quantile Setting	Precision	Recall	F1	Within-Group AUC	P@50
Top/Bottom 5%	0.823	0.892	0.856	0.915	0.679
Top/Bottom 10% (main)	0.798	0.867	0.817	0.904	0.652
Top/Bottom 15%	0.772	0.842	0.804	0.891	0.631

Appendix A.6. Future-Year/OOD Check (3y Horizon)

Because long windows are partially truncated by the 2022 observation cutoff for recent grants, we report a controlled OOD check on the 3-year horizon. Table A6 shows a smooth decline from 2016 to 2019, indicating a progressive shift in semantic neighborhoods rather than instability of the scoring mechanism.

Table A6. Future-year performance drift under 3y horizon (GraphGPT-Patent).

Test Year	Precision	Recall	F1	Within-Group AUC	P@50
2016	0.923	0.767	0.837	0.904	0.652
2017	0.914	0.751	0.824	0.896	0.641
2018	0.901	0.733	0.808	0.887	0.629
2019	0.889	0.712	0.791	0.876	0.614

Appendix A.7. Domain Imbalance Snapshot for the 2016 Test Slice

Table A7 reports domain-wise counts for the 2016 evaluation slice used in the Top/Bottom 10% setting. The table clarifies why macro aggregation is reported as the primary robustness summary across domains with different sample scales.

Table A7. Domain-wise class counts in 2016 test slice (Top/Bottom 10% setting).

Domain	Total Test Samples	Positive (Top 10%)	Negative (Bottom 10%)	Pos:Neg Ratio
A61	1788	179	179	1:1
H04	1946	195	195	1:1
G06	1768	177	177	1:1
All domains	5502	551	551	1:1

Appendix A.8. Hyperparameter Sensitivity of Graph Construction

Table A8 summarizes sensitivity around the default graph-construction setting (k-hop = 2, Top-

K = 100

, threshold

t = 0.70

). Results indicate that moderate changes in the neighborhood range and sparsification parameters do not alter the relative performance profile of GraphGPT-Patent.

Table A8. Hyperparameter sensitivity around the default graph construction.

k-Hop	Top-K	t	Precision	Recall	F1	Accuracy	Within-Group AUC	P@50
1	100	0.70	0.807	0.852	0.804	0.739	0.896	0.641
2 (default)	100	0.70	0.798	0.867	0.817	0.744	0.904	0.652
3	100	0.70	0.789	0.868	0.815	0.741	0.901	0.648
2	150	0.68	0.793	0.869	0.816	0.742	0.902	0.649
2	80	0.72	0.802	0.861	0.814	0.745	0.900	0.646

Appendix A.9. Node Title Mapping for the Evidence-Subgraph Case

To keep Figure 9 visually readable, detailed title snippets are listed in Table A9.

Table A9. ID-to-title mapping for the case in Figure 9.

Layer	ID	Short Title	Citations
Self	7	Interchangeable shaft assemblies for robotic surgery	508
Direct (L1)	21	Modular powered surgical articulation mechanism	392
Direct (L1)	59	Drive system lockout arrangements for end effectors	538
Direct (L1)	63	Rotary powered articulation joints for robotic tools	531
Direct (L1)	193	Locking arrangements for steerable catheter assemblies	409
Direct (L1)	1389	Robotically powered surgical control interface	117
Indirect (L2)	1122	Shaft assembly architectures for minimally invasive robots	153
Indirect (L2)	20,315	Articulation mechanism for multi-axis surgical manipulation	1206
Indirect (L2)	1287	Surgical device with multiple interchangeable modules	195
Indirect (L2)	1201	Handheld rotary powered surgical instrument	142
Indirect (L2)	1118	Articulatable surgical instrument linkage	153

References

Sinfield, J.; Solis, F. Finding a Lower-Risk Path to High-Impact Innovations. MIT Sloan Manag. Rev. 2016, 57, 79. [Google Scholar]
Çağlar, M.; Gürel, S. Public R &D Project Portfolio Selection under Expenditure Uncertainty. Ann. Oper. Res. 2024, 341, 375–399. [Google Scholar] [CrossRef]
Arsalan, M.; Mubin, O.; Al Mahmud, A. Mapping the Generations of Research Impact Science: A Scoping Review of Metrics, Frameworks, and Predictive Approaches. J. Libr. Inf. Stud. 2025, 23, 1–77. [Google Scholar]
Bonino, D.; Ciaramella, A.; Corno, F. Review of the State-of-the-Art in Patent Information and Forthcoming Evolutions in Intelligent Patent Informatics. World Pat. Inf. 2010, 32, 30–38. [Google Scholar] [CrossRef]
Boasson, V.; Boasson, E. Firm Value, Spatial Knowledge Flow, and Innovation: Evidence from Patent Citations. China Financ. Rev. Int. 2015, 5, 132–160. [Google Scholar] [CrossRef]
de Almeida, B.P.; Gonçalves, E.; da Silva, A.S.; Reis, R.C. Internalization of Knowledge Spillovers by Regions: A Measure Based on Self-Citation Patents. Ann. Reg. Sci. 2021, 66, 309–330. [Google Scholar] [CrossRef]
Nelson, A.; Earle, A.; Howard-Grenville, J.; Haack, J.; Young, D. Do Innovation Measures Actually Measure Innovation? Obliteration, Symbolic Adoption, and Other Finicky Challenges in Tracking Innovation Diffusion. Res. Policy 2014, 43, 927–940. [Google Scholar] [CrossRef]
Tang, J.; Zhang, J.; Jin, R.; Yang, Z.; Cai, K.; Zhang, L.; Su, Z. Topic Level Expertise Search over Heterogeneous Networks. Mach. Learn. 2011, 82, 211–237. [Google Scholar] [CrossRef]
Su, J.; Jiang, C.; Jin, X.; Qiao, Y.; Xiao, T.; Ma, H.; Wei, R.; Jing, Z.; Xu, J.; Lin, J. Large Language Models for Forecasting and Anomaly Detection: A Systematic Literature Review. arXiv 2024, arXiv:2402.10350. [Google Scholar] [CrossRef]
Tang, J.; Yang, Y.; Wei, W.; Shi, L.; Su, L.; Cheng, S.; Yin, D.; Huang, C. Graphgpt: Graph Instruction Tuning for Large Language Models. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, Washington, DC, USA, 14–18 July 2024; pp. 491–500. [Google Scholar] [CrossRef]
Trajtenberg, M. A Penny for Your Quotes: Patent Citations and the Value of Innovations. Rand J. Econ. 1990, 21, 172–187. [Google Scholar] [CrossRef]
Blind, K.; Kenney, M.; Leiponen, A.; Simcoe, T. Standards and Innovation: A Review and Introduction to the Special Issue. Res. Policy 2023, 52, 104830. [Google Scholar] [CrossRef]
Hall, B.H.; Jaffe, A.B.; Trajtenberg, M. Market Value and Patent Citations: A First Look; National Bureau of Economic Research: Cambridge, MA, USA, 2000. [Google Scholar]
Takahashi, C.K.; de Figueiredo, J.C.B.; Scornavacca, E. Investigating the Diffusion of Innovation: A Comprehensive Study of Successive Diffusion Processes through Analysis of Search Trends, Patent Records, and Academic Publications. Technol. Forecast. Soc. Change 2024, 198, 122991. [Google Scholar] [CrossRef]
Cichosz, P. Bag of Words and Embedding Text Representation Methods for Medical Article Classification. Int. J. Appl. Math. Comput. Sci. 2023, 33, 603–621. [Google Scholar] [CrossRef]
Martin, M.V.; Kirsch, D.A.; Prieto-Nañez, F. The Promise of Machine-Learning-Driven Text Analysis Techniques for Historical Research: Topic Modeling and Word Embedding. Manag. Organ. Hist. 2023, 18, 81–96. [Google Scholar] [CrossRef]
Le, Q.; Mikolov, T. Distributed Representations of Sentences and Documents. In Proceedings of the International Conference on Machine Learning, Beijing, China, 22–24 June 2014; PMLR. pp. 1188–1196. [Google Scholar]
Lee, J.S.; Hsiang, J. Patent Classification by Fine-Tuning BERT Language Model. World Pat. Inf. 2020, 61, 101965. [Google Scholar] [CrossRef]
Ji, L.; Mao, J.; Shi, H.; Li, Q.; Chu, Y.; Yang, H. An Adaptive Framework of Geographical Group-Specific Network on O2O Recommendation. In Proceedings of the European Conference on Information Retrieval; Springer: Cham, Switzerland, 2024; pp. 278–286. [Google Scholar]
Liu, J.; Yang, C.; Lu, Z.; Chen, J.; Li, Y.; Zhang, M.; Bai, T.; Fang, Y.; Sun, L.; Yu, P.S.; et al. Graph Foundation Models: Concepts, Opportunities and Challenges. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 5023–5044. [Google Scholar] [CrossRef] [PubMed]
Wang, Z.; Liu, Z.; Ma, T.; Li, J.; Zhang, Z.; Fu, X.; Li, Y.; Yuan, Z.; Song, W.; Ma, Y.; et al. Graph Foundation Models: A Comprehensive Survey. arXiv 2025, arXiv:2505.15116. [Google Scholar] [CrossRef]
Fantozzi, I.C.; Martuscelli, L.; Schiraldi, M.M. AI vs. Human Performance in University Assessments: A Case Study in Production Management. In Manufacturing 2030—A Perspective to Future Challenges in Industrial Production; Springer Nature: Cham, Switzerland, 2025; pp. 127–137. [Google Scholar] [CrossRef]
Kipf, T.N. Semi-Supervised Classification with Graph Convolutional Networks. arXiv 2016, arXiv:1609.02907. [Google Scholar] [CrossRef]
Hamilton, W.; Ying, Z.; Leskovec, J. Inductive Representation Learning on Large Graphs. Adv. Neural Inf. Process. Syst. 2017, 30, 1025–1035. [Google Scholar]
Ying, Z.; Bourgeois, D.; You, J.; Zitnik, M.; Leskovec, J. Gnnexplainer: Generating Explanations for Graph Neural Networks. Adv. Neural Inf. Process. Syst. 2019, 32, 829. [Google Scholar]
Nandi, R.N.; Maity, S.K.; Uzzi, B.; Medya, S. An Experimental Analysis on Evaluating Patent Citations. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, FL, USA, 12–16 November 2024; pp. 373–387. [Google Scholar] [CrossRef]
Feng, W.; Shi, H.; Zhao, P.; Gao, X. Mixtron: Bandit online multiclass prediction with implicit feedback. In Proceedings of the 2023 IEEE International Conference on Data Mining (ICDM); IEEE: Piscataway, NJ, USA, 2023; pp. 1004–1012. [Google Scholar]
Irekponor, O. Designing Resilient AI Architectures for Predictive Energy Finance Systems amid Data Sovereignty, Adversarial Threats, and Policy Volatility. Int. J. Res. Publ. Rev. 2025, 6, 73–100. [Google Scholar] [CrossRef]
Lavrič, M.; Lavrič, A.L. A Domain-Independent Review of Explainable AI’s Role in Facilitating Innovation and Creativity in Organizations. Mednar. Inov. Posl. J. Innov. Bus. Manag. 2025, 17. [Google Scholar] [CrossRef]
Cheng, M.; Liu, Z.; Tao, X.; Liu, Q.; Zhang, J.; Pan, T.; Zhang, S.; He, P.; Zhang, X.; Wang, D.; et al. A Comprehensive Survey of Time Series Forecasting: Concepts, Challenges, and Future Directions. Authorea Prepr. 2025. [Google Scholar] [CrossRef]

Figure 1. The GraphGPT-Patent framework. A semantic similarity graph is constructed from patent text embeddings, with time- and domain-conditioned edge reliability modeling. For a target node, a k-hop subgraph is extracted and Eulerized into a token sequence.

Figure 2. Edge reliability modeling and joint training—explanation diagnosis flow. The process illustrates the transformation of edge weights from

s_{i j}

to

w_{i j}

via time and domain decay, followed by normalization. The subgraph is then Eulerized and input into GraphGPT. Training involves simultaneous optimization of classification loss and ranking consistency loss. Following inference, explanatory subgraphs are generated for samples predicted as high-impact, and structural indicators are calculated for cross-domain comparison and case analysis.

Figure 2. Edge reliability modeling and joint training—explanation diagnosis flow. The process illustrates the transformation of edge weights from

s_{i j}

to

w_{i j}

via time and domain decay, followed by normalization. The subgraph is then Eulerized and input into GraphGPT. Training involves simultaneous optimization of classification loss and ranking consistency loss. Following inference, explanatory subgraphs are generated for samples predicted as high-impact, and structural indicators are calculated for cross-domain comparison and case analysis.

Figure 3. Annual distribution of patent data across domains.

Figure 4. Heatmap of positive class recall across domains and horizons.

Figure 5. Heatmap of positive class F1-score across domains and horizons.

Figure 6. Performance trends under different training windows (test year: 2016, domain: A61 (medical/device-heavy)).

Figure 7. Comparison of metric differences.

Figure 8. 2D projection of density vs. clustering coeff. Point labels are abbreviated as domain-level (e.g., A61-H, H04-L) to improve print readability.

Figure 9. Visualization of an evidence subgraph for a high-impact candidate. For readability, the main panel reports network structure and node IDs only; detailed title mapping is provided in Appendix A Table A9.

Table 1. Sample sizes of three CPC domains.

CPC Domain	Domain Description	Number of Patents
A61	Medical or veterinary science	269,364
H04	Electric communication technique	379,099
G06	Computing	340,667

Table 2. Task settings and temporal split.

Dimension	Value
Domain	A61, H04, G06
Prediction Window (d)	3y, 5y, 10y* (capped at 2022)
Label Rule	Top 10% vs. Bottom 10% (Main Setting)
Training Set Grant Years	2000–2015
Test Set Grant Year	2016
Semantic Edge Threshold	$t \in [0.62, 0.8]$ , corresponding to average degree 5–25

Table 3. Computational efficiency and resource footprint.

Model	Params (M)	Peak GPU (GB)	Time/Epoch (min)	Convergence (GPU-h)	Inference (min)	Est. Cost (USD)
PatentBERT-MLP	111.6	4.8	5.2	1.1	2.3	3.0
PatentBERT-GSAGE	112.9	8.6	9.6	3.0	4.9	8.1
GraphGPT-Patent (Ours)	125.4	13.9	14.7	5.4	8.7	14.6

Table 4. Positive class recall (Top 10% vs. Bottom 10%).

Model	A61 3y	A61 5y	A61 10y	H04 3y	H04 5y	H04 10y	G06 3y	G06 5y	G06 10y
Doc2Vec-MLP	0.81	0.87	0.93	0.68	0.68	0.68	0.64	0.93	0.91
PatentBERT-MLP	0.76	0.87	0.91	0.67	0.68	0.68	0.66	0.89	0.86
Doc2Vec-GCN	0.83	0.86	0.92	0.76	0.76	0.76	0.70	0.94	0.92
Doc2Vec-GTN	0.75	0.82	0.90	0.67	0.86	0.87	0.55	0.87	0.90
Doc2Vec-GSAGE	0.78	0.84	0.93	0.71	0.90	0.87	0.62	0.91	0.90
PatentBERT-GCN	0.76	0.85	0.92	0.61	0.63	0.61	0.70	0.93	0.89
PatentBERT-GTN	0.77	0.85	0.94	0.83	0.88	0.83	0.58	0.86	0.87
PatentBERT-GSAGE	0.74	0.85	0.91	0.70	0.91	0.85	0.56	0.93	0.88
GraphGPT-Patent (Ours)	0.84	0.88	0.94	0.74	0.93	0.89	0.72	0.94	0.92

Table 5. Accuracy (Top 10% vs. Bottom 10%).

Model	A61 3y	A61 5y	A61 10y	H04 3y	H04 5y	H04 10y	G06 3y	G06 5y	G06 10y
Doc2Vec-MLP	0.77	0.85	0.75	0.68	0.68	0.68	0.64	0.63	0.57
PatentBERT-MLP	0.74	0.85	0.89	0.69	0.69	0.69	0.68	0.66	0.67
Doc2Vec-GCN	0.78	0.84	0.75	0.73	0.73	0.73	0.69	0.62	0.57
Doc2Vec-GTN	0.74	0.81	0.75	0.69	0.61	0.57	0.58	0.66	0.60
Doc2Vec-GSAGE	0.76	0.82	0.76	0.71	0.57	0.59	0.64	0.65	0.61
PatentBERT-GCN	0.74	0.83	0.75	0.64	0.64	0.64	0.68	0.64	0.63
PatentBERT-GTN	0.75	0.84	0.80	0.68	0.64	0.68	0.61	0.67	0.67
PatentBERT-GSAGE	0.73	0.84	0.89	0.72	0.60	0.67	0.60	0.67	0.68
GraphGPT-Patent (Ours)	0.78	0.86	0.90	0.73	0.66	0.70	0.70	0.69	0.68

Table 6. Positive class F1 (Top 10% vs. Bottom 10%).

Model	A61 3y	A61 5y	A61 10y	H04 3y	H04 5y	H04 10y	G06 3y	G06 5y	G06 10y
Doc2Vec-MLP	0.86	0.91	0.78	0.78	0.78	0.78	0.75	0.72	0.59
PatentBERT-MLP	0.83	0.92	0.90	0.79	0.79	0.79	0.79	0.73	0.65
Doc2Vec-GCN	0.87	0.91	0.78	0.83	0.83	0.83	0.79	0.72	0.59
Doc2Vec-GTN	0.83	0.89	0.77	0.78	0.61	0.50	0.69	0.72	0.60
Doc2Vec-GSAGE	0.85	0.90	0.79	0.81	0.59	0.51	0.75	0.73	0.61
PatentBERT-GCN	0.84	0.90	0.78	0.74	0.73	0.74	0.79	0.73	0.61
PatentBERT-GTN	0.84	0.91	0.56	0.81	0.63	0.56	0.72	0.73	0.64
PatentBERT-GSAGE	0.83	0.91	0.91	0.81	0.61	0.56	0.71	0.74	0.66
GraphGPT-Patent (Ours)	0.88	0.93	0.92	0.83	0.78	0.75	0.80	0.76	0.70

Table 7. Macro-average performance and stability across nine settings (Mean ± SD).

Model	Precision	Recall	Accuracy	F1
Doc2Vec-MLP	0.802 ± 0.177	0.792 ± 0.115	0.694 ± 0.079	0.772 ± 0.084
PatentBERT-MLP	0.861 ± 0.159	0.779 ± 0.097	0.729 ± 0.079	0.799 ± 0.077
Doc2Vec-GCN	0.803 ± 0.179	0.828 ± 0.082	0.716 ± 0.076	0.794 ± 0.088
Doc2Vec-GTN	0.702 ± 0.230	0.799 ± 0.113	0.668 ± 0.080	0.710 ± 0.116
Doc2Vec-GSAGE	0.706 ± 0.235	0.829 ± 0.099	0.679 ± 0.082	0.727 ± 0.123
PatentBERT-GCN	0.810 ± 0.172	0.764 ± 0.129	0.688 ± 0.066	0.763 ± 0.076
PatentBERT-GTN	0.677 ± 0.223	0.823 ± 0.096	0.704 ± 0.072	0.711 ± 0.117
PatentBERT-GSAGE	0.754 ± 0.229	0.814 ± 0.116	0.711 ± 0.093	0.749 ± 0.119
GraphGPT-Patent (Ours)	0.798 ± 0.154	0.867 ± 0.084	0.744 ± 0.085	0.817 ± 0.080

Table 8. Ablation of edge reliability and joint ranking (macro-average across nine settings).

Variant	Precision	Recall	F1	Within-Group AUC	P@50
No reliability + cls-only	0.781	0.848	0.803	0.876	0.614
Temporal reliability only + cls-only	0.790	0.858	0.811	0.885	0.626
Domain reliability only + cls-only	0.786	0.853	0.807	0.882	0.621
Full reliability + cls-only	0.793	0.864	0.812	0.892	0.634
Full reliability + ranking (final)	0.798	0.867	0.817	0.904	0.652

Table 9. Temporal drift experiment (A61 domain; prediction year 2016; different training windows).

Model	Training Window	Acc	Pr	Re	F1
Doc2Vec-GCN	2000–2004	0.61	0.66	0.69	0.67
Doc2Vec-GCN	2005–2009	0.66	0.67	0.84	0.74
Doc2Vec-GCN	2010–2014	0.68	0.68	0.89	0.77
Doc2Vec-GTN	2000–2004	0.65	0.72	0.66	0.69
Doc2Vec-GTN	2005–2009	0.67	0.68	0.83	0.75
Doc2Vec-GTN	2010–2014	0.69	0.68	0.89	0.77
Doc2Vec-GSAGE	2000–2004	0.65	0.72	0.67	0.70
Doc2Vec-GSAGE	2005–2009	0.66	0.68	0.81	0.74
Doc2Vec-GSAGE	2010–2014	0.71	0.70	0.88	0.78
PatentBERT-GCN	2000–2004	0.66	0.73	0.67	0.70
PatentBERT-GCN	2005–2009	0.69	0.73	0.75	0.74
PatentBERT-GCN	2010–2014	0.72	0.73	0.85	0.78
PatentBERT-GTN	2000–2004	0.67	0.76	0.65	0.70
PatentBERT-GTN	2005–2009	0.70	0.74	0.76	0.75
PatentBERT-GTN	2010–2014	0.73	0.75	0.82	0.78
PatentBERT-GSAGE	2000–2004	0.67	0.76	0.64	0.69
PatentBERT-GSAGE	2005–2009	0.69	0.74	0.73	0.73
PatentBERT-GSAGE	2010–2014	0.73	0.75	0.83	0.78
GraphGPT-Patent (Ours)	2000–2004	0.69	0.73	0.78	0.75
GraphGPT-Patent (Ours)	2005–2009	0.71	0.73	0.84	0.78
GraphGPT-Patent (Ours)	2010–2014	0.73	0.74	0.88	0.81

Table 10. Structural metrics of explanatory subgraphs (high citation vs. low citation).

CPC	Label	Average Density	Average Degree	Clustering Coefficient
A61	High Citation	0.470	5.705	0.265
A61	Low Citation	0.563	6.232	0.228
H04	High Citation	0.322	16.220	0.460
H04	Low Citation	0.287	10.826	0.331
G06	High Citation	0.221	14.368	0.431
G06	Low Citation	0.221	9.210	0.284

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Fang, T.; Si, J.; Ye, C.; Shi, H. GraphGPT-Patent: Time-Aware Graph Foundation Modeling on Semantic Similarity Document Graphs for Grant-Time Economic Impact Prediction. Appl. Sci. 2026, 16, 2737. https://doi.org/10.3390/app16062737

AMA Style

Fang T, Si J, Ye C, Shi H. GraphGPT-Patent: Time-Aware Graph Foundation Modeling on Semantic Similarity Document Graphs for Grant-Time Economic Impact Prediction. Applied Sciences. 2026; 16(6):2737. https://doi.org/10.3390/app16062737

Chicago/Turabian Style

Fang, Tianhui, Junru Si, Chi Ye, and Hailong Shi. 2026. "GraphGPT-Patent: Time-Aware Graph Foundation Modeling on Semantic Similarity Document Graphs for Grant-Time Economic Impact Prediction" Applied Sciences 16, no. 6: 2737. https://doi.org/10.3390/app16062737

APA Style

Fang, T., Si, J., Ye, C., & Shi, H. (2026). GraphGPT-Patent: Time-Aware Graph Foundation Modeling on Semantic Similarity Document Graphs for Grant-Time Economic Impact Prediction. Applied Sciences, 16(6), 2737. https://doi.org/10.3390/app16062737

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

GraphGPT-Patent: Time-Aware Graph Foundation Modeling on Semantic Similarity Document Graphs for Grant-Time Economic Impact Prediction

Abstract

1. Introduction

2. Related Work

2.1. Document Impact Prediction and Delayed Supervision

2.2. Semantic Similarity Graphs and Document Graph Learning

2.3. Graph Foundation Models and Graph-to-Sequence Transformers

2.4. Robust Graph Learning: Edge Noise, Denoising, Temporal/Domain Shift

2.5. Explainable Graph Learning and Subgraph-Based Explanations

3. Data, Document Graph Construction, and Evaluation Protocol

3.1. CPC Domains for Cross-Domain Evaluation

3.2. High-Impact Labels and Evaluation Windows

3.3. Semantic Similarity Network Construction

3.4. Scalability and Graph Construction Complexity

3.5. Training and Testing Split

4. Method

4.1. GraphGPT-Patent

4.2. Time- and Domain-Conditioned Edge Reliability Modeling

4.3. Joint Objective of Impact Ranking Consistency

4.4. Explanatory Subgraphs and Structural Diagnostics

5. Experiments and Results

5.1. Metrics and Baselines

5.2. Main Results: System Performance Across Nine Settings

Ablation of Edge Reliability and Joint Ranking

5.3. Temporal Drift: Systematic Impact of Training Windows on Performance

5.4. Explainability Diagnostics: Structural Differences in Explanatory Subgraphs

6. Discussion, Conclusions, and Future Directions

6.1. Discussion

6.2. Implications for Leakage-Free Impact Prediction

6.3. Conclusions

6.4. Future Directions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Supplementary Quantitative Analysis

Appendix A.1. Decomposition of Prediction Window Difficulty

Appendix A.2. Effect Size of Temporal Drift Gains

Appendix A.3. Effect Size of Explanatory Structure Differences

Appendix A.4. Positive-Class Precision Across Nine Settings

Appendix A.5. Quantile-Threshold Robustness

Appendix A.6. Future-Year/OOD Check (3y Horizon)

Appendix A.7. Domain Imbalance Snapshot for the 2016 Test Slice

Appendix A.8. Hyperparameter Sensitivity of Graph Construction

Appendix A.9. Node Title Mapping for the Evidence-Subgraph Case

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI