Next Article in Journal
Multivariate Time-Series Forecasting of Youth Unemployment in Turkey: A Comparison of Deep Learning and Econometric Models
Previous Article in Journal
Construction of a New Hypersurface Family Using the Spherical Product in Minkowski Geometry
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Navigating Technological Frontiers: Explainable Patent Recommendation with Temporal Dynamics and Uncertainty Modeling

Business School, Lingnan Normal University, Zhanjiang 524048, China
Symmetry 2026, 18(1), 78; https://doi.org/10.3390/sym18010078
Submission received: 19 November 2025 / Revised: 10 December 2025 / Accepted: 14 December 2025 / Published: 2 January 2026

Abstract

Rapid technological innovation has made navigating millions of new patent filings a critical challenge for corporations and research institutions. Existing patent recommendation systems, largely constrained by their static designs, struggle to capture the dynamic pulse of an ever-evolving technological ecosystem. At the same time, their “black-box” decision-making processes severely limit their trustworthiness and practical value in high-stakes, real-world scenarios. To address this impasse, we introduce TEAHG-EPR, a novel, end-to-end framework for explainable patent recommendation. The core of our approach is to reframe the recommendation task as a dynamic learning and reasoning process on a temporal-aware attributed heterogeneous graph. Specifically, we first construct a sequence of patent knowledge graphs that evolve on a yearly basis. A dual-encoder architecture, comprising a Relational Graph Convolutional Network (R-GCN) and a Bidirectional Long Short-Term Memory network (Bi-LSTM), is then employed to simultaneously capture the spatial structural information within each time snapshot and the evolutionary patterns across time. Building on this foundation, we innovatively introduce uncertainty modeling, learning a dual “deterministic core + probabilistic potential” representation for each entity and balancing recommendation precision with exploration through a hybrid similarity metric. Finally, to achieve true explainability, we design a feature-guided controllable text generation module that can attach a well-reasoned, faithful textual explanation to every single recommendation. We conducted comprehensive experiments on two large-scale datasets: a real-world industrial patent dataset (USPTO) and a classic academic dataset (AMiner). The results are compelling: TEAHG-EPR not only significantly outperforms all state-of-the-art baselines in recommendation accuracy but also demonstrates a decisive advantage across multiple “beyond-accuracy” dimensions, including explanation quality, diversity, and novelty.

1. Introduction

Patents serve as a comprehensive chronicle of global technological progress and a critical asset for competitive advantage [1,2]. However, the exponential growth of patent filings has created a challenge of information overload. Identifying patents that are strategically relevant—such as emerging technologies or potential cross-domain applications—requires systems capable of analyzing complex, evolving relationships rather than performing simple keyword retrieval [3]. Consequently, patent recommendation has shifted from a static search task to a dynamic problem requiring deep semantic understanding and strategic foresight.
Existing research in patent recommendation has progressed from content-based analysis [4,5,6] to network-based approaches. Early methods leveraged homogeneous networks [7,8] to model direct relationships, such as citations or assignments. While useful, these approaches often fail to capture the multi-dimensional heterogeneity of the patent ecosystem. Recent advancements have adopted Heterogeneous Information Networks (HINs) and Knowledge Graphs (KGs) to model diverse entities and rich semantic paths [9,10,11,12].
Despite these improvements, two critical research gaps remain. First, most existing graph-based models treat patent data as a static snapshot, aggregating historical information while neglecting temporal dynamics. This static perspective fails to capture evolutionary patterns, such as the shifting research interests of an organization or the rapid rise of a technology cluster. Second, deep learning-based recommendation systems often suffer from a “black-box” limitation. In high-stakes R&D decision-making, recommendations lacking transparent, rational explanations are difficult for users to trust and adopt.
To address these challenges, we propose TEAHG-EPR, a novel framework for Explainable Patent Recommendation based on Temporal Evolution and Attributed Heterogeneous Graphs. We reframe the recommendation task as a learning process on a dynamic graph sequence. By constructing yearly snapshots of patent knowledge graphs, the model explicitly tracks the evolutionary trajectory of entities.
The framework employs a dual-encoder architecture: a Relational Graph Convolutional Network (R-GCN) [13] captures spatial structural information within each time snapshot, while a Bidirectional Long Short-Term Memory network (Bi-LSTM) models temporal evolutionary patterns. Furthermore, we introduce uncertainty modeling by learning a dual “deterministic core + probabilistic potential” representation for each entity, balancing recommendation precision with the exploration of latent opportunities. Finally, to ensure model transparency, we design a controllable text generation module using a Gated Feature Recurrent Unit (GFRU) [14] to provide feature-guided, faithful textual explanations for each recommendation.
The main contributions of this paper are as follows:
(1)
We propose TEAHG-EPR, an end-to-end framework that unifies structural, attribute, and temporal dimensions of patent data to address the limitations of static modeling.
(2)
We innovatively introduce uncertainty modeling into patent recommendation, utilizing Gaussian distributions to balance accuracy with the discovery of potential technological opportunities.
(3)
We design a controllable explanation module that generates relevant and traceable textual rationales, significantly enhancing system trustworthiness.
(4)
We demonstrate through experiments on real-world industrial (USPTO) and academic (AMiner) datasets that TEAHG-EPR outperforms state-of-the-art baselines in accuracy, diversity, and explanation quality.

2. Related Works

To clearly situate our academic contribution, this chapter systematically maps out the related work in the field of patent analysis and recommendation. We categorize the existing literature into three distinct streams: graph-based mining, content-based/hybrid techniques, and the emerging field of dynamic graph learning. This structure highlights the trajectory of technical evolution and identifies the specific gaps our work addresses.

2.1. Graph-Based Patent Mining and Knowledge Representation

The evolution of patent information mining has seen a paradigmatic shift from simple statistical analysis to complex network modeling [15].
Early research largely relied on homogeneous networks [16,17], constructing direct relationship graphs based on citations or assignments. The recommendation task was typically framed as a link prediction problem using structural similarity metrics [18]. While effective for local features, these methods were limited by their inability to distinguish the diverse entity types in the patent ecosystem.
To address this, researchers adopted Heterogeneous Information Networks (HINs) [19]. HINs capture complex semantics by modeling multiple node types (patents, assignees, IPCs). Approaches like those by Cheng et al. [20] and Zhang et al. [21] utilized meta-paths to quantify semantic similarity. However, HINs often struggle with the effective fusion of continuous attributes [22,23,24,25]. Recent advancements in Knowledge Graphs (KGs) [11] and multi-modal learning have sought to bridge this gap. For instance, AlignFusionNet [26] demonstrated the power of deep cross-modal alignment in 3D semantic prediction, a concept increasingly relevant for fusing patent texts with graph structures. Similarly, KTMN [27] introduced knowledge-driven modulation for visual reasoning, paralleling the need in patent analysis to use external knowledge to filter relevant technical signals. Despite these strides, most graph-based patent models remain static, aggregating historical data into a single snapshot.

2.2. Content-Based and Hybrid Recommendation Architectures

Parallel to graph methods, content analysis has evolved from keyword matching to deep semantic representation.
Early content-based methods focused on textual similarity [28,29] but often lacked predictive foresight. The introduction of Deep Learning significantly enhanced representation capabilities. Yoon et al. [30] and Cao et al. [31] utilized Doc2vec [32] and representation learning to capture latent semantics. However, navigating the vast search space of patent solutions remains computationally challenging, a problem sharing similarities with discrete optimization tasks like the Traveling Salesman Problem, where advanced search strategies such as Discrete Gorilla Troops Optimization [33] have proven vital for efficiency.
Hybrid recommendation systems attempt to combine the best of both worlds. Models like NeuACF [34] integrate neural networks with collaborative filtering to balance personalization and accuracy. While these methods fuse multi-source relationships [35,36,37,38,39,40,41,42], they often treat the recommendation process as a “black box,” lacking the explainability required for strategic R&D decision-making.

2.3. Dynamic Graph Learning and Temporal Evolution

A critical limitation in prior work is the insufficient modeling of temporal dynamics. The patent ecosystem is inherently evolutionary: technologies emerge, mature, and fade.
Dynamic Graph Learning has emerged as a powerful paradigm to handle such non-stationary data. Unlike static graph embeddings, dynamic approaches update node representations over time. Recent research in mobile social networks by Alilu et al. [43] highlighted the importance of adaptive data gathering schedules based on data variance to handle dynamic streams efficiently. This insight is directly applicable to patent recommendation, where the “variance” in technological trends dictates the value of information.
However, in the specific domain of patent recommendation, the application of dynamic graphs remains nascent. Most existing models [44,45] still rely on static snapshots or simple time-decay functions, failing to capture complex evolutionary patterns such as the shifting research trajectory of an assignee. Our proposed TEAHG-EPR framework specifically targets this gap by integrating temporal evolution modeling (via Bi-LSTM) directly into the heterogeneous graph learning process.

3. Materials and Methods

3.1. Problem Definition and Framework

3.1.1. Problem Definition

At its core, patent recommendation is about pinpointing, for a specific user like a corporation or a research lab, a handful of patents from a vast ocean of documents that align with their technological needs, strategic direction, or research interests. Formally, if we have a target user o and a pool of candidate patents P , the challenge at any given time t is to learn a ranking function f. This function should produce a Top-K list of patents, { p 1 , p 2 , , p K } , maximizing the relevance between the recommended patents and the user o.
Compared with traditional product or paper recommendations, the patent recommendation scenario has its own unique complexity and challenges [44,45,46]. This study summarizes these as follows:
First, there’s the Dynamic Temporality. The tech scene in patents isn’t fixed; it’s always changing. A patent’s worth, its place in a field, and links to other tech can shift a lot over time. Think about how new technologies emerge, citation networks grow, or market demands pivot. For instance, in a burgeoning field, an early patent might suddenly accrue a massive number of citations a few years down the line. A truly effective recommendation model has to capture these temporal shifts to offer forward-looking suggestions. Relying solely on outdated historical data just doesn’t cut it, as it completely misses the trajectory of technological progress [47].
Next, the Structural Heterogeneity & Multi-faceted Attributes. Patent info is a messy network with lots of connections. It includes things like patents, inventors, companies, tech categories (IPCs), and more—linked by stuff like who invented it, who owns it, citations, and transfers. Plus, each part has details, like how many claims a patent has or where a company is based. The central puzzle here is how to weave all this structural and attribute information together effectively to boost recommendation accuracy [46].
Finally, and perhaps most critically for real-world use, is the need for Decision Support & Explainability. Patent suggestions aren’t just for fun; they guide big business choices like buying tech, planning R&D, or avoiding lawsuits. So, a system you can’t understand inside out won’t cut it. Users need to trust the output. So, the system shouldn’t just say, “Here’s what you should look at.” It needs to explain why these specific patents are relevant, perhaps by revealing the deep technical connections to the user’s existing portfolio. This helps turn the recommendation into a real decision-making tool [45].
To tackle these challenges head-on, we frame patent recommendation as a learning problem on a Temporal Knowledge Graph. We start by constructing a sequence of knowledge graph snapshots by year, denoted as G s e q = { G ( t L ) , G ( t ) } . Here, each G ( t ) = ( V ( t ) , E ( t ) , A ( t ) ) represents the set of nodes (entities), edges (fact triples), and attributes up to time t. The ultimate goal is to learn a function f : ( o , G s e q ) ( { p 1 , p 2 , , p K } , { e 1 , e 2 , , e K } ) . This function should generate not only a personalized list of recommended patents for user o but also a corresponding textual explanation ei for each recommendation pi.

3.1.2. Overall Framework

The model we’ve designed to do this is called TEAHG-EPR (Temporal Evolution and Attribute Heterogeneous Graph for Explainable Patent Recommendation). It pulls together ideas from temporal evolution modeling, attribute-enhanced heterogeneous graph learning, and explainable text generation. Our design sought to build something that could handle the dynamic and heterogeneous nature of patent data. As we can see in Figure 1, the architecture is built around four core modules that work in concert:
Module I: Temporal-Aware Attributed Heterogeneous Graph Construction. This is the foundation. It’s responsible for taking raw patent data and structuring it into a sequence of yearly graph snapshots. Each graph mixes different nodes (patents, companies, inventors, IPCs) and adds extra details for patents and organizations.
Module II: Neighborhood Aggregation and Temporal Evolution Representation. This module does the heavy lifting of feature learning. It uses a Relational Graph Convolutional Network (R-GCN) [13] to aggregate features from the local neighborhood within each graph snapshot. Then, it sends these timed features to a bidirectional LSTM to see how an entity’s traits change over time.
Module III: Uncertainty-Aware Fine-Grained Recommendation. Here’s where the recommendation decision happens. We introduce a multi-dimensional Gaussian distribution to model the inherent uncertainty in each node’s representation. A multi-head attention mechanism then creates a fine-grained, context-aware representation by focusing on the most important attributes. To score patents, we use KL divergence, which neatly captures the asymmetric nature of similarity between entities.
Module IV: Feature-Based Explainable Text Generation. Finally, to explain recommendations clearly, this part creates text for each one. It pulls important tech words from the patent’s description and claims, then uses a Gated Feature Recurrent Unit (GFRU) model to make a solid, backed-up explanation.
While Figure 1 details the specific network components, we present a simplified data flow in Figure 2 to intuitively illustrate the information processing pipeline from input snapshots to the final dual-task output.

3.2. Constructing the Temporal-Aware Attributed Heterogeneous Graph

3.2.1. Patent Heterogeneous Graph

The world of patent data is a rich tapestry of different entities and the relationships between them. To capture this complexity, we construct a patent heterogeneous graph, defined as G = ( V , E , A ) , where V is the set of nodes, E is the set of edges, and A holds the node attributes.
Definition 1
(Heterogeneous Graph). A graph G = ( V , E ) is considered heterogeneous if there exist mapping functions for nodes, Ψ ( ) : V T , and for edges, ϕ ( ) : E R , such that the total number of node types and edge types is greater than two, i.e., T + R > 2 .
In our specific context, the set of node types T includes Organizations (O), Patents (P), Inventors (I), IPC Classifications (C), and Assignees (A). The relationships connecting them, R , are quite diverse. We define a set of six key relation types:
(1)
r1 (TransferOut): An organization transferring out a patent. A triplet (o, r1, p) means organization o was the seller of patent p.
(2)
r2 (TransferIn): The inverse, where an organization acquires a patent. So, (o, r2, p) indicates organization o obtained patent p.
(3)
r3 (InventedBy): The straightforward link between a patent and its creators, (p, r3, i).
(4)
r4 (BelongsTo): Connects a patent to its technical field via an IPC code, (p, r4, c).
(5)
r5 (OwnedBy): Represents the ownership link between a patent and its assignee, (p, r5, a).
(6)
r6 (Cites): The crucial citation link, where (pi, r4, pj) means patent i references patent j.
It is important to note that unlike traditional path-based models, our end-to-end TEAHG-EPR framework does not utilize pre-defined meta-paths during the training or inference phases. Instead, the R-GCN encoder implicitly captures high-order semantic dependencies through multi-hop message passing on the heterogeneous graph structure.

3.2.2. Attribute System

The raw graph structure is just one part of the story. The attributes of patents and organizations play a huge role in determining their relevance. After reviewing the literature [48] and consulting with domain experts, we’ve put together an attribute system that covers the technical, legal, and market dimensions.
For patents, we track 10 key indicators:
(1)
Technical Scope (A1): The number of unique 4-digit IPC classes a patent belongs to, which gives a sense of its technological breadth.
(2)
Number of Claims (A2): The total count of independent and dependent claims.
(3)
Patent Document Quality (A3): Measured by the total page count of the specification and claims.
(4)
Patent Competitiveness (A4): A composite score based on the strength of the assignee’s patent portfolio, detailed calculations are as follows.
(5)
Technological Novelty (A5): A score calculated from the textual similarity of the patent’s abstract to prior art, detailed calculations are as follows.
(6)
Grant Lag (A6): The time in years from the application date to the grant date.
(7)
Number of Inventors (A7): The size of the invention team.
(8)
Agent Status (A8): A binary variable indicating whether a patent agent was involved.
(9)
Forward Citations (A9): The number of times the patent is cited by later patents.
(10)
Backward Citations (A10): The number of prior art documents the patent cites.
To calculate patent competitiveness, C(P), we use the following formula:
C ( P ) = a i , I P C j | a i P A , I P C i P I F a i , I P C j | P A |
Here, PA is the set of assignees for patent P, and |PA| is their count. PI is the set of all IPC codes. The term F a i = [ F a i , I P C 1 , F a i , I P C 2 , , F a i , I P C j ] is an assignee-IPC vector where each element F a i , I P C j counts how many times assignee ai has patents in the IPCj class.
Technological novelty, N o v α ( P ) , is computed as:
N o v α ( P ) = e p m 2
In this formula, we set a similarity threshold α to 0.3. pm is the count of earlier patents that have a textual similarity score of α or higher with patent P.
For organizations, we use three main attributes:
(1)
Geographic Region (A11): The organization’s location, encoded at the city level.
(2)
Organization Type (A12): Categorized as a company, university, research institute, or individual.
(3)
Patent Textual Features (A13): A vector representation derived from the abstracts of the organization’s historical patent portfolio.
All these attributes are collected into an attribute matrix A m × n , where m is the total number of nodes and n is the number of attribute dimensions. Each node vi thus has an associated attribute vector A v i n .

3.2.3. Temporal Knowledge Graph

To really get at the dynamic nature of the patent landscape, we structure our knowledge graph as a sequence of yearly snapshots. So, for a target prediction year t, we construct a series of graphs spanning the previous L years up to the current year, giving us a sequence G s e q = ( G ( t L ) , G ( t L + 1 ) , , G ( t 1 ) , G ( t ) ) .
Each graph snapshot G ( t ) = ( V ( t ) , E ( t ) , A ( t ) ) contains:
(1)
V ( t ) : Set of existing nodes (entities) as of year t.
(2)
E ( t ) : Set of existing edges (facts triples) as of the year t.
(3)
A ( t ) : The entity attribute matrix as of year t.
Let’s take a look at a concrete example. Table 1 shows how the relationships of a patent, p1, evolve from 2019 to 2024. We’ll notice that some direct links, like who invented it or its primary IPC class, tend to be static. But the real action is in the dynamic links. Over time, new patents cite p1, its inventors publish new work, and its technology might get associated with new IPC classes. All these events cause the feature representations of the connected entities to change, which in turn drives the temporal evolution of p1’s own attributes. By organizing data like this, our model follows these changing effects through the network, building a strong, time-sensitive base for pulling features next.

3.3. Neighborhood Aggregation and Temporal Evolution Representation

This module is really the engine room of our model. Its job is to distill a rich, dynamic feature representation for each node, one that captures both its immediate structural environment and its historical trajectory. We break this down into two main steps.

3.3.1. Graph Neighborhood Aggregation Based on R-GCN

First things first, we need to make sense of the complex network structure within each yearly snapshot. The entities in our patent graph are interconnected through a variety of relationship types, forming a dense and intricate web. To effectively pull in information from this heterogeneous neighborhood, we use a Relational Graph Convolutional Network, i.e., R-GCN, as our core aggregation component.
For any given node vi, its feature representation h i ( l ) at layer l is updated according to the following rule:
h i ( l + 1 ) = σ ( r R j N i r 1 c i , r W r ( l ) h j ( l ) + W 0 ( l ) h i ( l ) )
What’s neat about this mechanism is how it allows different relationship types to learn their own distinct semantic pathways for feature propagation, thanks to the relation-specific weight matrices W r ( l ) . This is key to preserving the graph’s heterogeneity. In the formula, N i r is the set of neighbors of node vi under relation r, c i , r = | N i r | is a normalization constant, and W 0 ( l ) is the weight matrix for the node’s self-loop. For the activation function σ ( ) , we use ReLU.
Deep GNNs can sometimes suffer from the over-smoothing problem [49], where node representations become too similar. To improve how information flows between layers and keep our representations distinct, we also bring in a gated residual connection mechanism:
h i ( l + 1 ) = z i ( l ) h i ( l ) + ( 1 z i ( l ) ) h ˜ i ( l + 1 )
In which, h i ( l + 1 ) is the direct output from the R-GCN layer, and the gate vector z i ( l ) = σ ( W z h i ( l ) + b z ) adaptively controls how much of the previous layer’s information to carry over. The ⊙symbol denotes element-wise multiplication.
We run this R-GCN process independently on each graph snapshot G ( t ) in our temporal sequence G s e q . This gives us a series of node feature matrices, H ( t ) = { h 1 ( t ) , h 2 ( t ) , , h | V | ( t ) } , where each h i ( t ) d s is a static representation of node i at year t, and ds is the feature dimension.
A critical challenge in patent recommendation is the “cold-start” problem for newly filed patents that typically lack citation histories or transfer records. Our TEAHG-EPR framework addresses this through its attribute-driven heterogeneous graph structure. Even if a new patent node pnew has no r6 (Cites) or r1/r2 (Transfer) edges at time t, it is immediately linked to existing entities via static relations: r3 (InventedBy) to Inventors and r4 (BelongsTo) to IPC classes. Furthermore, its initial feature embedding h p n e w ( 0 ) is derived from its textual attributes (Section 3.2.2). Consequently, the R-GCN can aggregate information from these semantic neighbors (e.g., other patents by the same inventor or in the same IPC class) to generate a meaningful representation, enabling effective recommendation from the moment of filing.

3.3.2. Temporal Evolution Representation Based on Bidirectional LSTM

Now that we have a static representation for each node at each year, we need to string them together to understand how things change over time. Constructing individual time series for every neighbor of every entity is computationally prohibitive. Therefore, we adopted a streamlined ‘aggregate-then-encode’ strategy.
(1)
Building Relation-Specific Time Series
For any patent p (or organization o), and for any given time snapshot t within our window (tLτt), we first aggregate the features of all its neighbors under each specific relation type ri. We use a simple but effective mean pooling operation to get a consolidated neighborhood representation for that relation at that point in time:
h p , r i ( τ ) = M e a n P o o l i n g ( { h j ( τ ) | j N p , r i ( τ ) } )
Here, N p , r i ( τ ) is the set of neighbors of patent p linked by relation ri at time τ, and h j ( τ ) is the R-GCN-derived feature vector for neighbor j at that time τ.
By repeating this for every snapshot in our time window [tL, t], we end up with a compact and information-rich time series for each relationship type connected to patent p:
S p , r i = [ h p , r i ( t L ) , h p , r i ( t L + 1 ) , , h p , r i ( t ) ] ( L + 1 ) × d s
The length of this sequence is L + 1, and each element has a dimension of ds.
(2)
Encoding Temporal Patterns with Bi-LSTM
To capture the temporal dependencies within each of these relation-specific sequences, we feed each S p , r i into a Bidirectional Long Short-Term Memory (Bi-LSTM) network, as illustrated in Figure 3. A Bi-LSTM is great for this because it learns from both past-to-future and future-to-past contexts, giving us a much richer understanding of the evolutionary trends:
A r i ( p ) = B i L S T M ( S p , r i )
The output of the Bi-LSTM is a sequence of concatenated forward and backward hidden states. We take the final hidden state as a summary representation of the temporal evolution for that specific relational aspect. This summary vector has a dimension of 2dh, where dh is the hidden dimension of the LSTM. To standardize the dimensions for the next stage, we pass this through a final linear projection layer:
A r i ( p ) = L i n e a r ( B i L S T M ( S p , r i ) l a s t ) d p
The underlying forward pass of the LSTM is defined by the standard set of equations:
f t = σ ( W f [ h t 1 , x t ] + b f ) i t = σ ( W f [ h t 1 , x t ] + b i ) c ˜ t = tanh ( W c [ h t 1 , x t ] + b c ) c t = f t c t 1 + i t c t o t = σ ( W o [ h t 1 , x t ] + b o ) h t = o t tanh ( c t )
The backward LSTM is computed similarly, and the final hidden state at each step is a concatenation of the two directions: h t = [ h t ; h t ] .
(3)
Aggregating Multi-Dimensional Temporal Features
After all that, we’re left with a set of temporal evolution vectors, A r i ( p ) , one for each of the m relationship types connected to patent p. We simply stack these vectors together to form a multi-dimensional temporal evolution attribute matrix:
A ( p ) = S t a c k [ A r 1 ( p ) ; A r 2 ( p ) ; ; A r m ( p ) ] m × d p
We follow the exact same procedure to get a corresponding matrix A(o) for any given organization o. This resulting matrix, A(p), is a powerful summary. It concisely encodes the historical dynamics of patent p across various semantic dimensions (as defined by the relation types). This matrix becomes a crucial input for the fine-grained information aggregation module we’ll discuss next.

3.4. The Uncertainty-Aware Fine-Grained Recommendation Module

This is where our model moves from feature learning to actual recommendation. But we’re not just going to use a standard similarity score. Instead, breaking from the tradition of deterministic embeddings, we’re introducing uncertainty modeling with multi-dimensional Gaussian distributions for the first time in patent recommendation. The idea is to capture both the core relevance of an entity and its potential for technological exploration. The detailed workflow of this module is depicted in Figure 4.

3.4.1. Fine-Grained Information Aggregation and Multi-Head Attention Mechanism

Following the extraction of temporal evolution features, the next step is intelligent fusion. Since historical information varies in significance depending on the context, we employ a multi-head attention mechanism. For a deep-tech company, for instance, the track record of an inventor might be far more telling than anything else. For a market-driven firm, however, the diversity of IPC classes might be the key signal [50]. To capture these subtle, context-dependent importances, we bring in a multi-head attention mechanism.
First, let’s define the two key inputs for our attention mechanism:
(1)
Body Representation: For a patent p, we take its feature vector straight from the final layer of the R-GCN module, specifically from its application on the latest time snapshot, G(t). We call this H ( p ) d p . This vector primarily encodes the patent’s static neighborhood structure at the current moment—think of it as its core identity.
(2)
Temporal Evolving Attribute Representation: For the same patent p, we use the temporal evolution feature matrix A ( p ) m × d p that we constructed in the previous section. This matrix holds the dynamic history of its m different types of neighbors (like inventors, IPCs, etc.) over the observed time window.
Now, we use the multi-head attention mechanism to let the patent’s core identity “query” its own history, figuring out which parts of its past are most relevant to its present. Specifically, we set the body representation H(p) as the Query, and the temporal evolution matrix A(p) as both the Key and the Value. This allows us to compute a weighted summary of its attributes:
M u l t i H e a d ( Q , K , V ) = C o n c a t ( h e a d 1 , , h e a d h ) W O
where each head is computed as:
h e a d i = A t t e n t i o n ( Q W i Q , K W i K , V W i V )
and the attention function itself is the standard scaled dot-product attention:
A t t e n t i o n ( Q , K , V ) = softmax ( Q K T d k ) V
In these equations, h is the number of attention heads, the W matrices are the learnable projection matrices for each head, and dk = dp/h is the dimension of each head. Specifically for a patent p, the inputs are set up as:
Q = H ( p ) 1 × d p K = V = A ( p ) m × d p
The output of this process is a single, contextually-weighted attribute vector M ( p ) d p . This vector now adaptively combines the temporal information from different relational aspects based on their contribution to the patent’s core identity.
To get our final, unified representation, we concatenate the patent’s original body representation H(p) with this new weighted attribute vector M(p) and pass it through a Feed-Forward Network (FFN):
I ( p ) = F F N ( [ H ( p ) ; M ( p ) ] )
F F N ( x ) = ReLU ( x W 1 + b 1 ) W 2 + b 2
Here, the W’s and b’s are learnable weights and biases. We follow the same procedure to get a final fused representation I ( o ) d o for any given organization o.

3.4.2. Node Uncertainty Modeling Based on Multi-Dimensional Gaussian Distribution

Now that we have this powerful, deterministic feature vector I(vi) for each node, we can add another layer of sophistication. A single point vector (i.e., a point-wise embedding) is great, but it doesn’t capture uncertainty. For example, the application boundaries of a foundational patent are often fuzzy, and a company’s future technological needs are rarely set in stone. To model this, we introduce multi-dimensional Gaussian distributions.
Standard graph embedding methods typically map each entity to a fixed point in a vector space. However, this approach implies infinite precision and certainty, which is ill-suited for the dynamic and often ambiguous nature of patent data. For instance, the protection scope of a patent claims is a region rather than a point, and an organization’s future R&D interest is a fuzzy boundary rather than a specific target. To address this theoretical limitation, we adopt a probabilistic embedding strategy. We acknowledge that raw patent metrics (e.g., citation counts) often exhibit power-law or heavy-tailed distributions. However, our goal is not to fit the global topological distribution, but to model the local semantic uncertainty of individual entities in a latent representation space. The multivariate Gaussian distribution N ( μ v i , Σ v i ) serves as the maximum entropy distribution for continuous variables given a mean and variance, making it the most robust and unbiased choice for modeling local uncertainty when the true latent distribution is unknown. Furthermore, the learned non-linear projection layers (Equations (17) and (18)) effectively transform the potentially skewed input features into a latent manifold where the Gaussian approximation is numerically stable. Instead of representing a node vi as just a single vector, we model it as a full probability distribution N ( μ v i , Σ v i ) , with a mean vector μ v i d and a covariance matrix Σ v i d × d . This design mathematically decouples the “core technical identity” (represented by the mean μ v i ) from the “potential exploration scope” (represented by the covariance Σ v i ). It allows the model to perform “soft matching”—identifying patents that may not perfectly align with the core keywords but fall within the organization’s tolerable exploration radius. Crucially, we generate the parameters of this distribution directly from the deterministic feature representation we just created. We use a simple projection network to map the feature vector I(vi) to the Gaussian parameters:
μ v i = I ( v i ) W μ + b μ
log σ v i 2 = I ( v i ) W Σ + b Σ
where the W’s and b’s are learnable parameters. We predict the log-variance to ensure that the variance is always non-negative. With this, every node now has a probabilistic representation N ( μ v i , d i a g ( σ v i 2 ) ) that is centered around its deep temporal features but also captures its inherent uncertainty.

3.4.3. Organization—Patent Similarity Calculation and Recommendation List Generation

With both a deterministic feature vector I ( ) and a probabilistic distribution N ( ) for each entity, we can now design a hybrid similarity metric. The goal is to capture both the core relevance and the potential matching possibility between an organization and a patent.
(1)
Deterministic Core Relevance Measurement
The deterministic vectors I(o) and I(p) represent the most likely positions of the organization and patent in the feature space. They capture their core, unambiguous technical identities. We use cosine similarity to measure this, as it’s excellent at capturing the alignment of vector directions in high-dimensional space.
s i m det ( o , p ) = I ( o ) I ( p ) I ( o ) I ( p )
A high simdet score means a direct, strong match between the organization’s current technology stack and the patent’s core subject matter.
(2)
Probabilistic Potential Matching Measurement
The probabilistic distributions N o and N p , through their covariance matrices, capture the uncertainty and technological reach of each entity. An organization’s distribution might cover its potential future R&D directions, while a patent’s distribution could represent its various potential application domains. To measure how well these distributions align, we use the Kullback–Leibler (KL) divergence, transformed into a similarity score.
The KL divergence, D K L ( N o | | N p ) , measures the information lost when using the patent’s distribution N p to approximate the organization’s needs distribution N o . A smaller value is better. The asymmetry of KL divergence has a very intuitive meaning here: it makes sense for a technologically broad organization (large variance) to be matched with a technologically specific patent (small variance), resulting in a small DKL. The reverse, however, would be a poor fit and yield a large DKL.
We convert this distance into a similarity score using a negative exponential function:
s i m prob ( o , p ) = exp ( λ D K L ( N o | | N p ) )
The full formula for the KL divergence between two diagonal Gaussian distributions is:
1 2 ( i = 1 d ( log σ p , i 2 σ o , i 2 + σ o , i 2 + ( μ o , i μ p , i ) 2 σ p , i 2 ) d )
In which, d is the dimensionality of the feature space. The temperature parameter λ > 0 controls the sensitivity of the score to the KL distance. This simprob score is great for unearthing “high-potential” patents that may not be a perfect core match but fall within the organization’s exploratory boundaries.
(3)
Hybrid Recommendation Score and List Generation
The final recommendation score is a weighted combination of these two metrics, creating a system that benefits from both perspectives:
s c o r e ( o , p ) = ( 1 α ) s i m det ( o , p ) + α s i m prob ( o , p )
The hyperparameter α ∈ [0, 1] allows us to balance the importance of core relevance versus potential matching. For example, a company looking for disruptive technologies might want a higher α, while one focused on incremental improvements might prefer a lower one.
For any target organization o, we calculate this hybrid score against all patents in the candidate pool P candidate , rank them in descending order, and take the top K to form our final recommendation list:
RecList ( o ) = T o p K p P candidate ( s c o r e ( o , p ) )

3.5. Feature-Based Explainable Text Generation

A good recommendation isn’t enough; we need to explain why it’s good. This section details our module for generating explanations, which is designed to be both controllable and deeply tied to the specifics of each recommendation.

3.5.1. Construction of Patent Technical Feature Document

To enhance the explainability of our recommendations, we draw inspiration from the concept of citation context [46] and construct technical feature documents for both organizations and patents.
For a recommended patent p, we aggregate the text from its specification (including sections like “Background,” “Summary of the Invention,” etc.) to form its technical feature document, denoted as Tp. This document directly reflects the patent’s technical details and innovative contributions.
Similarly, for a target organization o, we collect and combine the specification texts from all patents in its historical portfolio. This collection forms the organization’s technical feature document, denoted as Qo, which provides a comprehensive view of its technological foundation and research trajectory.
Once these documents are constructed, we apply standard text preprocessing steps, including tokenization, stop-word removal, and stemming, to obtain the clean word sequences needed for the subsequent feature extraction phase.

3.5.2. Extraction of Technical Characteristic Words Based on PMI

With our feature documents in hand, the next challenge is to distill the most representative keywords from them. We need to find the terms that best connect an organization to a recommended patent. For this, we use Pointwise Mutual Information (PMI) [51]. PMI is a great tool for this job because it measures the statistical association between two words. A high PMI score suggests they co-occur more often than by chance.
Given an organization’s feature document Qo and a patent’s feature document Tp, we first identify a set of candidate feature words, Fo and Fp, by filtering for high-frequency terms that appear in both documents. Then, for any pair of feature words (fo, fp), we calculate their PMI as:
P M I ( f o , f p ) = log p ( f o , f p ) p ( f o ) p ( f p ) = log p ( f o | f p ) p ( f o )
In which p ( f o , f p ) is the joint probability of the two words appearing together, while p ( f o ) and p ( f p ) are their individual marginal probabilities. In practice, we estimate these probabilities from word counts within a co-occurrence window:
p ( f o ) = c o u n t ( f o ) f F o c o u n t ( f ) p ( f p ) = c o u n t ( f p ) f F p c o u n t ( f ) p ( f o , f p ) = c o u n t ( f o , f p ) ( f , f ) F o × F p c o u n t ( f , f )
To find the single best explanatory feature word f p * from the patent’s feature set Fp, we look for the one that has the highest total PMI with all the feature words in the organization’s set Fo:
f p * = arg max   f p F p P M I ( F o , f p ) s . t . P M I ( F o , f p ) = f o F o P M I ( f o , f p )
This sum is an approximation based on an independence assumption. We then select the top N words with the highest scores to serve as the guiding features for our text generation model.

3.5.3. Explanation Text Generation Based on GFRU

Standard text generation models like RNNs, LSTMs, or GRUs can produce fluent sentences, but they often lack controllability. It’s hard to ensure they stick to a specific topic. To solve this, we use a Gated Fusion Recurrent Unit (GFRU) model. The beauty of the GFRU is that it has a parallel mechanism specifically designed to incorporate external feature words, which makes the generated explanations far more accurate and trustworthy.
The GFRU process unfolds in two stages: encoding and decoding.
(1)
Encoding Stage: Initializing the Decoder State
Before we start generating any text, we need to give the decoder a good starting point—an initial hidden state that’s rich with context. To ensure that the generated explanation is not merely a post hoc rationalization but is deeply rooted in the recommendation logic, we strictly condition the generator’s initialization on the specific entity representations learned by the recommendation module:
h 0 = tanh ( W e [ I ( o ) ; I ( p ) ] + b e )
Here, I(o) and I(p) are the deterministic feature representations from the previous section. The weights W e 2 d o × d h and bias b e d h are learnable parameters, and dh is the hidden state dimension. This initial state h 0 d h packs in the core semantic understanding of the recommendation, giving the generation process a clear, relevant direction from the very beginning. This structural coupling ensures that the gradients from the explanation loss back-propagate to update the shared node embeddings I(o) and I(p) during the joint training process (detailed in Section 3.6). Thus, the explanation generation is intrinsically dependent on the exact feature states that determined the recommendation score.
(2)
Decoding Stage: Controllable Sequence Generation
During decoding, the GFRU needs three inputs at each time step: the previously generated word, the previous hidden state, and a guiding “feature topic vector.”
Let’s first define this key feature topic vector. From the previous section, we’ve already extracted the top N most relevant technical keywords { f 1 , f 2 , , f N } . We take the pre-trained word embeddings for these keywords and aggregate them, for instance by mean pooling, to create a single feature topic vector:
x f = M e a n P o o l i n g ( { E m b ( f 1 ) , E m b ( f 2 ) , , E m b ( f N ) } )
This vector xf captures the semantic essence of the core technical reason for the recommendation. It will be fed into the GFRU at every decoding step to keep the generation process anchored to these key concepts.
With all inputs defined, at each time step n, the GFRU takes the previous word’s embedding xn−1, the previous hidden state hn−1, and the feature topic vector xf to produce the current hidden state hn:
h n = G F R U ( x n 1 , h n 1 , x f )
The GFRU itself is composed of three sub-modules that work together:
(1)
The Context GRU: This is a standard GRU that takes the previous word xn−1 and hidden state hn−1 to produce a new candidate hidden state h n α . Its main job is to ensure the output sentence is grammatically correct and fluent.
z n α = σ ( W z α [ x n 1 ; h n 1 ] + b z α ) r n α = σ ( W r α [ x n 1 ; h n 1 ] + b r α ) h ˜ n α = tanh ( W h α [ x n 1 ; r n α h n 1 ] + b h α ) h n α = z n α h n 1 + ( 1 z n α ) h ˜ n α
(2)
The Feature GRU: This GRU works in parallel. Instead of the previous word, it takes the constant feature topic vector xf as input at every step, along with the previous hidden state hn−1. Its job is to produce a candidate hidden state h n β that is infused with the semantics of our keywords.
z n β = σ ( W z β [ x f ; h n 1 ] + b z β ) r n β = σ ( W r β [ x f ; h n 1 ] + b r β ) h ˜ n β = tanh ( W h β [ x f ; r n β h n 1 ] + b h β ) h n β = z n β h n 1 + ( 1 z n β ) h ˜ n β
Its structure is identical to the context GRU, but its parameters are independent.
(3)
The Gated Fusion Unit (GFU): This is where the magic happens. The GFU takes the outputs of both GRUs, h n α and h n β , and dynamically decides how much to listen to each.
h ¯ n α = tanh ( W α h n α ) h ¯ n β = tanh ( W β h n β ) ϕ n = σ ( W ϕ [ h ¯ n α ; h ¯ n β ] + b ϕ ) h n = ( 1 ϕ n ) h ¯ n α + ϕ n h ˜ n β
The gate vector ϕn acts as a learned switch. When its values are close to 0, the final hidden state hn relies on the Context GRU, generating fluent, connective words. When its values are close to 1, it switches to the Feature GRU, injecting our keywords into the sentence. This dynamic balancing act ensures the final text is both readable and factually grounded in our extracted features.
Finally, to predict the next word, we pass the final hidden state hn through a softmax layer over the vocabulary:
p ( y n | y < n , h 0 ) = softmax ( W v h n + b v )

3.5.4. Loss Fuction for Explanation Generation

To train this module, we use a standard cross-entropy loss function. Given a ground-truth explanation sequence { y 1 , y 2 , , y | S o , p | } , the model’s objective is to minimize the difference between its predicted word probabilities and the actual words in the sequence:
L e = 1 | T | ( o , p ) T 1 | S o , p | n = 1 | S o , p | log p ( y n | y < n , h 0 )
In which, T is the set of organization-patent pairs in our training set.

3.6. Model Training and Optimization

3.6.1. Joint Training Strategy

Our TEAHG-EPR model isn’t just a pipeline; it’s an integrated system with two core tasks: recommendation and explanation generation. To make them work together, we use a joint training strategy that optimizes for both recommendation accuracy and explanation quality at the same time.
For the recommendation task, we use the Bayesian Personalized Ranking (BPR) loss [52], a classic choice that’s well-suited for learning from the kind of implicit feedback we have (e.g., patents an organization has acquired).
L r = ( o , p + , p ) D r log σ ( s c o r e ( o , p + ) s c o r e ( o , p ) )
In this setup, D r is our training set of triplets. Each triplet (o, p+, p) consists of an organization o, a “positive” patent p+ that it has a known connection to, and a “negative” patent p that it doesn’t. The BPR loss simply pushes the model to score the positive patent higher than the negative one.
For the explanation generation task, as we’ve already discussed, the loss function is the standard cross-entropy loss, L e .
We then combine these two into a single joint loss function:
L t o t a l = μ L r + ω L e + γ Θ 2 2
Here, μ and ω are balancing hyper-parameters that let us control the relative importance of the recommendation and explanation tasks. We also include a standard L2 regularization term, weighted by γ, over all model parameters Θ to help prevent overfitting.
This joint training approach is powerful because it allows the two tasks to inform each other. A good recommendation provides a solid foundation for a good explanation. But in a rather elegant feedback loop, the signals from the explanation loss can also guide the recommendation module to learn more semantically meaningful feature representations.

3.6.2. Optimization Algorithms and Hyperparameter Settings

To update the model’s parameters, we use the Adam optimizer. It’s a go-to choice for this kind of work because it combines the benefits of momentum and adaptive learning rates, making it effective for handling the sparse gradients and non-stationary objectives common in deep learning. The update rules for Adam are as follows:
m t = β 1 m t 1 + ( 1 β 1 ) Θ L t o t a l v t = β 2 v t 1 + ( 1 β 2 ) ( Θ L t o t a l ) 2 m ^ t = m t 1 β 1 t , v ^ t = v t 1 β 2 t Θ t + 1 = Θ t η m ^ t v ^ t + ε
We use the standard decay rates of β1 = 0.9 and β2 = 0.999, with a small ε = 10−8 for numerical stability.
For the learning rate schedule, we employ a cosine annealing strategy. This starts with a relatively high learning rate to converge quickly in the early stages and then gradually decreases it, allowing for fine-tuning as the model gets closer to a solution. The learning rate ηt at any step t is calculated as:
η t = η min + 1 2 ( η max η min ) ( 1 + cos ( t T π ) )
We set the initial learning rate ηmax to 10−3 and the minimum ηmin to 10−5, where T is the total number of training iterations.
We set our batch size to 128. If we run into GPU memory constraints, we can use gradient accumulation, where we compute gradients for several smaller batches and only perform a parameter update after accumulating them. To prevent overfitting, we also use an early stopping strategy, monitoring the NDCG@10 on a validation set and halting training if there’s no improvement for 20 consecutive epochs.
The main hyperparameters for our model are summarized in Table 2.

3.6.3. Training Process

The complete training process for the TEAHG-EPR model is laid out in Algorithm 1. It shows how we orchestrate the different parts of the model within a single end-to-end training loop.
Algorithm 1: TEAHG-EPR Model Training Algorithm
Input: Temporal patent knowledge graph sequence Gseq, Training set D t r a i n = {(o, p+) | o has an interaction with p+}, Validation set D v a l , Hyperparameter set Θconfig.
Output: Trained model parameters Θ*.
1: Initialize model parameters Θ
2: // Main training loop begins
3: for epoch = 1 to max_epochs do
4:    // Recommendation Module Training
5:    for each mini-batch B r in D t r a i n do
6:       For each positive pair (o, p+), sample a negative patent p− to form triplets {(o, p+, p−)}
7:       // Step 1: Extract Deterministic Features
8:       Apply R-GCN to each graph snapshot in Gseq to get yearly node features {H(t)}
9:       Use Bi-LSTM to extract temporal evolution features A(o), A(p+), A(p−) for all entities in the triplets
10:     Aggregate fine-grained information via multi-head attention to get M(o), M(p+), M(p−)
11:     Concatenate and fuse to compute the final deterministic feature representations I(o), I(p+), I(p−)
12:     // Step 2: Generate Probabilistic Representations and Compute Scores
13:     Project the deterministic features I(.) to Gaussian distribution parameters (μ(.),Σ(.))
14:     Calculate deterministic similarities simdet(o, p+) and simdet(o, p−)
15:     Calculate probabilistic similarities simprob(o, p+) and simprob(o, p−)
16:     Compute the final recommendation scores score(o, p+) and score(o, p−)
17:     // Step 3: Compute Recommendation Loss
18:     Calculate the recommendation loss L r using the BPR loss function
19:   end for
20:
21:   // Explanation Generation Module Training
22:   for each mini-batch B e in D t r a i n do
23:     For each positive pair (o, p+), extract its technical feature words
24:     Use the already computed I(o), I(p+) as the initial input for the GFRU
25:     Generate the explanation text with the GFRU and compute the cross-entropy loss L e
26:   end for
27:
28:   // Joint Optimization
29:   Compute the total loss L t o t a l = μ L r + ω L e + γ Θ 2 2
30:   Perform a single backpropagation step and update all shared parameters Θ using the Adam optimizer
31:
32:   // Validation and Early Stopping
33:   Evaluate recommendation performance on the validation set D v a l
34:   if performance has not improved for a specified number of epochs then
35:     break
36:   end if
37:   Update learning rate (cosine annealing)
38: end for
39:
40: return trained model parameters Θ*

3.7. Personalization and Diversity Optimization

3.7.1. Personalized Recommendation Based on Organizational Clustering

Getting a ranked list is just the first step. We also want the recommendations to be personalized and diverse.
Different types of organizations have vastly different needs. A university might be looking for foundational, research-oriented patents, while a company is likely more interested in applied technologies with high market value. To add a layer of personalization, we perform a post-recommendation clustering of the target organizations based on non-accuracy metrics.
We use two main metrics for this:
(1)
Intra-List Similarity (IntraSim): This measures the average similarity within a recommended list, giving us a handle on its diversity:
I n t a r S i m ( o ) = 2 K ( K 1 ) i = 1 K 1 j = i + 1 K s i m ( p i , p j )
Here, K represents the length of the recommendation list; pi and pj are patents in the recommendation list; sim(pi, pj) is the cosine similarity between the patents (as Equation (19)). The lower the IntraSim value, the higher the recommendation diversity.
(2)
Popularity: This measures the average popularity of the recommended patents, indicating how much the model leans towards mainstream or niche items.
P o p u l a r i t y ( o ) = 1 K i = 1 K log ( 1 + f r e q ( p i ) )
Here, freq(pi) is defined as the forward citation count of patent pi. A lower Popularity score means the recommendations are less mainstream.
Using these two dimensions—IntraSim and Popularity—we can employ a K-means clustering algorithm to segment organizations into distinct behavioral profiles. This analysis reveals some fascinating user personas:
(1)
Mediators: High diversity, high popularity. These organizations tend to acquire a broad portfolio of well-known, popular patents.
(2)
Domain Leaders: Low diversity, high popularity. Their focus is narrow and deep, concentrating on the core, essential patents within their specific field.
(3)
Explorers: High diversity, low popularity. These are often research-oriented entities, constantly scouting for a wide range of novel, cutting-edge technologies that may not be popular yet.
(4)
Niche Specialists: Low diversity, low popularity. These organizations carve out a space for themselves by focusing intensely on a specific, often obscure, technological niche.
(5)
Emerging Players: Medium diversity, medium popularity. These organizations are typically in a technology accumulation phase, building out their portfolio.
(6)
Focused Experts: Extremely low diversity, medium popularity. They follow a highly specialized technological path, concentrating on a very narrow set of technologies.
The real power of this clustering is that it allows us to then tailor our recommendation strategies. For organizations identified as ‘Mediators’ or ‘Explorers,’ we can adjust the re-ranking process to boost the diversity of the final list. Conversely, for ‘Domain Leaders’ and ‘Niche Specialists,’ the strategy shifts to prioritizing specialization and relevance, ensuring the recommendations are as focused and pertinent as possible.

3.7.2. Re-Ranking and Quality Filtering of Explanations

To further polish the final recommendation list, we introduce a re-ranking step that considers not just the raw recommendation score, but also the quality of the generated explanation and the list’s diversity.
First, we score the quality of each explanation Eo,p:
Q ( E o , p ) = β 1 B L E U ( E o , p , E r e f ) + β 2 C o v ( E o , p , F p )
This score is a weighted sum of the BLEU score against a reference text (Eref, i.e., the patent’s abstract) and the coverage of our extracted feature words (Fp) in the generated text.
Then, we compute a new ranking score that combines everything:
R a n k S c o r e ( o , p ) = s c o r e ( o , p ) + δ 1 Q ( E o , p ) δ 2 RedundancyPenalty ( p )
The RedundancyPenalty term discourages recommending patents that are too similar to items already in the list, drawing inspiration from the Maximal Marginal Relevance (MMR) principle. δ 1 and δ 2 are the equilibrium coefficients. Finally, we re-rank the patents based on this new RankScore, filter out any recommendations with an explanation quality below a certain threshold, and present the final Top-K list along with their corresponding explanations.

3.8. Model Complexity Analysis

The time complexity of proposed method is mainly driven by two components: the R-GCN neighborhood aggregation and the GFRU explanation generation.
(1)
R-GCN Aggregation: For an R-GCN with L layers, the complexity per layer is roughly O ( | E | d s 2 ) , leading to a total of O ( L | E | d s 2 ) , where | E | is the number of edges.
(2)
Bi-LSTM Encoding: For a sequence of length nT, the complexity is O ( n T d h 2 ) .
(3)
Multi-Head Attention: This comes in at O ( m d p 2 ) , where m is the number of relation types.
(4)
GFRU Generation: For an explanation of length | S | , the complexity is O ( | S | d h 2 ) .
The overall time complexity is approximately O ( e p o c h s ( N b a t c h ( L | E s u b | d s 2 + m d p 2 ) + | S | d h 2 ) ) , where | E s u b | is the number of edges in a sampled subgraph. This is manageable in practice with parallel computing and batch processing.
The space complexity is primarily determined by the storage of node embeddings and model parameters. The main costs are the node embedding matrices O ( | V | d s ) , the final feature representations O ( | V | d o ) , and the model parameters like O ( d s 2 + d h 2 ) . This brings the total space complexity to around O ( | V | ( d s + d o ) + d h 2 ) .

4. Experiments

4.1. Datasets

To give our TEAHG-EPR model a thorough workout, we built and utilized two large-scale datasets with distinct characteristics. This wasn’t just about testing the model’s performance in its primary application area; we also wanted to see if its underlying framework was general enough to handle a different domain entirely. Our main proving ground is a real-world industrial dataset derived from United States patents [53], while a widely-used academic graph dataset [54] from AMiner serves as our testbed for generalization.

4.1.1. USPTO-Semiconductor Patent Dataset

Our core experimental data comes from the public records of the United States Patent and Trademark Office (USPTO) [53]. Specifically, we tapped into the patents-public-data project on the Google BigQuery platform. This resource is a lifesaver, offering well-structured data that is highly accessible via SQL queries, which greatly simplified our data construction process. To keep the computational load manageable while ensuring the data was representative, we zeroed in on a key sector known for rapid innovation and dense intellectual property: semiconductor manufacturing. We did this by filtering for all patents whose International Patent Classification (IPC) codes begin with “H01L”.
Turning this raw data into a temporal heterogeneous graph that our model could use involved a careful construction process. We started by setting a time window, pulling all granted U.S. semiconductor patents from 1 January 2005 to 31 December 2024. This 20-year span is long enough to capture significant technological shifts. From this pool, we extracted five core entity types: Patents, Inventors, Assignees/Organizations, four-digit primary IPC classes, and Applicants. A tricky but crucial detail here was handling inconsistencies in organization names—think “IBM” versus “International Business Machines Corporation.” To sort this out, we leveraged the pre-harmonized assignee IDs from the PatentsView project, which are integrated into BigQuery. This move was key to ensuring our organization nodes were unique and accurate. However, we acknowledge that this entity normalization strategy operates at a relatively coarse granularity. It aggregates all patents from diverse subsidiaries (e.g., Samsung Display vs. Samsung Electronics) into a single corporate node. While this approach effectively mitigates data sparsity by pooling interaction histories, it may introduce a degree of “semantic drift,” where the resulting node representation reflects an average of the conglomerate’s diverse interests rather than the specific needs of a subunit. To cut down on noise, we also filtered out any fringe entities, like inventors or organizations with fewer than five patents to their name.
Once the entities were defined, we built out the graph’s structure, strictly following the six relation types defined in Section 3.2.1. For the core recommendation task, we turned to the separate patent assignment dataset to extract two directed relationships for organization-patent interactions: TransferIn (an organization acquires a patent) and TransferOut (an organization divests a patent). The TransferIn relation is the heart of our recommendation task, with each record (o, r2, p, t) forming a positive interaction sample for training and evaluation. While TransferOut isn’t a direct label, we kept it in the graph as valuable context, helping the model understand an organization’s dynamic technology strategy more fully. Additionally, we collected the English abstract for each patent as the raw text input for our explanation module and computed the 13 initial attributes for each node based on the system defined in Section 3.2.2.
When it came to splitting the data, we used a time-sensitive “leave-one-out” strategy for each organization’s interaction history to mimic a real-world prediction scenario and prevent any data leakage. Here’s how it worked: for any organization with at least three acquisitions, all acquisitions made before 31 December 2024, went into the training set. Then, that organization’s very last acquisition within 2024 was held out as the positive sample for the test set, and its second-to-last acquisition became the positive sample for the validation set. This ensures that when we make a prediction, the model has only seen information from the past. During evaluation, we matched each positive sample (o, p+) with 100 randomly sampled patents p− that the organization had never interacted with, creating a ranked list for the model to sort. After all this meticulous processing, the final statistics for our USPTO-Semiconductor dataset are laid out in Table 3.

4.1.2. AMiner-DBLP Dataset

A great performance in one setting could just be a fluke or the result of overfitting to a specific data distribution. That is why testing the generalizability of our model’s framework is so important. We decided to port our model to a structurally similar but entirely different domain: academic paper recommendation. For this, we used the AMiner-DBLP dataset (Version 12) [54], a widely used benchmark in the academic community, and focused on the broad field of Computer Science.
Adapting our model to this dataset required some thoughtful analogies, since there is no direct “patent acquisition” behavior in academia. Our mapping strategy was as follows: we treated “Papers” as our patents, “Authors” as our inventors, and “Venues” (journals/conferences) as our IPC classes. The most critical adaptation was defining the user-item interaction. We designated “research organizations” as the “users” in our system, entities that we carefully extracted and normalized from the author affiliation field in the dataset. We then defined a positive interaction: if an author published a paper while affiliated with organization A, we considered organization A to have an interaction with that paper. The elegance of this setup is that it mirrors the accumulation of intellectual assets and output by an entity, making it semantically consistent with our organization-patent model.
With this setup, we extracted all relevant papers, citations, authors, and abstracts from the Computer Science domain between 2005 and 2024. The dataset partitioning followed the exact same time-sensitive, leave-one-out approach used for the USPTO dataset, with each organization’s most recent publication in 2024 serving as the test case. The final statistics for the constructed AMiner-CS dataset are detailed in Table 4. Although the specific node and edge types differ, its massive scale, inherent temporal nature, and complex heterogeneous structure provided an excellent second battlefield to test the generalization power and robustness of the TEAHG-EPR framework.

4.2. Evaluation Metrics

A robust recommendation system, we believe, is more than just about hitting accuracy targets. It needs to be judged on multiple fronts—how well it predicts, how trustworthy its explanations are, and whether it broadens a user’s horizon. To that end, we’ve designed a multi-faceted evaluation framework. It encompasses core recommendation performance metrics, measures for assessing the quality and credibility of explanations, and finally, indicators for gauging the diversity and novelty of the recommendations themselves.

4.2.1. Recommendation Performance Metrics

These metrics form the bedrock of any recommendation system evaluation. They tell us how effectively our model can sift through a vast universe of potential patents to pinpoint those most relevant to a user. In our setup, for each positive sample in the test set—that is, a patent a target organization actually acquired—our model needs to rank it as highly as possible within a list that also includes 100 randomly sampled negative patents. We have chosen two widely used and complementary Top-K ranking metrics for this purpose.
First, Hit Rate @K (HR@K). It is a straightforward measure, asking a simple question: “Does the correct item appear within the top K recommendations?” A higher HR@K indicates a better recall capability, meaning the model is more successful at retrieving relevant items.
However, simply “hitting” the target is not the whole story. There is a world of difference between a recommendation ranked first and one that barely scrapes into the Top-10. To capture this finer nuance of ranking quality, we turn to Normalized Discounted Cumulative Gain @K (NDCG@K). The core idea behind NDCG@K is that correctly ranked recommendations should contribute more when they appear higher up the list. It elegantly incorporates a position-aware gain discount, making it more sensitive to the ranking quality than HR@K.
N D C G @ K = 1 | U t e s t | u U t e s t D C G u @ K I D C G u @ K
We report the results for K values of 10, 20, and 50 to provide a comprehensive view of the model’s performance across different list lengths.

4.2.2. Explanation Quality Metrics

For any explainable recommendation system, the quality of the explanation is as vital as the accuracy of the recommendation itself. We’re looking for explanations that are not only fluent and easy to read, like something a human expert would write, but also genuinely faithful to the core reasons behind the recommendation. To assess this, we’ve adopted a two-pronged approach.
We first turn to two classic metrics from Natural Language Processing to gauge the generated text’s fluency and information overlap. BLEU-4 [55], borrowed from machine translation, assesses how well the generated text matches a reference text (in our case, the patent abstract) based on n-gram overlap, particularly focusing on 4-g. Meanwhile, ROUGE-L [56], commonly used in text summarization, evaluates the information capture by measuring the longest common subsequence between the generated explanation and the reference.
But BLEU and ROUGE, while useful, can only tell us if the explanation sounds like a plausible abstract. They do not tell us if it truly captures why a specific patent is relevant to a particular organization. To address this core issue, we designed a more targeted metric: Keyword Coverage (KC) [57]. Here is how it works: we first extract the Top-5 most representative technical keywords from the generated explanation and the target patent’s description using our PMI approach. Then, we simply check what proportion of these key terms appear in the explanation. This KC score directly measures whether our GFRU is truly “guided” by the core recommendation rationale, as intended. It is a crucial step in verifying the genuine faithfulness of our generated explanations.

4.2.3. Diversity and Novelty Metrics

A recommendation system that only ever suggests the most popular “hits” has limited long-term value. A truly intelligent system should also help users broaden their horizons, perhaps uncovering some unexpected but valuable “gems.” With this in mind, we have introduced two metrics that go beyond mere accuracy to evaluate the overall quality of the recommendations.
The first is Intra-List Similarity (ILS). This measures the average pairwise similarity among the items within a recommended list, serving as an inverse indicator of diversity. We use the IntraSim formula defined in Section 3.7.1, where patent similarity is calculated via cosine similarity on their final deterministic representations I(p). A lower ILS score means the recommended items are more dissimilar, indicating higher diversity.
The second metric is Catalogue Coverage @K. This is more straightforward: it measures the proportion of unique patents recommended across all test users, relative to the entire pool of candidate patents. Higher coverage suggests the model has the capability to explore and recommend long-tail or niche items, rather than getting stuck recommending only the most popular ones.

4.3. Baselines for Comparison

To really understand where our TEAHG-EPR model stands, we needed to put it up against a thoughtfully curated lineup of competitors. We chose our baselines to represent a spectrum of technologies, from classic collaborative filtering to the latest in sequential and knowledge graph-based recommendation.

4.3.1. General Recommendation Models

This first group of baselines represents general-purpose recommendation methods that do not rely on complex graph structures or temporal information.
(1)
BPR-MF [52]: A true classic in the recommendation systems field. This model learns latent vectors for users and items through matrix factorization, optimized with a Bayesian Personalized Ranking (BPR) loss. In our setup, organizations are the “users” and patents are the “items.” BPR-MF is our foundational baseline, helping us measure the performance lift gained by moving to more complex, deep learning-based models.
(2)
LightGCN [58]: As a standard-bearer for graph collaborative filtering, LightGCN has made waves by simplifying the graph convolution mechanism. It learns embeddings by propagating them only on a user-item bipartite graph, which has proven to be both efficient and highly effective. For us, LightGCN represents the state of the art in general-purpose graph recommendation. Comparing against it will highlight the benefits of our more complex design, which explicitly handles the heterogeneous and temporal aspects of the data.

4.3.2. Sequential Recommendation Models

To specifically isolate and test the value of our model’s temporal modeling component, we included an advanced sequential recommendation model in our lineup.
(1) GLINT-RU [59]: This is a lightweight yet powerful sequential model that creatively uses a single-layer, dense selective GRU to capture temporal dependencies and fine-grained positional information. To adapt it to our task, we treat each organization’s history of patent acquisitions as a time-ordered sequence. GLINT-RU’s job is then to predict the next patent in that sequence. This direct comparison will be very telling: it will reveal whether our temporal heterogeneous graph framework can capture richer information than a pure-play sequential model.

4.3.3. Heterogeneous Graph/KG Models

This group represents our core competitors. These models, like ours, are designed to handle heterogeneous information, but they go about it in different ways. The comparison here is key to demonstrating the superiority of our specific design choices.
(1) DHGPF [60]: This is a powerful, two-stage denoising heterogeneous graph model that includes a pre-training phase. It automatically learns weights to aggregate semantic information from different meta-paths and uses a gating mechanism to filter out noise. DHGPF represents the cutting edge of meta-path-based methods, and a strong performance against it will underscore the advantages of our R-GCN + Bi-LSTM dynamic neighborhood aggregation paradigm.
(2) KGAT-SS [61]: This model cleverly combines a graph attention network (KGAT) with a dual-tower architecture, allowing it to fuse both entity (graph) semantics and textual semantics. It uses the graph attention network to learn high-order relations and leverages BERT to process textual information. As a powerful fusion model, KGAT-SS is a formidable opponent. A successful comparison will validate whether our unique design elements—temporal modeling, uncertainty representation, and controllable explanation generation—truly deliver a performance edge.

4.3.4. Explainable Recommendation Models

Finally, to ensure a fair and direct comparison on the explainability front, we chose an advanced model that is also capable of generating explanations.
(1) KAERR [62]: This is a knowledge-aware, explainable model designed for reciprocal recommendation scenarios. It models preferences from a dual perspective by fusing different meta-paths with an attention mechanism, and then uses those attention weights to generate explanations. Although its original application (reciprocal recommendation) is slightly different from ours, its core idea of using meta-path attention for explanations provides an excellent point of reference. Comparing against KAERR on our explanation quality metrics (BLEU, ROUGE, and especially KC) will directly show whether our GFRU-based controllable generation paradigm can produce more faithful and relevant explanations.

4.4. Implementation Details

To ensure our experiments are both transparent and reproducible, this section provides the specifics of our implementation environment and parameter settings.
Our TEAHG-EPR model was built using PyTorch 2.5 and the PyTorch Geometric (PyG) 2.1 library. All experiments were carried out on a high-performance server equipped with an NVIDIA GeForce RTX 3090 (24 GB) GPU and 512 GB of RAM. We used the Adam optimizer for end-to-end training, setting the initial learning rate to 1 × 10−3. This was paired with a cosine annealing strategy to dynamically adjust the learning rate during training, eventually decaying to a minimum of 1 × 10−5. A global L2 regularization coefficient, γ, was set to 1 × 10−5 to mitigate overfitting. The batch size was fixed at 128 for both datasets. Other key hyperparameters for our model, such as embedding dimensions, the number of attention heads, and the temporal window size, are already detailed in Table 2 of our methodology section.

4.5. Ablation Experiments

Simply showing that TEAHG-EPR outperforms other models isn’t enough. A deeper question remains: where does this strong performance actually come from? Is it the introduction of temporal modeling, the subtlety of our uncertainty design, or the fine-grained attention mechanism? To get to the bottom of this, we conducted a series of rigorous ablation studies.
Our approach was straightforward but effective: we took the full TEAHG-EPR architecture and, like a surgeon, systematically removed each of its core innovative components one by one. We then observed how much the model’s performance suffered as a result. This performance drop, this “pain,” is precisely what quantifies the value of the removed component. All ablation experiments were conducted on our main USPTO-Semiconductor dataset, with the results presented in Table 5.
Our first variant, w/o Temporal, removes the Bi-LSTM module, stripping the model of its ability to capture dynamic evolution. It operates solely on the most recent graph snapshot, using the R-GCN encoding for direct recommendation. As you can clearly see from Table 5, the performance of the w/o Temporal model takes a significant hit, dropping by about 3.8% in HR@10 and a notable 5.2% in NDCG@10. This result powerfully validates the core argument we made in our introduction: patent recommendation is anything but a static matching problem. An organization’s innovation needs and a technology’s potential value are deeply embedded in the flow of time. By ignoring dynamic signals like the career trajectory of an inventor or the shifting popularity of a tech field, the model becomes short-sighted, unable to make forward-looking judgments. That performance drop is the price of ignoring time, and it is also the value contributed by our carefully designed temporal evolution module.
Next, we investigated the effectiveness of our much-touted “dual representation” design. For the w/o Uncertainty variant, we disabled the generation of Gaussian distributions and the KL divergence-based similarity calculation. The recommendation score was determined entirely by the cosine similarity between the deterministic feature vectors I(v) (effectively setting α = 0). The quantitative results in Table 5 provide strong empirical justification for this module. Removing the uncertainty component leads to a clear performance degradation: the HR@10 drops by 1.68% (0.8542 → 0.8398), while the NDCG@10 suffers a larger drop of 2.19% (0.6518 → 0.6375). This discrepancy is revealing. The deterministic vector I(v) is sufficient for retrieving obvious matches (maintaining a decent Hit Rate), but the uncertainty modeling is crucial for optimizing the ranking order (NDCG). By modeling the exploration scope via Gaussian distributions, the full model can correctly elevate “high-potential” patents—those that are logically relevant but semantically distant—to higher positions in the list. This confirms that uncertainty modeling is not redundant but acts as a critical “soft buffer” for refining recommendation precision.
In the w/o Attention variant, we replaced our multi-head attention mechanism with a much simpler approach: instead of having the body representation H(p) “query” the historical evolution matrix A(p), we just concatenated H(p) with a mean-pooled vector of the rows of A(p). The resulting performance drop (about 2.2% in HR@10 and 3.3% in NDCG@10) proves a simple truth: not all information is created equal. For a semiconductor patent, the historical dynamics of its inventors might be far more important than those of its assignee. The multi-head attention mechanism acts like an intelligent “information dispatcher,” dynamically assigning weights to historical information from different semantic dimensions (inventors, IPCs, citations, etc.) based on the specific context. Without it, the model is forced to treat all information sources as equally important, which allows crucial signals to be drowned out by noise, inevitably hurting performance.
Our final two variants turn the spotlight onto our explanation module and training strategy.
For w/o Joint-Training, we adopted a two-stage approach: first, we fully trained the entire recommendation module using only the recommendation loss L r . Then, we froze its parameters and trained the explanation generation module separately. The results show that while recommendation performance only dipped slightly, the explanation quality metrics—especially KC—took a noticeable tumble. This reveals a subtle but important benefit of joint training: the gradients flowing back from the explanation task seem to “nudge” the recommendation module into learning more semantically distinct feature representations, which in turn makes it easier for the explanation module to “understand” the reasons for a recommendation.
The w/o Feature-Guidance variant was the most extreme. We completely disabled the feature GRU within the GFRU, effectively turning it into a standard GRU model initialized by I(o) and I(p). Recommendation performance was almost unaffected, but the KC metric plummeted to a dismal 0.1234. This result could not be clearer: without external feature guidance, so-called “explanations” are just a string of fluent but generic platitudes. While this confirms the necessity of guidance, it does not prove that PMI is the optimal selection strategy. To address this, we perform a more rigorous comparative analysis in Section 4.6.3, where we benchmark our PMI-based approach against TF-IDF and Attention-based feature selection methods.

4.6. Overall Performance Comparison

In this section, we pit TEAHG-EPR against our seven curated baselines across both datasets. The goal is to answer the most fundamental question: in the complex arenas of patent and academic recommendation, does our model genuinely deliver stronger performance? The results, laid out in Table 6 and Table 7, speak for themselves.

4.6.1. Analysis of Recommendation Performance

The results in Table 6 and Table 7 tell a layered story. The most immediate takeaway is that TEAHG-EPR consistently achieves the best performance across all recommendation metrics on both datasets. On the highly competitive USPTO dataset, our model delivers a significant 3.82% improvement in NDCG@10 over the strongest baseline, KGAT-SS, confirming the superiority of our overall framework.
But let us dig into the details. You will notice that simply incorporating graph structure marks the first major leap in performance. Both LightGCN and the more advanced KG models vastly outperform BPR-MF, which relies solely on user-item interactions. This is not surprising; in domains like patents and academia, the rich web of relationships (citations, collaborations) is a treasure trove of recommendation signals.
A more interesting comparison arises between the sequential and graph-based models. GLINT-RU, a powerful sequential model, performs slightly worse than LightGCN and is clearly outmatched by the more sophisticated graph models. This highlights a key insight: while an organization’s acquisition history is a timeline, treating it as an isolated sequence is insufficient. A patent’s value is determined more by its position within the entire heterogeneous knowledge network than by simply what was acquired before it. Our model, through its temporal heterogeneous graph, elegantly combines both of these perspectives.
Finally, the real showdown happens among the top-tier heterogeneous graph models. DHGPF and KGAT-SS are formidable opponents, with the former showing strength through pre-training and meta-path aggregation, and the latter gaining an edge by incorporating textual information. Yet, TEAHG-EPR still manages to pull ahead. We attribute this crucial performance advantage to our unique blend of temporal evolution modeling and uncertainty representation. Models like KGAT-SS still operate on a static snapshot of the knowledge graph, whereas our model captures the dynamic trends of how entity representations evolve over time. Furthermore, our uncertainty modeling allows the recommendation process to look beyond hard, “core-to-core” matches and explore softer, “core-to-potential” fits—a particularly valuable trait in a forward-looking domain like technology scouting.

4.6.2. Analysis of Explanation Quality

If recommendation performance is the “IQ” of our model, then explanation quality is its “EQ.” In this regard, TEAHG-EPR demonstrates an overwhelming advantage.
As seen in Table 6, KGAT-SS, despite also incorporating text, produces mediocre explanations, with a KC of only 0.1567. This is likely because it treats text as just another feature, lacking a dedicated mechanism for generating explanations. KAERR, which uses meta-path attention to generate explanations, fares better on the KC metric, achieving 0.4501. This suggests that graph-path-based explanations are more targeted than those derived from general text features.
However, our TEAHG-EPR achieves a striking 0.7850 on the KC metric, outperforming the next-best KAERR by over 74%. This is a decisive victory. This result clearly proves the massive success of our “PMI for core rationale extraction + GFRU for guided generation” paradigm. Our model is not just “guessing” at an explanation; it is purposefully constructing an explanation guided by the evidence we provide. At the same time, our model also scores best on the BLEU and ROUGE metrics, showing that we did not sacrifice fluency or information completeness in our pursuit of controllability. The joint training strategy allows the high-quality semantic representations learned by the recommendation module to serve as an excellent “starting point” for the explanation module, achieving a win-win in both substance and style. In short, TEAHG-EPR not only excels at recommending what, but also provides the clearest, most faithful, and most credible answers as to why.

4.6.3. Faithfulness Evaluation

While metrics like BLEU and Keyword Coverage measure the linguistic quality of explanations, they do not necessarily reflect whether the explanation is “faithful”—that is, whether it truly represents the logic behind the model’s prediction. As highlighted in recent literature [63], a faithful explanation should pinpoint the input features that were actually decisive for the output.
To quantitatively assess this, we conducted a “Fidelity (Perturbation) Test” [64]. The underlying hypothesis is straightforward: if we mask the key features identified by the explanation generator, the model’s recommendation score for the target item should drop significantly. We define the Fidelity Score as the average drop in prediction probability:
F i d e l i t y = 1 | D | ( u , i ) D ( f ( u , i | X ) f ( u , i | X \ { k exp } ) )
where f(.) is the prediction function, X is the full feature set, and {kexp} is the set of keywords extracted by the explanation module. A higher score indicates greater faithfulness.
We compared our approach against three baseline explanation strategies:
(1) Random: Randomly masking keywords from the document.
(2) TF-IDF (Retrieval-based): Masking keywords with the highest TF-IDF scores, a common heuristic baseline [65].
(3) Attention-based (KGAT-SS): Masking node features with the highest attention weights from the graph attention layer.
The results are summarized in Table 8. TEAHG-EPR achieves a Fidelity Score of 0.421, which is notably higher than both the retrieval-based (0.152) and attention-based (0.285) baselines. This empirical evidence directly addresses the concern that feature-guided generation might be merely a narrative rationalization. The high Fidelity score demonstrates that if the keywords identified by our explanation module are removed, the model’s recommendation confidence drops significantly. This confirms that the generated text is evidence-grounded and faithfully reflects the actual factors driving the model’s scoring behavior.

4.7. Model Analysis and Discussion

Having established our model’s overall effectiveness, we wanted to dig deeper into its internal mechanics. A good model should not only be powerful but also robust and understandable. This section uses a series of detailed analyses to explore a few key questions: How sensitive is our model to its core hyperparameters? Does it achieve its accuracy at the cost of recommendation diversity? And what does it actually do in a concrete, real-world example?

4.7.1. Analysis of Hyperparameter Sensitivity

Our TEAHG-EPR model introduces several key hyperparameters that directly influence how it processes temporal information and balances certainty with uncertainty. Understanding the model’s sensitivity to these parameters is crucial for tuning and deployment. We investigated this on the USPTO-Semiconductor dataset by varying one parameter at a time while keeping others fixed.
(1) The Impact of Temporal Window Size L
The temporal window L dictates how many years of history the model can look back on. This is a critical parameter: too short, and it might miss long-term evolutionary trends; too long, and it could be swayed by outdated, irrelevant information. We tested values for L ranging from 1 to 7 years, with the results shown in Figure 5.
The trend is quite revealing. As L increases from 1 to 5, the model’s performance (measured by NDCG@10) steadily improves. This clearly indicates that capturing longer-term historical dynamics is vital for accurate recommendations. Things like an inventor’s career progression or the shifting popularity of a tech domain often take several years to become apparent. However, once L exceeds 5, performance begins to plateau and even slightly decline. This phenomenon reflects a critical trade-off.
Empirically, for the specific task of predicting imminent acquisitions, recent market trends appear to be more discriminative. In the fast-paced semiconductor sector, older data can act as noise that dilutes these immediate signals within our LSTM-based architecture.
However, we acknowledge that this “hard truncation” at 5 years is a limitation. As correctly noted in patent literature [1], foundational innovations often exhibit long latency periods, and their value may only emerge after a decade. A rigid window may systematically overlook such long-range dependencies. The performance drop at L > 7 suggests that our current model lacks an effective mechanism to filter “noise” from “foundation” in long historical sequences. Therefore, while L = 5 is the operational optimum for this study, future work should consider replacing this hard cutoff with a soft “temporal decay mechanism” or hierarchical attention. This would allow the model to retain access to long-term history (e.g., L = 10 or 20) while adaptively down-weighting outdated information rather than discarding it entirely.
(2) The Impact of Output Dimension dout
The representation dimension dout determines the “information capacity” the model has to characterize a patent or an organization. We tested a range of dimensions from 32 to 256, with the results plotted in Figure 6.
As expected, increasing the embedding dimension generally leads to better performance, with the most significant gains seen when moving from 32 to 128. This confirms that a sufficiently large representation space is necessary to capture the complex semantics of the patent heterogeneous graph. However, the performance boost from 128 to 256 is marginal, while the computational cost and risk of overfitting increase. This suggests that the model’s representational power starts to saturate around 128 dimensions. Given the trade-off, we selected dout = 128 as our standard configuration for all experiments.
(3) The Impact of Similarity Balance Coefficient α
The parameter α is perhaps the most philosophically interesting “knob” in our model. It directly controls whether the model’s final decision leans more towards “deterministic core matching” (α near 0) or “probabilistic potential-seeking” (α near 1). We observed how the model’s performance changed as α varied from 0 to 1, as shown in Figure 7.
This result offers the most profound insight and serves as further empirical validation for the uncertainty modeling module. At α = 0 (relying solely on cosine similarity), the performance corresponds exactly to the deterministic-only baseline. As we increase α to introduce the probabilistic KL divergence component, the performance (NDCG@10) does not drop but instead rises sharply, peaking at α = 0.6. This convex curve empirically proves that the probabilistic “potential-seeking” mechanism provides distinct, non-redundant information that complements the deterministic “core-matching” mechanism.
The key takeaway here is that our uncertainty modeling acts as a powerful “navigator,” not a “driver”; it cannot replace the fundamental need for core matching (as seen by the drop at α = 1). However, the significant performance gap between the peak (α = 0.6) and the baseline (α = 0) quantifies the specific gain attributed to our uncertainty design. It suggests that the optimal recommendation strategy is to place the majority of the weight (about 60%) on solid, core technological matches, while still dedicating a significant portion (about 40%) to exploring the potential opportunities revealed by the Gaussian distributions. This non-linear performance curve provides powerful evidence for the value of our “dual representation + hybrid metric” design: it’s not a simple sum of two parts, but an organic fusion of two different decision-making philosophies that ultimately achieves a 1 + 1 > 2 effect.
(4) The Impact of KL Divergence Temperature λ
The temperature parameter λ in Equation (20) controls the sensitivity of the probabilistic similarity score to the KL divergence distance. It determines how sharply the model penalizes the mismatch between the organization’s interest distribution and the patent’s potential distribution. To find the optimal setting, we evaluated the model’s performance (NDCG@10) on the USPTO dataset by varying λ across the range {0.01, 0.05, 0.1, 0.2, 0.5, 1.0}.
The results are presented in Figure 8. We observe a clear concave trend where performance peaks at λ = 0.1. When λ is too small (e.g., λ = 0.01), the exponential function in Equation (20) decays too rapidly, acting as a hard gate that assigns near-zero similarity to any pair with even minor divergence. This hinders the model from learning from “soft” matches. Conversely, when λ is too large (e.g., λ ≥ 0.5), the similarity scores become overly smoothed and indistinguishable, reducing the model’s discriminative power regarding uncertainty. The optimal value of λ = 0.1 provides a balanced “receptive field” for uncertainty, allowing the model to effectively identify high-potential candidates that fall within a reasonable exploration radius.

4.7.2. Analysis of Personalization and Diversity

A recommendation system that only ever suggests the most popular “hits” has limited long-term value. A truly intelligent system should also help users broaden their horizons, perhaps uncovering some unexpected but valuable “gems.” With this in mind, we will now explore TEAHG-EPR’s performance in terms of personalization and recommendation diversity.
(1) Visualization of Organization Profiles
First, we wanted to test a core hypothesis: can our model truly “read” the different innovation strategies hidden behind each organization? To find out, we created “recommendation profiles” for each test organization in the USPTO dataset using the two non-accuracy metrics defined in Section III.G.1: ILS and Popularity. A lower ILS score means the organization received a more diverse list of recommendations, while a lower Popularity score means the recommendations skewed towards more niche or cutting-edge technologies. We then plotted each organization’s (ILS, Popularity) score as a point in a 2D space and ran a K-Means clustering algorithm (with K = 6). The result was striking. As shown in Figure 9, the organizations don’t form a random cloud; they naturally coalesce into six distinct, well-defined clusters. This provides direct visual evidence that our model is indeed generating significantly different styles of recommendations for different organizations. A closer look at these clusters reveals that they align remarkably well with the profiles we defined in our methodology: we can clearly identify the Explorers (e.g., universities) in the bottom-left with their high-diversity, high-novelty recommendations; the Domain Leaders (e.g., industry giants) in the top-right receiving highly focused, popular recommendations; the Niche Specialists in the bottom-right, and so on for the Mediators, Emerging Players, and Focused Experts. This clear, interpretable clustering is a powerful testament to our model’s ability to capture implicit user preferences and strategic intents, providing a solid foundation for further personalization.
(2) Comparison of Diversity and Coverage
Having established the model’s personalization capabilities, we next wanted to see how it stacked up against the baselines in terms of diversity and novelty. We calculated the average Intra-List Similarity (ILS, lower is better) and Catalogue Coverage (@50, higher is better) for all models on the USPTO dataset, with the results shown in Figure 10. The story is clear. The collaborative filtering-based models, BPR-MF and LightGCN, perform the worst on diversity (highest ILS), falling into the classic “rich-get-richer” trap of recommending a small set of popular items. The knowledge graph-based models fare better, as they can leverage multi-hop relationships to find less obvious connections. However, a significant gap still remains between all baselines and TEAHG-EPR. Our model achieves the lowest ILS score while simultaneously reaching the highest coverage. We believe this decisive advantage stems directly from our unique uncertainty modeling. While traditional models ultimately rely on a deterministic similarity score, which naturally favors the “safest” matches, our hybrid metric, through its KL divergence-driven “potential-seeking” component, is explicitly encouraged to explore options at the boundary of a user’s core needs. This philosophical difference in design is what leads directly to TEAHG-EPR’s superior performance in both diversity and novelty, proving that our model finds a better balance between the seemingly contradictory goals of “precision” and “breadth.”

4.7.3. Cold-Start Performance Analysis

To empirically verify the model’s robustness to new entities, we conducted a targeted evaluation on a “Cold-Start” subset of the USPTO test set. This subset consists of patents filed within the last year of the dataset that have fewer than 3 citation links and zero transfer history. We compared TEAHG-EPR against two representative baselines: LightGCN (structure-heavy) and KGAT-SS (structure + text).
The results are presented in Table 9. As expected, the performance of all models drops compared to the full dataset. However, LightGCN suffers a catastrophic decline (−70% relative to its full performance) because it relies almost entirely on interaction history, which is missing for these nodes. In contrast, TEAHG-EPR maintains a robust NDCG@10 of 0.5124. It outperforms KGAT-SS (0.4532) by a significant margin, validating that our explicit modeling of temporal attribute evolution (via Bi-LSTM) and uncertainty helps bridge the gap for data-sparse items better than static graph attention methods.

4.8. Case Study

While quantitative metrics are essential, the true worth of a recommendation system is ultimately revealed in its ability to provide insightful, trustworthy advice in a real-world context. To bring our TEAHG-EPR model to life, we conducted an in-depth case study. We took on the role of a technology strategy consultant, with the goal of recommending emerging patents to the semiconductor giant Intel Corporation at a specific point in time: early 2022.
We chose Intel not only because it’s a data-rich “Domain Leader” in our USPTO dataset, but also because it was in the midst of a critical strategic pivot around 2022, investing heavily in advanced packaging, heterogeneous integration, and post-silicon transistor technologies. Our question was: could TEAHG-EPR, without being explicitly told about these strategies, “sniff out” these emerging technological directions simply by learning from historical data? We fed Intel as the target user into our model and asked for its Top-3 recommendations from the 2022 candidate patent pool. The results, shown in Table 10, were beyond our expectations in both their accuracy and the depth of their explanations.
At first glance, these three recommendations might seem a bit disparate, but their deep underlying logic becomes apparent when we analyze them against Intel’s real-world technological activities around 2022. Recommendation #1 (US 11,211,465) was a direct hit on a core technology. Intel had announced its new RibbonFET transistor architecture in 2021, which is precisely its commercial name for nanosheet technology. Without reading a single press release, our model correctly identified this “next-generation transistor” from its main competitor, Samsung, as a central concern for Intel. Recommendation #2 (US 11,289,397) was an equally on-point strategic match. With Moore’s Law slowing, heterogeneous integration via Chiplet technologies has become the industry’s consensus path forward, and Intel’s Foveros and EMIB are its flagship technologies. The model’s suggestion of a patent on hybrid bonding from its other key rival, TSMC—a key enabler for dense 3D integration—perfectly aligns with this strategy. Finally, Recommendation #3 (US 11,271,139) was a brilliant “exploratory” pick. While not Intel’s primary business at the time, silicon photonics is widely recognized as a disruptive technology for solving future data transfer bottlenecks in data centers. The model recommended a patent on integrating quantum dot light sources from a specialized German tech firm, a critical challenge in the field. The beauty of this recommendation is that it didn’t just stay in Intel’s comfort zone; it pointed to a new frontier that is highly relevant to its core competency (silicon manufacturing) but also full of future potential. This perfectly demonstrates the exploratory value brought by our model’s uncertainty modeling.
While the generated text provides the “what” of the recommendation, we also wanted to understand the “how” of the model’s reasoning. Since the internal message-passing of R-GCN is complex, we introduce meta-paths here strictly as a post hoc analytical tool to visualize the learned semantic connections. We define four key interpretive paths:
(1) MP1 ( O r 1 P r 4 C r 4 P r 2 O ): This path connects two organizations if they have dealt with patents in the same technical domain (IPC class). It is a good signal of shared technological interests.
(2) MP2 ( O r 1 P r 3 I r 3 P r 2 O ): This one highlights connections through talent. If two organizations are linked to patents by the same inventor, it could suggest talent mobility or collaboration.
(3) MP3 ( O r 1 P r 6 P r 2 O ): The classic technology lineage path. Citation links reveal pathways of innovation and technological inheritance.
(4) MP4 ( O r 1 P r 5 A r 5 P r 2 O ): This path uncovers connections through shared assignees, which can reflect complex corporate structures or intellectual property partnerships.
Using these paths as semantic probes, we unearth the hidden logical chains connecting Intel to the recommended nanosheet patent (US 11,211,465) within our vast temporal knowledge graph. Although we found multiple path instances, two stood out for their explanatory power, revealing the “why” from the perspectives of “talent spillover” and “technology evolution.” We visualize these two representative path instances in Figure 11. Path 1 (Talent & Knowledge Spillover) tells a fascinating story about people: it reveals that a key inventor on one of Intel’s older patents later became the inventor of the recommended cutting-edge nanosheet patent, which was subsequently acquired by a competitor. This sends a powerful signal: talent originating from Intel’s knowledge base is creating next-generation technology that the market values highly. Path 2 (Technology Domain Evolution) provides an irrefutable rationale from the technology itself: it shows that both the recommended patent and Intel’s own portfolio not only belong to the same core technical domain (IPC) but also share the same “technological roots” by citing the same foundational patent.

5. Conclusions

In this paper, we have tackled the tough challenges of delivering precise and explainable recommendations within the complex, ever-shifting ecosystem of patent information. We have argued that older, static models, with their “carving a mark on a moving boat” approach, are simply no longer adequate for the dual demands of foresight and reliability required by modern technology strategy. This is precisely why we built and implemented our innovative framework, TEAHG-EPR, from the ground up.
This study successfully integrates temporal evolution, attribute heterogeneity, uncertainty modeling, and controllable explanation generation into a unified framework. We did not just stop at applying R-GCN to a patent graph; with the help of a Bi-LSTM, we gave our model a sense of “memory” and “foresight,” allowing it to understand the life cycle of a technology through the lens of time. Nor did we settle for a rigid, deterministic profile for each entity. Instead, we wrapped it in a Gaussian distribution to sketch out its “boundary of possibilities,” and our experiments have shown that this dual-track perspective is key to broadening the horizons of the recommendations. Most importantly, our explanation module is no mere accessory. It is deeply intertwined with the main recommendation task, evolving alongside it. Such a configuration allows the final system not only to deliver an answer but also to tell the story of how it arrived at that answer.
The comprehensive experiments—from head-to-head comparisons against strong baselines to detailed ablation studies and a real-world case study—collectively validate the effectiveness of our design. TEAHG-EPR does not just set a new bar for conventional accuracy metrics; it also showcases its unique value in the areas that users truly care about: diversity, novelty, and the quality of its explanations.
Of course, this journey of exploration is not over. Future work could push deeper in several exciting directions. For instance, integrating larger-scale pre-trained language models into our text processing and explanation generation could lead to another leap in semantic understanding. Additionally, while the Gaussian assumption used in our uncertainty modeling provides computational efficiency and closed-form tractability, it may struggle to fully capture the heavy-tailed nature of patent value distributions observed in raw citation data. Future research could explore more flexible distributional forms, such as Student-t distributions or Elliptical distributions, or leverage Hyperbolic geometry embeddings to better model the hierarchical and scale-free properties of the patent ecosystem. Simultaneously, to address the limitation of coarse-grained organizational modeling, future studies could adopt hierarchical or multi-level graph representations. By explicitly modeling the nested relationship between parent companies and their subsidiaries, the system could reduce semantic drift and provide finer-grained recommendations tailored to specific R&D divisions. Furthermore, incorporating user’s real-time feedback in a streaming fashion into our dynamic graph is another tantalizing idea. All in all, we believe the design philosophy embodied by TEAHG-EPR—embracing dynamics, understanding uncertainty, and striving for transparency—offers a solid and inspiring new starting point for the future of intelligent technology recommendation and analysis systems.

Funding

The present research was supported by grant No. 2024A0505050018 from Department of Science and Technology of Guangdong Province, China.

Data Availability Statement

The datasets analyzed in this study are derived from publicly available sources. The USPTO-Semiconductor dataset was constructed using data from the Google BigQuery ‘patents-public-data’ project [53] (available at: https://console.cloud.google.com/marketplace/product/google_patents_public_data/patents (accessed on 16 May 2025)) and the PatentsView project (https://patentsview.org/download/data-download-tables (accessed on 16 May 2025)). The AMiner-CS dataset was constructed from the publicly available AMiner-DBLP dataset (Version 12) [54] (available at: https://www.aminer.cn (accessed on 16 May 2025)). The pre-processing scripts and the final datasets generated and used for the experiments in this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The author declares no conflicts of interest.

References

  1. Feng, Y. The Transition of the Patent System; Taylor & Francis: Abingdon, UK, 2025. [Google Scholar]
  2. Phillips, J. Experience & Abstraction: A Study of Speculative Knowledge Production in Reconceptualising Our Relation to the World. Ph.D. Dissertation, Goldsmiths, University of London, London, UK, 2024. [Google Scholar]
  3. Taherdoost, H. Innovation through research and development. In Signals and Communication Technology; Springer: Berlin/Heidelberg, Germany, 2024; Volume 10, pp. 78–83. [Google Scholar]
  4. Sherriff, G. How to read a patent: A survey of the textual characteristics of patent documents and strategies for comprehension. J. Patent Trademark Resour. Cent. Assoc. 2024, 34, 3. [Google Scholar]
  5. Ali, A.; Tufail, A.; De Silva, L.C.; Ahmed, S.; Khan, M.A.; Zafar, A.; Iqbal, S.; Rehman, A.; Khan, S.; Mahmood, T. Innovating patent retrieval: A comprehensive review of techniques, trends, and challenges in prior art searches. Appl. Syst. Innov. 2024, 7, 91. [Google Scholar] [CrossRef]
  6. Xue, D.; Shao, Z. Patent text mining based hydrogen energy technology evolution path identification. Int. J. Hydrogen Energy 2024, 49, 699–710. [Google Scholar] [CrossRef]
  7. Yang, P.; Wu, X.; Wen, P. Patent Technology Knowledge Recommendation by Integrating Large Language Models and Knowledge Graphs. Available online: https://ssrn.com/abstract=5603825 (accessed on 15 February 2025).
  8. Chung, J.; Choi, J.; Yoon, J. Exploring intra-and inter-organizational collaboration opportunities across a technological knowledge ecosystem: A metapath2vec approach. Expert Syst. Appl. 2025, 298, 129724. [Google Scholar] [CrossRef]
  9. MacLean, F. Knowledge graphs and their applications in drug discovery. Expert Opin. Drug Discov. 2021, 16, 1057–1069. [Google Scholar] [CrossRef]
  10. Chen, J.; Dong, H.; Hastings, J.; Karp, P.D.; Liu, Z.; Zhao, Z.; Chen, H.; Ding, Y.; Guo, Y.; Wu, J. Knowledge graphs for the life sciences: Recent developments, challenges and opportunities. arXiv 2023, arXiv:2309.17255. [Google Scholar] [CrossRef]
  11. Kejriwal, M. Knowledge graphs: A practical review of the research landscape. Information 2022, 13, 161. [Google Scholar] [CrossRef]
  12. Zhong, L.; Wu, J.; Li, Q.; Wang, X.; Chen, H.; Zhang, Y.; Li, J.; Li, Z.; Sun, Y.; Liu, W. A comprehensive survey on automatic knowledge graph construction. ACM Comput. Surv. 2023, 56, 1–62. [Google Scholar] [CrossRef]
  13. Schlichtkrull, M.; Kipf, T.N.; Bloem, P.; van den Berg, R.; Titov, I.; Welling, M. Modeling relational data with graph convolutional networks. In European Semantic Web Conference; Springer International Publishing: Cham, Switzerland, 2018; pp. 593–607. [Google Scholar]
  14. Li, L.; Zhang, Y.; Chen, L. Generate neural template explanations for recommendation. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management, Virtual Event, 19–23 October 2020; pp. 755–764. [Google Scholar]
  15. Madani, F.; Weber, C. The evolution of patent mining: Applying bibliometrics analysis and keyword network analysis. World Patent Inf. 2016, 46, 32–48. [Google Scholar] [CrossRef]
  16. Franz, M.; Alberts, S.C. Social network dynamics: The importance of distinguishing between heterogeneous and homogeneous changes. Behav. Ecol. Sociobiol. 2015, 69, 2059–2069. [Google Scholar] [CrossRef]
  17. Mhatre, V.; Rosenberg, C. Homogeneous vs heterogeneous clustered sensor networks: A comparative study. In Proceedings of the 2004 IEEE International Conference on Communications, Paris, France, 20–24 June 2004; Volume 6, pp. 3646–3651. [Google Scholar]
  18. Ai, J.; Cai, Y.; Su, Z.; Zhang, L.; Li, X.; Wang, Y.; Liu, H.; Zhang, Z. Predicting user-item links in recommender systems based on similarity-network resource allocation. Chaos Solitons Fractals 2022, 158, 112032. [Google Scholar] [CrossRef]
  19. Sun, Y.; Han, J. Mining heterogeneous information networks: A structural analysis approach. ACM SIGKDD Explor. Newslett. 2013, 14, 20–28. [Google Scholar] [CrossRef]
  20. Cheng, R.; Huang, Z.; Zheng, Y.; Liu, X.; Chen, L.; Wang, Y. Meta paths and meta structures: Analysing large heterogeneous information networks. In Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint Conference on Web and Big Data; Springer International Publishing: Cham, Switzerland, 2017; pp. 3–7. [Google Scholar]
  21. Zhang, Y.; Tian, J.; Sun, J.; Liu, X.; Chen, H.; Li, Y. HKGAT: Heterogeneous knowledge graph attention network for explainable recommendation system. Appl. Intell. 2025, 55, 549. [Google Scholar] [CrossRef]
  22. Sun, Y.; Han, J. Mining Heterogeneous Information Networks: Principles and Methodologies; Morgan & Claypool Publishers: Rafael, CA, USA, 2012. [Google Scholar]
  23. Shi, C.; Li, Y.; Zhang, J.; Sun, Y.; Yu, P.S. A survey of heterogeneous information network analysis. IEEE Trans. Knowl. Data Eng. 2016, 29, 17–37. [Google Scholar] [CrossRef]
  24. Creusen, M.E.H. The importance of product aspects in choice: The influence of demographic characteristics. J. Consum. Mark. 2010, 27, 26–34. [Google Scholar] [CrossRef]
  25. Blazevic, M.; Sina, L.B.; Secco, C.A.; Tosato, P.; Ghidini, C.; Prandi, C.; Salomoni, P. Recommendation of scientific publications—A real-time text analysis and publication recommendation system. Electronics 2023, 12, 1699. [Google Scholar] [CrossRef]
  26. Xu, Z.; Qi, L.; Du, H.; Zhang, J.; Li, Y.; Chen, Z. AlignFusionNet: Efficient Cross-modal Alignment and Fusion for 3D Semantic Occupancy Prediction. IEEE Access 2025, 13, 125003–125015. [Google Scholar] [CrossRef]
  27. Shi, J.; Han, D.; Chen, C.; Zhang, Y.; Liu, H. KTMN: Knowledge-driven Two-stage Modulation Network for visual question answering. Multimed. Syst. 2024, 30, 350. [Google Scholar] [CrossRef]
  28. Vrochidis, S.; Papadopoulos, S.; Moumtzidou, A.; Kompatsiaris, Y. Towards content-based patent image retrieval: A framework perspective. World Patent Inf. 2010, 32, 94–106. [Google Scholar] [CrossRef]
  29. Du, W.; Wang, Y.; Xu, W.; Zhang, H.; Chen, X.; Liu, J. A personalized recommendation system for high-quality patent trading by leveraging hybrid patent analysis. Scientometrics 2021, 126, 9369–9391. [Google Scholar] [CrossRef]
  30. Yoon, B.; Kim, S.; Kim, S.; Park, Y.; Lee, J. Doc2vec-based link prediction approach using SAO structures: Application to patent network. Scientometrics 2022, 127, 5385–5414. [Google Scholar] [CrossRef]
  31. Cao, Y.; Hou, L.; Li, J.; Liu, Z.; Sun, M. Joint representation learning of cross-lingual words and entities via attentive distant supervision. arXiv 2018, arXiv:1811.10776. [Google Scholar] [CrossRef]
  32. Lau, J.H.; Baldwin, T. An empirical evaluation of doc2vec with practical insights into document embedding generation. arXiv 2016, arXiv:1607.05368. [Google Scholar] [CrossRef]
  33. Hosseinnejad, R.; Habibizad Navin, A.; Rasouli Heikalabad, S.; Mohammadi, M.; Karimi, H. Combining 2-Opt and Inversion Neighborhood Searches with Discrete Gorilla Troops Optimization for the Traveling Salesman Problem. Available online: https://ssrn.com/abstract=4980127 (accessed on 15 February 2025).
  34. He, X.; Liao, L.; Zhang, H.; Nie, L.; Hu, X.; Chua, T.-S. Neural collaborative filtering. In Proceedings of the 26th International World Wide Web Conference, Perth, Australia, 3–7 April 2017; pp. 173–182. [Google Scholar]
  35. Qi, Y.; Zhang, X.; Hu, Z.; Liu, Y.; Wang, J.; Chen, K. Choosing the right collaboration partner for innovation: A framework based on topic analysis and link prediction. Scientometrics 2022, 127, 5519–5550. [Google Scholar] [CrossRef]
  36. Yang, J.; Wang, Y.; Zang, B.; Liu, H.; Zhang, Q.; Chen, Z. Research on digital matching methods integrating user intent and patent technology characteristics. Sci. Rep. 2025, 15, 18539. [Google Scholar] [CrossRef]
  37. Petruzzelli, A.M.; Albino, V.; Carbonara, N. Technology districts: Proximity and knowledge access. J. Knowl. Manag. 2007, 11, 98–114. [Google Scholar] [CrossRef]
  38. Yoon, J.; Park, H.; Kim, K. Identifying technological competition trends for R&D planning using dynamic patent maps: SAO-based content analysis. Scientometrics 2013, 94, 313–331. [Google Scholar]
  39. Tang, J.; Wang, B.; Yang, Y.; Hu, W.; Zhang, L.; Yang, Q. Patentminer: Topic-driven patent analysis and mining. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 12–16 August 2012; pp. 1366–1374. [Google Scholar]
  40. Grimaldi, M.; Cricelli, L.; Rogo, F. Valuating and analyzing the patent portfolio: The patent portfolio value index. Eur. J. Innov. Manag. 2018, 21, 174–205. [Google Scholar] [CrossRef]
  41. Yu, L.; Sun, L.; Du, B.; Zhang, M.; Liu, J. Heterogeneous graph representation learning with relation awareness. IEEE Trans. Knowl. Data Eng. 2022, 35, 5935–5947. [Google Scholar] [CrossRef]
  42. Ge, X.; Yang, Y.; Peng, L.; Zhang, H.; Li, X.; Wang, J. Spatio-temporal knowledge graph based forest fire prediction with multi source heterogeneous data. Remote Sens. 2022, 14, 3496. [Google Scholar] [CrossRef]
  43. Alilu, E.; Derakhshanfard, N.; Ghaffari, A. An Adaptive Data Gathering Scheduler Based on Data Variance for Energy Efficiency in Mobile Social Networks. Int. J. Networked Distrib. Comput. 2025, 13, 23. [Google Scholar] [CrossRef]
  44. He, X.; Wu, S.; Wu, Y.; Zhang, J.; Liu, H.; Chen, Z. Recommendation of patent transaction based on attributed heterogeneous network representation learning. J. China Soc. Sci. Tech. Inf. 2022, 41, 1214–1228. [Google Scholar]
  45. Ma, X.; Deng, Q.; Zhang, H.; Liu, Y.; Chen, J.; Wang, Z. Explainable paper recommendations based on heterogeneous graph representation learning and the attention mechanism. J. China Soc. Sci. Tech. Inf. 2024, 43, 802–817. [Google Scholar]
  46. Zhang, Z.; Yu, C.; Wang, J.; Liu, H.; Chen, Y.; Li, X. A temporal evolution and fine-grained information aggregation model for citation count prediction. Scientometrics 2025, 130, 2069–2091. [Google Scholar] [CrossRef]
  47. Jaffe, A.B.; Lerner, J. Innovation and Its Discontents: How Our Broken Patent System Is Endangering Innovation and Progress, and What to Do About It; Princeton University Press: Princeton, NJ, USA, 2011. [Google Scholar]
  48. Park, Y.; Yoon, J. Application technology opportunity discovery from technology portfolios: Use of patent classification and collaborative filtering. Technol. Forecast. Soc. Chang. 2017, 118, 170–183. [Google Scholar] [CrossRef]
  49. Li, J.; Zhang, Q.; Liu, W.; Wu, Z.; Chen, L.; Wang, J. Another perspective of over-smoothing: Alleviating semantic over-smoothing in deep GNNs. IEEE Trans. Neural Netw. Learn. Syst. 2024, 36, 6897–6910. [Google Scholar] [CrossRef]
  50. Bodley-Scott, T.; Oymak, E. Enabling an environment for transformational strategic alliances. In University–Industry Partnerships for Positive Change; Policy Press: Bristol, UK, 2022; pp. 55–114. [Google Scholar]
  51. Sasikala, S.; Christopher, A.B.A.; Geetha, S.; Kannan, A.; Rajkumar, R. A predictive model using improved normalized point wise mutual information (INPMI). In Proceedings of the 2013 Eleventh International Conference on ICT and Knowledge Engineering, Bangkok, Thailand, 20–22 November 2013; pp. 1–9. [Google Scholar]
  52. Rendle, S.; Freudenthaler, C.; Gantner, Z.; Schmidt-Thieme, L. BPR: Bayesian personalized ranking from implicit feedback. arXiv 2012, arXiv:1205.2618. [Google Scholar] [CrossRef]
  53. Toole, A.; Jones, C.; Madhavan, S. Patentsview: An Open Data Platform to Advance Science and Technology Policy. 2021. Available online: https://ssrn.com/abstract=3874213 (accessed on 15 February 2025).
  54. Tang, J.; Zhang, J.; Yao, L.; Li, J.; Zhang, L.; Su, Z. Arnetminer: Extraction and mining of academic social networks. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, NV, USA, 24–27 August 2008; pp. 990–998. [Google Scholar]
  55. Zhang, S.; Wang, M.; Wang, W.; Liu, X.; Chen, Y.; Li, J.; Zhou, Z.; Huang, Q.; Wu, Y.; Sun, H. Adaptations of ROUGE and BLEU to better evaluate machine reading comprehension task. arXiv 2018, arXiv:1806.03578. [Google Scholar] [CrossRef]
  56. Lin, C.Y. Rouge: A package for automatic evaluation of summaries. In Text Summarization Branches Out; Association for Computational Linguistics: Barcelona, Spain, 2004; pp. 74–81. [Google Scholar]
  57. Deng, K.; Li, X.; Lu, J.; Zhou, X. Best keyword cover search. IEEE Trans. Knowl. Data Eng. 2014, 27, 61–73. [Google Scholar] [CrossRef]
  58. He, X.; Deng, K.; Wang, X.; Li, Y.; Zhang, Y.; Wang, M. Lightgcn: Simplifying and powering graph convolution network for recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, China, 25–30 July 2020; pp. 639–648. [Google Scholar]
  59. Zhang, S.; Wang, M.; Wang, W.; Liu, X.; Chen, Y.; Li, J.; Zhou, Z.; Huang, Q.; Wu, Y.; Sun, H. Glint-ru: Gated lightweight intelligent recurrent units for sequential recommender systems. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Toronto, ON, Canada, 3–7 August 2025; Volume 1, pp. 1948–1959. [Google Scholar]
  60. Zhang, Y.; Zhang, Y.; Liao, W.; Liu, H.; Chen, J.; Wang, X. Multi-view self-supervised learning on heterogeneous graphs for recommendation. Appl. Soft Comput. 2025, 174, 113056. [Google Scholar] [CrossRef]
  61. Chang, X. Research on recommendation algorithm based on knowledge graph. In Proceedings of the 2024 4th International Conference on Bioinformatics and Intelligent Computing, Beijing, China, 26–28 January 2024; pp. 66–75. [Google Scholar]
  62. Lai, K.H.; Yang, Z.R.; Lai, P.Y.; Wong, K.C.; Cheung, W.K. Knowledge-aware explainable reciprocal recommendation. Proc. AAAI Conf. Artif. Intell. 2024, 38, 8636–8644. [Google Scholar] [CrossRef]
  63. Zhang, Y.; Yang, L.; Chen, D.; Liu, X.; Wang, J.; Zhao, Z. Faithful Explainable Recommendation via Neural Logic Reasoning. In Proceedings of the 17th ACM International Conference on Web Search and Data Mining (WSDM), Online, 4–8 March 2024; pp. 982–991. [Google Scholar]
  64. Agarwal, C.; Krishna, S.; Saxena, E.; Goel, A.; Saha, A.; Rudin, C. OpenXAI: Towards a Transparent Evaluation of Post-hoc Explanations. Adv. Neural Inf. Process. Syst. (NeurIPS) 2022, 35, 15784–15799. [Google Scholar]
  65. Lee, J.; Park, S.; Lee, J. Exploring potential R&D collaboration partners using embedding of patent graph. Sustainability 2023, 15, 14724. [Google Scholar] [CrossRef]
Figure 1. The overall architecture of the TEAHG-EPR model. It consists of four integrated modules: (I) Temporal-Aware Attributed Heterogeneous Graph Construction, (II) Neighborhood Aggregation & Temporal Evolution, (III) Uncertainty-Aware Fine-Grained Recommendation, and (IV) Feature-Based Explainable Text Generation. The model is trained end-to-end using a joint loss function, followed by a post-processing stage for personalization.
Figure 1. The overall architecture of the TEAHG-EPR model. It consists of four integrated modules: (I) Temporal-Aware Attributed Heterogeneous Graph Construction, (II) Neighborhood Aggregation & Temporal Evolution, (III) Uncertainty-Aware Fine-Grained Recommendation, and (IV) Feature-Based Explainable Text Generation. The model is trained end-to-end using a joint loss function, followed by a post-processing stage for personalization.
Symmetry 18 00078 g001
Figure 2. Simplified Data Flow Diagram of TEAHG-EPR. The workflow proceeds linearly from left to right: (1) Spatio-Temporal Encoding: The model processes the temporal graph sequence using R-GCN for spatial features and Bi-LSTM for temporal evolution. (2) Dual Representation: The fused features are mapped into a deterministic representation (for core identity) and an uncertainty modeling module (for exploration potential). (3) Multi-Task Output: These representations jointly drive the hybrid similarity scoring for recommendation and the feature-guided generator for textual explanations.
Figure 2. Simplified Data Flow Diagram of TEAHG-EPR. The workflow proceeds linearly from left to right: (1) Spatio-Temporal Encoding: The model processes the temporal graph sequence using R-GCN for spatial features and Bi-LSTM for temporal evolution. (2) Dual Representation: The fused features are mapped into a deterministic representation (for core identity) and an uncertainty modeling module (for exploration potential). (3) Multi-Task Output: These representations jointly drive the hybrid similarity scoring for recommendation and the feature-guided generator for textual explanations.
Symmetry 18 00078 g002
Figure 3. The “Aggregate-then-Encode” process for temporal evolution representation. For each time snapshot G(t), node features are first aggregated via R-GCN layers. The resulting sequence of snapshot-specific representations {h(tL),…,h(t)} is then fed into a Bidirectional LSTM to capture dynamic evolutionary patterns, culminating in the final Temporal Evolution Matrix.
Figure 3. The “Aggregate-then-Encode” process for temporal evolution representation. For each time snapshot G(t), node features are first aggregated via R-GCN layers. The resulting sequence of snapshot-specific representations {h(tL),…,h(t)} is then fed into a Bidirectional LSTM to capture dynamic evolutionary patterns, culminating in the final Temporal Evolution Matrix.
Symmetry 18 00078 g003
Figure 4. Detailed workflow of the Uncertainty-Aware Fine-Grained Recommendation Module. It illustrates how the deterministic Body Representation and Temporal Attribute Representation are fused via Multi-Head Attention into a unified deterministic vector. This vector is then projected into a probabilistic Gaussian distribution. The final recommendation score is a weighted combination of the deterministic cosine similarity and the probabilistic KL divergence.
Figure 4. Detailed workflow of the Uncertainty-Aware Fine-Grained Recommendation Module. It illustrates how the deterministic Body Representation and Temporal Attribute Representation are fused via Multi-Head Attention into a unified deterministic vector. This vector is then projected into a probabilistic Gaussian distribution. The final recommendation score is a weighted combination of the deterministic cosine similarity and the probabilistic KL divergence.
Symmetry 18 00078 g004
Figure 5. Impact of the temporal window size L on NDCG@10.
Figure 5. Impact of the temporal window size L on NDCG@10.
Symmetry 18 00078 g005
Figure 6. Impact of the output representation dimension dout on NDCG@10.
Figure 6. Impact of the output representation dimension dout on NDCG@10.
Symmetry 18 00078 g006
Figure 7. Impact of the similarity balance coefficient a on NDCG@10.
Figure 7. Impact of the similarity balance coefficient a on NDCG@10.
Symmetry 18 00078 g007
Figure 8. Impact of the KL divergence temperature λ on NDCG@10. The curve demonstrates that performance peaks at λ = 0.1, suggesting this value provides the optimal balance for the probabilistic similarity metric.
Figure 8. Impact of the KL divergence temperature λ on NDCG@10. The curve demonstrates that performance peaks at λ = 0.1, suggesting this value provides the optimal balance for the probabilistic similarity metric.
Symmetry 18 00078 g008
Figure 9. Visualization of Organization Clusters (K = 6).
Figure 9. Visualization of Organization Clusters (K = 6).
Symmetry 18 00078 g009
Figure 10. Comparison of Diversity and Coverage.
Figure 10. Comparison of Diversity and Coverage.
Symmetry 18 00078 g010
Figure 11. Visualization of Key Meta-Path Instances for Recommendation Explanation.
Figure 11. Visualization of Key Meta-Path Instances for Recommendation Explanation.
Symmetry 18 00078 g011
Table 1. A temporal relationship example of patent p1.
Table 1. A temporal relationship example of patent p1.
YearInventedByBelongsToCitesOwnedBy
2019i1, i2c1-a1
2020i1, i2c1p2a1
2021i1, i2c1p2, p3a1
2022i1, i2c1, c2p2, p3, p4a1
2023i1, i2c1, c2p2, p3, p4, p5a1
2024i1, i2, i3c1, c2p2, p3, p4, p5, p6a1
Table 2. Model Hyperparameter Settings.
Table 2. Model Hyperparameter Settings.
Hyper-ParameterSymbolValueDescription
Node Embedding Dimensionds128Output dimension of the R-GCN layers
Temporal Evolution Dimensiondp128Output dimension of the Bi-LSTM layers
Number of Attention Headsh8Number of heads in the multi-head attention
Hidden Dimensiondh256Hidden state dimension for the GFRU
FFN Dimensiondff512Intermediate layer dimension in the FFN
Temporal Window SizeL5Number of historical years to consider
Number of Negative SamplesK5Number of negative samples per positive instance
Recommendation Loss Weightμ1.0Weight for the recommendation task loss
Explanation Loss Weightω0.5Weight for the explanation generation task loss
L2 Regularization Coefficientγ10−5Strength of the L2 regularization
Similarity Balance Coefficientα0.6Weight for balancing deterministic and probabilistic similarity
KL Divergence Temperatureλ0.1Scaling parameter for the probabilistic distance
Table 3. Statistics of the USPTO-Semiconductor Dataset.
Table 3. Statistics of the USPTO-Semiconductor Dataset.
StatisticCountStatisticCount
Time Span2005–2024Cites Relations8,734,592
Patents1,254,321InventedBy Relations3,456,789
Organizations23,456BelongsTo Relations2,876,543
Inventors543,210OwnedBy Relations1,123,456
IPCs1234Acquired (Train)154,321
Total Nodes1,822,221Acquired (Val/Test)12,345/12,345
Total Edges16,345,701Sparsity~0%
Table 4. Statistics of the AMiner-CS Dataset.
Table 4. Statistics of the AMiner-CS Dataset.
StatisticCountStatisticCount
Time Span2005–2024Cites Relations25,789,123
Papers4,321,098Writes Relations15,432,109
Authors5,678,901PublishedIn Relations4,321,098
Organizations34,567Interactions (Train)12,345,678
Venues5432Interactions (Val/Test)23,456/23,456
Total Nodes10,040,000Total Edges45,542,330
Table 5. Performance Comparison of TEAHG-EPR and its Variants on the USPTO-Semiconductor Dataset (K = 10).
Table 5. Performance Comparison of TEAHG-EPR and its Variants on the USPTO-Semiconductor Dataset (K = 10).
ModelHR@10NDCG@10BLEU-4KC
TEAHG-EPR (Full Model)0.85420.65180.43150.7850
w/o Temporal0.82150.61800.42900.7795
w/o Uncertainty0.83980.63750.43080.7841
w/o Attention0.83550.63020.41880.7652
w/o Joint-Training0.84600.64330.39560.7104
w/o Feature-Guidance0.85390.65150.35210.1234
Table 6. Performance Comparison of All Models on the USPTO-Semiconductor Dataset.
Table 6. Performance Comparison of All Models on the USPTO-Semiconductor Dataset.
ModelHR@10NDCG@10HR@20NDCG@20HR@50NDCG@50BLEU-4ROUGE-LKC
BPR-MF0.45120.28450.58210.34560.73450.4123---
LightGCN0.71050.50110.81230.56780.90120.6234---
GLINT-RU0.69880.48900.79540.55010.88760.6012---
DHGPF0.80150.59870.88760.65430.94530.7011---
KGAT-SS0.83240.62780.90110.68010.95670.72560.38120.45030.1567
KAERR0.78500.57650.86540.63210.92340.68050.35430.43210.4501
TEAHG-EPR0.85420.65180.91560.70120.96780.74550.43150.48500.7850
Table 7. Performance Comparison of All Models on the AMiner-CS Dataset (K = 10).
Table 7. Performance Comparison of All Models on the AMiner-CS Dataset (K = 10).
ModelHR@10NDCG@10BLEU-4ROUGE-LKC
BPR-MF0.38760.2311---
LightGCN0.65430.4532---
GLINT-RU0.66120.4601---
DHGPF0.72010.5187---
KGAT-SS0.74350.53780.40110.47020.1899
KAERR0.70220.50130.38050.45650.4876
TEAHG-EPR0.76540.56010.45230.50110.8012
Table 8. Faithfulness Evaluation Results (Fidelity Score) (Higher values indicate that the explanation is more faithful to the model’s prediction logic).
Table 8. Faithfulness Evaluation Results (Fidelity Score) (Higher values indicate that the explanation is more faithful to the model’s prediction logic).
Explainer MethodFidelity Score (Prob. Drop)Relative Improvement
Random Selection0.032-
Retrieval-based (TF-IDF)0.152Baseline
Attention-based (KGAT-SS)0.285+87.5%
TEAHG-EPR (Ours)0.421+176.9%
Table 9. Recommendation Performance on Cold-Start Patents (USPTO) (Cold-Start: Degree ≤≤ 3 in citation/transfer graph).
Table 9. Recommendation Performance on Cold-Start Patents (USPTO) (Cold-Start: Degree ≤≤ 3 in citation/transfer graph).
ModelHR@10NDCG@10Relative Drop (vs. Full)
LightGCN0.21050.1532−69.4%
KGAT-SS0.61200.4532−27.8%
TEAHG-EPR0.68450.5124−21.4%
Table 10. Top-3 Patent Recommendations and Generated Explanations for Intel Corporation.
Table 10. Top-3 Patent Recommendations and Generated Explanations for Intel Corporation.
RankRecommended PatentGenerated Explanation
1US 11,211,465: “Nanosheet semiconductor device and method of manufacturing the same”“This patent is recommended because it aligns with Intel’s strategic focus on nanosheet transistor architecture. The technology, which involves vertically stacked gate-all-around FETs, is highly relevant given Intel’s recent advancements in RibbonFET and PowerVia technologies.”
2US 11,289,397: “Hybrid bonding with improved alignment”“We recommend this patent due to its strong relevance to hybrid bonding and advanced packaging. The proposed method for integrating HBM directly onto a logic die is a key technology, echoing Intel’s publicly stated goals in developing Foveros and EMIB chiplet integration.”
3US 11,271,139: “Methods for monolithic integration of quantum dot light emitting diodes on silicon”“This recommendation points to a potential new frontier: silicon photonics. The patent describes a method for integrating quantum dot emitters, which is highly relevant for future optical interconnects. This aligns with a potential long-term strategic direction for overcoming data transfer bottlenecks.”
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Huang, K.-W. Navigating Technological Frontiers: Explainable Patent Recommendation with Temporal Dynamics and Uncertainty Modeling. Symmetry 2026, 18, 78. https://doi.org/10.3390/sym18010078

AMA Style

Huang K-W. Navigating Technological Frontiers: Explainable Patent Recommendation with Temporal Dynamics and Uncertainty Modeling. Symmetry. 2026; 18(1):78. https://doi.org/10.3390/sym18010078

Chicago/Turabian Style

Huang, Kuan-Wei. 2026. "Navigating Technological Frontiers: Explainable Patent Recommendation with Temporal Dynamics and Uncertainty Modeling" Symmetry 18, no. 1: 78. https://doi.org/10.3390/sym18010078

APA Style

Huang, K.-W. (2026). Navigating Technological Frontiers: Explainable Patent Recommendation with Temporal Dynamics and Uncertainty Modeling. Symmetry, 18(1), 78. https://doi.org/10.3390/sym18010078

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop