3.2. Precision Recommendation Algorithm for Complete Demands
This study defines the recommendation algorithm for identifying patent supply solutions for complete demands as the “Precision Recommendation Algorithm for Complete Technological Demands” (PR-CTD). PR-CTD aims to identify the most precise patent supply solutions from the patent supply knowledge graph for individual complete enterprise demands. During the recommendation process, it must ensure that recommended patents achieve semantic correspondence and quantitative coverage across all five element types—Material, Method, Efficacy, Product, and Application. Under the objective of precise patent recommendation, PR-CTD emphasizes three aspects: (1) Element matching results must consist of five element entities from the same patent, avoiding “pseudo-matching results” formed by cross-patent element splicing; (2) Dual constraints of supply–demand element semantic matching and patent element type completeness must be simultaneously satisfied, ensuring recommendation results are reasonable in both element content and element quantity; (3) Patent recommendation results must possess interpretability, providing rationales for patent recommendations and offering matching bases for both supply and demand parties.
Building upon the traditional knowledge graph embedding model TransE and integrating LLMs with enterprise demand lexical set, this study employs Retrieval-Augmented Generation (RAG) technology, enabling LLMs to transform demand texts into retrieval statements under demand constraints, compensating for deficiencies of traditional Trans models [
44]. Compared with traditional single Trans models, this method offers the following advantages:
(1) Enhanced semantic understanding capability. Addressing TransE’s limitation of primarily relying on graph structure learning with limited semantic understanding, this study proposes a dual-vector encoding method combining BERT and TransE. BERT leverages semantic knowledge acquired during pre-training to effectively identify synonyms, hierarchical terms, and functionally equivalent expressions in supply–demand texts, enhancing the model’s understanding of natural language semantic differences. TransE retains its advantage in modeling knowledge graph structural relationships [
45]. The fusion of both achieves unification of semantic understanding and structural modeling, resolving the issue where traditional TransE methods identify synonymous expressions as different entities, thereby affecting matching accuracy.
(2) Strengthened global consistency constraint mechanism. Addressing the lack of global constraints in Trans-series models, this study designs an intelligent constraint retrieval mechanism based on RAG. During the query generation stage, “same patent node constraints” are embedded in the LLM’s retrieval instructions. During the candidate patent generation stage, retrieval conditions for complete coverage of all five element types are prioritized. The enterprise demand lexical tree provides semantic support for RAG, ensuring constraint conditions are accurately transmitted during semantic expansion. This method simultaneously considers both element content matching and element completeness constraints, ensuring recommended patents not only accurately correspond to enterprise demand elements at the semantic level but also completely cover all five element types in content. The enterprise demand lexical tree provides semantic support for RAG, ensuring that constraint conditions are accurately transmitted during semantic expansion. Specifically, the demand lexical tree is constructed using the five demand elements as the first-level nodes, forming a hierarchical demand terminology structure. Core demand terms extracted from enterprise technological demand texts are first organized through manual annotation, and then expanded using a large language model to generate semantically related expressions such as synonyms, alternative terms, and hypernyms. Based on this structure, each identified demand element can be expanded into an extended semantic set , which is used during the RAG retrieval process to improve semantic coverage between demand expressions and patent texts. This method simultaneously considers both element content matching and element completeness constraints, ensuring recommended patents not only accurately correspond to enterprise demand elements at the semantic level but also completely cover all five element types in content.
(3) Effective handling of data sparsity and sampling bias issues. Traditional deep learning models typically rely on large-scale training when addressing data sparsity and sampling bias issues, resulting in low efficiency and high costs. This study introduces RAG, transforming the training-dependent mode into an intelligent reasoning mode. Within this framework, the enterprise demand lexical tree serves as a terminology set for demand expansion, providing semantic context for LLMs; LLMs function as intelligent reasoning engines, understanding deep semantics of demands and generating high-quality retrieval statements; and the knowledge graph serves as the patent supply knowledge base, providing relational constraints for supply–demand matching. Through this triple assurance mechanism, RAG can transform demands into precise patent queries, effectively addressing sampling bias and sparse data issues in traditional methods.
The PR-CTD scheme is illustrated in
Figure 2 and comprises five main stages:
Stage 1: Demand Parsing and Element Expansion
Demand parsing is the initial step of the PR-CTD algorithm, with the objective of identifying five element types from complete enterprise demands. The complete enterprise demand text is denoted as D, and five element types are extracted, Material, Method (Approach), Efficacy, Product, and Application (Use), as shown in Formula (1):
where
represents the material element set,
represents the method element set,
represents the efficacy element set,
represents the product element set, and
represents the application element set.
To address the semantic diversity issue in demand expression, the PR-CTD algorithm introduces the demand lexical tree for element expansion. For each original element , based on the hierarchical structure and semantic associations of the demand lexical tree , an expanded element set is generated.
Stage 2: Dual-Vector Encoding and Fusion
To fully utilize element semantic information and structural information from the patent supply knowledge graph, the PR-CTD algorithm employs a dual-vector encoding strategy combining BERT and TransE, comprehensively integrating element semantic and structural information. BERT performs deep semantic encoding of expanded elements through a pre-trained language model, obtaining vector representations containing contextual information. TransE associates elements with corresponding entities in the knowledge graph, obtaining embedding vectors that reflect structural relationships among elements. Finally, these two vector types are unified through a weighted fusion strategy. For each element , its BERT semantic vector and TransE structural vector are obtained respectively.
BERT Semantic Vector Encoding: Element text is input into the pre-trained BERT model, utilizing the hidden state corresponding to the [CLS] token in the final layer as the semantic representation of the element (Formula (2)):
TransE Structural Vector Acquisition: Element
is mapped to its corresponding entity node
in the patent supply knowledge graph, and its structural embedding vector pre-trained in the TransE model is extracted (Formula (3)):
To achieve effective fusion of BERT semantic vectors and TransE structural vectors, it is necessary to address dimension alignment and weight allocation issues. The semantic vectors output by the BERT model have a dimension of 768, while the TransE model embedding vectors have a dimension of 128. The two cannot directly undergo numerical operations. Therefore, a linear transformation matrix
and fusion weight
are introduced, employing normalized weighted fusion as shown in Formula (4):
where
represents the final fused vector representation of element
. Both the fusion weight
and transformation matrix
are set as trainable parameters to construct a joint optimization objective based on the candidate patent set, as shown in Formula (5):
Iterative updates are performed through the Adam optimizer with a learning rate set to 0.01. The objective of each iteration is to maximize the comprehensive matching score of the candidate patent set, synchronously updating the fusion weight parameter and matrix parameter through backpropagation. The comprehensive score of candidate patents is defined as the sum of weighted scores across three dimensions for all candidate patents: supply–demand element conformity, patent element structural rationality, and patent element coverage, used to measure the overall matching quality between the entire candidate set and enterprise technological demands. To ensure the meaningfulness of the fusion weight, is constrained to the range [0, 1] after each parameter update.
Figure 3 illustrates the convergence trajectory of parameter
during the iteration process and corresponding performance changes. Experimental results indicate that the
value continuously decreases over 20 iterations and converges to 0.367, with the corresponding comprehensive matching score improving from 13.80 to 14.91. After the 12th iteration, the rate of
value decrease slows significantly, indicating that the model gradually approaches an optimal state. The improvement magnitude of the comprehensive matching score becomes more gradual, with diminishing marginal returns. Considering the limited marginal benefit of further training, this study establishes
at the 20th iteration as the optimal weight.
The optimal weight ratio determined through experimentation shows a BERT semantic information contribution of 36.7% and a TransE structural information contribution of 63.3%. This result indicates that in patent recommendation tasks, the contribution of structural information exceeds that of pure semantic information, aligning with the characteristics of patents having high structural organization and relatively explicit entity relationships.
Stage 3: Intelligent Constraint Retrieval Based on RAG
After obtaining demand vector representations, the PR-CTD algorithm employs RAG to screen candidate patent supply solutions from the patent supply knowledge graph. The objective of this stage is to efficiently screen a small-scale candidate patent set that highly matches enterprise technological demands from a large-scale patent collection, ensuring both retrieval precision and enhanced computational efficiency for subsequent stages. Unlike traditional query approaches, RAG integrates graph retrieval with LLM reasoning through a three-stage process of “retrieval–augmentation–generation,” achieving transformation from semantic retrieval to intelligent optimization. The prompt instructions for RAG-based candidate patent screening are provided in
Appendix B,
Figure A2 (Prompt template for candidate patent screening).
(1) Retrieval Stage
The objective of this stage is to execute constraint retrieval based on fused vectors in the patent supply knowledge graph, obtaining an initial candidate patent set with complete structure and relevant content.
Based on the fused vectors obtained in Stage 2, dual constraint retrieval is executed in the patent supply knowledge graph. CYPHER query statements are constructed in the Neo4j graph database to identify patents possessing all five elements through pattern matching and calculate similarity between the five-element fused vectors and patent element embedding vectors. Empirical results confirm that can identify element correspondence relationships while ensuring semantic accuracy; therefore, the similarity threshold is set as the patent retrieval threshold.
Constraint 1: CYPHER query statements are constructed to ensure that the five element types of candidate patents must originate from the same patent node, expressed as Formula (6):
represents a triple in the patent supply knowledge graph , indicating that patent is connected to element through relation . Constraint 1 aims to avoid pseudo-matching issues arising from splicing elements across different patents, ensuring element completeness of recommendation results.
Constraint 2: Patent ranking based on vector similarity. Priority is given to retrieving patents covering all five element types. When this condition cannot be satisfied, the requirement is relaxed to covering at least four element types with mandatory inclusion of Method and Efficacy, expressed as Formula (7):
represents the initial candidate patent set obtained from retrieval, providing input data for subsequent augmentation and generation stages. denotes the number of element categories covered by patent , and represents the set of element types contained in patent .
(2) Augmentation Stage
The objective of the augmentation stage is to combine retrieved patents with demand information to construct a contextual environment for LLM reasoning. This enables the model to possess an information foundation for executing intelligent reasoning and provides knowledge support for subsequent intelligent screening.
The top 20 candidate patents are selected from retrieval set
, their five element types are extracted, and these are combined with demand element information to form the augmented context
. Context
comprises two core components, expressed as Formula (8):
where
represents the complete enterprise demand element set, including demand elements and elements expanded based on the demand lexical tree, and
represents the candidate patent set, containing the five element types for each patent.
(3) Generation Stage
The objective of this stage is to invoke LLMs to execute intelligent reasoning on the augmented context, screening out the patent set that most closely aligns with enterprise demands. This study employs Alibaba Cloud Tongyi Qianwen Plus (Qwen-Plus) as the reasoning model and combines prompt engineering to design task-oriented templates that guide the model in executing two analysis tasks:
(1) Element matching analysis: Based on information from the demand lexical tree, identify semantic relationships between demands and patents including synonyms, hypernym–hyponym relationships, and equivalent terms, evaluating the matching quality of the five element types in candidate patents.
(2) Comprehensive screening decision: Comprehensively evaluate each candidate patent across three dimensions: supply–demand element conformity (prioritizing patents with explicit element correspondence relationships), patent element structural rationality (prioritizing patents with strong technical associations among elements), and patent element coverage completeness (prioritizing patents covering all elements). Based on this evaluation, 10 patents are selected from the candidates to form candidate patent set .
The entire RAG process achieves a complete workflow from vector retrieval through context augmentation to LLM intelligent optimization. The final output, candidate patent set , serves as input for multi-dimensional matching score calculation in Stage 4, providing a high-quality candidate patent set for comprehensive scoring and precise recommendation.
Stage 4: Multi-Dimensional Matching Score Calculation
After obtaining the RAG-optimized candidate patent set , the PR-CTD algorithm assesses the matching degree of each candidate patent across three dimensions: supply–demand content conformity, patent element structural rationality, and patent element coverage, establishing a quantitative evaluation framework:
(1) Supply–Demand Content Conformity (For Complete Demands)
This metric measures the degree of semantic matching between elements of enterprise technological demands and patent supply. The cosine similarity between demand element vectors and patent element vectors is calculated, as shown in Formula (9):
where
represents the vector representation of the
-th element type in the demand, and
represents the vector representation of the corresponding element in the patent.
(2) Patent Element Structural Rationality
This metric is used to verify the logical relationships among elements within the patent, ensuring that recommendation results are not only semantically similar but also structurally rational. TransE is utilized to verify the structural rationality of elements within patents:
where
represents the TransE vector of the patent node,
represents the relation vector,
represents the element entity vector, and
denotes the set of relations between the patent and its elements.
is the sigmoid function, used to convert the TransE distance metric into a standardized similarity score. The TransE model is based on the translation assumption
. When vectors of the patent node, relation, and element entity satisfy this assumption, the Euclidean distance
approaches zero, indicating that triples with smaller distances receive higher structural rationality scores.
The core significance of this metric lies in verifying the causal relationships and logical rationality between patents and their elements—specifically, whether each element is essential to implementing the technology. This ensures that recommended patents not only exhibit semantic similarity to demands at the textual level but also match demands in element composition, thereby avoiding misrecommendations arising from superficial textual similarity.
(3) Patent Element Coverage
This metric is used to evaluate the completeness of patent supply’s response to enterprise technological demand elements. It counts the number of elements in candidate patents whose semantic similarity with the five demand element types exceeds a threshold and calculates the coverage ratio, as shown in Formula (11):
This metric employs a threshold judgment mechanism: for each demand element type , the algorithm checks whether a corresponding element exists in the patent with similarity exceeding threshold . If such an element exists, the patent is considered to cover the -th element type of the demand. The final coverage score is the ratio of the number of covered element categories to the total number of element categories. A patent with a coverage score of 1 indicates it contains all five element types. This study sets the similarity threshold to 0.7.
Stage 5: Comprehensive Scoring and Optimal Selection
Based on the scores from the three dimensions above, the PR-CTD algorithm employs a weighted linear fusion strategy to calculate the comprehensive matching score for candidate patents, as shown in Formula (12):
where
and
are weight parameters satisfying
. The weights are determined according to the functional roles of different evaluation dimensions in the patent recommendation process.
Supply–demand element similarity directly measures the semantic correspondence between enterprise technological demands and patent contents and therefore reflects the core relevance of recommended patents. Because semantic consistency between demand descriptions and patent technical information is the primary criterion for determining whether a patent can potentially address a technological need, this dimension plays the most critical role in recommendation quality. Therefore, β1 is set to 0.5.
Patent element structural rationality evaluates the logical consistency among internal patent elements based on the TransE model. This metric verifies whether the relationships among materials, methods, and technical effects within a patent follow a coherent technical logic and thus helps ensure the technical plausibility of the recommendation results. Since this dimension functions as a structural validation mechanism rather than a direct relevance indicator, β2 is set to 0.3.
Patent element coverage evaluates whether the recommended patents contain a relatively complete set of relevant technical elements, ensuring the completeness of the recommendation results. Compared with semantic relevance and structural rationality, this metric mainly serves as a supporting indicator. Therefore, β3 is set to 0.2.
Finally, the patent with the highest comprehensive score is selected as the recommendation result:
Simultaneously, the algorithm provides comprehensive matching evidence, including matching scores across all dimensions and comprehensive score calculations, ensuring interpretability and traceability of recommendation results.
In summary, PR-CTD achieves supply–demand matching from complete demands to precise patent recommendations by integrating the semantic expansion capability of demand lexical trees, the deep semantic understanding capability of BERT, and the structural modeling capability of TransE, forming a precision recommendation algorithm oriented toward complete demands.
3.3. Fuzzy Recommendation Algorithm for Incomplete Demands
This study defines the recommendation algorithm for constructing multi-candidate patent sets for incomplete demands as the “Fuzzy Recommendation Algorithm for Incomplete Technological Demands” (FR-ITD). Enterprise incomplete demands suffer from ambiguous demand expression and element deficiency, making it difficult to achieve accurate patent supply–demand matching relying solely on enterprise demand texts. Therefore, the core objective of the FR-ITD algorithm is to identify multiple patent supply solutions based on elements and demand categories of incomplete demands, providing demand parties with multiple potentially needed patents for comparative selection. FR-ITD emphasizes three aspects: (1) content elements of recommended patents must match incomplete demand elements to ensure supply–demand content relevance; (2) FR-ITD must reasonably expand the recommendation scope based on demand categories to provide enterprises with potentially needed related patents; (3) the recommended patent set should present diverse technical approaches, providing enterprises with multiple options for comparative selection.
Existing patent recommendation methods exhibit certain limitations when addressing incomplete demands, primarily manifested in three aspects: First, the demand element deficiency problem—incomplete demands provide only partial element information, and matching based solely on this information tends to introduce numerous irrelevant patents due to excessively broad demand scope. Second, the demand category positioning problem—existing patent recommendation methods do not consider categories of incomplete demands, overlooking enterprises’ technology directions, technology foundations, and capability requirements, making it difficult to grasp which patents enterprises truly need. Third, insufficient diversity of recommendation solutions—most recommendation algorithms target precise matching, neglecting the actual need for diversified recommendation solutions for incomplete demands.
To address these issues, this study designs a fuzzy recommendation algorithm (FR-ITD) integrating the patent supply knowledge graph, demand categories, and LLMs. This algorithm supplements enterprise demand information through demand categories, constructing an augmented query context integrating demand element and demand category information. Patent retrieval is executed in the patent supply knowledge graph, with LLMs utilized to evaluate and screen candidate patents. Finally, comprehensive consideration of supply–demand content conformity, demand category alignment, and patent supply solution type produces patent recommendation solutions with high supply–demand matching and diverse solutions. Compared with traditional patent recommendation methods, FR-ITD offers two advantages:
(1) Demand categories supplement demand information. The demand category matrix integrates enterprise technology foundation, technology direction, and capability requirement information, reasonably expanding demand categories and refining demand bases for patent matching, which is conducive to improving relevance and effectiveness of patent recommendation results.
(2) LLMs enhance recommendation effectiveness and efficiency. FR-ITD inputs both demand information and patent information into LLMs, enabling them to identify content associations between patent solutions and incomplete demands based on semantic understanding capabilities. This guides comprehensive judgment from supply–demand content and demand category dimensions, rapidly locating patents highly relevant to incomplete demands among large-scale candidate patents, facilitating improved patent recommendation efficiency.
The FR-ITD scheme is illustrated in
Figure 4. The algorithm comprises four main stages:
Stage 1: Priority Ranking of Demand Category Terms
To ensure that demand category terms highly relevant to incomplete demands are prioritized in the patent supply–demand matching process, this study conducts priority ranking of demand category terms based on the demand category term matrix and similarity between “demand category terms—demand elements”.
First, the comprehensive fitness score of each demand category term
in the demand matrix is calculated, which is the average of similarity scores with other category terms, used to measure the association level with other category terms. A higher comprehensive fitness score indicates greater representativeness and importance in the demand, as shown in Formula (14):
where
represents the semantic similarity between demand category term
and the
-th category term from a different axis, and
is the number of terms from different axes relative to demand category term
. Based on the comprehensive fitness scores, the top 10 ranked demand keywords are selected to construct a candidate vocabulary pool.
Subsequently, the semantic similarity between each candidate term and demand elements is calculated. Element information is extracted from the demand, the BERT model generates element vector
, and the semantic similarity between candidate keyword
and demand elements is calculated, as shown in Formula (15):
where
represents the BERT vector representation of keyword
. Demand category terms are ranked according to semantic similarity scores, and the top 5 terms with the highest relevance to the current demand are selected as priority demand category terms for subsequent retrieval, as shown in Formula (16):
Stage 2: Augmented Query Context Construction
Augmented query context refers to integrating multiple types of information to form semantically more complete query content. FR-ITD organizes and combines enterprise demand texts, demand element information, and demand category terms to construct the augmented query context. This stage hierarchically organizes different types of information, providing retrieval conditions for the knowledge graph on one hand and supporting LLMs in understanding demands on the other, as shown in Formula (17):
where
represents the original text of the enterprise’s incomplete demand, preserving complete semantic information of the demand;
denotes element information extracted from the demand, ensuring accuracy of patent recommendation direction;
represents the top 5 demand category terms from Stage 1, supplementing demand categories;
comprises enterprise capability requirement terms and technology direction terms from the demand category matrix, providing enterprise background information for patent retrieval; and
,
, and
are semantic separators ensuring that LLMs can accurately distinguish different types of input information.
Stage 3: Intelligent Retrieval and Screening Based on Knowledge Graph
Based on the augmented query context constructed in Stage 2, this stage executes intelligent retrieval and screening in the patent supply knowledge graph. This stage comprises three steps:
(1) Patent Retrieval Term Extraction
Patent retrieval terms are extracted from the augmented query context , including demand elements, Top5 demand category terms, enterprise capability requirement terms, and technology direction terms. These terms are integrated to form the patent supply knowledge graph query term set.
(2) Knowledge Graph Constraint Retrieval
Based on the query term set, composite retrieval conditions are constructed in the patent supply knowledge graph. Priority is given to matching patents containing demand elements while expanding to patents associated with demand category terms. Entity retrieval is executed through MATCH pattern matching and WHERE condition filtering to obtain the initial candidate patent set .
(3) LLM-Based Candidate Patent Screening
The initial candidate patent set and its element information, together with the augmented query context, are jointly input into the LLM. A prompt is designed to guide the LLM in executing supply–demand matching analysis. The LLM’s candidate patent screening results are denoted as , serving as patents to be scored in Stage 4.
Stage 4: Patent Recommendation Scoring
Stage 3 obtained the candidate patent set . To evaluate the matching degree of candidate patents, FR-ITD quantitatively assesses the matching degree of candidate patents from two dimensions: supply–demand content conformity and demand category alignment.
(1) Supply–Demand Content Conformity (For Incomplete Demands)
Supply–demand content conformity measures the degree to which candidate patents respond to identified elements in incomplete demands. For each patent
in the candidate patent set, its supply–demand content conformity is calculated as shown in Formula (18):
where
represents the demand element set, and
represents the element set contained in patent
.
and
denote the BERT vector representations of demand elements and patent elements, respectively. By calculating similarity between demand elements and patent elements, this metric ensures that each known demand element can find corresponding responsive content in candidate patents, preventing recommendation results from deviating from demands.
(2) Demand Category Alignment
Demand category alignment is used to evaluate the degree of semantic matching between candidate patents and Top5 demand category terms, reflecting the response level of patent content to enterprise technological demand categories, as shown in Formula (19):
where
represents the Top5 demand category terms from Stage 1, and
denotes the vector representation of Top5 demand category terms. This metric measures the matching degree between patent content and demand categories by calculating semantic similarity between Top5 demand category terms and patent element vectors, enabling the algorithm to prioritize recommending patents that satisfy enterprise demand categories.
The final comprehensive score calculation is shown in Formula (20):
γ1 represents the weight parameter for supply–demand content conformity, while γ2 represents the weight parameter for demand category alignment. In this study, γ1 is set to 0.4 and γ2 is set to 0.6. Supply–demand content conformity measures the semantic correspondence between enterprise technological demands and patent technical content. It serves as a fundamental indicator for establishing supply–demand relationships in the patent recommendation process. The technical elements identified from demand texts represent the core content of enterprise technological needs. By semantically matching these elements with patent technical information, it is possible to determine whether a patent has the potential to address the technological problem described in the demand. Therefore, a moderate weight is assigned to supply–demand content conformity to ensure that the recommendation results maintain basic technical relevance while leaving sufficient space for demand category expansion. Demand category alignment evaluates the consistency between patent technologies and enterprise technological demands at the level of technological direction and capability foundation. Demand categories are typically constructed based on information such as the enterprise’s technological field, capability requirements, and technology development direction, which together reflect the overall technological positioning of the enterprise in the innovation process. In the patent recommendation framework, demand category alignment helps filter technical solutions that are consistent with the enterprise’s technological foundation and ensures that the recommended patents remain aligned with the enterprise’s technological development direction. Therefore, a relatively higher weight is assigned to this dimension, with γ2 = 0.6. Overall, the above weight settings are determined according to the functional roles of different evaluation dimensions in the patent recommendation process, ensuring the stability and rationality of the recommendation results.
Patents are ranked in descending order according to comprehensive scores. The method ultimately outputs a list of the top five recommended patents, with each patent accompanied by its supply–demand content conformity score and demand category alignment score, providing demand parties with multiple patent recommendation results with interpretable matching evidence.
FR-ITD addresses the element deficiency problem of incomplete demands through a four-stage cascading mechanism of “demand category term priority ranking + augmented context construction + knowledge graph intelligent retrieval and screening + two-dimensional scoring,” integrating demand category information to achieve supply–demand matching from vague demands to diversified patent candidate sets.