3.1. Research Framework
This study proposes a demand-driven patent recommendation method aimed at providing enterprises with patent supply solutions that effectively meet their technological needs. The method addresses key challenges in the patent recommendation process, including semantic misalignment between supply and demand and incomplete demand content. The framework consists of four sequential stages, as illustrated in
Figure 1.
Stage 1: Identification of supply–demand elements.
In this stage, enterprise technology demand texts and patent supply texts are collected separately. A topic clustering method is applied to identify the main constituent elements of both demand and supply content. Based on these results, an enterprise demand element system and a patent supply element system are constructed to provide a structured foundation and a semantic bridge for subsequent supply–demand correlation analysis.
Stage 2: Construction of the enterprise technical problem space and the patent solution space.
On the demand side, real enterprise technology demand texts are analyzed using a combination of content analysis and large language models to automatically extract and classify enterprise technical problems, forming a taxonomy of ten types of enterprise technical problem categories. On the supply side, patent texts are analyzed based on the forty inventive principles of TRIZ. These principles are optimized and reconstructed to establish a representative and discriminative classification system of patent solution types, forming a structured correspondence between enterprise problems and patent solutions.
Stage 3: Supply–demand correlation analysis based on element relationships.
According to the correspondence of supply–demand elements and the characteristics of demand types, the study categorizes supply–demand correlations into explicit correlations and implicit correlations. A hierarchical and progressive correlation pathway is constructed to reveal the matching patterns and logical mechanisms of different demand types, clarifying the direct matching mechanism for explicit demands and the reasoning-based matching mechanism for implicit demands.
Stage 4: Demand-driven patent recommendation.
In this stage, differentiated patent recommendation strategies are designed for explicit and implicit demands. Specific evaluation indicators and scoring methods are developed according to the characteristics of each demand type to quantitatively assess candidate patents and select those that best meet enterprise requirements as recommendation results. Finally, representative enterprise demand cases are used to validate the proposed method and evaluate its recommendation performance, confirming the model’s effectiveness and practical applicability.
3.2. Classification and Identification of Enterprise Technology Demands
According to structural functional theory, elements are the basic units that constitute objective phenomena [
28]. Different elements perform specific functions and form a complete system through relatively stable interconnections. Although the specific technological demands of enterprises are diverse, from the perspective of element decomposition, these demands can be broken down into a set of fundamental units with intrinsic logical relationships.
Existing studies have not clearly defined or classified the elements of enterprise technological demands. Current academic research mainly relies on patents and scientific papers as analytical texts, extracting technical elements from them and constructing general classification systems that include dimensions such as products, methods, materials, components or parts, efficacy or performance, technical attributes, application fields, and influencing factors [
29]. The technological element systems derived from such scientific and technical literature are characterized by academic rigor and specialization, focusing primarily on explaining the principles of technological innovation and implementation mechanisms.
To reveal the content structure and expression characteristics of enterprise technological demands, this study conducts content analysis based on authentic enterprise demand data. Accordingly, the provincial-level Technology Transfer and Commercialization Public Service Platforms established across China were selected as the primary data source. These platforms are constructed and managed under the supervision of provincial government departments and possess three key advantages: authenticity, comprehensiveness, and public welfare orientation.
First, compared with commercial patent transaction platforms or user-generated information on social media, provincial platforms offer higher data authenticity. Each platform enforces strict qualification review procedures, allowing only legally registered enterprises to submit and publish their technological demands. This mechanism ensures the traceability and reliability of enterprise information.
Second, these platforms demonstrate remarkable advantages in terms of data coverage and industrial representativeness. They aggregate a large number of enterprise technology demand records from multiple key sectors—such as energy, chemical engineering, power, electronic information, equipment manufacturing, and modern agriculture—covering enterprises of various sizes and types. This wide coverage provides a comprehensive reflection of the technological demand characteristics of China’s industrial sectors.
Third, the enterprise demand texts collected from these provincial platforms primarily focus on practical technological problems, transformation bottlenecks, and application objectives. As such, they objectively capture the real needs of enterprises in technological innovation and industrial upgrading, offering high policy relevance and strong application-oriented value.
Therefore, this study collected demand texts from provincial-level public service platforms for technology transfer and commercialization established in Anhui, Fujian, Gansu, Hebei, Henan, Heilongjiang, Hunan, Jilin, Jiangxi, Shanxi, Shaanxi, and Yunnan. These enterprise demand texts were compared with scientific and technical literature such as patents and research papers. The comparison revealed that enterprise technological demands place greater emphasis on practical application issues, focusing on technical challenges, directions for improvement, and expected outcomes, while paying relatively little attention to the principles of innovation or technical implementation paths. This difference in content focus suggests that element systems constructed from scientific literature are inadequate for accurately representing the content characteristics of enterprise technological demands. Therefore, it is necessary to refine and optimize existing technical element classification systems by integrating enterprise demand texts, in order to construct a classification framework better suited to describing enterprise technological demands.
In this study, 5000 enterprise technological demand texts in the new energy field were selected as the research sample. The BERTopic model was employed to perform topic modeling and semantic clustering analysis for extracting the content elements of enterprise technology demands. BERTopic integrates Transformer-based semantic encoding with the class-based TF-IDF (c-TF-IDF) weighting algorithm, which allows the model to preserve high-value semantic information during topic modeling and enhances both topic distinctiveness and interpretability [
30]. The model parameters were configured as follows: the text embedding model paraphrase-multilingual-MiniLM-L12-v2 was used, with
n_neighbors = 15,
n_components = 2, min_cluster_size = 80, and min_samples = 10. These settings achieve a balance between topic coherence and clustering granularity, ensuring that the resulting clusters accurately reflect meaningful patterns in enterprise technological demands.
Based on the clustering results, this study further decomposed enterprise technological demands into five fundamental content elements, which provide a clear structural foundation for subsequent analyses, including demand classification, demand identification, and supply–demand matching.
Table 1 summarizes the thematic clustering results of enterprise technological demands, while
Table 2 presents the definitions and corresponding meanings of each element category.
Ideally, a complete and well-defined enterprise technology demand should include all five types of elements, thereby providing a comprehensive representation of the enterprise’s technological requirements. In practice, however, the level of detail in such demand disclosures varies considerably, and some demands may lack one or more elements. In the field of economics, Luo Yongtai classified consumer demand into explicit, semi-implicit, and implicit types based on consumers’ levels of information cognition and price sensitivity when analyzing market demand. Building upon this theoretical framework, the present study adapts and extends it to the context of enterprise technological needs by considering the completeness of demand content elements [
31]. Accordingly, enterprise technology demands are categorized into explicit and implicit types. Explicit technological demands contain complete descriptions of all five elements, clearly articulating the technical problem, implementation method, required materials, performance indicators, target product, and application scenario. In contrast, implicit technological demands are characterized by missing elements and provide only partial information, reflecting an enterprise’s incomplete understanding or imprecise expression of its technological requirements.
For explicit technology demands, enterprises can provide detailed descriptions of the five content elements: material, method, efficacy, product, and application. The terms corresponding to each element carry clear technical meanings and effectively reflect the enterprise’s focal concerns and expected objectives regarding the required technology. Therefore, this study adopts a direct extraction method to identify representative technical terms within the five element categories and constructs a complete demand expression framework.
For implicit technological demands, the absence of one or more content elements often leads to identification results that are imprecise, unclear, or incomplete. The objective of identifying such demands is not to reconstruct the full technical details but rather to infer the potential categories of technological needs by leveraging multi-source data related to enterprise technological activities. To support the identification of implicit technological demand categories, this study incorporates three types of auxiliary data that are closely associated with enterprise technological activities: (1) Enterprise basic information. This refers to background data such as the industry to which the enterprise belongs and its main business activities. These data serve as the basis for determining the enterprise’s technological background and related fields. By analyzing the enterprise’s industry position and core business direction, it becomes possible to infer the likely application scenarios and technological domains, thereby providing contextual information for understanding enterprise technological needs. (2) Enterprise patent data. Patent data reflect an enterprise’s R&D achievements and indirectly reveal its technological innovation focus. They represent the technological foundation of what an enterprise is capable of accomplishing (“what it can do”). Such data provide valuable references for understanding an enterprise’s technological capabilities and strategic R&D orientation. (3) Enterprise technology recruitment information. This refers to publicly released information on job postings that express an enterprise’s demand for specific technical expertise. These recruitment postings typically list required technical skills and competencies, which reveal the technologies that enterprises are currently prioritizing. Moreover, they indirectly indicate potential deficiencies or capability gaps in the enterprise’s ongoing R&D activities.
- (1)
Enterprise basic information: identifies the technological direction category.
Enterprise technological directions are closely related to their industrial attributes. Based on the national standard Industrial Classification for National Economic Activities (GB/T 4754–2017) [
32], this study proposes a text semantic matching method that combines TF-IDF and cosine similarity to identify the technological direction categories of enterprises [
33]. The information on each enterprise’s industry and main business was obtained from its official website and the National Enterprise Credit Information Publicity System. The enterprise text was vectorized and compared with the text entries of the national industry classification using cosine similarity, as shown in Equation (1). Here,
and
denote the TF-IDF vectors of the enterprise text and the industry classification text, while
and
represent the
i-th dimension values of the vectors. The top three industry categories with the highest similarity scores were selected as candidate results. These were further verified manually according to the enterprise’s actual business scope to determine its final technological direction category.
- (2)
Enterprise patent data: identifies the technological foundation category.
Enterprise patent data record past R&D investment and innovation activities, providing the basis for identifying technological foundations. In this study, patent data were collected from the IncoPat patent database, using enterprise names as retrieval keywords to obtain all published patents. The titles and abstracts of the retrieved patents were extracted, and the Chinese text was segmented using the jieba tool. A stopword list specific to technological domains (e.g., “method,” “system,” “device,” and other high-frequency general terms) was applied to remove non-informative words, while nouns and noun phrases were retained through part-of-speech tagging. Weighted TF-IDF was used to calculate the term importance scores, and the top ten technical keywords with the highest weights were selected. These keywords reflect the enterprise’s technological foundation and are used in this study as categorical delimiters for identifying implicit technological needs.
- (3)
Enterprise technical capability requirements: identify the capability demand category.
This study also uses publicly available recruitment information as a data source, obtained from enterprise websites and major recruitment platforms. Technical recruitment data refer specifically to positions related to R&D, engineering, technical support, and system development. Managerial, administrative, and sales positions were excluded to ensure that the extracted information accurately reflects the enterprise’s technological capability needs. The recruitment texts were analyzed to identify sentences containing patterns such as “responsible for/conduct/complete + [technical task]” and “proficient in/familiar with/capable of/possess + [technical tool or method].” The extracted technical tasks and tools/methods were recorded as terms representing the enterprise’s technical capability requirements.
3.3. Analysis of Supply-Demand Relationships
3.3.1. Construction of the Enterprise Technical Problem Space and the Solution Space
From the perspective of supply–demand theory, “supply” refers to the willingness and ability of the supplier to provide products or services that meet specific needs under the premise that demand exists. In the context of patent supply–demand matching, enterprises, as the demand side, put forward technological requirements, while patents, as the supply side, provide technological solutions that can serve as references or adoption options for enterprises. The relationship between the two is established around technical problems. This section explores the associations between patent supply and enterprise technological demands, taking the technical problem as the bridge connecting supply and demand.
As discussed in
Section 3.2, from the perspective of element decomposition, enterprise technological demands consist of five content elements: material, method, efficacy, product, and application. Different types of elements correspond to various technological problems—for instance, substitutability problems of materials, process complexity problems of methods, and performance enhancement problems of efficacy. According to TRIZ theory, inventive problems can be categorized into six major types:
- (1)
Contradiction problems—how to improve product quality or functionality without increasing resource consumption;
- (2)
Diagnostic problems—how to identify and prevent system defects or deficiencies;
- (3)
Trimming problems—how to simplify system structure or functions to reduce costs;
- (4)
Analogical problems—how to transfer existing knowledge, technologies, or processes to new contexts;
- (5)
Combinatorial problems—how to integrate different existing solutions to form a more optimal one;
- (6)
Generative problems—how to propose entirely new technological solutions to satisfy unmet needs.
These types of inventive problems not only guide technological invention and innovation but also correspond to the core issues enterprises face when expressing their technological demands, thereby forming the theoretical foundation for demand identification and classification. However, in practice, enterprises encounter more specific and diverse technical problems.
Building on the six categories of inventive problems in TRIZ theory, this study combines real enterprise technological demands with a large language model–based automatic extraction and human-assisted classification method to identify and categorize enterprise technical problems, thereby revealing the underlying problem types within enterprise technology demands. A total of 9000 enterprise technology demand texts were collected from 12 provincial-level technology transfer and commercialization service platforms, covering the fields of new energy, new materials, advanced manufacturing, biomedicine, and information technology. These data comprehensively represent the technological needs of Chinese enterprises across diverse fields, ensuring the breadth and representativeness of the sample.
Since enterprise technical problems are expressed in sentence form, a large language model was used to extract problem statements from the demand texts. Specifically, a prompt was designed to guide the DeepSeek-R1 model in automatically identifying technical problem sentences within enterprise demands. Filtering rules were constructed using characteristic keywords such as “difficult,” “bottleneck,” “impact,” “poor,” “insufficient,” “low efficiency,” “high cost,” and “poor adaptability” to improve the accuracy of technical problem identification.
Following the automatic classification of enterprise technical problems by the large language model, human-assisted classification was conducted to refine and validate the results. This study expanded upon the six original inventive problem types in TRIZ theory and developed ten categories of technical problems that better align with enterprise technological demands. These problem categories correspond to the five content elements of enterprise technological demands, collectively forming the Enterprise Technical Problem Space (Question-Space), which characterizes the contradictions and needs underlying enterprise technology demands, as shown in
Table 3.
In the supply side analysis, this study aims to identify the content elements of patent supply. To ensure consistency with the method used for identifying enterprise demand elements, the BERTopic model was applied to perform topic clustering analysis on patent texts. A total of 100,000 patents in the field of lithium battery technology were selected as the sample data. Through model-based clustering, the study systematically identified and extracted the main element types of patent supply as follows: (1) Material element: the material basis entities on which patent technologies rely; (2) Method element: the operational processes, technical means, or implementation paths used to achieve technological objectives; (3) Efficacy element: the quantifiable performance outcomes achieved by patent solutions; (4) Product element: the functional carriers delivered by patent technologies; and (5) Application element: the specific application scenarios, deployment environments, or service domains of patent technologies.
In the process of supply–demand matching, patents provide technical responses to enterprise demands at the level of content elements, thereby forming the patent solution space (Solution-Space). This space aggregates patents that embody different inventive principles, reflecting the innovative paths and methodological features through which patents address enterprise technological problems. According to TRIZ theory, patent inventive principles can be summarized into forty classical paradigms, each representing a distinct mode of innovation. However, this classification framework presents two main limitations when applied to the categorization of patent solutions. First, the forty inventive principles are relatively abstract and simplistic, making it difficult to directly align them with the actual technical content of patents. Second, an individual patent often integrates multiple inventive principles, and applying the original TRIZ classification directly can lead to category dispersion, high overlap, and insufficient differentiation. Therefore, based on TRIZ theory and the actual technical content of patents, this study refines and reconstructs the system of inventive principles to develop a more representative and discriminative classification framework for patent supply solutions.
Using the aforementioned patents as the analytical dataset, this study conducts a classification analysis of patent supply solutions. First, a structured prompt template was designed to guide the large language model DeepSeek-R1 in interpreting each patent abstract and automatically matching it with the forty inventive principles of TRIZ, thereby labeling each patent with its corresponding inventive principle category. Second, based on the model’s automatic matching results, patents with semantically or conceptually similar inventive principles were merged into unified categories. Finally, with human-assisted review and inductive analysis, the merged categories were refined and named according to their technical characteristics, resulting in fifteen representative and distinct types of patent supply solutions, thus constructing the patent solution space, as shown in
Table 4 [
34].
Based on the decomposition of supply-demand elements and the analysis of technical problems and solutions, a correspondence exists between the demand side of enterprises and the supply side of patents in terms of element structure and problem-solving logic. The essence of supply-demand matching lies in establishing a logical correspondence between technical problems and solutions through content elements, thereby constructing a complete mapping system from demand problems to patent solutions.
3.3.2. Supply-Demand Correlation Based on Element Relationships
Integrating the logic of supply–demand matching, the concept of element decomposition, and the TRIZ theory, this study classifies supply–demand correlations into two categories, explicit correlation and implicit correlation, according to the correspondence of supply–demand elements and the type of demand. A hierarchical and progressive correlation path is established to illustrate how the completeness of enterprise demand expression influences the pattern of supply–demand matching.
- (1)
Explicit supply-demand correlation
The explicit supply–demand correlation focuses on explicit technological demands of enterprises. Such demands possess a complete structure of five content elements, allowing for point-to-point alignment with patent supply elements at the content level.
Explicit supply–demand correlation is established on the basis of element correspondence. Both enterprise demands and patent supplies contain five categories of content elements, forming direct matching relationships in terms of content, as illustrated in
Figure 2. In terms of element characteristics, this correlation emphasizes the symmetry and correspondence between supply and demand elements across content dimensions. Each demand element proposed by the enterprise can find a corresponding supply element in the patent data, while the patent supply provides responses within the same element dimension, thereby achieving direct content-level matching from the technical problem to its corresponding solution.
From a theoretical perspective, explicit supply–demand correlation reflects the fundamental principle of supply–demand theory and aligns with the logical framework of technical problem solving. Both sides achieve effective matching through element-level mapping between enterprise technological demands and patent supplies.
- (2)
Implicit supply-demand correlation
Implicit correlation arises when enterprise technology demands are incomplete in terms of elements. In such cases, semantic matching between supply and demand is achieved through category supplementation, problem identification, and solution reasoning. This type of correlation targets implicit technology demands, whose central characteristic is the absence of direct element correspondence. Thus, multi-source information and reasoning mechanisms are required to uncover potential supply categories for matching.
In practice, to effectively address implicit technology demands, the original demand expressions must undergo category supplementation, problem abstraction, and element mapping, gradually transforming fuzzy expressions into content that can be matched by patent supply. This study further divides implicit correlation into two types: demand expansion-driven correlation and technical problem-driven correlation.
- ➀
Demand expansion-driven implicit correlation
This type relies on multi-source data to supplement implicit demand categories and construct demand content that can be responded to by patent supply. The correlation chain is expressed as “Enterprise implicit technology demand → Limited element identification → Demand category supplementation → Patent supply response,” as shown in
Figure 3.
In this pathway, the supply–demand correlation consists of four main steps. First, element information is extracted from the enterprise’s implicit technology demand texts. Second, based on the implicit technology demand category supplementation method established in
Section 3.2 the categories of implicit technological demands are defined. Third, the supplemented demand categories are combined with the identified elements to serve as the basis for patent supply identification. Finally, patents that meet the enterprise’s needs are matched by using both the demand categories and elements as filtering conditions, thereby achieving a responsive link between fuzzy enterprise demands and patent supply.
In terms of element characteristics, the demand expansion–driven implicit correlation pathway exhibits asymmetry between supply and demand elements. Enterprise implicit technology demands usually contain only partial demand elements, making it difficult to achieve one-to-one correspondence with the five categories of patent supply elements. Therefore, it is necessary to compensate for the missing elements through demand category expansion.
From a theoretical perspective, the demand expansion–driven implicit correlation pathway reflects the core concept of “demand identification and supply adaptation” in supply–demand theory. This pathway emphasizes supplementing unexpressed technological needs through external information and transforming them into a basis for patent supply–demand matching.
- ②
Technical problem-driven implicit correlation
This type applies when enterprise implicit demands are expressed in vague terms. The process first identifies potential types of technical problems, and then uses these problem types together with demand elements as intermediaries to infer corresponding patent solutions. The correlation path follows progressive steps: “Enterprise implicit technology demand → Technical problem identification → Element attribution → Solution mapping → Patent supply matching,” as illustrated in
Figure 4.
In this pathway, the establishment of the supply–demand correlation involves four main steps. First, the types of technical problems are identified from the enterprise’s implicit technology demand texts. Second, these technical problems are mapped to their corresponding demand content elements, achieving an initial alignment between problems and demand elements. Third, once the demand elements are defined, their corresponding supply elements on the patent side are identified and mapped to the fifteen categories of patent supply solution types, enabling reasoning and inference of potential patent solutions. Finally, patents with the capability to respond to enterprise demands are identified based on the corresponding patent supply solution types, thus achieving a reasoning-based matching process from demand problems to technological solutions.
In terms of element characteristics, the technical problem–driven implicit correlation pathway also exhibits asymmetry between supply and demand elements. However, this pathway is primarily driven by technical problems, which serve as intermediaries connecting the demand and supply sides. By locating the technical problem, the related demand elements can be determined, and subsequently, the corresponding patent supply elements can be matched. This process forms a supply–demand reasoning chain of “demand problem → demand element → supply element → solution”.
From a theoretical perspective, this pathway embodies the core concept of TRIZ theory, which focuses on deriving inventive principles and finding solutions based on identified technical problems. The technical problems summarized from enterprise implicit demands are addressed through element correspondence and solution inference, thereby providing logical support and interpretability for the matching of implicit technological demands.
In summary, the correlation between enterprise technology demands and patent supply is jointly influenced by the explicitness of demand expression, the completeness of content elements, and the responsiveness of patent supply. Explicit correlation applies to explicit enterprise demands, where the supply side can respond directly through element correspondence. Implicit correlation applies to implicit enterprise demands, which require information supplementation, problem identification, and reasoning mechanisms to construct a viable supply-demand matching chain.
3.4. Research on Demand-Driven Patent Supply Information Retrieval and Recommendation Methods
3.4.1. Patent Supply Identification and Retrieval Under Demand Orientation
Before implementing patent recommendation, it is necessary to construct a supply set of patents tailored to specific technological demands. From a large pool of patents, a subset with potential semantic relevance to a particular demand must be identified, serving as the foundation for subsequent supply-demand correlation scoring and recommendation. Centered on the core logic of “selecting supply based on demand,” this study, in combination with supply-demand mapping relationships, the content element framework, and patent solution classifications, proposes identification and retrieval strategies for patent supply targeting different types of enterprise technological demands.
- (1)
Patent supply identification for explicit technological demands
Explicit demands enable enterprises to clearly articulate their specific requirements in terms of materials, methods, efficacy, products, and applications. Such demands provide the necessary basis for direct identification of corresponding patents. For this type of demand, this study introduces a demand organization method centered on the “Demand Lexical Tree.” This method constructs a hierarchical and extensible terminology system of technological demands to support patent supply identification driven by explicit demands.
The “Demand Lexical Tree” is grounded in lexical field theory and knowledge ontology hierarchy. It takes the five core content elements of enterprise technological demands as the primary branches and builds multi-level sets of terms organized by hypernym-hyponym relations, synonymous expressions, and semantic associations. Together, these form a multi-layered network of demand terminology, as shown in
Figure 5 [
35].
Construction of the Demand Lexical Tree combines manual annotation with large language model-assisted classification. First, demand terms appearing under the five content elements are manually organized and preliminarily classified to ensure accuracy and representativeness. Second, a large language model is employed to expand the terminology, guided by prompts to understand the semantic and contextual attributes of each demand. This expansion generates synonymous expressions, scenario-specific substitutes, and hierarchical extensions of demand terms, assisting in determining hypernym-hyponym or co-level classification relationships.
The primary purpose of constructing the Demand Lexical Tree is not to expand the coverage of potential patent supply, but to clarify and constrain the semantic boundaries of patent retrieval. This improves the adaptability and efficiency of patent filtering while enhancing the consistency of demand expressions within the same technological field.
- (2)
Patent supply identification for implicit technological demands
Implicit demands are characterized by vague expressions and missing elements, making it difficult to directly identify corresponding patents through keyword matching. Based on the two types of supply-demand correlations, this study designs a patent supply identification strategy for implicit demands, which integrates category supplementation, content element mapping, and solution reasoning.
First, at the category recognition level, implicit demands are expanded through the category supplementation method, using external data sources such as enterprise basic information, historical R&D records, and capability requirements. This step establishes the semantic boundaries for supply retrieval.
Second, at the supply content expansion level, the limited element information already identified from implicit demands is transformed into corresponding term subsets as a retrieval basis. In parallel, the technical problem types underlying enterprise demands are identified, thereby enabling the conversion of demand expressions into problem statements. Once identified, these problems are mapped to corresponding patent solution categories following the path: “Enterprise implicit technological demand → Technical problem identification → Element attribution → Solution mapping → Patent supply matching.” Based on these mappings, patent keyword sets are constructed according to the TRIZ inventive principles associated with each solution type.
Finally, the method for identifying patent supply targeting implicit demands integrates three key sources of information: (1) limited element information extracted directly from implicit demands; (2) demand categories supplemented through external multi-source data; and (3) patent solution keyword sets derived from TRIZ inventive principles. Collectively, these form a retrieval-oriented keyword system for implicit demands, transforming vague demand expressions into actionable conditions for patent supply identification.
3.4.2. Research on Patent Recommendation Methods Based on Supply-Demand Elements
In view of the differences in element completeness between explicit and implicit technological demands, this study proposes a dual-path patent recommendation framework. For explicit technological demands, the BERT model is employed to construct semantic embedding vectors for the five core content elements, namely materials, methods, efficacy, products, and applications, and to calculate both content similarity and element coverage. BERT is chosen because it effectively captures contextual and bidirectional semantic relationships within technical texts, enabling a more accurate representation of domain-specific language and improving the precision of supply–demand semantic matching. For implicit technological demands, where element expressions are incomplete, a dual strategy combining BERT-based semantic modeling and BM25 keyword retrieval is adopted to compute content similarity and category matching, respectively.
In both types of demand, content similarity serves as the primary indicator. It measures the degree of semantic correspondence between the patent text and the enterprise’s technological demand, reflecting the extent to which the patent content aligns with the element-level semantics of the demand. Because explicit demands contain all five content elements and thus allow direct element-level alignment, element coverage is introduced to evaluate whether a patent comprehensively addresses all five dimensions of enterprise needs, thereby indicating the completeness and adaptability of the proposed technological solution. By contrast, implicit demands lack complete element information. As defined earlier in this study, their demand categories are supplemented through multi-source data. Consequently, category matching is employed to assess the consistency between patents and the supplemented demand categories, including TRIZ-based solution keywords, thereby reflecting the rationality of supply–demand matching at the domain and directional levels.
- (1)
Patent recommendation method for explicit technological demands
Explicit demands are expressed with complete and clear coverage of all five elements, making them suitable for element-level matching based on semantic embedding models. In this study, a semantic vector space model is constructed using the BERT model, and the responsiveness of patent supply is quantified from two perspectives: content accuracy and element coverage [
36].
- ➀
Embedding construction for explicit demand elements
Based on the five types of content element information extracted from enterprise demand texts in the earlier stage (materials m, methods a, efficacy e, products p, and applications u), an explicit demand element set is constructed. In this set,
denotes the set of terms under the i-th element type in the enterprise technology demand.
The BERT model is employed to perform contextual semantic encoding for each term
, yielding its vector representation:
To construct a unified semantic representation for each element type, a mean pooling strategy is applied to aggregate the set of term vectors [
37], thereby obtaining the element-level vector representation:
Finally, each explicit technological demand can be represented as a set of five embedding vectors , which serves as the core representation basis for supply-demand semantic matching.
- ➁
Semantic representation construction for patent texts
To obtain the semantic embedding representation
of a patent document
, this study adopts a deep semantic representation approach. Specifically, the title and abstract fields of each patent
are concatenated and then fed into the BERT model to generate a unified semantic embedding vector [
38]. This vector is subsequently used for semantic matching with the demand element embeddings. The formulation is as follows:
- ➂
Supply-demand semantic matching score: content accuracy (Accuracy)
Cosine similarity is employed to compute the semantic matching score between the five types of demand elements and each patent text [
39]. Specifically, the semantic similarity between the embedding of element
and the semantic representation of patent
is calculated as follows:
Following multi-metric score fusion methods and their applications in patent semantic matching, the similarity scores of the five elements are averaged with equal weights to obtain the overall content accuracy of patent
, denoted as
. This metric evaluates the patent’s overall responsiveness to all element categories at the semantic level and reflects the precision of its content matching:
- ➃
Element integrity assessment: coverage
Referring to the definition and evaluation methods of coverage in recommendation studies, this study designs the “Coverage” indicator to evaluate how well a patent responds to the five demand element categories. Coverage is defined as the ratio of the number of element categories for which a patent provides an effective match to the total number of elements:
where
is an indicator function. If the similarity score
exceeds the threshold
, the element is considered to be effectively matched. This indicator measures the breadth of coverage of enterprise technological demands by a given patent, reflecting the patent’s ability to serve as a solution.
A threshold that is set too high may filter out potentially relevant elements, reducing the comprehensiveness of coverage. Conversely, a threshold that is set too low may introduce noise with insufficient semantic relevance, weakening the discriminative power of the coverage metric. Following common practices in recommendation systems and semantic similarity measures based on cosine similarity [
40], this study sets
.
- ⑤
Comprehensive scoring of patent supply
By combining the two indicators, content accuracy (
) and element coverage (
), this study constructs a comprehensive evaluation function for patent supply under explicit demand scenarios. Here,
is a weight parameter. To balance matching precision and element coverage,
is set to 0.5 in this study. This value indicates that both indicators are equally important in the comprehensive evaluation. It ensures that the recommended patents achieve high semantic matching precision while maintaining complete coverage of the five demand elements, thereby enhancing the overall comprehensiveness and reliability of the patent supply evaluation.
- (2)
Patent recommendation method for implicit technological demands
Implicit demands cannot be effectively addressed through direct element-level matching. To tackle this challenge, this study proposes a dual strategy that integrates BERT semantic modeling with the BM25 algorithm, performing selection and recommendation from two perspectives: content accuracy and category matching.
Content accuracy measures the extent to which patent supply responds to the already identified content elements of an implicit demand. Category matching, in turn, is based on the category labels of implicit demands and the keyword sets of patent supply solutions, using the BM25 algorithm to calculate the relevance between patent texts and category keywords.
For the limited element information contained in implicit demands, BERT embedding is employed with a mean pooling strategy to obtain element-level semantic vectors. Semantic similarity scores are then computed between demand element vectors and patent vectors in the BERT semantic space, and their equal-weight average is taken as the content accuracy.
The BM25 algorithm is applied to compute the relevance score between the keyword set of implicit demands and the patent supply texts [
41]:
Here,
denotes the frequency of keyword ttt in patent document
, and
represents its inverse document frequency. The empirical parameters are set as
[
42]. This metric reflects the degree of alignment between patent content and domain semantics as well as solution categories, serving as a supplement to content similarity evaluation.
By integrating semantic similarity and category matching, this study constructs a comprehensive scoring function for patent supply under implicit technological demands, denoted as
.
Considering that implicit demands, due to missing elements, rely more heavily on semantic understanding and alignment to achieve effective supply-demand matching, this study increases the weight of semantic similarity and sets . This setting maintains a certain degree of category coverage while emphasizing the central role of semantic alignment in matching implicit demands.
Finally, all candidate patents are ranked in descending order according to their comprehensive scores, and the top ten patents are selected as the recommendation results. The choice of ten patents is based on three main considerations. First, from the perspective of experimental analysis, selecting ten patents provides sufficient samples to fully verify the effectiveness of the proposed recommendation method while maintaining a concise set that facilitates comparative evaluation. Second, from the perspective of content comparison, the Top-10 range ensures adequate diversity and representativeness, clearly illustrating the quantitative differences among the recommended patents. Third, from a practical application standpoint, enterprises typically focus on the top-ranked patents when assessing recommendation results; therefore, presenting the top ten patents aligns well with actual enterprise decision-making behavior.