Skip Content
You are currently on the new version of our website. Access the old version .
EntropyEntropy
  • Article
  • Open Access

30 January 2026

Link Prediction of Green Patent Cooperation Network Based on Multidimensional Features

,
,
and
School of Economics and Management, University of Science and Technology Beijing, Beijing 100083, China
*
Author to whom correspondence should be addressed.
This article belongs to the Section Multidisciplinary Applications

Abstract

The regional green patent cooperation network describes the structural characteristics of regional collaborative innovation, and the link prediction of the network can anticipate the overall evolution trend, as well as help organizations identify potential partners for technology collaboration. This paper proposes a link prediction model based on multidimensional features, which integrates prediction indicators of node features, path features, and content features. In the model, the entropy weight method is employed to integrate various node similarity indicators, the heterogeneous influence of intermediate links and nodes is incorporated to fully emphasize the issue of heterogeneous paths, and the content similarity feature indicator based on patent text topic analysis integrates multiple distance similarity metrics. To improve prediction accuracy, the Grey Wolf Optimizer (GWO) method is adopted to determine the optimal weights for the three-dimensional indicators. The comparative experimental results show that the multidimensional prediction model can improve prediction accuracy significantly. Finally, the proposed prediction model is applied to forecast the green patent cooperation network in the Beijing-Tianjin-Hebei region of China, and the prediction results are discussed based on the distribution of agent types and regional distribution.

1. Introduction

The patent cooperation network depicts the cooperative relationships formed by various patent applicants through joint patent filings. It constitutes a complex network encompassing inter-enterprise collaboration, collaboration between enterprises and universities, as well as collaboration between enterprises and research institutions. As an important manifestation of collaborative innovation among innovation agents, studies on the patent cooperation network can reveal the characteristics and pathways of collaborative innovation [1,2]. In recent years, to promote the innovation-driven effect of developed regions on their surrounding areas, the Chinese government has formulated strategies for regional collaborative innovation development. Through measures such as resource allocation, industrial restructuring, and tax incentives, the government aims to boost innovation cooperation among sub-regions within a major regional cluster. Typical examples of such collaborative development regions include the Beijing-Tianjin-Hebei region, the Yangtze River Delta region, and the Pearl River Delta region. Against this policy backdrop, numerous scholars have conducted research on intra-regional patent cooperation networks to evaluate the effectiveness and dynamics of regional collaborative innovation. Meanwhile, in the process of regional collaborative innovation, green technology innovation has attracted growing attention due to its significance for regional ecology, environmental protection, and sustainable development. Therefore, it is of great practical significance to study regional green patent cooperation networks, analyze their structural characteristics, and reveal their evolutionary trends through link prediction. On the one hand, predicting the future development trends of green technology innovation cooperation within a region enables the early identification of network structural deficiencies, the evaluation of the driving effect of collaborative innovation, and the optimization of policies for green technology collaborative innovation. On the other hand, link prediction can assist patent applicants in identifying potential partners and facilitating the realization of collaborative innovation.
Although extensive research on link prediction has been conducted based on knowledge cooperation networks, few studies have specifically focused on link prediction for regional green patent cooperation networks. Against this backdrop, this paper aims to construct a regional green patent cooperation network and perform link prediction to uncover its evolutionary trends. At the methodological level, the primary challenge lies in selecting appropriate prediction indicators and constructing an effective prediction model to enhance prediction accuracy. Existing link prediction studies have employed a wide array of indicators, such as those based on node similarity, path similarity, and content similarity, with each category encompassing multiple specific indicators. Scholars typically combine one or more types of indicators to improve prediction accuracy [3,4,5]. Building on existing research, this paper proposes a novel hybrid model based on multidimensional features. Compared with previous related studies, the proposed model exhibits the following distinctive features: (1) For indicators within the same category, a multi-indicator coupling method based on the entropy weight method is introduced. (2) The heterogeneity of intermediate links and intermediate nodes in the network is simultaneously incorporated into path features. (3) Content similarity, node similarity, and path feature indicators are all integrated into the prediction model, and a method for calculating the optimal weights of these three indicator categories is proposed.
The remainder of this paper is organized as follows. Section 2 reviews the related research. Section 3 elaborates on the construction method of the prediction model. Section 4 presents prediction experiments conducted on a green patent dataset of the Beijing-Tianjin-Hebei region and provides a comparative analysis of multiple prediction models. Section 5 performs a predictive network analysis of link prediction results for the Beijing-Tianjin-Hebei green patent cooperation network and offers policy recommendations for the development of green technology collaborative innovation in the region.

3. Research Method

To achieve link prediction for regional green patent cooperation networks, a comprehensive model based on multidimensional indicators is proposed. This model integrates predictive indicators from three dimensions: node features, path features, and content features. In terms of node features, the entropy weight method is employed to couple traditional node similarity indicators, enhancing the universality of the indicators. Regarding path features, the heterogeneous influences of intermediate links and intermediate nodes are integrated to fully emphasize the issue of heterogeneous paths. For content features, the LDA model is used to construct topic set vectors for nodes, and multiple similarity distances are integrated to calculate content similarity (CS). The methodological framework of the model construction is illustrated in Figure 1. In the framework, the network is composed by nodes and edges, and the letters in the node circle means the number of the patent applicants.
Figure 1. Method framework for this study.

3.1. Predictive Indicator Based on Node Similarity Metric (Coupling)

The network constructed based on patent cooperation data can be represented as G = (N, E, W), where N represents the set of nodes in the network, E represents the set of edges in the network, W represents the edge weight, which is the number of collaborations of the entities, and |N| represents the total number of network nodes.
Link prediction considering node feature similarity holds that the higher the similarity of nodes, the greater the possibility of cooperation [15]. For example, the CN index refers to the number of common neighbors the two nodes share; the more likely the two nodes are to cooperate. Node similarity indicators have evolved over the years, and there are currently 10 commonly used indicators. The formulas are shown in Table 2. In the formula, k x represents the degree of node x, Γ x and represents the neighbor nodes of node x.
Table 2. Definitions of 10 classic node similarity indicators.
The prediction accuracy of each of the above indicators varies in different networks. Therefore, in this paper, the 10 node similarity indicators are coupled using the entropy weight method, and the constructed coupled indicator is denoted as Coupling, as shown in Formula (1):
S x y C o u p l i n g = a 1 S x y C N + a 2 S x y L H N + a 3 S x y J a c c a r d + a 4 S x y P A +
Among them, a i represents the weight value based on the entropy weight method, a i 0 ,   1 . S x y C o u p l i n g represents the similarity score of nodes x and y under the coupling index, S x y C N represents the similarity score of nodes x and y under the common neighbor (CN) structure, and S x y L H N is similar.
The entropy weight method draws on the concept of entropy in thermodynamics and uses information entropy to describe the amount of information of an event. In this paper, it represents the amount of information that each indicator accounts for in node features: the smaller the entropy value, the greater the dispersion degree of the indicator, and thus the greater its contribution to indicator ranking. For weight calculation using the entropy weight method, the first step is to perform sum standardization on each indicator, as shown in Formula (2).
p i j = R i j j = 1 10 R i j  
where j denotes the indicator number, which ranges from 1 to 10 in this paper; i denotes the node pair number. Secondly, calculate the information entropy value of each indicator for all node pairs, as shown in Formula (3).
E j = 1 l n m i = 1 m p i j l n p i j  
where m represents the total number of node pairs to be predicted. Since the entropy value is inversely proportional to the contribution, a positive relationship transformation is performed on the entropy value, as shown in Formula (4).
a j = 1 E j j = 1 n ( 1 E j )  

3.2. Predictive Indicator Based on Heterogeneous Path Metric (HP)

The path characteristics in the cooperation network can also accurately predict the cooperative relationship. However, when constructing the path characteristics, it is necessary to distinguish the influence of different paths on the prediction results, so as to further improve the prediction accuracy of the path characteristics. In this study, in order to distinguish the heterogeneity of the paths between two entities, the heterogeneity of the intermediate edges and the heterogeneity of the intermediate nodes are simultaneously introduced into the path characteristics.
Intermediate edge heterogeneity: For a link with multiple intermediate nodes, the larger the degrees of the two end nodes, the more dispersed their cooperation intentions. Thus, the less likely these two nodes will cooperate [26]. Here, the product of the degrees of the two end nodes is used to represent the link weight, that is, the heterogeneity of the intermediate edge. The link weight is negatively correlated with the cooperation probability. Therefore, for two nodes x and y in the network with a link between them, the influence of the link weight can be expressed as ( k x   k y ) 1 . For the intermediate nodes between nodes x and y, all the intermediate nodes are represented by a set, denoted as Z = Z 1 , Z 2 , , Z M , and the total heterogeneity of the intermediate edge of this intermediate node set is denoted as S x y L , which is expressed by Formula (5):
S x y L = k x k Z 1 1 + m = 1 M 1 k Z m k Z m + 1 1 + k Z M k y 1
Intermediate node heterogeneity: Even with the heterogeneity of edge connections, when the product of the degrees of the nodes in the intermediate links is equal, it still cannot effectively differentiate the impact of paths on cooperation. Therefore, further consider the heterogeneity of intermediate nodes as a weighted weight to enhance the discrimination of path heterogeneity. Intermediate node heterogeneity refers to the higher the connectivity between intermediate nodes, the more stable the connection between end nodes, and the higher the possibility of cooperation [25].
For node pairs x and y in the network that have at least one intermediate node, all the intermediate nodes are recorded as a whole Z, and the number of intermediate nodes is represented by |Z| = M. When M = 1, it is a second-order path, and its connectivity is represented as whether the intermediate node Z and the two neighboring nodes x and y can form a triangular ring. | Z | represents the actual number of triangular rings formed by all the surrounding neighbors of Z, and | Z Λ | represents the number of all possible triangular rings that the surrounding neighbors of Z can form. Then, the second-order path heterogeneity score of the intermediate nodes Z of nodes x and y is recorded as Formula (6):
S x y N = Z Z Λ
When M = 2, it becomes a third-order path, and its connectivity lies in whether the intermediate node set Z = Z 1 , Z 2 and the two neighboring nodes x and y can form a quadrilateral loop. At this time, the influence of the intermediate node set on similarity is similar to that when M = 1, that is, the actual number of quadrilateral loops formed is divided by the possible number of quadrilateral loops.
Considering the characteristic that the prediction accuracy decreases rapidly with the increase of path length, as well as the time complexity issue of the calculation process, this paper sets the number of intermediate nodes to M = 2. The heterogeneity path (HP) index S x y H P based on the intermediate edges and intermediate nodes is constructed, and is expressed by Formula (7):
S x y H P = S x y L + S x y N M = 1 + α S x y L + S x y N M = 2
Among them, S x y H P represents the heterogeneity score of nodes x and y based on the heterogeneity of intermediate edges and intermediate nodes, S x y L represents the heterogeneity score of nodes x and y based solely on the heterogeneity of intermediate edges, S x y N represents the heterogeneity score of nodes x and y based solely on the heterogeneity of intermediate nodes, and α is the weight value of the third-order path heterogeneity index, with α 0 ,   1 .

3.3. Predictive Indicator Based on Content Similarity Metric (CS)

In addition to the network structure information, the technical field of the patent applicant also has a significant impact on the cooperative relationship. Therefore, the content similarity between nodes needs to be introduced into the prediction model. The specific steps are as follows: The patent abstract is taken as the main content for text analysis. The LDA model is used to divide the patent abstract into themes. Then, vector similarity indicators such as the Jaccard coefficient, Euclidean distance, and Manhattan distance are utilized to calculate the content similarity of the patent, representing the degree of association of the applicant’s technical field.
Before applying the LDA model for topic classification, the optimal number of topics needs to be determined first. In this paper, the CV consistency score method is used to calculate the optimal number of topics K. Based on this, the LDA model is applied to calculate the topic distribution probability of each patent, and the topic with the highest probability value is selected as the topic of that patent. The calculation of probability is as shown in Formula (8).
P w d = P w t · P t d
Among them, P(w|d) represents the probability of word w in document d, P(w|t) represents the probability of word w in a specific topic t, and P(t|d)represents the probability of document d in a specific topic t.
After obtaining the topic of each patent, the topic is assigned to each applicant entity of the patent, indicating that the research direction of this applicant includes this topic. After assigning the topics one by one, a topic set vector can be formed for each applicant, representing the complete set of research directions of this applicant. Finally, the patent content similarity between applicants is calculated based on the topic set vectors. To improve the prediction effect, six common vector similarity indicators were selected, as shown in Table 3. Based on the experimental results, the three indicators with the best prediction accuracy were selected, and then the entropy weight method was used for coupling.
Table 3. Vector similarity indicators.
The formula of similarity index for the coupled content is shown as follows:
S x y C S = b 1 S x y 1 + b 2 S x y 2 + b 3 S x y 3
Among them, b 1 , b 2 , b 3 represent the weight values based on the entropy weight method, and b 1 , b 2 , b 3 0 ,   1 . S x y C S represents the content similarity score of nodes x and y under the coupling index, while S x y 1   , S x y 2 , S x y 3 are the top three vector similarity indicators.

3.4. Comprehensive Index NPC Based on Network Topology Structure and Content Similarity

By integrating the coupling node similarity index Coupling, the path heterogeneity index HP, and the content similarity index CS, a new link prediction index is constructed and denoted as NPC (Node & Path & Content index). Its calculation is as shown in Formula (10):
S N P C = x 1 S C o u p l i n g + x 2 S H P + x 3 S C S
Among them, x 1 , x 2 , x 3 are the optimal weight values of each indicator obtained through mathematical programming. The mathematical model is as Formula (11):
m i n : f = A U C S N P C = A U C x 1 S C o u p l i n g + x 2 S H P + x 3 S C S S . T . x 1 + x 2 + x 3 = 1 0 x 1 , x 2 , x 3 1
Among them, A U C ( S N P C ) represents the prediction accuracy of the NPC indicator, while S N P C , S C o u p l i n g , S H P and S C S are the similarity score matrices generated for the corresponding indicators.

4. Empirical Research

4.1. Data Collection and Organization

This paper selected the cooperative data of green invention patents in the Beijing-Tianjin-Hebei region from 2020 to 2023. The selected case is located in the Beijing-Tianjin-Hebei (BTH) region of China. Since 2014, this region has been designated as a national priority under the Beijing-Tianjin-Hebei Coordinated Development Strategy, whose connotation encompasses both collaborative innovation and green sustainable development within the region. Geographically, the region comprises Beijing, China’s capital city, and Tianjin, a directly administered municipality, both of which are surrounded by Hebei Province, a less economically developed area. The motivation behind the BTH Coordinated Development Strategy is to drive the integrated development of Tianjin and Hebei through the radiating effect of Beijing, covering such dimensions as talent sharing, industrial restructuring, resource transfer, and green development. Therefore, the selected case serves as a typical representative of regional green technology collaboration under China’s coordinated development policies, and thus holds notable representativeness and significance for research.
The data were sourced from the IncoPat global patent database. During the data search, the green patent classification number, applicant address, application time, number of applicants, applicant type, and business registration address were limited in the initial search formula. Among them, the green patent classification number was a patent classification number compiled based on the Chinese “International Patent Classification Table” and the “International Patent Classification (IPC) Green Inventory” (https://www.wipo.int/classifications/ipc/green-inventory/home, accessed on 10 December 2025.) and the applicant address and business registration address were for the three regions of Beijing, Tianjin, and Hebei.

4.2. Network Analysis

Based on the cooperative data, the green patent cooperative network in the Beijing-Tianjin-Hebei region generated by Gephi software (v0.10.1) is shown in Figure 2. This network contains 1541 network nodes and 2246 edges. The degree of the applicant is reflected by the size of the node, and the number of cooperation times is reflected by the width of the edge. Nodes of different colors represent different levels of clustering. Observations show that the average distance between network nodes is relatively small, and the degree distribution of the nodes is very uneven, with strong heterogeneity, and there are some applicant entities with very high degrees. Compared with a random network of the same scale, it was found that this network has significantly smaller average distances than the random network and significantly larger clustering coefficients than the random network, which conforms to the small-world characteristic. In the double-logarithmic coordinate system, the degree and degree distribution of the cooperation network are approximately linear, and meet the significance requirements, thus it can be considered that the network has scale-free characteristics. The small-world and scale-free characteristics indicate that this network is suitable for using link prediction algorithms of social networks for prediction.
Figure 2. Green patent cooperation network in the Beijing-Tianjin-Hebei region.

4.3. Link Prediction

This paper selects the Area Under Curve (AUC) metric to evaluate the prediction effect of the model. Compared to accuracy, recall rate, and F1, among other binary classification evaluation metrics, the unique advantage of AUC lies in its focus on ranking results rather than specific scores. Therefore, it is more suitable for evaluating ranking-type predictions. AUC is the area under the Receiver Operating Characteristic (ROC) curve. There are different calculation formulas for it in machine learning and complex networks. In this experiment, the formula based on complex networks is chosen for prediction evaluation.
In the experiment, the network edge connections are divided into the training set T1, the test set T2, and the set of non-existent edges T3. The proportion of the training set in the network edge connections is 90%, and the test set is 10%. The sampling method for T3 is as follows: construct an adjacency matrix for all nodes, and then form the T3 dataset using all node pairs (including T1 and T2) with no direct connections in the adjacency matrix. Additionally, T3 needs to maintain a 1:1 ratio with the positive samples T2 in order to avoid accuracy distortion caused by sample bias [23]. To avoid the randomness of the experiment, the random division of the training set is taken 100 times. Each division will obtain a set of similarity scores for each indicator, and the average value is taken as the final score for each indicator. Each time, a random edge is selected from the test set T2, and a random edge is selected from the set of non-existent edges T3. If the similarity score of the edge selected from T2 is greater than that of T3, then the result of this random experiment is recorded as “add 1 point”; if the two values are equal, it is recorded as “add 0.5 points”. After independent comparison for n times, the arithmetic average of the experiments is taken as the evaluation value of the precision. The specific calculation is shown in Formula (12):
A U C = n + 0.5 n n
In the formula, n represents the number of times out of n experiments that the score of T2 was higher, n represents the number of times the two were equal, and the sampling frequency n is set to 672,400 based on experience. Drawing on similar studies, when the number of sampling iterations n is set to 672,400, the absolute error of AUC can be controlled within 0.001 at a confidence level of 90%, reaching a relatively stable state. Therefore, this paper sets the number of sampling iterations n to 672,400 [24].

4.3.1. Link Prediction Experiment Based on Node Similarity

First, predictions are made based on the 10 node similarity indicators given in Table 1. The results are shown in Table 4. It can be seen that the average prediction accuracy of the RA indicator is the highest (0.9171), while the average prediction accuracy of the LHN indicator is the lowest (0.8987). This suggests that the RA indicator exhibits slightly higher adaptability to this network compared to other indicators. However, the prediction values of other indicators do not differ much from this, indicating that each indicator’s prediction in this network has its merits. Therefore, this paper uses the entropy weight method to couple all 10 commonly used indicators to obtain the coupled indicators and their AUC values, and marks them in the last row of Table 3. It can be observed that the accuracy of the Coupling indicator after coupling (0.9330) is higher than that of the highest RA indicator (0.9171) among the single indicators, indicating the effectiveness of the entropy weight method in integrating indicator performance. This paper selects the coupled Coupling indicator as the node similarity indicator, and the similarity score matrix generated by this indicator is denoted as S C o u p l i n g .
Table 4. AUC values of 10 node similarity indicators and AUC values of the coupling indicators.

4.3.2. Link Prediction Based on Path Heterogeneity

Based on the previous analysis of the green patent cooperation network structure in the Beijing-Tianjin-Hebei region, it is evident that the degree of heterogeneity of the network nodes is relatively significant. Therefore, in the prediction indicators, the role of path heterogeneity needs to be emphasized. Based on Formula (7), using the given α value, the path heterogeneity of all node queues with reachable relationships is iteratively calculated to form a path heterogeneity score matrix.
To select a reasonable value for α, a sensitivity analysis was performed on α. Figure 3 illustrates the AUC accuracy of the HP indicator for various α values. It can be observed that as the α value decreases, the accuracy of the HP indicator also increases continuously, and it begins to converge when the α value is 0.1. At this time, the prediction accuracy of the HP indicator of this network is 0.8453. Therefore, the α value is set to 0.1, and the similarity score matrix generated by the HP indicator is recorded as S H P .
Figure 3. Changes in AUC accuracy of HP indicators under different α values.

4.3.3. Link Prediction Based on Content Similarity

After considering node similarity and path similarity, this section will calculate the content similarity between applicant nodes. Firstly, the patent abstracts of the training set are extracted. After tokenization processing and removal of stop words, a dictionary and corresponding corpus are established, and each word is assigned a unique number. Then, the CV consistency method is used to draw the topic consistency diagram as shown in Figure 4. The sensitivity analysis of the number of topics K reveals that the coherence is highest when K = 6, and there is a significant inflection point. Therefore, in the patent collaboration network of this paper, K = 6 is selected as the optimal number of topics.
Figure 4. Curve chart of topic consistency.
After obtaining the number of topics, an LDA topic model is constructed. Then, the topic probability distribution of each abstract is extracted from the model, and the topic with the highest probability is selected as the final topic of the patent, forming a comprehensive dataset based on patent abstracts—patent applicants—patent topics, totaling 4318 items. For each topic, several words correspond to it, representing the technical direction of the topic, as shown in Table 5.
Table 5. Topic number and topic keywords.
Then, the topic of each patent record is assigned to all applicants of the patent, and all topics linked to each applicant are aggregated to obtain the complete set of topic vectors associated with the applicant. The specific method is as follows: first, the LDA model is applied to patent texts for topic identification, generating the six aforementioned topics. Next, the topic corresponding to each individual patent is determined and assigned to the patent applicant nodes of that patent. In this way, if an applicant participates in multiple patent applications, the applicant will have a topic set—that is, a topic set vector for each network node. Then, assign the topic of each patent data to all the applicants of that patent, combine all the topics of the applicants, and obtain the set of all topic vectors associated with each applicant.
Using the six distance calculation methods listed in Table 3, the content similarity of the topic set vectors of node pairs is calculated. The AUC values corresponding to each distance are shown in Table 6. The experimental results show that the accuracy of the Jaccard, Euclidean, and Manhattan indicators is significantly higher than that of the Cosine, Pearson, and Tanimoto indicators. Therefore, these three indicators are selected, and the CS metric is calculated using Formula (9). The AUC value of CS is 0.7847, which is significantly higher than that of the single vector similarity indicator. The similarity score matrix obtained after coupling is denoted as S C S .
Table 6. AUC values of six content similarity indicators and AUC value of the CS indicator.

4.4. Link Prediction Based on NPC Metrics

After obtaining matrices S C o u p l i n g , S H P , and S C S , the three are weighted to form a new prediction model, labeled as S N P C . The weight values of S C o u p l i n g , S H P and S C S in the new model are set as x 1 , x 2 , and x 3 respectively. To obtain the optimal weights, a mathematical model with AUC of S N P C as the objective was established as described in Formula (11). The model is solved by means of the Grey Wolf Optimizer (GWO), and the optimal weights x 1 = 0.9357 , x 2 = 0.0636 , x 3 = 0.0007 , as shown in Table 7. The result indicates that S C o u p l i n g has the greatest impact on the prediction accuracy, while S C S has the least impact. The AUC obtained under the optimal weights is 0.9627, which is significantly better than that of the individual metric of S C o u p l i n g , S H P , and S C S .
Table 7. Comprehensive comparison of AUC values for each similarity metric.
From Table 7, it can also be observed that in the optimal weight combination of the NPC indicators, although the influence of S C S is very small, when it is included in the prediction indicators, the prediction accuracy (0.9504) is better than that considering only S C o u p l i n g and S H P . This indicates that although the weight value of the CS indicator is very small, it can still significantly improve prediction accuracy by being included, highlighting the necessity of considering comprehensive indicators. Similarly, the weight value of the path heterogeneity (HP) indicator is also small, but it also improves the overall prediction accuracy.

6. Conclusions

This paper, based on the green patent cooperation network in the Beijing-Tianjin-Hebei region from 2020 to 2023, established a new link prediction model. This model integrates the prediction indicators of node characteristics, path characteristics, and content characteristics, and improves each of these indicators. In terms of node similarity, the entropy weight method is used to couple 10 node similarity indicators to improve the prediction accuracy of multiple indicators in the patent cooperation network. In terms of path heterogeneity, the heterogeneity of intermediate edges and intermediate nodes in the network is simultaneously introduced into the path characteristics to construct a weighted prediction model for multi-level paths. In terms of content similarity, the LDA topic model is used to classify the patent abstracts, and the entropy weight method is coupled with the precision values of the higher accuracy Jaccard, Euclidean, and Manhattan indicators. Finally, considering the indicators of node similarity, path heterogeneity, and content similarity based on the model, a comprehensive prediction model NPC was constructed. The optimal weight values of the three-dimensional indicators were solved using GWO method which takes the AUC as the objective.
The results of the comparative experiments demonstrate that the NPC-based prediction model, which comprehensively incorporates the network’s node features, path features, and collaborative content, achieves significantly higher prediction accuracy than models relying on any single category of indicators. In the optimal weight combination of the NPC indicators, node similarity indicators contribute the most, followed by path indicators, while CS indicators make the smallest contribution. Despite their relatively low weight, CS indicators still remarkably improve the prediction accuracy, which underscores the necessity of indicator integration. An analysis of the predicted network was conducted from two dimensions: the distribution of agent types and regional distribution. The findings reveal that imbalances will persist in the future development of green technology collaborative innovation in the Beijing-Tianjin-Hebei region, and the driving effect of cross-regional collaborative innovation should be further strengthened.
Although the prediction performance of the model proposed in this paper is relatively satisfactory, there are still certain limitations and shortcomings. (1) The methodology adopted in this paper is indicator fusion, which integrates multiple mainstream prediction indicators to improve prediction accuracy. However, this method involves an excessive number of indicators, leading to extremely high computational complexity when applied to large-scale networks. Therefore, this method is not suitable for the prediction of large-scale networks. (2) Given that this method constructs a new prediction model based on indicator fusion, it is necessary to adjust the combination scheme of various indicators according to AUC during the fusion process, resulting in strong dependence on datasets. It is possible that indicator fusion may fail to improve prediction accuracy when applied to other datasets. (3) This paper only considers the link prediction of potential edges. However, practical patent cooperation networks are weighted networks, where existing cooperative connections still have the possibility of re-collaboration—in other words, the weights of edges also need to be predicted.
To address the aforementioned limitations, the future work of this study will focus on two aspects. First, more datasets of patent cooperation networks will be utilized to verify the effectiveness of the proposed method. Second, based on the weighted network data of patent cooperation, the weights of existing cooperative edges will be predicted, so as to more accurately evaluate the future evolutionary trends of the network.

Author Contributions

Conceptualization, M.Y.; methodology, X.G.; software, Y.Y.; formal analysis, M.Y.; investigation, Y.Y.; resources, Y.Y.; data curation, M.Y.; writing-original draft preparation, M.Y.; writing-review and editing, J.L.; visualization, M.Y.; supervision, X.G.; project administration, X.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
CNCommon Neighbor
AAAdamic-Adar
HPIHub Promoted Index
RAResource Allocation
NSINeighbor Set Information
MPCPMeta-Path Computation Prediction
LDALatent Dirichlet Allocation
NPCNode & Path & Content
IPCInternational Patent Classification
AUCArea Under Curve
ROCReceiver Operating Characteristic

References

  1. Tsay, M.Y.; Liu, Z.W. Analysis of the patent cooperation network in global artificial intelligence technologies based on the assignees. World Pat. Inf. 2020, 63, 63102000. [Google Scholar] [CrossRef]
  2. Chen, W.; Qu, H.; Chi, K. Partner Selection in China Inter-organizational Patent Cooperation Network Based on Link Prediction Ap-proaches. Sustainability 2021, 13, 1003. [Google Scholar]
  3. Geum, Y.; Lee, S.; Yoon, B.; Park, Y. Identifying and Evaluating Strategic Partners for Collaborative R&D: Index-based Approach Using Patents and Publications. Technovation 2013, 33, 211–224. [Google Scholar]
  4. Zsolt, T.K.; Attila, I.K.; Tibor, C.; Beáta, F. Analysis and prediction of the Horizon 2020 R&D&I collaboration network. Expert Syst. Appl. 2024, 255, 124417. [Google Scholar]
  5. Wang, J.; Wang, N.; Zhao, W.; Feng, L. Identifying and evaluating R&D partners via patent-based multilayer networks from the perspective of knowledge complementarity: A case study. Comput. Ind. Eng. 2025, 204, 111085. [Google Scholar]
  6. Liu, Q. Discovering of Cooperation Pattern Evolution Path in Tsinghua University: A Patent Cooperation Network Perspective. In Proceedings of the 2015 2nd International Conference on Education Reform and Modern Management (ERMM2015), Hong Kong, China, 19–20 April 2015. [Google Scholar]
  7. Liu, W.; Song, Y.; Bi, K. Exploring the Patent Collaboration Network of China’s Wind Energy Industry: A Study Based on Patent Data from CNIPA. J. Renew. Sustain. Energy Rev. 2021, 144, 110989. [Google Scholar] [CrossRef]
  8. Mei, C.; Cong, J.; Jin, S. The Characteristics and Efficiency of Innovation Cooperation of Beijing-Tianjin-Hebei Region—Evidence from Patents and Papers cooperation. J. Hebei Univ. Technol. (Soc. Sci. Ed.) 2019, 11, 8–15. (In Chinese) [Google Scholar]
  9. Liu, Y.; Shao, X.; Tang, M.; Lan, H. Spatial-temporal Evolution of Green Innovation Network and Its Multidimensional Proximity Analysis: Empirical Evidence from China. J. Clean. Prod. 2021, 283, 124649. [Google Scholar] [CrossRef]
  10. Zhou, J. The Spatial-temporal Evolution and Cooperation Networks of Green Technology Innovation in the Yangtze River Delta Urban Agglomeration. Master’s Thesis, East China Normal University, Shanghai, China, 2019. (In Chinese) [Google Scholar]
  11. Fan, J.; Xiao, Z. Analysis of Spatial Correlation Network of China’s Green Innovation. J. Clean. Prod. 2021, 299, 126815. [Google Scholar] [CrossRef]
  12. Wang, K.; Bian, Y.; Cheng, Y. Exploring the Spatial Correlation Network Structure of Green Innovation Efficiency in the Yangtze River Delta, China. Sustainability 2022, 14, 3903. [Google Scholar] [CrossRef]
  13. Bai, Y.; Wang, J.; Jiao, J. A Framework for Determining the Impacts of a Multiple Relationship Network on Green Innovation. Sustain. Prod. Consum. 2012, 27, 471–484. [Google Scholar] [CrossRef]
  14. Dai, C.; Chen, L.; Li, B.; Li, Y. Link prediction in multi-relational networks based on relational similarity. Inf. Sci. 2017, 394–395, 198–216. [Google Scholar] [CrossRef]
  15. Liben-Nowell, D.; Kleinberg, J. The link-prediction problem for social networks. J. Am. Soc. Mation Sci. Technol. 2007, 58, 1019–1031. [Google Scholar] [CrossRef]
  16. Chuan, P.M.; Son, L.H.; Ali, M.; Khang, T.D.; Huong, L.T.; Dey, N. Link prediction in co-authorship networks based on hybrid content similarity metric. Appl. Intell. 2018, 48, 2470–2486. [Google Scholar] [CrossRef]
  17. Martínez, V.; Berzal, F.; Cubero, J.C. A survey of link prediction in complex networks. ACM Comput. Surv. 2016, 49, 1–33. [Google Scholar] [CrossRef]
  18. Haghani, S.; Keyvanpour, M.R. A systemic analysis of link prediction in social network. Artif. Intell. Rev. 2019, 52, 1961–1995. [Google Scholar] [CrossRef]
  19. Kumar, A.; Singh, S.S.; Singh, K.; Biswas, B. Link prediction techniques, applications, and performance: A survey. Phys. A Stat. Mech. Appl. 2020, 553, 124289. [Google Scholar] [CrossRef]
  20. Zhang, R.; Zhao, Z.; An, M. Structure Analysis and Link Prediction of Patent Technology Cooperation Network in Guang-dong-Hong Kong-Macao Greater Bay Area. China Invent. Pat. 2021, 18, 26–33. (In Chinese) [Google Scholar]
  21. Shi, A.; Qiu, J.; Zhao, S. Patent Technology Cooperation and Link Prediction Among Science and Technology Subjects in Beijing, Tianjin and Hebei. Mod. Inf. Technol. 2020, 4, 178–183. (In Chinese) [Google Scholar]
  22. Katz, L. A new status index derived from sociometric analysis. Psychometrika 1953, 18, 39–43. [Google Scholar] [CrossRef]
  23. Zhou, T.; Lü, L.; Zhang, Y.C. Predicting missing links via local information. Eur. Phys. J. B 2009, 71, 623–630. [Google Scholar] [CrossRef]
  24. Lü, L.; Jin, C.H.; Zhou, T. Similarity index based on local paths for link prediction of complex networks. Phys. Rev. E Stat. Nonlinear Soft Matter Phys. 2009, 80, 046122. [Google Scholar] [CrossRef]
  25. Zhu, B.; Xia, Y. An information-theoretic model for link prediction in complex networks. Sci. Rep. 2015, 5, 13707. [Google Scholar] [CrossRef]
  26. Zhu, X.Z.; Tian, H.; Cai, S.M.; Huang, J.; Zhou, T. Predicting missing links via significant paths. Europhys. Lett. 2014, 106, 18008. [Google Scholar] [CrossRef]
  27. Liu, J.; Sun, W. Discovery of potential scientific and technical collaborative relationship based on link prediction. Inf. Stud. Theory Appl. 2017, 40, 88–921+21. (In Chinese) [Google Scholar]
  28. Wang, Z.; Han, W.; Sun, Z.; Pan, X. Research on scientific collaboration prediction based on the combination of network topology and node attributes. Inf. Stud. Theory Appl. 2019, 42, 116–120. [Google Scholar]
  29. Jeon, J.; Lee, C.; Park, Y. How to use patent information to search potential technology partners in open innovation. J. Intellect. Prop. Rights 2011, 16, 385–393. [Google Scholar]
  30. Wang, X.F.; Wang, Z.N.; Huang, Y.; Liu, Y.Q.; Zhang, J.; Heng, X.F.; Zhu, D.H. Identifying R&D partners through Sub-ject-Action-Object semantic analysis in a problem & solution pattern. Technol. Anal. Strateg. Manag. 2017, 29, 1167–1180. [Google Scholar]
  31. Kang, J.; Lee, J.; Jang, D.; Park, S. A methodology of partner selection for sustainable Industry-University cooperation based on LDA topic model. Sustainability 2019, 11, 3478. [Google Scholar] [CrossRef]
  32. Park, I.; Jeong, Y.; Yoon, B.; Mortara, L. Exploring potential R&D collaboration partners through patent analysis based on bibliographic coupling and latent semantic analysis. Technol. Anal. Strateg. Manag. 2015, 27, 759–781. [Google Scholar]
  33. Ding, J.D.; Guo, J. Mining potential author cooperative relationships based on the similarity of content and path. Inf. Stud. Theory Appl. 2021, 44, 123–128. (In Chinese) [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.