4.1. Datasets and Experimental Settings
The experiments are conducted on four datasets to evaluate the proposed model from complementary perspectives, including general link prediction capability, industrial-domain adaptability, and practical applicability in fault diagnosis. Specifically, WN18RR [
21] and FB15k-237 [
24] are used as public benchmark datasets to evaluate the general link prediction capability of the proposed model. WN18RR is derived from WordNet and exhibits a clear semantic hierarchy, making it suitable for evaluating the ability to model hierarchical relations and multi-hop reasoning patterns. FB15k-237 is derived from Freebase and removes inverse-relation leakage, making it more appropriate for evaluating generalization under complex relational settings. Beyond these general-purpose benchmarks, two industrial knowledge graph datasets are further used to assess the adaptability and robustness of the proposed model in domain-specific industrial scenarios. Among them, the Chinese industrial knowledge graph is constructed from Computer Numerical Control (CNC) fault diagnosis knowledge and is used to examine the practical applicability of the proposed model in a representative fault diagnosis scenario, while the English industrial knowledge graph is derived from a related Industry 4.0 production-line scenario and is used to evaluate cross-scenario adaptability in a broader industrial context.
The Chinese industrial knowledge graph is constructed from the book
Practical Computer Numerical Control (CNC) Machine Tool Fault Diagnosis and Maintenance: 500 Cases [
25]. Specifically, the Chinese dataset is built by extracting fault-related knowledge from the book and then performing entity standardization, relation normalization, triple construction, noise cleaning, and duplicate removal, so as to obtain an industrial link prediction dataset for a representative CNC fault diagnosis application scenario. This dataset is used to evaluate the practical applicability of the proposed industrial knowledge graph link prediction model in fault diagnosis, since its triples mainly describe diagnostic relations among fault phenomena, alarm information, fault locations, fault causes, and related operations. During entity standardization, synonymous expressions, variant names, and inconsistent naming formats were merged according to their semantic roles in fault knowledge. Relation normalization was conducted by mapping extracted relations into a predefined set of industrial fault-diagnosis relations. Triples with ambiguous entities, duplicated records, or insufficient relation evidence were removed during data cleaning. The final triples were split into training, validation, and test sets under a fixed random seed, while keeping the relation distribution approximately consistent across different splits.
In contrast, the English industrial knowledge graph is mainly derived from a publicly available Industry 4.0 production-line dataset released on Zenodo [
26] and later described in detail in the corresponding journal article [
27]. More specifically, the English industrial dataset used in this study is not the full original dataset, but a reconstructed subgraph extracted from that benchmark dataset for the link prediction task. Although this dataset is not a fault-diagnosis dataset in a narrow sense, it represents a related industrial production-line scenario and is used as an additional industrial dataset to evaluate whether the proposed model remains effective beyond the representative fault diagnosis application dataset. This design allows the resulting dataset to retain realistic industrial semantics while remaining suitable for controlled experimental evaluation. For the English industrial dataset, relations and entities that were too sparse, duplicate-like, or mainly descriptive rather than relational were removed, and the remaining triples were reorganized into a link-prediction-oriented format.
For the Chinese industrial knowledge graph, entities are categorized into five types according to their semantic roles in fault knowledge, namely, error code, operation, phenomenon, fault location, and cause. Based on these categories, valid head- and tail-type sets are further constructed for each relation, which are used for type-constrained scoring and type-constrained hard negative sampling. These entity types correspond to key diagnostic elements in CNC fault diagnosis and provide explicit type boundaries for judging whether a candidate entity is valid under a specific diagnostic relation. Although these type definitions are derived from the fault diagnosis application, they also reflect a common characteristic of industrial knowledge graphs, namely, that many relations impose explicit validity constraints on the semantic roles of head and tail entities. The statistics of the four datasets are summarized in
Table 2.
Beyond the basic statistics, the four datasets differ substantially in their domains, semantic sources, and structural characteristics. WN18RR is a lexical knowledge graph derived from WordNet, and its textual semantics mainly come from WordNet glosses and relation labels. It contains clear hierarchical semantic relations and is therefore suitable for evaluating multi-hop reasoning and semantic hierarchy modeling. FB15k-237 is a general-domain knowledge graph derived from Freebase, where inverse-relation leakage has been removed. Compared with WN18RR, it contains more diverse relation patterns and is used to evaluate generalization under complex relational settings.
The Chinese industrial knowledge graph is constructed from CNC fault diagnosis and maintenance knowledge and is used as a representative application dataset in this study. Its entities and relations mainly describe diagnostic elements such as fault phenomena, alarm information, fault locations, fault causes, and operations. Therefore, it exhibits sparse local structures, concentrated diagnostic relations, explicit entity-type boundaries, and highly confusing candidates under certain diagnostic relations. These characteristics make it suitable for evaluating the practical value of the proposed industrial knowledge graph link prediction model in fault diagnosis applications.
The English industrial knowledge graph is derived from an Industry 4.0 production-line dataset and is used as an additional industrial dataset for cross-scenario validation. Although it is not a fault-diagnosis dataset in a narrow sense, it contains realistic industrial entities and relations from a production-line scenario, with a larger candidate space than the Chinese industrial knowledge graph. This dataset is therefore used to examine whether the proposed model remains effective in a related industrial scenario beyond the representative fault diagnosis application dataset. For both industrial datasets, semantic inputs are mainly constructed from entity names, relation names, and available domain descriptions.
All experiments were implemented using Python 3.8, PyTorch 1.10.0 (Meta AI, Menlo Park, CA, USA), CUDA 11.3 (NVIDIA Corporation, Santa Clara, CA, USA), and Transformers 4.18.0 (Hugging Face, New York, NY, USA). The experiments were conducted on a workstation equipped with an NVIDIA RTX A6000 GPU with 48 GB memory (NVIDIA Corporation, Santa Clara, CA, USA).
During preprocessing, the following steps are performed. First, inverse relations are constructed for each relation to enhance the modeling of bidirectional relation patterns. Second, pretrained text encoders are used to obtain semantic vectors of entities and relations. For the public datasets, textual inputs are mainly derived from WordNet glosses or Freebase labels, whereas for the industrial datasets, entity names, relation names, and domain descriptions are used. Third, a relation co-occurrence graph is constructed to learn relation priors, which provides structural support for subsequent dynamic relation prior generation. Finally, a type-constrained hard negative sampling strategy is employed during training to improve discrimination among highly confusing candidates.
To avoid potential information leakage, all trainable structural statistics, including relation co-occurrence statistics, dynamic relation prior inputs, type constraints used during training, and hard negative candidate pools, were constructed only from the training triples. Validation and test triples were not used for model training, relation-prior construction, or hard negative construction. For semantic initialization, only entity names, relation names, and externally available textual descriptions were used, and no validation or test triple labels were introduced during text embedding construction. Validation triples were used only for model selection, and test triples were used only for final evaluation. In the filtered evaluation protocol, known true triples from the training, validation, and test sets were removed from the corrupted candidate list following the standard link prediction setting, so as to avoid false-negative interference during ranking.
All experiments are evaluated under the filtered setting, i.e., existing true triples from the training, validation, and test sets are filtered out during head and tail prediction to avoid false-negative interference. The evaluation metrics are the commonly used Mean Reciprocal Rank (MRR), Hits@1, Hits@3, and Hits@10. For the public datasets, preprocessing mainly introduces inverse relations, text-based semantic initialization, and relation co-occurrence priors to strengthen general link prediction performance. For the industrial datasets, entity type constraints, relation prior enhancement, and hard negative construction are further incorporated to improve the adaptability of the model in industrial knowledge graphs. For the Chinese industrial knowledge graph, these constraints directly reflect diagnostic role boundaries among fault phenomena, alarm information, fault locations, fault causes, and operations, thereby supporting the evaluation of the proposed model in a representative fault diagnosis application. For the English industrial knowledge graph, the same modeling strategy is applied to examine whether the proposed model can generalize to a related production-line knowledge graph. All baseline methods are evaluated under the same filtered setting, and the reported results are based on the same training, validation, and test splits for fair comparison.
The main hyperparameter settings used in the experiments are summarized in
Table 3. The semantic weight
and the type-constrained weight
are further analyzed in the hyperparameter analysis. If not otherwise specified, the same basic training configuration is used across datasets, while dataset-specific parameters are selected according to validation performance.
4.2. Results on Public Benchmark Datasets
To verify the effectiveness of the proposed structural-and-semantic fusion model on general knowledge graph link prediction tasks, comparative experiments are first conducted on the two public benchmark datasets WN18RR and FB15k-237. The results are reported in
Table 4.
As shown in
Table 4, the proposed method achieves the best results on both public datasets. On WN18RR, the proposed method obtains an MRR of 0.599 and Hits@1, Hits@3, and Hits@10 of 0.538, 0.631, and 0.724, respectively. On FB15k-237, the corresponding scores are 0.446, 0.354, 0.498, and 0.650. Compared with representative baselines, the proposed method shows clear and stable advantages in overall ranking quality and top-ranked candidate accuracy.
On WN18RR, the proposed method improves the MRR of NBFNet from 0.551 to 0.599, indicating that dynamic relation priors and the semantic fusion module can provide more effective complementary information in scenarios involving hierarchical relations and multi-hop semantic dependencies, thereby improving overall candidate ranking quality. Meanwhile, the Hits@10 score of 0.724 indicates stronger coverage of high-ranking candidates. On FB15k-237, the proposed method also outperforms strong baselines such as NBFNet, N-BERT, HittER, and N-Former, demonstrating good generalization ability in more diverse and structurally complex public benchmark scenarios.
Among all baselines, NBFNet is the most directly comparable structure-based reasoning model because it also emphasizes relation-centered multi-hop propagation. Compared with NBFNet, the proposed method improves MRR by 0.048 on WN18RR and by 0.031 on FB15k-237. These improvements suggest that dynamic relation priors and candidate-ranking-level semantic evidence can provide useful complementary signals beyond structural propagation. The improvements in Hits@1 also indicate that the correct entity is more frequently ranked at the top, which is important for practical knowledge graph completion systems where only a few top-ranked candidates are typically inspected.
Overall, the public benchmark results validate the effectiveness of the proposed structural-and-semantic fusion framework for general knowledge graph link prediction. Although this is not the primary focus of the study, the results show that the proposed method is not only effective in industrial scenarios but also consistently beneficial on general datasets.
4.3. Results on Industrial Datasets
After validating the general effectiveness of the model on public benchmarks, experiments are further conducted on the Chinese industrial knowledge graph and the English industrial knowledge graph to evaluate the adaptability and robustness of the proposed method in industrial scenarios. The results are shown in
Table 5.
As shown in
Table 5, the proposed method achieves the best performance on both the Chinese and English industrial knowledge graphs. On the Chinese industrial knowledge graph, the proposed method obtains an MRR of 0.8532 and Hits@1, Hits@3, and Hits@10 of 0.8235, 0.8784, and 0.9144, respectively. On the English industrial knowledge graph, the corresponding scores are 0.7994, 0.7908, 0.8042, and 0.8216. These results indicate that the proposed model performs strongly not only on the Chinese industrial graph, where semantic relations are more concentrated, but also on the larger and more challenging English industrial graph with a broader candidate space.
On the Chinese industrial knowledge graph, the proposed method improves the MRR of NBFNet from 0.8220 to 0.8532, indicating that relation-graph-based dynamic relation priors, hierarchy-aware relation propagation, candidate-ranking-level semantic evidence, and type-aware constraints can more effectively exploit relation contexts, textual semantics, and type boundaries in industrial knowledge, thereby strengthening the modeling of query-relevant reasoning evidence. Meanwhile, the improvement in Hits@10 to 0.9144 further demonstrates strong ranking performance under high-recall candidate scenarios. On the English industrial knowledge graph, the proposed method also outperforms strong baselines such as NBFNet and DistMult, indicating that it remains robust and effective in larger-scale industrial scenarios with more complex candidate spaces.
The industrial results are practically meaningful because industrial link prediction is often used to provide a short list of candidate entities for downstream fault analysis, maintenance knowledge retrieval, or decision support. On the Chinese industrial knowledge graph, the Hits@1 improvement from 0.7919 to 0.8235 over NBFNet indicates that the correct entity is more frequently ranked first, which can reduce manual verification cost in fault-related candidate recommendation. On the English industrial knowledge graph, the MRR improvement from 0.7784 to 0.7994 shows that the proposed framework remains effective when the candidate space becomes larger. These gains suggest that the combination of relation-adaptive structural reasoning, semantic candidate discrimination, and type-aware constraints is especially helpful for industrial graphs with sparse local structures and explicit relation-type boundaries.
Overall, the industrial dataset results show that the performance gains of the proposed method do not arise from a single module, but from the joint contribution of relation-adaptive structural reasoning, semantic candidate discrimination, and entity-type validity modeling. In industrial knowledge graphs, where type boundaries are explicit yet candidates are highly confusing, relying only on structural patterns or semantic information is insufficient to achieve stable and reliable ranking. By combining relation-category-aware propagation with type-aware scoring and type-constrained hard negative training, the proposed method forms a more complete discrimination mechanism for candidate ranking.
4.4. Relation-Type Analysis
To further analyze model behavior under different relation mapping patterns, this subsection reports the performance on four standard relation categories in knowledge graph link prediction, namely, 1-to-1, 1-to-N, N-to-1, and N-to-N. These categories are used to describe the mapping complexity between head entities and tail entities under a specific relation. A 1-to-1 relation usually indicates that one head entity corresponds to one tail entity, and the candidate competition is relatively weak. A 1-to-N relation indicates that one head entity may correspond to multiple tail entities, while an N-to-1 relation indicates that multiple head entities may correspond to the same tail entity. These two categories usually involve stronger candidate competition because several entities may be plausible under the same relation. An N-to-N relation indicates that multiple head entities and multiple tail entities are connected under the same relation, reflecting a more complex many-to-many mapping pattern. Therefore, relation-type analysis is useful for examining whether a model can maintain stable ranking performance under different mapping complexities, especially when multiple candidate entities are structurally or semantically similar.
Following the standard setting in knowledge graph link prediction, the relations in the Chinese industrial knowledge graph are categorized into the above four types. The MRR values of the proposed method and NBFNet on these relation categories are reported in
Table 6.
As shown in
Table 6, the proposed method outperforms NBFNet across all four relation categories, although the magnitude of improvement varies considerably. Overall, the MRR values of 1-to-1 and N-to-N relations are significantly higher than those of 1-to-N and N-to-1 relations, indicating that both models perform well when relation mappings are relatively deterministic or structurally well connected. In contrast, one-to-many and many-to-one relations involve stronger candidate competition and are therefore more difficult to rank accurately.
More specifically, the proposed method achieves the most significant improvements on 1-to-N and N-to-1 relations. For 1-to-N relations, the MRR increases from 0.7214 to 0.7869, while for N-to-1 relations, it increases from 0.6938 to 0.7615. These results suggest that under complex relation patterns with multiple highly competitive candidates, dynamic relation priors, the semantic fusion module, and the type-constrained hard negative strategy can better enhance fine-grained candidate discrimination. In industrial knowledge graphs, many candidates are highly similar in local structure or semantics, and a single structural propagation signal is often insufficient for stable discrimination. By jointly exploiting structural information, semantic information, and type constraints, the proposed method is able to build clearer decision boundaries among confusing candidates.
The larger gains on 1-to-N and N-to-1 relations further support the motivation of the proposed framework. In these relation categories, multiple candidate entities may satisfy similar structural patterns, making the ranking task more dependent on query-specific relation information, textual semantic evidence, and type validity. Therefore, the observed improvements indicate that the proposed mechanisms are particularly useful when candidate competition is strong.
For the relatively regular or densely connected 1-to-1 and N-to-N relations, the proposed method also maintains strong performance. The MRR of 1-to-1 relations increases from 0.9346 to 0.9438, and that of N-to-N relations increases from 0.9017 to 0.9162. Although these gains are smaller than those on 1-to-N and N-to-1 relations, they still indicate that the proposed model is effective not only for complex relation patterns but also for relatively regular ones. Overall, the results in
Table 6 further verify the adaptability of the proposed model to different relation mapping patterns in industrial knowledge graphs, especially under strong candidate competition.
4.6. Hyperparameter Analysis
To further analyze the effect of key hyperparameters on performance, sensitivity analysis is conducted on the semantic branch weight
and the type-constrained branch weight
. These two hyperparameters control the contributions of semantic information and type-validity scoring evidence, respectively, in the final scoring function. Since the proposed method is designed primarily for industrial knowledge graph scenarios, the analysis is performed only on the Chinese industrial knowledge graph. Specifically, one parameter is fixed while the other is varied:
is fixed to 0.3 when analyzing
, and
is fixed to 0.3 when analyzing
. The results are shown in
Table 8.
As shown in
Table 8, both
and
exhibit a clear moderate-optimum pattern, meaning that excessively small or large values both lead to performance degradation. For
, when it is set to 0.2, the MRR is 0.8476, indicating that the contribution of the semantic branch is relatively insufficient and the model still relies more heavily on structural propagation and type-validity evidence, making it difficult to distinguish candidates that are structurally similar but semantically different. When
, the model achieves the best MRR of 0.8532, indicating that semantic information, structural evidence, and type-validity evidence are well balanced. When
is further increased to 0.4, the MRR decreases to 0.8491, suggesting that overly strong semantic weighting may cause the model to depend too much on semantic similarity and weaken the influence of structural paths and rule constraints.
For , the variation reflects the role boundary of the type-constrained branch in industrial scenarios. When , the MRR is 0.8458, indicating that the influence of type constraints on the final score is too weak and the model cannot fully exploit explicit type boundaries in the industrial knowledge graph to suppress incorrect candidates. When , the model reaches the best MRR of 0.8532, showing that the type-constrained branch forms a good complement to the structural and semantic branches. When increases to 0.4, the MRR drops to 0.8484, suggesting that overly strong rule constraints may compress fine-grained differences among multiple valid candidates and thus impair final ranking performance.
Overall, the results in
Table 8 indicate that the effectiveness of the proposed model relies on an appropriate balance among structural propagation, semantic discrimination, and type constraints, rather than the excessive strengthening of any single branch. Both the semantic branch and the type-constrained branch need to operate within an appropriate range so that they remain well balanced with structural evidence and jointly yield the best link prediction performance on industrial knowledge graphs.