Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Enhancing Domain-Specific Knowledge Graph Reasoning via Metapath-Based Large Model Prompt Learning

Electronics 2025, 14(5), 1012; https://doi.org/10.3390/electronics14051012

by Ruidong Ding^* and Bin Zhou

Reviewer 1:

Richard Fox

Reviewer 2:

Yasuko Kawahata

Reviewer 3: Anonymous

Electronics 2025, 14(5), 1012; https://doi.org/10.3390/electronics14051012

Submission received: 28 January 2025 / Revised: 18 February 2025 / Accepted: 24 February 2025 / Published: 3 March 2025

(This article belongs to the Section Artificial Intelligence)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

I have a few comments, nothing too substantial but some corrections for the authors to make. I do recommend though that in section 3 you provide some specific examples throughout the description of the algorithm because reading line after line of formula without concrete examples is somewhat exasperating.

Line 8: "Using" --> "using"

Line 21: define a knowledge graph and provide an example of a triple (some examples should be introduced prior to section 2.1 whether in the introduction or in section 2 prior to your discussion starting in 2.1)

In your first equation (after line 74) you have h, r, t, but you wait until line 87 to define them, also I find the use of "head" and "tail" to be unclear, how about "instance" and "value" instead?

Lines 97-105: this paragraph makes it sound like this research is new or recent, but this research dates back to the 1960s as semantic networks, it is the application of neural networks that might be considered somewhat recent.

Lines 166-175: some of these are missing references (e.g., ERNIE, KO).

Line 181: one of the references appears as [?]

Line 189: I don't think you need to specify "large language models" again since you have already introduced the term as LLMs earlier.

Figure 1: the portion on the right is a little blurry, see if you can improve the quality.

Line 256: define "zero-shot"

Line 256: "Reasoning meta-path are" --> "Meta-paths are"

Line 276: an example of Qi, Ti, Hi, would be helpful.

Line 281: where do the pre-defined templates come from?

Equation 1: this ends with a , but the next line starts a new paragraph with a capital letter, this should probably end with a period (or just remove the comma). (same with equation 2)

Equations 5 & 6: I could not find what the gamma or alpha are in these formulas (you later say gamma is a threshold but don't explain what the threshold represents)

Line 350: "These two steps can be iterated with each other" - this is unclear. Do you mean the two steps can be executed concurrently? Or that you do step 1 then step 2 repeatedly?

Throughout section 3, a few brief examples would be helpful to more clearly illustrate what all the sets and equations represent.

Line 393: this sentence goes beyond the margin.

Line 408: this line should not be indented.

Line 414: same

Line 434: you have a reference for TransE but don't say anything about. It would be useful to supply a 1-2 sentence description to introduce it.

Section 3.3.3: the text in parts 1-6 are hard to read since you are not indenting or separating each "paragraph" with blank line. You might consider changing these to bullet points to clearly show where each new "paragraph" starts. (e.g., lines 467-470, 472-473).

Lines 493-496: you have references "denoted as [...]" but you don't say what X, Y, Q or Z are, so this is unclear.

Line 531-532: who does the pruning? Is this tested by hand (human) or by a program?

Line 546: don't indent.

Table 2: explain F1 and Hits@1 (what are they?).

Line 708: again flows past the margin, I wonder if instead of showing us the raw entity updates you can just rephrase this in English so that it is easier to decipher.

Author Response

Comments 1:Line 8: "Using" --> "using"

Response 1: Thank you for pointing this out. I have corrected the capitalization in line 8 from “Using” to “using” to maintain consistency in style. The change can be found on line [7] of the revised manuscript.

Comments 2:Line 8: Line 21: define a knowledge graph and provide an example of a triple (some examples should be introduced prior to section 2.1 whether in the introduction or in section 2 prior to your discussion starting in 2.1)

Response 2: Thank you for your suggestion. I have addressed this comment by adding a definition of a knowledge graph along with an example of a triple in the Introduction section. The example is presented as Figure 1 to provide a visual illustration of the concept before the detailed discussion in Section 2.1.This change can be found on line[22-27] of the revised manuscript, where Figure 1 is introduced and explained.

Comments 3: In your first equation (after line 74) you have h, r, t, but you wait until line 87 to define them, also I find the use of "head" and "tail" to be unclear, how about "instance" and "value" instead?

Response 3: I agree with this comment. Therefore, I have revised the definitions of the variables h, r, and t to be immediately clear upon their first appearance in the manuscript. I have also replaced the terms “head” and “tail” with “instance” to enhance clarity. These changes can be found on line [87-106].

Comments 4: Lines 97-105: this paragraph makes it sound like this research is new or recent, but this research dates back to the 1960s as semantic networks, it is the application of neural networks that might be considered somewhat recent.

Response 4: Thank you for your insightful comment. I appreciate your clarification regarding the historical context of domain-specific knowledge graphs. To address your concern, I have revised the paragraph and add some reference to better reflect the long-standing development of these graphs while also acknowledging the recent impact of neural networks.This change can be found on lines [107-117] of the revised manuscript.

Comments 5: Lines 166-175: some of these are missing references (e.g., ERNIE, KO).

Response 5: Thank you for pointing this out, I have added the missing references in the revised manuscript. The references have been inserted in the paragraph on lines [190-194].

Comments 6: Line 181: one of the references appears as [?].

Response 6: Thank you for pointing this out, I have added the missing references in the revised manuscript. The references have been inserted in the paragraph on lines [202].

Comments 7: Line 189: I don't think you need to specify "large language models" again since you have already introduced the term as LLMs earlier.

Response 7: Thank you for pointing this out, I have removed the redundant specification of "large language models" and replaced it with the abbreviation "LLMs" to maintain consistency with the earlier introduction of the term.This change can be found on line [210].

Comments 8: Figure 1: the portion on the right is a little blurry, see if you can improve the quality.

Response 8: Thank you for pointing this out. I have replaced the blurry portion on the right side of Figure 1 with a higher-resolution image to improve its clarity.The revised figure now is Figure 2 and can be found on line [301] of the manuscript.

Comments 9: Line 256: define "zero-shot"

Response 9: I have added a more detailed definition of “zero-shot” in parentheses to clarify that it refers to prompts that do not require any labeled training data or prior domain-specific fine-tuning. Instead, they rely entirely on the structured knowledge graph query feedback for the next step of reasoning.This change can be found on line [303-307] of the revised manuscript.

Comments 10: Line 256: "Reasoning meta-path are" --> "Meta-paths are"

Response 10: Thank you for pointing this out, I have corrected the grammatical error by changing the phrase "Reasoning meta-path are" to "Meta-paths are" on line [303].

Comments 11: Line 276: an example of Qi, Ti, Hi, would be helpful.

Response 11: I agree with this comment. And I have added an example of Q _i , T_i, and H_i to illustrate the concepts more clearly. The example is included in the revised manuscript on lines [330-334].

Comments 12: Line 281: where do the pre-defined templates come from?

Response 12:The predefined prompts mentioned here are merely some manually set and simple prompt templates, such as: "[triplets,...]These are the adjacent nodes currently retrieved from the knowledge graph. Please select the potentially relevant next one:"

Comments 13: Equation 1: this ends with a , but the next line starts a new paragraph with a capital letter, this should probably end with a period (or just remove the comma). (same with equation 2)

Response 13: Thank you for pointing this out. I have removed the unnecessary comma at the end of Equation 1 and Equation 2.The revised equation now are Equation 3 and Equation 4.

Comments 14: :Equation 1: Equations 5 & 6: I could not find what the gamma or alpha are in these formulas (you later say gamma is a threshold but don't explain what the threshold represents)

Response 14: Thank you for your comment.But I am puzzled as to why γ and α were not clearly visible in the manuscript, as they were correctly included in the equations from our perspective. This might be due to a potential Unicode character error or formatting issue. To address this, I have updated the formula code and hope this will resolve the visibility problem. Additionally, I have added detailed explanations for γ and α to make the equations more readable.These changes can be found on lines [371-378] and Equation 7 and Equation 8.

Comments 15: Line 350: "These two steps can be iterated with each other" - this is unclear. Do you mean the two steps can be executed concurrently? Or that you do step 1 then step 2 repeatedly?

Response 15: The idea here is that after completing the first step, we obtain the query results needed for the next step, and after completing the second step, we determine the next query target. This two-step process is repeated iteratively until the final answer is reached. Thank you for pointing this out, I have revised the sentence on Line 350 to clarify the process. The revised sentence now reads: "These two steps are executed in sequence and iterated repeatedly."This change can be found on line [408] in the revised manuscript.

Comments 16: Throughout section 3, a few brief examples would be helpful to more clearly illustrate what all the sets and equations represent.

Response 16: Thank you very much for your valuable comment. While we have included some brief examples in the case study section of the experiments to demonstrate the practical application of our approach, we acknowledge that additional examples within Section 3 would be beneficial for a more comprehensive understanding.

However, we would like to explain the challenges in incorporating detailed examples directly into the methodological description. Our method involves a large number of iterative steps and continuous processing between large language models and knowledge graph nodes. Given the complexity and iterative nature of the approach, integrating scattered case descriptions within the methodological explanation could potentially disrupt the coherence and flow of the overall process description.

To address this concern, we have revised the manuscript by adding a detailed pseudocode to Section 3. This pseudocode provides a step-by-step breakdown of the method, which we believe will enhance the readability and clarity of our approach. The pseudocode can be found on page [14], Algorithm 1.

Comments 17: Line 393: this sentence goes beyond the margin.

Response 17: Thank you for pointing this out, I have reformatted the sentence to ensure it fits within the margin. The change can be found on line [453].

Comments 18: Line 408: this line should not be indented.&Line 414: same

Response 18: Thank you for pointing this out, I have removed the indentation from lines 408 and 414 to ensure proper formatting.These changes can be found on lines [466 and 472], respectively.

Comments 19: Line 434: you have a reference for TransE but don't say anything about. It would be useful to supply a 1-2 sentence description to introduce it.

Response 19: I agree with this comment, I have added a brief description of TransE to provide more context for readers. The revised paragraph now includes an explanation of TransE as a translation-based embedding model that maps entities and relations into a low-dimensional vector space.The specific change can be found on line [494-499] of the revised manuscript.

Comments 20: Section 3.3.3: the text in parts 1-6 are hard to read since you are not indenting or separating each "paragraph" with blank line. You might consider changing these to bullet points to clearly show where each new "paragraph" starts. (e.g., lines 467-470, 472-473).

Response 20: Thank you for pointing this out, I have converted the text in parts 1-6 to bullet points and added blank lines between each section to improve readability.The revised content can be found on line [487-561].

Comments 21: Lines 493-496: you have references "denoted as [...]" but you don't say what X, Y, Q or Z are, so this is unclear.

Response 21: Thank you for pointing this out, I have revised the text to clarify the meanings of the denoted variables [X], [Y], [Q], and [Z]. Specifically, I have added explanations for each variable to ensure clarity.The revised text can be found on lines [552-561] of the revised manuscript.

Comments 22: Line 531-532: who does the pruning? Is this tested by hand (human) or by a program?

Response 22: Thank you for raising this question. The pruning process described in lines 531-532 is performed by an automated program, not manually by humans. The program is designed to verify the existence of each triplet in the knowledge graph and prune any reasoning chains that contain non-existent triplets. This ensures the accuracy and reliability of the generated reasoning chains.The clarification has been added to the revised manuscript on lines [601-602].

Comments 23: Line 546: don't indent.

Response 23: Thank you for pointing this out. I have removed the indentation as suggested. The correction can be found on line [615] of the revised manuscript.

Comments 24: Table 2: explain F1 and Hits@1 (what are they?).

Response 24: Thank you for your suggestion. I have added explanations for the evaluation metrics F1 and Hits@1 in the text preceding Table 2. The F1 score is described as the harmonic mean of precision and recall, indicating the model's overall accuracy, while Hits@1 measures the proportion of times the correct answer is ranked first, reflecting the model's prioritization ability. These explanations can be found on lines [682-687] of the revised manuscript.

Comments 25: Line 708: again flows past the margin, I wonder if instead of showing us the raw entity updates you can just rephrase this in English so that it is easier to decipher.

Response 25: Thank you for pointing this out. I agree that the raw entity updates were difficult to decipher. I have rephrased the content in plain English to describe the entity updates more clearly. The revised text can be found on lines [864-871] of the revised manuscript.

Reviewer 2 Report

Comments and Suggestions for Authors

Comments

As a discussion of the approach to the task of reasoning in LLMs in this paper, the approach using domain knowledge graph inference methods based on learning metapaths (DKGM-paths) is very interesting.

(1) In the introduction, a comparison of the author's inference method with existing LLMs' inference weaknesses such as “induction of inference paths, inference on knowledge graphs based on repeated queries,” and other arguments that are the strengths of this paper, or a research design map diagram would help convey the impact of the results.

(2) The construction of semantic graphs and knowledge graphs is quite dependent on the language of the prompts. Languages in which one conversational or explanatory content is redundant (e.g., Japanese), such as modifiers, particle expressions, etc., are easier to infer at the point of data input because the person entering the data provides many explanations. However, the author's argument is probably that there is very little inference-based knowledge such as input prompts in English and Chinese. The author's originality lies in the inference there and the discussion that fills in the chain. I would like to see a claim.

(3)

After 3.3.1, in terms of actual applications, complex tasks such as medical care are assumed. Neighboring relations between derivation expressions and entities (heads, triplets, etc.) in an iterative framework, but is there any preprocessing or cleansing process of data here? If the language processing system in the input data and prompts were more explicit, it would be possible to explain the effectiveness of the knowledge graph in “languages with many explanations” and “languages with few explanations,” such that the knowledge graph is serialized to output for “languages with few explanations,” and the results would have a greater impact.

(4) It is important to emphasize the impact of the methodology in this case, but at the same time, the issues should also be clarified. I would also like to see clarification on specific data processing systems, such as processing and pre-processing on the test data set.

Comments for author File: Comments.pdf

Author Response

Comments 1:In the introduction, a comparison of the author's inference method with existing LLMs' inference weaknesses such as “induction of inference paths, inference on knowledge graphs based on repeated queries,” and other arguments that are the strengths of this paper, or a research design map diagram would help convey the impact of the results.

Response 1: Thank you for your valuable suggestion. I have revised the Introduction section to highlights the limitations of LLMs in handling structured knowledge graphs, such as their limited structural understanding, inefficient inference paths, and lack of fact verification mechanisms. These limitations are contrasted with the strengths of our proposed method, which integrates structured reasoning paths and iterative queries to enhance reasoning accuracy and reliability.The changes can be found on lines[41-64], lines[78-85] and lines[255-279] of the revised manuscript.

Comments 2: The construction of semantic graphs and knowledge graphs is quite dependent on the language of the prompts. Languages in which one conversational or explanatory content is redundant (e.g., Japanese), such as modifiers, particle expressions, etc., are easier to infer at the point of data input because the person entering the data provides many explanations. However, the author's argument is probably that there is very little inference-based knowledge such as input prompts in English and Chinese. The author's originality lies in the inference there and the discussion that fills in the chain. I would like to see a claim.

Response 2: Thank you for raising this insightful point regarding the impact of language on semantic and knowledge graph construction. In the manuscript, I address the issue of semantic sparsity differences, which explains the challenges in querying structured knowledge graphs using natural language. This issue is not about the difficulty of understanding the query intent in natural language but rather about the gap between the sparse semantic space of natural language queries and the dense structure of knowledge graphs.

Regarding the concern about different languages, I agree that languages like Japanese, with more redundant conversational or explanatory content, might provide richer context for inference at the data input stage. However, the core issue I aim to address is the "query construction" aspect rather than the "query intent" aspect.

Recent advancements in LLMs, especially since the advent of InstructionGPT, have significantly improved their ability to understand human intent. With extensive parameter learning, LLMs rarely fail to accurately grasp the intent behind a query. The challenge, however, lies in the "query construction" aspect. Even if LLMs understand the human intent perfectly, they may still struggle with querying structured knowledge graphs due to the limited information available for knowledge retrieval. This limitation leads to factual insufficiency and potential hallucinations, as the models lack the necessary guidance on how to effectively query the knowledge stored in graphs.

Therefore, the novelty of this work lies in addressing this gap by constructing interfaces and automated iterative mechanisms to assist LLMs in building reasoning steps incrementally. This approach helps LLMs construct accurate reasoning chains, ensuring that they can correctly query and utilize the knowledge stored in structured knowledge graphs. The goal is to enhance the factual accuracy and reliability of reasoning by bridging the semantic sparsity gap, rather than focusing solely on understanding query intent.

I hope this clarifies the distinction and the focus of this research. The relevant discussion can be found in the Introduction and Methodology sections, specifically addressing the challenges of semantic sparsity and the proposed solutions.

And I have revised the section on "Semantic sparsity discrepancy" to more clearly distinguish between the challenges of understanding query intent and constructing effective queries for knowledge graphs. The revised text emphasizes that while LLMs have made significant progress in understanding human intent, the core issue lies in the limited information available for querying structured knowledge, leading to potential factual insufficiency and hallucinations. The changes can be found on lines [65-76] of the revised manuscript.

Comments 3: After 3.3.1, in terms of actual applications, complex tasks such as medical care are assumed. Neighboring relations between derivation expressions and entities (heads, triplets, etc.) in an iterative framework, but is there any preprocessing or cleansing process of data here? If the language processing system in the input data and prompts were more explicit, it would be possible to explain the effectiveness of the knowledge graph in “languages with many explanations” and “languages with few explanations,” such that the knowledge graph is serialized to output for “languages with few explanations,” and the results would have a greater impact.

Response 3: Thank you for your insightful question. I believe this is a follow-up to the previous question. Indeed, we do have a preprocessing pipeline, especially for the medical management data mentioned in the manuscript, as this is our primary application task. The preprocessing includes both query preprocessing and data cleansing for medical records:

Query Preprocessing:

The main goal is to transform the query into a series of simple templates designed manually. This can be achieved through some straightforward procedures, and specific examples can be found in Section 3.3.3 (Serialization). The purpose, as mentioned in the previous response, is to make the steps of querying and answering clearer for LLMs to gradually obtain information.

Medical Data Cleansing and Processing:

This part is mainly handled by another team responsible for our specific funding project, but I can provide a brief explanation. It involves filtering various forms, records, and data generated during the medical process, as well as expert experiences, based on keyword and relevance identification. The aim is to ensure that only data related to the actual medical management issues required by the project are included. They design specific algorithms and methods to achieve automated preprocessing before applying it to our step.

We will make sure to clearly present the query preprocessing steps in the manuscript, especially the part related to the templates used in the iterative reasoning framework.

Comments 4: It is important to emphasize the impact of the methodology in this case, but at the same time, the issues should also be clarified. I would also like to see clarification on specific data processing systems, such as processing and pre-processing on the test data set.

Response 4: Thank you for pointing this out. I have added a detailed description of the data preprocessing steps in the Experimental Results section of the manuscript. Specifically, we included a subsection titled "4.1.3 Data Preprocessing" to clarify the preprocessing and cleansing procedures applied to the test datasets. This section details how we transformed the query texts into structured templates.These changes can be found on lines [656-675] of the revised manuscript.

Reviewer 3 Report

Comments and Suggestions for Authors

The authors have done a great job at the paper; however, by addressing the following points, the paper's impact and clarity will improve significantly.

Abstract and Introduction: Please simplify the language of some of the sentences in the abstract. They are overly complex at the moment. Simplifying the language and breaking down longer sentences would enhance readability.
Figures and Tables: Please consider adding more descriptive captions to figures (such as Figure 1 and Figure 3) to clarify their role in illustrating the key concepts in the paper.

Literature review

Please emphasise the motivation for metapath-based reasoning earlier in the introduction to better establish the research gap.
Also, a comparison with alternative prompt-based knowledge reasoning methods (beyond chain-of-thought) in the literature review could strengthen the justification for DKGM-path. Please consider this point.

Methodology

Please add a pseudo-code or a step-by-step flowchart summarizing the reasoning process. It would improve the readability of the paper.
The paper discusses experimental results but lacks a detailed justification for the chosen evaluation metrics (e.g., Hits@1, F1 score). Please explain why these are the most suitable for assessing multi-hop reasoning performance.

Experimentation and Validation

Baseline Comparisons: The DKGM-path method is compared against several baselines, but please provide a deeper discussion of the statistical significance of improvements. Also, consider adding standard deviation/error bars where applicable.
Ablation Studies: The method introduces several key components (meta-path construction, iterative verification, post-reasoning checks). An ablation study would clarify the contribution of each component.
Generalization to Other Domains: Since DKGM-path is domain-agnostic, testing on additional knowledge graph datasets beyond WebQSP and HotpotQA would strengthen the claims about generalizability.

Discussion and Future Work

The authors have done a good job at the limitations section, but potential solutions or mitigations (e.g., methods for reducing computational complexity) could also be briefly discussed.
The future work could also explore real-time applications or cross-domain scalability to highlight possible improvements.
Also, the paper refers to "meta-paths" and "reasoning chains" interchangeably at times. It would help to define these terms precisely and use them consistently.

Comments on the Quality of English Language

Grammar and Typos

There are some minor grammatical errors and awkward phrasings that need revision. For example:

"However, Using large language models for knowledge graph reasoning can also faces challenges..." → should be "However, using large language models for knowledge graph reasoning also faces challenges..."

Author Response

Comments 1: Abstract and Introduction: Please simplify the language of some of the sentences in the abstract. They are overly complex at the moment. Simplifying the language and breaking down longer sentences would enhance readability.

Response 1: Thank you for pointing this out. I agree. Therefore, I have simplified the language in the abstract and introduction to enhance readability.These changes can be found in the revised manuscript on lines[1-5] and lines[41-46].

Comments 2: Figures and Tables: Please consider adding more descriptive captions to figures (such as Figure 1 and Figure 3) to clarify their role in illustrating the key concepts in the paper.

Response 2: Thank you for pointing this out. I have revised the captions of Figure 2(now) and Figure 4(now) to provide more descriptive and detailed information that clarifies their role in illustrating the key concepts of the paper.These changes can be found on page [7] for Figure 2 and page [21] for Figure 4.

Comments 3: Please emphasise the motivation for metapath-based reasoning earlier in the introduction to better establish the research gap. Also, a comparison with alternative prompt-based knowledge reasoning methods (beyond chain-of-thought) in the literature review could strengthen the justification for DKGM-path. Please consider this point.

Response 3: Thank you for your valuable suggestions. I agree that emphasizing the motivation for metapath-based reasoning earlier in the introduction and comparing alternative prompt-based knowledge reasoning methods in the literature review are important to better establish the research gap and strengthen the justification for DKGM-path.

Therefore, I have made the following revisions:

In the Introduction: I have further clarified the purpose of the research and emphasized the limitations of existing LLMs when directly applied to knowledge graph reasoning. Specifically, I have highlighted the issue of semantic sparsity discrepancy and how it affects the construction of queries for LLMs. The revised text can be found on lines [50-85].

In the Literature Review: I have added a comparative analysis of alternative prompt-based knowledge reasoning methods beyond chain-of-thought. This section now includes a detailed discussion on how these methods differ from our proposed DKGM-path and why our approach is better suited to address the identified limitations. The revised section can be found on lines [254-279].

Comments 4: Please add a pseudo-code or a step-by-step flowchart summarizing the reasoning process. It would improve the readability of the paper.

Response 4: I agree with this comment. Therefore, I have added a detailed pseudo-code summarizing the iterative reasoning process to improve the readability of the paper. The new pseudo-code is inserted on page [14], Algorithm 1, immediately following the description of the reasoning steps.

Comments 5: The paper discusses experimental results but lacks a detailed justification for the chosen evaluation metrics (e.g., Hits@1, F1 score). Please explain why these are the most suitable for assessing multi-hop reasoning performance.

Response 5: Thank you for pointing this out. I have added a detailed explanation at the beginning of the experimental results section to justify the choice of evaluation metrics, specifically Hits@1 and F1 score. This new paragraph explains why these metrics are the most suitable for assessing multi-hop reasoning performance. The revision can be found on line [682-687]. The added content is marked in red for clarity.

Comments 6: Baseline Comparisons: The DKGM-path method is compared against several baselines, but please provide a deeper discussion of the statistical significance of improvements. Also, consider adding standard deviation/error bars where applicable.

Response 6: Thank you for pointing this out. Therefore, I have added a deeper discussion of the statistical significance of the improvements and included standard deviations in the tables to reflect the variability in the results.

Specifically, I have revised Tables 2, 3, and 4 to include standard deviations for each metric. For example, in Table 2, the performance of DKGM-path on the WebQSP dataset is now reported as 77.6 ± 0.7 for F1 and 78.4 ± 0.9 for Hits@1, indicating the mean performance and the standard deviation across multiple runs. Similar revisions have been made to other tables.

Additionally, I have added some discussion on the statistical significance of the improvements in the respective sections. The analysis shows that the performance gains of DKGM-path over other baselines are statistically significant (p < 0.05), demonstrating the robustness and effectiveness of our approach.These changes can be found on pages 14 to 18 , Tables 2, 3, and 5 of the revised manuscript.

Comments 7: Ablation Studies: The method introduces several key components (meta-path construction, iterative verification, post-reasoning checks). An ablation study would clarify the contribution of each component.

Response 7: Thank you for pointing this out. I agree with this comment. Therefore, I have added a new section titled "Ablation Studies" to evaluate the contribution of each key component of our method, including meta-path construction, iterative verification, and post-reasoning checks. This section systematically removes or modifies each component to observe its impact on overall performance, thereby clarifying their individual contributions.The new section can be found on line [791-824].

Comments 8: Generalization to Other Domains: Since DKGM-path is domain-agnostic, testing on additional knowledge graph datasets beyond WebQSP and HotpotQA would strengthen the claims about generalizability.

Response 8: Thank you for pointing this out. I agree with this comment and have conducted additional experiments on the MedicalQA dataset, a domain-specific knowledge graph for medical question answering obtained from OpenKG (\url{http://data.openkg.cn/dataset/mediacalqa}). I have added a new table (Table 4) and a detailed analysis section to demonstrate the performance of DKGM-path on this dataset.

Comments 9: The authors have done a good job at the limitations section, but potential solutions or mitigations (e.g., methods for reducing computational complexity) could also be briefly discussed. The future work could also explore real-time applications or cross-domain scalability to highlight possible improvements.

Response 9: Thank you for pointing this out. We agree with this comment and have added a new section titled "Future Work" to the revised manuscript. In this section, we outline several potential directions for future research to address the limitations of the current DKGM-path method and highlight possible improvements.

Specifically, we highlight that the current design of DKGM-path prioritizes reasoning accuracy and interpretability over real-time performance, which is suitable for complex reasoning tasks in domain-specific contexts. However, recognizing the potential need for real-time capabilities in certain scenarios, such as automated monitoring of domain data, we outline plans to investigate more efficient prompting mechanisms and optimize the interaction between the knowledge graph and the LLM.

These revisions can be found in the "Future Work" section on line[931-952] of the revised manuscript.

Comments 10: Grammar and Typos

There are some minor grammatical errors and awkward phrasings that need revision. For example:

"However, Using large language models for knowledge graph reasoning can also faces challenges..." → should be "However, using large language models for knowledge graph reasoning also faces challenges..."

Response 10: Thank you for pointing this out. I agree with this comment. Therefore, I have carefully reviewed and revised the entire manuscript to correct minor grammatical errors and awkward phrasings. The specific example you mentioned has been corrected. And I have addressed other similar issues throughout the manuscript to ensure clarity and grammatical accuracy.

Article Menu

Enhancing Domain-Specific Knowledge Graph Reasoning via Metapath-Based Large Model Prompt Learning

Methodology

Experimentation and Validation

Discussion and Future Work

Further Information

Guidelines

MDPI Initiatives

Follow MDPI