Context Is King: Large Language Models’ Interpretability in Divergent Knowledge Scenarios

Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe article introduces Context-Driven Divergent Knowledge Evaluation, a novel methodology for evaluating the interpretability of LLMs in different contexts. It is well-written and proposes interesting results.
Author Response
We appreciate the positive feedback and are delighted that the reviewer found our methodology and results novel and interesting.
Reviewer 2 Report
Comments and Suggestions for AuthorsThis paper proposes the "Context-Driven Divergent Knowledge Evaluation (CDK-E)" as a novel context-injection method to evaluate the interpretability of LLMs. There are several novelties about the work:
1) designing divergent contexts to deliberately introduce information that conflicts with the model's inherent knowledge to systematically evaluate the model's ability to adapt to context
2) generating specific datasets DKD to ensure the context design is part of a controlled variable experiment.
3) conducting comprehensive evaluation, including employing LLM-based metrics and semantic similarity across multiple dimensions to assesses the model's contextual interpretability.
The paper is well-written and easy to follow. The evaluations are comprehensive, and the proposed methodology has some novelty compared to the existing ones including RAG and CoT.
Author Response
We are grateful for the reviewer’s recognition of the novelties in our proposed methodology and for their positive evaluation of the manuscript.
Reviewer 3 Report
Comments and Suggestions for AuthorsAuthors focused on interpretability through interaction techniques. They introduce the Context-Driven Divergent Knowledge Evaluation (CDK-E) methodology, designed to evaluate the interpretability of large language models within context-divergent scenarios. They have presented a diagram for the method of visualizing. Authors argue that the CDK-E alone does not fully address all challenges related to explainability. I absolutely agree with that argument, I would expect some statistics of the past achievements in the evaluation process. The topic should be continued in the future research, to collect more evidence on the LLM evaluation approach. My question is if the results are domain dependent. Maybe authors could expand the bias issue and further differentiate it.
Author Response
We thank the reviewer for their valuable feedback and insightful comments. Based on the suggestion regarding future research directions, we have updated the Implications for Future Research section to include the following sentence:
We believe this addition aligns with the reviewer’s suggestion to explore these aspects further and provides a clear path for future investigation.