Next Article in Journal
Experimental Research on the Strength Characteristics of Artificial Freeze–Thaw Cement-Improved Soft Clay
Previous Article in Journal
Finite Element Analysis of Material and Structural Design for Tri-Camera Imaging Stability
Previous Article in Special Issue
Risk Analysis and Resilience of Humanitarian Aviation Supply Chains: A Bayesian Network Approach
 
 
Article
Peer-Review Record

Bridging Text and Knowledge: Explainable AI for Knowledge Graph Classification and Concept Map-Based Semantic Domain Discovery with OBOE Framework

Appl. Sci. 2025, 15(22), 12231; https://doi.org/10.3390/app152212231
by Raúl A. del Águila Escobar 1,*, María del Carmen Suárez-Figueroa 2, Mariano Fernández López 3 and Boris Villazón Terrazas 4
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4:
Reviewer 5: Anonymous
Appl. Sci. 2025, 15(22), 12231; https://doi.org/10.3390/app152212231
Submission received: 21 October 2025 / Revised: 10 November 2025 / Accepted: 14 November 2025 / Published: 18 November 2025
(This article belongs to the Special Issue Explainable Artificial Intelligence Technology and Its Applications)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

 

The paper is written more like project report than as research paper. It is too long, with too many bullets numbering, code and prompt details – only those innovative should be kept.

Also these “Inputs and outputs” e.g. row 243 one would not expect in research paper in this extent. Trivial things like topic assignment can be explained in one sentence. Nothing innovative in applying LDA. Table A.1 can be explained in one sentence no need for Appendix A nor table. More interesting would be to see which “default topic vocabulary” was used in research.

In 3.2.2. (B) Representation collection of triples is not explained, what is subject, objerct, predicate… is it from sintacted tree or some semantic layer? And which particular Stanza part/module was used for that (link to module not general site).

Not SpaCy but spaCy is the name of library.

“domains identified on the specific domain” not clear.

The paper presents an application of the OBOE framework for explainable classification of knowledge graphs and concept maps derived from text. While the topic is timely and the case study demonstrates the potential of combining explainable AI techniques with graph-based representations, the submission does not meet the standards of a scientific research article.

It is not clear why “R. A. del Águila Escobar, M. C. Suárez-Figueroa, M. Fernández-López, “OBOE: an Explainable Text Classification Frameworkˮ, International Journal of Interactive Multimedia and Artificial Intelligence, vol. 8, no. 6, pp. 24-37, 2024, http://dx.doi.org/10.9781/ijimai.2022.11.001 is not cited?”

Strengths: The paper offers a clear and structured description of the existing OBOE framework and its modular components. The topic of explainability and hybrid XAI–LLM systems is relevant and of current interest.

Major Weaknesses ia the lack of novelty. The paper does not introduce a new model, dataset, or algorithmic contribution. It merely applies an existing framework (OBOE) to a specific use case. There is no methodological innovation or comparative analysis that would advance the state of the art.

The work reads as a case study or technical demonstration rather than a scientific investigation. It focuses on showcasing the framework rather than addressing a clear research question or hypothesis. The reported results are descriptive and lack statistical or comparative rigor. No baselines, ablation studies, or reproducible experimental settings are provided. 

The discussion remains at a conceptual level, reiterating the framework’s design rather than providing critical analysis, limitations, or insights grounded in empirical evidence.

While the topic is interesting and relevant, the manuscript should be categorized as a technical report or case study rather than a scientific research paper. It lacks sufficient novelty, methodological contribution, and empirical validation to justify publication in its current form. I therefore recommend major .

Comments on the Quality of English Language

Quality of language is not the problem, but the style of writing is appropriate for a case study or project report, not for a scientific paper.

Author Response

Response to Reviewer 1

MDPI Applied Sciences

We deeply appreciate your comprehensive review and especially thank you for highlighting the need to clarify the fundamental research contribution of our work. Your feedback has been invaluable in helping us articulate more clearly that this paper presents a fundamentally different problem.

Major Comments:

Comment 1: "The paper is written more like project report than as research paper. It is too long, with too many bullets numbering, code and prompt details."

Response: We acknowledge this important observation and have substantially restructured the manuscript to align with research paper standards:

  • Reestructurated Methods and Results sections.
  • Eliminated extensive listings and prompt details, keeping only the innovative methodological aspects
  • Significantly reduced input/output descriptions (from 42 mentions to 8 essential ones)
  • Transformed bullet-point lists into concise prose where appropriate
  • Moved technical implementation details to supplementary materials

Comment 2: "Trivial things like topic assignment can be explained in one sentence. Nothing innovative in applying LDA. Table A.1 can be explained in one sentence no need for Appendix A nor table."

Response: We have streamlined the presentation of standard techniques:

  • Condensed the LDA topic assignment explanation to essential methodological points
  • Removed Table A.1 and replaced it with a brief textual description
  • Focused the narrative on our novel contributions: the integration of KGE with clustering for domain discovery and the triple-role application of LLMs

Comment 3: "In 3.2.2. (B) Representation collection of triples is not explained, what is subject, object, predicate… is it from syntactic tree or some semantic layer?"

Response: We have added a comprehensive explanation of triple extraction methodology and included Algorithm 2 to clarify the process.

Comment 4: "Not SpaCy but spaCy is the name of library."

Response: Corrected throughout the manuscript. All instances now properly use "spaCy" with correct capitalization.

Comment 5: "It is not clear why [OBOE paper citation] is not cited?"

Response: We acknowledge this oversight. The original OBOE framework paper has now been properly included in Bibliography as reference [1] and is referenced throughout the manuscript to establish the foundation upon which this work builds.

Comment 6: "The paper does not introduce a new model, dataset, or algorithmic contribution. It merely applies an existing framework (OBOE) to a specific use case."

Response: We sincerely thank you for this crucial feedback, as it highlights that the fundamental contribution of our work was not sufficiently clear. We have substantially revised the manuscript to emphasize that this is not merely an application of the existing OBOE framework, but rather a fundamentally different problem: While the original OBOE addressed a text classification task derived from texts with prediction explanations, this work represents a fundamental departure from the original OBOE framework, addressing conceptually different problems.

The original OBOE was designed to explain supervised text classification—answering "why does this text belong to class X?" In contrast, our work tackles two interconnected challenges: (1) explainable classification of knowledge graphs—answering "why does this knowledge graph belong to category Y?" and (2) unsupervised domain discovery—answering "how and why do concepts within the graph cluster into semantic domains?"

This distinction is not merely technical but represents a shift from explaining text-level predictions to explaining both graph-level classification and internal semantic structure. This fundamentally different dual objective required novel methodological contributions:

  • Integration of Knowledge Graph Embeddings (TransE, ConvKB, ComplEx) for semantic representation
  • Comparative analysis of hierarchical vs. spectral clustering for domain discovery
  • Triple-role application of LLMs: structure explanation, hallucination verification, and evaluation at scale
  • QualIT-inspired prompting strategies with Chain-of-Thought reasoning
  • New evaluation paradigm: Development of multi-faceted evaluation combining clustering metrics, LLM-based assessment, and human-aligned scoring for structural explanations
  • Moreover, its implications differ regarding the types of real-world problems to which it can be applied applied (Sections 1 and 5.5)

The revised abstract and introduction now clearly articulate this distinction, emphasizing that we address "explainable domain discovery" as a new XAI paradigm.

Comment 7: "The reported results are descriptive and lack statistical or comparative rigor. No baselines, ablation studies, or reproducible experimental settings are provided."

Response: We acknowledge this critical observation and have substantially enhanced the statistical rigor of our evaluation:

Statistical Testing: Added comprehensive statistical testing (Appendix B), which confirms that our results are not merely descriptive but demonstrate statistically significant improvements with reproducible experimental protocols.

Systematic Comparisons: While direct comparison with existing frameworks is not feasible (as they solve different problems), we established internal comparisons using several embeddings and clustering methods and their effect in the evaluation results:

  • KGE methods (TransE, ConvKB, ComplEx)
  • Effect of clustering algorithms (hierarchical vs. spectral)
  • Contribution of LLM components (generation, verification, evaluation)

Reproducibility: Enhanced experimental settings documentation:

  • Complete hyperparameter specifications
  • Random seeds fixed for all experiments (seed=42)
  • Detailed computational requirements (GPU, memory, runtime)

We believe these revisions have substantially strengthened the manuscript, clarifying its novel contribution to the XAI field and addressing all your concerns.

Sincerely,

The Authors

Reviewer 2 Report

Comments and Suggestions for Authors

The paper presents an implementation of the OBOE (explanations based on concepts) framework in the context of knowledge organization, using concept maps derived from natural language texts. In this paper, the Authors discovered how graph structures derived from texts were classified and explained based on the semantic similarity of concepts present in these graphs. The topic is interesting, and the paper corresponds well with the journal’s aim and scope.
In the Introduction, the Authors presented the basic information about the knowledge graphs, the aim of the paper, and their contributions. The information about the structure of the paper is missing. Next, the research problem description should be added to this section. In section 2, the Authors presented the analysis of the selected frameworks. I suggest adding the summary, based on the attached Table 1.
The results description is well-constructed. The discussion is robust and effectively enhances the Conclusions section. The figures are also well-prepared.

Author Response

We appreciate your positive feedback and valuable suggestions for improving the manuscript structure.

Comments:

Comment 1: "The information about the structure of the paper is missing."

Response: Added at the end of the Introduction:

"The remainder of this paper is structured as follows: Section 2 reviews related work and positions our contribution; Section 3 presents the methodology including KGE integration and clustering approaches; Section 4 details experimental results; Section 5 discusses findings, limitations, and practical implications; Section 6 concludes with future research directions."

Comment 2: "Next, the research problem description should be added to this section."

Response: Expanded the Introduction with a clear problem statement:

"This work addresses two interconnected challenges: (1) explainable classification of knowledge graphs—answering "why does this knowledge graph belong to category Y?" and (2) unsupervised domain discovery—answering "how and why do concepts within the graph cluster into semantic domains"

Comment 3: "I suggest adding the summary, based on the attached Table 1."

Response: Added a comprehensive Section after Table 1, synthesizing the comparative analysis of frameworks and highlighting the unique positioning of our approach in addressing structural explanation rather than prediction justification.

We believe these revisions have substantially strengthened the manuscript, clarifying its novel contribution to the XAI field and addressing all your concerns. We are grateful for the opportunity to improve our work and look forward to your further feedback.

Sincerely,

The Authors

Reviewer 3 Report

Comments and Suggestions for Authors

This paper presents a description of a partially supervised OBOE system for customizable explainable text classification. The authors examine its applicability in scenarios where knowledge is organized as concept maps derived from text arrays.

The relevance of this work lies in the fact that the use of explainable artificial intelligence has become crucial in solving various problems, particularly in understanding the decision-making processes of "black-box" algorithms, particularly in text classification tasks.

The synthesis of traditional and generative methods is a distinct strength of this work.

The authors also employ a comprehensive validation system for their research results.

Furthermore, the code and data are available on GitHub, which complies with open science requirements.

This work is innovative and has great potential for publication in the journal "Applied Sciences". However, comments and suggestions are needed to help clarify the results for readers. 

Comments and Suggestions for the authors:

  1. The main weakness of this study is the limited scope of the empirical data. The authors draw conclusions based on only two small corpora.
  2. The paper would be significantly improved if the authors conducted a comparative analysis of OBOE's performance with similar approaches.
  3. In Section 5, please remove duplicate descriptions from the methods section and add practical implications.
  4. Please edit Sections 3 and 4 to systematize and simplify the description of component inputs and outputs.
  5. Please improve the visualization of the research results.
  6. Please improve the stylistic editing of the text.

Comments for author File: Comments.pdf

Author Response

We greatly appreciate your positive assessment of our work's relevance and your constructive suggestions for improvement.

Comments:

Comment 1: "The main weakness of this study is the limited scope of the empirical data. The authors draw conclusions based on only two small corpora."

Response: We have expanded our experimental validation:

  • Added evaluation on the Reuters corpus (21,578 documents) in addition to Amazon Reviews and BBC News
  • This dataset provides highly specialized linguistic characteristics not present in Amazon or BBC
  • Increased the diversity of domains tested (now covering product reviews, news articles, and multi-topic classification)
  • In addition, most practical XAI applications involve specialized corpora of 1K-10K documents such as Corporate compliance (reviewing policy documents) or Medical diagnosis support (case histories). We believe that our scale reflects these realistic scenarios better than web-scale datasets
  • Acknowledged in the limitations section that further validation on domain-specific corpora with external knowledge resources would strengthen generalizability

Comment 2: "The paper would be significantly improved if the authors conducted a comparative analysis of OBOE's performance with similar approaches."

Response: We appreciate this suggestion and have addressed it within the constraints of our novel problem domain:

  • Added Section 2.5 "Comparative Positioning" explaining why direct quantitative comparison with existing frameworks is not feasible (as they address different problems: prediction explanation vs. domain discovery)
  • Nevertheless, to address this suggestion we improved Table 1 to emphasize qualitative comparison with related approaches (topic modeling, clustering methods, LLM-based systems)
  • Provided comparative analysis between hierarchical and spectral clustering within our framework
  • Emphasized that our contribution establishes a new baseline for explainable domain discovery

Comment 3: "In Section 5, please remove duplicate descriptions from the methods section and add practical implications."

Response:

  • Removed redundant methodological descriptions from the Discussion section
  • Added a new subsection (5.5) discussing some practical implications

Comment 4: "Please edit Sections 3 and 4 to systematize and simplify the description of component inputs and outputs."

Response: Substantially simplified these sections:

  • Consolidated component descriptions into concise prose

Comment 5: "Please improve the stylistic editing of the text."

Response: We acknowledge this important observation and have substantially restructured the manuscript to align with this comment:

  • We restructured the writing to make the document more readable
  • Eliminated extensive listings and prompt details, keeping only the innovative methodological aspects
  • Significantly reduced input/output descriptions (from 42 mentions to 8 essential ones)
  • Transformed bullet-point lists into concise prose where appropriate
  • Moved technical implementation details to supplementary materials

Comment 6: "Please improve the visualization of the research results."

Response: We have included visualizations both in the main body of the article and in the appendices to facilitate the interpretation of the results in each section:

  • Added Figure 2: Box plot showing cross-validation variance
  • Added Figures 3–5: Radar plots comparing topic modeling metrics (coherence, relevance, and coverage) across corpora and between hierarchical and spectral clustering strategies
  • Added several figures in the appendices:
  • UMAP projections (Figures B.2.1–3), accompanied by a discussion of the visualizations
  • Silhouette, dendrogram, and UMAP projection as an illustrative example from the analysis of Topic 3 in the BBC corpus
  • Topic-by-topic comparison of coherence, coverage, and relevance metrics between hierarchical and spectral clustering across corpora (Amazon, BBC, and Reuters) — Figures E.1 to E.3

We believe these revisions have substantially strengthened the manuscript, clarifying its novel contribution to the XAI field and addressing all your concerns. We are grateful for the opportunity to improve our work and look forward to your further feedback.

Sincerely,

The Authors

Reviewer 4 Report

Comments and Suggestions for Authors

This paper extends the OBOE framework to the classification and explanation of knowledge graphs derived from concept maps, integrating topic modeling, KGE-based embeddings, hierarchical clustering, and LLM-enhanced explanation verification. The modular design is well-structured and supports flexible adaptation across different datasets. The introduction of hallucination-prevention mechanisms and multi-level evaluation metrics demonstrates meaningful progress toward interpretable AI.

However, some methodological decisions remain insufficiently justified, such as hyperparameter thresholds and embedding dimensionality. Additionally, the paper would benefit from expanded failure case analysis and visualization of embedding behavior to help readers better understand the interpretability pipeline. Overall, this is a valuable contribution with solid potential for future development.

I have some revision suggestions about this paper as follows

1. Compare multiple KGE methods and clustering strategies to better justify architectural choices.

2. Discuss explanation degradation when topics overlap or entity extraction is noisy to improve robustness claims.

3. Embedding plots, dendrogram cuts, and user refinement screenshots would enhance interpretability and reproducibility.

Comments on the Quality of English Language

The quality of the English language is generally good and clear. The manuscript is readable, well-structured, and technically understandable.

Author Response

Thank you for recognizing the value of our contribution and providing specific technical suggestions.

Comments:

Comment 1: "Compare multiple KGE methods and clustering strategies to better justify architectural choices."

Response: Enhanced our comparative analysis:

  • Expanded evaluation to include TransE, ConvKB, and ComplEx embeddings architectures
  • Included Spectral Clustering, which operates directly on similarity matrices—the natural representation for semantic relationships in knowledge graphs—using eigendecomposition to identify clusters based on graph connectivity patterns rather than geometric distance. This complements hierarchical clustering's variance-minimization approach, as spectral methods excel at discovering non-convex semantic structures and overlapping domains common in knowledge graphs. Both methods work from the same semantic similarity foundation (cosine similarity of embeddings), but reveal different structural patterns: hierarchical clustering exposes taxonomic relationships through dendrograms, while spectral clustering captures latent semantic communities through graph partitioning
  • Provided detailed comparison between hierarchical and spectral clustering into Explanations results
  • Added justification for each architectural choice based on performance metrics

Comment 2: "Discuss explanation degradation when topics overlap or entity extraction is noisy to improve robustness claims."

Response: Added comprehensive failure analysis:

  • New subsection 5.9 "Failure Analysis and Robustness"

Comment 3: "Embedding plots, dendrogram cuts, and user refinement screenshots would enhance interpretability and reproducibility."

Response: Enhanced visualizations:

We have included visualizations both in the main body of the article and in the appendices to facilitate the interpretation of the results in each section:

  • Added Figure 2: Box plot showing cross-validation variance
  • Added Figures 3–5: Radar plots comparing topic modeling metrics (coherence, relevance, and coverage) across corpora and between hierarchical and spectral clustering strategies

Added several figures in the appendices:

  • UMAP projections (Figures B.2.1–3), accompanied by a discussion of the visualizations
  • Silhouette, dendrogram, and UMAP projection as an illustrative example from the analysis of Topic 3 in the BBC corpus
  • Topic-by-topic comparison of coherence, coverage, and relevance metrics between hierarchical and spectral clustering across corpora (Amazon, BBC, and Reuters) — Figures E.1 to E.3

We believe these revisions have substantially strengthened the manuscript, clarifying its novel contribution to the XAI field and addressing all your concerns. We are grateful for the opportunity to improve our work and look forward to your further feedback.

Sincerely,

The Authors

Reviewer 5 Report

Comments and Suggestions for Authors

Dear Authors,
Thank you for the opportunity to review your manuscript "Bridging Text and Knowledge: Explainable AI for Knowledge Graph Classification and Concept Map-Based Semantic Domain Discovery with OBOE Framework" sent to Applied Sciences Journal.
I put below a list of recommendations for the revised version of the article.

- within the Abstract, you use "OBOE" acronym at line 19. I recommend you also include the full description of the acronym, so that the readers understand what are you refering to.

- the Introduction chapter needs to clarify the following aspects: the research gap, the research goal and the research questions. At this moment, they are a bit mixed and merged, and the reader can't fully understand them Thus, I recommend you clearly define them.

- sections "2.3. Knowledge Graphs and Concept Maps" and "2.4. Knowledge Graph Embeddings" are too short (only 3 - 4 lines of text). Please think about merging them or reorganizing the texts.

- under "Table 1Comparative overview of recent explainable text classification frameworks", I recommend you specify the source of the data.

- at the end of the Related Work chapter, you should define and describe the research hypotheses, because they are completely missing from your manuscript now. Each modern research article should take into consideration at least one research hypothesis. Later in the article, based on the research results, you should validate (or not) the hypothesis. This way, the readers know what is your contribution to the field of knowledge.

- within text, you have too many spaces and blanks (for example: lines 437, 439, 444). Please take into consideration this issue and update it.

- in my version of the pdf file, the values from the first column of the "Table 5 LLM Based explanation evaluation" are unclear. Please revise it.

- in the Discussion chapter, I recommend you include a discussion about the validation of the research hypotheses.

Best Wishes!

Author Response

Thank you for your detailed review and specific recommendations for improving the scientific rigor of our manuscript.

Comments:

Comment 1: "Within the Abstract, you use 'OBOE' acronym at line 19. I recommend you also include the full description."

Response: Corrected. The abstract now includes: "OBOE (explanatiOns Based On concEpts) framework"

Comment 2: "The Introduction chapter needs to clarify: the research gap, the research goal and the research questions."

Response:

Restructured the Introduction and Related Work with clear delineation:

  • Research Questions: Added three explicit RQs focusing on KGE representation, clustering comparison, and LLM evaluation
  • Added Research Hypothesis and Research Gap (Section 2.6)
  • Remarked the research goal in the Introduction:

    This work addresses two interconnected challenges: (1) explainable classification of knowledge graphs—answering "why does this knowledge graph belong to category Y?" and (2) unsupervised domain discovery—answering "how and why do concepts within the graph cluster into semantic domains.

Comment 3: "Sections 2.3 and 2.4 are too short (only 3-4 lines of text)."

Response: Merged these sections into a comprehensive "Knowledge Representation and Embeddings" section with expanded content on both concept maps and KGE techniques.

Comment 4: "You should define and describe the research hypotheses."

Response: Added explicit research hypotheses after the literature review:

Each hypothesis is validated in the Results section and discussed in the Discussion.

Comment 5: "Within text, you have too many spaces and blanks."

Response: Thoroughly reviewed and corrected all formatting issues, removing unnecessary blank lines and ensuring consistent spacing throughout.

Comment 6: "The values from the first column of Table 5 are unclear."

Response: Reformatted Table 5 and 6 with clear column headers and improved readability. All metrics are now properly labeled and values are clearly displayed.

Comment 7: "In the Discussion chapter, include a discussion about the validation of the research hypotheses."

Response: Sections 5.1 to 5.4 now explicitly discussing how our results support or refute each hypothesis, with quantitative evidence and implications for the field.

We believe these revisions have substantially strengthened the manuscript, clarifying its novel contribution to the XAI field and addressing all your concerns. We are grateful for the opportunity to improve our work and look forward to your further feedback.

Sincerely,

The Authors

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

I confirm that the authors have carefully and comprehensively addressed all comments and suggestions provided during the review process. The revised version of the manuscript demonstrates substantial improvement in structure, clarity, and scientific rigor. The authors successfully transformed the paper from a project-style report into a well-balanced research article, reducing unnecessary detail while emphasizing methodological innovation and conceptual contributions.

All major issues raised, such as clarification of the triple extraction methodology, correction of terminology, inclusion of the missing OBOE citation, and the articulation of the paper’s original contribution beyond the existing framework, have been adequately resolved. The revised manuscript now clearly defines its dual objectives, methodological novelty, and evaluation rigor, supported by additional statistical analysis, baselines, and reproducibility details.

Given the extent and quality of the revisions, we find that the authors have fully met the requirements outlined in the previous review round. The manuscript is now clear, coherent, and scientifically sound. We recommend the paper for acceptance in its current form.

Comments on the Quality of English Language

Quality of language is not the problem, but the style of writing is appropriate for a case study or project report, not for a scientific paper.

Back to TopTop