Verified Language Processing with Hybrid Explainabilityâ€
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe paper addresses important problems and proposes some solutions.
However, the article is not written in the format of a paper. Currently, it is 85 pages long, which is excessively long and, as a result, will draw limited interest from the reader. To me, it seems that the article is written as a thesis, which has been submitted without being transformed into a paper. It has too many research questions to be addressed typically in a paper.
Some pictures do not display, for example, Fig. 16(h), Fig. 22.
This reviewers suggest resubmission of the paper after transforming the article accordingly.
Author Response
Please see the attached document.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe study is explicitly limited to "simple and factoid sentences" and current datasets where "entities only have at most one preposition." This severely restricts the applicability and generalizability of LaSSI. Real-world language is far more complex, abundant with multi-prepositional phrases, nuanced meanings, and non-factoid expressions (e.g., opinions, hypotheticals, metaphors, sarcasm).
The authors state that "currently available datasets are unsuitable to test our system under our premises" and the need for "completely re-annotating these datasets" is a major red flag. It suggests that the current evaluation might be on a highly specialized or custom dataset that doesn't reflect the diversity and complexity of general NLP tasks. This raises concerns about the external validity of their "preliminary results."
While transformers might struggle with direct paraconsistent reasoning using cosine similarity, there are other AI approaches explicitly designed for logical reasoning and knowledge representation (e.g., symbolic AI, logic programming, fuzzy logic, answer set programming). The paper doesn't sufficiently compare LaSSI against these established methods, which are inherently better suited for logical inference.
Large Language Models (LLMs) are constantly evolving, and fine-tuning, prompt engineering, or integration with external knowledge bases (e.g., Retrieval-Augmented Generation - RAG) can significantly enhance their logical reasoning and understanding of complex sentence structures, even for "deep sentence understanding." The paper's criticism of transformers appears somewhat outdated or overly generalized without a detailed breakdown of the transformer models used and their specific training/fine-tuning.
While true that large transformers are resource-intensive, dismissing them entirely based on this without detailed metrics for the specific tasks where LaSSI claims superiority is weak. For many real-world applications, the performance gains of transformers might justify the resource cost. The stated intention to compare with "other lightweight NLP methods" in future work is a significant omission for a current paper claiming novel contributions.
Many modern approaches in AI focus on automatically extracting knowledge from text (e.g., knowledge graph embedding, open information extraction) or leveraging very large pre-existing KBs (e.g., Wikidata, DBpedia) rather than relying on manual curation. The paper doesn't discuss how LaSSI could integrate with or benefit from these automated or pre-existing large-scale knowledge sources, which is crucial for real-world applications where manual KB maintenance is impractical.
How does "paraconsistent reasoning" translate into practical benefits for real-world NLP problems? In what scenarios is it crucial to handle contradictory information in this manner? This application-level justification is missing.
Author Response
Please see the attached document.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsSee the attaching file
Comments for author File: Comments.pdf
Author Response
Please see the attached document.
Author Response File: Author Response.pdf
Reviewer 4 Report
Comments and Suggestions for Authors
This manuscript proposes LaSSI (Logical, Structural, and Semantic text Interpretation), a pipeline that enables verified, explainable natural language processing (NLP) by transforming full-text sentences into first-order logic (FOL) representations. Unlike current state-of-the-art neural approaches, LaSSI provides both human- and machine-readable explanations and aims to distinguish between implication, indifference, and inconsistency in sentence similarity tasks—a nuance not captured by standard models or datasets. The approach is grounded in hybrid explainability, combining syntactic analysis (via Stanford CoreNLP), semantic enrichment (using resources like ConceptNet, GeoNames), and formal logic (via Montague Grammar and Parmenides Ontology). It provides a three-stage explanation framework: a priori (contextual and semantic enrichment), ad hoc (rewriting into logical programs), and ex post (deriving confidence-based similarity scores). The authors benchmark LaSSI against pre-trained language models (Sentence Transformers, ColBERTv2, DeBERTaV2+AMR-LDA) and show its superior ability to distinguish entailment types, recognize logical connectives, and perform in spatiotemporal reasoning. Strenghts: 1. The paper introduces a functional logic-based pipeline that merges formal linguistics, ontologies, and NLP—pioneering verified AI in natural language understanding. 2. Multiple research questions (e.g., on logical entailment, structure sensitivity, and scalability) are rigorously tested through theoretical analysis, controlled experiments, & benchmarks on datasets derived from ConceptNet and crafted test cases. 3. The LaSSI pipeline includes fine-grained intermediate representations, dependency parsing, logical rewriting, and confidence-based similarity, offering clear reasoning traceability (illustrated well in Figures 3–5 and 14–16). 4. The authors provide links to source code (LaSSI GitHub), datasets, and ontologies, enhancing reproducibility and facilitating further work in this domain. 5. Intuitive Experimental Design: The manuscript demonstrates how LLMs and AMR-based models fall short in capturing logical semantics—especially spatiotemporal and structural nuances—and contrasts them against LaSSI’s logic-grounded method. Weaknesses: 1. While the method is conceptually rich, the high technical complexity and dense formalism (e.g., logical rule rewriting, tabular semantics, FOL transformations) may hinder accessibility for broader NLP audiences. 2. The datasets used are carefully constructed but limited in scale and diversity. There's insufficient exploration of performance on real-world, noisy, large-scale corpora. 3. Scalability Concerns in Practice: 4. The approach's success depends on the quality and coverage of hand-crafted rules and ontologies (e.g., Parmenides). The process of expanding or adapting this framework to other domains or languages is not fully addressed. 5. Although grounded partly in Italian linguistic theory, the generalization to typologically diverse languages is not empirically demonstrated, even though claimed to be applicable. 6. The manuscript is unusually long. Suggestions: For challenges related to LLMs such as hallucination, inherent biases, and stochastic parroting authors can include more recent and relevant research as suggested below: Unlocking LLMs: Addressing Scarce Data and Bias Challenges in Mental Health and Therapeutic Counselling (URL: https://aclanthology.org/2024.nlpaics-1.26/ ) Intentional Biases in LLM Responses (DOI: 10.1109/UEMCON59035.2023.10316060) On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? (URL: https://dl.acm.org/doi/10.1145/3442188.3445922 ) Ask the experts: sourcing a high-quality nutrition counseling dataset through Human-AI collaboration (https://aclanthology.org/2024.findings-emnlp.674/) Flocks of Stochastic Parrots: Differentially Private Prompt Learning for Large Language Models (URL: https://proceedings.neurips.cc/paper_files/paper/2023/file/f26119b4ffe38c24d97e4c49d334b99e-Paper-Conference.pdf ) Automating Bias Testing of LLMs (DOI: 10.1109/ASE56229.2023.00018 ) |
The English could be slightly improved to more clearly express the research and spot the typo errors.
Author Response
Please see the attached document.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsI believe the paper is still very lengthy (55 pages) to draw any interest from readers.
Author Response
Only if the reviewer provides clear guidance can we ensure that we meet their expectations. Due to a lack of information from the reviewer, we can only guess. We think that further removing experiments will come at the detriment of the accessibility of the paper, by not remarking on the problems from competing approaches. Also, the Related Work section remarks on the problems for competing approaches, which is then essential. So, the only thing that can be shrunk is the methodological section. We have also removed the supplementary material and appendix, and refer to our previous preprint version on Preprint as a technical report if more information is needed.
Reviewer 2 Report
Comments and Suggestions for AuthorsThe authors have answered all the questions and improved the manuscript.
Author Response
My heartfelt thanks go to the reviewer, who recognized the good faith of our answers and who validated the validity of our arguments and the proposed methodology.