Verified Language Processing with Hybrid Explainability †
Abstract
1. Introduction
- RQ №1
- Is it possible to create a plug-and- play and explainable NLP pipeline for sentence representation? By exploiting white-box reasoning, we can ultimately visualise the outcome of each inference and sentence generation step (Section 5.3). Furthermore, by designing the pipeline in a modular way, it is easy to replace each single component to adapt to different linguistic needs. Through declarative context-free rewritings for NLP representation (Section 6), we ensure pipeline versatility by changing the inner rules rather than requiring code changes, as well as future extensibility. Therefore, we can see how the full text is transformed at every stage, allowing any errors to be identified and corrected using a human-in-the-loop approach.
- RQ №2
- Can pre-trained language models correctly capture the notion of sentence similarity? The previous result should imply the impossibility of accurately deriving the notion of equivalence, as entailment implies equivalence through if-and-only-if relationships, but not vice versa. Meanwhile, the notion of sentence indifference should be kept distinct from the notion of conflict. We designed empirical experiments with certain datasets to address the following sub-questions:
- (a)
- Can pre-trained language models capture propositional calculus? These experiments work on the following considerations. Given that First-Order Logic (FOL) is more expressive than propositional calculus, and given that pre-trained models are assumed to reason on arbitrary sentences, they are thus representable in FOL as per Montague’s assumption [17]. Additionally, given that propositional calculus is less expressive but included in FOL, any inability to capture propositional calculus will also invalidate the possibility of extending the approach to soundly making inferences in FOL. Current experiments (Section 4.2.1) show that pre-trained language models cannot adequately capture the information contained in logical connectives from propositional calculus.
- (b)
- Can pre-trained language models distinguish between active and passive sentences? Experiments (Section 4.2.2) show that their intermediate representation (AMR, tokens, and embeddings) is insufficient for distinguishing them faithfully.
- (c)
- Can pre-trained language models correctly capture minimal FOL extensions for spatiotemporal reasoning? Spatiotemporal reasoning requires specific part-of and is-a reasoning, requiring minimal FOL extensions (eFOLs). The paper’s results validate the observations from RQ №2(a), thus remarking on the impossibility of these approaches to reason on eFOLs (Section 4.2.3). Additional experiments clearly remark on the impossibility of such models to derive correct solutions even after re-training (Section 5.3).
2. Related Works
2.1. General Explainable and Verified Artificial Intelligence (GEVAI)
2.2. Pre-Trained Language Models
2.2.1. Sentence Transformers
Sentence Transformers (Section 2.2.1) | Neural Information Retrieval (IR) (Section 2.2.2) | Generative Large Language Model (LLM) (Section 2.2.3) | GEVAI (Section 2.1) | |||
---|---|---|---|---|---|---|
MPNet [33] | RoBERTa [34] | MiniLMv2 [35] | ColBERTv2 [36] | DeBERTaV2+AMR-LDA [12] | LaSSI (This Paper) | |
Task | Document Similarity | Query Answering | Entailment Classification | Paraconsistent Reasoning | ||
Sentence Pre-Processing | Word Tokenisation + Position Encoding | • AMR with Multi-Word Entity Recognition • AMR Rewriting | • Dependency Parsing • Generalised Graph Grammars • Multi-Word Entity Recognition • Logic Function Rewriting | |||
Similarity/Relationship inference | Permutated Language Modelling | – | Annotated Training Dataset | Factored by Tokenisation | • Logical Prompts • Contrastive Learning | • Knowledge Base-driven Similarity • TBox Reasoning |
Learning Strategy | Static Masking | Dynamic Masking | Annotated Training Dataset | • Autoregression • Sentence Distance Minimisation | ||
Final Representation | One vector per sentence | Many vectors per sentence | Classification outcome | Extended First-Order Logic (FOL) | ||
Pros | Deriving Semantic Similarity through Learning | Generalisation of document matching | Deriving Logical Entailment through Learning | • Reasoning Traceability • Paraconsistent Reasoning • Non biased by documents | ||
Cons | • Cannot express propositional calculus • Semantic similarity does not entail implication capturing | • Inadequacy of AMR • Reasoning limited by Logical Prompts • Biased by probabilistic reasoning | Heavily Relies on Upper Ontology |
2.2.2. Neural Information Retrieval (IR)
2.2.3. Generative Large Language Model (LLM)
3. Materials and Methods
3.1. A Priori
3.1.1. Syntactic Analysis Using Stanford CoreNLP
3.1.2. Generation of SetOfSingletons
- Multi-Word Entities: In [21], Algorithm S8 performs node grouping [48] over the nodes connected by compound edge labels while efficiently visiting the graph using a Depth-First Search (DFS). After this, we identify whether a subset of these nodes acts as a specification (extra) to the primary entity of interest or whether it should be treated as a single entity, while also considering the type of information for disambiguation purposes.
- Multiple Logical Functions: Due to the impossibility of graphs to represent n-ary relationships, we group multiple adverbial phrases into one SetOfSingleton. These will then be semantically disambiguated by their function during the Logical Sentence Analysis ([21], Section 3.2.3). Figure 3 provides a simple example, where each MULTIINDIRECT contains either one adverbial phrase or a conjunction. In [21], Supplement III.1 provides a more compelling example, where the SetOfSingleton actually contains more Singletons.
- Coordination: For coordination induced by conj relationships, we can derive a coordination type to be AND, NEITHER, or OR via cc relationships.
3.2. Ad Hoc
flow(Traffic, None)[(SPACE:Newcastle[(type:stay in place), (extra:city centre)]), (TIME:Saturdays[(type:defined)])] |
3.3. Ex Post
4. Results
4.1. Theoretical Results
4.1.1. Cosine Similarity
4.1.2. Confidence Metrics
4.2. Classification
4.2.1. Capturing Logical Connectives and Reasoning
4.2.2. Capturing Simple Semantics and Sentence Structure
4.2.3. Capturing Simple Spatiotemporal Reasoning
5. Discussion
5.1. Using Short Sentences
5.2. LaSSI Ablation Study
5.3. Explainability Study
5.3.1. Explicate Problem
5.3.2. Define Requirements
- Req №1
- The trained model used by the explainer should minimise the degradation of classification performances.
- Req №2
- The explainer should provide an intuitive explanation of the motivations why the text correlates with the classification outcome.
- Req №3
- The explainer should derive connections between semantically entailing words towards the classification task.
- (a)
- The existence of one single feature should not be sufficient to derive the classification; when this occurs, the model will overfit a specific dataset rather than learning to understand the general context of the passage.
5.3.3. Design and Develop
- TF-IDFVec+DT: TF-IDF Vectorisation [65] is a straightforward approach to represent each document within a corpus as a vector, where each dimension describes the TF-IDF value [39] for each word in the document. After vectorising the corpus, we fit a Decision Tree (DT) for learning the correlation between word frequency and classification outcome. Stopwords such as “the” typically have high IDF scores, as they might frequently occur within the text. We retain all the occurring words to minimise our bias when training the classifier. As this violates Req №3(a), we decide to pair this mechanism with the following, being attention-based.
- DistilBERT+Train: DistilBERT [66] is a transformer model designed to be fine-tuned on tasks that use entire sentences (potentially masked) to make decisions [67]. It uses WordPiece subword segmentation to extract features from the full text. We use this transformer to go beyond straightforward word tokenisation as the former. Thus, this approach will not violate Req №3(a) if the attention mechanism will not focus on one single word to draw conclusions, thus remarking on their impossibility to draw correlations across the two sentences.
5.3.4. Artifact Evaluation
Performance Degradation
Intuitiveness
Explanation Through Word Correlation
5.3.5. Final Considerations
6. Conclusions and Future Works
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Zhang, T.; Subburathinam, A.; Shi, G.; Huang, L.; Lu, D.; Pan, X.; Li, M.; Zhang, B.; Wang, Q.; Whitehead, S.; et al. GAIA-A Multi-Media Multi-Lingual Knowledge Extraction and Hypothesis Generation System. In Proceedings of the 2018 Text Analysis Conference, TAC 2018, Gaithersburg, MD, USA, 13–14 November 2018; Available online: http://www.kianasun.com/publication/tac2018/tac2018.pdf (accessed on 21 May 2025).
- Yoshida, S.; Tsurumaru, H.; Hitaka, T. Man-Assisted Machine Construction of a Semantic Dictionary for Natural Language Processing. In Proceedings of the 9th International Conference on Computational Linguistics, COLING’82, Prague, Czech Republic, 5–10 July 1982; ACADEMIA, Publishing House of the Czechoslovak Academy of Sciences: Prague, Czech Republic, 1982; pp. 419–424. [Google Scholar]
- Ayşe, Ş.; Zeynep, O.; İlknur, P. Extraction of Semantic Word Relations in Turkish from Dictionary Definitions. In Proceedings of the ACL 2011 Workshop on Relational Models of Semantics, Portland, OR, USA, 23 June 2011; Kim, S.N., Kozareva, Z., Nakov, P., Ó Séaghdha, D., Padó, S., Szpakowicz, S., Eds.; Omnipress, Inc.: Madison, WI, USA, 2011; pp. 11–18. [Google Scholar]
- Speer, R.; Chin, J.; Havasi, C. ConceptNet 5.5: An Open Multilingual Graph of General Knowledge. Assoc. Adv. Artif. Intell. (AAAI) 2017, 4444–4451. [Google Scholar] [CrossRef]
- Auer, S.; Bizer, C.; Kobilarov, G.; Lehmann, J.; Cyganiak, R.; Ives, Z. DBpedia: A nucleus for a web of open data. In The Semantic Web, Proceedings of the 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007 + ASWC 2007, Busan, Republic of Korea, 11–15 November 2007; Proceedings; Springer: Berlin/Heidelberg, Germany, 2007; pp. 722–735. [Google Scholar] [CrossRef]
- Mendes, P.; Jakob, M.; Bizer, C. DBpedia: A Multilingual Cross-Domain Knowledge Base. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey, 21–27 May 2012; Calzolari, N., Choukri, K., Declerck, T., Doğan, M.U., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., Piperidis, S., Eds.; European Language Resources Association (ELRA): Istanbul, Turkey, 2012; pp. 1813–1817. Available online: https://aclanthology.org/L12-1323/ (accessed on 21 May 2025).
- Bergami, G. A framework supporting imprecise queries and data. arXiv 2019, arXiv:1912.12531. [Google Scholar] [CrossRef]
- Talmor, A.; Herzig, J.; Lourie, N.; Berant, J. CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; Burstein, J., Doran, C., Solorio, T., Eds.; Association for Computational Linguistics: Minneapolis, MN, USA, 2019; pp. 4149–4158. [Google Scholar] [CrossRef]
- Kreutz, C.K.; Wolz, M.; Knack, J.; Weyers, B.; Schenkel, R. SchenQL: In-depth analysis of a query language for bibliographic metadata. Int. J. Digit. Libr. 2022, 23, 113–132. [Google Scholar] [CrossRef]
- Li, F.; Jagadish, H.V. NaLIR: An interactive natural language interface for querying relational databases. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, Snowbird, UT, USA, 22–27 June 2014; Association for Computing Machinery: New York, NY, USA, 2014; pp. 709–712. [Google Scholar] [CrossRef]
- Tammet, T.; Järv, P.; Verrev, M.; Draheim, D. An Experimental Pipeline for Automated Reasoning in Natural Language (Short Paper). In Automated Deduction—CADE 29, Proceedings of the 29th International Conference on Automated Deduction, Rome, Italy, 1–4 July 2023; Proceedings; Springer Nature: Cham, Switzerland, 2023; pp. 509–521. [Google Scholar] [CrossRef]
- Bao, Q.; Peng, A.Y.; Deng, Z.; Zhong, W.; Gendron, G.; Pistotti, T.; Tan, N.; Young, N.; Chen, Y.; Zhu, Y.; et al. Abstract Meaning Representation-Based Logic-Driven Data Augmentation for Logical Reasoning. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2024, Bangkok, Thailand, 11–16 August 2024; Ku, L.W., Martins, A., Srikumar, V., Eds.; Association for Computational Linguistics: Bangkok, Thailand, 2024; pp. 5914–5934. [Google Scholar] [CrossRef]
- Bender, E.M.; Gebru, T.; McMillan-Major, A.; Shmitchell, S. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? In Proceedings of the FAccT ’21: 2021 ACM Conference on Fairness, Accountability, and Transparency, Virtual Event, 3–10 March 2021; Association for Computing Machinery: New York, NY, USA, 2021; pp. 610–623. [Google Scholar] [CrossRef]
- Mirzadeh, I.; Alizadeh, K.; Shahrokhi, H.; Tuzel, O.; Bengio, S.; Farajtabar, M. GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models. arXiv 2024, arXiv:2410.05229. [Google Scholar] [CrossRef]
- Badyal, N.; Jacoby, D.; Coady, Y. Intentional Biases in LLM Responses. arXiv 2023, arXiv:2311.07611. [Google Scholar] [CrossRef]
- Harrison, J. Handbook of Practical Logic and Automated Reasoning; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar] [CrossRef]
- Montague, R. English as a Formal Language. In Logic and Philosophy for Linguists; De Gruyter Mouton: Berlin, Germany; Boston, MA, USA, 1975; pp. 94–121. [Google Scholar] [CrossRef]
- Brown, B. Inconsistency measures and paraconsistent consequence. In Measuring Inconsistency in Information; Grant, J., Martinez, M.V., Eds.; College Press: Joplin, MO, USA, 2018; Chapter 8; pp. 219–234. [Google Scholar]
- Carnielli, W.; Esteban Coniglio, M. Paraconsistent Logic: Consistency, Contradiction and Negation; Springer: Cham, Switzerland, 2016. [Google Scholar] [CrossRef]
- Fox, O.R.; Bergami, G.; Morgan, G. LaSSI: Logical, Structural, and Semantic text Interpretation. In Database Engineered Applications, Proceedings of the 28th International Symposium, IDEAS 2024, Bayonne, France, 26–29 August 2024; Proceedings; Springer: Berlin/Heidelberg, Germany, 2025; pp. 106–121. [Google Scholar] [CrossRef]
- Fox, O.R.; Bergami, G.; Morgan, G. Verified Language Processing with Hybrid Explainability: A Technical Report. arXiv 2025, arXiv:2507.05017. [Google Scholar] [CrossRef]
- Seshia, S.A.; Sadigh, D.; Sastry, S.S. Toward verified artificial intelligence. Commun. ACM 2022, 65, 46–55. [Google Scholar] [CrossRef]
- Bergami, G.; Fox, O.R.; Morgan, G. Extracting Specifications through Verified and Explainable AI: Interpretability, Interoperabiliy, and Trade-offs (In Press). In Explainable Artificial Intelligence for Trustworthy Decisions in Smart Applications; Springer: Berlin/Heidelberg, Germany, 2025; Chapter 2.
- Ma, L.; Kang, H.; Yu, G.; Li, Q.; He, Q. Single-Domain Generalized Predictor for Neural Architecture Search System. IEEE Trans. Comput. 2024, 73, 1400–1413. [Google Scholar] [CrossRef]
- Zini, J.E.; Awad, M. On the Explainability of Natural Language Processing Deep Models. ACM Comput. Surv. 2023, 55, 103:1–103:31. [Google Scholar] [CrossRef]
- Ayoub, J.; Yang, X.J.; Zhou, F. Combat COVID-19 infodemic using explainable natural language processing models. Inf. Process. Manag. 2021, 58, 102569. [Google Scholar] [CrossRef]
- Duan, H.; Dziedzic, A.; Papernot, N.; Boenisch, F. Flocks of Stochastic Parrots: Differentially Private Prompt Learning for Large Language Models. arXiv 2023, arXiv:2305.15594. [Google Scholar] [CrossRef]
- Gonen, H.; Blevins, T.; Liu, A.; Zettlemoyer, L.; Smith, N.A. Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Albuquerque, NM, USA, 29 April–4 May 2025; Chiruzzo, L., Ritter, A., Wang, L., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2025; pp. 785–798. [Google Scholar]
- Balloccu, S.; Reiter, E.; Li, K.J.H.; Sargsyan, R.; Kumar, V.; Reforgiato, D.; Riboni, D.; Dusek, O. Ask the experts: Sourcing a high-quality nutrition counseling dataset through Human-AI collaboration. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, Miami, FL, USA, 12–16 November 2024; Al-Onaizan, Y., Bansal, M., Chen, Y.N., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2024; pp. 11519–11545. [Google Scholar] [CrossRef]
- Morales, S.; Clarisó, R.; Cabot, J. Automating Bias Testing of LLMs. In Proceedings of the 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), Luxembourg, 11–15 September 2023; pp. 1705–1707. [Google Scholar] [CrossRef]
- Tsamoura, E.; Hospedales, T.; Michael, L. Neural-Symbolic Integration: A Compositional Perspective. AAAI Conf. Artif. Intell. 2021, 35, 5051–5060. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.u.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. Available online: https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf (accessed on 21 May 2025).
- Song, K.; Tan, X.; Qin, T.; Lu, J.; Liu, T. MPNet: Masked and Permuted Pre-training for Language Understanding. arXiv 2020. [Google Scholar] [CrossRef]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019. [Google Scholar] [CrossRef]
- Wang, W.; Bao, H.; Huang, S.; Dong, L.; Wei, F. MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers. In Proceedings of the Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Online Event, 1–6 August 2021; Zong, C., Xia, F., Li, W., Navigli, R., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 2140–2151. [Google Scholar] [CrossRef]
- Santhanam, K.; Khattab, O.; Saad-Falcon, J.; Potts, C.; Zaharia, M. ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seattle, WA, USA, 10–15 July 2022; Carpuat, M., de Marneffe, M.C., Meza Ruiz, I.V., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2022; pp. 3715–3734. [Google Scholar] [CrossRef]
- Strobl, L.; Merrill, W.; Weiss, G.; Chiang, D.; Angluin, D. What Formal Languages Can Transformers Express? A Survey. Trans. Assoc. Comput. Linguist. 2024, 12, 543–561. [Google Scholar] [CrossRef]
- Chen, D.; Manning, C.D. A Fast and Accurate Dependency Parser using Neural Networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, Doha, Qatar, 25–29 October 2014; Moschitti, A., Pang, B., Daelemans, W., Eds.; ACL: Cambridge, MA, USA, 2014; pp. 740–750. [Google Scholar] [CrossRef]
- Manning, C.D.; Raghavan, P.; Schütze, H. Introduction to Information Retrieval; Cambridge University Press: Cambridge, UK, 2008; Available online: https://nlp.stanford.edu/IR-book/information-retrieval-book.html (accessed on 21 May 2025).
- Hicks, M.T.; Humphries, J.; Slater, J. ChatGPT is bullshit. Ethics Inf. Technol. 2024, 26, 38. [Google Scholar] [CrossRef]
- Chen, Y.; Wang, D.Z. Knowledge expansion over probabilistic knowledge bases. In Proceedings of the International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, 22–27 June 2014; Dyreson, C.E., Li, F., Özsu, M.T., Eds.; Association for Computing Machinery: New York, NY, USA, 2014; pp. 649–660. [Google Scholar] [CrossRef]
- Kyburg, H.E. Probability and the Logic of Rational Belief; Wesleyan University Press: Middletown, CT, USA, 1961; Available online: https://www.jstor.org/stable/186504 (accessed on 21 May 2025).
- Graydon, M.S.; Lehman, S.M. Examining Proposed Uses of LLMs to Produce or Assess Assurance Arguments; NTRS-NASA Technical Reports Server: Hampton, VA, USA, 2025. Available online: https://ntrs.nasa.gov/citations/20250001849 (accessed on 21 May 2025).
- Ahlers, D. Assessment of the accuracy of GeoNames gazetteer data. In GIR ’13, Proceedings of the 7th Workshop on Geographic Information Retrieval Orlando, FL, USA, 5 November 2013; Association for Computing Machinery: New York, NY, USA, 2013; pp. 74–81. [Google Scholar] [CrossRef]
- Chang, A.X.; Manning, C. SUTime: A library for recognizing and normalizing time expressions. In LREC’12, Proceedings of the Eighth International Conference on Language Resources and Evaluation, Istanbul, Turkey, 23–25 May 2012; Calzolari, N., Choukri, K., Declerck, T., Doğan, M.U., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., Piperidis, S., Eds.; European Language Resources Association (ELRA): Paris, France, 2012; pp. 3735–3740. Available online: https://aclanthology.org/L12-1122/ (accessed on 21 May 2025).
- Qi, P.; Zhang, Y.; Zhang, Y.; Bolton, J.; Manning, C.D. Stanza: A Python Natural Language Processing Toolkit for Many Human Languages. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Online, 5–10 July 2020. [Google Scholar]
- Bergami, G.; Fox, O.R.; Morgan, G. Matching and Rewriting Rules in Object-Oriented Databases. Mathematics 2024, 12, 2677. [Google Scholar] [CrossRef]
- Junghanns, M.; Petermann, A.; Rahm, E. Distributed Grouping of Property Graphs with Gradoop. In Proceedings of the Datenbanksysteme für Business, Technologie und Web (BTW 2017), Stuttgart, Germany, 6–10 March 2017; Mitschang, B., Nicklas, D., Leymann, F., Schöning, H., Herschel, M., Teubner, J., Härder, T., Kopp, O., Wieland, M., Eds.; Gesellschaft für Informatik: Bonn, Germany, 2017; Volume P-265, pp. 103–122. Available online: https://dl.gi.de/handle/20.500.12116/678 (accessed on 21 May 2025).
- Bergami, G.; Zegadło, W. Towards a Generalised Semistructured Data Model and Query Language. SIGWEB Newsl. 2023, 2023, 4. [Google Scholar] [CrossRef]
- Niles, I.; Pease, A. Towards a standard upper ontology. In FOIS’01, Proceedings of the 2nd International Conference on Formal Ontology in Information Systems, Ogunquit, ME, USA, 17–19 October 2001; Association for Computing Machinery: New York, NY, USA, 2001; pp. 2–9. [Google Scholar] [CrossRef]
- Winstanley, P. Is An Upper Ontology Useful? In Proceedings of the 7th ISKO UK Biennial Conference 2023, Glasgow, UK, 24–25 July 2023; Svarre, T., Davies, S., Slavic, A., Vernau, J., Brown, N., Eds.; CEUR Workshop Proceedings. CEUR-WS.org: Aachen, Germany, 2023; Volume 3661. [Google Scholar]
- Wong, P.C.; Whitney, P.; Thomas, J. Visualizing Association Rules for Text Mining. In Proceedings of the 1999 IEEE Symposium on Information Visualization (InfoVIS ’99), San Francisco, CA, USA, 24–29 October 1999; p. 120. [Google Scholar] [CrossRef]
- Hinman, P.G. Fundamentals of Mathematical Logic; A K Peters: Wellesley, MA, USA; CRC Press: Boca Raton, FL, USA, 2005. [Google Scholar]
- Hugging Face. Sentence Transformers. Available online: https://huggingface.co/sentence-transformers (accessed on 24 February 2025).
- Simon, H.A. The Science of Design: Creating the Artificial. Design Issues 1988, 4, 67–82. Available online: http://www.jstor.org/stable/1511391 (accessed on 10 May 2025). [CrossRef]
- Johannesson, P.; Perjons, E. An Introduction to Design Science; Springer: Berlin/Heidelberg, Germany, 2021. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.I. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef]
- Dewi, C.; Tsai, B.J.; Chen, R.C. Shapley Additive Explanations for Text Classification and Sentiment Analysis of Internet Movie Database. In Recent Challenges in Intelligent Information and Database Systems; Szczerbicki, E., Wojtkiewicz, K., Nguyen, S.V., Pietranik, M., Krótkiewicz, M., Eds.; Springer: Singapore, 2022; pp. 69–80. [Google Scholar] [CrossRef]
- Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Ichter, B.; Xia, F.; Chi, E.H.; Le, Q.V.; Zhou, D. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In Advances in Neural Information Processing Systems 35, Proceedings of the Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, 28 November–9 December 2022; Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2022. [Google Scholar]
- Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; Krishnapuram, B., Shah, M., Smola, A.J., Aggarwal, C.C., Shen, D., Rastogi, R., Eds.; Association for Computing Machinery: New York, NY, USA, 2016; pp. 1135–1144. [Google Scholar]
- Ribeiro, M.T.; Singh, S.; Guestrin, C. Anchors: High-Precision Model-Agnostic Explanations. In Proceedings of the AAAI’18: AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; pp. 1527–1535. [Google Scholar] [CrossRef]
- Visani, G.; Bagli, E.; Chesani, F. OptiLIME: Optimized LIME Explanations for Diagnostic Computer Algorithms. In Proceedings of the CIKM 2020 Workshops Co-Located with 29th ACM International Conference on Information and Knowledge Management (CIKM 2020), Galway, Ireland, 19–23 October 2023; Conrad, S., Tiddi, I., Eds.; CEUR Workshop Proceedings. CEUR-WS.org: Aachen, Germany, 2020; Volume 2699. [Google Scholar]
- Watson, D.S.; O’Hara, J.; Tax, N.; Mudd, R.; Guy, I. Explaining Predictive Uncertainty with Information Theoretic Shapley Values. In Advances in Neural Information Processing Systems 36, Proceedings of the Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, 10–16 December 2023; Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2023. [Google Scholar]
- Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, Long Beach, CA, USA, 4–9 December 2017; pp. 4768–4777. [Google Scholar]
- Bengfort, B.; Bilbro, R.; Ojeda, T. Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning, 1st ed.; O’Reilly Media, Inc.: Santa Rosa, CA, USA, 2018. [Google Scholar]
- Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv 2019, arXiv:1910.01108. [Google Scholar]
- Bai, J.; Cao, R.; Ma, W.; Shinnou, H. Construction of Domain-Specific DistilBERT Model by Using Fine-Tuning. In Proceedings of the International Conference on Technologies and Applications of Artificial Intelligence, TAAI 2020, Taipei, Taiwan, 3–5 December 2020; pp. 237–241. [Google Scholar] [CrossRef]
- Crabbé, J.; van der Schaar, M. Evaluating the robustness of interpretability methods through explanation invariance and equivariance. In NIPS ’23, Proceedings of the 37th International Conference on Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023; Curran Associates Inc.: Red Hook, NY, USA, 2023. [Google Scholar] [CrossRef]
- Slack, D.; Hilgard, S.; Jia, E.; Singh, S.; Lakkaraju, H. Fooling LIME and SHAP: Adversarial Attacks on Post Hoc Explanation Methods. In Proceedings of the AIES ’20: AAAI/ACM Conference on AI, Ethics, and Society, New York, NY, USA, 7–9 February 2020; Association for Computing Machinery: New York, NY, USA, 2020; pp. 180–186. [Google Scholar] [CrossRef]
- Amini, R.; Norouzi, S.S.; Hitzler, P.; Amini, R. Towards Complex Ontology Alignment Using Large Language Models. In Knowledge Graphs and Semantic Web; Tiwari, S., Villazón-Terrazas, B., Ortiz-Rodríguez, F., Sahri, S., Eds.; Springer: Cham, Switzerland, 2025; pp. 17–31. [Google Scholar]
- Kruskal, J.B.; Wish, M. Multidimensional Scaling; Quantitative Applications in the Social Sciences; SAGE Publications, Inc.: Thousand Oaks, CA, USA, 1978. [Google Scholar] [CrossRef]
- Mead, A. Review of the Development of Multidimensional Scaling Methods. J. R. Stat. Soc. Ser. (Stat.) 1992, 41, 27–39. [Google Scholar] [CrossRef]
- Agarwal, S.; Wills, J.; Cayton, L.; Lanckriet, G.R.G.; Kriegman, D.J.; Belongie, S.J. Generalized Non-Metric Multidimensional Scaling. In Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, AISTATS 2007, San Juan, Puerto Rico, 21–24 March 2007; Meila, M., Shen, X., Eds.; JMLR Proceedings. PMLR: New York, NY, USA, 2007; Volume 2, pp. 11–18. Available online: http://proceedings.mlr.press/v2/agarwal07a.html (accessed on 21 May 2025).
- Quist, M.; Yona, G. Distributional Scaling: An Algorithm for Structure-Preserving Embedding of Metric and Nonmetric Spaces. J. Mach. Learn. Res. 2004, 5, 399–420. [Google Scholar]
- Costa, C.F.; Nascimento, M.A.; Schubert, M. Diverse nearest neighbors queries using linear skylines. GeoInformatica 2018, 22, 815–844. [Google Scholar] [CrossRef]
- Botea, V.; Mallett, D.; Nascimento, M.A.; Sander, J. PIST: An Efficient and Practical Indexing Technique for Historical Spatio-Temporal Point Data. GeoInformatica 2008, 12, 143–168. [Google Scholar] [CrossRef]
Metric | Average | Clustering | SGs | LGs | Logical | T1 | T2 | T3 | T4 | T5 | T6 |
---|---|---|---|---|---|---|---|---|---|---|---|
Accuracy | – | HAC | 0.28 | 0.38 | 1.00 | 0.34 | 0.34 | 0.36 | 0.36 | 0.36 | 0.39 |
k-Medoids | 0.28 | 0.38 | 1.00 | 0.34 | 0.34 | 0.36 | 0.36 | 0.36 | 0.39 | ||
F1 | Macro | HAC | 0.25 | 0.41 | 1.00 | 0.37 | 0.37 | 0.39 | 0.39 | 0.34 | 0.42 |
k-Medoids | 0.25 | 0.41 | 1.00 | 0.37 | 0.37 | 0.39 | 0.39 | 0.34 | 0.42 | ||
Weighted | HAC | 0.18 | 0.31 | 1.00 | 0.26 | 0.26 | 0.29 | 0.29 | 0.24 | 0.33 | |
k-Medoids | 0.18 | 0.31 | 1.00 | 0.26 | 0.26 | 0.29 | 0.29 | 0.24 | 0.33 | ||
Precision | Macro | HAC | 0.27 | 0.48 | 1.00 | 0.42 | 0.42 | 0.51 | 0.51 | 0.28 | 0.56 |
k-Medoids | 0.27 | 0.48 | 1.00 | 0.42 | 0.42 | 0.51 | 0.51 | 0.28 | 0.56 | ||
Weighted | HAC | 0.20 | 0.38 | 1.00 | 0.30 | 0.30 | 0.43 | 0.43 | 0.20 | 0.51 | |
k-Medoids | 0.20 | 0.38 | 1.00 | 0.30 | 0.30 | 0.43 | 0.43 | 0.20 | 0.51 | ||
Recall | Macro | HAC | 0.38 | 0.50 | 1.00 | 0.47 | 0.47 | 0.48 | 0.48 | 0.49 | 0.51 |
k-Medoids | 0.38 | 0.50 | 1.00 | 0.47 | 0.47 | 0.48 | 0.48 | 0.49 | 0.51 | ||
Weighted | HAC | 0.28 | 0.38 | 1.00 | 0.34 | 0.34 | 0.36 | 0.36 | 0.36 | 0.39 | |
k-Medoids | 0.28 | 0.38 | 1.00 | 0.34 | 0.34 | 0.36 | 0.36 | 0.36 | 0.39 |
Metric | Average | Clustering | SGs | LGs | Logical | T1 | T2 | T3 | T4 | T5 | T6 |
---|---|---|---|---|---|---|---|---|---|---|---|
Accuracy | – | HAC | 0.44 | 0.39 | 1.00 | 0.50 | 0.50 | 0.44 | 0.50 | 0.19 | 0.39 |
k-Medoids | 0.44 | 0.39 | 1.00 | 0.50 | 0.50 | 0.44 | 0.50 | 0.19 | 0.39 | ||
F1 | Macro | HAC | 0.46 | 0.38 | 1.00 | 0.52 | 0.52 | 0.45 | 0.52 | 0.13 | 0.36 |
k-Medoids | 0.46 | 0.38 | 1.00 | 0.52 | 0.52 | 0.45 | 0.52 | 0.13 | 0.36 | ||
Weighted | HAC | 0.36 | 0.30 | 1.00 | 0.49 | 0.49 | 0.43 | 0.49 | 0.09 | 0.27 | |
k-Medoids | 0.36 | 0.30 | 1.00 | 0.49 | 0.49 | 0.43 | 0.49 | 0.09 | 0.27 | ||
Precision | Macro | HAC | 0.42 | 0.34 | 1.00 | 0.51 | 0.51 | 0.52 | 0.51 | 0.09 | 0.29 |
k-Medoids | 0.42 | 0.34 | 1.00 | 0.51 | 0.51 | 0.52 | 0.51 | 0.09 | 0.29 | ||
Weighted | HAC | 0.33 | 0.27 | 1.00 | 0.51 | 0.51 | 0.57 | 0.51 | 0.06 | 0.23 | |
k-Medoids | 0.33 | 0.27 | 1.00 | 0.51 | 0.51 | 0.57 | 0.51 | 0.06 | 0.23 | ||
Recall | Macro | HAC | 0.58 | 0.52 | 1.00 | 0.56 | 0.56 | 0.52 | 0.56 | 0.29 | 0.53 |
k-Medoids | 0.58 | 0.52 | 1.00 | 0.56 | 0.56 | 0.52 | 0.56 | 0.29 | 0.53 | ||
Weighted | HAC | 0.44 | 0.39 | 1.00 | 0.50 | 0.50 | 0.44 | 0.50 | 0.19 | 0.39 | |
k-Medoids | 0.44 | 0.39 | 1.00 | 0.50 | 0.50 | 0.44 | 0.50 | 0.19 | 0.39 |
Metric | Average | Clustering | SGs | LGs | Logical | T1 | T2 | T3 | T4 | T5 | T6 |
---|---|---|---|---|---|---|---|---|---|---|---|
Accuracy | – | HAC | 0.21 | 0.23 | 1.00 | 0.28 | 0.29 | 0.27 | 0.29 | 0.21 | 0.29 |
k-Medoids | 0.21 | 0.23 | 1.00 | 0.28 | 0.29 | 0.27 | 0.29 | 0.21 | 0.29 | ||
F1 | Macro | HAC | 0.24 | 0.28 | 1.00 | 0.37 | 0.38 | 0.34 | 0.38 | 0.21 | 0.37 |
k-Medoids | 0.24 | 0.28 | 1.00 | 0.37 | 0.37 | 0.34 | 0.38 | 0.21 | 0.37 | ||
Weighted | HAC | 0.13 | 0.15 | 1.00 | 0.21 | 0.22 | 0.18 | 0.22 | 0.11 | 0.20 | |
k-Medoids | 0.13 | 0.15 | 1.00 | 0.21 | 0.20 | 0.18 | 0.22 | 0.11 | 0.20 | ||
Precision | Macro | HAC | 0.35 | 0.37 | 1.00 | 0.46 | 0.49 | 0.35 | 0.54 | 0.20 | 0.37 |
k-Medoids | 0.35 | 0.37 | 1.00 | 0.46 | 0.36 | 0.35 | 0.54 | 0.20 | 0.37 | ||
Weighted | HAC | 0.20 | 0.20 | 1.00 | 0.37 | 0.43 | 0.19 | 0.53 | 0.11 | 0.20 | |
k-Medoids | 0.20 | 0.20 | 1.00 | 0.37 | 0.20 | 0.19 | 0.53 | 0.11 | 0.20 | ||
Recall | Macro | HAC | 0.41 | 0.46 | 1.00 | 0.54 | 0.54 | 0.52 | 0.54 | 0.41 | 0.56 |
k-Medoids | 0.41 | 0.46 | 1.00 | 0.54 | 0.56 | 0.52 | 0.54 | 0.41 | 0.56 | ||
Weighted | HAC | 0.21 | 0.23 | 1.00 | 0.28 | 0.29 | 0.27 | 0.29 | 0.21 | 0.29 | |
k-Medoids | 0.21 | 0.23 | 1.00 | 0.28 | 0.29 | 0.27 | 0.29 | 0.21 | 0.29 |
Metric | Average | Clustering | SGs | LGs | Logical | |||
---|---|---|---|---|---|---|---|---|
Ad Hoc | Ad Hoc | Ad Hoc | ||||||
Y | N | Y | N | Y | N | |||
Accuracy | – | HAC | 0.21 | 0.21 | 0.23 | 0.24 | 1.00 | 0.33 |
k-Medoids | 0.21 | 0.21 | 0.23 | 0.24 | 1.00 | 0.33 | ||
F1 | Macro | HAC | 0.24 | 0.24 | 0.28 | 0.30 | 1.00 | 0.42 |
k-Medoids | 0.24 | 0.24 | 0.28 | 0.30 | 1.00 | 0.42 | ||
Weighted | HAC | 0.13 | 0.13 | 0.15 | 0.16 | 1.00 | 0.23 | |
k-Medoids | 0.13 | 0.13 | 0.15 | 0.16 | 1.00 | 0.23 | ||
Precision | Macro | HAC | 0.35 | 0.35 | 0.37 | 0.37 | 1.00 | 0.36 |
k-Medoids | 0.35 | 0.35 | 0.37 | 0.37 | 1.00 | 0.36 | ||
Weighted | HAC | 0.20 | 0.20 | 0.20 | 0.20 | 1.00 | 0.20 | |
k-Medoids | 0.20 | 0.20 | 0.20 | 0.20 | 1.00 | 0.20 | ||
Recall | Macro | HAC | 0.41 | 0.41 | 0.46 | 0.48 | 1.00 | 0.63 |
k-Medoids | 0.41 | 0.41 | 0.46 | 0.48 | 1.00 | 0.63 | ||
Weighted | HAC | 0.21 | 0.21 | 0.23 | 0.24 | 1.00 | 0.33 | |
k-Medoids | 0.21 | 0.21 | 0.23 | 0.24 | 1.00 | 0.33 |
Accuracy | F1 | Precision | Recall | ||||
---|---|---|---|---|---|---|---|
Macro | Weighted | Macro | Weigthed | Macro | Weighted | ||
TF-IDFVec+DT | 0.95 | 0.93 | 0.94 | 0.95 | 0.94 | 0.92 | 0.94 |
DistilBERT+Train | 0.76 | 0.51 | 0.69 | 0.45 | 0.64 | 0.61 | 0.76 |
Explainer | Model | Req №1 | Req №2 | Req №3 | Req №3(a) |
---|---|---|---|---|---|
SHAP | TF-IDFVec+DT | ◐ | ⬤ | ◐ | 〇 |
DistilBERT+Train | 〇 | ⬤ | ◐ | 〇 | |
LIME | TF-IDFVec+DT | ◐ | ⬤ | ◐ | 〇 |
DistilBERT+Train | 〇 | ⬤ | ◐ | 〇 | |
LaSSI | ⬤ | 〇 | ⬤ | ⬤ |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Fox, O.R.; Bergami, G.; Morgan, G. Verified Language Processing with Hybrid Explainability. Electronics 2025, 14, 3490. https://doi.org/10.3390/electronics14173490
Fox OR, Bergami G, Morgan G. Verified Language Processing with Hybrid Explainability. Electronics. 2025; 14(17):3490. https://doi.org/10.3390/electronics14173490
Chicago/Turabian StyleFox, Oliver Robert, Giacomo Bergami, and Graham Morgan. 2025. "Verified Language Processing with Hybrid Explainability" Electronics 14, no. 17: 3490. https://doi.org/10.3390/electronics14173490
APA StyleFox, O. R., Bergami, G., & Morgan, G. (2025). Verified Language Processing with Hybrid Explainability. Electronics, 14(17), 3490. https://doi.org/10.3390/electronics14173490