Informing Design and Research Concerning Conversationally Explainable AI Systems by Collecting and Distilling Human Explanatory Dialogues
Abstract
1. Introduction
- A methodology is proposed for empirically grounding CXAI in human explanatory communication using dialogue distillation of experimentally collected human dialogues. This methodology extends previous human-centred approaches to CXAI by enabling identification of explanatory dialogue capabilities which might be important for CXAI systems to possess.
- A dataset of 35 collected non-expert dialogues concerning AI-based personality estimation, encompassing a total of 779 utterances, is publicly released.
- A distillation of the collected dialogues identifies 11 different dialogue capabilities used by the participants in the study. Most interestingly in relation to previous work, the study reveals that interlocutors frequently explain predictions with reference to warrants (general rules or patterns [33]), either explicitly or implicitly, a behaviour which is difficult to emulate when targeting opaque models such as deep neural nets and random forests. The study also reveals situations where utterances by explainees presuppose false information, and where explainers signal these presupposition violations. While identification of presuppositions in user utterances has received limited attention in prior work on CXAI, the study provides empirical evidence of its function in analogous human–human interactions.
2. Materials and Methods
2.1. Experimental Setup
2.2. Recruitment of Participants
2.3. Collected Data
2.4. Dialogue Distillation
2.4.1. Technical Assumptions
2.4.2. Normative Assumptions
2.4.3. Example
- (1)
- R: I really want to know what these results are based on…why am I so low on openness? kind of disagree with thatO: “openness to experience”: the score was calculated on the basis that you don’t prefer high-energy, loud music… (D 20)
- U: I really want to know what the result is based on…why am I so low on openness? kind of disagree with thatS: You seem to prefer low-energy music.
- U: I really want to know what the result is based on…why am I so low on openness? kind of disagree with thatS: You seem to prefer low-energy, silent music.
- U: I really want to know what the result is based on…why am I so low on openness? kind of disagree with thatS: You seem to prefer high-energy, loud music.
3. Results
3.1. Types of Explananda
- (2)
- O: You scored 5 on Openness. You scored 0.5 on Conscientiousness. ……R: Why is it a 5 on Openness… (D 2)
- (3)
- O: You have been rated the highest in opennessR: Oh wow, why? :) (D 15)
- (4)
- R: What kind of information does the test give you?…O: apparently, you are very openO: almost 5 (out of −5 to 5 where 0 is the median)R: It’s interesting, I wonder what song would give this trait (D 13)
- (5)
- O: on extroversion, you scored pretty close to the medianR: Do you know the link with music features? (D 13)
- (6)
- O: You have high scores for openess…O: Extraversion is just below the median valueO: Agreeablenes is even more belowR: I think about the word agreeablenes, don’t know what to think about that :) (D 26)
3.2. Explanation Triggers and Query Types
- (7)
- R: I wonder if music influences the personality or if it’s only the other wayO: yeah I was thinking that too (D 13)
- (8)
- O: the AI calculates the results based on a statistical model for each personality trait…R: hm ok, so it takes into account many other people’s statistics thenO: a 1000 users apparently (D 19)
- (9)
- O: You scored 5 on Openness. You scored 0.5 on Conscientiousness. You scored −2.3 on Extraversion. You scored −0.7 on Agreeableness. You scored −0.5 on Neuroticism.O: Openness to experience describes a dimension of cognitive style that distinguishes imaginative, creative people from down-to-earth, conventional people.Conscientiousness concerns the way in which we control, regulate, and direct our impulses.… (D 2)
3.3. Types of Explanantia
- (10)
- R: what do you base this conclusion onO: This conclusion is based on the score from your ratings of the music you listened to (D 2)
- (11)
- O: You scored −2.3 on Extraversion, which tells me that you are likely more introverted.O: The Extraversion score was based on Dancable/Non-Dancable music, Happy/Sad music, Intrumental/Non-instrumental music and music with/without spoken words.(D 2)
- (12)
- R: Why is it a 5 on Openness, based on what variables …O: The 5 on Openness comes from the total score of four Preferences. Loud music is −5 and silent music is 5—you scored 4.5. Other preferences were acoustic music at −5 and non-acoustic on 5, where you scored −1. The high-energy music preference was −5 and low-energy music was 5—you scored 3.8. High-tempo preference was −5 and low-tempo music was 5—you scored 0,8. The total of these 4 preferences were a 5 on Openness (D 2)
- (13)
- R: Can I get my test results? What are the scores and their interpretation?…O: you have scored low on “openness” and “neuroticism”O: slightly higher on “extraversion”…R: I really want to know what these results are based on…why am I so low on openness? kind of disagree with thatO: “openness to experience”: the score was calculated on the basis that you don’t prefer high-energy, loud music…O: “Conscientiousness” was based on the fact that you prefer non-live music and music with spoken wordsO: “Extraversion” is based on the fact that you prefer music with spoken words while you disprefer sad/depressed/angry music, non-instrument music, non-danceable music(D 19)
- (14)
- O: Your neuroticism was −1.7…O: Do you not enjoy death metalR: not particularlyR: but slightlyO: Fair enoughO: Explains the sightly negative score (D 15)
- (15)
- O: You have high on openness, and very low on neutoticism. neutral on the restR: oh, low on neuroticism, but I pressed so fast on the growling-dislike-button!…O: Maybe if you were neurotic you would not like the growling, or you could. I can see reasons in both directions (D 34)
- (16)
- O: on extroversion, you scored pretty close to the medianR: Do you know the link with music features?O: in the explanations chart there is a strong relationship between extroversion and dancability and livelinessO: but if I’m reading this correctly, your preferences didn’t indicate strongly one way or another about those features(D 13)
- (17)
- O: you have scored low on “openness” and “neuroticism”O: slightly higher on “extraversion”…R: I really want to know what these results are based on ……O: “Extraversion” is based on the fact that you prefer music with spoken words while you disprefer sad/depressed/angry music, non-instrument music, non-danceable music(D 19)
- (18)
- O: apparently, you are very openO: almost 5 (out of −5 to 5 where 0 is the median)R: It’s interesting, I wonder what song would give this traitO: well I actually can tell you something about that I thinkO: not which song in particular, but how openness relates to features of the musicR: Oh great I’m interestedO: for example, openness apparently has a strong relationship with acousticnessO: and I think this is saying you showed a preference for accoustic musicR: Okay, yes I think I didR: What about extraversion?O: on extroversion, you scored pretty close to the medianR: Do you know the link with music features?O: in the explanations chart there is a strong relationship between extroversion and dancability and livelinessO: but if I’m reading this correctly, your preferences didn’t indicate strongly one way or another about those features (D 13)
- (19)
- R: Why do you think I’m a very aggreeable person?R: * disagreeableO: I don’t know, but the results says that a person scores high in that category if they prefer non-live music, major mode, acoustic music, and non-instrumental music … (D 8)
- (20)
- O: You have been rated the highest in opennessR: Oh wow, why? :)O: Openness describes a dimension of style that is imaginativeO: creative peopleO: down to earthO: and conventionalR: are these three different grades?O: Agreeableness is rated 0.5O: The reasons that I gave was all for openness (D 15)
- (21)
- R: … Is a lower score on agreeableness a negative quality to have? …O: “Agreeableness reflects individual differences in concern with cooperation and social harmony. Agreeable individuals value getting along with others.”O: I guess it means that you are a bit less likely to like cooperation and socialising? (D 6)
3.4. Response Strategies
- (22)
- R: But the result could be because of a preference of music with lower tempoR: am I correct? // …O: Yes this is correct. You preferred music with lower tempo. (D 2)
3.5. Argumentative Structure
- (23)
- O: you have scored low on “openness” and “neuroticism”O: slightly higher on “extraversion”…R: I really want to know what these results are based on …O: “openness to experience”: the score was calculated on the basis that you don’t prefer high-energy, loud musicR: well, I would not consider any of the music pieces I listened to as high-energy and loud music…but ok, I see…O: “Conscientiousness” was based on the fact that you prefer non-live music and music with spoken wordsR: mmm icO: “Extraversion” is based on the fact that you prefer music with spoken words while you disprefer sad/depressed/angry music, non-instrument music, non-danceable musicR: yes, this sounds about rightR: “Agreeableness” is based on the fact that you prefer non-instrumental music, non-live music, and perhaps acoustic music…R: agreeableness’s explanation also sounds good to me(D 19)
- (13′)
- U: I really want to know what the result is based on… why am I so low on openness? kind of disagree with thatS: You seem to prefer low-energy, loud music.…
- (18′)
- S: I think you rate high on openness.U: It’s interesting, I wonder whyS: Openness is associated with a preference for acoustic music.…
3.6. Faithfulness and Sources of Evidence
3.7. Answer Unavailability
- (24)
- R: … Among my answers, which ones tilted me the most towards being non-agreeable and slightly open?O: Your agreeableness reflects individual differences in concern with cooperation and social harmony. (D 3)
- (25)
- R: So, what did they base the low conscientiousnss on?O: hard to say, any specific music you rated high or low(D 33)
- (26)
- R: So there were no specific answers in the test that led to these results? For instance my slight disliking of death metal.O: Not really, your answer for the tracks is not something i can see, What i can see is What your answers acumilated to. Then showing me What personality traits you haveR: I see, that makes is a lot clearer. (D 3)
- (24′)
- U: Among my answers, which ones tilted me the most towards being non-agreeable?S: I don’t have any information about that.
- (7′)
- U: I wonder if a preference for acoustic music influences openness or if it’s only the other wayS: I don’t have any information about that.
- (15′)
- S: I think you rate low on neuroticism.U: oh, is this because I don’t like growling?S: Sorry, I don’t understand what “growling” means in this context.
- (14′)
- U: do you think death metal fans are 5 or −5 in neuroticism?S: Sorry, I don’t understand what “death metal” means in this context.
- (26′)
- U: So there were no specific answers in the test that led to these results? For instance my slight disliking of death metal.S: Sorry, I don’t understand what “death metal” means in this context.
- (21′)
- R: Is a lower score on agreeableness a negative quality to have?S: Sorry, I don’t understand what “negative quality” means in this context.
3.8. Feedback and Grounding
- (27)
- O: the AI calculates the results based on a statistical model for each personality traitO: I doubt if it can discriminate different phasesR: hm ok, so it takes into account many other people’s statistics thenO: of a particular personR: yes, what you described is a very general pictureO: a 1000 users apparentlyR: aha I see, quite a small setR: ok, I am pretty satisfied with the explanationO: users of the music website Last.fmO: I am glad :) (D 19)
- (28)
- R: I think these results kind of make sense for myself, but I am having hard times understanding my openness raiting. It’s interesting that I am extravert but not open based on these results…don’t you see a confusion there?R: agreeableness’s explanation also sounds good to meR: i see nowO: “Neuroticism” is based on the fact that you disprefer acoustic, high-energy music while you slightly prefer non-danceable music… (D 19)
- (29)
- O: Do you agree or disagree with the resultsR: but it is hard to understand what in the music made the AI think that I am 0 in extraversion…O: ?O: Yeah, so disagree? (D 15
- (30)
- R: Canbyou give me my resultsO: opennesa 5…O: neurotisk −1.2-ish…R: Why am i neurotiskO: no, minus 1.2R: Which means i am not neurotiskO: guess so (D 10)
- (31)
- R: … I listen to a lot of heavy metal. Is there a correlation between loud and fast music and my score?O: You seem to prefer slower and less loud musicR: haha that’s the complete opposite. … (D 8)
3.9. Anaphora
3.10. Turn-Taking and Complex Explanantia
- (13″)
- U: I really want to know what the result is based on…why am I so low on openness? kind of disagree with thatS: You seem to prefer low-energy music.U: okS: Also, you seem to prefer loud music.…
3.11. Ellipsis and Explanandum Co-Referencing
- (32)
- O: …You scored 5 in Openness, a bit under −2 in Conscientiousness, a bit over 1 in Extraversion, a bit under −4 in Agreeableness, and almost −3 in NeurotiscmR: Why do you think I’m a very aggreeable person?R: * disagreeableO: I don’t know, but the results says that a person scores high in that category if they prefer non-live music, major mode, acoustic music, and non-instrumental music. Maybe you have other preferences (D 8)
3.12. Reliability and Epistemic Stance
- (33)
- O: the AI calculates the results based on a statistical model for each personality traitO: I doubt if it can discriminate different phasesR: hm ok, so it takes into account many other people’s statistics thenO: of a particular personR: yes, what you described is a very general pictureO: a 1000 users apparentlyR: aha I see, quite a small set…O: users of the music website Last.fm…R: very specific website - most of the people right now don’t really use Last.fm, maybe that would explain a lot…O: I agree (D 19)
- (34)
- O: I can see that you have very high scores on opennessR: That is to be expectedR: Selection bias for people that participate in this kind of experimentR: or any experiment (D 33)
3.13. Synthesis of Findings
4. Discussion
4.1. Implications for Future Work
4.2. Validity
4.3. Limitations
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier; KDD ’16. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions; NIPS’17. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 4–9 December 2017; pp. 4768–4777. [Google Scholar]
- Lakkaraju, H.; Bach, S.H.; Leskovec, J. Interpretable decision sets: A joint framework for description and prediction. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1675–1684. [Google Scholar]
- Rudin, C.; Chen, C.; Chen, Z.; Huang, H.; Semenova, L.; Zhong, C. Interpretable machine learning: Fundamental principles and 10 grand challenges. Stat. Surv. 2022, 16, 1–85. [Google Scholar] [CrossRef]
- Marques-Silva, J.; Ignatiev, A. No silver bullet: Interpretable ML models must be explained. Front. Artif. Intell. 2023, 6, 1128212. [Google Scholar] [CrossRef] [PubMed]
- Molnar, C. Interpretable Machine Learning: A Guide For Making Black Box Models Explainable, 3rd ed.; Christoph Molnar: Munich, Germany, 2025. [Google Scholar]
- Miller, T. Explanation in artificial intelligence: Insights from the social sciences. Artif. Intell. 2019, 267, 1–38. [Google Scholar] [CrossRef]
- Rohlfing, K.J.; Cimiano, P.; Scharlau, I.; Matzner, T.; Buhl, H.M.; Buschmeier, H.; Esposito, E.; Grimminger, A.; Hammer, B.; Häb-Umbach, R.; et al. Explanation as a Social Practice: Toward a Conceptual Framework for the Social Design of AI Systems. IEEE Trans. Cogn. Dev. Syst. 2021, 13, 717–728. [Google Scholar] [CrossRef]
- Finke, J.; Horwath, I.; Matzner, T.; Schulz, C. (De)Coding Social Practice in the Field of XAI: Towards a Co-constructive Framework of Explanations and Understanding Between Lay Users and Algorithmic Systems. In Proceedings of the Artificial Intelligence in HCI, Online, 26 June–1 July 2022; Degen, H., Ntoa, S., Eds.; Springer: Cham, Switzerland, 2022; pp. 149–160. [Google Scholar]
- Lakkaraju, H.; Slack, D.; Chen, Y.; Tan, C.; Singh, S. Rethinking Explainability as a Dialogue: A Practitioner’s Perspective. arXiv 2022, arXiv:2202.01875. [Google Scholar]
- Dazeley, R.; Vamplew, P.; Foale, C.; Young, C.; Aryal, S.; Cruz, F. Levels of explainable artificial intelligence for human-aligned conversational explanations. Artif. Intell. 2021, 299, 103525. [Google Scholar] [CrossRef]
- Mariotti, E.; Alonso, J.M.; Gatt, A. Towards Harnessing Natural Language Generation to Explain Black-box Models. In Proceedings of the 2nd Workshop on Interactive Natural Language Technology for Explainable Artificial Intelligence, Online, 18 December 2020; pp. 22–27. [Google Scholar]
- Sokol, K.; Flach, P.A. Glass-Box: Explaining AI Decisions With Counterfactual Statements Through Conversation With a Voice-enabled Virtual Assistant. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, Stockholm, Sweden, 13–19 July 2018; pp. 5868–5870. [Google Scholar]
- Mindlin, D.; Beer, F.; Sieger, L.N.; Heindorf, S.; Esposito, E.; Ngonga Ngomo, A.C.; Cimiano, P. Beyond one-shot explanations: A systematic literature review of dialogue-based xAI approaches. Artif. Intell. Rev. 2025, 58, 81. [Google Scholar] [CrossRef]
- Werner, C. Explainable AI through Rule-based Interactive Conversation. In Proceedings of the Workshops of the EDBT/ICDT 2020 Joint Conference, Copenhagen, Denmark, 30 March–2 April 2020. [Google Scholar]
- Kuźba, M.; Biecek, P. What Would You Ask the Machine Learning Model? Identification of User Needs for Model Explanations Based on Human-Model Conversations. In Proceedings of the ECML PKDD 2020 Workshops, Ghent, Belgium, 14–18 September 2020; Koprinska, I., Kamp, M., Appice, A., Loglisci, C., Antonie, L., Zimmermann, A., Guidotti, R., Özgöbek, Ö., Ribeiro, R.P., Gavaldà, R., et al., Eds.; Springer: Cham, Switzerland, 2020; pp. 447–459. [Google Scholar]
- Slack, D.; Krishna, S.; Lakkaraju, H.; Singh, S. Explaining machine learning models with interactive natural language conversations using TalkToModel. Nat. Mach. Intell. 2023, 5, 873–883. [Google Scholar] [CrossRef]
- Feldhus, N.; Ravichandran, A.M.; Möller, S. Mediators: Conversational Agents Explaining NLP Model Behavior. arXiv 2022, arXiv:2206.06029. [Google Scholar] [CrossRef]
- Nguyen, V.B.; Schlötterer, J.; Seifert, C. From Black Boxes to Conversations: Incorporating XAI in a Conversational Agent. In Proceedings of the Explainable Artificial Intelligence, Lisbon, Portugal, 26–28 July 2023; Longo, L., Ed.; Springer: Cham, Switzerland, 2023; pp. 71–96. [Google Scholar]
- Wijekoon, A.; Corsar, D.; Wiratunga, N.; Martin, K.; Salimi, P. Tell me more: Intent Fulfilment Framework for Enhancing User Experiences in Conversational XAI. arXiv 2024, arXiv:2405.10446. [Google Scholar] [CrossRef]
- Malandri, L.; Mercorio, F.; Mezzanzanica, M.; Nobani, N. ConvXAI: A system for multimodal interaction with any black-box explainer. Cogn. Comput. 2023, 15, 613–644. [Google Scholar] [CrossRef]
- Mindlin, D.; Robrecht, A.S.; Morasch, M.; Cimiano, P. Measuring User Understanding in Dialogue-Based XAI Systems. In Proceedings of the 27th European Conference on Artificial Intelligence (ECAI 2024), Including PAIS 2024, Santiago de Compostela, Spain, 19–24 October 2024; Frontiers in Artificial Intelligence and Applications; Volume 392, pp. 1148–1155. [Google Scholar] [CrossRef]
- Berman, A. Argumentative Dialogue As Basis For Human-AI Collaboration. In Proceedings of the HHAI 2024 Workshops, Malmö, Sweden, 10–11 June 2024. [Google Scholar]
- Berman, A.; Larsson, S. Assessing Conversational Capabilities of Explanatory AI Interfaces. In Proceedings of the International Conference on Human-Computer Interaction, Gothenburg, Sweden, 22–27 June 2025; Springer: Berlin/Heidelberg, Germany, 2025; pp. 3–21. [Google Scholar]
- Wang, D.; Yang, Q.; Abdul, A.; Lim, B.Y. Designing theory-driven user-centric explainable AI. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, Scotland, UK, 4–9 May 2019; pp. 1–15. [Google Scholar]
- Ehsan, U.; Wintersberger, P.; Liao, Q.V.; Mara, M.; Streit, M.; Wachter, S.; Riener, A.; Riedl, M.O. Operationalizing human-centered perspectives in explainable AI. In Proceedings of the Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, Originally, Yokohama, Japan, 8–13 May 2021; pp. 1–6. [Google Scholar]
- Liao, Q.V.; Varshney, K.R. Human-Centered Explainable AI (XAI): From Algorithms to User Experiences. arXiv 2022, arXiv:2110.10790. [Google Scholar] [CrossRef]
- Kim, J.; Maathuis, H.; Sent, D. Human-centered evaluation of explainable AI applications: A systematic review. Front. Artif. Intell. 2024, 7, 1456486. [Google Scholar] [CrossRef] [PubMed]
- Shneiderman, B. Human-Centered AI; Oxford University Press: Oxford, UK, 2022. [Google Scholar]
- Capel, T.; Brereton, M. What is human-centered about human-centered AI? A map of the research landscape. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, Hamburg, Germany, 23–28 April 2023; pp. 1–23. [Google Scholar]
- Booshehri, M.; Buschmeier, H.; Cimiano, P. A Model of Factors Contributing to the Success of Dialogical Explanations; ICMI ’24. In Proceedings of the 26th International Conference on Multimodal Interaction, San José, Costa Rica, 4–8 November 2024; pp. 373–381. [Google Scholar] [CrossRef]
- Liao, Q.V.; Gruen, D.; Miller, S. Questioning the AI: Informing Design Practices for Explainable AI User Experiences. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, Oahu, HI, USA, 25–30 April 2020; pp. 1–15. [Google Scholar]
- Toulmin, S.E. The Uses of Argument; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
- Kuhlen, A.K.; Brennan, S.E. Language in dialogue: When confederates might be hazardous to your data. Psychon. Bull. Rev. 2013, 20, 54–72. [Google Scholar] [CrossRef] [PubMed]
- John, O.P.; Srivastava, S. The Big-Five Trait Taxonomy: History, Measurement, and Theoretical Perspectives. In Handbook of Personality: Theory and Research, 2nd ed.; Pervin, L.A., John, O.P., Eds.; Guilford Press: New York, NY, USA, 1999; pp. 102–138. [Google Scholar]
- Melchiorre, A.B.; Schedl, M. Personality Correlates of Music Audio Preferences for Modelling Music Listeners. In Proceedings of the 28th ACM Conference on User Modeling, Adaptation and Personalization, Genoa, Italy, 12–18 July 2020; Association for Computing Machinery: New York, NY, USA, 2020; pp. 313–317. [Google Scholar]
- Bird, S.; Klein, E.; Loper, E. Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit; O’Reilly Media: Sebastopol, CA, USA, 2009. [Google Scholar]
- Jönsson, A.; Dahlbäck, N. Distilling dialogues—A method using natural dialogue corpora for dialogue systems development. In Proceedings of the 6th Applied Natural Language Processing Conference, Seattle, WA, USA, 29 April–4 May 2000; Association for Computational Linguistics: Stroudsburg, CA, USA, 2000; pp. 44–51. [Google Scholar]
- Larsson, S.; Santamarta, L.; Jönsson, A. Using the process of distilling dialogues to understand dialogue systems. In Proceedings of the 6th International Conference on Spoken Language Processing (ICSLP2000/INTERSPEECH2000), Beijing, China, 16–20 October 2000; pp. 374–377. [Google Scholar]
- Larman, C.; Basili, V.R. Iterative and incremental developments: A brief history. Computer 2003, 36, 47–56. [Google Scholar] [CrossRef]
- Strauss, A.; Corbin, J. Basics of Qualitative Research; Sage Publications: Thousand Oaks, CA, USA, 1990. [Google Scholar]
- Larsson, S. Issue-Based Dialogue Management; University of Gothenburg: Gothenburg, Sweden, 2002. [Google Scholar]
- Ginzburg, J. The Interactive Stance; Oxford University Press: New York, NY, USA, 2012. [Google Scholar]
- Maraev, V.; Bernardy, J.P.; Ginzburg, J. Dialogue management with linear logic: The role of metavariables in questions and clarifications. Trait. Autom. Des Langues 2020, 61, 43–67. [Google Scholar]
- Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef]
- Hempel, C.G.; Oppenheim, P. Studies in the Logic of Explanation. Philos. Sci. 1948, 15, 135–175. [Google Scholar] [CrossRef]
- Larsson, S.; Myrendal, J. Dialogue acts and updates for semantic coordination. In Proceedings of the 21st Workshop on the Semantics and Pragmatics of Dialogue, Saarbrücken, Germany, 15–17 August 2017; pp. 59–66. [Google Scholar]
- Breitholtz, E. Enthymemes and Topoi in Dialogue: The Use of Common Sense Reasoning in Conversation; Brill: Leiden, The Netherlands, 2020. [Google Scholar] [CrossRef]
- Berman, A.; Gregoromichelaki, E.; Parai, C. From Interpretability to Clinically Relevant Linguistic Explanations: The Case of Spinal Surgery Decision-Support. In Proceedings of the 17th International Conference on Agents and Artificial Intelligence—Volume 1: IAI, Porto, Portugal, 23–25 February 2025; INSTICC: Lisbon, Portugal, 2025; pp. 909–920. [Google Scholar] [CrossRef]
- Ducrot, O. Topoï et formes topiques. Bull. D’études Linguist. FrançAise 1988, 22, 1–14. [Google Scholar]
- Grice, H.P. Logic and conversation. Syntax Semant. 1975, 3, 43–58. [Google Scholar]
- Sacks, H.; Schegloff, E.A.; Jefferson, G. A simplest systematics for the organization of turn-taking for conversation. Language 1974, 50, 696–735. [Google Scholar] [CrossRef]
- Hosseini, S.A. Dialogues Incorporating Enthymemes and Modelling of Other Agents’ Beliefs. Ph.D. Thesis, King’s College, London, UK, 2016. [Google Scholar]
- Chakraborti, T.; Kulkarni, A.; Sreedharan, S.; Smith, D.E.; Kambhampati, S. Explicability? Legibility? Predictability? Transparency? Privacy? Security? The emerging landscape of interpretable agent behavior. In Proceedings of the Twenty-Ninth International Conference on Automated Planning and Scheduling, Berkeley, CA, USA, 11–15 July 2019; Volume 29, pp. 86–96. [Google Scholar]
- Bench-Capon, T.J. Specification and implementation of Toulmin dialogue game. In Proceedings of the JURIX 1998, Groningen, The Netherlands, 8–9 December 1998; Volume 98, pp. 5–20. [Google Scholar]
- Shaheen, Q.u.a.; Toniolo, A.; Bowles, J.K.F. Dialogue Games for Explaining Medication Choices. In Rules and Reasoning: 4th International Joint Conference, Oslo, Norway, 29 June–1 July 2020; Gutiérrez-Basulto, V., Kliegr, T., Soylu, A., Giese, M., Roman, D., Eds.; Springer: Cham, Switzerland, 2020; pp. 97–111. [Google Scholar]
- Prakken, H. Coherence and Flexibility in Dialogue Games for Argumentation. J. Log. Comput. 2005, 15, 1009–1040. [Google Scholar] [CrossRef]
- Sklar, E.I.; Azhar, M.Q. Explanation through argumentation. In Proceedings of the 6th International Conference on Human-Agent Interaction, Southampton, UK, 15–18 December 2018; pp. 277–285. [Google Scholar]
- Vassiliades, A.; Bassiliades, N.; Patkos, T. Argumentation and explainable artificial intelligence: A survey. Knowl. Eng. Rev. 2021, 36, e5. [Google Scholar] [CrossRef]
- Feustel, I.; Rach, N.; Minker, W.; Ultes, S. Enhancing Model Transparency: A Dialogue System Approach to XAI with Domain Knowledge. In Proceedings of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Kyoto, Japan, 18–20 September 2024; Kawahara, T., Demberg, V., Ultes, S., Inoue, K., Mehri, S., Howcroft, D., Komatani, K., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2024; pp. 248–258. [Google Scholar] [CrossRef]
- Schindler, C.; Feustel, I.; Rach, N.; Minker, W. Automatic Generation of Structured Domain Knowledge for Dialogue-based XAI Systems. In Proceedings of the 15th International Workshop on Spoken Dialogue Systems Technology, Bilbao, Spain, 27–30 May 2025; Torres, M.I., Matsuda, Y., Callejas, Z., del Pozo, A., D’Haro, L.F., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2025; pp. 1–11. [Google Scholar]
- Peräkylä, A. Validity in Research on Naturally Occurring Social Interaction. In Qualitative Research: Issues of Theory, Method and Practice, 3rd ed.; Silverman, D., Ed.; Sage: London, UK, 2011; pp. 365–382. [Google Scholar]
- Seedhouse, P. Conversation analysis as research methodology. In Applying Conversation Analysis; Palgrave Macmillan: London, UK, 2005; pp. 251–266. [Google Scholar]
- Hase, P.; Bansal, M. Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior? In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; Jurafsky, D., Chai, J., Schluter, N., Tetreault, J., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 5540–5552. [Google Scholar] [CrossRef]
- Kamar, E. Directions in Hybrid Intelligence: Complementing AI Systems with Human Intelligence. In Proceedings of the IJCAI, New York, NY, USA, 9–15 July 2016; pp. 4070–4073. [Google Scholar]
- Lai, V.; Tan, C. On Human Predictions with Explanations and Predictions of Machine Learning Models: A Case Study on Deception Detection. In Proceedings of the FAT*, Atlanta, GA, USA, 29–31 January 2019. [Google Scholar]
- Vasconcelos, H.; Jörke, M.; Grunde-McLaughlin, M.; Krishna, R.; Gerstenberg, T.; Bernstein, M.S. When do XAI methods work? A cost-benefit approach to human-AI collaboration. In Proceedings of the CHI Workshop on Trust and Reliance in AI-Human Teams, New Orleans, LA, USA, 30 April 2022; pp. 1–15. [Google Scholar]




| Trial 1 | Trial 2 | Trial 3 | Trial 4 | Trial 5 | Total | |
|---|---|---|---|---|---|---|
| Channel | Colleagues | Colleagues | Students | Public | Public | |
| Date | Jun 2022 | Jun 2022 | Apr 2024 | Sep 2024 | Oct 2024 | |
| Chats/participant | 1 | 1 | 2 | 2 | 2 | |
| Participants | 3 | 4 | 15 | 46 | 6 | 74 |
| Utterances | 113 | 114 | 190 | 284 | 78 | 779 |
| Dialogues | 1 | 2 | 12 | 18 | 2 | 35 |
| Excluded | 1 | 0 | 0 | 5 | 2 | 8 |
| In scope | 1 | 1 | 5 | 4 | 2 | 13 |
| Operator | Respondent | Total | |
|---|---|---|---|
| Utterances/dialogue | 12.5 (14.7) | 9.7 (10.7) | 22.3 (24.6) |
| Tokens/utterances | 9.5 (10.6) | 8.6 (8.6) | 9.1 (9.8) |
| Trial 4 | Trial 5 | Total | |
|---|---|---|---|
| Gender | |||
| Female | 7 | 2 | 9 |
| Male | 9 | 2 | 11 |
| Other | 1 | 0 | 1 |
| Age | |||
| 18–24 years old | 1 | 0 | 1 |
| 25–34 years old | 3 | 0 | 3 |
| 35–44 years old | 2 | 1 | 3 |
| 45–54 years old | 6 | 2 | 8 |
| 55–64 years old | 3 | 1 | 4 |
| 65–74 years old | 1 | 0 | 1 |
| 75 years old or older | 1 | 0 | 1 |
| Education level | |||
| No schooling completed | 0 | 0 | 0 |
| Completed high school/gymnasium | 0 | 0 | 0 |
| Completed university degree | 17 | 4 | 21 |
| Type of Information | Corpus Example(s) |
|---|---|
| Predicted value or class, e.g., that a person is extraverted or scores −2.3 on extraversion | 11, 13–15, 18, 20, 23 and 32 |
| Feature value characterisation, i.e., whether a feature value is high or low in relation to a (potentially implicit) reference value, e.g., that a person likes low-energy music | 13, 14, 23 and 32 |
| Warrant, i.e., general inference rules/patterns used by the model when making its predictions, e.g., that it associates a preference for silent music with scoring high on openness | 18–20 |
| Features based on which a prediction is made, e.g., that openness is estimated based on preferences concerning danceability, valence, instrumentalness, and speechiness | 11 |
| General definition of a term, e.g., that “openness to experience” describes a dimension of cognitive style that distinguishes imaginative, creative people from down-to-earth, conventional people | 20, 21, 23 and 24 |
| Potential implications of a specific prediction, e.g., that low agreeableness implies being less likely to cooperate and socialise | 21 |
| Inference steps or calculations on which a prediction is based | 12 |
| Model information, e.g., model type and nature of training data | 33 and 34 |
| Dialogue Capability | Corpus Example(s) |
|---|---|
| Question answering and information delivery | |
| Answer wh-question, e.g., concerning prediction outcomes, meaning of terms, or explanations for predictions | 4, 6, 10, 12, 13, 18 and 20 |
| Deliver explanation unpromptedly, assuming that the system has some means to determine whether an explanation should be provided together with the prediction | 9 |
| Select most relevant answer, e.g., datum (feature level characterisation), warrant or features, assuming that the system has some means of assessing relevance 1 | See Feature level characterisation, Warrant and Features in Table 4 |
| Confirm/disconfirm hypothetical explanation, e.g., concerning whether a datum supports a prediction | 22 |
| Provide multiple answers, either in a single utterance and/or incrementally across multiple utterances and using continuation markers when appropriate 2 | 11–13 (single utterance); 20 (incrementally) |
| Provide contradictory evidence contrastively, e.g., that a particular circumstance supports a prediction, while another circumstance speaks against it | 17 |
| Context management | |
| Resolve ellipsis, e.g., implicit content of why-questions | 3–5 |
| Grounding and meta-communication | |
| Deliver additional information when user provides an acknowledgement, if such information is available 3 | 23 |
| Signal presupposition violation if the user’s utterance presupposes that the system holds a view which it in fact does not | 30 and 31 |
| Signal answer unavailability if the user asks a question for which no answer is available | 7′, 24′ and 25 |
| Provide negative understanding feedback if a sub-phrase in the user’s utterance cannot be mapped onto the system’s knowledge representations | 14′ and 15′ |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Berman, A.; Howes, C. Informing Design and Research Concerning Conversationally Explainable AI Systems by Collecting and Distilling Human Explanatory Dialogues. Information 2026, 17, 123. https://doi.org/10.3390/info17020123
Berman A, Howes C. Informing Design and Research Concerning Conversationally Explainable AI Systems by Collecting and Distilling Human Explanatory Dialogues. Information. 2026; 17(2):123. https://doi.org/10.3390/info17020123
Chicago/Turabian StyleBerman, Alexander, and Christine Howes. 2026. "Informing Design and Research Concerning Conversationally Explainable AI Systems by Collecting and Distilling Human Explanatory Dialogues" Information 17, no. 2: 123. https://doi.org/10.3390/info17020123
APA StyleBerman, A., & Howes, C. (2026). Informing Design and Research Concerning Conversationally Explainable AI Systems by Collecting and Distilling Human Explanatory Dialogues. Information, 17(2), 123. https://doi.org/10.3390/info17020123

