The Role of XAI in Advice-Taking from a Clinical Decision Support System: A Comparative User Study of Feature Contribution-Based and Example-Based Explanations

Du, Yuhan; Antoniadi, Anna Markella; McNestry, Catherine; McAuliffe, Fionnuala M.; Mooney, Catherine

doi:10.3390/app122010323

Open AccessArticle

The Role of XAI in Advice-Taking from a Clinical Decision Support System: A Comparative User Study of Feature Contribution-Based and Example-Based Explanations

by

Yuhan Du

¹

,

Anna Markella Antoniadi

¹

,

Catherine McNestry

²,

Fionnuala M. McAuliffe

²

and

Catherine Mooney

^1,*

¹

UCD School of Computer Science, University College Dublin, D04 V1W8 Dublin, Ireland

²

UCD Perinatal Research Centre, School of Medicine, University College Dublin, National Maternity Hospital, D02 NX40 Dublin, Ireland

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(20), 10323; https://doi.org/10.3390/app122010323

Submission received: 30 September 2022 / Revised: 7 October 2022 / Accepted: 9 October 2022 / Published: 13 October 2022

(This article belongs to the Special Issue Decision Support Systems for Disease Detection and Diagnosis)

Download

Browse Figures

Versions Notes

Abstract

:

Explainable artificial intelligence (XAI) has shown benefits in clinical decision support systems (CDSSs); however, it is still unclear to CDSS developers how to select an XAI method to optimize the advice-taking of healthcare practitioners. We performed a user study on healthcare practitioners based on a machine learning-based CDSS for the prediction of gestational diabetes mellitus to explore and compare two XAI methods: explanation by feature contribution and explanation by example. Participants were asked to make estimates for both correctly and incorrectly predicted cases to determine if there were any over-reliance or self-reliance issues. We examined the weight of advice and healthcare practitioners’ preferences. Our results based on statistical tests showed no significant difference between the two XAI methods regarding the advice-taking. The CDSS explained by either method had a substantial impact on the decision-making of healthcare practitioners; however, both methods may lead to over-reliance issues. We identified the inclination towards CDSS use as a key factor in the advice-taking from an explainable CDSS among obstetricians. Additionally, we found that different types of healthcare practitioners had differing preferences for explanations; therefore, we suggest that CDSS developers should select XAI methods according to their target users.

Keywords:

artificial intelligence; explainable AI; XAI; machine learning; clinical decision support system; CDSS; advice-taking; reliance; trust; user study

1. Introduction

Clinical decision support systems (CDSSs) can assist healthcare practitioners in various decisions and patient care tasks [1], and are expected to be especially helpful in low-resource settings. With the substantial progress artificial intelligence (AI) has made in the medical field towards clinical application [2], much research has been done on the development of AI/machine learning-based CDSSs to support decision-making in many aspects, such as identifying prescriptions with a high risk of medication error [3], diagnosing breast cancer [4] and predicting its recurrence [5], recommending treatment and predicting prognosis in patients with hepatocellular carcinoma [6], predicting patient quality of life in amyotrophic lateral sclerosis [7], automating sleep spindle detection [8], etc. AI/machine learning-based CDSSs are able to process large-scale and complex data and generalize patterns from historical data to predict the outcomes of new and unseen data, having great potential to improve healthcare delivery.

With the advancements in machine and deep learning techniques, several reviews have identified that, among numerous studies on the application of AI/machine learning towards clinical decision support, artificial neural networks (ANNs) or support vector machines (SVMs) are the most common algorithms used in various medical domains [9,10,11]. The algorithmic complexity of these algorithms often yields high performance; however, the issues associated with them are that they are black-boxes and their system reasoning is difficult to understand by healthcare practitioners. This lack of explainability poses a barrier to the use of AI/machine learning-based CDSSs. A model by Caruana et al. [12] suggested patients with asthma had lower risks of dying from pneumonia, but this was because these patients tended to be admitted in the Intensive Care Unit (ICU) and receive aggressive care. This showed that a machine learning-based system can reflect true patterns in the training data, but still be problematic if trusted blindly in clinical practice. Incorporating explainable AI (XAI) in CDSSs could help prevent such mistakes being made, increase trustworthiness and acceptability of these systems and their potential to be adopted in clinical practice [13]. Moreover, the World Health Organization AI Guidelines for Health endorsed ensuring explainability as a principle for the appropriate use of AI for health [14].

A literature review on XAI in CDSSs identified a distinct lack of application of XAI in AI/machine learning-based CDSSs and a lack of user studies for XAI-enabled CDSSs [13]. Nevertheless, the limited number of user studies has shown the importance of XAI in CDSSs. A user study by Panigutti et al. [15] showed that an explanation leads to greater advice-taking and implicit trust of a CDSS, and healthcare providers preferred it when an explanation was presented. This is consistent with Antoniadi et al. [16], who found that healthcare professionals preferred an explanation if given the option. Schoonderwoerd et al. [17] and Hwang et al. [18] also discovered that clinicians need explanations of CDSS outputs. Furthermore, Bussone et al. [19] found that providing a comprehensive list of facts of patients to explain why the CDSS provided the outputs had a positive impact on healthcare professionals’ trust and reliance, but led to over-reliance, whereas a selective one resulted in self-reliance. As shown in Figure 1, over-reliance refers to healthcare practitioners putting too much trust on the CDSS, even following the incorrect outputs, which may lead to the misuse of CDSS; self-reliance means healthcare practitioners neglect the correct outputs, which may result in CDSS disuse [20]. Appropriate reliance on a CDSS should be promoted, where healthcare practitioners follow the correct outputs but reject incorrect ones.

Aside from the limited amount of user studies in general, there is a lack of comparison of different types of XAI methods with healthcare practitioners in CDSSs. According to Wang et al. [21], the effects of explanations are largely different for decision-making tasks where people have different levels of domain expertise, and also the effects of different established XAI methods differ. To date, it is still unclear which XAI method works best for a CDSS, and what the underlying factors affecting a method’s effectiveness are. The effectiveness of a method comes not only from user preference or understanding, but also from guiding the user towards a correct decision. Therefore, there is a need for investigating appropriate advice-taking from a CDSS with both correct and incorrect predictions, as suggested by Panigutti et al. [15].

One of the main categories of XAI methods is the granularity of explanations; this is, whether an explanation discloses the overall model behavior (global explanation), or provides information about a suggestion regarding a specific patient (local explanation). Since the majority of developers of XAI-enabled CDSSs has focused on providing local explanations [13], we focused on two common and intuitive XAI methods that provide local explanations, namely explanation by feature contribution and explanation by example. We aimed to explore and compare the impact of these XAI methods on the advice-taking of healthcare practitioners in an explainable CDSS using both correctly and incorrectly predicted cases, as well as to explore healthcare practitioners’ preference through a user study. The CDSS we used is a machine learning-based CDSS developed for the prediction of gestational diabetes mellitus (GDM) [22]. We ask the following research questions (RQs):

Does an explainable CDSS have any impact on healthcare practitioners’ decision-making?
Which XAI method leads to overall higher advice-taking for both correctly and incorrectly predicted cases? Is it associated with healthcare practitioners’ clinical expertise?
Do these XAI methods lead to appropriate advice-taking, over-reliance or self-reliance? Are they associated with healthcare practitioners’ clinical expertise?
Which XAI method or combination do healthcare practitioners prefer in a CDSS?
Is the advice-taking from an explainable CDSS associated with healthcare practitioners’ expertise, prior experience of CDSS, their attitude towards the use of CDSS, etc.?

2. Materials and Methods

In this section, we describe the recruitment of participants, the CDSS and two XAI methods used in the survey, the decision-making task, experimental design and data collection process. Figure 2 shows the workflow overview of this study.

2.1. Participants Recruitment

We ran an online survey using an anonymous Google form with healthcare practitioners caring for pregnant women who have or may develop GDM, including obstetricians, midwives, general practitioners (GPs), and dietitians. The survey was sent to healthcare practitioners in Ireland via an email invite to complete the survey. Ethical approval for this study was granted by University College Dublin Human Research Ethics Committee (LS-E-22-02-Mooney) on 6 January 2022. Written informed consent was obtained from all participants.

2.2. CDSS for the Prediction of GDM

To evaluate the impact of explanations on participants’ advice-taking from a CDSS, we used our machine learning-based CDSS developed to predict the risk of GDM in women with overweight and obesity [22]. The CDSS consists of three different models that can predict GDM in theoretical, normal antenatal visit and remote settings. In this study, we used the model for a normal antenatal visit setting, due to the ease of use of the model in clinical practice. The model was developed based on a population from the Pregnancy Exercise and Nutrition Research Study (PEARS) dataset [23]. The majority of this population are white Irish women. Only descriptive features available in clinical routine were included for this model. After appropriate data preparation, synthetic minority oversampling technique and feature selection, the model was trained using SVM, a black-box machine learning method, in Python 3.8.8 using scikit-learn 0.24.2 (see [22] for more details). The model can predict GDM risk based on five features available at the first antenatal visit in Ireland: pregnant women’s family history of diabetes mellitus, age, body weight, serum white cell count and gestational age at the first antenatal visit. At a threshold of 0.5, it classifies pregnant women into those at low/high risk of developing GDM.

2.3. Explanation by Feature Contribution

To explain the effect of each feature on a certain prediction by our SVM model, we used Shapley additive explanations (SHAP) [24], a post hoc XAI method based on game theory that can explain the output of any machine learning model. It was applied to improve explainability in the context of machine learning-based CDSSs [16,25], and it has been shown to generate reliable explanations for our CDSS [22].

We provided bar plots of SHAP values which showed the contribution of each feature to a prediction. A red bar suggests the feature increases the predicted GDM risk, whereas a blue bar suggests the feature decreases the risk. The length of the bar indicates the magnitude of the contribution. We also provided some textual explanations of which features increase or decrease the GDM risk. Figure 3 shows the CDSS prediction and explanation by feature contribution for one of the exemplar cases presented in the Google form survey.

2.4. Explanation by Example

For each case, we also used the most similar case from the training set that has the same outcome as the prediction of the original case as an example-based explanation. The most similar case selected is the nearest neighbor using standard Euclidean distance. The maternal characteristics of the most similar case are presented both in text and in a table against the original case. Figure 4 shows the CDSS prediction and explanation by example for one of the exemplar cases presented in the Google form survey.

2.5. Decision-Making Task

The decision-making task is the prediction of GDM risk in women with body mass index (BMI) > 25 kg/m², on the scale of 0–100%. Participants were first presented with the patient’s five maternal characteristics at the first antenatal visit (family history of diabetes mellitus, age, body weight, serum white cell count, gestational age) only and were asked to make an initial estimate of this patient’s GDM risk based on their experience and expertise. Then they were presented with the risk category (i.e., high risk or low risk) predicted by our CDSS and an explanation, along with maternal characteristics, and were asked to make a second and final estimate.

2.6. Experimental Design and Data Collection

We adopted a within-subject design in this study. Each participant was asked to perform the decision-making task twice for each case, both before and after the CDSS prediction and explanation were presented, as described in Section 2.5. Four different, yet comparable cases were selected for the survey; these cases emanated from the independent test set in the development of the CDSS (see [22]). These four cases were women with BMI 25–39.9 kg/m² recruited at the National Maternity Hospital in Dublin, Ireland between 2013 and 2016 in the PEARS study [23]. Two of them were correctly predicted by our CDSS and two were incorrectly predicted, for the purpose of identifying any self-reliance or over-reliance on the CDSS. These cases were confirmed to be analogous by a qualified obstetrician, and were grouped in pairs.

For the comparison of the two types of explanations, one group of correct and incorrect cases were explained by feature contribution, and the other group by example, randomly assigned for each participant. The order of the correctly and incorrectly predicted cases in each group was also randomized in order to prevent order effect. Due to time restrictions, we did not ask participants to interact with the CDSS prototype directly. We presented the cases, predictions of our CDSS and explanations in text and figures in the Google form.

To measure the impact of CDSS predictions and explanations on participants’ GDM risk estimation, we used the weight of advice (WOA) as a dependent variable. WOA is defined as:

WOA = \frac{| f i n a l e s t i m a t e - i n i t i a l e s t i m a t e |}{| a d v i c e - i n i t i a l e s t i m a t e |}

. It measures the percentage shift in judgment after advice, and it quantifies how much the participants follow the advice they receive. It was employed in several advice-taking studies [26,27], and it was previously used to measure healthcare providers’ advice-taking from an AI-based CDSS [15]. In this study, the initial and final estimates are participants’ estimation of the GDM risk before and after the CDSS prediction and explanation, on the scale of 0–100%. The CDSS gives binary classification, predicting whether a case is at low risk or high risk of GDM. Therefore, the advice is either 0% or 100%. WOA ranges from 0 to 1, where 1 means a participant follow the prediction of our CDSS completely and 0 means the participant does not take any advice from our CDSS.

At the end of the survey, participants were asked which explanation or combination they preferred. They were asked to leave any comments (optional). They were also asked about their clinical expertise (type of healthcare practitioner, years of experience), their prior experience of CDSS, and their inclination towards CDSS use (11-point Likert scale, from 0 = “Not at all inclined” to 10 = “Extremely inclined”).

The survey (see Form S1) was adapted from what was described in [15] with changes, and was carefully designed in consultation with computer science experts and qualified healthcare practitioners. All names in the survey are pseudonyms. The survey was tested by qualified obstetricians before dissemination.

3. Results

A total of 26 healthcare practitioners completed this survey. We discarded one response from a GP who made an initial estimate of 0 for a case, which led to an undefined WOA value (

i n i t i a l e s t i m a t e = a d v i c e

). Responses from 25 participants were retained for analysis: 19 (76%) obstetricians, two (8%) midwives, and four (16%) dietitians. The average years of clinical experience was 11.68 years (standard deviation = 7.010), ranging from 3 to 34 years. The majority of participants (20, 80%) had no prior experience using a CDSS or any similar computer system, whereas five (20%, all obstetricians) did. In general, participants were inclined to use a CDSS (11-point Likert scale mean = 6.24, standard deviation = 2.538). Most participants (19, 76%) showed high inclination (11-point Likert scale > 5) towards the use of CDSSs. Only a small proportion (6, 24%) were not inclined (11-point Likert scale < 5) to CDSS use. None gave a neutral answer (11-point Likert scale = 5).

All data analysis was performed in R version 3.6.3.

3.1. RQ1: Impact on Decision-Making

Figure 5 shows the distribution of WOA of our CDSS with explanation by feature contribution or example. One-sample one-sided Wilcoxon signed-rank tests indicated that WOA for both XAI methods were significantly greater than zero (explanation by feature contribution: V = 780, p-value < 0.001; explanation by example: V = 861, p-value < 0.001). This showed that the participating healthcare practitioners took a substantial amount of advice from our CDSS with either type of explanation, and that our explainable CDSS with either type of explanation had a significant impact on healthcare practitioners’ decision-making.

3.2. RQ2: Overall Advice-Taking

Table 1 shows the comparison between the WOA for these two XAI methods using paired-samples two-sided Wilcoxon signed-rank tests. We found no statistically significant difference between these methods in the advice-taking of all participating healthcare practitioners (V = 536.5, p-value = 0.775). The rate of no advice-taking (WOA = 0) was 22% for explanation by feature contribution and 18% for explanation by example. A Fisher’s exact test showed no significant difference between these XAI methods (p-value = 0.803). Additionally, there was no significant difference between these methods in any subgroups of participants grouped by their clinical expertise, including, the type of healthcare practitioners and years of experience (see Table 1). Overall, we did not find significant differences in the impact of the two XAI methods on advice-taking of healthcare practitioners, regardless of their clinical expertise.

3.3. RQ3: Appropriate Advice-Taking, Over-Reliance or Self-Reliance

For both XAI methods, the paired-samples two-sided Wilcoxon signed-rank tests found no significant difference between the WOA for correct and incorrect cases predicted by our CDSS (explanation by feature contribution: V = 164.5, p-value = 0.429; explanation by example: V = 87, p-value = 0.206; see Figure 6). Interestingly, Figure 6 shows that for explanation by example, the WOA for the incorrect cases was higher than that for the correct cases, although this difference was not statistically significant. Additionally, we found no significant difference in the WOA between correct and incorrect cases among any subgroups of participants grouped by their clinical expertise, as shown in Table 2. Regarding the rate of no advice-taking for all participants, Fisher’s exact tests showed no significant difference between these cases for both XAI methods (p-value = 1 for both methods). Our results suggest that neither XAI method showed a sign of appropriate advice-taking when incorporated in our CDSS, and both methods may lead to over-reliance issues, regardless of the expertise of healthcare practitioners.

One-sample one-sided Wilcoxon signed-rank tests showed that the WOA of our CDSS with either XAI method for the correctly predicted cases was significantly greater than zero (explanation by feature contribution: V = 190, p-value < 0.001; explanation by example: V = 210, p-value < 0.001), which did not suggest any overall self-reliance among the participants for the two XAI methods.

3.4. RQ4: Preferred XAI Method

When asked which type of explanation they would prefer in a CDSS, almost half (12, 48%) of participants preferred to see both explanations, as shown in Figure 7. For participants who preferred one over the other, there was a clear preference for explanation by feature contribution (10, 40%) over explanation by example (1, 4%). Only two (8%) participants preferred none of the explanations in a CDSS.

Among all participating obstetricians, more than half (10, 52.6%) preferred explanation by feature contribution. The rest preferred both explanations (7, 36.8%) or none of the explanations (2, 10.5%). Among the two participating midwives, one preferred explanation by example and one preferred both explanations. All participating dietitians preferred both explanations.

One participant justified the preference for explanation by feature contribution as one similar example only was not helpful, and suggested the use of a large group of similar examples:

“I don’t think comparing to one similar patient is helpful, this does not predict the risk to our patient. If comparing to other patients it should be a comparison to a large group of similar patients, not just one case. ” (Obstetrician)

One participant commented that they preferred to see both explanations in a CDSS for better clarity:

“visually clearer” (Dietitian)

3.5. RQ5: Factors Associated with Advice-Taking

Having identified no significant difference between the two XAI methods, we looked into factors that may affect healthcare practitioners’ overall advice-taking from an explainable CDSS. Figure 8 compares the WOA of our CDSS explained by either of the XAI methods between subgroups of participants. The two-sided Wilcoxon rank sum tests suggested that WOA did not have a significant difference by the type of healthcare practitioner, years of experience, or prior experience of CDSS. For some of the comparisons, this may be due to the small sample size.

Obstetricians who explicitly expressed higher inclination (11-point Likert scale ⩾ 7) towards the use of CDSS had significantly overall higher WOA than those that expressed lower inclination (11-point Likert scale < 7) (W = 909, p-value = 0.049). This indicated that the inclination towards CDSS use was an important factor for the advice-taking from an explainable CDSS among obstetricians. However, this did not apply to the entire cohort of participants, either because of differences in the advice-taking behavior between different types of healthcare practitioners or the limited sample size.

3.6. Others

We investigated the association between healthcare practitioners’ preference for explanations and their advice-taking. Figure 9 shows the comparison of WOA between two XAI methods for those who preferred an explanation by feature contribution only. A paired-samples two-sided Wilcoxon signed-rank test indicated that, although these healthcare practitioners had a clear preference for explanations, there was no significant difference in their actual advice-taking between the two XAI methods (V = 85.5, p-value = 0.717).

In this study, we found that initial estimates of GDM risk can vary largely among the healthcare practitioners for the same case. For the four cases used in the survey, initial estimates of GDM risk ranged between 5–80%, 10–90%, 9–90% or 8–92%. This is potentially because there is no unified way to predict and quantify GDM risk on a scale of 0–100%.

We received a positive feedback on the clinical applicability of our CDSS for the prediction of GDM:

“… I can see its clinical applicability”. (Obstetrician)

However, some participants expressed concerns on our CDSS, including the input features it used, and the lack of universal definition and diagnostic criteria of GDM being a barrier to its clinical application:

“… Height matters. ” (Dietitian)

“… there is no internationally accepted definition of GDM and no accepted screening protocol … I think it is hard to understand the added clinical benefit of a risk prediction tool …”. (Obstetrician)

4. Discussion

Although explanations of CDSS outputs are deemed highly important for the appropriate use and acceptance of these systems, it is not yet clear what type of explanations can be more useful for their end-users. In the era of XAI, a variety of techniques to explain machine learning models are available, but little is known about their correlation with user-understanding [13]. We conducted a user study with healthcare practitioners of pregnant women in Ireland to investigate and compare two different types of XAI methods based on the machine learning-based CDSS we developed to predict GDM among women with overweight and obesity.

Our results, primarily based on obstetricians, showed that a CDSS, explained by either feature contribution or example, can have a substantial impact on the decision-making of healthcare practitioners, demonstrating the effectiveness of our explainable CDSS. It has been argued in literature, that example-based methods such as k-nearest neighbors (KNN), are motivated by human-reasoning [28,29,30,31]; thus, they may be appropriate for machine learning models used in the medical domain, as they can explain the model’s reasoning by allowing the user to compare it to similar case(s) and also apply their expertise to assess the quality of the suggestion [32]. In our work, we found no difference between the two XAI methods, explanation by feature contribution and explanation by example, regarding the advice-taking from a CDSS. Additionally, we found that both methods may lead to over-reliance issues, i.e., healthcare practitioners may be misled by an incorrect prediction of the CDSS. A previous study that compared rule-based and example-based explanations found that both types of explanations also led to over-reliance on the model’s predictions, although the participants in that study were not healthcare practitioners [33].

Despite many participating healthcare practitioners preferring both types of explanation, we noticed a clear preference for explanation by feature contribution over explanation by example, while the preference for XAI methods differed according to the expertise of healthcare practitioners. More than half of the obstetricians preferred an explanation by feature contribution only, whereas all participating dietitians preferred to have both an explanation by feature contribution and by example. The vast majority of participants preferred to have at least one of the XAI methods in a CDSS; therefore, we suggest that CDSS developers should incorporate some XAI methods in AI-based CDSSs where possible. It might be helpful to include multiple options of explanations for healthcare practitioners with different clinical expertise and cognitive styles. Healthcare practitioners have personal preferences concerning the information that they provide in their diagnosis [17], which would affect their preferred type of displayed information and explanation by a CDSS.

In addition, although most participating healthcare practitioners had no prior experience of using a CDSS, the majority were highly inclined to use a CDSS, indicating the potential for CDSS use in clinical settings. We identified the inclination towards CDSS use as an important factor that impacts the advice-taking among obstetricians. Computer-based training [34], involving machine learning education and its application in medical schools [35], may improve healthcare practitioners’ inclination towards technology use and may eventually enhance the adoption of machine learning-based systems in clinical practice. We did not find any association between the advice-taking of healthcare practitioners and their clinical expertise, years of experience or their prior experience of CDSS use.

One of the limitations of this study is the small sample size despite our best efforts for study participation. Our CDSS was developed mainly based on white Irish women, which limits our target participants to healthcare practitioners of pregnant women in Ireland only. In addition, an anecdotal feedback on our survey suggests that midwives and dietitians may be less comfortable with making decisions on GDM risk than obstetricians, which explains the limited number of responses from the former cohort. The decision-making task and features used in our CDSS were not exactly the same as those that healthcare practitioners normally perform or see in a clinical setting, which may have put some healthcare practitioners off. Another limitation of this study is that we did not provide any information on the accuracy of our CDSS, which might have impacted healthcare practitioners’ over-reliance on the system.

Future work could be conducted on the evaluation of CDSSs with other explanation methods, including example-based explanations with more than one example, with a focus on identifying the XAI methods that will lead to appropriate reliance on the system (i.e., users following correct suggestions and rejecting incorrect ones). User studies with other types of healthcare practitioners, preferably larger samples if possible, are necessary. Focus groups could also lead to the elicitation of further insights, knowledge, and opinions on the CDSS and its explanations.

5. Conclusions

This user study showed that our CDSS explained by either feature contribution or example had a significant impact on healthcare practitioners’ decision-making, demonstrating great potential in clinical settings. A comparison between these XAI methods showed no significant difference in the advice-taking from a CDSS, however, both methods may lead to over-reliance issues. Although many healthcare practitioners preferred to have both explanations in a CDSS, there was a clear preference for explanation by feature contribution over explanation by example. We identified the inclination towards CDSS use as an important factor in the advice-taking from an explainable CDSS among obstetricians. We also discovered that healthcare practitioners with different expertise preferred different types of explanations; therefore, CDSS developers should carefully select XAI methods based on the target users of the CDSS.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/app122010323/s1, Form S1: The Google form used to conduct this survey.

Author Contributions

Conceptualization, Y.D. and C.M. (Catherine Mooney); methodology, Y.D., A.M.A., C.M. (Catherine McNestry), F.M.M. and C.M. (Catherine Mooney); formal analysis, Y.D.; investigation, Y.D., A.M.A., C.M. (Catherine McNestry), F.M.M. and C.M. (Catherine Mooney); writing—original draft preparation, Y.D.; writing—review and editing, Y.D., A.M.A., C.M. (Catherine McNestry), F.M.M. and C.M. (Catherine Mooney); supervision, C.M. (Catherine Mooney). All authors have read and agreed to the published version of the manuscript.

Funding

PEARS study was supported by University College Dublin and National Maternity Hospital Medical Fund.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Human Research Ethics Committee of University College Dublin (protocol code LS-E-22-02-Mooney and date of approval 6 January 2022).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

No additional data are available.

Acknowledgments

We would like to gratefully thank people who helped disseminating the survey and all participating healthcare practitioners.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

CDSS	Clinical Decision Support System
XAI	Explainable Artificial Intelligence
AI	Artificial Intelligence
ANN	Artificial Neural Network
SVM	Support Vector Machine
RQ	Research Question
GP	General Practitioner
PEARS	Pregnancy Exercise and Nutrition Research Study
GDM	Gestational Diabetes Mellitus
SHAP	Shapley Additive Explanations
BMI	Body Mass Index
WOA	Weight of Advice
HCP	Healthcare Practitioner
KNN	K-Nearest Neighbours

References

Sutton, R.T.; Pincock, D.; Baumgart, D.C.; Sadowski, D.C.; Fedorak, R.N.; Kroeker, K.I. An overview of clinical decision support systems: Benefits, risks, and strategies for success. NPJ Digit. Med. 2020, 3, 1–10. [Google Scholar] [CrossRef] [Green Version]
Rajpurkar, P.; Chen, E.; Banerjee, O.; Topol, E.J. AI in health and medicine. Nat. Med. 2022, 28, 31–38. [Google Scholar] [CrossRef] [PubMed]
Corny, J.; Rajkumar, A.; Martin, O.; Dode, X.; Lajonchère, J.P.; Billuart, O.; Bézie, Y.; Buronfosse, A. A machine learning–based clinical decision support system to identify prescriptions with a high risk of medication error. J. Am. Med. Inform. Assoc. 2020, 27, 1688–1694. [Google Scholar] [CrossRef] [PubMed]
Ragab, M.; Albukhari, A.; Alyami, J.; Mansour, R.F. Ensemble deep-learning-enabled clinical decision support system for breast cancer diagnosis and classification on ultrasound images. Biology 2022, 11, 439. [Google Scholar] [CrossRef] [PubMed]
Massafra, R.; Latorre, A.; Fanizzi, A.; Bellotti, R.; Didonna, V.; Giotta, F.; La Forgia, D.; Nardone, A.; Pastena, M.; Ressa, C.M.; et al. A clinical decision support system for predicting invasive breast cancer recurrence: Preliminary results. Front. Oncol. 2021, 11, 576007. [Google Scholar] [CrossRef] [PubMed]
Choi, G.H.; Yun, J.; Choi, J.; Lee, D.; Shim, J.H.; Lee, H.C.; Chung, Y.H.; Lee, Y.S.; Park, B.; Kim, N.; et al. Development of machine learning-based clinical decision support system for hepatocellular carcinoma. Sci. Rep. 2020, 10, 14855. [Google Scholar] [CrossRef] [PubMed]
Antoniadi, A.M.; Galvin, M.; Heverin, M.; Hardiman, O.; Mooney, C. Development of an explainable clinical decision support system for the prediction of patient quality of life in amyotrophic lateral sclerosis. In Proceedings of the 36th Annual ACM Symposium on Applied Computing, Virtual Event, Republic of Korea, 22–26 March 2021; pp. 594–602. [Google Scholar]
Wei, L.; Ventura, S.; Ryan, M.A.; Mathieson, S.; Boylan, G.B.; Lowery, M.; Mooney, C. Deep-spindle: An automated sleep spindle detection system for analysis of infant sleep spindles. Comput. Biol. Med. 2022, 150, 106096. [Google Scholar] [CrossRef] [PubMed]
Safdar, S.; Zafar, S.; Zafar, N.; Khan, N.F. Machine learning based decision support systems (DSS) for heart disease diagnosis: A review. Artif. Intell. Rev. 2018, 50, 597–623. [Google Scholar] [CrossRef]
Bertl, M.; Metsallik, J.; Ross, P. A systematic literature review of AI-based digital decision support systems for post-traumatic stress disorder. Front. Psychiatry 2022, 13, 923613. [Google Scholar] [CrossRef]
Mazo, C.; Aura, C.; Rahman, A.; Gallagher, W.M.; Mooney, C. Application of Artificial Intelligence Techniques to Predict Risk of Recurrence of Breast Cancer: A Systematic Review. J. Pers. Med. 2022, 12, 1496. [Google Scholar] [CrossRef] [PubMed]
Caruana, R.; Lou, Y.; Gehrke, J.; Koch, P.; Sturm, M.; Elhadad, N. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, 10–13 August 2015; pp. 1721–1730. [Google Scholar]
Antoniadi, A.M.; Du, Y.; Guendouz, Y.; Wei, L.; Mazo, C.; Becker, B.A.; Mooney, C. Current Challenges and Future Opportunities for XAI in Machine Learning-Based Clinical Decision Support Systems: A Systematic Review. Appl. Sci. 2021, 11, 5088. [Google Scholar] [CrossRef]
World Health Organization. Ethics and Governance of Artificial Intelligence for Health: WHO Guidance. 2021. Available online: https://www.who.int/publications/i/item/9789240029200 (accessed on 4 October 2022).
Panigutti, C.; Beretta, A.; Giannotti, F.; Pedreschi, D. Understanding the Impact of Explanations on Advice-Taking: A User Study for AI-Based Clinical Decision Support Systems. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, New Orleans, LA, USA, 29 April–5 May 2022; Association for Computing Machinery: New York, NY, USA, 2022. [Google Scholar]
Antoniadi, A.M.; Galvin, M.; Heverin, M.; Wei, L.; Hardiman, O.; Mooney, C. A Clinical Decision Support System for the Prediction of Quality of Life in ALS. J. Pers. Med. 2022, 12, 435. [Google Scholar] [CrossRef] [PubMed]
Schoonderwoerd, T.A.; Jorritsma, W.; Neerincx, M.A.; van den Bosch, K. Human-centered XAI: Developing design patterns for explanations of clinical decision support systems. Int. J. -Hum.-Comput. Stud. 2021, 154, 102684. [Google Scholar] [CrossRef]
Hwang, J.; Lee, T.; Lee, H.; Byun, S. A Clinical Decision Support System for Sleep Staging Tasks with Explanations from Artificial Intelligence: User-Centered Design and Evaluation Study. J. Med. Internet Res. 2022, 24, e28659. [Google Scholar] [CrossRef] [PubMed]
Bussone, A.; Stumpf, S.; O’Sullivan, D. The role of explanations on trust and reliance in clinical decision support systems. In Proceedings of the 2015 international conference on healthcare informatics, Dallas, TX, USA, 21–23 October 2015; IEEE; IEEE: Dallas, TX, USA, 2015; pp. 160–169. [Google Scholar]
Parasuraman, R.; Riley, V. Humans and automation: Use, misuse, disuse, abuse. Hum. Factors 1997, 39, 230–253. [Google Scholar] [CrossRef]
Wang, X.; Yin, M. Are explanations helpful? a comparative study of the effects of explanations in ai-assisted decision-making. In Proceedings of the 26th International Conference on Intelligent User Interfaces, College Station, TX, USA, 14–17 April 2021; pp. 318–328. [Google Scholar]
Du, Y.; Rafferty, A.R.; McAuliffe, F.M.; Wei, L.; Mooney, C. An explainable machine learning-based clinical decision support system for prediction of gestational diabetes mellitus. Sci. Rep. 2022, 12, 1170. [Google Scholar] [CrossRef] [PubMed]
Kennelly, M.A.; Ainscough, K.; Lindsay, K.L.; O’Sullivan, E.; Gibney, E.R.; McCarthy, M.; Segurado, R.; DeVito, G.; Maguire, O.; Smith, T.; et al. Pregnancy exercise and nutrition with smartphone application support: A randomized controlled trial. Obstet. Gynecol. 2018, 131, 818–826. [Google Scholar] [CrossRef] [PubMed]
Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4765–4774. [Google Scholar]
Hu, C.A.; Chen, C.M.; Fang, Y.C.; Liang, S.J.; Wang, H.C.; Fang, W.F.; Sheu, C.C.; Perng, W.C.; Yang, K.Y.; Kao, K.C.; et al. Using a machine learning approach to predict mortality in critically ill influenza patients: A cross-sectional retrospective multicentre study in Taiwan. BMJ Open 2020, 10, e033898. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Harvey, N.; Fischer, I. Taking advice: Accepting help, improving judgment, and sharing responsibility. Organ. Behav. Hum. Decis. Process. 1997, 70, 117–133. [Google Scholar] [CrossRef] [Green Version]
Yaniv, I. Receiving other people’s advice: Influence and benefit. Organ. Behav. Hum. Decis. Process. 2004, 93, 1–13. [Google Scholar] [CrossRef]
Hsu, K.H.; Chiu, C.; Chiu, N.H.; Lee, P.C.; Chiu, W.K.; Liu, T.H.; Hwang, C.J. A case-based classifier for hypertension detection. Knowl.-Based Syst. 2011, 24, 33–39. [Google Scholar] [CrossRef]
Cai, C.J.; Jongejan, J.; Holbrook, J. The Effects of Example-Based Explanations in a Machine Learning Interface. In Proceedings of the 24th International Conference on Intelligent User Interfaces, Marina del Ray, CA, USA, 17–20 March 2019; Association for Computing Machinery: New York, NY, USA, 2019; pp. 258–262. [Google Scholar]
McDermid, J.A.; Jia, Y.; Porter, Z.; Habli, I. Artificial intelligence explainability: The technical and ethical dimensions. Philos. Trans. R. Soc. A 2021, 379, 20200363. [Google Scholar] [CrossRef]
Kenny, E.M.; Keane, M.T. Explaining Deep Learning using examples: Optimal feature weighting methods for twin systems using post-hoc, explanation-by-example in XAI. Knowl.-Based Syst. 2021, 233, 107530. [Google Scholar] [CrossRef]
Caruana, R.; Kangarloo, H.; Dionisio, J.D.; Sinha, U.; Johnson, D. Case-based explanation of non-case-based learning methods. In Proceedings of the AMIA Symposium, American Medical Informatics Association, Washington, DC, USA, 6–10 November 1999; p. 212. [Google Scholar]
van der Waa, J.; Nieuwburg, E.; Cremers, A.; Neerincx, M. Evaluating XAI: A comparison of rule-based and example-based explanations. Artif. Intell. 2021, 291, 103404. [Google Scholar] [CrossRef]
Matthews, M.; Doherty, G.; Coyle, D.; Sharry, J. Designing mobile applications to support mental health interventions. In Handbook of Research on User Interface Design and Evaluation for Mobile Technology; IGI Global: Hershey, PA, USA, 2008; pp. 635–656. [Google Scholar]
Kolachalama, V.B.; Garg, P.S. Machine learning and medical education. NPJ Digit. Med. 2018, 1, 1–3. [Google Scholar] [CrossRef]

Figure 1. Interaction between healthcare practitioners and a CDSS, classified into over-reliance, self-reliance, and appropriate reliance.

Figure 2. Workflow Overview. The order of the explanations and patients/cases is randomized.

Figure 3. CDSS prediction and explanation by feature contribution for an exemplar case.

Figure 4. CDSS prediction and explanation by example for an exemplar case.

Figure 5. WOA for explanation by feature contribution and explanation by example. This figure shows that our CDSS with either type of explanation has a significant impact on the decision-making (WOA significantly greater than zero).

Figure 6. WOA for explanation by feature contribution and explanation by example for correctly and incorrectly predicted cases.

Figure 7. Healthcare practitioners’ preference for explanations in a CDSS.

Figure 8. Comparison of the overall WOA of an explainable CDSS between subgroups of participants. (a) Comparison between obstetricians and other HCPs (midwives and dietitians). (b) Comparison between HCPs with ⩽ and >10 years of experience. (c) Comparison between HCPs with and without prior experience of CDSS use. (d) Comparison between all HCPs who were less and more inclined to CDSS use. (e) Comparison between obstetricians who were less and more inclined to CDSS use. Significant differences are highlighted in bold. HCP: healthcare practitioner.

Figure 9. WOA for feature contribution and example for participants who preferred explanation by feature contribution.

Table 1. Comparison of WOA between two XAI methods in all and subgroups of participants. Median is used as cutoff for continuous features (years of experience).

		Feature Contribution	Example
	No. of Participants	WOA Median	WOA Median	V	p-Value
All	25	0.4	0.380	536.5	0.775
Type of HCP¹
Obstetricians	19	0.5	0.4	304	0.654
Others (midwives, dietitians)	6	0.367	0.236	36	0.824
Years of experience
>10 years	12	0.317	0.314	113.5	0.958
⩽10 years	13	0.5	0.4	164.5	0.790

¹ HCP: healthcare practitioner.

Table 2. Comparison of WOA between correctly and incorrectly predicted cases for two XAI methods in subgroups of participants. Median is used as cutoff for continuous features (years of experience).

		Feature Contribution				Example
	No. of Participants	WOA Median		V	p-Value	WOA Median		V	p-Value
		Correct	Incorrect			Correct	Incorrect
Type of HCP¹
Obstetricians	19	0.6	0.5	94.5	0.711	0.333	0.5	32	0.066
Others (midwives, dietitians)	6	0.45	0.333	12	0.281	0.236	0.271	12	0.844
Years of experience
>10 years	12	0.4	0.317	44	0.351	0.25	0.392	25	0.301
⩽10 years	13	0.5	0.5	39	1	0.333	0.429	22	0.610

¹ HCP: healthcare practitioner.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Du, Y.; Antoniadi, A.M.; McNestry, C.; McAuliffe, F.M.; Mooney, C. The Role of XAI in Advice-Taking from a Clinical Decision Support System: A Comparative User Study of Feature Contribution-Based and Example-Based Explanations. Appl. Sci. 2022, 12, 10323. https://doi.org/10.3390/app122010323

AMA Style

Du Y, Antoniadi AM, McNestry C, McAuliffe FM, Mooney C. The Role of XAI in Advice-Taking from a Clinical Decision Support System: A Comparative User Study of Feature Contribution-Based and Example-Based Explanations. Applied Sciences. 2022; 12(20):10323. https://doi.org/10.3390/app122010323

Chicago/Turabian Style

Du, Yuhan, Anna Markella Antoniadi, Catherine McNestry, Fionnuala M. McAuliffe, and Catherine Mooney. 2022. "The Role of XAI in Advice-Taking from a Clinical Decision Support System: A Comparative User Study of Feature Contribution-Based and Example-Based Explanations" Applied Sciences 12, no. 20: 10323. https://doi.org/10.3390/app122010323

APA Style

Du, Y., Antoniadi, A. M., McNestry, C., McAuliffe, F. M., & Mooney, C. (2022). The Role of XAI in Advice-Taking from a Clinical Decision Support System: A Comparative User Study of Feature Contribution-Based and Example-Based Explanations. Applied Sciences, 12(20), 10323. https://doi.org/10.3390/app122010323

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Role of XAI in Advice-Taking from a Clinical Decision Support System: A Comparative User Study of Feature Contribution-Based and Example-Based Explanations

Abstract

1. Introduction

2. Materials and Methods

2.1. Participants Recruitment

2.2. CDSS for the Prediction of GDM

2.3. Explanation by Feature Contribution

2.4. Explanation by Example

2.5. Decision-Making Task

2.6. Experimental Design and Data Collection

3. Results

3.1. RQ1: Impact on Decision-Making

3.2. RQ2: Overall Advice-Taking

3.3. RQ3: Appropriate Advice-Taking, Over-Reliance or Self-Reliance

3.4. RQ4: Preferred XAI Method

3.5. RQ5: Factors Associated with Advice-Taking

3.6. Others

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI