Designing an Interpretability-Based Model to Explain the Artificial Intelligence Algorithms in Healthcare

The lack of interpretability in artificial intelligence models (i.e., deep learning, machine learning, and rules-based) is an obstacle to their widespread adoption in the healthcare domain. The absence of understandability and transparency frequently leads to (i) inadequate accountability and (ii) a consequent reduction in the quality of the predictive results of the models. On the other hand, the existence of interpretability in the predictions of AI models will facilitate the understanding and trust of the clinicians in these complex models. The data protection regulations worldwide emphasize the relevance of the plausibility and verifiability of AI models’ predictions. In response and to take a role in tackling this challenge, we designed the interpretability-based model with algorithms that achieve human-like reasoning abilities through statistical analysis of the datasets by calculating the relative weights of the variables of the features from the medical images and the patient symptoms. The relative weights represented the importance of the variables in predictive decision-making. In addition, the relative weights were used to find the positive and negative probabilities of having the disease, which indicated high fidelity explanations. Hence, the primary goal of our model is to shed light and give insights into the prediction process of the models, as well as to explain how the model predictions have resulted. Consequently, our model contributes by demonstrating accuracy. Furthermore, two experiments on COVID-19 datasets demonstrated the effectiveness and interpretability of the new model.


Introduction
In recent years, with the development of artificial intelligence (AI), machine learning (ML), and deep learning (DL), technologies have achieved great success in computer vision, natural language processing, speech recognition, and other fields. ML models have also been widely applied to vital real-world tasks, such as face recognition [1,2], malware detection, and smart medical analysis [3]. Although AI outperforms humans in many meaningful tasks, its performance and applications have also been questioned due to the lack of interpretability [4]. The model is interpretable because it is small and basic enough to be completely comprehended. Ideally, the user should understand the learning process well enough to realize how it forms the decision limits from the training data and why the model has these rules [5]. For ordinary users, the machine-learning model, particularly the deep neural networks (DNN) model, is similar to a black box. We give it an input, and it feeds back a decision result. No one can know exactly the decision basis behind it and whether the decisions it makes are reliable. With the wide applications of AI solutions in healthcare, it becomes increasingly critical to improve the understanding of the working mechanism of the model and publish the white box of artificial intelligence [6]. Building trust in the machine-learning models has become a prerequisite for the ultimate adoption of AI systems. Hence it is crucial to improve model transparency and interpretability,

Motivation
Healthcare presents particular ethical, legal, and regulatory problems since decisions can have an immediate impact on people's well-being or lives [10]. One of the major implementation challenges highlighted is the inability to explain the decision-making progress of AI systems to physicians and patients [10]. Clinicians must be confident that AI systems can be trusted because they must provide the best treatment to each patient. Therefore, developing interpretable models might be a step toward trustworthy AI in healthcare. The area of explainable AI seeks to gain knowledge into how and why AI models make predictions while retaining high levels of predictive performance. Although the interpretability of AI models holds significant promises for health care, it is still in its early stages. Among other things, it is unclear what constitutes a sufficient explanation and how its quality should be assessed. Furthermore, the benefit of interpretability of AI systems has yet to be demonstrated in reality [11].
In this paper, we contribute to the larger goal of creating trustworthy AI models in healthcare, by designing a new model that is added to the state of the art of interpretability techniques as a contribution that implements statistics and probability rules to produce accurate interpretations for the predictions of the AI algorithms.

Background: Statistics and Probability Techniques
In general, the interpretability-based model is based on statistics and probabilities rules to train the datasets.
We distinguish two interpretability strategies that adopt probabilities in their findings with solid theoretical backgrounds and are easy to implement: the Locally Interpretable Model-Agnostic Explanations (LIME), and (ii) the Deep Learning Important FeaTures.

Locally Interpretable Model-Agnostic Explanations (LIME)
Ribeiro et al. [12] introduced a surrogate model that uses a trained local model to interpret a single sample. However, the black-box model is explained by taking an instance sample of interest, performing disturbance near it to generate new sample points, and obtaining its predicted value. LIME uses the new dataset to train an interpretable model (such as linear regression or a decision tree) to obtain a near-local approximation to the blackbox model. LIME consists of two parts, LIME, and SP-LIME, while LIME approximates the model with a fidelity method, SP-LIME is used to select non-redundant instances (basically covering all features) to explain the global behavior of the model. Additionally, LIME can interpret the classification results of the medical image and can also be applied to related tasks of natural language processing, such as topic classification, part-of-speech tagging, etc. Because the starting point of LIME itself is model-independent, it has broad applicability [13].

Deep Learning Important FeaTures or DeepLIFT
DeepLIFT is a method for dissecting the output prediction of the neural network on a given input by backpropagating the contributions of all neurons in the network to each characteristic of the input. DeepLIFT assigns value to neurons depending on their activity. When the local gradient is zero, the findings might be deceptive. DeepLIFT produces surprisingly distinct attribution maps from input CT images with minor perturbations that are visually identical. Moreover, DeepLIFT can also show dependencies that other methods overlook by distinguishing between the negative and positive contributions [14].

Definition of Concepts
Some neighboring concepts are occasionally used as synonyms for transparency, including interpretability, explainability, and understandability [15]. However, there is a subtle difference between explainability and interpretability. The model is considered inherently interpretable if a person can comprehend its underlying workings, either the complete model at once or at least the elements of the model relevant to a specific prediction. It may include understanding decision criteria and cut-offs and the ability to compute the model's outputs manually. In contrast, we consider the model's prediction explainable if a process can offer (partial) knowledge about the model's workings. These details include identifying which elements of input were most significant for the resulting forecast or which adjustments to input would result in a different prediction [5]. Moreover, transparency implies that the behavior of artificial intelligence and its related components are understandable, explainable, and interpretable by humans. Besides, understandability means that the decisions made by the artificial intelligence model can reach a certain degree of understanding [16].
Before introducing specific interpretability problems and corresponding solutions, we briefly introduce what interpretability is and why it is needed. Data mining and machine-learning scenarios define interpretability as the ability to explain and present understandable terms to humans [17]. In essence, interpretability is the interface between humans and the decision model, which is both an accurate proxy for the decision model and understandable by humans [18]. In top-down machine learning, which builds models on a set of statistical rules and assumptions, interpretability is critical because it is the cornerstone of the defined rules and assumptions. Furthermore, model interpretability is a critical means of verifying that the assumptions are robust and that the defined rules are well suited to the task. Unlike top-down tasks, bottom-up machine learning usually corresponds to the automation of manual and onerous tasks. Given a batch of training data, the model automatically learns the input data and output categories by minimizing the learning error in the mapping relationship between them. In bottom-up learning tasks, the model is built automatically, so we do not know its learning process or its working mechanism. Therefore, interpretability aims to help people understand how a machinelearning model learns [8].

Related Works
In recent years (2016-2022), the interpretability of the various artificial intelligence models has attracted great attention from the academic and business sectors. Researchers have successively proposed several interpretation methods to solve and enhance the model "black box" problem. We distinguished three interpretability categories that have characteristics, advantages, and disadvantages. Rules-based interpretation models: A linear method was proposed in [19] by adding regularization to the tree, reducing the nodes of the decision tree, and solving the problem of a vast number of nodes without losing accuracy. An interpretable tree framework was proposed in [20] which can be applied to classification and regression problems by extracting, measuring, pruning, and selecting rules from tree collections and computing frequent variable interactions. This model also forms a rules-based learner, a reduced tree ensemble learner (STEL), and uses it in prediction. A method was proposed in [21] that learns rules to globally explain the behavior of black-box machine-learning models used to solve classification problems. It works by first extracting the important conditions at the instance level and then going through a genetic algorithm with suitable fitness-function rules. These rules represent the patterns in which the model makes decisions and help to understand its behavior.

Bayesian Nonparametric Approach
Guo et al. [11] designed a Bayesian nonparametric model to define an infinite-dimensional parameter space. In other words, the size of this model can adapt to the change in the AI model as the data are increased or decreased. This model can be determined according to how many data parameters are selected. It only needs a small assumption to learn data and perform clustering. The increasing data also can be continuously aggregated into corresponding classes. At the same time, this model also performs predictions. According to the specific learning problem, a spatial data model composed of all parameters related to this problem can be solved.

GAM
A global variable generalized additive weight method called GAM was proposed in [4], which accounts for the pattern of neural-network predictions of swarms. The global interpretation of GAM describes the nonlinear representation learned by the neural network. GAM also provides adjustable subpopulation granularity and the ability to track global interpretations for specific samples.

MAPLE
MAPLE may be used nearly identically as an explanation for a black-box model or as a predictive model; the main difference is that in the first instance MAPLE is fitted to the black-box model's prediction, whilst in the second situation MAPLE is fitted to the response variable. MAPLE has various intriguing feature LIMEs, which are mentioned below: (i) It avoids the trade-off between model performance and model interpretability since MAPLE is a highly accurate predictive model capable of generating correct predictions. (ii) It finds global trends by using local examples and explanations. MAPLE stands out from other interpretability frameworks due to its training distributions [22].

Anchors
Anchors is a model-independent, rule-based local explainer approach [23]. Anchors ensure that the forecasts of occurrences in the same anchor are nearly identical. In other words, anchors identify the qualities that are sufficient to correct the forecast while modifying the other attributes that do not affect the prediction. The bottom-up approach, in which anchors are built sequentially, is one method of anchor construction. Anchors, in particular, begin with an empty rule and extend it with one feature in each iteration until the resulting rule has the greatest estimated accuracy [23].

SHAP
A game theory concept was used to quantify the impact of each feature on the prediction process. The Shapley value [24] is a mechanism from coalitional game theory that can properly distribute benefits across players (features) when players' contributions are uneven. In other words, Shapley values are founded on the assumption that characteristics work together to influence the model's prediction toward a specific value. It then attempts to distribute its contributions fairly across all featured subsets. Shapley value, in particular, distributes the difference between prediction and average prediction equitably among the feature values of the instance to be explained. Shapely value fulfills three intriguing features [25].

Perturbation-Based Methods
Perturbation is the most basic method for examining the impact of modifying an AI model's input properties on its output. This can be accomplished by eliminating, masking, or changing specific input variables, then conducting the forward pass (output calculation) and comparing the results to the original output. This is comparable to the sensitivity analysis conducted in parametric control system models. The input features that have the greatest influence on the output are ranked first. It is computationally intensive since a forward pass must be performed after perturbing each collection of characteristics in the input [26]. In the case of picture data, the perturbation is accomplished by covering sections of the image with a grey patch and thereby obscuring them from the system's perspective. It can give both positive and negative evidence by identifying the responsible characteristics [27].

Attention Based
The fundamental concept of attention is motivated by the way people pay attention to various areas of a picture or other data sources in order to interpret them. The technique employed attention mechanisms to show the detection process, which included an image model and a language model [28]. The language model discovered dominant and selective characteristics to learn the mapping between visuals and diagnostic reports using that attention mechanism [26].

Concept Vectors
A unique approach was developed in [29] called Testing Concept Activation Vectors (TCAV) to explain the characteristics learnt by successive layers to domain experts who do not have deep-learning knowledge in terms of human-understandable concepts. It used the directional derivative of the network in idea space, similar to how saliency maps use it in input feature space. It was put to the test to explain DR level predictions, and it effectively recognized the existence of microaneurysms and aneurysms in the retina. This gave medical practitioners with explanations that were easily interpretable in terms of the existence or absence of a specific notion or physical structure in the image. [26]. Many clinical notions, such as the texture or form of a structure, cannot be adequately defined in terms of presence or absence and require a continuous scale of assessment.

Similar Images
In [30], research was proposed analyzing layers of a 3D-CNN using a Gaussian mixture model (GMM) and binary encoding of training and test pictures based on their GMM components to yield comparable 3D images as explanations. As an explanation for its conclusion, the algorithm returned activation-wise similar training pictures utilizing atlas. It was proven on 3D MNIST and an MRI dataset, where it yielded pictures with identical atrophy conditions. However, it was shown that in some circumstances, activation similarity was dependent on the spatial orientation of pictures, which might influence the choice of the returned images [26].

Textual Justification
This model explained its decision in terms of words or phrases that describe its logic and may communicate directly with both expert and ordinary users [26]. A justification model that utilized inputs from the classifier's visual characteristics, as well as prediction embeddings, was used to construct a diagnostic phrase and visual heatmaps for breastmass categorization [31]. In order to develop reasons in the presence of a restricted quantity of medical reports, the justification generator was trained using a visual word constraint loss [26].

Intrinsic Explainability
Intrinsic explainability explains its decisions in terms of human visible decision limits or variables. For a few dimensions where the decision boundaries can be viewed, they generally comprise relatively simpler models such as regression, decision trees, and SVM [32].

Recurrent Neural Network (RNN)
Ref. [33] proposed RNN model that combined a two-layer attention mechanism for sequential numbers according to the data. The method gave a detailed explanation of the prediction results and retained the relative accuracy of RNN.

Limits of the Existing Solutions
Significant progress has been made in explaining the decisions of the AI models, particularly those implemented in medical diagnosis. Understanding the features responsible for a particular prediction helps model designers iron out dependability problems so that end users may acquire trust and make better decisions [26]. Almost all of these strategies aim towards local explainability or justifying decisions in a particular case. However, it is essential to consider the characteristics of a black-box that might make the wrong decision for the wrong reason. It is a significant issue that can have an impact on performance when the system is implemented in the real world [26].
Moreover, when considering the above interpretability methods, there is a lack of quantitative judgments, which indicates their low explanation fidelity. The AI algorithms in the healthcare domain make their decisions in the domain of positive or negative test results, which explains their low accuracy [34]. However, there is a need to make explainability approaches more comprehensive and intertwined with uncertainty methods [26]. Moreover, it is essential to consider the characteristics of a black-box that might make the wrong decision for the wrong reason as it is a significant issue that can have an impact on the performance when the system is implemented in the real world [26]. In response and to take a role in tackling these challenges, we contribute to the field by designing an interpretability-based model which explains the predictions of the AI algorithms in healthcare. This approach simulates human-like reasoning abilities and makes the explanations by describing the various features of the medical image and the symptoms of the patient. We validate our model by performing experiments on COVID-19 datasets to demonstrate its effectiveness and interpretability. Table 1 shows the limitations of the existing interpretability methods.

The Interpretability-Based Model
The core principle of the new model is the using of the statistics and probability rules to train the datasets, by finding the relative weights of the variables which represent their relative importance in determining the prediction and the probability of having the disease. The variables are either the symptoms of the patient or the characteristics of the affected parts of the organ as shown in the medical image. Calculating the relative weights is performed by dividing the weight of each variable by the sum of all weights for all variables. Subsequently, the result of training the dataset is the likelihoods of positive infection. Our model contains two interpretability algorithms for training the datasets. The first is to interpret the predictions of the neural-networks models, shown in Figure 1. The second explains the decisions of the rules-based models more precisely, as shown in Figure 2

Dataset Requirements
The datasets must be labeled and representative. They must include all the salient features of the disease's medical image and the symptoms. Furthermore, they should include the counting details, i.e., the number of variables with specific descriptions, characteristics, and values.

Defining the Variables
The variables of the interpretable-based model are (i) the symptoms of the disease and (ii) the spatial features of the medical image.

Relative Weights as Ranking
Explanations contain variables encoded by treating each as a vector in R n : the relative weight ranks and considers each variable's association with a particular prediction. Each variable is a vector that answers the question: How important is this variable for a

Dataset Requirements
The datasets must be labeled and representative. They must include all the salient features of the disease's medical image and the symptoms. Furthermore, they should include the counting details, i.e., the number of variables with specific descriptions, characteristics, and values.

Defining the Variables
The variables of the interpretable-based model are (i) the symptoms of the disease and (ii) the spatial features of the medical image.

Relative Weights as Ranking
Explanations contain variables encoded by treating each as a vector in R n : the relative weight ranks and considers each variable's association with a particular prediction. Each variable is a vector that answers the question: How important is this variable for a

Dataset Requirements
The datasets must be labeled and representative. They must include all the salient features of the disease's medical image and the symptoms. Furthermore, they should include the counting details, i.e., the number of variables with specific descriptions, characteristics, and values.

Defining the Variables
The variables of the interpretable-based model are (i) the symptoms of the disease and (ii) the spatial features of the medical image.

Relative Weights as Ranking
Explanations contain variables encoded by treating each as a vector in R n : the relative weight ranks and considers each variable's association with a particular prediction. Each variable is a vector that answers the question: How important is this variable for a particular prediction? Or what is the extent for this variable to be in the decision of the machinelearning model? We treat variables as relative weighted conjoined rankings. particular prediction? Or what is the extent for this variable to be in the decision of the machine-learning model? We treat variables as relative weighted conjoined rankings.

Creating the Explanations
The interpretability-based model algorithms use the relative weights of the variables, to define the average of repetitions of the variables' characteristics in the dataset, which measures the relative importance of the variables in the prediction of the AI algorithm.
The outcomes of our model include the positive and negative probabilities of having the disease.

Validating the Interpretability-Based Model
Research in interpretability inherently faces the challenge of effective and reliable validation [17]. Identifying the appropriate validation methodology for a proposed approach is an open research question [35]. This paper validates the interpretability-based model using real datasets for COVID-19 patients. We use one dataset for medical images and another for symptoms as shown in Tables 2 and 3, respectively.
The data were gathered from open access data [36] and evaluated to improve clinical decisions and treatment. There is a total of 112 confirmed COVID-19 patients (range, 12-89 years), including 51 males (range, 25-89 years) and 61 females (range, 12-86 years). The data is public and available on the web [36].

Validating Our Model to Interpret the Predictions of the Neural-Networks Model
The neural-networks model is used to provide the predictions for COVID-19 patients by training the dataset of the chest CT images and making the prediction based on analyzing the medical image of the tested patient. However, the machine-learning models do not produce explanations for the predictions.
The suggested model will provide the interpretation based on the saved counting data for a set of chest CT images that are shown in Table 2 and Figure 4, where the relative weights are calculated by dividing the weight of each variable by the sum of all weights.  [36].

Creating the Explanations
The interpretability-based model algorithms use the relative weights of the variables, to define the average of repetitions of the variables' characteristics in the dataset, which measures the relative importance of the variables in the prediction of the AI algorithm.
The outcomes of our model include the positive and negative probabilities of having the disease.

Validating the Interpretability-Based Model
Research in interpretability inherently faces the challenge of effective and reliable validation [17]. Identifying the appropriate validation methodology for a proposed approach is an open research question [35]. This paper validates the interpretability-based model using real datasets for COVID-19 patients. We use one dataset for medical images and another for symptoms as shown in Tables 2 and 3, respectively.  The data were gathered from open access data [36] and evaluated to improve clinical decisions and treatment. There is a total of 112 confirmed COVID-19 patients (range, 12-89 years), including 51 males (range, 25-89 years) and 61 females (range, 12-86 years). The data is public and available on the web [36].

Validating Our Model to Interpret the Predictions of the Neural-Networks Model
The neural-networks model is used to provide the predictions for COVID-19 patients by training the dataset of the chest CT images and making the prediction based on analyzing the medical image of the tested patient. However, the machine-learning models do not produce explanations for the predictions.
The suggested model will provide the interpretation based on the saved counting data for a set of chest CT images that are shown in Table 2 and Figure 4, where the relative weights are calculated by dividing the weight of each variable by the sum of all weights.    We validate the model by applying Algorithm 1 of the interpretability-based model to the counting data in Table 2, as shown in Figure 2: (1) Define the variables of the explanation set: distribution of pulmonary lesions (no lesion, peripheral, central, diffuse), involvement of the lung (no involvement, single lobe, unilateral multilobe, bilateral multilobe), GGO, crazy-paving pattern, consolidation, linear opacities, air bronchogram, cavitation, bronchiectasis, pleural effusion, pericardial effusion, lymphadenopathy, and pneumothorax. (2) Train the dataset by finding the relative weights of the variables as shown in Table 4 and generating all the probable explanations for the patient by finding the sum of the related relative weights and calculating the positive and negative probabilities using the following formulas: −LR + = relative weights of the variables For instance, if the physician suspects a COVID-19 case by applying a deep learning model on the CT image, they may use the interpretability-based model to explain the prediction for the patient. In order to do that, they should observe the existing features of the CT image e.g., GGO, bronchiectasis, pericardial effusion, consolidation, involvement of the lung (bilateral multilobe), distribution of pulmonary lesions (peripheral). Using our model, the positive likelihood ratio (+LR) according to Table 4 is: 18.5% + 4.4% + 0.3% + 9.4% + 13.5% + 13.1% = 59.2% Whereas the negative likelihood ratio (−LR) is: The physician can explain the likelihood ratios for the patient using the indication of the relative weights of the symptoms. Table 5 includes all the possible predictions of our model based on the trained dataset. Where (+) represents the existence of the symptom and (−) represents its absence.

Validating Our Model to Interpret the Predictions of the Rules-Based Models
Generally, the rules-based models provide low explanation fidelity because their decisions are in the domain of positive or negative results. However, we apply the interpretability rules-based Algorithm 2 to the dataset in Table 3 that includes the symptoms of COVID-19 patients who have at most 13 symptoms to generate the explanations as shown in Figure 1 and as in the following:

1.
Define the variables of the explanation model which will be the symptoms: Fever, Dizziness, Palpitation, Throat pain, Nausea and vomiting, Headache, Abdominal pain and diarrhea, Expectoration, Dyspnea, Myalgia, Chest distress, Fatigue, and Dry Cough.

2.
Train the dataset by calculating the relative weights for the variables by dividing the ratio weight of each symptom by the sum of all weights, as shown in Table 6, and generate the explanations for the patient by finding the sum of the related relative weights and the positive and negative probabilities: 3.
−LR + = relative weights of the variables 4. +LR = 1 − (−LR) '.' Represents a probable prediction that is not mentioned.  For example, if a patient exhibits signs and symptoms of COVID-19 e.g., dry cough, fever (37.9 • C), headache, and myalgia, the physician suspects a COVID-19 case and recommends the patient take a polymerase chain reaction (PCR) test. The physician may use the interpretability-based model to explain the prediction for the patient. According to our model, the positive likelihood ratio (+LR) based on Table 6 is: 17.2% + 16.5% + 2.9% + 17.2% + 6.5% = 60.3% Whereas the negative likelihood ratio (−LR): The physician can explain the likelihood ratios for the patient using the indication of the relative weights of the symptoms. Table 7 includes all the possible predictions of our model based on the trained dataset. Where (+) represents the existence of the symptom and (−) represents its absence.  +  -------------11.1  88.9   8192  ---------------0  100 '.' Represents a probable prediction that is not mentioned.

Discussion
Our model represents a significant progress in explaining the decisions of the AI models in medical diagnosis. Additionally, understanding the features responsible for a particular prediction helps model designers iron out dependability problems so that end users may acquire trust and make better decisions [26]. The interpretability-based model contributes in: (i) reducing the cost of mistakes in the medical domain that is highly effected by the wrong prediction. (ii) minimizing the influence of AI model bias by elaborating on the decision-making criteria which builds trust for all users [37].
In this section, we highlight and compare our model with the other interpretability methods that were described in Section 4. When we apply the datasets of symptoms and CT images, we find that our model is able to distinguish the feature importance which is represented by the relative weight of the variable. In addition, it satisfies the model independent from other related algorithms as our model applies the statistics and probability rules. Moreover, the new model is able to identify the relative features for each instance. However, the accuracy of the predictions of the interpretability-based model is a limitation, because it depends on the correctness of the trained dataset, which may not exist in some medical images where the readings of the variables in the dataset are determined by the related experience of the medical staff.

Conclusions and Future Works
In this paper, we build an interpretability-based model-a methodology for making explanations to supplement existing interpretability techniques for the AI algorithms based on the statistics and probability rules.
Our model provides the decisions and the explanations. Additionally, it provides the likelihood of positive and negative infections and the ability to trace explanations to specific samples. We demonstrated the use of our model using real datasets for COVID-19 patients, one for the CT image and another for the rules-based model for symptoms, where the suggested model illuminates the explanation patterns across learned sets. Furthermore, with explanations across subpopulations, convolutional neural network predictions are more transparent. A possible next step is to reduce the complexity of the interpretable rules-based algorithms that offset their interpretability. Another future work area is to apply our model on other industries to optimize their AI systems. e.g., marketing, insurance, financial services.