Global Translation of Classiﬁcation Models

: The widespread and growing usage of machine learning models, particularly for critical areas such as law, predicate the need for global interpretability. Models that cannot be audited are vulnerable to biases inherited from the datasets that were used to develop them. Moreover, locally interpretable models are vulnerable to adversarial attacks. To address this issue, the present paper proposes a new methodology that can translate any existing machine learning model into a globally interpretable one. MTRE-PAN is a hybrid SVM-decision tree architecture that leverages the interpretability of linear hyperplanes by creating a set of polygons that delimit the decision boundaries of the target model. Moreover, the present paper introduces two new metrics: certain and boundary model parities. These metrics can be used to accurately evaluate the performance of the interpretable model near the decision boundaries. These metrics are used to compare MTRE-PAN to a previously proposed interpretable architecture called TRE-PAN. As in the case of TRE-PAN, MTRE-PAN aims at providing global interpretability. The comparisons are performed over target models developed using three benchmark datasets: Abalone, Census and Diabetes data. The results show that MTRE-PAN generates interpretable models that have a lower number of leaves and a higher agreement with the target models, especially around the most important regions in the feature space, namely the decision boundaries. metrics used in the present study facilitate the


Introduction
Since 2018, the European Union (EU) has placed regulations on personal data usage and algorithmic decision making systems [1]. As a result, EU citizens are entitled to explanations of algorithmic decisions and are able to contest them [1]. In the United States (US), regulatory bodies have begun investigating the widespread usage of artificial intelligence (AI). In 2014 and 2016, the executive office of the National Science and Technology Committee published two reports related to the ethical usage of AI and its regulatory recommendations [2]. This was followed by the introduction of the National Security Commission Artificial Intelligence Act of 2018 that established a formal committee to review the usage of AI and recommend necessary regulations [3].
Laws that regulate the use of machine learning (ML) applications are difficult to draft since they require extensive technical knowledge to accurately assess the outcomes produced by the underlying algorithms. However, these laws are needed to prevent misuse and decision failures. For instance, recidivism prediction instruments are widely used but are also the subject of controversy because they can inherit biases from the training data [4]. In the US, the judiciary presiding over State of Wisconsin vs. Eric. L. Loomis used an algorithm, COMPASS, to recommend sentencing. It sentenced the accused to 6 years in prison [5]. The defense argued that the usage of a black-box algorithm violated Mr. Loomis's right to due process since the algorithm was a trade secret. On appeal to the Wisconsin supreme court, the judgment was upheld [5]. The court's decision was heavily criticized by law scholars as having "failed to protect due process rights" [6]. These systems may perpetuate a cycle of incarceration [4]. In order to overcome some of these legal and ethical pitfalls, ML models need to be interpretable and open to auditing [7].
In response to this concern, the IEEE published the "The IEEE Global Initiative on Ethics of Autonomous and Intelligent Systems", a set of guiding principles for ethical AI usage [8]. These principles formed the foundation of the IEEE P7000 series of standards addressing AI standardization. Subsequently, the P7001 and P7003 standards required transparency and algorithmic bias considerations for autonomous systems, highlighting the need for interpretability for all ML models.
In general, ML models can be classified into two main categories in terms of interpretability. The first category consists of easily interpretable models, such as Bayesian networks [9], decision trees [10] and random forests [11]. The second category includes more complex models such as neural networks [12] and support vector machines [13]. This second category of models is often more accurate and generalizes better to new data [14]. However, it suffers from reduced interpretability [15]. In fact, the more complex the model, the less interpretable it becomes. More examples of this category include deep neural networks [12], convolution neural networks [16] and recurrent networks [17]. Similarly, the interpretability of SVM decreases with higher-order SVMs, which rely on RBF or polynomial kernels as opposed to the simpler linear kernels.
In [15], Lipton divides the notion of interpretability into two main categories: transparency and post hoc explanation. Transparency aims to deliver model-or global-level interpretability, whereas post hoc explanation is a per input "after the fact" explanation that provides a local level of interpretability. Both the local and the global interpretability of ML models have been investigated in previous studies. These studies propose a translation mechanism, where a non-interpretable model is translated to an interpretable model. For instance, the local interpretable model-agnostic explanations (LIME) technique translates a non-interpretable model to a locally interpretable one by sampling data around a query from the non-interpretable model [18]. The sampled data are labeled using the non-interpretable model and then are used to train a simple linear separator. The weights of the linear separator are provided as the explanation. An example of a global interpretation technique is TRE-PAN [19], which translates a neural network by training a decision tree model using data generated from the original model. These two approaches treat the target model as a black-box where only the outcome produced for a given input is available. A different type of global translation techniques relies on the complete knowledge of the architecture and parameters of the target model. For instance, the internal structure of a neural network was used to generate rules from an induced decision tree in CRED [20].
Global interpretation is the focus of this paper. A decision tree that relies on linear hyperplanes as separators is used to provide global interpretability for a target neural network model. The target model is considered to be a black-box. The present paper also introduces two metrics that can more accurately compare the target model and the interpretable model, specifically around the decision boundaries. Agreement between the target model and the interpretable model near the decision boundaries is critical since these regions delineate between different outcomes of the target model.

Related Work
Several methods have been proposed in the literature for translating non-interpretable ML models to interpretable ML models [18,19]. As mentioned above, these methods fall into two categories: transparency and post hoc explanation or, in other words, global translation and local translation, respectively [15]. Global translation corresponds to transparency because it aims to provide a comprehensive understanding of the behavior of the target model. Local translation corresponds to post hoc explanation, as it focuses on a subspace of the entire model. Lipton [15] further divides these two categories, where simulatability, decomposability and algorithmic transparency are sub-categories of global translation, whereas text explanations, visualization, local explanations, and explanation by example are sub-categories of local translation.

Global Translation
The aim of algorithmic transparency is to create an interpretable model that estimates the target model by translating it into a model whose behavior is understood [15]. For instance, TRE-PAN generates a decision tree which describes the behavior of a deep neural network [16,17]. This approach treats the target model as a black-box and is not limited to neural networks. It uses the target model to generate data that are used to train the interpretable decision tree. For each child node in the tree, enough data are generated to obtain the best split for the constrained sample space inherited from the parent node. TRE-PAN uses two of three decision trees. The top three features are selected, and the corresponding thresholds are established based on the potential gain in entropy from the split for each node. At most, two conditions need to be satisfied for a sample to be assigned to the left subtree.
The motivation behind TRE-PAN is that decision trees require substantially more training data than neural networks in order to achieve the same accuracy. These data may not be available. Therefore, the non-interpretable model, once trained with the available data, can be used to generate the additional synthetic data needed to train the interpretable model [19]. Some limitations of TRE-PAN include the fact that two out of three trees are more difficult to interpret than binary splits since three different possibilities are evaluated at each node. Moreover, the depth of the tree in TRE-PAN is primarily dictated by the complexity of the non-linear decision boundaries of the target model under consideration [19]. As TRE-PAN generates data near the decision boundaries, the information gain from splitting is likely to be greater than the gains from splitting regions that are farther away from the decision boundaries. This is anticipated because the data will have a more balanced proportion of positive and negative samples near the decision boundaries, requiring a higher number of splits to represent them. In fact, when the decision boundary of the target model has a non-linear shape, representing the area constrained by this shape requires several rectangles of varying sizes. Therefore, these boundaries often correspond to a large number of leaves in the TRE-PAN decision tree. Limiting the depth of the tree comes at the cost of lower accuracy [19].
An alternative global translation approach only applicable to neural networks was proposed in [20]. This approach requires access to the hidden nodes of the target neural network. Rules describing the global behavior of the network are extracted using the "continuous/discrete rule extractor via decision tree induction" (CRED) algorithm [20]. This algorithm builds a decision tree by clustering data around the training samples that activate a hidden node for a specific output class. CRED builds decision trees for each layer, generates intermediate rules, and combines them into global rules.
Another rule extraction technique was proposed in [21]. This technique is labeled "rule extraction by reverse engineering the neural networks" (RxREN). It extracts the rules in two phases. RxREN requires access to the internal architecture of the target neural network as well as the training data. In the first phase, RxREN prunes the input nodes based on their significance. In the second phase, the misclassified training examples are used to infer the feature ranges of the remaining (i.e., significant) input nodes, and to create threshold rules [21]. A subsequent improvement to RxREN, named "deep neural network rule extraction via decision tree induction" (DeepRED) was proposed in [22]. DeepRED combines RxREN and CRED. It first uses RxREN to prune the input nodes, and then uses a modified version of CRED to develop the explainable decision tree [22]. The modification consists of extracting intermediate rules from the target neural network model before merging them into complete decision trees [22].

Local Translation
The objective of local translation is to observe a limited subset of the feature space and attempt to explain it using specific input examples. Therefore, the focus of local translation is on explaining individual decisions, rather than the behavior of the entire model. An example of this type of translation is the model-agnostic explanations (LIME) [18].
This technique relies on a post hoc approach to explain local classification results. Specifically, LIME uses a linear model to represent a decision derived from a non-interpretable model. Given a target model and an input vector with a corresponding class prediction, the input is perturbed to generate synthetic data in the local neighborhood of the input vector under consideration. This synthetic data are then weighted by using a distance metric from the original input and used to train the interpretable linear model. By observing the feature weights of this linear model, the features that dictated the classification can be identified.
There are two potential limitations to LIME: random explanations and unconvincing explanations. LIME uses randomly perturbed data to create a local linear model. Therefore, it can generate different explanations for the same input depending on the distribution of the sampled data. Moreover, the local explanation can become less reliable if the input vector is near a non-linear decision boundary of the target model [18].
Other local translation techniques include utilizing a heat map to visualize the activation patterns in the input data [15,23,24]. For instance, given an input image, the pixels which were the most influential in selecting a predicted class can be identified [25]. In particular, it is possible to decompose the output of an image classifier such that each pixel in an image is weighted on how much it contributed to deciding the final outcome as in [26]. This decomposition is applied on a layer-by-layer basis, where the weights are propagated from one layer to the next layer. The decomposition can be done on pre-trained models, but does require access to the internal parameters of the models.
A similar decomposition technique can be applied to neural networks using DeepLIFT [24]. In this case, the contribution of the input feature to each layer is decomposed into a summation of weights as the value of the feature is propagated from the input layer to the output layer [24]. This technique requires a significant amount of domain knowledge, as a "reference" input (i.e., near the decision boundary) needs to be used as the baseline [24].
In [27], locally interpretable models such as LIME and DeepLIFT are considered to belong to the same "class of additive feature attribution methods". These methods decompose the target model into a sum of weights of the input features. A technique for finding these weights according to some properties (e.g., local accuracy, missingness, and consistency) is needed. Towards this goal, the "Shapley additive explanation" (SHAP) is proposed as a unified measure of feature importance [27]. The output of a given model is compared to the output of the same model after the removal of some features. The difference is assumed to be proportional to the contribution of the missing features [27].
Since SHAP is a local explanation technique, it is unable to capture the global behavior of the model. SHAP explains examples in a post hoc fashion and describes the aggregate behavior of a target model by only considering a representative sample of the data. Therefore, it is unable to describe the decision boundaries of the target model and is vulnerable to adversarial attacks [28].
In general, while local translation is easier to implement than global translation, it is prone to adversarial manipulation in various applications, including image classification and insurance decision support systems [15]. For example, an adversarial fake image can be overlaid on top of a real image, causing the model to misclassify the image [29]. A globally interpretable model is more resilient to such adversarial attacks and provides an opportunity to audit how a given decision is reached. The latter is important, as it can identify gaps in the inference mechanisms used by the target model.

Evaluation Metrics
A survey of previous research on model translation indicates that most of the techniques use common metrics to describe how accurately the interpretable model represents the target model. TRE-PAN calls this metric fidelity [19]; CRED refers to it as accuracy [20]; and RxREN calls it rule accuracy [21]. This metric is called model parity in the present paper. It compares the outcome of the target model and the interpretable model on a test dataset. Model parity is unable to represent the behavior of the target model near the decision boundaries. Data near the decision boundaries are significantly more sparse, compared to the rest of the feature space. When validation data are sampled uniformly from the entire feature space, they often fail to focus on the boundaries. As a result, interpretable models may generate high parity values but fail near the decision boundaries. The parity metric needs to consider the entropy of the subspace when evaluating the match between the target model and the interpretable model. Ideally, if the entropy is too low, all data should be considered mislabeled.

Materials and Methods
The global translation technique proposed in this paper translates a target ML model into a hybrid model consisting of a decision tree with linear SVM classifiers at each node. This hybrid ML architecture, which was previously proposed in [30,31], is extended to global translation. The proposed technique, MTRE-PAN, leverages the interpretability of the decision tree and the generalizability of SVM. It uses SVM with linear kernels instead of higher order kernels to facilitate interpretability. As in TRE-PAN, MTRE-PAN treats the target model as a black-box.

Model Overview
Let f (x), x ∈ R N represent a pre-trained, non-interpretable model, where x is the feature vector consisting of N dimensions and f (x) is a binary classifier with codomain {−1, 1}. MTRE-PAN builds an interpretable model for f consisting of a decision tree that uses hyperplanes to split each node into subtrees.
Each node k in MTRE-PAN is associated with a weight matrix C k ∈ R N * M and a bias vector b k ∈ R N . C k and b k form the set of linear constraints for each node. When applied simultaneously, they combine to form a convex hyper-polygon, where each node represents a mutually exclusive partition of the space of f . In the first phase, MTRE-PAN is trained using the original training data used to develop f , along with additional training data sampled from f . These data consist of the input set Q = {x 1 , x 2 , x 3 , . . . x m } and the corresponding label set Y = {l 1 , l 2 , l 3 , . . . l m }. Let Parent(k) represent the parent node of node k. The set of samples that are passed from a parent node to its left and right children is defined below: Leaf nodes inherit a label from the SVM classifier according to the side of the split they fall into. Figures 1 and 2 show two instances of an MTRE-PAN tree for the same function f , one at depth 1 and the second at depth 2. The target model f is a circle, where samples inside the circle are labeled "1" and those outside the circle are labeled "−1". The feature space for this synthetic example consists of two dimensions: the horizontal (feature 1) and vertical (feature 2) positions of the sample. In Figure 2, the right child of the root node is expanded by training an SVM on data sampled from a subspace of the feature space of f (x), such that any data used in the training must satisfy x T i C 1 > b 1 . Similarly, the left child of the root node is expanded by training an SVM on data sampled from a subspace of the feature space of f (x), such that any data used in the training must satisfy x T i C 1 ≤ b 1 . As the nodes continue to be split, the MTRE-PAN estimate of f improves. The splits induced by the new leaf nodes in Figure 2a start to converge to the boundaries of f (Figure 2b).  (1) and (2)   In MTRE-PAN, the decision to split a node is made according to the standard binary gain measure G [32] given by where Q k are the data of node k, p is the number of positive examples of Q k , n is the number of negative examples of Q k , and the entropy E is defined as follows: Nodes whose gains fall below a preset threshold are considered uncertain and therefore are candidates for further splitting. The main steps of MTRE-PAN are outlined in Algorithms 1 and 2. Each node generated by MTRE-PAN may sample the original noninterpretable model for more training data to add to its dataset Q if the available data are not sufficient to represent the subspace. This is accomplished by calling sampling-on-demand, as discussed next.

Begin Initializing Variables
Load training data of the target model (Tf); Enter entropy threshold (Et); Enter variance cutoff (Vc); Initialize an empty priority queue (Pq) ordered on gain as in Equation (3); Begin Building the MTRE-PAN tree Create the Root node of the tree (kRoot); Populate kRoot with Tf; Call Sampling_on_Demand (Algorithm 2) on kRoot; Place kRoot in the queue Pq; while The queue Pq is not empty do Dequeue a node (k) from Pq; if The entropy of k is above Et then Train a linear classifier (Sep_k) on the data in k; Add the parameters of Sep_k to the constraints of k Equations (1) and (2) ; Create two new empty nodes for the left (childL) and right (childR) children of k; Use C k & b k to split the data already sampled and stored in k; As in Equations (1) and (2), data are partitioned (Q l & Q r ) between childL and childR; Call Sampling_on_Demand for both childL and childR; Attach childL and childR as the children of k; Place childL and childR in the queue Pq; Return the Root node kRoot; Algorithm 2: Sampling on demand.

Begin Initializing Variables
Load input node k; Load trained target model as a function f (x), x ∈ R N ; Load user defined entropy threshold Et; Load user defined variance cutoff Vc; Load the constraints (C k & b k ) of node k; Load the data (Q k ) stored in k; Begin Sample the subspace of the node Calculate the point estimate (Pe) of the entropy of Q k stored in k using Equation (6) ; Calculate the variance (Pv) of the point estimate from Equation (7); Build a bounding box (Box) as a set of constraints for every feature of Q k . This bounding box constrains the maximum and minimum of each feature thereby surrounding all the data in Q k ; while Pv is greater than Vc do Sample data uniformly along the dimensions of Box; Discard samples that do not satisfy the constraints C k & b k of the node k. If k is the left child of its parent, Equation (1) is used to accomplish this process, otherwise Equation (2) is used; Add the data to Q k , the list of data of node k; Recalculate Pe and Pv on Q k ;

Sampling on Demand
After a leaf node is created in Algorithm 1, it might be necessary to generate more data to measure the entropy of the subspace encapsulated by its constraints. The leaf node was created by imposing additional constraints on the region of the feature space available to the parent node. The decision to sample more data is made by comparing the variance of the data assigned to the leaf to a predefined variance cutoff as shown in Algorithm 2.
The constraints associated with the leaf node (i.e., Equations (1) and (2)) make the subspace available for sampling a hyper-polygon. Data were already generated by the parent and distributed to the left and right child nodes in accordance to their respective constraints. A bounding box is built around these data. The bounding box is simply a hyper-rectangle that surrounds the data by placing bounds on the maximum and minimum of each feature plus a slight margin. The bounding box is uniformly sampled for data that is labeled using f and if the sample fits within the constraints of the leaf node, it is added to its dataset.
The entropy is measured using a point estimate of the probability. This is because in Equation (4), the value of α (the probability) is unknown. Every sampled data point can be considered to come from a series of independent and identically distributed indicator random variables: I = {I 1 , I 2 , I 3 , . . . I j }, and since ExpectedVal[I] = α, we can estimate α with the sample mean α: and the entropy as In order to determine when enough samples have been collected, the variance of α is observed, and since the variance of an indicator R.V is var While sampling the data, if the variance of the point estimate falls below a user-defined variance cutoff, sampling is stopped, and the entropy estimate of Equation (6) is considered to be an accurate estimate.

Results
Multiple experiments were conducted to assess the efficacy of MTRE-PAN and compare it to that of TRE-PAN [16,17]. In its proposed implementation, TRE-PAN generates an interpretable decision tree for each target model using a two out of three split as described in Section 2.1. In order to simplify the comparison with MTRE-PAN with respect to the depth of the resulting interpretable models, the C4.5 binary implementation of TRE-PAN was used [33]. MTRE-PAN, the model proposed in the present paper, consists of a hybrid combination of a binary decision tree and a linear SVM classifier at each node of the tree. Both TRE-PAN and MTRE-PAN were used to generate interpretable models with varying tree depths for several target models. The first target model is a simple function with a circular boundary delineating the negative and positive samples. This model was used earlier to illustrate the methodology. The remaining target models are feed forward neural networks, which were trained using three public domain datasets.
The hyperparameters of MTRE-PAN and TRE-PAN include the maximum depth, the cutoff entropy, the cutoff variance and the margin as described in Table 1. All the ML target models follow the same general architecture that consists of the following: -An input layer and two hidden layers, each with a number of nodes equal to the number of input variables in the dataset. -An output layer consisting of a single node. -All the nodes use a sigmoid activation function.

Maximum Depth
The maximum allowable depth of the interpretable decision tree. 10 Cutoff Entropy A leaf node with an entropy higher than the Cutoff Entropy is considered uncertain and is a candidate for further splitting. 0.0808

Cutoff Variance
Places an upper limit on the number of data points sampled in a given leaf. A lower cutoff variance will result in a more accurate value for the sample entropy.
-Certain model parity: This metric is similar to the model parity, except, in this case, the sample in the validation dataset that is assigned to uncertain nodes (i.e., nodes with entropy below the cutoff entropy) are labeled uncertain. These samples cannot match any label from f and as such are considered misses. This metric is calculated as and takes into consideration uncertain nodes that require further expansion. -Boundary model parity: This metric is also similar to the certain model parity. However, the validation dataset is limited to the samples that are near the decision boundary within a predefined margin ( Table 1). The boundary model parity measures the progress of the interpretable model toward replicating the behavior of the target model near the decision boundaries. It identifies an interpretable model that may have a high certain model parity but may not fare well around the decision boundaries. -Leaf count: The number of leaves in the interpretable decision tree generated by either MTRE-PAN or TRE-PAN.

Synthetic Data
MTRE-PAN makes use of linear separators at each node of the tree. This is similar to LIME [18]. However unlike LIME, MTRE-PAN uses multiple separators whose constraints define a hyper-polygon at each leaf node. As the tree grows, the set of polygons from the root to a leaf node decrease in entropy after every split. This corresponds to a decrease in the total area of the uncertain polygons. Therefore, the certain polygons start to approach the boundaries of the target model.
In order to illustrate this aspect, MTRE-PAN was applied to a simple function f consisting of a circle with a radius of 1 introduced earlier. The samples that fall inside the circle are positive and those outside are negative. TRE-PAN was also applied to the same synthetic function. Results for both MTRE-PAN and TRE-PAN are provided in Tables 2 and 3, respectively.  Figure 3 is a visualization of the polygons generated by MTRE-PAN for f at depths 9 and 12. It illustrates the convergence of the polygons to f . That is, the collective area of the uncertain polygons becomes smaller at depth 12 compared to depth 9. Moreover, as expected, the figure shows that the uncertain polygons always contain the decision boundaries of f . Otherwise, the polygon will not include both positive and negative labels and would have an entropy of 0. This behavior is also seen in Figure 4 for TRE-PAN. The uncertain polygons (i.e., hyper-rectangles in this case) also contain the decision boundaries of f .   Since the uncertain polygons are mutually exclusive, they can be used as an estimate of the decision boundaries and the overall behavior of the underlying model. As mentioned above, the accuracy of the interpretable model in representing the decision boundaries depends on the values of the cutoff variance and cutoff entropy. If the cutoff variance is high, it may not be possible to generate enough data to accurately label a polygon as positive, negative, or uncertain. On the other hand if it is low, more data are needed to ensure that the sample variance of the entropy is below the cutoff variance. Similarly, if the cutoff entropy is high, it may not be possible to decide whether a leaf node is certain or uncertain. This may potentially lead to labeling polygons that contain a decision boundary as certain. A cutoff entropy close or equal to zero with a sufficiently low cutoff variance will ensure that no decision boundary is missed.
If a decision boundary falls within a certain polygon (i.e., a polygon with an entropy lower than the cutoff entropy), it is still possible to estimate the missing decision boundary. This entails finding neighboring leaves that do not have the same label since an estimated boundary is simply a shared constraint that separates neighboring polygons of different labels.
MTRE-PAN provides a global explanation of the function f in the form of a set of uncertain polygons and of polygons that have an estimated boundary as a constraint. The remaining polygons provide additional constraints that help define the limits of this estimate in the feature space. This characteristic is important, as it avoids the unbounded plane issue observed in LIME. In LIME, a plane is generated as a local estimator of the decision boundary. However, the plane is not delimited in the feature space. Tables 2 and 3 show that the parity of both interpretable models of f are high for depths greater than 3. However, MTRE-PAN produces interpretable models with a lower number of leaves and similar certain model parity and boundary model parity to those produced by TRE-PAN. At depth 12, the interpretable model generated by MTRE-PAN consists of 305 leaves, whereas the one generated by TRE-PAN includes 775 leaves.

Abalone Data
The Abalone dataset consists of recorded physical characteristics for the Abalone mollusks [34]. It includes 4177 samples, where each sample has 8 features and an integer label representing the physical characteristics of abalone gastropods. The input features are sex, length, diameter, height, whole weight, shucked weight, viscera weight, and shell weight. The label, rings, is an integer number that represents the age of abalone mollusks. For the purpose of this study, it was converted to −1 for all values below the median and 1 for all values above the median in order to enable binary classification. The target model for this dataset achieved an 84.4% accuracy over the validation dataset. The performance metrics of the corresponding interpretable models generated by MTRE-PAN and TRE-PAN are reported in Tables 4 and 5, respectively. From the results, both MTRE-PAN and TRE-PAN approach the target model in terms of model parity. MTRE-PAN begins to achieve a non-zero certain model parity earlier in comparison to TRE-PAN (depth 3 vs. depth 5). MTRE-PAN also starts from a higher certain model parity when compared to TRE-PAN (51.81% vs. 15%). Both models converge near the boundaries, with MTRE-PAN achieving a non-zero boundary model parity sooner (depth 4 vs. depth 6), with a significantly higher starting parity (45.87% vs. 16.40%). As the leaf count indicates, MTRE-PAN at depth 4 has a significantly lower number of leaf nodes when compared to the closest TRE-PAN tree (with respect to parity) at depth 9 (i.e., 7 leaf nodes vs. 201 leaf nodes). The cost of training the linear classifiers of MTRE-PAN is superseded by the exponentially higher number of splits that is needed for TRE-PAN to achieve a similar parity. It can also be argued that while the separators used by TRE-PAN are simple, the high number of nodes complicates the interpretability of the resulting decision tree.

US Adult Census Data
This dataset is a collection from the US 1994 adult census data [35]. It includes 13 input variables and one output variable for 32,561 individuals that responded to the census. The input variables are age, work class, level of education, education years, marital status, occupation, relationship, race, sex, capital gain, capital loss, hours per week, and native country. The output variable is the income of the individual. In the original dataset, the income is a binary label that is set to −1 if the income is less than USD 50,000 and 1 otherwise. The target neural network model for the US Adult Census Data achieved an accuracy of 82.9%. The performance metrics of the corresponding interpretable models generated by MTRE-PAN and TRE-PAN for this dataset are included in Tables 6 and 7, respectively. Similar to the Abalone dataset, these results reflect an overall higher certain model parity with MTRE-PAN compared to TRE-PAN. However, both MTRE-PAN and TRE-PAN struggle when attempting to converge to the decision boundary within the margin.
They need depths of 6 and 9, respectively, before they start to show convergence toward the decision boundary. While the certain model parity increases steadily for both TRE-PAN and MTRE-PAN, the boundary model parity stagnates at depth 9 for TRE-PAN. The decision boundaries for the target model associated with the US Adult Census dataset are more complex than those of the Abalone dataset. This complexity is compounded by the increased number of dimensions in this dataset compared to the Abalone dataset.

Diabetes Diagnosis Data
This dataset covers a population of 798 Pima Indian women. It consists of eight input features, which were selected based on the WHO suggested predictors for diabetes mellitus [36]. The label is either 1 or −1 based on whether or not diabetes is detected for each individual. The eight input features are age, number of pregnancies, plasma glucose concentration, diastolic blood pressure, triceps skin fold thickness, 2-h serum insulin, body mass index, and diabetes pedigree function. The target model for this dataset achieved an accuracy of 70.8%, which is lower than the previous two target models. The performance metrics of the corresponding interpretable models generated by MTRE-PAN and TRE-PAN are included in Tables 8 and 9, respectively. Unlike the Census data, the boundary model parity converges faster. However the starting value of the non-zero model parity is still significantly lower compared to the Abalone dataset. Considering that the Diabetes dataset has the same number of dimensions as the Abalone dataset, it is likely that the lower starting boundary model parity and target model accuracy (70.8% vs. 84.4%) are due to the Diabetes dataset being more non-linear than the Abalone dataset. Moreover, even though both MTRE-PAN and TRE-PAN are close in terms of boundary model parity at depths 6 and 7 (i.e., 14.37% and 13.18%, respectively); MTRE-PAN converges faster over 4 levels reaching a boundary model parity of 45.17% compared to the 23.43% boundary model parity achieved by TRE-PAN over the same number of levels.

Explanation
TRE-PAN uses the decision tree as the explanation of the target model [19]. However, this decision tree may not represent the decision boundaries of the target model. In fact, as shown in the above tables, at extremely shallow depths, model parity can reach 90% while the boundary model parity is low (e.g., Table 3).
MTRE-PAN represents the target model using several matrices that place constraints on subsets of the feature space of the target model. These subsets can be certain positive, certain negative, or uncertain (i.e., boundary polygons). The constraints take the form of x T C ≤ b as indicated in Equations (1) and (2). Since the uncertain polygons contain the boundary of the target model, they can be used to generate linear approximations of the decision boundaries. Each linear approximation created by this approach is similar to the explanation employed by LIME [18]. However, in the case of MTRE-PAN, the linear approximation is bounded within the uncertain polygon used to generate it.
The explanation process is illustrated using the Diabetes Diagnosis dataset. The MTRE-PAN model at depth 8 shown in Table 8 has a total of 85 polygons. Each of these polygons is constrained by a bounding box, where the bounds are defined by the maximum and minimum of each feature as shown in Table 10  Equations (10)-(12) each represents a different polygon from the set of 85 polygons mentioned above. Specifically, Equations (10) and (11) represent a certain positive and a certain negative polygons, respectively. Equation (12) depicts an uncertain polygon (i.e., boundary polygon) with a linear estimate of the boundary as defined in Equation (13).    29 −5.31 3.02 0.63 0.36 −18.31 (12) x (13) According to Equation (10), MTRE-PAN indicates that at depth 8, all the patients that fall within this specific certain polygon are considered positive for diabetes. The original Diabetes dataset includes 92 such patients, one of whom is shown in Table 11. Moreover, based on the weights of the features in Equation (10), TST and 2SI do not contribute to the positive prediction in this polygon, whereas the most predictive feature is DPF. This latter feature represents the existence of a family history of diabetes for the patient and is known to be a significant predictive factor for diabetes [37].

Discussion
The performance metrics used in the present study facilitate the comparison of the characteristics of MTRE-PAN and TRE-PAN. Model parity measures the difference between the interpretable model and the target model without any distinction between certain and uncertain polygons. This metric is subject to sudden changes in accuracy due to the uncertain polygons being treated as though they are certain. This trend can be observed in Tables 4-7. For example in Table 4, the model parity at depth 2 increases to 93% from its previous value of 35% while the certain model parity remains at 0%. This means that all of the nodes currently in the tree exceed the cutoff entropy and are uncertain. The model parity stabilizes as the depth and certain model parity increase. When the total area of the uncertain polygons diminishes, the variance of the model parity also diminishes since most of the data which now reside within certain polygons will never be re-labeled.
Certain model parity increases as the depth of the tree in the interpretable model increases. This metric includes an uncertain class label for the data that fall within uncertain nodes. This intermediary class is always considered to be a miss when calculating the certain model parity. Therefore, a given sample will not transition from a true positive/negative to a false positive/negative as the tree is expanded. However, this transition can occur when model parity is used. The entropy of a given polygon may fall below the cutoff entropy and become certain, and as a result, a sample might be falsely labeled. This case does not affect certain model parity since the sample was considered falsely labeled before the split occurred.
Overall the certain model parity is a better indicator of the convergence of the interpretable model to the decision boundaries of the target model. A major difference between MTRE-PAN and TRE-PAN can be observed for the Census data in Tables 6 and 7. Based on certain model parity, MTRE-PAN at depth 3 approximates the target model as well as TRE-PAN at depth 9. Similar trends can be observed for the other datasets.
The ultimate goal of global translation is to translate the target model into an interpretable model. Therefore, boundary model parity is the most important metric because it illustrates progress towards this goal. It is a test of the ability of the model to represent the decision boundaries of the target model without the potential for easily interpretable data to exaggerate the accuracy of the translation. Any interpretable model generated by MTRE-PAN or TRE-PAN can achieve high levels of certain model parity. However, if the decision boundaries are complex, the model will fail at shallow depths. This can be observed when the Abalone (Table 4) and Diabetes models (Table 8) are compared. Both datasets have the same number of dimensions (i.e., 8 input features each). However, the decision boundaries for the diabetes dataset are more complex. Although both models have similar certain model parity, the boundary parity for the diabetes model does not converge easily.
The leaf count serves as an indication of the complexity when extracting information from the interpretable models. Since MTRE-PAN is able to achieve certain model parity comparable to TRE-PAN at shallow depths, it requires a significantly lower number of nodes. For instance, when comparing Tables 6 and 7, MTRE-PAN requires only 4 leaf nodes to represent the behavior of the target model nearly as well as TRE-PAN with 173 leaf nodes. For every added level, the leaf count doubles.
Both TRE-PAN and MTRE-PAN build a tree that represents the target model. The explanation in TRE-PAN is a decision tree which can be expressed as a set of rules [20,21]. However, this study shows that TRE-PAN has limitations, as it is unable to provide an accurate explanation of the decision boundaries of the target model. In contrast, MTRE-PAN seeks to directly represent the decision boundaries of the target model. This is accomplished by translating these decision boundaries to a set of bounded linear estimators. One of the main benefits of this approach is that it reduces the complexity traditionally associated with providing a global explanation of the target model.

Conclusions
This paper proposes MTRE-PAN, a global translation model that can be used for any target classification model. MTRE-PAN builds the explainable model as a hybrid combination of a decision tree and SVM linear separators at each node. This explainable model organizes the feature space into a set of polygons that approaches the behavior of the target model. MTRE-PAN was inspired by TRE-PAN. The latter uses a decision tree to partition the feature space, whereas the former uses linear hyperplanes. The performance of MTRE-PAN was compared to that of TRE-PAN for three target classification models developed using public domain datasets. The results show that MTRE-PAN achieves a higher parity across all metrics at shallower depths. The study also shows that certain model parity and boundary model parity are able to provide a better evaluation of the interpretable model than the traditional model parity. Certain model parity quantifies the accuracy of the interpretable model within low entropy regions. Boundary model parity specifically uses data within a margin of the decision boundaries to test the ability of the interpretable model to globally represent the decision boundaries of the target model.
MTRE-PAN has a higher computational complexity than TRE-PAN because it uses linear SVM classifiers at each node. However, as the results show, TRE-PAN requires a much deeper tree to achieve comparable results. The additional levels in the tree offset the difference in computational complexity. This is especially true for highly non-linear target models.
Future work includes the pruning of the MTRE-PAN interpretable tree model around the decision boundaries. Dynamic pruning can help develop more compact global explanation by pruning old constraints as new ones are added. MTRE-PAN can also be improved by developing a loss function that is specific to each target model. This loss function could use the probability distribution produced by the target model to locate the decision boundaries. However, this will be at the cost of a more model-dependent methodology. Finally, MTRE-PAN should be compared to other global interpretable models, and it its ability to interpret biased data should be evaluated.