A Visual Mining Approach to Improved Multiple-Instance Learning

Multiple-instance learning (MIL) is a paradigm of machine learning that aims to classify a set (bag) of objects (instances), assigning labels only to the bags. This problem is often addressed by selecting an instance to represent each bag, transforming a MIL problem into standard supervised learning. Visualization can be a useful tool to assess learning scenarios by incorporating the users' knowledge into the classification process. Considering that multiple-instance learning is a paradigm that cannot be handled by current visualization techniques, we propose a multiscale tree-based visualization called MILTree to support MIL problems. The first level of the tree represents the bags, and the second level represents the instances belonging to each bag, allowing users to understand the MIL datasets in an intuitive way. In addition, we propose two new instance selection methods for MIL, which help users improve the model even further. Our methods can handle both binary and multiclass scenarios. In our experiments, SVM was used to build the classifiers. With support of the MILTree layout, the initial classification model was updated by changing the training set, which is composed of the prototype instances. Experimental results validate the effectiveness of our approach, showing that visual mining by MILTree can support exploring and improving models in MIL scenarios and that our instance selection methods outperform the currently available alternatives in most cases.


Introduction
Many machine learning problems can be solved by standard supervised learning techniques, in which an object is represented by a single feature vector [1].However, there are problems in which the target of the classification is a set of several instances, each one represented by a separate feature vector.This is the case of multiple-instance learning (MIL) [2].In MIL, an object, called a bag, contains a set of instances.MIL was introduced in [3] to solve the problem of drug activity prediction, but many other studies have already applied this approach successfully, such as image classification [4], cancer detection via images or sequences [5,6], text categorization [7], speaker recognition [8] and web mining [9].Amongst the characteristics of problems that are fit to be solved by MIL approaches are those in week supervision scenarios that do not work well with standard machine learning pipelines [10].Although multi-class classification is possible with MIL, most studies addressed only binary classification where a bag can belong either to a positive or negative class: a bag is labeled as positive if it contains at least one positive instance; otherwise, it is labeled as a negative bag.
Different supervised methods have been proposed to handle the MIL problem [11][12][13][14].A widely used strategy is to convert the multiple-instance problem into a classical supervised learning problem by selecting a single feature vector (instance) among the several in each bag.This instance is often called instance prototype, which is later used to represent the bag both in training and classification steps, assuming it is sufficient to represent it correctly [2,15,16].Take, as an example, the image classification case where each image would be considered as a bag and its segmented regions (represented by separate feature vectors) as instances.Applying this type of strategy, the hypothesis would be that one region in each image-instance prototype can represent the whole image and distinguishes that "bag" from the others.Selecting a prototype by its relevance often needs to account for different factors such as similarity measures and the bias of the importance selection algorithm [17].In this context, users should play a central role in defining the selection criteria and adjusting parameters for the target model.
Visualization techniques have been successfully employed to help users in standard classification tasks [18][19][20][21].However, these techniques are not directly applicable to the case of multiple-instance data for two reasons: first, they do not scale well, which is a problem since the MIL dataset is often large due to the granularity of instances.Second, visualization methods are often designed to visualize all instances in the same space.An adequate visualization for this task should distinguish between bags and instances, reflecting the typical structure of an MIL dataset in the same layout.
Unlike previous approaches, in this paper, we propose the use of visualization to support user intervention in the multi-instance classification pipeline.We target the tasks of instance prototype selection and training set building by the analyst, performed interactively after a preliminary solution has been obtained by an automatic procedure.The central visualization in the approach is the MILTree, a multi-scale visualization technique based on the Neighbor-Joining similarity tree [22].In the first level, only bags are placed, reducing the amount of data to be visualized compared to laying out all instances on the visual plane.The second level projects the instances belonging to a single bag, allowing the user to explore its contents.Each bag in the first level is connected to its instances in the second level, producing an intuitive structure that facilitates data analysis.
In addition to the visual support to MIL, we also propose two new instance selection methods with multiclass support: one integrating the MILTree with Salient Instance Selection (MILTree-SI) and the other with Medoids Selection (MILTree-Med).MILTree-SI is based on the MILSIS method [16], which assumes that negative bags only have negative instances, while our method considers that negative bags can have positive instances as well.This adapts it for applications such as image and text.The MILTree-Med method uses the k-Medoids clustering algorithm to partition the unlabeled instances in an attempt to find positive and negative clusters, thus identifying the instance prototypes as their medoids.
In that context, the main contributions of this work are: • MILTree-a novel tree layout for multiple-instance data visualization; • MILTree-Med and MILTree-SI-two new methods for instance prototype selection; • MILSIPTree-a visual methodology to support multiple-instance data classification.

Related Work
The first studies in the MIL field included the Diverse Density (DD) [3], DD with Expectation Maximization (EM-DD) [23] and MI-SVM [11].Later, several methods were proposed using instance selection strategies, such as MILES [15], MILIS [2] and IS-MIL [12].These methods tackle the MIL problem by converting it into regular supervised learning.This is carried out by choosing instance prototypes (IP) for each bag, which then can be used to learn a classifier.MILSIS [16] is also a method based on instance selection that aims to identify instance prototypes named Salient Instances, which are true positive instances in positive bags.
When visualization, data mining and machine learning are studied in the same context, it is possible to provide better tools for exploring and understanding data [24].Studies relating these topics become even more important in the case of unstructured data since we need to obtain representations of those objects in reduced dimensions [25,26].There are several related studies aiming at supporting the classification process by means of visual tools [18][19][20][21].To process multidimensional data, previous visual mining approaches have employed both multidimensional projections and trees.
Multidimensional projections map high dimensional vectors into points in a low dimensional space, such as 2D or 3D [27].The result of this projection is a point placement on a plane, normally corresponding closely to a similarity relationship so that an instance is likely to be placed close to other similar instances and far from those that are not similar [28].A large variety of techniques can be used to perform projection, such as Principal Component Analysis (PCA) [29], Multidimensional Scaling (MDS) [30], Least Square Projection(LSP) [31], Local Affine Multidimensional Projection (LAMP) [32], tSNE and UMAP.Although projections have improved greatly in precision and performance, they are prone to producing overlapping points, causing clutter that hampers the interaction for datasets that are reasonably large.
On the other hand, similarity trees enforce the separability of the points by including edges between elements and causing branches between groups of similar points.They are constructed from the distances between the instances to be displayed.The Neighbor-Joining (NJ) method, originally proposed for re-constructing phylogenetic trees [22], is of particular interest.NJ builds unrooted trees, aiming at minimizing the tree length and number of tree branches by finding pairs of closed instances and creating a branch from them.In this paper, we employ an improved NJ tree layout algorithm [33] that runs faster than the original NJ [22].
Our paper presents a novel contribution by extending the notion of visually supporting classification tasks to the case of multiple-instance learning, providing a methodology that assigns visual layouts to MIL tasks.We use a novel multi-level NJ tree, allowing users to explore MIL datasets, select the training set, create a model, visualize classification results, as well as update the current model using novel methods, as detailed in the next section.

Background and Related Concepts
In this section, we briefly present the main concepts regarding multi-instance learning and visual support for classification tasks.

Multiple Instance Learning
In supervised learning, we define classification as follows: given an instance space X (also called input space) composed by individual instances-each one is represented by a feature vector-and a label space Y (output space) composed by classes that can be assigned to each sample, the task is to build a classifier, i.e., a map f : X → Y.The classifier is often obtained by using a training set of instances as input with their corresponding true labels [1].
In MIL, an object is represented by different parts, although the object itself has its own label, each part may have, in principle, different labels.This causes the classical supervised learning definition to perform poorly in multi-instance scenarios.
Formally, a multiple-instance learning algorithm learns a classifier f MIL : 2 X → {−1, +1}, taking as input a dataset composed by bags B i , with i = 1, • • • , n.Each bag contains a set of instances; the j th instance inside a given bag is denoted as B ij , and j = 1, • • • , n i , in which n i is the number of instances inside the bag B i .Considering a binary classification scenario, positive and negative bags are denoted, respectively, by B + i and B − i .For the sake of simplicity, we will denote a bag as B when it represents either positive or negative bags.
One of the crucial steps in this process is how to assign a label to the bag given the labels of its instances.The first MIL study [3], related to drug activity, assigned a positive label to a bag that has at least one positive instance.This approach makes sense for that application but has not been successful for other datasets [2].The state-of-the-art strategies often select a prototype instance in order to represent the bag, and there are many different heuristics that can be used to perform this task.
In the following sections, we will give further details about two strategies for instance selection employed in multiple-instance learning.Moreover, we discuss ideas for visualization and interaction to improve prototype instance selection.

Salient Instance Selection Strategy
In MILSIS [16], the authors perform prototype selection in positive bags, obtaining Salient Instances via "Rough Selection" and "Fine Selection".
Rough Selection obtains two optimal positive instances from all positive bags, which are basically those with the highest and lowest values of salience.First, all instances in negative bags are grouped in a set B − .Then, the salience Sal(B + ij ) for each instance in a given bag B + i is computed as follows: where d(., .) is the Euclidean distance function.A high salience value indicates that the instance is different from the other instances in the bag.After computing saliences inside B + i , instances are sorted from the maximum (j = 1) to the minimum (j = m) salience values.This is used to estimate the probability that B + i1 (maximum) and B + im (minimum) instances are positives given the set B − (see Equation 2), and then select an optimal positive instance, which will represent B i : where l(.) is the label function, σ is a scaling factor larger than 0; D(B ij , B − ) is the minimum distance between B ij and all instances in B − : Since Pr(l(B ij ) = +1|B − ) is proportional to D(B ij , B − ) [16], from Equation 3, it is possible to estimate how likely an instance is to be labeled as positive or negative by its distance to the set of negative instances.Finally, the probabilities of each bag are compared in order to find the optimal positive instance, i.e., the one with maximum distance to B − .
In Fine Selection, an optimal negative instance is selected from B − by its maximum distance to the optimal positive ones.Then, starting with the optimal positive instances obtained in the rough selection: B + i1 and B + im , it finds a true positive instance in each bag.These true positive instances will be part of the Salient Instances.Algorithm 1 summarizes the whole procedure.Note that B = B + ∪ B − ; n + and n − are, respectively, the number of instances inside positive and negative bags; and SalNum is the number of salient instances.

Medoids Instance Selection Strategy
The Medoids Instance Selection [34,35] strategy computes the medoid for each bag using the k-Medoids algorithm.Each medoid will represent a bag, and then the multiple-instance learning task is reduced to a traditional supervised learning task.The k-Medoids algorithm is adapted to cluster the multiple-instance data so as to partition the unlabeled training bags into k groups.After that, it re-represents each bag by a k-dimensional feature vector, where the value of the i th feature is the distance between the bag and the medoids of the i th group.In other words, the medoid of each bag has the minimum average distance to the other bags in the same group, with all bags represented as k-dimensional feature vectors in a regular supervised learning approach.

Visual Multiple-Instance Learning
Our approach to tackle the multiple-instance learning problem consists of two main features: a tree-based visualization to encode the MIL data (including instances and bags representations), coupled with new heuristics based on that visualization, to convert MIL into a standard machine learning problem.
The data under analysis can be visualized in the bag space or the instance space using MILTree.We also identify prototypes for each bag, which allows training a classifier using those prototypes.Two methods were designed to identify prototypes: MILTree-SI and MILTree-Med, both using the MILTree visualization proposed in this work.In this section, after defining additional notation, we describe the MILTree layout and then the MILTree-SI and MILTree-Med instance prototype selection methods.

Additional Notation
In addition to the notation described in Section 3.1, for each bag we designate two special instance prototypes, denoted by B protoProj and B protoClass .The B protoProj is used for visualization purposes, denoting the instance prototype that will be used to map bags in the MILTree's bag space layout, while B protoClass denotes the prototype used to create the classification model.Initially they are the same; however in order to keep the same visualization layout while updating the classification model, B protoClass can change, but B protoProj does not change to preserve the MILTree layout throughout the visual mining process.

Creating a Multiple-Instance Tree (MILTree)
The improved NJ algorithm [33] begins with a star tree formed by all m objects on the distance matrix, represented by leaf nodes arranged in a circular configuration and connected by branches to a single central node.Then, it iteratively finds the closest neighboring pair among all possible pairs of nodes by the criterion of minimum evolution, which attempts to minimize the sum of all edge lengths for all nodes of the tree.Afterwards, the closest pair is clustered into a new internal node, and the distances from this node to the remaining ones are computed to be used in subsequent iterations.The algorithm stops when m − 2 virtual nodes have been inserted into the tree, i.e., when the star tree is completely resolved into a binary tree.
Algorithm 2 illustrates the NJ tree procedure, which starts by computing the depth of the divergence for each node, i.e., the sum of the distances from instance i to all other nodes: Then, it computes a new distance matrix based on the divergence r i in order to find the closest pair of nodes i, j: After finding the pair of nodes i, j, a virtual node u is created as a parent of both i and j.The length of the edge connecting u to i is: and the length of the edge connecting u to j is: Finally, the pair of nodes i, j is replaced by u in the matrix D, and the distances between u and all others nodes are computed by Equation (8), where k = i, k = j and k = 1..m: The algorithm then iterates, finding and joining pairs of nodes until m − 2 virtual nodes are inserted into the tree.
To map multiple-instance data to a visual tree structure, the MILTree was developed as a two-level NJ tree, with bags and instances projected in different levels.We group the instance matrix data by the previously known bags using the instance prototypes as nodes of the second level of the NJ.The layout of the tree projects the data into visual space.Compute r i for i = 1..m, and find c i,j which is the closest pair of instances i, j ; // (see Equation ( 4) and ( 5)).
Replace i, j by u and update the distance matrix D with the new node u ; // (see Equation ( 8)).Define u as parent of both i and j; v = v − 1; end A subset of the Corel-1000 dataset is used to illustrate the bag and instance projection levels in Figure 1.In this dataset, each image is a bag, composed of feature vectors (instances) that are extracted from disjoint regions of the image, with an average of 4.5 instances per bag.
In the first-level (bag space projection) the red points represent positive bags-100 images of the flower category-and the blue points represent negative bags-100 images selected uniformly from the remaining categories of the dataset.When a bag is selected, its instances are projected in the second level of the multiple-instance tree (instance space projection).Algorithm 3 contains the complete procedure to build the MILTree, which starts by grouping the matrix data D in bags B i (first for loop), where i denotes the index of bags.We iterate over the instances in the matrix D, each line D m representing the distance from instance m to all other instances.In each iteration, a new bag B i is created, and all its instances B ij are added to B i , where j denote the index of instances belonging to some bag B i .Afterwards, we compute B i,protoProj and B i,protoClass for each B i using either MILTree-SI or MILTree-Med (second for loop).Remember that B i,protoProj is used in MILTree and B i,protoClass in the classification process.Every B i,protoProj is included in a set P, which will be used later to create a bag distance matrix in the NJ tree procedure.Finally, B i ∀i are processed by the MILTree using P, creating the bag space projection.
Instance projection: Since our MILTree has two levels, when a user interacts with a bag i, the instances B ij will form an instance space projection B i .tree.To create B i .tree, the NJ-Tree algorithm takes as input the instances of B i and the prototype B i,protoProj .

Instance Prototype Selection Methods
We propose two new prototype selection methods, MILTree-SI and MILTree-Med, based on those proposed by [16], as described in Section 3.2.Both the SI and Med approaches compute two prototypes per bag: B ix and B iy .The first, B ix , is used both in the visualization and to build the classification model, while B iy is an alternative prototype that can be used to update the classification model.Computing B iy offers an option to automatically change the bags' prototypes that are poorly represented by B ix , for example, those that are misclassified in the training set, improving the multiple-instance classification model.
Output: Multiple-Instance Tree.// Creating bags: In contrast with the original method [16], we assume that not only positive bags but also negative bags could have both positive and negative instances.This is true for more complex data such as images and text.Consider, for example, the problem of discriminating between photos of flowers (positive) and photos of the classes person and animals (negative) in which each image is a bag and its instances are disjoint regions of the image.If we use the classic definition of a positive bag, all those images (bag) containing a flower in at least one region (instance) are considered positive; to be considered negative, the image must not contain any flower.However, in this application, we are often interested in the main object in the scene.A photo whose main subject is an animal may contain a region with a flower, for example, in the background or a person may carry a flower in a photo, although the person itself is the main object.Similar examples apply to contexts such as text, video and speech classification.Therefore, we compute both optimal positive and negative prototypes.

MILTree-SI
The optimal negative instance is defined as the one most distant from all the true positive instances of B + obtained by the original Salient Instance Selection Method: Sal(B − ij ) is computed using Equation (1), then a true negative instance is obtained for each negative bag according to Equation (3).Equations ( 1) and (3) are reproduced again below for clarity: From all true negative instances, the optimal negative instance will be the one furthest from the set B + .Afterwards, we select the optimal positive instance from B + using a similar procedure, but this time selecting the instance in B + with the maximum distance to the optimal negative instance found previously.
Finally, as we already have the optimal instances (positive and negative), we compute the instance prototypes B ix and B iy for positive and negatives bags: B ix is the instance with the highest salience, and B iy is the next one with major salience.Figure 2 shows the selection of prototypes (which could be B ix or B iy ) from negative bags using MILTre-SI.

MILTree-Med
Clustering algorithms have been frequently used for selecting prototypes in a feature space defined by the instances in MIL, as presented in Section 3.2.2.The original methods create a new artificial instance that represents a bag by choosing, for instance, a cluster centroid.MILTree-Med, unlike other methods, works in the instance space of each bag, selecting an actual instance without creating new ones, which better complies with the visualization scalability.Each bag is considered a cluster, and the prototype is the medoid of the cluster.Since we want to find two prototypes, the k-Medoids algorithm is applied with k = 2. Since all bags may contain positive and negative instances, we want to identify potentially positive and negative clusters.
The medoids of the sub-clusters are the instance prototypes B ix and B iy .Figure 3 shows the selection of prototypes using MILTree-Med.Bag represents either a positive or negative bag, c is the centroid of the bag, m 1 and m 2 are the medoids of the two sub-clusters and d are distances between each medoid and the centroid.

Updating Instance Prototypes Using MILTree
To create the first MILTree layout, as well as the classifier, we set B i,protoClass = B i,protoProj = B i,x , i.e., the first instance prototype selected by either MILTree-SI or MILTree-Med heuristics.The MILTree was developed so that the user can spot those bags that are poorly represented by the first prototype selection.For those bags, users can then set B i,protoClass = B iy or manually inspect the bags to select a more representative one.
Two visual representations are available to the user: 1.
Prototype highlighting: MILTree highlights the current prototype B ix with a darker color and also B iy , which is the alternative prototype, with a lighter shade.Thus, by inspecting both, the user can validate B protoClass or update it according to their knowledge by selecting B iy or even another instance in the instance space layout.Figure 4 shows the instance prototypes B ix and B iy projected in the MILTree's instance space layout.In Figure 4a, the SI selection method is used, and in Figure 4b, the medoids selection is used instead.

2.
SVM class match tree: the InstancePrototypes ClassMatch tree uses color to contrast the bags that were misclassified, considering a training or validation set for which the labels are known.A similar approach has been successfully used in [21] and [33].In this approach, the instances B protoClass are used to build an SVM classifier, and the MILTree plots a layout we called InstancePrototypes ClassMatch tree, with colors according to the classification result: pale green for correctly classified and red for misclassified bags.Figure 5a displays the MILTree generated for a subset of the Corel-1000 dataset, where red bags represent positive bags (images of horses) and blue bags represent negative bags (random images from other categories).To find the InstancePrototypes ClassMatch tree for this dataset, we allow users to select a training set to create an SVM classifier.Figure 5b shows the training set that was used to create the classifier; dark red bags are the ones used for training, while the pale blue ones are used as validation/test.Finally, the InstancePrototypes ClassMatch tree shows the classification results (see Figure 5c), where dark red points are misclassified bags, probably with non-representative prototypes.
Updating the prototypes will improve the model, as indicated by the results presented later in Section 6.

Application of MILTree to Multiple-Instance Learning Scenarios
In this section, we present three case studies that illustrate the practical usefulness of MILTree.The case studies were carried out on a Dell workstation Z620 equipped with an Intel Core CPU (E5-2690, 3.40GHz) and 16GB memory.
The first presents a binary multiple-instance image classification problem using the Corel People dataset.The second one describes a multi-class scenario using images from five classes of the Corel dataset.Lastly, the MIL benchmark dataset Musk1 is used in the third case study.More information about each dataset can be found in Section 6.

Case 1: Instance Space Layout in a Binary Classification Problem
We demonstrate that an appropriate selection of instance prototypes can influence the accuracy of the classification.We use MILTree for the layout and MILTree-Med as instance prototype selection method, and the updates of prototypes are performed in the Instance Space Layout.For this case, we use the Corel People binary dataset with 200 bags (images) and 938 instances (feature vectors extracted from image regions), which includes 100 images from the class People (positive) and 100 images randomly selected from all other classes (negative) of the Corel-1000 dataset.Figure 6a shows the projection of the bags using MILTree: red bags represent positive bags (images of people) and blue bags represent negative bags (images from other categories).The first step in the classification process is to select the training set: 20% of the images are selected to train the model, while the remaining 80% is used for validation and test.Due to the nature of the NJ algorithm, MILTree positions the bags that better characterize the class they belong to as far as possible from the core of the tree (external points), while the bags located in the core of the tree (internal points) have features that overlap with other classes.This characteristic can be extremely useful in identifying a representative sample by selecting both external and internal bags to create our training set so as to build a classifier that is neither too restrictive nor too general.This visual selection strategy has been demonstrated to produce better results in standard supervised learning when compared with random selection strategies [21].Note that, in our representation, when the user selects a bag she/he is actually selecting its instance prototype B protoClass .Figure 6b shows the selected training set for the People dataset, where red bags represent the training set and blue bags represent the test dataset.
The selected training set is then used to create an initial classifier (in our experiments, an SVM classifier).After applying the model over the validation set, we display the Instan-cePrototypes ClassMatch tree that highlights, in a contrasting color, the misclassified training bags, whose instance prototypes are likely to be non-representative.Figure 6c shows the InstancePrototypes ClassMatch tree for the People dataset, highlighting in red the misclassified points belonging to the training set.We then use the second level projection of MILTree to explore the instance space projection of those bags, aiming to improve the classification model.
To improve the MIL classification model, users have two options.The first option is to automatically change the B protoClass of all misclassified training bags identified in the InstancePrototypes ClassMatch tree by replacing it with the alternative instance prototypes B iy .The second option, which we show in this case study, is to visually explore the instance space projection of all misclassified bags highlighted in the InstancePrototypes ClassMatch tree and individually choose how to update the instance prototypes.
Figure 7 shows the instance space layout of each highlighted bag, here referred to as A, B, C, D, E, F and G.Each instance space layout is a new NJ tree formed by instances that belong to the explored bag.We show the four possible visual steps required to update the prototypes.Note that the first, second and third steps could be executed automatically, but by exploring it manually, the user is allowed to see and control those steps individually.For instance, the bag C required a manual selection of the correct prototype, so a fourth step is added.This allows users to explore and analyze the instances in the instance space layout.
Using the example shown in Figure 7, we follow the four steps.The first step presents the initial status: the green and red points represent, respectively, the correct classified and misclassified instances in the training set; the current instance prototypes (B ix ) are highlighted with larger circles.In the second step, the alternative prototypes B iy selected by MILTree-Med are shown.In our example, it makes no sense to update the prototypes of D, E, F and G because all instances were classified in the same class.However, inside bags A, B and C, two classes are available, so users can then accept the automatic MILTree-Med update by changing the prototype to the one with the correct classification result.In the third step, the new instance prototypes of bags A, B and C are shown, but the alternative prototype B iy is still misclassified in C. Due to this, a fourth step can be carried out to manually choose a new instance prototype-in this case, a correctly classified one (green point) that is near the previous prototype.
After updating the bags' prototypes detected through the InstancePrototypes ClassMatch tree, we retrain the classification model.An accuracy of 72% was achieved before the update, and by updating only three bags, it was possible to increase it to 75%.In Figure 8a, we show the classification results using MILTree, and its correspondent ClassMatch tree is shown in Figure 8b.Note that the ClassMatch tree shows the bags that were misclassified in the whole dataset, including training and validation/test sets, while the InstancePrototypes ClassMatch tree only shows the bags that were misclassified in the training set.This case study demonstrates the positive impact of selecting representative instances on the accuracy of the classifier and that the visual exploration can play an important role by making this task easier.

Case 2: Bag Space Layout and a Multiclass Classification Problem
Here, we show the impact of adding new instances from bags that already exist in the training set to update and improve the classification model.Thus, in the updated model, some bags can be represented by more than one instance prototype.To update the model, the MILTree layout was used only in the bag space, and the MILTree-SI automatic selection method was used to detect new prototypes.We tested our approach with the multiclass Corel-300 dataset, containing 5 classes, 300 bags and 1293 instances.Figure 9a    As in the previous case study, an initial training set was selected using MILTree (see Figure 9b) to build an initial classifier for the Corel-300 dataset.We then show the InstancePrototypes ClassMatch tree to identify bags that have unsuitable instance prototypes, as shown in Figure 9c.Finally, we add all the alternative prototypes automatically detected by the MILTree-SI selection method to the current model.Thus, the instances B iy for all red bags (see Figure 9c) are added to the classification model.
Note again that in this case study, the new instance prototypes are added to the previously developed model.This strategy can be useful in multiclass problems because with more classes it can be difficult to select a single instance to represent each bag.The MILTree-SI method ranks the instances, and therefore, B ix and B iy are considered, respectively, the best and second-best prototypes so that it is intuitive to add the second prototype to the model.In contrast, the MILTree-Med method uses clustering in an attempt to obtain a pair of positive and negative prototypes, so it is highly recommended to choose just one of them because adding both instances to the model could increase the class overlap.
After updating the classifier, Figure 9d shows the classified MILTree, and Figure 9e shows its correspondent ClassMatch tree, where green and red points represent bags correctly classified and misclassified, respectively.The accuracy before the update was 82.6%, and after, including only eight new instances, it increased to 83.8%.It is also worth mentioning that the achieved accuracy using a standard classification method, without our visual mining tool, was 78%.This demonstrates that the selection of a proper training set and the process of updating the model using the proposed methods can help to create a classifier with improved performance, even in multiclass scenarios.

Case 3: Adding New Bags Using the MILTree Visualization
In this case study, we illustrate how to use MILtree to identify and select both new prototypes and new bags to update a model and improve its performance, assuming that the initial training set is not representative enough to build a classifier with good class discrimination capabilities.The ClassMatch tree is generated based on the initial classification results to identify misclassified bags, and MILTree-SI is used as the instance prototype selection method.Similar to the second case study, we will only make use of the bag space layout of MILTree.We tested our system using the Musk1 benchmark dataset consisting of 92 bags and 476 instances.Figure 10a shows the projection of the Musk1 dataset in the bag space layout of MILTree, in which blue bags represent negative bags and red bags represent positive bags.
First, an initial training set was selected using MILTree (see Figure 10b) to build the first classifier, and then the InstancePrototypes ClassMatch tree is generated to highlight the misclassified bags on the training set, as shown in Figure 10c.Just like the second case study, the alternative instance prototypes B iy of all red bags (misclassified bags) are added to the initial model.In addition, we use the ClassMatch tree to show the classifier results in the validation set to identify possible new bags that can be used to update the model as well.This is an interesting strategy because the validation set contains bags that were unseen in the training stage.Figure 10d presents the ClassMatch tree obtained with the first classification result for the Musk1 dataset.
Users can employ different strategies to keep updating the classification model using our layout by searching for representative bags in branches with high error rates and including those new bags in the training set.Instead of randomly selecting those bags, the visualization helps the user to identify subspaces, or regions of the feature space, that are poorly represented in the training set by looking at the tree branches with a high error rate, such as those annotated with ellipses in Figure 10d.
The user can then assess results with the confusion matrix of the validation set.In our example, around half the negative bags (blue) were confused as positive (red), as shown in Figure 11.The layout shows a concentration of those errors in branches belonging to the same class and located in different regions of the tree; in other words, they are not neighbors in the tree projection, as shown in Figure 10e.This may indicate that this class covers a wide range of features and may contain subclasses.
After inspecting and analyzing the ClassMatch tree layout, the user selects new bags and updates the initial model.Figure 10f highlights the two bags that were included in the model building process.Figure 10g shows the classified MILTree of the Musk1 test dataset, and Figure 10h presents its correspondent ClassMatch tree after the last model update.The accuracy on the Musk1 test dataset using the initial model was 73.9%; after updating it using new instance prototypes, we achieved 75.2%; and finally, after a second update using new bags, we obtained a 83.2% accuracy.This demonstrates once again that visual mining selection strategies are helpful in the MIL scenario.
To evaluate the performance of the proposed methods when compared to state-of-the-art MIL methods, Section 6 presents quantitative experiments performed over different datasets.

Experiments and Results
This section presents the experiments carried out to compare the proposed methods with MIL methods available for each dataset.We evaluate the average precision, average recall and average accuracy of both MILTree-Med and MILTree-SI using the MILTree layout for five MIL benchmark datasets, Corel-1000 and Corel-2000 image classification datasets, as well as the Biocreative text classification dataset.Furthermore, we perform experiments on a large-scale dataset and in a multiclass problem, both not addressed in previous works.The proposed methodology called MILSIPTree is used to carry out the complete multipleinstance classification process.Note that MILSIPTree is supported by both MILTree-Med and MILTree-SI instance prototype selection methods and the MILTree layout as well.
The source code of the proposed methods and the multi-instance datasets are publicly available to allow reproducibility.The LIBSVM package was applied to train all SVMs using settings that are comparable with the results obtained by the competing methods.For the Musk1 and Musk2 datasets, we have employed the classifier nu-SVC (Nu-Support Vector Classification), with a Nu value equal to 0.6.For the images datasets (Elephant, Fox, Tiger, Corel-1000 and Corel-2000) and text dataset (Biocreative), we used the classifier C-SVC (C-Support Vector Classification), with a Cost value equal to 1.No kernel was used because we intended to show how the proposed methods help obtaining a feature space with linearly separable classes, which require less effort on designing the classifier.

Benchmark Datasets
Five standard MIL benchmarks were used: The Musk1 and Musk2 datasets [3], as well as Elephant, Fox and Tiger image datasets [11].Those have been widely used in multiple-instance learning studies.
Musk1 and Musk2 are real-world benchmark datasets available at the UCI machine learning repository [36].The Musk data were generated in the research of drug activity prediction, in which a drug molecule is represented by a bag, and the different structural conformations of this molecule are considered as instances.Musk1 contains 47 positive bags and 45 negative bags, and the number of instances contained in each bag ranges from 2 to 40.Musk2 contains 39 positive bags and 63 negative bags, and the number of instances contained in each bag ranges from 1 to 1044.Each instance is represented by 166 continuous attributes.Table 1 shows the detailed information on the Musk datasets.
The image datasets named Elephant, Fox and Tiger were built with the goal of discriminating images containing elephants, foxes and tigers from those that do not, respectively.In this case, bags are considered images, and instances are considered regions of interest within the images.More details about these datasets are given in Table 2.
We split each dataset into 30% training and 70% testing data.Our methods allow the training set to be small because they provide smart ways to select it so that it is representative enough.The model update was similar to the case studies 1 and 2: the initial model is updated by either changing the prototypes using MILTree-Med or including new prototypes using MILTree-SI.Table 3 compares the results achieved by MILTree-SI and MILTree-Med in detail.Additionally, in Table 4, we compare MILTree-SI and MILTree-Med with the accuracy reported by nine MIL algorithms from the literature: Four baseline methods such as EM-DD [23], DD-SVM [37], mi-SVM [11] and MI-SVM [11], and five methods that use instance selection such as MILES [15], MILIS [2], MILD-B [38] as well as the most recent state-of-the-art results from MILSIS [16] and MILDE [4].The best accuracies are shown in bold.
The actual number of bags used to update the initial model was between three and eight.These bags were selected through the visual analysis of the data and by choosing alternative prototypes that were automatically detected by using either SI or Med approaches.The results show that the proposed MILTree-SI and MILTree-Med methods are very competitive, especially MILTree-Med, achieving an overall average performance of 82.8%.Table 7 presents the accuracy of different methods from the literature, including EM-DD [23], mi-SVM [11], MI-SVM [11], DD-SVM [37] and SMILES [14].We can see that our MILTree-SI and MILTree-Med methods achieve high classification accuracy on the sub-datasets.In particular, MILTree-Med outperforms all the others in all but three datasets.The competing methods EMDD, MI-SVM and DD-SVM selects only one instance as a prototype, which is often not sufficient.Moreover, our method was able to be competitive when compared with mi-SVM and SMILES, even though both use all instances from each bag to build the classifier.

Multiple-instance Multiclass Datasets
In this section, we turn our attention to the performance of MILTree-Med and MILTree-SI using the MILTree layout for solving multiclass classification problems.The baseline methods, such as EM-DD, mi-SVM, MI-SVM and DD-SVM, were originally proposed for binary class classification.Our MILTree layout and MILTree-Med and MILTree-SI methods also support multiclass datasets.We extend MILTree-Med and MILTree-SI for multiclass by performing one-against-all by decomposing the problem into a number of binary classifiers that are created to separate each class from the remaining ones.We used the Corel-2000 dataset with 2000 images, 20 classes and 100 images per class.Details about segmentation and feature extraction were mentioned in Section 6.2.Two experiments were carried out: one using the first 10 categories in the dataset (Corel-1000), and a second one using the complete dataset with all 20 categories (Corel-2000).Figure 12 shows images randomly sampled from the 20 categories.Bold values indicate the method that obtains the best performance in each dataset.
Table 8 presents the classification accuracy rates, including the results of DD-SVM, MILES and MILIS, as reported by the original papers, and the results of MI-SVM and mi-SVM, as reported in [2].From Table 8, we can see that MILTree-Med and MILTree-SI outperform competing methods due to the efficient bag selection strategy used for training and the efficient instance prototype selection performed inside each bag.

Scalability Analysis on a MIL Text Classification
In MIL text classification, each document is represented as a bag and the document paragraphs as instances.The Biocreative dataset used in this experiment has 1623 documents (papers) extracted from biomedical journals, belonging to three text categories: Components, Processes and Functions, all referring to Gene Ontologies (GOs) [39].It contains 34,569 instances, posing a challenge to conventional visualizations.Table 9 details this dataset.Each text document in the collection has a Protein identification, an associated article ID in PUBMED and a description text.We split the dataset in about 10% training and 90% testing data.After selecting the training set with MILTree and using the proposed prototype selection methods, all training bags were correctly classified, which means that all instances chosen as instance prototypes by MILTree-SI and MILTree-Med selection methods were representative.
To compare our methods with other state-of-the-art methods, we employed the Weka machine learning package (http://www.cs.waikato.ac.nz/ml/weka) (accessed on 20 August 2021).There are no previous results for this dataset employing the aforementioned methods, such as MILES, SMILES and others, for the three categories of Biocreative dataset.Previous work only shows results for one category, such as [40], that presents result only for the "Process" category.For this reason, we compare our methods with multi-instance methods available in Weka, such as DD, EM-DD, MI-SVM, MIWrapper [41], TLDSimple [42] and MIBoost [43].Table 10 shows the results, where both MILtree-Med and MILTree-SI methods supported by the MILTree layout obtained higher accuracy.The results can be explained by the visual discrimination of the categories "Components", "Processes" and "Functions" in the bag space projection of MILTree (see Figure 13a), which provides a clear guideline for users when identifying representative bags for the training set.This corroborates previous results that favor the strategy of selecting samples from both the internal and external parts of the MILTree.In Figure 13b, we show the training sample selected following the established guidelines.

MILTree Layout Bag Positioning
This experiment aims at evaluating how the bag positions in the MILTree layout are related to good candidates for training set selection.As mentioned in Section 5.1, the MILTree projects a bag belonging to a given class as an external point of the tree if it is furthest from the remaining classes.At the core of the MILTree (internal points) will be the bags that are closest to other classes, as well as the ones that overlap in feature space.
We followed a similar methodology to investigate the impact of bag positioning on the classification results.11 shows the results of each multi-instance classification for both collections.When using just external points, the model is often unable to represent boundary elements, resulting in a classifier that does not take into account the overlap degree between the classes.Using only internal bags, we add this information in the training set, but by combining external and internal bags, we have a sample containing both the more class-distinct elements and the ones belonging to the decision boundary region, resulting in a more accurate classifier.

User Study
In this section, we present a user study to evaluate the usability of the MILTree layout for multiple-instance learning problems.We conducted the user study with five participants, three male and two female, who were all undergraduate or graduate students.The age of the participants ranged from 20 to 33.All participants, except one who received additional guidance, had previous knowledge about supervised learning and classification models.All users performed the same task, which was to build a multiple-instance model for the Corel-300 dataset (see Section 5.2 for details).We prepared two 10-minute long videos to instruct how to use MILTree for multipleinstance classification where we used datasets other than Corel-300 as examples.All participants watched the training videos prior to starting the study.We also introduced MILTree to all participants and showed how they could use it.All participants used MILTree-SI as the instance prototype selection method and were instructed to use around 20% of the data to train the model, while the remaining 80% would be used for validation and testing.
After finishing the task, each participant was asked to answer a multiple choice questionnaire, with the questions shown in Table 12, and to justify the grade in a few sentences.For each question the answers 0, 1, 2, 3 and 4 were available, where 0 is the worst score and 4 is the best score.The grades had the following meanings: 0-No; 1-Little; 2-Fair; 3-Good; 4-Excellent.
Table 13 presents the means of the grades given to the questions.These results indicate that MILTree provides effective support to MIL by the subjects in the case study.All participants agreed that MILTree provided a good understanding of the multipleinstance data structure and supported them in the classification task.They had some comments as follows: "Identifying the classes within dataset is very simple because, generally, instances of the same class are closer"; "In two simple steps it is possible to identify and update misclassified bags using Prototypes-ClassMatch tree"; "ClassMatch tree is useful because we can identify new bags that could be more representative for each class." About MILTree as a tool for multi-instance classification, the majority of participants said that it is easy to use, leaving comments such as: "It is a very useful tool for some tasks with

Conclusions
In this paper, we propose MILTree for visual data mining in multiple-instance learning scenarios using an intuitive two-level tree structure that resembles MIL data models.While visually supporting data understanding, our approach also handles multiclass problems: the MILTree-SI selection method aims to uncover the most representative instances in both positive and negative bags, where negative bags could also have positive instances; the MILTree-Med method uses a clustering algorithm to partition unlabeled instances in search of positive and negative clusters to identify adequate instance prototypes.
Besides producing comparable or better accuracy with respect to state-of-the-art methods, our MILTree-based techniques allow the user to take part in every step of the multiple-instance classification process, such as data exploration, sampling for training, model updating (both automatic and manual) and validating the classification model.
The method has been tested on datasets of various sizes, and users have found positive aspects of the approach, as well as limitations, mainly related to interactive functions in the current prototype system.
Our methods and techniques combined are, to the best of our knowledge, the first complete set of visual tools to support MIL learning.
Because we deal with data that are organized in multiple levels (bag and instance levels in the case of MIL), future work can explore related tasks, such as hierarchical clustering or learning from label proportions [45,46], in which the data are organized in groups that can be viewed as bags, and only the proportion of each class in each bag is known.An alternative high precision way of organizing the samples in bags might be to use multidimensional projections.While we would lose the hierarchical organization, there might be benefits in the precision of the display.This is a venue worth pursuing.A visual alternative for very large datasets is also an expected development from this work.

Figure 1 .
Figure 1.Layouts of Bag and Instance Spaces for the Corel-1000 subset (with instance prototypes highlighted) in the MILTree, with a total of 200 bags and 824 instances.

Figure 2 .
Figure 2. Selection of instance prototypes on negative bags using MILTre-SI.B + i represents positive bags, B − i represents negative bags and d(B + i1 , B − ij ) represents the Euclidean distance between the optimal positive and a given negative instance.

Figure 4 .
Figure 4. Methods for selecting instance prototypes B ix and B iy .Both (a) and (b) project the same instances from a positive bag B + i of the MUSK1 dataset on the MILTree's instance space layout.

Figure 5 .
Figure 5. MILTree's bag space layout for a subset of the Corel-1000 dataset (100 images of the horse category and 100 random images selected from the remaining categories), with the projection of its ground truth (a), selected training set (b) and InstancePrototypes ClassMatch tree (c).

Figure 6 .
Figure 6.MILTree's Bag Space Layout for the People Category of the Corel-1000 dataset, with the projection of its Ground truth (a), selected training set (b) and InstancePrototypes ClassMatch tree (c).

Figure 7 .
Figure 7. Instance Space Layout of each bag with an unsuitable instance prototype.A, B, C, D, E, F and G represent red bags.

Figure 8 .
Figure 8. Classification result in the Bag Space Layout of MILTree for the People Category of the Corel-1000 dataset using a classification model with new instance prototypes (a) and corresponding classMatch tree (b).
displays the Corel-300 dataset in the bag space projection, where bags are points and the different colors represent different classes.

Figure 9 .
Figure 9. MILTree's Bag Space Layout for the Corel-300 dataset.Visualization of the classification process from training set selection to classification result inspection.Visualization of ground truth of dataset (a), selected training (b), InstancePrototypes ClassMatch tree where bags with unsuitable instance prototypes are identified (c), visualization of classification result (d) and its correspondent ClassMatch tree (e).Note that the InstancePrototypes ClassMatch tree only shows the bags that were misclassified in the training set, whereas the ClassMatch tree shows the bags that were misclassified in the test data and training set.Hence, for evaluating the classification results, users should only inspect the ClassMatch tree (e).
(a) Ground truth.(b) Selected training set.(c) InstancePrototypes ClassMatch tree.(d) ClassMatch tree of initial classification result.(e) Misclassified bags selected in the Ground truth.(f) Selection of new bags located in the branches where bags were misclassified.(g)Classification result using updated model with new instance prototypes and new bags.(h)ClassMatch tree of final classification result.

Figure 10 .
Figure 10.Visualization of the classification process for the Musk1 dataset.

Figure 11 .
Figure 11.Confusion matrix for the Musk1 dataset for classification results after using the initial classification model.Blue color represents the negative class, and the red color represents the positive class.

Figure 12 .
Figure 12.Images randomly sampled from 20 categories of the COREL dataset and the corresponding segmentation results.Segmented regions are shown in their representative colors.
Three training sets are used in this analysis; the first training set is composed only of external instances, the second training set is composed only of internal instances and the third training set is composed of a combination of the first and second training sets.

Figure 13 .
Figure 13.Bag space projection of MILTree for the Biocreative dataset using MILTree-SI, with the projection of its ground truth (a) and the selected training data (red bags) (b).The Corel's Cat3 and Cat6 subdatasets and MILTree-Med were used for this experiment.For Cat3, a total of 47 training bags are selected as training examples, while the remaining 153 bags are used as test set.For the Cat6, 44 training bags were selected for training, while the remaining 156 bags are used as the test set.Table11shows the results of each multi-instance classification for both collections.When using just external points, the model is often unable to represent boundary elements, resulting in a classifier that does not take into account the overlap degree between the classes.Using only internal bags, we add this information in the training set, but by combining external and internal bags, we have a sample containing both the more class-distinct elements and the ones belonging to the decision boundary region, resulting in a more accurate classifier.

Table 1 .
Musk datasets and the average number of instances per bag(Inst/Bag) for each dataset.

Table 2 .
Image datasets and the average number of instances per bag(Inst/Bag) for each dataset.

Table 3 .
Results of classification using MILTree-Med and MILTree-SI on the benchmark datasets

Table 6 .
Classification results using MILTree-SI on the Corel Dataset.

Table 7 .
Comparison between MILTree-SI/MILTree-Med and related methods from the literature on the Corel Dataset.

Table 8 .
Comparison between MILTree-SI / MILTree-Med and related methods from the literature on the 1000-Corel and 2000-Corel Datasets.

Table 9 .
Biocreative dataset.Total number of bags and instances for each category.

Table 10 .
Comparison of classification accuracy between MILTree-SI/MILTree-Med and baseline methods on the Biocreative dataset.
Bold value indicates the method that obtains the best performance.

Table 11 .
Results of multi-instance classification using three types of training set.

Table 12 .
The questionnaire used in the evaluation.Is the ClassMatch tree useful for discovering new bags that help to improve or update the model?5Do you feel that MILTree provides useful support in the multiple-instance classification process?

Table 13 .
Means of the results obtained in the evaluation using the questionnaire for multi-instance classification of Corel-300 dataset.

Table 16 .
Contrast Estimation between the methods on each row with respect to the methods on each column, considering different datasets.Positive values indicate that the method in the row presented higher average accuracy than the method in the column.The proposed methods are MILTree-Med (MT-Med) and MILTree-SI (MT-SI)