# A Probabilistic Bag-to-Class Approach to Multiple-Instance Learning

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

**Dataset:**Breast tissue images available at https://bioimage.ucsb.edu/research/bio-segmentation, extracted feature vectors available at https://figshare.com/articles/MIProblems_A_repository_of_multiple_instance_learning_datasets/6633983. BreakHis data available at https://web.inf.ufpr.br/vri/databases/breast-cancer-histopathological-database-breakhis/. Code available at https://github.com/kajsam/ProbabilisticBag2Class.

**Dataset License:**CC BY 4.0

## 1. Introduction

#### 1.1. Classification of Weakly Supervised Data

#### 1.2. Multi-Instance Learning

#### 1.3. Bag Density and Class Sparsity

#### 1.4. A Probabilistic Bag-to-Class Approach to Multi-Instance Learning

- presenting the hierarchical model for general, non-standard MI assumptions (Section 3.3),
- introduction of bag-to-class dissimilarity measures (Section 3.5), and
- identification of two properties for bag-to-class divergence (Section 4.1).

## 2. Related Work and State-of-the-Art

## 3. Theoretical Background and Intuitions

#### 3.1. Notation

- X: instance random vector
- C: class, either $pos$ or $neg$
- B: bag
- $P(\xb7)$: probability distribution
- ${\mathbf{x}}_{ki}$: feature vector (instance) in set k, $i=1,\dots ,{n}_{k}$
- ${\mathbb{X}}_{k}$: set of feature vectors k of size ${n}_{k}$
- ${y}_{k}$: bag label
- $\mathcal{X}$: sample space for instances
- ${\mathcal{X}}^{+}$: sample space for positive instances
- ${\mathcal{X}}^{-}$: sample space for negative instances
- ${\mathcal{B}}_{pos}$: sample space of positive bags
- ${\mathcal{B}}_{neg}$: sample space of negative bags
- $P\left(C\right|{\mathbb{X}}_{k})$ : posterior class probability, given instance sample
- $\mathsf{\Theta}$: parameter random vector
- ${\theta}_{k}$: parameter vector
- ${P}_{bag}\left(X\right)=P\left(X\right|B)$: probability distribution for instances in bag B
- $P\left(X\right|{\theta}_{k})$: parameterized probability distribution of bag k
- ${P}_{pos}\left(X\right)=P\left(X\right|pos)$: probability distribution for instances from the positive class
- ${P}_{neg}\left(X\right)=P\left(X\right|neg)$: probability distribution for instances from the negative class
- ${\tau}_{i}$: instance label
- ${\pi}_{k}$: probability of positive instances
- $D({P}_{k},{P}_{\ell})=D\left({P}_{k}\left(X\right),{P}_{\ell}\left(X\right)\right)$: divergence from ${P}_{k}\left(X\right)$ to ${P}_{\ell}\left(X\right)$
- ${f}_{k}\left(\mathbf{x}\right)=f\left(\mathbf{x}\right|{\theta}_{k})$: probability density function (PDF) for bag k
- $D({f}_{k},{f}_{\ell})=D\left({f}_{k}\left(\mathbf{x}\right),{f}_{\ell}\left(\mathbf{x}\right)\right)$: divergence from ${f}_{k}\left(\mathbf{x}\right)$ to ${f}_{\ell}\left(\mathbf{x}\right)$

#### 3.2. The Non-Vectorial Space of Probability Functions

#### 3.3. Hierarchical Distributions

- Step 1: Draw ${\pi}_{k}$ from $P\left({\mathsf{\Pi}}_{pos}\right)$, ${\theta}_{k}^{+}$ from $P\left({\mathsf{\Theta}}^{+}\right)$, and ${\theta}_{k}^{-}$ from $P\left({\mathsf{\Theta}}^{-}\right)$. These three parameters define the bag.
- Step 2: For $i=1,\dots ,{n}_{k}$, draw ${\tau}_{i}$ from $P\left(\mathcal{T}\right|{\pi}_{k})$, draw ${\mathbf{x}}_{i}$ from $P\left(X\right|{\theta}_{k}^{+})$ if ${\tau}_{i}=1$, and from $P\left(X\right|{\theta}_{k}^{-})$ otherwise.

#### 3.4. Dissimilarities in MI Learning

- Equality and orthogonality: An f-divergence takes its minimum when the two probability functions are equal and its maximum when they are orthogonal. This means that two identical bags will have minimum dissimilarity between them, and that two bags without shared sample space will have maximum dissimilarity. A definition of orthogonal distributions can be found in [47].
- Monotonicity: The f-divergences possess a monotonicity property that can be thought of as an equivalent to the triangle property for distances: For a family of densities with monotone likelihood ratio, if $a<{\theta}_{1}<{\theta}_{2}<{\theta}_{3}<b$, then $D\left(P\left(X\right|{\theta}_{1}),P\left(X\right|{\theta}_{2})\right)\le D\left(P\left(X\right|{\theta}_{1}),P\left(X\right|{\theta}_{3})\right)$. This is valid, e.g., for Gaussian densities with equal variance and mean $\theta $. This means that if the distance between ${\theta}_{1}$ and ${\theta}_{3}$ is larger than the distance between ${\theta}_{1}$ and ${\theta}_{2}$, the divergence is larger or equal. The f-divergences are not symmetric by definition, but some of them are.

#### 3.5. Bag-to-Class Dissimilarity

## 4. Properties for Bag-Level Classification

#### 4.1. Properties for Bag-to-Class Divergences

#### 4.2. A Class-Conditional Dissimilarity for MI Classification

#### 4.3. Bag-Level Divergence Classification

## 5. Experiments

#### 5.1. Simulated Data and Class Sparsity

- Sim 1: ${\nu}^{+}=15,{\eta}^{+}=1,{\pi}^{-}=0$: No positive instances in negative bags.
- Sim 2: ${\nu}^{+}=15,{\eta}^{+}=1,{\pi}^{-}=0.01$: Positive instances in negative bags.
- Sim 3: ${\nu}^{+}=0,{\eta}^{+}=100,{\pi}^{-}=0$: Positive and negative instances have the same expectation of the mean, but unequal variance.
- Sim 4: $P({\nu}^{+}=15)=P({\nu}^{+}=-15)=0.5,{\eta}^{+}=1,{\pi}^{-}=0.01$: Positive instances are sampled from two distributions with unequal mean expectation.

#### 5.2. The Impact of Pdf Estimation and Comparison to Other Methods

#### 5.3. Comparison to State-of-the-Art Methods

- GMMs are fitted with $1,\dots ,100$ components, and the number of components is chosen by minimum AIC. To save computation time, the number of components is estimated for 10 bags sampled from the training set. The median number of components is used to fit the bag PDFs in the rest of the algorithm, see Table 4. For the class PDFs, a random subsample of 10% of the instances is taken from each bag, to reduce computation time.
- Integrals: Importance sampling.
- Classification: To estimate the threshold, t, the training set is used to estimate ${f}_{pos}^{train}\left(\mathbf{x}\right)$ and ${f}_{neg}^{train}\left(\mathbf{x}\right)$, and the divergences between the bags in the validation set and ${f}_{pos}^{train}\left(\mathbf{x}\right)$ and ${f}_{neg}^{train}\left(\mathbf{x}\right)$ are calculated. The threshold, $\widehat{t}$, that gives the highest accuracy will then serve as threshold for the test set.

#### 5.4. Results

## 6. Discussion

#### 6.1. Point-of-View

#### 6.2. Conclusions and Future Work

## Author Contributions

## Funding

## Conflicts of Interest

## Abbreviations

MI | multi-instance |

probability density function | |

IS | instance space |

ES | embedded space |

BS | bag space |

KL | Kullback–Leibler |

SVM | support vector machine |

AIC | Akaike Information Criterion |

GMM | Gaussian mixture models |

KDE | kernel density estimation |

ROC | receiver operating characteristic |

AUC | area under the ROC curve |

## Appendix A

#### Appendix A.1. Non-Symmetric Divergences

#### Appendix A.2. Class-Conditional Bag-to-Class Divergence

**Class-conditional KL-divergence:**

## References

- Cheplygina, V.; de Bruijne, M.; Pluim, J.P. Not-so-supervised: A survey of semi-supervised, multi-instance, and transfer learning in medical image analysis. Med. Image Anal.
**2019**, 54, 280–296. [Google Scholar] [CrossRef] [Green Version] - Gelasca, E.D.; Byun, J.; Obara, B.; Manjunath, B.S. Evaluation and Benchmark for Biological Image Segmentation. In Proceedings of the IEEE International Conference on Image Processing, San Diego, CA, USA, 12–15 October 2008; pp. 1816–1819. [Google Scholar] [CrossRef] [Green Version]
- Kandemir, M.; Zhang, C.; Hamprecht, F.A. Empowering Multiple Instance Histopathology Cancer Diagnosis by Cell Graphs. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention—MICCAI 2014, Boston, MA, USA, 14–18 September 2014; pp. 228–235. [Google Scholar] [CrossRef]
- Doran, G.; Ray, S. Multiple-Instance Learning from Distributions. J. Mach. Learn. Res.
**2016**, 17, 1–50. [Google Scholar] - Zhang, M.L.; Zhou, Z.H. Multi-instance clustering with applications to multi-instance prediction. Appl. Intell.
**2009**, 31, 47–68. [Google Scholar] [CrossRef] - Zhou, Z.H.; Zhang, M.L.; Huang, S.J.; Li, Y.F. Multi-instance multi-label learning. Artif. Intell.
**2012**, 176, 2291–2320. [Google Scholar] [CrossRef] [Green Version] - Tang, P.; Wang, X.; Huang, Z.; Bai, X.; Liu, W. Deep patch learning for weakly supervised object classification and discovery. Pattern Recognit.
**2017**, 71, 446–459. [Google Scholar] [CrossRef] [Green Version] - Wang, X.; Yan, Y.; Tang, P.; Bai, X.; Liu, W. Revisiting multiple instance neural networks. Pattern Recognit.
**2018**, 74, 15–24. [Google Scholar] [CrossRef] [Green Version] - Dietterich, T.G.; Lathrop, R.H.; Lozano-Pérez, T. Solving the multiple instance problem with axis-parallel rectangles. Artif. Intell.
**1997**, 89, 31–71. [Google Scholar] [CrossRef] [Green Version] - Xu, Y.Y. Multiple-instance learning based decision neural networks for image retrieval and classification. Neurocomputing
**2016**, 171, 826–836. [Google Scholar] [CrossRef] - Qiao, M.; Liu, L.; Yu, J.; Xu, C.; Tao, D. Diversified dictionaries for multi-instance learning. Pattern Recognit.
**2017**, 64, 407–416. [Google Scholar] [CrossRef] - Weidmann, N.; Frank, E.; Pfahringer, B. A Two-Level Learning Method for Generalized Multi-instance Problems. In Proceedings of the European Conference on Machine Learning, Cavtat-Dubrovnik, Croatia, 22–26 September 2003; pp. 468–479. [Google Scholar] [CrossRef] [Green Version]
- Foulds, J.; Frank, E. A review of multi-instance learning assumptions. Knowl. Eng. Rev.
**2010**, 25, 1–25. [Google Scholar] [CrossRef] [Green Version] - Cheplygina, V.; Tax, D.M.J.; Loog, M. Multiple Instance Learning with Bag Dissimilarities. Pattern Recognit.
**2015**, 48, 264–275. [Google Scholar] [CrossRef] [Green Version] - Amores, J. Multiple Instance Classification: Review, Taxonomy and Comparative Study. Artif. Intell.
**2013**, 201, 81–105. [Google Scholar] [CrossRef] - Carbonneau, M.A.; Cheplygina, V.; Granger, E.; Gagnon, G. Multiple Instance Learning: A survey of Problem Characteristics and Applications. Pattern Recognit.
**2018**, 77, 329–353. [Google Scholar] [CrossRef] - Maron, O.; Lozano-Pérez, T. A framework for multiple-instance learning. In Advances in Neural Information Processing Systems, Denver, CO, USA, 30 November–5 December 1998; MIT Press: Cambridge, MA, USA, 1998; Volume 10, pp. 570–576. [Google Scholar]
- Xu, X.; Frank, E. Logistic Regression and Boosting for Labeled Bags of Instances; Lecture Notes in Computer Science; Dai, H., Srikant, R., Zhang, C., Eds.; Springer: Berlin/Heidelberg, Germany, 2004; pp. 272–281. [Google Scholar] [CrossRef] [Green Version]
- Tax, D.M.J.; Loog, M.; Duin, R.P.W.; Cheplygina, V.; Lee, W.J. Bag Dissimilarities for Multiple Instance Learning. In Similarity-Based Pattern Recognition; Lecture Notes in Computer Science; Pelillo, M., Hancock, E.R., Eds.; Springer: Berlin/Heidelberg, Germany, 2011; Volume 7005, pp. 222–234. [Google Scholar] [CrossRef]
- Zhou, Z.H.; Sun, Y.Y.; Li, Y.F. Multi-instance Learning by Treating Instances As non-I.I.D. Samples. In Proceedings of the 26th Annual International Conference on Machine Learning—ICML ’09, Montreal, QC, Canada, 14–18 June 2009; pp. 1249–1256. [Google Scholar] [CrossRef] [Green Version]
- Cheplygina, V.; Tax, D.M.J.; Loog, M. Dissimilarity-Based Ensembles for Multiple Instance Learning. IEEE Trans. Neural Netw. Learn. Syst.
**2016**, 27, 1379–1391. [Google Scholar] [CrossRef] [Green Version] - Boiman, O.; Shechtman, E.; Irani, M. In defense of Nearest-Neighbor based image classification. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 23–28 June 2008; pp. 1–8. [Google Scholar] [CrossRef]
- Lee, W.J.; Cheplygina, V.; Tax, D.M.J.; Loog, M.; Duin, R.P.W. Bridging structure and feature representations in graph matching. Int. J. Patten Recognit. Artif. Intell.
**2012**, 26, 1260005. [Google Scholar] [CrossRef] [Green Version] - Scott, S.; Zhang, J.; Brown, J. On generalized multiple-instance learning. Int. J. Comput. Intell. Appl.
**2005**, 5, 21–35. [Google Scholar] [CrossRef] [Green Version] - Ruiz-Muñoz, J.F.; Castellanos-Dominguez, G.; Orozco-Alzate, M. Enhancing the dissimilarity-based classification of birdsong recordings. Ecol. Inform.
**2016**, 33, 75–84. [Google Scholar] [CrossRef] - Sørensen, L.; Loog, M.; Tax, D.M.J.; Lee, W.J.; de Bruijne, M.; Duin, R.P.W. Dissimilarity-Based Multiple Instance Learning. In Structural, Syntactic, and Statistical Pattern Recognition; Hancock, E.R., Wilson, R.C., Windeatt, T., Ulusoy, I., Escolano, F., Eds.; Springer: Berlin/Heidelberg, Germany, 2010; pp. 129–138. [Google Scholar] [CrossRef]
- Schölkopf, B. The Kernel Trick for Distances. In Proceedings of the 13th International Conference on Neural Information Processing Systems, Denver, CO, USA, 27 November–2 December 2000; MIT Press: Cambridge, MA, USA, 2000; pp. 283–289. [Google Scholar]
- Wei, X.S.; Wu, J.; Zhou, Z.H. Scalable Algorithms for Multi-Instance Learning. IEEE Trans. Neural Netw. Learn. Syst.
**2017**, 28, 975–987. [Google Scholar] [CrossRef] - Kullback, S.; Leibler, R.A. On Information and Sufficiency. Ann. Math. Stat.
**1951**, 22, 79–86. [Google Scholar] [CrossRef] - Sahu, S.K.; Cheng, R.C.H. A fast distance-based approach for determining the number of components in mixtures. Can. J. Stat.
**2003**, 31, 3–22. [Google Scholar] [CrossRef] - Sudharshan, P.; Petitjean, C.; Spanhol, F.; Oliveira, L.E.; Heutte, L.; Honeine, P. Multiple instance learning for histopathological breast cancer image classification. Expert Syst. Appl.
**2019**, 117, 103–111. [Google Scholar] [CrossRef] - Zhang, G.; Yin, J.; Li, Z.; Su, X.; Li, G.; Zhang, H. Automated skin biopsy histopathological image annotation using multi-instance representation and learning. BMC Med. Genom.
**2013**, 6, S10. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Li, W.; Zhang, J.; McKenna, S.J. Multiple instance cancer detection by boosting regularised trees. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Basel, Switzerland, 2015; Volume 9349, pp. 645–652. [Google Scholar] [CrossRef] [Green Version]
- Tomczak, J.M.; Ilse, M.; Welling, M. Deep Learning with Permutation-invariant Operator for Multi-instance Histopathology Classification. arXiv
**2017**, arXiv:1712.00310. [Google Scholar] - Mercan, C.; Aksoy, S.; Mercan, E.; Shapiro, L.G.; Weaver, D.L.; Elmore, J.G. Multi-Instance Multi-Label Learning for Multi-Class Classification of Whole Slide Breast Histopathology Images. IEEE Trans. Med. Imaging
**2018**, 37, 316–325. [Google Scholar] [CrossRef] [Green Version] - Xu, Y.; Zhu, J.Y.; Chang, E.I.; Lai, M.; Tu, Z. Weakly supervised histopathology cancer image segmentation and classification. Med. Image Anal.
**2014**, 18, 591–604. [Google Scholar] [CrossRef] [PubMed] [Green Version] - McCann, M.T.; Bhagavatula, R.; Fickus, M.C.; Ozolek, J.A.; Kovaĉević, J. Automated colitis detection from endoscopic biopsies as a tissue screening tool in diagnostic pathology. In Proceedings of the 2012 19th IEEE International Conference on Image Processing, Orlando, FL, USA, 30 September–3 October 2012; pp. 2809–2812. [Google Scholar] [CrossRef] [Green Version]
- Dundar, M.M.; Badve, S.; Raykar, V.C.; Jain, R.K.; Sertel, O.; Gurcan, M.N. A multiple instance learning approach toward optimal classification of pathology slides. In Proceedings of the International Conference on Pattern Recognition, Istanbul, Turkey, 23–26 August 2010; pp. 2732–2735. [Google Scholar] [CrossRef]
- Samsudin, N.A.; Bradley, A.P. Nearest neighbour group-based classification. Pattern Recognit.
**2010**, 43, 3458–3467. [Google Scholar] [CrossRef] - Kraus, O.Z.; Ba, J.L.; Frey, B.J. Classifying and segmenting microscopy imageswith deep multiple instance learning. Bioinformatics
**2016**, 32, i52–i59. [Google Scholar] [CrossRef] - Hou, L.; Samaras, D.; Kurc, T.M.; Gao, Y.; Davis, J.E.; Saltz, J.H. Efficient Multiple Instance Convolutional Neural Networks for GigapixelResolution Image Classification. arXiv
**2015**, arXiv:1504.07947. [Google Scholar] - Jia, Z.; Huang, X.; Chang, E.I.; Xu, Y. Constrained Deep Weak Supervision for Histopathology Image Segmentation. IEEE Trans. Med. Imaging
**2017**, 36, 2376–2388. [Google Scholar] [CrossRef] [Green Version] - Jiang, B.; Pei, J.; Tao, Y.; Lin, X. Clustering Uncertain Data Based on Probability Distribution Similarity. IEEE Trans. Knowl. Data Eng.
**2013**, 25, 751–763. [Google Scholar] [CrossRef] - Kriegel, H.P.; Pfeifle, M. Density-based Clustering of Uncertain Data. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining KDD ’05, Chicago, IL, USA, 21–24 August 2005; pp. 672–677. [Google Scholar] [CrossRef]
- Ali, S.M.; Silvey, S.D. A General Class of Coefficients of Divergence of One Distribution from Another. J. R. Stat. Soc. Ser. B (Methodol.)
**1966**, 28, 131–142. [Google Scholar] [CrossRef] - Csiszár, I. Information-type measures of difference of probability distributions and indirect observations. Studia Scientiarum Mathematicarum Hungarica
**1967**, 2, 299–318. [Google Scholar] - Berger, A. On orthogonal probability measures. Proc. Am. Math. Soc.
**1953**, 4, 800–806. [Google Scholar] [CrossRef] - Gibbs, A.L.; Su, F.E. On Choosing and Bounding Probability Metrics. Int. Stat. Rev.
**2002**, 70, 419–435. [Google Scholar] [CrossRef] [Green Version] - Møllersen, K.; Dhar, S.S.; Godtliebsen, F. On Data-Independent Properties for Density-Based Dissimilarity Measures in Hybrid Clustering. Appl. Math.
**2016**, 07, 1674–1706. [Google Scholar] [CrossRef] [Green Version] - Møllersen, K.; Hardeberg, J.Y.; Godtliebsen, F. Divergence-based colour features for melanoma detection. In Proceedings of the 2015 Colour and Visual Computing Symposium (CVCS), Gjøvik, Norway, 25–26 August 2015; pp. 1–6. [Google Scholar] [CrossRef]
- Eguchi, S.; Copas, J. Interpreting Kullback-Leibler Divergence with the Neyman-Pearson Lemma. J. Multivar. Anal.
**2006**, 97, 2034–2040. [Google Scholar] [CrossRef] [Green Version] - Kass, R.E.; Raftery, A.E. Bayes Factors. J. Am. Stat. Assoc.
**1995**, 90, 773–795. [Google Scholar] [CrossRef] - Taneja, I.J.; Kumar, P. Generalized non-symmetric divergence measures and inequaities. J. Interdiscip. Math.
**2006**, 9, 581–599. [Google Scholar] [CrossRef] - McLachlan, G.; Peel, D. Finite Mixture Models; Wiley Series in Probability and Statistics; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2000. [Google Scholar] [CrossRef]
- Sheather, S.J.; Jones, M.C. A Reliable Data-Based Bandwidth Selection Method for Kernel Density Estimation. J. R. Stat. Soc. Ser. B (Methodol.)
**1991**, 53, 683–690. [Google Scholar] [CrossRef] - Wei, X.S.; Zhou, Z.H. An empirical study on image bag generators for multi-instance learning. Mach. Learn.
**2016**, 105, 155–198. [Google Scholar] [CrossRef] [Green Version] - Andrews, S.; Andrews, S.; Tsochantaridis, I.; Hofmann, T. Support vector machines for multiple-instance learning. Adv. Neural Inf. Process. Syst.
**2003**, 15, 561–568. [Google Scholar] - Venkatesan, R.; Chandakkar, P.; Li, B. Simpler Non-Parametric Methods Provide as Good or Better Results to Multiple-Instance Learning. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Sun, M.; Han, T.X.; Liu, M.-C.; Khodayari-Rostamabad, A. Multiple Instance Learning Convolutional Neural Networks for object recognition. In Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 4–8 December 2016; pp. 3270–3275. [Google Scholar] [CrossRef] [Green Version]

**Figure 1.**Breast tissue images [2]. The image segments are not labeled.

**Figure 3.**Parametric generative model. Bags are realizations of random parameter vectors, sampled according to the respective class distributions. Instances are realizations of feature vectors, sampled according the respective bag distributions. Only the instance sets are observed.

**Figure 5.**(

**a**) One positive bag in the training set gives small variance for the class PDF. (

**b**) Ten positive bags in the training set, and the variance has increased.

**Figure 6.**An example of ROC curves for $cKL$, $rKL$ and $rBH$ classifiers. The performance increases when the number of positive bags in the training set increases from 1 (dashed line) to 10 (solid line). The sensitivity-specificity pairs for the bag-to-bag KL and BH classifier is displayed for 100 positive and negative bags in the training set for comparison.

Bags | $\mathit{neg}$: 5 | $\mathit{neg}$: 10 | $\mathit{neg}$: 25 | |||||||
---|---|---|---|---|---|---|---|---|---|---|

Sim: | $\mathit{pos}$: | $\mathit{rBH}$ | $\mathit{rKL}$ | $\mathit{cKL}$ | $\mathit{rBH}$ | $\mathit{rKL}$ | $\mathit{cKL}$ | $\mathit{rBH}$ | $\mathit{rKL}$ | $\mathit{cKL}$ |

1 | 61 | 69 | 85 | 62 | 72 | 89 | 61 | 73 | 92 | |

1 | 5 | 63 | 75 | 86 | 64 | 82 | 94 | 68 | 84 | 97 |

10 | 69 | 86 | 87 | 73 | 91 | 95 | 75 | 91 | 98 | |

1 | 57 | 61 | 75 | 59 | 61 | 78 | 58 | 55 | 75 | |

2 | 5 | 59 | 67 | 79 | 60 | 68 | 84 | 62 | 63 | 85 |

10 | 64 | 77 | 80 | 66 | 78 | 86 | 68 | 72 | 86 | |

1 | 51 | 55 | 71 | 52 | 58 | 73 | 50 | 57 | 74 | |

3 | 5 | 53 | 61 | 76 | 53 | 66 | 81 | 52 | 65 | 83 |

10 | 58 | 73 | 78 | 58 | 76 | 84 | 57 | 76 | 87 | |

1 | 55 | 61 | 70 | 56 | 62 | 73 | 56 | 58 | 69 | |

4 | 5 | 56 | 63 | 75 | 57 | 64 | 81 | 59 | 59 | 80 |

10 | 60 | 74 | 77 | 62 | 76 | 85 | 63 | 69 | 84 | |

1 | 64 | 61 | 62 | 67 | 63 | 66 | 64 | 62 | 67 | |

5 | 5 | 73 | 69 | 63 | 74 | 70 | 67 | 75 | 71 | 72 |

10 | 74 | 70 | 62 | 75 | 73 | 69 | 76 | 74 | 72 | |

1 | 68 | 68 | 67 | 66 | 68 | 68 | 68 | 71 | 68 | |

6 | 5 | 65 | 64 | 67 | 68 | 68 | 69 | 70 | 71 | 74 |

10 | 66 | 64 | 66 | 70 | 69 | 72 | 72 | 73 | 74 |

KDE (Epan.) | KDE (Gauss.) | GMMs | |
---|---|---|---|

$cKL$ | 90 | 92 | 94 |

$rKL$ | 82 | 92 | 96 |

Data Set | $40\times $ | $100\times $ | $200\times $ | $400\times $ |
---|---|---|---|---|

Dimension | 23 | 26 | 25 | 24 |

Rep 1 | 66 | 55 | 52 | 70 |

Rep 2 | 58 | 49 | 69 | 71 |

Rep 3 | 59 | 50 | 50 | 70 |

Rep 4 | 47 | 49 | 58 | 73 |

Rep 5 | 63 | 59 | 72 | 74 |

**Table 5.**Accuracy and standard deviation. Best results and those within one standard deviation in bold.

Data Set (Magnification) | $40\times $ | $100\times $ | $200\times $ | $400\times $ |
---|---|---|---|---|

MI-SVM poly [57] | 86.2 (2.8) | 82.8 (4.8) | 81.7 (4.4) | 82.7 (3.8) |

Non-parametric [58] | 87.8 (5.6) | 85.6 (4.3) | 80.8 (2.8) | 82.9 (4.1) |

MILCNN [59] | 86.1 (4.2) | 83.8 (3.1) | 80.2 (2.6) | 80.6 (4.6) |

CNN [31] | 85.6 (4.8) | 83.5 (3.9) | 83.1 (1.9) | 80.8 (3.0) |

SVM [31] | 79.9 (3.7) | 77.1 (5.5) | 84.2 (1.6) | 81.2 (3.6) |

rKL | 83.4 (4.1) | 84.9 (4.2) | 88.3 (3.6) | 84.0 (2.8) |

cKL | 81.5 (3.2) | 85.2 (3.5) | 88.1 (3.6) | 85.0 (3.5) |

Data Set (Magnification) | $40\times $ | $100\times $ | $200\times $ | $400\times $ |
---|---|---|---|---|

rKL | 91.4 (2.4) | 91.3 (2.2) | 94.4 (1.9) | 91.6 (1.7) |

cKL | 88.4 (2.6) | 89.7 (1.6) | 91.9 (2.7) | 91.7 (2.4) |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Møllersen, K.; Hardeberg, J.Y.; Godtliebsen, F.
A Probabilistic Bag-to-Class Approach to Multiple-Instance Learning. *Data* **2020**, *5*, 56.
https://doi.org/10.3390/data5020056

**AMA Style**

Møllersen K, Hardeberg JY, Godtliebsen F.
A Probabilistic Bag-to-Class Approach to Multiple-Instance Learning. *Data*. 2020; 5(2):56.
https://doi.org/10.3390/data5020056

**Chicago/Turabian Style**

Møllersen, Kajsa, Jon Yngve Hardeberg, and Fred Godtliebsen.
2020. "A Probabilistic Bag-to-Class Approach to Multiple-Instance Learning" *Data* 5, no. 2: 56.
https://doi.org/10.3390/data5020056