# SVM-Based Multiple Instance Classification via DC Optimization

^{1}

^{2}

^{3}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

## 2. A DC Decomposition of the SVM-Based MIL

## 3. Solving DC-MIL Using a Nonsmooth DC Algorithm

- It iteratively builds two separate piecewise-affine approximations of the component functions, grouping the corresponding information in two separate bundles.
- It combines the two convex piecewise-affine approximations and generates a DC piecewise-affine model.
- The DC (hence, nonconvex) model is locally approximated using an auxiliary quadratic program, whose solution is used to certify approximate criticality, or to generate a descent search-direction to be explored via a backtracking line-search approach.
- Whenever no descent is achieved along the search direction, the bundle of the first function is enriched, thus, obtaining a better model function with this being the fundamental feature of any cutting plane algorithm.

Algorithm 1 DCPCA Main Iteration | |

1: Solve QP(I) and obtain $(\overline{d},\overline{v})$ | ▹ Find the search-direction and the predicted-reduction |

2: if $|\overline{v}|\le \theta $ then | ▹ Stopping test |

3: set ${z}^{*}=z$ and exit | ▹ Return the approximate critical point ${z}^{*}$ |

4: end if | |

5: Set $t=1$ | ▹ Start the line-search |

6: if $f(z+t\overline{d})-f(z)\le mt\overline{v}$ then | ▹ Descent test |

7: set $z:=z+t\overline{d}$ | ▹ Make a serious step |

8: calculate ${g}_{+}^{(1)}\in \partial {f}_{1}(z)$ and ${g}_{+}^{(2)}\in \partial {f}_{2}(z)$ | ▹ | |

9: update ${\alpha}_{i}^{(1)}$ for all $i\in I$ and ${\alpha}_{l}^{(2)}$ for all $l\in L$ | ▹ | |

10: set ${\mathcal{B}}_{1}={\mathcal{B}}_{1}\backslash \{({g}_{i}^{(1)},{\alpha}_{i}^{(1)}):{\alpha}_{i}^{(1)}>\epsilon ,\phantom{\rule{0.166667em}{0ex}}i\in I\}\cup \{({g}_{+}^{(1)},0)\}$ | ▹ | |

11: set ${\mathcal{B}}_{2}={\mathcal{B}}_{2}\cup \{({g}_{+}^{(2)},0)\}$ | ▹ | |

12: update appropriately I and L, and go to 1 | ▹ | |

13: else if $t\parallel \overline{d}\parallel >\eta $ then | ▹ Closeness test |

14: set $t=\sigma t$ and go to 6 | ▹ Reduce the step-size and iterate the line-search |

15: end if | |

16: Calculate ${g}_{+}^{(1)}\in \partial {f}_{1}(z+t\overline{d})$ | ▹ Make a null step |

17: calculate ${\alpha}_{+}^{(1)}={f}_{1}(z)-{f}_{1}(z+t\overline{d})+t{g}_{+}^{(1)\top}\overline{d}$ | ▹ | |

18: set ${\mathcal{B}}_{1}={\mathcal{B}}_{1}\cup \{({g}_{+}^{(1)},{\alpha}_{+}^{(1)})\}$, update appropriately I, and go to 1 | ▹ | |

## 4. Results

**train**, %), the average cpu time (

**cpu**, sec), the average number of function evaluations (

**nF**), and the average number of subgradient evaluations of the two functions (

**nG1**and

**nG2**). The reliability results show a good and balanced performance of the DC-MIL approach equipped with DCPCA, both, for the medium-size problems, where in one case DC-MIL slightly outperforms the other approaches, and for the large-size problems. Moreover, we observe that our approach looks strongly efficient as it manages to achieve high train-correctness in reasonably small execution times even for large-size problems.

## 5. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## Abbreviations

MIL | Multiple instance learning |

SVM | Support vector machine |

DC | Difference of convex |

## References

- Amores, J. Multiple instance classification: Review, taxonomy and comparative study. Artif. Intell.
**2013**, 201, 81–105. [Google Scholar] [CrossRef] - Carbonneau, M.; Cheplygina, V.; Granger, E.; Gagnon, G. Multiple instance learning: A survey of problem characteristics and applications. Pattern Recognit.
**2018**, 77, 329–353. [Google Scholar] [CrossRef] - Herrera, F.; Ventura, S.; Bello, R.; Cornelis, C.; Zafra, A.; Sanchez-Tarrago, D.; Vluymans, S. Multiple Instance Learning. Foundations and Algorithms; Springer: Berlin/Heidelberg, Germany, 2016; pp. 1–233. [Google Scholar]
- Dietterich, T.G.; Lathrop, R.H.; Lozano-Pérez, T. Solving the multiple instance problem with axis-parallel rectangles. Artif. Intell.
**1997**, 89, 31–71. [Google Scholar] [CrossRef] - Astorino, A.; Fuduli, A.; Gaudioso, M.; Vocaturo, E. Multiple Instance Learning Algorithm for Medical Image Classification. CEUR Workshop Proceedings. 2019, Volume 2400. Available online: http://ceur-ws.org/Vol-2400/paper-46.pdf (accessed on 25 September 2019).
- Astorino, A.; Fuduli, A.; Veltri, P.; Vocaturo, E. Melanoma detection by means of multiple instance learning. Interdiscip. Sci. Comput. Life Sci.
**2019**. [Google Scholar] [CrossRef] [PubMed] - Astorino, A.; Gaudioso, M.; Fuduli, A.; Vocaturo, E. A multiple instance learning algorithm for color images classification. In ACM International Conference Proceeding Series; ACM: New York, NY, USA, 2018; pp. 262–266. [Google Scholar]
- Quellec, G.; Cazuguel, G.; Cochener, B.; Lamard, M. Multiple-Instance Learning for Medical Image and Video Analysis. IEEE Rev. Biomed. Eng.
**2017**, 10, 213–234. [Google Scholar] [CrossRef] - Fu, G.; Nan, X.; Liu, H.; Patel, R.Y.; Daga, P.R.; Chen, Y.; Wilkins, D.E.; Doerksen, R.J. Implementation of multiple-instance learning in drug activity prediction. BMC Bioinform.
**2012**, 13. [Google Scholar] [CrossRef] - Zhao, Z.; Fu, G.; Liu, S.; Elokely, K.M.; Doerksen, R.J.; Chen, Y.; Wilkins, D.E. Drug activity prediction using multiple-instance learning via joint instance and feature selection. BMC BioInform.
**2013**, 14. [Google Scholar] [CrossRef] - Liu, B.; Xiao, Y.; Hao, Z. A selective multiple instance transfer learning method for text categorization problems. Knowl.-Based Syst.
**2018**, 141, 178–187. [Google Scholar] [CrossRef] - Kotsiantis, S.; Kanellopoulos, D. Multi-instance learning for bankruptcy prediction. In Proceedings of the 2008 Third International Conference on Convergence and Hybrid Information Technology, Busan, Korea, 11–13 November 2008; Volume 1, pp. 1007–1012. [Google Scholar]
- Briggs, F.; Lakshminarayanan, B.; Neal, L.; Fern, X.Z.; Raich, R.; Hadley, S.J.K.; Hadley, A.S.; Betts, M.G. Acoustic classification of multiple simultaneous bird species: A multi-instance multi-label approach. J. Acoust. Soc. Am.
**2012**, 131, 4640–4650. [Google Scholar] [CrossRef] - Gärtner, T.; Flach, P.A.; Kowalczyk, A.; Smola, A.J. Multi-instance kernels. In Proceedings of the 19th International Conference on Machine Learning, Sydney, Australia, 8–12 July 2002; pp. 179–186. [Google Scholar]
- Wang, J.; Zucker, J.D. Solving the multiple-instance problem: A lazy learning approach. In Proceedings of the Seventeenth International Conference on Machine Learning, Stanford, CA, USA, 29 June–2 July 2000; Morgan Kaufmann: San Francisco, CA, USA, 2000; pp. 1119–1126. [Google Scholar]
- Wen, C.; Zhou, M.; Li, Z. Multiple instance learning via bag space construction and ELM. In Proceedings of the International Society for Optical Engineering, Shanghai, China, 15–17 August 2018; Volume 10836. [Google Scholar]
- Wei, X.; Wu, J.; Zhou, Z. Scalable Algorithms for Multi-Instance Learning. IEEE Trans. Neural Netw. Learn. Syst.
**2017**, 28, 975–987. [Google Scholar] [CrossRef] - Andrews, S.; Tsochantaridis, I.; Hofmann, T. Support vector machines for multiple-instance learning. In Advances in Neural Information Processing Systems; Becker, S., Thrun, S., Obermayer, K., Eds.; MIT Press: Cambridge, UK, 2003; pp. 561–568. [Google Scholar]
- Astorino, A.; Fuduli, A.; Gaudioso, M. A Lagrangian relaxation approach for binary multiple instance classification. IEEE Trans. Neural Netw. Learn. Syst.
**2019**, 30, 2662–2671. [Google Scholar] [CrossRef] [PubMed] - Avolio, M.; Fuduli, A. A semi-proximal support vector machine approach for binary multiple instance learning. 2019; submitted. [Google Scholar]
- Bergeron, C.; Moore, G.; Zaretzki, J.; Breneman, C.; Bennett, K. Fast bundle algorithm for multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell.
**2012**, 34, 1068–1079. [Google Scholar] [CrossRef] [PubMed] - Mangasarian, O.; Wild, E. Multiple instance classification via successive linear programming. J. Optim. Theory Appl.
**2008**, 137, 555–568. [Google Scholar] [CrossRef] - Tseng, P. Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl.
**2001**, 109, 475–494. [Google Scholar] [CrossRef] - Fuduli, A.; Gaudioso, M.; Giallombardo, G. Minimizing nonconvex nonsmooth functions via cutting planes and proximity control. SIAM J. Optim.
**2004**, 14, 743–756. [Google Scholar] [CrossRef] - Vapnik, V. The Nature of the Statistical Learning Theory; Springer: New York, NY, USA, 1995. [Google Scholar]
- Fung, G.; Mangasarian, O. Proximal support vector machine classifiers. In Proceedings of the Seventh ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 26–29 August 2001; Provost, F., Srikant, R., Eds.; ACM: New York, NY, USA, 2001; pp. 77–86. [Google Scholar]
- Plastria, F.; Carrizosa, E.; Gordillo, J. Multi-instance classification through spherical separation and VNS. Comput. Oper. Res.
**2014**, 52, 326–333. [Google Scholar] [CrossRef] - Gaudioso, M.; Giallombardo, G.; Miglionico, G.; Vocaturo, E. Classification in the multiple instance learning framework via spherical separation. Soft Comput.
**2019**. [Google Scholar] [CrossRef] - Hansen, P.; Mladenović, N.; Moreno Pérez, J.A. Variable neighbourhood search: Methods and applications. 4OR
**2008**, 6, 319–360. [Google Scholar] [CrossRef] - Gaudioso, M.; Giallombardo, G.; Miglionico, G.; Bagirov, A.M. Minimizing nonsmooth DC functions via successive DC piecewise-affine approximations. J. Glob. Optim.
**2018**, 71, 37–55. [Google Scholar] [CrossRef] - Astorino, A.; Fuduli, A.; Gaudioso, M. DC models for spherical separation. J. Glob. Optim.
**2010**, 48, 657–669. [Google Scholar] [CrossRef] - Astorino, A.; Fuduli, A.; Gaudioso, M. Margin maximization in spherical separation. Comput. Optim. Appl.
**2012**, 53, 301–322. [Google Scholar] [CrossRef] - Astorino, A.; Gaudioso, M.; Seeger, A. Conic separation of finite sets. I. The homogeneous case. J. Convex Anal.
**2014**, 21, 1–28. [Google Scholar] - Astorino, A.; Gaudioso, M.; Seeger, A. Conic separation of finite sets. II. The non-homogeneous case. J. Convex Anal.
**2014**, 21, 819–831. [Google Scholar] - Le Thi, H.A.; Le, H.M.; Pham Dinh, T.; Van Huynh, N. Binary classification via spherical separator by DC programming and DCA. J. Glob. Optim.
**2013**, 56, 1393–1407. [Google Scholar] [CrossRef] - Astorino, A.; Fuduli, A. Semisupervised spherical separation. Appl. Math. Model.
**2015**, 39, 6351–6358. [Google Scholar] [CrossRef] - Wang, J.; Shen, X.; Pan, W. On efficient large margin semisupervised learning: Method and theory. J. Mach. Learn. Res.
**2009**, 10, 719–742. [Google Scholar] - Bagirov, A.M.; Taheri, S.; Ugon, J. Nonsmooth DC programming approach to the minimum sum-of-squares clustering problems. Pattern Recognit.
**2016**, 53, 12–24. [Google Scholar] [CrossRef] - Karmitsa, N.; Bagirov, A.M.; Taheri, S. New diagonal bundle method for clustering problems in large data sets. Eur. J. Oper. Res.
**2017**, 263, 367–379. [Google Scholar] [CrossRef] - Khalaf, W.; Astorino, A.; D’Alessandro, P.; Gaudioso, M. A DC optimization-based clustering technique for edge detection. Optim. Lett.
**2017**, 11, 627–640. [Google Scholar] [CrossRef] - Le Thi, H.; Pham Dinh, T. The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. J. Glob. Optim.
**2005**, 133, 23–46. [Google Scholar] - Astorino, A.; Miglionico, G. Optimizing sensor cover energy via DC programming. Optim. Lett.
**2016**, 10, 355–368. [Google Scholar] [CrossRef] - De Oliveira, W. Proximal bundle methods for nonsmooth DC programming. J. Glob. Optim.
**2019**, 75, 523–563. [Google Scholar] [CrossRef] - De Oliveira, W.; Tcheou, M.P. An inertial algorithm for DC programming. Set-Valued Var. Anal.
**2019**, 27, 895–919. [Google Scholar] [CrossRef] - Gaudioso, M.; Giallombardo, G.; Miglionico, G. Minimizing piecewise-concave functions over polytopes. Math. Oper. Res.
**2018**, 43, 580–597. [Google Scholar] [CrossRef] - Joki, K.; Bagirov, A.M.; Karmitsa, N.; Mäkelä, M.M. A proximal bundle method for nonsmooth DC optimization utilizing nonconvex cutting planes. J. Glob. Optim.
**2017**, 68, 501–535. [Google Scholar] [CrossRef] - Joki, K.; Bagirov, A.M.; Karmitsa, N.; Mäkelä, M.M.; Taheri, S. Double bundle method for finding Clarke stationary points in nonsmooth DC programming. Siam J. Optim.
**2018**, 28, 1892–1919. [Google Scholar] [CrossRef] [Green Version]

Data Set | Dimension | Instances | Bags |
---|---|---|---|

Elephant | 230 | 1320 | 200 |

Fox | 230 | 1320 | 200 |

Tiger | 230 | 1220 | 200 |

Musk-1 | 166 | 476 | 92 |

Musk-2 | 166 | 6598 | 102 |

Data Set | Dimension | Instances | Bags |
---|---|---|---|

TST1 | 6668 | 3224 | 400 |

TST2 | 6842 | 3344 | 400 |

TST3 | 6568 | 3246 | 400 |

TST4 | 6626 | 3391 | 400 |

TST7 | 7037 | 3367 | 400 |

TST9 | 6982 | 3300 | 400 |

TST10 | 7073 | 3453 | 400 |

Data Set | DC-MIL | MIL-RL | DC-SMIL | mi-SVM | MI-SVM | MICA | MIC${}^{\mathit{Bundle}}$ |
---|---|---|---|---|---|---|---|

Elephant | 84.0 | 83.0 | 84.5 | 82.2 | 81.4 | 80.5 | 80.5 |

Fox | 57.0 | 54.5 | 56.0 | 58.2 | 57.8 | 58.7 | 58.3 |

Tiger | 84.5 | 75.0 | 81.0 | 78.4 | 84.0 | 82.6 | 79.1 |

Musk-1 | 74.5 | 80.0 | 76.7 | - | - | - | 75.6 |

Musk-2 | 74.0 | 73.0 | 79.0 | - | - | - | 76.8 |

Data Set | DC-MIL | MIL-RL | mi-SVM | MI-SVM | MICA |
---|---|---|---|---|---|

TST01 | 94.3 | 95.5 | 93.6 | 93.9 | 94.5 |

TST02 | 80.0 | 85.5 | 78.2 | 84.5 | 85.0 |

TST03 | 86.5 | 86.8 | 87.0 | 82.2 | 86.0 |

TST04 | 86.0 | 79.8 | 82.8 | 82.4 | 87.7 |

TST07 | 79.8 | 83.5 | 81.3 | 78.0 | 78.9 |

TST09 | 68.3 | 68.8 | 67.5 | 60.2 | 61.4 |

TST10 | 78.0 | 77.5 | 79.6 | 79.5 | 82.3 |

Data Set | Train | Cpu | nF | nG1 | nG2 |
---|---|---|---|---|---|

Elephant | 91.0 | 3.14 | 500 | 243 | 208 |

Fox | 79.9 | 3.05 | 500 | 81 | 80 |

Tiger | 95.5 | 2.83 | 500 | 237 | 197 |

Musk-1 | 96.9 | 1.29 | 500 | 197 | 177 |

Musk-2 | 93.5 | 6.52 | 500 | 174 | 167 |

Data Set | Train | Cpu | nF | nG1 | nG2 |
---|---|---|---|---|---|

TST01 | 100.0 | 70.22 | 200 | 93 | 91 |

TST02 | 94.2 | 69.87 | 200 | 83 | 82 |

TST03 | 99.6 | 64.77 | 200 | 82 | 81 |

TST04 | 93.5 | 67.58 | 200 | 84 | 83 |

TST07 | 99.2 | 74.11 | 200 | 85 | 84 |

TST09 | 94.4 | 67.99 | 200 | 82 | 81 |

TST10 | 91.9 | 72.24 | 200 | 81 | 80 |

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Astorino, A.; Fuduli, A.; Giallombardo, G.; Miglionico, G.
SVM-Based Multiple Instance Classification via DC Optimization. *Algorithms* **2019**, *12*, 249.
https://doi.org/10.3390/a12120249

**AMA Style**

Astorino A, Fuduli A, Giallombardo G, Miglionico G.
SVM-Based Multiple Instance Classification via DC Optimization. *Algorithms*. 2019; 12(12):249.
https://doi.org/10.3390/a12120249

**Chicago/Turabian Style**

Astorino, Annabella, Antonio Fuduli, Giovanni Giallombardo, and Giovanna Miglionico.
2019. "SVM-Based Multiple Instance Classification via DC Optimization" *Algorithms* 12, no. 12: 249.
https://doi.org/10.3390/a12120249