# Tropical Balls and Its Applications to K Nearest Neighbor over the Space of Phylogenetic Trees

## Abstract

**:**

## 1. Introduction

- we show some properties of a tropical ball in the tropical projective torus;
- we show some properties of a tropical ball in a space of equidistant trees with a given set of leaves $\left[n\right]$;
- we compare tropical balls with balls defined with ${L}_{2}$ norm and ${L}_{\infty}$ norm;
- we define a tropical KNN algorithm; and
- we applied tropical KNN algorithm to simulated data generated by a multispecies coalescent model.

## 2. Preliminaries

**Definition**

**1.**

**Theorem**

**1**

**Definition**

**2.**

**Theorem**

**2**

## 3. Results

#### 3.1. Properties of Tropical Balls over the Space of Ultrametrics

**Lemma**

**1.**

**Proposition**

**1.**

**Lemma**

**2.**

**Lemma**

**3.**

**Theorem**

**3.**

**Theorem**

**4.**

**Theorem**

**5.**

#### 3.2. Examples in ${\mathcal{U}}_{4}$

**Example**

**1.**

**Example**

**2.**

**Example**

**3.**

#### 3.3. Approximation of a Tropical Ball

**Proposition**

**2.**

**Proof.**

**Proposition**

**3.**

**Proof.**

**Theorem**

**6.**

#### 3.4. Computational Results

`Mesquite`[18]. All computations are conducted in Apple Notebook MacBook Pro 2019 with 2.4 GHz 8-Core Intel Core i9 and 64 GB 2667 MHz DDR4. The

`R`code used for this simulation study can be found at polytopes.net/tropicalKNN.tar.

Algorithm 1: KNN Algorithm. | |

- Input: A data point $y\in {R}^{e}$ from a test set, a training set $\{{x}_{1},\dots {x}_{m}\}$, and a metric d. Positive integer $k>1$.
- Output: A class for y.
- Algorithm:
**for**$i=1,\dots ,m$**do**Compute $d(y,{x}_{i})$**end for**
| |

Order $d(y,{x}_{1}),\dots ,d(y,{x}_{m})$ from the smallest to the largest. Suppose $d(y,{x}_{{i}_{1}}),\dots ,d(y,{x}_{{i}_{k}})$ be the first k smallest distances. | |

Consider categories of ${x}_{{i}_{1}},\dots {x}_{{i}_{k}}$, that is, ${c}_{{i}_{1}},\dots ,{c}_{{i}_{k}}$ and assign the class, which is the biggest frequency among ${c}_{{i}_{1}},\dots ,{c}_{{i}_{k}}$, to y. |

`R`and we use the KNN algorithm implemented in the “class” package in

`R`[19].

`Mesquite`, available at http://mesquiteproject.org [18], to generate gene trees under the multispecies coalescent model. In this model, we have two parameters, the effective population size ${N}_{e}$ and species depth $SD$. In this simulation study, we set ${N}_{e}=100,000$ and varied

`Mesquite`.

## 4. Conclusions

## 5. Discussion

**Problem**

**1.**

**Problem**

**2.**

## 6. Materials and Methods

**Proof**

**for**

**Lemma**

**1.**

**Proof**

**for**

**Theorem**

**5.**

**Proof**

**for**

**Lemma**

**2.**

**Proof**

**for**

**Lemma**

**3.**

**Proof**

**for**

**Theorem**

**3.**

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Conflicts of Interest

## References

- Maddison, W.P. Gene trees in species trees. Syst. Biol.
**1997**, 46, 523–536. [Google Scholar] [CrossRef] - Huson, D.H.; Klopper, T.; Lockhart, P.J.; Steel, M.A. Reconstruction of Reticulate Networks from Gene Trees; Research in Computational Molecular Biology, Proceedings; Springer: Berlin, Germany, 2005; pp. 233–249. [Google Scholar]
- Weisrock, D.W.; Shaffer, H.B.; Storz, B.L.; Storz, S.R.; Storz, S.R.; Voss, S.R. Multiple nuclear gene sequences identify phylogenetic species boundaries in the rapidly radiating clade of Mexican ambystomatid salamanders. Mol. Ecol.
**2006**, 15, 2489–2503. [Google Scholar] [CrossRef] [PubMed] - Taylor, J.W.; Jacobson, D.J.; Kroken, S.; Kasuga, T.; Geiser, D.M.; Hibbett, D.S.; Fisher, M.C. Phylogenetic species recognition and species concepts in fungi. Fungal Genet. Biol.
**2000**, 31, 21–32. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Owen, M.; Yoshida, R. Continuous Spaces of Phylogenetic Trees; 2020; in preparation. [Google Scholar]
- Speyer, D.; Sturmfels, B. Tropical mathematics. Math. Mag.
**2009**, 82, 163–173. [Google Scholar] [CrossRef] - Yoshida, R.; Zhang, L.; Zhang, X. Tropical Principal Component Analysis and its Application to Phylogenetics. Bull. Math. Biol.
**2019**, 81, 568–597. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Akian, M.; Gaubert, S.; Viorel, N.; Singer, I. Best approximation in max-plus semimodules. Linear Algebra Appl.
**2011**, 435, 3261–3296. [Google Scholar] [CrossRef] [Green Version] - Cohen, G.; Gaubert, S.; Quadrat, J. Duality and separation theorems in idempotent semimodules. Linear Algebra Appl.
**2004**, 379, 395–422. [Google Scholar] [CrossRef] [Green Version] - Monod, A.; Lin, B.; Yoshida, R.; Kang, Q. Tropical Geometry of Phylogenetic Tree Space: A Statistical Perspective. 2019. Available online: https://arxiv.org/pdf/1805.12400.pdf (accessed on 9 March 2021).
- Fix, E.; Hodges, J. Discriminatory Analysis, Nonparametric Discrimination: Consistency Properties; Technical Report 4; USAF School of Aviation Medicine, Randolph Field: San Antonio, TX, USA, 1951. [Google Scholar]
- Saadatfar, H.; Khosravi, S.; Joloudari, J.H.; Mosavi, A.; Shamshirband, S. A New K-Nearest Neighbors Classifier for Big Data Based on Efficient Data Pruning. Mathematics
**2020**, 8, 286. [Google Scholar] [CrossRef] [Green Version] - Terrell, G.R.; Scott, D.W. Variable Kernel Density Estimation. Ann. Stat.
**1992**, 20, 1236–1265. [Google Scholar] [CrossRef] - Costa, J.A.; Hero, A.O. Manifold learning using Euclidean k-nearest neighbor graphs [image processing examples]. In Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal, QC, Canada, 17–21 May 2004; Volume 3, p. iii-988. [Google Scholar] [CrossRef]
- Maclagan, D.; Sturmfels, B. Introduction to Tropical Geometry; Vol. 161, Graduate Studies in Mathematics; Graduate Studies in Mathematics, 161, American Mathematical Society: Providence, RI, USA, 2015. [Google Scholar]
- Billera, L.; Holmes, S.; Vogtmann, K. Geometry of the space of phylogenetic trees. Adv. Appl. Math.
**2001**, 27, 733–767. [Google Scholar] [CrossRef] [Green Version] - Lin, B.; Sturmfels, B.; Tang, X.; Yoshida, R. Convexity in Tree Spaces. SIAM Discret. Math.
**2017**, 3, 2015–2038. [Google Scholar] [CrossRef] [Green Version] - Maddison, W.P.; Maddison, D. Mesquite: A Modular System for Evolutionary Analysis. Version 2.72. 2009. Available online: http://mesquiteproject.org (accessed on 9 March 2021).
- Ripley, B. Package “Class”. 2020. Available online: http://www.stats.ox.ac.uk/pub/MASS4/ (accessed on 9 March 2021).

**Figure 1.**An equidistant tree with species ${S}_{1},\phantom{\rule{0.166667em}{0ex}}{S}_{2},\phantom{\rule{0.166667em}{0ex}}{S}_{3}$. Leaves in the tree represent observable species ${S}_{1},\phantom{\rule{0.166667em}{0ex}}{S}_{2},\phantom{\rule{0.166667em}{0ex}}{S}_{3}$ in the given set of labels and internal nodes in the tree represent their common ancestors. Filled black circles represent observable states and unfilled circles represent unobservable states. Number in each branch in the tree represent its branch length and the total branch lengths from the root to each leaf are same for all leaves.

**Figure 2.**The first example for visualizing a tropical ball. LEFT: The tree corresponding to the center of the tropical ball. RIGHT: The tropical ball centered around the ultrametric corresponding to the equidistant tree in ${\mathcal{U}}_{4}$.

**Figure 3.**The second example for visualizing a tropical ball. LEFT: The tree corresponding to the center of the tropical ball. RIGHT: The tropical ball centered round the ultrametric corresponding to the equidistant tree in ${\mathcal{U}}_{4}$.

**Figure 4.**The third example for visualizing a tropical ball. LEFT: The tree corresponding to the center of the tropical ball is the tree where $b=0$ and $c=0$ in the picture. In this case the center of the tropical ball is on the boundary between two orthants in the tree space. RIGHT: The tropical ball centered around the ultrametric corresponding to the equidistant tree in ${\mathcal{U}}_{4}$.

**Figure 5.**Accuracy Rates for the classical KNN, tropical KNN, and weighted tropical KNN on simulated coalescent models.

**Figure 6.**DBSCAN results for $c=5$ (

**Top**) and $c=10$ (

**Bottom**). The minpt = 5 for both cases, and esp = 1 for $c=5$ and eps = 0.5 for $c=10$.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Yoshida, R.
Tropical Balls and Its Applications to K Nearest Neighbor over the Space of Phylogenetic Trees. *Mathematics* **2021**, *9*, 779.
https://doi.org/10.3390/math9070779

**AMA Style**

Yoshida R.
Tropical Balls and Its Applications to K Nearest Neighbor over the Space of Phylogenetic Trees. *Mathematics*. 2021; 9(7):779.
https://doi.org/10.3390/math9070779

**Chicago/Turabian Style**

Yoshida, Ruriko.
2021. "Tropical Balls and Its Applications to K Nearest Neighbor over the Space of Phylogenetic Trees" *Mathematics* 9, no. 7: 779.
https://doi.org/10.3390/math9070779