# Tropical Balls and Its Applications to K Nearest Neighbor over the Space of Phylogenetic Trees

## Abstract

## 1. Introduction

- we show some properties of a tropical ball in the tropical projective torus;
- we show some properties of a tropical ball in a space of equidistant trees with a given set of leaves $\left[n\right]$;
- we compare tropical balls with balls defined with ${L}_{2}$ norm and ${L}_{\infty}$ norm;
- we define a tropical KNN algorithm; and
- we applied tropical KNN algorithm to simulated data generated by a multispecies coalescent model.

## 2. Preliminaries

## 3. Results

#### 3.1. Properties of Tropical Balls over the Space of Ultrametrics

#### 3.2. Examples in ${\mathcal{U}}_{4}$

#### 3.3. Approximation of a Tropical Ball

#### 3.4. Computational Results

`Mesquite`[18]. All computations are conducted in Apple Notebook MacBook Pro 2019 with 2.4 GHz 8-Core Intel Core i9 and 64 GB 2667 MHz DDR4. The

`R`code used for this simulation study can be found at polytopes.net/tropicalKNN.tar.

Algorithm 1: KNN Algorithm. | |

- Input: A data point $y\in {R}^{e}$ from a test set, a training set $\{{x}_{1},\dots {x}_{m}\}$, and a metric d. Positive integer $k>1$.
- Output: A class for y.
- Algorithm:
**for**$i=1,\dots ,m$**do**Compute $d(y,{x}_{i})$**end for**
| |

Order $d(y,{x}_{1}),\dots ,d(y,{x}_{m})$ from the smallest to the largest. Suppose $d(y,{x}_{{i}_{1}}),\dots ,d(y,{x}_{{i}_{k}})$ be the first k smallest distances. | |

Consider categories of ${x}_{{i}_{1}},\dots {x}_{{i}_{k}}$, that is, ${c}_{{i}_{1}},\dots ,{c}_{{i}_{k}}$ and assign the class, which is the biggest frequency among ${c}_{{i}_{1}},\dots ,{c}_{{i}_{k}}$, to y. |

`R`and we use the KNN algorithm implemented in the “class” package in

`R`[19].

`Mesquite`, available at http://mesquiteproject.org [18], to generate gene trees under the multispecies coalescent model. In this model, we have two parameters, the effective population size ${N}_{e}$ and species depth $SD$. In this simulation study, we set ${N}_{e}=100,000$ and varied

`Mesquite`.

## 4. Conclusions

## 5. Discussion

**Figure 1.**An equidistant tree with species ${S}_{1},\phantom{\rule{0.166667em}{0ex}}{S}_{2},\phantom{\rule{0.166667em}{0ex}}{S}_{3}$. Leaves in the tree represent observable species ${S}_{1},\phantom{\rule{0.166667em}{0ex}}{S}_{2},\phantom{\rule{0.166667em}{0ex}}{S}_{3}$ in the given set of labels and internal nodes in the tree represent their common ancestors. Filled black circles represent observable states and unfilled circles represent unobservable states. Number in each branch in the tree represent its branch length and the total branch lengths from the root to each leaf are same for all leaves.

**Figure 2.**The first example for visualizing a tropical ball. LEFT: The tree corresponding to the center of the tropical ball. RIGHT: The tropical ball centered around the ultrametric corresponding to the equidistant tree in ${\mathcal{U}}_{4}$.

**Figure 3.**The second example for visualizing a tropical ball. LEFT: The tree corresponding to the center of the tropical ball. RIGHT: The tropical ball centered round the ultrametric corresponding to the equidistant tree in ${\mathcal{U}}_{4}$.

**Figure 4.**The third example for visualizing a tropical ball. LEFT: The tree corresponding to the center of the tropical ball is the tree where $b=0$ and $c=0$ in the picture. In this case the center of the tropical ball is on the boundary between two orthants in the tree space. RIGHT: The tropical ball centered around the ultrametric corresponding to the equidistant tree in ${\mathcal{U}}_{4}$.

**Figure 5.**Accuracy Rates for the classical KNN, tropical KNN, and weighted tropical KNN on simulated coalescent models.

**Figure 6.**DBSCAN results for $c=5$ (

**Top**) and $c=10$ (

**Bottom**). The minpt = 5 for both cases, and esp = 1 for $c=5$ and eps = 0.5 for $c=10$.

