# Computational and Statistical Analyses of Insertional Polymorphic Endogenous Retroviruses in a Non-Model Organism

^{1}

^{2}

^{3}

^{4}

^{5}

^{6}

^{7}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

## 2. Experimental Section

#### 2.1. Fluorescent In Situ Hybridization (FISH)

#### 2.2. Animal Samples

#### 2.3. Junction Fragment Analysis

#### 2.4. De Novo Clustering of Host-Virus Junction Fragments

#### 2.5. Mixture Model

_{ij}follows a Poisson distribution with mean λ

_{ij}:

_{ij}= λ

_{i}× λ

_{j}and furthermore, when animal j does not carry CrERV i, the read count follows a truncated Geometric distribution with parameter 0 < p < 1:

_{ij}= α

_{i}× β

_{j}where α

_{i}and β

_{j}are CrERV and animal specific parameters, respectively. In Equation (2), the geometric distribution is the discrete analogue of the exponential distribution where the probability mass function decreases with the number of false positive counts; we take K = 9, and the probability of a false positive may be found by 1 – P(n

_{ij}= 0). The truncation means if at least K + 1, or 10, reads are observed, then the corresponding CrERV must be present according to our model. The likelihood of above mixture model is:

_{i}is the mixing probability of the read count being generated from a Poisson distribution and can be interpreted as the prevalence of CrERV i among the N individuals. F(n

_{ij}|λ

_{ij}) is the probability mass function of the Poisson distribution given in Equation (1) and g(n

_{ij}|p) is the probability mass function of the truncated geometric distribution given in Equation (2). The parameter estimation can be carried out efficiently by a standard computational statistical tool known as the expectation-maximization algorithm [34].

#### 2.6. Validation of Mixture Model via Replicated Individuals

#### 2.7. Principal Component Analysis

#### 2.8. Determine the Relationship of Animals via Ensemble Cluster

_{ij}= 1 indicates animal j carries CrERV i. Letting X = Y

^{T}Y be a consensus co-association matrix that integrates the clustering solutions inferred by multiple CrERV [35], we see X

_{ij}is the number of shared viruses between animals i and j. This approach can be extended by replacing the binary matrix Y by a probability matrix, Z, estimated from a mixture model, where Y

_{ij}is the probability that individual j carries CrERV i.

^{T}WY, where W is an N by N diagonal matrix and W

_{j}is the weight for the jth virus. We expect that CrERVs that have recently integrated into the genome will have a low prevalence in the population and take W

_{j}to be inversely proportional to the estimated prevalence of CrERV j in the mixture model.

#### 2.9. Visualization of Animal Relatedness by Hierarchical Clustering

#### 2.10. Phylogenetic Analysis

## 3. Results and Discussion

#### 3.1. Overview of Research Objectives and Experimental Design

#### 3.2. Fluorescent In Situ Hybridization Analysis of CrERV Locations

#### 3.3. De Novo Clustering Analyses of CrERV-Host Junction Fragments

**Figure 1.**A schematic diagram of the workflow. For illustration purposes, three different CrERV integration sites, indicated by red, orange and blue colors, are shown from three different animals (marked MD1, 2 and 3). The CrERV-host junction fragment is enriched by PCR from the DNA of each animal and libraries are prepared, and then pooled and sequenced on the Ion Torrent platform. This results in a dataset of reads from all the CrERV-host junction fragments. These reads are clustered using the clustering pipeline to obtain three clusters representing the red, orange, and blue host genomic regions. Reads in a cluster are then separated into their animals of origin to obtain a read table. The read table is processed using mixture modeling to obtain the probability of correct assignment of each CrERV-host junction fragment clusters to each animal in the sample. In actual sequence data with higher read counts, low probabilities would reflect a spurious assignment of a read for a specific CrERV to an animal.

**Figure 2.**Fluorescent in situ hybridization of genomic distribution of CrERV integrations. Two nuclei of the mule deer cell line [36] are shown, one interphase (left) and the other in metaphase (right). The signal from Alexa Fluor 594 labeled CrERV probe is shown in red, the 4',6-diamidino-2-phenylindole (DAPI)-stained chromosomal DNA in blue. CrERV integration sites appear distributed along each chromosome.

#### 3.4. Estimating the Probability of a CrERV Assignment Using a Mixture Model

**Figure 3.**Features of clusters representing CrERV-host junction fragments obtained by de novo clustering. (

**a**) Frequency distribution of inter-cluster distances. The x-axis is a measure of the pairwise distance between clusters and the y-axis represents the frequency of all pairwise distances for the data set of 3160 cluster consensus sequences. The pairwise distance among clusters will be small if clusters are derived from closely related sequences, e.g., from a similar repeat region, or if two sets of reads from different regions are merged. The peak in the frequency distribution near 2 in our data indicates that the sequences returned by our clustering are well separated. Inset shows an expanded scale for inter-cluster distances between 1.1 and 1.65. The threshold for combining closely related clusters was chosen as 1.2; (

**b**) Frequency distribution of cluster diameters. The y-axis represents the frequency that a cluster diameter is observed and the x-axis represents cluster diameters, which are computed as the average distance of each read assigned to the cluster to the cluster consensus sequence. The majority of clusters have value close to one, indicating near perfect identity of individual reads in the cluster to the consensus sequence.

**Figure 4.**Distribution of reads assigned to CrERV. (

**a**) The frequency distribution of the average number of sequences assigned to CrERVs assigned to each of the 55 animals, which includes 10 replicate animals. Variation in read count can occur because of differences in sample preparation, pooling, and between individual sequencing runs. The data show that there is, on average, a read count of 50 for CrERVs from most animals, but two animals have an average of over 200 reads per CrERV; (

**b**) The frequency distribution of non-zero read count for the 3160 CrERV. The majority of CrERV integration sites are represented by a read count of between 1 and 100 but four CrERVs have over 850 reads.

**Figure 5.**Distribution of non-zero read count less than 10 for each CrERV. The histogram shows the proportion of the 3160 CrERV integration sites in 55 animals that have read counts from one to nine.

**Table 1.**The proportion of mismatched CrERV between two replicates for 10 animals. The data depict the number of mismatches at read count thresholds for 1–10 and for the mixture model. The read count threshold that minimizes the difference between replicates is in bold.

Animal ID | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | Mixture Model |
---|---|---|---|---|---|---|---|---|---|---|---|

M191 | 0.083 | 0.048 | 0.039 | 0.044 | 0.05 | 0.053 | 0.059 | 0.065 | 0.068 | 0.076 | 0.049 |

M389 | 0.174 | 0.101 | 0.067 | 0.047 | 0.038 | 0.033 | 0.034 | 0.034 | 0.037 | 0.034 | 0.034 |

M350 | 0.196 | 0.106 | 0.061 | 0.041 | 0.035 | 0.034 | 0.029 | 0.029 | 0.027 | 0.025 | 0.025 |

M261 | 0.061 | 0.032 | 0.033 | 0.033 | 0.04 | 0.039 | 0.04 | 0.043 | 0.042 | 0.041 | 0.033 |

M369 | 0.11 | 0.04 | 0.027 | 0.025 | 0.028 | 0.027 | 0.026 | 0.021 | 0.023 | 0.024 | 0.028 |

M167 | 0.157 | 0.079 | 0.053 | 0.047 | 0.04 | 0.035 | 0.035 | 0.034 | 0.035 | 0.035 | 0.047 |

M371 | 0.208 | 0.094 | 0.057 | 0.047 | 0.041 | 0.037 | 0.035 | 0.034 | 0.038 | 0.039 | 0.051 |

M376 | 0.07 | 0.055 | 0.06 | 0.064 | 0.066 | 0.071 | 0.072 | 0.075 | 0.08 | 0.081 | 0.062 |

M272 | 0.249 | 0.127 | 0.077 | 0.064 | 0.061 | 0.059 | 0.056 | 0.06 | 0.063 | 0.064 | 0.038 |

M273 | 0.103 | 0.057 | 0.042 | 0.034 | 0.030 | 0.027 | 0.028 | 0.030 | 0.028 | 0.027 | 0.027 |

#### 3.5. CrERV Distribution in Mule Deer

^{−6}or less of correct assignment to any animal. The histogram of pairwise distance of cluster sequences from these low probability CrERVs to all CrERVs is bimodal with peaks near 1.4 and 1.7 (Figure 7). The CrERV integration site sequences represented by the second peak are likely unique sequences based on the large pairwise distance between them and all other identified CrERV integration sites.

**Figure 6.**CrERV prevalence among animals and the number of CrERV integrations sites per animal estimated from the mixture model. (

**a**) A histogram of the total expected number of animals in which a CrERV insertion site was identified according to the mixture model after merging the replicate animals. These data are derived as the sum of probabilities for each of the 3160 CrERV in the 45 animals. Various percentiles of the distribution are shown by dotted lines; the median number of animals in which any CrERV is found is 1.01 and only 5% of animals share 15 or more CrERVs; (

**b**) The estimated number of CrERVs for each of the 45 animals is calculated by summing all probabilities of a CrERV integration for each animal. The horizontal axis gives the range of values, and the height of the corresponding bar gives the number of animals in that range. These data demonstrate that the majority of animals have between 200 and 280 CrERV integrations.

**Figure 7.**Inter-cluster distance for CrERVs with low probability of assignment to any animal. The histogram shows the frequency distribution of pairwise distances for cluster sequences representing the subset of 479 CrERVs with an estimated probability of correct assignment less than 10

^{−6}.

**Figure 8.**The heatmap depicts the entries in a square matrix in which the (i, j) entry is the number of CrERV assignments in common, as predicted by the mixture model, for animals i and j. Entries along the diagonal are the number of predicted CrERVs total for each animal. Darker (redder) colors indicate larger values, and the rows and columns are sorted automatically by the plotting function so as to keep similar columns close together. A total of 55 animals, including the replicates, are shown.

#### 3.6. The Relatedness of Mule Deer Based on Shared CrERVs

**Figure 9.**Hierarchical clustering results of ensemble cluster data depicting relatedness of animals. (

**a**) All CrERVs contribute equally; (

**b**) low-frequency CrERV are weighted to increase their contribution. The depth of the branch indicates the p-value of two animals carrying CrERVs independently. The replicate animals are merged in this analysis, which is based on 45 animals. Blue underlines indicate the two outlier groups shown also in Figure 8 and Figure 10. Stars indicate the animals that change positions when low-frequency CrERV are weighted (red centers are Montana deer, yellow centers are Oregon deer. The gray centered cluster has gained support and repositioned three animals when weighting of low-frequency CrERVs is imposed. Blue brackets indicate those animal groups also supported by phylogenetic analysis shown in Figure 10.

**Figure 10.**Unrooted consensus population trees obtained from MrBayes using different probability cutoffs. The top and bottom rows represent a probability cutoff for the presence of a CrERV of 0.01 and 0.99, respectively. The left and right columns represent the full dataset and a reduced dataset of only loci polymorphic for a CrERV, respectively. The analysis is based on the complete data set of 55 animals, which includes the 10 replicates. Oregon mule deer are shown in blue, Montana mule deer in red, replicate Montana mule deer in green, and Oregon black tail deer in magenta. Nodes have the following colors: Black: ≥ 0.95 posterior; Gray: ≥ 0.75 and < 0.95 posterior; White: < 0.75 posterior.

**Figure 11.**Map displaying geographical location and relatedness of animals. Points on the map give locations of the samples taken from the 45 animals. Squares denote blacktail deer; triangles and circles are mule deer sampled in Montana and Oregon, respectively. Points of the same color represent related groups of two to six animals shown in the bottom dendrogram of Figure 9. Inset displays the first two (scaled) principal component scores, after rotation indicated by the dotted gray PC score axes, derived from the 1268 × 1268 correlation matrix of the vectors of mixture-model estimated probabilities for the 1268 viruses present in at least two animals.

#### 3.7. Discussion

## 4. Conclusions

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## References

- Schnable, P.S.; Ware, D.; Fulton, R.S.; Stein, J.C.; Wei, F.; Pasternak, S.; Liang, C.; Zhang, J.; Fulton, L.; Graves, T.A.; et al. The B73 maize genome: Complexity, diversity, and dynamics. Science
**2009**, 326, 1112–1115. [Google Scholar] [CrossRef] [PubMed] - De Koning, A.P.J.; Gu, W.; Castoe, T.A.; Batzer, M.A.; Pollock, D.D. Repetitive elements may comprise over two-thirds of the human genome. PLoS Genet.
**2011**, 7. [Google Scholar] [CrossRef] - Lander, E.S.; Linton, L.M.; Birren, B.; Nusbaum, C.; Zody, M.C.; Baldwin, J.; Devon, K.; Dewar, K.; Doyle, M.; FitzHugh, W.; et al. Initial sequencing and analysis of the human genome. Nature
**2001**, 409, 860–921. [Google Scholar] [PubMed] - Kazazian, H.H. Mobile elements: Drivers of genome evolution. Science
**2004**, 303, 1626–1632. [Google Scholar] [CrossRef] [PubMed] - Bourque, G.; Leong, B.; Vega, V.B.; Chen, X.; Lee, Y.L.; Srinivasan, K.G.; Chew, J.L.; Ruan, Y.; Wei, C.L.; Ng, H.H.; et al. Evolution of the mammalian transcription factor binding repertoire via transposable elements. Genome Res.
**2008**, 18, 1752–1762. [Google Scholar] [PubMed] - Feschotte, C. Transposable elements and the evolution of regulatory networks. Nat. Rev. Genet.
**2008**, 9, 397–405. [Google Scholar] [CrossRef] [PubMed] - Jern, P.; Coffin, J.M. Effects of retroviruses on host genome function. Annu. Rev. Genet.
**2008**, 42, 709–732. [Google Scholar] [CrossRef] - Feschotte, C.; Gilbert, C. Endogenous viruses: Insights into viral evolution and impact on host biology. Nat. Rev. Genet.
**2012**, 13, 283–296. [Google Scholar] [CrossRef] [PubMed] - Stoye, J.P. Studies of endogenous retroviruses reveal a continuing evolutionary saga. Nat. Rev. Microbiol.
**2012**, 10, 395–406. [Google Scholar] [PubMed] - Marchi, E.; Kanapin, A.; Magiorkinis, G.; Belshaw, R. Unfixed endogenous retroviral insertions in the human population. J. Virol.
**2014**, 148. [Google Scholar] [CrossRef] - Belshaw, R.; Dawson, A.L.A.; Woolven, A.J.; Redding, J.; Burt, A.; Tristem, M. Genomewide screening reveals high levels of insertional polymorphism in the human endogenous retrovirus family HERV-K(HML2): Implications for present-day activity. J. Virol.
**2005**, 79, 12507–12514. [Google Scholar] [CrossRef] [PubMed] - Elleder, D.; Kim, O.; Padhi, A.; Bankert, J.G.; Simeonov, I.; Schuster, S.C.; Wittekindt, N.E.; Motameny, S.; Poss, M. Polymorphic integrations of an endogenous gammaretrovirus in the mule deer genome. J. Virol.
**2012**, 86, 2787–2796. [Google Scholar] [CrossRef] [PubMed] - Ávila-Arcos, M.C.; Ho, S.Y.W.; Ishida, Y.; Nikolaidis, N.; Tsangaras, K.; Hönig, K.; Medina, R.; Rasmussen, M.; Fordyce, S.L.; Calvignac-Spencer, S.; et al. One hundred twenty years of koala retrovirus evolution determined from museum skins. Mol. Biol. Evol.
**2013**, 30, 299–304. [Google Scholar] [CrossRef] [PubMed] - Gilbert, C.; Ropiquet, A.; Hassanin, A. Mitochondrial and nuclear phylogenies of Cervidae (Mammalia, Ruminantia): Systematics, morphology, and biogeography. Mol. Phylogenet. Evol.
**2006**, 40, 101–117. [Google Scholar] [CrossRef] [PubMed] - Hedges, S.B.; Dudley, J.; Kumar, S. TimeTree: A public knowledge-base of divergence times among organisms. Bioinformatics
**2006**, 22, 2971–2972. [Google Scholar] [PubMed] - Slotkin, R.K.; Martienssen, R. Transposable elements and the epigenetic regulation of the genome. Nat. Rev. Genet.
**2007**, 8, 272–285. [Google Scholar] [CrossRef] [PubMed] - Contreras-Galindo, R.; Kaplan, M.H.; Leissner, P.; Verjat, T.; Ferlenghi, I.; Bagnoli, F.; Giusti, F.; Dosik, M.H.; Hayes, D.F.; Gitlin, S.D.; et al. Human endogenous retrovirus K (HML-2) elements in the plasma of people with lymphoma and breast cancer. J. Virol.
**2008**, 82, 9329–9336. [Google Scholar] [CrossRef] [PubMed] - Kewitz, S.; Staege, M.S. Expression and Regulation of the Endogenous Retrovirus 3 in Hodgkin’s Lymphoma Cells. Front. Oncol.
**2013**, 3. [Google Scholar] [CrossRef] - Huang, G.; Li, Z.; Wan, X.; Wang, Y.; Dong, J. Human endogenous retroviral K element encodes fusogenic activity in melanoma cells. J Carcinog
**2013**, 12. [Google Scholar] [CrossRef] - Takeuchi, K.; Katsumata, K.; Ikeda, H.; Minami, M.; Wakisaka, A.; Yoshiki, T. Expression of endogenous retroviruses, ERV3 and lambda 4-1, in synovial tissues from patients with rheumatoid arthritis. Clin. Exp. Immunol.
**1995**, 99, 338–344. [Google Scholar] [PubMed] - García-Montojo, M.; de la Hera, B.; Varadé, J.; de la Encarnación, A.; Camacho, I.; Domínguez-Mozo, M.; Arias-Leal, A.; García-Martínez, Á.; Casanova, I.; Izquierdo, G.; et al. HERV-W polymorphism in chromosome X is associated with multiple sclerosis risk and with differential expression of MSRV. Retrovirology
**2014**, 11. [Google Scholar] [CrossRef] - Treangen, T.J.; Salzberg, S.L. Repetitive DNA and next-generation sequencing: Computational challenges and solutions. Nat Rev Genet.
**2013**, 13, 36–46. [Google Scholar] - Contreras-Galindo, R.; Kaplan, M.H.; He, S.; Contreras-Galindo, A.C.; Gonzalez-Hernandez, M.J.; Kappes, F.; Dube, D.; Chan, S.M.; Robinson, D.; Meng, F.; et al. HIV infection reveals widespread expansion of novel centromeric human endogenous retroviruses. Genome Res.
**2013**, 23, 1505–1513. [Google Scholar] [CrossRef] [PubMed] - Li, J.; Akagi, K.; Hu, Y.; Trivett, A.L.; Hlynialuk, C.J.W.; Swing, D.A.; Volfovsky, N.; Morgan, T.C.; Golubeva, Y.; Stephens, R.M.; et al. Mouse endogenous retroviruses can trigger premature transcriptional termination at a distance. Genome Res.
**2012**, 22, 870–884. [Google Scholar] [CrossRef] [PubMed] - Li, N.; Carrel, L. Escape from X chromosome inactivation is an intrinsic property of the Jarid1c locus. Proc. Natl. Acad. Sci. USA
**2008**, 105, 17055–17060. [Google Scholar] [CrossRef] [PubMed] - Miller, A.; Gustashaw, K.; Wolff, D.J.; Rider, S.H.; Monaco, A.P.; Eble, B.; Schlessinger, D.; Gorski, J.L.; van Ommen, G.J.; Weissenbach, J. Three genes that escape X chromosome inactivation are clustered within a 6 Mb YAC contig and STS map in X
_{p11.21–p11.22}. Hum. Mol. Genet.**1995**, 4, 731–739. [Google Scholar] [CrossRef] [PubMed] - Iskow, R.C.; McCabe, M.T.; Mills, R.E.; Torene, S.; Pittard, W.S.; Neuwald, A.F.; van Meir, E.G.; Vertino, P.M.; Devine, S.E. Natural mutagenesis of human genomes by endogenous retrotransposons. Cell
**2010**, 141, 1253–1261. [Google Scholar] [CrossRef] [PubMed] - Witherspoon, D.J.; Xing, J.; Zhang, Y.; Watkins, W.S.; Batzer, M.A.; Jorde, L.B. Mobile element scanning (ME-Scan) by targeted high-throughput sequencing. BMC Genomics
**2010**, 11. [Google Scholar] [CrossRef] [PubMed] - Ray, A.; Rahbari, R.; Badge, R.M. IAP Display: A Simple Method to Identify Mouse Strain Specific IAP Insertions. Mol. Biotechnol.
**2011**, 47, 243–252. [Google Scholar] [CrossRef] [PubMed] - Ciuffi, A.; Ronen, K.; Brady, T.; Malani, N.; Wang, G.; Berry, C.C.; Bushman, F.D. Methods for integration site distribution analyses in animal cell genomes. Methods
**2009**, 47, 261–268. [Google Scholar] [CrossRef] [PubMed] - Kamath, P.; Elleder, D.; Bao, L. The Population History of Endogenous Retroviruses in Mule Deer (Odocoileus hemionus). J. Hered.
**2014**, 105, 173–187. [Google Scholar] [CrossRef] [PubMed] - Malhotra, R.; Elleder, D.; Bao, L.; Hunter, D.; Acharya, R.; Poss, M. Clustering Pipeline for Determining Consensus Sequences in Targeted Next-Generation Sequencing. ArXiv E-Prints
**2014**. arXiv:1410.1608. [Google Scholar] - Edgar, R.C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics
**2010**, 26, 2460–2461. [Google Scholar] [CrossRef] [PubMed] - Dempster, A.; Laird, N.; Rubin, D. Maximum Likelihood from Incomplete Data via the EM Algorithm. J. R. Stat. Soc. B
**1977**, 39, 1–38. [Google Scholar] - Strehl, A.; Ghosh, J. Cluster Ensembles—A Knowledge Reuse Framework for Combining Multiple Partitions. J. Mach. Learn. Res.
**2002**, 3, 583–617. [Google Scholar] - Raymond, G.; Olsen, E.; Lee, K. Inhibition of protease-resistant prion protein formation in a transformed deer cell line infected with chronic wasting disease. J.Virol.
**2006**, 80, 596–604. [Google Scholar] [CrossRef] [PubMed] - Dunn, J.C.A. Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters. J. Cybern.
**1973**, 3, 32–57. [Google Scholar] [CrossRef] - Davies, D.L.; Bouldin, D.W. A Cluster Separation Measure. IEEE Trans. Pattern Anal. Mach. Intell.
**1979**, PAMI-1, 224–227. [Google Scholar] - Latch, E.K.; Heffelfinger, J.R.; Fike, J.A.; Rhodes, O.E. Species-wide phylogeography of North American mule deer (Odocoileus hemionus): Cryptic glacial refugia and postglacial recolonization. Mol. Ecol.
**2009**, 18, 1730–1745. [Google Scholar] [CrossRef] [PubMed] - Ilie, L.; Fazayeli, F.; Ilie, S. HiTEC: Accurate error correction in high-throughput sequencing data. Bioinformatics
**2011**, 27, 295–302. [Google Scholar] [CrossRef] [PubMed] - Kelley, D.R.; Schatz, M.C.; Salzberg, S.L. Quake: Quality-aware detection and correction of sequencing errors. Genome Biol.
**2010**, 11. [Google Scholar] [CrossRef] [PubMed] - Liu, Y.; Schröder, J.; Schmidt, B. Musket: A multistage k-mer spectrum-based error corrector for Illumina sequence data. Bioinformatics
**2013**, 29, 308–315. [Google Scholar] [CrossRef] [PubMed] - Liu, Y.; Schmidt, B.; Maskell, D.L. DecGPU: Distributed error correction on massively parallel graphics processing units using CUDA and MPI. BMC Bioinformatics
**2011**, 12. [Google Scholar] [CrossRef] [PubMed] - Medvedev, P.; Scott, E.; Kakaradov, B.; Pevzner, P. Error correction of high-throughput sequencing datasets with non-uniform coverage. Bioinformatics
**2011**, 27, i137–i141. [Google Scholar] [CrossRef] [PubMed] - Lindsay, B.G. Mixture Models: Theory, Geometry and Applications; Institute of Mathematical Statistics and American Statistical Association: Philadelphia, PA, USA, 1996. [Google Scholar]
- McLachlan, J.G.; Krishnan, T. EM Algorithm Extensions. In Wiley Series in Probability and Statistics; John Wiley & Sons, Inc.: New York, NY, USA, 1997. [Google Scholar]
- De Queiroz, A.; Gatesy, J. The supermatrix approach to systematics. Trends Ecol. Evol.
**2007**, 22, 34–41. [Google Scholar] [CrossRef] [PubMed] - Rokas, A.; Williams, B.L.; King, N.; Carroll, S.B. Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature
**2003**, 425, 798–804. [Google Scholar] [CrossRef] [PubMed] - Kolaczkowski, B.; Thornton, J.W. Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous. Nature
**2004**, 431, 461–463. [Google Scholar] [CrossRef] - Gadagkar, S.R.; Rosenberg, M.S.; Kumar, S. Inferring species phylogenies from multiple genes: Concatenated sequence tree versus consensus gene tree. J. Exp. Zool. B. Mol. Dev. Evol.
**2005**, 304, 64–74. [Google Scholar] [CrossRef] [PubMed] - Mossel, E.; Vigoda, E. Phylogenetic MCMC algorithms are misleading on mixtures of trees. Science
**2005**, 309, 2207–2209. [Google Scholar] [CrossRef] [PubMed] - Edwards, S.V.; Liu, L.; Pearl, D.K. High-resolution species trees without concatenation. Proc. Natl. Acad. Sci. USA
**2007**, 104, 5936–5941. [Google Scholar] [CrossRef] [PubMed] - Kubatko, L.S.; Degnan, J.H. Inconsistency of phylogenetic estimates from concatenated data under coalescence. Syst. Biol.
**2007**, 56, 17–24. [Google Scholar] [CrossRef] [PubMed] - Degnan, J.H.; Rosenberg, N.A. Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends Ecol. Evol.
**2014**, 24, 332–340. [Google Scholar] [CrossRef] - Rannala, B.; Yang, Z. Phylogenetic inference using whole genomes. Annu. Rev. Genomics Hum. Genet.
**2008**, 9, 217–231. [Google Scholar] [CrossRef] [PubMed] - Degnan, J.H.; Rosenberg, N.A. Discordance of species trees with their most likely gene trees. PLoS Genet.
**2006**, 2. [Google Scholar] [CrossRef] [PubMed] - Degnan, J.H. Anomalous unrooted gene trees. Syst. Biol.
**2013**, 62, 574–590. [Google Scholar] [CrossRef] [PubMed] - Rosenberg, N.A.; Tao, R. Discordance of species trees with their most likely gene trees: The case of five taxa. Syst. Biol.
**2008**, 57, 131–140. [Google Scholar] [CrossRef] - Rosenberg, N.A. Discordance of species trees with their most likely gene trees: A unifying principle. Mol. Biol. Evol.
**2013**, 30, 2709–2713. [Google Scholar] [CrossRef] [PubMed] - Heled, J.; Drummond, A.J. Bayesian inference of species trees from multilocus data. Mol. Biol. Evol.
**2010**, 27, 570–580. [Google Scholar] [CrossRef] [PubMed] - Liu, L. BEST: Bayesian estimation of species trees under the coalescent model. Bioinformatics
**2008**, 24, 2542–2543. [Google Scholar] [CrossRef] [PubMed] - Jewett, E.M.; Rosenberg, N.A. iGLASS: An improvement to the GLASS method for estimating species trees from gene trees. J. Comput. Biol.
**2012**, 19, 293–315. [Google Scholar] [CrossRef] [PubMed] - Pickrell, J.K.; Pritchard, J.K. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet.
**2012**, 8. [Google Scholar] [CrossRef] [PubMed]

© 2014 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Bao, L.; Elleder, D.; Malhotra, R.; DeGiorgio, M.; Maravegias, T.; Horvath, L.; Carrel, L.; Gillin, C.; Hron, T.; Fábryová, H.;
et al. Computational and Statistical Analyses of Insertional Polymorphic Endogenous Retroviruses in a Non-Model Organism. *Computation* **2014**, *2*, 221-245.
https://doi.org/10.3390/computation2040221

**AMA Style**

Bao L, Elleder D, Malhotra R, DeGiorgio M, Maravegias T, Horvath L, Carrel L, Gillin C, Hron T, Fábryová H,
et al. Computational and Statistical Analyses of Insertional Polymorphic Endogenous Retroviruses in a Non-Model Organism. *Computation*. 2014; 2(4):221-245.
https://doi.org/10.3390/computation2040221

**Chicago/Turabian Style**

Bao, Le, Daniel Elleder, Raunaq Malhotra, Michael DeGiorgio, Theodora Maravegias, Lindsay Horvath, Laura Carrel, Colin Gillin, Tomáš Hron, Helena Fábryová,
and et al. 2014. "Computational and Statistical Analyses of Insertional Polymorphic Endogenous Retroviruses in a Non-Model Organism" *Computation* 2, no. 4: 221-245.
https://doi.org/10.3390/computation2040221