Next Article in Journal
Proper Authentication of Ancient DNA Is Still Essential
Next Article in Special Issue
Testing for Polytomies in Phylogenetic Species Trees Using Quartet Frequencies
Previous Article in Journal
Protein Phosphatase 2A in the Regulation of Wnt Signaling, Stem Cells, and Cancer
Previous Article in Special Issue
Genome-Wide Identification and Structural Analysis of bZIP Transcription Factor Genes in Brassica napus
Article Menu

Export Article

Open AccessReview
Genes 2018, 9(3), 123; doi:10.3390/genes9030123

Delimiting Coalescence Genes (C-Genes) in Phylogenomic Data Sets

Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, CA 92521, USA
Division of Vertebrate Zoology and Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY 10024, USA
Authors to whom correspondence should be addressed.
Received: 1 December 2017 / Revised: 2 February 2018 / Accepted: 19 February 2018 / Published: 26 February 2018
(This article belongs to the Special Issue Estimating Phylogenies from Large Genomic Datasets)
View Full-Text   |   Download PDF [875 KB, uploaded 26 February 2018]   |  


Summary coalescence methods have emerged as a popular alternative for inferring species trees with large genomic datasets, because these methods explicitly account for incomplete lineage sorting. However, statistical consistency of summary coalescence methods is not guaranteed unless several model assumptions are true, including the critical assumption that recombination occurs freely among but not within coalescence genes (c-genes), which are the fundamental units of analysis for these methods. Each c-gene has a single branching history, and large sets of these independent gene histories should be the input for genome-scale coalescence estimates of phylogeny. By contrast, numerous studies have reported the results of coalescence analyses in which complete protein-coding sequences are treated as c-genes even though exons for these loci can span more than a megabase of DNA. Empirical estimates of recombination breakpoints suggest that c-genes may be much shorter, especially when large clades with many species are the focus of analysis. Although this idea has been challenged recently in the literature, the inverse relationship between c-gene size and increased taxon sampling in a dataset—the ‘recombination ratchet’—is a fundamental property of c-genes. For taxonomic groups characterized by genes with long intron sequences, complete protein-coding sequences are likely not valid c-genes and are inappropriate units of analysis for summary coalescence methods unless they occur in recombination deserts that are devoid of incomplete lineage sorting (ILS). Finally, it has been argued that coalescence methods are robust when the no-recombination within loci assumption is violated, but recombination must matter at some scale because ILS, a by-product of recombination, is the raison d’etre for coalescence methods. That is, extensive recombination is required to yield the large number of independently segregating c-genes used to infer a species tree. If coalescent methods are powerful enough to infer the correct species tree for difficult phylogenetic problems in the anomaly zone, where concatenation is expected to fail because of ILS, then there should be a decreasing probability of inferring the correct species tree using longer loci with many intralocus recombination breakpoints (i.e., increased levels of concatenation). View Full-Text
Keywords: coalescence genes; phylogenomics; protein-coding sequences; recombination breakpoints; recombination ratchet coalescence genes; phylogenomics; protein-coding sequences; recombination breakpoints; recombination ratchet

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0).

Scifeed alert for new publications

Never miss any articles matching your research from any publisher
  • Get alerts for new papers matching your research
  • Find out the new papers from selected authors
  • Updated daily for 49'000+ journals and 6000+ publishers
  • Define your Scifeed now

SciFeed Share & Cite This Article

MDPI and ACS Style

Springer, M.S.; Gatesy, J. Delimiting Coalescence Genes (C-Genes) in Phylogenomic Data Sets. Genes 2018, 9, 123.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics



[Return to top]
Genes EISSN 2073-4425 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top