Next Article in Journal
Embryo-Based Large Fragment Knock-in in Mammals: Why, How and What’s Next
Next Article in Special Issue
Evolutionary Dynamics of the POTE Gene Family in Human and Nonhuman Primates
Previous Article in Journal
Comprehensive Geno- and Phenotyping in a Complex Pedigree Including Four Different Inherited Retinal Dystrophies
Previous Article in Special Issue
The Genetic Basis of Scale-Loss Phenotype in the Rapid Radiation of Takifugu Fishes
Open AccessArticle

Rapid, Paralog-Sensitive CNV Analysis of 2457 Human Genomes Using QuicK-mer2

1
Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109, USA
2
Department of Human Genetics and Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
*
Author to whom correspondence should be addressed.
Genes 2020, 11(2), 141; https://doi.org/10.3390/genes11020141
Received: 11 January 2020 / Revised: 21 January 2020 / Accepted: 24 January 2020 / Published: 29 January 2020
(This article belongs to the Special Issue A Tale of Genes and Genomes)
Gene duplication is a major mechanism for the evolution of gene novelty, and copy-number variation makes a major contribution to inter-individual genetic diversity. However, most approaches for studying copy-number variation rely upon uniquely mapping reads to a genome reference and are unable to distinguish among duplicated sequences. Specialized approaches to interrogate specific paralogs are comparatively slow and have a high degree of computational complexity, limiting their effective application to emerging population-scale data sets. We present QuicK-mer2, a self-contained, mapping-free approach that enables the rapid construction of paralog-specific copy-number maps from short-read sequence data. This approach is based on the tabulation of unique k-mer sequences from short-read data sets, and is able to analyze a 20X coverage human genome in approximately 20 min. We applied our approach to newly released sequence data from the 1000 Genomes Project, constructed paralog-specific copy-number maps from 2457 unrelated individuals, and uncovered copy-number variation of paralogous genes. We identify nine genes where none of the analyzed samples have a copy number of two, 92 genes where the majority of samples have a copy number other than two, and describe rare copy number variation effecting multiple genes at the APOBEC3 locus. View Full-Text
Keywords: copy-number variation; gene duplication; k-mer copy-number variation; gene duplication; k-mer
Show Figures

Figure 1

MDPI and ACS Style

Shen, F.; Kidd, J.M. Rapid, Paralog-Sensitive CNV Analysis of 2457 Human Genomes Using QuicK-mer2. Genes 2020, 11, 141.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop