Next Article in Journal
B Chromosomes in Populations of Mammals Revisited
Previous Article in Journal
Out of the Can”: A Draft Genome Assembly, Liver Transcriptome, and Nutrigenomics of the European Sardine, Sardina pilchardus
Previous Article in Special Issue
A Statistical Method for Observing Personal Diploid Methylomes and Transcriptomes with Single-Molecule Real-Time Sequencing
Article Menu
Issue 10 (October) cover image

Export Article

Open AccessArticle
Genes 2018, 9(10), 486; https://doi.org/10.3390/genes9100486

De Novo Assembly of Two Swedish Genomes Reveals Missing Segments from the Human GRCh38 Reference and Improves Variant Calling of Population-Scale Sequencing Data

1
Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, 752 36 Uppsala, Sweden
2
Science for Life Laboratory, Department of Biochemistry and Biophysics (DBB), Stockholm University, 114 19 Stockholm, Sweden
3
Science for Life Laboratory, Department of Medical Sciences, Molecular Medicine, Uppsala University, 752 36 Uppsala, Sweden
4
Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, 752 36 Uppsala, Sweden
*
Author to whom correspondence should be addressed.
Received: 28 August 2018 / Revised: 21 September 2018 / Accepted: 5 October 2018 / Published: 9 October 2018
(This article belongs to the Special Issue Advances in Single Molecule, Real-Time (SMRT) Sequencing)
Full-Text   |   PDF [2207 KB, uploaded 14 October 2018]   |  

Abstract

The current human reference sequence (GRCh38) is a foundation for large-scale sequencing projects. However, recent studies have suggested that GRCh38 may be incomplete and give a suboptimal representation of specific population groups. Here, we performed a de novo assembly of two Swedish genomes that revealed over 10 Mb of sequences absent from the human GRCh38 reference in each individual. Around 6 Mb of these novel sequences (NS) are shared with a Chinese personal genome. The NS are highly repetitive, have an elevated GC-content, and are primarily located in centromeric or telomeric regions. Up to 1 Mb of NS can be assigned to chromosome Y, and large segments are also missing from GRCh38 at chromosomes 14, 17, and 21. Inclusion of NS into the GRCh38 reference radically improves the alignment and variant calling from short-read whole-genome sequencing data at several genomic loci. A re-analysis of a Swedish population-scale sequencing project yields > 75,000 putative novel single nucleotide variants (SNVs) and removes > 10,000 false positive SNV calls per individual, some of which are located in protein coding regions. Our results highlight that the GRCh38 reference is not yet complete and demonstrate that personal genome assemblies from local populations can improve the analysis of short-read whole-genome sequencing data. View Full-Text
Keywords: de novo assembly; SMRT sequencing; GRCh38; human reference genome; human whole-genome sequencing; population sequencing; Swedish population de novo assembly; SMRT sequencing; GRCh38; human reference genome; human whole-genome sequencing; population sequencing; Swedish population
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Supplementary material

SciFeed

Share & Cite This Article

MDPI and ACS Style

Ameur, A.; Che, H.; Martin, M.; Bunikis, I.; Dahlberg, J.; Höijer, I.; Häggqvist, S.; Vezzi, F.; Nordlund, J.; Olason, P.; Feuk, L.; Gyllensten, U. De Novo Assembly of Two Swedish Genomes Reveals Missing Segments from the Human GRCh38 Reference and Improves Variant Calling of Population-Scale Sequencing Data. Genes 2018, 9, 486.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Genes EISSN 2073-4425 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top