Next Article in Journal
Interactions Between Spermine-Derivatized Tentacle Porphyrins and The Human Telomeric DNA G-Quadruplex
Previous Article in Journal
Functional Biological Activity of Sorafenib as a Tumor-Treating Field Sensitizer for Glioblastoma Therapy
Article Menu
Issue 11 (November) cover image

Export Article

Open AccessArticle
Int. J. Mol. Sci. 2018, 19(11), 3687; https://doi.org/10.3390/ijms19113687

Hierarchical Clustering of DNA k-mer Counts in RNAseq Fastq Files Identifies Sample Heterogeneities

1
Department of Anaesthesiology, HELIOS University Hospital Wuppertal, University of Witten/Herdecke, Heusnerstr. 40, 42283 Wuppertal, Germany
2
Institut fur Virologie, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany
3
Mathematisches Institut, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
*
Author to whom correspondence should be addressed.
Received: 5 November 2018 / Accepted: 15 November 2018 / Published: 21 November 2018
(This article belongs to the Section Biochemistry)
Full-Text   |   PDF [477 KB, uploaded 21 November 2018]   |  
  |   Review Reports

Abstract

We apply hierarchical clustering (HC) of DNA k-mer counts on multiple Fastq files. The tree structures produced by HC may reflect experimental groups and thereby indicate experimental effects, but clustering of preparation groups indicates the presence of batch effects. Hence, HC of DNA k-mer counts may serve as a diagnostic device. In order to provide a simple applicable tool we implemented sequential analysis of Fastq reads with low memory usage in an R package (seqTools) available on Bioconductor. The approach is validated by analysis of Fastq file batches containing RNAseq data. Analysis of three Fastq batches downloaded from ArrayExpress indicated experimental effects. Analysis of RNAseq data from two cell types (dermal fibroblasts and Jurkat cells) sequenced in our facility indicate presence of batch effects. The observed batch effects were also present in reads mapped to the human genome and also in reads filtered for high quality (Phred > 30). We propose, that hierarchical clustering of DNA k-mer counts provides an unspecific diagnostic tool for RNAseq experiments. Further exploration is required once samples are identified as outliers in HC derived trees. View Full-Text
Keywords: DNA-kmer; Fastq; RNAseq DNA-kmer; Fastq; RNAseq
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).
SciFeed

Share & Cite This Article

MDPI and ACS Style

Kaisers , W.; Schwender, H.; Schaal , H. Hierarchical Clustering of DNA k-mer Counts in RNAseq Fastq Files Identifies Sample Heterogeneities. Int. J. Mol. Sci. 2018, 19, 3687.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Int. J. Mol. Sci. EISSN 1422-0067 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top