Next Article in Journal
Lyophilized Platelet-Rich Fibrin (PRF) Promotes Craniofacial Bone Regeneration through Runx2
Next Article in Special Issue
Molecular Characterization of α- and β-Thalassaemia among Malay Patients
Previous Article in Journal
Arabidopsis ABA Receptor RCAR1/PYL9 Interacts with an R2R3-Type MYB Transcription Factor, AtMYB44
Previous Article in Special Issue
Development of a Multiplex and Cost-Effective Genotype Test toward More Personalized Medicine for the Antiplatelet Drug Clopidogrel
Article Menu

Export Article

Open AccessArticle
Int. J. Mol. Sci. 2014, 15(5), 8491-8508;

DEFLATE Compression Algorithm Corrects for Overestimation of Phylogenetic Diversity by Grantham Approach to Single-Nucleotide Polymorphism Classification

Kolling Institute of Medical Research, Royal North Shore Hospital, Pacific Hwy, St Leonards, NSW 2065, Australia
Sydney Medical School, the University of Sydney, NSW 2006, Australia
University of Cambridge Metabolic Research Laboratories, Box 289, Level 4 Wellcome Trust-MRC Institute of Metabolic Science, Addenbrooke's Hospital, Hills Road, Cambridge CB2 0QQ, UK
Author to whom correspondence should be addressed.
Received: 22 January 2014 / Revised: 28 March 2014 / Accepted: 4 May 2014 / Published: 13 May 2014
(This article belongs to the Collection Human Single Nucleotide Polymorphisms and Disease Diagnostics)
Full-Text   |   PDF [571 KB, uploaded 19 June 2014]   |  


Improvements in speed and cost of genome sequencing are resulting in increasing numbers of novel non-synonymous single nucleotide polymorphisms (nsSNPs) in genes known to be associated with disease. The large number of nsSNPs makes laboratory-based classification infeasible and familial co-segregation with disease is not always possible. In-silico methods for classification or triage are thus utilised. A popular tool based on multiple-species sequence alignments (MSAs) and work by Grantham, Align-GVGD, has been shown to underestimate deleterious effects, particularly as sequence numbers increase. We utilised the DEFLATE compression algorithm to account for expected variation across a number of species. With the adjusted Grantham measure we derived a means of quantitatively clustering known neutral and deleterious nsSNPs from the same gene; this was then used to assign novel variants to the most appropriate cluster as a means of binary classification. Scaling of clusters allows for inter-gene comparison of variants through a single pathogenicity score. The approach improves upon the classification accuracy of Align-GVGD while correcting for sensitivity to large MSAs. Open-source code and a web server are made available at View Full-Text
Keywords: DEFLATE; compression; Grantham; variation; sequence alignment; nsSNP DEFLATE; compression; Grantham; variation; sequence alignment; nsSNP

Graphical abstract

This is an open access article distributed under the Creative Commons Attribution License (CC BY 3.0).

Share & Cite This Article

MDPI and ACS Style

Schlosberg, A.; Lam, B.Y.H.; Yeo, G.S.H.; Clifton-Bligh, R.J. DEFLATE Compression Algorithm Corrects for Overestimation of Phylogenetic Diversity by Grantham Approach to Single-Nucleotide Polymorphism Classification. Int. J. Mol. Sci. 2014, 15, 8491-8508.

Show more citation formats Show less citations formats

Related Articles

Article Metrics

Article Access Statistics



[Return to top]
Int. J. Mol. Sci. EISSN 1422-0067 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top