Comparisons of Non-Gaussian Statistical Models in DNA Methylation Analysis

Received: 24 March 2014; in revised form: 12 May 2014 / Accepted: 10 June 2014 / Published: 16 June 2014
(This article belongs to the Special Issue Identification and Roles of the Structure of DNA)
Abstract: As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance.
Keywords: non-Gaussian statistical models; dimension reduction; unsupervised learning; feature selection; DNA methylation analysis
