Next Article in Journal
Identification of Functional Transcriptional Binding Sites within Chicken Abcg2 Gene Promoter and Screening Its Regulators
Previous Article in Journal
Genetic Differentiation in Hatchery and Stocked Populations of Sea Trout in the Southern Baltic: Selection Evidence at SNP Loci
Previous Article in Special Issue
Testing Differential Gene Networks under Nonparanormal Graphical Models with False Discovery Rate Control
Open AccessArticle

Model-Based Clustering with Measurement or Estimation Errors

by Wanli Zhang and Yanming Di *
Department of Statistics, Oregon State University, Corvallis, OR 97330, USA
*
Author to whom correspondence should be addressed.
Current address: Eli Lilly & Company, Shanghai, China.
Genes 2020, 11(2), 185; https://doi.org/10.3390/genes11020185
Received: 27 November 2019 / Revised: 4 February 2020 / Accepted: 5 February 2020 / Published: 10 February 2020
(This article belongs to the Special Issue Statistical Methods for the Analysis of Genomic Data)
Model-based clustering with finite mixture models has become a widely used clustering method. One of the recent implementations is MCLUST. When objects to be clustered are summary statistics, such as regression coefficient estimates, they are naturally associated with estimation errors, whose covariance matrices can often be calculated exactly or approximated using asymptotic theory. This article proposes an extension to Gaussian finite mixture modeling—called MCLUST-ME—that properly accounts for the estimation errors. More specifically, we assume that the distribution of each observation consists of an underlying true component distribution and an independent measurement error distribution. Under this assumption, each unique value of estimation error covariance corresponds to its own classification boundary, which consequently results in a different grouping from MCLUST. Through simulation and application to an RNA-Seq data set, we discovered that under certain circumstances, explicitly, modeling estimation errors, improves clustering performance or provides new insights into the data, compared with when errors are simply ignored, whereas the degree of improvement depends on factors such as the distribution of error covariance matrices.
Keywords: gaussian finite mixture model; clustering analysis; uncertainty; expectation-maximization algorithm; classification boundary; gene expression; RNA-seq gaussian finite mixture model; clustering analysis; uncertainty; expectation-maximization algorithm; classification boundary; gene expression; RNA-seq
MDPI and ACS Style

Zhang, W.; Di, Y. Model-Based Clustering with Measurement or Estimation Errors. Genes 2020, 11, 185.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop