Next Article in Journal
Cellular and Synaptic Dysfunctions in Parkinson’s Disease: Stepping Out of the Striatum
Next Article in Special Issue
Graph Convolutional Network and Convolutional Neural Network Based Method for Predicting lncRNA-Disease Associations
Previous Article in Journal
The miRFIB-Score: A Serological miRNA-Based Scoring Algorithm for the Diagnosis of Significant Liver Fibrosis
Previous Article in Special Issue
Transcription Factors Indirectly Regulate Genes through Nuclear Colocalization
Open AccessArticle

Characterizing Human Cell Types and Tissue Origin Using the Benford Law

Department of Molecular Biology, Faculty of Life Sciences, Ariel University, Ariel 40700, Israel
*
Author to whom correspondence should be addressed.
Cells 2019, 8(9), 1004; https://doi.org/10.3390/cells8091004
Received: 28 June 2019 / Revised: 27 August 2019 / Accepted: 28 August 2019 / Published: 29 August 2019
(This article belongs to the Special Issue Bioinformatics and Computational Biology 2019)
Processing massive transcriptomic datasets in a meaningful manner requires novel, possibly interdisciplinary, approaches. One principle that can address this challenge is the Benford law (BL), which posits that the occurrence probability of a leading digit in a large numerical dataset decreases as its value increases. Here, we analyzed large single-cell and bulk RNA-seq datasets to test whether cell types and tissue origins can be differentiated based on the adherence of specific genes to the BL. Then, we used the Benford adherence scores of these genes as inputs to machine-learning algorithms and tested their separation accuracy. We found that genes selected based on their first-digit distributions can distinguish between cell types and tissue origins. Moreover, despite the simplicity of this novel feature-selection method, its separation accuracy is higher than that of the mean-expression level approach and is similar to that of the differential expression approach. Thus, the BL can be used to obtain biological insights from massive amounts of numerical genomics data—a capability that could be utilized in various biomedical applications, e.g., to resolve samples of unknown primary origin, identify possible sample contaminations, and provide insights into the molecular basis of cancer subtypes. View Full-Text
Keywords: single-cell RNA sequencing; Benford law; Benford distribution; cell classification; machine learning single-cell RNA sequencing; Benford law; Benford distribution; cell classification; machine learning
Show Figures

Figure 1

MDPI and ACS Style

Morag, S.; Salmon-Divon, M. Characterizing Human Cell Types and Tissue Origin Using the Benford Law. Cells 2019, 8, 1004.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop