In Systems Biology, the complex relationships between different entities in the cells are modeled and analyzed using networks. Towards this aim, a rich variety of gene regulatory network (GRN) inference algorithms has been developed in recent years. However, most algorithms rely solely on gene expression data to reconstruct the network. Due to possible expression profile similarity, predictions can contain connections between biologically unrelated genes. Therefore, previously known biological information should also be considered by computational methods to obtain more consistent results, such as experimentally validated interactions between transcription factors and target genes. In this work, we propose XGBoost for gene regulatory networks (XGRN), a supervised algorithm, which combines gene expression data with previously known interactions for GRN inference. The key idea of our method is to train a regression model for each known interaction of the network and then utilize this model to predict new interactions. The regression is performed by XGBoost, a state-of-the-art algorithm using an ensemble of decision trees. In detail, XGRN learns a regression model based on gene expression of the two interactors and then provides predictions using as input the gene expression of other candidate interactors. Application on benchmark datasets and a real large single-cell RNA-Seq experiment resulted in high performance compared to other unsupervised and supervised methods, demonstrating the ability of XGRN to provide reliable predictions.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited