Research in Computational Molecular Biology Focused on Comparative Genomics: Selected Papers from RECOMB-CG 2022

A special issue of Biology (ISSN 2079-7737). This special issue belongs to the section "Bioinformatics".

Deadline for manuscript submissions: closed (30 June 2022) | Viewed by 10494

Special Issue Editors


E-Mail Website
Guest Editor
Department of Computer Science, University of Saskatchewan, Saskatoon, SK S7N 5C9, Canada
Interests: bioinformatics; natural computing; genome evolution; comparative genomics; mathematical genomics

E-Mail Website
Guest Editor
Department of Electrical and Computer Engineering, UC San Diego, La Jolla, CA 92093-0407, USA
Interests: reconstruction of species trees from gene trees (phylogenomics); large-scale multiple sequence alignment; HIV transmission network reconstruction; metagenomic analyses using phylogenetic approaches

Special Issue Information

Dear Colleagues, 

This Special Issue presents selected papers from the 2022 RECOMB satellite workshop on Comparative Genomics (https://recombcg2022.usask.ca/). 

The RECOMB-CG conference brings together researchers in bioinformatics and genomics to discuss cutting-edge research in comparative genomics, with an emphasis on computational approaches and analyses, as well as novel experimental results. Topics of interest include genome evolution; population genomics; genome rearrangements; genomic variation, diversity and dynamics; phylogenomics; comparative tools for genome assembly; comparison of functional networks; gene identification or annotation; the evolution of cancer genomes; comparative epigenomics; paleogenomics; phylodynamics; metagenomics, and related areas.

Dr. Lingling Jin
Dr. Siavash Mirarab
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Biology is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2700 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • comparative genomics
  • evolution
  • genome rearrangement
  • algorithms
  • phylogeny
  • statistical genomics
  • ancestral reconstruction
  • gene families
  • reconciliation
  • whole-genome duplication
  • metagenomics
  • cancer genomics

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

28 pages, 1213 KiB  
Article
A Novel Algorithm for Feature Selection Using Penalized Regression with Applications to Single-Cell RNA Sequencing Data
by Bhavithry Sen Puliparambil, Jabed H. Tomal and Yan Yan
Biology 2022, 11(10), 1495; https://doi.org/10.3390/biology11101495 - 12 Oct 2022
Cited by 1 | Viewed by 2296
Abstract
With the emergence of single-cell RNA sequencing (scRNA-seq) technology, scientists are able to examine gene expression at single-cell resolution. Analysis of scRNA-seq data has its own challenges, which stem from its high dimensionality. The method of machine learning comes with the potential of [...] Read more.
With the emergence of single-cell RNA sequencing (scRNA-seq) technology, scientists are able to examine gene expression at single-cell resolution. Analysis of scRNA-seq data has its own challenges, which stem from its high dimensionality. The method of machine learning comes with the potential of gene (feature) selection from the high-dimensional scRNA-seq data. Even though there exist multiple machine learning methods that appear to be suitable for feature selection, such as penalized regression, there is no rigorous comparison of their performances across data sets, where each poses its own challenges. Therefore, in this paper, we analyzed and compared multiple penalized regression methods for scRNA-seq data. Given the scRNA-seq data sets we analyzed, the results show that sparse group lasso (SGL) outperforms the other six methods (ridge, lasso, elastic net, drop lasso, group lasso, and big lasso) using the metrics area under the receiver operating curve (AUC) and computation time. Building on these findings, we proposed a new algorithm for feature selection using penalized regression methods. The proposed algorithm works by selecting a small subset of genes and applying SGL to select the differentially expressed genes in scRNA-seq data. By using hierarchical clustering to group genes, the proposed method bypasses the need for domain-specific knowledge for gene grouping information. In addition, the proposed algorithm provided consistently better AUC for the data sets used. Full article
Show Figures

Figure 1

32 pages, 10116 KiB  
Article
Modulating Gene Expression within a Microbiome Based on Computational Models
by Liyam Chitayat Levi, Ido Rippin, Moran Ben Tulila, Rotem Galron and Tamir Tuller
Biology 2022, 11(9), 1301; https://doi.org/10.3390/biology11091301 - 31 Aug 2022
Viewed by 3075
Abstract
Recent research in the field of bioinformatics and molecular biology has revealed the immense complexity and uniqueness of microbiomes, while also showcasing the impact of the symbiosis between a microbiome and its host or environment. A core property influencing this process is horizontal [...] Read more.
Recent research in the field of bioinformatics and molecular biology has revealed the immense complexity and uniqueness of microbiomes, while also showcasing the impact of the symbiosis between a microbiome and its host or environment. A core property influencing this process is horizontal gene transfer between members of the bacterial community used to maintain genetic variation. The essential effect of this mechanism is the exposure of genetic information to a wide array of members of the community, creating an additional “layer” of information in the microbiome named the “plasmidome”. From an engineering perspective, introduction of genetic information to an environment must be facilitated into chosen species which will be able to carry out the desired effect instead of competing and inhibiting it. Moreover, this process of information transfer imposes concerns for the biosafety of genetic engineering of microbiomes as exposure of genetic information into unwanted hosts can have unprecedented ecological impacts. Current technologies are usually experimentally developed for a specific host/environment, and only deal with the transformation process itself at best, ignoring the impact of horizontal gene transfer and gene-microbiome interactions that occur over larger periods of time in uncontrolled environments. The goal of this research was to design new microbiome-specific versions of engineered genetic information, providing an additional layer of compatibility to existing engineering techniques. The engineering framework is entirely computational and is agnostic to the selected microbiome or gene by reducing the problem into the following set up: microbiome species can be defined as wanted or unwanted hosts of the modification. Then, every element related to gene expression (e.g., promoters, coding regions, etc.) and regulation is individually examined and engineered by novel algorithms to provide the defined expression preferences. Additionally, the synergistic effect of the combination of engineered gene blocks facilitates robustness to random mutations that might occur over time. This method has been validated using both computational and experimental tools, stemming from the research done in the iGEM 2021 competition, by the TAU group. Full article
Show Figures

Figure 1

24 pages, 6083 KiB  
Article
Learning Hyperbolic Embedding for Phylogenetic Tree Placement and Updates
by Yueyu Jiang, Puoya Tabaghi and Siavash Mirarab
Biology 2022, 11(9), 1256; https://doi.org/10.3390/biology11091256 - 24 Aug 2022
Cited by 3 | Viewed by 2559
Abstract
Phylogenetic placement, used widely in ecological analyses, seeks to add a new species to an existing tree. A deep learning approach was previously proposed to estimate the distance between query and backbone species by building a map from gene sequences to a high-dimensional [...] Read more.
Phylogenetic placement, used widely in ecological analyses, seeks to add a new species to an existing tree. A deep learning approach was previously proposed to estimate the distance between query and backbone species by building a map from gene sequences to a high-dimensional space that preserves species tree distances. They then use a distance-based placement method to place the queries on that species tree. In this paper, we examine the appropriate geometry for faithfully representing tree distances while embedding gene sequences. Theory predicts that hyperbolic spaces should provide a drastic reduction in distance distortion compared to the conventional Euclidean space. Nevertheless, hyperbolic embedding imposes its own unique challenges related to arithmetic operations, exponentially-growing functions, and limited bit precision, and we address these challenges. Our results confirm that hyperbolic embeddings have substantially lower distance errors than Euclidean space. However, these better-estimated distances do not always lead to better phylogenetic placement. We then show that the deep learning framework can be used not just to place on a backbone tree but to update it to obtain a fully resolved tree. With our hyperbolic embedding framework, species trees can be updated remarkably accurately with only a handful of genes. Full article
Show Figures

Figure 1

22 pages, 2386 KiB  
Article
Distance-Based Phylogenetic Placement with Statistical Support
by Navid Bin Hasan, Metin Balaban, Avijit Biswas, Md. Shamsuzzoha Bayzid and Siavash Mirarab
Biology 2022, 11(8), 1212; https://doi.org/10.3390/biology11081212 - 12 Aug 2022
Viewed by 1491
Abstract
Phylogenetic identification of unknown sequences by placing them on a tree is routinely attempted in modern ecological studies. Such placements are often obtained from incomplete and noisy data, making it essential to augment the results with some notion of uncertainty. While the standard [...] Read more.
Phylogenetic identification of unknown sequences by placing them on a tree is routinely attempted in modern ecological studies. Such placements are often obtained from incomplete and noisy data, making it essential to augment the results with some notion of uncertainty. While the standard likelihood-based methods designed for placement naturally provide such measures of uncertainty, the newer and more scalable distance-based methods lack this crucial feature. Here, we adopt several parametric and nonparametric sampling methods for measuring the support of phylogenetic placements that have been obtained with the use of distances. Comparing the alternative strategies, we conclude that nonparametric bootstrapping is more accurate than the alternatives. We go on to show how bootstrapping can be performed efficiently using a linear algebraic formulation that makes it up to 30 times faster and implement this optimized version as part of the distance-based placement software APPLES. By examining a wide range of applications, we show that the relative accuracy of maximum likelihood (ML) support values as compared to distance-based methods depends on the application and the dataset. ML is advantageous for fragmentary queries, while distance-based support values are more accurate for full-length and multi-gene datasets. With the quantification of uncertainty, our work fills a crucial gap that prevents the broader adoption of distance-based placement tools. Full article
Show Figures

Figure 1

Back to TopTop