Bioinformatics and Computational Genomics: Selected Papers from International Symposium on Bioinformatics Research and Applications (ISBRA 2022)

A special issue of Biomolecules (ISSN 2218-273X). This special issue belongs to the section "Bioinformatics and Systems Biology".

Deadline for manuscript submissions: closed (15 May 2023) | Viewed by 4239

Special Issue Editor


E-Mail Website
Guest Editor
Department of Computer Science, Georgia State University, Atlanta, GA 30303, USA
Interests: bioinformatics; computational genomics; computational epidemiology; analysis of high-throughput sequencing data; statistical inference; discrete algorithms

Special Issue Information

Dear Colleagues,

This Special Issue will include a selection of about 15 full papers accepted and presented at the 19th International Symposium on Bioinformatics Research and Applications (ISBRA 2022) which will be held in University of Haifa, Israel on November 14-17, 2022. ISBRA provides a forum for the exchange of ideas and results among researchers, developers, and practitioners working on all aspects of bioinformatics and computational biology and their applications.

Prof. Dr. Alex Zelikovsky
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Biomolecules is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2700 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • biomolecular imaging, molecular evolution, molecular modelling and simulation
  • computational genomics, proteomics
  • computational genetic epidemiology
  • metagenomics
  • next-generation sequencing data analysis
  • AI and machine learning methods in bioinformatics and medical information
  • big data analytics in biology and medicine

Published Papers (2 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

14 pages, 6660 KiB  
Article
Graph Random Forest: A Graph Embedded Algorithm for Identifying Highly Connected Important Features
by Leqi Tian, Wenbin Wu and Tianwei Yu
Biomolecules 2023, 13(7), 1153; https://doi.org/10.3390/biom13071153 - 20 Jul 2023
Cited by 4 | Viewed by 1691
Abstract
Random Forest (RF) is a widely used machine learning method with good performance on classification and regression tasks. It works well under low sample size situations, which benefits applications in the field of biology. For example, gene expression data often involve much larger [...] Read more.
Random Forest (RF) is a widely used machine learning method with good performance on classification and regression tasks. It works well under low sample size situations, which benefits applications in the field of biology. For example, gene expression data often involve much larger numbers of features (p) compared to the size of samples (n). Though the predictive accuracy using RF is often high, there are some problems when selecting important genes using RF. The important genes selected by RF are usually scattered on the gene network, which conflicts with the biological assumption of functional consistency between effective features. To improve feature selection by incorporating external topological information between genes, we propose the Graph Random Forest (GRF) for identifying highly connected important features by involving the known biological network when constructing the forest. The algorithm can identify effective features that form highly connected sub-graphs and achieve equivalent classification accuracy to RF. To evaluate the capability of our proposed method, we conducted simulation experiments and applied the method to two real datasets—non-small cell lung cancer RNA-seq data from The Cancer Genome Atlas, and human embryonic stem cell RNA-seq dataset (GSE93593). The resulting high classification accuracy, connectivity of selected sub-graphs, and interpretable feature selection results suggest the method is a helpful addition to graph-based classification models and feature selection procedures. Full article
Show Figures

Figure 1

21 pages, 2742 KiB  
Article
Assessing the Resilience of Machine Learning Classification Algorithms on SARS-CoV-2 Genome Sequences Generated with Long-Read Specific Errors
by Bikram Sahoo, Sarwan Ali, Pin-Yu Chen, Murray Patterson and Alexander Zelikovsky
Biomolecules 2023, 13(6), 934; https://doi.org/10.3390/biom13060934 - 2 Jun 2023
Viewed by 1727
Abstract
The emergence of third-generation single-molecule sequencing (TGS) technology has revolutionized the generation of long reads, which are essential for genome assembly and have been widely employed in sequencing the SARS-CoV-2 virus during the COVID-19 pandemic. Although long-read sequencing has been crucial in understanding [...] Read more.
The emergence of third-generation single-molecule sequencing (TGS) technology has revolutionized the generation of long reads, which are essential for genome assembly and have been widely employed in sequencing the SARS-CoV-2 virus during the COVID-19 pandemic. Although long-read sequencing has been crucial in understanding the evolution and transmission of the virus, the high error rate associated with these reads can lead to inadequate genome assembly and downstream biological interpretation. In this study, we evaluate the accuracy and robustness of machine learning (ML) models using six different embedding techniques on SARS-CoV-2 error-incorporated genome sequences. Our analysis includes two types of error-incorporated genome sequences: those generated using simulation tools to emulate error profiles of long-read sequencing platforms and those generated by introducing random errors. We show that the spaced k-mers embedding method achieves high accuracy in classifying error-free SARS-CoV-2 genome sequences, and the spaced k-mers and weighted k-mers embedding methods are highly accurate in predicting error-incorporated sequences. The fixed-length vectors generated by these methods contribute to the high accuracy achieved. Our study provides valuable insights for researchers to effectively evaluate ML models and gain a better understanding of the approach for accurate identification of critical SARS-CoV-2 genome sequences. Full article
Show Figures

Figure 1

Back to TopTop