Topic Editors

Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan 250061, China
College of Artificial Intelligence, Nankai University, Tianjin 300350, China
College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150086, China

Bioinformatics and Intelligent Information Processing

Abstract submission deadline
closed (31 July 2023)
Manuscript submission deadline
closed (30 November 2023)
Viewed by
16382

Topic Information

Dear Colleagues,

The 2023 Bioinformatics and Intelligent Information Processing Conference (BIIP2023), the annual conference of the Bioinformatics and Artificial Life Committee of the Chinese Association for Artificial Intelligence (CAAI), will be held in Jinan City, Shandong Province, China, from June 18th to June 20th, 2023. The conference is organized by the CAAI and hosted by the Bioinformatics and Artificial Life Committee of CAAI and the School of Control Science and Engineering of Shandong University. Under the current breakthroughs in large language models of AI, it is of great significance and value to discuss how to use new AI technologies to promote biomedical research. BIIP2023 aims to build such a platform for scientists in related fields. The conference will invite many distinguished experts and scholars in the fields of AI, life science, and medical science to give talks and run tutorials. In addition, sessions will also be set up for talks about the latest research progress and trends of interesting topics. The topic collection plans to present novel and advanced interdisciplinary research achievements in bioinformatics and intelligent information processing. We warmly welcome scholars in the related fields to submit their works to the journals involved in this topic collection. The topics include but are not limited to the following areas:

S1: Self-organization phenomena and mechanisms in natural and human-made systems

S2: Bioanalysis and intelligent processing algorithms

S3: Biological multi-omics data analysis

S4: Biological networks and systems biology

S5: Intelligent drug design

S6: Precision medicine and big data

S7: Biological and health big data analytics

S8: Biomedical image analysis

S9: Digital diagnosis and smart health

S10: Bioinformatics foundations of the brain and brain-like intelligence

S11: Artificial life systems and synthetic biology

S12: Artificial life and artificial intelligence

S13: Digital-based life and intelligent health

S14: Intelligent computing for digital-based life

S15: Other related fields

Prof. Dr. Zhiping Liu
Prof. Dr. Han Zhang
Prof. Dr. Junwei Han
Topic Editors

Keywords

  • bioinformatics
  • artificial intelligence
  • intelligent information processing
  • artificial life
  • models and algorithms
  • systems and simulators
  • systems biology
  • biomedical big data
  • large language models

Participating Journals

Journal Name Impact Factor CiteScore Launched Year First Decision (median) APC
AI
ai
3.1 7.2 2020 17.6 Days CHF 1600
Entropy
entropy
2.1 4.9 1999 22.4 Days CHF 2600
Genes
genes
2.8 5.2 2010 16.3 Days CHF 2600
International Journal of Molecular Sciences
ijms
4.9 8.1 2000 18.1 Days CHF 2900
Machine Learning and Knowledge Extraction
make
4.0 6.3 2019 27.1 Days CHF 1800

Preprints.org is a multidiscipline platform providing preprint service that is dedicated to sharing your research from the start and empowering your research journey.

MDPI Topics is cooperating with Preprints.org and has built a direct connection between MDPI journals and Preprints.org. Authors are encouraged to enjoy the benefits by posting a preprint at Preprints.org prior to publication:

  1. Immediately share your ideas ahead of publication and establish your research priority;
  2. Protect your idea from being stolen with this time-stamped preprint article;
  3. Enhance the exposure and impact of your research;
  4. Receive feedback from your peers in advance;
  5. Have it indexed in Web of Science (Preprint Citation Index), Google Scholar, Crossref, SHARE, PrePubMed, Scilit and Europe PMC.

Published Papers (7 papers)

Order results
Result details
Journals
Select all
Export citation of selected articles as:
15 pages, 911 KiB  
Article
A New and Lightweight R-Peak Detector Using the TEDA Evolving Algorithm
by Lucileide M. D. da Silva, Sérgio N. Silva, Luísa C. de Souza, Karolayne S. de Azevedo, Luiz Affonso Guedes and Marcelo A. C. Fernandes
Mach. Learn. Knowl. Extr. 2024, 6(2), 736-750; https://doi.org/10.3390/make6020034 - 29 Mar 2024
Viewed by 1659
Abstract
The literature on ECG delineation algorithms has seen significant growth in recent decades. However, several challenges still need to be addressed. This work aims to propose a lightweight R-peak-detection algorithm that does not require pre-setting and performs classification on a sample-by-sample basis. The [...] Read more.
The literature on ECG delineation algorithms has seen significant growth in recent decades. However, several challenges still need to be addressed. This work aims to propose a lightweight R-peak-detection algorithm that does not require pre-setting and performs classification on a sample-by-sample basis. The novelty of the proposed approach lies in the utilization of the typicality eccentricity detection anomaly (TEDA) algorithm for R-peak detection. The proposed method for R-peak detection consists of three phases. Firstly, the ECG signal is preprocessed by calculating the signal’s slope and applying filtering techniques. Next, the preprocessed signal is inputted into the TEDA algorithm for R-peak estimation. Finally, in the third and last step, the R-peak identification is carried out. To evaluate the effectiveness of the proposed technique, experiments were conducted on the MIT-BIH arrhythmia database (MIT-AD) for R-peak detection and validation. The results of the study demonstrated that the proposed evolutive algorithm achieved a sensitivity (Se in %), positive predictivity (+P in %), and accuracy (ACC in %) of 95.45%, 99.61%, and 95.09%, respectively, with a tolerance (TOL) of 100 milliseconds. One key advantage of the proposed technique is its low computational complexity, as it is based on a statistical framework calculated recursively. It employs the concepts of typicity and eccentricity to determine whether a given sample is normal or abnormal within the dataset. Unlike most traditional methods, it does not require signal buffering or windowing. Furthermore, the proposed technique employs simple decision rules rather than heuristic approaches, further contributing to its computational efficiency. Full article
(This article belongs to the Topic Bioinformatics and Intelligent Information Processing)
Show Figures

Figure 1

17 pages, 4207 KiB  
Article
A Comprehensive Evaluation of Generalizability of Deep Learning-Based Hi-C Resolution Improvement Methods
by Ghulam Murtaza, Atishay Jain, Madeline Hughes, Justin Wagner and Ritambhara Singh
Genes 2024, 15(1), 54; https://doi.org/10.3390/genes15010054 - 29 Dec 2023
Viewed by 1350
Abstract
Hi-C is a widely used technique to study the 3D organization of the genome. Due to its high sequencing cost, most of the generated datasets are of a coarse resolution, which makes it impractical to study finer chromatin features such as Topologically Associating [...] Read more.
Hi-C is a widely used technique to study the 3D organization of the genome. Due to its high sequencing cost, most of the generated datasets are of a coarse resolution, which makes it impractical to study finer chromatin features such as Topologically Associating Domains (TADs) and chromatin loops. Multiple deep learning-based methods have recently been proposed to increase the resolution of these datasets by imputing Hi-C reads (typically called upscaling). However, the existing works evaluate these methods on either synthetically downsampled datasets, or a small subset of experimentally generated sparse Hi-C datasets, making it hard to establish their generalizability in the real-world use case. We present our framework—Hi-CY—that compares existing Hi-C resolution upscaling methods on seven experimentally generated low-resolution Hi-C datasets belonging to various levels of read sparsities originating from three cell lines on a comprehensive set of evaluation metrics. Hi-CY also includes four downstream analysis tasks, such as TAD and chromatin loops recall, to provide a thorough report on the generalizability of these methods. We observe that existing deep learning methods fail to generalize to experimentally generated sparse Hi-C datasets, showing a performance reduction of up to 57%. As a potential solution, we find that retraining deep learning-based methods with experimentally generated Hi-C datasets improves performance by up to 31%. More importantly, Hi-CY shows that even with retraining, the existing deep learning-based methods struggle to recover biological features such as chromatin loops and TADs when provided with sparse Hi-C datasets. Our study, through the Hi-CY framework, highlights the need for rigorous evaluation in the future. We identify specific avenues for improvements in the current deep learning-based Hi-C upscaling methods, including but not limited to using experimentally generated datasets for training. Full article
(This article belongs to the Topic Bioinformatics and Intelligent Information Processing)
Show Figures

Figure 1

12 pages, 2481 KiB  
Article
Lambda CI Binding to Related Phage Operator Sequences Validates Alignment Algorithm and Highlights the Importance of Overlooked Bonds
by Jacklin Sedhom and Lee A. Solomon
Genes 2023, 14(12), 2221; https://doi.org/10.3390/genes14122221 - 15 Dec 2023
Cited by 1 | Viewed by 1311
Abstract
Bacteriophage λ’s CI repressor protein controls a genetic switch between the virus’s lysogenic and lytic lifecycles, in part, by selectively binding to six different DNA sequences within the phage genome—collectively referred to as operator sites. However, the minimal level of information needed for [...] Read more.
Bacteriophage λ’s CI repressor protein controls a genetic switch between the virus’s lysogenic and lytic lifecycles, in part, by selectively binding to six different DNA sequences within the phage genome—collectively referred to as operator sites. However, the minimal level of information needed for CI to recognize and specifically bind these six unique-but-related sequences is unclear. In a previous study, we introduced an algorithm that extracts the minimal direct readout information needed for λ-CI to recognize and bind its six binding sites. We further revealed direct readout information shared among three evolutionarily related lambdoid phages: λ-phage, Enterobacteria phage VT2-Sakai, and Stx2 converting phage I, suggesting that the λ-CI protein could bind to the operator sites of these other phages. In this study, we show that λ-CI can indeed bind the other two phages’ cognate binding sites as predicted using our algorithm, validating the hypotheses from that paper. We go on to demonstrate the importance of specific hydrogen bond donors and acceptors that are maintained despite changes to the nucleobase itself, and another that has an important role in recognition and binding. This in vitro validation of our algorithm supports its use as a tool to predict alternative binding sites for DNA-binding proteins. Full article
(This article belongs to the Topic Bioinformatics and Intelligent Information Processing)
Show Figures

Figure 1

26 pages, 16374 KiB  
Article
Statistical Analysis of Imbalanced Classification with Training Size Variation and Subsampling on Datasets of Research Papers in Biomedical Literature
by Jose Dixon and Md Rahman
Mach. Learn. Knowl. Extr. 2023, 5(4), 1953-1978; https://doi.org/10.3390/make5040095 - 11 Dec 2023
Viewed by 2226
Abstract
The overall purpose of this paper is to demonstrate how data preprocessing, training size variation, and subsampling can dynamically change the performance metrics of imbalanced text classification. The methodology encompasses using two different supervised learning classification approaches of feature engineering and data preprocessing [...] Read more.
The overall purpose of this paper is to demonstrate how data preprocessing, training size variation, and subsampling can dynamically change the performance metrics of imbalanced text classification. The methodology encompasses using two different supervised learning classification approaches of feature engineering and data preprocessing with the use of five machine learning classifiers, five imbalanced sampling techniques, specified intervals of training and subsampling sizes, statistical analysis using R and tidyverse on a dataset of 1000 portable document format files divided into five labels from the World Health Organization Coronavirus Research Downloadable Articles of COVID-19 papers and PubMed Central databases of non-COVID-19 papers for binary classification that affects the performance metrics of precision, recall, receiver operating characteristic area under the curve, and accuracy. One approach that involves labeling rows of sentences based on regular expressions significantly improved the performance of imbalanced sampling techniques verified by performing statistical analysis using a t-test documenting performance metrics of iterations versus another approach that automatically labels the sentences based on how the documents are organized into positive and negative classes. The study demonstrates the effectiveness of ML classifiers and sampling techniques in text classification datasets, with different performance levels and class imbalance issues observed in manual and automatic methods of data processing. Full article
(This article belongs to the Topic Bioinformatics and Intelligent Information Processing)
Show Figures

Figure 1

18 pages, 2565 KiB  
Article
Enhancing Electrocardiogram (ECG) Analysis of Implantable Cardiac Monitor Data: An Efficient Pipeline for Multi-Label Classification
by Amnon Bleich, Antje Linnemann, Benjamin Jaidi, Björn H. Diem and Tim O. F. Conrad
Mach. Learn. Knowl. Extr. 2023, 5(4), 1539-1556; https://doi.org/10.3390/make5040077 - 21 Oct 2023
Viewed by 2069
Abstract
Implantable Cardiac Monitor (ICM) devices are demonstrating, as of today, the fastest-growing market for implantable cardiac devices. As such, they are becoming increasingly common in patients for measuring heart electrical activity. ICMs constantly monitor and record a patient’s heart rhythm, and when triggered, [...] Read more.
Implantable Cardiac Monitor (ICM) devices are demonstrating, as of today, the fastest-growing market for implantable cardiac devices. As such, they are becoming increasingly common in patients for measuring heart electrical activity. ICMs constantly monitor and record a patient’s heart rhythm, and when triggered, send it to a secure server where health care professionals (HCPs) can review it. These devices employ a relatively simplistic rule-based algorithm (due to energy consumption constraints) to make alerts for abnormal heart rhythms. This algorithm is usually parameterized to an over-sensitive mode in order to not miss a case (resulting in a relatively high false-positive rate), and this, combined with the device’s nature of constantly monitoring the heart rhythm and its growing popularity, results in HCPs having to analyze and diagnose an increasingly growing number of data. In order to reduce the load on the latter, automated methods for ECG analysis are nowadays becoming a great tool to assist HCPs in their analysis. While state-of-the-art algorithms are data-driven rather than rule-based, training data for ICMs often consist of specific characteristics that make their analysis unique and particularly challenging. This study presents the challenges and solutions in automatically analyzing ICM data and introduces a method for its classification that outperforms existing methods on such data. It carries this out by combining high-frequency noise detection (which often occurs in ICM data) with a semi-supervised learning pipeline that allows for the re-labeling of training episodes and by using segmentation and dimension-reduction techniques that are robust to morphology variations of the sECG signal (which are typical to ICM data). As a result, it performs better than state-of-the-art techniques on such data with, e.g., an F1 score of 0.51 vs. 0.38 of our baseline state-of-the-art technique in correctly calling atrial fibrillation in ICM data. As such, it could be used in numerous ways, such as aiding HCPs in the analysis of ECGs originating from ICMs by, e.g., suggesting a rhythm type. Full article
(This article belongs to the Topic Bioinformatics and Intelligent Information Processing)
Show Figures

Figure 1

14 pages, 2920 KiB  
Article
A Comprehensive Self-Resistance Gene Database for Natural-Product Discovery with an Application to Marine Bacterial Genome Mining
by Hua Dong and Dengming Ming
Int. J. Mol. Sci. 2023, 24(15), 12446; https://doi.org/10.3390/ijms241512446 - 4 Aug 2023
Viewed by 1256
Abstract
In the world of microorganisms, the biosynthesis of natural products in secondary metabolism and the self-resistance of the host always occur together and complement each other. Identifying resistance genes from biosynthetic gene clusters (BGCs) helps us understand the self-defense mechanism and predict the [...] Read more.
In the world of microorganisms, the biosynthesis of natural products in secondary metabolism and the self-resistance of the host always occur together and complement each other. Identifying resistance genes from biosynthetic gene clusters (BGCs) helps us understand the self-defense mechanism and predict the biological activity of natural products synthesized by microorganisms. However, a comprehensive database of resistance genes is still lacking, which hinders natural product annotation studies in large-scale genome mining. In this study, we compiled a resistance gene database (RGDB) by scanning the four available databases: CARD, MIBiG, NCBIAMR, and UniProt. Every resistance gene in the database was annotated with resistance mechanisms and possibly involved chemical compounds, using manual annotation and transformation from the resource databases. The RGDB was applied to analyze resistance genes in 7432 BGCs in 1390 genomes from a marine microbiome project. Our calculation showed that the RGDB successfully identified resistance genes for more than half of the BGCs, suggesting that the database helps prioritize BGCs that produce biologically active natural products. Full article
(This article belongs to the Topic Bioinformatics and Intelligent Information Processing)
Show Figures

Figure 1

16 pages, 1733 KiB  
Article
Search for Dispersed Repeats in Bacterial Genomes Using an Iterative Procedure
by Eugene Korotkov, Yulia Suvorova, Dimitry Kostenko and Maria Korotkova
Int. J. Mol. Sci. 2023, 24(13), 10964; https://doi.org/10.3390/ijms241310964 - 30 Jun 2023
Cited by 2 | Viewed by 2177
Abstract
We have developed a de novo method for the identification of dispersed repeats based on the use of random position-weight matrices (PWMs) and an iterative procedure (IP). The created algorithm (IP method) allows detection of dispersed repeats for which the average number of [...] Read more.
We have developed a de novo method for the identification of dispersed repeats based on the use of random position-weight matrices (PWMs) and an iterative procedure (IP). The created algorithm (IP method) allows detection of dispersed repeats for which the average number of substitutions between any two repeats per nucleotide (x) is less than or equal to 1.5. We have shown that all previously developed methods and algorithms (RED, RECON, and some others) can only find dispersed repeats for x ≤ 1.0. We applied the IP method to find dispersed repeats in the genomes of E. coli and nine other bacterial species. We identify three families of approximately 1.09 × 106, 0.64 × 106, and 0.58 × 106 DNA bases, respectively, constituting almost 50% of the complete E. coli genome. The length of the repeats is in the range of 400 to 600 bp. Other analyzed bacterial genomes contain one to three families of dispersed repeats with a total number of 103 to 6 × 103 copies. The existence of such highly divergent repeats could be associated with the presence of a single-type triplet periodicity in various genes or with the packing of bacterial DNA into a nucleoid. Full article
(This article belongs to the Topic Bioinformatics and Intelligent Information Processing)
Show Figures

Figure 1

Back to TopTop