Computational Analysis of Biomedical Data

A special issue of Life (ISSN 2075-1729). This special issue belongs to the section "Biochemistry, Biophysics and Computational Biology".

Deadline for manuscript submissions: closed (30 September 2022) | Viewed by 17406

Special Issue Editor


E-Mail Website
Guest Editor
1. School of Computing, Communication and Business (Faculty 4), HTW Berlin—University of Applied Sciences, 12459 Berlin, Germany
2. Bioinformatics Division (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany
Interests: NGS; metagenomics; pathogen detection; software development; pipelining; research software engineering

Special Issue Information

Dear Colleagues,

As technology progresses, the amount of biomedical data available from diverse sources – ranging from the application of high-throughput technologies in diagnostics, through the collection and aggregation of (meta)data by public health agencies, to the increasingly machine-useable massive body of scientific publications – is rapidly increasing. However, the application of the information available within this data to actually improving the lives of people is lagging far behind its potential.

This special issue aims to collect high-quality research addressing this gap. Submissions are welcome that present approaches to generate added value from the analysis of biomedical data. Such approaches include, but are not limited to:

  • Novel algorithms or combinations of algorithms for biomedical data analysis
  • Tools for the analysis of biomedical data that, thanks to the application of good software engineering practices, offer sufficientl robustness and repdroducibility to be applied in clinical settings
  • Technical and organisational facilitators of cross-domain use and analysis of biomedical data

Dr. Piotr Wojtek Dabrowski
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Life is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • bioinformatics
  • omics data analysis
  • biomedical image processing
  • databases
  • data integration
  • software engineering

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

17 pages, 3794 KiB  
Article
PathoLive—Real-Time Pathogen Identification from Metagenomic Illumina Datasets
by Simon H. Tausch, Tobias P. Loka, Jakob M. Schulze, Andreas Andrusch, Jeanette Klenner, Piotr Wojciech Dabrowski, Martin S. Lindner, Andreas Nitsche and Bernhard Y. Renard
Life 2022, 12(9), 1345; https://doi.org/10.3390/life12091345 - 30 Aug 2022
Cited by 4 | Viewed by 2069
Abstract
Over the past years, NGS has become a crucial workhorse for open-view pathogen diagnostics. Yet, long turnaround times result from using massively parallel high-throughput technologies as the analysis can only be performed after sequencing has finished. The interpretation of results can further be [...] Read more.
Over the past years, NGS has become a crucial workhorse for open-view pathogen diagnostics. Yet, long turnaround times result from using massively parallel high-throughput technologies as the analysis can only be performed after sequencing has finished. The interpretation of results can further be challenged by contaminations, clinically irrelevant sequences, and the sheer amount and complexity of the data. We implemented PathoLive, a real-time diagnostics pipeline for the detection of pathogens from clinical samples hours before sequencing has finished. Based on real-time alignment with HiLive2, mappings are scored with respect to common contaminations, low-entropy areas, and sequences of widespread, non-pathogenic organisms. The results are visualized using an interactive taxonomic tree that provides an easily interpretable overview of the relevance of hits. For a human plasma sample that was spiked in vitro with six pathogenic viruses, all agents were clearly detected after only 40 of 200 sequencing cycles. For a real-world sample from Sudan, the results correctly indicated the presence of Crimean-Congo hemorrhagic fever virus. In a second real-world dataset from the 2019 SARS-CoV-2 outbreak in Wuhan, we found the presence of a SARS coronavirus as the most relevant hit without the novel virus reference genome being included in the database. For all samples, clinically irrelevant hits were correctly de-emphasized. Our approach is valuable to obtain fast and accurate NGS-based pathogen identifications and correctly prioritize and visualize them based on their clinical significance: PathoLive is open source and available on GitLab and BioConda. Full article
(This article belongs to the Special Issue Computational Analysis of Biomedical Data)
Show Figures

Figure 1

16 pages, 1991 KiB  
Article
A Reproducible Deep-Learning-Based Computer-Aided Diagnosis Tool for Frontotemporal Dementia Using MONAI and Clinica Frameworks
by Andrea Termine, Carlo Fabrizio, Carlo Caltagirone, Laura Petrosini and on behalf of the Frontotemporal Lobar Degeneration Neuroimaging Initiative
Life 2022, 12(7), 947; https://doi.org/10.3390/life12070947 - 23 Jun 2022
Cited by 9 | Viewed by 3111
Abstract
Despite Artificial Intelligence (AI) being a leading technology in biomedical research, real-life implementation of AI-based Computer-Aided Diagnosis (CAD) tools into the clinical setting is still remote due to unstandardized practices during development. However, few or no attempts have been made to propose a [...] Read more.
Despite Artificial Intelligence (AI) being a leading technology in biomedical research, real-life implementation of AI-based Computer-Aided Diagnosis (CAD) tools into the clinical setting is still remote due to unstandardized practices during development. However, few or no attempts have been made to propose a reproducible CAD development workflow for 3D MRI data. In this paper, we present the development of an easily reproducible and reliable CAD tool using the Clinica and MONAI frameworks that were developed to introduce standardized practices in medical imaging. A Deep Learning (DL) algorithm was trained to detect frontotemporal dementia (FTD) on data from the NIFD database to ensure reproducibility. The DL model yielded 0.80 accuracy (95% confidence intervals: 0.64, 0.91), 1 sensitivity, 0.6 specificity, 0.83 F1-score, and 0.86 AUC, achieving a comparable performance with other FTD classification approaches. Explainable AI methods were applied to understand AI behavior and to identify regions of the images where the DL model misbehaves. Attention maps highlighted that its decision was driven by hallmarking brain areas for FTD and helped us to understand how to improve FTD detection. The proposed standardized methodology could be useful for benchmark comparison in FTD classification. AI-based CAD tools should be developed with the goal of standardizing pipelines, as varying pre-processing and training methods, along with the absence of model behavior explanations, negatively impact regulators’ attitudes towards CAD. The adoption of common best practices for neuroimaging data analysis is a step toward fast evaluation of efficacy and safety of CAD and may accelerate the adoption of AI products in the healthcare system. Full article
(This article belongs to the Special Issue Computational Analysis of Biomedical Data)
Show Figures

Figure 1

25 pages, 8766 KiB  
Article
Recommendations of scRNA-seq Differential Gene Expression Analysis Based on Comprehensive Benchmarking
by Jake Gagnon, Lira Pi, Matthew Ryals, Qingwen Wan, Wenxing Hu, Zhengyu Ouyang, Baohong Zhang and Kejie Li
Life 2022, 12(6), 850; https://doi.org/10.3390/life12060850 - 7 Jun 2022
Cited by 5 | Viewed by 5614
Abstract
To guide analysts to select the right tool and parameters in differential gene expression analyses of single-cell RNA sequencing (scRNA-seq) data, we developed a novel simulator that recapitulates the data characteristics of real scRNA-seq datasets while accounting for all the relevant sources of [...] Read more.
To guide analysts to select the right tool and parameters in differential gene expression analyses of single-cell RNA sequencing (scRNA-seq) data, we developed a novel simulator that recapitulates the data characteristics of real scRNA-seq datasets while accounting for all the relevant sources of variation in a multi-subject, multi-condition scRNA-seq experiment: the cell-to-cell variation within a subject, the variation across subjects, the variability across cell types, the mean/variance relationship of gene expression across genes, library size effects, group effects, and covariate effects. By applying it to benchmark 12 differential gene expression analysis methods (including cell-level and pseudo-bulk methods) on simulated multi-condition, multi-subject data of the 10x Genomics platform, we demonstrated that methods originating from the negative binomial mixed model such as glmmTMB and NEBULA-HL outperformed other methods. Utilizing NEBULA-HL in a statistical analysis pipeline for single-cell analysis will enable scientists to better understand the cell-type-specific transcriptomic response to disease or treatment effects and to discover new drug targets. Further, application to two real datasets showed the outperformance of our differential expression (DE) pipeline, with unified findings of differentially expressed genes (DEG) and a pseudo-time trajectory transcriptomic result. In the end, we made recommendations for filtering strategies of cells and genes based on simulation results to achieve optimal experimental goals. Full article
(This article belongs to the Special Issue Computational Analysis of Biomedical Data)
Show Figures

Figure 1

18 pages, 2004 KiB  
Article
Structured, Harmonized, and Interoperable Integration of Clinical Routine Data to Compute Heart Failure Risk Scores
by Kim K. Sommer, Ali Amr, Udo Bavendiek, Felix Beierle, Peter Brunecker, Henning Dathe, Jürgen Eils, Maximilian Ertl, Georg Fette, Matthias Gietzelt, Bettina Heidecker, Kristian Hellenkamp, Peter Heuschmann, Jennifer D. E. Hoos, Tibor Kesztyüs, Fabian Kerwagen, Aljoscha Kindermann, Dagmar Krefting, Ulf Landmesser, Michael Marschollek, Benjamin Meder, Angela Merzweiler, Fabian Prasser, Rüdiger Pryss, Jendrik Richter, Philipp Schneider, Stefan Störk and Christoph Dieterichadd Show full author list remove Hide full author list
Life 2022, 12(5), 749; https://doi.org/10.3390/life12050749 - 18 May 2022
Cited by 1 | Viewed by 2820
Abstract
Risk prediction in patients with heart failure (HF) is essential to improve the tailoring of preventive, diagnostic, and therapeutic strategies for the individual patient, and effectively use health care resources. Risk scores derived from controlled clinical studies can be used to calculate the [...] Read more.
Risk prediction in patients with heart failure (HF) is essential to improve the tailoring of preventive, diagnostic, and therapeutic strategies for the individual patient, and effectively use health care resources. Risk scores derived from controlled clinical studies can be used to calculate the risk of mortality and HF hospitalizations. However, these scores are poorly implemented into routine care, predominantly because their calculation requires considerable efforts in practice and necessary data often are not available in an interoperable format. In this work, we demonstrate the feasibility of a multi-site solution to derive and calculate two exemplary HF scores from clinical routine data (MAGGIC score with six continuous and eight categorical variables; Barcelona Bio-HF score with five continuous and six categorical variables). Within HiGHmed, a German Medical Informatics Initiative consortium, we implemented an interoperable solution, collecting a harmonized HF-phenotypic core data set (CDS) within the openEHR framework. Our approach minimizes the need for manual data entry by automatically retrieving data from primary systems. We show, across five participating medical centers, that the implemented structures to execute dedicated data queries, followed by harmonized data processing and score calculation, work well in practice. In summary, we demonstrated the feasibility of clinical routine data usage across multiple partner sites to compute HF risk scores. This solution can be extended to a large spectrum of applications in clinical care. Full article
(This article belongs to the Special Issue Computational Analysis of Biomedical Data)
Show Figures

Figure 1

22 pages, 1715 KiB  
Article
Systems-Based Approach for Optimization of Assembly-Free Bacterial MLST Mapping
by Natasha Pavlovikj, Joao Carlos Gomes-Neto, Jitender S. Deogun and Andrew K. Benson
Life 2022, 12(5), 670; https://doi.org/10.3390/life12050670 - 30 Apr 2022
Viewed by 2555
Abstract
Epidemiological surveillance of bacterial pathogens requires real-time data analysis with a fast turnaround, while aiming at generating two main outcomes: (1) species-level identification and (2) variant mapping at different levels of genotypic resolution for population-based tracking and surveillance, in addition to predicting traits [...] Read more.
Epidemiological surveillance of bacterial pathogens requires real-time data analysis with a fast turnaround, while aiming at generating two main outcomes: (1) species-level identification and (2) variant mapping at different levels of genotypic resolution for population-based tracking and surveillance, in addition to predicting traits such as antimicrobial resistance (AMR). Multi-locus sequence typing (MLST) aids this process by identifying sequence types (ST) based on seven ubiquitous genome-scattered loci. In this paper, we selected one assembly-dependent and one assembly-free method for ST mapping and applied them with the default settings and ST schemes they are distributed with, and systematically assessed their accuracy and scalability across a wide array of phylogenetically divergent Public Health-relevant bacterial pathogens with available MLST databases. Our data show that the optimal k-mer length for stringMLST is species-specific and that genome-intrinsic and -extrinsic features can affect the performance and accuracy of the program. Although suitable parameters could be identified for most organisms, there were instances where this program may not be directly deployable in its current format. Next, we integrated stringMLST into our freely available and scalable hierarchical-based population genomics platform, ProkEvo, and further demonstrated how the implementation facilitates automated, reproducible bacterial population analysis. Full article
(This article belongs to the Special Issue Computational Analysis of Biomedical Data)
Show Figures

Figure 1

Back to TopTop