ijms-logo

Journal Browser

Journal Browser

Data Mining and Bioinformatic Tools for Health

A special issue of International Journal of Molecular Sciences (ISSN 1422-0067). This special issue belongs to the section "Molecular Informatics".

Deadline for manuscript submissions: closed (24 October 2022) | Viewed by 23239

Special Issue Editors


E-Mail Website
Guest Editor
Department of Chemistry and Biology “A. Zambelli”, Università di Salerno, Salerno, Italy
Interests: biochemistry; protein structure and function; bioinformatics; protein modelling; molecular docking; molecular dynamics simulations; rare diseases
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

In the last several years, the quantity of molecular information that can be associated to clinical data has increased considerably, thanks to the implementation of approaches and resources that allow the “omics” view of the diseases. Therefore, bioinformatics is considered essential for the management of a huge amount of data, in the perspective of better diagnosis and treatment of rare as well as complex diseases. Indeed, bioinformatics approaches can find relationships among genomics, transcriptomics, proteomics, metabolomics, interactomics and other “omics” data, which can elucidate the complex cross-talks among different levels and time scales. Data mining methods would enable the simulation of complex systems and the construction of dynamical networks, towards the development of predictive, preventive, and personalized medicine.

This Special Issue aims to show the state-of-the-art and future perspectives of the application of bioinformatics in data mining for human health. Contributions to illustrate both the development of new bioinformatics resources (including databases and tools) and the application of bioinformatics approaches finalized to the development of new diagnostic and therapeutic approaches for the management of rare as well as complex diseases are welcomed. The Special Issue will include also articles from the 16th virtual edition of the Bioinformatics and Computational Biology Conference (www.bbcc-meetings.it), December 1–3rd, 2021.

Prof. Dr. Maria Cubellis
Prof. Dr. Anna Marabotti
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. International Journal of Molecular Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. There is an Article Processing Charge (APC) for publication in this open access journal. For details about the APC please see here. Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • bioinformatics
  • data mining
  • genomics
  • transcriptomics
  • proteomics
  • metabolomics
  • personalized medicine
  • diagnosis
  • therapy

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (8 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Other

14 pages, 1815 KiB  
Article
Machine Learning as a Support for the Diagnosis of Type 2 Diabetes
by Antonio Agliata, Deborah Giordano, Francesco Bardozzo, Salvatore Bottiglieri, Angelo Facchiano and Roberto Tagliaferri
Int. J. Mol. Sci. 2023, 24(7), 6775; https://doi.org/10.3390/ijms24076775 - 5 Apr 2023
Cited by 19 | Viewed by 4394
Abstract
Diabetes is a chronic, metabolic disease characterized by high blood sugar levels. Among the main types of diabetes, type 2 is the most common. Early diagnosis and treatment can prevent or delay the onset of complications. Previous studies examined the application of machine [...] Read more.
Diabetes is a chronic, metabolic disease characterized by high blood sugar levels. Among the main types of diabetes, type 2 is the most common. Early diagnosis and treatment can prevent or delay the onset of complications. Previous studies examined the application of machine learning techniques for prediction of the pathology, and here an artificial neural network shows very promising results as a possible valuable aid in the management and prevention of diabetes. Additionally, its superior ability for long-term predictions makes it an ideal choice for this field of study. We utilized machine learning methods to uncover previously undiscovered associations between an individual’s health status and the development of type 2 diabetes, with the goal of accurately predicting its onset or determining the individual’s risk level. Our study employed a binary classifier, trained on scratch, to identify potential nonlinear relationships between the onset of type 2 diabetes and a set of parameters obtained from patient measurements. Three datasets were utilized, i.e., the National Center for Health Statistics’ (NHANES) biennial survey, MIMIC-III and MIMIC-IV. These datasets were then combined to create a single dataset with the same number of individuals with and without type 2 diabetes. Since the dataset was balanced, the primary evaluation metric for the model was accuracy. The outcomes of this study were encouraging, with the model achieving accuracy levels of up to 86% and a ROC AUC value of 0.934. Further investigation is needed to improve the reliability of the model by considering multiple measurements from the same patient over time. Full article
(This article belongs to the Special Issue Data Mining and Bioinformatic Tools for Health)
Show Figures

Graphical abstract

16 pages, 8803 KiB  
Article
Molecular Docking and Dynamics Simulation Revealed the Potential Inhibitory Activity of New Drugs against Human Topoisomerase I Receptor
by Francesco Madeddu, Jessica Di Martino, Michele Pieroni, Davide Del Buono, Paolo Bottoni, Lorenzo Botta, Tiziana Castrignanò and Raffaele Saladino
Int. J. Mol. Sci. 2022, 23(23), 14652; https://doi.org/10.3390/ijms232314652 - 24 Nov 2022
Cited by 6 | Viewed by 2240
Abstract
Human Topoisomerase I (hTop1p) is a ubiquitous enzyme that relaxes supercoiled DNA through a conserved mechanism involving transient breakage, rotation, and binding. Htop1p is the molecular target of the chemotherapeutic drug camptothecin (CPT). It causes the hTop1p-DNA complex to slow down the binding [...] Read more.
Human Topoisomerase I (hTop1p) is a ubiquitous enzyme that relaxes supercoiled DNA through a conserved mechanism involving transient breakage, rotation, and binding. Htop1p is the molecular target of the chemotherapeutic drug camptothecin (CPT). It causes the hTop1p-DNA complex to slow down the binding process and clash with the replicative machinery during the S phase of the cell cycle, forcing cells to activate the apoptotic response. This gives hTop1p a central role in cancer therapy. Recently, two artesunic acid derivatives (compounds c6 and c7) have been proposed as promising inhibitors of hTop1p with possible antitumor activity. We used several computational approaches to obtain in silico confirmations of the experimental data and to form a comprehensive dynamic description of the ligand-receptor system. We performed molecular docking analyses to verify the ability of the two new derivatives to access the enzyme-DNA interface, and a classical molecular dynamics simulation was performed to assess the capacity of the two compounds to maintain a stable binding pose over time. Finally, we calculated the noncovalent interactions between the two new derivatives and the hTop1p receptor in order to propose a possible inhibitory mechanism like that adopted by CPT. Full article
(This article belongs to the Special Issue Data Mining and Bioinformatic Tools for Health)
Show Figures

Figure 1

13 pages, 1304 KiB  
Article
hgtseq: A Standard Pipeline to Study Horizontal Gene Transfer
by Simone Carpanzano, Mariangela Santorsola, nf-core community and Francesco Lescai
Int. J. Mol. Sci. 2022, 23(23), 14512; https://doi.org/10.3390/ijms232314512 - 22 Nov 2022
Cited by 1 | Viewed by 2680
Abstract
Horizontal gene transfer (HGT) is well described in prokaryotes: it plays a crucial role in evolution, and has functional consequences in insects and plants. However, less is known about HGT in humans. Studies have reported bacterial integrations in cancer patients, and microbial sequences [...] Read more.
Horizontal gene transfer (HGT) is well described in prokaryotes: it plays a crucial role in evolution, and has functional consequences in insects and plants. However, less is known about HGT in humans. Studies have reported bacterial integrations in cancer patients, and microbial sequences have been detected in data from well-known human sequencing projects. Few of the existing tools for investigating HGT are highly automated. Thanks to the adoption of Nextflow for life sciences workflows, and to the standards and best practices curated by communities such as nf-core, fully automated, portable, and scalable pipelines can now be developed. Here we present nf-core/hgtseq to facilitate the analysis of HGT from sequencing data in different organisms. We showcase its performance by analysing six exome datasets from five mammals. Hgtseq can be run seamlessly in any computing environment and accepts data generated by existing exome and whole-genome sequencing projects; this will enable researchers to expand their analyses into this area. Fundamental questions are still open about the mechanisms and the extent or role of horizontal gene transfer: by releasing hgtseq we provide a standardised tool which will enable a systematic investigation of this phenomenon, thus paving the way for a better understanding of HGT. Full article
(This article belongs to the Special Issue Data Mining and Bioinformatic Tools for Health)
Show Figures

Figure 1

26 pages, 3641 KiB  
Article
Antiproliferative Activity Predictor: A New Reliable In Silico Tool for Drug Response Prediction against NCI60 Panel
by Annamaria Martorana, Gabriele La Monica, Alessia Bono, Salvatore Mannino, Silvestre Buscemi, Antonio Palumbo Piccionello, Carla Gentile, Antonino Lauria and Daniele Peri
Int. J. Mol. Sci. 2022, 23(22), 14374; https://doi.org/10.3390/ijms232214374 - 19 Nov 2022
Cited by 3 | Viewed by 2351
Abstract
In vitro antiproliferative assays still represent one of the most important tools in the anticancer drug discovery field, especially to gain insights into the mechanisms of action of anticancer small molecules. The NCI-DTP (National Cancer Institute Developmental Therapeutics Program) undoubtedly represents the most [...] Read more.
In vitro antiproliferative assays still represent one of the most important tools in the anticancer drug discovery field, especially to gain insights into the mechanisms of action of anticancer small molecules. The NCI-DTP (National Cancer Institute Developmental Therapeutics Program) undoubtedly represents the most famous project aimed at rapidly testing thousands of compounds against multiple tumor cell lines (NCI60). The large amount of biological data stored in the National Cancer Institute (NCI) database and many other databases has led researchers in the fields of computational biology and medicinal chemistry to develop tools to predict the anticancer properties of new agents in advance. In this work, based on the available antiproliferative data collected by the NCI and the manipulation of molecular descriptors, we propose the new in silico Antiproliferative Activity Predictor (AAP) tool to calculate the GI50 values of input structures against the NCI60 panel. This ligand-based protocol, validated by both internal and external sets of structures, has proven to be highly reliable and robust. The obtained GI50 values of a test set of 99 structures present an error of less than ±1 unit. The AAP is more powerful for GI50 calculation in the range of 4–6, showing that the results strictly correlate with the experimental data. The encouraging results were further supported by the examination of an in-house database of curcumin analogues that have already been studied as antiproliferative agents. The AAP tool identified several potentially active compounds, and a subsequent evaluation of a set of molecules selected by the NCI for the one-dose/five-dose antiproliferative assays confirmed the great potential of our protocol for the development of new anticancer small molecules. The integration of the AAP tool in the free web service DRUDIT provides an interesting device for the discovery and/or optimization of anticancer drugs to the medicinal chemistry community. The training set will be updated with new NCI-tested compounds to cover more chemical spaces, activities, and cell lines. Currently, the same protocol is being developed for predicting the TGI (total growth inhibition) and LC50 (median lethal concentration) parameters to estimate toxicity profiles of small molecules. Full article
(This article belongs to the Special Issue Data Mining and Bioinformatic Tools for Health)
Show Figures

Figure 1

15 pages, 1212 KiB  
Article
Artificial Intelligence Predictor for Alzheimer’s Disease Trained on Blood Transcriptome: The Role of Oxidative Stress
by Luigi Chiricosta, Simone D’Angiolini, Agnese Gugliandolo and Emanuela Mazzon
Int. J. Mol. Sci. 2022, 23(9), 5237; https://doi.org/10.3390/ijms23095237 - 7 May 2022
Cited by 9 | Viewed by 2448
Abstract
Alzheimer’s disease (AD) is an incurable neurodegenerative disease diagnosed by clinicians through healthcare records and neuroimaging techniques. These methods lack sensitivity and specificity, so new antemortem non-invasive strategies to diagnose AD are needed. Herein, we designed a machine learning predictor based on transcriptomic [...] Read more.
Alzheimer’s disease (AD) is an incurable neurodegenerative disease diagnosed by clinicians through healthcare records and neuroimaging techniques. These methods lack sensitivity and specificity, so new antemortem non-invasive strategies to diagnose AD are needed. Herein, we designed a machine learning predictor based on transcriptomic data obtained from the blood of AD patients and individuals without dementia (non-AD) through an 8 × 60 K microarray. The dataset was used to train different models with different hyperparameters. The support vector machines method allowed us to reach a Receiver Operating Characteristic score of 93% and an accuracy of 89%. High score levels were also achieved by the neural network and logistic regression methods. Furthermore, the Gene Ontology enrichment analysis of the features selected to train the model along with the genes differentially expressed between the non-AD and AD transcriptomic profiles shows the “mitochondrial translation” biological process to be the most interesting. In addition, inspection of the KEGG pathways suggests that the accumulation of β-amyloid triggers electron transport chain impairment, enhancement of reactive oxygen species and endoplasmic reticulum stress. Taken together, all these elements suggest that the oxidative stress induced by β-amyloid is a key feature trained by the model for the prediction of AD with high accuracy. Full article
(This article belongs to the Special Issue Data Mining and Bioinformatic Tools for Health)
Show Figures

Figure 1

15 pages, 3660 KiB  
Article
Integrated OMICs Approach for the Group 1 Protease Mite-Allergen of House Dust Mite Dermatophagoides microceras
by Rei-Hsing Hu, Chun-Wen Cheng, Chia-Ta Wu, Jiunn-Liang Ko, Ko-Huang Lue and Yu-Fan Liu
Int. J. Mol. Sci. 2022, 23(7), 3810; https://doi.org/10.3390/ijms23073810 - 30 Mar 2022
Cited by 2 | Viewed by 1714
Abstract
House dust mites (HDMs) are one of the most important allergy-causing agents of asthma. In central Taiwan, the prevalence of sensitization to Dermatophagoides microceras (Der m), a particular mite species of HDMs, is approximately 80% and is related to the IgE [...] Read more.
House dust mites (HDMs) are one of the most important allergy-causing agents of asthma. In central Taiwan, the prevalence of sensitization to Dermatophagoides microceras (Der m), a particular mite species of HDMs, is approximately 80% and is related to the IgE crossing reactivity of Dermatophagoides pteronyssinus (Der p) and Dermatophagoides farinae (Der f). Integrated OMICs examination was used to identify and characterize the specific group 1 mite-allergic component (Der m 1). De novo draft genomic assembly and comparative genome analysis predicted that the full-length Der m 1 allergen gene is 321 amino acids in silico. Proteomics verified this result, and its recombinant protein production implicated the cysteine protease and α chain of fibrinogen proteolytic activity. In the sensitized mice, pathophysiological features and increased neutrophils accumulation were evident in the lung tissues and BALF with the combination of Der m 1 and 2 inhalation, respectively. Principal component analysis (PCA) of mice cytokines revealed that the cytokine profiles of the allergen-sensitized mice model with combined Der m 1 and 2 were similar to those with Der m 2 alone but differed from those with Der m 1 alone. Regarding the possible sensitizing roles of Der m 1 in the cells, the fibrinogen cleavage products (FCPs) derived from combined Der m 1 and Der m 2 induced the expression of pro-inflammatory cytokines IL-6 and IL-8 in human bronchial epithelium cells. Der m 1 biologically functions as a cysteine protease and contributes to the α chain of fibrinogen digestion in vitro. The combination of Der m 1 and 2 could induce similar cytokines expression patterns to Der m 2 in mice, and the FCPs derived from Der m 1 has a synergistic effect with Der m 2 to induce the expression of pro-inflammatory cytokines in human bronchial epithelium cells. Full article
(This article belongs to the Special Issue Data Mining and Bioinformatic Tools for Health)
Show Figures

Figure 1

13 pages, 2228 KiB  
Article
High-Integrity Sequencing of Spike Gene for SARS-CoV-2 Variant Determination
by Yu-Chieh Liao, Feng-Jui Chen, Min-Chieh Chuang, Han-Chieh Wu, Wan-Chen Ji, Guann-Yi Yu and Tsi-Shu Huang
Int. J. Mol. Sci. 2022, 23(6), 3257; https://doi.org/10.3390/ijms23063257 - 17 Mar 2022
Cited by 5 | Viewed by 2161
Abstract
For tiling of the SARS-CoV-2 genome, the ARTIC Network provided a V4 protocol using 99 pairs of primers for amplicon production and is currently the widely used amplicon-based approach. However, this technique has regions of low sequence coverage and is labour-, time-, and [...] Read more.
For tiling of the SARS-CoV-2 genome, the ARTIC Network provided a V4 protocol using 99 pairs of primers for amplicon production and is currently the widely used amplicon-based approach. However, this technique has regions of low sequence coverage and is labour-, time-, and cost-intensive. Moreover, it requires 14 pairs of primers in two separate PCRs to obtain spike gene sequences. To overcome these disadvantages, we proposed a single PCR to efficiently detect spike gene mutations. We proposed a bioinformatic protocol that can process FASTQ reads into spike gene consensus sequences to accurately call spike protein variants from sequenced samples or to fairly express the cases of missing amplicons. We evaluated the in silico detection rate of primer sets that yield amplicon sizes of 400, 1200, and 2500 bp for spike gene sequencing of SARS-CoV-2 to be 59.49, 76.19, and 92.20%, respectively. The in silico detection rate of our proposed single PCR primers was 97.07%. We demonstrated the robustness of our analytical protocol against 3000 Oxford Nanopore sequencing runs of distinct datasets, thus ensuring high-integrity sequencing of spike genes for variant SARS-CoV-2 determination. Our protocol works well with the data yielded from versatile primer designs, making it easy to determine spike protein variants. Full article
(This article belongs to the Special Issue Data Mining and Bioinformatic Tools for Health)
Show Figures

Figure 1

Other

Jump to: Research

12 pages, 1403 KiB  
Perspective
What Is a Digital Twin? Experimental Design for a Data-Centric Machine Learning Perspective in Health
by Frank Emmert-Streib and Olli Yli-Harja
Int. J. Mol. Sci. 2022, 23(21), 13149; https://doi.org/10.3390/ijms232113149 - 29 Oct 2022
Cited by 15 | Viewed by 3252
Abstract
The idea of a digital twin has recently gained widespread attention. While, so far, it has been used predominantly for problems in engineering and manufacturing, it is believed that a digital twin also holds great promise for applications in medicine and health. However, [...] Read more.
The idea of a digital twin has recently gained widespread attention. While, so far, it has been used predominantly for problems in engineering and manufacturing, it is believed that a digital twin also holds great promise for applications in medicine and health. However, a problem that severely hampers progress in these fields is the lack of a solid definition of the concept behind a digital twin that would be directly amenable for such big data-driven fields requiring a statistical data analysis. In this paper, we address this problem. We will see that the term ’digital twin’, as used in the literature, is like a Matryoshka doll. For this reason, we unstack the concept via a data-centric machine learning perspective, allowing us to define its main components. As a consequence, we suggest to use the term Digital Twin System instead of digital twin because this highlights its complex interconnected substructure. In addition, we address ethical concerns that result from treatment suggestions for patients based on simulated data and a possible lack of explainability of the underling models. Full article
(This article belongs to the Special Issue Data Mining and Bioinformatic Tools for Health)
Show Figures

Figure 1

Back to TopTop