Submit to IJMS Review for IJMS Propose a Special Issue

Journal Menu

Journal Browser

Advances in AI and Machine Learning for the Analysis of -Omics and Complex Molecular Data

Special Issue Editors
Special Issue Information
Keywords
Benefits of Publishing in a Special Issue
Published Papers

A special issue of International Journal of Molecular Sciences (ISSN 1422-0067). This special issue belongs to the section "Molecular Informatics".

Deadline for manuscript submissions: closed (20 February 2025) | Viewed by 14338

Share This Special Issue

Special Issue Editors

Prof. Dr. David P Kreil

E-Mail Website
Guest Editor

1. Department für Biotechnologie, Universität für Bodenkultur Wien, (BOKU), Vienna, Austria
2. Institute of Advanced Research in Artificial Intelligence (IARAI), Vienna, Austria
Interests: machine learning; artificial intelligence; quantitative assays

Dr. Aleksandra Gruca

E-Mail Website
Guest Editor

Department of Computer Networks and Systems, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
Interests: machine learning; computational biology; bioinformatics; protein function

Special Issue Information

Dear Colleagues,

Increasingly, AI and machine learning spearhead efforts in analyzing the complex datasets generated by high-throughput -omic technologies. Advances in AI and machine learning, on the one hand, and progress in their applications, on the other hand, are traditionally pursued by different scientific communities, which we aim to bring together in this Special Issue of the IJMS.

We thus invite you to share your best work in the following domains:

(1) Advancing AI and machine learning for the analysis of -omics and complex molecular data. We welcome methodological advances or insights that robustly generalize to different data sources. Where complex algorithms or pipelines are introduced, individual steps need to be justified, such as through ablation studies.

(2) Applying AI and machine learning for novel insights into the mechanisms of biological processes or systems at the molecular level. We welcome novel insights concerning molecular functions, regulation mechanisms, pathways (regulation, signaling, metabolic, etc.), or molecular pathology. The identification of biomarkers is of interest if robust across cohorts or linked to mechanisms.

Novel insights should be developed in the context of complex systems, including, but not limited to, studies on organism interactions, healthy cohorts, or heterogenous diseases, such as cardiovascular, autoimmune, or ageing-related diseases, and cancer.

We sincerely hope that this Special Issue can showcase your latest work!

This Special Issue is edited by members of COST Action AtheroNET CA21153 (Network for implementing multi-omics approaches in atherosclerotic cardiovascular disease prevention and research, www.atheronet.eu).

Prof. Dr. David P Kreil
Dr. Aleksandra Gruca
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. International Journal of Molecular Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. There is an Article Processing Charge (APC) for publication in this open access journal. For details about the APC please see here. Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

computational biology
bioinformatics
machine learning/AI
high-throughput data analysis
multi-omics
genomics
transcriptomics
proteomics
metabolomics
regulation mechanisms
pathway analysis (regulation, signaling, and metabolic)
molecular pathology
complex diseases (cancer, cardiovascular, autoimmune, ageing-related, etc.)
biomarkers
functional prediction/annotation
benchmarking

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (5 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

21 pages, 3486 KB

Open AccessArticle

Biologically Informed Machine Learning Prioritizes Dietary Supplements That Protect Neural Crest Cells from Ethanol-Induced Epigenetic Dysregulation and Developmental Impairment

by Xiaoqing Wang, Miao Bai, Shuoyang Wang, Hongjia Qian, Jie Liu, Wenke Feng, Huang-ge Zhang, Xiaoyang Wu and Shao-yu Chen

Int. J. Mol. Sci. 2026, 27(1), 295; https://doi.org/10.3390/ijms27010295 - 27 Dec 2025

Viewed by 901

Abstract

The impairment of neural crest cells (NCCs) plays a pivotal role in the pathogenesis of fetal alcohol spectrum disorders (FASD). Epigenetic regulators mediate ethanol-induced disruptions in NCC development and represent promising targets for nutritional interventions. Here, we developed a biologically informed machine learning framework to predict nutritional supplements that modulate five key epigenetic regulators (miR-34a, DNMT3a, HDAC, miR-125b, and miR-135a) and mitigate ethanol’s adverse effects on NCCs. The optimized models demonstrated robust predictive performance and identified a number of nutritional supplements that could attenuate ethanol-induced NCC impairment, including resveratrol, vitamin B12, emodin, quercetin, and broccoli sprout-derived compounds. Our optimized models also revealed structural features that are critical for mitigating ethanol-induced NCC impairment through specific epigenetic mechanisms. These findings support predictive modeling as a tool to prioritize nutritional supplements for further investigation and the development of dietary strategies to prevent or reduce the risk of FASD. Full article

(This article belongs to the Special Issue Advances in AI and Machine Learning for the Analysis of -Omics and Complex Molecular Data)

► Show Figures

Graphical abstract

16 pages, 4225 KB

Open AccessArticle

Integrating Metabolomics Domain Knowledge with Explainable Machine Learning in Atherosclerotic Cardiovascular Disease Classification

by Everton Santana, Eliana Ibrahimi, Evangelos Ntalianis, Nicholas Cauwenberghs and Tatiana Kuznetsova

Int. J. Mol. Sci. 2024, 25(23), 12905; https://doi.org/10.3390/ijms252312905 - 30 Nov 2024

Cited by 4 | Viewed by 1748

Abstract

Metabolomic data often present challenges due to high dimensionality, collinearity, and variability in metabolite concentrations. Machine learning (ML) application in metabolomic analyses is enabling the extraction of meaningful information from complex data. Bringing together domain-specific knowledge from metabolomics with explainable ML methods can refine the predictive performance and interpretability of models used in atherosclerosis research. In this work, we aimed to identify the most impactful metabolites associated with the presence of atherosclerotic cardiovascular disease (ASCVD) in cross-sectional case–control studies using explainable ML methods integrated with metabolomics domain knowledge. For this, a subset from the FLEMENGHO cohort with metabolomic data available was used as the training cohort, including 63 patients with a history of ASCVD and 52 non-smoking controls matched by age, sex, and body mass index from the same population. First, Partial Least Squares Discriminant Analysis (PLS-DA) was applied for dimensionality reduction. The selected metabolites’ correlations were analyzed by considering their chemical categorization. Then, eXtreme Gradient Boosting (XGBoost) was used to identify metabolites that characterize ASCVD. Next, the selected metabolites were evaluated in an external cohort to determine their effectiveness in distinguishing between cases and controls. A total of 56 metabolites were selected for ASCVD discrimination using PLS-DA. The primary identified metabolites’ superclasses included lipids, organic acids, and organic oxygen compounds. Upon integrating these metabolites with the XGBoost model, the classification yielded a test area under the curve (AUC) of 0.75. SHAP analyses ranked cholesterol, 3-methylhistidine, and glucuronic acid among the most impactful features and showed the diversity of metabolites considered for building the ASCVD discriminator. Also using XGBoost, the selected metabolites achieved an AUC of 0.93 in an independent external validation cohort. In conclusion, the combination of different metabolites has the potential to build classifiers for ASCVD. Integrating metabolite categorization within the SHAP analysis further enhanced the interpretability of the model, offering insights into metabolite-specific contributions to ASCVD risk. Full article

(This article belongs to the Special Issue Advances in AI and Machine Learning for the Analysis of -Omics and Complex Molecular Data)

► Show Figures

Figure 1

19 pages, 3020 KB

Open AccessArticle

Multimodal Identification of Molecular Factors Linked to Severe Diabetic Foot Ulcers Using Artificial Intelligence

by Anita Omo-Okhuasuyi, Yu-Fang Jin, Mahmoud ElHefnawi, Yidong Chen and Mario Flores

Int. J. Mol. Sci. 2024, 25(19), 10686; https://doi.org/10.3390/ijms251910686 - 4 Oct 2024

Cited by 3 | Viewed by 3403

Abstract

Diabetic foot ulcers (DFUs) are a severe complication of diabetes mellitus (DM), which often lead to hospitalization and non-traumatic amputations in the United States. Diabetes prevalence estimates in South Texas exceed the national estimate and the number of diagnosed cases is higher among Hispanic adults compared to their non-Hispanic white counterparts. San Antonio, a predominantly Hispanic city, reports significantly higher annual rates of diabetic amputations compared to Texas. The late identification of severe foot ulcers minimizes the likelihood of reducing amputation risk. The aim of this study was to identify molecular factors related to the severity of DFUs by leveraging a multimodal approach. We first utilized electronic health records (EHRs) from two large demographic groups, encompassing thousands of patients, to identify blood tests such as cholesterol, blood sugar, and specific protein tests that are significantly associated with severe DFUs. Next, we translated the protein components from these blood tests into their ribonucleic acid (RNA) counterparts and analyzed them using public bulk and single-cell RNA sequencing datasets. Using these data, we applied a machine learning pipeline to uncover cell-type-specific and molecular factors associated with varying degrees of DFU severity. Our results showed that several blood test results, such as the Albumin/Creatinine Ratio (ACR) and cholesterol and coagulation tissue factor levels, correlated with DFU severity across key demographic groups. These tests exhibited varying degrees of significance based on demographic differences. Using bulk RNA-Sequenced (RNA-Seq) data, we found that apolipoprotein E (APOE) protein, a component of lipoproteins that are responsible for cholesterol transport and metabolism, is linked to DFU severity. Furthermore, the single-cell RNA-Seq (scRNA-seq) analysis revealed a cluster of cells identified as keratinocytes that showed overexpression of APOE in severe DFU cases. Overall, this study demonstrates how integrating extensive EHRs data with single-cell transcriptomics can refine the search for molecular markers and identify cell-type-specific and molecular factors associated with DFU severity while considering key demographic differences. Full article

(This article belongs to the Special Issue Advances in AI and Machine Learning for the Analysis of -Omics and Complex Molecular Data)

► Show Figures

Figure 1

21 pages, 16949 KB

Open AccessArticle

Combining Hyperspectral Techniques and Genome-Wide Association Studies to Predict Peanut Seed Vigor and Explore Associated Genetic Loci

by Zhenhui Xiong, Shiyuan Liu, Jiangtao Tan, Zijun Huang, Xi Li, Guidan Zhuang, Zewu Fang, Tingting Chen and Lei Zhang

Int. J. Mol. Sci. 2024, 25(15), 8414; https://doi.org/10.3390/ijms25158414 - 1 Aug 2024

Cited by 4 | Viewed by 2359

Abstract

Seed vigor significantly affects peanut breeding and agricultural yield by influencing seed germination and seedling growth and development. Traditional vigor testing methods are inadequate for modern high-throughput assays. Although hyperspectral technology shows potential for monitoring various crop traits, its application in predicting peanut seed vigor is still limited. This study developed and validated a method that combines hyperspectral technology with genome-wide association studies (GWAS) to achieve high-throughput detection of seed vigor and identify related functional genes. Hyperspectral phenotyping data and physiological indices from different peanut seed populations were used as input data to construct models using machine learning regression algorithms to accurately monitor changes in vigor. Model-predicted phenotypic data from 191 peanut varieties were used in GWAS, gene-based association studies, and haplotype analyses to screen for functional genes. Real-time fluorescence quantitative PCR (qPCR) was used to analyze the expression of functional genes in three high-vigor and three low-vigor germplasms. The results indicated that the random forest and support vector machine models provided effective phenotypic data. We identified Arahy.VMLN7L and Arahy.7XWF6F, with Arahy.VMLN7L negatively regulating seed vigor and Arahy.7XWF6F positively regulating it, suggesting distinct regulatory mechanisms. This study confirms that GWAS based on hyperspectral phenotyping reveals genetic relationships in seed vigor levels, offering novel insights and directions for future peanut breeding, accelerating genetic improvements, and boosting agricultural yields. This approach can be extended to monitor and explore germplasms and other key variables in various crops. Full article

(This article belongs to the Special Issue Advances in AI and Machine Learning for the Analysis of -Omics and Complex Molecular Data)

► Show Figures

Figure 1

20 pages, 23188 KB

Open AccessArticle

Boosting Clear Cell Renal Carcinoma-Specific Drug Discovery Using a Deep Learning Algorithm and Single-Cell Analysis

by Yishu Wang, Xiaomin Chen, Ningjun Tang, Mengyao Guo and Dongmei Ai

Int. J. Mol. Sci. 2024, 25(7), 4134; https://doi.org/10.3390/ijms25074134 - 8 Apr 2024

Cited by 6 | Viewed by 4648

Abstract

Clear cell renal carcinoma (ccRCC), the most common subtype of renal cell carcinoma, has the high heterogeneity of a highly complex tumor microenvironment. Existing clinical intervention strategies, such as target therapy and immunotherapy, have failed to achieve good therapeutic effects. In this article, single-cell transcriptome sequencing (scRNA-seq) data from six patients downloaded from the GEO database were adopted to describe the tumor microenvironment (TME) of ccRCC, including its T cells, tumor-associated macrophages (TAMs), endothelial cells (ECs), and cancer-associated fibroblasts (CAFs). Based on the differential typing of the TME, we identified tumor cell-specific regulatory programs that are mediated by three key transcription factors (TFs), whilst the TF EPAS1/HIF-2

α

was identified via drug virtual screening through our analysis of ccRCC’s protein structure. Then, a combined deep graph neural network and machine learning algorithm were used to select anti-ccRCC compounds from bioactive compound libraries, including the FDA-approved drug library, natural product library, and human endogenous metabolite compound library. Finally, five compounds were obtained, including two FDA-approved drugs (flufenamic acid and fludarabine), one endogenous metabolite, one immunology/inflammation-related compound, and one inhibitor of DNA methyltransferase (N4-methylcytidine, a cytosine nucleoside analogue that, like zebularine, has the mechanism of inhibiting DNA methyltransferase). Based on the tumor microenvironment characteristics of ccRCC, five ccRCC-specific compounds were identified, which would give direction of the clinical treatment for ccRCC patients. Full article

(This article belongs to the Special Issue Advances in AI and Machine Learning for the Analysis of -Omics and Complex Molecular Data)

► Show Figures

Journal Menu

Journal Browser

Advances in AI and Machine Learning for the Analysis of -Omics and Complex Molecular Data

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (5 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI