applsci-logo

Journal Browser

Journal Browser

Advances and Applications of Machine Learning for Bioinformatics

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (20 February 2026) | Viewed by 12317

Special Issue Editor

Special Issue Information

Dear Colleagues,

The Special Issue "Advances in Machine Learning for Bioinformatics" aims to bring together cutting-edge research that highlights the applications, challenges, and opportunities of machine learning in bioinformatics. As the field of bioinformatics continues to expand, machine learning techniques offer powerful tools to analyze complex biological data, identify patterns, and derive meaningful insights. This Special Issue invites contributions on diverse topics, including but not limited to, genomics, proteomics, systems biology, computational biology, and healthcare. Emphasis will be given to novel methodologies, algorithms, and case studies that demonstrate the effectiveness of machine learning in solving critical problems in biological sciences.

Prof. Dr. Malik Yousef
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • machine learning
  • bioinformatics
  • genomics
  • proteomics
  • computational biology
  • systems biology
  • deep learning
  • biological data analysis
  • healthcare informatics
  • predictive modeling

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

24 pages, 2003 KB  
Article
SEN-Batch Pseudo-Labeling with NeuroStack for Robust Semi-Supervised Liver Classification
by Pranabes Gangopadhyay, Perumal Ganeshkumar, Tirtharaj Sen, Bidesh Chakraborty, Arindam Biswas and Prabu Pachiyannan
Appl. Sci. 2026, 16(7), 3446; https://doi.org/10.3390/app16073446 - 2 Apr 2026
Viewed by 714
Abstract
The liver is vital for metabolism, detoxification, and homeostasis. Untreated liver disease leads to severe consequences, stressing the need for early diagnosis. However, patient classification using statistical learning is limited by the scarcity of large, labeled datasets due to high acquisition and expertise [...] Read more.
The liver is vital for metabolism, detoxification, and homeostasis. Untreated liver disease leads to severe consequences, stressing the need for early diagnosis. However, patient classification using statistical learning is limited by the scarcity of large, labeled datasets due to high acquisition and expertise cost. Surmounting this impediment, a novel Self-Evolving Neighborhood (SEN)-batched pseudo-labeling (PL) technique is proposed within the context of a semi-supervised learning framework. At its core, the NeuroStack model has been developed for labeling the datasets. The study examines the performance of the proposed PL algorithm across datasets like ILPD, BUPA Liver Disorder, and LFT. It is further compared to the state-of-the-art (SOTA) FixMatch. This study achieved the best accuracy of 98%, which is ≈11% higher than the FixMatch algorithm, and a confidence score of 97%, which is ≈12% higher than the FixMatch algorithm. The average accuracy, confidence score, F1-score and AUC across all the datasets are 94.6%, 94%, 0.96 and 0.98, respectively. The confidence interval was ±1.2 which is significantly lower than other algorithms. The experiments also achieved the best patient classification accuracy of 98% using the novel NeuroStack model which is adaptable for labeling any non-image datasets. Full article
(This article belongs to the Special Issue Advances and Applications of Machine Learning for Bioinformatics)
Show Figures

Figure 1

17 pages, 2696 KB  
Article
BF-m7GPred: A Dual-Branch Feature Fusion Deep Learning Architecture for Identifying RNA N7-Methylguanosine Modification Sites
by Jiyu Chen, Xingyang Fan, Qiu Jie and Shutan Xu
Appl. Sci. 2026, 16(5), 2577; https://doi.org/10.3390/app16052577 - 7 Mar 2026
Viewed by 460
Abstract
RNA N7-methylguanosine (m7G) is an important post-transcriptional epigenetic modification that participates in key biological processes, including RNA processing, stability maintenance, and translational regulation. Medical research has shown that m7G modification and its related regulatory factors are closely related to many neurological diseases and [...] Read more.
RNA N7-methylguanosine (m7G) is an important post-transcriptional epigenetic modification that participates in key biological processes, including RNA processing, stability maintenance, and translational regulation. Medical research has shown that m7G modification and its related regulatory factors are closely related to many neurological diseases and tumors. The accurate prediction of m7G sites is thus critical for understanding their biological functions in diseases. In this work, we propose BF-m7GPred, a dual-branch deep learning framework that integrates single-nucleotide-level embeddings and motif-level embeddings for m7G modification site prediction. Our proposed context-aware module tokenizes RNA sequences using byte-pair encoding and encodes sequences with the pretrained foundation biological model DNABERT2. In parallel, the proposed feature fusion module transforms sequences into multiple feature matrices using multiple traditional encoders. We introduce a feature selection strategy tailored to the encoding characteristics of the two branches. On a benchmark dataset collected from m7G-Hub v2.0, BF-m7GPred achieves superior performance on the independent test set against existing methods. Furthermore, its generalization capability is validated through comparative experiments on 10 diverse RNA modification datasets. Full article
(This article belongs to the Special Issue Advances and Applications of Machine Learning for Bioinformatics)
Show Figures

Figure 1

20 pages, 2434 KB  
Article
Machine Learning-Based Prediction of Autism Spectrum Disorder and Discovery of Related Metagenomic Biomarkers with Explainable AI
by Mustafa Temiz, Burcu Bakir-Gungor, Nur Sebnem Ersoz and Malik Yousef
Appl. Sci. 2025, 15(16), 9214; https://doi.org/10.3390/app15169214 (registering DOI) - 21 Aug 2025
Cited by 4 | Viewed by 3076
Abstract
Background: Autism spectrum disorder (ASD) is a complex neurodevelopmental disorder characterized by social communication deficits and repetitive behaviors. Recent studies have suggested that gut microbiota may play a role in the pathophysiology of ASD. This study aims to develop a classification model for [...] Read more.
Background: Autism spectrum disorder (ASD) is a complex neurodevelopmental disorder characterized by social communication deficits and repetitive behaviors. Recent studies have suggested that gut microbiota may play a role in the pathophysiology of ASD. This study aims to develop a classification model for ASD diagnosis and to identify ASD-associated biomarkers by analyzing metagenomic data at the taxonomic level. Methods: The performances of five different methods were tested in this study. These methods are (i) SVM-RCE, (ii) RCE-IFE, (iii) microBiomeGSM, (iv) different feature selection methods, and (v) a union method. The last method is based on creating a union feature set consisting of the features with importance scores greater than 0.5, identified using the best-performing feature selection methods. Results: In our 10-fold Monte Carlo cross-validation experiments on ASD-associated metagenomic data, the most effective performance metric (an AUC of 0.99) was obtained using the union feature set (17 features) and the AdaBoost classifier. In other words, we achieve superior machine learning performance with a few features. Additionally, the SHAP method, which is an explainable artificial intelligence method, is applied to the union feature set, and Prevotella sp. 109 is identified as the most important microorganism for ASD development. Conclusions: These findings suggest that the proposed method may be a promising approach for uncovering microbial patterns associated with ASD and may inform future research in this area. This study should be regarded as exploratory, based on preliminary findings and hypothesis generation. Full article
(This article belongs to the Special Issue Advances and Applications of Machine Learning for Bioinformatics)
Show Figures

Figure 1

19 pages, 990 KB  
Article
Machine Learning for Mortality Risk Prediction in Myocardial Infarction: A Clinical-Economic Decision Support Framework
by Konstantinos P. Fourkiotis and Athanasios Tsadiras
Appl. Sci. 2025, 15(16), 9192; https://doi.org/10.3390/app15169192 - 21 Aug 2025
Viewed by 3426
Abstract
Myocardial infarction (MI) remains a leading cause of in-hospital mortality. Early identification of high-risk patients is essential for improving clinical outcomes and optimizing hospital resource allocation. This study presents a machine learning framework for predicting mortality following MI using a publicly available dataset [...] Read more.
Myocardial infarction (MI) remains a leading cause of in-hospital mortality. Early identification of high-risk patients is essential for improving clinical outcomes and optimizing hospital resource allocation. This study presents a machine learning framework for predicting mortality following MI using a publicly available dataset of 1700 patient records, and after excluding records with over 20 missing values and features with more than 300 missing entries, the final dataset included 1547 patients and 113 variables, categorized as binary, categorical, integer, or continuous. Missing values were addressed using denoising autoencoders for continuous features and variational autoencoders for the remaining data. In contrast, feature selection was performed using Random Forest, and PowerTransformer scaling was applied, addressing class imbalance by using SMOTE. Twelve models were evaluated, including Focal-Loss Neural Networks, TabNet, XGBoost, LightGBM, CatBoost, Random Forest, SVM, Logistic Regression, and a voting ensemble. Performance was assessed using multiple metrics, with SVM achieving the highest F1 score (0.6905), ROC-AUC (0.8970), and MCC (0.6464), while Random Forest yielded perfect precision and specificity. To assess generalizability, a subpopulation external validation was conducted by training on male patients and testing on female patients. XGBoost and CatBoost reached the highest ROC-AUC (0.90), while Focal-Loss Neural Network achieved the best MCC (0.53). Overall, the proposed framework outperformed previous studies in key metrics and maintained better performance under demographic shift, supporting its potential for clinical decision-making in post-MI care. Full article
(This article belongs to the Special Issue Advances and Applications of Machine Learning for Bioinformatics)
Show Figures

Figure 1

24 pages, 1990 KB  
Article
Evaluating Skin Tone Fairness in Convolutional Neural Networks for the Classification of Diabetic Foot Ulcers
by Sara Seabra Reis, Luis Pinto-Coelho, Maria Carolina Sousa, Mariana Neto, Marta Silva and Miguela Sequeira
Appl. Sci. 2025, 15(15), 8321; https://doi.org/10.3390/app15158321 - 26 Jul 2025
Cited by 2 | Viewed by 2731
Abstract
The present paper investigates the application of convolutional neural networks (CNNs) for the classification of diabetic foot ulcers, using VGG16, VGG19 and MobileNetV2 architectures. The primary objective is to develop and compare deep learning models capable of accurately identifying ulcerated regions in clinical [...] Read more.
The present paper investigates the application of convolutional neural networks (CNNs) for the classification of diabetic foot ulcers, using VGG16, VGG19 and MobileNetV2 architectures. The primary objective is to develop and compare deep learning models capable of accurately identifying ulcerated regions in clinical images of diabetic feet, thereby aiding in the prevention and effective treatment of foot ulcers. A comprehensive study was conducted using an annotated dataset of medical images, evaluating the performance of the models in terms of accuracy, precision, recall and F1-score. VGG19 achieved the highest accuracy at 97%, demonstrating superior ability to focus activations on relevant lesion areas in complex images. MobileNetV2, while slightly less accurate, excelled in computational efficiency, making it a suitable choice for mobile devices and environments with hardware constraints. The study also highlights the limitations of each architecture, such as increased risk of overfitting in deeper models and the lower capability of MobileNetV2 to capture fine clinical details. These findings suggest that CNNs hold significant potential in computer-aided clinical diagnosis, particularly in the early and precise detection of diabetic foot ulcers, where timely intervention is crucial to prevent amputations. Full article
(This article belongs to the Special Issue Advances and Applications of Machine Learning for Bioinformatics)
Show Figures

Figure 1

Review

Jump to: Research

34 pages, 501 KB  
Review
An Overview of Existing Applications of Artificial Intelligence in Histopathological Diagnostics of Lymphoma: A Scoping Review
by Mieszko Czaplinski, Grzegorz Redlarski, Mateusz Wieczorek, Paweł Kowalski, Piotr Mateusz Tojza, Adam Sikorski and Arkadiusz Żak
Appl. Sci. 2026, 16(6), 2803; https://doi.org/10.3390/app16062803 - 14 Mar 2026
Viewed by 476
Abstract
Background: Artificial intelligence (AI) shows promising results in lymphoma detection, prediction, and classification. However, translating these findings into practice requires a rigorous assessment of potential biases, clinical utility, and further validation of research models. Objective: The goal of this study was to summarize [...] Read more.
Background: Artificial intelligence (AI) shows promising results in lymphoma detection, prediction, and classification. However, translating these findings into practice requires a rigorous assessment of potential biases, clinical utility, and further validation of research models. Objective: The goal of this study was to summarize existing studies on artificial intelligence models for the histopathological detection of lymphoma. Design: This study adhered to the PRISMA Extension for Scoping Reviews (PRISMA-ScR) guidelines. A systematic search was conducted across three major databases (Scopus, PubMed, Web of Science) for English-language articles and reviews published between 2016 and 2025. Seven precise search queries were applied to identify relevant publications, accounting for variations in study modality, algorithmic architectures, and disease-specific terminology. Results: The search identified 612 records, of which 36 articles met the inclusion criteria. These studies presented 36 AI models, comprising 30 diagnostic and six prognostic applications, with Convolutional Neural Networks (CNNs) being the predominant architecture. Regarding data sources, 83% (30/36) of datasets utilized Hematoxylin and Eosin (H&E)-stained images, while the remainder relied on diverse modalities, including IHC-stained slides, bone marrow smears, and other tissue preparations. Studies predominantly utilized retrospective, private cohorts with sample sizes typically ranging from 50 to 400 patients; only a minority leveraged open-access repositories (e.g., Kaggle, TCGA). The primary application was slide-level multi-class classification, distinguishing between specific lymphoma subtypes and non-neoplastic controls. Beyond diagnosis, a subset of studies explored advanced prognostic tasks, such as predicting chemotherapy response and disease progression (e.g., in CLL), as well as automated biomarker quantification (c-MYC, BCL2, PD-L1). Reported diagnostic performance was generally high, with accuracy ranging from 60% to 100% (clustering around 90%) and AUC values spanning 0.70 to 0.99 (predominantly >0.90). Conclusions: While AI models demonstrate high diagnostic accuracy, their translation into practice is limited by unstandardized protocols, morphological complexity, and the “black box” nature of algorithms. Critical issues regarding data provenance, image noise, and lack of representativeness raise risks of systematic bias, hence the need for rigorous validation in diverse clinical environments. Full article
(This article belongs to the Special Issue Advances and Applications of Machine Learning for Bioinformatics)
Show Figures

Figure 1

Back to TopTop