Next Article in Journal
High Allelic Heterogeneity in Kazakhstani Patients with Neurofibromatosis Type 1: Results from the First Molecular Study
Previous Article in Journal
Distinct Morphokinetic Signature of Human Embryos with Chromosomal Mosaicism
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Integrated Bioinformatics and Machine Learning for Ascertainment and Validation of Biomarkers for Screening Breast Disease

1
National Key Laboratory, Heilongjiang University of Chinese Medicine, Harbin 150000, China
2
State Key Laboratory of Multimodal Artificial Intelligence, Institute of Automation, Chinese Academy of Sciences, Beijing 100000, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Genes 2025, 16(11), 1389; https://doi.org/10.3390/genes16111389
Submission received: 27 October 2025 / Revised: 11 November 2025 / Accepted: 15 November 2025 / Published: 18 November 2025
(This article belongs to the Section Bioinformatics)

Abstract

Background: This research sought to screen potential biomarkers in diagnosing breast diseases and elucidating their immune-related mechanisms. Methods: Three datasets were attained from the Gene Expression Omnibus (GEO) database. LIMMA package and weighted gene co-expression network analysis (WGCNA) were used to ascertain differentially expressed genes (DEGs) and key modules in benign breast disease (BBD) and breast cancer (BC). The intersecting genes underwent functional enrichment analysis. Three machine learning (ML) methods (encompassing LASSO regression, random forest, and support vector machine recursive feature elimination (SVM-RFE)) were implemented to select core genes. The diagnostic performance of the core genes was evaluated by comparing their expression levels, plotting receiver operating characteristic (ROC) curves, and constructing a Nomogram. The TCGA-BRCA dataset was used to estimate the prognostic capability of the core genes among individuals with BC. Finally, the IC infiltration was ascertained utilizing the CIBERSORT algorithm. Results: In total, 2579 DEGs were identified in BBD. WGCNA exhibited that the 1652 genes in green and pink modules were strongly correlated with BBD. In BC, 2742 DEGs were identified. The turquoise and red modules contained 7286 genes exhibiting strong correlations with BC. After intersecting, 41 common genes were obtained, which were predominantly enriched in immune and inflammation regulation pathways. Through integrated screening with three ML algorithms, Arrestin Domain Containing 1 (ARRDC1) and ATPase Sarcoplasmic/Endoplasmic Reticulum Ca2+ Transporting 2 (ATP2A2) were identified as core genes. The ROC curve exhibited that the AUC for the two genes was greater than 0.8. The calibration curve of the nomogram signified a strong alignment between the anticipated risk and detected results. Survival analysis in TCGA-BRCA showed that the high expression of the two genes exhibited a significantly positive association with unfavorable prognosis. Immune infiltration analysis further demonstrated the dysregulation of multiple immune cells in patient samples. Conclusions: ARRDC1 and ATP2A2 are strongly linked to BBD and BC. These findings might enhance our comprehension of the pathogenesis and progression of both BBD and BC, offering prospective biological biomarkers and therapeutic targets for clinical treatment.
Keywords: benign breast disease; breast cancer; bioinformatics; machine learning; immune cell infiltration benign breast disease; breast cancer; bioinformatics; machine learning; immune cell infiltration

Share and Cite

MDPI and ACS Style

Wang, Q.; Yang, S.; Zhang, Y.; Piao, C.; Liu, X.; Wu, X. Integrated Bioinformatics and Machine Learning for Ascertainment and Validation of Biomarkers for Screening Breast Disease. Genes 2025, 16, 1389. https://doi.org/10.3390/genes16111389

AMA Style

Wang Q, Yang S, Zhang Y, Piao C, Liu X, Wu X. Integrated Bioinformatics and Machine Learning for Ascertainment and Validation of Biomarkers for Screening Breast Disease. Genes. 2025; 16(11):1389. https://doi.org/10.3390/genes16111389

Chicago/Turabian Style

Wang, Qi, Saisai Yang, Yao Zhang, Chengyu Piao, Xin Liu, and Xiuhong Wu. 2025. "Integrated Bioinformatics and Machine Learning for Ascertainment and Validation of Biomarkers for Screening Breast Disease" Genes 16, no. 11: 1389. https://doi.org/10.3390/genes16111389

APA Style

Wang, Q., Yang, S., Zhang, Y., Piao, C., Liu, X., & Wu, X. (2025). Integrated Bioinformatics and Machine Learning for Ascertainment and Validation of Biomarkers for Screening Breast Disease. Genes, 16(11), 1389. https://doi.org/10.3390/genes16111389

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop