Bioinformatics Strategies in Breast Cancer Research

Veneziano, Matteo; Savini, Isabella; Cortellesi, Elisa; Gasperi, Valeria; Gambacurta, Alessandra; Catani, Maria Valeria

doi:10.3390/biom15101409

Open AccessReview

Bioinformatics Strategies in Breast Cancer Research

by

Matteo Veneziano

¹

,

Isabella Savini

¹

,

Elisa Cortellesi

¹

,

Valeria Gasperi

¹

,

Alessandra Gambacurta

^1,2 and

Maria Valeria Catani

^1,*

¹

Department of Experimental Medicine, Tor Vergata University of Rome, 00133 Rome, Italy

²

NAST Centre (Nanoscience & Nanotechnology & Innovative Instrumentation), Tor Vergata University of Rome, 00133 Rome, Italy

^*

Author to whom correspondence should be addressed.

Biomolecules 2025, 15(10), 1409; https://doi.org/10.3390/biom15101409

Submission received: 24 June 2025 / Revised: 9 September 2025 / Accepted: 29 September 2025 / Published: 2 October 2025

(This article belongs to the Special Issue Experimental and Bioinformatic Approaches for Biomarker Discovery in Disease: From Molecular Mechanisms to Therapeutic Challenges)

Download

Browse Figures

Versions Notes

Abstract

Breast cancer is a heterogeneous disease and a leading cause of cancer-related deaths worldwide, underscoring the urgent need for effective biomarkers to guide diagnosis, prognosis, and therapeutic decisions. Bioinformatics methodologies, including genomics, transcriptomics, proteomics, and metabolomics data analysis, are essential for deciphering the complex molecular landscape of breast cancer. Bioinformatics tools facilitate the identification of differentially expressed genes, non-coding RNAs, and proteins, unraveling crucial pathways involved in tumor initiation, progression, and metastasis. By constructing and analyzing protein–protein interaction networks and signaling pathways, bioinformatics approaches can identify potential diagnostic, prognostic, and predictive biomarkers. Herein, we explore the role of bioinformatics in breast cancer research and its potential application in identifying novel therapeutic targets and predicting drug response, ultimately enabling the development of tailored treatment strategies. We also address the challenges and future directions in utilizing bioinformatics for biomarker discovery and validation, emphasizing the need for robust statistical methods, standardized data analysis pipelines, and collaborative efforts to translate bioinformatics insights into improved clinical outcomes for breast cancer patients.

Keywords:

breast cancer; biomarker; genomics; transcriptomics; proteomics; metabolomics; drug response

1. Introduction

In 2022, GLOBOCAN estimated 2.3 million new cases of breast cancer (BC) worldwide, making it the most diagnosed tumor and the leading cause of cancer-related deaths among women [1]. BC is an extremely intricate tumor characterized by significant histological and molecular heterogeneity, which influences progression, metastasis formation, and therapy response [2,3]. Consequently, diagnosis and the following appropriate therapeutic plan strictly depend on the integration of histological and molecular characteristics of the tumor, that take into the account grade, stage, lymphovascular invasion, and the presence of axillary lymph node metastasis, but also the status of steroid receptors [estrogen receptor (ER), progesterone receptor (PR)], human epidermal growth factor receptor 2 (HER2), and the Ki67 index (nuclear marker of proliferating cells) [4].

Although BC classification is constantly adapting and evolving, based on the integration of new information from various sources, the standard for diagnosis has been established by World Health Organization (WHO). The primary histologic classification of BC is based on whether the tumor is of epithelial origin, or not. While less than 1% of BCs have non-epithelial origin (e.g., sarcoma deriving myofibroblasts and blood vessel cells), the vast majority of BCs are carcinomas deriving from the epithelial cells of breast lobules and milk ducts [4,5] (Figure 1). This latter group is histologically categorized as in situ carcinoma (further distinguished in lobular and ductal carcinoma, LCIS and DCIS, respectively) and invasive carcinoma (IBC), able to infiltrate the surrounding mammary glandular tissue and metastasize to regional lymph nodes and distant organs. The most common IBC is the invasive carcinoma no special type (NST) [also referred to invasive ductal carcinoma (IDC)], followed by invasive lobular carcinoma (ILC, approximately 10–15% of invasive BC), and other subtypes accounting for less than 10% of BC [4].

On the basis of molecular subtyping [6], BCs are further classified in luminal A-like (ER⁺, PR⁺, HER2⁻, Ki67^low), luminal B-like (ER⁺ and/or PR⁺, HER2^+/−, Ki67^low/high), HER2-enriched (ER⁻, PR⁻, HER2⁺, Ki67^high) and basal-like [triple-negative breast cancer (TNBC); ER⁻, PR⁻, HER2⁻, Ki67^high] [3]. BC also exhibits strong familial clustering, with BRCA1 and BRCA2 being well-known high-penetrance genes associated with increased risk, and a polygenic origin suggested by medium-low penetrance genes, such as PALB2, PTEN, PIK3CA, ATM, PALB2, BARD1, CHEK2, RAD51C, RAD51D, and TP53 [7,8,9,10]. Analysis of global gene expression patterns reveals distinct BC subtypes with unique biological behaviors, treatment responses, and prognoses [11,12]. For instance, in DCIS, germline BRCA1 and BRCA2 mutations, HER2 amplification, and negative ER and PR status are observed, reflecting the full spectrum of luminal A, luminal B, HER2-enriched, and basal-like molecular subtypes. Conversely, classic ILCs predominantly express ER and PR and lack HER2 gene amplification/overexpression. Furthermore, specific etiological risk factors, such as BRCA1 and TP53 mutations, are usually associated with TNBC, while BRCA2 and PIK3CA mutations are linked to Luminal A- and B-like BC subtypes [4].

Due to the complex and heterogeneous nature of BC, traditional research methods often fall short in providing a comprehensive understanding of the molecular aberrations within tumors. This is where bioinformatics tools become invaluable, enabling the integration of high-throughput data and clinical factors to elucidate the underlying molecular mechanisms of BC. Genomics, transcriptomics and proteomics, along with single-cell analyses, have already been demonstrated to be effective for the discovery of new biomarkers and therapeutic targets. Advances in technology have enabled researchers to delve deeper into the molecular mechanisms of tumorigenesis, leading to the discovery of novel diagnostic and prognostic biomarkers and paving the way for the development of targeted therapeutic interventions. In the future, bioinformatics will be strongly related to clinical personalized medicine; by detecting patient-specific molecular profiles, personalized therapeutic agents may indeed be chosen in precision oncology, which could improve clinical results [13].

Herein, we will discuss applications of these bioinformatic strategies in changing our comprehension of BC, as well as how these are being employed to devise precision treatment approaches.

2. Bioinformatics Tools and Techniques for Biomarker Discovery

Bioinformatics is an interdisciplinary field that develops and applies mathematical, statistical, and computational methods to solve biological problems. It encompasses the utilization of software tools and databases for collection, organization, and analysis of large-scale biological and medical data, to gain a deeper understanding of biological systems. It mainly relies on “omics” technologies (genomics, transcriptomics, epigenomics, proteomics, lipidomics, metabolomics), each comprehensively focusing on a specific set of biological molecules.

2.1. Data Acquisition

Modern sequencing techniques, often referred to as high-throughput sequencing or next-generation sequencing (NGS) (Figure 2), have transformed nucleic acid analysis by providing increased speed, lower costs, and higher accuracy compared to first-generation Sanger’s genome sequencing [14]. One NGS approach is whole-genome sequencing (WGS), which involves sequencing an organism’s entire genome using DNA extracted from blood or biopsy tissues [15]. WGS is highly sensitive to structural variants, including deletions, insertions, inversions, duplications, and translocations. Furthermore, it enables the identification of single-nucleotide polymorphisms (SNPs), which can contribute in elucidating the underlying biology of diseases and supports the development of effective, personalized treatment strategies [16,17]. Unlike WGS, which analyzes all genomic regions, the whole-exome sequencing (WES) focuses exclusively on the coding regions. Approximately 85% of mutations associated with diseases occur within the exons and WES covers around 95% of these areas, highlighting its significant role in research [18]. While both WGS and WES provide robust coverage of coding regions, WES may be more susceptible to sequence bias in large-scale applications, whereas WGS offers more reliable detection of structural variants [19]. Finally, Target Region Sequencing (TRS) analyzes a select set of genes or genomic regions with specific functions, often linked to particular diseases or phenotypes [20]. This technique has a wide range of applications, including detection of SNPs, insertions and deletions, copy number variations, and structural variants. It also allows discovery of germline or somatic mutations, linkage analysis for inherited diseases, and identification of biomarkers and therapeutic targets; one of the significant advantages is the TRS ability to identify variants at low allele frequencies, even as low as 1%.

RNA sequencing (RNA-seq) is the preferred technique for studying the transcriptome and serves as a primary tool for expression analysis of coding and non-coding (nc) RNAs [14]. Both bulk (which measures average gene expressions across large cell populations) and single-cell RNA (scRNA-seq, which analyses RNA profiles at the single-cell level) sequencing are developed for gene expression analysis [21]. These methods should be viewed as complementary rather than interchangeable [22]. Additionally, global run-on sequencing (GRO-Seq) allows mapping the production of active and nascent RNA, capturing transcription activity in real time rather than steady-state RNA levels [23]. These techniques can be useful for deciphering mechanisms of drug resistance occurring during treatment of several diseases, including cancer, and for developing personalized medicine approaches tailored to each patient’s molecular profile [17].

More recent NGS applications include those employed in epigenomics and chromosome conformation and interaction technologies. The first one commonly utilizes chromatin immunoprecipitation followed by sequencing (ChIP-seq) for mapping histone modifications [24], methyl-seq and bisulfite-seq (these techniques enzymatically or chemically convert unmethylated cytosines into uracil, later read as thymidine) for DNA methylation studies [25], ATAC-seq (a technique that creates tagged DNA fragments, using a hyperactive mutant Tn5 transposase, subsequently PCR-amplified and sequenced) and DNase-seq (it consists in sequencing of DNase I-sensitive DNA regions not occupied by histones) for studying chromatin accessibility [26]. Investigations on dynamic and regulated interplay of macromolecules within the cell utilize High-throughput Chromosome Conformation Capture (Hi-C) and Chromatin Interaction Analysis with Paired-End-Tag sequencing (ChIA-PET), the last combining ChIP and sequencing for mapping long-range chromatin interactions [27,28].

The main techniques used in proteomics involve gel, chromatography, reverse-phase microarrays, and mass-spectrometry-based analyses, together with bioinformatic approaches [29]. Proteins are firstly isolated from a mixture through 2D gel electrophoresis, affinity chromatography, size exclusion chromatography or ion-exchange chromatography. Protein identification is the next step, which makes use of several techniques, among which the most common is mass-spectrometry, where proteins are separated according to their mass-to-charge ratio (m/z), enabling their identification [29].

Finally, by using nuclear magnetic resonance (NMR), protein microarrays, and mass spectrometry, it is possible to investigate the complete set of low-molecular-weight metabolites (<1500 Da) produced by a cell because of its metabolic activity. In this context, cellular and organismal metabolic functions can be studied by considering both endometabolome (which consists of all intracellular metabolites) and exometabolome (which includes metabolites secreted into the growth medium or extracellular fluid) [30].

2.2. Data Collection and Organization

The holistic understanding of interactions and relationships within biological systems requires a systematic approach to data acquisition and organization. The advent of online databases with open access has enabled researchers to utilize, analyze, and interpret this wealth of information effectively [31].

Databases can be categorized into two main categories: primary and secondary databases [32]. Primary databases contain experimental data that researchers generate and submit directly. An example is represented by GenBank, a comprehensive genetic sequence database that houses nucleotide sequences primarily obtained through submissions from individual laboratories [33]; this database plays a crucial role in storing and disseminating publicly available DNA sequences. Secondary databases consist of curated analyses and interpretations of primary data, making them more refined [32].

Databases are further classified according to the specific type of biological data they curate (Table 1):

(1): Sequence databases. These encompass both primary and secondary collections of DNA, RNA, or protein sequences. Besides the above-mentioned GenBank collection, other online available databases include the following: (i) EMBL (EMBL Nucleotide Sequence Database), a comprehensive repository of nucleotide sequences and annotations maintained by EMBL’s European Bioinformatics Institute (EMBL-EBI), drawing data from public databases [34]; (ii) RNAcentral, a public resource that provides integrated access to a continually updated, extensive collection of non-coding RNA sequences [35]; (iii) UniProt, a freely accessible resource for protein sequences and their functional annotations. The UniProt Knowledgebase (UniProtKB), containing over 227 million sequences, is continuously updated by the UniProt team using machine learning and data extracted from scientific literature [36].
(2): Gene expression databases. These store information about gene expression patterns across various cell types, tissues, or organisms at different times or under specific conditions. These databases enable, for example, the comparison of gene expression levels in healthy versus tumor tissues or in tissues treated with a placebo versus those treated with a drug. Gene Expression Omnibus (GEO) is a public functional genomics data repository, enabling researchers to explore and analyze gene expression data, including both raw and processed data; it also offers web tools that allow users to analyze and interpret data [37]. The Cancer Genome Atlas (TCGA) is a comprehensive database, containing over 20,000 tumor and matched normal samples across 33 of the most prevalent forms of cancer, all molecularly characterized at the DNA (copy number changes, epigenetic modifications), RNA (messenger RNA and microRNA), and protein levels [38].
(3): Genetic databases. These contain information on genetic variants, including mutations, SNPs, and other genomic modifications linked to genetic diseases and pathological conditions. Examples include the following: (i) Clinical Genome Resource (ClinVar) catalogs various types of structural variants (SNPs, copy number variations, inversions, and translocations) together with their association with diseases [39]. Clinical relevance information for these variants is contributed by clinical testing labs, research institutions, and expert groups [39]. (ii) Single-Nucleotide Polymorphism Database (dbSNP) stores information on single nucleotide variants, microsatellites, insertions, and deletions that are prevalent in the human genome, useful for cancer research and genetic association studies [40,41].
(4): Molecular structure databases. These provide access to the three-dimensional (3D) structures of biological molecules. Understanding these structures is critical for elucidating their functions and roles within cells. Among the most significant databases deserve mention: (i) Protein Data Bank (PDB), an open-access repository that houses over 210,000 experimentally validated 3D structures of proteins and nucleic acids [42]. The database is weekly updated with relevant functional annotations sourced from various external biodata resources [43]; (ii) Structural Classification of Proteins (SCOP), a database that organizes proteins with known 3D structures based on their evolutionary and structural relationships [44].
(5): Molecular interaction databases. These focus on biomolecular interactions, particularly protein–protein interaction (PPIs). By identifying biological pathways, molecular patterns, and discovering new protein functions, these databases can elucidate the molecular basis of various pathologies, making them valuable tools for prevention, diagnosis, and therapy [45]. Databases in this category include the following: (i) Biological General Repository for Interaction Datasets (BioGRID), which provides comprehensive information on protein and genetic interactions across multiple species (including yeast, mice, and humans), thus allowing users to create intricate network graphs [46]; (ii) STRING, a key resource for studying physical and functional PPIs, deriving from experimental interaction databases, scientific literature, and computational predictions based on co-expression [47]; (iii) IntAct, a curated database system and analysis tool for investigating molecular interactions derived from scientific literature and direct data submissions. IntAct features over one million binary interactions and is continuously updated, with annotations that detail how even minor sequence changes can affect protein interactions [48].
(6): Biological Pathway Databases. These provide valuable insights into the biological roles of molecules and the metabolic pathways they participate in. The most common functional database is Kyoto Encyclopedia of Genes and Genomes (KEGG), a comprehensive database designed to assign functional meanings to genes and genomes at both molecular and broader biological levels. This integrated resource combines 15 manually curated databases with one computationally generated database, organized into four main categories (systems, genomic, information, and health information). KEGG serves as a vital tool for studying metabolism, genetic pathways, organismal functions, and human diseases [49].

Table 1. Major public databases for omics data integration and analysis in biomedical research.

Database	Name	Details	Website (10 June 2025)
Sequence	GenBank	DNA sequences	https://www.ncbi.nlm.nih.gov/genbank/
	EMBL	Nucleotide sequences and annotations	https://www.ebi.ac.uk/embl/
	RNAcentral	Non-coding RNA sequences and annotations	https://rnacentral.org/
	UniProt	Protein sequences and annotations	https://www.uniprot.org/
Gene expression	GEO	Multi-omics data	https://www.ncbi.nlm.nih.gov/geo/
Gene expression	TCGA	Multi-omics data	https://www.cancer.gov/ccg/research/genome-sequencing/tcga
Genetic	ClinVar	Genetic variants and associations with diseases	https://www.ncbi.nlm.nih.gov/clinvar/
Genetic	dbSNP	Small genetic variations	https://www.ncbi.nlm.nih.gov/snp/
Molecular structure	PDB	3D structure of proteins, nucleic acids and complexes with functional annotations	https://www.rcsb.org/
Molecular structure	SCOP	Evolutionary structure and structural relationships of proteins	https://scop.mrc-lmb.cam.ac.uk/
Molecular interactions	BioGRID	Protein, genetic and chemical interactions	https://thebiogrid.org/
	STRING	Protein–protein interactions	https://string-db.org/
	IntAct	Molecular interactions for macromolecular complexes	https://www.ebi.ac.uk/intact/
Biological Pathways	KEGG	High-level functions of biological systems	https://www.kegg.jp/

BioGRID: Biological General Repository for Interaction Datasets; ClinVar: Clinical Variation; dbSNP: Single-Nucleotide Polymorphism Database; EMBL: European Molecular Biology Laboratory; GEO: Gene Expression Omnibus; KEGG: Kyoto Encyclopedia of Genes and Genomes; PDB: Protein Data Bank; RNAcentral: RNA Central Database; SCOP: Structural Classification of Proteins; STRING: Search Tool for the Retrieval of Interacting Genes/Proteins; TCGA: The Cancer Genome Atlas; UniProt: Universal Protein Resource.

2.3. Data Analysis

The final step in the bioinformatics pipeline is represented by data analysis, where raw data are converted into meaningful biological insights. It is the conclusion of all preceding steps (data acquisition, collection, and organization) and utilizes computational tools and algorithms to extract relevant information. Open-access tools, together with publicly available databases, empower researchers to collaborate and share data, thereby accelerating progress in molecular research.

While not exhaustive, here we describe a selection of the most common web-based tools employed in bioinformatics analyses, all of which are freely available and thus more accessible to the broader scientific community (Table 2).

At the end of NGS procedures many short read sequences are generated, and alignment is necessary to identify the corresponding segment of each sequenced read in its reference sequence [50]. Sequence alignment is fundamental in comparative studies and phylogenetics analysis, since it allows comparison among two or more sequences (DNA, RNA, peptides or proteins) to highlight similarities and differences, for understanding potential functional, structural or evolutionary relationships. Alignment is evaluated using a scoring system: for nucleotides, a positive score is given when identical bases are present in both sequences, while protein scoring considers chemical and physical features of amino acids (amino acids with similar characteristics receive higher scores) [50]. The most commonly used amino acid substitution scoring matrices include PAM [51], BLOSUM [52], and JTT [53].

Splicing variants from TCGA can be analyzed by Tumor Splicing Variants database (TSVdb) [54], a user-friendly interface for visualization of gene expression, splicing patterns and clinical data, thus allowing understanding of the relationships between isoforms and patient prognosis. Another online platform for the analysis of multidimensional cancer genomics is the cBioPortal for Cancer Genomics, which outputs complex molecular profiling data (deriving from cell lines or tumor tissue) in an easily interpretable format (containing genetic, epigenetic, gene expression, and proteomic information) [55].

Accurate discrimination of long non-coding (lnc) RNAs from protein-coding genes within transcriptomic data is achieved using LncRNA-ID, a powerful tool that employs a machine learning model to assess coding potential based on a comprehensive set of transcriptional features [56]. Another useful online tool is LncBook 2.0 that incorporates lncRNA annotations at different omics levels, thus allowing to decipher lncRNA signatures in diffrent physiopatological contexts [57].

User-friendly web platforms, such as miRNet 2.0 [58], miRTargetLink 2.0 [59], and miRDB [60], are also available for exploring miRNA-centric networks; they allow understanding of the intricate relationships between miRNAs and their target genes, as well as their role in several biological processes and diseases.

A major application of RNA-seq data is, however, the analysis of differentially expressed genes (DEGs), allowing researchers to compare gene expression levels across samples under different conditions. For example, RNA-seq can be employed to investigate gene expression variations between healthy and tumor tissues, providing insights into molecular mechanisms underlying disease and facilitating the identification of potential biomarkers or therapeutic targets [17]. The functional roles of DEGs can be inferred through Gene Ontology (GO) analysis, which establishes a hierarchical structure encompassing biological, cellular, and molecular functions associated with genes [61]. It is continuously updated to reflect new scientific discoveries, ensuring it remains current with the latest advancements in biological knowledge [62]. GO term annotation analysis (mapping the provided entries to GO subsets) or enrichment analysis (scanning for GO categories that are overrepresented in the input list) can be performed through the open-source web applications Gonet [63] and ShinyGO 0.82 [64]. Easy Visualization and Inference Toolbox for Transcriptome Analysis (eVITTA) is a powerful tool providing modules for analyzing and exploring studies published in NCBI GEO (easyGEO), detailed molecular- and systems-level functional profiling (easyGSEA), and customizable comparisons among experimental groups (easyVizR) [65]. Other commonly used web-based tools are Integrated Differential Expression and Pathway analysis (iDEP) 2.0 [66] and gene Profiler (g:Profiler) [67].

AlphaFold represents a breakthrough in protein structure prediction. It is an artificial intelligence (AI) program, developed by DeepMind and EMBL-EBI, that predicts the 3D structure of a protein based on its primary amino acid sequence, by using machine learning. This tool has transformed bioinformatics and molecular biology research fields, by enabling accurate prediction of protein structures, crucial for understanding protein functions and interactions with other biomolecules. The latest version, AlphaFold 3, released in 2024, can predict not only the structure of proteins but also that of DNA and RNA, as well as identifying ligands and their interactions. This advancement is made possible through a deep learning architecture, representing a significant leap forward in comprehending complex biochemical interactions [68].

Finally, molecular docking is a computational method used to predict binding affinity between ligands and receptors, particularly useful in drug discovery [69]. The process involves two main steps: (i) predicting the binding pose of the compound, which is the most energetically favorable arrangement when the compound is attached to the target, and (ii) estimating the binding energy that holds the compound to the target. Two key inputs are required: the 2D chemical structure of the compound and the 3D structure of the target, which can be obtained by X-ray crystallography or nuclear magnetic resonance spectroscopy [69,70,71]. AutoDock Vina 1.2.0 [72] and CB-Dock2 [73] are some of the freely accessible docking tools.

2.4. Machine Learning and AI

Biological data are challenging to analyze, because of their complexity; therefore, there is a growing use of machine learning and AI tools, aimed at creating informative and predictive models of the underlying biological processes [74]. These tools are especially valuable for handling datasets that are too large or intricate for human analysis, as well as for automating data analysis tasks. The fields of machine learning and AI are continuously evolving, as demonstrated by the sharp rise in global publication trends from 2018 to 2023, underscoring the growing influence of deep learning, especially BC diagnosis and treatment [75]. While they are not a one-size-fits-all solution particularly in cases where datasets are insufficient or when the focus is on understanding rather than prediction, they remain a crucial resource in biological research [74].

3. Bioinformatics in BC Research

Despite advances in BC research, mortality persists, and its rates continue to rise, highlighting the persistent difficulties in early detection, personalized treatment, and ongoing monitoring. Consequently, identifying novel biomarkers is crucial for improving patient outcomes, and multi-omics technologies and bioinformatics-driven integration of the resulting datasets are yielding clinical breakthroughs in personalized treatment and prognostics.

3.1. Diagnostic and Prognostic Biomarkers

Early detection represents a cornerstone of improved survival rates; nonetheless, currently it relies on imaging and histopathology, which may lack sensitivity for precancerous lesions or minimal residual disease, thus requiring novel diagnostic biomarkers. Novel biomarkers like gene expression signatures (multigene signatures), circulating tumor DNA (ctDNA) and ncRNAs offer non-invasive, high-resolution tools for earlier detection. This could revolutionize screening, particularly for high-risk groups, like those with TNBC, who remain difficult to deal with.

The combined analysis of two BC microarray datasets, globally encompassing 30 BC and 33 normal breast samples, identified a total of 733 DEGs; GO enrichment and KEGG pathway analysis, as well as construction of PPI network, lead to identification of 10 hub genes, strictly linked to growth and proliferation, among which six were strongly associated with BC progression. These DEGs are intimately linked to BC, displaying elevated expression across diverse BC subtypes, especially in TNBCs and being notably correlated with diminished patient survival rates [76] (Table 3).

Bioinformatics can also represent a powerful tool for classifying BC functional subtypes and stages, as well as for assessing the best treatment and predicting recurrence [96,97]. For instance, Li and coworkers [98] reported a computational framework combining transcriptomic profiling (101 normal breast and 218 TNBC tissue samples from GEO database) and protein interaction network to identify molecular drivers of TNBC. Fifty-four TNBC-related genes were identified, mostly related to invasion and metastasis and viral carcinogenesis. The authors also developed a novel high-risk BC prediction model, by implementing a pre-existing algorithm, whose accuracy reached 95.394% for TNBC diagnosis and 86.598% for TNBC staging.

By using the 80-gene molecular subtyping BluePrint test, Kuilman’s group identified and characterized a rare, but biologically distinct group of early-stage BCs, referred to as “dual subtypes” displaying both luminal basal and HER2 features [99].

Combination of WGS and RNA-seq approaches allowed identification of several fusion genes (ESR1-CCDC170, BCL2L14-ETV6, ETV6-NTRK3, MYB-NFIB, and NOTCH/MAST kinase) that may drive BC progression and may be useful for identifying patients who need closer monitoring and more aggressive therapy [100,101].

scRNA-seq performed on BC cells of axillary lymph nodes allowed a better knowledge of mechanisms underlying BC metastasis [79]. The authors analyzed five primary BC tissues (27,028 single cells) and 10 paired axillary lymph nodes (69,768 single cells), drawing a complete transcriptome profile visualized by the manifold learning and dimension reduction algorithm UMAP. The study allowed identification of nine cancer cell subclusters, including CD44⁺/ALDH2⁺/ALDH6A1⁺ BC stem cells, as well as key genes involved either in lymph node metastasis (PTMA, STC2, CST3, and RAMP3 genes) or in interactions between metastatic cancer cells and immune cells (NECTIN2-TIGIT and LGALS1-PTPRC interactions). Based on these findings, the authors concluded that BC progression may be predicted evaluating the transcriptome profile, gene set score, and cellular composition of cancer cell clusters [79].

Recent classification systems leveraging protein expression profiling have been proposed to more effectively discern the functional phenotypic variations contributing to BC complexity and heterogeneity; a more precise classification, indeed, can guarantee enhanced prognostic accuracy and improved treatment outcomes. For instance, proteomic profiling of 300 formalin-fixed paraffin-embedded (FFPE) BC specimens revealed distinct protein signatures linked to different immune responses and clinical outcomes. In particular, basal-like BCs can be classified into two subgroups (immune hot/favorable prognosis and immune cold/unfavorable prognosis); HER2-enriched BCs can be distinguished in three subgroups, depending on lipid metabolism, immune-response and extracellular matrix characteristics, while in TNBCs, four proteomic clusters (basal-immune hot, basal-immune cold, mesenchymal, and luminal) can be recognized [84].

Likewise, Jeon and coworkers [85] conducted a proteomic analysis of 56 FFPE BC biopsies based on immune subtypes (immune-inflamed, immune-excluded, and immune-desert) to investigate the relationship between immune characteristics and clinical outcomes. Although no differences in prognosis were detected, the three groups displayed differences in terms of proteomic signatures. The immune-inflamed group, who responds better to immune checkpoint inhibitors, exhibited higher levels of coronin-1A, while immune-excluded/desert tumors (associated with an unfavorable response to immunotherapy) displayed upregulated α-1-antitrypsin levels. Furthermore, a positive correlation was observed between tumor-infiltrating lymphocytes (TILs), known to improve prognosis and treatment responses, and coronin-1A expression, while α-1-antitrypsin levels were negatively correlated with TILs [85].

Additionally, through high-throughput mass spectrometry, bioinformatic, and machine learning approaches, Azevedo and colleagues [86] have drawn protein expression patterns specific to each BC subtype. Notably, TNBCs exhibited the most extensive proteomic alterations, with 343 overexpressed and 121 downregulated proteins. Distinct biological pathways and molecular changes were linked to each BC subtype, alongside unique patterns of oncoprotein and tumor suppressor expression. Similarly, proteomic profiling of 60 human BC cells identified 13,000 proteins, enabling subtype classification. Importantly, specific protein signatures correlated with hormone receptor (ER, PR, HER2) status, underscoring their potential utility in biomarker discovery and drug development [89].

Metabolomic studies have further delineated pathways associated with BC onset and progression. In one comprehensive study analyzing both the metabolome and lipidome of 330 TNBC samples and 149 paired normal breast tissues, TNBCs were stratified into three distinct metabolomic subtypes: (i) C1-enriched in ceramides and fatty acids; (ii) C2-enriched in oxidation-related metabolites and glycosyl transfer products; (iii) C3-characterized by minimal metabolic dysregulation [82]. Integrating these data with available genomic and transcriptomic profiles revealed subtype-specific metabolites as potential therapeutic targets. For instance, the transcriptomic luminal androgen receptor subtype overlapped with C1, where sphingosine-1-phosphate (a key metabolite in the ceramide pathway) emerged as a promising therapeutic candidate; meanwhile, the transcriptomic basal-like immune-suppressed subtype corresponded with C2 and C3, where N-acetyl-aspartyl-glutamate emerged as another potential target [82]. Elia and coworkers [102] linked proline catabolism to in vivo metastasis formation, further demonstrating that inhibition of proline dehydrogenase was sufficient to impair lung metastasis development in mouse models.

Also, changes in the epigenomic landscape serve as key discriminators between healthy and cancerous breast tissues. We recently identified a critical feed-forward regulatory loop involving EBF1, ETS2, KLF2 transcription factors, and miR-126. Disruption of this EBF1/ETS2/KLF2-miR-126-gene circuit promotes oncogenic transformation and progression in BC. Compared to healthy cells, the three transcription factors were significantly downregulated in BC due to epigenetic silencing or a “poised but not transcribed” promoter state. This downregulation altered the expression of cancer-related genes, thereby facilitating malignant transformation [81].

Distinct epigenetic regulation patterns have been reported across BC subtypes. For example, Karsli-Ceppioglu and coworkers [103] observed aberrant gene regulation linked to changes in H3K9ac and/or H3K27me3 epi-marks, especially in TNBC, luminal B and HER2⁺ tumors. Among DEGs associated with these histone modifications (79 DEGs for H3K9ac and 37 genes for H3K27me3), key transcription factors such as PAX3, DLX5, RUNX1, and GATA4 were strongly implicated in BC progression. Similarly, decreases in H3K9me2/3 levels coupled with increased activity of KDM3A/JMJD1A in BC contribute to aberrant expression of critical genes, including MYC, PAX3, WNT5A, and CDKN2A/B. These epigenetic alterations play pivotal roles during transformation and hold promise as diagnostic markers and therapeutic targets [104]. Beyond histone modifications, a wide array of epigenetic mechanisms—including DNA methylation and ncRNA regulation—drive essential processes underlying tumorigenesis and metastatic potential in BC. For a comprehensive overview of the BC epigenome, including these multifaceted alterations, see the detailed review in [105].

In 2023, Choi and Chae [87] introduced moBRCA-net, an interpretable deep learning-based framework for BC subtype classification utilizing multi-omics datasets: this framework consists of four modules (preprocessing, multi-omics data integration, omics-level feature importance learning, and classification) that, through a multi-omics integration strategy, allow combination of gene expression, microRNA expression, and DNA methylation information for BC classification. Similarly, by using the Multi-Omics Factor Analysis v2 (MOFA+), an advanced statistical framework integrating transcriptomic, proteomic, and metabolomic data [106], Sharma and coworkers [107] successfully distinguished highly aggressive BCs from less aggressive forms. Furthermore, by using the Cancer Integration via Multikernel LeaRning (CIMLR) algorithm, integrating genomic, methylation, transcriptomic, microRNA expression, and protein data, Malighetti and colleagues [108] identified three prognostic biomarkers (LMO1, PRAME, and RSPO2), whose overexpression was consistently associated with worst outcome in both primary and metastatic BC patients [108].

Finally, the application of omics technologies and bioinformatics to plasma analysis has emerged as a powerful approach in BC research, offering significant promise for early detection, molecular subtype differentiation, and therapy response prediction. This approach is minimally invasive, less time-consuming, and more cost-effective compared to traditional methods. Plasma metabolomics, empowered by advanced bioinformatics, is enabling the discovery of novel biomarkers and therapeutic targets. Plasma metabolome and proteinogram performed on 216 healthy, benign, and BC subjects identified specific metabolic profiles for each condition [83]. Specifically, glutamate and glutamine metabolism, as well as alanine, aspartate, and glutamate and arginine biosynthesis metabolism, were found to be downregulated in BC patients; moreover, among the 31 differentially expressed proteins, aspartate aminotransferase, L-lactate dehydrogenase B chain, glutathione synthetase, and glutathione peroxidase 3 were closely linked to these metabolic pathways [83]. Two recent studies further exemplify the prognostic and predictive power of this approach. The first study [88] analyzed associations between 2074 circulating proteins and the risk of nine common cancers in a cohort of 337,822 cancer cases. It identified 21 proteins associated with BC risk. Among them, nine proteins (AOC2, SPN1, CD160, RALB, GDI2, CPNE1, ULK3, CTSF, and PLAUR) showed colocalized associations with multiple BC molecular subtypes. Notably, PLAUR exhibited a strong positive association with overall BC risk and all subtypes except HER2-enriched tumors [88]. The second study [80] employed a rigorous two-phase analytical framework combining proteome/transcriptome-wide association studies with Mendelian Randomization to identify plasma proteins that are not only associated with, but also causally linked to, BC risk. This approach leveraged large-scale high-throughput datasets and included extensive validation and sensitivity analyses to ensure robustness. The study identified five plasma proteins with strong causal associations to BC: PEX14 and CTSF showed positive causal effects, while SNUPN, CSK, and PARK7 were negatively associated with risk. Importantly, PEX14 was the only protein with a strong causal effect specific to the ER⁻ BC subtype, underscoring its subtype-specific importance [80].

3.2. Harnessing Bioinformatics for BC Therapy

Beyond diagnosis, biomarkers are also indispensable for tailoring therapies to individual tumor biology [109], especially taking into the account the high intra- and inter-tumoral heterogeneity of BC that demands more precise risk stratification of patients based on their prognosis and avoiding overtreatment in low-risk cases. Over the past decade, indeed, targeted therapy has become a central focus in oncology research and clinical practice, fundamentally transforming the treatment landscape for many cancers, including BC. A pivotal role is played by chemogenomic profiling that, by integrating genetic data with drug sensitivity patterns, maps molecular drivers of treatment response, decodes resistance mechanisms, and optimizes treatment selection, particularly in aggressive TNBC subtypes. In this context, Savage and co-workers [110] developed a library, consisting of 37 patient-derived xenografts (PDX) from hard-to-treat BCs, which retain the molecular and phenotypic characteristics of the original patient tumors. By combining different technologies (WGS, RNA-seq, reverse-phase protein array, drug sensitivity testing), the authors have drawn a comprehensive molecular profiling and screening of metastatic potential and chemosensitivity of this PDX library, thus indicating that it can represent a valuable preclinical resource.

Pharmacogenomics also plays a pivotal role, as genomic testing can be used for identifying specific mutations or polymorphisms that drive cancer growth and/or influence responses to drug therapy. Through this approach, BRCA1 and BRCA2 mutations have been identified as targetable genetic mutations in metastatic BC; targeted therapies (usage of PARP inhibitors, such as olaparib) have indeed been shown to be effective in treating patients with BRCA-positive BCs [111].

By providing insights into BC molecular subtypes and risk of recurrence, the Prediction Analysis of Microarray 50 (PAM50) [112] appears useful for tailoring treatment plans. For instance, Ohara and collaborators [113] evaluated the response to neoadjuvant chemotherapy for ER⁺ BC, demonstrating that PAM50 gene expression profiling provided a more accurate prediction of response than immunohistochemistry alone [113].

Proteomic approaches have been used to predict treatment response as well. For instance, by analyzing a BC cohort of 113 FFPE samples, before and after chemotherapy, Shenoy and colleagues [90] identified two proteins involved in proline biosynthesis, PYCR1 and ALDH18A1, significantly associated with chemotherapy resistance in a subtype-specific manner. In this context, a high-throughput proteomic dataset of over 13,000 proteins from 60 human BC cell lines is publicly available that, when combined with other omics data (genomics, transcriptomics, phosphoproteomics), aids in identifying markers predictive of drug response according to cancer subtype [89].

A strong predictor of therapy response and outcome is represented by the characterization of the immune tumor microenvironment (ITME), a complex and heterogeneous network consisting of varying stromal and immune cell populations, extracellular matrix, and signaling molecules [114]. Several computational tools, such as digital cytometry, allow the study of ITME. For example, CIBERSORT, and its successor CIBERSORTx, uses bulk tumor RNA-seq data to digitally estimate the proportions of immune cell types, allowing a more affordable alternative to scRNA-seq [115]. To overcome bias linked to these methods (such as inaccurate quantification of tumor-infiltrating immune cells), Fernandez and coworkers [116] developed the MIXTURE algorithm deconvoluting cell-type proportions of bulk tumor samples by using a leukocyte-validated gene signature, thus improving the accuracy of immune cell proportions related to outcome and response to immune checkpoint blockade. Recently, Zerdes and colleagues [117] characterized the spatial organization of ITME in early BC patients through machine learning approaches: by using biopsies from patients enrolled in the EORTC 10994/BIG 1–00 randomized phase III neoadjuvant trial (NCT00017095), they suggested that machine learning algorithms shows promise in accurate characterization of the immune infiltrate, with valuable prognostic and therapeutic implications [117].

Although many drugs have been approved for BC treatment [118], new therapies are urgently needed to address the limitations of current options, including resistance, toxicity, and poor outcomes in aggressive or advanced cases. In recent years, combining biological and computational approaches (drug databases, tools for the analysis of drug–target interactions, molecular docking, and AI technologies) has offered promising strategies for drug repurposing [119,120,121]. Several drugs originally approved for non-cancer or other cancer indications are currently under active investigation, although formal FDA (Food and Drug Administration) approval for BC is still lacking for most of them (Table 4).

For example, integration of transcriptomic data and structural docking enabled to repurpose existing FDA-approved drugs, such as dolasetron and granisetron (two serotonin receptor antagonists, approved as anti-emetic in the context of cancer chemotherapy) that have been shown to behave as aromatase inhibitors, thus suggesting their usefulness in hormone-related BC treatment [94]. By interrogating the GSCALite web server, three potential repurposable drugs (the mitogen-activated protein kinase kinase inhibitors trametinib, selumetinib, and refametinib) were found to target key genes (BUB1, ASPM, TTK, CCNA2, CENPF, RFC4, and CCNB1) involved in BC development [92].

Several pre-clinical and clinical studies have underlined both direct (inhibition of mitochondrial activity and mTOR signaling, AMPK activation, with subsequent reduction in protein synthesis, proliferation and cell growth) and indirect (reduction in gluconeogenesis, inflammation, insulin and IGF-1 levels, and stimulation of glucose uptake) effects of metformin (an anti-diabetic drug) on BC [122]. In addition, by combining metabolic markers, dynamic FDG-PET-CT imaging, transcriptomics, and metabolomics in BC patients, two distinct metabolic adaptation patterns to metformin have been identified: the first one (linked to metformin-resistance) showed increased expression of oxidative phosphorylation and fatty acid oxidation genes, while the second one (linked to metformin-sensitivity) displayed increased glucose uptake [93,123].

Finally, a novel computational drug repurposing approach has been developed that integrates omics data into a network-based machine learning framework, effectively capturing the complex interactions among drugs, genes, and BC subtypes. Using this method, ruxolitinib was successfully identified as a potential new drug for personalized BC treatment [95].

Table 4. Examples of repurposed and investigational drugs for BC: original indications and current evidence.

Drug	Class	Original FDA-Approved Use	Repurposing for BC (Approval Status)	Refs
Anastrozole	Aromatase inhibitor	Therapy in postmenopausal women with advanced HR⁺ BC Adjuvant therapy in early HR⁺ BC	Postmenopausal women at high risk of developing BC (UK-approved)	[124]
Azelastine	Histamine receptor antagonist	Allergy	HR⁺, HER2⁺ and TNBC subtypes (not yet approved, pre-clinical study)	[125]
Diclofenac	COX inhibitor	NSAID for pain and inflammation	TNBC (not yet approved, pre-clinical study)	[126]
Metformin	Mitochondrial complex I inhibitor	Type-2 diabetes mellitus	HR⁺, HER2⁺ and TNBC subtypes (not yet approved, pre-clinical and clinical studies)	[122]
Nebivolol	β-adrenergic receptor antagonist	Hypertension	TNBC (not yet approved, pre-clinical studies)	[127,128]
Olaparib	PARP inhibitor	Advanced BRCA-mutated ovarian cancer	early and metastatic BRCA-mutated BC (FDA-approved)	[129]
Ruxolitinib	JAK inhibitor	Bone marrow and blood cancers	HR⁺ metastatic BC and TNBC (not yet approved, clinical study)	[130]
Trametinib	MEK inhibitor	BRAF-mutated melanoma, NSCLC, thyroid cancer, and low-grade gliomas	TNBC (not yet approved, clinical study)	[131]

BC: Breast Cancer; COX: Cyclooxygenase; FDA: Food and Drug Administration; HER2: Human Epidermal Growth Factor Receptor 2; HR: Hormone Receptor; JAK: Janus Kinase; MEK: Mitogen-Activated Protein Kinase Kinase; NSCLC: Non-Small Cell Lung Cancer; NSAID: Non-Steroidal Anti-Inflammatory Drug; PARP: Poly (ADP-Ribose) Polymerase; TNBC: Triple-Negative Breast Cancer; UK: United Kingdom.

In conclusion, integration of bioinformatics with genomic, proteomic, and pharmacological data is revolutionizing BC management by enabling increasingly precise therapy personalization. This multidisciplinary approach, indeed, not only allows understanding of the molecular mechanisms underlying therapy response but can also facilitate the discovery of novel biomarkers and/or the repurposing of existing drugs.

4. Conclusions

Due to the complex molecular landscape of BC, multi-omics approaches integrated with bioinformatics represent a relevant opportunity for more precisely defining the prognosis and selecting the best treatment for each individual patient. This field is continuously advancing, and some tests are already in use in some countries. Currently, five main gene expression profiling tests for BC are commercially available, namely Prosigna^® (PAM50), Mammaprint^® (based on the Amsterdam 70-gene BC gene signature), Oncotype DX^® (21-gene recurrence score assay), Breast Cancer Index^® (11-gene signature for predicting the risk of recurrence), and Endopredict^® (12 gene prognostic test) [132]. These tests provide valuable information to personalize therapies and improve disease management.

However, there are still many challenges that need to be overcome to make progress in this area. The first challenge is represented by the difficulty in integrating omics data, such as genomics, transcriptomics, and proteomics, to obtain a better understanding of biomolecular mechanisms, as each omics provides unique information about different aspects of the molecular system under study. Due to the dimensions and complexity of these data, sophisticated bioinformatics and expertise are required for analysis, where machine learning and AI play a crucial role by enhancing speed and precision.

AI supports knowledge-based approaches (such as identifying relationships between biomarkers through models that better integrate well-known biological structures) and manages challenges like noisy data, missing information from some omics technologies, and reducing reliance on manual feature engineering [133,134].

Another obstacle to address concerns big data generated by many sequencing methods. These data derive from different sources, are highly heterogeneous and, often, not related to each other. Moreover, the error rate is quite high, thus making it difficult to discriminate genetic variants from sequencing errors and to interpret data since many variants are not clinically relevant to the diseases [135]. These problems can be solved by combining data from different datasets, e.g., PharmGKB [136] and ClinVar [39], which curate and collate information on many variants, especially in the field of drug response [137].

The lack of published guidelines about bioinformatics may create confusion on how to establish and validate bioinformatics pipelines. Appropriate quality control is crucial to ensure that the generated data are robust, accurate, reproducible, and traceable. To address these challenges, the Association of Molecular Pathology issued clinical practice guidelines and reports (containing 17 consensus recommendations) designed to standardize the validation process of clinical NGS bioinformatics pipelines [138].

We mentioned above the role of machine learning and deep learning for diagnosis, staging, treatment, and prognosis in BC studies. However, AI challenges include nonuniformity of data standards, high development costs of AI systems, and instability of AI models. In particular, the latter problem is due to several factors, including incorrect data selection and accuracy, inherent bias, and model misspecification; thus, stability testing is mandatory for ensuring that AI works reliably. Going forward, employing multimodal learning to combine medical imaging with holographic data will become a valuable asset in research [139].

Looking ahead, bioinformatics will need standardized procedures and publicly accessible data for analysis to guarantee consistency and reproducibility. Increased investment in computational infrastructure and educational initiatives focused on bioinformatics and AI will further advance their use. To boost the reliability and broad applicability of AI-driven models, it is crucial to foster interdisciplinary collaboration and conduct comprehensive clinical validation [140].

Author Contributions

M.V., I.S., E.C., V.G., A.G. and M.V.C., data curation, methodology, writing the original draft, review, and editing. A.G., M.V.C. and V.G., supervision, writing, review, and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This work has been supported by Tor Vergata University of Rome-Progetti Ricerca Scientifica d’Ateneo 2021 n°. 0000147/2022 to M.V.C. We also thank the sources of private funding who provided the basis for the study: “Progetto cellule staminali” NAST Centre.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data sharing is not applicable to this article as no datasets were generated or analyzed during this current study.

Acknowledgments

During the preparation of this manuscript, the authors used Perplexity AI (version of 20 June 2025) for the purposes of assistance with English language editing and conceptual clarity. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Kim, J.; Harper, A.; McCormack, V.; Sung, H.; Houssami, N.; Morgan, E.; Mutebi, M.; Garvey, G.; Soerjomataram, I.; Fidler-Benaoudia, M.M. Global Patterns and Trends in Breast Cancer Incidence and Mortality across 185 Countries. Nat. Med. 2025, 31, 1154–1162. [Google Scholar] [CrossRef] [PubMed]
Fumagalli, C.; Barberis, M. Breast Cancer Heterogeneity. Diagnostics 2021, 11, 1555. [Google Scholar] [CrossRef]
Lüönd, F.; Tiede, S.; Christofori, G. Breast Cancer as an Example of Tumour Heterogeneity and Tumour Cell Plasticity during Malignant Progression. Br. J. Cancer 2021, 125, 164–175. [Google Scholar] [CrossRef]
Tan, P.H.; Ellis, I.; Allison, K.; Brogi, E.; Fox, S.B.; Lakhani, S.; Lazar, A.J.; Morris, E.A.; Sahin, A.; Salgado, R.; et al. The 2019 World Health Organization Classification of Tumours of the Breast. Histopathology 2020, 77, 181–185. [Google Scholar] [CrossRef]
Radu, I.; Scripcariu, V.; Panuța, A.; Rusu, A.; Afrăsânie, V.-A.; Cojocaru, E.; Aniței, M.G.; Alexa-Stratulat, T.; Terinte, C.; Șerban, C.F.; et al. Breast Sarcomas—How Different Are They from Breast Carcinomas? Clinical, Pathological, Imaging and Treatment Insights. Diagnostics 2023, 13, 1370. [Google Scholar] [CrossRef]
Orrantia-Borunda, E.; Anchondo-Nuñez, P.; Acuña-Aguilar, L.E.; Gómez-Valles, F.O.; Ramírez-Valdespino, C.A. Subtypes of Breast Cancer. In Breast Cancer; Exon Publications: Brisbane, QLD, Australia, 2022; pp. 31–42. [Google Scholar]
Breast Cancer Association Consortium. Breast Cancer Risk Genes—Association Analysis in More than 113,000 Women. N. Engl. J. Med. 2021, 384, 428–439. [Google Scholar] [CrossRef] [PubMed]
Taylor, A.; Brady, A.F.; Frayling, I.M.; Hanson, H.; Tischkowitz, M.; Turnbull, C.; Side, L. Consensus for Genes to Be Included on Cancer Panel Tests Offered by UK Genetics Services: Guidelines of the UK Cancer Genetics Group. J. Med. Genet. 2018, 55, 372–377. [Google Scholar] [CrossRef]
McDevitt, T.; Durkie, M.; Arnold, N.; Burghel, G.J.; Butler, S.; Claes, K.B.M.; Logan, P.; Robinson, R.; Sheils, K.; Wolstenholme, N.; et al. EMQN Best Practice Guidelines for Genetic Testing in Hereditary Breast and Ovarian Cancer. Eur. J. Hum. Genet. 2024, 32, 479–488. [Google Scholar] [CrossRef]
Mavaddat, N.; Dorling, L.; Carvalho, S.; Allen, J.; González-Neira, A.; Keeman, R.; Bolla, M.K.; Dennis, J.; Wang, Q.; Ahearn, T.U.; et al. Pathology of Tumors Associated With Pathogenic Germline Variants in 9 Breast Cancer Susceptibility Genes. JAMA Oncol. 2022, 8, e216744. [Google Scholar] [CrossRef]
Feng, Y.; Spezia, M.; Huang, S.; Yuan, C.; Zeng, Z.; Zhang, L.; Ji, X.; Liu, W.; Huang, B.; Luo, W.; et al. Breast Cancer Development and Progression: Risk Factors, Cancer Stem Cells, Signaling Pathways, Genomics, and Molecular Pathogenesis. Genes Dis. 2018, 5, 77–106. [Google Scholar] [CrossRef] [PubMed]
Koboldt, D.C.; Fulton, R.S.; McLellan, M.D.; Schmidt, H.; Kalicki-Veizer, J.; McMichael, J.F.; Fulton, L.L.; Dooling, D.J.; Ding, L.; Mardis, E.R.; et al. Comprehensive Molecular Portraits of Human Breast Tumours. Nature 2012, 490, 61–70. [Google Scholar] [CrossRef]
Goetz, L.H.; Schork, N.J. Personalized Medicine: Motivation, Challenges, and Progress. Fertil. Steril. 2018, 109, 952–963. [Google Scholar] [CrossRef]
Clark, A.J.; Lillard, J.W. A Comprehensive Review of Bioinformatics Tools for Genomic Biomarker Discovery Driving Precision Oncology. Genes 2024, 15, 1036. [Google Scholar] [CrossRef]
Bagger, F.O.; Borgwardt, L.; Jespersen, A.S.; Hansen, A.R.; Bertelsen, B.; Kodama, M.; Nielsen, F.C. Whole Genome Sequencing in Clinical Practice. BMC Med. Genom. 2024, 17, 39. [Google Scholar] [CrossRef]
Zhao, E.Y.; Jones, M.; Jones, S.J.M. Whole-Genome Sequencing in Cancer. Cold Spring Harb. Perspect. Med. 2019, 9, a034579. [Google Scholar] [CrossRef]
Hong, M.; Tao, S.; Zhang, L.; Diao, L.T.; Huang, X.; Huang, S.; Xie, S.J.; Xiao, Z.D.; Zhang, H. RNA Sequencing: New Technologies and Applications in Cancer Research. J. Hematol. Oncol. 2020, 13, 166. [Google Scholar] [CrossRef]
Rabbani, B.; Tekin, M.; Mahdieh, N. The Promise of Whole-Exome Sequencing in Medical Genetics. J. Hum. Genet. 2014, 59, 5–15. [Google Scholar] [CrossRef]
Lelieveld, S.H.; Spielmann, M.; Mundlos, S.; Veltman, J.A.; Gilissen, C. Comparison of Exome and Genome Sequencing Technologies for the Complete Capture of Protein-Coding Regions. Hum. Mutat. 2015, 36, 815–822. [Google Scholar] [CrossRef]
Pei, X.M.; Yeung, M.H.Y.; Wong, A.N.N.; Tsang, H.F.; Yu, A.C.S.; Yim, A.K.Y.; Wong, S.C.C. Targeted Sequencing Approach and Its Clinical Applications for the Molecular Diagnosis of Human Diseases. Cells 2023, 12, 493. [Google Scholar] [CrossRef] [PubMed]
Tang, F.; Barbacioru, C.; Wang, Y.; Nordman, E.; Lee, C.; Xu, N.; Wang, X.; Bodeau, J.; Tuch, B.B.; Siddiqui, A.; et al. MRNA-Seq Whole-Transcriptome Analysis of a Single Cell. Nat. Methods 2009, 6, 377–382. [Google Scholar] [CrossRef] [PubMed]
Wei, E.; Reisinger, A.; Li, J.; French, L.E.; Clanner-Engelshofen, B.; Reinholz, M. Integration of ScRNA-Seq and TCGA RNA-Seq to Analyze the Heterogeneity of HPV+ and HPV- Cervical Cancer Immune Cells and Establish Molecular Risk Models. Front. Oncol. 2022, 12, 860900. [Google Scholar] [CrossRef]
Lopes, R.; Agami, R.; Korkmaz, G. GRO-Seq, a Tool for Identification of Transcripts Regulating Gene Expression. Methods Mol. Biol. 2017, 1543, 43–55. [Google Scholar] [CrossRef]
Chen, X.; Xu, H.; Shu, X.; Song, C.-X. Mapping Epigenetic Modifications by Sequencing Technologies. Cell Death Differ. 2025, 32, 56–65. [Google Scholar] [CrossRef]
Guanzon, D.; Ross, J.P.; Ma, C.; Berry, O.; Liew, Y.J. Comparing Methylation Levels Assayed in GC-Rich Regions with Current and Emerging Methods. BMC Genom. 2024, 25, 741. [Google Scholar] [CrossRef]
Mehrmohamadi, M.; Sepehri, M.H.; Nazer, N.; Norouzi, M.R. A Comparative Overview of Epigenomic Profiling Methods. Front. Cell Dev. Biol. 2021, 9, 714687. [Google Scholar] [CrossRef]
Caldera, M.; Buphamalai, P.; Müller, F.; Menche, J. Interactome-Based Approaches to Human Disease. Curr. Opin. Syst. Biol. 2017, 3, 88–94. [Google Scholar] [CrossRef]
Li, G.; Cai, L.; Chang, H.; Hong, P.; Zhou, Q.; Kulakova, E.V.; Kolchanov, N.A.; Ruan, Y. Chromatin Interaction Analysis with Paired-End Tag (ChIA-PET) Sequencing Technology and Application. BMC Genom. 2014, 15, S11. [Google Scholar] [CrossRef]
Cui, M.; Cheng, C.; Zhang, L. High-Throughput Proteomics: A Methodological Mini-Review. Lab. Investig. 2022, 102, 1170–1181. [Google Scholar] [CrossRef]
Meng, X.; Liu, Y.; Xu, S.; Yang, L.; Yin, R. Review on Analytical Technologies and Applications in Metabolomics. BIOCELL 2024, 48, 65–78. [Google Scholar] [CrossRef]
Al-Harazi, O.; El Allali, A.; Colak, D. Biomolecular Databases and Subnetwork Identification Approaches of Interest to Big Data Community: An Expert Review. Omi. A J. Integr. Biol. 2019, 23, 138–151. [Google Scholar] [CrossRef]
Diniz, W.J.S.; Canduri, F. REVIEW-ARTICLE Bioinformatics: An Overview and Its Applications. Genet. Mol. Res. 2017, 16, 17. [Google Scholar] [CrossRef]
Benson, D.A.; Cavanaugh, M.; Clark, K.; Karsch-Mizrachi, I.; Lipman, D.J.; Ostell, J.; Sayers, E.W. GenBank. Nucleic Acids Res. 2012, 41, D36–D42. [Google Scholar] [CrossRef]
Stoesser, G. The EMBL Nucleotide Sequence Database. Nucleic Acids Res. 2002, 30, 21–26. [Google Scholar] [CrossRef] [PubMed]
Sweeney, B.A.; Petrov, A.I.; Burkov, B.; Finn, R.D.; Bateman, A.; Szymanski, M.; Karlowski, W.M.; Gorodkin, J.; Seemann, S.E.; Cannone, J.J.; et al. RNAcentral: A Hub of Information for Non-Coding RNA Sequences. Nucleic Acids Res. 2019, 47, D221–D229. [Google Scholar] [CrossRef]
Bateman, A.; Martin, M.-J.; Orchard, S.; Magrane, M.; Ahmad, S.; Alpi, E.; Bowler-Barnett, E.H.; Britto, R.; Bye-A-Jee, H.; Cukura, A.; et al. UniProt: The Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 2023, 51, D523–D531. [Google Scholar] [CrossRef]
Clough, E.; Barrett, T. The Gene Expression Omnibus Database. Methods Mol Biol. 2016, 1418, 93–110. [Google Scholar] [CrossRef]
Ma, C.X.; Ellis, M.J. The Cancer Genome Atlas: Clinical Applications for Breast Cancer. Oncology 2013, 27, 1263–1269. [Google Scholar]
Landrum, M.J.; Lee, J.M.; Benson, M.; Brown, G.R.; Chao, C.; Chitipiralla, S.; Gu, B.; Hart, J.; Hoffman, D.; Jang, W.; et al. ClinVar: Improving Access to Variant Interpretations and Supporting Evidence. Nucleic Acids Res. 2018, 46, D1062–D1067. [Google Scholar] [CrossRef]
Phan, L.; Zhang, H.; Wang, Q.; Villamarin, R.; Hefferon, T.; Ramanathan, A.; Kattman, B. The Evolution of DbSNP: 25 Years of Impact in Genomic Research. Nucleic Acids Res. 2025, 53, D925–D931. [Google Scholar] [CrossRef]
Pettersson, E. Tri-Nucleotide Threading for Parallel Amplification of Minute Amounts of Genomic DNA. Nucleic Acids Res. 2006, 34, e49. [Google Scholar] [CrossRef]
Burley, S.K.; Piehl, D.W.; Vallat, B.; Zardecki, C. RCSB Protein Data Bank: Supporting Research and Education Worldwide through Explorations of Experimentally Determined and Computationally Predicted Atomic Level 3D Biostructures. IUCrJ 2024, 11, 279–286. [Google Scholar] [CrossRef] [PubMed]
Burley, S.K.; Bhikadiya, C.; Bi, C.; Bittrich, S.; Chao, H.; Chen, L.; Craig, P.A.; Crichlow, G.V.; Dalenberg, K.; Duarte, J.M.; et al. RCSB Protein Data Bank (RCSB.Org): Delivery of Experimentally-Determined PDB Structures alongside One Million Computed Structure Models of Proteins from Artificial Intelligence/Machine Learning. Nucleic Acids Res. 2023, 51, D488–D508. [Google Scholar] [CrossRef]
Andreeva, A.; Kulesha, E.; Gough, J.; Murzin, A.G. The SCOP Database in 2020: Expanded Classification of Representative Family and Superfamily Domains of Known Protein Structures. Nucleic Acids Res. 2020, 48, D376–D382. [Google Scholar] [CrossRef]
Safari-Alighiarloo, N.; Taghizadeh, M.; Rezaei-Tavirani, M.; Goliaei, B.; Peyvandi, A.A. Protein-Protein Interaction Networks (PPI) and Complex Diseases. Gastroenterol. Hepatol. Bed Bench 2014, 7, 17–31. [Google Scholar]
Oughtred, R.; Rust, J.; Chang, C.; Breitkreutz, B.; Stark, C.; Willems, A.; Boucher, L.; Leung, G.; Kolas, N.; Zhang, F.; et al. The BioGRID Database: A Comprehensive Biomedical Resource of Curated Protein, Genetic, and Chemical Interactions. Protein Sci. 2021, 30, 187–200. [Google Scholar] [CrossRef]
Szklarczyk, D.; Kirsch, R.; Koutrouli, M.; Nastou, K.; Mehryary, F.; Hachilif, R.; Gable, A.L.; Fang, T.; Doncheva, N.T.; Pyysalo, S.; et al. The STRING Database in 2023: Protein–Protein Association Networks and Functional Enrichment Analyses for Any Sequenced Genome of Interest. Nucleic Acids Res. 2023, 51, D638–D646. [Google Scholar] [CrossRef]
del Toro, N.; Shrivastava, A.; Ragueneau, E.; Meldal, B.; Combe, C.; Barrera, E.; Perfetto, L.; How, K.; Ratan, P.; Shirodkar, G.; et al. The IntAct Database: Efficient Access to Fine-Grained Molecular Interaction Data. Nucleic Acids Res. 2022, 50, D648–D653. [Google Scholar] [CrossRef]
Kanehisa, M.; Furumichi, M.; Tanabe, M.; Sato, Y.; Morishima, K. KEGG: New Perspectives on Genomes, Pathways, Diseases and Drugs. Nucleic Acids Res. 2017, 45, D353–D361. [Google Scholar] [CrossRef]
Chowdhury, B.; Garai, G. A Review on Multiple Sequence Alignment from the Perspective of Genetic Algorithm. Genomics 2017, 109, 419–431. [Google Scholar] [CrossRef] [PubMed]
Dayhoff, M.O.; Schwartz, R.M.; Orcutt, B.C. A Model of Evolutionary Change. In Atlas of Protein Sequence and Structure; National Biomedical Research Foundation: Waltham, MA, USA, 1978; pp. 345–352. [Google Scholar]
Henikoff, S.; Henikoff, J.G. Amino Acid Substitution Matrices from Protein Blocks. Proc. Natl. Acad. Sci. USA 1992, 89, 10915–10919. [Google Scholar] [CrossRef] [PubMed]
Bernardi, G.; Bernardi, G. Compositional Constraints and Genome Evolution. J. Mol. Evol. 1986, 24, 1–11. [Google Scholar] [CrossRef] [PubMed]
Sun, W.; Duan, T.; Ye, P.; Chen, K.; Zhang, G.; Lai, M.; Zhang, H. TSVdb: A Web-Tool for TCGA Splicing Variants Analysis. BMC Genom. 2018, 19, 405. [Google Scholar] [CrossRef]
Gao, J.; Aksoy, B.A.; Dogrusoz, U.; Dresdner, G.; Gross, B.; Sumer, S.O.; Sun, Y.; Jacobsen, A.; Sinha, R.; Larsson, E.; et al. Integrative Analysis of Complex Cancer Genomics and Clinical Profiles Using the CBioPortal. Sci. Signal. 2013, 6, pl1. [Google Scholar] [CrossRef]
Achawanantakun, R.; Chen, J.; Sun, Y.; Zhang, Y. LncRNA-ID: Long Non-Coding RNA IDentification Using Balanced Random Forests. Bioinformatics 2015, 31, 3897–3905. [Google Scholar] [CrossRef]
Li, Z.; Liu, L.; Feng, C.; Qin, Y.; Xiao, J.; Zhang, Z.; Ma, L. LncBook 2.0: integrating human long non-coding RNAs with multi-omics annotations. Nucleic Acids Res. 2023, 51, D186–D191. [Google Scholar] [CrossRef]
Chang, L.; Xia, J. MicroRNA Regulatory Network Analysis Using MiRNet 2.0. Methods Mol Biol. 2023, 2594, 185–204. [Google Scholar] [CrossRef]
Kern, F.; Aparicio-Puerta, E.; Li, Y.; Fehlmann, T.; Kehl, T.; Wagner, V.; Ray, K.; Ludwig, N.; Lenhof, H.-P.; Meese, E.; et al. MiRTargetLink 2.0—Interactive MiRNA Target Gene and Target Pathway Networks. Nucleic Acids Res. 2021, 49, W409–W416. [Google Scholar] [CrossRef]
Chen, Y.; Wang, X. MiRDB: An Online Database for Prediction of Functional MicroRNA Targets. Nucleic Acids Res. 2020, 48, D127–D131. [Google Scholar] [CrossRef] [PubMed]
Tomczak, A.; Mortensen, J.M.; Winnenburg, R.; Liu, C.; Alessi, D.T.; Swamy, V.; Vallania, F.; Lofgren, S.; Haynes, W.; Shah, N.H.; et al. Interpretation of Biological Experiments Changes with Evolution of the Gene Ontology and Its Annotations. Sci. Rep. 2018, 8, 5115. [Google Scholar] [CrossRef] [PubMed]
The Gene Ontology Consortium. The Gene Ontology Resource: 20 Years and Still GOing Strong. Nucleic Acids Res. 2019, 47, D330–D338. [Google Scholar] [CrossRef]
Pomaznoy, M.; Ha, B.; Peters, B. GOnet: A Tool for Interactive Gene Ontology Analysis. BMC Bioinform. 2018, 19, 470. [Google Scholar] [CrossRef]
Ge, S.X.; Jung, D.; Yao, R. ShinyGO: A Graphical Gene-Set Enrichment Tool for Animals and Plants. Bioinformatics 2020, 36, 2628–2629. [Google Scholar] [CrossRef]
Cheng, X.; Yan, J.; Liu, Y.; Wang, J.; Taubert, S. EVITTA: A Web-Based Visualization and Inference Toolbox for Transcriptome Analysis. Nucleic Acids Res. 2021, 49, W207–W215. [Google Scholar] [CrossRef] [PubMed]
Ge, X. IDEP Web Application for RNA-Seq Data Analysis. Methods Mol. Biol. 2021, 2284, 417–443. [Google Scholar] [PubMed]
Reimand, J.; Kull, M.; Peterson, H.; Hansen, J.; Vilo, J. G:Profiler—A Web-Based Toolset for Functional Profiling of Gene Lists from Large-Scale Experiments. Nucleic Acids Res. 2007, 35, W193–W200. [Google Scholar] [CrossRef] [PubMed]
Abramson, J.; Adler, J.; Dunger, J.; Evans, R.; Green, T.; Pritzel, A.; Ronneberger, O.; Willmore, L.; Ballard, A.J.; Bambrick, J.; et al. Accurate Structure Prediction of Biomolecular Interactions with AlphaFold 3. Nature 2024, 630, 493–500. [Google Scholar] [CrossRef]
Agu, P.C.; Afiukwa, C.A.; Orji, O.U.; Ezeh, E.M.; Ofoke, I.H.; Ogbu, C.O.; Ugwuja, E.I.; Aja, P.M. Molecular Docking as a Tool for the Discovery of Molecular Targets of Nutraceuticals in Diseases Management. Sci. Rep. 2023, 13, 13398. [Google Scholar] [CrossRef]
Paggi, J.M.; Pandit, A.; Dror, R.O. The Art and Science of Molecular Docking. Annu. Rev. Biochem. 2024, 93, 389–410. [Google Scholar] [CrossRef]
Sugiki, T.; Kobayashi, N.; Fujiwara, T. Modern Technologies of Solution Nuclear Magnetic Resonance Spectroscopy for Three-Dimensional Structure Determination of Proteins Open Avenues for Life Scientists. Comput. Struct. Biotechnol. J. 2017, 15, 328–339. [Google Scholar] [CrossRef]
Eberhardt, J.; Santos-Martins, D.; Tillack, A.F.; Forli, S. AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python Bindings. J. Chem. Inf. Model. 2021, 61, 3891–3898. [Google Scholar] [CrossRef]
Liu, Y.; Yang, X.; Gan, J.; Chen, S.; Xiao, Z.-X.; Cao, Y. CB-Dock2: Improved Protein–Ligand Blind Docking by Integrating Cavity Detection, Docking and Homologous Template Fitting. Nucleic Acids Res. 2022, 50, W159–W164. [Google Scholar] [CrossRef]
Greener, J.G.; Kandathil, S.M.; Moffat, L.; Jones, D.T. A Guide to Machine Learning for Biologists. Nat. Rev. Mol. Cell Biol. 2022, 23, 40–55. [Google Scholar] [CrossRef]
Singh, A.; Singh, A.; Bhattacharya, S. Research Trends on AI in Breast Cancer Diagnosis, and Treatment over Two Decades. Discov. Oncol. 2024, 15, 772. [Google Scholar] [CrossRef]
Yan, S.; Yue, S. Identification of Early Diagnostic Biomarkers for Breast Cancer through Bioinformatics Analysis. Medicine 2023, 102, e35273. [Google Scholar] [CrossRef]
Wang, N.; Zhang, H.; Li, D.; Jiang, C.; Zhao, H.; Teng, Y. Identification of Novel Biomarkers in Breast Cancer via Integrated Bioinformatics Analysis and Experimental Validation. Bioengineered 2021, 12, 12431–12446. [Google Scholar] [CrossRef]
Golestan, A.; Tahmasebi, A.; Maghsoodi, N.; Faraji, S.N.; Irajie, C.; Ramezani, A. Unveiling Promising Breast Cancer Biomarkers: An Integrative Approach Combining Bioinformatics Analysis and Experimental Verification. BMC Cancer 2024, 24, 155. [Google Scholar] [CrossRef]
Xu, K.; Wang, R.; Xie, H.; Hu, L.; Wang, C.; Xu, J.; Zhu, C.; Liu, Y.; Gao, F.; Li, X.; et al. Single-Cell RNA Sequencing Reveals Cell Heterogeneity and Transcriptome Profile of Breast Cancer Lymph Node Metastasis. Oncogenesis 2021, 10, 66. [Google Scholar] [CrossRef]
Wang, Y.; Yi, K.; Chen, B.; Zhang, B.; Jidong, G. Elucidating the Susceptibility to Breast Cancer: An in-Depth Proteomic and Transcriptomic Investigation into Novel Potential Plasma Protein Biomarkers. Front. Mol. Biosci. 2024, 10, 1340917. [Google Scholar] [CrossRef] [PubMed]
Gambacurta, A.; Tullio, V.; Savini, I.; Mauriello, A.; Catani, M.V.; Gasperi, V. Identification of the EBF1/ETS2/KLF2-MiR-126-Gene Feed-Forward Loop in Breast Carcinogenesis and Stemness. Int. J. Mol. Sci. 2025, 26, 328. [Google Scholar] [CrossRef] [PubMed]
Xiao, Y.; Ma, D.; Yang, Y.-S.; Yang, F.; Ding, J.-H.; Gong, Y.; Jiang, L.; Ge, L.-P.; Wu, S.-Y.; Yu, Q.; et al. Comprehensive Metabolomics Expands Precision Medicine for Triple-Negative Breast Cancer. Cell Res. 2022, 32, 477–490. [Google Scholar] [CrossRef] [PubMed]
An, R.; Yu, H.; Wang, Y.; Lu, J.; Gao, Y.; Xie, X.; Zhang, J. Integrative Analysis of Plasma Metabolomics and Proteomics Reveals the Metabolic Landscape of Breast Cancer. Cancer Metab. 2022, 10, 13. [Google Scholar] [CrossRef]
Asleh, K.; Negri, G.L.; Spencer Miko, S.E.; Colborne, S.; Hughes, C.S.; Wang, X.Q.; Gao, D.; Gilks, C.B.; Chia, S.K.L.; Nielsen, T.O.; et al. Proteomic Analysis of Archival Breast Cancer Clinical Specimens Identifies Biological Subtypes with Distinct Survival Outcomes. Nat. Commun. 2022, 13, 896. [Google Scholar] [CrossRef] [PubMed]
Jeon, Y.; Lee, G.; Jeong, H.; Gong, G.; Kim, J.; Kim, K.; Jeong, J.H.; Lee, H.J. Proteomic Analysis of Breast Cancer Based on Immune Subtypes. Clin. Proteom. 2024, 21, 17. [Google Scholar] [CrossRef] [PubMed]
Azevedo, A.L.K.; Gomig, T.H.B.; Batista, M.; Marchini, F.K.; Spautz, C.C.; Rabinovich, I.; Sebastião, A.P.M.; Oliveira, J.C.; Gradia, D.F.; Cavalli, I.J.; et al. High-Throughput Proteomics of Breast Cancer Subtypes: Biological Characterization and Multiple Candidate Biomarker Panels to Patients’ Stratification. J. Proteom. 2023, 285, 104955. [Google Scholar] [CrossRef] [PubMed]
Choi, J.M.; Chae, H. MoBRCA-Net: A Breast Cancer Subtype Classification Framework Based on Multi-Omics Attention Neural Networks. BMC Bioinform. 2023, 24, 169. [Google Scholar] [CrossRef]
Smith-Byrne, K.; Hedman, Å.; Dimitriou, M.; Desai, T.; Sokolov, A.V.; Schioth, H.B.; Koprulu, M.; Pietzner, M.; Langenberg, C.; Atkins, J.; et al. Identifying Therapeutic Targets for Cancer among 2074 Circulating Proteins and Risk of Nine Cancers. Nat. Commun. 2024, 15, 3621. [Google Scholar] [CrossRef]
Kalocsay, M.; Berberich, M.J.; Everley, R.A.; Nariya, M.K.; Chung, M.; Gaudio, B.; Victor, C.; Bradshaw, G.A.; Eisert, R.J.; Hafner, M.; et al. Proteomic Profiling across Breast Cancer Cell Lines and Models. Sci. Data 2023, 10, 514. [Google Scholar] [CrossRef]
Shenoy, A.; Belugali Nataraj, N.; Perry, G.; Loayza Puch, F.; Nagel, R.; Marin, I.; Balint, N.; Bossel, N.; Pavlovsky, A.; Barshack, I.; et al. Proteomic Patterns Associated with Response to Breast Cancer Neoadjuvant Treatment. Mol. Syst. Biol. 2020, 16, e9443. [Google Scholar] [CrossRef]
Zheng, B.; Du, P.; Zeng, Z.; Cao, P.; Ma, X.; Jiang, Y. Propranolol Inhibits EMT and Metastasis in Breast Cancer through MiR-499-5p-Mediated Sox6. J. Cancer Res. Clin. Oncol. 2024, 150, 59. [Google Scholar] [CrossRef]
Alam, M.S.; Rahaman, M.M.; Sultana, A.; Wang, G.; Mollah, M.N.H. Statistics and Network-Based Approaches to Identify Molecular Mechanisms That Drive the Progression of Breast Cancer. Comput. Biol. Med. 2022, 145, 105508. [Google Scholar] [CrossRef]
Lord, S.R.; Collins, J.M.; Cheng, W.-C.; Haider, S.; Wigfield, S.; Gaude, E.; Fielding, B.A.; Pinnick, K.E.; Harjes, U.; Segaran, A.; et al. Transcriptomic Analysis of Human Primary Breast Cancer Identifies Fatty Acid Oxidation as a Target for Metformin. Br. J. Cancer 2020, 122, 258–265. [Google Scholar] [CrossRef]
Mujawar, T.; Tare, H.; Deshmukh, N.; Udugade, B.; Thube, U. Repurposing FDA-Approved Anastrozole-Based Drugs for Breast Cancer through Drug-Drug Transcriptomic Similarity and Cavity Detection Guided Blind Docking. Int. J. DRUG Deliv. Technol. 2023, 13, 1172–1177. [Google Scholar] [CrossRef]
Firoozbakht, F.; Rezaeian, I.; Rueda, L.; Ngom, A. Computationally Repurposing Drugs for Breast Cancer Subtypes Using a Network-Based Approach. BMC Bioinform. 2022, 23, 143. [Google Scholar] [CrossRef] [PubMed]
Neagu, A.-N.; Whitham, D.; Buonanno, E.; Jenkins, A.; Alexa-Stratulat, T.; Tamba, B.I.; Darie, C.C. Proteomics and Its Applications in Breast Cancer. Am. J. Cancer Res. 2021, 11, 4006–4049. [Google Scholar]
Huang, Y.; Zeng, P.; Zhong, C. Classifying Breast Cancer Subtypes on Multi-Omics Data via Sparse Canonical Correlation Analysis and Deep Learning. BMC Bioinform. 2024, 25, 132. [Google Scholar] [CrossRef]
Li, M.; Guo, Y.; Feng, Y.-M.; Zhang, N. Identification of Triple-Negative Breast Cancer Genes and a Novel High-Risk Breast Cancer Prediction Model Development Based on PPI Data and Support Vector Machines. Front. Genet. 2019, 10, 180. [Google Scholar] [CrossRef]
Kuilman, M.M.; Ellappalayam, A.; Barcaru, A.; Haan, J.C.; Bhaskaran, R.; Wehkamp, D.; Menicucci, A.R.; Audeh, W.M.; Mittempergher, L.; Glas, A.M. BluePrint Breast Cancer Molecular Subtyping Recognizes Single and Dual Subtype Tumors with Implications for Therapeutic Guidance. Breast Cancer Res. Treat. 2022, 195, 263–274. [Google Scholar] [CrossRef]
Loo, S.K.; Yates, M.E.; Yang, S.; Oesterreich, S.; Lee, A.V.; Wang, X.-S. Fusion-Associated Carcinomas of the Breast: Diagnostic, Prognostic, and Therapeutic Significance. Genes. Chromosomes Cancer 2022, 61, 261–273. [Google Scholar] [CrossRef]
Roy, S.; Gupta, D. Analysis of Breast Cancer Next-Generation Sequencing Datasets for Identifying Fusion Genes Responsible for the Cancer Progression. Inform. Med. Unlocked 2023, 41, 101306. [Google Scholar] [CrossRef]
Elia, I.; Broekaert, D.; Christen, S.; Boon, R.; Radaelli, E.; Orth, M.F.; Verfaillie, C.; Grünewald, T.G.P.; Fendt, S.-M. Proline Metabolism Supports Metastasis Formation and Could Be Inhibited to Selectively Target Metastasizing Cancer Cells. Nat. Commun. 2017, 8, 15267. [Google Scholar] [CrossRef] [PubMed]
Karsli-Ceppioglu, S.; Dagdemir, A.; Judes, G.; Lebert, A.; Penault-Llorca, F.; Bignon, Y.J.; Bernard-Gallon, D. The Epigenetic Landscape of Promoter Genome-Wide Analysis in Breast Cancer. Sci. Rep. 2017, 7, 6597. [Google Scholar] [CrossRef] [PubMed]
Zhao, Q.Y.; Lei, P.J.; Zhang, X.; Zheng, J.Y.; Wang, H.Y.; Zhao, J.; Li, Y.M.; Ye, M.; Li, L.; Wei, G.; et al. Global Histone Modification Profiling Reveals the Epigenomic Dynamics during Malignant Transformation in a Four-Stage Breast Cancer Model. Clin. Epigenet. 2016, 8, 34. [Google Scholar] [CrossRef] [PubMed]
Cortellesi, E.; Savini, I.; Veneziano, M.; Gambacurta, A.; Catani, M.V.; Gasperi, V. Decoding the Epigenome of Breast Cancer. Int. J. Mol. Sci. 2025, 26, 2605. [Google Scholar] [CrossRef] [PubMed]
Argelaguet, R.; Arnol, D.; Bredikhin, D.; Deloro, Y.; Velten, B.; Marioni, J.C.; Stegle, O. MOFA+: A Statistical Framework for Comprehensive Integration of Multi-Modal Single-Cell Data. Genome Biol. 2020, 21, 111. [Google Scholar] [CrossRef]
Sharma, A.; Debik, J.; Naume, B.; Ohnstad, H.O.; Sahlber, K.K.; Borgen, E.; Børresen-Dale, A.-L.; Engebråten, O.; Fritzman, B.; Garred, Ø.; et al. Comprehensive Multi-Omics Analysis of Breast Cancer Reveals Distinct Long-Term Prognostic Subtypes. Oncogenesis 2024, 13, 22. [Google Scholar] [CrossRef]
Malighetti, F.; Villa, M.; Villa, A.M.; Pelucchi, S.; Aroldi, A.; Cortinovis, D.L.; Canova, S.; Capici, S.; Cazzaniga, M.E.; Mologni, L.; et al. Prognostic Biomarkers in Breast Cancer via Multi-Omics Clustering Analysis. Int. J. Mol. Sci. 2025, 26, 1943. [Google Scholar] [CrossRef]
Zhang, S.; Liu, K.; Liu, Y.; Hu, X.; Gu, X. The Role and Application of Bioinformatics Techniques and Tools in Drug Discovery. Front. Pharmacol. 2025, 16, 1547131. [Google Scholar] [CrossRef] [PubMed]
Savage, P.; Pacis, A.; Kuasne, H.; Liu, L.; Lai, D.; Wan, A.; Dankner, M.; Martinez, C.; Muñoz-Ramos, V.; Pilon, V.; et al. Chemogenomic Profiling of Breast Cancer Patient-Derived Xenografts Reveals Targetable Vulnerabilities for Difficult-to-Treat Tumors. Commun. Biol. 2020, 3, 310. [Google Scholar] [CrossRef]
Meisel, J.L.; Venur, V.A.; Gnant, M.; Carey, L. Evolution of Targeted Therapy in Breast Cancer: Where Precision Medicine Began. Am. Soc. Clin. Oncol. Educ. B. 2018, 38, 78–86. [Google Scholar] [CrossRef]
Nielsen, T.O.; Parker, J.S.; Leung, S.; Voduc, D.; Ebbert, M.; Vickery, T.; Davies, S.R.; Snider, J.; Stijleman, I.J.; Reed, J.; et al. A Comparison of PAM50 Intrinsic Subtyping with Immunohistochemistry and Clinical Prognostic Factors in Tamoxifen-Treated Estrogen Receptor–Positive Breast Cancer. Clin. Cancer Res. 2010, 16, 5222–5232. [Google Scholar] [CrossRef]
Ohara, A.M.; Naoi, Y.; Shimazu, K.; Kagara, N.; Shimoda, M.; Tanei, T.; Miyake, T.; Kim, S.J.; Noguchi, S. PAM50 for Prediction of Response to Neoadjuvant Chemotherapy for ER-Positive Breast Cancer. Breast Cancer Res. Treat. 2019, 173, 533–543. [Google Scholar] [CrossRef]
Rodríguez-Bejarano, O.H.; Parra-López, C.; Patarroyo, M.A. A Review Concerning the Breast Cancer-Related Tumour Microenvironment. Crit. Rev. Oncol. Hematol. 2024, 199, 104389. [Google Scholar] [CrossRef]
Le, T.; Aronow, R.A.; Kirshtein, A.; Shahriyari, L. A Review of Digital Cytometry Methods: Estimating the Relative Abundance of Cell Types in a Bulk of Cells. Brief. Bioinform. 2021, 22, bbaa219. [Google Scholar] [CrossRef]
Fernández, E.A.; Mahmoud, Y.D.; Veigas, F.; Rocha, D.; Miranda, M.; Merlo, J.; Balzarini, M.; Lujan, H.D.; Rabinovich, G.A.; Girotti, M.R. Unveiling the Immune Infiltrate Modulation in Cancer and Response to Immunotherapy by MIXTURE—An Enhanced Deconvolution Method. Brief. Bioinform. 2021, 22, bbaa317. [Google Scholar] [CrossRef]
Zerdes, I.; Matikas, A.; Mezheyeuski, A.; Manikis, G.; Acs, B.; Johansson, H.; Boyaci, C.; Boman, C.; Poncet, C.; Ignatiadis, M.; et al. Machine Learning-Based Spatial Characterization of Tumor-Immune Microenvironment in the EORTC 10994/BIG 1-00 Early Breast Cancer Trial. NPJ Breast Cancer 2025, 11, 23. [Google Scholar] [CrossRef]
National Cancer Institute. Drugs Approved for Breast Cancer. Available online: https://www.cancer.gov/about-cancer/treatment/drugs/breast (accessed on 12 May 2025).
Kulkarni, V.S.; Alagarsamy, V.; Solomon, V.R.; Jose, P.A.; Murugesan, S. Drug Repurposing: An Effective Tool in Modern Drug Discovery. Russ. J. Bioorg. Chem. 2023, 49, 157–166. [Google Scholar] [CrossRef]
Correia, A.S.; Gärtner, F.; Vale, N. Drug Combination and Repurposing for Cancer Therapy: The Example of Breast Cancer. Heliyon 2021, 7, e05948. [Google Scholar] [CrossRef] [PubMed]
Hernández-Lemus, E.; Martínez-García, M. Pathway-Based Drug-Repurposing Schemes in Cancer: The Role of Translational Bioinformatics. Front. Oncol. 2021, 10, 605680. [Google Scholar] [CrossRef]
Corleto, K.A.; Strandmo, J.L.; Giles, E.D. Metformin and Breast Cancer: Current Findings and Future Perspectives from Preclinical and Clinical Studies. Pharmaceuticals 2024, 17, 396. [Google Scholar] [CrossRef] [PubMed]
Lord, S.R.; Cheng, W.-C.; Liu, D.; Gaude, E.; Haider, S.; Metcalf, T.; Patel, N.; Teoh, E.J.; Gleeson, F.; Bradley, K.; et al. Integrated Pharmacodynamic Analysis Identifies Two Metabolic Adaption Pathways to Metformin in Breast Cancer. Cell Metab. 2018, 28, 679–688.e4. [Google Scholar] [CrossRef] [PubMed]
Cuzick, J.; Sestak, I.; Forbes, J.F.; Dowsett, M.; Cawthorn, S.; Mansel, R.E.; Loibl, S.; Bonanni, B.; Evans, D.G.; Howell, A. Use of Anastrozole for Breast Cancer Prevention (IBIS-II): Long-Term Results of a Randomised Controlled Trial. Lancet 2020, 395, 117–122. [Google Scholar] [CrossRef]
Moraca, F.; Arciuolo, V.; Marzano, S.; Napolitano, F.; Castellano, G.; D’Aria, F.; Di Porzio, A.; Landolfi, L.; Catalanotti, B.; Randazzo, A.; et al. Repurposing FDA-Approved Drugs to Target G-Quadruplexes in Breast Cancer. Eur. J. Med. Chem. 2025, 285, 117245. [Google Scholar] [CrossRef] [PubMed]
Yang, L.; Li, J.; Li, Y.; Zhou, Y.; Wang, Z.; Zhang, D.; Liu, J.; Zhang, X. Diclofenac Impairs the Proliferation and Glucose Metabolism of Triple-Negative Breast Cancer Cells by Targeting the c-Myc Pathway. Exp. Ther. Med. 2021, 21, 584. [Google Scholar] [CrossRef]
Kim, Y.J.; Jang, S.-K.; Kim, G.; Hong, S.-E.; Park, C.S.; Seong, M.-K.; Kim, H.-A.; Kim, K.S.; Kim, C.-H.; Park, K.S.; et al. Nebivolol Sensitizes BT-474 Breast Cancer Cells to FGFR Inhibitors. Anticancer. Res. 2023, 43, 1973–1980. [Google Scholar] [CrossRef]
Nuevo-Tapioles, C.; Santacatterina, F.; Stamatakis, K.; Núñez de Arenas, C.; Gómez de Cedrón, M.; Formentini, L.; Cuezva, J.M. Coordinate β-Adrenergic Inhibition of Mitochondrial Activity and Angiogenesis Arrest Tumor Growth. Nat. Commun. 2020, 11, 3606. [Google Scholar] [CrossRef]
Tutt, A.N.J.; Garber, J.E.; Kaufman, B.; Viale, G.; Fumagalli, D.; Rastogi, P.; Gelber, R.D.; de Azambuja, E.; Fielding, A.; Balmaña, J.; et al. Adjuvant Olaparib for Patients with BRCA₁- or BRCA₂- Mutated Breast Cancer. N. Engl. J. Med. 2021, 384, 2394–2405. [Google Scholar] [CrossRef] [PubMed]
Makhlin, I.; McAndrew, N.P.; Wileyto, E.P.; Clark, A.S.; Holmes, R.; Bottalico, L.N.; Mesaros, C.; Blair, I.A.; Jeschke, G.R.; Fox, K.R.; et al. Ruxolitinib and Exemestane for Estrogen Receptor Positive, Aromatase Inhibitor Resistant Advanced Breast Cancer. NPJ Breast Cancer 2022, 8, 122. [Google Scholar] [CrossRef] [PubMed]
Zhu, S.; Wu, Y.; Song, B.; Yi, M.; Yan, Y.; Mei, Q.; Wu, K. Recent Advances in Targeted Strategies for Triple-Negative Breast Cancer. J. Hematol. Oncol. 2023, 16, 100. [Google Scholar] [CrossRef]
Bravaccini, S.; Mazza, M.; Maltoni, R. No More Disparities among Regions in Italy: Recent Approval of Genomic Test Reimbursability for Early Breast Cancer Patients in the Country. Breast Cancer Res. Treat. 2023, 201, 1–3. [Google Scholar] [CrossRef]
Jamialahmadi, H.; Khalili-Tanha, G.; Nazari, E.; Rezaei-Tavirani, M. Artificial Intelligence and Bioinformatics: A Journey from Traditional Techniques to Smart Approaches. Gastroenterol. Hepatol. Bed Bench 2024, 17, 241–252. [Google Scholar] [CrossRef]
Flores, J.E.; Claborne, D.M.; Weller, Z.D.; Webb-Robertson, B.-J.M.; Waters, K.M.; Bramer, L.M. Missing Data in Multi-Omics Integration: Recent Advances through Artificial Intelligence. Front. Artif. Intell. 2023, 6, 1098308. [Google Scholar] [CrossRef]
Vaseghi, H.; Akrami, S.M.; Rashidi-Nezhad, A. The Challenges in the Interpretation of Genetic Variants Detected by Genomics Techniques in Patients with Congenital Anomalies. J. Clin. Lab. Anal. 2023, 37, e24967. [Google Scholar] [CrossRef] [PubMed]
Thorn, C.F.; Klein, T.E.; Altman, R.B. PharmGKB: The Pharmacogenomics Knowledge Base. Methods Mol. Biol. 2013, 1015, 311–320. [Google Scholar] [CrossRef]
Field, M.A. Bioinformatic Challenges Detecting Genetic Variation in Precision Medicine Programs. Front. Med. 2022, 9, 806696. [Google Scholar] [CrossRef]
Roy, S.; Coldren, C.; Karunamurthy, A.; Kip, N.S.; Klee, E.W.; Lincoln, S.E.; Leon, A.; Pullambhatla, M.; Temple-Smolkin, R.L.; Voelkerding, K.V.; et al. Standards and Guidelines for Validating Next-Generation Sequencing Bioinformatics Pipelines. J. Mol. Diagn. 2018, 20, 4–27. [Google Scholar] [CrossRef]
Guo, J.; Hu, J.; Zheng, Y.; Zhao, S.; Ma, J. Artificial Intelligence: Opportunities and Challenges in the Clinical Applications of Triple-Negative Breast Cancer. Br. J. Cancer 2023, 128, 2141–2149. [Google Scholar] [CrossRef] [PubMed]
Ahn, J.S.; Shin, S.; Yang, S.-A.; Park, E.K.; Kim, K.H.; Cho, S.I.; Ock, C.-Y.; Kim, S. Artificial Intelligence in Breast Cancer Diagnosis and Personalized Medicine. J. Breast Cancer 2023, 26, 405. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Schematic representation of the current (5th edition) World Health Organization (WHO) classification of epithelial breast carcinomas, including main categories, subtypes, and their relative frequencies. The diagram distinguishes between in situ and invasive carcinomas, further detailing the predominant histological subtypes and their typical receptor status. BC: Breast Cancer; DCIS: Ductal Carcinoma in Situ; ER: Estrogen Receptor; HER2: Human Epidermal Growth Factor Receptor 2; IBC: Invasive Breast Carcinoma; IDC: Invasive Ductal Carcinoma; ILC: Invasive Lobular Carcinoma; LCIS: Lobular Carcinoma in Situ; NST: No Special Type; PR: Progesterone Receptor.

Figure 2. Overview of next-generation sequencing (NGS) applications in biomedical research. ATAC-seq: Assay for Transposase-Accessible Chromatin using sequencing; ChIA-PET: Chromatin Interaction Analysis by Paired-End Tag sequencing; ChIP-seq: Chromatin Immunoprecipitation sequencing; DNase-seq: DNase I hypersensitive sites sequencing; GRO-seq: Global Run-On sequencing; Hi-C: High-throughput Chromosome Conformation Capture; NGS: next-generation sequencing; RNA-seq: RNA sequencing; scRNA-seq: single-cell sequencing RNA TRS: Targeted Resequencing; WES: whole-exome sequencing; WGS: whole-genome sequencing; Methyl-seq: Methylation sequencing; bisulfite-seq: Bisulfite sequencing.

Table 2. Widely used databases and free web-based tools for bioinformatics analyses and related website links.

Database	Analysis	Website (10 June 2025)
AlphaFold 3	Prediction of protein, DNA and RNA structure and modeling of structural complexes	https://alphafold.ebi.ac.uk/
AutoDock Vina 1.2.0	Protein–ligand docking	https://vina.scripps.edu/
CB-Dock2	Protein–ligand blind docking	https://cadd.labshare.cn/cb-dock2/
cBioPortal for Cancer Genomics	Multi-omics cancer genomics data	http://cbioportal.org/
eVITTA	Transcriptome functional characterization	https://tau.cmmt.ubc.ca/eVITTA/
g:Profiler	Functional enrichment analysis at gene level	https://biit.cs.ut.ee/gprofiler/gost
GO	Gene functions, cellular processes and subcellular localization of proteins	http://www.geneontology.org/
GOnet	GO term annotation and enrichment analysis	https://tools.dice-database.org/GOnet/
iDEP 2.0	RNA-seq data analysis	https://bioinformatics.sdstate.edu/idep/
LncBook 2.0	human lncRNAs integration with multi-omics annotations	https://ngdc.cncb.ac.cn/lncbook/
LncRNA-ID	lncRNA identification	https://github.com/zhangy72/LncRNA-ID
miRNet 2.0	miRNA functions and interaction networks with genes, diseases, compounds, transcription factors.	https://www.mirnet.ca/
miRTargetLink 2.0	miRNA–mRNA interactions	https://ccb-compute.cs.uni-saarland.de/mirtargetlink2/
miRDB	miRNA–mRNA interactions and functional annotations	https://mirdb.org/mirdb/index.html
ShinyGO 0.82	Graphical gene-set enrichment	https://bioinformatics.sdstate.edu/go/
TSVdb	TCGA splicing variants	https://github.com/wenjie1991/TSVdb

CB-Dock2: Cavity Blind-Dock 2; eVITTA: easy Visualization and Inference Toolbox for Transcriptome Analysis; GO: Gene Ontology; g:Profiler: gene Profiler; iDEP 2.0: Integrated Differential Expression and Pathway analysis 2.0; lncRNA: long non-coding RNA; LncRNA-ID: Long Non-Coding RNA Identification; miRNA: microRNA; ShinyGO 0.82: Shiny Gene Ontology 0.82; TCGA: The Cancer Genome Atlas; TSVdb: Tumor Splicing Variants database.

Table 3. Most relevant studies on integrating experimental and bioinformatics strategies in BC research, conducted over the last five years (2020–2025).

BC Samples	Integrated Strategy	Main Findings	Refs
GEO and TCGA databases	Transcriptomic profiling PPI network construction Survival analysis	10 hub genes (PBK, CCNA2, CDCA8, MELK, NUSAP1, BIRC5, CCNB2, HMMR, MAD2L1, and PRC1) strongly associated with BC evolution.	[76]
GEO and TCGA databases	Transcriptomic profiling PPI network construction Survival analysis	23 hub genes negatively correlated with BC overall survival. Increased cell cycle gene (CDK1, CDC20, AURKA and MCM4) expression as predictive biomarker for poor prognosis.	[77]
TCGA and METABRIC databases	Transcriptomic profiling	Genes involved in cell communication (CACNG4 and CHRNA6), cell cycle regulation and DNA replication (PKMYT1) pathways, and invasion and metastasis (EPYC) as diagnostic and prognostic markers.	[78]
Fresh tissues and axillary lymph nodes	Single cell transcriptomic profiling	CD44⁺/ALDH2 ⁺/ALDH6A1⁺ cluster in BC stem cells. PTMA, STC2, CST3, and RAMP3 genes involved in lymph node metastasis.	[79]
ARIC study	Transcriptomic and proteomic profiling	Five plasma proteins with strong and causal links to BC: PEX14 and CTSF positively associated; SNUPN, CSK, and PARK7 negatively associated.	[80]
Fresh tissues TCGA and GEO databases	Epigenomic and transcriptomic profiling PPI network construction	Identification of a TFs/miR-126/gene FFL regulating cell identity/stemness. FFL disruption promotes oncogenic transformation and BC progression.	[81]
Fresh tissues	Genomic, transcriptomic, metabolomic and lipidomic profiling. Gene–protein–reaction relationship construction	Subclassification of TNBCs in three metabolomic subtypes with prognostic value. N-acetyl-aspartyl-glutamate as potential therapeutic target for high-risk tumors.	[82]
Plasma samples	Metabolomic and proteomic profiling PPI network construction Machine learning models for diagnostic efficacy evaluation	Downregulation of metabolism of specific amino acids. Among the 31 DEPs, four enzymes (GOT1, LDHB, GSS, GPX3) linked to deregulated metabolic pathways. Identification of plasma metabolic signature for BC.	[83]
FFPE samples	Proteomic profiling Survival analysis	Subclassification of basal-like, HER2-enriched and TNBCs based on immune responses and clinical outcomes	[84]
FFPE samples TCGA database	Proteomic profiling Survival analysis	Coronin-1A and α-1-antitrypsin as markers for immune subtype-stratification	[85]
Fresh tissues NCG database	Proteomic profiling PPI network construction	Classification of BC subtypes based on oncoproteins/tumor suppressor DEPs.	[86]
TCGA database	Transcriptomic and epigenomic profiling	Development of a BC subtype classification framework (moBRCA-net).	[87]
Public protein-GWAS studies	Proteomic profiling/disease causal relationship construction	Genetically predicted concentrations of circulating AOC2, SPN1, CD160, RALB, GDI2, CPNE1, ULK3, CTSF, and PLAUR associated with BC risk and subtypes.	[88]
Cell lines	Proteomic profiling	~13,000 cell type-specific proteins correlated with HR status and molecular signatures. RB1 and CB2X as strong predictors of palbociclib response.	[89]
FFPE samples TCGA database Cell lines	Proteome and metabolome profiling PPI network construction Survival analysis	PYCR1 and ALDH18A1 associated with NAT resistance, tumor relapse and poor prognosis. PYCR1 KO: increased glutamine catabolism and chemotherapy-sensitivity in ER⁺ cells, decreased integrin and laminin expression in ER⁺ and TNBC.	[90]
Cell lines Murine models	Transcriptomic profiling miRNA target gene prediction	EMT and metastasis inhibition by propranolol.	[91]
GEO and TCGA databases PDB and PubChem databases	Transcriptomic profiling PPI network construction Survival analysis TFs/miRNA/genes network construction Drug sensitivity analysis Molecular modeling and docking	Seven key genes (BUB1, CCNB1, ASPM, TTK, CCNA2, CENPF, and RFC4), regulated by specific TFs and miRNAs, involved in BC progression with prognostic value. Trametinib, selumetinib, and refametinib repurposing for BCs.	[92]
Fresh tissues	Transcriptomic and lipidomic profiling	Upregulation of fatty acid oxidation genes depending on metformin resistance or sensitivity.	[93]
CMap database	Transcriptomic profiling Molecular modeling and docking	Dolasetron and granisetron repurposing as aromatase inhibitors	[94]
METABRIC and LINCS databases	Genomic and transcriptomic profiling Drug–drug interaction analysis	Novel network-based approach for drug repurposing. BC subtype-specific ruxolitinib repurposing.	[95]

ALDH2: Aldehyde Dehydrogenase 2 Family Member; ALDH6A1: Aldehyde Dehydrogenase 6 Family Member A1; AOC2: Amine oxidase, copper containing 2; ARIC study: Atherosclerosis Risk in Communities study; ASPM: Assembly Factor for Spindle Microtubules; AURKA: Aurora kinase A; BC: Breast Cancer; BIRC5: Baculoviral IAP repeat containing 5; BUB1: Mitotic Checkpoint Serine/Threonine Kinase BUB1; CACNG4: Calcium voltage-gated channel auxiliary subunit gamma 4; CB2X: Cannabinoid receptor 2; CCNA2: Cyclin A2; CCNB1: Cyclin B1; CCNB2: Cyclin B2; CD160: CD160 antigen; CD44: CD44 Molecule (IN Blood Group); CDC20: Cell division cycle 20 homolog; CDCA8: Cell division cycle-associated 8; CDK1: Cyclin-dependent kinase 1; CENPF: Centromere Protein F; CHRNA6: Cholinergic receptor, nicotinic, alpha 6; Cmap database: Connectivity Map database; CPNE1: Copine-1; CSK: C-terminal Src kinase; CST3: Cystatin C; CTSF: Cathepsin F; DEPs: Differentially expressed proteins; EMT: Epithelial–Mesenchymal Transition; EPYC: Epiphycan; ER: Estrogen Receptor; ER2: Human Epidermal growth factor Receptor 2; FFL: Feed Forward Loop; FFPE: Formalin-Fixed Paraffin-Embedded; GDI2: GDP dissociation inhibitor 2; GEO: Gene Expression Omnibus; GOT1: Glutamic-oxaloacetic transaminase 1; GPX3: Glutathione peroxidase 3; GSS: Glutathione synthetase; GWAS: Genome-Wide Association Study; HMMR: Hyaluronan Mediated Motility Receptor; HR: Hormone Receptor; KO: Knockout; LDHB: Lactate dehydrogenase B; LINCS: Library of Integrated Network-based Cellular Signatures; MAD2L1: Mitotic Arrest Deficient-like 1; MCM4: Minichromosome Maintenance Complex Component 4; MELK: Maternal Embryonic Leucine Zipper Kinase; METABRIC: Molecular Taxonomy of Breast Cancer International Consortium; miR-126: microRNA-126; miRNA: microRNA; NAT: Neoadjuvant treatment; NCG: Network of Cancer Genes; NUSAP1: Nucleolar and Spindle Associated Protein 1; PARK7: Parkinsonism associated deglycase; PBK: PDZ Binding Kinase; PDB: Protein Data Bank; PEX14: Peroxisomal biogenesis factor 14; PKMYT1: Protein Kinase, Membrane Associated Tyrosine/Threonine 1; PLAUR: Plasminogen Activator, Urokinase Receptor; PPI: Protein–Protein Interaction; PR: Progesterone Receptor; PRC1: Protein Regulator Of Cytokinesis 1; PTMA: Prothymosin alpha; PYCR1: Pyrroline-5-carboxylate reductase 1; RALB: RAS-related protein Ral-B; Rb: Retinoblastoma 1; RFC4: Replication Factor C Subunit 4; SNUPN: Snurportin 1; SPN1: Spen family transcriptional repressor 1; STC2: Stanniocalcin 2; TCGA: The Cancer Genome Atlas; TF: Transcription Factor; TNBC: Triple-Negative Breast Cancer; TTK: Dual-specificity protein kinase TTK; ULK3: Unc-51 Like Kinase 3.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Veneziano, M.; Savini, I.; Cortellesi, E.; Gasperi, V.; Gambacurta, A.; Catani, M.V. Bioinformatics Strategies in Breast Cancer Research. Biomolecules 2025, 15, 1409. https://doi.org/10.3390/biom15101409

AMA Style

Veneziano M, Savini I, Cortellesi E, Gasperi V, Gambacurta A, Catani MV. Bioinformatics Strategies in Breast Cancer Research. Biomolecules. 2025; 15(10):1409. https://doi.org/10.3390/biom15101409

Chicago/Turabian Style

Veneziano, Matteo, Isabella Savini, Elisa Cortellesi, Valeria Gasperi, Alessandra Gambacurta, and Maria Valeria Catani. 2025. "Bioinformatics Strategies in Breast Cancer Research" Biomolecules 15, no. 10: 1409. https://doi.org/10.3390/biom15101409

APA Style

Veneziano, M., Savini, I., Cortellesi, E., Gasperi, V., Gambacurta, A., & Catani, M. V. (2025). Bioinformatics Strategies in Breast Cancer Research. Biomolecules, 15(10), 1409. https://doi.org/10.3390/biom15101409

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Bioinformatics Strategies in Breast Cancer Research

Abstract

1. Introduction

2. Bioinformatics Tools and Techniques for Biomarker Discovery

2.1. Data Acquisition

2.2. Data Collection and Organization

2.3. Data Analysis

2.4. Machine Learning and AI

3. Bioinformatics in BC Research

3.1. Diagnostic and Prognostic Biomarkers

3.2. Harnessing Bioinformatics for BC Therapy

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI