3. Results
We conducted a retrospective study to comprehensively assess the microbial community in fecal samples. The research was carried out between October 2020 and June 2025 at the University of Medicine and Pharmacy of Craiova. A total of 99 participants were included, divided into two groups of 57 breast cancer patients and 42 healthy controls, encompassing individuals with benign conditions and those with no detected lesions.
In
Figure 2 Left, a histogram with the ages of the patients is plotted, and a blue line was added for a smooth kernel density estimation of real data and a red line for the theoretical normal distribution based on the mean and standard deviation of the age dataset. By comparing these two curves, a good enough level of similarity can be seen with a higher difference in the two curves on the left side. For further visual analysis, a quartile to quartile plot is shown in
Figure 2 Right; depending on how well the blue dots align with the red line, the likelihood of a normal distribution can be visually quantified. As it can be observed from the image, there are a few blue dots on the ends of the line, especially on the left part, that do not fit very well with the Gaussian distribution.
A Shapiro–Wilk test was performed for a numerical verification of the Gaussian distribution for the age spectrum of the patients, where values of 0.976 for statistics and 0.064 for p-value were obtained. It can be concluded that we do not have sufficient proofs to deny the statement from the null hypothesis () because the p-value is above a standard threshold of α = 0.05, so the data follows approximately a normal distribution.
Data analysis revealed that among the 57 breast cancer patients, 13 were BRCA mutation carriers, while the remaining 44 did not exhibit a BRCA mutation. Of the BRCA mutation carriers, only one patient was 37 years old, with the remaining patients being over 50 years of age, predominantly from urban areas. Specifically, among the non-BRCA carriers, 38 were from urban areas, and 8 of the BRCA carriers also resided in urban settings (
Table 1).
In
Table 2, we presented a detailed profile of the BRCA carriers, assessing age, family history of breast cancer, diagnostic procedure, histogenetic type, histopathological diagnosis, and molecular classification.
The BRCA carrier groups in our study displayed distinct profiles. BRCA1 carriers were predominantly over the age of 40 and typically had a family history of breast cancer (BC). Diagnosis often occurred through clinical examination combined with biopsy, revealing malignant breast lesions, primarily invasive carcinomas of either ductal or lobular type. These tumors were commonly triple-negative.
In contrast, BRCA2 carriers were all over 40 years old, with no family history of BC. Their diagnoses were often made following clinical examination, a surgical procedure, and histopathological analysis. The breast lesions were invariably malignant, presenting as invasive carcinomas of either ductal or lobular type, and were most commonly luminal A-type tumors.
These subgroup differences, along with previously noted distinctions between BRCA carriers and non-carriers, align with findings reported in the literature. However, many of the key correlations in our study (e.g., age, family history, and histopathological diagnosis) were trend-like rather than statistically significant, likely due to the small sample size. This limitation stems from our center’s recent establishment and the exploratory nature of this study.
Among breast cancer patients, urban residency was associated with more pronounced microbial dysbiosis. Additionally, BRCA carriers tended to have a distinct microbial signature compared to non-carriers, suggesting a potential interaction between genetic predisposition and gut microbiota composition.
In
Table 3, we presented a detailed profile of the non-BRCA breast cancer patients, assessing age, family history of breast cancer, diagnostic procedure, histogenetic type, histopathological diagnosis, and molecular classification.
Applying SDI on the dataset resulted in two clear separable bacterial groups as follows:
The bacteria with blue nuances of colors have a height diversity index;
The bacteria with yellowish nuances have a low to 0 diversity index which means an almost absence of variance in the data for those bacteria.
Initially, 18 bacterial species were analyzed and, based on the Shannon index, it was observed that some of them were not relevant, as indicated by the heatmap in
Figure 3. Consequently, a decision of exclusion was taken for those highlighted in yellow:
Proteus species,
Klebsiella species,
Enterobacter species,
Hafnia alveii,
Serratia species,
Morganella morganii,
Kluyvera species,
Citrobacter species,
Pseudomonas species,
Clostridium species, and
Mold fungi. Subsequently, the focus of analyzing data moved from an initial 18 species to a total number of 7 species with a Shannon index greater than 0.4, which are highlighted in blue:
Fusobacterium nucleatum,
Faecalibacterium prausnitzii,
Blautia,
Bifidobacterium and
Lactobacillus, as well as members of the
Firmicutes and
Bacteroides phyla.Findings revealed that BC patients exhibited reduced microbial diversity compared to healthy controls. Specifically, there was a relative enrichment of Firmicutes, particularly Clostridium clusters XIVa and IV, in BC patients. Conversely, Bacteroidetes phylum members, including genera such as Bifidobacterium, Odoribacter, Butyricimonas, and Coprococcus, were depleted in the BC cohort. These alterations suggest a dysbiotic gut microbiota in BC patients, characterized by an overrepresentation of certain Firmicutes and a reduction in beneficial Bacteroides, which may influence breast cancer development and progression.
To conduct a more thorough analysis for the importance of each bacteria in the decision making process, an AI algorithm, namely Random Forest, was trained on the database. The results obtained for the main evaluation criteria—accuracy, precision, recall and f1-score—are all perfect with 100%, which may suggest an overlearning process generated by the small dataset. In the graphic below (
Figure 4), the importance of each bacteria using the Random Forest algorithm was measured. The
Clostridium species has an almost insignificant impact on the decision making process with a coefficient of 0.003 compared to bacteria like
Bacteroides with a coefficient of 0.27 or
Firmicutes with a coefficient of 0.226. The impact of either
Bacteroides or
Firmicutes is 75 times more important than
Clostridium species. Therefore, even though it has a high enough Shannon diversity index, a decision to eliminate
Clostridium species from further analysis was taken.
Breast cancer patients exhibited significantly reduced alpha diversity compared to healthy controls. BRCA carriers showed the lowest microbial diversity among the groups. Overall, BRCA carriers are more susceptible to developing BC compared to BRCA-negative individuals. When BC occurs in BRCA carriers, it tends to be of a higher grade and more aggressive, associated with higher recurrence risk scores and worse survival outcomes. Furthermore, BRCA1 carriers are more likely to develop aggressive BC with a poorer prognosis at a younger age compared to BRCA2 carriers.
The percent for the dominant concentration level of each bacteria calculated in
Table 4 for each of the three analyzed groups are determined using the following formula:
where n represents the number of concentration levels, in the case of this study 3: low, moderate, and high.
m represents the number of patients in the analyzed group.
Fusobacterium nucleatum is commonly overrepresented in breast cancer patients, especially in BRCA carriers, and has been associated with tumor progression, immune modulation, and inflammatory responses. Increased levels of Blautia species have been linked to breast cancer, likely due to their role in obesity and hormone metabolism, both of which are risk factors for breast cancer.
Faecalibacterium prausnitzii, an anti-inflammatory bacterium, is generally reduced in breast cancer patients, which could contribute to increased inflammation associated with tumorigenesis. Higher microbial diversity was noted in healthy individuals compared to breast cancer patients. Beneficial bacteria such as
Bifidobacterium and
Lactobacillus, which contribute to gut and systemic health, were more abundant in healthy controls (
Table 4).
One of the main hypotheses of the study is the relationship between gut microbiota and cancer/non-cancer patients; there are 6 bacteria types that are significant for this analysis. To reduce the high dimensionality of the data from a 6-dimensional space, 1 dimension for each bacteria, to a 2-dimensional or 3-dimensional space, an AI technique was chosen that deals with this problem, namely Principal Component Analysis (PCA).
By analyzing both 2D and 3D graphics (
Figure 5), the two cases are clearly separable so it can be concluded based on a visual representation that there is a relationship between gut microbiota and cancer/non-cancer cases. It can also be observed that cancer patients appear more dispersed than healthy patients, which may suggest a higher variability of gut microbiota for sick people. The linear relationship determined by the PCA algorithm between each PC and all the six bacteria is as follows:
- ○
PC1 = −0.272961*Fusobacterium nucleatum + 0.455046*Faecalibacterium prausnitzii − 0.312610*Blautia + 0.442051*Bifidobacterium and Lactobacillus − 0.451844*Firmicutes + 0.470242*Bacteroides
PC2 = 0.712624*Fusobacterium nucleatum + 0.201637*Faecalibacterium prausnitzii + 0.531642*Blautia + 0.374008*Bifidobacterium and Lactobacillus − 0.154154*Firmicutes + 0.072255*Bacteroides
- ○
PC1 = −0.272961*Fusobacterium nucleatum + 0.455046*Faecalibacterium prausnitzii − 0.312610*Blautia + 0.442051*Bifidobacterium and Lactobacillus − 0.451844*Firmicutes + 0.470242*Bacteroides
PC2 = 0.712624*Fusobacterium nucleatum + 0.201637*Faecalibacterium prausnitzii + 0.531642*Blautia + 0.374008*Bifidobacterium and Lactobacillus − 0.154154*Firmicutes + 0.072255*Bacteroides
PC3 = 0.133008*Fusobacterium nucleatum − 0.131506*Faecalibacterium prausnitzii + 0.144051*Blautia − 0.688158*Bifidobacterium and Lactobacillus − 0.520384*Firmicutes + 0.447106*Bacteroides
A statistical experiment consisting of two logistical regression AI models (null model and complete model) was conducted to validate the conclusion drawn from the visual analysis of the results given by PCA. For the null model, a log-likelihood value of 57 was obtained, and for the complete model, the log-likelihood value was 99. A higher value for the log-likelihood score indicates a better model. With a value for the statistical test Likelihood Ratio (LR) of 84 and a p-value of 5.32 × 10−16, the null hypothesis () can be denied with a high degree of confidence, which highlights the existence of a relationship between the main components of the study, namely gut microbiota and cancer.
The same process was followed to analyze if there is a potential relationship between gut microbiota and BRCA gene for cancer patients with the results of PCA presented in the image below (
Figure 6).
As it can be seen, this time, the two cases are not linear separable which suggests that gut microbiota has no relationship with the BRCA gene. To strengthen the conclusions of the visual analysis given by PCA, a statistical investigation composed of two logistical regression AI models (null model and complete model) was carried out. For the null model, a log-likelihood value of 44 was obtained, and for the complete model, the log-likelihood value was 50. With a value for the statistical test Likelihood Ratio (LR) of 12 and a p-value of 0.062, the null hypothesis () is accepted, thus highlighting that, on this database, there is no detectable relationship between gut microbiota and the BRCA gene for cancer patients.
Breast cancer patients showed an overrepresentation of pro-inflammatory and potentially pathogenic bacteria such as Fusobacterium nucleatum, while beneficial anti-inflammatory bacteria like Faecalibacterium prausnitzii were less abundant compared to healthy controls. This microbial dysbiosis may play a significant role in breast cancer progression and systemic inflammation.
The study found that the gut microbiota of breast cancer patients was significantly more diverse compared to healthy controls. This increased diversity was not observed in healthy women, suggesting that dysbiosis status may influence the relationship between the gut microbiota and breast cancer.
Certain bacteria were found to be more abundant in breast cancer patients, including E. coli. The count of E. coli in the patients’ stool was significantly elevated, exceeding the normal upper limit (9 × 107 CFU/g). This suggests an overgrowth of E. coli, which is associated with the production of biogenic amines and ammonia. These metabolites can induce inflammation and contribute to a toxic environment in the gut. The report shows reduced levels of Enterococcus and Lactobacillus species, which are crucial for maintaining gut health and preventing colonization by pathogenic bacteria. Bifidobacterium levels, although within normal ranges, were at the lower end, indicating a potentially weakened protective flora. An elevated pH indicates a more alkaline environment, often due to the overgrowth of proteolytic bacteria like E. coli, which can produce alkaline metabolites.
The report highlights an overgrowth of bacteria capable of producing biogenic amines, such as histamine, which can contribute to inflammatory responses and may exacerbate conditions like cancer by promoting a pro-inflammatory microenvironment.
Certain strains of E. coli possess the pks pathogenicity island, which enables them to produce colibactin, a genotoxin that can induce DNA double-strand breaks in host cells. This could contribute to genomic instability and potentially accelerate cancer progression. The overgrowth of E. coli can lead to increased levels of lipopolysaccharides (LPS), which trigger systemic inflammation. Chronic inflammation is a well-known risk factor for cancer progression. Beneficial bacteria such as Lactobacillus and Bifidobacterium are known to produce short-chain fatty acids like butyrate, which possess anti-inflammatory and anti-cancer properties by inhibiting histone deacetylases and modulating immune responses. A reduced presence of these beneficial bacteria can compromise gut barrier integrity, leading to increased gut permeability. This can result in the translocation of bacterial endotoxins into the bloodstream, further promoting systemic inflammation. The overproduction of histamine due to dysbiosis can exacerbate inflammation and potentially contribute to cancer proliferation. Histamine has been shown to influence tumor growth by modulating immune responses and promoting angiogenesis (formation of new blood vessels to feed tumors).
The findings from the stool analysis provide valuable insights into the patients’ gut health, revealing a state of dysbiosis that could be linked to breast cancer diagnosis. Addressing these imbalances through dietary modifications, probiotics, and anti-inflammatory interventions may help improve the patient’s overall health and potentially slow cancer progression.
4. Discussion
This study presents a comprehensive characterization of gut microbiota alterations in breast cancer (BC) patients, offering evidence of a distinct microbial signature associated with both cancer presence and BRCA mutation status. Unlike prior research, which often emphasizes general associations between dysbiosis and malignancy, our work integrates advanced computational techniques—such as Shannon diversity index (SDI), Random Forest classification, and Principal Component Analysis (PCA)—to extract and validate microbiota patterns specific to breast cancer and its genetic subtypes. From an initial pool of 18 bacterial taxa, SDI filtering allowed for the identification of seven biologically informative species: Fusobacterium nucleatum, Faecalibacterium prausnitzii, Blautia, Bifidobacterium, Lactobacillus, and broader groups including Firmicutes and Bacteroidetes. These taxa were consistently altered in BC patients compared to healthy controls.
Our findings indicate a significantly reduced microbial diversity in BC patients compared to healthy controls, with BRCA mutation carriers exhibiting the lowest diversity among all subgroups. BRCA1 carriers, in particular, were more likely to present triple-negative tumors and malignant lesions diagnosed through biopsy and histopathology, suggesting a more aggressive disease phenotype. The microbial profiles corroborated this observation: 92.31% of BRCA1/2 mutation carriers had low levels of Faecalibacterium prausnitzii, a known anti-inflammatory bacterium, while 84.62% exhibited low abundance of beneficial Bacteroidetes. Moreover, over 76% of BRCA-positive patients had high levels of Firmicutes and 69.23% had elevated Fusobacterium nucleatum, highlighting the inflammatory skew of their microbiota.
This aligns with established models that correlate dysbiosis with systemic inflammation and impaired immune regulation—two hallmarks of tumorigenesis. Notably,
F. nucleatum is known to activate NF-κB signaling via TLR4, thereby promoting chronic inflammation and immune evasion, mechanisms well-documented in colorectal and breast cancers. The pro-inflammatory milieu in BRCA carriers was also evident in their higher prevalence of
Blautia spp., which are implicated in metabolic dysregulation and obesity, both risk factors for hormone-sensitive BC [
6,
7,
8].
The identification of these bacterial profiles is further supported by machine learning outputs. Random Forest analysis assigned disproportionately high weights to Bacteroides (0.27) and Firmicutes (0.226) compared to minimal contributions from Clostridium species (0.003), leading to its exclusion from subsequent analysis. PCA analyses revealed a clear visual separation between cancer and non-cancer cohorts, supporting the hypothesis of a microbiota–tumor axis. Statistical validation through logistic regression reinforced these observations (log-likelihood ratio = 84, p = 5.32 × 10−16), providing strong evidence for a relationship between microbiota composition and cancer status.
In contrast, microbiota differences between BRCA-positive and BRCA-negative cancer patients were not linearly separable by PCA, and logistic regression analysis failed to reject the null hypothesis (p = 0.062). While this result may reflect a true biological null, it is more likely explained by the small number of BRCA carriers (n = 13), limiting statistical power. Still, the observed trends—such as lower Bifidobacterium and Lactobacillus levels and higher microbial variability among BRCA carriers—suggest a potential interaction between genetic susceptibility and microbial ecology that warrants further investigation.
One of the most salient findings is the elevated presence of E. coli, particularly strains potentially harboring the pks pathogenicity island capable of producing colibactin. Although not sequenced directly, the abnormally high stool counts (exceeding 9 × 107 CFU/g) imply potential genotoxic capacity. This microbial component could contribute to DNA double-strand breaks and genomic instability in host tissues, a mechanism that may act synergistically with inherited BRCA mutations to accelerate carcinogenesis. Further, elevated stool pH and depletion of protective flora such as Lactobacillus and Enterococcus corroborate a shift toward proteolytic and alkali-producing bacterial populations, conducive to a toxic and pro-inflammatory gut environment.
The hormonal axis further contextualizes the observed dysbiosis. Several gut bacteria, particularly within the
Firmicutes phylum, express β-glucuronidase, an enzyme that reactivates conjugated estrogens in the gut, increasing their systemic bioavailability. This process may exacerbate the risk in estrogen receptor-positive BC, especially in postmenopausal women who rely on peripheral aromatization for estrogen production. In our dataset,
Firmicutes were significantly overrepresented in both BRCA and non-BRCA BC patients, while
Bacteroidetes and SCFA-producing taxa were consistently depleted, pointing to impaired estrogen metabolism and weakened anti-inflammatory buffering [
9,
10].
From a translational perspective, the demonstration that microbiota profiles can stratify cancer risk suggests potential for non-invasive diagnostic biomarkers and therapeutic targets. Microbiota-modulating strategies—such as dietary interventions, prebiotics, probiotics, and even fecal microbiota transplantation—could augment existing treatment modalities by restoring microbial balance, improving immune function, and potentially reducing tumor-promoting inflammation [
11,
12]. Notably, the PCA model’s ability to classify cancer status visually and statistically, using only microbiota data, reinforces the potential utility of these findings in early diagnostic frameworks.
Nevertheless, this study is constrained by several limitations. The modest sample size limits generalizability and statistical power, particularly for subgroup analyses like BRCA vs. non-BRCA comparisons. Additionally, the cross-sectional nature precludes causal inferences, and although patients were treatment-naïve and antibiotic-free, unmeasured environmental or dietary confounders may still have influenced microbiota composition. Lastly, despite sophisticated modeling approaches, the study lacks longitudinal data that would clarify whether observed dysbiosis is a driver or consequence of disease. Future studies should adopt prospective designs with repeated sampling and include functional microbial analysis, such as metagenomics or metabolomics, to better delineate causal mechanisms.
The findings of this study, which demonstrate a distinct gut microbial signature in breast cancer patients characterized by elevated levels of
Firmicutes and
Fusobacterium nucleatum, provide a critical foundation for exploring the underlying mechanisms. These bacterial taxa are increasingly recognized for their roles in promoting local and systemic inflammation [
13], which is a known driver of carcinogenesis [
14]. For instance,
Fusobacterium nucleatum possesses adhesins that allow it to attach to epithelial cells and activate pro-inflammatory signaling pathways, such as NF-κB, through the stimulation of Toll-like receptor 4 (TLR4) on immune cells. This activation leads to the production of cytokines, which can create a pro-tumorigenic microenvironment and contribute to the chronic inflammation observed in various cancers, including breast cancer [
15].
Beyond inflammation, the gut microbiota can significantly influence host hormonal metabolism, a central pathway in the etiology of hormone-receptor-positive breast cancer. Bacteria within the gut, particularly certain species of
Firmicutes and other phyla, produce enzymes such as β-glucuronidase [
16]. This enzyme deconjugates estrogen metabolites that have been excreted into the bile, allowing them to be reabsorbed into the enterohepatic circulation [
17]. This process, if dysregulated, can lead to increased systemic estrogen levels, which can stimulate the proliferation of estrogen-sensitive breast cancer cells. The observed shift in the microbial community in our breast cancer cohort may therefore contribute to altered hormonal homeostasis, providing a plausible link between the microbial signature and disease pathogenesis [
18,
19,
20].
The potential for direct genotoxic effects of the microbiota also warrants discussion. As highlighted in the literature, certain strains of E. coli can produce genotoxins like colibactin. This compound is known to induce DNA double-strand breaks in host cells, a critical event in the initiation of cancer. While this study did not specifically quantify these toxins, the presence of certain bacterial species in the breast cancer group, as identified by our analysis, suggests that a microenvironment conducive to genotoxic activity may be present. This mechanism, combined with chronic inflammation and altered hormonal metabolism, offers a multifaceted explanation for how a dysbiotic gut microbiota could contribute to the development and progression of breast cancer.
4.1. Data Analysis and Interpretation
The normal distribution is one of the most commonly used in modern science because many statistical methods like Pearson coefficient, linear regression, ANOVA test, chi-squared test,
t test, and Fisher test need the data to follow a normal/Gaussian distribution. The main parameters for a normal distribution are the mean and the standard deviation. If a set of data follows a normal distribution, then it should be nearly symmetrical to the left and right of the mean value, approximately 68% of the data should be in the interval mean ± standardDeviation, and approximately 95% of the data should be in the interval mean ± 2*standardDeviation [
21]. In our case, the mean value for age is 59.84 years and the standard deviation is 11.64. For a good evaluation of the age distribution of the patients, we employed a few classical methods like histogram, quartile to quartile plot, and one normal distribution test, namely Shapiro–Wilk.
An SDI technique was employed on the data with the goal to eliminate the bacteria that are almost constant for all the patients, regardless of them being having or not having cancer. If a bacterium has a low value for the Shannon index, then, depending on a certain threshold, we can decide to eliminate it. So far, there is no universal and standardized threshold for the index, but studying the heatmap for all bacteria, an obvious difference can be seen between certain groups.
For a more in-depth analysis of the importance of all bacteria in the decision process, we employed an AI algorithm, namely Random Forest. The idea of using this approach comes more from the desire to analyze the impact of one certain bacteria, Clostridium species, which has a medium value for the Shannon diversity index, rather than to evaluate the performance of the decision process of an AI algorithm. We did not take into consideration the AI performance in classifying the cancer and non-cancer patients, not because of the lack of interest in integrating new and emerging technologies in the decision process, but due to the lack of a bigger dataset; this task is too easy for a complex AI architecture like Random Forest.
A good and simple way to understand more from the combination of these bacteria in cases of cancer and non-cancer patients is to make a form of visual representation for all the bacteria. This would be a very good idea if the data were two-dimensional (two bacteria analyzed) or three-dimensional (three bacteria analyzed) but a higher dimensionality is hard to impossible for humans to understand. In order to solve this issue, we chose to make use of a component of artificial intelligence that deals with this problem, namely Principal Component Analysis (PCA). PCA is a technique of dimensional reduction that can transform a six-dimensional space into a two or three-dimensional space by creating principal components (PC) based on the six bacteria. Based on the graphics produced by PCA, a clear separation between cancer and non-cancer patients can be observed. These conclusions are further strengthened by a statistical comparison of two regression models, one that trains on the data and the other that does not see the data.
4.2. Limitations and Future Works
This study identifies a distinct microbial signature in breast cancer patients, characterized by an overrepresentation of certain Firmicutes and a reduction in beneficial Bacteroides compared to healthy individuals. The finding that the microbiota can be used to distinguish cancer from non-cancer patients is visually supported by the PCA plots. This investigation, while providing novel insights into the association between gut microbiota and breast cancer, is subject to several methodological limitations that warrant consideration and provide clear directions for future research.
The study’s primary limitation is the relatively modest sample size (n = 99), which restricted the statistical power and necessitates a cautious interpretation of the findings. The observed correlations, while biologically plausible, should be considered exploratory and hypothesis-generating. Future research should prioritize the recruitment of a significantly larger, geographically and ethnically diverse cohort to validate these preliminary findings and enhance their generalizability.
Another significant limitation of the study imposed by the dimensionality of the database is the impossibility of a more in-depth analysis for BRCA vs. non-BRCA cases. This is a result from the fact that a further division by any criteria of the databases, especially for the 13 BRCA cases, would fragment too much the clusters that could lead to the creation of subgroups with 4–5 cases. In future research, when the databases have grown significantly, further exploration between key factors like histotype, phenotype, stage and age for the BRCA vs. Non-BRCA cases should be carried out.
While this study establishes a compelling association between a distinct microbial signature and breast cancer, it is inherently cross-sectional and cannot infer causality. The identified differences in microbiota composition could be a consequence, rather than a cause, of breast cancer or its treatment. To address this, future investigations should adopt a longitudinal study design. A prospective cohort study, following healthy individuals over time and collecting serial fecal samples, would be instrumental in determining whether these microbial signatures precede the diagnosis of breast cancer, thereby strengthening the argument for a causal role. This would also allow for the assessment of dynamic changes in the microbiota throughout the course of the disease and its treatment, offering a more comprehensive understanding of its role in breast cancer pathogenesis.