3.1. Age, Age of Menarche, Menopausal Status, Number of Pregnancies, and BMI
Both the LR and HR groups display discrepancies in the chosen cut-off values related to the breast cancer (BC) patients’ ages. Moreover, only a subset of them reports such data: regarding the LR group, two articles [
16,
49] define “60 years old (y/o)” as a discriminative value (
Figure 2A), while “40 y/o” is indicated in the other study [
47] (
Figure 2B). In contrast, both HR studies [
44,
55] differ in the used threshold, with “50 y/o” and “40 y/o” as the selected ones (
Figure 2C,D). Hence, the indication of a higher presence of older subjects in the HR group compared to the LR one cannot be fully supported due to both the limited number of available studies and the defined ranges.
This trend interestingly aligns with that of our cohort, which is skewed towards individuals over 50 years old (72.1%,
Figure 2E). However, when considering the two age groups (≤50 y/o and >50 y/o) separately, our cohort shows an almost double frequency of HR cases compared to LR ones in both groups. Therefore, age is not a significant discriminating factor for patient risk frequency.
Next, the following analyzed parameter was the age of menarche. Such data are not mentioned in all seventeen selected articles; therefore, the analysis has been limited to the IOM cohort. By stratifying our patients, we observed that menarche predominantly occurs at the age of twelve, which was selected as the cut-off value (
Figure 3A and
Table 2). Although the majority of patients in the IOM cohort have an age of menarche at or below twelve, the risk class sub-analysis within the two categories (≤12 and >12) does not reveal a clear trend (
Figure 3B). Additionally, while there may be a trend starting from the age of 10, as indicated in
Figure 3B, the statistical analysis does not demonstrate a significant association with the age of menarche (
p = 0.70), even when excluding the 16 y/o value.
The patients’ menopausal condition is partially indicated in both the LR and HR groups, limiting the ability to compare their data with those of our cohort. Only three articles in the LR group [
16,
41,
42] documented this status: in two of them, the entire population considered is composed of post-menopausal subjects, while they are pre-menopausal in the other one (
Figure S1A). Conversely, two studies in the HR group [
43,
55] provide the menopausal status: patients are entirely pre-menopausal in the first study and mixed in the second one (35% and 65%, respectively,
Figure S1B). Therefore, due to the low quality and quantity of data provided, no indication can be drawn. The IOM cohort, which predominantly consists of post-menopausal BC subjects (62.1%,
Figure S1C), shows the irrelevance of this factor. In fact, nearly identical percentages were obtained from the analysis (
p = 0.65) across the two inquired categories (
Table S5).
In addition to the age of menarche, the number of pregnancies is another underestimated parameter, and no indications have been provided for either the LR or HR group. Following the previous approach, IOM-cohort subjects were stratified based on their number of pregnancies: most patients have had a maximum of two pregnancies (72.9%,
Figure 4A); thus, it was chosen as the cut-off value. Even in this case, the sub-analysis of LR/HR case percentages within the two defined categories (≤2 and >2) does not yield any evident indication. However, as the number of pregnancies increases, the relative percentages of HR patient progressively decrease, while there is a corresponding increase in the percentages of LR subjects (
Figure 4B and
Table 3). Since this analysis reveals a statistically significant tendency (
p = 0.0027) between number of pregnancies and EPclin results, it suggests that higher parity in our cohort seems to be associated with a lower frequency of EPclin high-risk classification.
Body Mass Index (BMI) is a simple weight-to-height index. It was analyzed in the IOM cohort, which investigated differences within the LR and HR groups. Following the World Health Organization’s guidelines [
56], individuals were stratified into four categories, namely underweight, normal, overweight and obese, along with one group that had no available data (
Figure S2A). The analysis of these four categories does not reveal any trend, and subsequent statistical evaluation confirms a lack of meaningful insights (
Figure S2B, Table S6).
3.2. Tumor Stage, Nodal Status, and Grade
Then, tumor stage and nodal status, both pivotal and contributing parameters in defining the EndoPredict Clinical (EPclin) score, along with tumor grade, have been analyzed and compared between the LR and HR groups as well as within the IOM cohort.
Since the EndoPredict test is intended and developed for primary non-metastatic BCs in women, only female patients with pT1-T3 breast tumors are eligible. As depicted in
Figure 5, the LR group displays a higher percentage of stage 1 tumors (pT1) compared to the HR group (72.2% versus 50.2%). On the contrary, the HR group shows almost double (45.7% and 3.8%) values for stage 2 (pT2) and stage 3 (pT3) tumors compared to the LR group (
Figure 5A,B). Even inside the pT1 tumor fraction itself, by excluding those studies [
16,
44,
51] that did not report complete sub-stratification of pT1 tumors, the smallest tumors (pT1ab) are more present in the LR group (19.7%) compared to the other one (10.1%) (
Figure 5C,D). As a consequence, the major presence of larger tumors is a relevant trait of the HR group, and it justifies the higher number of high-risk subjects.
The statistical analysis, in fact, performed by comparing the LR and HR groups does not highlight any significance when pT1 tumor sub-stratification is accounted for (
Figure 5F); however, it arises for both pT1 (as a whole group) and pT2 but not for pT3 tumor stages (
Figure 5E).
The IOM cohort displays a profile that appears closer to that of the LR group (
Figure 6). However, unlike the previous cohorts, only a small percentage (7.9%) of the pT1 class comprehends pT1ab tumors, which are mostly of the pT1c type (57.1%). In addition, the IOM cohort contains more pT2 tumors as well. The pT1c and pT2 stages are significantly more represented in the IOM cohort (90.7%) than in the LR group (78.3%) and are slightly more numerous compared to the HR group as well (85.1%). This data, along with the evidence that as tumor size increases, the percentage of high-risk EndoPredict results progressively rises (
p < 0.001,
Table 4), consistently suggests that the IOM cohort aligns more closely with a high-risk-profile group.
Although most of the tested subjects in both the LR (79.5%) and HR groups (62.2%) are characterized by node-negative BCs, the percentage of node-positive tumors is greater in the HR group (37.3%) compared to the LR group (20.5%) (
Figure 7A,B). This difference is not only numerically but also statistically significant: in fact, by excluding two high-risk studies [
51,
53] whose enrolled subjects are all node-positive or node-negative, the analysis displays a statistically significant difference (
Figure 7C).
Our cohort, which encompasses a wide number of node-negative BC subjects (72.1%,
Figure 8), appears to closely resemble the composition observed in the HR group. Furthermore, when the lymph node status is negative, the percentage of LR/HR scores is almost balanced (
Table 5); conversely, the percentage of EP tests considerably shifts towards high-risk-classified scores (
p < 0.001,
Table 5) when the lymph node status is positive.
By comparing the LR and HR groups according to tumor grade, which reflects cancer aggressiveness and ranges from 1 to 3, both categories show similar percentages of G2 tumors (65.0% versus 64.7%), while G1 and G3 values are basically reversed (
Figure 9A,B). Nevertheless, when comparing both groups regarding the grade parameter, no statistical significance arises (
Figure 9C).
IOM subjects, unlike those from the literature, display a complete absence of G1 cancers and a larger G3 fraction (
Figure 10). In our cohort, while G2 BCs include nearly the same percentages of low- and high-risk results, the G3 fraction has three times the percentage of high-risk subjects compared to low-risk ones (
p = 0.006,
Table 6). Thus, the higher prevalence of G3 cases clearly shifts our cohort towards a high-risk profile.
3.3. Molecular and Histological Features: ER, PR, Ki67, Molecular Subtype, and Histology
Several immunophenotypic parameters were evaluated to assess potential associations between these characteristics and the frequency of high-risk cases. However, in the available literature, only a limited number of studies report specific information on the quantitative expression of the estrogen receptor (ER). Only one study, with a predominance of patients with a low EPclin score, provides precise stratification of patients into categories based on different levels of ER expression [
16]: a total of 57.1% of patients exhibit high ER expression, while 32.5% have medium ER levels, and 10.4% have low ER expression. The only study with a predominance of patients with a high EPclin score uses a combined stratification strategy based on the expression of both the ER and the progesterone receptor (PR) [
44]: a total of 69.6% of patients have a high percentage of hormone-receptor-positive cells, and 29% exhibit low expression, but this information is missing for 1.4% of subjects.
Overall, the IOM cohort exhibits a different distribution. According to ASCO guidelines [
57] only BCs with IHC ER expression above 10% clearly benefit from endocrine therapy, while more controversial data are available for the 1–9% category. From this point of view and exploiting a stratification strategy found in the literature [
58], patients were divided into three categories. The percentages of patients with high (above or equal to 90%), medium (70–89%), and low (10–69%) ER expression are 89.3%, 5%, and 1.4%, respectively (
Figure 11). The differences in high- and low-risk frequencies across the three ER expression groups were statistically significant (
p < 0.001,
Table 7). Still, since the vast majority of patients in the cohort belongs to the high-ER-expression group, this may affect the robustness of the analysis, and these data need to be interpreted with caution.
A total of six identified studies report details on PR positivity. Of these, four [
16,
47,
49,
53] have a higher frequency of low-risk patients, while two [
43,
55] show a predominance of high-risk individuals (
Figure 12A,B). In all the analyzed studies, the predominant fraction consisted of PR-positive patients. The average proportion of PR-negative patients in studies with a predominance of low-risk cases was 8.7%, while in studies with a predominance of high-risk cases, it was 6.7%. The differences in the frequency of PR-positive receptors between the two groups of studies were not statistically significant (
Figure 12C).
Note that all previous statistical analyses comparing both LR and HR groups on the basis of a certain parameter and their interpretations are hypothesis-generating and not confirmatory due to the descriptive/scoping nature of our review.
Similarly to ER expression status, IOM patients were firstly stratified on the basis of the cut-off value (20%) reported in the guidelines of the Italian Ministry of Health and based on recommendations from the Italian Society of Medical Oncology (AIOM) [
6]. They show a higher proportion of PR-negative BCs compared to both previous groups of studies (19.3%
Figure 12D). Among PR-negative patients, 67% exhibit a high-risk profile, while the remaining 33% are classified as low-risk (
Table 8). Similarly, 65% of PR-positive patients are at a high risk, while 35% are at a low risk. The differences in high- and low-risk frequencies between the patient groups are not statistically significant (
p = 0.88).
Few studies report detailed information on Ki-67 expression levels: three [
16,
42,
47] and five articles [
43,
48,
50,
52,
55] with a predominance of low-risk and high-risk patients, respectively. Moreover, since the Ki-67 thresholds used to stratify patients vary across studies, data aggregation and a comparison with our data are challenging. One study [
42] employs a Ki-67 threshold of 30% to classify patients, identifying 83.6% as having low Ki-67 expression and 12.2% as having high Ki-67 expression, with 4.2% of cases lacking this information (
Figure 13A). Two additional studies use lower, but comparable, thresholds: Filipits et al. sets their threshold at 11%, while Jahn et al. uses a threshold of 10%. The cohort in Filipits et al.’s study shows a distribution similar to that of Constantinidou et al., with 74.6% of patients classified as having low Ki-67 expression and 21.6% as having high Ki-67 expression, with 3.8% missing data (
Figure 13B). However, in Jahn et al.’s study, where the threshold is set at 10%, the distribution was reversed, with only 23.4% of patients having Ki-67 levels ≤ 10% and 76.6% exhibiting levels above it (
Figure 13C). A similar pattern is observed in studies focusing on high-risk patient groups. In two studies [
48,
55], a Ki-67 threshold of 20% is applied, and the proportion of patients classified as having high Ki-67 expression is substantially larger, with 58.6% falling into the low-Ki-67 category and 35.8% into the high-Ki-67 category (
Figure 13D). Likewise, in studies that use a threshold of 14% [
43,
50,
52], high Ki-67 expression is more prevalent than low Ki-67 expression. Specifically, 53.5% of patients exhibit Ki-67 levels above 14%, while 36.9% have Ki-67 levels at or below this threshold (
Figure 13E). These findings highlight the inconsistency in Ki-67 stratification criteria across studies, which complicates direct comparisons and meta-analyses. The observed trend of higher Ki-67 expression being more frequent in high-risk patient groups may suggest a potential association between Ki-67 levels and patient risk stratification. However, it is important to underline the narrative interpretation of this trend because, as emphasized before, the wide nature of selected cut-off values among analyzed studies makes it difficult to draw inferential statements across studies. Nevertheless, further investigation about this trend would be reasonable.
According to the guidelines of the Italian Ministry of Health and based on recommendations from AIOM [
6], a Ki-67 threshold of 20% has been established as a significant parameter for risk classification. Consequently, the IOM patients were first stratified according to this criterion. Notably, the majority of patients in the cohort exhibited a Ki-67 value equal to 20% (
Figure 14A and
Table 9). This particular enrichment may be due to the fact that surpassing this threshold is one of the five criteria used to classify a patient as having an intermediate or high risk. Moreover, the ratio of high-to-low Ki67 patients in the IOM cohort (49.3% and 46.4%, respectively) is comparable to that observed in studies featuring a predominance of high-risk patients (
Figure 13D,E and
Figure 14A). Although the limited number of available studies and the variability in Ki-67 thresholds prevent a comprehensive evaluation of these similarities, these findings suggest that the IOM cohort may be classified as a high-risk population.
Additionally, we decided to further stratify our cohort by exploiting every single value of Ki-67, from 10% to 70%. As depicted in
Figure 14B, we, thus, observed the clear discrimination between LR- and HR-classified patients when Ki-67 is above 25%. In fact, when its value is between 10% and 25%, the probability of being classified as a high or low risk of recurrence varies, as it probably depends on other factors. On the contrary, when the Ki-67 value overcomes 25%, the probability of having a high-risk result progressively increases as Ki-67 becomes higher.
The statistical analysis performed additionally supports this hypothesis (
p = 0.0001,
Table 9). Further analysis with larger and more standardized cohorts could offer greater clarity on this hypothesis.
EndoPredict is used in hormone-receptor-positive disease and, under standard conditions, is primarily administered to patients with the Luminal A or Luminal B phenotype. In the analyzed studies, information on surrogate molecular subtypes is never explicitly reported, preventing a systematic assessment of its potential implications in risk evaluation.
In the IOM case series (
Figure 15), 60% of patients have a Luminal B phenotype, 35.7% have a Luminal A phenotype, while in 4.3% of cases, this information was not retrieved. Among the 50 patients with a Luminal A phenotype, 20 (40%) were classified as low-risk according to the EndoPredict test, while 30 (60%) were classified as high-risk (
Table 10). Among the 84 patients with a Luminal B phenotype, 26 (31%) were classified as low-risk, while 58 (69%) were classified as high-risk. The slight differences observed in risk distribution between the two phenotypic groups were not statistically significant (
p = 0.24). This finding should be interpreted in light of the clinical selection criteria for EndoPredict testing, which is performed on a subset of patients with intermediate clinical risk rather than on the entire Luminal A or B population. The EPclin risk classification depends, in fact, on the gene expression signature, which also adjusts for tumor size and nodal status, not on the luminal phenotype itself.
The histology of tumors was analyzed in studies where this information was available: five [
45,
46,
47,
49,
53] and three studies [
50,
51,
55] with a prevalence of low-risk and high-risk subjects, respectively. Considering only the invasive subtype, in the group of studies with a predominance of low-risk patients, the average percentage of ductal carcinomas is 78.53%, while the percentage of lobular carcinomas is 13.09%. Additionally, an average of 8.48% of patients in this group have no available information or exhibit a different phenotype (
Table S7). In the group of studies with a predominance of high-risk patients, the average percentage of ductal carcinomas is 82.66%, while the percentage of lobular carcinomas is 5.78%. Additionally, an average of 11.55% of patients in this group have no available information or exhibit a different phenotype (
Figure S3A,B, Table S7).
In the IOM cohort, the percentage of ductal and lobular carcinomas is 85% and 7.9% respectively, while 7.1% of patients have no available information or exhibit a different phenotype (
Figures S3C and S4A–C). Since the majority of patients (119) have an invasive ductal histotype, the differences observed between ductal and lobular samples are not statistically significant (
p = 0.28,
Table S8).
To further assess the independent contribution of clinical and biological factors, we performed multivariate logistic regression that includes age, number of pregnancies, tumor size, nodal status, histological grade, and Ki-67 (
Table 11). Tumor size, nodal involvement, and the proliferative index remained independent predictors of a high-risk classification according to EPclin. Using Ki-67 as a continuous variable, pT1c (OR 7.12; 95% CI 1.25–40.60;
p = 0.027), pT2 (OR 38.99; 95% CI 5.39–281.95;
p = 0.0003), and node-positive disease (OR 5.24; 95% CI 1.48–18.57;
p = 0.010) were significantly associated with a high risk. Ki-67 was independently associated with a high risk as well (OR 1.11 per 1% increase; 95% CI 1.03–1.19;
p = 0.006). The model’s AUC was 0.846.
In an alternative specification using Ki-67 ≥ 20% (vs. <20%), pT2 (OR 27.08; 95% CI 4.12–178.20;
p = 0.0006), node-positive disease (OR 6.84; 95% CI 1.71–27.29;
p = 0.006), and Ki-67 ≥ 20% (OR 4.79; 95% CI 1.06–21.65;
p = 0.042) were significantly associated with a high risk, whereas pT1c showed a positive but non-significant trend (OR 3.95; 95% CI 0.81–19.24;
p = 0.089). Age showed a borderline inverse association with high-risk classification (OR 0.95 per year; 95% CI 0.91–1.00;
p = 0.044), while the number of pregnancies was not significant. The model’s AUC was 0.824. The complete results of the multivariable logistic regression models are reported in
Supplementary Table S4.