Survival of Breast Cancer by Stage, Grade and Molecular Groups in Mallorca, Spain

The aims of this study are: (1) to determine cause-specific survival by stage, grade, and molecular groups of breast cancer, (2) to identify factors which explain and predict the likelihood of survival and the risk of dying from this cancer; and (3) to find out the distribution of breast cancer cases by stage, grade, and molecular groups in females diagnosed in the period 2006–2012 in Mallorca (Spain). We collected data regarding age, date and diagnostic method, histology, laterality, sublocation, pathological or clinical tumor size (T), pathological or clinical regional lymph nodes (N), metastasis (M) and stage, histologic grade, estrogen and progesterone receptors status, HER-2 expression, Ki67 level, molecular classification, date of last follow-up or date of death, and cause of death. We identified 2869 cases. Cause-specific survival for the entire sample was 96% 1 year after diagnosis, 91% at 3 years and 87% at 5 years. Relative survival was 96.9% 1 year after diagnosis, 92.6% at 3 years and 88.5% at 5 years. The competing-risks regression model determined that patients over 65 years of age and patients with triple negative cancer have worse prognoses, and as stages progress, the prognosis for breast cancer worsens, especially from stage III.


Introduction
Breast cancer is the most common cancer and the leading cause of death from cancer in women around the world [1], including in Spain [2]. According to REDECAN, the estimated incidence cases in women in 2022 will be 34,750 new diagnoses in Spain [2].
According to EUROCARE-5, for the period 2000-2007, 5-year relative survival with breast cancer in Spain was 82.8% (81.9-83.6), slightly higher than the average in Europe, that was 81.8% (81.6-82.0), with a range from 74% for Eastern Europe to 85% for Northern Europe [3]. Compared with the period 1995-1999, breast cancer survival has improved in Spain [4].
Analyses of prognostic factors on survival are essential for patients and professionals and impact on health policy. Scientific evidence has demonstrated the prognostic value of different variables, such as age, ethnicity, tumor size, histology, histological grade, stage at diagnosis, hormone receptor status, and the surgical and adjuvant treatment patients receive [5,6].
Although stage, grade, and molecular groups are prognostic factors in breast cancer [6], information about survival based on these variables continues to be scarce. Regarding stage, grouped classification (localized, regional extension and metastasized) [7] or four-stage classification (I, II, III and IV) [8,9] are widely used. However, the International Union Against Cancer TNM system (7th edition) classifies invasive breast cancer stage into eight categories (IA, IB, IIA, IIB, IIIA, IIIB, IIIC and IV) [10]. Information about survival by stage, grade, or molecular group, as well as the relationships among these variables, is vital for making clinical decisions. Progressive improvements in survival associated with early detection and better management and treatment of the disease have been observed [6].
The aims of this study are: 1. to determine cause-specific survival by stage, grade, and molecular group of breast cancer; 2. to identify factors which explain and predict the likelihood of survival and the risk of dying from this cancer; and 3. to determine the distribution of breast cancer cases by stage, grade, and molecular group.

Patient Involvement
Our research comprised a population-based, retrospective follow-up study of female patients living in Mallorca, diagnosed with invasive breast cancer (C50) between 2006 and 2012, identified through the Mallorca Cancer Registry. The total population of Mallorca in 2012 was 876,147 inhabitants. We excluded cases exclusively identified through the death certificate (DCO) and cases without follow-up (only for survival analysis).

Variables
The following data were collected: age at diagnosis, date and diagnostic method, histology, laterality, sublocation, pathological or clinical tumor size (T), pathological or clinical regional lymph node status (N), metastasis (M) and stage, histologic grade, estrogen and progesterone receptors status, HER-2 expression, Ki67 levels, molecular classification, date of last follow-up or date of death and cause of death (breast cancer or other causes).
We defined survival time from the date of diagnosis to the date of last known vital status (death by any cause, date of loss to follow-up, or date of the end of follow-up on 31 December 2018). Vital status was categorized as alive (0), dead by breast cancer (1) or dead by other causes (2).

Statistical Analysis
We used the multiple imputation (MI) method to assign values when these were missing in the following main variables: laterality, sublocation, stage, histologic grade, and molecular classification. Three main steps were followed [11]. First, we ran the imputation model and replaced each missing value with sets of 5, 10, 15, and 20 imputations by applying the multiple imputation chained equation (MICE) procedures. We made the MI using sex, age, diagnostic method, histology, time, and vital status. A more detailed description can be found in a previous manuscript [12]. Secondly, we independently analyzed the resulting imputed and complete data sets by applying a competing-risks regression. Finally, we applied a single competing-risks regression model using Rubin's rules [13] from each set of 5, 10, 15, and 20 estimates resulting from the previous competing-risks regression model. We selected the MI with five imputations because the increase to 10, 15, or 20 did not change the coefficient values, the standard errors, or the degrees of significance.
Before performing the survival analysis, we explored the relationships among the variables using contingency tables, on the basis of which the Chi-square independence test and the Cramer's V association index were performed.
As we knew the cause of death, we used Cause-Specific Survival (CSS), but we also calculated the Relative Survival (RS) by the Ederer II method [14] using life tables obtained from published official mortality data for the Balearic Islands [15]. Since survival studies such as EUROCARE and CONCORD have used RS, we decided to calculate both survival types in order to be able to compare them with each other and with the aforementioned studies.
We applied the actuarial and Kaplan-Meier methods in our survival analysis to estimate the likelihood of survival and risk of death. We used the log-rank test to evaluate the statistical differences of the observed survival curves by each categorical variable and created graphic representations thereof in order to compare and observe the evolution of survival over time. Finally, we applied competing-risks regression models to identify the prognostic factors associated with mortality risk. The regression model included age, diagnostic method, sublocation, histology, stage, laterality, molecular classification, and histologic grade. Cases at stage IB were excluded because their survival rate was 100%. We tested the proportional hazard assumption for each covariate by introducing time-dependent variables.
Competing-risks regression [16] provides a valuable alternative to Cox regression [17] for survival data in the presence of competing risks. Competing-risks regression posits a model for the subhazards function of a failure event of primary interest in the presence of competing failure events that impede the event of interest. This must not be confused with the usual right-censoring found in survival data, such as censoring due to loss to follow-up. However, while censoring merely obstructs from observing the event of interest, a competing event prevents the event of interest from occurring altogether. In our study, the event of interest was breast cancer death, while the competing failure event was death from other causes. Finally, this model estimates the subhazard ratios in a manner akin to the hazard ratios in the Cox regression.
We selected the covariates in the final competing-risks model using the Wald test. We performed the competing-risks regression model both before and after MI to compare the effect of the imputation procedure on the subhazard ratio estimation of covariates.
We used STATA 16 for MI and CSS analysis and the 'relsurv' R library for RS.

Results
We identified a total of 2885 breast cancer cases with diagnoses between 2006 and 2012. We excluded 16 DCO cases, so the final sample was 2869 cases. Of them, 98.8% were diagnosed by pathological methods (1.2% by clinical methods), and 82.0% had ductal/NST histology. There were 5.4% of cases with unknown laterality, 18.2% with unknown sublocation, 16.7% with unknown T, 17.8% with unknown N, 14.8% with unknown M, 16.7% with unknown stage, 22.7% with unknown histologic grade, and 22.0% with unknown molecular classification. After MI, 30.1% were in stage IA, 3.4% were in stage IB, 24.1% were in stage IIA, 15.7% were in stage IIB, 11.2% were in stage IIIA, 3.8% were in stage IIIB, 3.4% were in stage IIIC, and 8.3% were in stage IV. Table 1 presents a complete description of the sample and the distribution of the variables imputed after applying MI.  Table 2 shows CSS by stage and year before and after MI. If MI had not been not applied, there would have been a slight overestimation in the initial stages; on the other hand, there would have been an underestimation in the more advanced stages (IIIB, IIIC and IV). Survival times with breast cancer seemed to stabilize for some stages (IB, IIA, IIIB, IIIC and IV) but not for others (IA, IIB, IIIA).  8  96  100  89  86  69  62  52  16  81  95  98  88  86  70  64  55  25  81  9  96  100  88  85  67  54  49  14  80  95  98  88  85  68  58  52  23  80  10  96  100  87  84  66  51  49  9  79  94  98  87  84  67  55  52  19  79  11  96  100  87  81  66  51  49  6  78  94  98  87  82  67  55  52  16  79  12  94  100  87  81  63  51  49  6  77  93  98  87  82  65  55  52  16  78  13  94  100  87  76  63  51  49  6  76  93  98  87  77  65  55  52  16  77 In the same way, Table 3 shows CSS by molecular classification and the year before and after MI. Again, if MI had not been applied, there would have been a slight overestimation in all molecular groups, except in the case of the triple negative, in which there would have been a slight underestimation. Table 3. Cause-specific survival (CSS) function in percentages by years of follow-up and molecular classification based on the actuarial method, before and after multiple imputation (MI) (m = 5).  1  99  97  98  95  92  97  98  97  97  91  92  96  2  97  95  95  88  86  94  95  94  95  85  87  93  3  97  92  93  84  83  92  95  91  92  81  84  91  4  96  89  91  82  79  89  93  88  90  79  80  88  5  95  87  89  80  77  87  93  86  88  77  78  87  6  94  84  86  76  75  85  92  84  86  74  76  84  7  94  82  84  75  73  83  91  82  83  73  74  82  8  93  79  83  73  73  82  90  79  82  71  74  81  9  93  78  81  68  72  80  90  79  81  67  73  80  10  93  75  80  68  72  79  90  76  80  67  73  79  11  93  75  80  68  72  79  90  76  80  67  73  79  12  93  75  78  68  72  78  90  76  78  67  73  78  13  93  69  78  68  72  77  90  71  78  67  73  77   Table 4 shows CSS and RS at 5 years by stage and molecular classification before and after MI. Generally, a slightly higher RS can be observed compared to CSS before and after MI. Conversely, both the SR and the CSS are slightly lower after MI. Survival curves showed differences in breast cancer survival (p < 0.001) by age and histology (Figure 1), stage, histologic grade, and molecular classification (Figure 2). Laterality was significant at p = 0.09 and sublocation was significant at p = 0.07. Comparing each variable by pair of categories, all age groups presented differences (p < 0.05), except between the 15-44 and 45-54 groups and between the 45-54 and 55-64 groups. Breast cancer survival diminishes markedly in people over 75 years of age. Ductal/NST and mixed carcinoma histologies have better survival compared to other carcinoma subtypes (p < 0.05); other neo-plasms (non-epithelial and non-specific) have the worst survival. There were survival differences in all stages (p < 0.05), except between IA and IB and IIA and IIB. All categories of histologic grade presented differences (p < 0.05); the prognosis worsened as the grade of differentiation decreased. All categories of molecular classification presented differences (p < 0.05), except between luminal B and luminal with ki67 unknown, and between HER-2 enriched and triple negative, being luminal A the category with the best survival. Slight changes could be seen in the survival curves after applying MI (Figure 2). The Wald test included age, sublocation, stage, laterality, and molecular classification in the final competing-risks regression model. Therefore, we excluded diagnostic method, histology, and histologic grade. The exclusion of histologic grade was probably due to its relationship with molecular classification (χ 2 (8) = 594.44, p < 0.001; Cramer's V index = 0.38, p < 0.001). Table 5 shows the results of the competing-risks model before and after MI. After MI, the model determined that patients over 65 years old had worse prognoses The Wald test included age, sublocation, stage, laterality, and molecular classification in the final competing-risks regression model. Therefore, we excluded diagnostic method, histology, and histologic grade. The exclusion of histologic grade was probably due to its relationship with molecular classification (χ 2 (8) = 594.44, p < 0.001; Cramer's V index = 0.38, p < 0.001). Table 5 shows the results of the competing-risks model before and after MI. After MI, the model determined that patients over 65 years old had worse prognoses than the 55-64 years old group. Also, patients with triple negative had a worse prognosis than those with luminal with ki67 unknown. As stages progress, the prognosis for breast cancer worsens, especially for stage IV. In general, standard errors were lower after MI, providing more accurate estimates of the risk of dying from this cancer. Finally, sublocation and laterality were no longer significant after MI.

Discussion
The CSS of breast cancer at 1, 3, and 5 years was 96%, 91%, and 87% respectively, while the RS 1, 3, and 5 years after diagnosis was 96.9%, 92.6%, and 88.5%, respectively. The RS was slightly higher than CSS, as previously observed in other studies for breast and other cancers, such as prostate cancer, for which early diagnosis is performed [18,19].
In these cancers, RS overestimates survival because of earlier diagnosis. In the study by De Lacerda et al., for example, the difference between CSS and RS 5 years after diagnosis (period 2000-2013, N = 653,181 cases) was like ours.
Comparing the RS obtained in our study with those of other population-based studies, we see that the CONCORD-3 (period 2010-2014) obtained a net survival at five years of between 70% and 85%. Most European countries, including Spain, the United States, Canada, and Australia had a survival rate of 85% or more, up to a maximum of 92.8%. In Spain, the CONCORD-3 study concluded that 5-year survival from diagnosis for the period 2000-2004 was 82.9%, while for the period 2005-2009, it was 84.6%, and for the period 2010-2014, it was 85.2% [20].
On the other hand, the EUROCARE-5 study (1999-2007) obtained a European-wide RS 5 years after diagnosis of 81.8% and 82.8% in Spain [3]. In the previous period (EUROCARE -4, 1995-1999), the RS was 79.4% in Europe and 80.3% in Spain [21]. According to REDECAN, survival rates by region showed slight differences: The survival rate obtained in our study is among the highest published to date. The management of breast cancer in Spain is partly associated with the implementation of Population Screening Programs in the 1990s. In our Autonomous Community, the program for the early detection of breast cancer began in 1998, one of the last in the country. Despite this, the data recorded by the Carlos III Health Institute [22] indicate that mortality in our Autonomous Community is among the lowest in the country. We believe that the private sector, which predominates in our region, is compensating for the late start and low coverage of our screening program. According to Grande et al., monitoring of regional epidemiological indicators for breast cancer is crucial to evaluate the different measures taken for breast cancer control [23].
In the CONCORD and EUROCARE studies, survival was shown to be increasing over time. The obtained results, i.e., 88.5% five years after diagnosis, confirmed this trend, which was attributed to the implementation of screening programs and therapeutic improvements [3].
We have observed that the survival of breast cancer cases does not stabilize after ten years; this is consistent with recent cure fraction studies that have shown that after ten years, only 50% of breast cancers are cured and that the time to cure depends on age, being lower in middle age women, and higher in young (15-44 years old) or oldest (65-74 years old) women in some stages but not in others [24]. In our study, survival stabilization could be also related to stage, as we have seen that it occurs after ten years in some stages but not in others.
In our study, breast cancer survival was associated only with age over 65, triple negative, and stage. Furthermore, the competing risks regression model showed that, in contrast, neither sublocation nor laterality affected survival after MI. However, other studies have found a relationship between sublocation and survival [25].
On the other hand, age is a known independent prognostic factor in numerous studies [3]. It has been previously described that the age with the best overall survival is around 50; from there, it goes down with increasing age [26]. In our study, the age range with the best survival was 55-64 years, more or less coinciding with the age range of the population-based breast cancer screening program in Mallorca until 2006. From then on, the age included in the program was progressively extended, reaching 69 in 2011. The age extension has not so far resulted in improved survival in the women included in our study. This situation could be due to a delay in achieving adequate population coverage, due to the limited resources of the screening program. The regression model has shown that being older than 64 affects survival; this is especially the case for women above 75 years of age. Some authors relate these results to suboptimal treatment in this age group due to the presence of comorbidities, possible toxicities, preferences of the patient, etc. [27]. For the purpose of understanding the impact of the screening program (age 50-69 years) on survival, the Supplementary Material provides the results grouped by age as follows: (1) up to 49 years of age; (2) from 50 to 69 years of age; (3) over 70 years of age (Tables S1 and S2, Figure S1). It can be observed that patients who are in the screening program have a better survival rate compared to the group of older women and a similar survival rate to the group of younger women.
Diagnosis stage is a factor that shows the most evident relationship with survival. All stages are significant concerning the reference group (IA), and the prognosis changes as the diagnosis stage progresses. For example, diagnosis at stage II (IIA and IIB) implies approximately double the likelihood of not surviving compared to stage IA. In stages IIIA and IIIB, this probability is multiplied by approximately five times, while the probability of not surviving when with stage IIIC (characterized by positive supraclavicular lymph nodes) is already almost eight times that of stage IA. The worst prognosis is at stage IV, i.e., almost 22 times more likely not to survive than the reference stage. Fortunately, only 26.7% of cases are in stage III or IV. These results demonstrate that the stage at the time of diagnosis is key to patient survival. Moreover, it is a variable on which we can take action, as it is not an intrinsic characteristic of cancer itself. Therefore, it is necessary to develop and maintain early detection programs and rapid diagnosis protocols that allow diagnoses to be made at the earliest possible stage, as this will have a clear impact on patient survival.
To our knowledge, no survival study based on population data according to stagecategorized into eight levels-has been published. However, at the European level, Nordenskjöld et al. presented survival data by stage in a population-based study of diagnoses made between 1989 and 2013 in Sweden (N = 42,220). Those authors observed 5-year survival rates of 97.8% in stage I, 87.4% in stage II with negative nodes, 89.5% in stage II with positive nodes, 64.1% in stage III and 17.1% in stage IV [28]. Nationally, diagnoses between 2000 and 2012 from the Granada Registry had RS five years after diagnosis of 96.6% for stage I, 88.2% for stage II, 62.5% for stage III, and 23.3% for stage IV [29]. In all cases, the survival rates obtained in our sample were slightly higher, especially in stage IV.
Finally, our analysis of the molecular group using the competitive risk model showed that only belonging to the triple-negative group means a worse survival rate than the reference group (Luminal with unknown Ki67), i.e., the risk of dying from breast cancer in this group is almost double that of the other groups. If we analyze CSS and RS by molecular group, the triple-negative and the HER-2 enriched groups had the worst survival. This survival distribution was previously known and has been analyzed in recent years in several population-based studies [30,31]. The best survival rate was observed in the Luminal A group; this was probably due to the fact that this group has therapeutic targets (hormone therapy), while research is still being carried out to determine which therapeutic targets may be effective for the treatment of triple-negative breast cancer. Additionally, triple negative breast cancer is particularly heterogeneous [31].
Regarding the other factors included in our study that may affect survival, although not included as significant in the competing risks model, we obtained histology and histological grade data. Concerning these factors, it is necessary to note that histology likely affects survival, because it is a characteristic of the tumor itself. Hence, the histological group with the worst survival is "other neoplasms" (non-epithelial, non-specific), which includes sarcomas and non-specific histologies, both of which have poor prognoses. On the other hand, histological grade has been shown to be a prognostic factor associated with breast cancer survival, but due to its close relationship with molecular group, we were not able to include it in the competitive risk model [32].
The percentage of missing cases in the stage was 16,8%, apparently higher than that found in the high-resolution CONCORD study, where it was 8% in European registries and 11% in registries in the United States. In some studies, it is assumed that if T and N are known, M can be considered 0 [33]. In our case, in cases where M was unknown at initial diagnosis, a thorough review of the clinical history was performed to confirm, whenever possible, the value of M. Moreover, they decided to exclude unstaged cases. We have shown that multiple imputation of missing stage or molecular groups avoids underestimating survival in advanced stages or triple-negative cases while causing these results to be overestimated in early stages or in other molecular groups. Therefore, the use of multiple imputation made it possible to use data for all of the patients in the database and obtain unbiased and more accurate estimates of breast cancer survival. In this line, Derks et al. identified the need to apply competing-risks models in long follow-up survival studies on breast cancer in which other causes of death are taken into account, as well as its usefulness when we values are missing [34].
One limitation in our study was the relatively high percentage of missing values regarding molecular group, specifically about Ki67, because clinicians did not use this metric during the study period. However, we overcame this by creating the category Luminal with Ki67 unknown to maximize the information available about ER, PR and HER-2. This strategy allowed us to reduce the missing values in molecular groups from 56.8% to 22.0%. Ki67 is an independent prognostic variable which is currently being used in clinical practice to make therapeutic decisions [35]. However, its use is controversial, because there is no single helpful cutoff point [36], and other studies have failed to include in their molecular classifications [37].
Another limitation was that we did not collect information about treatment, relapses, or risk mutations, such as BRCA1-2. It is essential to note the difficulty in collecting this type of information from cancer registries due to the complexity of searching for it in patients' medical records. We initially tried to collect the treatments received by patients with a sample taken from our database, but due to the complexity involved, we decided to focus on other aspects. In light of this fact, it should be noted that information about TNM is also challenging to find, and, in Spain, the cancer register of Mallorca is one of the few that collect it.
The strength of this study is that the sample was population-based, with high-quality data, as the negligible percentage of missing cases in terms of survival information and the minimal differences observed between cancer-specific and relative survival show. In our research, clinical records of each case were reviewed by trained professionals. We followed some of the cases for up to 13 years. Moreover, we knew the cause of death, which allowed us to calculate cause-specific survival. The application of the competing-risks model instead of the Cox model used in other studies [12] made it possible to obtain more realistic estimates of the risk of dying from breast cancer, considering that there are obviously patients who die from other causes. On the other hand, it was possible to collect a pathological diagnosis in many cases, which is unusual compared to other tumors which are frequently diagnosed by imaging or other clinical tests.

Conclusions
The breast cancer CSS and RS obtained in this study are good, i.e., above the average in Spain, confirming the improving trends for this cancer. We conclude that age, stage at diagnosis, and molecular classification are significant prognostic factors. Our data indicate that triple-negative tumors have the worst prognosis regarding molecular classification. In addition, our study showed a worsening breast survival rate by stage III, although these cases represent only one quarter of cases. Therefore, reinforcing early detection breast cancer programs and developing rapid diagnosis protocols are essential.

Informed Consent Statement:
This study is retrospective based on data collected by the Mallorca Cancer Registry (Balearic Islands, Spain), which does not have individual consent from patients, because this information is required by the national health system in relation to cancer registries from Spain. Data Availability Statement: Data available on request due to privacy/ethical restrictions.

Conflicts of Interest:
The authors declare no conflict of interest.