Validating the ACOSOG Z0011 Trial Result: A Population-Based Study Using the SEER Database.

The Z0011 trial demonstrated that axillary lymph node dissection (ALND) could be omitted in spite of 1-2 metastatic sentinel lymph nodes. This study aimed to validate the results on a population-based database. The Surveillance, Epidemiology, and End Results (SEER) database was searched for patients comparable to the Z0011 participants. The type of axillary surgery was estimated using the total number of examined axillary lymph nodes (ALNs). Breast cancer-specific mortality (BCSM) was compared between patients with ≥10 ALNs (the sentinel lymph node dissection (SLND) and ALND group, or "SLND + ALND group") and patients with one or two ALNs (the "SLND group"). During 2010-2015, the SEER database included 7077 and 6620 patients categorized in the SLND group and the SLND + ALND group, respectively. Death was observed for 515 patients (7.3%) in the SLND group and 589 patients (8.9%) in the SLND + ALND group based on a median follow-up of 41 months. After propensity-score matching, the adjusted hazard ratio for BCSM in the SLND group (vs. the SLND + ALND group) was 1.038 (95% confidence interval: 0.798-1.350). Regardless of the SLND criteria, the outcomes were not significantly different between the two groups. This retrospective cohort study of Z0011-comparable patients revealed that ALND could be omitted based on the Z0011 strategy, even among patients with ≤2 dissected ALNs.


Introduction
Axillary management of breast cancer has evolved from conventional axillary lymph node dissection (ALND) to less invasive adjuvant therapies. For example, the American College of Surgeons Oncology Group (ACOSOG)'s Z0011 trial demonstrated that conventional ALND could be safely omitted without increasing recurrence or cancer-related death among clinically node-negative women with T1-T2 invasive breast cancer who were receiving breast-conserving surgery (BCT) with planned whole-breast irradiation and adequate systemic therapy, even if metastases were present in one or two sentinel lymph nodes (SLNs). The long-term Z0011 follow-up data also revealed non-inferior

Characteristics of the Z0011-Comparable SEER Cohort
The 23,138 Z0011-comparable SEER patients were assigned to the SLND group (7077 patients), the "SLND plus" group (9411 patients), and the SLND + ALND group (6620 patients) ( Table 1). The median age was 60 years, the median tumor size was 1.8 cm, 86.1% of the patients had estrogen receptor (ER)-positive cancers, 58.1% of the patients received adjuvant chemotherapy, and 73.8% of the patients received radiation therapy (RT). The median follow-up was 41 months, and death was observed for 1760 patients (7.6%), including 865 breast cancer-specific mortality (BCSM) cases (3.7%). There was a significant increase in the ratio of SLND to SLND + ALND patients according to the year of diagnosis during 2010-2015.

Characteristics of the SLND Group
Among the Z0011-comparable SEER cohort, the SLND group had smaller and lower-grade tumors than the SLND + ALND group, as well as a higher ER-positive rate and a lower human epidermal growth factor receptor 2 (HER2) overexpression rate. The SLND group also had a far higher number of patients with only one positive LN. Adjuvant RT was slightly more common in the SLND group (74.9% vs. 71.8%; p < 0.001), while adjuvant chemotherapy was more common in the SLND + ALND group (49.0% vs. 69.7%; p < 0.001). The "SLND plus" group had characteristics that were intermediate between the SLND and SLND + ALND groups (Table 1).
Although the characteristics of the SLND group in the Z0011-comparable SEER cohort were generally similar to those in the original Z0011 cohort, adjuvant RT and chemotherapy were administered less frequently in the SEER cohort (Table S1). Furthermore, the SEER dataset only includes incomplete information regarding adjuvant therapies, which suggests that the actual rates of adjuvant therapy would be higher than our results. Abbreviations: Abbreviations: axillary lymph node (ALN); axillary lymph node dissection (ALND); confidence interval (CI); estrogen receptor (ER); human epidermal growth factor receptor 2 (HER2); hazard ratio (HR); invasive ductal carcinoma (IDC); invasive lobular carcinoma (ILC); progesterone receptor (PR); sentinel lymph node dissection (SLND).

Outcomes in the Z0011-Comparable Cohort
The 865 BCSM cases (3.7%) included 219 patients (3.1%) in the SLND group and 312 patients (4.7%) in the SLND + ALND group. However, we omitted 334 patients (3.5%) in the "SLND plus" group because their axillary surgery type was unclear. The SLND group had a significantly shorter follow-up period, which might be related to the increasing proportion of the SLND cases (relative to SLND + ALND cases) during 2010-2015. We did not detect a significant difference in BCSM risk when we compared the SLND group to the SLND + ALND group (unadjusted hazard ratio (HR): 0.884, 95% confidence interval (CI): 0.743-1.051) ( Figure 1). Univariate differences in BCSM were observed according to T category, histological grade and type, number of metastatic ALNs, node status, ER status, progesterone receptor (PR) status, chemotherapy status, and RT status. No significant differences in BCSM were observed according to age or HER2 status.
Furthermore, there was no significant difference in BCSM when we compared the SLND and SLND + ALND groups (adjusted HR: 1.065, 95% CI: 0.821-1.382) using a multivariable model ( Table 2). The multivariable analysis revealed that BCSM was significantly influenced by T category, histological grade, age, number of metastatic ALNs, node status, ER status, PR status, HER2 status, chemotherapy status, and RT status. The effect of ALND omission on BCSM differed by the number of metastatic ALN(s) and RT status (interaction p = 0.008 and 0.023 respectively).
Cancers 2020, 12, 950 6 of 12 we compared the SLND group to the SLND + ALND group (unadjusted hazard ratio (HR): 0.884, 95% confidence interval (CI): 0.743-1.051) ( Figure 1). Univariate differences in BCSM were observed according to T category, histological grade and type, number of metastatic ALNs, node status, ER status, progesterone receptor (PR) status, chemotherapy status, and RT status. No significant differences in BCSM were observed according to age or HER2 status. Furthermore, there was no significant difference in BCSM when we compared the SLND and SLND + ALND groups (adjusted HR: 1.065, 95% CI: 0.821-1.382) using a multivariable model ( Table  2). The multivariable analysis revealed that BCSM was significantly influenced by T category,  Subgroup analyses according to histological grade, age, T category, ER status, PR status, HER2 status, molecular subtype, node status and adjuvant therapy status revealed no significant difference in BCSM between the SLND and SLND + ALND groups ( Figure 2). Among patients with two metastatic ALNs, the SLND group had a higher BCSM rate than the SLND + ALND group (HR: 1.576, 95% CI: 1.090-2.279). Interestingly, both ALNs were metastatic for all SLND patients with two ALNs in total.

Comparing BCSM after PS Matching
We performed propensity score (PS) matching to reduce bias related to the influence of patient and tumor characteristics on the decision to omit ALND. After omitting 422 patients in the SLND group and 424 patients in the SLND + ALND group because of missing data, PS matching was performed for 6655 patients in the SLND group and 6196 patients in the SLND + ALND group. The PSs were calculated using a logistic regression model with the following independent variables: age (≤50 vs. > 50 years), T category, number of positive ALN(s), micro-/macro-metastasis, histological Risk of breast cancer-specific mortality in each subgroup of the Z0011-comparable patients according to the clinicopathologic risk factors. Abbreviations: axillary lymph node (ALN); axillary lymph node dissection (ALND); confidence interval (CI); estrogen receptor (ER); human epidermal growth factor receptor 2 (HER2); hazard ratio (HR); invasive ductal carcinoma (IDC); invasive lobular carcinoma (ILC); progesterone receptor (PR).

Comparing BCSM after PS Matching
We performed propensity score (PS) matching to reduce bias related to the influence of patient and tumor characteristics on the decision to omit ALND. After omitting 422 patients in the SLND group Cancers 2020, 12, 950 7 of 12 and 424 patients in the SLND + ALND group because of missing data, PS matching was performed for 6655 patients in the SLND group and 6196 patients in the SLND + ALND group. The PSs were calculated using a logistic regression model with the following independent variables: age (≤50 vs. >50 years), T category, number of positive ALN(s), micro-/macro-metastasis, histological type and grade, ER status, PR status, HER2 status, and adjuvant therapy status. In total, we identified 7194 PS-matched patients (3597 patients in the SLND group, 3597 patients in the SLND + ALND group). The standardized differences before and after the PS matching are summarized in Table S2.
Among the 7194 PS-matched patients, the adjusted HR for BCSM in the SLND group was 1.038 (95% CI: 0.798-1.350). This result agrees with the result from the multivariable analysis and supports the Z0011 strategy. Figure 3 shows our adjusted HR and the reported HR for BCSM from the Z0011 trial [2].

Sensitivity Analysis of BCSM after PS Matching
We also performed sensitivity analyses based on the possibility that patients would be unintentionally excluded from the SLND group based on our operational definition. The SLND group was redefined using one, three or four as the total number of ALNs, and the BCSM outcomes were compared between those groups and the SLND + ALND group after PS matching. None of the ALN criteria provided a significant difference in BCSM between the SLND and SLND + ALND groups ( Figure S1).

Discussion
To validate the Z0011 strategy using the SEER dataset, we operationally defined the SLND group as having ≤2 ALNs. This allowed us to select patients who underwent SLND alone, with minimal inclusion of patients who underwent conventional ALND, as SLND removes an average of two LNs in most studies [6][7][8]. However, relative to patients who truly underwent SLND alone, patients in our SLND group might be more susceptible to recurrence and/or cancer-related mortality, given the relative high possibility of a false negative SLND result using a small number of ALNs [9]. Our operational definition is also useful because patients in the SLND group are unique in that their SLND pathological results have little significance in determining whether to proceed with conventional ALND under the Z0011 strategy, as the number of positive SLNs could not exceed two. Thus, the applicability of the Z0011 results within that group is a clinically significant issue that needed to be validated separately.
Another retrospective study has evaluated an Asian Z0011-eligible cohort, and confirmed that ALND omission did not increase the risk of recurrence, even among patients with ≤2 SLNs [10]. However, that study did not evaluate cancer-related deaths because of limited information regarding . Adjusted hazard ratios of the SLND group to the SLND + ALND group, for breast cancer-specific mortality after propensity score matching. Abbreviations: axillary lymph node (ALN); axillary lymph node dissection (ALND); confidence interval (CI); hazard ratio (HR); sentinel lymph node dissection (SLND).

Sensitivity Analysis of BCSM after PS Matching
We also performed sensitivity analyses based on the possibility that patients would be unintentionally excluded from the SLND group based on our operational definition. The SLND group was redefined using one, three or four as the total number of ALNs, and the BCSM outcomes were compared between those groups and the SLND + ALND group after PS matching. None of the ALN criteria provided a significant difference in BCSM between the SLND and SLND + ALND groups ( Figure S1).

Discussion
To validate the Z0011 strategy using the SEER dataset, we operationally defined the SLND group as having ≤2 ALNs. This allowed us to select patients who underwent SLND alone, with minimal inclusion of patients who underwent conventional ALND, as SLND removes an average of two LNs in most studies [6][7][8]. However, relative to patients who truly underwent SLND alone, patients in our SLND group might be more susceptible to recurrence and/or cancer-related mortality, given the relative high possibility of a false negative SLND result using a small number of ALNs [9]. Our operational definition is also useful because patients in the SLND group are unique in that their SLND pathological results have little significance in determining whether to proceed with conventional ALND under the Z0011 strategy, as the number of positive SLNs could not exceed two. Thus, the applicability of the Z0011 results within that group is a clinically significant issue that needed to be validated separately.
Another retrospective study has evaluated an Asian Z0011-eligible cohort, and confirmed that ALND omission did not increase the risk of recurrence, even among patients with ≤2 SLNs [10]. However, that study did not evaluate cancer-related deaths because of limited information regarding mortality. In contrast, the present study involved a median follow-up of 41 months and identified 865 BCSM cases (3.7%) and 1760 cases (7.6%) of all-cause mortality. The present study also involved a large number of subjects, which allowed for a useful survival analysis despite the short follow-up period. Finally, the present study's mortality rate was similar to the Z0011 results, based on 5-year overall survival rates of 92.5% in the SLND-alone group and 91.8% in the ALND group [11].
Conventional ALND is thought to require ≥10 LNs to provide an adequate axillary assessment during breast cancer staging, and we used that cut-off to define our SLND + ALND group. These patients had only between zero and one metastatic nodes identified via further ALND after the one or two SLN metastases were detected, which indicates they are a unique subgroup with less metastatic burden than patients with ≥2 additional nodal metastases going through the same process, and that they would experience more favorable outcomes. However, the SLND patients might have poorer outcomes than patients with ≥3 SLNs, based on their higher possibility of false negative results. Thus, the lack of a significant difference in BCSM between the SLND and ALND + SLND groups supports the Z0011 strategy, and demonstrates that a small number of SLNs (e.g., ≤2 SLNs) had a minimal effect on BCSM when using the Z0011 strategy.
Our operational definition assumed that SLND was performed for all patients who had one or two ALN metastases in the final pathological report. Thus, the SLND + ALND group might have included some patients with upfront ALND but without SLND, based on the results of image-guided cytology. However, these patients all had small tumors and a low nodal metastatic burden, which suggests that their outcomes would be similar regardless of whether the metastatic LNs were detected via SLND or image-guided cytology. In addition, the initial method of detecting metastatic LNs would have little influence on patient outcomes if their nodal status is fixed based on the final pathological examination. That assumption is parallel to the previous attempts to expand the Z0011 strategy based on preoperative imaging and/or image-guided cytology [12][13][14][15][16].
The present study's retrospective design might have introduced biases that disguised poorer outcomes in the SLND group, relative to the SLND + ALND group, especially as we detected differences between the two groups that suggested that selection bias affected the decision to omit ALND (Table 1). Thus, we performed PS matching and re-assessed the outcomes, which failed to reveal a significant difference in BCSM between the SLND and SLND + ALND groups. We believe that this approach is useful for overcoming the limitations of a retrospective validation study, especially given that a prospective validation of the Z0011 trial results would not be feasible, given the availability of long-term follow-up data and several previous validation studies.
Interestingly, among patients with two ALN metastases, the SLND group had a significantly higher BCSM rate, relative to the SLND + ALND group (Figure 2), although we suspect that this difference might have been intensified by our operational grouping. In this context, SLND patients with two ALN metastases might have a higher possibility of additional metastatic nodes that would be identified via further ALND, relative to patients with metastasis-free SLNs, which could be related to the ordinal position of the first positive LN among all removed SLNs. Yi et al. [17] have demonstrated that the first metastatic SLN was found to be the "hottest" SLN during SLND in 69% of their cases, and that the likelihood of metastatic disease decreased with each successive SLN that was evaluated. Thus, given that the SLN is the first in a regional lymphatic basin to accept drainage from the primary tumor [18][19][20], latter SLNs would be less likely to be informative. Therefore, it might be prudent to cautiously apply the Z0011 strategy for patients with metastases in 100% of a small number of SLNs. This study has several limitations. First, the median follow-up was relatively short, given the outcomes of early-stage and ER-positive tumors, which highlights the importance of prolonged follow-up with an analysis of late events. Second, the SEER database does not include detailed information regarding endocrine therapy or specific RT fields, such as high-tangential or nodal irradiation, which precludes related analyses. However, evaluating patients from the post-Z0011 era presumably means that most patients received adequate endocrine therapy, and the roles of specific RT fields in the Z0011 results remain unclear. Furthermore, we failed to detect significant differences in the adjusted HRs for BCSM from the multivariable and PS-matched analyses ( Table 2 and Figure 3), which suggests that any potential bias is unlikely to change our findings. Third, possible caveats associated with having two very different patient populations, as shown in Table 1, should be still considered, although we calibrated those parameters with statistical methods.
This study's major strength is the use of the largest Z0011-comparable cohort from a population-based database, although the potential influence of selection bias should not be overlooked. Furthermore, to overcome the lack of information regarding axillary surgery type, we used an operational definition based on clinical experience and logical deduction. Moreover, our results might be meaningful in evaluating the Z0011 strategy in a contemporary cohort, although additional information is needed regarding late outcomes.

SEER Database and Cases
The Surveillance, Epidemiology, and End Results (SEER) database [21] is maintained by the US National Cancer Institute and covers 18 population-based registries from 1973 to 2016 (approximately 30% of American patients). We retrospectively identified Z0011-comparable patients during 2010-2015 using SEER*Stat 8.3.6 software. The retrospective search identified 23,138 women with T1-T2 invasive breast cancer, primary BCT, and one or two metastatic axillary lymph nodes (ALNs) (Figure 4).

SEER Database and Cases
The Surveillance, Epidemiology, and End Results (SEER) database [21] is maintained by the US National Cancer Institute and covers 18 population-based registries from 1973 to 2016 (approximately 30% of American patients). We retrospectively identified Z0011-comparable patients during 2010-2015 using SEER*Stat 8.3.6 software. The retrospective search identified 23,138 women with T1-T2 invasive breast cancer, primary BCT, and one or two metastatic axillary lymph nodes (ALNs) (Figure 4). The SEER dataset did not specify the type of axillary surgery, which we operationally defined as SLND and/or ALND based on a set of three assumptions ( Figure 5): 1) all 23,138 patients underwent SLND and/or ALND, 2) most patients with one or two examined ALNs underwent SLND alone, and 3) most patients with ≥10 ALNs underwent conventional ALND based on their SLND results. Based on those assumptions, we assigned patients with one or two examined ALNs to the SLND alone group (the "SLND group"), and patients with ≥10 examined ALNs to the "SLND + The SEER dataset did not specify the type of axillary surgery, which we operationally defined as SLND and/or ALND based on a set of three assumptions ( Figure 5): 1) all 23,138 patients underwent SLND and/or ALND, 2) most patients with one or two examined ALNs underwent SLND alone, and 3) most patients with ≥10 ALNs underwent conventional ALND based on their SLND results. Based on those assumptions, we assigned patients with one or two examined ALNs to the SLND alone group (the "SLND group"), and patients with ≥10 examined ALNs to the "SLND + ALND group". Patients with three to nine ALNs were assigned to an "SLND plus group" because we could not determine whether they underwent SLND alone or SLND + ALND.
Cancers 2020, 12, 950 10 of 12 ALND group". Patients with three to nine ALNs were assigned to an "SLND plus group" because we could not determine whether they underwent SLND alone or SLND + ALND. The primary outcome was defined as breast cancer-specific mortality (BCSM), based on a "breast"-related cause in the SEER dataset. Deaths from other causes were assumed to be censored at the time of death.

Sensitivity Analysis
The total number of lymph nodes acquired after SLND could be only one or ≥3 in the real world. Thus, the outcomes of "SLND alone" patients might be different from the outcomes in our "SLND group". To address that potential discrepancy, we performed a sensitivity analysis with other criteria for defining the SLND group based on the number of total ALNs (one, three and four ALNs). The corresponding risk estimates were calculated in the same way as the main analysis.

Propensity Score Matching
The effects of selection bias were minimized by matching propensity scores (PSs), which were calculated using a logistic regression model with the SLND group as the dependent variable and other variables that were selected based on their univariate associations with the SLND group. the logistic regression model for PS calculation included the following independent variables: age (≤50 vs. >50 years), T category, number of positive ALN(s), micro-/macro-metastasis, histologic type and grade, ER status, PR status, HER2 status, and adjuvant therapy status. Patients from the SLND and SLND + ALND groups were paired 1:1 using nearest-neighbor matching with a caliper width less than 0.25 standard deviations. Standardized differences were estimated before and after the matching to evaluate the covariates' balance, with absolute values of <0.1 considered indicative of wellbalanced groups [22,23]. These analyses were performed with R software version 3.5.2.

Statistical Analyses
The two groups' characteristics were compared using the chi-squared test and two-sample ttests. Survival curves were compared using the Kaplan-Meier method and log-rank test. Cox's The primary outcome was defined as breast cancer-specific mortality (BCSM), based on a "breast"-related cause in the SEER dataset. Deaths from other causes were assumed to be censored at the time of death.

Sensitivity Analysis
The total number of lymph nodes acquired after SLND could be only one or ≥3 in the real world. Thus, the outcomes of "SLND alone" patients might be different from the outcomes in our "SLND group". To address that potential discrepancy, we performed a sensitivity analysis with other criteria for defining the SLND group based on the number of total ALNs (one, three and four ALNs). The corresponding risk estimates were calculated in the same way as the main analysis.

Propensity Score Matching
The effects of selection bias were minimized by matching propensity scores (PSs), which were calculated using a logistic regression model with the SLND group as the dependent variable and other variables that were selected based on their univariate associations with the SLND group. The logistic regression model for PS calculation included the following independent variables: age (≤50 vs. >50 years), T category, number of positive ALN(s), micro-/macro-metastasis, histologic type and grade, ER status, PR status, HER2 status, and adjuvant therapy status. Patients from the SLND and SLND + ALND groups were paired 1:1 using nearest-neighbor matching with a caliper width less than 0.25 standard deviations. Standardized differences were estimated before and after the matching to evaluate the covariates' balance, with absolute values of <0.1 considered indicative of well-balanced groups [22,23]. These analyses were performed with R software version 3.5.2.

Statistical Analyses
The two groups' characteristics were compared using the chi-squared test and two-sample t-tests. Survival curves were compared using the Kaplan-Meier method and log-rank test. Cox's proportional hazard regression models were used to calculate hazard ratios (HRs) and 95% confidence intervals (CIs) for the associations of BCSM with the prognostic variables and treatments. All tests were two-sided and p-values of ≤0.05 were considered statistically significant. The analyses were performed using IBM SPSS software (version 20.0; IBM Corp., Armonk, NY, USA) and SAS software (version 9.3; SAS Institute, Cary, NC, USA).

Conclusions
This retrospective study of Z0011-comparable patients from the SEER database revealed that ALND could be omitted without increasing the risk of BCSM, based on a median follow-up of 41 months. Furthermore, our results suggest that the small number of SLNs had a minimal effect on the risk of BCSM based on the Z0011 strategy.
Supplementary Materials: The following are available online at http://www.mdpi.com/2072-6694/12/4/950/s1. Figure S1: Adjusted hazard ratios of the SLND group to the SLND + ALND group, for breast cancer-specific mortality after propensity score matching, Table S1: Comparison between the Z0011-comparable cohort in this study and the ACOSOG Z0011 study cohort, Table S2: Standardized differences before and after propensity score matching for the Z0011-comparable cohort.