Web-Based Dynamic Nomograms for Predicting Overall Survival and Cancer-Specific Survival in Breast Cancer Patients with Lung Metastases

Background: 60–70% of patients who die from breast cancer have lung metastases. However, there is a lack of readily available tools for accurate risk stratification in patients with breast cancer lung metastases (BCLM). Therefore, a web-based dynamic nomogram was developed for BCLM to quickly, accurately, and intuitively assess overall and cancer-specific survival rates. Methods: Patients diagnosed with BCLM between 2004 and 2016 were extracted from the Surveillance, Epidemiology, and Final Results (SEER) database. After excluding incomplete data, all patients were randomly assigned to training and validation cohorts (2:1). Patients’ basic clinical information, detailed pathological staging and treatment information, and sociological information were included in further analysis. Nomograms were constructed following the evaluations of the Cox regression model and verified using the concordance index (C-index), calibration curves, time-dependent receiver operating characteristic (ROC) curves, and decision curve analysis (DCA). Web-based dynamic nomograms were published online. Results: 3916 breast cancer patients with lung metastases were identified from the SEER database. Based on multivariate Cox regression analysis, overall survival (OS) and cancer-specific survival (CSS) are significantly correlated with 13 variables: age, marital status, race, grade, T stage, surgery, chemotherapy, bone metastatic, brain metastatic, liver metastatic, estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor-2 (HER2). These are included in the construction of the nomogram of OS and CSS. The time-dependent receiver operating characteristic curve, decision curve analysis, consistency index, and calibration curve prove the distinct advantages of the nomogram. Conclusions: Our web-based dynamic nomogram effectively integrates patient molecular subtype and sociodemographic characteristics with clinical characteristics and guidance and can be easily used. ER-Negative should receive attention in diagnosing and treating BCLM.


Background
Breast cancer is the most diagnosed cancer and the second leading cause of cancerrelated deaths among women in developed countries [1]. More than 90% of deaths caused by breast cancer are due to metastasis-related complications [2]. However, metastasis is a little-known process that begins when tumor cells are separated from the primary tumor and injected into the bloodstream [3]. These circulating tumor cells (CTCs) eventually stay in the capillary bed of distant organs and extravasate into the parenchyma through the blood vessel wall, resulting in metastatic colonies in secondary sites [4]. The bone, brain, liver, and lungs are more likely to metastasize by breast cancer, knowns as organ orientation [5]. For breast cancer patients with metastases, 30-60% of patients have lesions

Study Design and Patient Selection
We used the SEER*Stat 8.3.8 program to retrieve and download patient information from the SEER 18 database. The SEER*Stat program is a public country registration database containing cancer incidence data in 18 regions of the United States, accounting for approximately 34.6% of the total population, and detailed patient data.
The trial population included adult female breast cancer patients diagnosed with lung metastases from 2010 to 2016. We included the patients' race, age, marital status, detailed tumor node metastasis (TNM) staging, and whether they had undergone primary site radical resection, chemotherapy, ER, PR, HER2 receptor status, bone metastasis, brain metastasis, or liver metastasis. We excluded patients who had no pathological results and no complete information. Finally, 3916 patients were included in our study, and the selection flow chart is shown in Figure 1. This type of retrospective research does not require an ethics committee review. Figure 1. The flowchart of patients was identified in the study. A total of 6060 BCML patients were included from the SEER database, and we excluded 2144 patients based on exclusion criteria. The 3916 patients who met the requirements were randomly divided into training and validation cohort by 2:1. The training cohort was used to construct Norman diagrams, and the validation cohort was used to evaluate the effect of the building model. We validated our constructed Norman plots using the Calibration curve, the C-index decision curve analysis (DAC), and time-dependent ROC methods.

Statistical Analysis and Dynamic Nomogram Publication
Patients were randomly divided into 2:1 training and verification groups, 2611 and 1305 cases. Univariate Cox proportional hazards regression analysis was developed to identify independent prognostic factors to construct prognostic factors. Based on univariate analysis results (p-value < 0.1), a multivariate Cox proportional hazard regression analysis was performed to construct a nomogram of significant variables (p-value < 0.05) in the training group. We use 1-year, 3-year, and 5-year OS and CSS for analysis in the nomogram. The consistency index (C-index) is used to evaluate the discrimination ability and accuracy of the nomogram. The identification and calibration are evaluated by bootstrapping 1000 times. Decision curve analysis (DCA) is used to evaluate the advantages and benefits of our new forecasting model compared with another single factor. In addition, Figure 1. The flowchart of patients was identified in the study. A total of 6060 BCML patients were included from the SEER database, and we excluded 2144 patients based on exclusion criteria. The 3916 patients who met the requirements were randomly divided into training and validation cohort by 2:1. The training cohort was used to construct Norman diagrams, and the validation cohort was used to evaluate the effect of the building model. We validated our constructed Norman plots using the Calibration curve, the C-index decision curve analysis (DAC), and time-dependent ROC methods.

Statistical Analysis and Dynamic Nomogram Publication
Patients were randomly divided into 2:1 training and verification groups, 2611 and 1305 cases. Univariate Cox proportional hazards regression analysis was developed to identify independent prognostic factors to construct prognostic factors. Based on univariate analysis results (p-value < 0.1), a multivariate Cox proportional hazard regression analysis was performed to construct a nomogram of significant variables (p-value < 0.05) in the training group. We use 1-year, 3-year, and 5-year OS and CSS for analysis in the nomogram. The consistency index (C-index) is used to evaluate the discrimination ability and accuracy of the nomogram. The identification and calibration are evaluated by bootstrapping 1000 times. Decision curve analysis (DCA) is used to evaluate the advan-tages and benefits of our new forecasting model compared with another single factor. In addition, time-dependent ROC is used to evaluate the effectiveness of the OS and CSS of the models we built. After validation, the web-based dynamic nomograms for BCLM OS and CSS are published online through the r package "DynNom" and shinyapps.io (https://www.shinyapps.io/) (accessed on 20 December 2022).
All these statistical methods are packaged using r version 4.2.0 (http://www.r-project.org) (accessed on 20 December 2022). In the two-tailed test, statistical significance is set to p < 0.05.

Patient Characteristics
Our study excluded 2144 cases initially (lack of autopsy or death certificate diagnosis (n = 5); survival months are incomplete, or 0 (n = 482); lack of histological grade (n = 725); lacking ER, PR, and HER2 status (n = 358); T0 and Tx according to the 6th edition AJCC staging (n = 574) (Figure 1)), and 3916 cases were finally included (2611 cases were training cohort and 1305 cases were Verification cohort). Table 1 summarizes the patient characteristics. Among all BCLM patients, the median age was 62, 41.2% were married, and 72.3% were white. Pathologically, more than half of the patients were poorly differentiated tumors (55.8%), and 42.2% of patients had T4. Only 30.2% of patients underwent surgery in terms of treatment, but most patients underwent chemotherapy (62.7%). Bone metastasis is the most common in patients with BCLM (53.5%), followed by liver metastasis (26.7%). The probability of HER2 negative is the highest, more than 70.7%. In addition, there are 460 triple-negative breast cancer (TNBC) patients, accounting for 11.7%. The longest followup of the entire cohort is 82 months, the average OS is 20.9 months, and the CSS is similar to the OS, which is 21.0 months.

Establishment of a Prognostic Nomogram
Univariate and multivariate Cox regression analysis is used to calculate the weights of variables in OS and CSS (expressed as OR) and identify independent risk factors. Variables with significant differences in univariate analysis are included in the Cox regression model of multivariate analysis, in which OS and CSS are significantly related to 13 variables, namely age, marital status, race, grade, T stage, surgery, chemotherapy, bone metastatic, brain metastatic, liver metastatic, ER, PR, and HER2. Patient gender, insurance status, lymph node metastasis, and radiation therapy were excluded. (Tables 2 and 3). All important variables are integrated to build a nomogram of OS and CSS. The nomograms of 1-year, 3-year, and 5-year OS are shown in Figure 2A, and the nomograms of 1-year, 3-year, and 5-year CSS are shown in Figure 3A. After adding up the scores associated with each variable and placing the total score in the bottom tier, the probability of OS and CSS can be estimated at 1, 3, and 5 years.  each variable and placing the total score in the bottom tier, the probability of OS and CSS can be estimated at 1, 3, and 5 years.

Verification of Prognostic Nomogram
The concordance index (C-index), calibration curves, time-dependent receiver operating characteristic (ROC) curves, and decision curve analysis (DCA) methods were used to verify the superiority of our nomogram.
DCA can compare nomograms and other factors to help clinicians make beneficial decisions. Figure 2D,G shows the DCA curves of OS for nomogram and other clinical factors, and Figure 3D,G represents CSS. Compared with all other clinical factors, the DCA of the nomogram showed a superior net benefit, suggesting an excellent clinical application of the nomogram in this study.

Risk Stratification and Web-Based Dynamic Nomogram Publication
X-tile software (version 3.6.1; Yale University, New Haven, CT, USA) was used to calculate the cut-off value by adding the scores associated with each variable to obtain the total score for patients with BCLM. The cut-off values for OS are 347 and 464, and the cut-off values for CSS are 394 and 548 ( Figure 4A,B). Therefore, BCLM patients are classified as high-risk (score > 464), medium-risk (347 < score ≤ 464), and low-risk (score ≤ 347) for OS. In addition, BCLM patients were classified as high-risk (score > 548), medium-risk (394 < score ≤ 548), and low-risk (score ≤ 394) for CSS ( Figure 4C,D). X-tile software (version 3.6.1; Yale University, New Haven, CT, USA) was used to calculate the cut-off value by adding the scores associated with each variable to obtain the total score for patients with BCLM. The cut-off values for OS are 347 and 464, and the cutoff values for CSS are 394 and 548 ( Figure 4A,B). Therefore, BCLM patients are classified as high-risk (score > 464), medium-risk (347 < score ≤ 464), and low-risk (score ≤ 347) for OS. In addition, BCLM patients were classified as high-risk (score > 548), medium-risk (394 < score ≤ 548), and low-risk (score ≤ 394) for CSS ( Figure 4C,D).  Based on risk stratification, Kaplan-Meier survival curves were plotted for all BCLM patients, as shown in Figure 5. The 5-year OS rate is the highest in the low-risk group at 34.5%, followed by the intermediate-risk group at 12.9%, and the worst at 4.9% in the high-risk group ( Figure 5A). Likewise, the low-risk group had the highest 5-year CSS rate at 40.1%, followed by the intermediate-risk group with 14.5%, and the high-risk group with 4.6% ( Figure 5B). There was a statistically significant difference in survival outcomes among the three groups (p < 0.001). We finally published our web-based nomogram for predicting OS in BCLM patients ( Figure 5C) and CSS ( Figure 5D), which can be accessed at (https://nomogram-xiangyahospital.shinyapps.io/BCLMOSnomogram) (accessed on 20 December 2022) and (https://cssnomogram-xiangyahospital.shinyapps. io/DynNomapp) (accessed on 20 December 2022) to conduct your data calculations and risk analysis. A step-by-step instruction was produced to help researchers use the site (Supplementary File).

Discussion
Risk assessment tools and surveillance methods for BCLM remain limited. We constructed a nomogram based on extensive population data from the SEER database and included breast cancer molecular subtypes, treatment modalities, and sociodemographic factors. Although Xie's study found disease risk factors based on similar data, it did not build a complete risk assessment tool [12]. We developed and constructed a nomogram of an accurate scoring system with clinical value and achieved good classification results for OS and CSS in BCLM patients. We validated our model with multiple methods and found that it works well to provide comprehensive guidance for clinical practice. We published it online, so all readers can easily access and use it.
Our study found that surgical treatment is protective for OS and CSS in BCLM patients. Surgery can prolong the survival time of patients and reduce the risk of death by 29.1%. Several randomized clinical trials and retrospective studies have shown that primary tumor surgery can improve cancer survival by reducing the tumor burden of metastatic breast cancer [13][14][15]. However, selection bias cannot be ignored, such as selecting patients with smaller tumor burden, younger age, or other more favorable characteristics for surgery, which may lead to the final result being more biased towards the surgery group [16,17]. Some basic research suggests that surgical injury may upregulate genes involved in adhesion, invasion, and angiogenesis, leading to breast cancer metastasis in the lung [18]. Although the benefit of primary tumor surgery in patients with newly diagnosed breast cancer and lung metastases is unclear, our results suggest that BCLM patients undergoing surgical treatment have a survival benefit.
Our study found significant differences in the prognosis of BCLM patients of different molecular subtypes. We separately analyzed the three crucial therapeutic targets of breast cancer, ER, PR, and HER2, and the positive of the three reduced the risk of death by 48.9%, 44.2%, and 25.8%. This suggests that BCLM patients pay more attention to evaluating ER molecular targets. There is evidence of an interaction between the ER and HER2 pathways [19]. Zhao's research also found that molecular typing strongly impacts metastatic breast cancer [20]. The ER pathway can be used as a bypass activation mechanism for downstream signals of the HER2 pathway [21]. Activating the HER2 signaling pathways could further promote the activity of the ER pathway, leading to impaired endocrine therapy response and possibly changing the tumor's response to HER2-targeted therapy [22]. Some neoadjuvant clinical studies targeting anti-HER2 have shown that, compared with HER2+/ER-patients, HER2+/ER+ patients have a lower pathological complete response rate (pCR) [23]. It also suggests that our breast cancer patients with lung metastasis may be closely related to ER expression in molecular mechanisms.
In sociodemography, we found that marital status is an independent factor for BCLM patients. Marriage could reduce the risk of death by 25%, similar to our previous research on colorectal cancer [24]. However, more than 99% of breast cancer patients are women, and the hormone cycle and stable level are closely related to the occurrence and development of breast cancer [25]. Unmarried patients often lack the care and support of their spouses and are more likely to suffer from chronic psychological distress, such as depression and helplessness and unhealthy living habits, such as smoking and alcoholism [26]. Reducing psychosocial support and psychological stress will affect the immune and endocrine systems and accelerate tumor growth and patient death [27]. Furthermore, with the further development of the industrialization process, the human marriage rate may be further reduced [28]. Therefore, in clinical work, providing psychosocial support from medical workers may help improve the survival of patients.
Our study randomly selected one-third of the patients as the validation group to confirm the superiority of the nomogram in this study. The excellent results of the verification group, including the C index, time-dependent ROC curve, DCA, and calibration curve, ensure the generalizability of the new nomogram. We made our models as web-based dynamic nomograms, making our models available to everyone online. However, there are still some limitations in our research. First, as a retrospective study, the nomogram still needs to be verified by prospective studies in the future. Secondly, much information is missing in our research, such as the lack of metastatic tumor size, location, and treatment response, but with the further improvement of the database, these problems will be further solved. In addition, we need more accurate data to verify the validity of the nomogram. Despite these limitations, the findings demonstrate the clinical value of the sensitivity and specificity of our constructed nomogram.

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/10 .3390/jpm13010043/s1, File: Web-based dynamic nomograms for predicting overall survival and cancer-specific survival in breast cancer patients with lung metastases.
Author Contributions: Conception and design: K.W., Y.L., D.W. and Z.Z.; administrative support: Z.Z.; provision of study materials or patients: Y.L.; collection and assembly of data: K.W. and Y.L.; data analysis and Interpretation: K.W., Y.L. and D.W.; writing-original draft: K.W., Y.L., D.W. and Z.Z.: writing-review and editing: all authors final approval of the manuscript. All authors have read and agreed to the published version of the manuscript.
Funding: Yuqiang Li was granted for this project, supported by funds from the Natural Science Foundation of Hunan Province (No. 2022JJ40799). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: All patient files are available from the SEER database. More details supporting the findings of this study are available from the corresponding author upon request.