A Novel Prognostic Prediction Model Based on Pyroptosis-Related Clusters for Breast Cancer

Breast cancer (BC) is the most common cancer affecting women and the leading cause of cancer-related deaths worldwide. Compelling evidence indicates that pyroptosis is inextricably involved in the development of cancer and may activate tumor-specific immunity and/or enhance the effectiveness of existing therapies. We constructed a novel prognostic prediction model for BC, based on pyroptosis-related clusters, according to RNA-seq and clinical data downloaded from TCGA. The proportions of tumor-infiltrating immune cells differed significantly in the two pyroptosis clusters, which were determined according to 38 pyroptosis-related genes, and the immune-related pathways were activated according to GO and KEGG enrichment analysis. A 56-gene signature, constructed using univariate and multivariate Cox regression, was significantly associated with progression-free interval (PFI), disease-specific survival (DSS), and overall survival (OS) of patients with BC. Cox analysis revealed that the signature was significantly associated with the PFI and DSS of patients with BC. The signature could efficiently distinguish high- and low-risk patients and exhibited high sensitivity and specificity when predicting the prognosis of patients using KM and ROC analysis. Combined with clinical risk, patients in both the gene and clinical low-risk subgroup who received adjuvant chemotherapy had a significantly lower incidence of the clinical event than those who did not. This study presents a novel 56-gene prognostic signature significantly associated with PFI, DSS, and OS in patients with BC, which, combined with the TNM stage, might be a potential therapeutic strategy for individualized clinical decision-making.


Introduction
Female breast cancer (BC) has become the most commonly diagnosed cancer, with an estimated 2.3 million new cases, and is the fifth leading cause of cancer mortality worldwide, with 685,000 deaths occurring in 2020 [1]. Independently of histological subtypes, molecular subtypes were categorized as Luminal-like (including Luminal A and Luminal B), HER2positive, and triple-negative breast cancer (TNBC). These were based on tumor hormonal status, human epidermal growth factor receptor 2 (HER2) status, and proliferation marker protein Ki-67 (MKI67) status [2,3]. Based on TNM staging, breast cancer was categorized into five different stages (0, I II, III, IV) [4]. In the course of clinical personalized treatment, patients are further subdivided into advanced-stage (stages III and IV) and early-stage (stages 0, I and II) BC [5,6]. Molecular subtype and clinical staging could assist in deciding on therapeutic options for patients with BC [7], for instance, endocrine therapy for Luminallike BC [8,9], HER2-targeted therapy for HER2-positive BC [10,11], and chemotherapy for TNBC [12,13]. Moreover, some authors showed that the ADH/ALDH activities are lower in tumor cells than in normal parenchyma, suggesting that isoenzymes of ADH may

Identification of Pyroptosis-Related Clusters
Strawberry Perl software was used to merge RNA-seq files, and the Normalize Quantiles (edgeR, R package) was used to normalize RNA expression levels. A total of 51 pyroptosis-related genes were obtained from the Reactome website (https://reactome.org/ (accessed on 31 October 2021)), and 38 differential pyroptosis-related genes were determined using the R package "limma" (FDR ≤ 0.05). According to the 38 differential pyroptosis-related genes, pyroptosis-related clusters were built by the R package "Con-sensusClusterPlus (version 1.50.0, Wilkerson MD, Hayes DN. Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA)".

Bioinformatic Analysis of the Two Pyroptosis-Related Clusters
A total of 434 DEGs associated with the two pyroptosis-related clusters were identified. GO functional enrichment and KEGG pathway analysis were performed to discover biological processes overrepresented in the 434 gene list by R package "clusterProfiler", "org.Hs.eg.db", "enrichplot", "ggplot2", and "graphlayouts". To identify potential interactions between DEGs, a protein-protein interaction (PPI) network was analyzed using the String database (http://string-db.org/ (accessed on 12 November 2021)).

Risk Prognostic Model Construction and Evaluation
The dataset (N = 1025) was used to develop a clinical prognostic prediction model, and half of the dataset (N = 512) was randomly selected as a validation dataset, which was used to verify the accuracy of the RNA-based prognostic model as a predictor of PFI of patients with BC. Univariate and multivariate Cox proportional hazard models were constructed as risk prognostic models, and area under the curve (AUC) analysis was conducted to evaluate their accuracy.

Nomogram Construction
The independent predictors, evaluated by univariate and multivariate Cox regression, were included to establish a nomogram model by the R package "rms", "foreign", and "survival", aiming to evaluate the predictive power of independent predictors for 5-year, and 10-year PFI and DSS rates. Subsequently, a calibration plot was established to evaluate the accuracy of the nomogram prediction with the R package "rms" and "survival".

Statistical Analyses
Statistical analyses were conducted using SPSS statistical suite version 25.0 (IBM SPSS Statistics, Chicago, USA), Strawberry Perl version 5.32.1.1 (https://strawberryperl.com/ release-notes/5.32.1.1-32bit.html (accessed on 12 November 2021)), and R version 3.6.1 (R Project for Statistical Computing, https://www.r-project.org (accessed on 12 November 2021)). Statistical significance was defined as a two-sided p-value or adjusted p-value ≤ 0.05. The primary outcome in this study was PFI, the secondary outcome was DSS, and the event times of PFI, DSS, and OS were defined according to the guidelines for time-to-event endpoint definitions in the BC trial [27]. The PFI, DSS, and OS event times for the individual patients enrolled in this retrospective study were manually retrieved from TCGA clinical records and a previous study [28,29]. The Wilcoxon test (FDR ≤ 0.05) was used to identify the DEGs for variables of two groups using the R package "limma". The DEGs with prognostic value, identified using univariate Cox proportional hazard models (p-value ≤ 0.05), were further analyzed by multivariate Cox regression (default settings: backward, conditional, entry 0.05, removal 0.10), and 56 pyroptosis-related DEGs identified by this analysis were used to construct a formula for the calculation of prognostic risk scores. The results of univariate and multivariate Cox regression are presented as a hazard ratio (HR), with a 95% confidence interval (CI). The cut-off point was calculated by the R package "ggrisk". Next, BC cases were categorized as "low risk" or "high risk" based on risk scores being higher or lower than the cut-off point. Chi-square analysis was used to assess the correlation between BC clinicopathological features and risk subgroups. Kaplan-Meier (KM) analyses were applied to generate survival curves and the log-rank test was used to establish the significance of differences between curves. The receiver operating characteristic (ROC) curve was constructed to assess the prognostic performance of the risk score. A prognostic nomogram to predict individual survival based on the signature and clinical risk factors was constructed by Cox regression. The accuracy of the risk prognostic model was tested using AUC (95% CI) values.

Identification of Pyroptosis-Related Clusters
A total of 38 differential pyroptosis-related genes were determined between normal and tumor samples (normal, N = 113 and tumor, N = 1110) from 51 pyroptosis-related genes, and the expression levels of the 38 genes were shown using the R package "pheatmap" ( Figure 1A). Subsequently, based on the 38 differential pyroptosis-related genes, two pyroptosis-related clusters (cluster C1, N = 439; cluster C2, N = 586) were built using the R package "ConsensusClusterPlus" (Figure 1B-D), and principal components analysis (PCA) and t-Distributed Stochastic Neighbor Embedding analysis (tSNE) plots were constructed to show the two clusters with obvious alterations using the R package "Rtsne" (Figure 1E,F). KM analysis indicated that patients in the low-risk group had significantly longer PFI, DSS, and OS (PFI, p = 0.001, DSS, p = 7 × 10 −4 , OS, p < 0.0001) ( Figure 1G-I).

Identification of DEGs and Bioinformatic Analysis of the Two Pyroptosis-Related Clusters
Differential gene expression analysis was run on the two pyroptosis-related clusters using the Wilcoxon test. A total of 434 DEGs were identified and are shown in the pheatmap ( Figure 2A). GO functional enrichment ( Figure 2B) and KEGG pathway ( Figure  2C) analysis demonstrated that the DEGs were enriched in immune-related pathways, such as humoral immune response, adaptive immune response based on the somatic recombination of immune, B-cell-mediated immunity, complement activation, and immunoglobulin-mediated immune response. To further estimate the level of individual tumor-infiltrating immune cells in the two pyroptosis-related clusters, we performed CIBERSORT, and the landscape of tumor-infiltrating immune cells is shown in the barplot ( Figure 2D). Some types of tumor-infiltrating immune cells were variously distributed in different pyroptosis-related clusters, such as naïve B cells, plasma cells, T cells CD8, T cells CD4 memory resting, T cells CD4 memory-activated, T cells follicular helper, T cells gamma delta, NK cells resting, macrophages M0, macrophages M1, macrophages M2, dendritic cells resting, mast cells resting, mast cells activated, and neutrophils ( Figure 2E). Subsequently, the PPI was constructed to visualize the interactions among 434 DEGs ( Figure 2F).

Identification of DEGs and Bioinformatic Analysis of the Two Pyroptosis-Related Clusters
Differential gene expression analysis was run on the two pyroptosis-related clusters using the Wilcoxon test. A total of 434 DEGs were identified and are shown in the pheatmap ( Figure 2A). GO functional enrichment ( Figure 2B) and KEGG pathway ( Figure 2C) analysis demonstrated that the DEGs were enriched in immune-related pathways, such as humoral immune response, adaptive immune response based on the somatic recombination of immune, B-cell-mediated immunity, complement activation, and immunoglobulin-mediated immune response. To further estimate the level of individual tumor-infiltrating immune cells in the two pyroptosis-related clusters, we performed CIBERSORT, and the landscape of tumor-infiltrating immune cells is shown in the barplot ( Figure 2D). Some types of tumor-infiltrating immune cells were variously distributed in different pyroptosis-related clusters, such as naïve B cells, plasma cells, T cells CD8, T cells CD4 memory resting, T cells CD4 memory-activated, T cells follicular helper, T cells gamma delta, NK cells resting, macrophages M0, macrophages M1, macrophages M2, dendritic cells resting, mast cells resting, mast cells activated, and neutrophils ( Figure 2E). Subsequently, the PPI was constructed to visualize the interactions among 434 DEGs ( Figure 2F).

Construction of Risk Model Based on Pyroptosis-Related DEGs
The univariate Cox proportional hazard analysis identified 256 DEGs with prognostic value, 56 of which were determined by multivariate Cox regression to be the optimum prognostic models for predicting PFI risk in patients with BC ( Figure 3A). Risk scores were calculated using the formula construct according to multivariate Cox regression (Risk calculation formula). Based on the risk score, −0.956728925, which was calculated as the cut-off point, the patients were grouped into high-(N = 308) and low-(N = 717) risk groups. Patients with high-risk scores tended to present poorer clinical outcomes compared with patients with low-risk scores ( Figure 3B). The expression levels of the 56 genes are shown in a violin plots ( Figure 3C).
KM analysis indicated that patients in the low-risk group had significantly longer PFI, DSS, and OS in the training and validation datasets (all p < 0.0001) ( Figure 3D

Construction of Risk Model Based on Pyroptosis-Related DEGs
The univariate Cox proportional hazard analysis identified 256 DEGs with prognostic value, 56 of which were determined by multivariate Cox regression to be the optimum prognostic models for predicting PFI risk in patients with BC ( Figure 3A). Risk scores were calculated using the formula construct according to multivariate Cox regression (Supplementary Material: Risk calculation formula). Based on the risk score, −0.956728925, which was calculated as the cut-off point, the patients were grouped into high-(N = 308) and low-(N = 717)risk groups. Patients with high-risk scores tended to present poorer clinical outcomes compared with patients with low-risk scores ( Figure 3B). The expression levels of the 56 genes are shown in a violin plots ( Figure 3C).
KM analysis indicated that patients in the low-risk group had significantly longer PFI, DSS, and OS in the training and validation datasets (all p < 0.0001) ( Figure 3D

Clinicopathological Features
A total of 1025 female cases with BC recorded in TCGA were extracted for analysis in this study. The median patient age was 58 years (ranging from 26 to 90 years), while the median PFI was 767 days, and DSS was 825 days. The 10-year PFI rate for all analyzed cases was 87.6%, and 10-year DSS was 92.9%. BC tumor size, lymph node, and metastasis status (TNM) stage were defined as outlined by the Eighth Edition American Joint Committee on Cancer (AJCC) Staging Manual [4], and molecular subtype (PAM50) was derived from a previous report by Thorsson et al. [28]. In the age subgroup, the proportion of ≥61 y subgroup patients in the high-risk group was significantly higher than that in the ≤40 y and 41-60 y subgroup in the training dataset (χ 2 = 6.492, p = 0.040), but not in the validation dataset (χ 2 = 5.661, p = 0.059). In the molecular subgroup, the proportion of luminal A subgroup patients in the high-risk group was significantly lower than that in the other subgroup in the training dataset (χ 2 = 10.957, p = 0.027), but not in the validation dataset (χ 2 = 6.174, p = 0.187). Further, metastasis status was associated with a higher proportion of patients in the high-risk group for both the total dataset (χ 2 = 11.582, p = 0.001) and validation dataset (χ 2 = 7.243, p = 0.011). The demographic and clinical, pathologic characteristics of the patients with breast cancer are shown in Table 1.

Clinicopathological Features
A total of 1025 female cases with BC recorded in TCGA were extracted for analysis in this study. The median patient age was 58 years (ranging from 26 to 90 years), while the median PFI was 767 days, and DSS was 825 days. The 10-year PFI rate for all analyzed cases was 87.6%, and 10-year DSS was 92.9%. BC tumor size, lymph node, and metastasis status (TNM) stage were defined as outlined by the Eighth Edition American Joint Committee on Cancer (AJCC) Staging Manual [4], and molecular subtype (PAM50) was derived from a previous report by Thorsson et al. [28]. In the age subgroup, the proportion of ≥61 y subgroup patients in the high-risk group was significantly higher than that in the ≤40 y and 41-60 y subgroup in the training dataset (χ 2 = 6.492, p = 0.040), but not in the validation dataset (χ 2 = 5.661, p = 0.059). In the molecular subgroup, the proportion of luminal A subgroup patients in the high-risk group was significantly lower than that in the other subgroup in the training dataset (χ 2 = 10.957, p = 0.027), but not in the validation dataset (χ 2 = 6.174, p = 0.187). Further, metastasis status was associated with a higher proportion of patients in the high-risk group for both the total dataset (χ 2 = 11.582, p = 0.001) and validation dataset (χ 2 = 7.243, p = 0.011). The demographic and clinical, pathologic characteristics of the patients with breast cancer are shown in Table 1.

56-Gene Signature Associated with Prognosis of Patients with BC
Univariate and multivariate Cox proportional hazard regression analyses for 10-year PFI indicated that a higher 56-gene risk score was correlated with a higher incidence of clinical events (univariate analysis, HR = 6.257, 95% CI: 4.331-9.039, p < 0.001; multivariate analysis, HR = 5.643, 95% CI 3.894-8.175, p < 0.001). Furthermore, univariate and multivariate Cox proportional hazard regression analyses for 10-year DSS also indicated that a higher 56-gene risk score was correlated with a higher incidence of clinical events (univariate analysis, HR = 5.520, 95% CI: 3.407-8.944, p < 0.001; multivariate analysis, HR = 4.578, 95% CI 2.797-7.494, p < 0.001). The results of univariate and multivariate Cox proportional hazard regression analyses for 10-year PFI and DSS are shown in Table 2.
Furthermore, KM survival curves for 10-year PFI, DSS, and OS showed that the highrisk group had a worse prognosis in both the training (all, p < 0.0001) and validation (all, p < 0.0001) datasets ( Figure 3D

Evaluation of the Predictive Power of the Prognostic Signature
According to the AJCC cancer staging manual (eighth edition), the TNM stage is correlated with cancer prognosis [4,16,30]. Furthermore, age and intrinsic molecular subtype (PAM50) are closely linked to prognosis in patients with BC [31][32][33][34]. Furthermore, to validate the potential of the prognostic signature as a predictor of the PFI, DSS, and OS of patients with BC, the entire TCGA BC dataset was stratified by TNM stage, age, and molecular subtype. Patients were split into three age subgroups (≤40, 41-60, and ≥61 years old), three lymph node status subgroups (N0, N1, and N2-N3), three tumor size subgroups (T1, T2, and T3-T4), and five molecular subtype subgroups (PAM50, luminal A, luminal B, HER2, basal-like, and normal-like).
KM analysis indicated that patients in the low-risk group had significantly longer PFI, DSS, and OS in all three age subgroups (PFI, all subgroup, p < 0.0001; DSS, ≤40 y subgroup, p = 0.0082, 41-61 y subgroup, p < 0.0001, ≥61 y subgroup, p < 0.0001; OS, ≤40 y subgroup, p = 0.0038, 41-61 y subgroup, p < 0.0001, ≥61 y subgroup, p = 0.00057) (Figure 4A-C,G-I; Figure S1A-C). ROC curve analysis showed that the prognostic signature had good sensitivity and specificity for predicting PFI, DSS, and OS in all three age subgroups (PFI,   In the analyses of tumor size subgroups, KM curves also showed that patients in the low-risk group had a significantly better prognosis for PFI, DSS, and OS than those in the high-risk group (PFI, T1 subgroup, p < 0.0001, T2 subgroup, p < 0.0001, T3-T4 subgroup, p < 0.0001; DSS, T1 subgroup, p = 0.0026, T2 subgroup, p < 0.0001, T3-T4 subgroup, p < 0.0001; OS, T1 subgroup, p = 0.0095, T2 subgroup, p < 0.00012, T3-T4 subgroup, p < 0.0012) (Figure 5A  In the analyses of tumor size subgroups, KM curves also showed that patients in the low-risk group had a significantly better prognosis for PFI, DSS, and OS than those in the high-risk group (PFI, T1 subgroup, p < 0.0001, T2 subgroup, p < 0.0001, T3-T4 subgroup, p < 0.0001; DSS, T1 subgroup, p = 0.0026, T2 subgroup, p < 0.0001, T3-T4 subgroup, p < 0.0001; OS, T1 subgroup, p = 0.0095, T2 subgroup, p < 0.00012, T3-T4 subgroup, p < 0.0012) (Figure 5A  In KM analyses, the curves showed that patients in the low-risk group had a significantly better prognosis for PFI, DSS, and OS than those in the high-risk group for all the lymph node subgroups (PFI, all the lymph node subgroup, p < 0.0001; DSS, N0 subgroup, p < 0.0001, N1 subgroup, p = 0.0027, N2-N3 subgroup, P < 0.0001; OS, N0 subgroup, P < 0.0001, N1 subgroup, P = 0.024, N2-N3 subgroup, p = 0.0022) ( Figure 6A In the analyses of metastasis status subgroups, KM curves also showed that patients in the low-risk group had a significantly better prognosis for PFI than those in the highrisk group (M0 subgroup, p < 0.0001, M1 subgroup, p = 0.012) ( Figure 6D  In KM analyses, the curves showed that patients in the low-risk group had a significantly better prognosis for PFI, DSS, and OS than those in the high-risk group for all the lymph node subgroups (PFI, all the lymph node subgroup, p < 0.0001; DSS, N0 subgroup, p < 0.0001, N1 subgroup, p = 0.0027, N2-N3 subgroup, p < 0.0001; OS, N0 subgroup, p < 0.0001, N1 subgroup, p = 0.024, N2-N3 subgroup, p = 0.0022) ( Figure 6A  In the analyses of the five molecular subtype subgroups, KM curves also showed that patients in the low-risk group had significantly better prognosis for PFI and DSS than those in the high-risk group (PFI, normal-like subgroup, p < 0.0001, Luminal A subgroup,  In the analyses of metastasis status subgroups, KM curves also showed that patients in the low-risk group had a significantly better prognosis for PFI than those in the high-risk group (M0 subgroup, p < 0.0001, M1 subgroup, p = 0.012) ( Figure 6D In the analyses of the five molecular subtype subgroups, KM curves also showed that patients in the low-risk group had significantly better prognosis for PFI and DSS than those in the high-risk group (PFI, normal-like subgroup, p < 0.0001,  Table 3. Overall, these analyses indicate that the prognostic signature has a good predictive value.  Table 3. Overall, these analyses indicate that the prognostic signature has a good predictive value.

Nomogram Development
To apply the prognostic signature in clinical settings, based on the results of univariate and multivariate Cox proportional hazard regression analyses, nomograms were constructed to predict the PFI and DSS of BC patients at 5 and 10 years. Each risk factor corresponds to a designated point, determined by drawing a line perpendicular to the point's axis. The sum of the corresponding risk factor points located on the total points represents the probability of 5-and 10-year PFI or DSS, directly leading, straight down, to the 5-and 10-year PFI or DSS axis ( Figure 8A,B). The calibration curves demonstrated that the signature possesses high consistencies in nomogram-predicted and actual results when predicting the 5-and 10-year PFI ( Figure 8C,D) or DSS ( Figure 8E,F) rate of BC patients. Our data suggested that the nomograms for PFI and DSS exhibited a good predictive efficacy in 5-and 10-year PFI and DSS probabilities.
To apply the prognostic signature in clinical settings, based on the results of univariate and multivariate Cox proportional hazard regression analyses, nomograms were constructed to predict the PFI and DSS of BC patients at 5 and 10 years. Each risk factor corresponds to a designated point, determined by drawing a line perpendicular to the point's axis. The sum of the corresponding risk factor points located on the total points represents the probability of 5-and 10-year PFI or DSS, directly leading, straight down, to the 5-and 10-year PFI or DSS axis ( Figure 8A,B). The calibration curves demonstrated that the signature possesses high consistencies in nomogram-predicted and actual results when predicting the 5-and 10-year PFI ( Figure 8C,D) or DSS ( Figure 8E,F) rate of BC patients. Our data suggested that the nomograms for PFI and DSS exhibited a good predictive efficacy in 5-and 10-year PFI and DSS probabilities.

Relevance of the Prognostic Signature in Clinical Decision-Making
Patients were stratified into two groups for the evaluation of the AJCC stage by combining AJCC stages I and II (N = 772) into the low-clinical-risk group (marked as C-), and AJCC stage III and IV (N = 253) into the high-clinical-risk group (marked as C+) for statistical analysis. Combining the clinical-risk group and gene-risk group (the low-risk group was marked as G-, and the high-risk group was marked as G+), the total patients were classified into the following four subgroups, G-C-(N = 557), G-C+ (N = 160), G+C-(N = 215), G+C+ (N = 93). As expected, KM curves showed that patients in the G-C-subgroup had a significantly better prognosis for PFI, DSS, and OS than those in the other subgroups, and the worst was the G+C+ subgroup (PFI, p < 0.0001; DSS, p < 0.0001; OS, p < 0.0001) ( Figure 9A-C).
To further evaluate the prognostic signature's potential as a predictor of response to chemotherapy, KM analysis was performed in the four subgroups. In the G-C-subgroup, the patients who underwent adjuvant chemotherapy had a significantly better prognosis

Relevance of the Prognostic Signature in Clinical Decision-Making
Patients were stratified into two groups for the evaluation of the AJCC stage by combining AJCC stages I and II (N = 772) into the low-clinical-risk group (marked as C-), and AJCC stage III and IV (N = 253) into the high-clinical-risk group (marked as C+) for statistical analysis. Combining the clinical-risk group and gene-risk group (the low-risk group was marked as G-, and the high-risk group was marked as G+), the total patients were classified into the following four subgroups, G-C-(N = 557), G-C+ (N = 160), G+C-(N = 215), G+C+ (N = 93). As expected, KM curves showed that patients in the G-C-subgroup had a significantly better prognosis for PFI, DSS, and OS than those in the other subgroups, and the worst was the G+C+ subgroup (PFI, p < 0.0001; DSS, p < 0.0001; OS, p < 0.0001) ( Figure 9A-C).  To further evaluate the prognostic signature's potential as a predictor of response to chemotherapy, KM analysis was performed in the four subgroups. In the G-C-subgroup, the patients who underwent adjuvant chemotherapy had a significantly better prognosis than those who did not (PFI, p = 0.073; DSS, p = 0.0024; OS, p < 0.0001) ( Figure 9D-F). In the G-C+ and G+C-subgroup, the patients who underwent adjuvant chemotherapy had a significantly better prognosis for OS than those who did not, but not for PFI and DSS (G-C+, PFI, p = 0.34, DSS, p = 0.0089, OS, p = 0.00019; G+C-, PFI, p = 0.17, DSS, p = 0.23, OS, p = 0.02) ( Figure 9G-I,J-L). However, in the G+C+ subgroup, the patients who underwent adjuvant chemotherapy did not show a statistically better prognosis than those who did not (PFI, p = 0.2, DSS, p = 0.2, OS, p = 0.19) (Figure 9M-O). These results suggest that patients in the G-Csubgroup could benefit from adjuvant chemotherapy for PFI, DSS, and OS, while those in the G+C+ subgroup may not, and the patients in the G+C-and G-C+ subgroups could benefit from adjuvant chemotherapy only for OS, not for PFI and DSS.

Discussion
Breast cancer has become a serious threat to the health of women worldwide; thus, it is imperative to find an effective individualized precision therapy. Although some multigene prognosis tools have been developed to assist in clinical decision-making for patients with BC, the scope of application of these predictors has been somewhat limited; for example, Oncotype Dx and MammaPrint were developed for a specific type and clinical stage of BC [17,18]. Due to the disease heterogeneity, traditional chemotherapy, endocrine therapy, and targeted therapy struggle to achieve effective therapeutic effects for some patients with BC [35]. The role of immunity in BC is becoming clearer; immunotherapy in breast cancer is currently gaining ground, in combination with existing therapeutic strategies [19]. The development of pyroptosis, a highly inflammatory form of programmed cell death, is closely associated with the immune-related functions and infiltration of immune cells in tumors [23][24][25][26]. Therefore, more effort is needed to develop pyroptosis-related prognostic and diagnostic models for BC.
In the present study, two pyroptosis-related clusters were constructed by 38 differential pyroptosis-related genes. Further analysis found that some types of tumor-infiltrating immune cells were variously distributed in different pyroptosis-related clusters, such as naïve B cells, plasma cells, T cells CD8, T cells CD4 memory resting, T cells CD4 memoryactivated, T cells follicular helper, T cells gamma delta, NK cells resting, macrophages M0, macrophages M1, macrophages M2, dendritic cells resting, mast cells resting, mast cells activated, and neutrophils. A total of 434 DEGs were identified between the two pyroptosis-related clusters and GO functional enrichment and KEGG pathway analyses demonstrated that the DEGs were enriched in immune-related pathways, such as humoral immune response, adaptive immune response based on somatic immune recombination, Bcell-mediated immunity, complement activation, and immunoglobulin-mediated immune response. These results suggested that pyroptosis is closely related to immunity, which is in accordance with previous studies [36,37].
TNBC is characterized by high heterogeneity, high invasion, high metastatic potential, easy recurrence, and poor prognosis [13]. Indeed, great efforts have been dedicated to finding druggable targets for the personalized treatment of TNBC, leading to the discovery of potential and important targets and pathways linked to cancer development [9,43,44], immunity [45] and even pyroptosis [46]. Jiang YZ et al. classified TNBCs into four transcriptome-based subtypes based on the clinical, genomic, and transcriptomic data of a cohort of 465 primary TNBC [47], and the phase Ib/II FUTURE trial suggested a new concept for TNBC treatment, demonstrating the clinical benefit of subtyping-based targeted therapy for refractory metastatic TNBC [48]. Special attention was paid to that in this study. We also found that patients in the low-risk group had significantly better prognoses for PFI, DSS and OS than those in the high-risk group (PFI, p < 0.0001; DSS, p < 0.0001; OS, p < 0.0001). These results provide important indicators for further studies on the therapeutic value of pyroptosis in TNBC.
In the recent studies about BC, four multi-gene signatures (model) have been developed to predict prognosis [49][50][51][52]: a nine pyroptosis-associated lncRNA signature, a seven pyroptosis-related lncRNAs model, three different pyroptosis clusters and a three-gene signature. However, our model has several advantages as a predictor of prognosis in patients with BC. First, our study included 1025 female cases with BC, excluding males and cases with missing clinical information or RNA-seq data, which avoided the possibility of sex-specific effects and ensured more credible results. Second, PFI was chosen as a clinical outcome when constructing a prognostic prediction model, rather than OS, as OS is less sensitive to BC-specific progression. Third, the patient G-C-subgroup in our model could benefit from adjuvant chemotherapy for PFI, DSS, and OS, and patients in the G+C-and G-C+ subgroups could benefit from adjuvant chemotherapy in terms of OS, but not for PFI and DSS. The results could inform clinical decision-making regarding appropriate treatment strategies for patients with BC.
Notwithstanding, the study has a few shortcomings and limitations which should be acknowledged. First, it may take a long time to apply these findings to the clinic in the real world. Although limited by the follow-up time and the number of cases, our Breast Center has already started to establish a validation dataset for BC to verify the findings in this research. If the validation dataset is consistent with the results of this study, the finding will be applied to a prospective clinical study. Second, the biological functions of the 56 genes remain to be fully elucidated. Third, in the M1 subgroup (N = 16), the KM curve or ROC subgroup analyses did not reveal any significant difference for DSS and OS, and Luminal B (N = 176) and HER2 (N = 70) also showed no significant difference for OS. For the M1 subgroup, the number of clinical samples was too small, making it hard to draw a scientific conclusion. Additionally, the heterogeneity of the Luminal B and HER2 molecular subtypes, especially intratumor heterogeneity, presents substantial challenges in cancer treatment; therefore, further studies are needed to identify more accurate molecular models for these patient subgroups. Fourth, our study lacks an independent validation dataset. In the early stage of study design, we considered randomly dividing the full training set into training and validation datasets according to the different proportions, but the larger the sample size in the dataset, the higher the credibility of the established model, so the full dataset was selected for training and modeling. Although the verification dataset was randomly selected from the total dataset with an overlap in the sample points, this also could verify the reliability of the model.

Conclusions
In conclusion, we identified a novel 56-gene prognostic signature that is significantly associated with PFI, DSS and OS in patients with BC, and developed a nomogram based on this signature with a high prognostic prediction value. Moreover, our data suggested that the prognostic signature, combined with the TNM stage, might be a potential therapeutic strategy for individualized clinical decision-making. , ROC analysis showed the sensitivity and specificity of prognostic signature for predicting OS for high-and low-risk groups in the T1, T2, and T3-T4 subgroups; Figure  S2: KM and ROC curve analyses of patients stratified by lymph node status, metastasis status, and