Prognostication of Primary Tumor Location in Early-Stage Nodal Diffuse Large B-Cell Lymphoma: An Analysis of the SEER Database

Simple Summary Few studies have examined the impact of primary tumor location on clinical outcome in patients with early-stage nodal diffuse large B-cell lymphoma (DLBCL). The objective of this study was to identify the association between primary tumor location and early-stage nodal DLBCL patient prognosis using a large population-based cohort and make an effort to uncover its underlying molecular mechanism using a public database. Our result shows that the prognosis of early-stage nodal DLBCL patients with tumors growing under the diaphragm is poorer. After screening DEGs and carrying out enrichment analysis, we found early-stage nodal diffuse large B-cell lymphoma located in different sites having different genetic characteristics. These results emphasize the importance of the primary tumor site on clinical decision-making and prognosis of patients with early-stage nodal diffuse large B-cell lymphoma. Abstract The prognostic role of primary tumor location for clinical outcomes of patients with early-stage nodal diffuse large B-cell lymphoma (DLBCL) remains uncertain. We evaluated the relationship between primary tumor site and overall survival (OS) in 9738 early-stage nodal DLBCL patients from the Surveillance, Epidemiology, and End Results (SEER) database. The primary site of the tumors was characterized as supradiaphragm and subdiaphragm according to the definition of lymph node distribution in the Ann Arbor staging. The OS was significantly better for patients of the supradiaphragm group (n = 6038) compared to the ones from the subdiaphragm group (n = 3655) (hazard ratio (HR) 1.24; 95%CI: 1.16–1.33; P < 0.001), and it was preserved after propensity score matching (PSM) (HR 1.15; 95% CI: 1.07–1.24; P < 0.001). Gene enrichment analyses demonstrated that the subdiaphragm group has an upregulated extracellular matrix (ECM)-related signaling, which reportedly can promote growth, invasion, and metastasis of the cancer, and downregulated interferon response, which is considered to have anti-tumor function. Our results indicate the two tumor locations (supradiaphragm and subdiaphragm) presented different prognostic implications for the overall survival, suggesting that the tumor’s location could serve as a prognostic biomarker for early-stage nodal DLBCL patients.


Introduction
The diffuse large B-cell lymphoma (DLBCL) is the most common subtype of non-Hodgkin's lymphoma (NHL), accounting for approximately 25 percent of NHL cases [1]. cant statistical differences in the survival rates among patients with DLBCL in different sites, and lymphomas originating from the abdomen, pelvis, or thorax showed higher risk genetic characteristics. It is suggested that the location of lymphadenopathy be included in the new prognosis score system of the early-stage DLBCL [20]. From this, we can find that there is indeed a difference in the prognosis of patients with early-stage DLBCL in the different sites, nodal or extranodal patients, and the influence of the primary site should be taken into consideration when making clinical decisions. The abovementioned authors divided the data into several groups and found that there was a statistical difference in the survival rate of patients with nodal DLBCL. However, there was no difference among groups, such as between abdomen and pelvic lymph nodes, thorax lymph nodes, and Waldeyer's ring (WR), in prognosis. This kind of grouping may not be suitable for inclusion in the prognosis scoring system, and it is thus vital to know how to group the nodal DLBCL in different sites in order to obtain the best grouping mode.
Therefore, the focus of this study is to explore a feasible and simple grouping method for patients with early-stage nodal DLBCL based on the SEER database, that is, patients with nodal DLBCL in different primary sites are divided into the subdiaphragm group (Sub-DLBCL) and supradiaphragm group (Sup-DLBCL) according to the definition of lymph node distribution in Ann Arbor staging (still applicable to Lugano staging). We aim to explore the overall survival of patients with early-stage nodal DLBCL located on both sides of the diaphragm during the rituximab era, and try to identify the potential molecular mechanism between them, in an attempt to assist clinicians to better predict the prognosis of DLBCL patients with different parts and to provide a basis for clinical decision-making.

Patient Enrollment
In our cohort, the cases came from the registries of the SEER-9 cancer incidence file of the US National Cancer Institute from 1973 to 2015. We extracted cases of DLBCL from 2000 to 2014 using the ICD-O-3 code 9680, 9684/3 and 9688/3. Because our goal was to study the overall survival rate of early-stage nodal DLBCL patients on both sides of the diaphragm during the rituximab era, which was firstly approved in 1997, we only included patients diagnosed after 2000, which is a period that is expected to reflect widespread and quick use of rituximab in the treatment of lymphomas. We only included the nodal DLBCL patients and excluded patients with lymph nodes of multiple regions and lymph node, NOS. Only the stage I and stage II cases according to Ann Arbor staging with a defined primary site were included in our study. Patients that had died from reasons other than the primary tumor and patients with no survival time and no clear primary site were excluded. The detailed criterion for the inclusion and exclusion of patients is shown in Figure 1.

End Points
The primary endpoint of this study was the overall survival (OS). The OS was calculated from the beginning of the initial treatment until the time of death from any cause, or until the last follow-up time point.

Standard of Grouping
The primary site of nodal DLBCL is divided into the subdiaphragm (Sub-DLBCL) and supradiaphragm groups (Sup-DLBCL) according to the definition of lymph node distribution in Ann Arbor staging. The supradiaphragm group (Sup-DLBCL) includes lymph nodes of head, face, and neck/intrathoracic lymph nodes/lymph nodes of axilla or arm. The subdiaphragm group (Sub-DLBCL) includes intra-abdominal lymph nodes/pelvic lymph nodes/lymph nodes of inguinal region or leg. The Waldeyer's ring and tonsil are the lymphoid tissue above the diaphragm and the spleen is the lymphoid tissue below the diaphragm.

End Points
The primary endpoint of this study was the overall survival (OS). The OS was calculated from the beginning of the initial treatment until the time of death from any cause, or until the last follow-up time point.

Standard of Grouping
The primary site of nodal DLBCL is divided into the subdiaphragm (Sub-DLBCL) and supradiaphragm groups (Sup-DLBCL) according to the definition of lymph node distribution in Ann Arbor staging. The supradiaphragm group (Sup-DLBCL) includes lymph nodes of head, face, and neck/intrathoracic lymph nodes/lymph nodes of axilla or arm. The subdiaphragm group (Sub-DLBCL) includes intra-abdominal lymph nodes/pelvic lymph nodes/lymph nodes of inguinal region or leg. The Waldeyer's ring and tonsil are the lymphoid tissue above the diaphragm and the spleen is the lymphoid tissue below the diaphragm.

Statistical Analysis
The baseline characteristics of the patients were compared among the different groups using Student's t-test or an equivalent nonparametric test. The log-rank test was applied to compare the overall survival between different groups. The univariate Cox proportional hazard model was used to calculate hazard ratios (HR), stratifying for anatomical sites and adjusting for the following covariates: sex, age at diagnosis, race, and Ann Arbor stage. The 1:1 propensity score matching (PSM) was used to match the covariate proportions, mentioned above as covariates, with the cliper value being set to 0.05. Multivariate Cox regression analysis was used to explore the effect of matched primary site on its overall survival.
Statistical analysis was conducted using R programming environment version 3.6.2 (http://cran.r-project.org, accessed date: 20 March 2020), IBM SPSS Statistics version 25 (IBM Corp., Armonk, NY, USA), and GraphPad Prism version 8. P-values were all two-sided and statistical significance threshold was set at P < 0.05 if not mentioned. All confidence intervals were stated at the 95% confidence level.

Microarray Data
The gene expression profiles (RNA-sequencing expression levels) and clinical data were downloaded from The Cancer Genome Atlas (TCGA) database (https://cancergenome. nih.gov/, accessed date: 10 January 2021). Patients and samples with missing staging and primary location of the tumor were excluded. Ten supradiaphragm and four subdiaphragm patients were included from our study by applying the abovementioned exclusion criteria. This study was conducted in accordance with the publication guidelines of TCGA (http://cancergenome.nih.gov/publications/public action guidelines, accessed date: 10 January 2021).

Data Preprocessing and Differential Expression Genes Screening
R statistical software (version 3.6.2; https://www.r-project.org/, accessed date: 20 March 2020) and Bioconductor analysis tools (http://www.bioconductor.org/, accessed date: 20 March 2020) were utilized to process the raw data. The edgeR package of R was used to generate the expression matrix screen differential expression genes (DEGs) between the Sup-DLBCL group and the Sub-DLBCL group with the threshold of false discovery rate (FDR) set at 0.05, and the |log 2 -fold change| (|log 2 FC|) minimum threshold set at 2; the gene expression values were averaged when measuring the same gene in multiple probes. The differences between the two groups were statistically assessed by Student's t-test; the P-values were adjusted for multiple-testing by the Benjamini-Hochberg (BH) method.

Functional and Pathway Enrichment Analysis
Functional enrichment analysis molecular function (MF), biological process (BP), and cellular component (CC) terms was conducted using the clusterProfiler package of R statistical software. KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway enrichment analysis was also performed to identify the key biological pathways using the pathview package of R. Enrichment analyses were separately performed in the upregulated and downregulated genes. The P-value was set at <0.05 for inferring statistically significant enrichment of gene ontology (GO) or pathway terms, and the fold enrichment score was used to quantify the enrichment.

Gene Set Enrichment Analysis
The R package "clusterprofiler" was utilized to conduct gene set enrichment analysis (GSEA) to determine the statistically significant gene sets enriched in the different subgroups examined. The expression matrix of the TCGA dataset was analyzed using the annotation file "hallmark gene sets" in the Molecular Signatures Database (MSigDB). The cutoff values were set to P < 0.05, adjusted P-value < 0.05 and FDR < 0.25.

Clinical Characteristics of Patients
A total of 9738 patients with early-stage nodal DLBCL had been reported to the SEER database from 2000 until 2014. The baseline information of the patients is summarized in Table 1. Most of the patients were white and had received chemotherapy but no surgery or radiotherapy. Statistically significant differences were observed for age, race, and Ann Arbor's stage between the Sup-DLBCL and Sub-DLBCL groups, respectively. This study compared 6083 (62.5%) Sup-DLBCL patients and 3655 (37.5%) Sub-DLBCL patients, among which 5371 cases (55.2%) were males and 4367 cases (44.8%) were females. The average age of all, Sup-DLBCL, and Sub-DLBCL patients was 59.1, 56.7, and 63.2 years old, respectively. The incidence rate of Sup-DLBCL was higher than the one of Sub-DLBCL. Sub-DLBCL patients were older and had more patients over 60 years old compared to Sup-DLBCL patients. The median survival time for all, Sup-DLBCL, and Sub-DLBCL patients was 66.7

Survival Analyses
The Kaplan-Meier method was used to construct survival curve and the log-rank test was used to evaluate the differences among groups in order to explore the overall survival of patients with early-stage nodal DLBCL located on both sides of the diaphragm during the rituximab era. Table 2 presents the 5-year overall survival rate of patients with nodal DLBCL according to primary sites. Our results demonstrated that the 5-year overall survival rate was 74.8%. The 5-year OS rates of Sup-DLBCL and Sub-DLBCL were 78.0% and 69.4%, respectively (P < 0.001). The median survival time of the whole dataset was 185.0 months, while the Sup-DLBCL and Sub-DLBCL groups were undefined and 145.0 months, respectively. The overall survival of patients whose primary site was located under the diaphragm was worse than the one of the patients whose primary site was located on the diaphragm, as shown in Figure 2 and Table 2.

Propensity Score Matching
The propensity score matching (PSM) of the baseline factors of the dataset examined was carried out to reduce the selection deviation. There were 3638 matches between the two groups. Figure 3a,b depicts the distribution map and histogram of the propensity score before and after matching. After PSM, all baseline factors were matched except the Ann Arbor's stage, as shown in Table 3. Then, the matched data were analyzed with the Kaplan-Meier survival analysis and similar results were obtained. The 5-year overall survival rate and the median survival time of the whole matched data were 71.6% and 153.0 months, respectively. The 5-year OS rates of Sup-DLBCL and Sub-DLBCL were 73.5% and 69.6%, respectively (P < 0.001). The median survival time of Sup-DLBCL and Sub-DLBCL was 158.0 and 146.0 months, respectively, as shown in Table 2.

Propensity Score Matching
The propensity score matching (PSM) of the baseline factors of the dataset examined was carried out to reduce the selection deviation. There were 3638 matches between the two groups. Figure 3a,b depicts the distribution map and histogram of the propensity score before and after matching. After PSM, all baseline factors were matched except the Ann Arbor's stage, as shown in Table 3. Then, the matched data were analyzed with the Kaplan-Meier survival analysis and similar results were obtained. The 5-year overall survival rate and the median survival time of the whole matched data were 71.6% and 153.0 months, respectively. The 5-year OS rates of Sup-DLBCL and Sub-DLBCL were 73.5% and 69.6%, respectively (P < 0.001). The median survival time of Sup-DLBCL and Sub-DLBCL was 158.0 and 146.0 months, respectively, as shown in Table 2.

Adjusted Model
Multivariable Cox proportional hazard regression analysis was then performed to adjust for baseline factors that may impact the patient's disease prognosis. The unadjusted HR of patients with the primary site located under the diaphragm was 1.48 (P < 0.001, 95%CI: 1.38-1.58) before PSM, while the unadjusted HR was 1.15 (P < 0.001, 95%CI: 1.07-1.24) after PSM. Only the meaningful factors from the univariate analysis were considered in the multivariate analysis. Thus, age, stage, race, and primary site were included in the multivariate analysis. Table 4 lists the univariate analysis before and after PSM, and Table  5 lists the unadjusted and adjusted models. Multivariate Cox proportional hazard regression analysis showed that before PSM, the overall survival of patients with primary site located under the diaphragm was worse than the one of the patients whose primary site was located on the diaphragm (P < 0.001, HR = 1.24, 95%CI: 1.16-1.33) after adjusting for age, stage, and race. After PSM, the overall survival of the Sub-DLBCL patients was still worse than the one of the Sup-DLBCL patients (P < 0.001, HR = 1.15, 95%CI: 1.07-1.24).

Adjusted Model
Multivariable Cox proportional hazard regression analysis was then performed to adjust for baseline factors that may impact the patient's disease prognosis. The unadjusted HR of patients with the primary site located under the diaphragm was 1.48 (P < 0.001, 95%CI: 1.38-1.58) before PSM, while the unadjusted HR was 1.15 (P < 0.001, 95%CI: 1.07-1.24) after PSM. Only the meaningful factors from the univariate analysis were considered in the multivariate analysis. Thus, age, stage, race, and primary site were included in the multivariate analysis. Table 4 lists the univariate analysis before and after PSM, and Table 5 lists the unadjusted and adjusted models. Multivariate Cox proportional hazard regression analysis showed that before PSM, the overall survival of patients with primary site located under the diaphragm was worse than the one of the patients whose primary site was located on the diaphragm (P < 0.001, HR = 1.24, 95%CI: 1.16-1.33) after adjusting for age, stage, and race. After PSM, the overall survival of the Sub-DLBCL patients was still worse than the one of the Sup-DLBCL patients (P < 0.001, HR = 1.15, 95%CI: 1.07-1.24).

Identification of DEGs
Differential expression genes (DEGs) between Sup-DLBCL group and Sub-DLBCL group were then identified using data from the TCGA database. A total of 130 DEGs were identified using the "edgeR" package with the following thresholds: FDR < 0.05 and |log 2 FC| > 1. 104 (80%) upregulated and 26 (20%) downregulated genes between the Sup-DLBCL group and the Sub-DLBCL group were found. The top 10 upregulated genes and 10 downregulated genes identified in the gene expression microarray study are presented in Table 6. In addition, volcano plots were generated to visualize the distribution of the DEGs and a heatmap with a dendrogram for clustering the DEGs was illustrated using the "ggplot2" package in R software (Figure 4a,b).

GO and KEGG Pathway Analysis for DEGs
Functional enrichment analysis was applied to these DEGs to interrogate the biological functions of DEGs ( Figure 5). Network-forming collagen trimer/collagen network/basement membrane collagen trimer were the most enriched GO terms in CC, and delayed rectifier potassium channel activity/extracellular matrix structural constituent/extracellular matrix structural constituent conferring tensile strength were the most enriched GO terms in MF for the upregulated genes, while no term was found enriched in BP and KEGG. Late endosome/primary lysosome/azurophil granule was significantly enriched in CC, detoxification of copper ion/stress response to copper ion/cellular response to copper ion was significantly enriched in BP, and only mineral absorption was significantly enriched in KEGG for the downregulated genes, while the remaining terms were insignificant in MF.

GO and KEGG Pathway Analysis for DEGs
Functional enrichment analysis was applied to these DEGs to interrogate the biological functions of DEGs ( Figure 5). Network-forming collagen trimer/collagen network/basement membrane collagen trimer were the most enriched GO terms in CC, and delayed rectifier potassium channel activity/extracellular matrix structural constituent/extracellular matrix structural constituent conferring tensile strength were the most enriched GO terms in MF for the upregulated genes, while no term was found enriched in BP and KEGG. Late endosome/primary lysosome/azurophil granule was significantly enriched in CC, detoxification of copper ion/stress response to copper ion/cellular response

GSEA Analysis of the Gene Expression Files in the Sub-DLBCL Group Versus the Sup-DLBCL Group
The GSEA analysis was performed with the expression profile using a predefined "hallmark signature". Seven gene sets were significantly enriched in the Sup-DLBCL group compared to the Sub-DLBCL group, with the top three including interferon gamma response (P.adjust = 0.008, NES = −2.005), interferon alpha response (P.adjust = 0.008, NES = −2.015), and inflammatory response (P.adjust = 0.008, NES = −2.230) ( Figure 6).

GSEA Analysis of the Gene Expression Files in the Sub-DLBCL Group versus the Sup-DLBCL Group
The GSEA analysis was performed with the expression profile using a predefined "hallmark signature". Seven gene sets were significantly enriched in the Sup-DLBCL group compared to the Sub-DLBCL group, with the top three including interferon gamma response (P.adjust = 0.008, NES = −2.005), interferon alpha response (P.adjust = 0.008, NES = −2.015), and inflammatory response (P.adjust = 0.008, NES = −2.230) (Figure 6).

Discussion
In the present study, clinical outcomes of early-stage nodal DLBCL were analyzed to investigate the prognostic impact of tumor location by classifying lymphoma into two groups: Sup-DLBCL and Sub-DLBCL. Our findings suggested that Sub-DLBCL was associated with worse 5-year OS compared to Sup-DLBCL (HR = 1.24; P < 0.001). The incidence of DLBCL under the diaphragm was lower, the age was higher, and there were more pa-

Discussion
In the present study, clinical outcomes of early-stage nodal DLBCL were analyzed to investigate the prognostic impact of tumor location by classifying lymphoma into two groups: Sup-DLBCL and Sub-DLBCL. Our findings suggested that Sub-DLBCL was associated with worse 5-year OS compared to Sup-DLBCL (HR = 1.24; P < 0.001). The incidence of DLBCL under the diaphragm was lower, the age was higher, and there were more patients older than 60 years old in comparison with the Sup DLBL group. The OS of all patients in this study was 74.8%, in contrast to the previous studies which reported values in between 62% and 65.5% [19,21]. This difference could be attributed differences in the population's inclusion criteria. Our results suggested that the prognosis of DLBCL patients in the Sub-DLBCL group was poorer.
PSM was also performed to reduce the selection deviation in order to obtain more reliable results. All baseline factors were matched after PSM while similar results were obtained showing that the prognosis of DLBCL patients in the Sub-DLBCL group was poorer than the Sup-DLBCL group (HR = 1.15; P < 0.001). The HR became smaller but more stable after adjusting for the influence of baseline factors, indicating that the baseline factors affected the results to a certain extent, but there were still significant statistical differences in the results, demonstrating that our results are robust.
Recent advances in genomic analysis allow formation of a comprehensive insight into the molecular heterogeneity of DLBCL and in stratifying risk for DLBCL. Currently, there are multiple classification systems available, including the international prognostic index (IPI), the cell-of-origin (COO) classification, and the capture double-hit lymphoma (DHL) and related subtypes [8,11,[22][23][24]. Other attempts are under way in addition to the aforementioned categorization systems. For instance, based on the genomic aberrations, DLBCLs were divided into four subtypes or five subtype as mentioned in the background. In the present study, gene expression profile analysis using TCGA public database was conducted, including 10 Sup-DLBCL and 4 Sub-DLBCL samples, in an attempt to clarify the molecular changes between the two groups and to provide the molecular foundation for grouping the patients with early-stage nodal DLBCL. Finally, we identified 130 DEGs, with 104 upregulated and 26 downregulated. Among them, the tumor necrosis factor receptor superfamily, member 14 (TNFRSF14) is downregulated in the Sub-DLBCL, which may map this group to the previously described EZB subtype [8,10,25]. As mentioned in the introduction, the authors performed whole-exome sequencing and found that EZB subtype mainly harbored EZH2 mutations and BCL2 translocations, mapping to the previously described C3 and BCL2 clusters, and exhibits frequent inactivation of TNFRSF14, with inferior outcome to other GCB patients within GCB DLBCL. In the present study, we found that TNFRSF14 is downregulated in Sub-DLBCL by analyzing RNA-sequencing data and screening DEGs, which may have a similar biological function to its inactivation of EZB subtypes, explaining the worst outcome of Sub-DLBCL. Considering that COO and MYC/BCL2 status may not influence the outcome among patients with early-stage DLBCL treated with R-CHOP regimen [26][27][28][29][30][31], other factors (internal or external microenvironmental factors) may drive the poor prognosis of early-stage nodal DLBCL located under the diaphragm. This hypothesis, however, needs to be further confirmed in the future, as we only included early-stage nodal DLBCL patients while only RNA-sequencing data were analyzed and not DNA sequencing.
The DEGs generated by the comparison of the Sup-DLBCL and Sub-DLBCL groups were screened for further GO/KEGG analysis. The extracellular matrix-related terms were the most significantly enriched BP/CC/MF terms in upregulated genes. These genes have been reported to promote growth, survival, and invasion of cancer and to interact with fibroblast and immune cells to promote metastasis and impair treatment [32][33][34][35][36][37]. Most of the previous studies concerning the tumor microenvironment in DLBCL have focused on tumor-associated immune cells such as T cells, natural killer (NK) cells, and tumor-associated macrophages (TAMs) [38,39]. Few studies have given emphasis on extracellular matrix, an important part of tumor microenvironment. Recently, a group identified a genetic signature based on TME-related genes, comprising TIMP2, QKI, LCP2, LAMP2, ITGAM, CSF3R, and AAK1 [40], by calculating the abundance of immune-stromal components, and from this group they further obtained differentially expressed genes. In our research, we found that extracellular matrix structural constituents such as the networkforming collagen trimer and collagen type IV trimer were upregulated in the Sub-DLBCL group, which may contribute to the development of DLBCL located under diaphragm. In fact, studies have shown the effect of these extracellular matrix structural constituents on tumor progression. A study demonstrated that collagen IV can mediate LOXL2 to stimulate angiogenesis [41]. The type I-trimer collagen was found to be expressed in ductal infiltrating carcinomas and promote migration of tumor cells [42]. Our results suggested that the overexpression of extracellular matrix structural constituent may have contributed to the development of DLBCL located under diaphragm. More focus in the future should be given on the role of the extracellular matrix on early-stage nodal DLBCL. However, no significant KEGG pathway term was identified as significantly enriched in the upregulated genes. GO/KEGG enrichment analyses indicated that the downregulated DEGs were mainly manifested in ion balance in vivo, such as zinc and cadmium ions. Maintenance of ion homeostasis is primarily mediated by ion channels, which has been reported to be a tumor suppressor or an oncogene associated with tumor development [43,44]. Moreover, in our study, genes related to zinc, cadmium, and other ion homeostasis were downregulated in the Sub-DLBCL group with poor prognosis, which suggests that the ion homeostasis may be involved in early-stage nodal DLBCL as a tumor suppressor. At present, its role in DLBCL still remains unclear, and this is the part we should put more focus on in the future.
Besides the analysis of DEGs of Sup-DLBCL group versus Sub-DLBCL group set, this study also provides insightful GSEA results. Seven gene sets were enriched in the Sup-DLBCL group and not in the Sub-DLBCL group, including interferon gamma response, interferon alpha response, inflammatory response, myogenesis, oxidative phosphorylation, and complement and TNFα_signaling via NFκB. Interferons (IFNs) are cytokines that have antiviral, antitumor, and immunomodulatory properties, mediating immune response. IFN-α is a member of the type I IFN family, while IFN-γ is the lone member of the type II IFN family. IFN-α and IFN-γ are considered to have proapoptotic, anti-proliferative, and immune-related functions, such as promote myeloid cell activation and antigen presentation, which catalyzes the tumor's elimination [45][46][47][48]. Sistigu et al. suggested that neoplastic cells can engage in a type I IFN response early after exposure to anthracyclines, and the therapeutic activity of anthracyclines relies on type I IFN signaling in neoplastic cells [49]. Although IFN-γ induces a broad spectrum of tumor-protective mechanisms, reason study has proved that IFN-γ-dependent senescence induction is a key mechanism required to protect against those cancer cells that escape from cytotoxicity [50]. CC-122, a new chemical entity termed as pleiotropic pathway modifier, has been reported to have anti-tumor effect in both the activated B-cell (ABC) and the germinal center B-cell DLBCL, mainly by inducing the degradation or short hairpin RNA-mediated knockdown of Aiolos and Ikaros. Moreover, it has been found to correlate with the increased transcription of interferon (IFN)-stimulated genes independent of IFN-α, -β, and -γ production and/or secretion [51]. Our results demonstrated that the Sup-DLBCL seems to have a more active interferon response presenting a better response to R-CHOP-based immunochemotherapy and a better prognosis than the patients with Sub-DLBCL. This result may imply an important method to stratify early-stage nodal DLBCL patients based on the level of interferon and a potential therapeutic target suggesting the potential future inclusion of IFN-γ or IFN-γ analogue in combined therapies for early-stage nodal DLBCL patients with tumors under the diaphragm. Further experiments and more effort are required to validate the therapeutic role of IFN-γ or IFN-γ analogue and to unravel the specific differentiated molecular biological mechanism between groups.
To the best of our knowledge, this is the first study providing a comprehensive descriptive analysis of patients with early-stage DLBCL based on the definition of lymph node distribution in Ann Arbor staging. The results of the present manuscript demonstrated that the Sub-DLBCL group presents worse prognosis, which confirms that when dealing with early-stage nodal DLBCL with tumors located under the diaphragm, we should be more vigilant, monitor more closely recurrence after treatment, and try to achieve cure or long-term survival. Currently, many studies have compared the clinical characteristics and survival differences between nodal and extranodal sites and between patients with different extranodal sites [21,52,53], but the comparisons of patients with different nodal sites remain limited. In the present manuscript, we thus mainly compare the survival differences of patients with different nodal sites and explore its underlying molecular mechanism. At present, it is known that the prognoses of the Sup-DLBCL and the Sub-DLBCL are different, perhaps partly because the diagnosing of nodal DLBCL under the diaphragm is more difficult and the symptoms are not specific, making the diagnosis more difficult. Our study provided additional evidence for these differences between Sup-DLBCL and Sub-DLBCL, attributing them to molecular mechanisms such as extracellular matrix and the regulation of inflammatory response.
This study has specific limitations since it is based on publicly available data: (1) Despite the use of propensity scores matching to control the selection bias for clinical observation studies, selection bias still exists. (2) We cannot determine the type of chemotherapy, the duration of chemotherapy, and the patient's response status to treatment; the progression-free survival period cannot be obtained due to the limitation of the database, and thus we cannot conclude about the difference in response status of the two groups of patients to treatment. (3) There are no records of baseline performance status, B-symptoms, bulk disease, and lactate dehydrogenase levels in the SEER database, while limited information is available about treatments. Therefore, adjustments for these potential prognostic factors are not feasible.

Conclusions
Despite these limitations, our study has shown that the primary site of early-stage nodal diffuse large B-cell lymphoma affects its prognosis, and the prognosis of nodal DLBCL patients with tumors growing under the diaphragm is poorer. We screened DEGs by analyzing RNA-expression profiling data from the TCGA database and found several pathways, suggesting that the genetic characteristics between the different primary site of early-stage nodal diffuse large B-cell lymphoma is different. In clinical practice, the primary site of the nodal DLBCL growth should be considered comprehensively and the optimal treatment should be selected with reference to the primary site.  Data Availability Statement: Publicly available datasets were analyzed in this study. These data can be found here: https://cancergenome.nih.gov/, accessed date: 20 March 2020; https://seer.cancer. gov/, accessed date: 20 March 2020.