1. Introduction
Chronic rhinosinusitis with nasal polyps (CRSwNP) is a heterogeneous inflammatory disease of the upper airway. It typically presents with nasal obstruction, rhinorrhea, and olfactory dysfunction, and it can markedly impair health-related quality of life (HRQoL) [
1,
2,
3]. Although current treatment includes intranasal and systemic corticosteroids, endoscopic sinus surgery, and, more recently, biologics targeting type 2 inflammation, many patients with severe disease still relapse and require repeated systemic treatment or revision surgery [
1,
4,
5,
6]. These ongoing clinical challenges reflect the biological heterogeneity of CRSwNP, which involves immune endotypes, epithelial dysfunction, and tissue remodeling [
1,
7,
8].
Eosinophilic subclassification remains one of the most widely used ways to stratify CRSwNP. In many studies, eosinophilic disease is associated with stronger type 2 immune signals, a higher risk of recurrence, and possible differences in treatment response [
9,
10,
11,
12]. However, eosinophilic classification is not standardized across studies or regions. The JESREC framework, for example, combines clinical and radiologic features with peripheral blood eosinophilia to identify eosinophilic CRS [
10], whereas other approaches rely more directly on tissue eosinophil proportion or eosinophil counts per high-power field, with variable cutoffs that may also relate to recurrence risk [
11,
12,
13]. The updated Chinese CRS guideline also notes the lack of a unified standard and summarizes several commonly used definitions [
9]. Comparative studies further suggest that different tissue-eosinophilia thresholds identify overlapping, but not identical, clinical and immunologic states. Taken together, eosinophilic labels remain clinically useful, but they are only imperfectly standardized surrogates of the underlying tissue biology [
12,
13].
CRSwNP pathology is unlikely to be explained by eosinophil burden alone. Growing evidence points to epithelial barrier impairment, abnormal epithelial repair, and stromal remodeling as key features of the disease that go beyond traditional histologic classification [
14,
15,
16]. Single-cell, spatial, and multi-scale transcriptomic studies also support this view, showing heterogeneous immune–epithelial interactions and remodeling programs across tissue states. These findings suggest that nominal histologic eosinophilic labels capture only part of the molecular heterogeneity of CRSwNP, including epithelial dysfunction, remodeling, and immune–epithelial organization [
8,
16,
17]. We therefore asked how closely nominal histologic eosinophilic labels correspond to a broader multi-axis molecular burden framework across public single-cell, spatial, and bulk CRSwNP datasets, and whether molecular stratification could provide information beyond conventional pathology-based subclassification. We hypothesized that eosinophilic labels would still be informative, but incomplete, meaning that molecular burden would vary in a graded manner and align only partly with nominal eosinophilic categories across cohorts and platforms [
8].
2. Materials and Methods
2.1. Study Design
This study integrated public transcriptomic datasets encompassing discovery single-cell RNA sequencing (scRNA-seq), independent scRNA-seq validation, GeoMx digital spatial profiling, and bulk transcriptomic replication cohorts. The aim was to determine how closely nominal histologic eosinophilic labels correspond to a broader multi-axis molecular burden framework across datasets and platforms.
2.2. Public Datasets and Overall Analytical Framework
The study comprised four analytical layers. First, we used a discovery scRNA-seq cohort [
18] to construct the formal four-axis molecular burden framework and define core epithelial states and transcriptional programs. Second, we analyzed an independent validation scRNA-seq cohort as a conceptually aligned external support layer for selected epithelial and immune features. Third, a GeoMx digital spatial profiling dataset [
16] provided orthogonal compartment-resolved support using platform-adapted modules. Finally, independent bulk transcriptomic cohorts served as a framework-level replication layer to evaluate incomplete concordance between nominal eosinophilic labels and molecular burden in both label-explicit cohorts, including GSE72713 [
19] and Ishino [
20], and control-versus-polyp datasets.
2.3. Construction of the Molecular Burden Framework
The formal four-axis molecular burden framework was constructed only in the discovery scRNA-seq cohort (n = 21 samples: Control, 5; CRSsNP, 5; non-eosinophilic CRSwNP, 5; eosinophilic CRSwNP, 6), where sample-level pseudobulk profiles allowed all prespecified dimensions to be evaluated in the same analytical layer. The framework was designed as an exploratory transcriptomic representation of recurrent CRSwNP pathobiology, not as an exhaustive classification system or a clinically validated score. We selected four dimensions based on prior evidence that CRSwNP tissue organization reflects inflammatory context, epithelial injury and repair, stromal/extracellular-matrix remodeling, and epithelial barrier or host-defense dysfunction.
Gene sets were obtained with “msigdbr” from MSigDB. Type 2 inflammatory context was represented by “GOBP_TYPE_2_IMMUNE_RESPONSE”; epithelial injury/remodeling by “HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION”; extracellular-matrix remodeling by “REACTOME_EXTRACELLULAR_MATRIX_ORGANIZATION”; and epithelial barrier/defense by “HALLMARK_APICAL_JUNCTION”, “HALLMARK_APICAL_SURFACE”, and “GOBP_DEFENSE_RESPONSE_TO_BACTERIUM”. Gene-set identifiers and the number of genes retained after filtering to the discovery pseudobulk matrix are provided in
Supplementary Table S1.
Single-cell counts were aggregated by sample to generate pseudobulk expression matrices. Pseudobulk counts were normalized using trimmed mean of M-values (TMM) normalization and transformed to log counts per million. Signature scores were calculated with GSVA using the ssGSEA method and were z-scaled across samples. Barrier integrity and antibacterial defense were treated as protective programs; therefore, their signs were inverted and averaged to generate a burden-oriented barrier/defense impairment score. The discovery composite molecular burden was then defined as the unweighted mean of the four standardized burden-oriented dimensions: type 2 inflammation, epithelial injury/remodeling, extracellular-matrix remodeling, and barrier/defense impairment. Equal weighting was chosen as a parsimonious analytical choice to avoid assigning unsupported biological weights to individual axes. Samples were ranked by this composite score and, when needed for visualization, grouped into tertiles. These tertiles were operational strata for displaying graded molecular organization and were not intended to define fixed biological thresholds.
2.4. Consistency and Sensitivity Analyses
We evaluated the discovery burden representation using several descriptive consistency and sensitivity analyses. First, we calculated component-to-composite and inter-component correlations to assess internal consistency and overlap among the four prespecified dimensions. Second, we performed leave-one-component-out (LOCO) analyses, recalculating the composite burden after sequential removal of each component. Because LOCO correlations are partly expected when standardized correlated components are averaged, these analyses were interpreted as internal consistency checks rather than independent proof of robustness.
To directly test whether the burden ordering depended on specific weighting choices, we performed alternative burden formulation analyses. These included an epithelial injury/remodeling downweighted composite, a type 2 inflammation downweighted composite, and a rank-based composite using the average rank across the four burden-oriented components. We compared each alternative formulation with the original discovery composite using Pearson and Spearman correlations and assessed the overlap of the high-burden tertile. We also examined median-based burden grouping and alternative rank-based stratification as additional checks of the discovery burden pattern.
2.5. Discovery Epithelial Single-Cell Analysis
To identify cellular correlates of disease burden, we extracted epithelial cells from the discovery cohort and analyzed them as a dedicated Seurat object. We then mapped sample-level burden assignments back to individual cells using matched identifiers. Discovery epithelial-state annotation was guided by canonical marker expression together with prior nasal/CRS single-cell epithelial atlases defining related basal, secretory, goblet, glandular, and ciliated compartments and their disease-associated remodeling states [
21]. State composition was quantified as the cellular proportion of each state within each sample.
We evaluated functional epithelial programs using the AddModuleScore function in Seurat [
22]. Unlike the inverted scaling used for the composite framework, we preserved these module scores in their native direction to accurately reflect underlying transcriptional activity. Scores were averaged across epithelial cells at the sample level prior to group comparisons. For trajectory inference using Slingshot [
23], basal cells were used as the root based on their progenitor-like identity in prior CRS epithelial single-cell atlases and established airway epithelial lineage organization. Glandular and ciliated states were treated as terminal branches, and glandular-branch marker dynamics were examined as supportive evidence for epithelial state organization while recognizing that inferred trajectory orientation is conditional on root designation. To provide tissue-level qualitative support for these epithelial remodeling-associated findings, we examined SERPINB4 immunofluorescence in relation to KRT5-defined basal/progenitor-associated epithelial compartments. Immunofluorescence staining was performed using anti-KRT5 antibody (Abcam, Cambridge, UK; ab52635; rabbit monoclonal; clone EP1601Y; 1:100) and anti-SERPINB4 antibody (OriGene Technologies, Rockville, MD, USA; UM500016; mouse monoclonal; clone UMAB16; 1:100). Nuclei were counterstained with DAPI (1 μg/mL; Thermo Fisher Scientific, Waltham, MA, USA; D1306), and sections were mounted with antifade mounting medium (Beyotime Biotechnology, Shanghai, China; P0126). Images were acquired using an Olympus BX53 fluorescence microscope (Olympus Corporation, Tokyo, Japan) and analyzed using ImageJ software (version 1.54g; National Institutes of Health, Bethesda, MD, USA;
https://imagej.nih.gov/ij/). In this qualitative panel, SERPINB4 was interpreted as a remodeling/pathology-associated epithelial signal, with KRT5 serving as a contextual basal/progenitor anchor rather than as a stand-alone classification system.
2.6. Independent Validation scRNA-Seq Analysis
The independent validation scRNA-seq cohort was used as a conceptually aligned external support layer rather than as a de novo reconstruction of the four-axis discovery composite. Exact transfer of the discovery framework was not feasible because the validation dataset did not provide explicit eosinophilic subclassifications and did not contain the same full set of compartments needed to reconstruct all four discovery axes, including a fibroblast/stromal compartment comparable to the extracellular-matrix component in the discovery framework. We therefore retained the source study’s tissue and disease annotations and mapped them to burden-oriented comparison groups to test whether the main epithelial and immune features identified in the discovery analysis showed compatible behavior in an external dataset.
Within the epithelial compartment, we calculated module scores for wounding, barrier-associated, and secretory/remodeling-associated programs, together with basal, ciliated, and secretory-state modules. Wounding and barrier-associated programs were derived from public gene sets where possible, whereas state-associated modules used curated marker panels suitable for the validation epithelial atlas. Scores were summarized at the sample level before statistical comparison. Within immune cells, analyses focused on a curated myeloid remodeling panel and a curated type 2 T/NK context panel. These validation analyses were interpreted as external support for selected framework-related features rather than exact replication of the discovery composite burden score.
2.7. GeoMx Spatial Profiling Analysis
The GeoMx digital spatial profiling dataset was analyzed as an orthogonal compartment-resolved support layer based on multiplex digital spatial profiling methodology [
24]. Because the GeoMx dataset used predefined epithelial, immune, and macrophage-enriched compartments and did not provide a fibroblast/stromal compartment comparable to the discovery extracellular-matrix axis, the four-axis discovery composite could not be transferred directly. Instead, we evaluated platform-adapted, conceptually aligned modules within their corresponding spatial compartments: epithelial injury/remodeling and epithelial barrier programs in epithelial regions, macrophage remodeling in macrophage-enriched regions, and type 2 immune context in immune regions. Region of interest (ROI)-level summaries were used as the analytical units. Cross-compartment relationships were assessed descriptively to determine whether epithelial injury/remodeling features showed partial coupling with macrophage remodeling or type 2 immune context across ROIs.
2.8. Bulk Transcriptomic Analysis
Bulk transcriptomic cohorts were used as a framework-level replication layer using platform-adapted signatures. Two label-explicit cohorts, GSE72713 (n = 9; Control, 3; non-eosinophilic CRSwNP, 3; eosinophilic CRSwNP, 3) and Ishino (n = 26; source groups Control, 6; nonECRS, 8; ECRS/Asp, 12, treated here as nominally non-eosinophilic and eosinophilic CRSwNP labels), were used to examine whether nominal non-eosinophilic and eosinophilic CRSwNP labels showed incomplete concordance with bulk molecular burden. Additional control-versus-polyp cohorts included GSE179265 (CT, 7; NP, 17), GSE136825 (CT, 28; NP, 42), and GSE36830 (CT, 6; NP, 12) to assess broader case-versus-control generalizability. Because bulk RNA-seq averages multiple cell types and does not provide compartment-resolved fibroblast, epithelial, myeloid, or T/NK states, these analyses were not treated as an exact reconstruction of the discovery scRNA-seq four-axis composite.
The main bulk analysis used four conceptually aligned signatures. ‘Type2_Context’ was a curated type 2 context panel (‘IL4’, ‘IL5’, ‘IL13’, ‘HPGDS’, ‘GATA3’). ‘Myeloid_Remodeling’ was a curated myeloid/remodeling panel (‘CD68’, ‘CCL18’, ‘MMP9’, ‘FN1’, ‘SPP1’). ‘Wounding_Program’ was based on the public ‘GOBP_RESPONSE_TO_WOUNDING’ gene set, and ‘Barrier_Associated_Program’ was based on ‘HALLMARK_APICAL_JUNCTION’. Signature scores were calculated by ssGSEA and z-scaled within each cohort. A platform-adapted bulk composite was calculated as the sum of scaled type 2 context, wounding, and myeloid remodeling scores minus the scaled barrier-associated score. This bulk composite was used to evaluate directionality and relative effect magnitude rather than to claim exact identity with the discovery scRNA-seq composite.
To assess whether bulk conclusions depended on selected curated panels, we performed alternative-signature sensitivity analyses. These alternative definitions were used only for sensitivity analyses and did not replace the main bulk analysis. The type 2 context panel was replaced by a public type 2 immune response gene set, and the myeloid panel was expanded to include additional myeloid/remodeling markers (‘CD163’, ‘MRC1’, ‘MSR1’, ‘POSTN’) in addition to the main myeloid genes. We then recalculated alternative bulk composites and compared original and alternative scores using Spearman correlations, effect sizes, and high-burden tertile overlap.
2.9. Statistical Analysis
The sample or ROI, rather than individual cells, was used as the primary analytical unit for burden and module comparisons. In the discovery framework, scRNA-seq data were aggregated to sample-level pseudobulk profiles before GSVA/ssGSEA scoring. In epithelial, immune, and validation scRNA-seq analyses, cell-level module scores were summarized at the sample level before group comparisons. In GeoMx analyses, ROI-level or compartment-resolved ROI summaries were used. This strategy was used to reduce pseudoreplication arising from treating cells or subregions from the same biological sample as independent observations.
Pairwise group comparisons were performed primarily using two-sided Wilcoxon rank-sum tests. P values were adjusted for multiple testing using the Benjamini–Hochberg false discovery rate method where applicable. Effect sizes were summarized using Cohen’s d for case-versus-control or high-versus-low comparisons. For the bulk multi-cohort effect-size summary, 95% confidence intervals for Cohen’s d were estimated by bootstrap resampling. Correlations were assessed using Pearson or Spearman correlation as appropriate. Spatial cross-compartment correlations were interpreted descriptively. Analyses and visualization were performed in R (version 4.2.2; R Foundation for Statistical Computing, Vienna, Austria;
https://www.r-project.org/) using Seurat (version 4.4.0;
https://satijalab.org/seurat/), GSVA (version 1.46.0), edgeR (version 3.40.2), msigdbr (version 25.1.1), ggplot2 (version 4.0.2), ggpubr (version 0.6.3), dplyr (version 1.2.0), tidyr (version 1.3.2), Slingshot (version 2.6.0), and related packages.
4. Discussion
Our integrative multimodal analyses suggest that nominal eosinophilic labels in CRSwNP, while clinically informative, do not fully capture the molecular complexity of diseased tissue. Across single-cell, spatial transcriptomic [
24], and bulk layers, the most reproducible disease-burden signals centered on epithelial injury and myeloid remodeling axes. This pattern aligns with growing evidence that CRSwNP tissue biology is shaped not solely by conventional inflammation, but by epithelial dysfunction, remodeling, and complex immune–epithelial interactions [
14,
16,
25,
26,
27].
While eosinophilic subclassification remains clinically meaningful [
1,
2,
9], our analyses suggest that it captures only a fraction of the underlying tissue-state heterogeneity. In the discovery cohort, histologic eosinophilic labels correlated with molecular burden, but this relationship was incomplete: nominally non-eosinophilic CRSwNP samples were not restricted to the low-burden end, frequently extending into the intermediate-to-high range. This discordance was also observed across independent label-explicit bulk cohorts (GSE72713 and Ishino). In both datasets, eosinophilic samples tended to occupy the higher-burden spectrum, whereas non-eosinophilic cases were distributed more broadly, resulting in substantial overlap across traditional clinical boundaries. These findings support an exploratory, graded model of CRSwNP molecular organization rather than a strictly categorical one.
This molecular continuum is biologically plausible in light of the current classification landscape. While eosinophilic subclassification retains clinical utility, its operational definitions vary widely across studies, regions, and practice settings [
9,
10,
11]. For instance, EPOS 2020 considers tissue eosinophilia to be one indicator supporting type 2 inflammatory disease, not a universal, standalone histologic standard [
1]. Conversely, JESREC utilizes a composite framework incorporating clinical, radiologic, and blood eosinophil metrics, avoiding reliance on a single tissue threshold [
10]. Similarly, updated Chinese guidelines emphasize the absence of unified diagnostic criteria, acknowledging diverse metrics such as eosinophil proportions, counts per high-power field, and recurrence-based thresholds [
9,
11]. Geographic variations in CRS inflammatory signatures—particularly regarding type 2 prevalence and eosinophilic predominance—add another layer of complexity [
17]. Against this backdrop, the incomplete correspondence between nominal eosinophilic labels and molecular burden is compatible with underlying biological heterogeneity.
A major observation of this study is that epithelial injury and remodeling features provided the most stable disease-burden signals across analytical platforms. In the discovery single-cell dataset, higher burden was reflected mainly by epithelial state reorganization, amplified wounding programs, and barrier alterations. In the independent validation cohort, the largest shifts again centered on the wounding and barrier axes, whereas epithelial state and immune features were less uniform. The bulk datasets showed a similar pattern: composite burden exhibited the strongest and most consistent case-versus-control effect, while epithelial wounding and myeloid remodeling also demonstrated broadly concordant positive effects. By comparison, the type 2 context was weaker, and barrier features, although directionally consistent, varied more across datasets. These observations align with the emerging consensus that CRSwNP is shaped not solely by immune inflammation, but by epithelial dysfunction, altered differentiation, impaired mucosal defense, and chronic remodeling [
14,
16,
25]. Indeed, tight-junction disruption, barrier breakdown, and defective host defense have been repeatedly implicated in CRS pathophysiology [
14,
16] and likely perpetuate persistent tissue remodeling [
28]. Recent single-cell and spatial atlases similarly highlight epithelial programs and immune–epithelial crosstalk, particularly those involving basal progenitor trajectories and remodeling-linked states, as central architectural features of nasal polyp biology [
16,
18]. Our findings extend this literature by suggesting that epithelial injury and remodeling dimensions are among the more reproducible axes spanning nominal eosinophilic categories. The immunofluorescence panel provided qualitative in situ support for epithelial remodeling features, not quantitative validation.
Notably, individual modules did not perform uniformly across datasets. Type 2 inflammation remains clinically important and therapeutically actionable in severe CRSwNP, as evidenced by current guidelines and the efficacy of type 2-targeted biologics [
1,
5,
29,
30]. However, in our bulk replication cohorts, the type 2 signature was less consistent than composite burden, epithelial wounding, or myeloid remodeling. Similarly, barrier and host-defense features displayed substantial cross-cohort variance. This variability does not diminish the clinical relevance of type 2 inflammation; it indicates that tissue-level transcriptomic organization cannot be reduced to a single inflammatory axis. Different analytical platforms and cohorts may capture immune signals disparately, particularly within biologically heterogeneous CRSwNP populations [
17,
18,
25]. Furthermore, because this study relies on retrospective datasets spanning diverse sampling schemes, compartments, and assay structures, these cross-platform differences warrant cautious interpretation. Technical covariates, including tissue-site heterogeneity, cellular averaging in bulk RNA-seq, and within-individual dependence in single-cell analyses, can affect apparent effect sizes across datasets [
31].
Among non-epithelial components, the most consistent signals arose from myeloid remodeling features rather than a uniformly strong type 2 context. In the validation single-cell dataset, myeloid remodeling provided the clearest non-epithelial signal, whereas the type 2 T/NK axis offered weaker directional support. Similarly, in the spatial dataset, the strongest compartment-level support emerged from epithelial injury and macrophage remodeling profiles, while type 2 immune scores were directionally concordant but more modest. The external layers, therefore, provide directional evidence for selected framework features, without exact one-to-one replication; in particular, the validation single-cell and GeoMx datasets lacked a stromal/fibroblast compartment comparable to the discovery extracellular-matrix axis. This pattern fits mechanistic frameworks linking macrophage remodeling, coagulation cascade imbalances, and extracellular matrix organization to nasal polyp biology, including evidence for excessive fibrin deposition and impaired fibrinolysis [
28]. Recent cellular studies also connect specific myeloid states with type 2 pathophysiology and immune recruitment in eosinophilic CRSwNP [
18]. Consequently, epithelial and myeloid remodeling states appear to be relatively stable organizational dimensions within our burden framework, even when immune signals vary across platforms [
16,
18]. This tissue-state view is consistent with recent interpretations that eosinophil- and neutrophil-associated inflammation in CRS should be considered within a broader tissue context rather than treated as mutually exclusive categories [
32]. Histopathologic descriptions of CRSwNP nasal mucosa, including stromal edema, basement membrane thickening, and eosinophil-predominant inflammatory infiltrates, further fit this interpretation [
33]. Eosinophilic subclassification, therefore, remains clinically relevant, but pathology-based labels and multidimensional molecular features appear to capture partially overlapping aspects of CRSwNP tissue biology. Prospective cohorts with harmonized eosinophilic definitions, matched single-cell/spatial/bulk profiling, and quantitative tissue validation will be needed before prognostic or therapeutic utility is inferred.