Multi-Algorithm Analysis Reveals Pyroptosis-Linked Genes as Pancreatic Cancer Biomarkers

Simple Summary Pancreatic ductal adenocarcinoma (PDAC) is often diagnosed at advanced stages, resulting in limited treatment options and poor survival rates. To address this challenge, we conducted a comprehensive analysis of pyroptosis-related genes using advanced algorithms. Our study, involving 1273 PDAC cases, identified 357 pyroptosis-related genes. Notably, BHLHE40, IL18, BIRC3, and APOL1 were found to be related to unfavourable PDAC outcomes and were validated through experiments and multiple datasets. We developed a novel model and an accessible nomogram to predict PDAC prognosis. Our research enhances our understanding of PDAC and has significant implications for both research and clinical practice. Abstract Pancreatic ductal adenocarcinoma (PDAC) is often diagnosed at late stages, limiting treatment options and survival rates. Pyroptosis-related gene signatures hold promise as PDAC prognostic markers, but limited gene pools and small sample sizes hinder their utility. We aimed to enhance PDAC prognosis with a comprehensive multi-algorithm analysis. Using R, we employed natural language processing and latent Dirichlet allocation on PubMed publications to identify pyroptosis-related genes. We collected PDAC transcriptome data (n = 1273) from various databases, conducted a meta-analysis, and performed differential gene expression analysis on tumour and non-cancerous tissues. Cox and LASSO algorithms were used for survival modelling, resulting in a pyroptosis-related gene expression-based prognostic index. Laboratory and external validations were conducted. Bibliometric analysis revealed that pyroptosis publications focus on signalling pathways, disease correlation, and prognosis. We identified 357 pyroptosis-related genes, validating the significance of BHLHE40, IL18, BIRC3, and APOL1. Elevated expression of these genes strongly correlated with poor PDAC prognosis and guided treatment strategies. Our accessible nomogram model aids in PDAC prognosis and treatment decisions. We established an improved gene signature for pyroptosis-related genes, offering a novel model and nomogram for enhanced PDAC prognosis.


Introduction
Pancreatic ductal adenocarcinoma (PDAC) is one of the deadliest cancers, often diagnosed at an advanced metastatic stage [1][2][3].Despite significant advancements in treatment modalities, such as surgery, chemotherapy, and targeted therapies, the 5-year overall survival rate for PDAC patients in 2022 remained dishearteningly low at 11% [4].This bleak prognosis primarily arises from 90% of PDAC cases being identified in advanced stages, extending beyond the pancreas and spreading systemically, with over 50% developing metastases [4].This dire situation underscores the urgent need for new biomarkers to assist in PDAC risk assessment and to identify novel therapeutic targets.
Recent studies have shed light on the detrimental effects of first-and second-line chemotherapeutic drugs, which induce pyroptosis and exacerbate the progression and Cancers 2024, 16, 372 2 of 17 chemoresistance of PDAC [5].Consequently, pyroptosis-related gene signatures have emerged as valuable tools for predicting prognosis in PDAC [6][7][8][9].Pyroptosis, a relatively newfound form of inflammatory caspase-induced lytic programmed cell death, is notably activated in infected cells to eliminate pathogenic niches, incite inflammation, and attract immune cells [10].Characterised by cell swelling, membrane rupture, the release of cellular contents, and initiation of a potent inflammatory response [11,12], pyroptosis is initiated by inflammatory caspases-1/4/5/11, which cleave and activate gasdermin D (GSDMD) to execute the process.The active N-terminal fragment of GSDMD binds to membrane lipids, disrupting membrane integrity and forming pores, ultimately causing changes in cell osmotic pressure, cellular swelling, and membrane rupture [13].This cascade of events includes the maturation and secretion of numerous proinflammatory cytokines like IL-18 and IL-1β, which activate active and passive immunity, fostering a robust inflammatory response [14].
Pyroptosis plays a pivotal role in the immune response against infections, especially viral ones like HIV and COVID-19 [15,16].However, its role in cancer development and treatment is complex, influenced by factors such as tumour heterogeneity, biological behaviours, and epigenetic characteristics [17].Studies have shown that pyroptosis can contribute to the inflammatory microenvironment of tumours, promoting tumour cell growth and invasion [14].Conversely, pyroptosis has been demonstrated to activate the immune response and enhance the effectiveness of immunotherapy.Moreover, various chemotherapy agents, such as decitabine (DAC), iron oxide, and glucose oxidase, have shown promise in inducing pyroptosis in cancer cells, thus triggering antitumour immunotherapeutic responses [17].Nevertheless, our understanding of the function of pyroptosis in PDAC, particularly its impact on prognosis, remains limited.In this study, we harnessed natural language processing (NLP) to analyse the pyroptosis-related literature within the PubMed database comprehensively.
Our investigation focused on identifying research hotspots and extracting pertinent genes related to pyroptosis.Subsequently, we performed a meta-analysis using publicly available PDAC sequencing data, explicitly targeting pivotal genes involved in pyroptosis.Building upon this analysis, we developed a prognostic risk model based on the upregulation of four leading candidate pyroptosis-related genes to predict PDAC prognosis more accurately.Ultimately, our research underscores the significant role of heightened pyroptosis in PDAC prognosis.

Retrieval and Downloading of Pyroptosis-Related Publications
To access publications related to "Pyroptosis", we employed the pubquery package in R (version: 4.2.1).This package facilitated the retrieval and downloading of pertinent PubMed publications.A comprehensive record of search results in XML format, spanning publications until 31 December 2022, was obtained.Excel (Microsoft Corporation, Redmond, WA, USA) and R were primarily used for visualisation.

Natural Language Processing (NLP) and Latent Dirichlet Allocation (LDA)
To extract detailed publication data, including publication year, region, abstract, and research type, the Python programming language (version 3.11.1),known for its efficiency in object-oriented programming, was employed.The LDA technique was utilised to discern research topics covered in the publications.For this analysis, we set the number of identified topics to 50, considering factors such as appropriate perplexity, redundancy, and legibility.
Using the LDA algorithm, we computed topic probabilities for each article and assigned a topic to each publication based on these probabilities.Heatmaps were generated to visually represent research topics and publication dates [18,19].For cluster analysis and the creation of thematic networks to discern relationships between themes, we employed the Louvain algorithm within Gephi software (version 0.9.2).
To establish connections between themes, we identified the two topics with the highest attribution probability in each article and counted their co-occurrences within each document.The codes used for the LDA analysis are described (Supplemental Material: Supplemental Information S1).

Retrieval and Acquisition of Pyroptosis-Related Genes
To compile a comprehensive list of pyroptotic-related genes, the Genes and Expression section of the NIH National Library of Medicine (https://www.ncbi.nlm.nih.gov/guide/genes-expression/ (accessed on 9 January 2024)) was utilised to identify gene symbols.Abstracts of relevant publications were retrieved.All genes and gene names appearing in these abstracts were extracted, and their frequency of occurrence was recorded.Additional verification was conducted to ensure data integrity and establish a correlation between the genes and pyroptosis, referencing databases such as GSEA (https://www.gsea-msigdb.org/gsea/index.jsp(accessed on 9 January 2024)), Genecards (https://www.genecards.org/(accessed on 9 January 2024)), and KEGG (https://www.genome.jp/kegg/(accessed on 9 January 2024)).Inclusion criteria: Publications must have abstracts containing gene names or gene symbols.Exclusion criteria: Our researchers manually examined each identified gene, requiring explicit evidence in papers demonstrating the gene's association with pyroptosis.This association should originate from (1) molecular biology experiments, including but not limited to RT-qPCR at the transcriptional level and protein-level detection represented by Western blot.(2) Sequencing data suggesting correlation.(3) At least two authors agreed that this gene is related to pyroptosis.A union of pyroptosis-related genes was created by consolidating information from all these databases.

Meta-Analysis of Prognostic Implications of Pyroptosis-Related Core Genes in PDAC
To assess the overall survival (OS) implications of pyroptosis core genes in PDAC, we computed the hazard ratio (HR) for each gene using the log-rank test in R. If no significant heterogeneity was observed (I 2 < 50% and p > 0.05), we pooled the HRs of each pyroptosis gene from different bulk sequencing-based cohorts using a fixed-effects model.The Meta package in R (https://cran.r-project.org/web/packages/meta/index.html(accessed on 9 January 2024)) was used for conducting the meta-analysis, and Graphpad (https://www.graphpad.com/,Version 9, (accessed on 9 January 2024)) was used for visualisation.The meta-analysis included all transcriptional data mentioned above in point Section 2.3.

mRNA Extraction and RT-qPCR
The RNeasy Mini Kit (QIAGEN, Hilden, Germany) was used for mRNA extraction.Following the instructions of the manufacturer, reverse transcription was performed using the High-Capacity RNA-to-DNA™ Kit (Thermo Fisher Scientific, Dreieich, Germany).RT-qPCR was conducted using the PowerUp™ SYBR™ Green Master Mix (Thermo Fisher Scientific, Germany).Primer sequences are available (Supplemental Materials: Table S3) and were synthesised by Eurofinsgenomic (Ebersberg, Germany).The primer concentration used was 500 nM.Gene expression levels were normalised to the housekeeping gene glycerinaldehyde-3-phosphate-dehydrogenase (GAPDH).The qPCR conditions involved denaturation at 95 • C for 15 s, annealing at 60 • C for 15 s, and extension at 72 • C for 1 min, repeated for 40 cycles.The results are expressed as relative expression values, calculated using the 2-∆∆Ct method [21].

Identification of Key Prognostic Genes and Establishment of a Scoring System for Pyroptosis-Related Genes Prognostic Index
We performed a univariate Cox regression analysis to assess the survival significance of pyroptosis-related genes, considering a significance threshold of p < 0.05.Subsequently, we employed the least absolute shrinkage and selection operator (LASSO) regression analysis to narrow the selection of candidate genes.The LASSO regression helped determine the optimal penalty parameter (λ) based on the minimum parameter.Subsequently, we calculated the regression coefficients of these selected pyroptosis-related genes, and a multivariate Cox regression analysis was performed to construct a scoring system.The following formula represents the scoring system: In this formula, each gene is denoted as Gene k , where k represents the gene index.Coef (Gene k ) corresponds to the regression coefficient of Gene k obtained from the multivariate Cox regression analysis, while expr(Gene k ) represents the expression level of Gene k .The risk score for an individual is calculated by summing the product of each gene's coefficient and expression level.

Validation of the Pyroptosis-Related Genes Prognostic Index Scoring System
We conducted a series of analyses to validate the effectiveness of our prognostic index scoring system based on pyroptosis-related genes.First, we generated receiver operating characteristic (ROC) curves for 1-year, 3-year, and 5-year survival using the "survivalROC" package (Version: 1.0.3.1).By calculating the corresponding area under the curve (AUC), we assessed the predictive accuracy of our scoring system.Furthermore, we categorised all patients into low-risk and high-risk groups using the optimal cut-off value of the risk score obtained from the "extra value" package.We utilised Kaplan-Meier survival curves to confirm the prognostic difference between these two groups.Considering the importance of external validation for prognostic features, we employed the ICGC-PACA-CA datasets (available at https://dcc.icgc.org/projects/PACA-CA(accessed on 9 January 2024)) to validate the prognostic value of our scoring system.

Evaluation of Predictive Value and Construction of a Nomogram Prediction Model
To further assess our risk groupings' predictive value, we performed univariate and multivariate Cox regression analyses using the TCGA and ICGC-CA datasets.These analyses aimed to determine the association between risk groupings and the prognosis of PDAC patients.Additionally, we constructed nomograms for the TCGA and ICGC-CA datasets using the "rms" R package (open source package).These nomograms were designed to predict the survival probabilities of individuals with PDAC at 1, 3, and 5 years.Calibration curves were created to assess the accuracy of the nomogram predictions.

Statistical Analysis
The RT-qPCR data are presented as mean values and standard deviations from a minimum of three independent experiments.Statistical analysis was performed using R Studio (version: 2023.12.0+369) and Excel (version: 2019).A significance level of p < 0.05 was considered statistically significant.In GSEA, a false discovery rate (FDR) of 5% was adjusted to account for multiple tests.All p-values were calculated based on two-sided statistical tests, with results with a p-value of less than 0.05 considered statistically significant.Significance levels are * p < 0.05, ** p < 0.01, and *** p < 0.001.

Study Design
We employed data mining and NLP to analyse pyroptosis-related literature from the PubMed database (Figure 1).Our study aimed to identify key research trends, hotspots, and relevant genes to pyroptosis.In the discovery phase, we undertook a meta-analysis using public PDAC sequencing data, focusing on genes closely related to pyroptosis.In the training phase, we built a prognostic risk model based on the upregulation of four major pyroptosis-related genes to refine PDAC prognosis prediction.In the validation phase, we tested our model on new data to gauge its accuracy.We confirmed the model's effectiveness in predicting PDAC prognosis using tests like ROC and survival curves.

Identification of Pyroptosis-Relateted Genes through Bibliometric Analysis
We undertook a bibliometric analysis to pinpoint manuscripts discussing genes related to pyroptosis.Our assessment encompassed 4970 publications up to 31 December 2022.Through evaluating the relationship between publication year (x) and number of publications (y), we discerned that the function y = 1.615e 0.39x could represent the data.This exponential correlation suggests a swift surge in publication volume.By the close of 2022, publications tallied up to 2584, but this figure is projected to ascend to 3810 by the culmination of 2023, showcasing a marked upward trend (Figure 2A).

Identification of Pyroptosis-Relateted Genes through Bibliometric Analysis
We undertook a bibliometric analysis to pinpoint manuscripts discussing genes related to pyroptosis.Our assessment encompassed 4970 publications up to 31 December 2022.Through evaluating the relationship between publication year (x) and number of publications (y), we discerned that the function y = 1.615e 0.39x could represent the data.This exponential correlation suggests a swift surge in publication volume.By the close of 2022, publications tallied up to 2584, but this figure is projected to ascend to 3810 by the culmination of 2023, showcasing a marked upward trend (Figure 2A).
Based on our LDA results, we earmarked prevalent themes in current pyroptosis research.The primary topic was "Signal pathways research", closely followed by "Disease-related research" and "Risk and prognosis research" (Figure 2B).Within "Signalling Pathways Research", key areas included "Inflammasome", "Caspase", and "GSDME and Therapy".In the "Disease-Related Research" domain, pivotal subjects included "IL Expression and Regulation", "Reperfusion Injury", and "Pyroptosis Model".Based on our LDA results, we earmarked prevalent themes in current pyroptosis research.The primary topic was "Signal pathways research", closely followed by "Diseaserelated research" and "Risk and prognosis research" (Figure 2B).Within "Signalling Pathways Research", key areas included "Inflammasome", "Caspase", and "GSDME and Therapy".In the "Disease-Related Research" domain, pivotal subjects included "IL Expression and Regulation", "Reperfusion Injury", and "Pyroptosis Model".The "Risk and Prognosis Research" dimension covered chronic illnesses, tumours, and infections.Moreover, there is considerable exploration around the roles of "GSDMD" and "lncRNA" in pyroptosis.
infections.Moreover, there is considerable exploration around the roles of "GSDMD" and "lncRNA" in pyroptosis.

Transcriptome Meta-Analysis Suggests an Essential Role of Pyroptosis Genes in PDAC Prognosis
To underscore the role of pyroptosis signalling in PDAC, we carried out ssGSEA and found a notable downregulation of the pyroptosis signalling pathway in tumour samples compared to adjacent non-tumour tissues.The normalised enrichment score was −1.851, supported by a false discovery rate (FDR) q-value under 0.01 (Figure 3A).To delve deeper into the prognostic implications of the key genes driving pyroptosis-specifically CASP1, CASP3, CASP4, CASP5, GSDMA, GSDMB, GSDMC, GSDMD, and GSDME-we analysed eight PDAC transcriptome datasets, totalling 1273 PDAC patient samples (Figure 3B).A meta-analysis of these datasets indicated that higher CASP1 expression correlated with a 12% decrease in death risk for PDAC patients, with HR = 0.88, 95% CI 0.80-0.96(Figure 3C).In contrast, elevated expressions of GSDMC with HR = 1.13, 95% CI 1.01-1.28(Figure 3D), and GSDME with HR = 1.15, 95% CI 1.01-1.28(Figure 3E), correlated to a 13% and 15% risk in PDAC-related mortality, respectively.For other pyroptotic genes, no statistically significant prognostic relationships were found.In essence, our data emphasise the pronounced impact of certain pyroptotic genes on the prognosis of PDAC patients.

Prognosis
To underscore the role of pyroptosis signalling in PDAC, we carried out ssGSEA and found a notable downregulation of the pyroptosis signalling pathway in tumour samples compared to adjacent non-tumour tissues.The normalised enrichment score was −1.851, supported by a false discovery rate (FDR) q-value under 0.01 (Figure 3A).To delve deeper into the prognostic implications of the key genes driving pyroptosis-specifically CASP1, CASP3, CASP4, CASP5, GSDMA, GSDMB, GSDMC, GSDMD, and GSDME-we analysed eight PDAC transcriptome datasets, totalling 1273 PDAC patient samples (Figure 3B).A meta-analysis of these datasets indicated that higher CASP1 expression correlated with a 12% decrease in death risk for PDAC patients, with HR = 0.88, 95% CI 0.80-0.96(Figure 3C).In contrast, elevated expressions of GSDMC with HR = 1.13, 95% CI 1.01-1.28(Figure 3D), and GSDME with HR = 1.15, 95% CI 1.01-1.28(Figure 3E), correlated to a 13% and 15% risk in PDAC-related mortality, respectively.For other pyroptotic genes, no statistically significant prognostic relationships were found.In essence, our data emphasise the pronounced impact of certain pyroptotic genes on the prognosis of PDAC patients.
Kaplan-Meier survival curves flagged a substantially compromised survival rate for the high-risk faction across both TCGA and ICGC sets (Figure 6H,I).Specifically, highscoring TCGA patients bore a median survival span of 42.58 ± 20.74, whereas their lowscoring counterparts had a median survival span of 61.50 ± 3.41 (p = 0.003).For the ICGC set, the survival was 38.00 ± 5.16 for the high-scoring group and 56.08 ± 5.69 for the low-scoring one (p = 0.025).Together, this pyroptosis gene-centric scoring system holds promise as a valuable prognostic tool for categorising PDAC patients, potentially guiding clinical determinations.
Kaplan-Meier survival curves flagged a substantially compromised survival rate for the high-risk faction across both TCGA and ICGC sets (Figure 6H,I).Specifically, highscoring TCGA patients bore a median survival span of 42.58 ± 20.74, whereas their lowscoring counterparts had a median survival span of 61.50 ± 3.41 (p = 0.003).For the ICGC set, the survival was 38.00 ± 5.16 for the high-scoring group and 56.08 ± 5.69 for the lowscoring one (p = 0.025).Together, this pyroptosis gene-centric scoring system holds promise as a valuable prognostic tool for categorising PDAC patients, potentially guiding clinical determinations.

Construction and Evaluation of a Prognostic Nomogram Based on Core Pyroptosis Gene Expression
To evaluate the clinical significance of our scoring system, we ran a univariate analysis on PDAC patients using the TCGA database.This encompassed age, gender, tumour grade, N stage, T stage, and our specific score.The study yielded an HR of 2.823 (95% CI, 1.653-4.823)for our risk score, highlighting its significance with a p-value of <0.001 (Figure 7A).This suggests that individuals in the high-risk category face almost three times (2.8-fold) the mortality risk compared to those in the low-risk group.Expanding our examination, we carried out a multivariate analysis factoring in patient age, tumour grade, and risk score.This confirmed the high-risk group as a substantial risk determinant for PDAC, evidenced by an HR of 2.77 (1.59-4.82)and a p-value < 0.001 (Figure 7B).A heatmap underscores the strong association between our score and diverse clinical determinants.We then crafted a nomogram integrating these clinical parameters to offer a precise prognosis evaluation (Figure 7C,D).To evaluate the predictive precision of our model, we reviewed the calibration curve across both the TCGA training set and the ICGC database, which reflected commendable accuracy.Our scoring model is open to the public and primed for clinical application, promoting its effortless incorporation into clinical decision-making processes.The compelling evidence from our findings implies that this scoring paradigm is a robust prognostic instrument for PDAC.Our scoring tool is available at: https://nomogram-uniheidelberg.shinyapps.io/DynNomapp/(accessed on 9 January 2024), Figure 7E,F.See (Supplemental Information S2, Table S1 and Figure S1) for examples.

Expression
To evaluate the clinical significance of our scoring system, we ran a univariate analysis on PDAC patients using the TCGA database.This encompassed age, gender, tumour grade, N stage, T stage, and our specific score.The study yielded an HR of 2.823 (95% CI, 1.653-4.823)for our risk score, highlighting its significance with a p-value of <0.001 (Figure 7A).This suggests that individuals in the high-risk category face almost three times (2.8-fold) the mortality risk compared to those in the low-risk group.Expanding our examination, we carried out a multivariate analysis factoring in patient age, tumour grade, and risk score.This confirmed the high-risk group as a substantial risk determinant for PDAC, evidenced by an HR of 2.77 (1.59-4.82)and a p-value < 0.001 (Figure 7B).A heatmap underscores the strong association between our score and diverse clinical determinants.We then crafted a nomogram integrating these clinical parameters to offer a precise prognosis evaluation (Figure 7C,D).To evaluate the predictive precision of our model, we reviewed the calibration curve across both the TCGA training set and the ICGC database, which reflected commendable accuracy.Our scoring model is open to the public and primed for clinical application, promoting its effortless incorporation into clinical decision-making processes.The compelling evidence from our findings implies that this scoring paradigm is a robust prognostic instrument for PDAC.Our scoring tool is available at: h ps://nomogram-uniheidelberg.shinyapps.io/DynNomapp/(accessed on 9 January 2024), Figure 7E,F.See (Supplemental Information S2, Table S1 and Figure S1) for examples.

Discussion
This study aimed to investigate the potential of pyroptosis-related gene expression as a prognostic signature for treatment decision making in PDAC.We employed NLP and LDA to screen approximately 5000 pyroptosis-related publications, thereby identifying key research areas in the field.Through search algorithms, we identified 357 genes related to pyroptosis.Meta-analysis confirmed a strong correlation between the expression of pyroptosis effector genes and PDAC prognosis.We analysed multiple transcriptome datasets to explore further the differential expression of these 357 genes in PDAC tumours and adjacent tissues.Subsequently, utilising various algorithms, we developed a PDAC prognosis assessment model based on the expression levels of four key pyroptosis-related genes.This model exhibited excellent predictive for PDAC prognosis.Notably, this study represents the first comprehensive analysis for identification of key genes among known pyroptosis-related genes using text analysis and subsequent analysis in PDAC.We have rigorously assessed the nomogram model and made it publicly available online for interested users.
To comprehensively identify genes related to pyroptosis, we followed a two-step approach.First, we downloaded all publications related to pyroptosis and used manual gene retrieval methods and multiple databases.This process resulted in a total of 357 pyroptosis-related genes.We aimed to maximise the confirmation of these genes and explore their expression and functional relevance in PDAC.Previous studies that constructed a pyroptosis-related gene signature in PDAC, such as the ones conducted by Huang et [22], primarily used the literature and reviews to identify candidate genes, confirming around 33 pyroptosis-related genes and subsequent investigations and modelling.Regardless of the approach used for model construction, the unquestionable impact of pyroptosis-related genes on the prognosis of PDAC remains evident.Our approach allowed us to adopt a more comprehensive perspective to avoid missing potential candidate genes and ensure knowledge completeness.
Additionally, we employed bibliometric analysis to assess the research landscape of pyroptosis, utilising LDA topic modelling from machine learning to identify significant research focuses.We found that the research on pyroptosis is still in its early stages, with significant attention directed towards "Signalling", followed by "Disease-related" and "Risk and prognosis".Furthermore, we discovered substantial gaps in pyroptosis-related research concerning drug development and clinical applications.Our study introduces a novel algorithm that offers a deeper research perspective than traditional bibliometric analysis of the pyroptosis literature [23,24].Using new NLP tools and large language models (LLMs) like ChatGPT, our research represents a small branch of the relevant studies and potentially provides a more detailed approach to furthering the field [25].
Our study deviated from typical modelling approaches by initially keeping the data separate and giving priority to experimental validation.Many studies encounter difficulties validating their findings after using mathematical or computational models to analyse and understand a particular subject or problem.For instance, Zhao et al., 2022 [26], merged multiple datasets to identify a lactic acid metabolism-related gene signature in lung adenocarcinoma.In contrast, our study followed a straightforward approach, conducting differential analysis for each gene across four PDAC GEO datasets.To minimise batch effects, we refrained from merging the datasets and instead performed differential analysis individually for each gene in the four PDAC GEO datasets before intersecting the results.Dvinge et al., 2014 [27], highlighted that even rigorous studies using the TCGA database might mask tumour characteristics due to variations in sample processing among different research institutions, emphasising the importance of investigating the diversity between normal and tumour cells.Therefore, we advanced the validation of expression differences between normal cells and tumour-related genes and employed protein analysis from multiple standard PDAC cell lines and large-scale databases.
Despite conducting hundreds of transcriptome sequencing and cross-validation across multiple batches, our RT-qPCR data could not entirely obtain consistent data, which may be because of PDAC cell line composition and tumour heterogeneity.One possible explanation for this discrepancy is that the established cell lines we validated consist solely of PDAC cells, while the transcriptomic data originated from PDAC tumour tissue from patients.Spatial transcriptomic studies by Ma et al., 2022 [23], suggested that PDAC, a highly heterogeneous tumour, comprises a mixture of tumour cells, inflammatory cells, fibrotic tissue, and normal pancreatic tissue.The proportion of PDAC cells can vary significantly, ranging from 10% to 90%, while sequencing captures the entire tissue, resulting in quantitative discrepancies [25,28].On the other hand, our validation process successfully identified four genes that exhibited high expression in established PDAC cell lines, and we further confirmed their protein expression levels using the Human Atlas database.This enables precise localisation and establishes a solid foundation for future research endeavours.
We have identified four key pyroptosis clinical roles for the first time, but the specific mechanism of action still needs to be further explored.For instance, BHLHE40 (Basic helixloop-helix family member e40), also known as DEC1 or HLHB2, has been found to control circadian rhythm and cell differentiation [29][30][31][32].Single-cell sequencing data obtained by Wang et al., 2023 [31], revealed that BHLHE40-driven pro-tumour neutrophils exhibit hyperactivated glycolysis in the pancreatic tumour microenvironment, promoting adverse outcomes in PDAC.BIRC3 (baculoviral iAp repeat containing 3), a member of the family of inhibitors of apoptosis proteins (IAPs), regulates cell death and survival [33,34].It possesses both anti-apoptotic and pro-pyroptotic functions, promoting cell survival and protecting against pyroptosis while triggering cell death through activation of caspase-1 [33].BIRC3 is highly expressed in PDAC and may contribute to cancer progression by modulating cell survival and death [35,36].However, further research is necessary to elucidate its precise mechanisms and develop targeted treatment strategies [37].
APOL1, known as apolipoprotein L1, encodes a protein involved in lysosomal degradation and lipid metabolism [38,39].The primary focus of APOL1 research has been on kidney diseases, particularly its association with focal segmental glomerulosclerosis and chronic kidney disease [40].Hu et al., 2012 [38], used a mass spectrometry-based pipeline to identify APOL1 as a novel PDAC biomarker.Xu et al., 2023 [41], employed single-cell analysis and machine learning and discovered that elevated APOL1 levels predict PDAC prognosis and endocrine metabolism.Furthermore, Lin et al., 2021 [42], demonstrated that APOL1 could activate the NOTCH1 signalling pathway, promoting PDAC proliferation while inhibiting apoptosis.Our study also observed an HR of 1.29 (1.11-1.50)for APOL1 in PDAC, revealing high expression at both mRNA and protein levels, thus further substantiating the need for extended APOL1 research.
Interleukin-18 (IL-18) is a proinflammatory cytokine implicated in immune response regulation and the pathogenesis of diverse diseases, including cancer and inflammation [43].This potent cytokine modulates immune responses and inflammation in PDAC [44].IL-18 promotes cytokine production and stimulates immune cell activation, including T cells and natural killer cells, pivotal in the anticancer immune response [45].However, our findings indicate a negative prognosis association between IL-18 and PDAC across multiple datasets [46].Thus, we speculate that while IL-18 influences immune responses, clearance of infections, and repair of damaged cells, its proinflammatory attributes may contribute to disease progression.Numerous gaps remain in pyroptosis research that necessitate further exploration.
The limitations of this study stem from the algorithm's lack of interpretability, preventing us from further understanding the scoring criteria for evaluating the prognosis of PDAC patients based on the four genes.Furthermore, the interrelationships among these four genes are still unknown.We validated the four candidate genes at both the transcriptomic and protein levels using PDAC cell lines, a factor that may potentially impact the model's effectiveness.Additionally, our reliance on transcriptomic profiling, as opposed to more advanced next-generation sequencing (NGS) techniques for model construction, represents another limitation.However, despite these constraints, our validation across multiple datasets demonstrated the robust performance and clinical significance of the scoring system.Our research introduces an innovative model capable of identifying crucial genes from a vast body of literature, leveraging extensive transcriptomic data, and employing various machine learning algorithms.We have unveiled a clinically relevant pathway that can guide scientific investigations.

Conclusions
Using machine learning, we developed a novel model that identifies key genes by analysing vast transcriptomic data.Our result has provided significant insights into the role of pyroptosis-related genes in PDAC prognosis.The identified gene features and our nomogram offer a promising predictive tool for patient outcomes and treatment planning.However, the study has limitations.It relies on public databases and needs further validation in broader clinical contexts.It is crucial to understand how these pyroptosis-related genes affect PDAC progression and their interplay with other pathways.Future work should focus on these genes' functional roles in PDAC and their potential as therapeutic targets.

Figure 1 .
Figure 1.Study design and flowchart.The study unfolds in three phases, delineated by color-coded sections.Pink signifies text mining and natural language processing; using R, we extracted 4970 pyroptosis-related articles from PubMed after thorough filtering.Bibliometric and LDA analyses pinpointed research trends.Yellow highlights model discovery, where meta-analysis and GSEA investigated pyroptosis genes, revealing their significance in PDAC.Blue represents training, encompassing univariate, multivariate, and LASSO regression analyses, leading to a prognostic model.Green represents the validation phase, which utilises methods like ROC analysis to affirm the model's efficacy.The model is available at h ps://nomogramuniheidelberg.shinyapps.io/DynNomapp/(accessed on 9 January 2024).

Figure 1 .
Figure 1.Study design and flowchart.The study unfolds in three phases, delineated by colorcoded sections.Pink signifies text mining and natural language processing; using R, we extracted 4970 pyroptosis-related articles from PubMed after thorough filtering.Bibliometric and LDA analyses pinpointed research trends.Yellow highlights model discovery, where meta-analysis and GSEA investigated pyroptosis genes, revealing their significance in PDAC.Blue represents training, encompassing univariate, multivariate, and LASSO regression analyses, leading to a prognostic model.Green represents the validation phase, which utilises methods like ROC analysis to affirm the model's efficacy.The model is available at https://nomogram-uniheidelberg.shinyapps.io/DynNomapp/(accessed on 9 January 2024).

Figure 6 .
Figure 6.Prognostic index of pyroptosis-related genes in PDAC.(A) TCGA's univariate analysis identified BHLHE40, IL18, BIRC3, and APOL1 as PDAC risk factors with combined expression predicting prognosis.(B) LASSO regression optimally selected gene combinations, showing log lambda values.(C) Weight histogram for the chosen genes.(D) Patient scores from TCGA discerned high from low risk, with principal components analysis (PCA) reinforcing the distinction.(E) Similar scoring and PCA for the ICGC database.(F) TCGA's ROC and t-SNE analyses validate and visualise prognosis-based patient clustering.(G) Corresponding ROC and t-SNE analyses in ICGC.(H) Kaplan-Meier in TCGA and (I) ICGC reveals survival differences between scoring groups.

Figure 6 .
Figure 6.Prognostic index of pyroptosis-related genes in PDAC.(A) TCGA's univariate analysis identified BHLHE40, IL18, BIRC3, and APOL1 as PDAC risk factors with combined expression predicting prognosis.(B) LASSO regression optimally selected gene combinations, showing log lambda values.(C) Weight histogram for the chosen genes.(D) Patient scores from TCGA discerned high from low risk, with principal components analysis (PCA) reinforcing the distinction.(E) Similar scoring and PCA for the ICGC database.(F) TCGA's ROC and t-SNE analyses validate and visualise prognosis-based patient clustering.(G) Corresponding ROC and t-SNE analyses in ICGC.(H) Kaplan-Meier in TCGA and (I) ICGC reveals survival differences between scoring groups.

Figure 7 .
Figure 7. Clinical relevance of the pyroptosis-related prognostic index.(A) Univariate analysis showed that score is a significant risk factor for PDAC with HR = 2.82 (1.66-4.82),p < 0.001.(B) Multivariate analysis confirmed that score is a high-risk factor for PDAC with HR = 2.77 (1.59-4.82),p < 0.001.(C) Heatmap analysis demonstrates the relationship between the scores and clinical factors.(D) A nomogram was constructed using the scoring system and clinical factors.(E) The scoring platform/nomogram is accessible at https://nomogram-uniheidelberg.shinyapps.io/DynNomapp/,(accessed on 9 January 2024).(F) Calibration curves for the scoring model in TCGA and ICGC.*, p < 0.05.