Multimodal Deep Learning for Prediction of Progression-Free Survival in Patients with Neuroendocrine Tumors Undergoing 177Lu-Based Peptide Receptor Radionuclide Therapy

Baur, Simon; Ruhwedel, Tristan; Böke, Ekin; Kobus, Zuzanna; Lishkova, Gergana; Wetz, Christoph; Amthauer, Holger; Roderburg, Christoph; Tacke, Frank; Rogasch, Julian M.; Samek, Wojciech; Jann, Henning; Ma, Jackie; Eschrich, Johannes

doi:10.3390/cancers18081194

Open AccessArticle

Multimodal Deep Learning for Prediction of Progression-Free Survival in Patients with Neuroendocrine Tumors Undergoing ¹⁷⁷Lu-Based Peptide Receptor Radionuclide Therapy

by

Simon Baur

^1,*

,

Tristan Ruhwedel

²

,

Ekin Böke

¹,

Zuzanna Kobus

^3,4

,

Gergana Lishkova

⁵,

Christoph Wetz

²,

Holger Amthauer

²

,

Christoph Roderburg

⁶,

Frank Tacke

³

,

Julian M. Rogasch

²

,

Wojciech Samek

^1,7,8

,

Henning Jann

³,

Jackie Ma

¹

and

Johannes Eschrich

^3,9

¹

Department of Artificial Intelligence, Fraunhofer Heinrich Hertz Institute, 10587 Berlin, Germany

²

Department of Nuclear Medicine, Charité—Universitätsmedizin Berlin, 13353 Berlin, Germany

³

Department of Hepatology and Gastroenterology, Charité—Universitätsmedizin Berlin, 13353 Berlin, Germany

⁴

Division of Interventional Radiology, Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA

⁵

Department of Endocrinology and Metabolism, Charité—Universitätsmedizin Berlin, 13353 Berlin, Germany

⁶

Clinic for Gastroenterology, Hepatology and Infectious Diseases, University Hospital Düsseldorf, Medical Faculty of Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany

⁷

BIFOLD—Berlin Institute for the Foundations of Learning and Data, 10587 Berlin, Germany

⁸

Department of Electrical Engineering and Computer Science, Technische Universität Berlin, 10623 Berlin, Germany

⁹

Berlin Institute of Health at Charité—Universitätsmedizin Berlin, BIH Biomedical Innovation Academy, BIH Charité Digital Clinician Scientist Program, Charitéplatz 1, 10117 Berlin, Germany

^*

Author to whom correspondence should be addressed.

Cancers 2026, 18(8), 1194; https://doi.org/10.3390/cancers18081194

Submission received: 9 March 2026 / Revised: 20 March 2026 / Accepted: 26 March 2026 / Published: 8 April 2026

(This article belongs to the Special Issue Advancing Therapeutic Strategies for Neuroendocrine Tumors: Towards Personalized and Multidisciplinary Care)

Download

Browse Figures

Review Reports Versions Notes

Simple Summary

Patients with metastatic neuroendocrine tumors (NETs) are often treated with peptide receptor radionuclide therapy (PRRT), but treatment responses vary considerably between individuals. Being able to estimate how long the disease will remain controlled after therapy could help physicians tailor follow-up strategies and treatment decisions. In this study, we explored whether artificial intelligence models can use routinely available clinical information to improve such predictions. We analyzed data from 116 patients treated with [¹⁷⁷Lu]Lu-DOTATOC, including laboratory biomarkers and PET/CT imaging acquired before therapy. We evaluated multiple deep learning models and fusion setups. Models relying on only one type of information, such as laboratory data or imaging alone, showed limited predictive ability. In contrast, a model that combined laboratory values with PET and CT imaging performed better and provided more informative predictions. These findings highlight the potential value of integrating different clinical data sources using deep learning methods to support personalized management of patients undergoing PRRT.

Abstract

Background/Objectives: Peptide receptor radionuclide therapy (PRRT) is an established treatment for metastatic neuroendocrine tumors (NETs), yet long-term disease control occurs only in a subset of patients. Predicting progression-free survival (PFS) could support individualized treatment planning. This study evaluates laboratory, imaging, and multimodal deep learning models for PFS prediction in PRRT-treated patients. Methods: In this retrospective, single-center study 116 patients with metastatic NETs undergoing [177Lu]Lu-DOTATOC were included. Clinical characteristics, laboratory values, and pretherapeutic somatostatin receptor positron emission tomography/computed tomographies (SR-PET/CTs) were collected. Seven models were trained to classify low- vs. high-PFS groups, including unimodal (laboratory, SR-PET, or CT) and multimodal fusion approaches. Performance was assessed via repeated 3-fold cross-validation with area under the receiver operating characteristic curve (AUROC) and area under the precision–recall curve (AUPRC). Explainability was evaluated by feature importance analysis and gradient based saliency maps. Results: Forty-two patients (36%) displayed short PFS (≤1 year) and 74 patients displayed long PFS (>1 year). Groups were similar in most characteristics, except for higher baseline chromogranin A (p = 0.003), elevated

γ

-GT (p = 0.002), and fewer PRRT cycles (p < 0.001) in short-PFS patients. The Random Forest model trained only on laboratory biomarkers reached an AUROC of 0.59 ± 0.02. Unimodal three-dimensional convolutional neural networks using SR-PET or CT performed worse (AUROC 0.42 ± 0.03 and 0.54 ± 0.01, respectively). A multimodal fusion model integrating laboratory values, SR-PET, and CT—augmented with a pretrained CT branch—achieved the best results (AUROC 0.72 ± 0.01, AUPRC 0.80 ± 0.01). Explainability analyses provided insights into model predictions, with explainability patterns in the fusion model appearing physiologically plausible and predominantly tumor-focused. Conclusions: Multimodal deep learning combining SR-PET, CT, and laboratory biomarkers outperformed unimodal approaches for PFS prediction after PRRT. Upon external validation, such models may support risk-adapted follow-up strategies.

Keywords:

neuroendocrine tumors; peptide receptor radionuclide therapy; [¹⁷⁷Lu]Lu-DOTATOC; progression-free survival; somatostatin receptor PET/CT; multimodal deep learning; radiomics; predictive modeling; treatment outcome

1. Introduction

Neuroendocrine tumors (NETs) arise from neuroendocrine cells and represent a heterogeneous group of neoplasms with variable biological behavior and clinical presentation [1]. Although classified as rare, the reported incidence of NETs has been steadily increasing over recent decades [2]. Most frequently, NETs originate in the gastrointestinal tract or pancreas, collectively referred to as gastroenteropancreatic neuroendocrine tumors (GEP-NETs) [3]. In a subset of patients, the primary tumor site remains unknown despite extensive diagnostic work-up, referred to as NETs of unknown primary (CUP-NETs) [4].

For patients with advanced disease, available treatment options are limited and include somatostatin analogs, targeted therapies, chemotherapy, and peptide receptor radionuclide therapy (PRRT). PRRT with [¹⁷⁷Lu]Lu-DOTATATE or [¹⁷⁷Lu]Lu-DOTATOC has emerged as an effective treatment strategy for patients with metastatic NETs that express high levels of somatostatin receptors [5]. Clinical trials have demonstrated that ¹⁷⁷Lu-based PRRT significantly prolongs progression-free survival (PFS) [6]. More recently, the NETTER-2 trial evaluated [¹⁷⁷Lu]Lu-DOTATATE as a first-line therapy for patients with advanced grades 2 and 3 gastroenteropancreatic (GEP) NETs, demonstrating encouraging outcomes that support its expanded role in earlier lines of treatment [7]. Nonetheless, meta-analyses have demonstrated that PRRT achieves objective response rates in patients with advanced NETs ranging between 25.0% and 35.0%, depending on the response assessment criteria applied, and disease control rates between 79.0% and 83.0% [8]. Identifying patients who will not achieve long-term disease control or remission in advance is a clinical need and defines the rationale of the present study.

As of today, the histological Ki-67 proliferation index and serum chromogranin A (CgA) remain the most established prognostic biomarkers in patients undergoing PRRT. Elevated Ki-67 and CgA levels have been consistently associated with shorter PFS [9]. The multigene transcriptomic assay NETest has been proposed as a predictive tool for PRRT outcomes, showing promising initial results; however, its high cost and limited availability currently restrict clinical implementation [10]. Recent work by Ruhwedel et al. identified the De Ritis ratio (AST/ALT) as a prognostic biomarker in patients undergoing PRRT, with elevated values associated with shorter progression-free survival and overall survival [11,12].

Imaging-derived parameters have also been proposed to predict PRRT outcomes. Somatostatin receptor (SR) heterogeneity, high lesional SR expression—as assessed by the Krenning score—and the metastases-to-liver ratio (M/L ratio) have been investigated as potential predictors of therapy response [13,14]. Furthermore, radiomic signatures extracted from baseline somatostatin receptor PET (SR-PET) or CT imaging have demonstrated potential for stratifying patients [15]. However, the predictive value of these parameters remains limited, thus restricting their clinical applicability.

In recent years, artificial intelligence (AI)—and particularly deep learning (DL)—has emerged as a potent tool to extract high-dimensional patterns from heterogeneous biomedical data, including imaging, genomics, and clinical variables. In oncology, multimodal DL approaches leverage specialized architectures that process each data type in dedicated branches before combining them into a joint representation for prediction. Typically, convolutional neural networks are employed for image analysis, while feed-forward networks handle structured clinical data, and sequence models such as recurrent or transformer architectures are used for genomic or temporal inputs. These modality-specific encoders are integrated via fusion strategies ranging from simple concatenation to attention-based transformers enabling end-to-end optimization across all modalities [16,17]. Recent reviews highlight that such architectures can capture complementary information, mitigate modality specific biases, and improve generalization across diverse patient populations [16,18]. This paradigm is particularly relevant for PRRT, where reliable biomarkers for predicting durable response remain scarce. Applying multimodal AI frameworks to NETs could therefore facilitate more accurate patient selection and personalized therapeutic strategies.

2. Materials and Methods

2.1. Study Design and Patient Cohort

This retrospective, single-center study included 116 consecutive patients with histologically confirmed NETs who received PRRT with [¹⁷⁷Lu]Lu-DOTATOC between 2015 and 2022 at Charité—Universitätsmedizin Berlin. Eligibility criteria comprised the following: (1) metastatic, progressive disease; (2) sufficient SR expression confirmed by pretherapeutic [⁶⁸Ga]Ga-DOTATOC PET/CT; (3) availability of both pretherapeutic laboratory data and imaging (SR-PET and CT scan); and (4) availability of clinical follow-up data.

Baseline laboratory values included liver function parameters—aspartate transaminase (AST), alanine transaminase (ALT), and gamma-glutamyl transferase (GGT)—as well as chromogranin A (CgA), measured within four weeks before PRRT initiation. Patients were excluded if they had undergone prior PRRT, had incomplete clinical records, or insufficient follow-up to assess disease progression.

A majority of the cohort analyzed in this study was previously included in earlier publications [11,19]. The present study comprises additional, more recently treated patients. Furthermore, it differs methodologically by employing a multimodal deep learning framework that integrates laboratory values and imaging data (SR-PET/CT) for predictive modeling of PFS. Thus, while the patient cohort overlaps with previous studies, the methodological approach of the current work is distinct. Table 1 illustrates all the patient characteristics.

2.2. Imaging Characteristics

Prior to PRRT initiation, all patients underwent a pretherapeutic [⁶⁸Ga]Ga-DOTA-based PET. The median interval between the pretherapeutic SR-PET and initiation of the first PRRT cycle was 36 days (IQR 44; Q1 15.5–Q3 59.5 days). PET/CT examinations were performed in our center with either a Philips Gemini TF 16 scanner with time-of-flight capability and a 16-row CT scanner [20] or a GE Discovery MI scanner with silicon photomultipliers and time-of-flight capability and a 64-row CT scanner [21]. The CT scans included in our model were exclusively those acquired simultaneously with the pretherapeutic SR-PET to guarantee spatial and temporal alignment between anatomical and functional imaging data. Among the 116 patients included, 74 underwent whole-body contrast-enhanced CT, while the remaining patients received whole-body non-contrast CT. The contrast-enhanced CT images were acquired during the venous contrast phase with a slice thickness of 3 mm. We deliberately used whole-body CTs to ensure that all lesions detected on SR-PET could be anatomically correlated with the corresponding CT scan across the entire field of view.

2.3. Peptide Receptor Radionuclide Therapy and Response Assessment

Patients received [¹⁷⁷Lu]Lu-DOTATOC PRRT with a median of 3 cycles (range: 1–7), with each administered at a standard dose of 200 mCi (7.40 GBq). Treatment cycles were scheduled at intervals of 10 to 12 weeks. Interim response assessment was performed using [⁶⁸Ga]Ga-DOTATOC PET/CT after every two cycles, with the first evaluation following the second treatment cycle. To minimize the risk of misinterpreting radiogenic edema as disease progression (pseudo-progression), interim staging was conducted at least two months after the most recent PRRT cycle [22]. Disease progression was determined by an interdisciplinary tumor board. In patients showing progressive disease, no additional PRRT cycles were administered. Following completion of therapy, patients underwent routine follow-up imaging every 3 to 6 months. Morphological evaluation was primarily based on CT.

2.4. Progression-Free Survival

PFS was defined as the time from the initiation of the first PRRT cycle until the date of documented disease progression or death from any cause. Disease progression was assessed according to RECIST 1.1 criteria as determined by the local interdisciplinary tumor board. Patients without a progression event were not included in the analysis. For the purpose of this study, we defined the PFS threshold at 1 year, as progression within the first year after PRRT initiation is generally considered to indicate insufficient therapeutic benefit and has been used previously in clinical studies [23]. This threshold was applied to dichotomize patients into low-PFS (≤1 year) and high-PFS (>1 year) groups for subsequent analyses.

2.5. Deep Learning Models

We evaluated seven predictive models for PFS classification, exploring the predictive value of laboratory parameters, imaging, and multimodal data. First we trained a Random Forest classifier using only laboratory biomarkers. The laboratory biomarkers included were AST, ALT, CgA, and GGT. Subsequently, we developed two separate 3D convolutional neural networks (3D-CNNs) to predict the PFS based on PET or CT imaging data. To investigate the benefit of multimodal integration, we constructed fusion models that combined (1) PET imaging with laboratory biomarkers, (2) CT imaging with laboratory biomarkers, and (3) PET and CT imaging with laboratory biomarkers. In addition, an advanced fusion model was evaluated in which the CT branch was initialized from a pretrained MONAI 3D CT segmentation network [24] and fine-tuned for PFS prediction. Laboratory features were concatenated with the flattened image embeddings and processed by three fully connected layers. Figure 1 provides an overview of our modelling and analysis setup. Given the relatively small sample size (n = 116), we employed stringed regularization. All models were trained with a learning rate of 0.01 (cosine decay) and a dropout rate of 0.1 for 20 epochs with early stopping at 5 epochs. We found empirically that higher dropout rates were not beneficial for predictive performance. The ADAM optimizer with a weight decay of 0.2 and binary cross-entropy loss was used for all experiments; the relatively high weight decay was deliberately chosen, as preliminary experiments with lower values led to frequent model collapse due to overfitting. All imaging data were preprocessed by removing visual artifacts, harmonizing voxel slopes using DICOM metadata, normalizing intensities across the training set, and resizing volumes to

75 \times 50 \times 50

voxels. All implementations and experiments were done with python 3.11 and PyTorch 2.10.

2.6. Statistical Analysis

To make sure our evaluation was reliable and not dependent on one specific data split, we used a repeated cross-validation approach. Therefore, we divided the dataset into three parts of equal size (3-fold cross-validation). In each round, two parts were used to train the model and the remaining part was used to test it. This rotation continued until each part had served once as the test set. We then repeated this entire 3-fold process five times, each time reshuffling the data. This repetition reduced the chance that the results were influenced by a particular way of splitting the data, giving us a more robust estimate of performance. Finally, we reported the average performance across all 5 repetitions, along with the standard error to indicate how much the results varied. Performance was measured using two widely applied metrics: the area under the receiver operating characteristic curve (AUROC), and the area under the precision–recall curve (AUPRC). To assess whether the observed performance differences between model families were statistically significant, we conducted nonparametric significance testing. Each model was assigned to one of three predefined groups based on its input modality: unimodal (PET Only, CT Only, or Random Forest on tabular data), one-image fusion (PET or CT combined with clinical variables), and dual fusion (both PET and CT combined, with or without pretraining). We first compared one-image fusion models against unimodal models, and then dual-fusion models against one-image fusion models. For each comparison, we used the Mann–Whitney U test (two-sided) to evaluate whether one group achieved a higher performance than the other. To control for multiple testing, p-values were adjusted using the Bonferroni correction. In addition, we quantified the effect size with Cliff’s delta, which indicates how strongly one group tends to outperform the other. According to conventional interpretation, values of

| δ | < 0.147

, <0.33, <0.474, and ≥0.474 correspond to negligible, small, medium, and large effects, respectively.

2.7. Model Analysis and Explainability

To elucidate the decision-making process of our deep learning framework, we performed both representation space analysis and biomarker relevance assessment, as well as qualitative explainability through saliency maps. The aim was to provide transparency on how the model integrates multimodal information—namely, PET imaging and laboratory biomarkers—to arrive at its predictions.

2.7.1. Umap Analysis of Feature Embeddings

Uniform Manifold Approximation and Projection (UMAP) [25] was employed to visualize the high-dimensional feature embeddings learned by the network. UMAP is a non-linear dimensionality reduction technique that projects complex feature spaces into two dimensions while preserving both local and global structural relationships. This makes it particularly suitable for identifying separable patient subgroups in latent space.

2.7.2. Feature Importance Analysis of Laboratory Biomarkers

Feature importance was assessed by computing input gradients with respect to the laboratory biomarker inputs. For each sample, the absolute gradient values at the tabular input layer were computed via backpropagation, normalized by the sample-wise maximum, and averaged across the dataset to yield a per-feature importance score. The resulting gradient-based saliency reflects each feature’s marginal contribution to the model output.

2.7.3. Qualitative Explainability

Gradient-based saliency maps were computed to localize regions within the PET scans that most strongly influenced the model’s classification decisions. For each patient, voxel-wise gradients from the PET Fusion and PET Only models were backpropagated and mapped to the input PET volume to generate saliency heatmaps. These were overlaid on the original scans in three anatomical planes (axial, coronal, and sagittal) to visually highlight spatial patterns of model attention.

3. Results

3.1. Clinical Characteristics

A total of 116 patients were included in the final study cohort, with a median age of 66 years (range: 36–87). A total of 41% of patients were female. The most common primary sites were the small intestine (42%) and pancreas (29%), with 17% of patients having a cancer of unknown primary (CUP). Most patients presented with hepatic metastases (73%), frequently accompanied by lymphonodal or osseous spread. The majority of tumors were G2 (71%) with a median Ki-67 index of 5% (range: 1–40). Patients received a median of 3 PRRT cycles (range: 1–7). Key baseline laboratory values, including chromogranin A (CgA) and

γ

-GT, are summarized in Table 1.

3.2. Progression-Free Survival

To evaluate the treatment outcomes in the study cohort, we first analyzed the PFS. Figure 2 illustrates the PFS distribution of the patient cohort. The median PFS for the total cohort was 15.7 months (interquartile range [IQR]: 9.1–26.7 months), indicating that half of the patients remained progression-free for at least this duration. No censored patients were included, as target values are necessary for a sample to be used in our deep learning setup. Our cohort includes only patients that eventually showed progress. Patients were stratified into short-PFS (≤1 year) and long-PFS (>1 year) groups. When comparing these two subgroups, most clinical characteristics did not differ significantly between the groups, including age, sex distribution, primary tumor location, metastatic pattern, tumor functionality, and histological grading (all

p > 0.05

). Notably, the baseline chromogranin A (CgA) level was significantly higher in patients with shorter PFS (

p = 0.003

) and elevated

γ

-GT levels (

p = 0.02

). In addition, patients with early progression had received fewer PRRT cycles (

p < 0.01

), which is expected, as treatment is typically discontinued in the event of disease progression prior to completion of the planned cycles.

3.3. Deep Learning Predictive Model for Progression-Free Survival

We applied a series of unimodal and multimodal deep learning architectures to assess the added value of integrating imaging and laboratory data for predicting PFS. Our results demonstrate that integrating multiple data modalities consistently improves the model performance in progression-free survival (PFS) classification (Table 2). The baseline Random Forest model, trained solely on laboratory biomarkers, achieved moderate performance (AUROC:

0.59 \pm 0.02

, AUPRC:

0.67 \pm 0.01

, accuracy:

0.61 \pm 0.02

). In contrast, unimodal 3D convolutional neural networks trained on PET or CT data alone yielded a lower discriminative performance, particularly for the PET Only model (AUROC:

0.42 \pm 0.03

). The CT Only model performed slightly better (AUROC:

0.54 \pm 0.01

), though both remained inferior to the model using only laboratory parameters. Introducing laboratory features into imaging-based models led to improvements: the PET–laboratory fusion model achieved an AUROC of

0.68 \pm 0.01

and the highest accuracy overall (

0.65 \pm 0.01

), suggesting strong complementarity between PET imaging and laboratory data. Similarly, the CT–laboratory fusion model improved over the CT Only model across all metrics, reaching an AUROC of

0.62 \pm 0.03

. Further combining all three modalities—PET, CT, and laboratory data—resulted in additional performance gains. The PET–CT–laboratory fusion model achieved an AUROC of

0.69 \pm 0.01

and matched the highest AUPRC score (

0.80 \pm 0.01

), reinforcing the value of multimodal integration. Finally, initializing the CT branch with a pretrained model further boosted the AUROC to

0.72 \pm 0.01

, indicating that leveraging pretrained representations can enhance predictive performance. Figure 3 gives a visual overview of predictive performance of all models. For additional insights into predictive performance, Figure 4 displays ROC and precision–recall curves of a single exemplary cross validation fold for our PET CT Fusion model. We can clearly see the improved predictive performance compared with the Random Forest model and a random baseline. Statistical testing applied as described in Section 2.6 suggests that adding iteratively more modalities improves the predictive model performance significantly (

p < 0.01

).

To further explore the clinical relevance of our model predictions, we performed a Kaplan–Meier analysis stratified by the model-derived probability of long progression-free survival (PFS) (see Figure 5). The plot shows a representative single cross validation of the PET Fusion model. Patients predicted by our model to have a high therapy response (probability ≥ 0.5) demonstrated a markedly prolonged PFS compared with those in the low-response group (probability < 0.5). The median PFS was 17.25 months in the high-response group versus 12.43 months in the low-response group, corresponding to a statistically significant difference (log-rank p = 0.0001; test statistic 15.18). As all patients in our cohort eventually experienced progression, no censoring occurred, and the survival curves therefore display the proportion of patients who had progressed at a given time point. These findings indicate that our multimodal model is capable of clinically meaningful risk stratification, separating patients into distinct prognostic groups based solely on baseline data prior to therapy initiation.

3.4. Results Model Analysis and Explainability

3.4.1. UMAP Analysis

Figure 6 displays the UMAP projections of embeddings derived from different fusion strategies, PET imaging combined with laboratory biomarkers (PET Fusion), CT imaging combined with laboratory biomarkers (CT Fusion), joint PET and CT imaging fused with laboratory biomarkers (PET-CT Fusion), and the same PET-CT fusion model with the CT branch initialized from a pretrained network (PET-CT Fusion, pretrained). All UMAP projections were computed using a fixed random seed, with embeddings extracted from a single cross-validation fold both before and after the fusion layer, ensuring that observed structural differences in the projections reflect the effect of the fusion mechanism rather than algorithmic stochasticity. PET-derived embeddings alone show weak class separation. In contrast, the fused PET-CT embeddings—particularly when incorporating pretrained CT features—exhibit markedly improved clustering, with tighter intra-class grouping and greater inter-class separation. This suggests multimodal fusion enriches the feature space, enabling the model to capture subtle features associated with PRRT response.

3.4.2. Feature Importance

Our evaluation of feature importance of laboratory values is displayed in Figure 7. Across all model configurations, ALT, AST, CgA, and Gamma-GT consistently emerged as key discriminative variables. Notably, CgA exhibited the highest importance in both PET-CT fusion approaches, highlighting its strong association with the target outcome when combined with imaging-related features. The pretrained PET-CT fusion model generally preserved or enhanced biomarker relevance compared with the non-pretrained variant, suggesting that the integration of well-learned CT representations can strengthen the interpretive value of specific laboratory measures. The Random Forest baseline, while ranking the same biomarkers highly, demonstrated lower absolute importance values, underscoring the advantage of deep multimodal learning in capturing non-linear relationships between biochemical and imaging features.

3.4.3. Qualitative Explainability

Figure 8 and Figure 9 show the results of our qualitative explainability analysis. First, we analyzed the global distributions of gradient magnitudes across all test sets of an entire cross validation (Figure 8a) for PET Only and PET Fusion models. The PET Only model exhibits an irregular, noisy distribution with substantial density fluctuations across the gradient range. In contrast, PET Fusion displays a smooth, unimodal distribution centered around moderate gradient magnitudes. Notably, PET Only shows a pronounced shift toward higher gradient values (spanning the full [0, 1] range with significant density beyond 0.6), which—coupled with the model’s poor generalization performance—is indicative of exploding gradients, a well-known phenomenon in deep learning that can hinder stable learning and meaningful representation formation [26,27,28,29]. Quantitative comparisons using multiple statistical distance metrics confirmed substantial divergence between the distributions of the two models: Wasserstein Distance (0.090), Kolmogorov–Smirnov Statistic (0.241,

p < 0.001

), Jensen–Shannon Divergence (0.418), Energy Distance (0.183), Bhattacharyya Distance (0.207), and Histogram Overlap (0.518) (Figure 8b). We further illustrate these differences at the individual-case level. An example of a raw PET scan is shown in Figure 9a. Figure 9b,c compare the corresponding saliency maps of the PET Only and PET Fusion models for the same patient. Brighter colors indicate voxels with stronger contributions to the model’s prediction. The PET Fusion model predominantly focuses on relevant tumorous regions, while the PET Only model assigns high importance to the bladder. In addition to visual inspection, we compared the gradient magnitude distributions of both models (Figure 9d,e). The PET Only model exhibits numerous large gradients and an irregular, fragmented distribution, whereas the PET Fusion model produces a smoother, more coherent gradient distribution with fewer extreme values. In total, our gradient analyses support our earlier findings from model performance (Table 2) and internal feature representations (Figure 6): the PET Fusion model learns more physiologically meaningful signal patterns associated with PRRT effectiveness, while the PET Only model fails to capture relevant information. The contrasting distributional shapes underscore fundamental differences in training stability: PET Fusion’s concentrated, bell-shaped profile reflects well-regulated gradient flow, while PET Only’s diffuse, erratic pattern signals optimization instability.

4. Discussion

In this study, we developed and evaluated a multimodal deep learning model integrating somatostatin receptor PET, CT imaging, and laboratory biomarkers to predict progression-free survival in patients undergoing [¹⁷⁷Lu]Lu-DOTATOC PRRT. Our results show that unimodal imaging models alone—whether based on SR-PET or CT—were insufficient to provide clinically meaningful predictive performance and, in fact, performed worse than the model using only laboratory data. In contrast, combining complementary imaging modalities with laboratory biomarkers in a fusion architecture substantially enhanced the predictive accuracy and robustness. Importantly, we incorporated explainability into the model by leveraging three-dimensional gradient maps and biomarker relevance analyses, enabling interpretation of decision-driving features.

As reported previously, analyses of the cohort characteristics showed that baseline levels of CgA and gamma-GT were higher in patients with shorter PFS [11]. Consistent with this finding, earlier studies have demonstrated an inverse association between baseline CgA and clinical outcome in NET patients undergoing PRRT [30]. A similar pattern was observed for gamma-GT, which has been linked to hepatic tumor burden and poorer prognosis in NET [31], and which in our analysis was consistently higher in patients with early progression. Interestingly, in the explainability analysis of our multimodal model, CgA also emerged as a parameter with notable contribution to prediction (see Figure 4). This suggests that despite its limited value as a stand-alone biomarker—being strongly influenced by non-tumor-related factors, such as proton-pump inhibitor therapy, renal dysfunction, or other comorbidities—CgA can still provide complementary prognostic information when integrated with imaging and other laboratory features. Although previous authors have applied machine learning approaches to predict the prognosis of patients with NETs [32,33], these models have typically relied on single data modalities, predominantly clinical and laboratory data. For example, Jiang et al. used deep learning on population-based data from the SEER registry—containing only demographic, clinical, and pathological variables—to predict survival in pancreatic NETs [32]. Likewise, Gao et al. developed a machine learning model for prognosis estimation in gastroenteropancreatic NET patients with liver metastases using solely clinical parameters [33].

In addition to clinical and laboratory parameters, imaging information has increasingly been explored as a means of predicting outcomes after PRRT in NET patients. SR-PET/CT provide essential information on tumor burden and receptor expression, and several groups have investigated whether quantitative or radiomic features could be used for prognostic modeling. For example, in a recent study, Opalińska et al. found that a significant decrease in liver-normalized SUV_max in NET lesions on [⁶⁸Ga]Ga-DOTA-TATE PET/CT following PRRT was associated with a lower risk of disease progression over a 20-month follow-up [34]. This suggests that PET/CT-derived SUV_lmax in NET lesions may serve as an additional and independent predictor of treatment outcome. Furthermore, Laudicella et al. reported that the [⁶⁸Ga]Ga-DOTA-TATE PET/CT radiomic features HISTO_Skewness and HISTO_Kurtosis predicted the PRRT response for individual lesions of both primary and metastatic GEP-NETs, regardless of tumor origin, with AUCs of 0.745 and 0.722, respectively [15]. Importantly, in the CLARINET trial, Pavel et al. reported that deep learning models based on CT imaging alone failed to outperform conventional laboratory markers such as chromogranin A and specific growth rate (SLDr) [35]. Similarly, in our study, the model based solely on CT scans or SR-PET showed no meaningful prognostic value and performed worse than a baseline model using laboratory biomarkers. Only when laboratory and imaging data were combined in a multimodal fusion model did we observe a relevant increase in predictive performance. These results further underscore the complementary nature of PET and CT imaging in capturing distinct yet clinically relevant aspects of disease biology. While SR-PET emphasizes functional and metabolic activity, CT provides higher resolution anatomical detail. From a clinical perspective, this implies that radiomic signatures from combined PET and CT imaging—augmented by biochemical markers—may reflect pathological differences more accurately than any modality alone. When integrated within a shared feature space alongside laboratory biomarkers, the combined modality seems to offer a richer and more complete representation of patient status. This multimodal synergy enables the network to detect patterns that may be too subtle to discern in either modality alone, thereby improving the robustness and generalizability of the learned representations.

We recognize several limitations of our study. First, the sample size was relatively small, which raises concerns about the robustness and generalizability of the findings. Training deep learning models on limited data can lead to overfitting; although we employed cross-validation and regularization techniques, a larger dataset would be needed to ensure the model’s performance is consistent and not an artifact of our particular cohort. Second, our analysis was retrospective. This inherently carries risks of selection bias (e.g., only patients who completed PRRT were included) and confounding factors that prospective studies could better control. A key limitation of our study is the absence of an external validation cohort, which restricts the generalizability of our findings. Nonetheless, the PET/CT images were acquired using different scanner systems, potentially introducing variability due to differences in reconstruction algorithms. Given that all required inputs—laboratory parameters, as well as SR-PET and CT imaging—are routinely obtained as part of the standard diagnostic work-up in patients scheduled for PRRT, prospective validation in larger multicenter cohorts appears feasible. Another methodological limitation concerns the heterogeneity of CT acquisition protocols. Among the 116 patients included, 42 underwent non-contrast CT, whereas the remaining patients received contrast-enhanced whole-body scans acquired during the venous contrast phase. This ensured complete anatomical coverage and optimal alignment between the PET and CT images. However, arterial phase imaging could have improved the visualization of certain lesions, especially hepatic metastases, and might have enhanced the accuracy of image-based analyses. A further limitation of our study is that we considered only a relatively small fraction of the potentially available clinical information. Additional data, such as genetic profiles, advanced laboratory parameters, histological images, or multiplex staining, might have provided further predictive value. At the same time, novel biomarkers, such as the NETest, are gaining increasing attention. Future models that integrate such high-specificity biomarkers with deep learning predictions could further enhance the accuracy and clinical utility of prognostic tools in NET patients undergoing PRRT.

In contrast to many previous studies in this field, our work makes a contribution with respect to explainability, moving beyond the paradigm of “black-box” deep learning models. While gradient-based visualization did highlight tumor regions, as expected, it also consistently emphasized areas such as the kidneys, spleen, and urinary bladder. In line with the observed AUROC of 0.42 for the SR-PET only model, these findings indicate that SR-PET data alone did not provide predictive value for PFS. Consistent with the inferior predictive performance, PET Only models produced noisier and less structured gradient maps, with strong activations concentrated in medically irrelevant regions (Figure 9). In contrast, the PET Fusion model—though not entirely free of spurious correlations, which are expected to some extent in any explainability method—yielded clearer, more coherent gradient patterns that tend to focus more on clinically relevant tumorous regions. In general, saliency in non-tumor regions likely reflects a mix of relevant and spurious correlations inherent to the imaging data. Importantly, these correlations are not necessarily harmful in our setting: predictive performance emerges only after fusion with laboratory features, as supported by our UMAP analysis of embedding space, suggesting that the model leverages clinically meaningful interactions rather than relying solely on non-medically relevant image cues.

From a clinical perspective, the ability to stratify patients according to their expected progression-free survival may have important implications for treatment planning. PRRT is a resource-intensive therapy associated with potential toxicity; therefore, improved patient selection is clinically relevant. Patients predicted to have a high risk of early progression might benefit from closer monitoring, earlier response assessment, or alternative treatment strategies. Because the multimodal framework proposed in this study relies exclusively on routinely acquired laboratory parameters and imaging data, it could potentially be integrated into clinical workflows such as interdisciplinary tumor board decision making. In addition, the modular architecture allows future incorporation of further data sources, including genomic or molecular biomarkers, which may further improve predictive performance.

5. Conclusions

In this study, we developed a multimodal deep learning framework combining SR-PET imaging, CT imaging, and laboratory biomarkers to predict progression-free survival in patients undergoing [¹⁷⁷Lu]Lu-DOTATOC PRRT. Our results demonstrate that models based on single imaging modalities alone provide limited prognostic value and perform worse than models using laboratory parameters. In contrast, integrating imaging and laboratory data within a multimodal fusion architecture substantially improves predictive performance and robustness.

Importantly, our approach incorporates explainability through three-dimensional gradient maps and biomarker relevance analysis, enabling insight into the model’s decision-making process. The results suggest that clinically meaningful prognostic information emerges primarily from the interaction between imaging features and biochemical markers rather than from individual modalities alone.

Although the present study is limited by its retrospective design, relatively small sample size, and the lack of external validation, the findings highlight the potential of multimodal deep learning models for improving prognostic stratification in NET patients undergoing PRRT. Future work should focus on validating this approach in larger multicenter cohorts and integrating additional data sources, such as genomic biomarkers or advanced laboratory markers, to further enhance the predictive performance and clinical utility.

Author Contributions

Conceptualization, J.E. and S.B.; methodology, S.B., E.B., J.M. and W.S.; software, S.B. and E.B.; validation, S.B., E.B., J.M. and W.S.; formal analysis, S.B. and E.B.; investigation, T.R., Z.K., G.L. and J.E.; data curation, T.R., Z.K., G.L. and J.E.; resources, C.R., F.T., H.A., C.W. and H.J.; visualization, S.B. and E.B.; writing—original draft preparation, J.E. and S.B.; writing—review and editing, C.R., F.T., H.A., C.W., H.J., J.M.R., J.M. and W.S.; supervision, J.E., J.M.R., C.W. and H.A.; project administration, J.E. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Senate of Berlin and the European Commission’s Digital Europe Programme (DIGITAL), grant TEF-Health (101100700). Johannes Eschrich is a participant in the BIH Charité Junior Digital Clinician Scientist Program funded by Charité—Universitätsmedizin Berlin and the Berlin Institute of Health at Charité.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of Charité—Universitätsmedizin Berlin (Approval No.: EA1/016/23; Date: 24 February 2023).

Informed Consent Statement

Due to the retrospective study design and non-interventional design using anonymized data, patient consent was waived.

Data Availability Statement

The datasets generated and analyzed during the current study are not publicly available due to institutional and ethical restrictions but are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Pedraza-Arévalo, S.; Gahete, M.D.; Alors-Pérez, E.; Luque, R.M.; Castaño, J.P. Multilayered heterogeneity as an intrinsic hallmark of neuroendocrine tumors. Rev. Endocr. Metab. Disord. 2018, 19, 179–192. [Google Scholar] [CrossRef]
Liu, E.H. Neuroendocrine Tumors: Epidemiology. In Neuroendocrine Tumours: Diagnosis and Management; Yalcin, S., Öberg, K., Eds.; Springer International Publishing: Cham, Switzerland, 2024; pp. 37–50. [Google Scholar]
Yao, J.C.; Hassan, M.M.; Phan, A.T.; Dagohoy, C.G.; Leary, C.C.; Mares, J.E.; Abdalla, E.K.; Fleming, J.B.; Vauthey, J.N.; Rashid, A.; et al. One hundred years after “carcinoid”: Epidemiology of and prognostic factors for neuroendocrine tumors in 35,825 cases in the United States. J. Clin. Oncol. Off. J. Am. Soc. Clin. Oncol. 2008, 26, 3063–3072. [Google Scholar] [CrossRef]
Hainsworth, J.D.; Greco, F.A.; Strosberg, J.R. Neuroendocrine Neoplasms of Unknown Primary Site. 2024. Available online: https://www.uptodate.com/contents/neuroendocrine-neoplasms-of-unknown-primary-site (accessed on 25 March 2026).
Pellat, A.; Cottereau, A.S.; Terris, B.; Coriat, R. Neuroendocrine carcinomas of the digestive tract: What is new? Cancers 2021, 13, 3766. [Google Scholar] [CrossRef]
Strosberg, J.; El-Haddad, G.; Wolin, E.; Hendifar, A.; Yao, J.; Chasen, B.; Mittra, E.; Kunz, P.L.; Kulke, M.H.; Jacene, H.; et al. Phase 3 trial of 177Lu-Dotatate for midgut neuroendocrine tumors. N. Engl. J. Med. 2017, 376, 125–135. [Google Scholar] [CrossRef]
Singh, S.; Halperin, D.; Myrehaug, S.; Herrmann, K.; Pavel, M.; Kunz, P.L.; Chasen, B.; Tafuto, S.; Lastoria, S.; Capdevila, J.; et al. [177Lu] Lu-DOTA-TATE plus long-acting octreotide versus high-dose long-acting octreotide for the treatment of newly diagnosed, advanced grade 2–3, well-differentiated, gastroenteropancreatic neuroendocrine tumours (NETTER-2): An open-label, randomised, phase 3 study. Lancet 2024, 403, 2807–2817. [Google Scholar]
Wang, L.F.; Lin, L.; Wang, M.J.; Li, Y. The therapeutic efficacy of 177Lu-DOTATATE/DOTATOC in advanced neuroendocrine tumors: A meta-analysis. Medicine 2020, 99, e19304. [Google Scholar] [CrossRef]
Daskalakis, K.; Tsoli, M.; Wallin, G.; Kogut, A.; Srirajaskanthan, R.; Harlow, C.; Giovos, G.; Weickert, M.O.; Kos-Kudla, B.; Kaltsas, G. Modified histopathological grading optimizes prediction of survival outcomes in small intestinal neuroendocrine tumors. J. Clin. Endocrinol. Metab. 2024, 109, e2222–e2230. [Google Scholar] [CrossRef] [PubMed]
Knigge, U.; Capdevila, J.; Bartsch, D.; Baudin, E.; Falkerby, J.; Kianmanesh, R.; Kos-Kudla, B.; Niederle, B.; Nieveen van Dijkum, E.; O’Toole, D.; et al. ENETS consensus recommendations for the standards of care in neuroendocrine neoplasms: Follow-up and documentation. Neuroendocrinology 2017, 105, 310–319. [Google Scholar] [CrossRef] [PubMed]
Ruhwedel, T.; Rogasch, J.M.; Huang, K.; Jann, H.; Schatka, I.; Furth, C.; Amthauer, H.; Wetz, C. The prognostic value of the de ritis ratio for progression-free survival in patients with net undergoing [177lu]lu-dotatoc-prrt: A retrospective analysis. Cancers 2021, 13, 635. [Google Scholar] [CrossRef]
Ruhwedel, T.; Rogasch, J.; Schatka, I.; Galler, M.; Steinhagen, P.; Wetz, C.; Amthauer, H. Beyond similarities: Overall survival and prognostic insights from [¹⁷⁷Lu] Lu-DOTATOC therapy in neuroendocrine tumors. Eur. J. Nucl. Med. Mol. Imaging 2025, 52, 3662–3671. [Google Scholar] [CrossRef] [PubMed]
Werner, R.A.; Lapa, C.; Ilhan, H.; Higuchi, T.; Buck, A.K.; Lehner, S.; Bartenstein, P.; Bengel, F.; Schatka, I.; Muegge, D.O.; et al. Survival prediction in patients undergoing radionuclide therapy based on intratumoral somatostatin-receptor heterogeneity. Oncotarget 2016, 8, 7039. [Google Scholar] [CrossRef]
Wetz, C.; Genseke, P.; Apostolova, I.; Furth, C.; Ghazzawi, S.; Rogasch, J.M.; Schatka, I.; Kreissl, M.C.; Hofheinz, F.; Grosser, O.S.; et al. The association of intra-therapeutic heterogeneity of somatostatin receptor expression with morphological treatment response in patients undergoing PRRT with [177Lu]-DOTATATE. PLoS ONE 2019, 14, e0216781. [Google Scholar] [CrossRef]
Laudicella, R.; Comelli, A.; Liberini, V.; Vento, A.; Stefano, A.; Spataro, A.; Crocè, L.; Baldari, S.; Michelangelo, B.; Deandreis, D.; et al. [68Ga] DOTATOC PET/CT radiomics in the prediction of response in GEP-NETs undergoing [177Lu] DOTATOC PRRT: The “Theragnomics” concept. Cancers 2022, 14, 984. [Google Scholar] [CrossRef]
Acosta, J.N.; Falcone, G.J.; Rajpurkar, P.; Topol, E.J. Multimodal biomedical AI. Nat. Med. 2022, 28, 1773–1784. [Google Scholar] [CrossRef]
Baur, S.; Benova, A.; Cantú, E.D.; Ma, J. On the effectiveness of multimodal privileged knowledge distillation in two vision transformer based diagnostic applications. arXiv 2025, arXiv:2508.06558. [Google Scholar]
Rajpurkar, P.; Chen, E.; Banerjee, O.; Topol, E.J. AI in health and medicine. Nat. Med. 2022, 28, 31–38. [Google Scholar] [CrossRef] [PubMed]
Wetz, C.; Ruhwedel, T.; Schatka, I.; Grabowski, J.; Jann, H.; Metzger, G.; Galler, M.; Amthauer, H.; Rogasch, J.M. Plasma markers for therapy response monitoring in patients with neuroendocrine tumors undergoing peptide receptor radionuclide therapy. Cancers 2023, 15, 5717. [Google Scholar] [CrossRef]
Surti, S.; Kuhn, A.; Werner, M.E.; Perkins, A.E.; Kolthammer, J.; Karp, J.S. Performance of Philips Gemini TF PET/CT scanner with special consideration for its time-of-flight imaging capabilities. J. Nucl. Med. 2007, 48, 471–480. [Google Scholar] [PubMed]
Vandendriessche, D.; Uribe, J.; Bertin, H.; De Geeter, F. Performance characteristics of silicon photomultiplier based 15-cm AFOV TOF PET/CT. EJNMMI Phys. 2019, 6, 8. [Google Scholar] [CrossRef]
Brabander, T.; van der Zwan, W.A.; Teunissen, J.J.; Kam, B.L.; de Herder, W.W.; Feelders, R.A.; Krenning, E.P.; Kwekkeboom, D.J. Pitfalls in the response evaluation after peptide receptor radionuclide therapy with [177 Lu-DOTA 0, Tyr 3] octreotate. ENdocrine-Relat. Cancer 2017, 24, 243–251. [Google Scholar] [CrossRef]
Baudin, E.; Walter, T.; Docao, C.; Haissaguerre, M.; Hadoux, J.; Taieb, D.; Ansquer, C.; Dierickx, L.; De Mestier, L.; Deshayes, E.; et al. First multicentric randomized phase II trial investigating the antitumor efficacy of peptide receptor radionuclide therapy with 177Lutetium–Octreotate (OCLU) in unresectable progressive neuroendocrine pancreatic tumor: Results of the OCLURANDOM trial, On behalf of the ENDOCAN RENATEN network and GTE. Ann. D’Endocrinol. 2022, 83, 289–290. [Google Scholar] [CrossRef]
Wasserthal, J.; Breit, H.C.; Meyer, M.T.; Pradella, M.; Hinck, D.; Sauter, A.W.; Heye, T.; Boll, D.T.; Cyriac, J.; Yang, S.; et al. TotalSegmentator: Robust segmentation of 104 anatomic structures in CT images. Radiol. Artif. Intell. 2023, 5, e230024. [Google Scholar] [CrossRef]
McInnes, L.; Healy, J.; Melville, J. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv 2018, arXiv:1802.03426. [Google Scholar]
Glorot, X.; Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Sardinia, Italy, 13–15 May 2010; JMLR Workshop and Conference Proceedings. pp. 249–256. [Google Scholar]
Bengio, Y.; Simard, P.; Frasconi, P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 1994, 5, 157–166. [Google Scholar] [CrossRef]
Hochreiter, S. Untersuchungen zu dynamischen neuronalen Netzen. Diploma Technol. Univ. MÜNchen 1991, 91, 31. [Google Scholar]
Ceni, A. Random orthogonal additive filters: A solution to the vanishing/exploding gradient of deep neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 10794–10807. [Google Scholar] [CrossRef] [PubMed]
Aalbersberg, E.A.; Vries–Huizing, D.M.V.d.; Walraven, I.; Veen, B.J.d.W.d.; Kulkarni, H.; Singh, A.; Stokkel, M.P.M.; Baum, R.P. Parameters to predict progression-free and overall survival after peptide receptor radionuclide therapy: A multivariate analysis in 782 patients. J. Nucl. Med. 2019, 60, 1259–1265. [Google Scholar] [CrossRef] [PubMed]
Schmidt, B.C.; Leiderer, M.T.; Amin, T.; Viol, F.; Huber, S.; Henes, F.O.; Schrader, J. Does gamma-glutamyltransferase correlate with liver tumor burden in neuroendocrine tumors? Endocrine 2024, 83, 511–518. [Google Scholar] [CrossRef] [PubMed]
Jiang, C.; Wang, K.; Yan, L.; Yao, H.; Shi, H.; Lin, R. Predicting the survival of patients with pancreatic neuroendocrine neoplasms using deep learning: A study based on Surveillance, Epidemiology, and End Results database. Cancer Med. 2023, 12, 12413–12424. [Google Scholar] [CrossRef] [PubMed]
Gao, F.; Chen, J.; Xu, X. Machine learning predicts prognosis in patients with gastroenteropancreatic neuroendocrine tumors with liver metastases. Discov. Oncol. 2025, 16, 743. [Google Scholar] [CrossRef]
Opalińska, M.; Morawiec-Sławek, K.; Kania-Kuc, A.; Al Maraih, I.; Sowa-Staszczak, A.; Hubalewska-Dydejczyk, A. Potential value of pre-and post-therapy [68Ga] Ga-DOTA-TATE PET/CT in the prognosis of response to PRRT in disseminated neuroendocrine tumors. Front. Endocrinol. 2022, 13, 929391. [Google Scholar] [CrossRef] [PubMed]
Pavel, M.; Dromain, C.; Ronot, M.; Schaefer, N.; Mandair, D.; Gueguen, D.; Elvira, D.; Jégou, S.; Balazard, F.; Dehaene, O.; et al. The use of deep learning models to predict progression-free survival in patients with neuroendocrine tumors. Future Oncol. 2023, 19, 2185–2199. [Google Scholar] [CrossRef] [PubMed]

Figure 1. (Top) Overview of the proposed deep learning pipeline for PRRT response prediction. The model integrates 3D PET and 3D CT scans processed through separate 3D convolutional networks, along with laboratory biomarkers, via a concatenation-based fusion layer. The fused features are passed through fully connected layers to generate the prediction output. Solid arrows indicate forward pass flow; dashed arrows indicate post hoc explainability analyses derived via backpropagation, including gradient-based saliency maps (top right), UMAP-based visualization of learned feature embeddings (bottom center), and laboratory biomarker feature importance (bottom right). (Bottom) Overview of input modality combinations evaluated in our experiments.

Figure 2. Kaplan–Meier curve for Progression-free survival (PFS) in the total study cohort (n = 116) of patients with neuroendocrine tumors treated with [¹⁷⁷Lu]Lu-DOTATOC PRRT. Vertical dashed red line indicates split into high and low therapy response. No censored patients are included, and all patients eventually showed progress.

Figure 3. Barplot comparison of AUROC and AUPRC across models.

Figure 4. Comparison of predictive performance between the Random Forest baseline (laboratory values only) model and the PET-CT Fusion model. The plot is showing the example of a single representative cv fold. (Left): ROC curves showing True Positive Rate versus False Positive Rate; the dashed gray line represents a random baseline. (Right): Precision–Recall curves illustrating the trade-off between precision and recall; the dashed gray line indicates the all-positive baseline. The PET-CT Fusion model consistently outperforms the Random Forest baseline, as reflected in higher AUROC and AUPRC values. For cross-validation metrics, refer to Table 2.

Figure 5. Kaplan–Meier curves of study cohort, stratified by our model prediction output probabilities

\hat{y}

(low PFS:

\hat{y} < 0.5

, high PFS:

\hat{y} > = 0.5

). Note that as we did not include censored patients, and all patients in our cohort eventually showed progression, the y-axis represents the proportion of progression patients at a given time. Log-rank test:

p = 0.0001

.

Figure 5. Kaplan–Meier curves of study cohort, stratified by our model prediction output probabilities

\hat{y}

(low PFS:

\hat{y} < 0.5

, high PFS:

\hat{y} > = 0.5

). Note that as we did not include censored patients, and all patients in our cohort eventually showed progression, the y-axis represents the proportion of progression patients at a given time. Log-rank test:

p = 0.0001

.

Figure 6. U-MAP projection of learned feature embeddings from different fusion strategies. (Row 1): Embeddings derived from PET imaging alone. Embeddings are mixed up between classes, therefore PET imaging alone is not capable of good discrimination. (Row 2): Embeddings derived from CT imaging alone. Similar to PET imaging only, embeddings are scattered with no clear distinction. Notably, the pretrained CT model displays better clustered embeddings, due to prior exposure to CT imaging. (Row 3): Fusion embeddings combining PET, CT, and laboratory biomarkers reveal markedly improved class separation, with tighter intra-class clustering and clearer inter-class boundaries. This illustrates the synergistic effect of multimodal integration in capturing disease-related variation that is not apparent in single-modality embeddings.

Figure 7. Feature importance of selected laboratory biomarkers across different fusion strategies and a Random Forest baseline. Importance values were computed using a permutation-based approach, with higher values indicating stronger contribution to model predictions. Four biomarkers—ALT, AST, CgA, and Gamma-GT—consistently ranked among the most relevant features across all model configurations. CgA exhibited the highest importance in both PET-CT fusion models, suggesting a strong association with the target outcome when combined with imaging-derived features.

Figure 8. Visual and quantitative comparison of global gradient distributions for PET Only and PET Fusion models. (a) Comparison of gradient magnitude distributions between PET Only and PET Fusion models. (b) Quantitative comparison of gradient distributions between PET Only and PET Fusion models across multiple distance metrics.

Figure 9. Example PET scan and comparison of gradient maps for a single sample between PET Only and PET Fusion models. Panels (b,c) show gradient heatmap overlays; (d,e) show the corresponding gradient distributions. Gradient maps were filtered with v_min = 0.3, omitting smaller gradients.

Table 1. Summary of patient characteristics stratified by PFS. Categorical variables were compared with Fisher’s exact test, while continuous variables were compared with the Wilcoxon rank-sum (Mann–Whitney) test. Values are given as counts with percentages or as median (min–max). “Functionality” refers to the presence of clinically relevant hormone secretion by NETs. Parts of the presented cohort overlap with previously published studies [11,19]. See the Section 2 Methods for further details.

Metric	Total	PFS ≤ 1 year	PFS > 1 year	p-Value
Patient Statistics
Patient count	116 (100%)	42 (36%)	74 (64%)
Age in years	66 (36–87)	66 (36–87)	66 (36–80)	0.945
Male	68 (59%)	23 (55%)	45 (61%)	0.560
Female	48 (41%)	19 (45%)	29 (39%)	0.560
Primary Location
Small intestine	49 (42%)	19 (45%)	30 (41%)	0.697
Pancreas	34 (29%)	8 (19%)	26 (35%)	0.090
Colon/rectum	12 (10%)	4 (10%)	8 (11%)	1.000
Stomach	1 (1%)	0 (0%)	1 (1%)	1.000
CUP	20 (17%)	11 (26%)	9 (12%)	0.073
Metastatic Spread
Hepatic	85 (73%)	28 (67%)	57 (77%)	0.276
Lymphonodal	75 (65%)	26 (62%)	49 (66%)	0.689
Osseous	35 (30%)	12 (29%)	23 (31%)	0.836
Peritoneal	19 (16%)	6 (14%)	13 (18%)	0.796
Pulmonal	5 (4%)	1 (2%)	4 (5%)	0.652
Functionality
Yes	40 (34%)	19 (45%)	21 (28%)	0.072
No	75 (65%)	22 (52%)	53 (72%)	0.045
Unknown	1 (1%)	1 (2%)	0 (0%)	0.362
Grading
G1	23 (20%)	8 (19%)	15 (20%)	1.000
G2	82 (71%)	29 (69%)	53 (72%)	0.833
G3	6 (5%)	2 (5%)	4 (5%)	1.000
Unknown	5 (4%)	3 (7%)	2 (3%)	0.351
Ki67 index %	5 (1–40)	5 (1–25)	5 (1–40)	0.501
Laboratory Parameters
CgA in μg/L	419 (24–99,590)	821 (25–99,590)	262 (24–15,100)	0.001
AST in U/L	28 (14–139)	32 (14–123)	28 (14–139)	0.123
ALT in U/L	28 (7–132)	27 (7–96)	28 (10–132)	0.774
$γ$ -GT in U/L	61 (9–691)	95 (21–688)	50 (9–691)	0.014
De Ritis ratio	1.12 (0.46–3.43)	1.16 (0.46–3.43)	1.07 (0.53–2.87)	0.223
PRRT cycles	4 (1–7)	2 (1–4)	4 (1–7)	<0.001

Table 2. Performance metrics (AUROC and AUPRC) for all models. Values are reported as the mean ± standard error. Bolded values indicate the best performance in each metric. Significance markers denote statistical improvement over the next lower model family: * p < 0.01 (vs. unimodal); ^† p < 0.01 (vs. one-image fusion). All significant differences correspond to large effect sizes (Cliff’s

δ > 0.8

).

Table 2. Performance metrics (AUROC and AUPRC) for all models. Values are reported as the mean ± standard error. Bolded values indicate the best performance in each metric. Significance markers denote statistical improvement over the next lower model family: * p < 0.01 (vs. unimodal); ^† p < 0.01 (vs. one-image fusion). All significant differences correspond to large effect sizes (Cliff’s

δ > 0.8

).

Model	AUROC	AUPRC
RF (laboratory values Only)	$0.59 \pm 0.02$	$0.67 \pm 0.01$
PET Only	$0.42 \pm 0.03$	$0.58 \pm 0.03$
CT Only	$0.54 \pm 0.01$	$0.57 \pm 0.03$
PET Fusion	0.68 ± 0.01 *	0.80 ± 0.01 *
CT Fusion	0.62 ± 0.03 *	0.72 ± 0.04 *
PET-CT Fusion	0.69 ± 0.01 ^†	0.80 ± 0.01
PET-CT Fusion (pretrained CT)	0.72 ± 0.01 ^†	0.80 ±0.02

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Baur, S.; Ruhwedel, T.; Böke, E.; Kobus, Z.; Lishkova, G.; Wetz, C.; Amthauer, H.; Roderburg, C.; Tacke, F.; Rogasch, J.M.; et al. Multimodal Deep Learning for Prediction of Progression-Free Survival in Patients with Neuroendocrine Tumors Undergoing ¹⁷⁷Lu-Based Peptide Receptor Radionuclide Therapy. Cancers 2026, 18, 1194. https://doi.org/10.3390/cancers18081194

AMA Style

Baur S, Ruhwedel T, Böke E, Kobus Z, Lishkova G, Wetz C, Amthauer H, Roderburg C, Tacke F, Rogasch JM, et al. Multimodal Deep Learning for Prediction of Progression-Free Survival in Patients with Neuroendocrine Tumors Undergoing ¹⁷⁷Lu-Based Peptide Receptor Radionuclide Therapy. Cancers. 2026; 18(8):1194. https://doi.org/10.3390/cancers18081194

Chicago/Turabian Style

Baur, Simon, Tristan Ruhwedel, Ekin Böke, Zuzanna Kobus, Gergana Lishkova, Christoph Wetz, Holger Amthauer, Christoph Roderburg, Frank Tacke, Julian M. Rogasch, and et al. 2026. "Multimodal Deep Learning for Prediction of Progression-Free Survival in Patients with Neuroendocrine Tumors Undergoing ¹⁷⁷Lu-Based Peptide Receptor Radionuclide Therapy" Cancers 18, no. 8: 1194. https://doi.org/10.3390/cancers18081194

APA Style

Baur, S., Ruhwedel, T., Böke, E., Kobus, Z., Lishkova, G., Wetz, C., Amthauer, H., Roderburg, C., Tacke, F., Rogasch, J. M., Samek, W., Jann, H., Ma, J., & Eschrich, J. (2026). Multimodal Deep Learning for Prediction of Progression-Free Survival in Patients with Neuroendocrine Tumors Undergoing ¹⁷⁷Lu-Based Peptide Receptor Radionuclide Therapy. Cancers, 18(8), 1194. https://doi.org/10.3390/cancers18081194

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu