Review Reports - Multimodal Deep Learning for Prediction of Progression-Free Survival in Patients with Neuroendocrine Tumors Undergoing <sup>177</sup>Lu-Based Peptide Receptor Radionuclide Therapy

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Thanks to the authors for the interesting paper. I agree that by combining many data sources, including imaging, pathology, and genetics, multimodal deep learning enhances progression-free survival prediction, resulting in higher accuracy than unimodal models.

My comments:

What will authors say about the pharmacokinetics of the imaging radiotracer and the therapeutic radiotracer? The peptide is the same, but the radionuclides are different—will this cause a huge or subtle difference in the biodistribution? Authors have indicated that NETs are expressed in different organs. My question is what are the limiting organs for free Ga-68 and Lu-177, and could this result in false positives, especially in the imaging?

Which aspect of PRRT will lead to elevated AST/ALT? I know these markers can be triggered by many activities, including overconsumption of alcohol and many others.

Table 1: Pancreatic NETs sometimes make hormones such as gastrin, insulin, glucagon, and vasoactive intestinal peptide. My question is, did the authors consider all these hormones, or were there some specific primary hormones of interest? I'm looking at the functionality test.

Thanks to the authors for figure 1. It gives a clear overview for the response prediction.

Figure 6: I know UMAP uses stochastic gradient descent. Did the authors make different runs on the same data for different visualizations, or was a fixed random seed set?

Author Response

Response to Reviewers
Multimodal Deep Learning for Prediction of Progression-Free Survival in Patients with Neuroendocrine
Tumors Undergoing 177Lu-based Peptide Receptor Radionuclide Therapy
Manuscript ID: cancers-4219706
March 20, 2026

We sincerely thank the reviewer for their constructive and thoughtful comments, which have helped to
strengthen our manuscript. Below, we provide point-by-point responses to each comment.

Comment 1
What will authors say about the pharmacokinetics of the imaging radiotracer and the therapeutic radiotracer?
The peptide is the same, but the radionuclides are different—will this cause a huge or subtle difference in the
biodistribution? Authors have indicated that NETs are expressed in different organs. My question is what
are the limiting organs for free Ga-68 and Lu-177, and could this result in false positives, especially in the
imaging?

Answer 1
We thank the reviewer for the question. Both [68Ga]Ga-DOTATOC and [177Lu]Lu-DOTATOC share the
same peptide (DOTATOC) and bind predominantly to SSTR2. The receptor-mediated biodistribution is
therefore broadly comparable. Physiological uptake of 68Ga-labeled somatostatin analogues is highest in
the spleen, followed by kidneys, adrenals, pituitary, and liver (Pauwels et al., Eur J Nucl Med Mol Imaging
2020;47:3033–3046, doi:10.1007/s00259-020-04918-4). Urinary excretion leads to prominent bladder activity.
For 177Lu-based PRRT, the kidneys are the primary dose-limiting organ due to glomerular filtration, proximal
tubular reabsorption, and interstitial retention of the radiopeptide, with bone marrow as a secondary dose-
limiting organ (Erbas & Tuncel, Semin Nucl Med 2016;46:462–478, doi:10.1053/j.semnuclmed.2016.04.006).
The key pharmacological difference lies in the physical half-life (68 min for 68Ga vs. 6.7 days for 177Lu),
which results in substantially higher cumulative absorbed doses to receptor-positive organs and kidneys dur-
ing PRRT compared to the diagnostic PET scan.
Regarding the potential for false positives: physiological uptake in SSTR-expressing organs is a well-
recognized pattern on 68Ga-DOTA-PET and is generally distinguishable from pathological uptake by ex-
perienced readers. Importantly, our deep learning model uses imaging as one input alongside laboratory
biomarkers, and the explainability analysis (Figures 8–9) confirmed that the fusion model focused predomi-
nantly on tumor regions rather than on sites of physiological tracer accumulation.

Comment 2
Which aspect of PRRT will lead to elevated AST/ALT? I know these markers can be triggered by many
activities, including overconsumption of alcohol and many others.

Answer 2
We thank the reviewer for this question. Importantly, all baseline laboratory values in our study, including
AST and ALT, were measured within four weeks before the first PRRT cycle. These values therefore reflect
the patients’ pre-treatment baseline status and are not related to any effects of PRRT itself. We agree
with the reviewer that AST and ALT are non-specific markers that can be elevated due to a wide range
of conditions, including pre-existing liver disease, metabolic disorders, medication effects, or hepatic tumor
involvement, among others. Given that 73% of our cohort presented with hepatic metastases, hepatic disease
burden may have contributed to the observed elevations in some patients, though we cannot determine
the exact underlying cause on an individual level. Regardless of their etiology, Ruhwedel et al. (reference
[11]) demonstrated that the De Ritis ratio (AST/ALT) carries prognostic value in NET patients undergoing
PRRT, suggesting that these parameters reflect clinically relevant disease characteristics. In the context of
our predictive model, their value lies in providing a readily available baseline signal that, when combined
with imaging features, contributes to the prognostic information captured by the multimodal framework.

Comment 3
Table 1: Pancreatic NETs sometimes make hormones such as gastrin, insulin, glucagon, and vasoactive
intestinal peptide. My question is, did the authors consider all these hormones, or were there some specific
primary hormones of interest? I’m looking at the functionality test.

Answer 3
We thank the reviewer for this question. In our study, “functionality” refers to the presence of a clinically
manifest hormonal syndrome (e.g., carcinoid syndrome, insulinoma, gastrinoma), as documented in patient
records. We did not systematically measure individual hormone levels (gastrin, insulin, glucagon, VIP)
across the cohort. Specific hormonal subtypes were not included as model features because (1) they were not
uniformly available, (2) individual syndrome subgroups were too small for meaningful model training, and
(3) our aim was to integrate broadly available biomarkers with imaging. We acknowledge this as a limitation.

Comment 4
Figure 6: I know UMAP uses stochastic gradient descent. Did the authors make different runs on the same
data for different visualizations, or was a fixed random seed set?

Answer 4
We thank the reviewer for this question. All UMAP visualizations presented in Figure 6 were computed using
the same random seed. The embeddings were extracted from a single cross-validation fold, both before and
after the fusion layer, so that the structural differences visible in the projections directly reflect the effect of
the fusion mechanism. Using the same random seed across all UMAP computations further ensures that any
observed differences in the projections are attributable solely to the learned representations, and not to the
stochastic nature of the UMAP algorithm itself.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The authors trained a deep learning model on 116 patients treated with 30 [177Lu]Lu-DOTATOC, including laboratory biomarkers and PET/CT imaging acquired 31 before therapy. The model integrated multiple information to predict PFS, which supported the precision medicine. The application is new and has the strong clinical mening.

The sample size is vary small, models are prone to overfitting; how did the authors address this? Could you describe the regularization techniques employed?
Some of the text in Figure 1 is extremely small, and the content is unclear.
How to evaluate the feature importance, can you elabroate the details?
Could authors eplace clear figures (a)-(c) in Figure 9 , and explain results in Figure 9?

Author Response

We sincerely thank the reviewer for their constructive and thoughtful comments, which have helped to
strengthen our manuscript. Below, we provide point-by-point responses to each comment.

Comment 1
The sample size is vary small, models are prone to overfitting; how did the authors address this? Could you
describe the regularization techniques employed?

Answer 1
We thank the reviewer for raising this important point. We agree that the relatively small sample size (n =
116) poses a risk of overfitting, and we took several measures to mitigate this.
Regarding regularization, all models were trained with a dropout rate of 0.1 and weight decay of 0.2
within the ADAM optimizer. These techniques penalize model complexity and reduce reliance on individual
neurons or weights, thereby limiting overfitting. High weight decay specifically was an important factor, with
experimental runs with low weight decay values tending to collapse frequently due to overfitting.
We included the discussion of these points in manuscript in Paragraph 2.5 Deep Learning Models. We
further acknowledge that external validation in larger, multicenter cohorts remains an important next step,
as stated in the Limitations and Conclusions sections.

Comment 2
Some of the text in Figure 1 is extremely small, and the content is unclear.

Answer 2
We thank the reviewer for this comment and have improved the readability of Figure 1 in the revised
manuscript by increasing font sizes, clarifying the layout and adding a legend.
We briefly clarify the figure content here. Figure 1 consists of two parts. The upper panel illustrates
the overall deep learning pipeline: 3D PET and 3D CT scans are each processed through dedicated 3D
convolutional branches, and the resulting image embeddings are concatenated with laboratory biomarkers
at a fusion layer. The fused representation is then passed through fully connected layers to generate the
final PRRT response prediction. Solid arrows represent the forward pass, while dashed arrows indicate the
post-hoc backward analyses used for explainability, namely gradient-based saliency maps, feature importance,
and UMAP-based embedding visualization. The lower panel provides a concise tabular overview of the seven
input modality combinations evaluated in our experiments, ranging from unimodal setups (laboratory only,
PET only, CT only) to full multimodal fusion with and without CT pretraining.
We reworked the figure caption to clarify structure and purpose of this figure in the updated manuscript.

Comment 3
How to evaluate the feature importance, can you elaborate the details?

Answer 3
We thank the reviewer for this question and are happy to elaborate on the feature importance methodology.
Feature importance for the laboratory biomarkers was derived using a gradient-based approach. Specifi-
cally, for each sample in the dataset, we enabled gradient computation with respect to the tabular (laboratory)
input and performed a forward pass through the trained model. We then backpropagated through the net-
work and computed the absolute value of the gradients at the tabular input layer. These absolute gradients
reflect how sensitively the model output responds to small perturbations in each input feature — features
with larger gradients are thus considered more influential to the model’s prediction. For each sample, the
gradients were normalized by the sample’s maximum gradient value to ensure comparability across patients.
Final importance scores were averaged across all samples.
Figure 7 in the manuscript provides a visual overview of feature importance across all model configura-
tions, including the Random Forest baseline, PET Fusion, CT Fusion, PET CT Fusion, and the pretrained
PET CT Fusion model. Notably, CgA showed particularly high importance especially in the multimodal
fusion models, which is consistent with the established literature linking elevated baseline CgA to shorter
progression-free and overall survival in NET patients undergoing PRRT.
Importantly, this alignment with known prognostic markers also serves as a plausibility check for our
model: the fact that features with established clinical relevance are independently identified as most influential
by the gradient-based analysis suggests that the model is capturing meaningful biological signal rather than
relying on spurious correlations. This reinforces confidence in the model’s decision-making process and
supports the clinical interpretability of the multimodal framework.

Comment 4
Could authors replace clear figures (a)-(c) in Figure 9, and explain results in Figure 9?

Answer 4
We thank the reviewer for this comment and redid panels (a) - (c) in figure 9 to increase clarity.
Figure 9 presents a qualitative explainability analysis for a single representative patient, comparing
gradient-based saliency maps and gradient magnitude distributions between the PET Only and PET Fu-
sion models. Panel (a) shows the raw [68Ga]Ga-DOTA-PET scan in three anatomical planes. Panels (b) and
(c) show the corresponding saliency heatmaps, and panels (d) and (e) the gradient magnitude distributions,
for the PET Only and PET Fusion models respectively.
The PET Only model produces high gradient activations concentrated on medically irrelevant structures
such as the urinary bladder, with an irregular, spiky gradient distribution — consistent with its poor predictive
performance (AUROC 0.42) and indicative of optimization instability. It is well established in the deep
learning literature that models with erratic, high-magnitude gradient distributions tend to generalize poorly,
while smoother, more concentrated distributions are characteristic of stable and well-trained networks. In
contrast, the PET Fusion model yields smoother, more concentrated gradient distributions with saliency
focused predominantly on tumorous regions. This is clinically plausible and in line with the substantially
better predictive performance of the fusion model, suggesting that the incorporation of laboratory biomarkers
stabilizes training and guides the imaging branch toward more relevant anatomical structures.