Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

An Adaptive Attention 3D U-Net for High-Fidelity MRI-to-CT Synthesis: Bridging the Anatomical Gap with CBAM

Diagnostics 2026, 16(6), 875; https://doi.org/10.3390/diagnostics16060875

by Chaima Bensebihi^1,*

, Nacer Eddine Benzebouchi²

, Nawel Zemmal^2,3

, Abdallah Namoun^4,5,*

, Aida Chefrour^3,6

and Siham Amrouch^1,3

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Diagnostics 2026, 16(6), 875; https://doi.org/10.3390/diagnostics16060875

Submission received: 9 January 2026 / Revised: 24 February 2026 / Accepted: 6 March 2026 / Published: 16 March 2026

(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The manuscript proposes a CBAM-enhanced 3D U-Net for MRI-to-CT synthesis and evaluates it on the SynthRAD2023 dataset.

I have concerns about: How this design differs from Attention U-Net and other dual-attention architectures.

Clearly explain metric computation and add a short justification of the clinical relevance of the reported HU error.

Lack of comparison with previous studies .

The satament of data availability is missing.

the conclusion part is bvery short.

Can you apply your claim on clincial data, at least one case, to check the feasibility in the clinical applications?

Add a clear subsection describing the dataset split strategy and fold construction.

Author Response

Kindly find attached our corrections to the reviews.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Re-Review Manuscript Title: An Adaptive Attention 3D U-Net for High-Fidelity MRI-to-CT Synthesis: Bridging the Anatomical Gap with CBAM

Review of the Manuscript

The manuscript is well-structured and addresses a clinically relevant problem (MRI-only workflows) with a technically sound solution. The integration of CBAM into skip connections is a novel, targeted enhancement that directly mitigates a key limitation of traditional U-Nets (noisy feature transfer). The use of the SynthRAD2023 dataset (a benchmark for radiotherapy planning) and comprehensive evaluation (quantitative + qualitative + ablation) strengthens the validity of results.

However, the work has limitations in generalizability, clinical translation, and technical rigor. While the model shows promise, it requires substantial refinement to move from a "proof-of-concept" to a clinically deployable tool.

Detailed Comments

Small, Homogeneous Dataset:

The study uses only the SynthRAD2023 brain dataset (n=322 test cases) and excludes the pelvis/CBCT data from the challenge. The dataset is limited to adult patients (no pediatric/elderly data) and lacks diversity in pathology (e.g., tumors, trauma). This limits generalizability to real-world clinical populations (e.g., children, patients with brain lesions).

No Data Augmentation:

The authors explicitly state no data augmentation was used (beyond normalization/cropping). 3D medical data is inherently limited, and augmentation (e.g., rotation, scaling, elastic deformation) is critical to prevent overfitting and improve robustness. The lack of augmentation raises concerns about the model’s ability to handle anatomical variability.

Fixed Hyperparameters:

All experiments use a fixed learning rate (1e-4), batch size (1), and 100 epochs—no hyperparameter tuning (e.g., learning rate scheduling, batch size optimization) is reported. This "one-size-fits-all" approach may not be optimal for different datasets or clinical scenarios.

Simplified Preprocessing:

The center crop to 128×128×128 voxels is arbitrary and may discard critical peripheral anatomy (e.g., skull base, temporal bones).
The mask.nii.gz (provided in the dataset) is ignored, even though it could improve segmentation accuracy (e.g., focusing on brain tissue vs. background).
No bias field correction is applied to MRI data, which can introduce intensity inhomogeneities and degrade model performance.

Unvalidated Preprocessing:

The Z-score normalization for MRI and linear scaling for CT is standard but not validated for the SynthRAD2023 dataset. For example, the Z-score may not account for inter-patient intensity variability, and the CT clipping range ([-1000, 2000] HU) may exclude pathological densities (e.g., calcifications >2000 HU).

High Computational Cost:

The model has 42 million parameters—large for 3D medical imaging, especially in clinical settings with limited GPU memory.
Training time: 9–10 hours per session (Tesla T4/V100).
Inference time: 0.68 seconds per volume (vs. 0.60 seconds for baseline)—a 13% increase that may be prohibitive in time-sensitive scenarios (e.g., intraoperative planning).
Memory usage: 7–12% increase during training due to CBAM’s attention map computations.

CBAM’s Data Dependence:

CBAM’s performance is highly sensitive to data quality. The model was trained on pre-aligned, pre-processed data—if applied to real-world data (with misalignments, noise, or artifacts), CBAM may emphasize non-informative patterns (e.g., motion artifacts) rather than anatomical features.

Overfitting Risk:

The small dataset and lack of augmentation increase the risk of overfitting. The model may "memorize" training data (e.g., specific patient anatomies) rather than learning generalizable features. The authors do not report overfitting metrics (e.g., training vs. validation loss curves) to address this.

Competition from Simpler Attention Mechanisms:

The manuscript does not compare CBAM to lighter attention mechanisms (e.g., Squeeze-and-Excitation (SE) blocks, Efficient Attention) that achieve similar performance with lower computational cost. For clinical use, a "good enough" model with 1/10 the parameters of CBAM are more practical.

Low Resolution:

The 128×128×128 voxel resolution is sufficient for exploratory research but inadequate for clinical use. High-resolution images (e.g., 256×256×256) are required to capture fine anatomical details (e.g., skull sutures, small air cavities) critical for radiotherapy planning.

No Cross-Modal Generalization:

The model is trained only on brain MRI-to-CT synthesis. It is not tested on other modalities (e.g., CT-to-MRI) or anatomical regions (e.g., chest, abdomen), limiting its utility for multi-modal or multi-site workflows.

Unproven Clinical Utility:

The authors claim the model "reduces radiation exposure" and "streamlines workflow" but provide no clinical outcome data (e.g., time saved per case, reduction in CT scan requests, improved treatment plan accuracy). These claims are speculative without real-world testing.

No Multi-Center Validation:

The model is trained and tested on a single dataset (SynthRAD2023). It is not validated on data from other centers, scanners, or protocols—critical for ensuring generalizability to diverse clinical environments.

Overstated Comparisons:

The comparison to state-of-the-art (SOTA) methods (Table 3) is misleading:

The proposed model’s MAE (41 HU) is higher than the cGAN method (~38 HU) but the authors claim it is "lower." This is a quantitative error (likely a typo: 41 vs. 38).
The FedSynth CT-Brain method is a federated learning model (trained on multi-center data) but the proposed model is trained on a single dataset. Comparing their performance without accounting for data diversity is unfair.

No Longitudinal or Treatment Data:

The study does not test the model’s ability to:

Track disease progression (e.g., tumor growth) over time.
Support treatment response assessment (e.g., changes in bone density after radiation).
Generate CT images for retrospective studies (e.g., matching historical MRI data to CT for research).

Incomplete Methodological Details:

The 3D U-Net architecture (e.g., number of layers, filter sizes, activation functions) is not fully described.
The CBAM implementation (e.g., MLP layer sizes, kernel size for spatial attention) is not provided.
The training pipeline (e.g., data loading, loss function, optimizer parameters) is not open-sourced.

No Code or Model Availability:

The authors state the model is "published on Hugging Face" but do not provide a link to the repository (e.g., model weights, inference code). This makes it impossible to reproduce results or build on the work.

Grammar Issues and Typos

The manuscript has several grammatical errors and typos that detract from its professionalism. Key issues include:

"reliance on deep learning techniques has become widespread" (redundant "has become") → "reliance on deep learning techniques has become widespread" (remove "has become" for conciseness)
"the attention gate in the Attention U-Net model is one of the most common mechanisms" (remove "of these")
"the model was trained and executed in a Google Colab environment, equipped with an NVIDIA Tesla T4 GPU (16 GB of memory)" (use parentheses for clarity)
“table x”, “figure x” should be capitalized. “Column one”-> “Column One”
References: Inconsistent journal name formatting.
The resolution of Figs 1, 5, 7, 8 should be improved.

Comments on the Quality of English Language

the Quality of English Language needs to be improved

Author Response

Kindly find attached our corrections to the reviews.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

I have the following concerns for the authors:

Inconsistent metric units — the manuscript reports MAE as 0.028 ± 0.007 and elsewhere as 41 HU; please clarify the unit/conversion used and ensure all tables/plots report the same (normalized vs HU) convention.
No independent/external test — results are reported from internal cross-validation only (SynthRAD2023); add an external hold-out cohort or multi-centre test to demonstrate generalizability.
Preprocessing under-specified — center-cropping to 128×128×128 while ignoring the provided mask.nii.gz (p.5) may truncate anatomy inconsistently; justify this choice and quantify anatomical coverage after cropping.
Possible data-leakage in splits — the manuscript must explicitly state patient-level split procedure (IDs per fold) and confirm no slice/volume overlap between train/val/test folds.
Reproducibility gaps — only torch.manual_seed(42) is mentioned; report full seed strategy (numpy/random/cudnn), exact optimizer hyperparameters (e.g., AdamW weight-decay), environment, and share training code/config.
Missing statistical significance — improvements in the ablation (Table 2) are shown as mean±SD but lack inferential testing or CIs; provide paired statistical tests or CIs to support claims of improvement.

Author Response

Kindly find attached our corrections to the reviews.

Author Response File: Author Response.pdf

Round 2

Reviewer 3 Report

Comments and Suggestions for Authors

I am satisfied with authors response, may be accepted for publication.

Article Menu

An Adaptive Attention 3D U-Net for High-Fidelity MRI-to-CT Synthesis: Bridging the Anatomical Gap with CBAM

Further Information

Guidelines

MDPI Initiatives

Follow MDPI