Enhancing Reliable Prostate Lesion Detection: Integrating Multi-Expert Annotations and Tailored nnU-Net Ensemble Learning Strategies
Abstract
1. Introduction
1.1. Related Work
1.2. Aim of This Work
2. Materials and Methods
2.1. Study Sample
2.2. Image Acquisition
2.3. Multi-Reader Annotation
2.4. Dataset Characteristics
2.5. Training Strategy
2.6. Deep Learning Model Architecture and Training
2.6.1. Preprocessing
2.6.2. Network Architecture and Loss Function
2.6.3. Training Parameters
2.7. Evaluation
3. Results
4. Discussion
4.1. Clinical and Technical Relevance of Multi-Expert Training
4.2. Impact of Focal Loss and Anatomical Preprocessing
4.3. Broader Methodological Implications and Clinical Translation
4.4. Study Limitations and Fairness Considerations
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Siegel, R.L.; Miller, K.D.; Wagle, N.S.; Jemal, A. Cancer statistics, 2023. CA Cancer J. Clin. 2023, 73, 17–48. [Google Scholar] [CrossRef]
- Zhu, Y.; Wang, H.K.; Qu, Y.Y.; Ye, D.W. Prostate cancer in East Asia: Evolving trend over the last decade. Asian J. Androl. 2015, 17, 48–57. [Google Scholar] [CrossRef]
- Padhani, A.R.; Weinreb, J.; Rosenkrantz, A.B.; Villeirs, G.; Turkbey, B.; Barentsz, J. PI-RADS v2 status update and future directions. Eur. Urol. 2019, 75, 385–396. [Google Scholar] [CrossRef] [PubMed]
- Greer, M.D.; Shih, J.H.; Lay, N.; Barrett, T.; Bittencourt, L.; Borofsky, S.; Kabakus, I.; Law, Y.M.; Marko, J.; Shebel, H.; et al. Interreader variability of Prostate Imaging Reporting and Data System version 2 in detecting and assessing prostate cancer lesions at prostate MRI. AJR Am. J. Roentgenol. 2019, 212, 1197–1205. [Google Scholar] [CrossRef] [PubMed]
- Sonn, G.A.; Fan, R.E.; Ghanouni, P.; Wang, N.N.; Brooks, J.D.; Loening, A.M.; Daniel, B.L.; To’o, K.J.; Thong, A.E.; Leppert, J.T. Prostate magnetic resonance imaging interpretation varies substantially across radiologists. Eur. Urol. Focus 2019, 5, 592–599. [Google Scholar] [CrossRef] [PubMed]
- Jóźwiak, R.; Sobecki, P.; Lorenc, T. Intraobserver and Interobserver Agreement between Six Radiologists Describing mpMRI Features of Prostate Cancer Using a PI-RADS 2.1 Structured Reporting Scheme. Life 2023, 13, 580. [Google Scholar] [CrossRef]
- Bosma, J.S.; Saha, A.; Hosseinzadeh, M.; Slootweg, I.; de Rooij, M.; Huisman, H. Semisupervised Learning with Report-guided Pseudo Labels for Deep Learning-based Prostate Cancer Detection Using Biparametric MRI. Radiol. Artif. Intell. 2023, 5, e230031. [Google Scholar] [CrossRef]
- Isensee, F.; Jaeger, P.F.; Kohl, S.A.A.; Petersen, J.; Maier-Hein, K.H. nnU-Net: A self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 2021, 18, 203–211. [Google Scholar] [CrossRef]
- Bardis, M.; Houshyar, R.; Chantaduly, C.; Tran-Harding, K.; Ushinsky, A.; Chahine, C.; Rupasinghe, M.; Chow, D.; Chang, P. Segmentation of the prostate transition zone and peripheral zone on MR images with deep learning. Radiol. Imaging Cancer 2021, 3, e200024. [Google Scholar] [CrossRef]
- Aldoj, N.; Biavati, F.; Michallek, F.; Stober, S.; Dewey, M. Automatic prostate and prostate zones segmentation of magnetic resonance images using DenseNet-like U-net. Sci. Rep. 2020, 10, 14315. [Google Scholar] [CrossRef]
- Alzate-Grisales, J.A.; Mora-Rubio, A.; García-García, F.; Tabares-Soto, R.; Iglesia-Vayá, M.D.L. SAM-UNETR: Clinically significant prostate cancer segmentation using transfer learning from large model. IEEE Access 2023, 11, 118217–118232. [Google Scholar] [CrossRef]
- Song, E.; Long, J.; Ma, G.; Liu, H.; Hung, C.C.; Jin, R.; Wang, P.; Wang, W. Prostate lesion segmentation based on a 3D end-to-end convolutional neural network with deep multi-scale attention. Magn. Reson. Imaging 2023, 99, 98–109. [Google Scholar] [CrossRef]
- Mitura, J.; Jóźwiak, R.; Mycka, J.; Mykhalevych, I.; Gonet, M.; Sobecki, P.; Lorenc, T.; Tupikowski, K. Ensemble Deep Learning Models for Segmentation of Prostate Zonal Anatomy and Pathologically Suspicious Areas. In Proceedings of the Medical Image Understanding and Analysis; Yap, M.H., Kendrick, C., Behera, A., Cootes, T., Zwiggelaar, R., Eds.; Springer: Cham, Switzerland, 2024; pp. 217–231. [Google Scholar]
- Saha, A.; Bosma, J.S.; Twilt, J.J.; Van Ginneken, B.; Bjartell, A.; Padhani, A.R.; Bonekamp, D.; Villeirs, G.; Salomon, G.; Giannarini, G.; et al. Artificial intelligence and radiologists in prostate cancer detection on MRI (PI-CAI): An international, paired, non-inferiority, confirmatory study. Lancet Oncol. 2024, 25, 879–887. [Google Scholar] [CrossRef] [PubMed]
- Adams, L.C.; Makowski, M.R.; Engel, G.; Rattunde, M.; Busch, F.; Asbach, P.; Niehues, S.M.; Vinayahalingam, S.; van Ginneken, B.; Litjens, G.; et al. Prostate158—An expert-annotated 3T MRI dataset and algorithm for prostate cancer detection. Comput. Biol. Med. 2022, 148, 105817. [Google Scholar] [CrossRef] [PubMed]
- Warfield, S.K.; Zou, K.H.; Wells, W.M. Simultaneous truth and performance level estimation (STAPLE): An algorithm for the validation of image segmentation. IEEE Trans. Med. Imaging 2004, 23, 903–921. [Google Scholar] [CrossRef]
- Le, K.H.; Tran, T.V.; Pham, H.H.; Nguyen, H.T.; Le, T.T.; Nguyen, H.Q. Learning from multiple expert annotators for enhancing anomaly detection in medical image analysis. arXiv 2022, arXiv:2203.10611. [Google Scholar] [CrossRef]
- Mirikharaji, Z.; Abhishek, K.; Izadi, S.; Hamarneh, G. D-LEMA: Deep learning ensembles from multiple annotations—Application to skin lesion segmentation. arXiv 2021, arXiv:2012.07206. [Google Scholar] [CrossRef]
- Hamm, C.A.; Baumgärtner, G.L.; Biessmann, F.; Beetz, N.L.; Hartenstein, A.; Savic, L.J.; Froböse, K.; Dräger, F.; Schallenberg, S.; Rudolph, M.; et al. Interactive Explainable Deep Learning Model Informs Prostate Cancer Diagnosis at MRI. Radiology 2023, 307, e222276. [Google Scholar] [CrossRef]
- Fassia, M.K.; Balasubramanian, A.; Woo, S.; Vargas, H.A.; Hricak, H.; Konukoglu, E.; Becker, A.S. Deep Learning Prostate MRI Segmentation Accuracy and Robustness: A Systematic Review. Radiol. Artif. Intell. 2024, 6, e230138. [Google Scholar] [CrossRef]
- Cai, J.C.; Nakai, H.; Kuanar, S.; Froemming, A.T.; Bolan, C.W.; Kawashima, A.; Takahashi, H.; Mynderse, L.A.; Dora, C.D.; Humphreys, M.R.; et al. Fully Automated Deep Learning Model to Detect Clinically Significant Prostate Cancer at MRI. Radiology 2024, 312, e232635. [Google Scholar] [CrossRef]
- Giganti, F.; Moreira da Silva, N.; Yeung, M.; Davies, L.; Frary, A.; Ferrer Rodriguez, M.; Sushentsev, N.; Ashley, N.; Andreou, A.; Bradley, A.; et al. AI-powered Prostate Cancer Detection: A Multi-centre, Multi-scanner Validation Study. Eur. Radiol. 2025, 35, 4915–4924. [Google Scholar] [CrossRef] [PubMed]
- Turkbey, B.; Huisman, H.; Fedorov, A.; Macura, K.J.; Margolis, D.J.; Panebianco, V.; Oto, A.; Schoots, I.G.; Siddiqui, M.M.; Moore, C.M.; et al. Requirements for AI Development and Reporting for MRI Prostate Cancer Detection in Biopsy-Naive Men: PI-RADS Steering Committee, Version 1.0. Radiology 2025, 315, e240140. [Google Scholar] [CrossRef]
- Lekadir, K.; Frangi, A.F.; Porras, A.R.; Glocker, B.; Cintas, C.; Langlotz, C.P.; Weicken, E.; Asselbergs, F.W.; Prior, F.; Collins, G.S.; et al. FUTURE-AI: International Consensus Guideline for Trustworthy and Deployable Artificial Intelligence in Healthcare. BMJ 2025, 388, e081554. [Google Scholar] [CrossRef]
- Gonet, M.; Majorek, S.; Mycka, J.; Mykhalevych, I.; Jozwiak, R. Improving prostate lesion detection with multiple annotations and ensemble techniques. In Proceedings of the MIDI, Warsaw, Poland, 12 December 2024. [Google Scholar] [CrossRef]
- Lowekamp, B.C.; Chen, D.T.; Ibáñez, L.; Blezek, D. The Design of SimpleITK. Front. Neuroinform. 2013, 7, 45. [Google Scholar] [CrossRef] [PubMed]
- Pical Evaluator. Available online: https://github.com/DIAGNijmegen/picai_eval (accessed on 15 October 2025).
- Laganà, F. Design and Simulation-Based Validation of an Embedded Acquisition Architecture for In Situ PCB Integrity Monitoring in Biomedical Devices. Electronics 2026, 15, 833. [Google Scholar] [CrossRef]



| Total Patients | Development Subset | Test Subset | |
|---|---|---|---|
| Characteristics | (n = 378, 100%) | (n = 323, 85.45%) | (n = 55, 14.55%) |
| Age (years), median (IQR) | 68 (63–73) | 68 (64–73) | 69 (62–71) |
| PSA (ng/mL), mean (SD) | 10.47 (10.37) | 10.72 (10.76) | 9.03 (7.66) |
| Prostate volume (mL) mean (SD) | 56.1 (29.7) | 55.1 (30.0) | 62.1 (27.5) |
| ISUP score, case (%) | |||
| 0 | 197 (52.1%) | 168 (44.4%) | 29 (7.7%) |
| 1 | 80 (21.2%) | 68 (18.0%) | 12 (3.2%) |
| 2 | 53 (14%) | 46 (12.2%) | 7 (1.9%) |
| 3 | 35 (9.3%) | 29 (7.7%) | 6 (1.6%) |
| 4 | 8 (2.1%) | 7 (1.9%) | 1 (0.3%) |
| 5 | 5 (1.3%) | 5 (1.3%) | 0 (0%) |
| Metric | U-Net (Baseline) | nnU-Net (Proposed) | (%) | p-Value | 95% CI |
|---|---|---|---|---|---|
| Consensus-based Annotation Learning Strategy | |||||
| Average Precision (AP) | 0.69 | 0.75 | +8.7% | 0.012 | 0.70–0.78 |
| AUROC | 0.92 | 0.96 | +4.3% | 0.021 | 0.93–0.97 |
| Dice Coefficient | 0.51 | 0.52 | +1.9% | 0.26 | 0.49–0.54 |
| PICAI Score | 0.81 | 0.85 | +4.9% | 0.047 | 0.81–0.87 |
| Multi-expert Annotation Learning Strategy | |||||
| Average Precision (AP) | 0.76 | 0.81 | +6.6% | <0.01 | 0.77–0.83 |
| AUROC | 0.95 | 0.99 | +4.2% | <0.01 | 0.97–1.00 |
| Dice Coefficient | 0.51 | 0.56 | +9.8% | <0.01 | 0.52–0.58 |
| PICAI Score | 0.86 | 0.90 | +4.7% | 0.018 | 0.87–0.92 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Jozwiak, R.; Gonet, M.; Mycka, J.; Mykhalevych, I.; Radomski, D.S.; Tupikowski, K.; Lorenc, T.; Dolowy, J.; Zacharzewska-Gondek, A. Enhancing Reliable Prostate Lesion Detection: Integrating Multi-Expert Annotations and Tailored nnU-Net Ensemble Learning Strategies. Appl. Sci. 2026, 16, 3932. https://doi.org/10.3390/app16083932
Jozwiak R, Gonet M, Mycka J, Mykhalevych I, Radomski DS, Tupikowski K, Lorenc T, Dolowy J, Zacharzewska-Gondek A. Enhancing Reliable Prostate Lesion Detection: Integrating Multi-Expert Annotations and Tailored nnU-Net Ensemble Learning Strategies. Applied Sciences. 2026; 16(8):3932. https://doi.org/10.3390/app16083932
Chicago/Turabian StyleJozwiak, Rafal, Michal Gonet, Jan Mycka, Ihor Mykhalevych, Dariusz S. Radomski, Krzysztof Tupikowski, Tomasz Lorenc, Joanna Dolowy, and Anna Zacharzewska-Gondek. 2026. "Enhancing Reliable Prostate Lesion Detection: Integrating Multi-Expert Annotations and Tailored nnU-Net Ensemble Learning Strategies" Applied Sciences 16, no. 8: 3932. https://doi.org/10.3390/app16083932
APA StyleJozwiak, R., Gonet, M., Mycka, J., Mykhalevych, I., Radomski, D. S., Tupikowski, K., Lorenc, T., Dolowy, J., & Zacharzewska-Gondek, A. (2026). Enhancing Reliable Prostate Lesion Detection: Integrating Multi-Expert Annotations and Tailored nnU-Net Ensemble Learning Strategies. Applied Sciences, 16(8), 3932. https://doi.org/10.3390/app16083932

