YOLO-Based Shading Artifact Reduction for CBCT-to-MDCT Translation Using Two-Stage Learning
Abstract
1. Introduction
- A two-stage learning strategy that separates global domain mapping (Stage 1) from localized artifact correction (Stage 2), reducing optimization complexity and improving training stability.
- A YOLO-based region correction loss that applies gradient magnitude minimization selectively to detected artifact regions through a fully differentiable formulation, enabling direct generator optimization while preserving anatomical structures.
- A self-regulating mechanism where YOLO detection confidence naturally decreases as artifacts diminish, providing automatic adjustment of correction intensity without manual intervention.
- Experimental validation demonstrating 14.0% artifact score reduction while maintaining structural similarity (SSIM > 0.96) on a dataset of 11,000 CBCT and 23,500 MDCT images, with ablation studies confirming the superiority of two-stage learning over joint training.
2. Related Work
2.1. Unpaired Image-to-Image Translation
2.2. Deep Learning for CT Image Enhancement
2.3. Object Detection in Medical Imaging
3. Materials and Methods
3.1. Dataset
3.2. Method Overview
3.3. Network Architecture
3.3.1. Generator Architecture
3.3.2. Discriminator Architecture
3.4. Stage 1: Base CycleGAN Training
3.4.1. Adversarial Loss
3.4.2. Cycle Consistency Loss
3.4.3. Total Loss Functions
3.5. Stage 2: YOLO-Based Fine-Tuning
3.5.1. YOLO Artifact Detector Training
3.5.2. Region Correction Loss
3.5.3. Stage 2 Generator Loss
3.5.4. Self-Regulating Mechanism
3.6. Implementation Details
4. Results
4.1. Evaluation Metrics
4.1.1. Artifact Score
4.1.2. Structure Preservation Metrics
4.1.3. Independent Artifact Metrics
4.2. Comparison Models
4.3. Quantitative Results
4.4. Qualitative Results
4.5. 3D Volumetric Consistency
4.6. Frequency-Domain Analysis
4.7. Ablation Study
4.7.1. Training Strategy
4.7.2. Loss Function
4.7.3. Pareto Front Analysis
4.8. YOLO Detector Performance
4.9. Computational Cost
5. Discussion
5.1. Effectiveness of Two-Stage Learning
5.2. Benefits of YOLO Integration
5.3. Preservation of Anatomical Structures
5.4. Clinical Implications
5.5. Limitations and Future Directions
6. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| CBCT | Cone-Beam Computed Tomography |
| MDCT | Multi-Detector Computed Tomography |
| CT | Computed Tomography |
| GAN | Generative Adversarial Network |
| CycleGAN | Cycle-Consistent Generative Adversarial Network |
| CNN | Convolutional Neural Network |
| YOLO | You Only Look Once |
| SSIM | Structural Similarity Index |
| NMS | Nonmaximum Suppression |
| LSGAN | Least Squares GAN |
References
- Kalender, W.A. Computed Tomography: Fundamentals, System Technology, Image Quality, Applications, 2nd ed.; Publicis Corporate Publishing: Erlangen, Germany, 2005. [Google Scholar]
- Scarfe, W.C.; Farman, A.G. What is cone-beam CT and how does it work? Dent. Clin. N. Am. 2008, 52, 707–730. [Google Scholar] [CrossRef] [PubMed]
- Jaffray, D.A.; Siewerdsen, J.H. Cone-beam computed tomography with a flat-panel imager: Initial performance characterization. Med. Phys. 2000, 27, 1311–1323. [Google Scholar] [CrossRef] [PubMed]
- Kida, S.; Kaji, S.; Nawa, K.; Imae, T.; Nakamoto, T.; Ozaki, S.; Ohta, T.; Nozawa, Y.; Nakagawa, K. Visual enhancement of cone-beam CT by use of CycleGAN. Med. Phys. 2020, 47, 998–1010. [Google Scholar] [CrossRef] [PubMed]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Topol, E.J. High-performance medicine: The convergence of human and artificial intelligence. Nat. Med. 2019, 25, 44–56. [Google Scholar] [CrossRef] [PubMed]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention (MICCAI), Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Rajpurkar, P.; Irvin, J.; Zhu, K.; Yang, B.; Mehta, H.; Duan, T.; Ding, D.; Bagul, A.; Langlotz, C.; Shpanskaya, K.; et al. CheXNet: Radiologist-level pneumonia detection on chest X-rays with deep learning. arXiv 2017, arXiv:1711.05225. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Montreal, QC, Canada, 8–13 December 2014; pp. 2672–2680. [Google Scholar]
- Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
- Liang, X.; Chen, L.; Nguyen, D.; Zhou, Z.; Gu, X.; Yang, M.; Wang, J.; Jiang, S. Generating synthesized computed tomography (CT) from cone-beam computed tomography (CBCT) using CycleGAN for adaptive radiation therapy. Phys. Med. Biol. 2019, 64, 125002. [Google Scholar] [CrossRef] [PubMed]
- Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-Unet: Unet-like pure transformer for medical image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Tel Aviv, Israel, 23–27 October 2022; pp. 205–218. [Google Scholar]
- Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Virtual, 6–12 December 2020; pp. 6840–6851. [Google Scholar]
- Isola, P.; Zhu, J.Y.; Zhou, T.; Efros, A.A. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1125–1134. [Google Scholar]
- Mirza, M.; Osindero, S. Conditional generative adversarial nets. arXiv 2014, arXiv:1411.1784. [Google Scholar] [CrossRef]
- Wang, T.C.; Liu, M.Y.; Zhu, J.Y.; Tao, A.; Kautz, J.; Catanzaro, B. High-resolution image synthesis and semantic manipulation with conditional GANs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 8798–8807. [Google Scholar]
- Kim, T.; Cha, M.; Kim, H.; Lee, J.K.; Kim, J. Learning to discover cross-domain relations with generative adversarial networks. In Proceedings of the International Conference on Machine Learning (ICML), Sydney, Australia, 6–11 August 2017; pp. 1857–1865. [Google Scholar]
- Yi, Z.; Zhang, H.; Tan, P.; Gong, M. DualGAN: Unsupervised dual learning for image-to-image translation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2849–2857. [Google Scholar]
- Liu, M.Y.; Breuel, T.; Kautz, J. Unsupervised image-to-image translation networks. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017; pp. 700–708. [Google Scholar]
- Huang, X.; Liu, M.Y.; Belongie, S.; Kautz, J. Multimodal unsupervised image-to-image translation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 172–189. [Google Scholar]
- Choi, Y.; Choi, M.; Kim, M.; Ha, J.W.; Kim, S.; Choo, J. StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 8789–8797. [Google Scholar]
- Yi, X.; Walia, E.; Babyn, P. Generative adversarial network in medical imaging: A review. Med. Image Anal. 2019, 58, 101552. [Google Scholar] [CrossRef]
- Chen, H.; Zhang, Y.; Kalra, M.K.; Lin, F.; Chen, Y.; Liao, P.; Zhou, J.; Wang, G. Low-dose CT with a residual encoder-decoder convolutional neural network. IEEE Trans. Med. Imaging 2017, 36, 2524–2535. [Google Scholar] [CrossRef] [PubMed]
- Milletari, F.; Navab, N.; Ahmadi, S.A. V-Net: Fully convolutional neural networks for volumetric medical image segmentation. In Proceedings of the International Conference on 3D Vision (3DV), Stanford, CA, USA, 25–28 October 2016; pp. 565–571. [Google Scholar]
- Oktay, O.; Schlemper, J.; Folgoc, L.L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; McDonagh, S.; Hammerla, N.Y.; Kainz, B.; et al. Attention U-Net: Learning where to look for the pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar] [CrossRef]
- Wolterink, J.M.; Dinkla, A.M.; Savenije, M.H.F.; Seevinck, P.R.; van den Berg, C.A.T.; Isgum, I. Deep MR to CT synthesis using unpaired data. In Proceedings of the Simulation and Synthesis in Medical Imaging (SASHIMI), Quebec City, QC, Canada, 14 September 2017; pp. 14–23. [Google Scholar]
- Kang, E.; Koo, H.J.; Yang, D.H.; Seo, J.B.; Ye, J.C. Cycle-consistent adversarial denoising network for multiphase coronary CT angiography. Med. Phys. 2019, 46, 550–562. [Google Scholar] [CrossRef] [PubMed]
- Altalib, A.; McGregor, S.; Li, C.; Perelli, A. Synthetic CT Image Generation From CBCT: A Systematic Review. IEEE Trans. Radiat. Plasma Med. Sci. 2025, 9, 691–707. [Google Scholar] [CrossRef]
- Salimans, T.; Goodfellow, I.; Zaremba, W.; Cheung, V.; Radford, A.; Chen, X. Improved techniques for training GANs. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Barcelona, Spain, 5–10 December 2016; pp. 2234–2242. [Google Scholar]
- Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the International Conference on Machine Learning (ICML), Sydney, Australia, 6–11 August 2017; pp. 214–223. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Montreal, QC, Canada, 7–12 December 2015; pp. 91–99. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar] [CrossRef]
- Jocher, G.; Qiu, J. Ultralytics YOLO11. 2024. Available online: https://github.com/ultralytics/ultralytics (accessed on 5 January 2025).
- Neubeck, A.; Van Gool, L. Efficient non-maximum suppression. In Proceedings of the International Conference on Pattern Recognition (ICPR), Hong Kong, China, 20–24 August 2006; pp. 850–855. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Miyato, T.; Kataoka, T.; Koyama, M.; Yoshida, Y. Spectral normalization for generative adversarial networks. In Proceedings of the International Conference on Learning Representations (ICLR), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Mao, X.; Li, Q.; Xie, H.; Lau, R.Y.K.; Wang, Z.; Smolley, S.P. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2794–2802. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Shrivastava, A.; Pfister, T.; Tuzel, O.; Susskind, J.; Wang, W.; Webb, R. Learning from simulated and unsupervised images through adversarial training. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2107–2116. [Google Scholar]
- Ulyanov, D.; Vedaldi, A.; Lempitsky, V. Instance normalization: The missing ingredient for fast stylization. arXiv 2016, arXiv:1607.08022. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning (ICML), Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]







| Model | Artifact Score (↓) | PSNR (↑) | SSIM (↑) | INU (↓) | SI (↓) |
|---|---|---|---|---|---|
| Baseline | 0.516 | 18.222 ± 1.520 | 0.572 ± 0.072 | 1.014 ± 0.837 | 0.208 ± 0.021 |
| [18.202, 18.243] | [0.571, 0.573] | [1.003, 1.026] | [0.208, 0.208] | ||
| Joint | 0.575 | 18.136 ± 1.565 | 0.561 ± 0.073 | 1.079 ± 1.050 | 0.217 ± 0.021 |
| [18.115, 18.157] | [0.560, 0.562] | [1.065, 1.093] | [0.217, 0.218] | ||
| Proposed | 0.444 | 18.282 ± 1.520 | 0.583 ± 0.072 | 0.974 ± 0.781 | 0.204 ± 0.021 |
| [18.261, 18.302] | [0.582, 0.584] | [0.964, 0.985] | [0.204, 0.205] |
| Model | Dice (↑) | SSIM (↑) |
|---|---|---|
| Fine-tuning ( = 1.0) | 0.849 ± 0.078 | 0.971 ± 0.012 |
| Fine-tuning ( = 3.0) | 0.788 ± 0.091 | 0.962 ± 0.015 |
| Fine-tuning ( = 5.0) | 0.818 ± 0.095 | 0.968 ± 0.015 |
| Model | Artifact (↓) | PSNR (↑) | SSIM (↑) | INU (↓) | PSNR p-Value |
|---|---|---|---|---|---|
| Training Strategy | |||||
| Baseline (Stage 1 Only) | 0.516 | 18.222 | 0.572 | 1.014 | — |
| Joint Training | 0.575 | 18.136 | 0.561 | 1.079 | <0.001 (↓) |
| Random Init (no Stage 1) | 0.000 † | 16.104 | 0.693 | 0.023 | <0.001 (↓) |
| Proposed (Two-Stage) | 0.444 | 18.282 | 0.583 | 0.974 | <0.001 (↑) |
| Loss Function (all Two-Stage) | |||||
| TV Loss | 0.000 † | 16.242 | 0.671 | 0.071 | <0.001 (↓) |
| L1 ROI Loss | 0.554 | 18.113 | 0.594 | 2.541 | <0.001 (↓) |
| L2 Gradient (Proposed) | 0.444 | 18.282 | 0.583 | 0.974 | <0.001 (↑) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Lee, Y.; Park, H.-C. YOLO-Based Shading Artifact Reduction for CBCT-to-MDCT Translation Using Two-Stage Learning. Mathematics 2026, 14, 1223. https://doi.org/10.3390/math14071223
Lee Y, Park H-C. YOLO-Based Shading Artifact Reduction for CBCT-to-MDCT Translation Using Two-Stage Learning. Mathematics. 2026; 14(7):1223. https://doi.org/10.3390/math14071223
Chicago/Turabian StyleLee, Yangheon, and Hyun-Cheol Park. 2026. "YOLO-Based Shading Artifact Reduction for CBCT-to-MDCT Translation Using Two-Stage Learning" Mathematics 14, no. 7: 1223. https://doi.org/10.3390/math14071223
APA StyleLee, Y., & Park, H.-C. (2026). YOLO-Based Shading Artifact Reduction for CBCT-to-MDCT Translation Using Two-Stage Learning. Mathematics, 14(7), 1223. https://doi.org/10.3390/math14071223

