A Balanced Multimodal Multi-Task Deep Learning Framework for Robust Patient-Specific Quality Assurance
Abstract
1. Introduction
- BMMQA is the first dedicated solution to modality imbalance for radiotherapy QA and multi-task multimodal learning, explicitly addressing convergence asynchrony where tabular features dominate training. A network forward mechanism enabling fast computation of outputs from different modality combinations is also proposed.
- BMMQA introduces task-specific fusion protocols: attention-based softmax weighting for GPR regression dynamically balances modality contributions, while spatial concatenation for DDP preserves structural fidelity. This dual-path design resolves representational conflicts between global scalar prediction and fine-grained spatial mapping.
- BMMQA develops a theoretically grounded Shapley value framework for quantifying modality contributions, tailored explicitly for continuous-output regression. This framework underpins a dynamic balancing mechanism that modulates training to prevent modality suppression.
- BMMQA innovatively addresses the problem of multi-task multimodal learning by introducing a modality-contribution-based task weighting scheme: assigning different task-specific weights according to modality contributions to resolve the modality imbalance problem in multi-task learning.
- Extensive experiments on clinically relevant datasets validate that BMMQA outperforms existing methods on both coarse-grained and fine-grained PSQA tasks, significantly improving modality collaboration and task generalization in real clinical settings.
2. Related Work
2.1. Multimodal PSQA
2.2. Imbalanced Multimodal Learning
3. Foundations
3.1. Clinically Validated Data Curation
3.2. Task Formulation
3.3. Shapley Value
4. Methodology
4.1. Overview
4.2. Modality Encoders
4.3. Task-Specific Modality Fusion
4.3.1. GPR-Oriented Fusion
4.3.2. DDP-Oriented Fusion
4.4. Task-Specific Decoders
4.4.1. GPR Decoder
4.4.2. Dose Difference Decoder
4.5. Dynamic Balancing Mechanism
4.5.1. Shapley-Based Imbalance Indicator
4.5.2. Adjustment Strategy
Algorithm 1: Per-epoch Training Procedure of BMMQA |
5. Evaluation and Results
5.1. Implementation Setting
5.2. Evaluation Metrics
5.3. Baseline Methods
5.4. EXP-A: Comparative Experiment with PSQA Methods
5.4.1. GPR Result Analysis
5.4.2. DDP Result Analysis
5.4.3. Modality Contribution Dynamics During Training
5.5. EXP-B: Ablation Experiment with Modality Balance Factor r
5.6. Computational Resource Utilization
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Meijers, A.; Marmitt, G.G.; Siang, K.N.W.; van der Schaaf, A.; Knopf, A.C.; Langendijk, J.A.; Both, S. Feasibility of patient specific quality assurance for proton therapy based on independent dose calculation and predicted outcomes. Radiother. Oncol. 2020, 150, 136–141. [Google Scholar] [CrossRef] [PubMed]
- Zeng, X.; Zhu, Q.; Ahmed, A.; Hanif, M.; Hou, M.; Jie, Q.; Xi, R.; Shah, S.A. Multi-granularity prior networks for uncertainty-informed patient-specific quality assurance. Comput. Biol. Med. 2024, 179, 108925. [Google Scholar] [CrossRef] [PubMed]
- Park, J.M.; Kim, J.I.; Park, S.Y.; Oh, D.H.; Kim, S.T. Reliability of the gamma index analysis as a verification method of volumetric modulated arc therapy plans. Radiat. Oncol. 2018, 13, 175. [Google Scholar] [CrossRef] [PubMed]
- Huang, Y.; Pi, Y.; Ma, K.; Miao, X.; Fu, S.; Chen, H.; Wang, H.; Gu, H.; Shao, Y.; Duan, Y.; et al. Image-based features in machine learning to identify delivery errors and predict error magnitude for patient-specific IMRT quality assurance. Strahlenther. Onkol. 2023, 199, 498–510. [Google Scholar] [CrossRef]
- Kui, X.; Liu, F.; Yang, M.; Wang, H.; Liu, C.; Huang, D.; Li, Q.; Chen, L.; Zou, B. A review of dose prediction methods for tumor radiation therapy. Meta-Radiol. 2024, 2, 100057. [Google Scholar] [CrossRef]
- Tan, H.S.; Wang, K.; Mcbeth, R. Deep Evidential Learning for Dose Prediction. arXiv 2024, arXiv:2404.17126. [Google Scholar] [CrossRef]
- Bi, Q.; Lian, X.; Shen, J.; Zhang, F.; Xu, T. Exploration of radiotherapy strategy for brain metastasis patients with driver gene positivity in lung cancer. J. Cancer 2024, 15, 1994. [Google Scholar] [CrossRef]
- Liu, W.; Zhang, L.; Xie, L.; Hu, T.; Li, G.; Bai, S.; Yi, Z. Multilayer perceptron neural network with regression and ranking loss for patient-specific quality assurance. Knowl.-Based Syst. 2023, 271, 110549. [Google Scholar] [CrossRef]
- Li, H.; Peng, X.; Zeng, J.; Xiao, J.; Nie, D.; Zu, C.; Wu, X.; Zhou, J.; Wang, Y. Explainable attention guided adversarial deep network for 3D radiotherapy dose distribution prediction. Knowl.-Based Syst. 2022, 241, 108324. [Google Scholar] [CrossRef]
- Hu, T.; Xie, L.; Zhang, L.; Li, G.; Yi, Z. Deep multimodal neural network based on data-feature fusion for patient-specific quality assurance. Int. J. Neural Syst. 2022, 32, 2150055. [Google Scholar] [CrossRef]
- Han, C.; Zhang, J.; Yu, B.; Zheng, H.; Wu, Y.; Lin, Z.; Ning, B.; Yi, J.; Xie, C.; Jin, X. Integrating plan complexity and dosiomics features with deep learning in patient-specific quality assurance for volumetric modulated arc therapy. Radiat. Oncol. 2023, 18, 116. [Google Scholar] [CrossRef]
- Huang, Y.; Pi, Y.; Ma, K.; Miao, X.; Fu, S.; Zhu, Z.; Cheng, Y.; Zhang, Z.; Chen, H.; Wang, H.; et al. Deep learning for patient-specific quality assurance: Predicting gamma passing rates for IMRT based on delivery fluence informed by log files. Technol. Cancer Res. Treat. 2022, 21, 15330338221104881. [Google Scholar] [CrossRef]
- Li, C.; Su, Z.; Li, B.; Sun, W.; Wu, D.; Zhang, Y.; Li, X.; Xie, Z.; Huang, J.; Wei, Q. Plan complexity and dosiomics signatures for gamma passing rate classification in volumetric modulated arc therapy: External validation across different LINACs. Phys. Med. 2025, 133, 104962. [Google Scholar] [CrossRef] [PubMed]
- Sun, W.; Mo, Z.; Li, Y.; Xiao, J.; Jia, L.; Huang, S.; Liao, C.; Du, J.; He, S.; Chen, L.; et al. Machine learning-based ensemble prediction model for the gamma passing rate of VMAT-SBRT plan. Phys. Med. 2024, 117, 103204. [Google Scholar] [CrossRef] [PubMed]
- Huang, Y.; Cai, R.; Pi, Y.; Ma, K.; Kong, Q.; Zhuo, W.; Kong, Y. A feasibility study to predict 3D dose delivery accuracy for IMRT using DenseNet with log files. J. X-Ray Sci. Technol. 2024, 32, 1199–1208. [Google Scholar] [CrossRef] [PubMed]
- Xu, S.; Cui, M.; Huang, C.; Wang, H.; Hu, D. BalanceBenchmark: A Survey for Multimodal Imbalance Learning. arXiv 2025, arXiv:2502.10816. [Google Scholar]
- Du, C.; Teng, J.; Li, T.; Liu, Y.; Yuan, T.; Wang, Y.; Yuan, Y.; Zhao, H. On Uni-Modal Feature Learning in Supervised Multi-Modal Learning. In Proceedings of the 40th International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., Scarlett, J., Eds.; PMLR, Proceedings of Machine Learning Research. Volume 202, pp. 8632–8656. [Google Scholar]
- Hua, C.; Xu, Q.; Bao, S.; Yang, Z.; Huang, Q. ReconBoost: Boosting Can Achieve Modality Reconcilement. arXiv 2024, arXiv:2405.09321. [Google Scholar] [CrossRef]
- Peng, X.; Wei, Y.; Deng, A.; Wang, D.; Hu, D. Balanced Multimodal Learning via On-the-Fly Gradient Modulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 21–24 June 2022; pp. 8238–8247. [Google Scholar]
- Xu, R.; Feng, R.; Zhang, S.X.; Hu, D. MMCosine: Multi-Modal Cosine Loss Towards Balanced Audio-Visual Fine-Grained Learning. In Proceedings of the ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar] [CrossRef]
- Li, H.; Li, X.; Hu, P.; Lei, Y.; Li, C.; Zhou, Y. Boosting Multi-modal Model Performance with Adaptive Gradient Modulation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2–3 October 2023; pp. 22214–22224. [Google Scholar]
- Ma, H.; Zhang, Q.; Zhang, C.; Wu, B.; Fu, H.; Zhou, J.T.; Hu, Q. Calibrating Multimodal Learning. In Proceedings of the 40th International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; Proceedings of Machine Learning Research. Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., Scarlett, J., Eds.; PMLR. 2023; Volume 202, pp. 23429–23450. [Google Scholar]
- Wu, N.; Jastrzebski, S.; Cho, K.; Geras, K.J. Characterizing and Overcoming the Greedy Nature of Learning in Multi-modal Deep Neural Networks. In Proceedings of the 39th International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; Proceedings of Machine Learning Research. Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., Sabato, S., Eds.; PMLR. 2022; Volume 162, pp. 24043–24055. [Google Scholar]
- Wei, Y.; Hu, D. MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance. arXiv 2024, arXiv:2405.17730. [Google Scholar] [CrossRef]
- Fan, Y.; Xu, W.; Wang, H.; Wang, J.; Guo, S. PMR: Prototypical Modal Rebalance for Multimodal Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, USA, 18–22 June 2023; pp. 20029–20038. [Google Scholar]
- Wang, W.; Tran, D.; Feiszli, M. What makes training multi-modal classification networks hard? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 14–19 June 2020; pp. 12695–12705. [Google Scholar]
- Winterbottom, T.; Xiao, S.; McLean, A.; Moubayed, N.A. On modality bias in the tvqa dataset. arXiv 2020, arXiv:2012.10210. [Google Scholar] [CrossRef]
- Sun, Y.; Mai, S.; Hu, H. Learning to Balance the Learning Rates Between Various Modalities via Adaptive Tracking Factor. IEEE Signal Process. Lett. 2021, 28, 1650–1654. [Google Scholar] [CrossRef]
- Huang, Y.; Lin, J.; Zhou, C.; Yang, H.; Huang, L. Modality Competition: What Makes Joint Training of Multi-modal Network Fail in Deep Learning? (Provably). arXiv 2022, arXiv:2203.12221. [Google Scholar] [CrossRef]
- Du, C.; Li, T.; Liu, Y.; Wen, Z.; Hua, T.; Wang, Y.; Zhao, H. Improving Multi-Modal Learning with Uni-Modal Teachers. arXiv 2021, arXiv:2106.11059. [Google Scholar]
- Wei, Y.; Feng, R.; Wang, Z.; Hu, D. Enhancing multimodal Cooperation via Sample-level Modality Valuation. In Proceedings of the CVPR, Seattle, WA, USA, 17–21 June 2024. [Google Scholar]
- Wei, Y.; Li, S.; Feng, R.; Hu, D. Diagnosing and Re-learning for Balanced Multimodal Learning. In Proceedings of the ECCV, Milan, Italy, 29 September–4 October 2024. [Google Scholar]
- Yang, Z.; Wei, Y.; Liang, C.; Hu, D. Quantifying and Enhancing Multi-modal Robustness with Modality Preference. arXiv 2024, arXiv:2402.06244. [Google Scholar] [CrossRef]
- Miften, M.; Olch, A.; Mihailidis, D.; Moran, J.; Pawlicki, T.; Molineu, A.; Li, H.; Wijesooriya, K.; Shi, J.; Xia, P.; et al. TG 218: Tolerance limits and methodologies for IMRT measurement-based verification QA: Recommendations of AAPM Task Group No. 218. Med. Phys. 2018, 45, e53–e83. [Google Scholar] [CrossRef]
- McNiven, A.L.; Sharpe, M.B.; Purdie, T.G. A new metric for assessing IMRT modulation complexity and plan deliverability. Med. Phys. 2010, 37, 505–515. [Google Scholar] [CrossRef]
- Du, W.; Cho, S.H.; Zhang, X.; Hoffman, K.E.; Kudchadker, R.J. Quantification of beam complexity in intensity-modulated radiation therapy treatment plans. Med. Phys. 2014, 41, 021716. [Google Scholar] [CrossRef]
- Götstedt, J.; Karlsson Hauer, A.; Bäck, A. Development and evaluation of aperture-based complexity metrics using film and EPID measurements of static MLC openings. Med. Phys. 2015, 42, 3911–3921. [Google Scholar] [CrossRef]
- Crowe, S.; Kairn, T.; Kenny, J.; Knight, R.; Hill, B.; Langton, C.M.; Trapp, J. Treatment plan complexity metrics for predicting IMRT pre-treatment quality assurance results. Australas. Phys. Eng. Sci. Med. 2014, 37, 475–482. [Google Scholar] [CrossRef]
- Crowe, S.; Kairn, T.; Middlebrook, N.; Sutherland, B.; Hill, B.; Kenny, J.; Langton, C.M.; Trapp, J. Examination of the properties of IMRT and VMAT beams and evaluation against pre-treatment quality assurance results. Phys. Med. Biol. 2015, 60, 2587. [Google Scholar] [CrossRef]
- Nauta, M.; Villarreal-Barajas, J.E.; Tambasco, M. Fractal analysis for assessing the level of modulation of IMRT fields. Med. Phys. 2011, 38, 5385–5393. [Google Scholar] [CrossRef] [PubMed]
- Zhu, H.; Zhu, Q.; Wang, Z.; Yang, B.; Zhang, W.; Qiu, J. Patient-specific quality assurance prediction models based on machine learning for novel dual-layered MLC linac. Med. Phys. 2023, 50, 1205–1214. [Google Scholar] [CrossRef] [PubMed]
- Shapley, L. The value of n-person games. Ann. Math. Stud. 1953, 28, 307–317. [Google Scholar]
- Molnar, C. Interpretable Machine Learning; Lulu. Press: Morrisville, NC, USA, 2020. [Google Scholar]
- Weber, R.J. Probabilistic values for games. The Shapley Value. Essays in Honor of Lloyd S. Shapley; Cambridge University Press: New York, NY, USA, 1988; pp. 101–119. [Google Scholar]
- Freund, Y.; Schapire, R.E. Game theory, on-line prediction and boosting. In Proceedings of the Ninth Annual Conference on Computational Learning Theory, Garda, Italy, 28 June–1 July 1996; pp. 325–332. [Google Scholar]
- Wang, K.; Gou, C.; Duan, Y.; Lin, Y.; Zheng, X.; Wang, F.Y. Generative adversarial networks: Introduction and outlook. IEEE/CAA J. Autom. Sin. 2017, 4, 588–598. [Google Scholar] [CrossRef]
- Gemp, I.; McWilliams, B.; Vernade, C.; Graepel, T. EigenGame: PCA as a Nash Equilibrium. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 30 April 2020. [Google Scholar]
- Hu, P.; Li, X.; Zhou, Y. Shape: An unified approach to evaluate the contribution and cooperation of individual modalities. arXiv 2022, arXiv:2205.00302. [Google Scholar] [CrossRef]
- Ono, T.; Hirashima, H.; Iramina, H.; Mukumoto, N.; Miyabe, Y.; Nakamura, M.; Mizowaki, T. Prediction of dosimetric accuracy for VMAT plans using plan complexity parameters via machine learning. Med. Phys. 2019, 46, 3823–3832. [Google Scholar] [CrossRef]
- Lay, L.M.; Chuang, K.C.; Wu, Y.; Giles, W.; Adamson, J. Virtual patient-specific QA with DVH-based metrics. J. Appl. Clin. Med. Phys. 2022, 23, e13639. [Google Scholar] [CrossRef]
- Thongsawad, S.; Srisatit, S.; Fuangrod, T. Predicting gamma evaluation results of patient-specific head and neck volumetric-modulated arc therapy quality assurance based on multileaf collimator patterns and fluence map features: A feasibility study. J. Appl. Clin. Med. Phys. 2022, 23, e13622. [Google Scholar] [CrossRef]
- Valdes, G.; Scheuermann, R.; Hung, C.; Olszanski, A.; Bellerive, M.; Solberg, T. A mathematical framework for virtual IMRT QA using machine learning. Med. Phys. 2016, 43, 4323–4334. [Google Scholar] [CrossRef]
- Huang, Y.; Pi, Y.; Ma, K.; Miao, X.; Fu, S.; Chen, H.; Wang, H.; Gu, H.; Shan, Y.; Duan, Y.; et al. Virtual patient-specific quality assurance of IMRT using UNet++: Classification, gamma passing rates prediction, and dose difference prediction. Front. Oncol. 2021, 11, 700343. [Google Scholar] [CrossRef]
- Yoganathan, S.; Ahmed, S.; Paloor, S.; Torfeh, T.; Aouadi, S.; Al-Hammadi, N.; Hammoud, R. Virtual pretreatment patient-specific quality assurance of volumetric modulated arc therapy using deep learning. Med. Phys. 2023, 50, 7891–7903. [Google Scholar] [CrossRef]
- Radosavovic, I.; Kosaraju, R.P.; Girshick, R.; He, K.; Dollár, P. Designing network design spaces. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 10428–10436. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar] [CrossRef]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 22–25 July 2017; pp. 4700–4708. [Google Scholar]
- Mehta, S.; Rastegari, M. Mobilevit: Light-weight, general-purpose, and mobile-friendly vision transformer. arXiv 2021, arXiv:2110.02178. [Google Scholar]
- Suwanraksa, C.; Bridhikitti, J.; Liamsuwan, T.; Chaichulee, S. CBCT-to-CT translation using registration-based generative adversarial networks in patients with head and neck cancer. Cancers 2023, 15, 2017. [Google Scholar] [CrossRef] [PubMed]
- Liu, F.; Liu, J.; Fang, Z.; Hong, R.; Lu, H. Densely Connected Attention Flow for Visual Question Answering. In Proceedings of the IJCAI, Macao, China, 10–16 August 2019; pp. 869–875. [Google Scholar]
- Xu, J.; Pan, Y.; Pan, X.; Hoi, S.; Yi, Z.; Xu, Z. RegNet: Self-regulated network for image classification. IEEE Trans. Neural Netw. Learn. Syst. 2022, 34, 9562–9567. [Google Scholar] [CrossRef]
- Zeng, L.; Zhang, M.; Zhang, Y.; Zou, Z.; Guan, Y.; Huang, B.; Yu, X.; Ding, S.; Liu, Q.; Gong, C. TransQA: Deep hybrid transformer network for measurement-guided volumetric dose prediction of pre-treatment patient-specific quality assurance. Phys. Med. Biol. 2023, 68, 205010. [Google Scholar] [CrossRef]
- Yang, X.; Li, S.; Shao, Q.; Cao, Y.; Yang, Z.; Zhao, Y.Q. Uncertainty-guided man–machine integrated patient-specific quality assurance. Radiother. Oncol. 2022, 173, 1–9. [Google Scholar] [CrossRef]
Reference | Category | Description (Modalities + Fusion Strategy) | Fine-Grained Task? | Modality Balance? |
---|---|---|---|---|
[11] | Early Fusion | Dose matrices encoded via DenseNet-121 (image) and plan complexity features (tabular) are separately reduced using PCA/Lasso and concatenated for Random Forest regression. | ✗ | ✗ |
[13] | Early Fusion | Plan and dosimetric features (tabular) are selected via Lasso and used as input to a logistic regression model for classification. | ✗ | ✗ |
[14] | Late Fusion | Two SVMs are trained separately on radiomics (image-like features) and plan complexity features; predictions are aggregated to improve GPR classification and regression. | ✗ | ✓ |
[10] | Late Fusion | Dose distribution matrices (image) and MU values (tabular) are processed by separate models; outputs are aggregated at the decision level as a baseline fusion strategy. | ✗ | ✓ |
[10] | Intermediate Fusion | 3D ResNet encodes MLC aperture images; spatial features are concatenated one-to-one with MU values (tabular), jointly optimized in an end-to-end model (FDF). | ✓ | ✗ |
[12] | Intermediate Fusion | Dose distribution is compressed into a 99D vector and concatenated with MU value; the fused vector is passed to fully connected layers. | ✓ | ✗ |
[15] | Intermediate Fusion | MU-weighted fluence maps generated from delivery log files are used as image input to DenseNet for GPR prediction under multiple gamma criteria. | ✓ | ✗ |
BMMQA | Intermediate Fusion | Dual-modality input (dose matrix + tabular metrics); image encoded by a pre-trained CNN, tabular by MLP; fusion uses task-specific heads: attention-weighted feature fusion for GPR, spatial concatenation for DDP; Shapley-based diagnostics and adaptive loss scaling address modality imbalance. | ✓ | ✓ |
Treatment Site | Abbreviation | Number of Plans |
---|---|---|
Head and Neck | H&N | 19 |
Chest | C | 138 |
Abdomen | A | 31 |
Pelvis | P | 22 |
Type | Ref | Metric Name | H&N | C | A | P |
---|---|---|---|---|---|---|
Complexity Metric (Input Tabular Modality) | [35] | Aperture area variability of distal MLC (d-AAV) | 0.30 | 0.42 | 0.35 | 0.43 |
Aperture area variability of proximal MLC (p-AAV) | 0.31 | 0.44 | 0.41 | 0.43 | ||
Leaf sequence variability of distal MLC (d-LSV) | 2.49 | 3.17 | 6.81 | 4.71 | ||
Leaf sequence variability of proximal MLC (p-LSV) | 1.44 | 1.66 | 3.48 | 1.88 | ||
Modulation complex score of distal MLC (d-MCS) | 1.38 | 0.90 | 0.97 | 0.82 | ||
Modulation complex score of proximal MLC (p-MCS) | 1.21 | 0.70 | 0.68 | 0.73 | ||
[36] | Beam area of distal MLC (d-BA) | 0.27 | 0.56 | 0.34 | 0.38 | |
Beam area of proximal MLC (p-BA) | 0.22 | 0.53 | 0.37 | 0.35 | ||
Union aperture areas of distal MLC (d-UAA) | 20.45 | 22.84 | 16.89 | 35.90 | ||
Union aperture areas of proximal MLC (p-UAA) | 15.43 | 20.45 | 18.44 | 32.26 | ||
Plan irregularity of distal MLC (d-PI) | 32.98 | 66.99 | 88.17 | 66.18 | ||
Plan irregularity of proximal MLC (p-PI) | 31.57 | 66.84 | 92.33 | 68.85 | ||
Plan modulation of distal MLC (d-PM) | 0.70 | 0.58 | 0.65 | 0.57 | ||
Plan modulation of proximal MLC (p-PM) | 0.69 | 0.56 | 0.59 | 0.57 | ||
Complexity Metric (Input Tabular Modality) | [37] | Circumference/area of distal MLC (d-C/A) | 0.36 | 1.01 | 1.12 | 1.00 |
Circumference/area of proximal MLC (p-C/A) | 0.34 | 1.02 | 1.32 | 1.01 | ||
[38] | Closed leaf score of distal MLC (d-CLS) | 0.65 | 1.07 | 1.79 | 1.54 | |
Closed leaf score of proximal MLC (p-CLS) | 0.66 | 1.08 | 2.11 | 1.55 | ||
Mean asymmetry distance of distal MLC (d-MAD) | 0.78 | 0.79 | 0.78 | 0.80 | ||
Mean asymmetry distance of proximal MLC (p-MAD) | 0.77 | 0.81 | 0.79 | 0.79 | ||
[39] | Small aperture score 2 mm of distal MLC (d-SAS-2 mm) | 0.23 | 0.34 | 0.26 | 0.34 | |
Small aperture score 5 mm of distal MLC (d-SAS-5 mm) | 0.24 | 0.36 | 0.31 | 0.34 | ||
Small aperture score 10 mm of distal MLC (d-SAS-10 mm) | 263.53 | 195.03 | 191.07 | 172.01 | ||
Small aperture score 15 mm of distal MLC (d-SAS-15 mm) | 10.01 | 28.43 | 31.07 | 28.29 | ||
Small aperture score 2 mm of proximal MLC (p-SAS-2 mm) | 9.87 | 29.62 | 38.17 | 29.59 | ||
Small aperture score 5 mm of proximal MLC (p-SAS-5 mm) | 0.19 | 0.15 | 0.11 | 0.13 | ||
Small aperture score 10 mm of proximal MLC (p-SAS-10 mm) | 0.10 | 0.08 | 0.11 | 0.08 | ||
Small aperture score 15 mm of proximal MLC (p-SAS-15 mm) | 0.26 | 0.19 | 0.13 | 0.17 | ||
[40] | Average leaf gap of distal MLC (d-ALG) | 0.16 | 0.14 | 0.15 | 0.13 | |
Average leaf gap of proximal MLC (p-ALG) | 0.49 | 0.37 | 0.23 | 0.25 | ||
Standard deviation of leaf gap of distal MLC (d-SLG) | 0.31 | 0.33 | 0.20 | 0.22 | ||
Standard deviation of leaf gap of proximal MLC (p-SLG) | 0.68 | 0.53 | 0.31 | 0.39 | ||
[41] | MU value of each beam (MU) | 0.59 | 0.48 | 0.31 | 0.37 | |
QA Metric (Target) | Gamma Passing Rate (1%/1mm) | 91.80 | 93.72 | 99.15 | 97.90 | |
Gamma Passing Rate (2%/2mm) | 96.58 | 96.78 | 99.78 | 99.05 | ||
Gamma Passing Rate (3%/2mm) | 98.31 | 98.13 | 99.95 | 99.37 |
Symbol | Description |
---|---|
A single input sample, consisting of an image modality and a tabular modality . | |
, | Ground truths for GPR and DDP tasks, respectively. ; . |
Latent feature map from the image encoder; shape: . Requires pooling before projection. | |
Latent tabular embedding vector produced by the tabular encoder; shape: . | |
Fused latent representations used in GPR and DDP heads, respectively. | |
Matrix multiplication between projection matrix and tabular embedding vector. | |
Matrix multiplication between the projection matrix and the pooled image feature map. Due to the complexity introduced by both multi-task and multimodal notation, spatial pooling over is applied implicitly for notational simplicity and consistency, resulting in an -dimensional vector. This convention is used throughout the paper, including expressions such as . | |
Attention weights for the image and tabular modalities, computed via softmax over scalar scores. | |
Power set of modality subsets considered for ablation: . | |
GPR prediction computed using only the modality subset . | |
Model predictions for GPR and DDP tasks using all available modalities by default, namely ; . | |
Value function for modality subset , defined as the negative MAE of predictions over the validation set. Used to compute Shapley-style contributions. | |
Modality contributions computed via two-modality Shapley decomposition based on . | |
Adaptive weighting factor for the DDP loss, derived from modality contributions. | |
Loss terms for the GPR and DDP tasks. | |
Total training loss: . |
Parameter | Description | Value |
---|---|---|
Resized Dose Plan Array Shape | To organize dose plan array samples into a batch, we perform ResizeWithPadOrCrop preprocessing | [512, 512] |
Regression Head Architecture | Architecture of GPR prediction output; the output dimension is 3 since we select three GPR criteria [1%/1 mm, 2%/2 mm, 2%/3 mm] | AdaptiveAvgPool2d[1, 1] Flatten() Dropout(p = 0.1) Linear[512, 3] |
Decoder (DDP) Architecture | U-Net decoder channel dimensions | [768, 384, 192, 128, 32] |
Dropout Rate | The dropout rate employed in network | 0.1 |
Optimizer | Type of optimizer used | Adam |
r | The hyperparameters used in the computation of are set to their default values unless otherwise specified | 16 |
Epochs | Number of training epochs | 30 |
Batch Size | Size of the training batches | 32 |
Learning Rate | Learning rate for the optimizer | |
Dataset Split Ratio (EXP-A and EXP-B) | Train:Validation:Test split ratio | Multi-factor stratified sampling based on Figure 2, with a 7:1:2 split. |
Category | Method | Criteria | GPR MAE (%) | GPR RMSE (%) | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|
All | 95–100 | 90–95 | <90 | All | 95–100 | 90–95 | <90 | ||||
Tabular Modality | GBDT [41] | 1%/1 mm | 1.8666 | 1.6246 | 1.4805 | 3.3034 | 2.6750 | 2.3497 | 1.9099 | 4.3705 | 0.6055 |
2%/2 mm | 1.2371 | 0.9154 | 1.9578 | 4.3461 | 1.8317 | 1.2805 | 2.3500 | 5.2914 | 0.5992 | ||
2%/3 mm | 0.8798 | 0.7188 | 2.2047 | 7.8881 | 1.3434 | 1.0064 | 2.6306 | 7.8911 | 0.5740 | ||
RF [41] | 1%/1 mm | 2.0911 | 1.7536 | 1.5255 | 4.1621 | 3.0146 | 2.2812 | 1.8969 | 5.5119 | 0.5305 | |
2%/2 mm | 1.3554 | 0.9389 | 1.9412 | 7.0523 | 2.1110 | 1.2655 | 2.2275 | 7.7729 | 0.5030 | ||
2%/3 mm | 0.9515 | 0.6937 | 3.0804 | 10.4988 | 1.5628 | 0.9491 | 3.2777 | 10.5093 | 0.4784 | ||
PL [41] | 1%/1 mm | 2.0605 | 1.6859 | 1.7921 | 3.4642 | 3.1027 | 2.4075 | 2.6953 | 4.7986 | 0.4821 | |
2%/2 mm | 1.3673 | 1.0289 | 1.9915 | 5.2811 | 2.1353 | 1.4706 | 2.6279 | 6.5024 | 0.4697 | ||
2%/3 mm | 0.9664 | 0.7833 | 2.2965 | 9.8307 | 1.5707 | 1.1072 | 2.9172 | 9.8389 | 0.4349 | ||
Image Modality | UNET++ [53] | 1%/1 mm | 1.7783 | 1.3875 | 1.5326 | 3.1709 | 2.7271 | 2.2103 | 2.0726 | 4.4958 | 0.6165 |
2%/2 mm | 1.1216 | 0.8269 | 1.4684 | 5.4498 | 1.8683 | 1.2849 | 1.8351 | 6.4454 | 0.6218 | ||
2%/3 mm | 0.7704 | 0.6126 | 1.8108 | 9.2442 | 1.3504 | 0.9324 | 2.2840 | 9.2930 | 0.6258 | ||
Ranking Loss [8] | 1%/1 mm | 1.7604 | 1.2448 | 1.4286 | 3.6156 | 2.7392 | 1.8543 | 1.9004 | 5.0189 | 0.6286 | |
2%/2 mm | 1.1131 | 0.8030 | 1.5527 | 5.3085 | 1.8941 | 1.2502 | 1.9655 | 6.5941 | 0.6157 | ||
2%/3 mm | 0.7512 | 0.5939 | 1.7643 | 9.4602 | 1.3703 | 0.8979 | 2.4679 | 9.6021 | 0.6129 | ||
TransQA [62] | 1%/1 mm | 1.7308 | 1.4348 | 1.3519 | 3.2477 | 2.6730 | 2.3100 | 1.7716 | 4.5237 | 0.6344 | |
2%/2 mm | 1.1060 | 0.7795 | 1.5627 | 5.5561 | 1.8772 | 1.2354 | 1.9102 | 6.6028 | 0.6297 | ||
2%/3 mm | 0.7787 | 0.6049 | 1.9731 | 9.5395 | 1.3874 | 0.9178 | 2.5051 | 9.5951 | 0.6171 | ||
RegNet [61] | 1%/1 mm | 1.7576 | 1.5107 | 1.2643 | 3.4472 | 2.7098 | 2.1745 | 1.6929 | 4.8423 | 0.6302 | |
2%/2 mm | 1.0729 | 0.7354 | 1.5806 | 5.4991 | 1.8329 | 1.1409 | 1.9207 | 6.6090 | 0.6423 | ||
2%/3 mm | 0.7232 | 0.5699 | 1.6869 | 9.4960 | 1.3191 | 0.8856 | 2.1924 | 9.4994 | 0.638 | ||
Unbalanced Multimodality | ResNet18 [63] | 1%/1 mm | 1.7590 | 1.2781 | 1.5838 | 3.1678 | 2.6695 | 1.9935 | 2.0368 | 4.5546 | 0.6435 |
2%/2 mm | 1.0841 | 0.8023 | 1.4118 | 5.2429 | 1.8383 | 1.2270 | 1.9120 | 6.3409 | 0.6371 | ||
2%/3 mm | 0.7587 | 0.5900 | 1.9359 | 9.0621 | 1.3604 | 0.9105 | 2.5084 | 9.1153 | 0.6204 | ||
DenseNet121 [53] | 1%/1 mm | 1.7163 | 1.4312 | 1.2647 | 3.3849 | 2.5983 | 2.0865 | 1.6889 | 4.5853 | 0.6701 | |
2%/2 mm | 1.0943 | 0.7143 | 1.7679 | 5.5857 | 1.8488 | 1.1296 | 2.1092 | 6.4863 | 0.6681 | ||
2%/3 mm | 0.7978 | 0.6084 | 2.2694 | 8.4039 | 1.3743 | 0.9064 | 2.6525 | 8.9947 | 0.6399 | ||
MobileVit [62] | 1%/1 mm | 1.7383 | 1.3995 | 1.4173 | 3.2042 | 2.6178 | 2.0552 | 1.8679 | 4.5078 | 0.6577 | |
2%/2 mm | 1.1021 | 0.7656 | 1.6221 | 5.4503 | 1.8338 | 1.1682 | 1.9683 | 6.4490 | 0.6443 | ||
2%/3 mm | 0.7666 | 0.5865 | 2.0116 | 9.7678 | 1.3761 | 0.8995 | 2.4443 | 9.7681 | 0.6208 | ||
RegNet [61] | 1%/1 mm | 1.7231 | 1.5993 | 1.2690 | 3.0660 | 2.6364 | 2.2652 | 1.7408 | 4.4818 | 0.6493 | |
2%/2 mm | 1.0709 | 0.7864 | 1.3264 | 5.6330 | 1.8495 | 1.2493 | 1.7169 | 6.6010 | 0.6314 | ||
2%/3 mm | 0.7462 | 0.5641 | 2.0615 | 9.2111 | 1.3494 | 0.8660 | 2.5374 | 9.3495 | 0.6356 | ||
Balanced Multimodality | ResNet18 [63] | 1%/1 mm | 1.7047 | 1.3624 | 1.2807 | 2.9709 | 2.5799 | 2.0178 | 1.6799 | 4.3262 | 0.6629 |
2%/2 mm | 1.0437 | 0.7619 | 1.2372 | 4.9718 | 1.7894 | 1.1533 | 1.6703 | 6.0951 | 0.6512 | ||
2%/3 mm | 0.7323 | 0.5817 | 1.7598 | 8.4276 | 1.2979 | 0.8949 | 2.2805 | 8.7005 | 0.6453 | ||
DenseNet121 [53] | 1%/1 mm | 1.6554 | 1.4855 | 1.1600 | 3.1554 | 2.6145 | 2.1450 | 1.5494 | 4.6966 | 0.6631 | |
2%/2 mm | 1.0385 | 0.7599 | 1.2611 | 5.3775 | 1.8531 | 1.1958 | 1.7133 | 6.7708 | 0.6357 | ||
2%/3 mm | 0.7446 | 0.5613 | 1.9863 | 9.3201 | 1.3767 | 0.8744 | 2.5759 | 9.6852 | 0.6224 | ||
MobileVit [62] | 1%/1 mm | 1.7322 | 1.2182 | 1.2759 | 2.8021 | 2.5736 | 1.9130 | 1.6695 | 4.1985 | 0.6609 | |
2%/2 mm | 1.0779 | 0.7003 | 1.3006 | 5.0196 | 1.7886 | 1.0464 | 1.7484 | 6.0630 | 0.6551 | ||
2%/3 mm | 0.7672 | 0.6119 | 1.7489 | 8.0622 | 1.3283 | 0.9018 | 2.2418 | 8.5363 | 0.6354 | ||
RegNet [61] | 1%/1 mm | 1.6925 | 1.3151 | 1.0587 | 3.3435 | 2.6448 | 1.9737 | 1.3771 | 4.7989 | 0.6689 | |
2%/2 mm | 1.0814 | 0.7307 | 1.4890 | 5.6002 | 1.8345 | 1.0826 | 1.8753 | 6.6543 | 0.6770 | ||
2%/3 mm | 0.7582 | 0.5553 | 2.0203 | 9.1534 | 1.3198 | 0.8208 | 2.5249 | 9.2636 | 0.6814 |
Category | Method | DDP MAE (%) | DDP SSIM | |||||||
---|---|---|---|---|---|---|---|---|---|---|
All | 95–100 | 90–95 | <90 | All | 95–100 | 90–95 | <90 | |||
Image Modality | UNET++ [53] | 4.7676 | 5.3073 | 4.8217 | 3.5283 | 0.9527 | 0.9526 | 0.9518 | 0.9551 | 0.9753 |
Ranking Loss [8] | 4.5994 | 5.1528 | 4.6368 | 3.3719 | 0.9559 | 0.9555 | 0.9552 | 0.9585 | 0.9784 | |
TransQA [62] | 4.6481 | 5.2112 | 4.6973 | 3.3718 | 0.9599 | 0.9591 | 0.9592 | 0.9634 | 0.9793 | |
RegNet [61] | 5.0810 | 5.6554 | 5.1256 | 3.7927 | 0.9370 | 0.9347 | 0.9364 | 0.9432 | 0.9793 | |
Unbalanced Multimodality | ResNet18 [63] | 4.8604 | 5.2371 | 4.7999 | 4.2309 | 0.9490 | 0.9553 | 0.9531 | 0.9260 | 0.9748 |
DenseNet121 [53] | 4.8091 | 5.2819 | 4.7740 | 3.9207 | 0.9551 | 0.9609 | 0.9583 | 0.9356 | 0.9764 | |
MobileVit [62] | 4.8784 | 5.4469 | 4.9394 | 3.5631 | 0.9568 | 0.9554 | 0.9561 | 0.9613 | 0.9774 | |
RegNet [61] | 4.7814 | 5.3558 | 4.8183 | 3.5116 | 0.9554 | 0.9533 | 0.9551 | 0.9604 | 0.9805 | |
Balanced Multimodality | ResNet18 [63] | 4.3027 | 4.7272 | 4.3588 | 3.2953 | 0.9535 | 0.9600 | 0.9575 | 0.9367 | 0.9773 |
DenseNet121 [53] | 4.0220 | 4.4545 | 3.9942 | 3.1993 | 0.9640 | 0.9650 | 0.9656 | 0.9581 | 0.9751 | |
MobileVit [62] | 4.4387 | 4.7451 | 4.3519 | 3.8250 | 0.9543 | 0.9598 | 0.9565 | 0.9453 | 0.9701 | |
RegNet [61] | 4.2745 | 4.7580 | 4.3112 | 3.1921 | 0.9579 | 0.9581 | 0.9573 | 0.9589 | 0.9778 |
Method Name | Flops (GMac) | Param Size (M) |
---|---|---|
Unet++ | 63.7 | 15.9 |
TranQA | 240.34 | 215.38 |
DenseNet | 6.4 | 14.62 |
BMMQA | 6.5 | 14.72 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zeng, X.; Ahmed, A.; Tunio, M.H. A Balanced Multimodal Multi-Task Deep Learning Framework for Robust Patient-Specific Quality Assurance. Diagnostics 2025, 15, 2555. https://doi.org/10.3390/diagnostics15202555
Zeng X, Ahmed A, Tunio MH. A Balanced Multimodal Multi-Task Deep Learning Framework for Robust Patient-Specific Quality Assurance. Diagnostics. 2025; 15(20):2555. https://doi.org/10.3390/diagnostics15202555
Chicago/Turabian StyleZeng, Xiaoyang, Awais Ahmed, and Muhammad Hanif Tunio. 2025. "A Balanced Multimodal Multi-Task Deep Learning Framework for Robust Patient-Specific Quality Assurance" Diagnostics 15, no. 20: 2555. https://doi.org/10.3390/diagnostics15202555
APA StyleZeng, X., Ahmed, A., & Tunio, M. H. (2025). A Balanced Multimodal Multi-Task Deep Learning Framework for Robust Patient-Specific Quality Assurance. Diagnostics, 15(20), 2555. https://doi.org/10.3390/diagnostics15202555