StrDiSeg: Adapter-Enhanced DINOv3 for Automated Ischemic Stroke Lesion Segmentation
Abstract
1. Introduction
- 1.
- We present a comprehensive investigation into end-to-end fine-tuning of a large pretrained vision transformer for ischemic stroke lesion segmentation on NCCT and MRI;
- 2.
- We design a multi-scale U-Net-style decoder with attention mechanisms for precise structural reconstruction from transformer features;
- 3.
- We demonstrate experimentally that full fine-tuning significantly improves segmentation performance over partial or lightweight transfer, suggesting that large pretrained on natural imagees visual backbones may require full representational adaptation in clinical imaging domains with strong modality shifts.
2. Materials and Methods
2.1. Datasets
2.2. Proposed Model Architecture
2.2.1. Pretrained DINOv3 Encoder
2.2.2. Adapter-Based Fine-Tuning
2.2.3. Attention-Enhanced U-Net Decoder
2.3. Loss Function
2.4. Evaluation Metrics and Implementation Details
2.4.1. Evaluation Metrics
2.4.2. Implementation Details
3. Results
3.1. Quantitative Performance
3.2. Qualitative Visualization
3.3. Ablation Experiments
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| NCCT | Non-contrast computed tomography |
| DWI | Diffusion-weighted imaging |
| MRI | Magnetic resonance imaging |
| ViT-B | Vision transformer—base |
| CNNs | Convolutional neural networks |
| AIS | Acute ischemic stroke |
References
- Smith, A.G.; Rowland Hill, C. Imaging Assessment of Acute Ischaemic Stroke: A Review of Radiological Methods. Br. J. Radiol. 2018, 91, 20170573. [Google Scholar] [CrossRef] [PubMed]
- Saver, J.L.; Goyal, M.; van der Lugt, A.; Menon, B.K.; Majoie, C.B.L.M.; Dippel, D.W.; Campbell, B.C.; Nogueira, R.G.; Demchuk, A.M.; Tomasello, A.; et al. Time to Treatment with Endovascular Thrombectomy and Outcomes from Ischemic Stroke: A Meta-analysis. JAMA 2016, 316, 1279–1289. [Google Scholar] [CrossRef] [PubMed]
- Wang, X.; Qi, G.; Cheng, X.; Shan, H.; Hong, L.; Dong, Q.; Liu, Y.; Zhou, F.; Li, S.; He, Z.; et al. Deep Learning for Identifying Ischemic Core in Acute Ischemic Stroke Based on Non-contrast Enhanced Computed Tomography. Stroke 2025, 56, ATP180. [Google Scholar] [CrossRef]
- Malik, M.; Chong, B.; Fernandez, J.; Shim, V.; Kasabov, N.K.; Wang, A. Stroke Lesion Segmentation and Deep Learning: A Comprehensive Review. Bioengineering 2024, 11, 86. [Google Scholar] [CrossRef] [PubMed]
- Pham, D.L.; Xu, C.; Prince, J.L. Current Methods in Medical Image Segmentation. Annu. Rev. Biomed. Eng. 2000, 2, 315–337. [Google Scholar] [CrossRef] [PubMed]
- Sharma, N.; Aggarwal, L.M. Automated Medical Image Segmentation Techniques. J. Med. Phys. 2010, 35, 3–14. [Google Scholar] [CrossRef] [PubMed]
- Hesamian, M.H.; Jia, W.; He, X.; Kennedy, P. Deep Learning Techniques for Medical Image Segmentation: Achievements and Challenges. J. Digit. Imaging 2019, 32, 582–596. [Google Scholar] [CrossRef] [PubMed]
- Gao, Y.; Jiang, Y.; Peng, Y.; Yuan, F.; Zhang, X.; Wang, J. Medical Image Segmentation: A Comprehensive Review of Deep Learning-Based Methods. Tomography 2025, 11, 52. [Google Scholar] [CrossRef] [PubMed]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015; Navab, N., Hornegger, J., Wells, W., Frangi, A., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2015; Volume 9351, pp. 234–241. [Google Scholar]
- Isensee, F.; Jaeger, P.F.; Kohl, S.A.A.; Petersen, J.; Maier-Hein, K.H. nnU-Net: A Self-Configuring Method for Deep Learning-Based Biomedical Image Segmentation. Nat. Methods 2021, 18, 203–211. [Google Scholar] [CrossRef] [PubMed]
- Zhou, Z.; Siddiquee, M.M.R.; Tajbakhsh, N.; Liang, J. UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation. IEEE Trans. Med. Imaging 2019, 39, 1856–1867. [Google Scholar] [CrossRef] [PubMed]
- Myronenko, A. 3D MRI Brain Tumor Segmentation Using Autoencoder Regularization. In BrainLes 2018: Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries; Revised Selected Papers, Part II; Springer International Publishing: Berlin/Heidelberg, Germany, 2019; Volume 4, pp. 311–320. [Google Scholar]
- Oktay, O.; Schlemper, J.; Le Folgoc, L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; McDonagh, S.; Hammerla, N.Y.; Kainz, B.; et al. Attention U-Net: Learning Where to Look for the Pancreas. In Proceedings of the Medical Imaging with Deep Learning, Amsterdam, The Netherlands, 4–6 July 2018. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Hatamizadeh, A.; Yang, D.; Roth, H.R.; Xu, D. UNETR: Transformers for 3D Medical Image Segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV); IEEE: Piscataway, NJ, USA, 2022; pp. 1748–1758. [Google Scholar]
- Tang, Y.; Yang, D.; Li, W.; Roth, H.R.; Landman, B.; Xu, D.; Nath, V.; Hatamizadeh, A. Self-Supervised Pre-training of Swin Transformers for 3D Medical Image Analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2022; pp. 20730–20740. [Google Scholar]
- Hatamizadeh, A.; Nath, V.; Tang, Y.; Yang, D.; Roth, H.; Xu, D. Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images. arXiv 2022, arXiv:2201.01266. [Google Scholar] [CrossRef]
- Raghu, M.; Zhang, C.; Kleinberg, J.M.; Bengio, S. Transfusion: Understanding Transfer Learning for Medical Imaging. In Proceedings of the Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
- Chen, S.; Ma, K.; Zheng, Y. Med3D: Transfer Learning for 3D Medical Image Analysis. arXiv 2019, arXiv:1904.00625. [Google Scholar] [CrossRef]
- Liu, C.C.; Pfeiffer, J.; Vulić, I.; Gurevych, I. FUN with Fisher: Improving Generalization of Adapter-Based Cross-Lingual Transfer with Scheduled Unfreezing. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers); Duh, K., Gomez, H., Bethard, S., Eds.; Association for Computational Linguistics: Mexico City, Mexico, 2024; pp. 1998–2015. [Google Scholar]
- Han, C.; Wang, Q.; Cui, Y.; Wang, W.; Huang, L.; Qi, S.; Liu, D. Facing the Elephant in the Room: Visual Prompt Tuning or Full Finetuning? In Proceedings of the International Conference on Learning Representations (ICLR), Vienna, Austria, 7 May 2024. [Google Scholar]
- Siméoni, O.; Vo, H.V.; Seitzer, M.; Baldassarre, F.; Oquab, M.; Jose, C.; Khalidov, V.; Szafraniec, M.; Yi, S.; Ramamonjisoa, M.; et al. DINOv3. arXiv 2025, arXiv:2508.10104. [Google Scholar] [PubMed]
- Liang, K.; Han, K.; Li, X.; Cheng, X.; Li, Y.; Wang, Y.; Yu, Y. Symmetry-Enhanced Attention Network for Acute Ischemic Infarct Segmentation with Non-Contrast CT Images. In Proceedings of the MICCAI 2021, Strasbourg, France, 27 September–1 October 2021. [Google Scholar]
- Hernandez Petzsche, M.R.; de la Rosa, E.; Hanning, U.; Wiest, R.; Valenzuela, W.; Reyes, M.; Kirschke, J.S. ISLES 2022: A Multi-Center Magnetic Resonance Imaging Stroke Lesion Segmentation Dataset. Sci. Data 2022, 9, 762. [Google Scholar] [CrossRef] [PubMed]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Liu, X.; Sanchez, P.; Thermos, S.; O’Neil, A.Q.; Tsaftaris, S.A. Learning Disentangled Representations in the Imaging Domain. Med. Image Anal. 2022, 80, 102516. [Google Scholar] [CrossRef] [PubMed]
- Zou, S.; Wang, G.; Zhang, Y.; Shen, Y.; Yuan, M.; Su, Y. Adaptive Feature Decoupled Network for Polyp Segmentation. Biomed. Signal Process. Control 2026, 111, 108327. [Google Scholar] [CrossRef]
- Xie, S.; Tu, Z. Holistically-Nested Edge Detection. arXiv 2015, arXiv:1504.06375. [Google Scholar] [CrossRef]
- Kerssies, T.; Cavagnero, N.; Hermans, A.; Norouzi, N.; Averta, G.; Leibe, B.; Dubbelman, G.; de Geus, D. Your ViT is Secretly an Image Segmentation Model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2025. [Google Scholar]
- He, K.; Chen, X.; Xie, S.; Li, Y.; Doll’ar, P.; Girshick, R. Masked Autoencoders Are Scalable Vision Learners. arXiv 2021, arXiv:2111.06377. [Google Scholar] [CrossRef]
- Wang, W.; Gao, Z.; Gu, L.; Pu, H.; Cui, L.; Wei, X.; Liu, Z.; Jing, L.; Ye, S.; Shao, J.; et al. InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency. arXiv 2025, arXiv:2508.18265. [Google Scholar]
- Zhu, J.; Wang, W.; Chen, Z.; Liu, Z.; Ye, S.; Gu, L.; Tian, H.; Duan, Y.; Su, W.; Shao, J.; et al. InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models. arXiv 2025, arXiv:2504.10479. [Google Scholar]
- Chen, Z.; Wu, J.; Wang, W.; Su, W.; Chen, G.; Xing, S.; Zhong, M.; Zhang, Q.; Zhu, X.; Lu, L.; et al. InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2024; pp. 24185–24198. [Google Scholar]







| Model | Dice ↑ | HD/HD95 ↓ | IoU ↑ | VC ↑ | VDP ↓ |
|---|---|---|---|---|---|
| SegResNet [12] | 0.423 | 3.807/3.717 | 0.337 | 0.574 | 0.576 |
| UNETR [15] | 0.347 | 4.011/3.986 | 0.261 | 0.410 | 0.673 |
| SwinUNETR [17] | 0.447 | 3.744/3.687 | 0.351 | 0.541 | 0.565 |
| nnUNet [10] | 0.453 | 3.770/3.691 | 0.355 | 0.535 | 0.547 |
| UNet++ [11] | 0.441 | 3.759/3.677 | 0.352 | 0.585 | 0.540 |
| AttnUnet [13] | 0.428 | 3.795/3.687 | 0.349 | 0.608 | 0.563 |
| Ours(Backbone Frozen) | 0.372 | 4.080/4.077 | 0.269 | 0.383 | 0.552 |
| Ours | 0.516 | 3.493/3.432 | 0.400 | 0.528 | 0.429 |
| Model | Dice ↑ | HD/HD95 ↓ | IoU ↑ | VC ↑ | VDP ↓ |
|---|---|---|---|---|---|
| SegResNet [12] | 0.805 | 3.018/2.445 | 0.695 | 0.810 | 0.175 |
| UNETR [15] | 0.791 | 3.197/2.660 | 0.678 | 0.796 | 0.190 |
| SwinUNETR [17] | 0.800 | 3.059/2.525 | 0.688 | 0.804 | 0.204 |
| nnUNet [10] | 0.818 | 2.960/2.316 | 0.712 | 0.822 | 0.167 |
| UNet++ [11] | 0.814 | 2.985/2.354 | 0.706 | 0.817 | 0.178 |
| AttnUnet [13] | 0.813 | 3.018/2.446 | 0.704 | 0.817 | 0.171 |
| Ours(Backbone Frozen) | 0.741 | 3.342/3.067 | 0.610 | 0.743 | 0.229 |
| Ours | 0.824 | 3.096/2.390 | 0.716 | 0.826 | 0.155 |
| Setting | Backbone Trainable | Adapter | Decoder Attention | Dice ↑ |
|---|---|---|---|---|
| (1) All closed | × | × | × | 0.3405 |
| (2) Frozen backbone | × | ✓ | ✓ | 0.3719 |
| (3) No adapter | ✓ | × | ✓ | 0.5158 |
| (4) No decoder attention | ✓ | ✓ | × | 0.5135 |
| Full model | ✓ | ✓ | ✓ | 0.5161 |
| Setting | Dice | BCE | Auxiliary | Dice ↑ |
|---|---|---|---|---|
| (1) Only Dice | ✓ | × | × | 0.5088 |
| (2) Only BCE | × | ✓ | × | 0.4429 |
| (3) Dice + BCE | ✓ | ✓ | × | 0.5119 |
| (4) Dice + BCE + 0.3aux | ✓ | ✓ | ✓ | 0.5161 |
| (5) Dice + BCE + 0.8aux | ✓ | ✓ | ✓ | 0.5159 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Chen, Q.; Zhang, D.; Chen, Y.; Zhang, S.; Sun, Y.; Reis, F.; Li, L.M.; Yuan, L.; Jin, H.; Qiu, W. StrDiSeg: Adapter-Enhanced DINOv3 for Automated Ischemic Stroke Lesion Segmentation. Bioengineering 2026, 13, 133. https://doi.org/10.3390/bioengineering13020133
Chen Q, Zhang D, Chen Y, Zhang S, Sun Y, Reis F, Li LM, Yuan L, Jin H, Qiu W. StrDiSeg: Adapter-Enhanced DINOv3 for Automated Ischemic Stroke Lesion Segmentation. Bioengineering. 2026; 13(2):133. https://doi.org/10.3390/bioengineering13020133
Chicago/Turabian StyleChen, Qiong, Donghao Zhang, Yimin Chen, Siyuan Zhang, Yue Sun, Fabiano Reis, Li M. Li, Li Yuan, Huijuan Jin, and Wu Qiu. 2026. "StrDiSeg: Adapter-Enhanced DINOv3 for Automated Ischemic Stroke Lesion Segmentation" Bioengineering 13, no. 2: 133. https://doi.org/10.3390/bioengineering13020133
APA StyleChen, Q., Zhang, D., Chen, Y., Zhang, S., Sun, Y., Reis, F., Li, L. M., Yuan, L., Jin, H., & Qiu, W. (2026). StrDiSeg: Adapter-Enhanced DINOv3 for Automated Ischemic Stroke Lesion Segmentation. Bioengineering, 13(2), 133. https://doi.org/10.3390/bioengineering13020133

