Liver-VLM: Enhancing Focal Liver Lesion Classification with Self-Supervised Vision-Language Pretraining

Song, Jian; Hu, Yuchang; Wang, Hui; Chen, Yen-Wei

doi:10.3390/app152312578

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

Liver-VLM: Enhancing Focal Liver Lesion Classification with Self-Supervised Vision-Language Pretraining

¹

School of Mathematical Sciences, Huaqiao University, Quanzhou 362021, China

²

College of Information Science and Engineering, Ritsumeikan University, Osaka 567-8570, Japan

³

School of Information Science and Engineering, Shandong University, Qingdao 266237, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2025, 15(23), 12578; https://doi.org/10.3390/app152312578

Submission received: 20 May 2025 / Revised: 25 October 2025 / Accepted: 25 November 2025 / Published: 27 November 2025

(This article belongs to the Special Issue Machine Learning and Data Analysis: Bridging Theory and Real-World Solutions)

Download Versions Notes

Abstract

Accurate classification of focal liver lesions (FLLs) is crucial for reliable clinical decision-making. Inspired by contrastive vision-language models such as CLIP and MedCLIP, we propose Liver-VLM for FLLs classification, trained on a dedicated multi-phase 2D CT dataset. Liver-VLM aligns multi-phase CT image representations with class-specific textual descriptions by calculating their similarity under a cross-entropy loss. Furthermore, we design tailored, enriched textual prompts to stabilize optimization and enable robust classification even with limited labeled data. Additionally, self-supervised pretraining and data augmentation strategies are incorporated to further improve classification performance. Experimental results on an in-house MPCT-FLLs dataset demonstrate that Liver-VLM consistently outperforms existing VLMs, achieving an accuracy of 85.63 ± 3.18% and an AUC of 0.94 ± 0.01. Our findings highlight the efficacy of self-supervised learning and task-specific augmentation in overcoming data scarcity and distributional biases in medical image analysis.

Keywords: focal liver lesions (FLLs); multi-phase CT imaging; self-supervised learning; vision-language models (VLMs); data augmentation; phase shuffle strategy

Share and Cite

MDPI and ACS Style

Song, J.; Hu, Y.; Wang, H.; Chen, Y.-W. Liver-VLM: Enhancing Focal Liver Lesion Classification with Self-Supervised Vision-Language Pretraining. Appl. Sci. 2025, 15, 12578. https://doi.org/10.3390/app152312578

AMA Style

Song J, Hu Y, Wang H, Chen Y-W. Liver-VLM: Enhancing Focal Liver Lesion Classification with Self-Supervised Vision-Language Pretraining. Applied Sciences. 2025; 15(23):12578. https://doi.org/10.3390/app152312578

Chicago/Turabian Style

Song, Jian, Yuchang Hu, Hui Wang, and Yen-Wei Chen. 2025. "Liver-VLM: Enhancing Focal Liver Lesion Classification with Self-Supervised Vision-Language Pretraining" Applied Sciences 15, no. 23: 12578. https://doi.org/10.3390/app152312578

APA Style

Song, J., Hu, Y., Wang, H., & Chen, Y.-W. (2025). Liver-VLM: Enhancing Focal Liver Lesion Classification with Self-Supervised Vision-Language Pretraining. Applied Sciences, 15(23), 12578. https://doi.org/10.3390/app152312578

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Liver-VLM: Enhancing Focal Liver Lesion Classification with Self-Supervised Vision-Language Pretraining

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI