Next Article in Journal
A Study on Fractional-Order Adaptive Super-Twisting Sliding Mode Control for an Excavator Working Device
Previous Article in Journal
A Novel Hybrid Metaheuristic Algorithm for Real-World Mechanical Engineering Optimization Problems
Previous Article in Special Issue
Analysis of Radial Hydraulic Forces in Centrifugal Pump Operation via Hierarchical Clustering (HC) Algorithms
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Liver-VLM: Enhancing Focal Liver Lesion Classification with Self-Supervised Vision-Language Pretraining

1
School of Mathematical Sciences, Huaqiao University, Quanzhou 362021, China
2
College of Information Science and Engineering, Ritsumeikan University, Osaka 567-8570, Japan
3
School of Information Science and Engineering, Shandong University, Qingdao 266237, China
*
Authors to whom correspondence should be addressed.
Appl. Sci. 2025, 15(23), 12578; https://doi.org/10.3390/app152312578
Submission received: 20 May 2025 / Revised: 25 October 2025 / Accepted: 25 November 2025 / Published: 27 November 2025

Abstract

Accurate classification of focal liver lesions (FLLs) is crucial for reliable clinical decision-making. Inspired by contrastive vision-language models such as CLIP and MedCLIP, we propose Liver-VLM for FLLs classification, trained on a dedicated multi-phase 2D CT dataset. Liver-VLM aligns multi-phase CT image representations with class-specific textual descriptions by calculating their similarity under a cross-entropy loss. Furthermore, we design tailored, enriched textual prompts to stabilize optimization and enable robust classification even with limited labeled data. Additionally, self-supervised pretraining and data augmentation strategies are incorporated to further improve classification performance. Experimental results on an in-house MPCT-FLLs dataset demonstrate that Liver-VLM consistently outperforms existing VLMs, achieving an accuracy of 85.63 ± 3.18% and an AUC of 0.94 ± 0.01. Our findings highlight the efficacy of self-supervised learning and task-specific augmentation in overcoming data scarcity and distributional biases in medical image analysis.
Keywords: focal liver lesions (FLLs); multi-phase CT imaging; self-supervised learning; vision-language models (VLMs); data augmentation; phase shuffle strategy focal liver lesions (FLLs); multi-phase CT imaging; self-supervised learning; vision-language models (VLMs); data augmentation; phase shuffle strategy

Share and Cite

MDPI and ACS Style

Song, J.; Hu, Y.; Wang, H.; Chen, Y.-W. Liver-VLM: Enhancing Focal Liver Lesion Classification with Self-Supervised Vision-Language Pretraining. Appl. Sci. 2025, 15, 12578. https://doi.org/10.3390/app152312578

AMA Style

Song J, Hu Y, Wang H, Chen Y-W. Liver-VLM: Enhancing Focal Liver Lesion Classification with Self-Supervised Vision-Language Pretraining. Applied Sciences. 2025; 15(23):12578. https://doi.org/10.3390/app152312578

Chicago/Turabian Style

Song, Jian, Yuchang Hu, Hui Wang, and Yen-Wei Chen. 2025. "Liver-VLM: Enhancing Focal Liver Lesion Classification with Self-Supervised Vision-Language Pretraining" Applied Sciences 15, no. 23: 12578. https://doi.org/10.3390/app152312578

APA Style

Song, J., Hu, Y., Wang, H., & Chen, Y.-W. (2025). Liver-VLM: Enhancing Focal Liver Lesion Classification with Self-Supervised Vision-Language Pretraining. Applied Sciences, 15(23), 12578. https://doi.org/10.3390/app152312578

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop