ECSA: Mitigating Catastrophic Forgetting and Few-Shot Generalization in Medical Visual Question Answering

Jia, Qinhao; Liu, Shuxian; Chen, Mingliang; Li, Tianyi; Yang, Jing

doi:10.3390/tomography11100115

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

ECSA: Mitigating Catastrophic Forgetting and Few-Shot Generalization in Medical Visual Question Answering

by

Qinhao Jia

,

Shuxian Liu

^*,

Mingliang Chen

,

Tianyi Li

and

Jing Yang

School of Computer Science and Technology, Xinjiang University, Urumqi 830017, China

^*

Author to whom correspondence should be addressed.

Tomography 2025, 11(10), 115; https://doi.org/10.3390/tomography11100115

Submission received: 21 August 2025 / Revised: 27 September 2025 / Accepted: 15 October 2025 / Published: 20 October 2025

Download Versions Notes

Abstract

Objective: Medical Visual Question Answering (Med-VQA), a key technology that integrates computer vision and natural language processing to assist in clinical diagnosis, possesses significant potential for enhancing diagnostic efficiency and accuracy. However, its development is constrained by two major bottlenecks: weak few-shot generalization capability stemming from the scarcity of high-quality annotated data and the problem of catastrophic forgetting when continually learning new knowledge. Existing research has largely addressed these two challenges in isolation, lacking a unified framework. Methods: To bridge this gap, this paper proposes a novel Evolvable Clinical-Semantic Alignment (ECSA) framework, designed to synergistically solve these two challenges within a single architecture. ECSA is built upon powerful pre-trained vision (BiomedCLIP) and language (Flan-T5) models, with two innovative modules at its core. First, we design a Clinical-Semantic Disambiguation Module (CSDM), which employs a novel debiased hard negative mining strategy for contrastive learning. This enables the precise discrimination of “hard negatives” that are visually similar but clinically distinct, thereby significantly enhancing the model’s representation ability in few-shot and long-tail scenarios. Second, we introduce a Prompt-based Knowledge Consolidation Module (PKC), which acts as a rehearsal-free non-parametric knowledge store. It consolidates historical knowledge by dynamically accumulating and retrieving task-specific “soft prompts,” thus effectively circumventing catastrophic forgetting without relying on past data. Results: Extensive experimental results on four public benchmark datasets, VQA-RAD, SLAKE, PathVQA, and VQA-Med-2019, demonstrate ECSA’s state-of-the-art or highly competitive performance. Specifically, ECSA achieves excellent overall accuracies of 80.15% on VQA-RAD and 85.10% on SLAKE, while also showing strong generalization with 64.57% on PathVQA and 82.23% on VQA-Med-2019. More critically, in continual learning scenarios, the framework achieves a low forgetting rate of just 13.50%, showcasing its significant advantages in knowledge retention. Conclusions: These findings validate the framework’s substantial potential for building robust and evolvable clinical decision support systems.

Keywords: Medical Visual Question Answering (Med-VQA); multimodal learning; visual feature extraction; deep reasoning

Share and Cite

MDPI and ACS Style

Jia, Q.; Liu, S.; Chen, M.; Li, T.; Yang, J. ECSA: Mitigating Catastrophic Forgetting and Few-Shot Generalization in Medical Visual Question Answering. Tomography 2025, 11, 115. https://doi.org/10.3390/tomography11100115

AMA Style

Jia Q, Liu S, Chen M, Li T, Yang J. ECSA: Mitigating Catastrophic Forgetting and Few-Shot Generalization in Medical Visual Question Answering. Tomography. 2025; 11(10):115. https://doi.org/10.3390/tomography11100115

Chicago/Turabian Style

Jia, Qinhao, Shuxian Liu, Mingliang Chen, Tianyi Li, and Jing Yang. 2025. "ECSA: Mitigating Catastrophic Forgetting and Few-Shot Generalization in Medical Visual Question Answering" Tomography 11, no. 10: 115. https://doi.org/10.3390/tomography11100115

APA Style

Jia, Q., Liu, S., Chen, M., Li, T., & Yang, J. (2025). ECSA: Mitigating Catastrophic Forgetting and Few-Shot Generalization in Medical Visual Question Answering. Tomography, 11(10), 115. https://doi.org/10.3390/tomography11100115

Article Menu

ECSA: Mitigating Catastrophic Forgetting and Few-Shot Generalization in Medical Visual Question Answering

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI